Multiple tasks at 100%, still running after 150+ hours


Advanced search

Questions and Answers : Windows : Multiple tasks at 100%, still running after 150+ hours

1 · 2 · Next
Author Message
Jeff Rivett
Send message
Joined: 4 Mar 15
Posts: 3
Credit: 0
RAC: 0
Message 1981 - Posted: 18 Mar 2015, 17:49:15 UTC

There are eight Atlas tasks listed in BOINC, seven of which are now at 100%. The last is at 99.998% and slowly rising. The tasks at 100% are still shown as 'Running' and still using CPU. Each has a corresponding VBox VM. Some of them have been running for 150+ hours. I'm on 64-bit Windows 8.1 with 16 GB of RAM.

I originally had some trouble getting Atlas to work, but discovered Intel Virtualization was disabled in the BIOS. Once I enabled that, the VMs started to work as expected (no errors). I was able to confirm this by starting one of the VMs while its task was suspended.

Is it possible that the Progress indicator is simply incorrect? Before I enabled virtualization, the tasks did seem to be running, although they definitely weren't, as I could see by running one of the (suspended) VMs, where I saw the error message that pointed me to the virtualization setting in BIOS. Perhaps BOINC thought those tasks were running when in fact they weren't, and now the work continues, even though the Progress indicator is at 100%. If I'm right about that, the tasks should eventually complete.

I've read a few of the other posts here and tried some of the suggestions for similar problems, but so far no luck in figuring this out.

I originally had BOINC configured to use 100% of the CPUs (eight of them) which is why there are eight Atlas tasks. Earlier today I dropped the setting to 50% as a test; the result was that only four of the eight tasks showed as running, which makes sense.

Eventually, of course, the tasks will reach their deadline and time out, at which point I'll most likely just abort them.

Jeff Rivett
Send message
Joined: 4 Mar 15
Posts: 3
Credit: 0
RAC: 0
Message 1984 - Posted: 19 Mar 2015, 13:24:21 UTC
Last modified: 19 Mar 2015, 13:25:30 UTC

All eight Atlas tasks now show Progress as 100% and Status as 'Running'.

I installed the VirtualBox extensions to see if I could remote into one of the running Atlas VMs, and it works, but all I see is a Linux boot menu, and I can't do or see anything else.

I started one of the suspended Atlas VMs, and recorded some interesting messages that appeared during the boot process:

"mount: mount point /proc/bus/usb does not exist"

"Starting udev:udev-work[1648]: error changing netif name 'eth0'to 'eth8': Device or resource busy"

"Bringing up interface eth1: Device eth1 does not seem to be present, delaying initialization. [FAILED]"

"cp: cannot stat '/home/atlas01/shared/check_env.sh': no such file or directory"

"chown: cannot dereference '/home/atlas01/RunAtlas/runJob/py': no such file or directory"

Are any of these messages serious and/or relevant to the problem?

Profile Georg Stoifl
Avatar
Send message
Joined: 28 Oct 14
Posts: 75
Credit: 5,187,269
RAC: 0
    
Message 2053 - Posted: 25 Mar 2015, 11:58:59 UTC

Hi,

Check if you run into the same issue:

http://atlasathome.cern.ch/forum_thread.php?id=201
http://atlasathome.cern.ch/forum_thread.php?id=187

Having multiple Hypervisors at Windows with 64 bit doesn't work.
(VirtualBox + Win. HyperV) Windows 8.1 has HyperV by default installed.

Georg

Jeff Rivett
Send message
Joined: 4 Mar 15
Posts: 3
Credit: 0
RAC: 0
Message 2054 - Posted: 25 Mar 2015, 13:29:28 UTC - in response to Message 2053.

HyperV is not installed. In any case, I already gave up and reset Atlas, losing several hundred hours of work. Now nothing is happening at all with Atlas. No new tasks. Nothing. And BOINC now says 'Your app_config.xml file refers to an unknown application 'ATLAS'. Known applications: None'. Too many problems for me. Removing Atlas.

UncleBenZ
Send message
Joined: 18 Jun 15
Posts: 1
Credit: 0
RAC: 0
Message 2590 - Posted: 28 Jun 2015, 15:29:11 UTC
Last modified: 28 Jun 2015, 15:29:50 UTC

I have the same problem. 2 tasks with 100% and no time remaining (---) and 2 tasks with 100% and time remaining says 00:00:00.
I have Virtual Box 4.3.28 and the extension pack for it.
Also Win8.1 64bit.

So if 8.1 hat HyperV installed and i have Virtual Box installed i have both and its not working?

But why is it "working" but not validating?

Dates when the tasks run out it 2.7., 3.7. and 2x 7.7..

Where can i see if HyperV is installed and deinstall it?
And Virtualising is enabled in the Bios but it started working when it was disbled.
Here is a screenshot
http://www.bilder-upload.eu/show.php?file=a8e1fc-1435505918.png

First 2 tasks in virtual box have this screen. The other 2 tasks just have a black screen with its name.

JezusCorp
Send message
Joined: 20 Aug 15
Posts: 3
Credit: 0
RAC: 0
Message 2893 - Posted: 26 Aug 2015, 13:00:04 UTC
Last modified: 26 Aug 2015, 13:00:45 UTC

I aslo have this problem. But I noticed that whenever the tasks reach 100%, it drops back to 99.996%. It has been running like this since last friday, August 21th. I just restarded BOINC Manager, but nothing changed.

If it can help, here is my log event since the restart. Setting is at "always run" and use 50% while computer is in use.

PS : I erased all the information that could identify my name or PC-Name.

26/08/2015 8:44:36 AM | | cc_config.xml not found - using defaults
26/08/2015 8:44:37 AM | | Starting BOINC client version 7.4.42 for windows_x86_64
26/08/2015 8:44:37 AM | | log flags: file_xfer, sched_ops, task
26/08/2015 8:44:37 AM | | Libraries: libcurl/7.39.0 OpenSSL/1.0.1j zlib/1.2.8
26/08/2015 8:44:37 AM | | Data directory: C:\ProgramData\BOINC
26/08/2015 8:44:37 AM | | Running under account XXXXX
26/08/2015 8:44:37 AM | | CUDA: NVIDIA GPU 0: GeForce 210 (driver version 341.81, CUDA version 6.5, compute capability 1.2, 1024MB, 737MB available, 67 GFLOPS peak)
26/08/2015 8:44:37 AM | | OpenCL: NVIDIA GPU 0: GeForce 210 (driver version 341.81, device version OpenCL 1.0 CUDA, 1024MB, 737MB available, 67 GFLOPS peak)
26/08/2015 8:44:37 AM | | Host name: XXXXX
26/08/2015 8:44:37 AM | | Processor: 4 GenuineIntel Intel(R) Core(TM) i5-2380P CPU @ 3.10GHz [Family 6 Model 42 Stepping 7]
26/08/2015 8:44:37 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes syscall nx lm avx vmx tm2 pbe
26/08/2015 8:44:37 AM | | OS: Microsoft Windows 7: Professional x64 Edition, Service Pack 1, (06.01.7601.00)
26/08/2015 8:44:37 AM | | Memory: 7.97 GB physical, 15.95 GB virtual
26/08/2015 8:44:37 AM | | Disk: 390.53 GB total, 120.89 GB free
26/08/2015 8:44:37 AM | | Local time is UTC XXXXX hours
26/08/2015 8:44:37 AM | | VirtualBox version: 4.3.12
26/08/2015 8:44:37 AM | ATLAS@home | URL http://atlasathome.cern.ch/; Computer ID 25804; resource share 100
26/08/2015 8:44:37 AM | ATLAS@home | General prefs: from ATLAS@home (last modified 20-Aug-2015 09:21:29)
26/08/2015 8:44:37 AM | ATLAS@home | Host location: none
26/08/2015 8:44:37 AM | ATLAS@home | General prefs: using your defaults
26/08/2015 8:44:37 AM | | Reading preferences override file
26/08/2015 8:44:37 AM | | Preferences:
26/08/2015 8:44:37 AM | | max memory usage when active: 4083.19MB
26/08/2015 8:44:37 AM | | max memory usage when idle: 7349.75MB
26/08/2015 8:44:37 AM | | max disk usage: 100.00GB
26/08/2015 8:44:37 AM | | don't compute while active
26/08/2015 8:44:37 AM | | don't use GPU while active
26/08/2015 8:44:37 AM | | suspend work if non-BOINC CPU load exceeds 25%
26/08/2015 8:44:37 AM | | (to change preferences, visit a project web site or select Preferences in the Manager)
26/08/2015 8:44:37 AM | | Not using a proxy
26/08/2015 8:44:37 AM | | Suspending computation - computer is in use
26/08/2015 8:44:37 AM | | Suspending network activity - computer is in use
26/08/2015 8:50:53 AM | | Suspending GPU computation - computer is in use
26/08/2015 8:50:54 AM | | Resuming GPU computation
26/08/2015 8:50:55 AM | | Resuming network activity
26/08/2015 8:50:55 AM | ATLAS@home | Sending scheduler request: To fetch work.
26/08/2015 8:50:55 AM | ATLAS@home | Requesting new tasks for CPU
26/08/2015 8:50:57 AM | ATLAS@home | Scheduler request completed: got 0 new tasks
26/08/2015 8:50:57 AM | ATLAS@home | No tasks sent

JezusCorp
Send message
Joined: 20 Aug 15
Posts: 3
Credit: 0
RAC: 0
Message 2899 - Posted: 27 Aug 2015, 18:03:51 UTC - in response to Message 2893.

I tried to manually start one task via VirtualBox Manager. An error message keeps showing:

VT-x/AMD-V hardware acceleration is not available on your system. Your 64-bit guest will fail to detect a 64-bit CPU and will not be able to boot.


Any ideas what i should do?

My system is Windows 7: Professional x64 Edition, Service Pack 1, Memory: 7.97 GB physical, 15.95 GB virtual

JezusCorp
Send message
Joined: 20 Aug 15
Posts: 3
Credit: 0
RAC: 0
Message 2903 - Posted: 28 Aug 2015, 12:43:34 UTC - in response to Message 2899.

Yesterday, i managed to get the Virtualization running via my bios.

Thing is, now i get this error message in Boinc Manager :

Computation error


3 tasks out of 4 are at 100% and they have this message.

On top of all that, my computer crashed 2 times since i enabled Virtualization. Add to reboot everything.

Any ideas?

I will suspend everything until someone can pin-point my problems.

Valter
Send message
Joined: 29 Jun 14
Posts: 3
Credit: 0
RAC: 0
Message 3098 - Posted: 6 Oct 2015, 18:15:48 UTC

Hi to all!

I have a WU stuck at 99.998%. It has been running for 29 hours and shows 1 second as remaining time, for hours.

Please, should I leave it or abort it?

Many thanks,

Valter Aguiar
denise.valter@cmg.com.br

Mike
Send message
Joined: 22 Oct 15
Posts: 7
Credit: 7,033
RAC: 0
  
Message 3169 - Posted: 27 Oct 2015, 20:17:42 UTC
Last modified: 27 Oct 2015, 20:39:36 UTC

I have the same problem where the units never finish. After poking around for a while, I was able to open up the interface for the virtual machine through the VirtualBox manager. I then discovered that the virtual machine never booted at all but there must be no communication between BOINC and the virtual machine because BOINC didn't indicate any sort of problem.

I fiddled with my BIOS settings and was able to get the virtual machines to get farther into their boot process, but they seem to stall at a login prompt and then time out with an error message stating they can't connect. At that point the virtual machine automatically shuts down and BOINC marks it as a computation error. I've yet to have a single WU even start as far as I can tell.

After considerable poking around in other fora, it seems to be a networking issue with VirtualBox perhaps? My firewall is configured to allow VirtualBox through, and BOINC has no problem with network connectivity. I get over 73Mbps at speedtest.net, so I doubt the connection speed is a problem.

Any suggestions?
Thanks,
Mike

André Viveiros
Send message
Joined: 28 Oct 15
Posts: 1
Credit: 513
RAC: 0
  
Message 3264 - Posted: 4 Nov 2015, 13:26:47 UTC
Last modified: 4 Nov 2015, 13:30:47 UTC

Same problem here! :/
Have reset the project and lost immense hours of processing.

Maybe I will close my account. Any Resolution for this problem??

Etienne Guyot
Send message
Joined: 29 Nov 15
Posts: 10
Credit: 36,003
RAC: 0
    
Message 3424 - Posted: 6 Dec 2015, 10:21:51 UTC - in response to Message 3264.

Just enable VT-x in your BIOS settings (Intel proc) or equiv on AMD MoBo.

Depending on your Mother Board, you may have to go in advanced settings to find the corresponding switch.

On my own system (W7 pro 64bit running on an Asus Z97 pro, i7 proc @4GHz, 16Gb of ram), the switch was disable by default: I add to manual enabled it. Then I add to abort all previous ATLAS tasks and let the BOINC manager got some new tasks.
It run just fine now, a task is completed in a little bit more than one hour.

Bye

we
Send message
Joined: 22 Nov 15
Posts: 1
Credit: 59,572
RAC: 0
    
Message 3519 - Posted: 27 Dec 2015, 8:22:47 UTC - in response to Message 3264.

exactly the same issue: 100% for daaaays! But sometime it works. So some jobs get through. Don't know why.
I wonder why no one of the BOINC team responds and brings a solution?
I will remove ATLAS if nothing happens - it's stealing calculation power which can go to someone else ...

Profile BoincBoinc8
Send message
Joined: 23 Dec 15
Posts: 15
Credit: 184,308
RAC: 0
    
Message 3636 - Posted: 25 Jan 2016, 19:18:55 UTC - in response to Message 3519.
Last modified: 25 Jan 2016, 19:33:08 UTC

I've got the same issue and will no accept new till it is fixed. This issue is a bit random for me. Sometime it works perfectly and sometimes, computation error, stuck at 100% for ages and then computation error.... Better to abort and provide my calc power to another project.
Note Hyper-V is disable on my machines.

State: All (84) · In progress (3) · Validation pending (0) · Validation inconclusive (0) · Valid (28) · Invalid (12) · Error (41)

Something is wrong no?

Mateon1
Send message
Joined: 28 May 15
Posts: 4
Credit: 13,077
RAC: 0
    
Message 3683 - Posted: 3 Feb 2016, 6:54:14 UTC

I have this issue and I don't know how to fix it. Virtualbox seems to be working, but no work is getting done, and no CPU is being used.
When I open the running ATLAS VMs in Virtualbox I get a message saying:

FATAL: Could not read from the boot medium! System halted.

No storage is attached to the VM

hsdecalc
Send message
Joined: 21 Feb 15
Posts: 5
Credit: 494,496
RAC: 0
    
Message 3691 - Posted: 7 Feb 2016, 16:41:47 UTC
Last modified: 7 Feb 2016, 16:51:24 UTC

Same problem here. Some infos:

On my first PC with WIN8.1 and i7-4790 Atlas-WUs running well.
My second PC, with i7- 5820k CPU (no on-chip GPU) on WIN10 can’t start the ATLAS-job.
On both PC jobs from the VirtualLHC@home project running without problems. So it is not a problem with the VT-x bios option?

The bad one has this file vbox_checkpoint.xml:
<vbox_checkpoint>
<elapsed_time>13224.452841</elapsed_time>
<cpu_time>6.656250</cpu_time>
<webapi_port>50134</webapi_port>
<remote_desktop_port>50135</remote_desktop_port>
</vbox_checkpoint>

The cpu time stop between 5 to 6 seconds, no increase.
I compare the logfiles from both PC. I only see a different output in log-file Vbox.log:

WIN 10 (bad one):
00:00:02.608402 PIIX3 ATA: Ctl#1: finished processing RESET
00:00:02.610928 PIT: mode=2 count=0x48d3 (18643) - 64.00 Hz (ch=0)
00:00:02.613302 Display::handleDisplayResize: uScreenId=0 pvVRAM=0000000009bf0000 w=640 h=480 bpp=32 cbLine=0xA00 flags=0x1
00:00:05.082823 PIT: mode=2 count=0x10000 (65536) - 18.20 Hz (ch=0)
00:00:05.083085 VMMDev: Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032
00:00:05.083988 VMMDev: Guest Log: BIOS: Booting from Hard Disk...
00:00:05.089192 Display::handleDisplayResize: uScreenId=0 pvVRAM=0000000000000000 w=720 h=400 bpp=0 cbLine=0x0 flags=0x1
00:00:05.175215 Display::handleDisplayResize: uScreenId=0 pvVRAM=0000000009bf0000 w=640 h=480 bpp=24 cbLine=0x780 flags=0x1
--- the line above is (for hours) the last line -----

--- i can suspend and restart the WU, lines are created: ---
03:52:17.894767 Changing the VM state from 'RUNNING' to 'SUSPENDING'
03:52:17.920855 PDMR3Suspend: 26 018 954 ns run time
03:52:17.920888 Changing the VM state from 'SUSPENDING' to 'SUSPENDED'
03:52:17.920904 Console: Machine state changed to 'Paused'
03:54:28.060469 Changing the VM state from 'RESUMING' to 'RUNNING'
03:54:28.060491 Console: Machine state changed to 'Running'

WIN8.1 (ok)
00:00:02.963314 PIIX3 ATA: Ctl#1: RESET, DevSel=0 AIOIf=0 CmdIf0=0x00 (-1 usec ago) CmdIf1=0x00 (-1 usec ago)
00:00:02.963341 PIIX3 ATA: Ctl#1: finished processing RESET
00:00:02.976016 PIT: mode=2 count=0x48d3 (18643) - 64.00 Hz (ch=0)
00:00:02.986934 Display::handleDisplayResize: uScreenId=0 pvVRAM=0000000006340000 w=640 h=480 bpp=32 cbLine=0xA00 flags=0x1
00:00:05.457821 Display::handleDisplayResize: uScreenId=0 pvVRAM=0000000000000000 w=720 h=400 bpp=0 cbLine=0x0 flags=0x1
00:00:05.459821 PIT: mode=2 count=0x10000 (65536) - 18.20 Hz (ch=0)
00:00:05.460153 VMMDev: Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032
00:00:05.460540 VMMDev: Guest Log: BIOS: Booting from Hard Disk...
00:00:05.543837 Display::handleDisplayResize: uScreenId=0 pvVRAM=0000000006340000 w=640 h=480 bpp=24 cbLine=0x780 flags=0x1
00:00:07.696866 VMMDev: Guest Log: BIOS: KBD: unsupported int 16h function 03
00:00:07.697245 VMMDev: Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000
00:00:07.794308 Display::handleDisplayResize: uScreenId=0 pvVRAM=0000000000000000 w=720 h=400 bpp=0 cbLine=0x0 flags=0x1

00:00:08.520127 GIM: KVM: VCPU 0: Enabled system-time struct. at 0x000000008c0eb000 - u32TscScale=0xbda1125a i8TscShift=-1 uVersion=2 fFlags=0x1 uTsc=0x382951677 uVirtNanoTS=0x14cd3f727
00:00:08.520162 TM: Switching TSC mode from 'Dynamic' to 'RealTscOffset'
00:00:08.567302 GIM: KVM: VCPU 0: Enabled system-time struct. at 0x000000008c0eb000 - u32TscScale=0xbda1125a i8TscShift=-1 uVersion=4 fFlags=0x1 uTsc=0x382951677 uVirtNanoTS=0x14cd3f727
00:00:08.611044 GIM: KVM: Enabled wall-clock struct. at 0x0000000001e2d348 - u32Sec=1454840458 u32Nano=830864200 uVersion=2
…..
00:02:17.737835 VMMDev: Guest Log: Copying input files into RunAtlas.
00:02:18.383624 VMMDev: Guest Log: Copied input files into RunAtlas.
00:02:18.412378 VMMDev: Guest Log: Starting ATLAS job. Output is redirected into runtime_log.

The lines after time 00:00:07.794.. I never see on my WIN10 PC.
No error messages. Job is running but without cpu-time consumption.
May it's a problem with CPUs which have no integrated GPU?

Profile Yeti
Avatar
Send message
Joined: 20 Jul 14
Posts: 699
Credit: 22,597,832
RAC: 0
    
Message 3694 - Posted: 7 Feb 2016, 18:46:59 UTC

Please try this checklist:

http://atlasathome.cern.ch/forum_thread.php?id=438

Profile NateM
Send message
Joined: 22 Dec 14
Posts: 1
Credit: 201,011
RAC: 0
    
Message 3695 - Posted: 7 Feb 2016, 19:10:31 UTC

I had the same issue...tasks at 100% but never finishing after installing on a new computer. After going into the bios and verifying VT-d was enabled and Intel Virtualization Technology was enabled the problem stopped.

hsdecalc
Send message
Joined: 21 Feb 15
Posts: 5
Credit: 494,496
RAC: 0
    
Message 3696 - Posted: 7 Feb 2016, 19:11:51 UTC

Some weeks ago id does not work with the preferences:
Intel (R) VT-d feature = enabled
My system crashed!

Now I checked my bios-settings again.
I'm only activate the preferences:
Intel (R) Intel Virtualization Technology = enabled

Now the VM is working!
I cancel the WUs which are already activated.
WUs which are not started before are running now.
Problem solved.

Administrator
Send message
Joined: 13 Jan 16
Posts: 1
Credit: 0
RAC: 0
Message 3717 - Posted: 13 Feb 2016, 3:02:20 UTC - in response to Message 3424.

I'm going to bet that I'm just out of luck resolving the issue on a win 2003 server x64.

1 · 2 · Next

Questions and Answers : Windows : Multiple tasks at 100%, still running after 150+ hours