The cupboard is bare!


Advanced search

Message boards : Number crunching : The cupboard is bare!

1 · 2 · 3 · Next
Author Message
Fuzzy Duck
Send message
Joined: 3 Dec 15
Posts: 33
Credit: 5,074,231
RAC: 0
    
Message 5728 - Posted: 29 Nov 2016, 15:47:17 UTC

Please note that there are no new WU's available.

Please replenish.

Erich
Send message
Joined: 18 Dec 15
Posts: 253
Credit: 1,942,248
RAC: 0
    
Message 5729 - Posted: 29 Nov 2016, 19:49:37 UTC

David, any major problem around?

Profile Yeti
Avatar
Send message
Joined: 20 Jul 14
Posts: 699
Credit: 22,597,832
RAC: 0
    
Message 5730 - Posted: 29 Nov 2016, 23:16:27 UTC

Did anyone from Projectteam notice that we are out of work again ????

David Cameron
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 13 May 14
Posts: 252
Credit: 2,028,556
RAC: 1
    
Message 5731 - Posted: 30 Nov 2016, 8:17:53 UTC - in response to Message 5730.

Yes, we know and are working to get some new tasks defined.

Erich
Send message
Joined: 18 Dec 15
Posts: 253
Credit: 1,942,248
RAC: 0
    
Message 5735 - Posted: 1 Dec 2016, 7:24:45 UTC - in response to Message 5731.

Yes, we know and are working to get some new tasks defined.

David, any idea when new tasks will become available?

David Cameron
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 13 May 14
Posts: 252
Credit: 2,028,556
RAC: 1
    
Message 5737 - Posted: 1 Dec 2016, 13:07:07 UTC - in response to Message 5735.

Sorry, it took a bit longer than expected but we have new tasks now.

Profile [AF] Marc
Send message
Joined: 20 May 16
Posts: 8
Credit: 1,004,053
RAC: 0
    
Message 5738 - Posted: 1 Dec 2016, 13:31:37 UTC - in response to Message 5737.

Hi, no problem, I think we are all here to help the Atlas@home project.
So thank you to give some work to our computers, they were greedy :-)

Patrick
Send message
Joined: 21 Oct 16
Posts: 2
Credit: 58,961
RAC: 0
    
Message 5740 - Posted: 1 Dec 2016, 22:51:50 UTC
Last modified: 1 Dec 2016, 22:53:34 UTC

Work units from this lastest batch are not being processed properly by my machine. It's not properly using my CPUs, when it was doing just fine before. Look at how low the CPU time is vs. the time elapsed, when it should be 4x more. The host ID is 60192.


2016-12-01 07:23:11 (9172): Guest Log: Starting ATLAS job. (PandaID=3104537457)
2016-12-01 09:02:43 (9172): Status Report: Elapsed Time: '6001.869226'
2016-12-01 09:02:43 (9172): Status Report: CPU Time: '478.859375'
2016-12-01 10:42:48 (9172): Status Report: Elapsed Time: '12006.620356'
2016-12-01 10:42:48 (9172): Status Report: CPU Time: '550.953125'
2016-12-01 12:22:52 (9172): Status Report: Elapsed Time: '18011.408161'
2016-12-01 12:22:52 (9172): Status Report: CPU Time: '657.734375'
2016-12-01 14:02:57 (9172): Status Report: Elapsed Time: '24016.214005'
2016-12-01 14:02:57 (9172): Status Report: CPU Time: '725.671875'


Here is my app_config

<app_config>
<app>
<name>ATLAS_MCORE</name>
<max_concurrent>1</max_concurrent>
</app>
<app>
<name>ATLAS</name>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>ATLAS_MCORE</app_name>
<plan_class>vbox_64_mt_mcore</plan_class>
<avg_ncpus>4</avg_ncpus>
<cmdline>--memory_size_mb 5700</cmdline>
</app_version>
</app_config>

Patrick
Send message
Joined: 21 Oct 16
Posts: 2
Credit: 58,961
RAC: 0
    
Message 5741 - Posted: 2 Dec 2016, 3:15:30 UTC

Following the steps in the checklist has not fixed the problem, but I am unwilling to reinstall the Boinc client currently as I have many other projects being worked on. I reinstalled VirtualBox, reset the project, and reset my PC multiple times. I may come back to this project later if it begins to work again later, but until then I am unable to do any more WUs.

Erich
Send message
Joined: 18 Dec 15
Posts: 253
Credit: 1,942,248
RAC: 0
    
Message 5742 - Posted: 2 Dec 2016, 4:48:52 UTC - in response to Message 5740.

Work units from this lastest batch are not being processed properly by my machine. It's not properly using my CPUs, when it was doing just fine before.

same thing is happening here.
I have 3 multicore WUs (3 cores each) running for more than 5 hours now, and under "properties" I see that CPU usage was no more than about 10 minutes.
Why so?

Erich
Send message
Joined: 18 Dec 15
Posts: 253
Credit: 1,942,248
RAC: 0
    
Message 5744 - Posted: 2 Dec 2016, 6:17:55 UTC - in response to Message 5742.

Work units from this lastest batch are not being processed properly by my machine. It's not properly using my CPUs, when it was doing just fine before.

same thing is happening here.
I have 3 multicore WUs (3 cores each) running for more than 5 hours now, and under "properties" I see that CPU usage was no more than about 10 minutes.
Why so?

again, the same problem happens with the next 3 WUs. What's wrong with them?
Are WUs not being tested before they are released for download?

Erich
Send message
Joined: 18 Dec 15
Posts: 253
Credit: 1,942,248
RAC: 0
    
Message 5745 - Posted: 2 Dec 2016, 6:56:23 UTC

I now have aborted all WUs downloaded yesterday and waiting in the queue for being processed.
Then I downloaded several new ones.
However, they show the same problem: CPU usage only for the first few minutes, then it is nil.

So, I am stopping processing ATLAS for the time being and switch to other projects.

Would be nice if the ATLAS people could tell us once healthy WUs are in the queue.
I will gladly switch back to ATLAS then.

computezrmle
Send message
Joined: 29 Oct 14
Posts: 54
Credit: 1,137,404
RAC: 0
    
Message 5746 - Posted: 2 Dec 2016, 7:30:34 UTC - in response to Message 5745.

... CPU usage only for the first few minutes, then it is nil.

Same problem on my hosts.

Andy_Taximan
Send message
Joined: 2 Feb 15
Posts: 40
Credit: 2,475,590
RAC: 0
    
Message 5747 - Posted: 2 Dec 2016, 8:30:09 UTC

same problem here :(

maeax
Send message
Joined: 25 Jun 14
Posts: 50
Credit: 1,700,662
RAC: 0
    
Message 5748 - Posted: 2 Dec 2016, 8:31:13 UTC

Multi-Core Tasks are finished after a lot of time and most time without CPU-using.

Graphic-Button in Boinc say no new collisions for ATLAS since yesterday.

http://atlasathome.cern.ch/result.php?resultid=7751768

Erich
Send message
Joined: 18 Dec 15
Posts: 253
Credit: 1,942,248
RAC: 0
    
Message 5749 - Posted: 2 Dec 2016, 8:37:48 UTC

David, what can you do to get this problem solved?

Brummig
Send message
Joined: 9 Feb 16
Posts: 32
Credit: 107,674
RAC: 0
    
Message 5751 - Posted: 2 Dec 2016, 15:38:32 UTC - in response to Message 5749.

Me too. Just aborted another Never Ending Task. Nine minutes of CPU, hours of slowly counting towards 100%, getting slower and slower.

Profile Yeti
Avatar
Send message
Joined: 20 Jul 14
Posts: 699
Credit: 22,597,832
RAC: 0
    
Message 5752 - Posted: 2 Dec 2016, 18:52:03 UTC

I have switched all my clients to LHC-Consolidated.

They will stay there until we get an info that the problems with the actual WUs are solved.

When switching back, it will take some time until all LHC-WUs are finished and my Atlas-Speed comes back to former ranges

Jesse Viviano
Send message
Joined: 20 Dec 15
Posts: 16
Credit: 328,453
RAC: 0
    
Message 5755 - Posted: 3 Dec 2016, 13:52:05 UTC

I first aborted all of my work units and then removed ATLAS@home because these jobs were not using my CPU much at all. I then decided to rejoin after seeing that some of the work units I aborted actually getting completed. The multicore work units I got did not use my CPU much, but some of them have completed after spending nearly 10 hours each and were validated.

Jesse Viviano
Send message
Joined: 20 Dec 15
Posts: 16
Credit: 328,453
RAC: 0
    
Message 5756 - Posted: 3 Dec 2016, 14:04:27 UTC
Last modified: 3 Dec 2016, 14:04:53 UTC

The lines below from the stderr for task 7760181 suggests that there is something wrong in these tasks because I normally do not see these lines in a stderr:

2016-12-03 05:24:22 (14548): Guest Log: - Last 10 lines from /home/atlas01/RunAtlas/Panda_Pilot_5050_1480725069/PandaJob_3104658639_1480725112/athena_stdout.txt -
2016-12-03 05:24:22 (14548): Guest Log: PyJobTransforms.trfExe.preExecute 2016-12-03 01:33:15,159 INFO Batch/grid running - command outputs will not be echoed. Logs for EVNTtoHITS are in log.EVNTtoHITS
2016-12-03 05:24:22 (14548): Guest Log: PyJobTransforms.trfExe.preExecute 2016-12-03 01:33:15,159 INFO Now writing wrapper for substep executor EVNTtoHITS
2016-12-03 05:24:22 (14548): Guest Log: PyJobTransforms.trfExe._writeAthenaWrapper 2016-12-03 01:33:15,159 INFO Valgrind not engaged
2016-12-03 05:24:22 (14548): Guest Log: PyJobTransforms.trfExe.preExecute 2016-12-03 01:33:15,160 INFO Athena will be executed in a subshell via ['./runwrapper.EVNTtoHITS.sh']
2016-12-03 05:24:22 (14548): Guest Log: PyJobTransforms.trfExe.execute 2016-12-03 01:33:15,160 INFO Starting execution of EVNTtoHITS (['./runwrapper.EVNTtoHITS.sh'])
2016-12-03 05:24:22 (14548): Guest Log: PyJobTransforms.trfExe.execute 2016-12-03 11:21:49,392 INFO EVNTtoHITS executor returns 65
2016-12-03 05:24:22 (14548): Guest Log: PyJobTransforms.trfExe.postExecute 2016-12-03 11:21:49,392 WARNING AthenaMP run was set to True, but no outputs file was found
2016-12-03 05:24:22 (14548): Guest Log: PyJobTransforms.trfExe.validate 2016-12-03 11:21:49,469 ERROR Validation of return code failed: Non-zero return code from EVNTtoHITS (65) (Error code 65)
2016-12-03 05:24:22 (14548): Guest Log: PyJobTransforms.trfExe.validate 2016-12-03 11:21:49,481 INFO Scanning logfile log.EVNTtoHITS for errors
2016-12-03 05:24:22 (14548): Guest Log: PyJobTransforms.transform.execute 2016-12-03 11:21:49,649 CRITICAL Transform executor raised TransformValidationException: Non-zero return code from EVNTtoHITS (65); Logfile error in log.EVNTtoHITS: "AthMpEvtLoopMgr ERROR Failure in waiting or sub-process finished abnormally"

This suggests that something is crashing within the virtual machines in the latest batch of work units.

1 · 2 · 3 · Next

Message boards : Number crunching : The cupboard is bare!