ATLAS 1.29 is released


Advanced search

Message boards : News : ATLAS 1.29 is released

1 · 2 · Next
Author Message
Profile Wenjing Wu
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 23 Jun 14
Posts: 31
Credit: 3,151,720
RAC: 204
      
Message 1233 - Posted: 3 Nov 2014, 10:15:38 UTC

ATLAS 1.29 is released, it has a minor change to increase the virtual machine memory from 1.5GB to 2GB as we notice some job failures due to memory limits.
____________

Profile Phil1966
Send message
Joined: 14 Jun 14
Posts: 39
Credit: 1,185,758
RAC: 0
    
Message 1234 - Posted: 3 Nov 2014, 10:28:13 UTC - in response to Message 1233.
Last modified: 3 Nov 2014, 10:28:50 UTC

please delete this post. thank you

Sherri C Bohn
Send message
Joined: 10 Oct 14
Posts: 1
Credit: 0
RAC: 0
Message 1235 - Posted: 3 Nov 2014, 11:03:12 UTC - in response to Message 1233.

Do I need to switch to ATLAS 1.29 and if I do, how do I go about that...
____________

Profile Yeti
Avatar
Send message
Joined: 20 Jul 14
Posts: 699
Credit: 22,597,832
RAC: 0
    
Message 1236 - Posted: 3 Nov 2014, 11:44:22 UTC - in response to Message 1233.

What about the longrunners with 1.28 ? I have Jobs still running that are 50 times the normal runtime of the box and they still are going on.

Abort or Keep going on ?

Profile Yeti
Avatar
Send message
Joined: 20 Jul 14
Posts: 699
Credit: 22,597,832
RAC: 0
    
Message 1238 - Posted: 3 Nov 2014, 12:03:37 UTC - in response to Message 1233.

ATLAS 1.29 is released, it has a minor change to increase the virtual machine memory from 1.5GB to 2GB as we notice some job failures due to memory limits.

Are you shure that it is already working ?

I'm not shure if BOINCTask is doing right about Memory but it tells me the same amount of Memory (1.464,84 MB) for 1.28 and 1.29 Tasks

fabby
Avatar
Send message
Joined: 24 Oct 14
Posts: 31
Credit: 13,947
RAC: 0
    
Message 1240 - Posted: 3 Nov 2014, 12:15:48 UTC - in response to Message 1233.

Just to be absolutely clear: for which platforms are you releasing 1.29?
Because I'm on Linux and am still running 1.27 tasks as of a few minutes ago...

;)

Fabby

zombie67 [MM]
Avatar
Send message
Joined: 18 Jun 14
Posts: 31
Credit: 1,175,117
RAC: 0
    
Message 1242 - Posted: 3 Nov 2014, 13:37:40 UTC - in response to Message 1240.
Last modified: 3 Nov 2014, 13:38:22 UTC

Just to be absolutely clear: for which platforms are you releasing 1.29?
Because I'm on Linux and am still running 1.27 tasks as of a few minutes ago...

;)

Fabby


All three apps were updated to 1.29.

http://atlasathome.cern.ch/apps.php

New tasks that are downloaded will get the new app.
____________
Dublin, California
Team: SETI.USA

fabby
Avatar
Send message
Joined: 24 Oct 14
Posts: 31
Credit: 13,947
RAC: 0
    
Message 1244 - Posted: 3 Nov 2014, 18:51:24 UTC - in response to Message 1242.

Thanks Zombie! (Feeling dumb now)...

I did click to like your post, but don't know if you did see anything of that as I can't see the rating of the post changing, so a little note as well!

F.

Andrej Filipcic
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 3 Jun 14
Posts: 41
Credit: 3,891,911
RAC: 0
    
Message 1254 - Posted: 4 Nov 2014, 10:26:02 UTC

The main reason to increase the memory to 2GB was that a high fraction of more complex jobs ended badly due to killed main process in the VM. Those jobs use ~1.1GB RSS, but the virtual memory is close to 2GB, so they die if the swap space is not configured.

I have been monitoring the 1.29 jobs on my desktops and they seem to proceed fine, without the issues present in 1.28. If you still have long runners with 1.28 active I suggest to kill them.

Note that the memory raises slowly from few 100MB to 2GB in about 15-30 min of runtime.

rbpeake
Send message
Joined: 27 Jun 14
Posts: 86
Credit: 8,794,961
RAC: 0
    
Message 1266 - Posted: 4 Nov 2014, 14:11:30 UTC - in response to Message 1254.
Last modified: 4 Nov 2014, 14:13:29 UTC

The main reason to increase the memory to 2GB was that a high fraction of more complex jobs ended badly due to killed main process in the VM. Those jobs use ~1.1GB RSS, but the virtual memory is close to 2GB, so they die if the swap space is not configured.

I have been monitoring the 1.29 jobs on my desktops and they seem to proceed fine, without the issues present in 1.28. If you still have long runners with 1.28 active I suggest to kill them.

Note that the memory raises slowly from few 100MB to 2GB in about 15-30 min of runtime.

I am getting some long runners (where the CPU stops but the work unit keeps on running and running) with 1.29 as well. Here http://atlasathome.cern.ch/result.php?resultid=413604 and here http://atlasathome.cern.ch/result.php?resultid=414018.

Thanks.

Tom*
Send message
Joined: 28 Jun 14
Posts: 118
Credit: 8,761,428
RAC: 0
    
Message 1267 - Posted: 4 Nov 2014, 14:35:01 UTC

1.29 fixed my long runners, I am sure there are other reasons for long runners

than just the vm size, like disks going offline mounted read-only etc.

So a big Thank You for 1.29 I am almost out of babysitting mode.

fabby
Avatar
Send message
Joined: 24 Oct 14
Posts: 31
Credit: 13,947
RAC: 0
    
Message 1276 - Posted: 5 Nov 2014, 19:43:25 UTC - in response to Message 1267.

Just wanted to let everyone know: running my first 1.29 job and it's consuming 2.1 GB of RAM (and 8MB of swap), so for everyone out there that has only 2GB of RAM/CPU core: adjust your Computing preferences to diminish the number of concurrent ATLAS@Home jobs (...or buy more RAM) ;)

I use the absolute number as I only have 1 machine with 2 cores. The Yetis among us might want to :


    Swap some memory around OR
    Set the % to 87.5 and take out the 4GB machine OR
    Keep the 4GB machine and set the % to 75%


to avoid excessive swapping...
Yeti, please feed back to post 1267 what the new rule of thumb should be after you've done some baby-sitting! >:)

Tom*
Send message
Joined: 28 Jun 14
Posts: 118
Credit: 8,761,428
RAC: 0
    
Message 1277 - Posted: 5 Nov 2014, 21:02:12 UTC - in response to Message 1276.

on a 12 gig machine I can only run 5 tasks at once the sixth task

causes a scheduler wait vm hypervisor was unable to allocate enough memory.

before 1.29 I could run 6 at once.

On a 16 gig machine when I ran 4 at once the memory gadget was at 52%

Now those same 4 tasks clock the memory gadget at 63%.

I will absolutely take this new behavior over the old behavior:-)

Profile Yeti
Avatar
Send message
Joined: 20 Jul 14
Posts: 699
Credit: 22,597,832
RAC: 0
    
Message 1278 - Posted: 6 Nov 2014, 20:08:40 UTC - in response to Message 1267.

1.29 fixed my long runners, I am sure there are other reasons for long runners

than just the vm size, like disks going offline mounted read-only etc.

So a big Thank You for 1.29 I am almost out of babysitting mode.


I have already crunched more than 130 1.29 WUs; they have validated okay, only 2 shortrunners have not been validated.

The crunching times vary from "normal" to 2x "normal".

I couldn't watch if the estimated crunch-time and the real crunch-time are in a good sync, but on the fly, it Looks good

Yeti

lancone
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 26 May 14
Posts: 219
Credit: 53,963
RAC: 0
    
Message 1281 - Posted: 7 Nov 2014, 8:59:09 UTC - in response to Message 1278.

Hello Yeti,

That is very good feedback from your side.
We do observe a much better turn around since 1.29 has been introduced.
As jobs for this validation sample, may be of very different types the running time per WU is not constant thus the estimated time is not accurate.

Regards

Profile Ray Murray
Avatar
Send message
Joined: 26 Jun 14
Posts: 35
Credit: 48,094
RAC: 0
    
Message 1283 - Posted: 7 Nov 2014, 10:27:54 UTC

1.29s do seem to be finishing better, closer to the initial estimate of c.30 hours rather than previous estimate of 3hrs but running to 30. Like Yeti, I've not caught one in the act of finishing so I don't know how close the % progress relates. Tidy-up at finish still doesn't always remove attached .vdi, and my last one left the .iso as well, which then have to be removed manually from Virtual Media Manager in VBox.
I have lost a couple where some excessive I/O from one of Eric's dodgy Sixtrack WUs caused a loss of contact between Atlas (and vLHC) and their respective virtual hard drives and they have had to be aborted.

I've only been allowing 1 Atlas, 1 vLHC with 2 Sixtrack without issue other than the occasional cited above.

Boinc 7.4.8
VBox 4.3.12
RAM 8GB

Profile tullio
Send message
Joined: 27 Jun 14
Posts: 256
Credit: 289,886
RAC: 0
    
Message 1286 - Posted: 7 Nov 2014, 17:02:03 UTC - in response to Message 1283.
Last modified: 7 Nov 2014, 17:02:40 UTC

I am running 2 ATLAS tasks, one vLHC task and one SETI@home GPU task, which uses very little CPU (0.04). They come down in blocks of ten, but run only one at the time. All this on my Windows 8.1 PC with 8 GB RAM.
tullio

Tom*
Send message
Joined: 28 Jun 14
Posts: 118
Credit: 8,761,428
RAC: 0
    
Message 1304 - Posted: 12 Nov 2014, 19:00:38 UTC - in response to Message 1276.
Last modified: 12 Nov 2014, 19:03:03 UTC

please feed back to post 1267 what the new rule of thumb should be after you've done some baby-sitting


My old Rule of Thumb was 2 gigs per task. With 1.29 it is now 2.5 gigs per task
for all three of my systems.

They increased the vm size of each task from 1.5 gigs to 2.0 gigs this
increased the ROT from 2.0 to 2.5 for Windows 7.

PS - I was the one doing baby-sitting not Yeti:-) but then we all may have been
with 1.28.

Profile Yeti
Avatar
Send message
Joined: 20 Jul 14
Posts: 699
Credit: 22,597,832
RAC: 0
    
Message 1315 - Posted: 14 Nov 2014, 8:30:54 UTC - in response to Message 1278.

1.29 fixed my long runners, I am sure there are other reasons for long runners

than just the vm size, like disks going offline mounted read-only etc.

So a big Thank You for 1.29 I am almost out of babysitting mode.


I have already crunched more than 130 1.29 WUs; they have validated okay, only 2 shortrunners have not been validated.

The crunching times vary from "normal" to 2x "normal".

I couldn't watch if the estimated crunch-time and the real crunch-time are in a good sync, but on the fly, it Looks good

Yeti

Okay, now I have crunched succesfull 258 WUs of 1.29

I had to abort 8 Tasks that seemed to have gone into Limbo, sadly with crunchtimes up to 53 hours

daniel
Send message
Joined: 17 Nov 14
Posts: 7
Credit: 0
RAC: 0
Message 1357 - Posted: 19 Nov 2014, 4:16:32 UTC

Hi
I'm having jobs hang with 1.29. So far I killed one, and a second one is hung, waiting for restart later, it says..

Something with my setup?

1 · 2 · Next

Message boards : News : ATLAS 1.29 is released