ATLAS Simulation v1.20 released


Advanced search

Message boards : News : ATLAS Simulation v1.20 released

1 · 2 · Next
Author Message
David Cameron
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 13 May 14
Posts: 252
Credit: 2,028,082
RAC: 0
    
Message 182 - Posted: 3 Jul 2014, 18:59:31 UTC

The Boinc developers have provided a new virtualbox wrapper which should hopefully fix the issues seen with endless work units. This has been incorporated into release 1.20. Please let us know if you still see work units which hang at the end of execution. If you have work units like this from previous versions please abort them. Apologies for the waste of CPU time.

Big thanks to the Boinc developers for their help with this problem!

Profile nenym
Send message
Joined: 2 Jul 14
Posts: 2
Credit: 235,237
RAC: 0
    
Message 183 - Posted: 3 Jul 2014, 19:31:06 UTC
Last modified: 3 Jul 2014, 19:35:36 UTC

Hm....not sure
i7-4770K: estimated time 21 min, till 50% (progress cca 0,1%/sec) it looked like OK, now 1:19 97,5% and progress 0,002%/sec
i5-4570S: estimated time 29 min, till 50% (progress cca 0,1%/sec) it looked like OK, now 1:28 96,3% and progress 0,002%/sec

Both machines W7 64bit

Abort or let it run?

David Cameron
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 13 May 14
Posts: 252
Credit: 2,028,082
RAC: 0
    
Message 184 - Posted: 3 Jul 2014, 20:45:54 UTC - in response to Message 183.

Sorry, there were some changes missing in 1.20. We are preparing 1.21 just now.

Profile nenym
Send message
Joined: 2 Jul 14
Posts: 2
Credit: 235,237
RAC: 0
    
Message 185 - Posted: 3 Jul 2014, 21:04:50 UTC - in response to Message 184.

OK, aborting.

David Cameron
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 13 May 14
Posts: 252
Credit: 2,028,082
RAC: 0
    
Message 186 - Posted: 3 Jul 2014, 21:13:14 UTC - in response to Message 185.

v1.21 is out now.

DJ GrumpyPants
Send message
Joined: 4 Jul 14
Posts: 1
Credit: 77,619
RAC: 0
    
Message 193 - Posted: 4 Jul 2014, 11:44:54 UTC

Ran 4 work units over night. Woke up to all 4 at 100%, but still running. Looks like they will never complete.

Profile LHCByloved
Avatar
Send message
Joined: 28 Jun 14
Posts: 29
Credit: 6,406
RAC: 0
  
Message 194 - Posted: 4 Jul 2014, 16:09:21 UTC

Hi there

On my host (Mac OS 10.8 + BOINC 7.2.42 + VBox 4.3.12 + 4 GB RAM) everything is working fine now, with 1.19 I had some trouble due to high RAM use of up to 2048 MB, now with version 1.21 and 1024 RAM for VM it works fine together with other apps (T4T and normal computer use).

I did not observe the issue about 100% not ending tasks so far.

Greetings,
Bylo

Profile Ray Murray
Avatar
Send message
Joined: 26 Jun 14
Posts: 35
Credit: 48,094
RAC: 0
    
Message 198 - Posted: 4 Jul 2014, 21:09:32 UTC
Last modified: 4 Jul 2014, 21:31:53 UTC

SEEMS to be working fine on Windows7 but I can't get Linux to play.
I reset my 64bit guest Virtual Xubuntu to get the latest 1.21 but I'm still getting the message in the Status column after only 12 seconds;
Waiting to run (Scheduler wait: Please upgrade Boinc to the latest version.)
as reported in the [Edits] of my earlier post and by Hoshione here. I already have Boinc 7.2.42 installed which works with T4T although I do have to use an app_info with wrapper 26037 as I get a different "Scheduler wait: job unmanageable" message with later wrappers.

stderr shows "ERROR: Invalid configuration. VM type requires acceleration but the current configuration cannot support it." and trying to start the VM manually says that
VT-x/AMD-V hardware acceleration is not available on your system. Your 64-bit guest will fail to detect a 64-bit CPU and will not be able to boot.

As already stated, T4T runs its 32bit VM so VT-x/AMD-V must be enabled.

I don't really know what I'm doing with Linux but as something is broken, I thought I'd fiddle a bit to see what would happen. Somebody more knowledgeable with Linux might be able to provide better input.

Making start_atlas.sh executable populated the shared folder with a number of new files including, after restarting Boinc, a result.tar.gz which wasn't previously present. Restarting Boinc, the VM started up but gave the "FATAL can't read from boot medium" as "Not Attached" shows in Storage in VBox manager, however it did then close itself down and I got credit for 90 seconds work.
The stderrs of these two tasks that I interfered with, here and here, look a lot more promising and may provide better insight as to what is going on as I don't really know what to look for.

Hope some of that might be helpful.

! Just spotted a huge 1.24GB file coming in on the Win7 host for an estimated 7hour wu where 1-2 hour wus have 75MB files !

neutrinoboy
Send message
Joined: 3 Jul 14
Posts: 2
Credit: 40,896
RAC: 0
    
Message 201 - Posted: 5 Jul 2014, 5:06:26 UTC

Having trouble with v1.21 jobs.
Windows 8.1 (BOINC 7.4.8 + Vbox 4.3.12) runs 1.20 fine but fails on 1.21 jobs asking for BOINC upgrade.
Under Fedora 3.14.9 (BOINC 7.5.0 + Vbox 4.3.12) both 1.20 and 1.21 jobs run fine

Profile MAGIC Quantum Mechanic
Avatar
Send message
Joined: 4 Jul 14
Posts: 331
Credit: 485,372
RAC: 0
    
Message 202 - Posted: 5 Jul 2014, 6:19:13 UTC - in response to Message 201.

I got that too with v1.21 but after updating Boinc it works fine on my Win7 and 8.1
____________

neutrinoboy
Send message
Joined: 3 Jul 14
Posts: 2
Credit: 40,896
RAC: 0
    
Message 204 - Posted: 5 Jul 2014, 10:11:23 UTC - in response to Message 202.

What version of BOINC did you upgrade to on Win8.1?

Profile MAGIC Quantum Mechanic
Avatar
Send message
Joined: 4 Jul 14
Posts: 331
Credit: 485,372
RAC: 0
    
Message 214 - Posted: 5 Jul 2014, 20:05:30 UTC - in response to Message 204.

7.2.28 on my only 8.1 and it works with no problems and am trying the newer 7.2.42 on a Win7 just to see how it will run since it is a 8-core laptop

http://atlasathome.cern.ch/results.php?hostid=1440
____________

Profile jay
Send message
Joined: 12 Jul 14
Posts: 21
Credit: 15,290
RAC: 0
    
Message 280 - Posted: 15 Jul 2014, 5:34:42 UTC

Hi,
Some WU finished OK.
Now, some hare hanging at completion.

I am just getting started and aborted some WU a day or two ago while learning.
http://atlasathome.cern.ch/results.php?hostid=2156

All was going well. Was able to complete a few WU.

Then, I ran 5 Atlas WU at once.
These started OK. They went to 87% of memory and they page-swapped for a few seconds; then ran without swaps.
But all 5 are hanging at completion (run for 2 hours after 100% complete) No appreciable CPU utilization - but memory not released from virtualBox.


Would VBOX Logs help?
This looks suspicious to me:

00:47:55.385583 AIOMgr: Endpoint for file '/var/lib/boinc-client/slots/0/vm_image.vdi' (flags 000c0723) created successfully 00:47:55.391939 AIOMgr: Flush failed with VERR_INVALID_PARAMETER, disabling async flushes 00:47:55.394125 Changing the VM state from 'RESUMING' to 'RUNNING'. 00:48:25.288151 PIIX3 ATA: LUN#0: IDLE IMMEDIATE, CmdIf=0xca (30959554 usec ago) 00:48:25.288173 PIIX3 ATA: LUN#0: aborting current command 00:48:30.293945 PIIX3 ATA: LUN#0: IDLE IMMEDIATE, CmdIf=0xe1 (35965349 usec ago) 00:48:30.293964 PIIX3 ATA: LUN#0: aborting current command 00:48:30.293982 PIIX3 ATA: Ctl#0: RESET, DevSel=0 AIOIf=0 CmdIf0=0xe1 (35965387 usec ago) CmdIf1=0x00 (-1 usec ago) 00:48:30.345762 PIIX3 ATA LUN#0: Async I/O thread probably stuck in operation, interrupting 00:48:31.095220 PIIX3 ATA LUN#0: Async I/O thread probably stuck in operation, interrupting 00:48:32.645496 PIIX3 ATA LUN#0: Async I/O thread probably stuck in operation, interrupting 00:48:34.794803 PIIX3 ATA LUN#0: Async I/O thread probably stuck in operation, interrupting 00:48:35.144616 PIIX3 ATA LUN#0: Async I/O thread probably stuck in operation, interrupting 00:48:35.345280 PIIX3 ATA LUN#0: Async I/O thread probably stuck in operation, interrupting 00:48:35.544638 PIIX3 ATA LUN#0: Async I/O thread probably stuck in operation, interrupting 00:48:35.744676 PIIX3 ATA LUN#0: Async I/O thread probably stuck in operation, interrupting 00:48:35.944836 PIIX3 ATA LUN#0: Async I/O thread probably stuck in operation, interrupting 00:48:36.144731 PIIX3 ATA LUN#0: Async I/O thread probably stuck in operation, interrupting


All the Logs in the 5 slots were similar - except one that had:
01:35:29.326274 PIIX3 ATA LUN#0: Async I/O threa01:35:34.169336 PIIX3 ATA LUN#0: Async I/O thread probably stuck in operation, interrupting

Other logs or debug requested?
Or should I abort and go on - with only 2 at a time??

I am running Linux,
Package linux-image-3.13.0-29-lowlatency: i 3.13.0-29.53
Package virtualbox: i 4.3.10-dfsg-1
Package boinc-client: i A 7.2.42+dfsg-1
Package boinc-manager: i A 7.2.42+dfsg-1


Thanks!!
Jay

lancone
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 26 May 14
Posts: 219
Credit: 53,963
RAC: 0
    
Message 283 - Posted: 15 Jul 2014, 9:50:43 UTC - in response to Message 280.

Hello Jay,

5 WUs at the time on a 8GB machine should be a bit heavy,
may be you could also try with VirtualBox version 4.2.16

Regards

Profile jay
Send message
Joined: 12 Jul 14
Posts: 21
Credit: 15,290
RAC: 0
    
Message 291 - Posted: 16 Jul 2014, 5:08:05 UTC - in response to Message 283.

Greetings!
Thanks for the response.

I really didn't intend to run all 5 at once.
Had started with 2 atlas, 1 CPU support for GPU, 3 WCG.
But the BOINC scheduler picked up 3 more atlas when the WCG completed.
I needed to pay more attention to that.

Now running 2 or 3 at once and all is going OK.

Eventually, I would like to run Atlas with other projects without
running more that 3 Atlas WU.

Does anyone know if using the <max_concurrent>3</max_concurrent>
(in an app_config.xml) would work?

HHHmmm.. I'll try it and see.

<app_config>
<app>
<name>ATLAS</name>
<max_concurrent>3</max_concurrent>
</app>
</app_config>

(place in the Atlas project directory.)

The intent is to run 6 of 8 kernels, limiting Atlas to no more that 3.
I'll keep fingers crossed.

Thanks again, Jay

captainjack
Send message
Joined: 28 Jun 14
Posts: 21
Credit: 182,001
RAC: 0
    
Message 298 - Posted: 16 Jul 2014, 13:40:15 UTC

Hi jay,

Yes an app_info does work for Atlas. I currently have mine limited to two concurrent. And your app config looks like it should work.

Good luck on your test.

Jacob Klein
Send message
Joined: 21 Jun 14
Posts: 48
Credit: 27,798
RAC: 0
    
Message 299 - Posted: 16 Jul 2014, 14:03:36 UTC
Last modified: 16 Jul 2014, 14:07:34 UTC

Just a heads up...

In order to get a BOINC VM Project to work with the newly-released VirtualBox 4.3.14, the application must be using a VBoxWrapper version of 26095 or greater. If a user decides to do this, the VMs will run at normal priority, as the VirtualBox changes in 4.3.14 prevent VBoxWrapper from controlling process priority.

ATLAS 1.22 does not currently use a new enough VBoxWrapper version, I believe. So, once the user has the 1.22 files downloaded, then the user must specify an Anonymous platform app_info.xml in order to make it work. http://boinc.berkeley.edu/wiki/client_configuration ... They'll also need to download the version of the wrapper they want, and extract the files into the project's directory. http://boinc.berkeley.edu/dl/?C=M;O=D

So, with all that being said, here is my current setup. I have both an app_info.xml (to run ATLAS 1.22 as Anonymous platform and make ATLAS work using VirtualBox 4.3.14), as well as an app_config.xml (to limit max_concurrent, to 1 in my case, since I have several other VMs running), as well as the VBoxWrapper files, all in my ATLAS project directory.

--------------------------------------------------------------
app_config.xml:
--------------------------------------------------------------

<!-- ATLAS@Home --> <!-- Limit max_concurrent, since too many VMs can cause system to entirely freeze. --> <app_config> <!-- ATLAS Simulation --> <app> <name>ATLAS</name> <max_concurrent>1</max_concurrent> </app> </app_config>


--------------------------------------------------------------
app_info.xml:
--------------------------------------------------------------
<app_info> <app> <name>ATLAS</name> <user_friendly_name>ATLAS Simulation</user_friendly_name> <non_cpu_intensive>0</non_cpu_intensive> </app> <file_info> <name>vboxwrapper_26095_windows_x86_64.exe</name> <executable/> </file_info> <file_info> <name>vboxwrapper_26095_windows_x86_64.pdb</name> </file_info> <file_info> <name>ATLAS_vbox_1.22_windows_x86_64_vm_image.vdi</name> </file_info> <file_info> <name>ATLAS_vbox_job_1.22_windows_x86_64.xml</name> </file_info> <app_version> <app_name>ATLAS</app_name> <version_num>122</version_num> <platform>windows_x86_64</platform> <avg_ncpus>1.000000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <flops>48887539185.877502</flops> <plan_class>vbox64</plan_class> <api_version>7.5.0</api_version> <file_ref> <file_name>vboxwrapper_26095_windows_x86_64.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>vboxwrapper_26095_windows_x86_64.pdb</file_name> </file_ref> <file_ref> <file_name>ATLAS_vbox_1.22_windows_x86_64_vm_image.vdi</file_name> <open_name>vm_image.vdi</open_name> <copy_file/> </file_ref> <file_ref> <file_name>ATLAS_vbox_job_1.22_windows_x86_64.xml</file_name> <open_name>vbox_job.xml</open_name> </file_ref> </app_version> </app_info>



--------------------------------------------------------------
Regards,
Jacob

Profile jay
Send message
Joined: 12 Jul 14
Posts: 21
Credit: 15,290
RAC: 0
    
Message 300 - Posted: 16 Jul 2014, 14:13:38 UTC - in response to Message 298.

Thanks Jack!

I lowered my setting to two as well.

It worked! Great!

Also, I had set all projects to "No new task" and waited for all to finish.
Then, with "% processors" set to 75% (in preferences) I allowed Atlas to get Workunits.
It downloaded 6 WU (6 of 8 kernels) at once!
(Well, almost at once. It took just under an hour to download on my Bellsouth DSL.)
Now, I can start up other projects and crunch away....
:)

Ahhhhh, perhaps I should ask? What should I know about trying to run 100% of the kernels? Does VMBox play well with others?
I have one CPU allocated for 1 GPU WU running - so there is a little slack... sometimes.

T H A N K Y O U !!!


Jay

rbpeake
Send message
Joined: 27 Jun 14
Posts: 86
Credit: 8,794,961
RAC: 0
    
Message 301 - Posted: 16 Jul 2014, 14:41:05 UTC
Last modified: 16 Jul 2014, 14:41:49 UTC

Seems like there is no compelling reason to upgrade to Virtual Box 4.3.14?

Tom*
Send message
Joined: 28 Jun 14
Posts: 118
Credit: 8,761,428
RAC: 1
    
Message 302 - Posted: 16 Jul 2014, 14:45:21 UTC - in response to Message 301.

+1

Seems like there is no compelling reason to upgrade to Virtual Box 4.3.14?

1 · 2 · Next

Message boards : News : ATLAS Simulation v1.20 released