Checklist Version 2 for Atlas@Home on your PC


Advanced search

Message boards : Number crunching : Checklist Version 2 for Atlas@Home on your PC

1 · 2 · Next
Author Message
Profile Yeti
Avatar
Send message
Joined: 20 Jul 14
Posts: 699
Credit: 22,597,832
RAC: 0
    
Message 5350 - Posted: 21 Sep 2016, 9:40:00 UTC
Last modified: 21 Sep 2016, 9:46:28 UTC

I see new users again and again complain that Atlas doesn't run well on their PCs or they don't produce succesfull results.

This checklist is the intention to help.

As BOINC doesn't allow us to keep the original-checklist up to date, we have to make a new thread from time to time. This Version is actualized with all new informations / hints we got since the first checklist was made. This checklist was last updated at 21.09.2016

Please, check this list and be shure to check really all Details, step by step, all are important.


  1. Do you use an actual BOINC-x64-Client ? At the Moment, 7.6.22 does it very well

  2. Do you have installed VirtualBox ? At the Moment, 5.1.2 and 5.1.4 are doing very well, Atlas-Team even recommends to use them. In near future, Atlas will stop working on VirtualBox 4.x

  3. You should install the ExtensionPack according to your VirtualBox-Version. So, if you are running VirtualBox 5.1.2, you should install the ExtensionPack for 5.1.2

  4. Check, if (Intel = VT-X / AMD = AMD-v / VIA = VIA-vt) in your BIOS is switched on. To check you can use a great tool from the web. Download LeoMoon CPU-V and check if it gives you 2 green okays.

  5. Did you try to crunch Projects using VMs in the past while VT-X was not enabled? Could be that BOINC has kept this in mind!

    To check and fix this, first exit BOINC and make shure, all BOINC-Tasks have really finished.

    In your BOINC_Data-Directory you will find a client_state.xml. Open this with a simple editor and search for:
    [b]<p_vm_extensions_disabled>1</p_vm_extensions_disabled>[/b]

    If this is absent or the number is 0 / zero than all is fine. Otherwise change it to 0 / zero <p_vm_extensions_disabled>0</p_vm_extensions_disabled> and safe the file. Be carefull to save it as a real ascii-file

    Be carefull that you closed your BOINC-Client succesfull before you change anything in client_state.xml. Otherwise BOINC will overwright your changes

  6. Check, if you have have enough RAM for Atlas available. Each Atlas-Task needs 2,1 GB free RAM, MultiCore-WUs need 2,5 GB + 0,8 * number of cores (so 6,5 GB for a 5-Core WU)

  7. Check that your Windows-Firewall lets the communication work. BOINC.EXE and VBoxHeadless.exe Need out- and incoming communications.

  8. Check that your AntiVirus ignores your BOINC_Data-Directory

  9. Try to run only 1 Atlas at a time until you got it succesfull working
    ..... A) You can suspend the other Tasks manually
    ..... B) you can use an app_config.xml

  10. Atlas connects on different ports to their Servers as BOINC-Users are used. You will have to open these ports:
    ..... HTTP (Port 80)
    ..... HTTP Proxy (Port 3128)
    ..... HTTPS (Port 443)
    ..... XMPP (Port 5222)
    ..... TCP Port 9094

    There is a new page that gives you official Informations from Project_team

  11. If all this is ok, you should be ready to start.

  12. If you run a Task, you can mark it in BOINC and check the Properties. Interesting for you is "CPU-Time at last checkpoint" versus "CPU-Time". They should have only a small difference of 10 to 20 minutes. A simple example from my box is: 01:04:09 versus 01:22:26. This is 8 minutes difference and this is okay. If there are big differences something seems to be wrong.

    With MultiCore-WUs after 10 minutes CPU-Time should climb much faster than elapsed-time

  13. With Atlas 1.44 / 1.46 I have seen no simply longrunner among thousands of crunched WUs. My slowest PC has done a Task in max 12 hours, my fastest do it in 01:04 or usually in 1 hour 40/50 minutes.

  14. If your WUs seem to start up fine, we can get folling scenarios:


    • Szenario A:

      Your WUs end up after 10 or 20 minutes then there could something still be wrong mostly on your PC or your Firewall.

    • Szenario B:

      Your WUs run more than 20 / 30 minutes but your CPU-Time is only 10 or 20 seconds, then we do not know exactly what is the reason.

      In one case we could identify a faulty DNS-Server as reason.

      You could help us to find the reason for this. First try a project reset of Atlas.

      If this helped: fine! Let us know

      If this didn't help maybe you should consider to clean up the install as described in the last point

    • Szenario C:

      Your WUs end up after several seconds. In the logs you can find something like "Error Code: ERR_CPU_VM_EXTENSIONS_DISABLED"

      Then you should go back to Point Nr 4 + 5 above


  15. If you think, somethink is still not right, you can take a look inside the VM (That's why we asked you to install the extension pack).
    ..... Mark the running AtlasJob in BOINC-Manager
    ..... Choose "Show VM Console" in the left side.
    ..... A console should open showing following lines (with Atlas 1.44)



    If your Console looks like this, all is fine and your WU should finish succesfull soon

  16. If you want to clean up your install:

    • Set Atlas-Project to "No New Tasks"
    • Abort all Atlas-Tasks in BOINC-Manager
    • Force BOINC to communicate with Atlas-Server until all Tasks are gone in your task-list
    • Exit BOINC
    • Open VirtualBoxManager and delete all VMs that are listed (be carefull not to delete VMs of vLHC or CMS)
    • Exit VirtualBoxManager
    • Reboot your PC


    Now you should be ready for a new try

    In some circumstances it was necessary to completly deinstall VirtualBox / BOINC, reboot the PC and then re-install VirtualBox / BOINC

  17. Want to run MultiCore-WUs but you don't like the number of cores it takes?

    No Problem, look in this thread how to reduce the number of cores MultiCore-WUs use


Still not working ? Post your problem here

Yeti

last edited: 19.09.2016

Profile Georg Stoifl
Avatar
Send message
Joined: 28 Oct 14
Posts: 75
Credit: 5,187,269
RAC: 0
    
Message 5432 - Posted: 4 Oct 2016, 8:52:18 UTC
Last modified: 4 Oct 2016, 8:53:54 UTC

Nice... awesome summary!

I add here my app_config.xml, took me a while to figure out how to configure it. I have a 12 core cpu and use 10 cores to not max out the machine.
-> two MultiCore tasks with 4 cores each + two single core tasks

<app_config>
<app>
<name>ATLAS</name>
<max_concurrent>2</max_concurrent>
</app>
<app>
<name>ATLAS_MCORE</name>
<max_concurrent>2</max_concurrent>
</app>
<app_version>
<app_name>ATLAS_MCORE</app_name>
<avg_ncpus>4.000000</avg_ncpus>
<plan_class>vbox_64_mt_mcore</plan_class>
<cmdline>--memory_size_mb 30000</cmdline>
</app_version>
</app_config>


Anything to improve on this app config ?

Thanks,
Georg

computezrmle
Send message
Joined: 29 Oct 14
Posts: 54
Credit: 1,137,404
RAC: 0
    
Message 5434 - Posted: 4 Oct 2016, 10:28:06 UTC - in response to Message 5432.

Well done.

As your host has 32 GB RAM I would only reduce the memory_size_mb value (30000 -> 5700) according to No. 6 in Yeti´s checklist.

Profile Yeti
Avatar
Send message
Joined: 20 Jul 14
Posts: 699
Credit: 22,597,832
RAC: 0
    
Message 5435 - Posted: 4 Oct 2016, 10:33:22 UTC - in response to Message 5434.

Well done.

As your host has 32 GB RAM I would only reduce the memory_size_mb value (30000 -> 5700) according to No. 6 in Yeti´s checklist.

Yes, 5700 MB will be the right number based on the formula from David

For me I round it up a little to 6000 MB

Pavel Hanak
Send message
Joined: 28 Jul 16
Posts: 9
Credit: 583,593
RAC: 0
    
Message 5438 - Posted: 5 Oct 2016, 6:47:39 UTC
Last modified: 5 Oct 2016, 6:48:05 UTC

I've checked everything on the new checklist, but unfortunately I'm still getting a lot of WUs that fail according to "Scenario B". However, I've noticed one thing: When I abort such tasks manually, a "BOINC VirtualBox Wrapper has stopped working" window appears. I suspect it may even be the reason why the "Scenario B" failures happen in the first place. Any ideas why the Wrapper may not work properly?

Profile Yeti
Avatar
Send message
Joined: 20 Jul 14
Posts: 699
Credit: 22,597,832
RAC: 0
    
Message 5439 - Posted: 5 Oct 2016, 7:05:22 UTC - in response to Message 5438.

I've checked everything on the new checklist, but unfortunately I'm still getting a lot of WUs that fail according to "Scenario B". However, I've noticed one thing: When I abort such tasks manually, a "BOINC VirtualBox Wrapper has stopped working" window appears. I suspect it may even be the reason why the "Scenario B" failures happen in the first place. Any ideas why the Wrapper may not work properly?

Your computers are hidden so I can not take a look into your logfiles

Pavel Hanak
Send message
Joined: 28 Jul 16
Posts: 9
Credit: 583,593
RAC: 0
    
Message 5441 - Posted: 5 Oct 2016, 17:48:54 UTC - in response to Message 5439.
Last modified: 5 Oct 2016, 17:52:54 UTC

Your computers are hidden so I can not take a look into your logfiles

I've "unhidden" my computers, please try to look again. I'm having trouble only with computer ID 53002, the other two run fine. The problematic machine is behind small home router, but I highly doubt this is the source of the problem; it allows all outgoing traffic, and I've opened all the ports listed in your checklist for incoming traffic. The other two machines have true public IPv4 adresses.

Profile Georg Stoifl
Avatar
Send message
Joined: 28 Oct 14
Posts: 75
Credit: 5,187,269
RAC: 0
    
Message 5448 - Posted: 6 Oct 2016, 12:57:32 UTC
Last modified: 6 Oct 2016, 12:57:50 UTC

I've changed memory_size_mb to 5700..

Thanks!
Georg

Erich
Send message
Joined: 18 Dec 15
Posts: 253
Credit: 1,942,248
RAC: 0
    
Message 5641 - Posted: 15 Nov 2016, 9:27:59 UTC - in response to Message 5350.

...



If your Console looks like this, all is fine and your WU should finish succesfull soon



Yeti, one question: Will the WU finish sucessfully ONLY if the console shows exactly what is shown here?
And will it NOT finish successfully if anything else is shown?

Your reply to this question would help me a lot. Many thanks.

Profile Yeti
Avatar
Send message
Joined: 20 Jul 14
Posts: 699
Credit: 22,597,832
RAC: 0
    
Message 5644 - Posted: 15 Nov 2016, 12:20:08 UTC - in response to Message 5641.

Yeti, one question: Will the WU finish sucessfully ONLY if the console shows exactly what is shown here?
And will it NOT finish successfully if anything else is shown?

It could finish showing this:



But with this it should finish within 10 or 20 minutes, otherwise you could abort it.

I don't know about any other screen-content that could finish succesfull

Could you post or describe what you see ?

Erich
Send message
Joined: 18 Dec 15
Posts: 253
Credit: 1,942,248
RAC: 0
    
Message 5649 - Posted: 15 Nov 2016, 17:48:36 UTC - in response to Message 5644.

Could you post or describe what you see ?

my problem is that I don't manage to copy the content of the VM console window :-(

Profile Yeti
Avatar
Send message
Joined: 20 Jul 14
Posts: 699
Credit: 22,597,832
RAC: 0
    
Message 5650 - Posted: 15 Nov 2016, 18:06:42 UTC - in response to Message 5649.

Could you post or describe what you see ?

my problem is that I don't manage to copy the content of the VM console window :-(

Maybe you find in this thread what you are seeing ...

Erich
Send message
Joined: 18 Dec 15
Posts: 253
Credit: 1,942,248
RAC: 0
    
Message 5651 - Posted: 15 Nov 2016, 19:05:44 UTC - in response to Message 5650.

Could you post or describe what you see ?

my problem is that I don't manage to copy the content of the VM console window :-(

Maybe you find in this thread what you are seeing ...

yes, indeed. Things like those shown in that thread, and slightly others as well.

After the WUs get started, all goes well for a while, then I notice that only one core is used instead of two, and also RAM usage is much less than it should be.
And from that point on, these strange entries are shown in the VM console.

The interesting thing is that
1) there was one WU which was completed and validated well after I had to switch to multicore recently - all others failed, though.
2) the single core WUs which were running on this system until about a week ago all went well.

There is only one thing left which I could try: to change from 1 WU with 2 cores to 2 WUs with 1 core each. Maybe this will work, no idea. I will see and report here.

Does anyone have any idea what the problem could be?

Erich
Send message
Joined: 18 Dec 15
Posts: 253
Credit: 1,942,248
RAC: 0
    
Message 5652 - Posted: 15 Nov 2016, 19:58:04 UTC

As said above, I now switched, via app_config, from 1 tasks 2-cores to 2 tasks 1-core ea.
The app_config is:

<app_config>
<app_version>
<app_name>ATLAS_MCORE</app_name>
<avg_ncpus>1.000000</avg_ncpus>
<plan_class>vbox_64_mt_mcore</plan_class>
<cmdline>--memory_size_mb 3300</cmdline>
</app_version>
<project_max_concurrent>2</project_max_concurrent>
</app_config>

(I have also tried it with 4000MB and 5000MB in the 6th line of the app).

The problem now is that only one task is active, a second one indicates "waiting to run" in the status column. Which means it would only start after the active one gets finished. And a third task which had been downloaded says "waiting for memory" in the status line.

Before, with the real single-core WUs, 2 such tasks were running concurrently without any problem.
So, the problem now obviously is with the multicore WUs.

What can to do to get 2 ATLAS tasks run concurrently, as I had it before with the single-core tasks?

PHILIPPE
Send message
Joined: 24 Jul 16
Posts: 84
Credit: 53,413
RAC: 0
    
Message 5653 - Posted: 15 Nov 2016, 20:29:15 UTC - in response to Message 5652.

Erich wrote:
What can to do to get 2 ATLAS tasks run concurrently, as I had it before with the single-core tasks?

Because of the lack of RAM memory in my computer (only 3756.51 Mo), i tried to reduce the amount of ram needed in the app_config.
3300 is a theoretical number which suits with the law : 2.5+0.8*number of core.
For me i execute one core-wu with 2500.
2016-11-11 10:08:47 (7932): Create VM. (boinc_3b72a85e1c527f88, slot#2)
2016-11-11 10:08:49 (7932): Setting Memory Size for VM. (2500MB)

It allows me to go on using my computer while the tasks are running.For the moment, it seems to agree with the load of my computer .
So , maybe try :
<cmdline>--memory_size_mb 2500</cmdline>
It could solve temporarily your problem...

Erich
Send message
Joined: 18 Dec 15
Posts: 253
Credit: 1,942,248
RAC: 0
    
Message 5654 - Posted: 15 Nov 2016, 20:53:09 UTC - in response to Message 5653.

Philippe, thanks for your hints.

My RAM is 8GB. I am also running one instance of GPUGRID and one instance of WCG. Both use max. 200MB altogether. Then there are just a few basic things running, like antivirus, Windows Explorer, etc. They all use very little RAM.

When I ran 2 ATLAS single-core tasks before, together with the applications mentioned above, about 6 GB RAM was used (out of 8GB).

So, running 2 ATLAS tasks, allowing total 5000MB RAM should not be a problem.

If I change the MB figure in the app-line to read:
<cmdline>--memory_size_mb 2500</cmdline>
I guess that this will NOT get a second ATLAS task to run concurrently, since this value will be too low for 2 tasks; don't you think so?

I will try, but I doubt.

Profile Yeti
Avatar
Send message
Joined: 20 Jul 14
Posts: 699
Credit: 22,597,832
RAC: 0
    
Message 5655 - Posted: 15 Nov 2016, 21:12:21 UTC - in response to Message 5652.

The problem now is that only one task is active, a second one indicates "waiting to run" in the status column. Which means it would only start after the active one gets finished. And a third task which had been downloaded says "waiting for memory" in the status line.

Before, with the real single-core WUs, 2 such tasks were running concurrently without any problem.
So, the problem now obviously is with the multicore WUs.

What can to do to get 2 ATLAS tasks run concurrently, as I had it before with the single-core tasks?

Put more memory into your PC ;-)

There is a scheduler bug in BOINC-Projects: The scheduler calculates the memory each WU needs. By this it ignores the settings of your app_config

So, BOINC-Clients thinks, the WU on your small PC needs round about 5,8 GB to run ...

PHILIPPE
Send message
Joined: 24 Jul 16
Posts: 84
Credit: 53,413
RAC: 0
    
Message 5656 - Posted: 15 Nov 2016, 21:16:27 UTC - in response to Message 5654.

In fact the previous tasks (single core), executed with the app :ATLAS Simulation v2.01 (vbox_64)
windows_x86_64
used exactly 2241Mo of RAM, each one.

2016-11-08 20:57:07 (6096): Setting Memory Size for VM. (2241MB)

So before the change, you used only 4482Mo of RAM.

I try to look at one 2-core wu
(ATLAS Simulation Running on Multiple Core v1.04 (vbox_64_mt_mcore)
windows_x86_64) you aborted and i see abnormal setting in the log:
2016-11-13 07:24:21 (9952): Setting Memory Size for VM. (3300MB) <-- normaly (4100MB)
2016-11-13 07:24:21 (9952): Setting CPU Count for VM. (2) <-- for 2-core wu


Is this the reason why your 2-core fail?

Erich
Send message
Joined: 18 Dec 15
Posts: 253
Credit: 1,942,248
RAC: 0
    
Message 5657 - Posted: 15 Nov 2016, 21:40:14 UTC - in response to Message 5655.

Put more memory into your PC ;-)

There is a scheduler bug in BOINC-Projects: The scheduler calculates the memory each WU needs. By this it ignores the settings of your app_config

So, BOINC-Clients thinks, the WU on your small PC needs round about 5,8 GB to run ...


Unfortunately, this PC MoBo is limited to 8GB RAM.

In other words: although I was able run 2 single-core tasks before, I now cannot run more than 1 multicore-task (1 core).
Is this assumption correct?

If so, the recent elimination of single-core tasks is a big disadvantage for me :-(

Profile Yeti
Avatar
Send message
Joined: 20 Jul 14
Posts: 699
Credit: 22,597,832
RAC: 0
    
Message 5658 - Posted: 15 Nov 2016, 22:25:55 UTC - in response to Message 5657.

In other words: although I was able run 2 single-core tasks before, I now cannot run more than 1 multicore-task (1 core).
Is this assumption correct?

If so, the recent elimination of single-core tasks is a big disadvantage for me :-(

As long as the scheduler-bug isn't solved you are right.

But you could try 1 MultiCore-Task with two cores

1 · 2 · Next

Message boards : Number crunching : Checklist Version 2 for Atlas@Home on your PC