I posted about this error several days ago here, but as there's been no response thought I'd start my own thread. At the time I posted about this I had crunched 20 jobs. One validated, 18 were invalid and the last one I manually aborted. Every failed job had this error, with a time stamp which I have removed:
PyJobTransforms.trfExe.preExecute 2016-11-25 13:25:46,420 INFO Now writing wrapper for substep executor EVNTtoHITS
PyJobTransforms.trfExe._writeAthenaWrapper 2016-11-25 13:25:46,420 INFO Valgrind not engaged
PyJobTransforms.trfExe.preExecute 2016-11-25 13:25:46,421 INFO Athena will be executed in a subshell via ['./runwrapper.EVNTtoHITS.sh']
Guest Log: PyJobTransforms.trfExe.execute 2016-11-25 13:25:46,421 INFO Starting execution of EVNTtoHITS (['./runwrapper.EVNTtoHITS.sh'])
Guest Log: PyJobTransforms.trfExe.execute 2016-11-25 13:34:21,094 INFO EVNTtoHITS executor returns 33
Guest Log: PyJobTransforms.trfExe.postExecute 2016-11-25 13:34:21,124 WARNING Failed to process expected perfMon stats file ntuple.pmon.gz: [Errno 2] No such file or directory: 'ntuple.pmon.gz'
Guest Log: PyJobTransforms.trfExe.validate 2016-11-25 13:34:21,124 ERROR Validation of return code failed: Non-zero return code from EVNTtoHITS (33) (Error code 65)
PyJobTransforms.trfExe.validate 2016-11-25 13:34:21,141 INFO Scanning logfile log.EVNTtoHITS for errors
Guest Log: PyJobTransforms.transform.execute 2016-11-25 13:34:21,215 CRITICAL Transform executor raised TransformValidationException: Non-zero return code from EVNTtoHITS (33); Logfile error in log.EVNTtoHITS: "PyG4AtlasAlg FATAL Standard std::exception is caught"
At this point the task terminates.
I don't know which of these is the real problem. Is it "Valgrind not engaged", and that leads to the errors that follow? Why is ntuple.pmon.gz missing; how is it generated? Is it supposed to be bundled with the task or downloaded during the run? This is important because many of my invalid tasks did validate after a third or fourth resend, but several didn't. Most of the ones that didn't had this error.
This affects both Mac and Windows (7 in my case). I didn't see it on any Linux hosts but I don't know if that's because they're immune or it just happened that way. It affects VirtualBox 5.0.2 thru at least 5.1.8. In my case, I had ATLAS as a backup to Cosmology where I've been running VirtualBox jobs for months without any real problem. I had successfully run ATLAS in the past, so seeing this many invalid tasks was an unpleasant surprise.
A lot more work would get done if this problem were solved. There's a lot of wasted time on machines that are capable but have this error, and a lot of wasted bandwidth resending these jobs until they get to a good host or finally bomb out.
Team USA form