Message boards :
Number crunching :
RAPIT tasks failing after few seconds
Message board moderation
Author | Message |
---|---|
Send message Joined: 15 Jan 11 Posts: 175 Credit: 6,242,691 RAC: 699 |
I've just had two consecutive RAPIT task fail after only a few seconds with 'Compute error'. (16585904 & 16585822). I've been running RAPIT tasks for some time with no problems and have other models running OK at the moment. Just wondering if I have problems or is it the WU's. (No changes on my system) |
Send message Joined: 16 Jan 10 Posts: 1081 Credit: 7,020,145 RAC: 4,560 |
The message (from an EU rather than RAPIT model): execl(/Library/Application Support/BOINC Data/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-apple-darwin, 159230) failed! ... is usually produced by the Mac permissions problem. There's no evidence of a BOINC Manager upgrade or restoration from backup that I can see, so if you can't get any models to start after a reboot then the next thing would be a project reset (or remove) to clear out all the downloaded files. Don't do that until the running models finish otherwise they'll be aborted. |
Send message Joined: 15 Jan 11 Posts: 175 Credit: 6,242,691 RAC: 699 |
The ones I was referring to were further down the task information list (which wasn�t in �sent� date order?) Mon 5 May 12:23:33 2014 climateprediction.net Starting task hadcm3n_890u_1980_40_008721113_0 using hadcm3n version 607 Mon 5 May 12:24:01 2014 climateprediction.net Computation for task hadcm3n_890u_1980_40_008721113_0 finished Task id 16585904 Mon 5 May 13:46:13 2014 climateprediction.net Starting task hadcm3n_88yk_1980_40_008721031_0 using hadcm3n version 607 Mon 5 May 13:46:41 2014 climateprediction.net Computation for task hadcm3n_88yk_1980_40_008721031_0 finished Task id 16585822 After my two current tasks finish I'll do a project reset anyway. I've just looked at my task list a bit more thoroughly & realised that there have been more RAPIT failures than I'd realised. The last two happened as I was working and startled me. |
Send message Joined: 19 Sep 04 Posts: 92 Credit: 1,937,001 RAC: 318 |
Another bad batch...? :-( Professor Desty Nova Researching Karma the Hard Way |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
Another bad batch...? :-( No. It's the same permissions problem that Iain suggested, this time on the HadCM3N worker process: execl(/Library/Application Support/BOINC Data/projects/climateprediction.net/hadcm3n_um_6.07_i686-apple-darwin, 131270) failed! "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 19 Sep 04 Posts: 92 Credit: 1,937,001 RAC: 318 |
I was kind of asking/stating because I had just downloaded a RAPIT unit, and it crashed after a few seconds (on windows). Stderr: <core_client_version>7.2.42</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Model crashed: INITTIME: Atmosphere basis time mismatch Professor Desty Nova Researching Karma the Hard Way |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
There are several reasons for crashes, some of which are to do with the computer in question, and others which indicate a problem with the data set. So it's necessary to see/quote the error to know which is which. The error INITTIME is one that means there's a problem with the data files. PS 3 that I'm running DON'T have a problem, but they're all re-sends, so the others that tried to run them have computer problems. |
Send message Joined: 27 Feb 08 Posts: 41 Credit: 1,402,356 RAC: 0 |
It appears that approximately 3,000 of these RAPIT units were taken off the server in the past few days. Are the ones I am crunching still valid? Thanks! Regards, Bob P. |
Send message Joined: 31 Oct 04 Posts: 336 Credit: 3,316,482 RAC: 0 |
Have a look at this thread, I checked your active ones and didn't find that message so yours are still needed. |
©2024 climateprediction.net