climateprediction.net home page
Hadcm3n models crashing on new Macintosh computers too

Hadcm3n models crashing on new Macintosh computers too

Message boards : Number crunching : Hadcm3n models crashing on new Macintosh computers too
Message board moderation

To post messages, you must log in.

AuthorMessage
dajashby

Send message
Joined: 1 Sep 04
Posts: 55
Credit: 17,223,688
RAC: 967
Message 42431 - Posted: 20 Jun 2011, 22:39:12 UTC


hadcm3n is NOT compatible with CPUs/processors without SSE2 capabilities. That means HADCM3N is NOT compatible with


Pentium 3, Pentium 3 based Celerons and Xeons, and older Intel processors, and
Athlon XP/MP, Athlon XP based Semprons/Durons, and older AMD processors.



A computer with a processor listed above downloading one of these models will result in an immediate crash. This is not expected to change in the future.


geophi, you can add Macintosh computers with Intel's new Sandy Bridge processor to that list. I've now had two of these models crash on my 2011 iMac within seconds of launching - which, given the difficulty of getting any work at all, is pretty frustrating.
Derrick Ashby
ID: 42431 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42433 - Posted: 20 Jun 2011, 23:10:32 UTC - in response to Message 42431.  

Are you sure that's the cause, and not the batch of faulty models as mentioned in recent posts in News and Announcements?

Backups: Here
ID: 42433 · Report as offensive     Reply Quote
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 42434 - Posted: 21 Jun 2011, 8:05:34 UTC

The stderr messages for your failed tasks all contain errors similar to the following:

execl(/Library/Application Support/BOINC Data/projects/climateprediction.net/hadcm3n_um_6.07_i686-apple-darwin, 137095) failed!
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=10400, iMonCtr=1
Model crash detected, will try to restart...

That computer's task list has a HadAM3P_SAF task crashing with the same error on 26th May and a HadAM3P_EU task completing successfully on 22nd May. The other notable difference between them is that the EU task reported BOINC core client version 6.10.58 and the SAF one reported 6.12.26, so the error could indicate that the upgrade caused problems with the permissions.

Interestingly, the SAF task returned 5 trickles after the error was reported, suggesting that you managed to get it to run from a backup. How did you manage that?
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 42434 · Report as offensive     Reply Quote
dajashby

Send message
Joined: 1 Sep 04
Posts: 55
Credit: 17,223,688
RAC: 967
Message 42453 - Posted: 23 Jun 2011, 22:15:40 UTC


That computer's task list has a HadAM3P_SAF task crashing with the same error on 26th May and a HadAM3P_EU task completing successfully on 22nd May. The other notable difference between them is that the EU task reported BOINC core client version 6.10.58 and the SAF one reported 6.12.26, so the error could indicate that the upgrade caused problems with the permissions.


I acquired the current computer (iMac with i7 processor) in late May, but made the mistake of migrating the settings from the laprop (Macbook Air) I had been using (I'm new to the Macintosh). The BOINC settings were transferred along with everything else, and although I reinstalled BOINC it may be that I need to clear the whole thing out and start again. The successfully completed tasks were on the laptop. Everything that's run on this machine from CPDM has crashed.

The case where trickles arrived after the model crashed was probably because the same model was being run on both machines, which was a bit stupid of me.
Derrick Ashby
ID: 42453 · Report as offensive     Reply Quote
dajashby

Send message
Joined: 1 Sep 04
Posts: 55
Credit: 17,223,688
RAC: 967
Message 42455 - Posted: 24 Jun 2011, 13:09:15 UTC

OK, I have cleared out my BOINC Data folder and reinstalled the BOINC manager, and, somewhat to my surprise, picked up a new hadcm3n model immediately. It has so far run for several minutes without crashing. I'm crossing my fingers.
Derrick Ashby
ID: 42455 · Report as offensive     Reply Quote
dajashby

Send message
Joined: 1 Sep 04
Posts: 55
Credit: 17,223,688
RAC: 967
Message 42464 - Posted: 25 Jun 2011, 13:29:34 UTC - in response to Message 42455.  

15 hours now...:-)
Derrick Ashby
ID: 42464 · Report as offensive     Reply Quote
dajashby

Send message
Joined: 1 Sep 04
Posts: 55
Credit: 17,223,688
RAC: 967
Message 42466 - Posted: 26 Jun 2011, 2:15:37 UTC - in response to Message 42464.  

And now I have a second hadcm3n model. Looks like it was my BOINC installation causing the problems.
Derrick Ashby
ID: 42466 · Report as offensive     Reply Quote

Message boards : Number crunching : Hadcm3n models crashing on new Macintosh computers too

©2024 climateprediction.net