climateprediction.net home page
Model or hardware failure?

Model or hardware failure?

Questions and Answers : Windows : Model or hardware failure?
Message board moderation

To post messages, you must log in.

AuthorMessage
Steinar1965

Send message
Joined: 4 Sep 06
Posts: 79
Credit: 5,583,517
RAC: 0
Message 33187 - Posted: 2 Apr 2008, 13:33:40 UTC

02.04.2008 08:41:31|climateprediction.net|Computation for task hadcm3istd_7sb9_1920_160_05924482_0 finished
02.04.2008 08:41:31|climateprediction.net|Output file hadcm3istd_7sb9_1920_160_05924482_0_10.zip for task hadcm3istd_7sb9_1920_160_05924482_0 absent
02.04.2008 08:41:31|climateprediction.net|Output file hadcm3istd_7sb9_1920_160_05924482_0_11.zip for task hadcm3istd_7sb9_1920_160_05924482_0 absent
02.04.2008 08:41:31|climateprediction.net|Output file hadcm3istd_7sb9_1920_160_05924482_0_12.zip for task hadcm3istd_7sb9_1920_160_05924482_0 absent
02.04.2008 08:41:31|climateprediction.net|Output file hadcm3istd_7sb9_1920_160_05924482_0_13.zip for task hadcm3istd_7sb9_1920_160_05924482_0 absent
02.04.2008 08:41:31|climateprediction.net|Output file hadcm3istd_7sb9_1920_160_05924482_0_14.zip for task hadcm3istd_7sb9_1920_160_05924482_0 absent
02.04.2008 08:41:31|climateprediction.net|Output file hadcm3istd_7sb9_1920_160_05924482_0_15.zip for task hadcm3istd_7sb9_1920_160_05924482_0 absent
02.04.2008 08:41:31|climateprediction.net|Output file hadcm3istd_7sb9_1920_160_05924482_0_16.zip for task hadcm3istd_7sb9_1920_160_05924482_0 absent

The model has crashed twice in one day. It crashed right after upload of the zip, after 56,25%.
The results says client error.

I have reinstalled the pc a week ago, prime95 works OK..
Any advices?
Thx
ID: 33187 · Report as offensive     Reply Quote
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 33188 - Posted: 2 Apr 2008, 15:21:04 UTC

The error log has a \"NEGATIVE THETA DETECTED\" message near the end, which suggests that the model was in computational trouble of some sort. A model can sometimes recover from isolated errors like that, but if it keeps crashing at the same point then there\'s nothing that can be done (on that machine).

However, the curse of those messages is that they don\'t include a date or time, so the event may relate to an earlier incident and not be related to the crash at all.

With a single error like that, I would be inclined to try a backup, since the model has run for a long time and it would be a shame to lose it because of a PC problem.
ID: 33188 · Report as offensive     Reply Quote
Steinar1965

Send message
Joined: 4 Sep 06
Posts: 79
Credit: 5,583,517
RAC: 0
Message 33189 - Posted: 2 Apr 2008, 15:43:58 UTC - in response to Message 33188.  

The model has crashed 2 times at the same point. I restored from backup but it crashed again. I run 4 models and there are crashes from time to time and restoring from backup happens. But since it crashed twice at the same point I am not sure if it is any point restoring again.

Should I let it go then?
ID: 33189 · Report as offensive     Reply Quote
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 33190 - Posted: 2 Apr 2008, 15:48:01 UTC - in response to Message 33189.  

... Should I let it go then?

Sadly, yes.
ID: 33190 · Report as offensive     Reply Quote
Steinar1965

Send message
Joined: 4 Sep 06
Posts: 79
Credit: 5,583,517
RAC: 0
Message 33191 - Posted: 2 Apr 2008, 16:33:24 UTC - in response to Message 33190.  

... Should I let it go then?

Sadly, yes.


At least it has gone more than 50% so maybe it is possible to generate a new model from it.
Sad not to reach the end anyway..
ID: 33191 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 33201 - Posted: 3 Apr 2008, 19:26:35 UTC
Last modified: 3 Apr 2008, 19:27:45 UTC

Has anyone ever tried taking a backup of a model with this sort of error, which means that a condition impossible in the real world has been generated, and transferring it from Intel to AMD or vice-versa?

We\'ve always assumed that these neg pressure or neg theta errors meant that the initial model parameter values were a combination that was tragically doomed to fail. But I wonder whether a computation error might produce the same result.

Sometimes you can restore a model with a computation error and the same error reoccurs at the same point. But in the case of loopers, a transfer to the other sort of computer has sometimes got models through the loop because of the different ways that Intels and AMDs handle the computations.

Steinar, I see that the model crashed on your Intel, but you previously crunched CPDN models with an AMD....... If the AMD is still functional you might want to try this to see if a transfer works with this model. Of course you could only try this if you haven\'t yet deleted the backup.

You may want to wait to see whether other people think this idea is worth trying.
Cpdn news
ID: 33201 · Report as offensive     Reply Quote
Steinar1965

Send message
Joined: 4 Sep 06
Posts: 79
Credit: 5,583,517
RAC: 0
Message 33203 - Posted: 4 Apr 2008, 16:26:52 UTC - in response to Message 33201.  

Has anyone ever tried taking a backup of a model with this sort of error, which means that a condition impossible in the real world has been generated, and transferring it from Intel to AMD or vice-versa?

We\'ve always assumed that these neg pressure or neg theta errors meant that the initial model parameter values were a combination that was tragically doomed to fail. But I wonder whether a computation error might produce the same result.

Sometimes you can restore a model with a computation error and the same error reoccurs at the same point. But in the case of loopers, a transfer to the other sort of computer has sometimes got models through the loop because of the different ways that Intels and AMDs handle the computations.

Steinar, I see that the model crashed on your Intel, but you previously crunched CPDN models with an AMD....... If the AMD is still functional you might want to try this to see if a transfer works with this model. Of course you could only try this if you haven\'t yet deleted the backup.

You may want to wait to see whether other people think this idea is worth trying.


I dont have the AMD anymore, sry. if I had it would be interesting to se. I have also deleted the backup now. Maybe the next time it happens it should be tried on the other processor. I can do it when my models are finished, they are now at 60%.
ID: 33203 · Report as offensive     Reply Quote

Questions and Answers : Windows : Model or hardware failure?

©2024 climateprediction.net