climateprediction.net home page
Did this job really suddenly finish?

Did this job really suddenly finish?

Questions and Answers : Windows : Did this job really suddenly finish?
Message board moderation

To post messages, you must log in.

AuthorMessage
Lester Lane

Send message
Joined: 3 Nov 07
Posts: 6
Credit: 646,665
RAC: 0
Message 33001 - Posted: 16 Mar 2008, 22:04:17 UTC

It had taken ages to get nearly 40% done on my first job. Then suddenly it vanishes. The log reads:

16/03/2008 19:04:08|climateprediction.net|Task hadsm3fub_0555_005909211_9 exited with zero status but no \'finished\' file
16/03/2008 19:04:08|climateprediction.net|If this happens repeatedly you may need to reset the project.
16/03/2008 19:04:08|climateprediction.net|Restarting task hadsm3fub_0555_005909211_9 using hadsm3 version 506
16/03/2008 19:04:16|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
16/03/2008 19:04:45|climateprediction.net|Scheduler request succeeded: got 0 new tasks
16/03/2008 19:05:10|climateprediction.net|Computation for task hadsm3fub_0555_005909211_9 finished
16/03/2008 19:05:10|climateprediction.net|Output file hadsm3fub_0555_005909211_9_2.zip for task hadsm3fub_0555_005909211_9 absent
16/03/2008 19:05:10|climateprediction.net|Output file hadsm3fub_0555_005909211_9_3.zip for task hadsm3fub_0555_005909211_9 absent
16/03/2008 19:05:10|climateprediction.net|Restarting task hadsm3fub_033m_005920429_4 using hadsm3 version 506
16/03/2008 19:08:21|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 1 completed tasks
16/03/2008 19:08:26|climateprediction.net|Scheduler request succeeded: got 0 new tasks

Has this really finished? Looks to me like it failed to produce a file. I would not like this error to reoccur, if it is one. Can someone explain this please?
ID: 33001 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2170
Credit: 64,555,907
RAC: 5,858
Message 33002 - Posted: 17 Mar 2008, 2:00:44 UTC

I\'m assuming this is the result in question. There are a lot of \"no heartbeat from core client\" messages in the output on that page. Did it fail during a time that some other CPU intensive task was running?
ID: 33002 · Report as offensive     Reply Quote
Lester Lane

Send message
Joined: 3 Nov 07
Posts: 6
Credit: 646,665
RAC: 0
Message 33014 - Posted: 18 Mar 2008, 11:23:17 UTC - in response to Message 33002.  

I\'m assuming this is the result in question. There are a lot of \"no heartbeat from core client\" messages in the output on that page. Did it fail during a time that some other CPU intensive task was running?


No, but I am running SETI and Einstein projects too but only two will run at any one time.
ID: 33014 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 33020 - Posted: 19 Mar 2008, 7:11:34 UTC


I ran into the same problem with a BBC WU, it was due to an abrupt power failure and like this Wu it had successfully conveyed the message to the BBC server :( first time it had happened. Any way i ran it to completion from a back up, i had, and got the credits even though the server had marked it as client error. Now i am very careful as to what is being sent to the server. Any problem, i kill it and start from a back up. No more tales being carried back to the server.
Regards
Masud.
ID: 33020 · Report as offensive     Reply Quote
Lester Lane

Send message
Joined: 3 Nov 07
Posts: 6
Credit: 646,665
RAC: 0
Message 33022 - Posted: 19 Mar 2008, 15:59:32 UTC - in response to Message 33020.  


I ran into the same problem with a BBC WU, it was due to an abrupt power failure and like this Wu it had successfully conveyed the message to the BBC server :( first time it had happened. Any way i ran it to completion from a back up, i had, and got the credits even though the server had marked it as client error. Now i am very careful as to what is being sent to the server. Any problem, i kill it and start from a back up. No more tales being carried back to the server.
Regards
Masud.


Thanks, I could try running it again from a backup to see if it works. But it is not a power failure as I have an APS unit.
ID: 33022 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 33023 - Posted: 19 Mar 2008, 16:37:17 UTC
Last modified: 19 Mar 2008, 16:42:49 UTC

Even if a task has been reported to the server as crashed, you can still restore a backup and try again. If the server learns of the crash it just doesn\'t award you with the word \'Success\' when the task completes.

Properly made backups of the complete contents of the BOINC folder, done after exiting from BOINC, almost always restore correctly. Hope your restore works, Lester - best of luck.

Lester, to reduce your chances of future model crashes, have a look at the README collection about crashes and problems (link in my signature). The important item for you is #6 by MikeMars. He lists all the usual model crash causes and suggests solutions that are mostly very easy to implement.
Cpdn news
ID: 33023 · Report as offensive     Reply Quote
Lester Lane

Send message
Joined: 3 Nov 07
Posts: 6
Credit: 646,665
RAC: 0
Message 33029 - Posted: 19 Mar 2008, 18:40:45 UTC - in response to Message 33023.  

Even if a task has been reported to the server as crashed, you can still restore a backup and try again. If the server learns of the crash it just doesn\'t award you with the word \'Success\' when the task completes.

Properly made backups of the complete contents of the BOINC folder, done after exiting from BOINC, almost always restore correctly. Hope your restore works, Lester - best of luck.

Lester, to reduce your chances of future model crashes, have a look at the README collection about crashes and problems (link in my signature). The important item for you is #6 by MikeMars. He lists all the usual model crash causes and suggests solutions that are mostly very easy to implement.


Thanks. I think it was the backup that took the model down, so no, the restore did not work. Will have to go back further... Will now set the BOINC to stop when backups run.
ID: 33029 · Report as offensive     Reply Quote
Lester Lane

Send message
Joined: 3 Nov 07
Posts: 6
Credit: 646,665
RAC: 0
Message 33031 - Posted: 19 Mar 2008, 19:01:39 UTC - in response to Message 33029.  

Even if a task has been reported to the server as crashed, you can still restore a backup and try again. If the server learns of the crash it just doesn\'t award you with the word \'Success\' when the task completes.

Properly made backups of the complete contents of the BOINC folder, done after exiting from BOINC, almost always restore correctly. Hope your restore works, Lester - best of luck.

Lester, to reduce your chances of future model crashes, have a look at the README collection about crashes and problems (link in my signature). The important item for you is #6 by MikeMars. He lists all the usual model crash causes and suggests solutions that are mostly very easy to implement.


Thanks. I think it was the backup that took the model down, so no, the restore did not work. Will have to go back further... Will now set the BOINC to stop when backups run.


Got it working again - great! BOINC no longer runs whilst backing up.
ID: 33031 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 33041 - Posted: 20 Mar 2008, 14:54:55 UTC

To make a backup you must EXIT from BOINC, not just suspend the tasks or close the BOINC manager by clicking X.

File > Exit in BOINC manager, or right-click on the system tray icon and select Exit.
Cpdn news
ID: 33041 · Report as offensive     Reply Quote

Questions and Answers : Windows : Did this job really suddenly finish?

©2024 climateprediction.net