climateprediction.net (CPDN) home page
Thread 'HadCM3 short 8.34 workunits errors'

Thread 'HadCM3 short 8.34 workunits errors'

Message boards : Number crunching : HadCM3 short 8.34 workunits errors
Message board moderation

To post messages, you must log in.

AuthorMessage
Christophe Daulie

Send message
Joined: 19 Aug 15
Posts: 5
Credit: 2,899,370
RAC: 0
Message 56516 - Posted: 16 Jul 2017, 7:55:16 UTC

Are there people who have also errors when workunits of HadCM3 short 8.34 start ?

good luck with climateprediction
ID: 56516 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4542
Credit: 19,039,635
RAC: 18,944
Message 56517 - Posted: 16 Jul 2017, 16:52:41 UTC - in response to Message 56516.  

Hi Christophe,

can you say what the errors are please? The tasks of this type are still listed as running on your computer
https://www.cpdn.org/cpdnboinc/results.php?hostid=1396550
so anyone looking is going to have to guess what the problem is. Searching a bit, at least one windows and one linux computer have had failures with this batch but both the ones I found have quite a high failure rate anyway.

Perhaps if they have failed when they report it will be possible to work out a bit more or if anyone else with failures reports in.
ID: 56517 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 492
Credit: 31,510,687
RAC: 14,925
Message 56519 - Posted: 16 Jul 2017, 22:12:30 UTC - in response to Message 56517.  

I've had one fail with an invalid theta. Failed after only 42sec of CPU time.
ID: 56519 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 492
Credit: 31,510,687
RAC: 14,925
Message 56520 - Posted: 17 Jul 2017, 11:49:19 UTC - in response to Message 56519.  

Just had a second one fail after a few seconds with a visual fortran runtime error. Is there a problem with this batch?
ID: 56520 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4542
Credit: 19,039,635
RAC: 18,944
Message 56521 - Posted: 17 Jul 2017, 12:05:05 UTC

I notice Alan that yours are from batch 595 whereas Cristophe's are from 599. Certainly, scouting around I haven't found any completed from 599 yet. I will let project know.
ID: 56521 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4542
Credit: 19,039,635
RAC: 18,944
Message 56522 - Posted: 17 Jul 2017, 12:24:38 UTC - in response to Message 56521.  

I have had a reply from David in Oxford.

Hi Dave,

We have had 64 successes within Batch 599 so far, though I agree it has a high failure rate though the experimental design is likely causing this.

Regards

David


I suspect the same is true of 595 which has the same design. - I hadn't actually looked through 595 tasks to see how many were failing hence the reply being only mentioning the successes from them.
ID: 56522 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4542
Credit: 19,039,635
RAC: 18,944
Message 56523 - Posted: 17 Jul 2017, 14:53:47 UTC

And from Sarah,

Hi,

Yes especially an invalid theta error will probably indicate that the model is unstable. This is not unexpected from these batches as they use perturbed physics setup. These batches will also be added to in the future so the high failure rate for these small amounts of workunits may not be indicative of the final whole batch.

Best wishes,
Sarah
ID: 56523 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 56524 - Posted: 17 Jul 2017, 15:04:35 UTC

I have a batch 595 WU running now. It is 85% complete after nearly 6 days. It should complete in 1 day. So not all fail early.
ID: 56524 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 492
Credit: 31,510,687
RAC: 14,925
Message 56525 - Posted: 17 Jul 2017, 20:45:58 UTC - in response to Message 56520.  

And a third.
ID: 56525 · Report as offensive     Reply Quote
Alex Plantema

Send message
Joined: 3 Sep 04
Posts: 126
Credit: 26,610,380
RAC: 3,377
Message 56526 - Posted: 17 Jul 2017, 21:47:17 UTC

I received 9 tasks from batch 595. 8 completed successfully, the 9th hasn't started yet.
ID: 56526 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 56527 - Posted: 18 Jul 2017, 12:30:52 UTC - in response to Message 56516.  

Are there people who have also errors when workunits of HadCM3 short 8.34 start ?


hadcm3s_a092_203412_120_599_011122340_1
Workunit 11122340
Created 18 Jul 2017, 8:31:10 UTC
Sent 18 Jul 2017, 8:31:22 UTC
Report deadline 30 Jun 2018, 13:51:22 UTC
Received 18 Jul 2017, 12:23:28 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 22 (0x16) Unknown error number

https://www.cpdn.org/cpdnboinc/result.php?resultid=20552745

<core_client_version>7.2.33</core_client_version>
<![CDATA[
<message>
process exited with code 22 (0x16, -234)
</message>
<stderr_txt>
forrtl: severe (24): end-of-file during read, unit 5, file /home/boinc/projects/climateprediction.net/hadcm3s_a092_203412_120_599_011122340/jobs/climate.cpdc, line 873, position 0
Image PC Routine Line Source
hadcm3s_um_8.34_i 0848A415 Unknown Unknown Unknown
hadcm3s_um_8.34_i 084AE5F7 Unknown Unknown Unknown
hadcm3s_um_8.34_i 082C98AF Unknown Unknown Unknown
hadcm3s_um_8.34_i 081C028B Unknown Unknown Unknown
hadcm3s_um_8.34_i 081C1E0A Unknown Unknown Unknown
hadcm3s_um_8.34_i 083F95C9 Unknown Unknown Unknown
hadcm3s_um_8.34_i 083F867F Unknown Unknown Unknown
hadcm3s_um_8.34_i 0840346D Unknown Unknown Unknown
libc-2.12.so 00421D26 __libc_start_main Unknown Unknown
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=27249, iMonCtr=1
Model crash detected, will try to restart...
forrtl: severe (24): end-of-file during read, unit 5, file /home/boinc/projects/climateprediction.net/hadcm3s_a092_203412_120_599_011122340/jobs/climate.cpdc, line 873, position 0
Image PC Routine Line Source
hadcm3s_um_8.34_i 0848A415 Unknown Unknown Unknown
hadcm3s_um_8.34_i 084AE5F7 Unknown Unknown Unknown
hadcm3s_um_8.34_i 082C98AF Unknown Unknown Unknown
hadcm3s_um_8.34_i 081C028B Unknown Unknown Unknown
hadcm3s_um_8.34_i 081C1E0A Unknown Unknown Unknown
hadcm3s_um_8.34_i 083F95C9 Unknown Unknown Unknown
hadcm3s_um_8.34_i 083F867F Unknown Unknown Unknown
hadcm3s_um_8.34_i 0840346D Unknown Unknown Unknown
libc-2.12.so 0033ED26 __libc_start_main Unknown Unknown
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=27249, iMonCtr=1
Model crash detected, will try to restart...
forrtl: severe (24): end-of-file during read, unit 5, file /home/boinc/projects/climateprediction.net/hadcm3s_a092_203412_120_599_011122340/jobs/climate.cpdc, line 873, position 0
Image PC Routine Line Source
hadcm3s_um_8.34_i 0848A415 Unknown Unknown Unknown
hadcm3s_um_8.34_i 084AE5F7 Unknown Unknown Unknown
hadcm3s_um_8.34_i 082C98AF Unknown Unknown Unknown
hadcm3s_um_8.34_i 081C028B Unknown Unknown Unknown
hadcm3s_um_8.34_i 081C1E0A Unknown Unknown Unknown
hadcm3s_um_8.34_i 083F95C9 Unknown Unknown Unknown
hadcm3s_um_8.34_i 083F867F Unknown Unknown Unknown
hadcm3s_um_8.34_i 0840346D Unknown Unknown Unknown
libc-2.12.so 0030ED26 __libc_start_main Unknown Unknown
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=27249, iMonCtr=1
Model crash detected, will try to restart...
forrtl: severe (24): end-of-file during read, unit 5, file /home/boinc/projects/climateprediction.net/hadcm3s_a092_203412_120_599_011122340/jobs/climate.cpdc, line 873, position 0
Image PC Routine Line Source
hadcm3s_um_8.34_i 0848A415 Unknown Unknown Unknown
hadcm3s_um_8.34_i 084AE5F7 Unknown Unknown Unknown
hadcm3s_um_8.34_i 082C98AF Unknown Unknown Unknown
hadcm3s_um_8.34_i 081C028B Unknown Unknown Unknown
hadcm3s_um_8.34_i 081C1E0A Unknown Unknown Unknown
hadcm3s_um_8.34_i 083F95C9 Unknown Unknown Unknown
hadcm3s_um_8.34_i 083F867F Unknown Unknown Unknown
hadcm3s_um_8.34_i 0840346D Unknown Unknown Unknown
libc-2.12.so 00322D26 __libc_start_main Unknown Unknown
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=27249, iMonCtr=1
Model crash detected, will try to restart...
forrtl: severe (24): end-of-file during read, unit 5, file /home/boinc/projects/climateprediction.net/hadcm3s_a092_203412_120_599_011122340/jobs/climate.cpdc, line 873, position 0
Image PC Routine Line Source
hadcm3s_um_8.34_i 0848A415 Unknown Unknown Unknown
hadcm3s_um_8.34_i 084AE5F7 Unknown Unknown Unknown
hadcm3s_um_8.34_i 082C98AF Unknown Unknown Unknown
hadcm3s_um_8.34_i 081C028B Unknown Unknown Unknown
hadcm3s_um_8.34_i 081C1E0A Unknown Unknown Unknown
hadcm3s_um_8.34_i 083F95C9 Unknown Unknown Unknown
hadcm3s_um_8.34_i 083F867F Unknown Unknown Unknown
hadcm3s_um_8.34_i 0840346D Unknown Unknown Unknown
libc-2.12.so 0030ED26 __libc_start_main Unknown Unknown
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=27249, iMonCtr=1
Model crash detected, will try to restart...
forrtl: severe (24): end-of-file during read, unit 5, file /home/boinc/projects/climateprediction.net/hadcm3s_a092_203412_120_599_011122340/jobs/climate.cpdc, line 873, position 0
Image PC Routine Line Source
hadcm3s_um_8.34_i 0848A415 Unknown Unknown Unknown
hadcm3s_um_8.34_i 084AE5F7 Unknown Unknown Unknown
hadcm3s_um_8.34_i 082C98AF Unknown Unknown Unknown
hadcm3s_um_8.34_i 081C028B Unknown Unknown Unknown
hadcm3s_um_8.34_i 081C1E0A Unknown Unknown Unknown
hadcm3s_um_8.34_i 083F95C9 Unknown Unknown Unknown
hadcm3s_um_8.34_i 083F867F Unknown Unknown Unknown
hadcm3s_um_8.34_i 0840346D Unknown Unknown Unknown
libc-2.12.so 0030ED26 __libc_start_main Unknown Unknown
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=27249, iMonCtr=1
Model crash detected, will try to restart...
Sorry, too many model crashes! :-(
Calling boinc_finish...04:32:01 (27249): called boinc_finish(22)
In boinc_exit called with status 22
Calloing set_signal_exit_code with status 22

</stderr_txt>
]]>
ID: 56527 · Report as offensive     Reply Quote
KenBob2

Send message
Joined: 15 Oct 05
Posts: 1
Credit: 25,257,825
RAC: 7,689
Message 56532 - Posted: 20 Jul 2017, 13:00:55 UTC - in response to Message 56516.  

I'm also getting errors. All start with: 'forrtl:severe (24): end-of-file read. Last line is 'Stack trace terminated abnormally.
ID: 56532 · Report as offensive     Reply Quote
Christophe Daulie

Send message
Joined: 19 Aug 15
Posts: 5
Credit: 2,899,370
RAC: 0
Message 56546 - Posted: 23 Jul 2017, 17:33:08 UTC - in response to Message 56517.  
Last modified: 23 Jul 2017, 17:33:27 UTC

Hi Christophe,

can you say what the errors are please? The tasks of this type are still listed as running on your computer
https://www.cpdn.org/cpdnboinc/results.php?hostid=1396550
so anyone looking is going to have to guess what the problem is. Searching a bit, at least one windows and one linux computer have had failures with this batch but both the ones I found have quite a high failure rate anyway.

Perhaps if they have failed when they report it will be possible to work out a bit more or if anyone else with failures reports in.


The requested errors were already mentioned in the meanwhile on this topic... errors mentioning line873
ID: 56546 · Report as offensive     Reply Quote
Solly

Send message
Joined: 9 Feb 17
Posts: 4
Credit: 2,447,704
RAC: 0
Message 56606 - Posted: 31 Jul 2017, 20:04:40 UTC - in response to Message 56520.  

I have had exactly the same issue. I just aborted the task.
ID: 56606 · Report as offensive     Reply Quote

Message boards : Number crunching : HadCM3 short 8.34 workunits errors

©2024 cpdn.org