climateprediction.net home page
Task 16040100

Task 16040100

Name hadcm3n_ob1d_1900_40_008469428_0
Workunit 8620267
Created 27 Sep 2013, 9:47:57 UTC
Sent 3 Oct 2013, 17:56:36 UTC
Report deadline 3 Jan 2014, 1:23:47 UTC
Received 6 Nov 2013, 21:04:57 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 22 (0x00000016) Unknown error code
Computer ID 1212841
Run time 8 days 6 hours 37 min 52 sec
CPU time 8 days 4 hours 28 min 48 sec
Validate state Invalid
Credit 8,398.08
Device peak FLOPS 2.93 GFLOPS
Application version UK Met Office Coupled Model Full Resolution Ocean v6.07
windows_intelx86
Stderr
<core_client_version>7.0.64</core_client_version>
<![CDATA[
<message>
The device does not recognize the command.
 (0x16) - exit code 22 (0x16)
</message>
<stderr_txt>
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5668, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4008, iMonCtr=1
Model crash detected, will try to restart...
07:48:56 (5712): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
19:03:27 (5588): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1276, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5556, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2992, iMonCtr=1
Model crash detected, will try to restart...
18:53:53 (5644): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3096, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4508, iMonCtr=1
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
19:22:28 (4260): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4496, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6104, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4612, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4564, iMonCtr=1
Model crash detected, will try to restart...
05:32:33 (4792): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4832, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6104, iMonCtr=1
Model crash detected, will try to restart...
20:00:35 (5856): No heartbeat from core client for 30 sec - exiting
20:00:36 (5856): No heartbeat from core client for 30 sec - exiting
20:00:37 (5856): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
CPDN Monitor - Quit request from BOINC...
19:23:23 (3748): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
19:24:41 (2032): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1

Model crashed: SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1
21:06:07 (5552): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1

Model crashed: SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1
SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1

Model crashed: SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1
SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1

Model crashed: SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1
SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1

Model crashed: SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1
SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1

Model crashed: SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1
Sorry, too many model crashes! :-(
Called boinc_finish

</stderr_txt>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
03 Nov 2013 19:44:31 1212841 16040100 hadcm3n_ob1d_1900_40_008469428_0 699,840 691,786 0.9885
03 Nov 2013 12:18:51 1212841 16040100 hadcm3n_ob1d_1900_40_008469428_0 673,920 665,056 0.9868
02 Nov 2013 18:07:03 1212841 16040100 hadcm3n_ob1d_1900_40_008469428_0 648,000 638,612 0.9855
02 Nov 2013 10:59:01 1212841 16040100 hadcm3n_ob1d_1900_40_008469428_0 622,080 613,206 0.9857
31 Oct 2013 20:38:12 1212841 16040100 hadcm3n_ob1d_1900_40_008469428_0 596,160 586,360 0.9836
28 Oct 2013 21:06:58 1212841 16040100 hadcm3n_ob1d_1900_40_008469428_0 570,240 559,723 0.9816
28 Oct 2013 13:44:44 1212841 16040100 hadcm3n_ob1d_1900_40_008469428_0 544,320 549,574 1.0097
28 Oct 2013 06:29:28 1212841 16040100 hadcm3n_ob1d_1900_40_008469428_0 518,400 524,283 1.0113
27 Oct 2013 14:49:56 1212841 16040100 hadcm3n_ob1d_1900_40_008469428_0 492,480 497,889 1.0110
27 Oct 2013 07:42:57 1212841 16040100 hadcm3n_ob1d_1900_40_008469428_0 466,560 472,460 1.0126
26 Oct 2013 14:57:32 1212841 16040100 hadcm3n_ob1d_1900_40_008469428_0 440,640 446,324 1.0129
25 Oct 2013 20:05:51 1212841 16040100 hadcm3n_ob1d_1900_40_008469428_0 414,720 420,252 1.0133
23 Oct 2013 20:42:12 1212841 16040100 hadcm3n_ob1d_1900_40_008469428_0 388,800 394,042 1.0135
20 Oct 2013 18:02:53 1212841 16040100 hadcm3n_ob1d_1900_40_008469428_0 362,880 367,685 1.0132
20 Oct 2013 10:40:16 1212841 16040100 hadcm3n_ob1d_1900_40_008469428_0 336,960 341,277 1.0128
18 Oct 2013 20:50:47 1212841 16040100 hadcm3n_ob1d_1900_40_008469428_0 311,040 314,161 1.0100
16 Oct 2013 19:31:51 1212841 16040100 hadcm3n_ob1d_1900_40_008469428_0 285,120 287,293 1.0076
13 Oct 2013 18:49:41 1212841 16040100 hadcm3n_ob1d_1900_40_008469428_0 259,200 260,667 1.0057
13 Oct 2013 11:35:58 1212841 16040100 hadcm3n_ob1d_1900_40_008469428_0 233,280 234,772 1.0064
12 Oct 2013 18:43:15 1212841 16040100 hadcm3n_ob1d_1900_40_008469428_0 207,360 208,663 1.0063


©2024 climateprediction.net