Name | hadcm3n_o59b_2060_40_008116151_0 |
Workunit | 8271265 |
Created | 30 Jul 2012, 16:12:57 UTC |
Sent | 30 Jul 2012, 16:14:22 UTC |
Report deadline | 29 Oct 2012, 23:41:33 UTC |
Received | 25 Aug 2012, 2:22:18 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 583026 |
Run time | 24 days 22 hours 40 min 31 sec |
CPU time | 19 days 16 hours 25 min 44 sec |
Validate state | Invalid |
Credit | 8,398.08 |
Device peak FLOPS | 2.23 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... 14:07:42 (4164): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:07:43 (4164): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3236, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3036, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 12:58:15 (3764): No heartbeat from core client for 30 sec - exiting 12:58:16 (3764): No heartbeat from core client for 30 sec - exiting 12:58:17 (3764): No heartbeat from core client for 30 sec - exiting 12:58:18 (3764): No heartbeat from core client for 30 sec - exiting 12:58:19 (3764): No heartbeat from core client for 30 sec - exiting 12:58:20 (3764): No heartbeat from core client for 30 sec - exiting 12:58:21 (3764): No heartbeat from core client for 30 sec - exiting 12:58:22 (3764): No heartbeat from core client for 30 sec - exiting 12:58:24 (3764): No heartbeat from core client for 30 sec - exiting 12:58:25 (3764): No heartbeat from core client for 30 sec - exiting 12:58:26 (3764): No heartbeat from core client for 30 sec - exiting 12:58:27 (3764): No heartbeat from core client for 30 sec - exiting 12:58:28 (3764): No heartbeat from core client for 30 sec - exiting 12:58:29 (3764): No heartbeat from core client for 30 sec - exiting 12:58:30 (3764): No heartbeat from core client for 30 sec - exiting 12:58:31 (3764): No heartbeat from core client for 30 sec - exiting 12:58:32 (3764): No heartbeat from core client for 30 sec - exiting 12:58:33 (3764): No heartbeat from core client for 30 sec - exiting 12:58:34 (3764): No heartbeat from core client for 30 sec - exiting 12:58:36 (3764): No heartbeat from core client for 30 sec - exiting 12:58:37 (3764): No heartbeat from core client for 30 sec - exiting 12:58:38 (3764): No heartbeat from core client for 30 sec - exiting 12:58:39 (3764): No heartbeat from core client for 30 sec - exiting 12:58:40 (3764): No heartbeat from core client for 30 sec - exiting 12:58:41 (3764): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:59:16 (1852): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3368, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3368, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3368, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3128, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3128, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3128, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
23 Aug 2012 23:29:05 | 583026 | 15053398 | hadcm3n_o59b_2060_40_008116151_0 | 699,840 | 1,643,594 | 2.3485 |
23 Aug 2012 01:34:46 | 583026 | 15053398 | hadcm3n_o59b_2060_40_008116151_0 | 673,920 | 1,580,165 | 2.3447 |
22 Aug 2012 04:20:19 | 583026 | 15053398 | hadcm3n_o59b_2060_40_008116151_0 | 648,000 | 1,520,259 | 2.3461 |
21 Aug 2012 05:39:11 | 583026 | 15053398 | hadcm3n_o59b_2060_40_008116151_0 | 622,080 | 1,458,858 | 2.3451 |
20 Aug 2012 10:34:48 | 583026 | 15053398 | hadcm3n_o59b_2060_40_008116151_0 | 596,160 | 1,399,533 | 2.3476 |
19 Aug 2012 12:35:08 | 583026 | 15053398 | hadcm3n_o59b_2060_40_008116151_0 | 570,240 | 1,337,672 | 2.3458 |
18 Aug 2012 12:14:44 | 583026 | 15053398 | hadcm3n_o59b_2060_40_008116151_0 | 544,320 | 1,276,969 | 2.3460 |
17 Aug 2012 13:15:47 | 583026 | 15053398 | hadcm3n_o59b_2060_40_008116151_0 | 518,400 | 1,216,824 | 2.3473 |
16 Aug 2012 16:50:15 | 583026 | 15053398 | hadcm3n_o59b_2060_40_008116151_0 | 492,480 | 1,156,330 | 2.3480 |
16 Aug 2012 00:24:03 | 583026 | 15053398 | hadcm3n_o59b_2060_40_008116151_0 | 466,560 | 1,094,911 | 2.3468 |
15 Aug 2012 05:49:26 | 583026 | 15053398 | hadcm3n_o59b_2060_40_008116151_0 | 440,640 | 1,034,089 | 2.3468 |
14 Aug 2012 10:51:52 | 583026 | 15053398 | hadcm3n_o59b_2060_40_008116151_0 | 414,720 | 972,301 | 2.3445 |
13 Aug 2012 14:51:22 | 583026 | 15053398 | hadcm3n_o59b_2060_40_008116151_0 | 388,800 | 910,902 | 2.3429 |
12 Aug 2012 16:08:05 | 583026 | 15053398 | hadcm3n_o59b_2060_40_008116151_0 | 362,880 | 849,371 | 2.3406 |
11 Aug 2012 18:09:48 | 583026 | 15053398 | hadcm3n_o59b_2060_40_008116151_0 | 336,960 | 787,633 | 2.3375 |
10 Aug 2012 19:38:18 | 583026 | 15053398 | hadcm3n_o59b_2060_40_008116151_0 | 311,040 | 725,794 | 2.3334 |
09 Aug 2012 17:39:50 | 583026 | 15053398 | hadcm3n_o59b_2060_40_008116151_0 | 285,120 | 665,295 | 2.3334 |
08 Aug 2012 18:42:34 | 583026 | 15053398 | hadcm3n_o59b_2060_40_008116151_0 | 259,200 | 605,041 | 2.3343 |
07 Aug 2012 19:48:36 | 583026 | 15053398 | hadcm3n_o59b_2060_40_008116151_0 | 233,280 | 544,592 | 2.3345 |
06 Aug 2012 22:07:50 | 583026 | 15053398 | hadcm3n_o59b_2060_40_008116151_0 | 207,360 | 484,947 | 2.3387 |
©2024 cpdn.org