Task 15720750

Name	hadcm3n_4jvy_1940_40_008309388_1
Workunit	8460523
Created	10 Apr 2013, 23:34:13 UTC
Sent	10 Apr 2013, 23:34:45 UTC
Report deadline	11 Jul 2013, 7:01:56 UTC
Received	8 May 2013, 23:30:43 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	1239373
Run time	17 days 17 hours 40 min 54 sec
CPU time	14 days 22 hours 43 min 50 sec
Validate state	Invalid
Credit	5,598.72
Device peak FLOPS	1.28 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 i686-pc-linux-gnu
Stderr	<core_client_version>6.6.41</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 63 - Return code = 1 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 64 - Return code = 1 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 65 - Return code = 1 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 66 - Return code = 1 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 67 - Return code = 1 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 1 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 1 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 00:26:30 (30031): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:49:33 (15111): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:49:34 (15111): No heartbeat from core client for 30 sec - exiting 01:07:06 (24305): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 01:42:30 (31248): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 01:46:43 (12912): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:00:48 (14656): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 03:31:07 (20077): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 03:38:30 (23553): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 03:38:33 (23553): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=26456, iMonCtr=1 Model crash detected, will try to restart... 04:23:02 (26456): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=11828, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=11828, iMonCtr=1 Model crash detected, will try to restart... 07:52:37 (11828): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=30226, iMonCtr=1 Model crash detected, will try to restart... 09:01:23 (30226): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:23:52 (25333): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:24:00 (25333): No heartbeat from core client for 30 sec - exiting 11:17:19 (25637): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:58:22 (14469): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:58:23 (14469): No heartbeat from core client for 30 sec - exiting 12:58:24 (14469): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=22374, iMonCtr=1 Model crash detected, will try to restart... 18:10:23 (22374): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:10:27 (22374): No heartbeat from core client for 30 sec - exiting 18:10:28 (22374): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=15556, iMonCtr=1 Model crash detected, will try to restart... 22:39:56 (15556): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=25834, iMonCtr=1 Model crash detected, will try to restart... 23:13:47 (25834): No heartbeat from core client for 30 sec - exiting 23:13:51 (25834): No heartbeat from core client for 30 sec - exiting 23:13:52 (25834): No heartbeat from core client for 30 sec - exiting 23:13:53 (25834): No heartbeat from core client for 30 sec - exiting 23:13:54 (25834): No heartbeat from core client for 30 sec - exiting 23:13:55 (25834): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6292, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6292, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6292, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6292, iMonCtr=1 Model crash detected, will try to restart... 03:55:34 (6292): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 03:55:36 (6292): No heartbeat from core client for 30 sec - exiting 03:55:37 (6292): No heartbeat from core client for 30 sec - exiting 05:26:09 (20615): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=24311, iMonCtr=1 Model crash detected, will try to restart... 06:42:05 (24311): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:17:54 (21825): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:17:58 (21825): No heartbeat from core client for 30 sec - exiting 14:17:59 (21825): No heartbeat from core client for 30 sec - exiting 14:18:00 (21825): No heartbeat from core client for 30 sec - exiting 14:23:40 (7554): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:23:42 (7554): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9785, iMonCtr=1 Model crash detected, will try to restart... 18:23:14 (2544): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:23:21 (2544): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=16122, iMonCtr=1 Model crash detected, will try to restart... 19:31:52 (16122): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=10926, iMonCtr=1 Model crash detected, will try to restart... 20:19:15 (10926): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc SIGABRT: abort called Stack trace (15 frames): /home/huo/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu(boinc_catch_signal+0x6f)[0x840da8f] [0xb77bb400] [0xb77bb424] /lib/i386-linux-gnu/libc.so.6(gsignal+0x4f)[0xb75e01df] /lib/i386-linux-gnu/libc.so.6(abort+0x175)[0xb75e3825] /home/huo/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x8401b90] /home/huo/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x83f7af5] /home/huo/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x83f7b32] /home/huo/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x83f7c5a] /home/huo/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x83f825e] /home/huo/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x83f829d] /home/huo/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x839dfbf] /home/huo/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x839bdf3] /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb75cb4d3] /home/huo/BOINC/projects/climateprediction.net/hadcm3n_um_6.07_i686-pc-linux-gnu[0x804cb11] Exiting... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=29843, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=29843, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=29843, iMonCtr=1 Model crash detected, will try to restart... 02:05:47 (29843): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 05:20:41 (5247): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=17828, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=17828, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=17828, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=17828, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=17828, iMonCtr=1 Model crash detected, will try to restart... 18:08:26 (4437): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7317, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7317, iMonCtr=1 Model crash detected, will try to restart... terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc SIGABRT: abort called Stack trace (23 frames): ../../projects/climateprediction.net/hadcm3n_6.07_i686-pc-linux-gnu(boinc_catch_signal+0x6f)[0x80b80df] [0xb7761400] [0xb7761424] /lib/i386-linux-gnu/libc.so.6(gsignal+0x4f)[0xb74831df] /lib/i386-linux-gnu/libc.so.6(abort+0x175)[0xb7486825] /usr/lib/i386-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x14d)[0xb76f613d] /usr/lib/i386-linux-gnu/libstdc++.so.6(+0xaaed3)[0xb76f3ed3] /usr/lib/i386-linux-gnu/libstdc++.so.6(+0xaaf0f)[0xb76f3f0f] /usr/lib/i386-linux-gnu/libstdc++.so.6(+0xab05e)[0xb76f405e] /usr/lib/i386-linux-gnu/libstdc++.so.6(_Znwj+0x7f)[0xb76f467f] /usr/lib/i386-linux-gnu/libstdc++.so.6(_Znaj+0x1b)[0xb76f474b] /home/huo/BOINC/projects/climateprediction.net/hadcm3n_se_6.07_i686-pc-linux-gnu.so(_Z8ReadDataRSt14basic_ifstreamIcSt11char_traitsIcEEPiS4_+0x8e)[0xb6f7d91e] /home/huo/BOINC/projects/climateprediction.net/hadcm3n_se_6.07_i686-pc-linux-gnu.so(_Z11ConvertFileSsSsSsSsi+0x238)[0xb6f78d18] /home/huo/BOINC/projects/climateprediction.net/hadcm3n_se_6.07_i686-pc-linux-gnu.so(umtonc+0x158)[0xb6f7efc8] ../../projects/climateprediction.net/hadcm3n_6.07_i686-pc-linux-gnu[0x805afab] ../../projects/climateprediction.net/hadcm3n_6.07_i686-pc-linux-gnu[0x805490c] ../../projects/climateprediction.net/hadcm3n_6.07_i686-pc-linux-gnu[0x8058324] ../../projects/climateprediction.net/hadcm3n_6.07_i686-pc-linux-gnu[0x804f1f4] ../../projects/climateprediction.net/hadcm3n_6.07_i686-pc-linux-gnu[0x8050491] ../../projects/climateprediction.net/hadcm3n_6.07_i686-pc-linux-gnu[0x805112c] ../../projects/climateprediction.net/hadcm3n_6.07_i686-pc-linux-gnu[0x805137a] /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb746e4d3] ../../projects/climateprediction.net/hadcm3n_6.07_i686-pc-linux-gnu(__gxx_personality_v0+0x169)[0x804cb51] Exiting... </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
05 May 2013 03:58:43	1239373	15720750	hadcm3n_4jvy_1940_40_008309388_1	466,560	1,252,412	2.6844
04 May 2013 07:45:53	1239373	15720750	hadcm3n_4jvy_1940_40_008309388_1	440,640	1,182,710	2.6841
03 May 2013 11:29:27	1239373	15720750	hadcm3n_4jvy_1940_40_008309388_1	414,720	1,113,095	2.6840
02 May 2013 15:15:35	1239373	15720750	hadcm3n_4jvy_1940_40_008309388_1	388,800	1,043,400	2.6836
01 May 2013 18:58:38	1239373	15720750	hadcm3n_4jvy_1940_40_008309388_1	362,880	973,772	2.6835
30 Apr 2013 22:41:04	1239373	15720750	hadcm3n_4jvy_1940_40_008309388_1	336,960	904,232	2.6835
30 Apr 2013 02:28:30	1239373	15720750	hadcm3n_4jvy_1940_40_008309388_1	311,040	834,713	2.6836
29 Apr 2013 06:11:57	1239373	15720750	hadcm3n_4jvy_1940_40_008309388_1	285,120	764,917	2.6828
28 Apr 2013 09:51:28	1239373	15720750	hadcm3n_4jvy_1940_40_008309388_1	259,200	695,071	2.6816
27 Apr 2013 13:24:27	1239373	15720750	hadcm3n_4jvy_1940_40_008309388_1	233,280	625,166	2.6799
26 Apr 2013 17:02:01	1239373	15720750	hadcm3n_4jvy_1940_40_008309388_1	207,360	555,630	2.6795
25 Apr 2013 21:08:23	1239373	15720750	hadcm3n_4jvy_1940_40_008309388_1	181,440	486,083	2.6790
25 Apr 2013 00:14:11	1239373	15720750	hadcm3n_4jvy_1940_40_008309388_1	155,520	416,353	2.6772
24 Apr 2013 03:58:35	1239373	15720750	hadcm3n_4jvy_1940_40_008309388_1	129,600	346,993	2.6774
23 Apr 2013 07:43:44	1239373	15720750	hadcm3n_4jvy_1940_40_008309388_1	103,680	277,539	2.6769
22 Apr 2013 11:24:47	1239373	15720750	hadcm3n_4jvy_1940_40_008309388_1	77,760	208,115	2.6764
21 Apr 2013 15:15:32	1239373	15720750	hadcm3n_4jvy_1940_40_008309388_1	51,840	138,747	2.6764
20 Apr 2013 18:59:52	1239373	15720750	hadcm3n_4jvy_1940_40_008309388_1	25,920	69,337	2.6750