Questions and Answers :
Unix/Linux :
hadam model restart errors
Message board moderation
Author | Message |
---|---|
Send message Joined: 22 Oct 05 Posts: 15 Credit: 2,340,122 RAC: 0 |
After booting once computer found that BOINC crashed and restarted and HADAM model time has ben reset to some hours (see log below). After that HADAM model each time restarts and sends trickle message twice after starting BOINC client. Additionally consumed time was reset to something near zero. I guess this time corruption is BOINC not project as I have seen similar time corruption also with 2 Einstein@Home workunits more than once. Is it worth to continue the model? System information: Linux, Fedora 8 x86_64, etc. Andris hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018737 A - 11/08/2000 02:50 - H:M:S=0075:45:34 AVG=14.56 DLT=10.57 hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018738 A - 11/08/2000 03:00 - H:M:S=0075:45:43 AVG=14.56 DLT= 9.36 hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018739 A - 11/08/2000 03:10 - H:M:S=0075:45:53 AVG=14.56 DLT=10.00 Cleaning up graphics data... Detaching shared memory... 23-Jun-2008 18:55:31 [---] Starting BOINC client version 5.10.45 for x86_64-pc-linux-gnu 23-Jun-2008 18:55:32 [---] log flags: task, file_xfer, sched_ops 23-Jun-2008 18:55:32 [---] Libraries: libcurl/7.18.2 NSS/3.12.0.3 zlib/1.2.3 libidn/0.6.14 23-Jun-2008 18:55:32 [---] Executing as a daemon 23-Jun-2008 18:55:32 [---] Data directory: /var/lib/boinc 23-Jun-2008 18:55:32 [Einstein@Home] Found app_info.xml; using anonymous platform 23-Jun-2008 18:55:32 [---] Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU @ 2.40GHz [Family 6 Model 15 Stepping 7] 23-Jun-2008 18:55:32 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm 23-Jun-2008 18:55:32 [---] OS: Linux: 2.6.25.6-27.fc8 23-Jun-2008 18:55:32 [---] Memory: 3.87 GB physical, 3.91 GB virtual 23-Jun-2008 18:55:32 [---] Disk: 47.30 GB total, 29.55 GB free 23-Jun-2008 18:55:32 [---] Local time is UTC +3 hours 23-Jun-2008 18:55:33 [climateprediction.net] URL: http://climateprediction.net/; Computer ID: 826910; location: home; project prefs: default 23-Jun-2008 18:55:33 [3x+1@home] URL: http://allprojectstats.com/collatz/; Computer ID: 1108; location: (none); project prefs: default 23-Jun-2008 18:55:33 [SETI@home] URL: http://setiathome.berkeley.edu/; Computer ID: 4174592; location: home; project prefs: default 23-Jun-2008 18:55:33 [orbit@home] URL: http://orbit.psi.edu/oah/; Computer ID: 3512; location: (none); project prefs: default 23-Jun-2008 18:55:33 [Cosmology@Home] URL: http://www.cosmologyathome.org/; Computer ID: 14606; location: (none); project prefs: default 23-Jun-2008 18:55:33 [Milkyway@home] URL: http://milkyway.cs.rpi.edu/milkyway/; Computer ID: 9818; location: (none); project prefs: default 23-Jun-2008 18:55:33 [Einstein@Home] URL: http://einstein.phys.uwm.edu/; Computer ID: 1099657; location: home; project prefs: default 23-Jun-2008 18:55:33 [---] General prefs: from http://cosmologyathome.org/ (last modified 26-Jan-2008 22:13:49) 23-Jun-2008 18:55:33 [---] Host location: none 23-Jun-2008 18:55:33 [---] General prefs: using your defaults 23-Jun-2008 18:55:33 [---] Reading preferences override file 23-Jun-2008 18:55:33 [---] Preferences limit memory usage when active to 1983.22MB 23-Jun-2008 18:55:33 [---] Preferences limit memory usage when idle to 3569.80MB 23-Jun-2008 18:55:33 [---] Preferences limit disk usage to 9.31GB 23-Jun-2008 18:55:33 [climateprediction.net] Restarting task hadam3h_c_52s16_2000_2000_1_0 using hadam3 version 503 23-Jun-2008 18:55:41 [Einstein@Home] Restarting task h1_1089.05_S5R3__350_S5R3b_1 using einstein_S5R3 version 438 23-Jun-2008 18:55:41 [Einstein@Home] Restarting task h1_1089.05_S5R3__349_S5R3b_0 using einstein_S5R3 version 438 23-Jun-2008 18:55:42 [Einstein@Home] Restarting task h1_1089.05_S5R3__549_S5R3b_2 using einstein_S5R3 version 438 Beginning work on result hadam3h_c_52s16_2000_2000_1_0... Starting model in /var/lib/boinc/projects/climateprediction.net... Created shared memory region key = 113980 of size 5009240 bytes (version 602) .so shmem return code = 1152 Starting model ID hadam3h_c_52s16_2000_2000_1 Phase 1 Program launched with process id # 3583 Climate model starting - use graphics to monitor progress. Or visit the website to see the graphs for this run. Getting pthread attributes - retval=0 Setting pthread size (576716800 bytes) - retval=0 Executing program hadam3_um_5.03_i686-pc-linux-gnu 113980 dth20l_052s16.anc ssta2000.anc sicea2000.anc hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018721 A - 11/08/2000 00:10 - H:M:S=0075:41:25 AVG=14.56 DLT= 0.00 23-Jun-2008 18:55:49 [climateprediction.net] Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks 23-Jun-2008 18:55:54 [climateprediction.net] Scheduler request succeeded: got 0 new tasks Cleaning up graphics data... Detaching shared memory... 23-Jun-2008 18:56:11 [climateprediction.net] Task hadam3h_c_52s16_2000_2000_1_0 exited with zero status but no \'finished\' file 23-Jun-2008 18:56:11 [climateprediction.net] If this happens repeatedly you may need to reset the project. 23-Jun-2008 18:56:11 [climateprediction.net] Restarting task hadam3h_c_52s16_2000_2000_1_0 using hadam3 version 503 Beginning work on result hadam3h_c_52s16_2000_2000_1_0... Starting model in /var/lib/boinc/projects/climateprediction.net... Created shared memory region key = 113980 of size 5009240 bytes (version 602) .so shmem return code = 1152 Starting model ID hadam3h_c_52s16_2000_2000_1 Phase 1 Getting pthread attributes - retval=0 Setting pthread size (576716800 bytes) - retval=0 Executing program hadam3_um_5.03_i686-pc-linux-gnu 113980 dth20l_052s16.anc ssta2000.anc sicea2000.anc Program launched with process id # 3596 Climate model starting - use graphics to monitor progress. Or visit the website to see the graphs for this run. hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018721 A - 11/08/2000 00:10 - H:M:S=0001:13:57 AVG= 0.24 DLT= 0.00 23-Jun-2008 18:56:15 [climateprediction.net] Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks 23-Jun-2008 18:56:20 [climateprediction.net] Scheduler request succeeded: got 0 new tasks hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018722 A - 11/08/2000 00:20 - H:M:S=0001:15:28 AVG= 0.24 DLT=90.95 hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018723 A - 11/08/2000 00:30 - H:M:S=0001:15:37 AVG= 0.24 DLT= 8.99 hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018724 A - 11/08/2000 00:40 - H:M:S=0001:15:47 AVG= 0.24 DLT= 9.99 |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Did you Exit from BOINC before re-booting? Highly recommended if you don\'t want to crash climate models. Several trickles after a re-start is OK. If you look at the trickle file, you\'ll see that they are type \'cse\', whereas \'normal\' trickles are type \'orig\'. type \'cse\' just tells the server that the model is back up and running. As for continuing, is the day/month/year constantly advancing in the graphics? If so, continue. If it keeps jumping backwards, it may be best to abort. Backups: Here |
Send message Joined: 22 Oct 05 Posts: 15 Credit: 2,340,122 RAC: 0 |
Did you Exit from BOINC before re-booting? Linux shut-down scripts stops BOINC automatically. I\'m using Fedora-8 boinc-client RPM which runs BOINC in daemon mode.
Text output from BOINC client shows that there are no backjumps (except of using last checkpoint after restarting). In daemon mode graphics does not seenm to be working. Tried to: 1) suspended CPDN model 2) restarted BOINC (/etc/init.d/boinc-client stop with following starting it) 3) resumed CPDN model It still exited and restarted from checkpoint in the same way as in earlier included log. |
©2024 climateprediction.net