Message boards : Number crunching : Replanca Error/Sigseg fault.
Message board moderation
Previous · 1 · 2 · 3 · 4
Author | Message |
---|---|
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,829,527 RAC: 9,480 |
Two more of the same 617 batch failed with SIGSEGV: segmentation violation at 13h 49 minutes. I have 4 failed in a row all at 1st attempt (_0) https://www.cpdn.org/cpdnboinc/result.php?resultid=20576338 https://www.cpdn.org/cpdnboinc/result.php?resultid=20566129 |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,829,527 RAC: 9,480 |
Two more from batch 617 on the second linux - after 12 minutes - Model crashed: In boinc_exit called with status 0 Calloing set_signal_exit_code with status 0 https://www.cpdn.org/cpdnboinc/result.php?resultid=20574397 https://www.cpdn.org/cpdnboinc/result.php?resultid=20574922 |
Send message Joined: 15 May 09 Posts: 4542 Credit: 19,039,635 RAC: 18,944 |
https://www.cpdn.org/cpdnboinc/workunit.php?wuid=11133626 This one now in last chance saloon. on both mac and my linux laptop failed just before creation of first zip with sigseg fault. Now it is on a Windows box so I will check back in a while to see what happens and whether the Linux/Mac versions need to be pulled. |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,829,527 RAC: 9,480 |
So far 9 WUs failed on my Linux (and later on Darwin/Linux at 2nd attempt), 5 of them report trickles under Win, the other 4 haven't reported under WIN yet. I still have two more in queue. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
https://www.cpdn.org/cpdnboinc/workunit.php?wuid=11133626 I'm assuming this is once again one of those batches that fails on the first timestep of the regional model on January 1st on Mac and Linux? |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,829,527 RAC: 9,480 |
I'm assuming this is once again one of those batches that fails on the first timestep of the regional model on January 1st on Mac and Linux? It looks like, I'm tailing my last one, but will have results in 12 h. One more failed, however stuck in progress, so I have 10 that failed in total on my Linux |
Send message Joined: 15 May 09 Posts: 4542 Credit: 19,039,635 RAC: 18,944 |
Looks very much like it. Am emailing project. |
Send message Joined: 15 May 09 Posts: 4542 Credit: 19,039,635 RAC: 18,944 |
Got the following back from Sarah, Hi Dave, So any of this batch on Linux/Mac machines can be aborted. |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,829,527 RAC: 9,480 |
Thanks Dave, So no need to tail the last 617er. Will abort. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I got two batch 617 work units and each failed with a segmentation fault after about 12 hours on a Linux machine. Computer ID 1256552 wah2_sas50_l09y_198612_13_617_011131907_1 wah2_sas50_l2nz_199512_13_617_011135004_1 |
Send message Joined: 15 May 09 Posts: 4542 Credit: 19,039,635 RAC: 18,944 |
They will do, on Linux and Mac please just abort this batch. Not sure what is happening with the ones that crash out after a few minutes though. I am assuming that is a different problem. Don't know if that one affects windows machines or not. |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,829,527 RAC: 9,480 |
The 3 WUs that failed after 8-12 minutes (mentioned above) on my Linix seem to work fine under WIN as zips and trickles pile up |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
They will do, on Linux and Mac please just abort this batch. Not sure what is happening with the ones that crash out after a few minutes though. I am assuming that is a different problem. Don't know if that one affects windows machines or not. I have a Windows version batch 617 WU’s that’s been running for 2 days now with no problems. |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,829,527 RAC: 9,480 |
3 Models with SIGSEGV: segmentation violation Stack trace (12 frames): wah2_afr50_k38v_201512_13_644_011208869_0 wah2_wus25_td9f_200309_25_583_011064495_1 wah2_wus25_tc4t_200409_25_583_011063033_1 |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,829,527 RAC: 9,480 |
It might be irrelevant to the current topic, but two WUs from the bad batch 617 that was discussed below are still In Progress on the web no matter that the linux one crashed and the windows one finished successfully. I will detach/reattach to release them, but project people might have another look at that batch and find some answers why there are ghost WUs reported here. |
©2024 cpdn.org