climateprediction.net home page

Client Error/Computation Error - HADSMs


Advanced search

Message boards : Number crunching : Client Error/Computation Error - HADSMs

AuthorMessage
sphagc
Send message
Joined: Apr 24 06
Posts: 4
Credit: 8,984,280
RAC: 0
Message 38003 - Posted 17 Sep 2009 21:31:25 UTC

    For recent batch of HADSM models I have been getting the following messages:

    17/09/2009 20:31:05 climateprediction.net Started upload of hadsm3fub_k4q9_006418923_1_1.zip
    17/09/2009 20:31:06 climateprediction.net Computation for task hadsm3fub_k4q9_006418923_1 finished
    17/09/2009 20:31:06 climateprediction.net Output file hadsm3fub_k4q9_006418923_1_2.zip for task hadsm3fub_k4q9_006418923_1 absent
    17/09/2009 20:31:06 climateprediction.net Output file hadsm3fub_k4q9_006418923_1_3.zip for task hadsm3fub_k4q9_006418923_1 absent
    17/09/2009 20:31:44 climateprediction.net Finished upload of hadsm3fub_k4q9_006418923_1_1.zip


    When I look at my account TASK information it indicates Client Error/Computation Error.

    Any ideas why, HADSM3Ps seem to be running fine.

    Regards

    Coz

    DJStarfox
    Send message
    Joined: Jan 27 07
    Posts: 262
    Credit: 1,170,691
    RAC: 187
    Message 38004 - Posted 17 Sep 2009 21:58:56 UTC - in response to Message 38003.

      Sphagc,
      Would you post a link to the workunit that you\'re talking about? And also, which computer is this in your list of computers?

      Les Bayliss
      Forum moderator
      Send message
      Joined: Sep 5 04
      Posts: 5409
      Credit: 8,958,344
      RAC: 1,679
      Message 38005 - Posted 17 Sep 2009 21:58:56 UTC

        My guess is that you\'re interrupting the 3 phase slab models at the end of a phase and before the next phase has started. They don\'t like this!
        There\'s LOTS of post processing at the end of each phase, which involves extracting data, consolidating it, and then zipping them for upload. Interrupt this and the files are history.

        If a model has reached the end of a phase, wait until after the first trickle in the next phase before interrupting.


        ____________
        Backups: Here

        Profile geophi
        Forum moderator
        Send message
        Joined: Aug 7 04
        Posts: 1477
        Credit: 22,724,605
        RAC: 5,327
        Message 38006 - Posted 17 Sep 2009 22:23:45 UTC

          It looks like you\'ve had 7 errors right at the end of phase 1. As Les said, something appears to be happening to interrupt post-processing at that critical end-of-phase time. It seems unlikely that you would be manually interrupting each model at the time of failure since the failures occurred at 7 different times.

          If I recall correctly, some executable other than the hadsm3 um process is called at post processing. Perhaps Vista, or an antivirus, or anti-malware application has locked this file that is only needed at that time? Ian/Thyme might have a better idea.

          sphagc
          Send message
          Joined: Apr 24 06
          Posts: 4
          Credit: 8,984,280
          RAC: 0
          Message 38007 - Posted 18 Sep 2009 13:26:04 UTC - in response to Message 38004.

            Sphagc,
            Would you post a link to the workunit that you\'re talking about? And also, which computer is this in your list of computers?



            http://climateapps2.oucs.ox.ac.uk/cpdnboinc/hosts_user.php?userid=392646
            Computer which is showing problem:
            996941 [tasks] Cozzie-VistaX64 home 4,198.64 88,812 GenuineIntel
            Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz [Intel64 Family 6 Model 15 Stepping 11] Microsoft Windows Vista
            Ultimate x64 Edition, Service Pack 2, (06.00.6002.00) 18 Sep 2009 12:41:42 UTC

            Machine is left running 24/7 and I only reboot after Microsoft Updates (making sure I close down BOINC before shutdown).

            Tasks with Errors.
            9938307 6649750 9 Sep 2009 20:00:32 UTC 15 Sep 2009 17:51:07 UTC Over Client error Compute error 364,086.40 2,282.60 2,282.60
            9894197 6645339 2 Sep 2009 19:15:06 UTC 7 Sep 2009 19:51:26 UTC Over Client error Compute error 417,807.50 2,282.60 2,282.60
            9891457 6645065 11 Sep 2009 15:06:47 UTC 16 Sep 2009 10:13:09 UTC Over Client error Compute error 368,213.10 2,282.60 2,282.60
            9826402 6638561 6 Sep 2009 16:39:28 UTC 11 Sep 2009 15:06:47 UTC Over Client error Compute error 400,210.60 2,282.60 2,282.60
            9811750 6637096 12 Sep 2009 20:35:18 UTC 17 Sep 2009 19:32:18 UTC Over Client error Compute error 392,664.40 2,282.60 2,282.60
            9752529 6631174 7 Sep 2009 19:53:02 UTC 12 Sep 2009 20:35:18 UTC Over Client error Compute error 393,902.40 2,282.60 2,282.60
            9618960 6597657 9 Sep 2009 17:06:57 UTC 15 Sep 2009 19:57:11 UTC Over Client error Compute error 378,376.20 2,282.60 2,282.60

            NB. Everything else seems to be working fine with shorter HADSM3Ps - I am doing nothing different with them, not had problem with the longer ones before.

            Many thanks for your help

            Coz.

            Profile geophi
            Forum moderator
            Send message
            Joined: Aug 7 04
            Posts: 1477
            Credit: 22,724,605
            RAC: 5,327
            Message 38008 - Posted 18 Sep 2009 15:13:55 UTC

              @sphagc

              Are there any differences in setup between that PC and your other Windows PCs that are successfully running hadsm3 type models? Different antivirus? Different antimalware program? Different firewalls?

              DJStarfox
              Send message
              Joined: Jan 27 07
              Posts: 262
              Credit: 1,170,691
              RAC: 187
              Message 38012 - Posted 20 Sep 2009 4:57:50 UTC - in response to Message 38007.

                Seems like file permissions problems. Reset security on all files in your BOINC\'s data/projects directory. Could also be Vista security.... The climate applications need to be able to spawn themselves and their post-processing items. Without this execute permission, task will fail. I know there\'s a Windows Defender or Vista Security something-or-other or perhaps virus protection that might be preventing this.

                Other than that, afraid I can\'t be much help with Vista....

                Profile Thyme Lawn
                Forum moderator
                Send message
                Joined: Aug 5 04
                Posts: 1232
                Credit: 10,395,396
                RAC: 1,139
                Message 38018 - Posted 22 Sep 2009 14:29:26 UTC - in response to Message 38006.

                  Last modified: 22 Sep 2009 14:30:07 UTC

                  If I recall correctly, some executable other than the hadsm3 um process is called at post processing.

                  The se process is indeed the problem. All of the HadSM3 tasks are failing with the same error, namely

                  Could not launch smallexecs process. Last Error=5

                  (e.g. click the \'+\' by stderr out for task id 9938307).

                  Check that projects/climateprediction.net in your BOINC data directory contains the file hadsm3_se_6.07_windows_intelx86.zip (1,958,740 bytes) and that it has been unzipped to hadsm3_se_6.07_windows_intelx86.exe (2,212,352 bytes, modification time 12:11:16 on 21 August 2008).
                  ____________
                  "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer

                  Profile Thyme Lawn
                  Forum moderator
                  Send message
                  Joined: Aug 5 04
                  Posts: 1232
                  Credit: 10,395,396
                  RAC: 1,139
                  Message 38019 - Posted 23 Sep 2009 7:50:43 UTC - in response to Message 38018.

                    Could not launch smallexecs process. Last Error=5

                    A further thought about that message. Error number 5 is \"Access denied\" so the cause could be file permissions or locking.
                    ____________
                    "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer

                    sphagc
                    Send message
                    Joined: Apr 24 06
                    Posts: 4
                    Credit: 8,984,280
                    RAC: 0
                    Message 38020 - Posted 23 Sep 2009 8:27:14 UTC - in response to Message 38019.

                      Could not launch smallexecs process. Last Error=5

                      A further thought about that message. Error number 5 is \"Access denied\" so the cause could be file permissions or locking.



                      Thanks for all the replies, I have checked and both exe & zip file are present with all permissions set as far as I can see correctly.

                      The two quad-core systems both running Vista X64 Ultimate with Spyware Doctor for Malware detection, but problem systems has Kapersky Internet Security 2009 running whist, the other has Kapersky Anti-Virus 6 for Workstations. File permissions etc have been set identical, unless the Security Suite has something extra I have missed, although previous HADSM have cuased no problems.

                      Anyway everyone, thanks for messages I will keep an eye on the systems and report back if I spot any further problems.

                      Regards

                      Coz.

                      Tamaster
                      Send message
                      Joined: May 20 09
                      Posts: 1
                      Credit: 36,702
                      RAC: 0
                      Message 38321 - Posted 17 Nov 2009 13:03:39 UTC

                        Well... Wish I could figure out why, but I\'ve had far too many compute errors running cpdn tasks and far too much frustration like this one: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=6693008 where I\'ve burned hundreds of thousands of compute seconds only to have it punt and get but a fraction of credit. And judging from the above result, I\'m not the only one experiencing these type of failures. Perhaps my computer isn\'t up to the demand, but I don\'t believe that explains it. I\'ve run Aqua Multithread for hundreds of hours without error, I\'ve got Folding runnng on both GPUs daily with nary a problem. All while getting my normal work done. And other BOINC projects crunch along happily side by side with cpdn while it \"face-plants\" yet again. Ah well... I gave it a go. That should count for something I guess...

                        DJStarfox
                        Send message
                        Joined: Jan 27 07
                        Posts: 262
                        Credit: 1,170,691
                        RAC: 187
                        Message 38371 - Posted 23 Nov 2009 5:07:33 UTC

                          22-Nov-2009 13:59:05 [climateprediction.net] Computation for task hadsm3mh_kv40_006489252_4 finished
                          22-Nov-2009 13:59:05 [climateprediction.net] Output file hadsm3mh_kv40_006489252_4_2.zip for task hadsm3mh_kv40_006489252_4 absent
                          22-Nov-2009 13:59:05 [climateprediction.net] Output file hadsm3mh_kv40_006489252_4_3.zip for task hadsm3mh_kv40_006489252_4 absent
                          22-Nov-2009 13:59:05 [climateprediction.net] Output file hadsm3mh_kv40_006489252_4_4.zip for task hadsm3mh_kv40_006489252_4 absent

                          I have a few (but not all) HadSM_MH models that crash around timestep 260,000. Not sure why, as some of the MH models do finish properly, although not lately.
                          Good one:
                          http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=10543047

                          Bad ones:
                          http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=10531431
                          http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=9374407
                          http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=9362859

                          Profile Mary
                          Avatar
                          Send message
                          Joined: Oct 7 08
                          Posts: 7
                          Credit: 165,698
                          RAC: 0
                          Message 38386 - Posted 24 Nov 2009 20:53:12 UTC

                            I know this WU failed because BOINC switched projects while it was trying to do post processing:

                            http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=9616085

                            However this WU failed without any reason I can find just yet:

                            http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=10415116

                            My Primegrid WU\'s around the time were unaffected which rules out processor problems and the NFS WU in memory survived which rules out a lack of available memory (since NFS is very sensitive to memory issues). None of the other 15 projects showed any issues whatsoever, just the CPDN WU. It had jumped to 100% sometime while I was gone, but was still \'Waiting to Run\'. I caught it before it restarted and changed the \'waiting\' to \'computer error\'. The graphics listed it as being at only 71% (despite the 100% given in the BOINC manager) and the temps had gone blue.
                            ____________
                            ~It only takes one bottle cap moving at 23,000 mph to ruin your whole day~

                            Les Bayliss
                            Forum moderator
                            Send message
                            Joined: Sep 5 04
                            Posts: 5409
                            Credit: 8,958,344
                            RAC: 1,679
                            Message 38387 - Posted 24 Nov 2009 21:07:56 UTC

                              If the temperatures were blue, then either the model hadn\'t run long enough to generate the data needed by the graphics package to show the correct colours, (blue is the default colour immediately on starting, and before sufficient data has been crunched), or it had turned into an \'iceworld\'.
                              Iceworld description here, discussion here, and appeal for data here. The later only applies to people who take regular backups, and are prepared to do some extra work.


                              ____________
                              Backups: Here

                              Post to thread

                              Message boards : Number crunching : Client Error/Computation Error - HADSMs




                              Copyright © 2002-2014 climateprediction.net