climateprediction.net home page

The world's largest climate forecasting experiment for the 21st century.

Upload problems


Advanced search

Message boards : Number crunching : Upload problems

AuthorMessage
staffann
Send message
Joined: Oct 23 05
Posts: 22
Credit: 394,767
RAC: 0
Message 41939 - Posted 9 Apr 2011 17:28:59 UTC

    I noticed that my 24/7 linux server had not gotten any credits from climateprediction for a while, so I thought I'd look and see what's happening. It seems it cannot upload result files despite the server status on you page saying ok. The messages from boinc are posted below. As you can see, other projects communicate just fine. Any suggestions? Thanks!


    2011-04-09 16:48:01 rosetta@home Reporting 1 completed tasks, not requesting new tasks
    2011-04-09 16:48:02 rosetta@home Started upload of mem_tid3_run06_A_1afo_SAVE_ALL_OUT_IGNORE_THE_REST_22930_14947_0_0
    2011-04-09 16:48:06 rosetta@home Scheduler request completed
    2011-04-09 16:48:09 rosetta@home Finished upload of mem_tid3_run06_A_1afo_SAVE_ALL_OUT_IGNORE_THE_REST_22930_14947_0_0
    2011-04-09 16:48:11 World Community Grid Sending scheduler request: To fetch work.
    2011-04-09 16:48:11 World Community Grid Reporting 3 completed tasks, requesting new tasks
    2011-04-09 16:48:16 World Community Grid Scheduler request completed: got 1 new tasks
    2011-04-09 16:48:18 World Community Grid Started download of E201784_622_C.22.C20H14N2.00010713.3.set1d06_C.22.C20H14N2.00010713.3.zip
    2011-04-09 16:48:21 World Community Grid Finished download of E201784_622_C.22.C20H14N2.00010713.3.set1d06_C.22.C20H14N2.00010713.3.zip
    2011-04-09 17:05:00 climateprediction.net Started upload of famous_xaby_1999_200_007075221_2_10.zip
    2011-04-09 17:05:00 climateprediction.net Started upload of famous_voj9_999_200_006734889_5_2.zip
    2011-04-09 17:05:24 Project communication failed: attempting access to reference site
    2011-04-09 17:05:24 climateprediction.net Temporarily failed upload of famous_xaby_1999_200_007075221_2_10.zip: connect() failed
    2011-04-09 17:05:24 climateprediction.net Backing off 1 hr 38 min 10 sec on upload of famous_xaby_1999_200_007075221_2_10.zip
    2011-04-09 17:05:24 climateprediction.net Temporarily failed upload of famous_voj9_999_200_006734889_5_2.zip: connect() failed
    2011-04-09 17:05:24 climateprediction.net Backing off 2 hr 42 min 44 sec on upload of famous_voj9_999_200_006734889_5_2.zip
    2011-04-09 17:05:47 BOINC can't access Internet - check network connection or proxy configuration.
    2011-04-09 18:06:01 World Community Grid Computation for task dg01_c002_pr56b1_0 finished
    2011-04-09 18:06:01 World Community Grid Starting E201784_622_C.22.C20H14N2.00010713.3.set1d06_0
    2011-04-09 18:06:01 World Community Grid Starting task E201784_622_C.22.C20H14N2.00010713.3.set1d06_0 using cep2 version 640
    2011-04-09 18:06:03 World Community Grid Started upload of dg01_c002_pr56b1_0_0
    2011-04-09 18:06:03 World Community Grid Started upload of dg01_c002_pr56b1_0_1
    2011-04-09 18:06:10 World Community Grid Finished upload of dg01_c002_pr56b1_0_0
    2011-04-09 18:06:10 World Community Grid Started upload of dg01_c002_pr56b1_0_2
    2011-04-09 18:06:11 World Community Grid Finished upload of dg01_c002_pr56b1_0_1
    2011-04-09 18:06:11 World Community Grid Finished upload of dg01_c002_pr56b1_0_2

    ____________

    Profile Greg van Paassen
    Send message
    Joined: Nov 17 07
    Posts: 131
    Credit: 3,745,919
    RAC: 4,674
    Message 41941 - Posted 9 Apr 2011 21:22:21 UTC - in response to Message 41939.

      Last modified: 9 Apr 2011 21:45:43 UTC

      Somewhere on these forums there is a discussion about this problem, but I can't find it either. :(

      What I would do is this: Open a terminal and 'cd' to the boinc directory. (On my system it is /var/lib/boinc.)

      grep -A 2 "<upload_when_present/>" client_state.xml

      See if there is anything funny about any of the URLs listed. Sometimes the word "handler" (in the URL) has been corrupted to "handlrr" or "hanndlr" or some other variation.

      If this is the problem, before editing client_state.xml you must shut down Boinc -- otherwise the file can get horribly corrupted.

      If everything looks OK with the URLs, I would try 'pinging' each of the listed servers, e.g.:-

      ping -c 3 http://boinc1.coas.oregonstate.edu

      Hope this helps.

      EDIT: There is some discussion that might be relevant on the PHPBB message board, here

      EDIT 2: If you need to change client_state.xml, do NOT change anything between
      <signed_xml> and

      </signed_xml>

      Only change the URL that is above the <signed_xml> line for each file.

      Eirik Redd
      Send message
      Joined: Aug 31 04
      Posts: 193
      Credit: 23,214,844
      RAC: 31,385
      Message 41954 - Posted 10 Apr 2011 9:48:41 UTC

        Somehow the client.state.xml gets these missplllllng problem.
        handddlr handler handlrr whatever.
        Perhaps the new crew might try some kind of spell checker?
        Me, I have done at least 30- or 50 spelling fixes in the client_state.xml in the last two years. Burbblmm--qqb;eep.
        I mean glorbb sneeel pp.
        Actually -- how can we trust the "scientists" when they can't spell ?
        Probably they get all the models more or less right, give or take a mpb or bom or snoo; or whatever.
        So -- please explain how easy it is to get the name wrong but get the science right? == Eh?
        ____________

        Les Bayliss
        Forum moderator
        Send message
        Joined: Sep 5 04
        Posts: 5129
        Credit: 8,459,347
        RAC: 5,837
        Message 41955 - Posted 10 Apr 2011 10:19:32 UTC - in response to Message 41954.

          This problem only happens with linux systems, and there's nothing wrong with the data when it leaves Oxford.
          It's only when it arrives on certain Linux based computers that it gets messed up.
          And I think that it also only happens with the PNW versions of the hadam3p models.

          And the researchers / scientists / climatologists DON'T create the files that get sent to people's computers; this is done by the project people, using scripts that create the necessary files, according to the parameter specifications of the researchers.

          Milo has spent a lot of time searching the scripts, and the files still in the data pool, to try and find where this is happening, without success.


          ____________
          Backups: Here

          Eirik Redd
          Send message
          Joined: Aug 31 04
          Posts: 193
          Credit: 23,214,844
          RAC: 31,385
          Message 41956 - Posted 10 Apr 2011 10:53:43 UTC - in response to Message 41955.

            Good to know that it only affects linux systems. But what a weird glitch it is. Only seems to affect 2 or 3 chars out of the whole works.
            I will continue to process work-units from Climateprediction -- at least until the cows come home. No worries about the science. But what a strange anomaly.
            Me have no clue how this xml data gets wrong.
            Very strange indeed.

            ____________

            Profile Iain Inglis
            Forum moderator
            Send message
            Joined: Jan 16 10
            Posts: 410
            Credit: 9,532
            RAC: 0
            Message 41957 - Posted 10 Apr 2011 11:11:29 UTC

              What might be helpful, though rather onerous, is if some Linux user were to:

              (a) download a PNW and suspend it before it starts (the download will still complete)

              (b) make a backup of client_state.xml

              (c) run the model

              Repeat ad nauseam until there's an upload failure, then compare the current file with the backup. This would at least confirm what we suppose - that the corruption happens at the client end. (Or it might show the corruption happens before the model starts.)

              Eirik Redd
              Send message
              Joined: Aug 31 04
              Posts: 193
              Credit: 23,214,844
              RAC: 31,385
              Message 41958 - Posted 10 Apr 2011 11:22:20 UTC - in response to Message 41955.

                If can catch a pnw download will try to to this

                Make one thing perfectly clear -- the spelling problem has to do with the BOINC infrastructure. Not with the climate models.
                ____________

                staffann
                Send message
                Joined: Oct 23 05
                Posts: 22
                Credit: 394,767
                RAC: 0
                Message 41959 - Posted 10 Apr 2011 12:47:27 UTC

                  Thank you for the replies. I ran grep on the client_state.xml file. I couldn't find any misspelt file_upload_handler. All of the references (and there were many) were to kraken so I pinged kraken from the server, and it worked perfectly.

                  I made the client_state file available here: http://staffannilsson.eu/Unrelated/client_state.xml

                  I'm at loss how to proceed.
                  ____________

                  Les Bayliss
                  Forum moderator
                  Send message
                  Joined: Sep 5 04
                  Posts: 5129
                  Credit: 8,459,347
                  RAC: 5,837
                  Message 41960 - Posted 10 Apr 2011 16:11:00 UTC - in response to Message 41959.

                    The model type mentioned in your manager listing is for FAMOUS, not a PNW model type.
                    So none of this discussion should apply. It's just a red herring.
                    (And the message associated with the spelling problem is something like: file handler is missing.)

                    But what IS in the list is: 2011-04-09 17:05:47 BOINC can't access Internet - check network connection or proxy configuration..

                    So at the time the list was created, the problem was that your computer couldn't get to the internet.
                    And as it's been happening to you for a while, then you'd have to look earlier in the messages for other reasons for the upload failures.

                    One possibility is described in this by Thyme Lawn.
                    There's another type of BOINC problem, also discussed by Thyme Lawn, where the large cpdn zips can cause a 'log jam' if a large number of them build up, and there are also files from other projects in the transfers queue.

                    This post would probably be from January / February, when the servers were having a problem. The cure involves a few lines to be inserted into cc_config.xml, to limit the number of simultaneous files that BOINC is allowed to try during it's upload attempts.


                    ____________
                    Backups: Here

                    staffann
                    Send message
                    Joined: Oct 23 05
                    Posts: 22
                    Credit: 394,767
                    RAC: 0
                    Message 41962 - Posted 10 Apr 2011 19:26:58 UTC - in response to Message 41960.

                      Last modified: 10 Apr 2011 19:33:14 UTC

                      Yes, there is the message about not being able to access the internet. It still appears after every attempt to upload a climateprediction file, but at the same time all other project communicate just fine. If I use ssh to log onto the server, I see that I can access the internet from it (and even ping kraken as previously mentioned).

                      There are now 31 files waiting to upload, all from climateprediction.
                      ____________

                      staffann
                      Send message
                      Joined: Oct 23 05
                      Posts: 22
                      Credit: 394,767
                      RAC: 0
                      Message 41963 - Posted 10 Apr 2011 19:49:38 UTC - in response to Message 41962.

                        Last modified: 10 Apr 2011 19:50:03 UTC

                        The files are uploading now. All that was needed was to restart the daemon:


                        sudo /etc/init.d/boinc-client restart

                        No idea why it was necessary, but happy that it worked.
                        Thanks for the help
                        ____________

                        Ingleside
                        Send message
                        Joined: Aug 5 04
                        Posts: 85
                        Credit: 6,840,914
                        RAC: 11,870
                        Message 41965 - Posted 10 Apr 2011 23:22:31 UTC - in response to Message 41963.

                          The files are uploading now. All that was needed was to restart the daemon:

                          sudo /etc/init.d/boinc-client restart

                          No idea why it was necessary, but happy that it worked.
                          Thanks for the help

                          You seem to be running v6.10.17. There was a bug fixed around v6.10.3x that had to do with DNS-lookup, there client would always use the same, possibly bad, ip-address. A re-start of client was the only way to fix this problem.

                          Ingleside
                          Send message
                          Joined: Aug 5 04
                          Posts: 85
                          Credit: 6,840,914
                          RAC: 11,870
                          Message 41966 - Posted 10 Apr 2011 23:40:28 UTC - in response to Message 41957.

                            What might be helpful, though rather onerous, is if some Linux user were to:

                            (a) download a PNW and suspend it before it starts (the download will still complete)

                            (b) make a backup of client_state.xml

                            (c) run the model

                            Repeat ad nauseam until there's an upload failure, then compare the current file with the backup. This would at least confirm what we suppose - that the corruption happens at the client end. (Or it might show the corruption happens before the model starts.)

                            I'll recommend one additional step:

                            a0: Immediately after being assigned a PNW-task, suspend network, and make a backup of sched_reply_climateprediction.net.xml, before enabling network again.

                            It's important that CPDN doesn't contact the scheduling-server again before making the backup.


                            If there's now a mis-spelling in sched_reply* it's either a server-side-problem, a problem during transfer from the scheduling-server, or a problem made by the client in handling of the scheduler-reply.

                            If sched_reply* had everything spelled correctly, but spelling-error shows-up in client_state.xml, it's a client-problem.

                            Eirik Redd
                            Send message
                            Joined: Aug 31 04
                            Posts: 193
                            Credit: 23,214,844
                            RAC: 31,385
                            Message 41971 - Posted 12 Apr 2011 1:56:15 UTC - in response to Message 41966.

                              Set prefs to pnw only.
                              Allowed new tasks and got this z25a
                              Didn't stop download quick enough but sched_reply_climateprediction.net.xml had no misspellings and client_state.xml had

                              <file_info>
                              <name>hadam3p_pnw_z25a_2005_1_006914102_1_1.zip</name>
                              <nbytes>0.000000</nbytes>
                              <max_nbytes>150000000.000000</max_nbytes>
                              <generated_locally/>
                              <status>0</status>
                              <upload_when_present/>
                              <url>http://boinc1.coas.oregonstate.edu/cpdn_cgi_main/file_upload_hnndler</url>
                              <signed_xml>
                              <name>hadam3p_pnw_z25a_2005_1_006914102_1_1.zip</name>
                              <generated_locally/>
                              <upload_when_present/>
                              <max_nbytes>150000000</max_nbytes>
                              <url> http://boinc1.coas.oregonstate.edu/cpdn_cgi_main/file_upload_handler </url>
                              </signed_xml>

                              for the first 12 uploads, the 13th upload was spelled ok but goes to

                              <url>http://climateapps1.oucs.ox.ac.uk/cgi-bin/file_upload_handler</url>


                              Tried again with this z25h
                              This time stopped network before first file downloaded, got exact same results in both sched_reply and client.state -- 12 instances of 'hnndler' in client.state and no obvious errors in sched_reply.
                              ____________

                              Profile Iain Inglis
                              Forum moderator
                              Send message
                              Joined: Jan 16 10
                              Posts: 410
                              Credit: 9,532
                              RAC: 0
                              Message 41974 - Posted 12 Apr 2011 14:03:52 UTC

                                That confirms that the problem isn't at the server end; at least it isn't immediately a server spelling error - there might conceivably be some wrong context that causes the errononeous re-write at the client.

                                It does look rather like an off-by-one or buffer flushing error somewhere, with the 'n' creeping backward twice in the 'hnndler' case (there are other spelling errors too). I guess the next step is to find where/when the rewrite happens. I assume the science application doesn't rewrite client_state.xml, whereas BOINC Manager does.

                                Profile Thyme Lawn
                                Forum moderator
                                Send message
                                Joined: Aug 5 04
                                Posts: 1212
                                Credit: 10,213,785
                                RAC: 692
                                Message 41975 - Posted 12 Apr 2011 14:16:19 UTC - in response to Message 41974.

                                  The BOINC core client does the updating, merging the data from sched_reply_climateprediction.net.xml into client_state.xml.

                                  I can't see anything in the code which would account for the corruption. The 1024 character working buffer is definitely long enough and the only modifications made outside the <signed_xml> block are deletion of leading and trailing spaces and decoding of XML escape strings; neither apply to the <url> tag.
                                  ____________
                                  "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer

                                  Ingleside
                                  Send message
                                  Joined: Aug 5 04
                                  Posts: 85
                                  Credit: 6,840,914
                                  RAC: 11,870
                                  Message 41978 - Posted 12 Apr 2011 21:10:21 UTC - in response to Message 41971.

                                    Last modified: 12 Apr 2011 21:16:26 UTC

                                    Tried again with this z25h
                                    This time stopped network before first file downloaded, got exact same results in both sched_reply and client.state -- 12 instances of 'hnndler' in client.state and no obvious errors in sched_reply.

                                    Ok, so the problem is clearly on the client-side of things, and not on server-side or during communication to client. This is atleast a starting-point to try tracking down the problem...

                                    Looking on my own pnw-tasks, it seems pwn is the only CPDN-tasks that uses http://boinc1.coas.oregonstate.edu/ as upload-server, but why this should have any effects seems strange.

                                    The only other difference that seems to be present is the upload-handler is at /cpdn_cgi_main/ and the other CPDN-URL's is both shorter here and has only one _ or - so maybe this has any effects even it really shouldn't...

                                    Then it comes to the size, /cpdn_cgi_main/ is total 13 letters, while some BOINC-projects uses longer. Example. SIMAP uses /boincsimap_cgi/ at 14 letters, while Einstein@home for the non-Arecibo-tasks uses /EinsteinAtHome_cgi/ meaning 18 letters. SIMAP has also longer total URL-length than the PNW-models, in case this has any meaning.


                                    So, while I know this is probably not the reason for corruption, but could you also try to attach to SIMAP and Einstein@home, and see if you gets URL-corruption from these projects also?

                                    Profile geophi
                                    Forum moderator
                                    Send message
                                    Joined: Aug 7 04
                                    Posts: 1447
                                    Credit: 22,192,755
                                    RAC: 9,891
                                    Message 41980 - Posted 12 Apr 2011 22:48:41 UTC

                                      I've seen this error a few times on my Core i7 920 in Fedora 13 64 bit. That PC has never had a PNW model, so I don't think that can be a common factor to all these. In fact, it's only run SAF for less than a month, and I don't think I've had that error in the last month. Before that, it only ran FAMOUS and hadsm3 models for the previous year.

                                      Profile Jonathan Miller
                                      Forum moderator
                                      Project administrator
                                      Project developer
                                      Volunteer developer
                                      Send message
                                      Joined: Mar 28 11
                                      Posts: 32
                                      Credit: 82,588
                                      RAC: 0
                                      Message 41988 - Posted 21 Apr 2011 8:01:36 UTC - in response to Message 41980.

                                        My feeling is to agree that it is not merely a PNW issue.
                                        I have found references to malformed URLs on more than one upload server.

                                        I think the best strategy would be for the CPDN sysadmins to search the apache logs on our servers and try to find out which models are failing, and use them to pull out the info about the client types that are failing.

                                        I do agree that it would be good to find out whether this happens with other BOINC projects.

                                        ...so I have another item on my to-do list!

                                        Jonathan
                                        CPDN SysAdmin

                                        Eirik Redd
                                        Send message
                                        Joined: Aug 31 04
                                        Posts: 193
                                        Credit: 23,214,844
                                        RAC: 31,385
                                        Message 41989 - Posted 21 Apr 2011 9:09:50 UTC

                                          I upgraded two hosts from 6.10.56 to 6.10.58
                                          Also have at least one running 6.6,40
                                          Got 3 pnw with no errors before the recent outage.
                                          Will try for more soon.
                                          Beginning to think might be a problem in glibc since Windows hosts don't seem to be seeing this problem
                                          found in old logs what might be similar back last October on Beta Famous. Unfortunately don't have details or files any more.
                                          Is this possibly a problem for Uli Drepper and the glibc crew?
                                          Or?
                                          Any advice on how to to trap this problem welcome here.
                                          Any ideas on BOINC versions or linux versions or glibc versions?
                                          ____________

                                          Eirik Redd
                                          Send message
                                          Joined: Aug 31 04
                                          Posts: 193
                                          Credit: 23,214,844
                                          RAC: 31,385
                                          Message 41991 - Posted 21 Apr 2011 11:48:39 UTC

                                            Caught one

                                            6.10.58 BOINC client Ubuntu
                                            2.6.32-30-generic #59-Ubuntu SMP Tue Mar 1 21:30:46 UTC 2011 x86_64 GNU/Linux

                                            All 12 uploads corrupted with <url>http://boinc1.coas.oregonstate.edu/cpdn_cgi_main/file_upload_hnndler</url>

                                            Noticed that happens only on my Intel machines, not AMD -- too small a sample to be significant.

                                            this wu
                                            ____________

                                            Eirik Redd
                                            Send message
                                            Joined: Aug 31 04
                                            Posts: 193
                                            Credit: 23,214,844
                                            RAC: 31,385
                                            Message 41992 - Posted 21 Apr 2011 12:51:14 UTC

                                              Gaaaaargh --
                                              This is driving me nuts!
                                              I looked again at the source code at BOINC but can't see head nor tail.
                                              ONE lousy character in the xml gets changed, sometimes. One some machines.With some models, not others. Sometimes. Only linux. Maybe only intel. But not always. Some models are susceptible, but no way to figure it.
                                              And now when I want to do another test my only other Core 2 snarfed a hadcm3n while I wasn't looking --good -- need to do those, but --

                                              So -- this is one weird problem -- What to do?
                                              ____________

                                              Eirik Redd
                                              Send message
                                              Joined: Aug 31 04
                                              Posts: 193
                                              Credit: 23,214,844
                                              RAC: 31,385
                                              Message 41995 - Posted 21 Apr 2011 13:48:10 UTC

                                                OK trying SIMAP and Einstein
                                                ____________

                                                Profile Warped
                                                Send message
                                                Joined: Sep 12 04
                                                Posts: 33
                                                Credit: 750,041
                                                RAC: 0
                                                Message 42016 - Posted 24 Apr 2011 7:47:20 UTC - in response to Message 41988.

                                                  My feeling is to agree that it is not merely a PNW issue.
                                                  I have found references to malformed URLs on more than one upload server.

                                                  I think the best strategy would be for the CPDN sysadmins to search the apache logs on our servers and try to find out which models are failing, and use them to pull out the info about the client types that are failing.

                                                  I do agree that it would be good to find out whether this happens with other BOINC projects.

                                                  ...so I have another item on my to-do list!

                                                  Jonathan
                                                  CPDN SysAdmin


                                                  Welcome Jonathan!

                                                  From what I can gather, the problem seems particularly severe in PNW tasks for Linux. Would it be possible to disable the distribution of these work units to Linux clients until resolved?

                                                  I have tried editing the client_state.xml file to no avail. Does the client make any contact with the URL http://boinc1.coas.oregonstate.edu/cpdn_cgi_main/ at any stage prior to upload? Could it be an error emanating from them?
                                                  ____________
                                                  Warped

                                                  Les Bayliss
                                                  Forum moderator
                                                  Send message
                                                  Joined: Sep 5 04
                                                  Posts: 5129
                                                  Credit: 8,459,347
                                                  RAC: 5,837
                                                  Message 42018 - Posted 24 Apr 2011 8:45:27 UTC

                                                    The models for PNW disappeared about 3 days ago, as per this thread. The application is gone as well.

                                                    So someone is doing something. Not sure who or what.


                                                    ____________
                                                    Backups: Here

                                                    Ingleside
                                                    Send message
                                                    Joined: Aug 5 04
                                                    Posts: 85
                                                    Credit: 6,840,914
                                                    RAC: 11,870
                                                    Message 42019 - Posted 24 Apr 2011 11:50:57 UTC - in response to Message 42016.

                                                      Last modified: 24 Apr 2011 11:54:05 UTC

                                                      From what I can gather, the problem seems particularly severe in PNW tasks for Linux. Would it be possible to disable the distribution of these work units to Linux clients until resolved?

                                                      This could fairly easily be done with making a customized plan-class, but with the current less-than-optimal staff-situation wouldn't expect this to happen at this point.

                                                      I have tried editing the client_state.xml file to no avail. Does the client make any contact with the URL http://boinc1.coas.oregonstate.edu/cpdn_cgi_main/ at any stage prior to upload? Could it be an error emanating from them?

                                                      The upload-server is only accessed then tries to upload a file, not before, so with the corruption being present at the time task was 1st. downloaded to client it has nothing to do with connections to upload-server.

                                                      As for editing client_state.xml, make sure you've completely exited BOINC-client before trying to edit the file, this includes exiting any BOINC-service or whatever it's called under Linux, if not the info in client_state.xml will just be overwritten with new info as BOINC runs.

                                                      If you have exited BOINC, and afterwards edited client_state.xml and the wrong URL somehow gets re-created on next start of BOINC, this would be very interesting, since it's much easier to test-out things that happens with just a re-start of client, and not something that only happens on 1st. download of a task...

                                                      Profile Greg van Paassen
                                                      Send message
                                                      Joined: Nov 17 07
                                                      Posts: 131
                                                      Credit: 3,745,919
                                                      RAC: 4,674
                                                      Message 42020 - Posted 24 Apr 2011 20:12:05 UTC

                                                        Last modified: 24 Apr 2011 20:35:32 UTC

                                                        The problem does occur with PNWs, but it's maddeningly intermittent. I got 6 PNWs on the 22nd. I have "grepped" my client_state.xml and there are no misspellings of file_upload_handler.

                                                        I have set up a cron job to regularly check client_state.xml for misspellings of "file_upload_handler".

                                                        Details: all 6 of my PNWs are re-dispatched tasks (after the first or second client returned an error). One of them has started processing, another should start in 3 days, then 3 more the day after that. Core i7-2600, Arch Linux 64 bit, my own glibc 2.13 compiled with -O2 -march=native -m32.

                                                        Profile Warped
                                                        Send message
                                                        Joined: Sep 12 04
                                                        Posts: 33
                                                        Credit: 750,041
                                                        RAC: 0
                                                        Message 42023 - Posted 26 Apr 2011 6:07:24 UTC - in response to Message 42019.

                                                          As for editing client_state.xml, make sure you've completely exited BOINC-client before trying to edit the file, this includes exiting any BOINC-service or whatever it's called under Linux, if not the info in client_state.xml will just be overwritten with new info as BOINC runs.

                                                          If you have exited BOINC, and afterwards edited client_state.xml and the wrong URL somehow gets re-created on next start of BOINC, this would be very interesting, since it's much easier to test-out things that happens with just a re-start of client, and not something that only happens on 1st. download of a task...


                                                          Hi Ingleside.

                                                          For security reasons I run BOINC in a user account, not as root. It is only possible to edit the client_state.xml file when logged in as root. I tried everything I could think of, including rebooting and going straight to the root account in order to edit the file, making sure that BOINC was not running at all. I also edited the client_state_prev.xml file. I am sure that I was able to correct all instances of the misspelled "hnndler" in both files. Hence my suspicion that the file is somehow updated from the website in Oregon.

                                                          I aborted the task and have changed my preferences to run only the Southern Africa tasks, which happen to be the only type now available. This is running fine, except that the graphics file seems to have been corrupted on download. I can see that the model has not turned to an ice world but that is all (no timestep or other information). This is not a major issue as I can view the checkpoint progress via "Properties" in BOINC.

                                                          3rkko
                                                          Send message
                                                          Joined: Feb 12 08
                                                          Posts: 53
                                                          Credit: 4,090,012
                                                          RAC: 2,979
                                                          Message 42027 - Posted 26 Apr 2011 17:07:11 UTC - in response to Message 42023.

                                                            Did you make sure that the Boinc Client was not running? The command "sudo /etc/init.d/boinc-client status" tells you whether the Client is running or not. You can stop or start it by replacing status with "stop" or "start". This works in Ubuntu and should be something similar in other flavours of Linux.

                                                            Lewis Shadoff
                                                            Avatar
                                                            Send message
                                                            Joined: Oct 23 09
                                                            Posts: 1
                                                            Credit: 1,171,675
                                                            RAC: 0
                                                            Message 42070 - Posted 30 Apr 2011 13:35:06 UTC - in response to Message 41989.

                                                              I am having this problem on Windows 7 BOINC 6.10.60 (and the previous version)
                                                              There are 14 data sets waiting to upload.

                                                              Typical messages:


                                                              4/30/2011 5:50:55 AM climateprediction.net Started upload of hadam3p_pnw_yx88_1967_1_006898128_0_4.zip
                                                              4/30/2011 5:52:14 AM Project communication failed: attempting access to reference site
                                                              4/30/2011 5:52:14 AM climateprediction.net Temporarily failed upload of hadam3p_pnw_yx88_1967_1_006898128_0_4.zip: HTTP error
                                                              4/30/2011 5:52:14 AM climateprediction.net Backing off 14 min 38 sec on upload of hadam3p_pnw_yx88_1967_1_006898128_0_4.zip
                                                              4/30/2011 5:52:15 AM Internet access OK - project servers may be temporarily down.
                                                              4/30/2011 5:57:24 AM Project communication failed: attempting access to reference site
                                                              4/30/2011 5:57:24 AM climateprediction.net Temporarily failed upload of hadam3p_pnw_yx88_1967_1_006898128_0_3.zip: HTTP error
                                                              4/30/2011 5:57:24 AM climateprediction.net Backing off 1 hr 47 min 33 sec on upload of hadam3p_pnw_yx88_1967_1_006898128_0_3.zip
                                                              4/30/2011 5:57:25 AM Internet access OK - project servers may be temporarily down.
                                                              4/30/2011 6:18:12 AM climateprediction.net Sending scheduler request: To send trickle-up message.

                                                              This has been going on since the database upgrade started.

                                                              ____________
                                                              Lewis Shadoff, Ph.D.
                                                              Lake Jackson, TX

                                                              Profile Thyme Lawn
                                                              Forum moderator
                                                              Send message
                                                              Joined: Aug 5 04
                                                              Posts: 1212
                                                              Credit: 10,213,785
                                                              RAC: 692
                                                              Message 42071 - Posted 30 Apr 2011 15:37:18 UTC

                                                                Last modified: 30 Apr 2011 15:41:10 UTC

                                                                That sounds very like the problem I described (and gave a workaround for) here Lewis.
                                                                ____________
                                                                "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer

                                                                Profile Greg van Paassen
                                                                Send message
                                                                Joined: Nov 17 07
                                                                Posts: 131
                                                                Credit: 3,745,919
                                                                RAC: 4,674
                                                                Message 42073 - Posted 30 Apr 2011 15:55:58 UTC

                                                                  Alternatively, it could be a virus scanner problem as described in this thread.

                                                                  Post to thread

                                                                  Message boards : Number crunching : Upload problems




                                                                  Copyright © 2002-2014 climateprediction.net