climateprediction.net home page

Server can't open log file


Advanced search

Message boards : Number crunching : Server can't open log file

AuthorMessage
Dave Roberts
Send message
Joined: Jan 15 11
Posts: 81
Credit: 1,398,771
RAC: 779
Message 42611 - Posted 11 Jul 2011 12:47:49 UTC

    I've been getting the message "Server can't open log file (../log_climateapps2/scheduler.log" each time a "trickle up" request is generated? Any problems?

    David

    Tom_unoduetre
    Send message
    Joined: Aug 27 04
    Posts: 5
    Credit: 16,962
    RAC: 0
    Message 42612 - Posted 11 Jul 2011 13:35:13 UTC

      same here
      ____________

      Les Bayliss
      Forum moderator
      Send message
      Joined: Sep 5 04
      Posts: 5424
      Credit: 9,074,925
      RAC: 2,603
      Message 42615 - Posted 14 Jul 2011 20:18:20 UTC

        Yes, there were problems, as mentioned in this News post on our alternative board.


        ____________
        Backups: Here

        Lockleys
        Send message
        Joined: Jan 13 07
        Posts: 126
        Credit: 4,373,196
        RAC: 2,696
        Message 42616 - Posted 14 Jul 2011 21:52:17 UTC

          Last modified: 14 Jul 2011 21:52:38 UTC

          Even though the server appears to be back OK on the server status page, communication is still failing with HTTP internal server error.

          I presume this is known and still awaiting completion of the remedial work.

          Les Bayliss
          Forum moderator
          Send message
          Joined: Sep 5 04
          Posts: 5424
          Credit: 9,074,925
          RAC: 2,603
          Message 42617 - Posted 14 Jul 2011 22:22:54 UTC - in response to Message 42616.

            Correct.
            This board is back, plus a couple of other things, but not everything has been turned back on yet.

            When you have a / and /root failure on a big Apache system, repairs and restores take time.


            ____________
            Backups: Here

            Dave Roberts
            Send message
            Joined: Jan 15 11
            Posts: 81
            Credit: 1,398,771
            RAC: 779
            Message 42619 - Posted 15 Jul 2011 16:03:16 UTC

              Last modified: 15 Jul 2011 16:04:16 UTC

              .

              Dave Roberts
              Send message
              Joined: Jan 15 11
              Posts: 81
              Credit: 1,398,771
              RAC: 779
              Message 42620 - Posted 15 Jul 2011 16:03:16 UTC

                I'm succumbing to temptation - not expecting any comments- but why oh why don't organisations investing in huge hardware/database installations check back in history and use the only genuinely 365/24/7 system that has been proven across the world. Namely OpenVMS/Rdb. The last system I worked on had zero downtime in 10 years. (excluding 1 night out a year for new releases.) Yeah , I know - it's a legacy system. Oh well.
                .

                glaesum
                Send message
                Joined: Feb 24 06
                Posts: 47
                Credit: 604,403
                RAC: 37
                Message 42622 - Posted 15 Jul 2011 23:14:22 UTC

                  well, my first log message for days "Scheduler request completed" without the dreaded "Scheduler request failed: HTTP internal server error" was at 15:55 BST and no sign of any recent trickle files in the data folder.

                  They don't seem to be showing on the database yet - I expect that's the huge backlog to process. At least there's some sign of life - well done for getting it up for the weekend!

                  I'll let another model run now as well as the long coupled models. /p

                  DJStarfox
                  Send message
                  Joined: Jan 27 07
                  Posts: 262
                  Credit: 1,178,267
                  RAC: 171
                  Message 42623 - Posted 16 Jul 2011 0:33:26 UTC

                    Yeah, looks like the scheduler is accepting requests again. However, the last trickle showing on my 1 running model is from 09 Jul 2011 22:07:33. I know my computer has sent several trickles since then. I hope they're not lost.

                    Profile JIM
                    Send message
                    Joined: Dec 31 07
                    Posts: 682
                    Credit: 4,211,232
                    RAC: 2,828
                    Message 42625 - Posted 16 Jul 2011 6:58:37 UTC

                      The server is also giving out new WU?s again. I just received a new CM3n after several of having an idle core. As everyone knows, an idle core is the devils workshop. :-)

                      ____________

                      Ingleside
                      Send message
                      Joined: Aug 5 04
                      Posts: 92
                      Credit: 8,502,569
                      RAC: 8,061
                      Message 42631 - Posted 17 Jul 2011 11:41:57 UTC - in response to Message 42620.

                        I'm succumbing to temptation - not expecting any comments- but why oh why don't organisations investing in huge hardware/database installations check back in history and use the only genuinely 365/24/7 system that has been proven across the world. Namely OpenVMS/Rdb. The last system I worked on had zero downtime in 10 years. (excluding 1 night out a year for new releases.) Yeah , I know - it's a legacy system. Oh well.
                        .

                        Well, it's the 1st. time I've heard of an OS that keeps running flawlessly then the hardware it's running on has stopped working...

                        Mikek69
                        Send message
                        Joined: Dec 31 06
                        Posts: 2
                        Credit: 159,681
                        RAC: 0
                        Message 42635 - Posted 17 Jul 2011 21:46:16 UTC - in response to Message 42623.

                          Me too. I had only been going a couple of hours on a job when they shut down. Since then I have had several trickles fail and at least 3 since it came back on line and still none showing... Pain in the but If I've wasted 110 hours and more if this goes on.
                          Mike

                          Les Bayliss
                          Forum moderator
                          Send message
                          Joined: Sep 5 04
                          Posts: 5424
                          Credit: 9,074,925
                          RAC: 2,603
                          Message 42636 - Posted 17 Jul 2011 22:03:15 UTC - in response to Message 42635.

                            The trickles aren't showing because of the huge backlog of data on the upload servers that needs to be processed, at the same time as tens of thousands of computers want to download new work.
                            And it's been the weekend. Still is in some parts of the world.
                            And there's some more work to do on the servers. I think that some of the daemons haven't been started yet.

                            Patience is the best cure.


                            ____________
                            Backups: Here

                            Dave Roberts
                            Send message
                            Joined: Jan 15 11
                            Posts: 81
                            Credit: 1,398,771
                            RAC: 779
                            Message 42637 - Posted 17 Jul 2011 22:42:40 UTC

                              Well, it's the 1st. time I've heard of an OS that keeps running flawlessly then the hardware it's
                              running on has stopped working...

                              Ah well, I did say 'installations' which naturally includes a backup server to facilitate 'fail over' procedures. Perhaps I should have added the detail, but OpenVMS has always been a world leader for its reliability,performance and continuity with a clustered environment, even across multiple sites. The clustering, together with various system services allow it to be virtually completely 'disaster tolerant'. An exception being if every site is nuked at the same time.

                              Mikek69
                              Send message
                              Joined: Dec 31 06
                              Posts: 2
                              Credit: 159,681
                              RAC: 0
                              Message 42639 - Posted 18 Jul 2011 9:23:05 UTC - in response to Message 42636.

                                Les

                                Thanks for that info. I was beginnibg to think I had lost it all. So I will be patient......

                                Mike

                                glaesum
                                Send message
                                Joined: Feb 24 06
                                Posts: 47
                                Credit: 604,403
                                RAC: 37
                                Message 42644 - Posted 19 Jul 2011 13:23:31 UTC - in response to Message 42636.

                                  "The trickles aren't showing because of the huge backlog of data on the upload servers that needs to be processed, at the same time as tens of thousands of computers want to download new work."

                                  right now I can see trickles for Jul 10 & Jul 11 - so that's some activity, though not sure if any headway is being made into the backlog. how's things with others? /p

                                  Profile Dave Jackson
                                  Send message
                                  Joined: May 15 09
                                  Posts: 866
                                  Credit: 655,707
                                  RAC: 163
                                  Message 42645 - Posted 19 Jul 2011 21:34:39 UTC - in response to Message 42644.

                                    They all seem to be going through normally for me now.

                                    Dave

                                    Profile Greg van Paassen
                                    Send message
                                    Joined: Nov 17 07
                                    Posts: 142
                                    Credit: 4,271,370
                                    RAC: 0
                                    Message 42658 - Posted 23 Jul 2011 20:14:51 UTC

                                      Re this message from Ananas: restarting the client doesn't help. I'm still getting "HTTP internal server error" on trickle-ups.

                                      Les Bayliss
                                      Forum moderator
                                      Send message
                                      Joined: Sep 5 04
                                      Posts: 5424
                                      Credit: 9,074,925
                                      RAC: 2,603
                                      Message 42659 - Posted 23 Jul 2011 20:31:52 UTC - in response to Message 42658.

                                        server error is just that - one of the project's servers.
                                        Usually a sign that the upload server is under heavy load from user's computers.
                                        Restarting your 'client' in any form won't help. You just have to try again later.

                                        The News message refers to errors such as 'server (or project), not found'.


                                        ____________
                                        Backups: Here

                                        Profile Ananas
                                        Forum moderator
                                        Send message
                                        Joined: Oct 31 04
                                        Posts: 336
                                        Credit: 3,316,482
                                        RAC: 0
                                        Message 42661 - Posted 23 Jul 2011 22:04:13 UTC - in response to Message 42658.

                                          Last modified: 23 Jul 2011 22:05:32 UTC

                                          Re this message from Ananas: restarting the client doesn't help. I'm still getting "HTTP internal server error" on trickle-ups.

                                          There are 2 errors that are not directly related.

                                          As you already receive "real" server messages and not just "Scheduler request failed: Couldn't connect to server", your BOINC client already knows the correct IP. Those detailed errors are server side problems, restarting the client does not help.

                                          But BOINC clients cache the IP forever (at least older ones do) and will continue giving you the "connect" error message even if the server is already up and running, just with a fresh IP.


                                          I have 2 models running on 2 hosts, they both have been collecting trickles and were unable to upload them : "Couldn't connect ..."

                                          I restarted only one and that one did upload the trickles, while the other one still sits there with the "connect" error.

                                          Ingleside
                                          Send message
                                          Joined: Aug 5 04
                                          Posts: 92
                                          Credit: 8,502,569
                                          RAC: 8,061
                                          Message 42662 - Posted 24 Jul 2011 1:01:23 UTC - in response to Message 42661.

                                            But BOINC clients cache the IP forever (at least older ones do) and will continue giving you the "connect" error message even if the server is already up and running, just with a fresh IP.

                                            Don't remember the exact version, but this was fixed around v6.10.30, so any resent BOINC-clients shouldn't normally have any problems getting the new IP.



                                            Profile Greg van Paassen
                                            Send message
                                            Joined: Nov 17 07
                                            Posts: 142
                                            Credit: 4,271,370
                                            RAC: 0
                                            Message 42663 - Posted 24 Jul 2011 3:48:22 UTC - in response to Message 42659.

                                              Last modified: 24 Jul 2011 3:55:43 UTC

                                              What I was suggesting (a bit subtly perhaps) is that perhaps the error is in fact HTTP 500, "Internal Error", and not HTTP 502, "Gateway Timeout" (= slow response from the database server), nor HTTP 503, "Service [temporarily] Unavailable", i.e. overload. Given that the machine has been rebuilt, and has changed to a new IP address, and all.

                                              Edit: The point being that waiting would fix the latter two problems, but not the first.

                                              Post to thread

                                              Message boards : Number crunching : Server can't open log file




                                              Copyright © 2002-2014 climateprediction.net