climateprediction.net home page
Server can't open log file

Server can't open log file

Message boards : Number crunching : Server can't open log file
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 175
Credit: 6,242,691
RAC: 699
Message 42611 - Posted: 11 Jul 2011, 12:47:49 UTC

I've been getting the message "Server can't open log file (../log_climateapps2/scheduler.log" each time a "trickle up" request is generated? Any problems?

David
ID: 42611 · Report as offensive     Reply Quote
Tom_unoduetre

Send message
Joined: 27 Aug 04
Posts: 5
Credit: 40,886
RAC: 0
Message 42612 - Posted: 11 Jul 2011, 13:35:13 UTC

same here
ID: 42612 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42615 - Posted: 14 Jul 2011, 20:18:20 UTC

Yes, there were problems, as mentioned in this News post on our alternative board.


Backups: Here
ID: 42615 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 42616 - Posted: 14 Jul 2011, 21:52:17 UTC
Last modified: 14 Jul 2011, 21:52:38 UTC

Even though the server appears to be back OK on the server status page, communication is still failing with HTTP internal server error.

I presume this is known and still awaiting completion of the remedial work.
ID: 42616 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42617 - Posted: 14 Jul 2011, 22:22:54 UTC - in response to Message 42616.  

Correct.
This board is back, plus a couple of other things, but not everything has been turned back on yet.

When you have a / and /root failure on a big Apache system, repairs and restores take time.


Backups: Here
ID: 42617 · Report as offensive     Reply Quote
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 175
Credit: 6,242,691
RAC: 699
Message 42619 - Posted: 15 Jul 2011, 16:03:16 UTC
Last modified: 15 Jul 2011, 16:04:16 UTC

.
ID: 42619 · Report as offensive     Reply Quote
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 175
Credit: 6,242,691
RAC: 699
Message 42620 - Posted: 15 Jul 2011, 16:03:16 UTC

I'm succumbing to temptation - not expecting any comments- but why oh why don't organisations investing in huge hardware/database installations check back in history and use the only genuinely 365/24/7 system that has been proven across the world. Namely OpenVMS/Rdb. The last system I worked on had zero downtime in 10 years. (excluding 1 night out a year for new releases.) Yeah , I know - it's a legacy system. Oh well.
.
ID: 42620 · Report as offensive     Reply Quote
glaesum

Send message
Joined: 24 Feb 06
Posts: 47
Credit: 782,082
RAC: 0
Message 42622 - Posted: 15 Jul 2011, 23:14:22 UTC

well, my first log message for days "Scheduler request completed" without the dreaded "Scheduler request failed: HTTP internal server error" was at 15:55 BST and no sign of any recent trickle files in the data folder.

They don't seem to be showing on the database yet - I expect that's the huge backlog to process. At least there's some sign of life - well done for getting it up for the weekend!

I'll let another model run now as well as the long coupled models. /p
ID: 42622 · Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 300
Credit: 3,288,263
RAC: 26,370
Message 42623 - Posted: 16 Jul 2011, 0:33:26 UTC

Yeah, looks like the scheduler is accepting requests again. However, the last trickle showing on my 1 running model is from 09 Jul 2011 22:07:33. I know my computer has sent several trickles since then. I hope they're not lost.
ID: 42623 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,053,321
RAC: 4,417
Message 42625 - Posted: 16 Jul 2011, 6:58:37 UTC

The server is also giving out new WU?s again. I just received a new CM3n after several of having an idle core. As everyone knows, an idle core is the devils workshop. :-)

ID: 42625 · Report as offensive     Reply Quote
Ingleside

Send message
Joined: 5 Aug 04
Posts: 108
Credit: 18,237,578
RAC: 35,248
Message 42631 - Posted: 17 Jul 2011, 11:41:57 UTC - in response to Message 42620.  

I'm succumbing to temptation - not expecting any comments- but why oh why don't organisations investing in huge hardware/database installations check back in history and use the only genuinely 365/24/7 system that has been proven across the world. Namely OpenVMS/Rdb. The last system I worked on had zero downtime in 10 years. (excluding 1 night out a year for new releases.) Yeah , I know - it's a legacy system. Oh well.
.

Well, it's the 1st. time I've heard of an OS that keeps running flawlessly then the hardware it's running on has stopped working...
ID: 42631 · Report as offensive     Reply Quote
Mikek69

Send message
Joined: 31 Dec 06
Posts: 2
Credit: 159,681
RAC: 0
Message 42635 - Posted: 17 Jul 2011, 21:46:16 UTC - in response to Message 42623.  

Me too. I had only been going a couple of hours on a job when they shut down. Since then I have had several trickles fail and at least 3 since it came back on line and still none showing... Pain in the but If I've wasted 110 hours and more if this goes on.
Mike
ID: 42635 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42636 - Posted: 17 Jul 2011, 22:03:15 UTC - in response to Message 42635.  

The trickles aren't showing because of the huge backlog of data on the upload servers that needs to be processed, at the same time as tens of thousands of computers want to download new work.
And it's been the weekend. Still is in some parts of the world.
And there's some more work to do on the servers. I think that some of the daemons haven't been started yet.

Patience is the best cure.


Backups: Here
ID: 42636 · Report as offensive     Reply Quote
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 175
Credit: 6,242,691
RAC: 699
Message 42637 - Posted: 17 Jul 2011, 22:42:40 UTC

Well, it's the 1st. time I've heard of an OS that keeps running flawlessly then the hardware it's
running on has stopped working...

Ah well, I did say 'installations' which naturally includes a backup server to facilitate 'fail over' procedures. Perhaps I should have added the detail, but OpenVMS has always been a world leader for its reliability,performance and continuity with a clustered environment, even across multiple sites. The clustering, together with various system services allow it to be virtually completely 'disaster tolerant'. An exception being if every site is nuked at the same time.
ID: 42637 · Report as offensive     Reply Quote
Mikek69

Send message
Joined: 31 Dec 06
Posts: 2
Credit: 159,681
RAC: 0
Message 42639 - Posted: 18 Jul 2011, 9:23:05 UTC - in response to Message 42636.  

Les

Thanks for that info. I was beginnibg to think I had lost it all. So I will be patient......

Mike
ID: 42639 · Report as offensive     Reply Quote
glaesum

Send message
Joined: 24 Feb 06
Posts: 47
Credit: 782,082
RAC: 0
Message 42644 - Posted: 19 Jul 2011, 13:23:31 UTC - in response to Message 42636.  

"The trickles aren't showing because of the huge backlog of data on the upload servers that needs to be processed, at the same time as tens of thousands of computers want to download new work."

right now I can see trickles for Jul 10 & Jul 11 - so that's some activity, though not sure if any headway is being made into the backlog. how's things with others? /p

ID: 42644 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,377,675
RAC: 3,657
Message 42645 - Posted: 19 Jul 2011, 21:34:39 UTC - in response to Message 42644.  

They all seem to be going through normally for me now.

Dave
ID: 42645 · Report as offensive     Reply Quote
Profile Greg van Paassen

Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 42658 - Posted: 23 Jul 2011, 20:14:51 UTC

Re this message from Ananas: restarting the client doesn't help. I'm still getting "HTTP internal server error" on trickle-ups.
ID: 42658 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42659 - Posted: 23 Jul 2011, 20:31:52 UTC - in response to Message 42658.  

server error is just that - one of the project's servers.
Usually a sign that the upload server is under heavy load from user's computers.
Restarting your 'client' in any form won't help. You just have to try again later.

The News message refers to errors such as 'server (or project), not found'.


Backups: Here
ID: 42659 · Report as offensive     Reply Quote
Profile Ananas
Volunteer moderator

Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 42661 - Posted: 23 Jul 2011, 22:04:13 UTC - in response to Message 42658.  
Last modified: 23 Jul 2011, 22:05:32 UTC

Re this message from Ananas: restarting the client doesn't help. I'm still getting "HTTP internal server error" on trickle-ups.

There are 2 errors that are not directly related.

As you already receive "real" server messages and not just "Scheduler request failed: Couldn't connect to server", your BOINC client already knows the correct IP. Those detailed errors are server side problems, restarting the client does not help.

But BOINC clients cache the IP forever (at least older ones do) and will continue giving you the "connect" error message even if the server is already up and running, just with a fresh IP.


I have 2 models running on 2 hosts, they both have been collecting trickles and were unable to upload them : "Couldn't connect ..."

I restarted only one and that one did upload the trickles, while the other one still sits there with the "connect" error.
ID: 42661 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Server can't open log file

©2024 climateprediction.net