climateprediction.net home page
Signal 10, zip error

Signal 10, zip error

Questions and Answers : Macintosh : Signal 10, zip error
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
old_user4506

Send message
Joined: 31 Aug 04
Posts: 2
Credit: 94,996
RAC: 0
Message 12722 - Posted: 20 May 2005, 13:54:49 UTC
Last modified: 20 May 2005, 13:58:41 UTC

Whenever a work unit comes close to completion, the work unit terminates with a client error. I have included the error text below. Does anybody know what this means? Thanks in advance!

Whenever a work unit comes close to completion, the work unit terminates with a client error. I have included the error text below. Does anybody know what this means? Thanks in advance!

<core_client_version>4.13</core_client_version>
<message>process got signal 10
</message>
<active_task_state>3</active_task_state>
<signal>10</signal>
<stderr_txt>
zip warning: Too many open files
zip warning: could not open for reading: 2bjeba.ph35c10.x2.nc
zip warning: zip file empty
zip I/O error: Too many open files

zip error: Temporary file failure (ziMFXzBx)

</stderr_txt>
ID: 12722 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 12748 - Posted: 21 May 2005, 5:35:03 UTC

I've seen this "too many files open" before when Mac people have had trouble, but, apart from the obvious, I don't know what it means or how to fix it.
And I don't recall anyone offering help.

You could try <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=93"> this</a> thread, and go to the MacNN team site.

Les
ID: 12748 · Report as offensive     Reply Quote
old_user54269

Send message
Joined: 12 Feb 05
Posts: 3
Credit: 143,853
RAC: 0
Message 12921 - Posted: 28 May 2005, 2:13:42 UTC

I have the same issue. My G5 always completes the unit but on the last trickle I get that error. I get the credit for the time but I don't get a valid result. It is getting annoying.
ID: 12921 · Report as offensive     Reply Quote
old_user54269

Send message
Joined: 12 Feb 05
Posts: 3
Credit: 143,853
RAC: 0
Message 12922 - Posted: 28 May 2005, 2:26:03 UTC

I wonder if it is the way the client is trying to zip the file and submit it after it is completed. I have many iterations of this error but it looks to me as if whatever my boinc_4.19_ppcG5 client is trying to do in terms of zipping is not working. Is this 4.19 client looking for a specific zip app somewhere?

Here is an example of my error message. It is like Karl's.

4.19
process got signal 10

3
10

zip warning: Too many open files
zip warning: could not open for reading: 24g4ba.ph34c10.x2.nc
zip warning: zip file empty
zip I/O error: Too many open files

zip error: Temporary file failure (zixaciuv)

ID: 12922 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 12958 - Posted: 30 May 2005, 20:39:57 UTC

I don't know what it's trying to do.
And the new BOINC version apparently crashes right at the start of a run for Macs.

Les
ID: 12958 · Report as offensive     Reply Quote
old_user54269

Send message
Joined: 12 Feb 05
Posts: 3
Credit: 143,853
RAC: 0
Message 12961 - Posted: 31 May 2005, 1:45:38 UTC - in response to Message 12958.  

Yes mine does as well. I installed the 4.43 version and it downloads the packet but never starts it. I end up exceeding my number of units allows per day. I tried restting it mutilple days to no avail.
ID: 12961 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 12966 - Posted: 31 May 2005, 4:41:51 UTC

Rico,
This problem will probably only be solved with a new version of hadsm.
As Macs are only 2% of the computers running this project, and the programmer is up to his eyebrows with work, a cure may be months away.

Les
ID: 12966 · Report as offensive     Reply Quote
RAW

Send message
Joined: 31 Jan 05
Posts: 5
Credit: 909,503
RAC: 296
Message 13060 - Posted: 2 Jun 2005, 14:03:10 UTC - in response to Message 12922.  

&gt; I wonder if it is the way the client is trying to zip the file and submit it
&gt; after it is completed. I have many iterations of this error but it looks to me
&gt; as if whatever my boinc_4.19_ppcG5 client is trying to do in terms of zipping
&gt; is not working. Is this 4.19 client looking for a specific zip app somewhere?
&gt;
&gt; Here is an example of my error message. It is like Karl's.
&gt;
&gt; 4.19
&gt; process got signal 10
&gt;
&gt; 3
&gt; 10
&gt;
&gt; zip warning: Too many open files
&gt; zip warning: could not open for reading: 24g4ba.ph34c10.x2.nc
&gt; zip warning: zip file empty
&gt; zip I/O error: Too many open files
&gt;
&gt; zip error: Temporary file failure (zixaciuv)
&gt;


exactly the same thing for me. This is from the terminal window that I saved when I completed my first project, about a month ago. At the end of a long list of adding files, this is how it ends:

adding: 1retba.ph32c10.x2.nc (deflated 8%)
adding: 1retba.ph33c10.x2.nc (deflated 8%)
adding: 1retba.ph34c10.x2.nc
2005-05-05 02:37:31 [climateprediction.net] Unrecoverable error for result 1ret_400103021_0 (process got signal 10)
2005-05-05 02:37:31 [climateprediction.net] Unrecoverable error for result 1ret_400103021_0 (process got signal 10)

and the stderr out says:

4.19
process got signal 10

3
10

zip warning: Too many open files
zip warning: could not open for reading: 1retba.ph34c10.x2.nc
zip warning: zip file empty
zip I/O error: Too many open files

zip error: Temporary file failure (ziFfyJtV)


The list of zipped files that are still sitting in the project's folder ends with 1retba.ph34c10.x2.nc.zip, but that file is empty (4 Kb). It is followed by 1retba.ph36c10.x2.nc (there is no .ph35c10). In the dataout folder, there is 1retaa.pc.8yac sitting on top, followed by what seems to be the next files in line for zipping: 1retba.ph37c10.x2.nc, 1retba.ph38c10.x2.nc etc.

It is clear that this is an systematic error in the zip procedure on the Mac, because it's always around the same file (###.ph34c10.x2.nc or ###.ph35c10.x2.nc) that things stop working. Is the procedure opening, but not closing files as it zips along, hitting an upper limit to the number of open files? Going on the number of zip-files produced, this must be arount 240. It also surprises me that the zip procedure seemingly has deletedthe original files before it was even done with all the work. Or is that because the files that were zipped were open when the error occurred, and were lost?

On a more practical side: are our results still usable, without the two files that are missing (###.ph34c10.x2.nc and ###.ph35c10.x2.nc, in my case)? Could we send them manually? If all Mac-users are experiencing this problem, than CP is losing 2% of its data, and lots of valuable CPU time. Personally, I don't care about getting credits, I care about supplying useful results to a useful project. If no solution is found for this problem, I could just as well stop participating, which I will now do as soon as my current project nears completion. Or at least make a copy of the original files in dataout before zipping starts.

I'll also post a warning on the MacNN-forum, maybe some geek over there can help the programmer solve this.
ID: 13060 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 13071 - Posted: 2 Jun 2005, 16:08:54 UTC

Raw
It certainly needs someone looking at it. There have been lots of posts from people with a problem, who turn out to have this zip problem.

Not being a Mac person, I've no idea what is normal about number of zip files.
I don't even know which is the problem: hadsm, or BOINC.

Perhaps you should also post on the BOINC (SETI), site.
Hope you get a cure. And please post back here if you do.

Les

ID: 13071 · Report as offensive     Reply Quote
old_user70928

Send message
Joined: 18 Apr 05
Posts: 3
Credit: 7,245
RAC: 0
Message 13117 - Posted: 4 Jun 2005, 13:28:46 UTC - in response to Message 13071.  

I have the exact same problem. Are anyone successfully finishing models on a Mac these days? Not sure how to check (browsed top hosts link and found none, but it was a quick look).

How long has this problem been known? I fully understand that the overworked programmer can't prioritize the 2% apparently running Macs, but if it is not working, Climate Prediction should stop asking Mac users to contribute, or post a warning or something. Has anyone been in touch with them about this?

Also, I have to say... if you ask a group to help you with your model and they volunteer their machines and then have major problems, you kind of have to set some time aside to help out (or you shouldn't have asked for help in the first place).

Christian Hansson


ID: 13117 · Report as offensive     Reply Quote
old_user63950

Send message
Joined: 16 Mar 05
Posts: 2
Credit: 107,907
RAC: 0
Message 13385 - Posted: 12 Jun 2005, 21:15:31 UTC - in response to Message 13071.  

&gt; Raw
&gt; It certainly needs someone looking at it. There have been lots of posts from
&gt; people with a problem, who turn out to have this zip problem.
&gt;
&gt; Not being a Mac person, I've no idea what is normal about number of zip
&gt; files.
&gt; I don't even know which is the problem: hadsm, or BOINC.
&gt;
&gt; Perhaps you should also post on the BOINC (SETI), site.
&gt; Hope you get a cure. And please post back here if you do.
&gt;

It works fine with Seti Boinc, so I suspect that the problem is with the hadsm. I use Macs, and would be interested in debugging this problem. Is there a link to the source code for this particular bit? In particular the section wrapping up the files prior to sending them back.

I suspect that manually duplicating the same process might reveal the issue....

Cheers,

Joel
ID: 13385 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 13392 - Posted: 12 Jun 2005, 22:48:01 UTC

Lupus,
The hadsm is propriety code belonging to the Met Office, so few people have access to it.

It is probably something simple that could be fixed quickly if:
There is/was a programmer familiar with the Mac,
There is a modern Mac available for him to use,
He can justify the time to look, keeping in mind the huge size of the program, and the possibliity that it ISN'T a simple matter.

It's possible that something changed in the Mac hardware/software a few models back, which is no longer compatable with the hadsm code.

Les

ID: 13392 · Report as offensive     Reply Quote
Christian Hoklas

Send message
Joined: 5 Aug 04
Posts: 11
Credit: 63,408
RAC: 0
Message 13724 - Posted: 22 Jun 2005, 20:10:45 UTC

Hey there,

you should not worry too much about this error. Last November Carl Christensen posted in the thread below that this problem isn't really relevant to this project. No work or credit gets lost. Have a look here: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=1076

I hope that helps and explains why nobody seems to care to get this fixed.

Bye!
Christian
ID: 13724 · Report as offensive     Reply Quote
old_user4506

Send message
Joined: 31 Aug 04
Posts: 2
Credit: 94,996
RAC: 0
Message 13726 - Posted: 22 Jun 2005, 21:04:02 UTC - in response to Message 13724.  

&gt; Hey there,
&gt;
&gt; you should not worry too much about this error. Last November Carl Christensen
&gt; posted in the thread below that this problem isn't really relevant to this
&gt; project. No work or credit gets lost. Have a look here:
&gt; http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=1076
&gt;

Quoting from Carl's response in :

&gt; As long as the final trickle (phase 3, timestep 259248) went through, and
&gt; there are no "result#_[1-5].zip" files in your boinc/climateprediction.net
&gt; directory, it's probably OK. I don't really use the BOINC status for credits,
&gt; what "counts" are the trickles and the file uploads.

Yup, my last trickle did go through, so that answers most of my question. There's just one thing: What do I do with the old files from the completed-yet-not-completed work unit?
ID: 13726 · Report as offensive     Reply Quote
Christian Hoklas

Send message
Joined: 5 Aug 04
Posts: 11
Credit: 63,408
RAC: 0
Message 13727 - Posted: 22 Jun 2005, 21:27:27 UTC - in response to Message 13726.  

&gt; There's just one thing: What do I do with the old files from the
&gt; completed-yet-not-completed work unit?

It is always a good idea to keep the file for as long as possible. I know from the pre-boinc time of this project that only a fraction of the results produced are actually send back at the end of the computations. It might be that the people from climateprediction.net consider you model as very interesting and like to take a closer look. Then it would be nice to have the files on your hd. I am not sure whether this is still true. :-/ Sorry!
ID: 13727 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 13728 - Posted: 22 Jun 2005, 21:54:01 UTC

&gt; What do I do with the old files from the completed-yet-not-completed work unit?

You can put them onto cds, then delete them from your computer if you need the room. Possibly safer there as well.
It's what I've been doing.

Any that haven't finished at least one phase are useless, if you have any like that. There was talk that the researchers may at some stage look through partially completed models to see why they failed, but they have so much else to do that it may never happen.

Les

ID: 13728 · Report as offensive     Reply Quote
Profile Ananas
Volunteer moderator

Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 14034 - Posted: 3 Jul 2005, 6:47:22 UTC

Maybe fixing this "open files" problem will fix nearly all Trickle24 problems on most systems.

When I had them (reproduceable!) on a dual CPU win2k system, one of my assumptions about the reason was exactly this - but the PC froze and gave no related error message. I just could see that there have been 0x00 bytes in a bunch of results instead of data.

In this case I'm even a bit happy about your errors as there is a "good" error message for this problem now, which (hopefully) will lead to a really important bugfix.


This one should have one of the highest priorities after having solved all database and server problems.
ID: 14034 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 14037 - Posted: 3 Jul 2005, 7:02:54 UTC

&gt; This one should have one of the highest priorities after having solved all database and server problems.

I agree with this. But how to push it?
Have a look at <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/show_user.php?userid=20179"> this</a> users results. I think he's given up.

ID: 14037 · Report as offensive     Reply Quote
Profile Ananas
Volunteer moderator

Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 14038 - Posted: 3 Jul 2005, 7:22:49 UTC - in response to Message 14037.  

&gt; &gt; This one should have one of the highest priorities after having solved
&gt; all database and server problems.
&gt;
&gt; I agree with this. But how to push it?
&gt; ...


If this stuff is modular (and I bet it is), why not publish the source of those not science-related parts so everyone can help?

I have sent in a few source changes for the BOINC client and PHP and they are working even without having compiled and tested them on my PC. So having just parts of the project sources in public might help here too.
ID: 14038 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 14040 - Posted: 3 Jul 2005, 8:08:06 UTC

It's fortran, which uses subroutines, so it IS modular.
But it's a matter of the legal agreement between the Met Office and Oxford Uni.
It might work if a few people agree to secrecy if they work on it.

ID: 14040 · Report as offensive     Reply Quote
1 · 2 · Next

Questions and Answers : Macintosh : Signal 10, zip error

©2024 climateprediction.net