climateprediction.net (CPDN) home page
Thread 'Ghost work units?'

Thread 'Ghost work units?'

Message boards : Number crunching : Ghost work units?
Message board moderation

To post messages, you must log in.

AuthorMessage
Terrible T

Send message
Joined: 16 Oct 04
Posts: 4
Credit: 2,012,373
RAC: 0
Message 55840 - Posted: 1 Mar 2017, 22:21:34 UTC

Noticed computer tasks which are not in my work que (computer 1420079), anybody
has an idea how/why?

Task Workunit
20298856 10955173 22 Feb 2017, 17:40:15 UTC 4 Feb 2018,
20298576 10884065 22 Feb 2017, 17:38:32 UTC 4 Feb 2018,
20298733 10924236 22 Feb 2017, 17:37:41 UTC 4 Feb 2018,
20295494 10885567 22 Feb 2017, 17:36:48 UTC 4 Feb 2018,
20278213 10876852 22 Feb 2017, 17:36:48 UTC 4 Feb 2018,
20292607 10964113 22 Feb 2017, 17:36:48 UTC 4 Feb 2018,

Terrible T
ID: 55840 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 55841 - Posted: 1 Mar 2017, 22:40:57 UTC

Probably "phantom" tasks. This happens now and then when there's some sort of overload on the Oxford servers.
ID: 55841 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 0
Message 55847 - Posted: 4 Mar 2017, 0:43:30 UTC - in response to Message 55840.  

See the thread below (Orphaned Work Units). I have a number of WU's that CPDN believes are active but are not visible in my BOINC application. My best explanation is that there is a BOINC failure processing mode that occurs on my Intel I7 machine where BOINC processing fails, the data and WU informaton is deleted from my machine, but there is no error report to CPDN....

This results in CPDN continuing to believe the WU is being processed, when in fact it is not...and the WU is then not resent to another user until the 1 year time limit on processing has passed.

Art Masson
ID: 55847 · Report as offensive     Reply Quote
Professor Desty Nova
Avatar

Send message
Joined: 19 Sep 04
Posts: 92
Credit: 2,025,718
RAC: 465
Message 55849 - Posted: 4 Mar 2017, 10:39:39 UTC

And a few weeks ago when the trickles were missing, I had a WU that finished and reported with an OK from the servers (on the BOINC Manager Log), but stills shows as in progress... At least now all the trickles appear.


Professor Desty Nova
Researching Karma the Hard Way
ID: 55849 · Report as offensive     Reply Quote
pututu

Send message
Joined: 18 Jun 17
Posts: 18
Credit: 10,293,533
RAC: 33,275
Message 69414 - Posted: 27 Jul 2023, 23:59:47 UTC

Noticed that I have some ghost/phantom tasks (i.e. tasks showing up in the server as "in progress") but nothing on my PC.

I guess I'll just have to let them expire in about 12 months time.

Let me know if there is a way to recover or if not recycle these tasks back to other volunteers. Let me know whom I can pm the list of ghost tasks to be recycled, if needed. These are the tasks that start with wah2_eas25*, so likely to end up with errors from what I've seen on a few of them.

Cheers.
ID: 69414 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 69415 - Posted: 28 Jul 2023, 1:03:35 UTC - in response to Message 69414.  
Last modified: 28 Jul 2023, 1:12:31 UTC

Noticed that I have some ghost/phantom tasks (i.e. tasks showing up in the server as "in progress") but nothing on my PC.

I guess I'll just have to let them expire in about 12 months time.

Let me know if there is a way to recover or if not recycle these tasks back to other volunteers. Let me know whom I can pm the list of ghost tasks to be recycled, if needed. These are the tasks that start with wah2_eas25*, so likely to end up with errors from what I've seen on a few of them.

Cheers.

If you detach the PC associated with these tasks, their status will go to "Abandoned" and the next task from that work unit will be ready to send out (assuming yours wasn't the last task in that work unit). You can then reattach.

Edit...Do this after you have no cpdn tasks currently running.
ID: 69415 · Report as offensive     Reply Quote
pututu

Send message
Joined: 18 Jun 17
Posts: 18
Credit: 10,293,533
RAC: 33,275
Message 69416 - Posted: 28 Jul 2023, 2:14:22 UTC - in response to Message 69415.  


If you detach the PC associated with these tasks, their status will go to "Abandoned" and the next task from that work unit will be ready to send out (assuming yours wasn't the last task in that work unit). You can then reattach.

Edit...Do this after you have no cpdn tasks currently running.


Thanks but I've the impression that detaching the project will cause the task to be abandoned AND will only be recycled when the task expires in about a year for cpdn tasks.
At least based on what I understand from the primegrid forum when I participated in their challenges. http://www.primegrid.com/forum_thread.php?id=10277&nowrap=true#163886
Also I read older thread that says the same thing: https://www.cpdn.org/forum_thread.php?id=8585#57815

From primegrid forum.
Important reminders:
At the Conclusion of the Challenge: We kindly ask users "moving on" to ABORT their tasks instead of DETACHING, RESETTING, or PAUSING. ABORTING tasks allows them to be recycled immediately; thus a much faster "clean up" to the end of a Challenge. DETACHING, RESETTING, and PAUSING tasks causes them to remain in limbo until they EXPIRE. Therefore, we must wait until tasks expire to send them out to be completed. Please consider either completing what's in the queue or ABORTING them. Thanks!
ID: 69416 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 69417 - Posted: 28 Jul 2023, 4:17:20 UTC - in response to Message 69416.  
Last modified: 28 Jul 2023, 4:18:32 UTC

Back in earlier days, the cpdn server sometimes had trouble keeping up, especially when it was running the weekly credit script. So,occasionally, if tasks were reported during that credit run, the completion status was not logged/stored correctly.

For example, there were 4 tasks (marked Abandoned when I detached) that were sent to one of my computers on May 22 2020 that sent in all 4 trickles,

https://www.cpdn.org/results.php?hostid=1492829&offset=160&show_names=0&state=0&appid=33

and reported to the server on May 27 that the tasks were completed and were a success. However, the server did not record the completion report and so those tasks were no longer on my PC, but were "In progress" according to the task status on the server. When sent to my computer, these tasks had a deadline of May 4 2021. On June 4th 2020 I detached that computer from climateprediction which is when the boinc server marked the stats as "Abandoned".

Looking at one of those work units

https://www.cpdn.org/workunit.php?wuid=12017682

you can see the next task from it was sent out on June 4 2020 to a computer that completed that task. So it did not wait until the deadline, the next task from that work unit was sent back out immediately after abandonment.

If you do an advanced search on the number crunching forum going back with a search limit of "no limit" and keyword detach, or abandoned, you find some replies by WaterOakley, who is a sharp boinc user, recommending the same method for tasks that are listed by the server as in progress for a PC, but are not in the boinc manager task list for the PC.
ID: 69417 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1052
Credit: 16,817,940
RAC: 12,877
Message 69421 - Posted: 28 Jul 2023, 9:18:46 UTC
Last modified: 28 Jul 2023, 9:21:14 UTC

I had some hardware fail with a task running and had to rebuild/reinstall the OS. In this scenario there's nothing that can be done for the 'In Progress' task still showing on a machine that no longer exists. It's not possible to cancel the task via the web account. Which is somewhat irritating given the long timeout of the the hadley model batches.

In practise though, CPDN will usually close a batch once they get >90% returns so late resends are not important to the project.
ID: 69421 · Report as offensive     Reply Quote
pututu

Send message
Joined: 18 Jun 17
Posts: 18
Credit: 10,293,533
RAC: 33,275
Message 69422 - Posted: 28 Jul 2023, 13:45:40 UTC
Last modified: 28 Jul 2023, 13:55:47 UTC

@geophi, thanks for the clarification. I'll detach the project.
ID: 69422 · Report as offensive     Reply Quote
pututu

Send message
Joined: 18 Jun 17
Posts: 18
Credit: 10,293,533
RAC: 33,275
Message 69435 - Posted: 30 Jul 2023, 21:08:44 UTC
Last modified: 30 Jul 2023, 21:12:10 UTC

Just completed the last WU this morning on this rig and detach the project. Doesn't seem to change the status from "in progress" to "abandoned" after re-attaching the project. Maybe wait for another day?

Here is my rig https://www.cpdn.org/results.php?hostid=1542213. WU#12219305 is the ghost task.

I detached and re-attached the cpdn twice this morning but no success without shutting down the boinc client. Just now, I did detach the project and then shutdown client and restarted the client and re-attaching the project but still the same.

7/30/2023 8:06:59 AM | climateprediction.net | Resetting project
7/30/2023 8:06:59 AM | climateprediction.net | Detaching from project
7/30/2023 8:07:28 AM |  | Fetching configuration file from https://climateprediction.net/get_project_config.php
7/30/2023 8:07:52 AM | climateprediction.net | Fetching scheduler list
7/30/2023 8:07:58 AM | climateprediction.net | Master file download succeeded
7/30/2023 8:08:03 AM | climateprediction.net | Sending scheduler request: Project initialization.
7/30/2023 8:08:03 AM | climateprediction.net | Requesting new tasks for CPU and NVIDIA GPU
7/30/2023 8:08:04 AM | climateprediction.net | Scheduler request completed: got 0 new tasks
7/30/2023 8:08:04 AM | climateprediction.net | Project has no tasks available
7/30/2023 8:08:04 AM | climateprediction.net | Project requested delay of 3636 seconds

7/30/2023 9:08:21 AM | climateprediction.net | Resetting project
7/30/2023 9:08:21 AM | climateprediction.net | Detaching from project
7/30/2023 9:10:01 AM |  | Fetching configuration file from https://climateprediction.net/get_project_config.php
7/30/2023 9:10:40 AM |  | Project communication failed: attempting access to reference site
7/30/2023 9:10:41 AM |  | Internet access OK - project servers may be temporarily down.
7/30/2023 9:10:44 AM | climateprediction.net | Fetching scheduler list
7/30/2023 9:10:48 AM | climateprediction.net | Master file download succeeded
7/30/2023 9:10:53 AM | climateprediction.net | Sending scheduler request: Project initialization.
7/30/2023 9:10:53 AM | climateprediction.net | Requesting new tasks for CPU and NVIDIA GPU
7/30/2023 9:10:55 AM | climateprediction.net | Scheduler request completed: got 0 new tasks
7/30/2023 9:10:55 AM | climateprediction.net | Project has no tasks available
7/30/2023 9:10:55 AM | climateprediction.net | Project requested delay of 3636 seconds

ID: 69435 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1052
Credit: 16,817,940
RAC: 12,877
Message 69436 - Posted: 31 Jul 2023, 7:46:15 UTC - in response to Message 69435.  
Last modified: 31 Jul 2023, 7:48:17 UTC

There is a 'memory' I believe in the boinc server software where it retains info on past hosts in case they reconnect later. I had a similar issue some time ago I asked them about that was related.

I'll try to get time with Andy this morning and ask him but I believe that's what's going on. You might need to detach and then wait a couple of days before reattaching. I'm not sure what the right time period is, the moderators might know.
ID: 69436 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,816,918
RAC: 4,574
Message 69437 - Posted: 31 Jul 2023, 8:32:54 UTC - in response to Message 69436.  

There is a 'memory' I believe in the boinc server software where it retains info on past hosts in case they reconnect later. I had a similar issue some time ago I asked them about that was related.
Yes, when you re-attach to the project, the server searches the 'host' table and tries to find a match: if it finds one, it re-cycles the old HostID number to reduce database bloat.

If it issues a new number instead, I expect that the old record won't be changed, and will remain 'fossilised' in the database with the ghost task(s) preserved.

The user can recover the old HostID manually, but it's a bit fiddly: you have to stop BOINC, and edit the client_state.xml file. Proceed with extreme caution, using a plain-text editor only. DON'T do this if you have active tasks in progress or waiting to start.

You need to change both the HostID itself - you can see the old one on this website - and the <rpc_seqno>. Make that value one larger than the "Number of times client has contacted server" shown on this website for the old HostID. Save the file, and restart BOINC. I can't guarantee that it will exorcise the ghosts, but it's worth a try while things are quiet.
ID: 69437 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1052
Credit: 16,817,940
RAC: 12,877
Message 69439 - Posted: 31 Jul 2023, 16:33:17 UTC - in response to Message 69437.  

I have seriously messed up my boinc client editing the client_state.xml before now. Might be safer just to ignore the ghost task until it times out.
ID: 69439 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4542
Credit: 19,039,635
RAC: 18,944
Message 69440 - Posted: 31 Jul 2023, 16:52:47 UTC - in response to Message 69439.  

I have seriously messed up my boinc client editing the client_state.xml before now. Might be safer just to ignore the ghost task until it times out.
I have only edited mine after first backing up! In the days of tasks that lasted for months if a task crashed due to a power outage, I would restore from a backup. Oh the fun!
ID: 69440 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,803,805
RAC: 8,874
Message 69447 - Posted: 2 Aug 2023, 17:38:20 UTC

My last standing ghost WU is now "over" after 9 years in the "In Progress" queue with a "Timed out - no response" status as of 19.07.2023 (got the WU back on 15.01.2014)

https://www.cpdn.org/result.php?resultid=16272420

Finally my queue is clear :)
ID: 69447 · Report as offensive     Reply Quote

Message boards : Number crunching : Ghost work units?

©2024 cpdn.org