Seite 77 von 93

Re: Long running work unit

Verfasst: 14.10.2017 07:26
von Michael H.W. Weber
Short add-on:
We are aware of the tasks that have to be re-sent, etc. and appropriate measures will be taken to deal with these.

Michael.

Re: Long running work unit

Verfasst: 14.10.2017 07:40
von robertmiles
Jacob Klein hat geschrieben:
robertmiles hat geschrieben:This task completed in 2012, apparently successfully, but is still waiting for validation:
http://www.rnaworld.de/rnaworld/result. ... d=14832293
I suspect that it reached a limit of 50 failures for wingmates.
Could you check if repeating it under VM would give a useful output to compare it to?
Michael may be able to confirm, but ... it's likely that that work unit was issued using the "cmsearch XXL (long)" app (before VM apps existed), and in order to get a VM wingman for it, they set initial replication to 0 to not send any more under that app, and your wingman has a different work unit with initial replication of 1 using the "cmsearch VM (VirtualBox) 1.0.2" app.
Is there any way to identify WHAT VM workunit in such a case? Or if not directly, a way to get a list of all VM workunits still in progress?
Jacob Klein hat geschrieben:
robertmiles hat geschrieben:This workunit appears to be stuck at 98.765% progress; the progress has not changed in the last 86 hours.
http://www.rnaworld.de/rnaworld/result. ... d=14953356

[snip]
Jacob Klein hat geschrieben:I'm working on a HUGE MONSTER right now, on a slow laptop, where .... it went 98.765% at 196.5d, and is currently at 540d ! Talk about patience!

So, as you can see.... you may only be a third the way complete, when you reach that 98.765% point. MONSTERS.
If they're still using CPU, and you feel like continuing the challenge, then DO NOT ABORT. :)

Regards,
Jacob Klein
It's still using CPU.

Could I get a copy of the source code for the VM application? I've had a online class in CUDA, and am thinking of converting some BOINC applications to CUDA. If enough of one can be converted to run in parallel under CUDA, such an application could theoretically run 750 times as fast on my GTX 1080 as on one core of my computers - but most BOINC projects are satisfied with 10 times as fast.

An OpenCL version could run on more graphics cards, but I have not found a suitable online OpenCL class yet.

Re: Long running work unit

Verfasst: 14.10.2017 13:07
von Michael H.W. Weber
Porting Infernal to GPUs has been tried years ago and did not scale.

Michael.

Re: Long running work unit

Verfasst: 22.10.2017 19:54
von Jacob Klein
Nitro completed his longest running task ever!
548.6d :lol:
http://www.rnaworld.de/rnaworld/workuni ... id=6330820
.. and it validated!

Since this was his last v1.17 task, this frees him up for some serious upgrades:
- New SSD drive (as his promised reward)
- Reinstallation of 4 OSs (Win10 Release, Inside Slow, Insider Fast, Insider Fast Skip Ahead)
- He can now start using the Fall Creators Update, and even the Fast Ring (since he only has v1.18 tasks left)
- VirtualBox v5.1.x (since he only has v1.18 tasks left)

Let's get this party started! :)

Re: Long running work unit

Verfasst: 22.10.2017 23:05
von gemini8
Yeah! Great thing!
Have a nice party, and don't forget the Ubuntu install. ;-)

Re: Long running work unit

Verfasst: 25.10.2017 13:28
von Darkness Productions
I, too, have a long running WU, http://www.rnaworld.de/rnaworld/workuni ... id=6330860

I notice on the page linked above that this task has not once been completed correctly. Maybe it's a bad unit?

I received it yesterday (October 24). It currently has completed 0.919% after ~25 hours (which works out to about 113 days, or a completion time around Feb 15, 2018). The deadline for this task is marked at Nov 13, 2017. I'm running on a machine with the following specs:

CPU type GenuineIntel Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz [x86 Family 6 Model 42 Stepping 7]
Number of processors 4
Coprocessors AMD ATI Radeon HD 6970M (1024MB) OpenCL: 1.2
Operating System Darwin 16.6.0
BOINC version 7.8.2
Memory 16384 MB

If this task will not complete for several *months* past the deadline, should I preemptively cancel the task?

Re: Long running work unit

Verfasst: 25.10.2017 18:40
von Jacob Klein
robertmiles hat geschrieben:
Jacob Klein hat geschrieben:
robertmiles hat geschrieben:This task completed in 2012, apparently successfully, but is still waiting for validation:
http://www.rnaworld.de/rnaworld/result. ... d=14832293
I suspect that it reached a limit of 50 failures for wingmates.
Could you check if repeating it under VM would give a useful output to compare it to?
Michael may be able to confirm, but ... it's likely that that work unit was issued using the "cmsearch XXL (long)" app (before VM apps existed), and in order to get a VM wingman for it, they set initial replication to 0 to not send any more under that app, and your wingman has a different work unit with initial replication of 1 using the "cmsearch VM (VirtualBox) 1.0.2" app.
Is there any way to identify WHAT VM workunit in such a case? Or if not directly, a way to get a list of all VM workunits still in progress?
Robert:

Christian just went through the list of "paused" work units that were basically "completed but requiring manual validation, thus not validated and not reissued to wingmen". He asked me privately about my tasks, and I confirmed that my "paused" tasks were now validated or issued to wingmen.

I also asked him about your task, and he told me who your "VM wingman" is! See below.

Your completed non-VM work unit:
http://www.rnaworld.de/rnaworld/workuni ... id=5986207
Name: cms_GA-p[e5-20MB_Lin64f]_1_Gallus-gallus-(chicken)_CM000111.lin.EMBL_RF00028_Intron_gpI_1328586124_42560
Status: Completed, waiting for validation. Completed 21 Nov 2012, 3:28:28 UTC. Initial replication currently set to 0.

Your VM wingman work unit:
http://www.rnaworld.de/rnaworld/workuni ... id=6341776
Name: cmsvm_GA-p[e5-20MB_Lin64f]_1_Gallus-gallus-(chicken)_CM000111.lin.EMBL_RF00028_Intron_gpI_1328586124_42560
Status: In Progress
minimum quorum: 2
initial replication: 1
max # of error/total/success tasks: 30, 60, 5

So, once someone completes that VM task, then it'll have to be manually verified by Christian.
Hope this helps you!

Thanks,
Jacob Klein

Re: Long running work unit

Verfasst: 25.10.2017 20:23
von Jacob Klein
Darkness Productions hat geschrieben:I, too, have a long running WU, http://www.rnaworld.de/rnaworld/workuni ... id=6330860
<snip>
It is a monster.
Percent complete is largely irrelevant for monsters. Eventually it will get to 98.765%, and stay there for a long time, until it completes successfully.
Estimated remaining time is also irrelevant for monsters, as the estimate given by the app is usually wayyyyyy off.

It will likely take your PC about 250-600 days of CPU time, in order to complete.
The server will auto-extend the server-side deadlines, if your PC communitates with the server within every 2 weeks.

If you're up for the challenge, keep it, monitor it, and be vigilant (asking questions if needed before going near the abort button, and generally avoiding big upgrades/updates).
If you're not up for the challenge, abort it.

Re: Long running work unit

Verfasst: 26.10.2017 07:23
von ChristianB
I posted the list of the remaining workunits that need the special manual cross validation here: viewtopic.php?f=75&t=16786

Re: Long running work unit

Verfasst: 26.10.2017 13:32
von Jacob Klein
Thank you Christian!

Re: Long running work unit

Verfasst: 26.10.2017 13:41
von Darkness Productions
Jacob Klein hat geschrieben: <snip>
The server will auto-extend the server-side deadlines, if your PC communitates with the server within every 2 weeks.
<snip>
Thank you for your quick response!

So does that mean that I should see (after two weeks of CPU time) the deadline extended within the client automatically?

I think my only real concern with this task is that no one else has completed it, with quite a few ending in an Error while computing result after anywhere from 11-128 days of compute time.

Re: Long running work unit

Verfasst: 26.10.2017 13:58
von Jacob Klein
Darkness Productions hat geschrieben:
Jacob Klein hat geschrieben:<snip>
The server will auto-extend the server-side deadlines, if your PC communitates with the server within every 2 weeks.
<snip>
Thank you for your quick response!
So does that mean that I should see (after two weeks of CPU time) the deadline extended within the client automatically?
I think my only real concern with this task is that no one else has completed it, with quite a few ending in an Error while computing result after anywhere from 11-128 days of compute time.
Nope, the deadline won't extend on the client. But it will get extended on the server and stay green, if your client communicates with the project within every 2 weeks. You should expect to see the server deadline always set to a value that's 2 weeks or more from current date.

And I'm serious about expecting it to take 250-600 days :) Probably hopefully more like 150-300, but still, you get the idea.