Long running work unit

Everything about the project RNA World
Nachricht
Autor
Benutzeravatar
Michael H.W. Weber
Vereinsvorstand
Vereinsvorstand
Beiträge: 22418
Registriert: 07.01.2002 01:00
Wohnort: Marpurk
Kontaktdaten:

Re: Long running work unit

#913 Ungelesener Beitrag von Michael H.W. Weber » 14.10.2017 07:26

Short add-on:
We are aware of the tasks that have to be re-sent, etc. and appropriate measures will be taken to deal with these.

Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.

http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B

Bild Bild Bild

robertmiles
XBOX360-Installer
XBOX360-Installer
Beiträge: 86
Registriert: 23.02.2010 18:43
Wohnort: northern Alabama, US

Re: Long running work unit

#914 Ungelesener Beitrag von robertmiles » 14.10.2017 07:40

Jacob Klein hat geschrieben:
robertmiles hat geschrieben:This task completed in 2012, apparently successfully, but is still waiting for validation:
http://www.rnaworld.de/rnaworld/result. ... d=14832293
I suspect that it reached a limit of 50 failures for wingmates.
Could you check if repeating it under VM would give a useful output to compare it to?
Michael may be able to confirm, but ... it's likely that that work unit was issued using the "cmsearch XXL (long)" app (before VM apps existed), and in order to get a VM wingman for it, they set initial replication to 0 to not send any more under that app, and your wingman has a different work unit with initial replication of 1 using the "cmsearch VM (VirtualBox) 1.0.2" app.
Is there any way to identify WHAT VM workunit in such a case? Or if not directly, a way to get a list of all VM workunits still in progress?
Jacob Klein hat geschrieben:
robertmiles hat geschrieben:This workunit appears to be stuck at 98.765% progress; the progress has not changed in the last 86 hours.
http://www.rnaworld.de/rnaworld/result. ... d=14953356

[snip]
Jacob Klein hat geschrieben:I'm working on a HUGE MONSTER right now, on a slow laptop, where .... it went 98.765% at 196.5d, and is currently at 540d ! Talk about patience!

So, as you can see.... you may only be a third the way complete, when you reach that 98.765% point. MONSTERS.
If they're still using CPU, and you feel like continuing the challenge, then DO NOT ABORT. :)

Regards,
Jacob Klein
It's still using CPU.

Could I get a copy of the source code for the VM application? I've had a online class in CUDA, and am thinking of converting some BOINC applications to CUDA. If enough of one can be converted to run in parallel under CUDA, such an application could theoretically run 750 times as fast on my GTX 1080 as on one core of my computers - but most BOINC projects are satisfied with 10 times as fast.

An OpenCL version could run on more graphics cards, but I have not found a suitable online OpenCL class yet.

Benutzeravatar
Michael H.W. Weber
Vereinsvorstand
Vereinsvorstand
Beiträge: 22418
Registriert: 07.01.2002 01:00
Wohnort: Marpurk
Kontaktdaten:

Re: Long running work unit

#915 Ungelesener Beitrag von Michael H.W. Weber » 14.10.2017 13:07

Porting Infernal to GPUs has been tried years ago and did not scale.

Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.

http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B

Bild Bild Bild

Jacob Klein
Brain-Bug
Brain-Bug
Beiträge: 564
Registriert: 26.07.2013 15:41

Re: Long running work unit

#916 Ungelesener Beitrag von Jacob Klein » 22.10.2017 19:54

Nitro completed his longest running task ever!
548.6d :lol:
http://www.rnaworld.de/rnaworld/workuni ... id=6330820
.. and it validated!

Since this was his last v1.17 task, this frees him up for some serious upgrades:
- New SSD drive (as his promised reward)
- Reinstallation of 4 OSs (Win10 Release, Inside Slow, Insider Fast, Insider Fast Skip Ahead)
- He can now start using the Fall Creators Update, and even the Fast Ring (since he only has v1.18 tasks left)
- VirtualBox v5.1.x (since he only has v1.18 tasks left)

Let's get this party started! :)

Benutzeravatar
gemini8
Vereinsvorstand
Vereinsvorstand
Beiträge: 5898
Registriert: 31.05.2011 10:30
Wohnort: Hannover

Re: Long running work unit

#917 Ungelesener Beitrag von gemini8 » 22.10.2017 23:05

Yeah! Great thing!
Have a nice party, and don't forget the Ubuntu install. ;-)
Gruß, Jens
- - - - - -
Lowend-User und Teilzeit-Cruncher

Bild Bild Bild
Bild

Darkness Productions
Idle-Sammler
Idle-Sammler
Beiträge: 4
Registriert: 25.10.2017 13:19

Re: Long running work unit

#918 Ungelesener Beitrag von Darkness Productions » 25.10.2017 13:28

I, too, have a long running WU, http://www.rnaworld.de/rnaworld/workuni ... id=6330860

I notice on the page linked above that this task has not once been completed correctly. Maybe it's a bad unit?

I received it yesterday (October 24). It currently has completed 0.919% after ~25 hours (which works out to about 113 days, or a completion time around Feb 15, 2018). The deadline for this task is marked at Nov 13, 2017. I'm running on a machine with the following specs:

CPU type GenuineIntel Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz [x86 Family 6 Model 42 Stepping 7]
Number of processors 4
Coprocessors AMD ATI Radeon HD 6970M (1024MB) OpenCL: 1.2
Operating System Darwin 16.6.0
BOINC version 7.8.2
Memory 16384 MB

If this task will not complete for several *months* past the deadline, should I preemptively cancel the task?

Jacob Klein
Brain-Bug
Brain-Bug
Beiträge: 564
Registriert: 26.07.2013 15:41

Re: Long running work unit

#919 Ungelesener Beitrag von Jacob Klein » 25.10.2017 18:40

robertmiles hat geschrieben:
Jacob Klein hat geschrieben:
robertmiles hat geschrieben:This task completed in 2012, apparently successfully, but is still waiting for validation:
http://www.rnaworld.de/rnaworld/result. ... d=14832293
I suspect that it reached a limit of 50 failures for wingmates.
Could you check if repeating it under VM would give a useful output to compare it to?
Michael may be able to confirm, but ... it's likely that that work unit was issued using the "cmsearch XXL (long)" app (before VM apps existed), and in order to get a VM wingman for it, they set initial replication to 0 to not send any more under that app, and your wingman has a different work unit with initial replication of 1 using the "cmsearch VM (VirtualBox) 1.0.2" app.
Is there any way to identify WHAT VM workunit in such a case? Or if not directly, a way to get a list of all VM workunits still in progress?
Robert:

Christian just went through the list of "paused" work units that were basically "completed but requiring manual validation, thus not validated and not reissued to wingmen". He asked me privately about my tasks, and I confirmed that my "paused" tasks were now validated or issued to wingmen.

I also asked him about your task, and he told me who your "VM wingman" is! See below.

Your completed non-VM work unit:
http://www.rnaworld.de/rnaworld/workuni ... id=5986207
Name: cms_GA-p[e5-20MB_Lin64f]_1_Gallus-gallus-(chicken)_CM000111.lin.EMBL_RF00028_Intron_gpI_1328586124_42560
Status: Completed, waiting for validation. Completed 21 Nov 2012, 3:28:28 UTC. Initial replication currently set to 0.

Your VM wingman work unit:
http://www.rnaworld.de/rnaworld/workuni ... id=6341776
Name: cmsvm_GA-p[e5-20MB_Lin64f]_1_Gallus-gallus-(chicken)_CM000111.lin.EMBL_RF00028_Intron_gpI_1328586124_42560
Status: In Progress
minimum quorum: 2
initial replication: 1
max # of error/total/success tasks: 30, 60, 5

So, once someone completes that VM task, then it'll have to be manually verified by Christian.
Hope this helps you!

Thanks,
Jacob Klein

Jacob Klein
Brain-Bug
Brain-Bug
Beiträge: 564
Registriert: 26.07.2013 15:41

Re: Long running work unit

#920 Ungelesener Beitrag von Jacob Klein » 25.10.2017 20:23

Darkness Productions hat geschrieben:I, too, have a long running WU, http://www.rnaworld.de/rnaworld/workuni ... id=6330860
<snip>
It is a monster.
Percent complete is largely irrelevant for monsters. Eventually it will get to 98.765%, and stay there for a long time, until it completes successfully.
Estimated remaining time is also irrelevant for monsters, as the estimate given by the app is usually wayyyyyy off.

It will likely take your PC about 250-600 days of CPU time, in order to complete.
The server will auto-extend the server-side deadlines, if your PC communitates with the server within every 2 weeks.

If you're up for the challenge, keep it, monitor it, and be vigilant (asking questions if needed before going near the abort button, and generally avoiding big upgrades/updates).
If you're not up for the challenge, abort it.

ChristianB
Admin
Admin
Beiträge: 1920
Registriert: 23.02.2010 22:12

Re: Long running work unit

#921 Ungelesener Beitrag von ChristianB » 26.10.2017 07:23

I posted the list of the remaining workunits that need the special manual cross validation here: viewtopic.php?f=75&t=16786

Jacob Klein
Brain-Bug
Brain-Bug
Beiträge: 564
Registriert: 26.07.2013 15:41

Re: Long running work unit

#922 Ungelesener Beitrag von Jacob Klein » 26.10.2017 13:32

Thank you Christian!

Darkness Productions
Idle-Sammler
Idle-Sammler
Beiträge: 4
Registriert: 25.10.2017 13:19

Re: Long running work unit

#923 Ungelesener Beitrag von Darkness Productions » 26.10.2017 13:41

Jacob Klein hat geschrieben: <snip>
The server will auto-extend the server-side deadlines, if your PC communitates with the server within every 2 weeks.
<snip>
Thank you for your quick response!

So does that mean that I should see (after two weeks of CPU time) the deadline extended within the client automatically?

I think my only real concern with this task is that no one else has completed it, with quite a few ending in an Error while computing result after anywhere from 11-128 days of compute time.

Jacob Klein
Brain-Bug
Brain-Bug
Beiträge: 564
Registriert: 26.07.2013 15:41

Re: Long running work unit

#924 Ungelesener Beitrag von Jacob Klein » 26.10.2017 13:58

Darkness Productions hat geschrieben:
Jacob Klein hat geschrieben:<snip>
The server will auto-extend the server-side deadlines, if your PC communitates with the server within every 2 weeks.
<snip>
Thank you for your quick response!
So does that mean that I should see (after two weeks of CPU time) the deadline extended within the client automatically?
I think my only real concern with this task is that no one else has completed it, with quite a few ending in an Error while computing result after anywhere from 11-128 days of compute time.
Nope, the deadline won't extend on the client. But it will get extended on the server and stay green, if your client communicates with the project within every 2 weeks. You should expect to see the server deadline always set to a value that's 2 weeks or more from current date.

And I'm serious about expecting it to take 250-600 days :) Probably hopefully more like 150-300, but still, you get the idea.

Antworten

Zurück zu „RNA World Discussions (english)“