Long running work unit

Everything about the project RNA World
Nachricht
Autor
Benutzeravatar
gemini8
Vereinsvorstand
Vereinsvorstand
Beiträge: 5898
Registriert: 31.05.2011 10:30
Wohnort: Hannover

Re: Long running work unit

#973 Ungelesener Beitrag von gemini8 » 22.09.2018 13:02

Congrats again, Mr. Monster-Baby Hunter. ;-)

Twenty-five days are something between PrimeGrid's Genefer 21 and Genefer 22 when the latter still ran on CPU - one thread only!
Not nice when I wanted to restart my machine and didn't because it would've been scary to potentially lose all the work that already had been done.
Well, opposite to a lot of other people I saw them through when I got them. Had no working GPU for them back then.
Validation might still take quite a long time on those.
I think people abandoning those tasks after a while or just not choosing them at all were reason enough that PrimeGrid decided to let them run on GPU only.
Still more than four days on my fastest one.

So, thinking about that in relation to RNA World tasks shows even more just how dedicated the RNA World crunchers are. :-)
Gruß, Jens
- - - - - -
Lowend-User und Teilzeit-Cruncher

Bild Bild Bild
Bild

Jacob Klein
Brain-Bug
Brain-Bug
Beiträge: 564
Registriert: 26.07.2013 15:41

Re: Long running work unit

#974 Ungelesener Beitrag von Jacob Klein » 22.09.2018 13:53

When I first joined RNA World, I didn't understand why these tasks (non-VM at the time) were even being released without checkpointing! But I think I do understand it now, and ... we are extremely lucky to be able to have VirtualBox support in BOINC, to use snapshots as checkpointing. It's still fragile, but it can and does work.

In my opinion, checkpointing should happen every minute, unless it has to write a ton of data. Most of my projects checkpoint every 1-5 minutes. But in the case of VM snapshots, which do write a lot of data, I think they are [correctly] set to checkpoint every 30 minutes.

It works well.

Jacob Klein
Brain-Bug
Brain-Bug
Beiträge: 564
Registriert: 26.07.2013 15:41

Re: Long running work unit

#975 Ungelesener Beitrag von Jacob Klein » 06.10.2018 20:34

HUZZAH!

Speed just completed one of the few remaining ones that I'm manually babysitting!

WU 6330804: http://www.rnaworld.de/rnaworld/workuni ... id=6330804
Resumed Task 14951980: http://www.rnaworld.de/rnaworld/result. ... d=14951980

243.4d CPU Time :smoking:

Only 3 more manual ones left to babysit!

Benutzeravatar
gemini8
Vereinsvorstand
Vereinsvorstand
Beiträge: 5898
Registriert: 31.05.2011 10:30
Wohnort: Hannover

Re: Long running work unit

#976 Ungelesener Beitrag von gemini8 » 07.10.2018 01:53

Congrats!
You sweep through 'em like they are small fry! ;-)
Gruß, Jens
- - - - - -
Lowend-User und Teilzeit-Cruncher

Bild Bild Bild
Bild

Benutzeravatar
Michael H.W. Weber
Vereinsvorstand
Vereinsvorstand
Beiträge: 22418
Registriert: 07.01.2002 01:00
Wohnort: Marpurk
Kontaktdaten:

Re: Long running work unit

#977 Ungelesener Beitrag von Michael H.W. Weber » 07.10.2018 10:15

I now have two monsters in progess again... 8)

Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.

http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B

Bild Bild Bild

Jacob Klein
Brain-Bug
Brain-Bug
Beiträge: 564
Registriert: 26.07.2013 15:41

Re: Long running work unit

#978 Ungelesener Beitrag von Jacob Klein » 07.10.2018 16:35

Congrats. It'd be great if you learned to do backups and can complete them even if they fail within BOINC.
Also, I wish I had more monsters, honestly. I've got CPU resources available to dedicate to them, so we can get them cleared out quicker.
But I'm at the mercy of the project's decisions on how things are done.

Benutzeravatar
Michael H.W. Weber
Vereinsvorstand
Vereinsvorstand
Beiträge: 22418
Registriert: 07.01.2002 01:00
Wohnort: Marpurk
Kontaktdaten:

Re: Long running work unit

#979 Ungelesener Beitrag von Michael H.W. Weber » 08.10.2018 07:37

Jacob Klein hat geschrieben:Congrats. It'd be great if you learned to do backups and can complete them even if they fail within BOINC.
No, this has to work without external action. That's why we use Virtualbox for the long tasks.

Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.

http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B

Bild Bild Bild

Jacob Klein
Brain-Bug
Brain-Bug
Beiträge: 564
Registriert: 26.07.2013 15:41

Re: Long running work unit

#980 Ungelesener Beitrag von Jacob Klein » 08.10.2018 07:45

Yeah, well ... Sure. We *try* to get them done using BOINC+VirtualBox, but it's not as robust as we'd like, so things that should take 1-1.5 years take 3-6. A lot of the huge monsters I'm completing, were ones that were completed in 2013 or 2014, and needed a strong wingman.

So, I will continue to carefully take weekly backups, and if BOINC+VirtualBox fails, I'll resume from my backups to ensure a completion. Because I don't want my resources wasted, and I don't want to waste the resources of other wingmen who might get failures too.

"This has to work without external action" seems to be a policy restriction only. Tasks can be completed, and verified with wingmen, all without that policy.

Jacob Klein
Brain-Bug
Brain-Bug
Beiträge: 564
Registriert: 26.07.2013 15:41

Re: Long running work unit

#981 Ungelesener Beitrag von Jacob Klein » 08.10.2018 20:01

WOW!

Speed just completed a second "manual monster" within the same week!

WU 6330864: http://www.rnaworld.de/rnaworld/workuni ... id=6330864
Resumed Task 14953730: http://www.rnaworld.de/rnaworld/result. ... d=14953730

182.7d CPU Time :smoking:

Only 2 more manual ones left to babysit, outside of BOINC!

Benutzeravatar
gemini8
Vereinsvorstand
Vereinsvorstand
Beiträge: 5898
Registriert: 31.05.2011 10:30
Wohnort: Hannover

Re: Long running work unit

#982 Ungelesener Beitrag von gemini8 » 08.10.2018 22:43

Quite a small monster. ;-)
Gruß, Jens
- - - - - -
Lowend-User und Teilzeit-Cruncher

Bild Bild Bild
Bild

Jacob Klein
Brain-Bug
Brain-Bug
Beiträge: 564
Registriert: 26.07.2013 15:41

Re: Long running work unit

#983 Ungelesener Beitrag von Jacob Klein » 23.10.2018 03:42

HURRAY!

Speed just completed ANOTHER "manual monster" outside of BOINC!

WU 6341797: http://www.rnaworld.de/rnaworld/workuni ... id=6341797
Resumed Task 14953814: http://www.rnaworld.de/rnaworld/result. ... d=14953814

208.7d CPU Time :smoking:

Just 1 more "manual monster" left to babysit :P

Benutzeravatar
gemini8
Vereinsvorstand
Vereinsvorstand
Beiträge: 5898
Registriert: 31.05.2011 10:30
Wohnort: Hannover

Re: Long running work unit

#984 Ungelesener Beitrag von gemini8 » 23.10.2018 07:08

Congrats.
RNA World should think about creating new monsters for you. ;-)
Gruß, Jens
- - - - - -
Lowend-User und Teilzeit-Cruncher

Bild Bild Bild
Bild

Antworten

Zurück zu „RNA World Discussions (english)“