Seite 82 von 93

Re: Long running work unit

Verfasst: 22.09.2018 13:02
von gemini8
Congrats again, Mr. Monster-Baby Hunter. ;-)

Twenty-five days are something between PrimeGrid's Genefer 21 and Genefer 22 when the latter still ran on CPU - one thread only!
Not nice when I wanted to restart my machine and didn't because it would've been scary to potentially lose all the work that already had been done.
Well, opposite to a lot of other people I saw them through when I got them. Had no working GPU for them back then.
Validation might still take quite a long time on those.
I think people abandoning those tasks after a while or just not choosing them at all were reason enough that PrimeGrid decided to let them run on GPU only.
Still more than four days on my fastest one.

So, thinking about that in relation to RNA World tasks shows even more just how dedicated the RNA World crunchers are. :-)

Re: Long running work unit

Verfasst: 22.09.2018 13:53
von Jacob Klein
When I first joined RNA World, I didn't understand why these tasks (non-VM at the time) were even being released without checkpointing! But I think I do understand it now, and ... we are extremely lucky to be able to have VirtualBox support in BOINC, to use snapshots as checkpointing. It's still fragile, but it can and does work.

In my opinion, checkpointing should happen every minute, unless it has to write a ton of data. Most of my projects checkpoint every 1-5 minutes. But in the case of VM snapshots, which do write a lot of data, I think they are [correctly] set to checkpoint every 30 minutes.

It works well.

Re: Long running work unit

Verfasst: 06.10.2018 20:34
von Jacob Klein
HUZZAH!

Speed just completed one of the few remaining ones that I'm manually babysitting!

WU 6330804: http://www.rnaworld.de/rnaworld/workuni ... id=6330804
Resumed Task 14951980: http://www.rnaworld.de/rnaworld/result. ... d=14951980

243.4d CPU Time :smoking:

Only 3 more manual ones left to babysit!

Re: Long running work unit

Verfasst: 07.10.2018 01:53
von gemini8
Congrats!
You sweep through 'em like they are small fry! ;-)

Re: Long running work unit

Verfasst: 07.10.2018 10:15
von Michael H.W. Weber
I now have two monsters in progess again... 8)

Michael.

Re: Long running work unit

Verfasst: 07.10.2018 16:35
von Jacob Klein
Congrats. It'd be great if you learned to do backups and can complete them even if they fail within BOINC.
Also, I wish I had more monsters, honestly. I've got CPU resources available to dedicate to them, so we can get them cleared out quicker.
But I'm at the mercy of the project's decisions on how things are done.

Re: Long running work unit

Verfasst: 08.10.2018 07:37
von Michael H.W. Weber
Jacob Klein hat geschrieben:Congrats. It'd be great if you learned to do backups and can complete them even if they fail within BOINC.
No, this has to work without external action. That's why we use Virtualbox for the long tasks.

Michael.

Re: Long running work unit

Verfasst: 08.10.2018 07:45
von Jacob Klein
Yeah, well ... Sure. We *try* to get them done using BOINC+VirtualBox, but it's not as robust as we'd like, so things that should take 1-1.5 years take 3-6. A lot of the huge monsters I'm completing, were ones that were completed in 2013 or 2014, and needed a strong wingman.

So, I will continue to carefully take weekly backups, and if BOINC+VirtualBox fails, I'll resume from my backups to ensure a completion. Because I don't want my resources wasted, and I don't want to waste the resources of other wingmen who might get failures too.

"This has to work without external action" seems to be a policy restriction only. Tasks can be completed, and verified with wingmen, all without that policy.

Re: Long running work unit

Verfasst: 08.10.2018 20:01
von Jacob Klein
WOW!

Speed just completed a second "manual monster" within the same week!

WU 6330864: http://www.rnaworld.de/rnaworld/workuni ... id=6330864
Resumed Task 14953730: http://www.rnaworld.de/rnaworld/result. ... d=14953730

182.7d CPU Time :smoking:

Only 2 more manual ones left to babysit, outside of BOINC!

Re: Long running work unit

Verfasst: 08.10.2018 22:43
von gemini8
Quite a small monster. ;-)

Re: Long running work unit

Verfasst: 23.10.2018 03:42
von Jacob Klein
HURRAY!

Speed just completed ANOTHER "manual monster" outside of BOINC!

WU 6341797: http://www.rnaworld.de/rnaworld/workuni ... id=6341797
Resumed Task 14953814: http://www.rnaworld.de/rnaworld/result. ... d=14953814

208.7d CPU Time :smoking:

Just 1 more "manual monster" left to babysit :P

Re: Long running work unit

Verfasst: 23.10.2018 07:08
von gemini8
Congrats.
RNA World should think about creating new monsters for you. ;-)