Long running work unit
Re: Long running work unit
Congrats again, Mr. Monster-Baby Hunter.
Twenty-five days are something between PrimeGrid's Genefer 21 and Genefer 22 when the latter still ran on CPU - one thread only!
Not nice when I wanted to restart my machine and didn't because it would've been scary to potentially lose all the work that already had been done.
Well, opposite to a lot of other people I saw them through when I got them. Had no working GPU for them back then.
Validation might still take quite a long time on those.
I think people abandoning those tasks after a while or just not choosing them at all were reason enough that PrimeGrid decided to let them run on GPU only.
Still more than four days on my fastest one.
So, thinking about that in relation to RNA World tasks shows even more just how dedicated the RNA World crunchers are.
Twenty-five days are something between PrimeGrid's Genefer 21 and Genefer 22 when the latter still ran on CPU - one thread only!
Not nice when I wanted to restart my machine and didn't because it would've been scary to potentially lose all the work that already had been done.
Well, opposite to a lot of other people I saw them through when I got them. Had no working GPU for them back then.
Validation might still take quite a long time on those.
I think people abandoning those tasks after a while or just not choosing them at all were reason enough that PrimeGrid decided to let them run on GPU only.
Still more than four days on my fastest one.
So, thinking about that in relation to RNA World tasks shows even more just how dedicated the RNA World crunchers are.
-
- Brain-Bug
- Beiträge: 564
- Registriert: 26.07.2013 15:41
Re: Long running work unit
When I first joined RNA World, I didn't understand why these tasks (non-VM at the time) were even being released without checkpointing! But I think I do understand it now, and ... we are extremely lucky to be able to have VirtualBox support in BOINC, to use snapshots as checkpointing. It's still fragile, but it can and does work.
In my opinion, checkpointing should happen every minute, unless it has to write a ton of data. Most of my projects checkpoint every 1-5 minutes. But in the case of VM snapshots, which do write a lot of data, I think they are [correctly] set to checkpoint every 30 minutes.
It works well.
In my opinion, checkpointing should happen every minute, unless it has to write a ton of data. Most of my projects checkpoint every 1-5 minutes. But in the case of VM snapshots, which do write a lot of data, I think they are [correctly] set to checkpoint every 30 minutes.
It works well.
-
- Brain-Bug
- Beiträge: 564
- Registriert: 26.07.2013 15:41
Re: Long running work unit
HUZZAH!
Speed just completed one of the few remaining ones that I'm manually babysitting!
WU 6330804: http://www.rnaworld.de/rnaworld/workuni ... id=6330804
Resumed Task 14951980: http://www.rnaworld.de/rnaworld/result. ... d=14951980
243.4d CPU Time
Only 3 more manual ones left to babysit!
Speed just completed one of the few remaining ones that I'm manually babysitting!
WU 6330804: http://www.rnaworld.de/rnaworld/workuni ... id=6330804
Resumed Task 14951980: http://www.rnaworld.de/rnaworld/result. ... d=14951980
243.4d CPU Time
Only 3 more manual ones left to babysit!
Re: Long running work unit
Congrats!
You sweep through 'em like they are small fry!
You sweep through 'em like they are small fry!
- Michael H.W. Weber
- Vereinsvorstand
- Beiträge: 22435
- Registriert: 07.01.2002 01:00
- Wohnort: Marpurk
- Kontaktdaten:
Re: Long running work unit
I now have two monsters in progess again...
Michael.
Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.
http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B
http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B
-
- Brain-Bug
- Beiträge: 564
- Registriert: 26.07.2013 15:41
Re: Long running work unit
Congrats. It'd be great if you learned to do backups and can complete them even if they fail within BOINC.
Also, I wish I had more monsters, honestly. I've got CPU resources available to dedicate to them, so we can get them cleared out quicker.
But I'm at the mercy of the project's decisions on how things are done.
Also, I wish I had more monsters, honestly. I've got CPU resources available to dedicate to them, so we can get them cleared out quicker.
But I'm at the mercy of the project's decisions on how things are done.
- Michael H.W. Weber
- Vereinsvorstand
- Beiträge: 22435
- Registriert: 07.01.2002 01:00
- Wohnort: Marpurk
- Kontaktdaten:
Re: Long running work unit
No, this has to work without external action. That's why we use Virtualbox for the long tasks.Jacob Klein hat geschrieben:Congrats. It'd be great if you learned to do backups and can complete them even if they fail within BOINC.
Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.
http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B
http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B
-
- Brain-Bug
- Beiträge: 564
- Registriert: 26.07.2013 15:41
Re: Long running work unit
Yeah, well ... Sure. We *try* to get them done using BOINC+VirtualBox, but it's not as robust as we'd like, so things that should take 1-1.5 years take 3-6. A lot of the huge monsters I'm completing, were ones that were completed in 2013 or 2014, and needed a strong wingman.
So, I will continue to carefully take weekly backups, and if BOINC+VirtualBox fails, I'll resume from my backups to ensure a completion. Because I don't want my resources wasted, and I don't want to waste the resources of other wingmen who might get failures too.
"This has to work without external action" seems to be a policy restriction only. Tasks can be completed, and verified with wingmen, all without that policy.
So, I will continue to carefully take weekly backups, and if BOINC+VirtualBox fails, I'll resume from my backups to ensure a completion. Because I don't want my resources wasted, and I don't want to waste the resources of other wingmen who might get failures too.
"This has to work without external action" seems to be a policy restriction only. Tasks can be completed, and verified with wingmen, all without that policy.
-
- Brain-Bug
- Beiträge: 564
- Registriert: 26.07.2013 15:41
Re: Long running work unit
WOW!
Speed just completed a second "manual monster" within the same week!
WU 6330864: http://www.rnaworld.de/rnaworld/workuni ... id=6330864
Resumed Task 14953730: http://www.rnaworld.de/rnaworld/result. ... d=14953730
182.7d CPU Time
Only 2 more manual ones left to babysit, outside of BOINC!
Speed just completed a second "manual monster" within the same week!
WU 6330864: http://www.rnaworld.de/rnaworld/workuni ... id=6330864
Resumed Task 14953730: http://www.rnaworld.de/rnaworld/result. ... d=14953730
182.7d CPU Time
Only 2 more manual ones left to babysit, outside of BOINC!
Re: Long running work unit
Quite a small monster.
-
- Brain-Bug
- Beiträge: 564
- Registriert: 26.07.2013 15:41
Re: Long running work unit
HURRAY!
Speed just completed ANOTHER "manual monster" outside of BOINC!
WU 6341797: http://www.rnaworld.de/rnaworld/workuni ... id=6341797
Resumed Task 14953814: http://www.rnaworld.de/rnaworld/result. ... d=14953814
208.7d CPU Time
Just 1 more "manual monster" left to babysit
Speed just completed ANOTHER "manual monster" outside of BOINC!
WU 6341797: http://www.rnaworld.de/rnaworld/workuni ... id=6341797
Resumed Task 14953814: http://www.rnaworld.de/rnaworld/result. ... d=14953814
208.7d CPU Time
Just 1 more "manual monster" left to babysit
Re: Long running work unit
Congrats.
RNA World should think about creating new monsters for you.
RNA World should think about creating new monsters for you.