Long running work unit
-
- Brain-Bug
- Beiträge: 564
- Registriert: 26.07.2013 15:41
Re: Long running work unit
Speed is a rock star - Another task completed!
216.4d
http://www.rnaworld.de/rnaworld/result. ... d=14952505
216.4d
http://www.rnaworld.de/rnaworld/result. ... d=14952505
- Michael H.W. Weber
- Vereinsvorstand
- Beiträge: 22431
- Registriert: 07.01.2002 01:00
- Wohnort: Marpurk
- Kontaktdaten:
Re: Long running work unit
Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.
http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B
http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B
-
- XBOX360-Installer
- Beiträge: 86
- Registriert: 23.02.2010 18:43
- Wohnort: northern Alabama, US
Re: Long running work unit
How much more does it take than just writing checkpoints to files of two alternating names, and trying whichever file has the newer timestamp first if a restore is needed?Michael H.W. Weber hat geschrieben:Well, apparently that is not that trivial...Peter Hucker hat geschrieben:Is there no way to make it finish saving the new checkpoint before deleting the previous one?
-
- Mikrocruncher
- Beiträge: 30
- Registriert: 19.08.2017 13:56
Re: Long running work unit
Agreed, sounds like basic safe programming to me. Don't delete the last thing before the new thing is finished.
-
- Brain-Bug
- Beiträge: 564
- Registriert: 26.07.2013 15:41
Re: Long running work unit
They're doing it through Snapshots within an Oracle VirtualBox VM. So, yes, it seems that it MIGHT be able to become possibly more robust, but at the same time, they're at the mercy of Oracle managing those snapshots. You might want to take a look at BOINC's vboxwrapper code, to see if you can improve upon it!
- Michael H.W. Weber
- Vereinsvorstand
- Beiträge: 22431
- Registriert: 07.01.2002 01:00
- Wohnort: Marpurk
- Kontaktdaten:
Re: Long running work unit
That's exactly the point.
Michael.
Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.
http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B
http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B
-
- Brain-Bug
- Beiträge: 564
- Registriert: 26.07.2013 15:41
Re: Long running work unit
The 3 main files you'd want to inspect would be here:
https://github.com/BOINC/boinc/blob/mas ... rapper.cpp
https://github.com/BOINC/boinc/blob/mas ... common.cpp
https://github.com/BOINC/boinc/blob/mas ... m_impl.cpp
Offhand, without inspecting files...
I'm curious what would happen if the system was interrupted, DURING/AFTER the creation of a new snapshot, but BEFORE deletion of the old snapshot.
That is where, it seems to me, a weak spot may exist (and the code looks at "current snapshot" which may be bad).
I don't know. I'm not a vboxwrapper dev. But if any of you guys are, have at it
I think (from my experience) that perhaps it's possible that things get in such a wonky state, that VirtualBox can't launch ANY of the snapshots. Again, you're at the mercy of VirtualBox saving those snapshots correctly, without hosing the entire VM, and vboxwrapper doing proper snapshot deletes/resumes.
https://github.com/BOINC/boinc/blob/mas ... rapper.cpp
https://github.com/BOINC/boinc/blob/mas ... common.cpp
https://github.com/BOINC/boinc/blob/mas ... m_impl.cpp
Offhand, without inspecting files...
I'm curious what would happen if the system was interrupted, DURING/AFTER the creation of a new snapshot, but BEFORE deletion of the old snapshot.
That is where, it seems to me, a weak spot may exist (and the code looks at "current snapshot" which may be bad).
I don't know. I'm not a vboxwrapper dev. But if any of you guys are, have at it
I think (from my experience) that perhaps it's possible that things get in such a wonky state, that VirtualBox can't launch ANY of the snapshots. Again, you're at the mercy of VirtualBox saving those snapshots correctly, without hosing the entire VM, and vboxwrapper doing proper snapshot deletes/resumes.
-
- Mikrocruncher
- Beiträge: 30
- Registriert: 19.08.2017 13:56
Re: Long running work unit
Surely the system must know if a snapshot has been written completely. A simple "finished" marker on the end of the file. If that ain't there, load the previous one. Somebody (perhaps in Oracle) doesn't know what they're doing.Jacob Klein hat geschrieben:Offhand, without inspecting files...
I'm curious what would happen if the system was interrupted, DURING/AFTER the creation of a new snapshot, but BEFORE deletion of the old snapshot.
That is where, it seems to me, a weak spot may exist (and the code looks at "current snapshot" which may be bad).
-
- XBOX360-Installer
- Beiträge: 86
- Registriert: 23.02.2010 18:43
- Wohnort: northern Alabama, US
Re: Long running work unit
Note, however, that VM snapshots are quite large. Mine is currently about 772 MB. Any change to keep the old snapshot until the new one is fully written needs to be user-selectable, unless you want to have fewer users with enough disk space to handle such tasks.Peter Hucker hat geschrieben:Surely the system must know if a snapshot has been written completely. A simple "finished" marker on the end of the file. If that ain't there, load the previous one. Somebody (perhaps in Oracle) doesn't know what they're doing.Jacob Klein hat geschrieben:Offhand, without inspecting files...
I'm curious what would happen if the system was interrupted, DURING/AFTER the creation of a new snapshot, but BEFORE deletion of the old snapshot.
That is where, it seems to me, a weak spot may exist (and the code looks at "current snapshot" which may be bad).
Oracle could reduce the snapshot size, though, by keeping only one copy of the read-only portion of what is normally in a snapshot.
Re: Long running work unit
I think two tiny little snapshots like that lingering on my hard disk would be no problem if the security of the work was better through them.
Still haven't been able to hunt for RNA World tasks though.
Still haven't been able to hunt for RNA World tasks though.
-
- Mikrocruncher
- Beiträge: 30
- Registriert: 19.08.2017 13:56
Re: Long running work unit
772MB isn't much. Most people have many many GB free. Even my old cobbled together computers from old junk with 80GB drives have 30GB free. My main computer has 3TB and has 250GB free. Using 3/4s of a GB for a short time is nothing.
-
- Admin
- Beiträge: 1920
- Registriert: 23.02.2010 22:12
Re: Long running work unit
The thing is. vboxwrapper is supposed to do just what you all described. Keep the snapshot around until the next was written successfully. Unfortunately if the snapshot operation itself is disturbed by something it is still recognized as the last snapshot (although not usable) and vboxwrapper will not try to revert to the snapshot before that. This is hard to reproduce which is why nobody investigated this rare occurrence given the resource constraints at Rechenkraft and BOINC.