Long running work unit

Everything about the project RNA World
Nachricht
Autor
Jacob Klein
Brain-Bug
Brain-Bug
Beiträge: 564
Registriert: 26.07.2013 15:41

Re: Long running work unit

#889 Ungelesener Beitrag von Jacob Klein » 26.08.2017 14:18

Speed is a rock star - Another task completed!
216.4d :smoking:

http://www.rnaworld.de/rnaworld/result. ... d=14952505

Benutzeravatar
Michael H.W. Weber
Vereinsvorstand
Vereinsvorstand
Beiträge: 22427
Registriert: 07.01.2002 01:00
Wohnort: Marpurk
Kontaktdaten:

Re: Long running work unit

#890 Ungelesener Beitrag von Michael H.W. Weber » 26.08.2017 16:06

:good:

Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.

http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B

Bild Bild Bild

robertmiles
XBOX360-Installer
XBOX360-Installer
Beiträge: 86
Registriert: 23.02.2010 18:43
Wohnort: northern Alabama, US

Re: Long running work unit

#891 Ungelesener Beitrag von robertmiles » 30.08.2017 20:18

Michael H.W. Weber hat geschrieben:
Peter Hucker hat geschrieben:Is there no way to make it finish saving the new checkpoint before deleting the previous one?
Well, apparently that is not that trivial...
How much more does it take than just writing checkpoints to files of two alternating names, and trying whichever file has the newer timestamp first if a restore is needed?

Peter Hucker
Mikrocruncher
Mikrocruncher
Beiträge: 30
Registriert: 19.08.2017 13:56

Re: Long running work unit

#892 Ungelesener Beitrag von Peter Hucker » 30.08.2017 20:24

Agreed, sounds like basic safe programming to me. Don't delete the last thing before the new thing is finished.

Jacob Klein
Brain-Bug
Brain-Bug
Beiträge: 564
Registriert: 26.07.2013 15:41

Re: Long running work unit

#893 Ungelesener Beitrag von Jacob Klein » 30.08.2017 22:07

They're doing it through Snapshots within an Oracle VirtualBox VM. So, yes, it seems that it MIGHT be able to become possibly more robust, but at the same time, they're at the mercy of Oracle managing those snapshots. You might want to take a look at BOINC's vboxwrapper code, to see if you can improve upon it!

Benutzeravatar
Michael H.W. Weber
Vereinsvorstand
Vereinsvorstand
Beiträge: 22427
Registriert: 07.01.2002 01:00
Wohnort: Marpurk
Kontaktdaten:

Re: Long running work unit

#894 Ungelesener Beitrag von Michael H.W. Weber » 30.08.2017 22:37

That's exactly the point.

Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.

http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B

Bild Bild Bild

Jacob Klein
Brain-Bug
Brain-Bug
Beiträge: 564
Registriert: 26.07.2013 15:41

Re: Long running work unit

#895 Ungelesener Beitrag von Jacob Klein » 30.08.2017 22:54

The 3 main files you'd want to inspect would be here:

https://github.com/BOINC/boinc/blob/mas ... rapper.cpp
https://github.com/BOINC/boinc/blob/mas ... common.cpp
https://github.com/BOINC/boinc/blob/mas ... m_impl.cpp

Offhand, without inspecting files...
I'm curious what would happen if the system was interrupted, DURING/AFTER the creation of a new snapshot, but BEFORE deletion of the old snapshot.
That is where, it seems to me, a weak spot may exist (and the code looks at "current snapshot" which may be bad).

I don't know. I'm not a vboxwrapper dev. But if any of you guys are, have at it :)

I think (from my experience) that perhaps it's possible that things get in such a wonky state, that VirtualBox can't launch ANY of the snapshots. Again, you're at the mercy of VirtualBox saving those snapshots correctly, without hosing the entire VM, and vboxwrapper doing proper snapshot deletes/resumes.

Peter Hucker
Mikrocruncher
Mikrocruncher
Beiträge: 30
Registriert: 19.08.2017 13:56

Re: Long running work unit

#896 Ungelesener Beitrag von Peter Hucker » 30.08.2017 23:05

Jacob Klein hat geschrieben:Offhand, without inspecting files...
I'm curious what would happen if the system was interrupted, DURING/AFTER the creation of a new snapshot, but BEFORE deletion of the old snapshot.
That is where, it seems to me, a weak spot may exist (and the code looks at "current snapshot" which may be bad).
Surely the system must know if a snapshot has been written completely. A simple "finished" marker on the end of the file. If that ain't there, load the previous one. Somebody (perhaps in Oracle) doesn't know what they're doing.

robertmiles
XBOX360-Installer
XBOX360-Installer
Beiträge: 86
Registriert: 23.02.2010 18:43
Wohnort: northern Alabama, US

Re: Long running work unit

#897 Ungelesener Beitrag von robertmiles » 31.08.2017 07:00

Peter Hucker hat geschrieben:
Jacob Klein hat geschrieben:Offhand, without inspecting files...
I'm curious what would happen if the system was interrupted, DURING/AFTER the creation of a new snapshot, but BEFORE deletion of the old snapshot.
That is where, it seems to me, a weak spot may exist (and the code looks at "current snapshot" which may be bad).
Surely the system must know if a snapshot has been written completely. A simple "finished" marker on the end of the file. If that ain't there, load the previous one. Somebody (perhaps in Oracle) doesn't know what they're doing.
Note, however, that VM snapshots are quite large. Mine is currently about 772 MB. Any change to keep the old snapshot until the new one is fully written needs to be user-selectable, unless you want to have fewer users with enough disk space to handle such tasks.

Oracle could reduce the snapshot size, though, by keeping only one copy of the read-only portion of what is normally in a snapshot.

Benutzeravatar
gemini8
Vereinsvorstand
Vereinsvorstand
Beiträge: 5921
Registriert: 31.05.2011 10:30
Wohnort: Hannover

Re: Long running work unit

#898 Ungelesener Beitrag von gemini8 » 31.08.2017 09:49

I think two tiny little snapshots like that lingering on my hard disk would be no problem if the security of the work was better through them.
Still haven't been able to hunt for RNA World tasks though.
Gruß, Jens
- - - - - -
Lowend-User und Teilzeit-Cruncher

Bild Bild Bild
Bild

Peter Hucker
Mikrocruncher
Mikrocruncher
Beiträge: 30
Registriert: 19.08.2017 13:56

Re: Long running work unit

#899 Ungelesener Beitrag von Peter Hucker » 31.08.2017 11:18

772MB isn't much. Most people have many many GB free. Even my old cobbled together computers from old junk with 80GB drives have 30GB free. My main computer has 3TB and has 250GB free. Using 3/4s of a GB for a short time is nothing.

ChristianB
Admin
Admin
Beiträge: 1920
Registriert: 23.02.2010 22:12

Re: Long running work unit

#900 Ungelesener Beitrag von ChristianB » 31.08.2017 11:26

The thing is. vboxwrapper is supposed to do just what you all described. Keep the snapshot around until the next was written successfully. Unfortunately if the snapshot operation itself is disturbed by something it is still recognized as the last snapshot (although not usable) and vboxwrapper will not try to revert to the snapshot before that. This is hard to reproduce which is why nobody investigated this rare occurrence given the resource constraints at Rechenkraft and BOINC.

Antworten

Zurück zu „RNA World Discussions (english)“