Checkpointing or job restart issue.

Everything about the project RNA World
Nachricht
Autor
adrianxw

Checkpointing or job restart issue.

#1 Ungelesener Beitrag von adrianxw » 25.11.2011 18:30

My machines normally run 24/7 so I don't know if this is a new problem or not. I noticed it today because the current jobs I'm getting run for a long time.

2-3 times today, I have had to reboot this machine. Each time, the cmsearch XXL (large) ... task has restarted from 0%.

Benutzeravatar
mxplm
Partikel-Strecker
Partikel-Strecker
Beiträge: 966
Registriert: 14.09.2009 13:56
Wohnort: Bielefeld

Re: Checkpointing or job restart issue.

#2 Ungelesener Beitrag von mxplm » 25.11.2011 18:56

The infeRNAl software suite we use for our calculations does not support checkpointing. We attempted to add this feature but didn't succeed (yet). That is why your task restart after rebooting.

We know that this is a problem, especially for those long running cmsearch XXL tasks. The missing checkpoints are the reason why we split the cmsearch app into S and XXL in the first place. This way, users are able do deselect the big workunits. However, we want to reward people who finish monster tasks, so above a certain threshold, there is a credit bonus ;)
:Wiki-Benutzerseite: (Über mich)
:fold.it: (Helfen durch Zocken)

adrianxw

Re: Checkpointing or job restart issue.

#3 Ungelesener Beitrag von adrianxw » 26.11.2011 13:49

Machines here normally run 24/7, (they run web servers), so the run time and lack of checkpointing is not, (normally), an issue. I only noticed the issue because I needed to do some work on this machine. I just looked at my settings, but could not see an option to preferentially take the long jobs. Such an option may be of use to you.

Benutzeravatar
Michael H.W. Weber
Vereinsvorstand
Vereinsvorstand
Beiträge: 22431
Registriert: 07.01.2002 01:00
Wohnort: Marpurk
Kontaktdaten:

Re: Checkpointing or job restart issue.

#4 Ungelesener Beitrag von Michael H.W. Weber » 26.11.2011 15:48

adrianxw hat geschrieben:I just looked at my settings, but could not see an option to preferentially take the long jobs. Such an option may be of use to you.
You just need to select the XXL WUs.

Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.

http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B

Bild Bild Bild

Benutzeravatar
mxplm
Partikel-Strecker
Partikel-Strecker
Beiträge: 966
Registriert: 14.09.2009 13:56
Wohnort: Bielefeld

Re: Checkpointing or job restart issue.

#5 Ungelesener Beitrag von mxplm » 27.11.2011 22:54

Michael H.W. Weber hat geschrieben:You just need to select the XXL WUs.
The closest you can get to "prefer XXL" is to only check cmsearch XXL and to also check "If nothing is available, sent me other WUs".
:Wiki-Benutzerseite: (Über mich)
:fold.it: (Helfen durch Zocken)

adrianxw

Re: Checkpointing or job restart issue.

#6 Ungelesener Beitrag von adrianxw » 21.12.2011 17:25

I had to abort this one as it was a real slow coach, and right now, I am working on the hardware so boots/restarts are necessary 2-3 times a day. It is a shame as it looks, from the results list, that another full run would be useful - lots of aborts and errors. Sorry.

I've set the project to No New Tasks for now, but will be back to normal next week hopefully, more probably, in the New Year.

Antworten

Zurück zu „RNA World Discussions (english)“