31.8 hour WU for 509.72 credits unacceptable

Everything about the project RNA World
Nachricht
Autor
Benutzeravatar
Michael H.W. Weber
Vereinsvorstand
Vereinsvorstand
Beiträge: 22435
Registriert: 07.01.2002 01:00
Wohnort: Marpurk
Kontaktdaten:

Re: 31.8 hour WU for 509.72 credits unacceptable

#13 Ungelesener Beitrag von Michael H.W. Weber » 14.03.2010 10:32

He exactly got the credits for the actual work that was done which in RNA World is similar if not more than what other projects would have granted. 32 hrs were not put into computation, also.

Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.

http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B

Bild Bild Bild

Benutzeravatar
rilian
PDA-Benutzer
PDA-Benutzer
Beiträge: 47
Registriert: 08.02.2010 15:38
Kontaktdaten:

Re: 31.8 hour WU for 509.72 credits unacceptable

#14 Ungelesener Beitrag von rilian » 15.03.2010 00:05

oh shi

http://www.rnaworld.de/rnaworld/result. ... id=1291163

seems this task was restarted million times (and wasted 6 days)

could you do something with it in the next app version, please
I crunch for Ukraine

Ananas
WU-Schieber
WU-Schieber
Beiträge: 1184
Registriert: 27.04.2008 18:37
Wohnort: Nordlichter Köln

Re: 31.8 hour WU for 509.72 credits unacceptable

#15 Ungelesener Beitrag von Ananas » 15.03.2010 00:12

rilian hat geschrieben:oh shi

http://www.rnaworld.de/rnaworld/result. ... id=1291163

seems this task was restarted million times (and wasted 6 days)

could you do something with it in the next app version, please
I'm not 100% sure - but I think you're wrong.

I might be wrong - but this "wrapper: windows. no checkpoint image" is not a restart.

A restart always is a new cmsearch call, like : "wrapper: running cmsearch (-o out cmfile in)"

You have that line only once, so it actually did run to the bitter end where the RAM was exhausted.


p.s.: Compare yours to this one : http://www.rnaworld.de/rnaworld/result. ... id=1307857 which (imo.) has a restart in it.
Zuletzt geändert von Ananas am 15.03.2010 00:20, insgesamt 1-mal geändert.
vi BOINC/checkin_notes
:1,$s/bug/feature/g
:wq!

Erzaehlen sich Biologen eigentlich Klein-RNA-Witze?

Benutzeravatar
rilian
PDA-Benutzer
PDA-Benutzer
Beiträge: 47
Registriert: 08.02.2010 15:38
Kontaktdaten:

Re: 31.8 hour WU for 509.72 credits unacceptable

#16 Ungelesener Beitrag von rilian » 15.03.2010 00:19

ok, since that machine is 24/7 crunching (no machine restarts etc), and CPU time is 4000sec while run time 500000 sec, this looked much like symptoms of topic WU, that's why i wrote "restarted"

i dont know what "wrapper: windows. no checkpoint image" is as well :drinking:

is there anything in project folder that may help investigate this? machine is quite remote but i can try to get logs
I crunch for Ukraine

Ananas
WU-Schieber
WU-Schieber
Beiträge: 1184
Registriert: 27.04.2008 18:37
Wohnort: Nordlichter Köln

Re: 31.8 hour WU for 509.72 credits unacceptable

#17 Ungelesener Beitrag von Ananas » 15.03.2010 00:26

It uploads what it has (at least most of it), some stderr output currently seems to get lost though.

We have a thread here with a report that cmsearch sometimes sits there doing nothing :

http://www.rechenkraft.net/phpBB/viewto ... 30#p116830

It's in german language but the screenshot with BOINC and taskmanager should show what it is about. This might explain the big difference between CPU time and wallclock time. We do not know yet why this happens.
vi BOINC/checkin_notes
:1,$s/bug/feature/g
:wq!

Erzaehlen sich Biologen eigentlich Klein-RNA-Witze?

Gentilli

Re: 31.8 hour WU for 509.72 credits unacceptable

#18 Ungelesener Beitrag von Gentilli » 15.03.2010 00:26

Thank you Michael for explaining the situation.
I will be re-attaching to the project shortly.

Rilian, unfortunately that is not my WU. Mine was completed successfully and claimed credits was 458.15.
And my WU was re-started only a few times as the Boinc manager controls the time slots.


Thank you.

Ananas
WU-Schieber
WU-Schieber
Beiträge: 1184
Registriert: 27.04.2008 18:37
Wohnort: Nordlichter Köln

Re: 31.8 hour WU for 509.72 credits unacceptable

#19 Ungelesener Beitrag von Ananas » 15.03.2010 00:32

There are two types of restart.

The "soft / non-destructive" restart happens, when the WU is just suspended (without beeing thrown out of memory) and BOINC reactivates it, I guess that's where the wrapper looks for this "checkpoint image" (which can only exist in Lin32) - Yoyo knows the wrapper very well, maybe he can confirm this?

The other one seems to happen after a crash sometimes, which means that it was actually a restart from scratch - a new cmsearch task with a fresh start from 0% - Gentilli, that's what your WU did (most likely). So the first run gets totally lost (including its CPU time) and only the second (or final) run's CPU time is reported to BOINC.
vi BOINC/checkin_notes
:1,$s/bug/feature/g
:wq!

Erzaehlen sich Biologen eigentlich Klein-RNA-Witze?

Benutzeravatar
rilian
PDA-Benutzer
PDA-Benutzer
Beiträge: 47
Registriert: 08.02.2010 15:38
Kontaktdaten:

Re: 31.8 hour WU for 509.72 credits unacceptable

#20 Ungelesener Beitrag von rilian » 15.03.2010 00:36

You are not authorised to read this forum. :robot:
Ananas hat geschrieben:There are two types of restart.

The "soft / non-destructive" restart happens, when the WU is just suspended (without beeing thrown out of memory) and BOINC reactivates it, I guess that's where the wrapper looks for this "checkpoint image" (which can only exist in Lin32) - Yoyo knows the wrapper very well, maybe he can confirm this?
if wu is "left in memory while suspended", why should it search for checkpoint anyway?
I crunch for Ukraine

Ananas
WU-Schieber
WU-Schieber
Beiträge: 1184
Registriert: 27.04.2008 18:37
Wohnort: Nordlichter Köln

Re: 31.8 hour WU for 509.72 credits unacceptable

#21 Ungelesener Beitrag von Ananas » 15.03.2010 00:40

rilian hat geschrieben:
You are not authorised to read this forum. :robot:
oops, sorry ... in short : it says that BOINC shows the WU as "running" but the windows task manager shows that it does not consume CPU time.
vi BOINC/checkin_notes
:1,$s/bug/feature/g
:wq!

Erzaehlen sich Biologen eigentlich Klein-RNA-Witze?

Benutzeravatar
rilian
PDA-Benutzer
PDA-Benutzer
Beiträge: 47
Registriert: 08.02.2010 15:38
Kontaktdaten:

Re: 31.8 hour WU for 509.72 credits unacceptable

#22 Ungelesener Beitrag von rilian » 15.03.2010 00:43

Ananas hat geschrieben:
rilian hat geschrieben:
You are not authorised to read this forum. :robot:
oops, sorry ... in short : it says that BOINC shows the WU as "running" but the windows task manager shows that it does not consume CPU time.
wu is just waiting for christmas :D

(dunno if this joke is popular in english IT community, but it is quite well known in russian)
I crunch for Ukraine

Ananas
WU-Schieber
WU-Schieber
Beiträge: 1184
Registriert: 27.04.2008 18:37
Wohnort: Nordlichter Köln

Re: 31.8 hour WU for 509.72 credits unacceptable

#23 Ungelesener Beitrag von Ananas » 15.03.2010 00:50

rilian hat geschrieben:... if wu is "left in memory while suspended", why should it search for checkpoint anyway?
I can only guess : When the application says that it knows about checkpoints (i.e. it has a checkpoint filename in the WU), the wrapper tries to evaluate the timestamp of that checkpoint file in order to see wether the WU has checkpointed or not (Information taken from the Berkeley sample code)

The reason why there is a checkpoint filename is, that the WU generator creates the same WU XML files independant from the operating system (even though only Linux32 can handle it).
___________

We know this Christmas joke in german too - and if we want to express an even longer delay (actually infinite), we say "It will happen when Christmas and Easter will be on the same date"
vi BOINC/checkin_notes
:1,$s/bug/feature/g
:wq!

Erzaehlen sich Biologen eigentlich Klein-RNA-Witze?

Antworten

Zurück zu „RNA World Discussions (english)“