Long running work unit

Everything about the project RNA World

Re: Long running work unit

Unread postby Ananas » 03.12.2011 12:53

The runtime estimate is very rough and afaik. the 100% point is set to that estimate.

If the result has been underestimated, it hangs at 100% for quite a while (this while can even be weeks), if it has been overestimated, it is done before 100% are reached.
vi BOINC/checkin_notes
:1,$s/bug/feature/g
:wq!

Erzaehlen sich Biologen eigentlich Klein-RNA-Witze?
Ananas
WU-Schieber
WU-Schieber
 
Posts: 1184
Joined: 27.04.2008 18:37
Location: Nordlichter Köln

Re: Long running work unit

Unread postby Zachariassen » 06.12.2011 18:57

Hello yoyo
Could you please extend the following task?

13612793 - 5732991

Almost 400h. and the progress has been 100% for a week or so....
I hope I can deliver this WU in 2011 (well - if I can avoid a power outage - again... :wink: )
Zachariassen
 

Re: Long running work unit

Unread postby raddoc » 09.12.2011 18:11

this is at 100% for some time and I have over 166 hours of work - should I abort?


Name cms_GA[e20-30MB_Lin64f]_1_Oryza-sativa-Japonica-Group_CM000145.lin.EMBL_RF00028_Intron_gpI_1307967123_61768_23
Workunit 5288673
Created 16 Nov 2011 18:10:44 UTC
Sent 17 Nov 2011 0:18:42 UTC
Received ---
Server state Over
Outcome No reply
Client state New
Exit status 0 (0x0)
Computer ID 10516
Report deadline 7 Dec 2011 0:18:42 UTC
Run time 0
CPU time 0
stderr out

Validate state Initial
Claimed credit 0
Granted credit 0
application version cmsearch XXL (large) 1.0.2 v0.31
raddoc
Idle-Sammler
Idle-Sammler
 
Posts: 4
Joined: 09.12.2011 18:01

Re: Long running work unit

Unread postby mxplm » 13.12.2011 08:42

raddoc wrote:this is at 100% for some time and I have over 166 hours of work - should I abort?

Someone already finished it, but now the WU status is "too many error results". Perhaps yoyo can rescue this one.

@yoyo: Could you please extend Result 13429171 for me? Either you changed the SSH credentials or I forgot them.
:Wiki-Benutzerseite: (Über mich)
:fold.it: (Helfen durch Zocken)
User avatar
mxplm
Vereinsmitglied
Vereinsmitglied
 
Posts: 966
Joined: 14.09.2009 13:56
Location: Bielefeld

Re: Long running work unit

Unread postby yoyo » 13.12.2011 09:47

raddoc wrote:this is at 100% for some time and I have over 166 hours of work - should I abort?


Name cms_GA[e20-30MB_Lin64f]_1_Oryza-sativa-Japonica-Group_CM000145.lin.EMBL_RF00028_Intron_gpI_1307967123_61768_23
Workunit 5288673
Created 16 Nov 2011 18:10:44 UTC
Sent 17 Nov 2011 0:18:42 UTC
Received ---
Server state Over
Outcome No reply
Client state New
Exit status 0 (0x0)
Computer ID 10516
Report deadline 7 Dec 2011 0:18:42 UTC
Run time 0
CPU time 0
stderr out

Validate state Initial
Claimed credit 0
Granted credit 0
application version cmsearch XXL (large) 1.0.2 v0.31

Boinc canceled and deleted this workunit already because it got to many errors. So you should cancel your result also.
yoyo
HILF mit im Rechenkraft-WiKi, dies gibts zu tun.
Wiki - FAQ - Verein - Chat

Image Image
User avatar
yoyo
Vereinsvorstand
Vereinsvorstand
 
Posts: 7116
Joined: 17.12.2002 14:09
Location: Berlin

Re: Long running work unit

Unread postby yoyo » 13.12.2011 09:48

mxplm wrote:@yoyo: Could you please extend Result 13429171 for me?

I extended it.
yoyo
HILF mit im Rechenkraft-WiKi, dies gibts zu tun.
Wiki - FAQ - Verein - Chat

Image Image
User avatar
yoyo
Vereinsvorstand
Vereinsvorstand
 
Posts: 7116
Joined: 17.12.2002 14:09
Location: Berlin

Re: Long running work unit

Unread postby ftpd » 13.12.2011 16:36

Hi YoYo,

Can you please extend two wu's?
Hostid = 4208

5731106 - 13428348
5714106 - 13394064

Thx,

Ton
ftpd
 

Re: Long running work unit

Unread postby yoyo » 13.12.2011 17:33

Extended.
yoyo
HILF mit im Rechenkraft-WiKi, dies gibts zu tun.
Wiki - FAQ - Verein - Chat

Image Image
User avatar
yoyo
Vereinsvorstand
Vereinsvorstand
 
Posts: 7116
Joined: 17.12.2002 14:09
Location: Berlin

Re: Long running work unit

Unread postby robertmiles » 14.12.2011 01:38

Michael H.W. Weber wrote:Indeed, these WUs were not defective but just not complete. :roll: The problem is that the progress bar does not really work well as described many times before. Again, RNA World has stochastic elements which underly the calculations and that prevents an accurate runtime prediction. In most cases, the progress bar is fairly OK but in some exceptional cases, as in those described above, it fails completely and may indicate runtimes of only 10% of the real runtime. We are of course sad about that because there is no way to solve this problem except for having checkpointing.

Michael.


I've seen a suitable way in a previous application over at Rosetta@Home. As the reported progress approaches 100%, just decrease the size of the steps of reported progress so that the progress keeps increasing, but never reaches 100% until the workunit is actually finished.
robertmiles
PDA-Benutzer
PDA-Benutzer
 
Posts: 51
Joined: 23.02.2010 18:43

Re: Long running work unit

Unread postby Michael H.W. Weber » 14.12.2011 01:40

Too much efforts for very little outcome.

Michael.
Fördern, nicht fordern. Kooperieren, nicht konkurrieren. Konstruieren, nicht konsumieren.

Image

Image Image Image
User avatar
Michael H.W. Weber
Vereinsvorstand
Vereinsvorstand
 
Posts: 18155
Joined: 07.01.2002 01:00
Location: Marpurk

Re: Long running work unit

Unread postby robertmiles » 14.12.2011 03:13

Michael H.W. Weber wrote:
ConflictingEmotions wrote:Really you need to figure out why this intron WUs are so badly underestimated.

Well, we know that but, unfortunately, it cannot be fixed due to stochastic elements in the code.

Michael.


Do you at least get a chance to modify the estimated runtimes after the initial calculation? For example, multiply it by the typical ratio for which intron runtimes are incorrect?

Or, if it's easier, divide the estimated speed of the CPU by that ratio?
robertmiles
PDA-Benutzer
PDA-Benutzer
 
Posts: 51
Joined: 23.02.2010 18:43

Re: Long running work unit

Unread postby Ananas » 14.12.2011 06:44

That's what the first cmsearch call is supposed to do, it is extremely unreliable though.
Code: Select all
wrapper: running unzip_cpufeat (cmsearch.zip)
wrapper: no checkpoint file found
wrapper: running cmsearch (--forecast 1 -T 0.0 --fil-T-hmm 0.0 --fil-T-qdb 0.0 RF00894_mir-790.cm Equus-caballus-(horse)_CM000405.lin.EMBL.fasta)
forecast.txt found.

This "--forecast" is a runtime forecast. Check forecast.txt in your slot directories and you will see that it has an estimated runtime in it. The file is human-readable.

p.s.: this applies only to cmsearch, cmcalibrate uses a loop count for the progress, the last loop needs somewhat more time so it isn't an exact measure - but better than nothing
vi BOINC/checkin_notes
:1,$s/bug/feature/g
:wq!

Erzaehlen sich Biologen eigentlich Klein-RNA-Witze?
Ananas
WU-Schieber
WU-Schieber
 
Posts: 1184
Joined: 27.04.2008 18:37
Location: Nordlichter Köln

PreviousNext

Return to RNA World Discussions (english)

Who is online

Users browsing this forum: No registered users and 6 guests