Seite 28 von 93

Re: Long running work unit

Verfasst: 03.12.2011 12:53
von Ananas
The runtime estimate is very rough and afaik. the 100% point is set to that estimate.

If the result has been underestimated, it hangs at 100% for quite a while (this while can even be weeks), if it has been overestimated, it is done before 100% are reached.

Re: Long running work unit

Verfasst: 06.12.2011 18:57
von Zachariassen
Hello yoyo
Could you please extend the following task?

13612793 - 5732991

Almost 400h. and the progress has been 100% for a week or so....
I hope I can deliver this WU in 2011 (well - if I can avoid a power outage - again... :wink: )

Re: Long running work unit

Verfasst: 09.12.2011 18:11
von raddoc
this is at 100% for some time and I have over 166 hours of work - should I abort?


Name cms_GA[e20-30MB_Lin64f]_1_Oryza-sativa-Japonica-Group_CM000145.lin.EMBL_RF00028_Intron_gpI_1307967123_61768_23
Workunit 5288673
Created 16 Nov 2011 18:10:44 UTC
Sent 17 Nov 2011 0:18:42 UTC
Received ---
Server state Over
Outcome No reply
Client state New
Exit status 0 (0x0)
Computer ID 10516
Report deadline 7 Dec 2011 0:18:42 UTC
Run time 0
CPU time 0
stderr out

Validate state Initial
Claimed credit 0
Granted credit 0
application version cmsearch XXL (large) 1.0.2 v0.31

Re: Long running work unit

Verfasst: 13.12.2011 08:42
von mxplm
raddoc hat geschrieben:this is at 100% for some time and I have over 166 hours of work - should I abort?
Someone already finished it, but now the WU status is "too many error results". Perhaps yoyo can rescue this one.

@yoyo: Could you please extend Result 13429171 for me? Either you changed the SSH credentials or I forgot them.

Re: Long running work unit

Verfasst: 13.12.2011 09:47
von yoyo
raddoc hat geschrieben:this is at 100% for some time and I have over 166 hours of work - should I abort?


Name cms_GA[e20-30MB_Lin64f]_1_Oryza-sativa-Japonica-Group_CM000145.lin.EMBL_RF00028_Intron_gpI_1307967123_61768_23
Workunit 5288673
Created 16 Nov 2011 18:10:44 UTC
Sent 17 Nov 2011 0:18:42 UTC
Received ---
Server state Over
Outcome No reply
Client state New
Exit status 0 (0x0)
Computer ID 10516
Report deadline 7 Dec 2011 0:18:42 UTC
Run time 0
CPU time 0
stderr out

Validate state Initial
Claimed credit 0
Granted credit 0
application version cmsearch XXL (large) 1.0.2 v0.31
Boinc canceled and deleted this workunit already because it got to many errors. So you should cancel your result also.
yoyo

Re: Long running work unit

Verfasst: 13.12.2011 09:48
von yoyo
mxplm hat geschrieben:@yoyo: Could you please extend Result 13429171 for me?
I extended it.
yoyo

Re: Long running work unit

Verfasst: 13.12.2011 16:36
von ftpd
Hi YoYo,

Can you please extend two wu's?
Hostid = 4208

5731106 - 13428348
5714106 - 13394064

Thx,

Ton

Re: Long running work unit

Verfasst: 13.12.2011 17:33
von yoyo
Extended.
yoyo

Re: Long running work unit

Verfasst: 14.12.2011 01:38
von robertmiles
Michael H.W. Weber hat geschrieben:Indeed, these WUs were not defective but just not complete. :roll: The problem is that the progress bar does not really work well as described many times before. Again, RNA World has stochastic elements which underly the calculations and that prevents an accurate runtime prediction. In most cases, the progress bar is fairly OK but in some exceptional cases, as in those described above, it fails completely and may indicate runtimes of only 10% of the real runtime. We are of course sad about that because there is no way to solve this problem except for having checkpointing.

Michael.
I've seen a suitable way in a previous application over at Rosetta@Home. As the reported progress approaches 100%, just decrease the size of the steps of reported progress so that the progress keeps increasing, but never reaches 100% until the workunit is actually finished.

Re: Long running work unit

Verfasst: 14.12.2011 01:40
von Michael H.W. Weber
Too much efforts for very little outcome.

Michael.

Re: Long running work unit

Verfasst: 14.12.2011 03:13
von robertmiles
Michael H.W. Weber hat geschrieben:
ConflictingEmotions hat geschrieben:Really you need to figure out why this intron WUs are so badly underestimated.
Well, we know that but, unfortunately, it cannot be fixed due to stochastic elements in the code.

Michael.
Do you at least get a chance to modify the estimated runtimes after the initial calculation? For example, multiply it by the typical ratio for which intron runtimes are incorrect?

Or, if it's easier, divide the estimated speed of the CPU by that ratio?

Re: Long running work unit

Verfasst: 14.12.2011 06:44
von Ananas
That's what the first cmsearch call is supposed to do, it is extremely unreliable though.

Code: Alles auswählen

wrapper: running unzip_cpufeat (cmsearch.zip)
wrapper: no checkpoint file found
wrapper: running cmsearch (--forecast 1 -T 0.0 --fil-T-hmm 0.0 --fil-T-qdb 0.0 RF00894_mir-790.cm Equus-caballus-(horse)_CM000405.lin.EMBL.fasta)
forecast.txt found.
This "--forecast" is a runtime forecast. Check forecast.txt in your slot directories and you will see that it has an estimated runtime in it. The file is human-readable.

p.s.: this applies only to cmsearch, cmcalibrate uses a loop count for the progress, the last loop needs somewhat more time so it isn't an exact measure - but better than nothing