Long running work unit
Re: Long running work unit
The runtime estimate is very rough and afaik. the 100% point is set to that estimate.
If the result has been underestimated, it hangs at 100% for quite a while (this while can even be weeks), if it has been overestimated, it is done before 100% are reached.
If the result has been underestimated, it hangs at 100% for quite a while (this while can even be weeks), if it has been overestimated, it is done before 100% are reached.
vi BOINC/checkin_notes
:1,$s/bug/feature/g
:wq!
Erzaehlen sich Biologen eigentlich Klein-RNA-Witze?
:1,$s/bug/feature/g
:wq!
Erzaehlen sich Biologen eigentlich Klein-RNA-Witze?
Re: Long running work unit
Hello yoyo
Could you please extend the following task?
13612793 - 5732991
Almost 400h. and the progress has been 100% for a week or so....
I hope I can deliver this WU in 2011 (well - if I can avoid a power outage - again... )
Could you please extend the following task?
13612793 - 5732991
Almost 400h. and the progress has been 100% for a week or so....
I hope I can deliver this WU in 2011 (well - if I can avoid a power outage - again... )
Re: Long running work unit
this is at 100% for some time and I have over 166 hours of work - should I abort?
Name cms_GA[e20-30MB_Lin64f]_1_Oryza-sativa-Japonica-Group_CM000145.lin.EMBL_RF00028_Intron_gpI_1307967123_61768_23
Workunit 5288673
Created 16 Nov 2011 18:10:44 UTC
Sent 17 Nov 2011 0:18:42 UTC
Received ---
Server state Over
Outcome No reply
Client state New
Exit status 0 (0x0)
Computer ID 10516
Report deadline 7 Dec 2011 0:18:42 UTC
Run time 0
CPU time 0
stderr out
Validate state Initial
Claimed credit 0
Granted credit 0
application version cmsearch XXL (large) 1.0.2 v0.31
Name cms_GA[e20-30MB_Lin64f]_1_Oryza-sativa-Japonica-Group_CM000145.lin.EMBL_RF00028_Intron_gpI_1307967123_61768_23
Workunit 5288673
Created 16 Nov 2011 18:10:44 UTC
Sent 17 Nov 2011 0:18:42 UTC
Received ---
Server state Over
Outcome No reply
Client state New
Exit status 0 (0x0)
Computer ID 10516
Report deadline 7 Dec 2011 0:18:42 UTC
Run time 0
CPU time 0
stderr out
Validate state Initial
Claimed credit 0
Granted credit 0
application version cmsearch XXL (large) 1.0.2 v0.31
Re: Long running work unit
Someone already finished it, but now the WU status is "too many error results". Perhaps yoyo can rescue this one.raddoc hat geschrieben:this is at 100% for some time and I have over 166 hours of work - should I abort?
@yoyo: Could you please extend Result 13429171 for me? Either you changed the SSH credentials or I forgot them.
Re: Long running work unit
Boinc canceled and deleted this workunit already because it got to many errors. So you should cancel your result also.raddoc hat geschrieben:this is at 100% for some time and I have over 166 hours of work - should I abort?
Name cms_GA[e20-30MB_Lin64f]_1_Oryza-sativa-Japonica-Group_CM000145.lin.EMBL_RF00028_Intron_gpI_1307967123_61768_23
Workunit 5288673
Created 16 Nov 2011 18:10:44 UTC
Sent 17 Nov 2011 0:18:42 UTC
Received ---
Server state Over
Outcome No reply
Client state New
Exit status 0 (0x0)
Computer ID 10516
Report deadline 7 Dec 2011 0:18:42 UTC
Run time 0
CPU time 0
stderr out
Validate state Initial
Claimed credit 0
Granted credit 0
application version cmsearch XXL (large) 1.0.2 v0.31
yoyo
Re: Long running work unit
I extended it.mxplm hat geschrieben:@yoyo: Could you please extend Result 13429171 for me?
yoyo
Re: Long running work unit
Hi YoYo,
Can you please extend two wu's?
Hostid = 4208
5731106 - 13428348
5714106 - 13394064
Thx,
Ton
Can you please extend two wu's?
Hostid = 4208
5731106 - 13428348
5714106 - 13394064
Thx,
Ton
Re: Long running work unit
Extended.
yoyo
yoyo
-
- XBOX360-Installer
- Beiträge: 86
- Registriert: 23.02.2010 18:43
- Wohnort: northern Alabama, US
Re: Long running work unit
I've seen a suitable way in a previous application over at Rosetta@Home. As the reported progress approaches 100%, just decrease the size of the steps of reported progress so that the progress keeps increasing, but never reaches 100% until the workunit is actually finished.Michael H.W. Weber hat geschrieben:Indeed, these WUs were not defective but just not complete. The problem is that the progress bar does not really work well as described many times before. Again, RNA World has stochastic elements which underly the calculations and that prevents an accurate runtime prediction. In most cases, the progress bar is fairly OK but in some exceptional cases, as in those described above, it fails completely and may indicate runtimes of only 10% of the real runtime. We are of course sad about that because there is no way to solve this problem except for having checkpointing.
Michael.
- Michael H.W. Weber
- Vereinsvorstand
- Beiträge: 22435
- Registriert: 07.01.2002 01:00
- Wohnort: Marpurk
- Kontaktdaten:
Re: Long running work unit
Too much efforts for very little outcome.
Michael.
Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.
http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B
http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B
-
- XBOX360-Installer
- Beiträge: 86
- Registriert: 23.02.2010 18:43
- Wohnort: northern Alabama, US
Re: Long running work unit
Do you at least get a chance to modify the estimated runtimes after the initial calculation? For example, multiply it by the typical ratio for which intron runtimes are incorrect?Michael H.W. Weber hat geschrieben:Well, we know that but, unfortunately, it cannot be fixed due to stochastic elements in the code.ConflictingEmotions hat geschrieben:Really you need to figure out why this intron WUs are so badly underestimated.
Michael.
Or, if it's easier, divide the estimated speed of the CPU by that ratio?
Re: Long running work unit
That's what the first cmsearch call is supposed to do, it is extremely unreliable though.
This "--forecast" is a runtime forecast. Check forecast.txt in your slot directories and you will see that it has an estimated runtime in it. The file is human-readable.
p.s.: this applies only to cmsearch, cmcalibrate uses a loop count for the progress, the last loop needs somewhat more time so it isn't an exact measure - but better than nothing
Code: Alles auswählen
wrapper: running unzip_cpufeat (cmsearch.zip)
wrapper: no checkpoint file found
wrapper: running cmsearch (--forecast 1 -T 0.0 --fil-T-hmm 0.0 --fil-T-qdb 0.0 RF00894_mir-790.cm Equus-caballus-(horse)_CM000405.lin.EMBL.fasta)
forecast.txt found.
p.s.: this applies only to cmsearch, cmcalibrate uses a loop count for the progress, the last loop needs somewhat more time so it isn't an exact measure - but better than nothing
vi BOINC/checkin_notes
:1,$s/bug/feature/g
:wq!
Erzaehlen sich Biologen eigentlich Klein-RNA-Witze?
:1,$s/bug/feature/g
:wq!
Erzaehlen sich Biologen eigentlich Klein-RNA-Witze?