Seite 1 von 2

CMsearch large not saving state on any kind of suspend

Verfasst: 31.12.2011 20:56
von DavidHoney
I have had to abort all the CMSearcg (large) tasks because their state is lost on any kind of suspension (for any reason) and they appear to require anywhere from 2 days to to more than 5 days of uninterrupted processing on an i7 core Intel machine. This is is a waste of CPU time. Either the work units need to be made smaller so that they stand a reasonable chance of completing in under 1 day, or they need to be able to save their state so that they can resume where they left off without discarding possible days of CPU time for some suspension.

Re: CMsearch large not saving state on any kind of suspend

Verfasst: 31.12.2011 21:02
von yoyo
Both is not possible. Therefore we run them as XXL App which you can deselect in your preferences. Anyway, there is a small group of XXL fans.
yoyo

Re: CMsearch large not saving state on any kind of suspend

Verfasst: 01.01.2012 10:18
von DavidHoney
That's a pity.

Where do I find the preference? I don't see it in either the BOIC manager or the BAM! web pages. I'd like to disable any XXL WUs as they are unlikely to ever finish and the CPU time would be better spent on other tasks or proijects.

Thanks,
David.

Re: CMsearch large not saving state on any kind of suspend

Verfasst: 01.01.2012 10:31
von Michael H.W. Weber
DavidHoney hat geschrieben:...I'd like to disable any XXL WUs as they are unlikely to ever finish and the CPU time would be better spent on other tasks or proijects.
These tasks definetely finish, give exorbitant credits as a compensation for the (unavoidably) long runtimes and are worth of processing, as they correspond to interesting but complex RNA families. :D

Michael.

P.S.: And of course can the CMSEARCH XXL applications be deselcted in the RNA World project settings under the Run only the selected applications option.

Re: CMsearch large not saving state on any kind of suspend

Verfasst: 01.01.2012 11:41
von DavidHoney
These tasks definetely finish, give exorbitant credits as a compensation for the (unavoidably) long runtimes and are worth of processing, as they correspond to interesting but complex RNA families
It seems they will only complete if you never need to suspend BOINC or the project for at least some days or possibly more than a week. The moment you have to suspend the project, or suspend use of processor/GPU for running BOINC project tasks, or reboot the machine (such as installing a Windows update or installing/uninstalling an application, all the work for that RNA task is discarded. So on my machine where it is likely that such a suspension is required at least twice a week, the result is that none of those cmsearch XXL tasks will ever complete. Thanks to the pointer on where I can disable cmsearch XXL WUs from running - I've now done that.

This is a pity. The applications from other BOINC projects save their state on suspension so that they can resume without discarding all the work achieved on them to-date. This is the most user friendly design to which BOINC apps should aspire. After all, it's my machine and when I need to run my own resource intensive apps, I should be able to suspend BOINC so that resources are relinquished so that I can the applications.

David.

Re: CMsearch large not saving state on any kind of suspend

Verfasst: 01.01.2012 12:03
von MReed
DavidHoney hat geschrieben:The applications from other BOINC projects save their state on suspension so that they can resume without discarding all the work achieved on them to-date. This is the most user friendly design to which BOINC apps should aspire.
Well... it's not as if we (the as association Rechenkraft.net) don't try to make this possible BUT the application wasn't originally developed for BOINC - and the coders didn't think it necessary to implement checkpoints. We'll have to do with what we've got and unless someone of us can find a way to make checkpoints possible in our WU's, we're unfortunately stuck with not having them, sorry.

Re: CMsearch large not saving state on any kind of suspend

Verfasst: 01.01.2012 12:44
von Michael H.W. Weber
DavidHoney hat geschrieben:It seems they will only complete if you never need to suspend BOINC or the project for at least some days or possibly more than a week. The moment you have to suspend the project, or suspend use of processor/GPU for running BOINC project tasks, or reboot the machine (such as installing a Windows update or installing/uninstalling an application, all the work for that RNA task is discarded. So on my machine where it is likely that such a suspension is required at least twice a week, the result is that none of those cmsearch XXL tasks will ever complete. Thanks to the pointer on where I can disable cmsearch XXL WUs from running - I've now done that.

This is a pity. The applications from other BOINC projects save their state on suspension so that they can resume without discarding all the work achieved on them to-date. This is the most user friendly design to which BOINC apps should aspire. After all, it's my machine and when I need to run my own resource intensive apps, I should be able to suspend BOINC so that resources are relinquished so that I can the applications.

David.
Yes, it is correct and known (FAQ) that RNA World supports checkpointing only for 32-bit Linux machines which have memory randomization disabled in the kernel. However, you should activate the switch "keep application in memory" in the general BOINC settings - and this not only for the RNA World project - to avoid loss of data when pausing the task. Sending Windows to hibernation mode should then NOT result in loss of the computational results and should consequently allow you to turn your machine off for a while (in hibernation mode, only!).

As described earlier on many occasions in diverse discussion fora, we are not keen on implementing checkpointing at the science application level for multiple reasons. Among these is the fact that the current application would have to be re-written. And this application is not developed by our team. The second most important argument is that unlike other distributed computing projects, our project already consists of multiple applications and will be massively extended in the future. We therefore require a BOINC-integrated, universal checkpointing mechanism as we cannot re-write all the applications each time a new version is released. Such a universal approach would also be beneficial to all the other projects and, to my point of view, is therefore of utmost universal importance. Unfortunately, there seems to be zero efforts from the BOINC developers to take this request serious. As a consequence, I cannot exclude that RNA World might one day migrate to a different, more advanced DC infrastructure that satisfies our needs in a more timely fashion.

Michael.

Re: CMsearch large not saving state on any kind of suspend

Verfasst: 08.01.2012 17:47
von Xenu
I could live with the lack of suspend if the run time could be counted on to be close to what was predicted.

I started the job I'm currently running, with run time that was supposed to be around 140 hours. It hit 100% after 333.5 high priority hours, but after that has run another 381 hours at high priority, and still hasn't finished. It would have been nice to know what I was getting into.

Re: CMsearch large not saving state on any kind of suspend

Verfasst: 08.01.2012 23:19
von Ananas
@Xenu : disabling XXL work if you prefer shorter ones (described above : "Run only the selected applications"), that should relax the situation a bit

edit : The first call to the cmsearch program estimates the runtime to the 100% mark ... and you're right, it sometimes isn't too close to reality :

wrapper: no checkpoint file found
wrapper: running cmsearch (--forecast 1 -T 0.0 --fil-T-hmm 0.0 --fil-T-qdb 0.0
RF00976_mir-583.cm Ornithorhynchus-anatinus-(platypus)_CM000409.lin.EMBL.fasta)
forecast.txt found.


This might be caused by the CPU specific optimizations, that probably work better for the forecast than for the real calculation.

Re: CMsearch large not saving state on any kind of suspend

Verfasst: 10.01.2012 17:34
von Xenu
Thanks, I disabled XXL a little over a month ago.

I don't mind jobs that take a really long time, but by now it's been at 100% for close to 3 weeks. Will it complete today? Next week? The week after? Is it caught in a loop and will never complete? I've really given up guessing, but would hope that the coders could figure out some way of preventing this sort of thing.

Re: CMsearch large not saving state on any kind of suspend

Verfasst: 10.01.2012 17:41
von yoyo
About which workunitID you are talking?
yoyo

Re: CMsearch large not saving state on any kind of suspend

Verfasst: 10.01.2012 19:18
von Xenu
That would be WU 5759022, now at 764 hours on a 2.6 GHz core.