Pretty long WUs: suggestions and stuff

Everything about the project RNA World
Nachricht
Autor
Lem
Mikrocruncher
Mikrocruncher
Beiträge: 29
Registriert: 11.04.2011 20:04

Pretty long WUs: suggestions and stuff

#1 Ungelesener Beitrag von Lem » 01.03.2012 11:07

Hi, guys and dolls!

So I'm getting stuck here into some long running workunits, and some others to come. ;)

This one http://www.rnaworld.de/rnaworld/workuni ... id=5698909 has been running for about 240 hours, and it's reported at 70%.

This one http://www.rnaworld.de/rnaworld/workuni ... id=5826285 has been running for 220 hour. It has reached 100% about 20 hours ago, but it's still going on.

This one http://www.rnaworld.de/rnaworld/workuni ... id=5838193 has been running for 184 hours, and it is reported at 80%.

Other three units in the 200-300 hours range are coming in a while.

My box, based on an Intel i5 2500K overclocked now to 4600MHz, happens to be one of the fastest machine involved in this project, at least on a "per task" basis. I'm pretty sure that these units will take a much longer time to be completed on other hosts.

I'm happy with long workunits: I usually do not shut down nor reboot my PC, so this isn't going to really matter. But I don't have any UPS, which means that any blackout would end up in all the work being lost. And if we talk about wus that with most PCs can take months of computation, the chance of a blackout should actually be taken into account.

I think that it would be useful to reserve such long wus (let's say more than 100 hours) to someone who states, may be in his project-preferences, to have an UPS. Don't you think so? Wouldn't it be possible to implement such a thing?

BTW: I'm here with Boinc 6.12.33 (Linux 64bit). I've noticed that, if Boinc tries to connect to the network when there's no network available (when I've shut down the router, for example), workunits begin to crash. I mean: the executables keep on running, but Boinc reports the wus have finished with 0 status but no output file (or something like that) and starts other wus which will crash as well very soon. This doesn't happen just with RNA workunits, but also with any other project. It always happens when Boinc tries to connect and there's no network. Sometimes more wus crash, sometimes less. My obvious solution has been to set Boinc network preferences so that no connection is attemped during nighttime, when the net is usually off. HTH.

Bye.

Benutzeravatar
MReed
Task-Killer
Task-Killer
Beiträge: 726
Registriert: 10.02.2010 22:26
Wohnort: Berlin

Re: Pretty long WUs: suggestions and stuff

#2 Ungelesener Beitrag von MReed » 01.03.2012 11:22

Lem hat geschrieben:I think that it would be useful to reserve such long wus (let's say more than 100 hours) to someone who states, may be in his project-preferences, to have an UPS. Don't you think so? Wouldn't it be possible to implement such a thing?

It is already possible to deselect those long running wus in your account - you can find that online in your account under: project preferences -> Run only the selected applications.
Just deselect the XXL wus there and you won't be "bothered" with those long runners again ;)
MfG
MReed

Bild

Lem
Mikrocruncher
Mikrocruncher
Beiträge: 29
Registriert: 11.04.2011 20:04

Re: Pretty long WUs: suggestions and stuff

#3 Ungelesener Beitrag von Lem » 01.03.2012 11:59

MReed hat geschrieben: It is already possible to deselect those long running wus in your account - you can find that online in your account under: project preferences -> Run only the selected applications.
Just deselect the XXL wus there and you won't be "bothered" with those long runners again ;)
:-o
I'm not bothered. I run *only* long wus at the moment.

I'm saying that IMHO it would be wise to distinguish long wus and wus that "would run for weeks or even more", let's say "insanely long wus", because for these ones it's safer to have an UPS since there are no checkpoints.

Bye.

Benutzeravatar
Michael H.W. Weber
Vereinsvorstand
Vereinsvorstand
Beiträge: 20653
Registriert: 07.01.2002 01:00
Wohnort: Marpurk
Kontaktdaten:

Re: Pretty long WUs: suggestions and stuff

#4 Ungelesener Beitrag von Michael H.W. Weber » 01.03.2012 20:47

Since we are planning to discontinue the indeed *insanely long WUs* as soon as the current batches are done, we are not planning to introduce another run time selection mechanism.

Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.

http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B

Bild Bild Bild

Ananas
WU-Schieber
WU-Schieber
Beiträge: 1184
Registriert: 27.04.2008 18:37
Wohnort: Nordlichter Köln

Re: Pretty long WUs: suggestions and stuff

#5 Ungelesener Beitrag von Ananas » 01.03.2012 20:56

As of the UPS ... brown-/blackouts are not that common here, I lost way more work through that buggy heartbeat stuff in BOINC than through power failures.
vi BOINC/checkin_notes
:1,$s/bug/feature/g
:wq!

Erzaehlen sich Biologen eigentlich Klein-RNA-Witze?

Benutzeravatar
Michael H.W. Weber
Vereinsvorstand
Vereinsvorstand
Beiträge: 20653
Registriert: 07.01.2002 01:00
Wohnort: Marpurk
Kontaktdaten:

Re: Pretty long WUs: suggestions and stuff

#6 Ungelesener Beitrag von Michael H.W. Weber » 02.03.2012 14:11

Yes, and the heartbeat bug is known for many years but the BOINC developers are not willing or capable of solving this issue. It is a plain shame.

Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.

http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B

Bild Bild Bild

Benutzeravatar
skgiven
Mikrocruncher
Mikrocruncher
Beiträge: 17
Registriert: 23.10.2010 08:49

Re: Pretty long WUs: suggestions and stuff

#7 Ungelesener Beitrag von skgiven » 03.03.2012 14:12

If you free up a CPU core/thread the Heartbeat issues often go away. On the surface, this means losing a core/thread, but in reality the operating system and user apps tend to use ~10 or 20% of a core/thread. It also expedites tasks on other threads/cores and reduces failures; so overall you might actually be gaining. Obviously this is more pertinent to high end CPU's with many cores/threads, especially the HT CPU's. Reducing the write to disk times might also help (I tend to use 600 to 1000s, especially when using a SSD, and crunching for high I/O projects such as Climate). The 60s default is ancient (for single or dual core CPU's), and not a recommended config for high end CPU's and multi GPU systems.

Presently running several very long tasks :( - wanted to reinstall and test, oh well only 4days to wait (i7-2600K). Must deselect these.

Might have been better if these were not selected by default.

Just checked and "cmsearch XXL" was not selected.
However, "If no work for selected applications is available, accept work from other applications?" was selected!
So when these long apps were added everyone with such settings would have automatically started running the long apps, even if unselected. I think it's difficult to make exceptions to this server side.
Bild

Benutzeravatar
skgiven
Mikrocruncher
Mikrocruncher
Beiträge: 17
Registriert: 23.10.2010 08:49

Re: Pretty long WUs: suggestions and stuff

#8 Ungelesener Beitrag von skgiven » 04.03.2012 22:04

The longer run times are a bit 'keen' given that there are no checkpoints.
Just noticed that one of my tasks still has 101h to go. Yesterday it was 103h.
That would be Boinc using the short task times to guesstimate runtime for the long tasks!
So still 4days to go, and counting... :roll2:
Bild

Ananas
WU-Schieber
WU-Schieber
Beiträge: 1184
Registriert: 27.04.2008 18:37
Wohnort: Nordlichter Köln

Re: Pretty long WUs: suggestions and stuff

#9 Ungelesener Beitrag von Ananas » 04.03.2012 22:59

skgiven hat geschrieben:If you free up a CPU core/thread the Heartbeat issues often go away. ...
Had set my strongest machine to 15 for a while, it did become slightly better but it didn't go away, especially bad it was with CPDN workunits *sigh* those had been the main reason for building that box :´(
Reducing the write to disk times might also help (I tend to use 600 to 1000s ...
Mine is set to 600 seconds for the "big box venue" - looks as if we have come to quite similar conclusions :-)
skgiven hat geschrieben:... especially when using a SSD ....
I have already been thinking about one of those things, Samsung 830 seems to be good for XP systems, as XP doesn't optimize access for SDDs yet and Samsung SDDs come with an additional program to compensate that
vi BOINC/checkin_notes
:1,$s/bug/feature/g
:wq!

Erzaehlen sich Biologen eigentlich Klein-RNA-Witze?

Antworten

Zurück zu „RNA World Discussions (english)“