Long running work unit

Everything about the project RNA World
Nachricht
Autor
Benutzeravatar
Michael H.W. Weber
Vereinsvorstand
Vereinsvorstand
Beiträge: 22414
Registriert: 07.01.2002 01:00
Wohnort: Marpurk
Kontaktdaten:

Re: Long running work unit

#13 Ungelesener Beitrag von Michael H.W. Weber » 11.03.2010 21:36

Code: Alles auswählen

Monodelphis-domestica
...is the largest genome file we have in stock. :evil2:

Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.

http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B

Bild Bild Bild

Al Dente
Fingerzähler
Fingerzähler
Beiträge: 2
Registriert: 11.03.2010 19:27

Re: Long running work unit

#14 Ungelesener Beitrag von Al Dente » 12.03.2010 02:30

Task Manager is all over the place, showing between 250M & 500M.

Benutzeravatar
Conan
XBOX360-Installer
XBOX360-Installer
Beiträge: 80
Registriert: 11.02.2010 11:09

Re: Long running work unit

#15 Ungelesener Beitrag von Conan » 16.04.2010 14:21

I have this work unit that on original estimate would run for well over 1,000 hours.

The work unit started and got to 8% after almost 3 hours and time left was about 950 hours.
I then had to restart computer.
After the next checkpoint the WU reset itself back to Zero (0) hours and Zero (0) percent (%) with time left back up to over 1200 hours.

Still running it got back up to 22 hours and 17% which looked like all was now going fine with time left down to 750 or so hours.

Checked a half hour ago and the WU has gone back to 5% after 24 hours and time left back up to 990 hours.

So is this normal ?

Will it ever finish ?

Should I abort the WU ?
As I doubt I will get any credit for it and it will probable error out anyway the way it is going.

Thanks
Conan.
Bild

Ananas
WU-Schieber
WU-Schieber
Beiträge: 1184
Registriert: 27.04.2008 18:37
Wohnort: Nordlichter Köln

Re: Long running work unit

#16 Ungelesener Beitrag von Ananas » 16.04.2010 14:57

Conan hat geschrieben:... WU has gone back to 5% ...
This can only mean that it has been restarted, i.e. the first run crashed and it started again at 0% at some point.

The BOINC heartbeat bug (for D.A. it's a feature) can cause such things.

The runtime basically seems to be OK (not the restart though), one of your wingmen has already returned a result that ran for nearly 2 days.

A remaining runtime of more than 1000 hours has probably been caused by a duration correction factor that went way out of bounds as a result of a few ultra-short CMS results in the range of only a few seconds. So don't trust that remaining time. You can check the DCF on your host page.


The decision about aborting or not is yours - the risk is high that it restarts again sometimes but there is a small chance that you can finish it (afterall it isn't such a complex organism). It probably depends a bit on the RAM that is available for this task, e.g. how much of your RAM is eaten by operating system and the other three tasks on that box.
vi BOINC/checkin_notes
:1,$s/bug/feature/g
:wq!

Erzaehlen sich Biologen eigentlich Klein-RNA-Witze?

Benutzeravatar
Michael H.W. Weber
Vereinsvorstand
Vereinsvorstand
Beiträge: 22414
Registriert: 07.01.2002 01:00
Wohnort: Marpurk
Kontaktdaten:

Re: Long running work unit

#17 Ungelesener Beitrag von Michael H.W. Weber » 17.04.2010 08:28

This is a small WU and all problems you experience in this case must somehow be related to your system or BOINC settings. It won't consume much RAM and also has no run time that would take 1000 hrs.

Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.

http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B

Bild Bild Bild

Benutzeravatar
Conan
XBOX360-Installer
XBOX360-Installer
Beiträge: 80
Registriert: 11.02.2010 11:09

Re: Long running work unit

#18 Ungelesener Beitrag von Conan » 17.04.2010 09:45

Thanks for the replies.

I have decided to let it run as it is now behaving like I thought it should.

I don't know about a small WU though,
It is now up to 43.38 Hours 28% done with 600 Hours to go (or about 155 Hours in actual time) to completion.

And yes Michael it does not use a lot of RAM, just a faily long time to run (around 200 hours).
Bild

Benutzeravatar
yoyo
Vereinsvorstand
Vereinsvorstand
Beiträge: 8043
Registriert: 17.12.2002 14:09
Wohnort: Berlin
Kontaktdaten:

Re: Long running work unit

#19 Ungelesener Beitrag von yoyo » 17.04.2010 09:48

Do you have cpu throttling switched on, means that only some percentage of the CPU should be used?
yoyo
HILF mit im Rechenkraft-WiKi, dies gibts zu tun.
Wiki - FAQ - Verein - Chat

Bild Bild

Benutzeravatar
Conan
XBOX360-Installer
XBOX360-Installer
Beiträge: 80
Registriert: 11.02.2010 11:09

Re: Long running work unit

#20 Ungelesener Beitrag von Conan » 17.04.2010 11:22

Conan hat geschrieben:I have this work unit that on original estimate would run for well over 1,000 hours.

The work unit started and got to 8% after almost 3 hours and time left was about 950 hours.
I then had to restart computer.
After the next checkpoint the WU reset itself back to Zero (0) hours and Zero (0) percent (%) with time left back up to over 1200 hours.

Still running it got back up to 22 hours and 17% which looked like all was now going fine with time left down to 750 or so hours.

Checked a half hour ago and the WU has gone back to 5% after 24 hours and time left back up to 990 hours.

So is this normal ?

Will it ever finish ?

Should I abort the WU ?
As I doubt I will get any credit for it and it will probable error out anyway the way it is going.

Thanks
Conan.
Well I finally got sick of this WU constantly going round in circles.

It just reset back from 28% to 4% at 45.12 Hours done and now 1,056 Hours to go.

This WU was a waste of time for me, it would not of ended. I don't know what is wrong with it, my computer is running all other projects that I do (Docking, FreeHal, QuantumFire, WUProj, Collatz, Ralph and AQUA (at the moment, often changes)) with no trouble.

I will see if someone else can finish it.
Bild

Benutzeravatar
Conan
XBOX360-Installer
XBOX360-Installer
Beiträge: 80
Registriert: 11.02.2010 11:09

Re: Long running work unit

#21 Ungelesener Beitrag von Conan » 17.04.2010 11:25

yoyo hat geschrieben:Do you have cpu throttling switched on, means that only some percentage of the CPU should be used?
yoyo
Thanks Yoyo,
no I don't, all running full throttle.
I have now aborted the WU after the 3rd restart (from 8% done to 0%, then 17% done to 5% and lastly 28% done to 4%).
Bild

Benutzeravatar
Michael H.W. Weber
Vereinsvorstand
Vereinsvorstand
Beiträge: 22414
Registriert: 07.01.2002 01:00
Wohnort: Marpurk
Kontaktdaten:

Re: Long running work unit

#22 Ungelesener Beitrag von Michael H.W. Weber » 17.04.2010 12:04

Hmmm, what is that (taken from your log)?

Code: Alles auswählen

22:39:25 (512): No heartbeat from core client for 30 sec - exiting
22:39:27 (512): No heartbeat from core client for 30 sec - exiting
22:39:28 (512): No heartbeat from core client for 30 sec - exiting
I will look up the original run time estimate as processed on the server (an Intel i7 920 QuadCore).

Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.

http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B

Bild Bild Bild

Benutzeravatar
rilian
PDA-Benutzer
PDA-Benutzer
Beiträge: 47
Registriert: 08.02.2010 15:38
Kontaktdaten:

Re: Long running work unit

#23 Ungelesener Beitrag von rilian » 17.04.2010 14:32

Bild

:good: :3d:
I crunch for Ukraine

Ananas
WU-Schieber
WU-Schieber
Beiträge: 1184
Registriert: 27.04.2008 18:37
Wohnort: Nordlichter Köln

Re: Long running work unit

#24 Ungelesener Beitrag von Ananas » 17.04.2010 17:12

We are working on a solution to split cmsearch runs into chunks, we actually have come quite far even :-)

A splitter already existed for a few days and I just sent Michael a first attempt of a program that recombines the partial results into a complete one.

If this works, we will not have results running several days anymore and hopefully it will reduce the RAM requirements significant.

The combined time of all sub-results required for one complete result will increase a bit as the algorithm needs minor redundancies - but I guess that's a lot better than loosing so much time through crashed results.
vi BOINC/checkin_notes
:1,$s/bug/feature/g
:wq!

Erzaehlen sich Biologen eigentlich Klein-RNA-Witze?

Antworten

Zurück zu „RNA World Discussions (english)“