Problem with xxl wu

Everything about the project RNA World
Nachricht
Autor
adrianxw

Problem with xxl wu

#1 Ungelesener Beitrag von adrianxw » 14.06.2012 08:13

This morning, I looked at this machine and found my XXL wu was back to 5 hours elapsed. The trace shows...

<Many repeats>
wrapper: windows. no checkpoint image
01:38:04 (2044): No heartbeat from core client for 30 sec - exiting
01:38:05 (2044): No heartbeat from core client for 30 sec - exiting
RNA World wrapper v0.04
wrapper: no checkpoint file found
wrapper: running cmsearch (-o cms_GA-p[e5-20MB_Lin64f]_Caenorhabditis-elegans_BX284603.lin.EMBL_RF00028_Intron_gpI_1328586124_13747_0 -T 0.0 --fil-T-hmm 0.0 --fil-T-qdb 0.0
RF00028_Intron_gpI.cm Caenorhabditis-elegans_BX284603.lin.EMBL.fasta)
forecast.txt found.
wrapper: windows. no checkpoint image
wrapper: windows. no checkpoint image
wrapper: windows. no checkpoint image
wrapper: windows. no checkpoint image
02:30:16 (8028): No heartbeat from core client for 30 sec - exiting
RNA World wrapper v0.04
wrapper: no checkpoint file found
wrapper: running cmsearch (-o cms_GA-p[e5-20MB_Lin64f]_Caenorhabditis-elegans_BX284603.lin.EMBL_RF00028_Intron_gpI_1328586124_13747_0 -T 0.0 --fil-T-hmm 0.0 --fil-T-qdb 0.0
RF00028_Intron_gpI.cm Caenorhabditis-elegans_BX284603.lin.EMBL.fasta)
forecast.txt found.
02:30:49 (3244): No heartbeat from core client for 30 sec - exiting
RNA World wrapper v0.04
wrapper: no checkpoint file found
wrapper: running cmsearch (-o cms_GA-p[e5-20MB_Lin64f]_Caenorhabditis-elegans_BX284603.lin.EMBL_RF00028_Intron_gpI_1328586124_13747_0 -T 0.0 --fil-T-hmm 0.0 --fil-T-qdb 0.0
RF00028_Intron_gpI.cm Caenorhabditis-elegans_BX284603.lin.EMBL.fasta)
forecast.txt found.
02:36:59 (5280): No heartbeat from core client for 30 sec - exiting
RNA World wrapper v0.04
wrapper: no checkpoint file found
wrapper: running cmsearch (-o cms_GA-p[e5-20MB_Lin64f]_Caenorhabditis-elegans_BX284603.lin.EMBL_RF00028_Intron_gpI_1328586124_13747_0 -T 0.0 --fil-T-hmm 0.0 --fil-T-qdb 0.0
RF00028_Intron_gpI.cm Caenorhabditis-elegans_BX284603.lin.EMBL.fasta)
forecast.txt found.
02:39:07 (8016): No heartbeat from core client for 30 sec - exiting
RNA World wrapper v0.04
wrapper: no checkpoint file found
wrapper: running cmsearch (-o cms_GA-p[e5-20MB_Lin64f]_Caenorhabditis-elegans_BX284603.lin.EMBL_RF00028_Intron_gpI_1328586124_13747_0 -T 0.0 --fil-T-hmm 0.0 --fil-T-qdb 0.0
RF00028_Intron_gpI.cm Caenorhabditis-elegans_BX284603.lin.EMBL.fasta)
forecast.txt found.
wrapper: windows. no checkpoint image
<Repeats>

The first "No heartbeat..." seems to be the start of an hour long event. Now, obviously, the machine was running or it could not have been trying to restart, other indications confirm that there has not been a power issue, (electric clock still correct).

My Climate Prediction wu here has been running for a couple of weeks also crashed out during the night, "coincidence"? The wu log file shows nothing.

I also have BOINC running on my wifes laptop. This morning, there are 3 projects pending upload of completed task, (Docking, Malaria and WCG). Another "coincidence". Her machine is connected by a wireless link. When manually prodded, the files uploaded without incident.

I am suspicious, but really welcome any comment concerning the likely cause of these "coincidences".

Benutzeravatar
MReed
Task-Killer
Task-Killer
Beiträge: 726
Registriert: 10.02.2010 22:26
Wohnort: Berlin

Re: Problem with xxl wu

#2 Ungelesener Beitrag von MReed » 14.06.2012 09:55

The heartbeat bug is a known bug (or feature according to David Anderson) of BOINC. It can be triggered by (not limited to) by unpacking several large wirkunits simultaniously.
MfG
MReed

Bild

Benutzeravatar
Michael H.W. Weber
Vereinsvorstand
Vereinsvorstand
Beiträge: 20576
Registriert: 07.01.2002 01:00
Wohnort: Marpurk
Kontaktdaten:

Re: Problem with xxl wu

#3 Ungelesener Beitrag von Michael H.W. Weber » 14.06.2012 18:51

@adrianxw: To solve the heart-beat bug which indeed is known for years but still not fixed by Anderson's team, we have test code available which should solve the issue. Do you have a 64 bit system running? If so, I could email you the code.

Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.

http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B

Bild Bild Bild

adrianxw

Re: Problem with xxl wu

#4 Ungelesener Beitrag von adrianxw » 15.06.2012 10:45

No, 32bit XP.

Tex1954
Idle-Sammler
Idle-Sammler
Beiträge: 5
Registriert: 23.11.2011 07:53

Re: Problem with xxl wu

#5 Ungelesener Beitrag von Tex1954 » 08.07.2012 03:07

I can sympathize with many folks who get super long WU's on their system. Many of us like to run several projects at the same time and without micro-managing the system, there is a problem.

It's easy to get far too many WU's on a system and it hogs all the CPU power... this leads to us aborting tasks.

I think the simple solution is possibly two fold.

1) Do what neuorna@home does and limit the maximum number of tasks to 1 or 2 per machine.

2) Having done #1, you can also add a selection in the Project Preferences to allow a client to select how many are downloaded.

I don't know if the problem is fixed yet, but check-pointing is an absolute MUST HAVE on long project tasks. By long, I mean anything over 6 hours. In my area, we suffer from power outages fairly regularly and a UPS protects the system, but can't power it more than a few minutes fully loaded... And then there are the software updates that require rebooting...

:)

Benutzeravatar
yoyo
Vereinsvorstand
Vereinsvorstand
Beiträge: 7833
Registriert: 17.12.2002 14:09
Wohnort: Berlin
Kontaktdaten:

Re: Problem with xxl wu

#6 Ungelesener Beitrag von yoyo » 08.07.2012 07:01

We have implemented a different system. You can deselect the cmsearch XXL workunits in your settings. Than you get only short cmsearch S workunits.
yoyo
HILF mit im Rechenkraft-WiKi, dies gibts zu tun.
Wiki - FAQ - Verein - Chat

Bild Bild

Tex1954
Idle-Sammler
Idle-Sammler
Beiträge: 5
Registriert: 23.11.2011 07:53

Re: Problem with xxl wu

#7 Ungelesener Beitrag von Tex1954 » 08.07.2012 17:03

yoyo hat geschrieben:We have implemented a different system. You can deselect the cmsearch XXL workunits in your settings. Than you get only short cmsearch S workunits.
yoyo
I understand... however, I was under the impression that if nobody ever did the XXL tasks then they would never get done and wished to offer a solution to "that" problem.

Their are situations where I would like to run one LONG task to let other projects run also. But, when one tries to manually do that, one invariably gets too many tasks sent and they will run high priority which kills other project tasks and takes over your system.

:)

robertmiles
XBOX360-Installer
XBOX360-Installer
Beiträge: 75
Registriert: 23.02.2010 18:43
Wohnort: northern Alabama, US

Re: Problem with xxl wu

#8 Ungelesener Beitrag von robertmiles » 09.07.2012 13:51

Tex1954 hat geschrieben:I can sympathize with many folks who get super long WU's on their system. Many of us like to run several projects at the same time and without micro-managing the system, there is a problem.

It's easy to get far too many WU's on a system and it hogs all the CPU power... this leads to us aborting tasks.

I think the simple solution is possibly two fold.

1) Do what neuorna@home does and limit the maximum number of tasks to 1 or 2 per machine.

2) Having done #1, you can also add a selection in the Project Preferences to allow a client to select how many are downloaded.

I don't know if the problem is fixed yet, but check-pointing is an absolute MUST HAVE on long project tasks. By long, I mean anything over 6 hours. In my area, we suffer from power outages fairly regularly and a UPS protects the system, but can't power it more than a few minutes fully loaded... And then there are the software updates that require rebooting...

:)
Another idea to consider: Allow users to set a limit of how many hours of work will be downloaded at one time. Preferably calculated using the speed of their machine, but using the speed of the reference machine would at least be useful. That way, longer workunits would be downloaded alone, and the computer could then decide whether to ask for any more workunits. Also, users could have a finer grained control of the maximum length workunits to download, not just S or XXL. This would take more effort on the server - it would have to go down the list of workunits it is ready to send until it finds one less that that expected length.

On the UPS power limits: A UPS I'm using has the ability to put the computer into sleep mode when it runs low on power, if used with a computer that supports sleep mode. Restarting from sleep mode preserves the CPU memory contents from before, although probably not the GPU memory contents. In other words, CPU workunits can resume from the point of interruption without a normal checkpoint, although GPU workunits probably cannot. This will not work for restarting after most software updates, though. Note that restarting from sleep mode is manual, not automatic.

On normal checkpoints: Perhaps the project can offer some information to help users decide whether they know the right kinds of computer programming to have a try at adding checkpoints. For example:

1. Whether there are any software license restrictions that would prevent allowing a user to download the source code.

2. What computer language(s) the application is written in.

3. What human language(s) the comments are written in.

4. What operating systems the source code is set up to compile under.

Benutzeravatar
Michael H.W. Weber
Vereinsvorstand
Vereinsvorstand
Beiträge: 20576
Registriert: 07.01.2002 01:00
Wohnort: Marpurk
Kontaktdaten:

Re: Problem with xxl wu

#9 Ungelesener Beitrag von Michael H.W. Weber » 09.07.2012 22:39

robertmiles hat geschrieben:On normal checkpoints: Perhaps the project can offer some information to help users decide whether they know the right kinds of computer programming to have a try at adding checkpoints. For example:

1. Whether there are any software license restrictions that would prevent allowing a user to download the source code.
No restrictions.
robertmiles hat geschrieben:2. What computer language(s) the application is written in.
C.
robertmiles hat geschrieben:3. What human language(s) the comments are written in.
English.
robertmiles hat geschrieben:4. What operating systems the source code is set up to compile under.
We successfully compiled the current RNA World-implemented version for Linux, Windows, MacOS and ARM although it is officially made only for Linux/Unix/MacOS falvors.
A brand new version has appeared which according to our first tests does no longer compile for ARM - which would be bad.

Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.

http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B

Bild Bild Bild

robertmiles
XBOX360-Installer
XBOX360-Installer
Beiträge: 75
Registriert: 23.02.2010 18:43
Wohnort: northern Alabama, US

Re: Problem with xxl wu

#10 Ungelesener Beitrag von robertmiles » 10.07.2012 02:36

Is it set up to compile under Windows, not just run there after it is compiled? If so, I'd like to look at the source code to estimate whether I am ready to try adding checkpoints, initially for the Windows version only.

Benutzeravatar
Michael H.W. Weber
Vereinsvorstand
Vereinsvorstand
Beiträge: 20576
Registriert: 07.01.2002 01:00
Wohnort: Marpurk
Kontaktdaten:

Re: Problem with xxl wu

#11 Ungelesener Beitrag von Michael H.W. Weber » 13.07.2012 07:51

robertmiles hat geschrieben:Is it set up to compile under Windows, not just run there after it is compiled? If so, I'd like to look at the source code to estimate whether I am ready to try adding checkpoints, initially for the Windows version only.
No, it is set to compile under Linux, not Windows.

Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.

http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B

Bild Bild Bild

robertmiles
XBOX360-Installer
XBOX360-Installer
Beiträge: 75
Registriert: 23.02.2010 18:43
Wohnort: northern Alabama, US

Re: Problem with xxl wu

#12 Ungelesener Beitrag von robertmiles » 15.07.2012 03:48

I've never used Linux, so I doubt that I'm ready.

Antworten

Zurück zu „RNA World Discussions (english)“