Long running work unit

Everything about the project RNA World
Nachricht
Autor
Valter
Taschenrechner
Taschenrechner
Beiträge: 10
Registriert: 18.02.2010 11:26

Re: Long running work unit

#745 Ungelesener Beitrag von Valter » 20.10.2015 13:34

Many thanks, Michael. It's going on crunching.
Valter.

Benutzeravatar
Michael H.W. Weber
Vereinsvorstand
Vereinsvorstand
Beiträge: 22431
Registriert: 07.01.2002 01:00
Wohnort: Marpurk
Kontaktdaten:

Re: Long running work unit

#746 Ungelesener Beitrag von Michael H.W. Weber » 21.10.2015 12:01

Valter hat geschrieben:It's going on crunching.
Valter.
Very good! :good:

Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.

http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B

Bild Bild Bild

vkliber
Idle-Sammler
Idle-Sammler
Beiträge: 3
Registriert: 21.09.2015 17:48

Re: Long running work unit

#747 Ungelesener Beitrag von vkliber » 25.01.2016 13:45

ChristianB hat geschrieben:
vkliber hat geschrieben:Now :
Progress : 98.765% ... (still same value)
Time elapsed : 3623:28:41
Time remaining : 45:18:34

What went wrong, it is a normal state?
Can I do complete the calculation of this unit? It would be a shame to miss almost half a year.

PC goes still without power off, VirtualBox still running and process occupies core.
Hi,

yes this is the normal state for when the app underestimated the runtime. The 98,765% is just a number I used to make this recognizable. The app is still calculating but we can't say anything about the progress so it is fixed for this task from now on.
I still crunching, Time elapsed : 6125 hours, it's still okay? I hope that I not die before. :) !

Benutzeravatar
Michael H.W. Weber
Vereinsvorstand
Vereinsvorstand
Beiträge: 22431
Registriert: 07.01.2002 01:00
Wohnort: Marpurk
Kontaktdaten:

Re: Long running work unit

#748 Ungelesener Beitrag von Michael H.W. Weber » 25.01.2016 14:58

Should be fine. I have two of that type running with >3000 and >4000 hrs.

Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.

http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B

Bild Bild Bild

vkliber
Idle-Sammler
Idle-Sammler
Beiträge: 3
Registriert: 21.09.2015 17:48

Re: Long running work unit

#749 Ungelesener Beitrag von vkliber » 11.04.2016 12:41

vkliber hat geschrieben:
ChristianB hat geschrieben:
vkliber hat geschrieben:Now :
Progress : 98.765% ... (still same value)
Time elapsed : 3623:28:41
Time remaining : 45:18:34

What went wrong, it is a normal state?
Can I do complete the calculation of this unit? It would be a shame to miss almost half a year.

PC goes still without power off, VirtualBox still running and process occupies core.
Hi,

yes this is the normal state for when the app underestimated the runtime. The 98,765% is just a number I used to make this recognizable. The app is still calculating but we can't say anything about the progress so it is fixed for this task from now on.
I still crunching, Time elapsed : 6125 hours, it's still okay? I hope that I not die before. :) !
Success :roll2: . WU is completed, CPU time 27,928,680.00 (more than 323 days) ... ufff. PC now have a rest :D .
Hopefully it will be of benefit to you (and to me).

Jacob Klein
Brain-Bug
Brain-Bug
Beiträge: 564
Registriert: 26.07.2013 15:41

Re: Long running work unit

#750 Ungelesener Beitrag von Jacob Klein » 11.04.2016 13:37

NICE!!! Congratulations!
That's one of the work units that I'm working on too (I'm your wingman :wave: ). I've got 165 days invested in it, and will have to redo calculations to see how long it'll take me :)

Your PC is ~40% faster than mine, so .... It's going to take me a while!

Benutzeravatar
Michael H.W. Weber
Vereinsvorstand
Vereinsvorstand
Beiträge: 22431
Registriert: 07.01.2002 01:00
Wohnort: Marpurk
Kontaktdaten:

Re: Long running work unit

#751 Ungelesener Beitrag von Michael H.W. Weber » 12.04.2016 14:21

vkliber hat geschrieben:Success :roll2: . WU is completed, CPU time 27,928,680.00 (more than 323 days) ... ufff. PC now have a rest :D .
Hopefully it will be of benefit to you (and to me).
Fantastic! :D

Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.

http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B

Bild Bild Bild

IanEdwardJames
Mikrocruncher
Mikrocruncher
Beiträge: 23
Registriert: 15.03.2011 10:40

Re: Long running work unit

#752 Ungelesener Beitrag von IanEdwardJames » 08.09.2016 09:35

Very sad news today, two of my longest running WU's decided to error out.

Task 14950784
and Task 149656
one had been going since July 2015 and had gotten to 97.67% and done 7835 hours.
the other was at around 78% with 3861 hours on the clock.
If someone is able to maybe look at the logs and make sure I didn't do anything wrong as I find it a little coincidental that both got 'computation errors' at the same time even through they were at different stages through the WU?

Thank you,

Ian

Jacob Klein
Brain-Bug
Brain-Bug
Beiträge: 564
Registriert: 26.07.2013 15:41

Re: Long running work unit

#753 Ungelesener Beitrag von Jacob Klein » 08.09.2016 12:21

Ian,

I'm sorry. That's a shame.

I summarized the logs, below. I'm not sure what happened to your tasks, but it seems that SOMETHING happened between (2016-08-28 18:45:23) and (2016-08-28 19:42:19) [the time I would have expected a snapshot], local time. Not sure if you had a blue-screen, or power outage, or abrupt PC restart, or VirtualBox Service failure, or what, between those times. Might want to look at Event Viewer to see if you get any hints.

Also, I noticed you were using VirtualBox 4.2.16, a very old version, that is susceptible to service failures. For older v1.15 tasks, I'm using (and recommend) the latest 4.3.x, VirtualBox 4.3.40. But since you don't have those old tasks anymore, now might be a good time to upgrade to the latest 5.0.x, VirtualBox v5.0.26.

Sorry for your loss,
Jacob

-----------------------------------

http://www.rnaworld.de/rnaworld/result. ... d=14950784
- 2016-08-28 18:45:23: Last successful snapshot taken
- 2016-08-28 19:02:18: VM started, resuming from last checkpoint, but something went wrong during or shortly after, because it did not take another snapshot again.
- 2016-08-28 22:05:54: Retry, failed.
- 2016-08-29 21:21:46: Retry, failed.
- 2016-08-30 22:35:02: Retry, failed.
- 2016-09-03 10:20:46: Retry, failed.
- 2016-09-03 14:03:36: Retry, failed.
- 2016-09-04 12:27:03: Retry, failed.
- 2016-09-04 18:23:57: Retry, failed.
- 2016-09-05 18:18:48: Retry, determined it wanted to re-register the VM, caught an error on re-registration: Error Description: Trying to open a VM config 'C:\ProgramData\BOINC\slots\8\boinc_dd2b3ffe17b31245\boinc_dd2b3ffe17b31245.vbox' which has the same UUID as an existing virtual machine

http://www.rnaworld.de/rnaworld/result. ... d=14949656
- 2016-08-28 18:43:14: Last successful snapshot taken
- 2016-08-28 19:02:19: VM started, resuming from last checkpoint, but something went wrong during or shortly after, because it did not take another snapshot again.
- 2016-08-28 22:05:51: Retry, failed.
- 2016-08-29 21:21:42: Retry, failed.
- 2016-08-30 22:34:59: Retry, failed.
- 2016-09-03 10:20:42: Retry, failed.
- 2016-09-03 14:03:33: Retry, failed.
- 2016-09-04 12:27:00: Retry, failed.
- 2016-09-04 18:23:53: Retry, failed.
- 2016-09-05 17:43:30: Retry, determined it wanted to re-register the VM, caught an error on re-registration: Error Description: Trying to open a VM config 'C:\ProgramData\BOINC\slots\9\boinc_4cf75c0427eca566\boinc_4cf75c0427eca566.vbox' which has the same UUID as an existing virtual machine

IanEdwardJames
Mikrocruncher
Mikrocruncher
Beiträge: 23
Registriert: 15.03.2011 10:40

Re: Long running work unit

#754 Ungelesener Beitrag von IanEdwardJames » 09.09.2016 11:43

Thank you for having a look for me Jacob.

I am upgrading Vbox and BOINC as I type.

All I found was a Vbox application error at 19:02. Didn't tell me real lot. I'm always worried about updating things like Boinc and VBox when they are in the middle of big jobs.

Thank you again,

Ian

Jacob Klein
Brain-Bug
Brain-Bug
Beiträge: 564
Registriert: 26.07.2013 15:41

Re: Long running work unit

#755 Ungelesener Beitrag von Jacob Klein » 09.09.2016 13:21

You're welcome.

You are correct that it's inadvisable to upgrade BOINC or VirtualBox while having big tasks on hand, but there is a way to test it (which involves exiting BOINC, watching all the BOINC-related processes exit in Task Manager, backing up your data folder, disconnect all network adapters, and testing, and if something goes wrong the project won't be notified since you were offline, and you can restore your data folder). I do this all the time, because all my machines run Windows 10 Insider Fast-Ring and I always install and use new test versions of VirtualBox. One day, I'll setup a guide.

In the meantime, my recommended rule of thumb is:
- If you don't do offline testing, then don't upgrade BOINC or VirtualBox while running long-running VM tasks.
- If you can/will do offline testing, then you should be able to:
--- upgrade BOINC to any publicly released version without problem
--- upgrade VirtualBox to any publicly-released version WITHIN THE SAME "MINOR" VERSION (ie: v5.0.20 -> v5.0.22, but NOT v5.0.20 -> v5.1.0) without problem.

It's very likely, to me, that your tasks crashed due to a VirtualBox Service crash. When I first started RNA World VM tasks, I had the exact same thing happen (multiple tasks crash at same time to a cluster of 2 tasks), at 2 Jul 2014, 11:09:24 UTC, using VirtualBox v4.3.12, and I lost 96 days and 125 days. VirtualBox has matured a ton since the VirtualBox v4.2 and early v4.3 days. New RNA World tasks support v5.0.x currently, but not v5.1.x yet.

So, it is important to update VirtualBox, to the latest version that your tasks support, whenever you have the chance.

IanEdwardJames
Mikrocruncher
Mikrocruncher
Beiträge: 23
Registriert: 15.03.2011 10:40

Re: Long running work unit

#756 Ungelesener Beitrag von IanEdwardJames » 10.09.2016 14:39

That's good to know that there is a way to test it whilst upgrading, I didn't think of saving all the project data etc.

I was waiting for these ones to finish before doing any major updates (BOINC had been bugging me for quite some time about newer versions)

:good:

Antworten

Zurück zu „RNA World Discussions (english)“