Long running work unit
Re: Long running work unit
Many thanks, Michael. It's going on crunching.
Valter.
Valter.
- Michael H.W. Weber
- Vereinsvorstand
- Beiträge: 22435
- Registriert: 07.01.2002 01:00
- Wohnort: Marpurk
- Kontaktdaten:
Re: Long running work unit
Very good!Valter hat geschrieben:It's going on crunching.
Valter.
Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.
http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B
http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B
Re: Long running work unit
I still crunching, Time elapsed : 6125 hours, it's still okay? I hope that I not die before. !ChristianB hat geschrieben:Hi,vkliber hat geschrieben:Now :
Progress : 98.765% ... (still same value)
Time elapsed : 3623:28:41
Time remaining : 45:18:34
What went wrong, it is a normal state?
Can I do complete the calculation of this unit? It would be a shame to miss almost half a year.
PC goes still without power off, VirtualBox still running and process occupies core.
yes this is the normal state for when the app underestimated the runtime. The 98,765% is just a number I used to make this recognizable. The app is still calculating but we can't say anything about the progress so it is fixed for this task from now on.
- Michael H.W. Weber
- Vereinsvorstand
- Beiträge: 22435
- Registriert: 07.01.2002 01:00
- Wohnort: Marpurk
- Kontaktdaten:
Re: Long running work unit
Should be fine. I have two of that type running with >3000 and >4000 hrs.
Michael.
Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.
http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B
http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B
Re: Long running work unit
Success . WU is completed, CPU time 27,928,680.00 (more than 323 days) ... ufff. PC now have a rest .vkliber hat geschrieben:I still crunching, Time elapsed : 6125 hours, it's still okay? I hope that I not die before. !ChristianB hat geschrieben:Hi,vkliber hat geschrieben:Now :
Progress : 98.765% ... (still same value)
Time elapsed : 3623:28:41
Time remaining : 45:18:34
What went wrong, it is a normal state?
Can I do complete the calculation of this unit? It would be a shame to miss almost half a year.
PC goes still without power off, VirtualBox still running and process occupies core.
yes this is the normal state for when the app underestimated the runtime. The 98,765% is just a number I used to make this recognizable. The app is still calculating but we can't say anything about the progress so it is fixed for this task from now on.
Hopefully it will be of benefit to you (and to me).
-
- Brain-Bug
- Beiträge: 564
- Registriert: 26.07.2013 15:41
Re: Long running work unit
NICE!!! Congratulations!
That's one of the work units that I'm working on too (I'm your wingman ). I've got 165 days invested in it, and will have to redo calculations to see how long it'll take me
Your PC is ~40% faster than mine, so .... It's going to take me a while!
That's one of the work units that I'm working on too (I'm your wingman ). I've got 165 days invested in it, and will have to redo calculations to see how long it'll take me
Your PC is ~40% faster than mine, so .... It's going to take me a while!
- Michael H.W. Weber
- Vereinsvorstand
- Beiträge: 22435
- Registriert: 07.01.2002 01:00
- Wohnort: Marpurk
- Kontaktdaten:
Re: Long running work unit
Fantastic!vkliber hat geschrieben:Success . WU is completed, CPU time 27,928,680.00 (more than 323 days) ... ufff. PC now have a rest .
Hopefully it will be of benefit to you (and to me).
Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.
http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B
http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B
-
- Mikrocruncher
- Beiträge: 23
- Registriert: 15.03.2011 10:40
Re: Long running work unit
Very sad news today, two of my longest running WU's decided to error out.
Task 14950784
and Task 149656
one had been going since July 2015 and had gotten to 97.67% and done 7835 hours.
the other was at around 78% with 3861 hours on the clock.
If someone is able to maybe look at the logs and make sure I didn't do anything wrong as I find it a little coincidental that both got 'computation errors' at the same time even through they were at different stages through the WU?
Thank you,
Ian
Task 14950784
and Task 149656
one had been going since July 2015 and had gotten to 97.67% and done 7835 hours.
the other was at around 78% with 3861 hours on the clock.
If someone is able to maybe look at the logs and make sure I didn't do anything wrong as I find it a little coincidental that both got 'computation errors' at the same time even through they were at different stages through the WU?
Thank you,
Ian
-
- Brain-Bug
- Beiträge: 564
- Registriert: 26.07.2013 15:41
Re: Long running work unit
Ian,
I'm sorry. That's a shame.
I summarized the logs, below. I'm not sure what happened to your tasks, but it seems that SOMETHING happened between (2016-08-28 18:45:23) and (2016-08-28 19:42:19) [the time I would have expected a snapshot], local time. Not sure if you had a blue-screen, or power outage, or abrupt PC restart, or VirtualBox Service failure, or what, between those times. Might want to look at Event Viewer to see if you get any hints.
Also, I noticed you were using VirtualBox 4.2.16, a very old version, that is susceptible to service failures. For older v1.15 tasks, I'm using (and recommend) the latest 4.3.x, VirtualBox 4.3.40. But since you don't have those old tasks anymore, now might be a good time to upgrade to the latest 5.0.x, VirtualBox v5.0.26.
Sorry for your loss,
Jacob
-----------------------------------
http://www.rnaworld.de/rnaworld/result. ... d=14950784
- 2016-08-28 18:45:23: Last successful snapshot taken
- 2016-08-28 19:02:18: VM started, resuming from last checkpoint, but something went wrong during or shortly after, because it did not take another snapshot again.
- 2016-08-28 22:05:54: Retry, failed.
- 2016-08-29 21:21:46: Retry, failed.
- 2016-08-30 22:35:02: Retry, failed.
- 2016-09-03 10:20:46: Retry, failed.
- 2016-09-03 14:03:36: Retry, failed.
- 2016-09-04 12:27:03: Retry, failed.
- 2016-09-04 18:23:57: Retry, failed.
- 2016-09-05 18:18:48: Retry, determined it wanted to re-register the VM, caught an error on re-registration: Error Description: Trying to open a VM config 'C:\ProgramData\BOINC\slots\8\boinc_dd2b3ffe17b31245\boinc_dd2b3ffe17b31245.vbox' which has the same UUID as an existing virtual machine
http://www.rnaworld.de/rnaworld/result. ... d=14949656
- 2016-08-28 18:43:14: Last successful snapshot taken
- 2016-08-28 19:02:19: VM started, resuming from last checkpoint, but something went wrong during or shortly after, because it did not take another snapshot again.
- 2016-08-28 22:05:51: Retry, failed.
- 2016-08-29 21:21:42: Retry, failed.
- 2016-08-30 22:34:59: Retry, failed.
- 2016-09-03 10:20:42: Retry, failed.
- 2016-09-03 14:03:33: Retry, failed.
- 2016-09-04 12:27:00: Retry, failed.
- 2016-09-04 18:23:53: Retry, failed.
- 2016-09-05 17:43:30: Retry, determined it wanted to re-register the VM, caught an error on re-registration: Error Description: Trying to open a VM config 'C:\ProgramData\BOINC\slots\9\boinc_4cf75c0427eca566\boinc_4cf75c0427eca566.vbox' which has the same UUID as an existing virtual machine
I'm sorry. That's a shame.
I summarized the logs, below. I'm not sure what happened to your tasks, but it seems that SOMETHING happened between (2016-08-28 18:45:23) and (2016-08-28 19:42:19) [the time I would have expected a snapshot], local time. Not sure if you had a blue-screen, or power outage, or abrupt PC restart, or VirtualBox Service failure, or what, between those times. Might want to look at Event Viewer to see if you get any hints.
Also, I noticed you were using VirtualBox 4.2.16, a very old version, that is susceptible to service failures. For older v1.15 tasks, I'm using (and recommend) the latest 4.3.x, VirtualBox 4.3.40. But since you don't have those old tasks anymore, now might be a good time to upgrade to the latest 5.0.x, VirtualBox v5.0.26.
Sorry for your loss,
Jacob
-----------------------------------
http://www.rnaworld.de/rnaworld/result. ... d=14950784
- 2016-08-28 18:45:23: Last successful snapshot taken
- 2016-08-28 19:02:18: VM started, resuming from last checkpoint, but something went wrong during or shortly after, because it did not take another snapshot again.
- 2016-08-28 22:05:54: Retry, failed.
- 2016-08-29 21:21:46: Retry, failed.
- 2016-08-30 22:35:02: Retry, failed.
- 2016-09-03 10:20:46: Retry, failed.
- 2016-09-03 14:03:36: Retry, failed.
- 2016-09-04 12:27:03: Retry, failed.
- 2016-09-04 18:23:57: Retry, failed.
- 2016-09-05 18:18:48: Retry, determined it wanted to re-register the VM, caught an error on re-registration: Error Description: Trying to open a VM config 'C:\ProgramData\BOINC\slots\8\boinc_dd2b3ffe17b31245\boinc_dd2b3ffe17b31245.vbox' which has the same UUID as an existing virtual machine
http://www.rnaworld.de/rnaworld/result. ... d=14949656
- 2016-08-28 18:43:14: Last successful snapshot taken
- 2016-08-28 19:02:19: VM started, resuming from last checkpoint, but something went wrong during or shortly after, because it did not take another snapshot again.
- 2016-08-28 22:05:51: Retry, failed.
- 2016-08-29 21:21:42: Retry, failed.
- 2016-08-30 22:34:59: Retry, failed.
- 2016-09-03 10:20:42: Retry, failed.
- 2016-09-03 14:03:33: Retry, failed.
- 2016-09-04 12:27:00: Retry, failed.
- 2016-09-04 18:23:53: Retry, failed.
- 2016-09-05 17:43:30: Retry, determined it wanted to re-register the VM, caught an error on re-registration: Error Description: Trying to open a VM config 'C:\ProgramData\BOINC\slots\9\boinc_4cf75c0427eca566\boinc_4cf75c0427eca566.vbox' which has the same UUID as an existing virtual machine
-
- Mikrocruncher
- Beiträge: 23
- Registriert: 15.03.2011 10:40
Re: Long running work unit
Thank you for having a look for me Jacob.
I am upgrading Vbox and BOINC as I type.
All I found was a Vbox application error at 19:02. Didn't tell me real lot. I'm always worried about updating things like Boinc and VBox when they are in the middle of big jobs.
Thank you again,
Ian
I am upgrading Vbox and BOINC as I type.
All I found was a Vbox application error at 19:02. Didn't tell me real lot. I'm always worried about updating things like Boinc and VBox when they are in the middle of big jobs.
Thank you again,
Ian
-
- Brain-Bug
- Beiträge: 564
- Registriert: 26.07.2013 15:41
Re: Long running work unit
You're welcome.
You are correct that it's inadvisable to upgrade BOINC or VirtualBox while having big tasks on hand, but there is a way to test it (which involves exiting BOINC, watching all the BOINC-related processes exit in Task Manager, backing up your data folder, disconnect all network adapters, and testing, and if something goes wrong the project won't be notified since you were offline, and you can restore your data folder). I do this all the time, because all my machines run Windows 10 Insider Fast-Ring and I always install and use new test versions of VirtualBox. One day, I'll setup a guide.
In the meantime, my recommended rule of thumb is:
- If you don't do offline testing, then don't upgrade BOINC or VirtualBox while running long-running VM tasks.
- If you can/will do offline testing, then you should be able to:
--- upgrade BOINC to any publicly released version without problem
--- upgrade VirtualBox to any publicly-released version WITHIN THE SAME "MINOR" VERSION (ie: v5.0.20 -> v5.0.22, but NOT v5.0.20 -> v5.1.0) without problem.
It's very likely, to me, that your tasks crashed due to a VirtualBox Service crash. When I first started RNA World VM tasks, I had the exact same thing happen (multiple tasks crash at same time to a cluster of 2 tasks), at 2 Jul 2014, 11:09:24 UTC, using VirtualBox v4.3.12, and I lost 96 days and 125 days. VirtualBox has matured a ton since the VirtualBox v4.2 and early v4.3 days. New RNA World tasks support v5.0.x currently, but not v5.1.x yet.
So, it is important to update VirtualBox, to the latest version that your tasks support, whenever you have the chance.
You are correct that it's inadvisable to upgrade BOINC or VirtualBox while having big tasks on hand, but there is a way to test it (which involves exiting BOINC, watching all the BOINC-related processes exit in Task Manager, backing up your data folder, disconnect all network adapters, and testing, and if something goes wrong the project won't be notified since you were offline, and you can restore your data folder). I do this all the time, because all my machines run Windows 10 Insider Fast-Ring and I always install and use new test versions of VirtualBox. One day, I'll setup a guide.
In the meantime, my recommended rule of thumb is:
- If you don't do offline testing, then don't upgrade BOINC or VirtualBox while running long-running VM tasks.
- If you can/will do offline testing, then you should be able to:
--- upgrade BOINC to any publicly released version without problem
--- upgrade VirtualBox to any publicly-released version WITHIN THE SAME "MINOR" VERSION (ie: v5.0.20 -> v5.0.22, but NOT v5.0.20 -> v5.1.0) without problem.
It's very likely, to me, that your tasks crashed due to a VirtualBox Service crash. When I first started RNA World VM tasks, I had the exact same thing happen (multiple tasks crash at same time to a cluster of 2 tasks), at 2 Jul 2014, 11:09:24 UTC, using VirtualBox v4.3.12, and I lost 96 days and 125 days. VirtualBox has matured a ton since the VirtualBox v4.2 and early v4.3 days. New RNA World tasks support v5.0.x currently, but not v5.1.x yet.
So, it is important to update VirtualBox, to the latest version that your tasks support, whenever you have the chance.
-
- Mikrocruncher
- Beiträge: 23
- Registriert: 15.03.2011 10:40
Re: Long running work unit
That's good to know that there is a way to test it whilst upgrading, I didn't think of saving all the project data etc.
I was waiting for these ones to finish before doing any major updates (BOINC had been bugging me for quite some time about newer versions)
I was waiting for these ones to finish before doing any major updates (BOINC had been bugging me for quite some time about newer versions)