Could it not be reproduced by cutting the power to a computer while it's writing?ChristianB hat geschrieben:The thing is. vboxwrapper is supposed to do just what you all described. Keep the snapshot around until the next was written successfully. Unfortunately if the snapshot operation itself is disturbed by something it is still recognized as the last snapshot (although not usable) and vboxwrapper will not try to revert to the snapshot before that. This is hard to reproduce which is why nobody investigated this rare occurrence given the resource constraints at Rechenkraft and BOINC.
Long running work unit
-
- Mikrocruncher
- Beiträge: 30
- Registriert: 19.08.2017 13:56
Re: Long running work unit
-
- Brain-Bug
- Beiträge: 564
- Registriert: 26.07.2013 15:41
Re: Long running work unit
That's what I expected Christian. It'd be nice if VBoxWrapper could see that multiple snapshots are in the VM, and if the newest doesn't work, keep trying older ones until you find one that works.
Any chance of a code change to do that magic?
Any chance of a code change to do that magic?
-
- Admin
- Beiträge: 1920
- Registriert: 23.02.2010 22:12
Re: Long running work unit
Of course. If you find a developer who is willing to do this to his development machine please let me know. It might be possible to programmatically interrupt VBox when writing to simulate that but I don't know enough about this to be sure or to investigate.Peter Hucker hat geschrieben:Could it not be reproduced by cutting the power to a computer while it's writing?ChristianB hat geschrieben:The thing is. vboxwrapper is supposed to do just what you all described. Keep the snapshot around until the next was written successfully. Unfortunately if the snapshot operation itself is disturbed by something it is still recognized as the last snapshot (although not usable) and vboxwrapper will not try to revert to the snapshot before that. This is hard to reproduce which is why nobody investigated this rare occurrence given the resource constraints at Rechenkraft and BOINC.
Jacob: Rom was the only one working on vboxwrapper and he is not working on BOINC anymore. There is a guy at CERN who does some work on vboxwrapper for LHC@home but they don't use snapshots so they don't touch that. In respect to RNA World I have more important projects that I want to do before looking into vboxwrapper.
-
- Brain-Bug
- Beiträge: 564
- Registriert: 26.07.2013 15:41
Re: Long running work unit
I understand, Christian. I was asking from a more "academic" "is it even possible to do" perspective
-
- Brain-Bug
- Beiträge: 564
- Registriert: 26.07.2013 15:41
Re: Long running work unit
Yeah, bring them down!
-
- Brain-Bug
- Beiträge: 564
- Registriert: 26.07.2013 15:41
Re: Long running work unit
RacerX, after having a bit of a heat stroke, proved that he loves his new water cooler...
... by COMPLETING his longest-running-task-ever, less than a day after hardware installation!
355.4d
http://www.rnaworld.de/rnaworld/result. ... d=14952308
This was his last v1.17 task, so ... he'll be moving to the Fast Ring partition, as soon as I get Zathras figured out.
... by COMPLETING his longest-running-task-ever, less than a day after hardware installation!
355.4d
http://www.rnaworld.de/rnaworld/result. ... d=14952308
This was his last v1.17 task, so ... he'll be moving to the Fast Ring partition, as soon as I get Zathras figured out.
Re: Long running work unit
Congrats again, Mr. Monster-Hunter!
-
- Brain-Bug
- Beiträge: 564
- Registriert: 26.07.2013 15:41
Re: Long running work unit
Speed just finished a near-record-breaker for him.
303.7d
http://www.rnaworld.de/rnaworld/workuni ... id=6330837
303.7d
http://www.rnaworld.de/rnaworld/workuni ... id=6330837
Re: Long running work unit
More than three years for validating. Impressive...
-
- XBOX360-Installer
- Beiträge: 86
- Registriert: 23.02.2010 18:43
- Wohnort: northern Alabama, US
Re: Long running work unit
This task completed in 2012, apparently successfully, but is still waiting for validation:
http://www.rnaworld.de/rnaworld/result. ... d=14832293
I suspect that it reached a limit of 50 failures for wingmates.
Could you check if repeating it under VM would give a useful output to compare it to?
This workunit appears to be stuck at 98.765% progress; the progress has not changed in the last 86 hours. However, a checkpoint was written within the last 10 minutes.
http://www.rnaworld.de/rnaworld/result. ... d=14953356
It was interrupted by what appeared to be a Windows 10 update, followed by running several MT tasks from another BOINC project that required using all CPU cores BOINC is allowed to use.
The estimated remaining time is counting down by one second every second 9 times, then jumping up by 9 seconds in the next second, then repeating that over and over.
Is this normal behavior for a task apparently that close to completion? For example, is the progress not allowed to advance past 98.765% until the VirtualBox portion finishes? Is the VirtualBox portion telling the wrapper enough information that the wrapper can base its progress percentage on something more than the elapsed time as a percentage of the time it estimates the task will run?
http://www.rnaworld.de/rnaworld/result. ... d=14832293
I suspect that it reached a limit of 50 failures for wingmates.
Could you check if repeating it under VM would give a useful output to compare it to?
This workunit appears to be stuck at 98.765% progress; the progress has not changed in the last 86 hours. However, a checkpoint was written within the last 10 minutes.
http://www.rnaworld.de/rnaworld/result. ... d=14953356
It was interrupted by what appeared to be a Windows 10 update, followed by running several MT tasks from another BOINC project that required using all CPU cores BOINC is allowed to use.
The estimated remaining time is counting down by one second every second 9 times, then jumping up by 9 seconds in the next second, then repeating that over and over.
Is this normal behavior for a task apparently that close to completion? For example, is the progress not allowed to advance past 98.765% until the VirtualBox portion finishes? Is the VirtualBox portion telling the wrapper enough information that the wrapper can base its progress percentage on something more than the elapsed time as a percentage of the time it estimates the task will run?
-
- Brain-Bug
- Beiträge: 564
- Registriert: 26.07.2013 15:41
Re: Long running work unit
Michael may be able to confirm, but ... it's likely that that work unit was issued using the "cmsearch XXL (long)" app (before VM apps existed), and in order to get a VM wingman for it, they set initial replication to 0 to not send any more under that app, and your wingman has a different work unit with initial replication of 1 using the "cmsearch VM (VirtualBox) 1.0.2" app.robertmiles hat geschrieben:This task completed in 2012, apparently successfully, but is still waiting for validation:
http://www.rnaworld.de/rnaworld/result. ... d=14832293
I suspect that it reached a limit of 50 failures for wingmates.
Could you check if repeating it under VM would give a useful output to compare it to?
We've explained this before. YES, it is normal for it to be at 98.765%, and remain there. Also, time remaining IRRELEVANT for these VM tasks, and the behavior you are seeing (counts down, then jumps to larger value), is completely normal. The reasoning is that, it did it's best job to estimate how long it'd take, but it was way off. I've seen it off by a factor of 3 or 4, before. When it is off, the task will remain at 98.765%, until completion. The rule is: If the task is still using CPU, then DO NOT ABORT.robertmiles hat geschrieben:This workunit appears to be stuck at 98.765% progress; the progress has not changed in the last 86 hours.
http://www.rnaworld.de/rnaworld/result. ... d=14953356
It was interrupted by what appeared to be a Windows 10 update, followed by running several MT task from another BOINC project that required using all CPU cores BOINC is allowed to use.
The estimated remaining time is counting down by one second every second 9 times, then jumping up by 9 seconds in the next second, then repeating that over and over.
Is this normal behavior for a task apparently that close to completion? For example, is the progress not allowed to advance past 98.765% until the VirtualBox portion finishes? Is the VirtualBox portion telling the wrapper enough information that the wrapper can base its progress percentage on something more than the elapsed time as a percentage of the time it estimates the task will run?
I have had tasks at 98.765% for many many many months, before they completed successfully. Completely normal. These are MONSTERS. For your particular task, a real rough estimate might be that it will take 6 months to 18 months to complete. A wingman hasn't completed it, so I can't give any better estimates, sorry. But your work unit's "estimated runtime on reference system" value of 14 weeks (also a lousy estimate, fyi) ... is one of the largest ones that I've seen.
For your reference, here are some data points on my completed tasks ... indicating when they went to "98.765%", and when they completed:
98.765%, Completed
106.9d, 303.7d
105.9d, 192.4d
123.1d, 216.4d
117.1d, 144.7d
118.3d, 320.3d
126.3d, 189.4d
99.4d, 228.5d
100.9d, 276.1d
177.3d, 355.4d
165.2d, 318.8d
144.6d, 226.9d
225.0d, 353.8d
192.2d, 555.9d
I'm working on a HUGE MONSTER right now, on a slow laptop, where .... it went 98.765% at 196.5d, and is currently at 540d ! Talk about patience!
So, as you can see.... you may only be a third the way complete, when you reach that 98.765% point. MONSTERS.
If they're still using CPU, and you feel like continuing the challenge, then DO NOT ABORT.
I'd be careful about the upcoming Fall Creators Update, though. I'm not sure which version of VirtualBox is/isn't compatible. If you are using a v5.1.x version, you might want to carefully close BOINC then upgrade to the LATEST v5.1.x version, at some point. But do NOT upgrade to v5.2.x, because your v1.18 task NEEDS v5.1.x.
Regards,
Jacob Klein