Harmonious Trees 0.03
Harmonious Trees 0.03
This version contains a fix to abort long running workunits after ~ 2 days of runtime and send the intermediate result back to the server. The server reissues this workunit to continue from the point where it was aborted.
yoyo
yoyo
Re: Harmonious Trees 0.03
yoyo, what to do with WU's running @ ht 0.02?
Re: Harmonious Trees 0.03
If it is a ong running wu you should consider to abort it. If this result will meet a version 0.03 wu, which was aborted it will most probably be invalid.
yoyo
yoyo
Re: Harmonious Trees 0.03
What if a slow host meets a fast one? Will they still validate? 2 days on the slow one will sure result in a way different checkpoint file than the one of the fast host.
vi BOINC/checkin_notes
:1,$s/bug/feature/g
:wq!
Erzaehlen sich Biologen eigentlich Klein-RNA-Witze?
:1,$s/bug/feature/g
:wq!
Erzaehlen sich Biologen eigentlich Klein-RNA-Witze?
Re: Harmonious Trees 0.03
Hi Ananas,
The timing of abortion, or more precisely "premature completion when taking too much time", is controlled by an internal counter. If the counter is not messed up, things will work out. On the other hand, the counter is an estimation, therefore the timing may not be exactly 2 days, and it depends on machine.
fwjmath.
The timing of abortion, or more precisely "premature completion when taking too much time", is controlled by an internal counter. If the counter is not messed up, things will work out. On the other hand, the counter is an estimation, therefore the timing may not be exactly 2 days, and it depends on machine.
fwjmath.
Re: Harmonious Trees 0.03
Hi!
I also have a task (Result ID 12294926), that takes way too long (now a net 24h, with just over 5% done) for my personal taste. If it continues like this, it will never finish before the deadline. It might not even finish before i am finished . I am very reluctant to lending my computer time to a task with that unspecific and long enduring characteristics and I am seriously considering canceling/aborting the task. You are writing tasks will intentionally self-abort after about 2 days? Are there any credits given then? And can you give more details to this "counter" mentioned? Does the current counter show up anywhere, for example in the ckpt.txt file?
For info: my ckpt.txt file currently looks like this:
34
189721
68890281
590248
68295211
0
2176794844
0123433123333323332333233221233123
0123433123333323332333233221233123
01GHPL82?7:<>NAEDOCFI;49MBJ@K5=36D
406110275451
Can you read out anything from this, especially when the task might come to a regular end or at least to an intentionally prematurally self-aborted end?
I also have a task (Result ID 12294926), that takes way too long (now a net 24h, with just over 5% done) for my personal taste. If it continues like this, it will never finish before the deadline. It might not even finish before i am finished . I am very reluctant to lending my computer time to a task with that unspecific and long enduring characteristics and I am seriously considering canceling/aborting the task. You are writing tasks will intentionally self-abort after about 2 days? Are there any credits given then? And can you give more details to this "counter" mentioned? Does the current counter show up anywhere, for example in the ckpt.txt file?
For info: my ckpt.txt file currently looks like this:
34
189721
68890281
590248
68295211
0
2176794844
0123433123333323332333233221233123
0123433123333323332333233221233123
01GHPL82?7:<>NAEDOCFI;49MBJ@K5=36D
406110275451
Can you read out anything from this, especially when the task might come to a regular end or at least to an intentionally prematurally self-aborted end?
Re: Harmonious Trees 0.03
Hello Mayoran,
I would like to clear up several things first.
For app verion 0.03, the application will "prematurely self-abort", which means that it would quit running when a certain amount of computation is reached, and will return as a normal result, and the deserved credits will be granted. In this case, the result is nothing different than a normally finished result in the aspect of volunteers and credits. We call it "premature" only because the actual segment in the workunit is only partially finished in this case and the remaining part needs to be redistributed once again.
I don't know if you are familiar with a finished project called Rectilinear Crossing Number (RCN). In fact this kind of mechanism to avoid extremely long workunits had been used in RCN.
For the progress bar, I have to admit that it does not really reflect progress for very long workunits. However, there is another way to look for progress. Look at the last line of ckpt.txt. The app will "prematurely self-abort" when it reaches 2e12 and some more subtle kind of criteria. But roughly, you can see the last line of ckpt.txt as a not-so-just indicator, and 2e12 as a very soft threshold.
For your workunit, it is a typically long running one, just give it its time.
fwjmath.
I would like to clear up several things first.
For app verion 0.03, the application will "prematurely self-abort", which means that it would quit running when a certain amount of computation is reached, and will return as a normal result, and the deserved credits will be granted. In this case, the result is nothing different than a normally finished result in the aspect of volunteers and credits. We call it "premature" only because the actual segment in the workunit is only partially finished in this case and the remaining part needs to be redistributed once again.
I don't know if you are familiar with a finished project called Rectilinear Crossing Number (RCN). In fact this kind of mechanism to avoid extremely long workunits had been used in RCN.
For the progress bar, I have to admit that it does not really reflect progress for very long workunits. However, there is another way to look for progress. Look at the last line of ckpt.txt. The app will "prematurely self-abort" when it reaches 2e12 and some more subtle kind of criteria. But roughly, you can see the last line of ckpt.txt as a not-so-just indicator, and 2e12 as a very soft threshold.
For your workunit, it is a typically long running one, just give it its time.
fwjmath.
Re: Harmonious Trees 0.03
Hello fwjmath, thank you for your fast reply.fwjmath hat geschrieben:Hello Mayoran,
Ok, I did it understand that way.fwjmath hat geschrieben:I would like to clear up several things first.
For app verion 0.03, the application will "prematurely self-abort", which means that it would quit running when a certain amount of computation is reached, and will return as a normal result, and the deserved credits will be granted. In this case, the result is nothing different than a normally finished result in the aspect of volunteers and credits. We call it "premature" only because the actual segment in the workunit is only partially finished in this case and the remaining part needs to be redistributed once again.
Nope, not yet done that (sub)project. But I understand what you mean.fwjmath hat geschrieben: I don't know if you are familiar with a finished project called Rectilinear Crossing Number (RCN). In fact this kind of mechanism to avoid extremely long workunits had been used in RCN.
That's why I posted my file; thought anyway, that last line would be important.fwjmath hat geschrieben:For the progress bar, I have to admit that it does not really reflect progress for very long workunits. However, there is another way to look for progress. Look at the last line of ckpt.txt. The app will "prematurely self-abort" when it reaches 2e12 and some more subtle kind of criteria. But roughly, you can see the last line of ckpt.txt as a not-so-just indicator, and 2e12 as a very soft threshold.
However, since I am currently at roughly 4e11, with about 24h CPU time, in linear terms therefore about 20% done, that means I have roughly another 80% to go, which means about four full days of CPU-time? Hard to get!!! Might come close to deadline in practical terms!
And what other more 'subtle' criteria do you talk about? Is it possible, that my task will not end after these four long days?
Re: Harmonious Trees 0.03
Ok, finally, Task has ended on my slow Laptop, successfully. So far so good.
But task ended with counter at just above 1e12 and not at 2e12.
Hmmm...
But task ended with counter at just above 1e12 and not at 2e12.
Hmmm...
- Beyond
- Prozessor-Polier
- Beiträge: 111
- Registriert: 02.02.2008 01:48
- Wohnort: Rum River watershed, MN, USA
Re: Harmonious Trees 0.03
Don't think this fix is working. I have a .03 WU that's been running for 59 hours and is still at 25% completion:yoyo hat geschrieben:This version contains a fix to abort long running workunits after ~ 2 days of runtime and send the intermediate result back to the server. The server reissues this workunit to continue from the point where it was aborted.
yoyo
0.03 harmtrees hat_737_34-104574-1727118432_R_1315461737_2
59:03:07 elapsed - 25% - 177:09:10 time left - 225:25:32 deadline - 54.6 °C - Running High P.
What should I do with it?
Regards/Beyond
Re: Harmonious Trees 0.03
Hi Beyond,Beyond hat geschrieben:Don't think this fix is working. I have a .03 WU that's been running for 59 hours and is still at 25% completion:yoyo hat geschrieben:This version contains a fix to abort long running workunits after ~ 2 days of runtime and send the intermediate result back to the server. The server reissues this workunit to continue from the point where it was aborted.
yoyo
0.03 harmtrees hat_737_34-104574-1727118432_R_1315461737_2
59:03:07 elapsed - 25% - 177:09:10 time left - 225:25:32 deadline - 54.6 °C - Running High P.
What should I do with it?
Regards/Beyond
Could you please post the content of ckpt.txt here, so that we can do further investigation? We should note that the ~2days runtime limit is a soft one, which means it depends both on machine and on workunit. Moreover, the progress bar is only an estimation, since the actual progress is very hard to measure.
However, from ckpt.txt, in a case-by-case basis, we can have a better estimation about running time. As I said in previous posts, workunit should be finished once the last line of ckpt.txt exceeds 2e12 and some more subtle condition satisfied. Let T be the first tree making the last line of ckpt.txt exceeding 2e12. Workunit will stop once all trees with the same first subtree are processed. However, it is not easy to estimate this number automatically. Therefore, if you would like to pose your ckpt.txt here, I could give you an estimation of remaining running time.
But rest assure in all cases. We can prove mathematically that this application, when fed with correct input, will terminate.
fwjmath.
- Beyond
- Prozessor-Polier
- Beiträge: 111
- Registriert: 02.02.2008 01:48
- Wohnort: Rum River watershed, MN, USA
Re: Harmonious Trees 0.03
Here's the contents of ckpt.txt:fwjmath hat geschrieben:Hi Beyond,Beyond hat geschrieben:Don't think this fix is working. I have a .03 WU that's been running for 59 hours and is still at 25% completion:yoyo hat geschrieben:This version contains a fix to abort long running workunits after ~ 2 days of runtime and send the intermediate result back to the server. The server reissues this workunit to continue from the point where it was aborted.
yoyo
0.03 harmtrees hat_737_34-104574-1727118432_R_1315461737_2
59:03:07 elapsed - 25% - 177:09:10 time left - 225:25:32 deadline - 54.6 °C - Running High P.
What should I do with it?
Regards/Beyond
Could you please post the content of ckpt.txt here, so that we can do further investigation? We should note that the ~2days runtime limit is a soft one, which means it depends both on machine and on workunit. Moreover, the progress bar is only an estimation, since the actual progress is very hard to measure.
However, from ckpt.txt, in a case-by-case basis, we can have a better estimation about running time. As I said in previous posts, workunit should be finished once the last line of ckpt.txt exceeds 2e12 and some more subtle condition satisfied. Let T be the first tree making the last line of ckpt.txt exceeding 2e12. Workunit will stop once all trees with the same first subtree are processed. However, it is not easy to estimate this number automatically. Therefore, if you would like to pose your ckpt.txt here, I could give you an estimation of remaining running time.
But rest assure in all cases. We can prove mathematically that this application, when fed with correct input, will terminate.
fwjmath.
34
104573
256502765
22094756
233931644
0
2551069546
0123456745123456634123442123322222
0123456745123456634123442123322222
05E=FI;@HK9PM1B3L<D?7J>O26GN84C:A6
2470400999780
I restarted BOINC earlier to see if it would start progressing again. Hope that didn't mess it up. Looks like it's past 2e12 though.
Edit: the ckpt.txt now:
34
104573
278022765
23803576
253701846
0
2742674845
0123456745123456564543312345345111
0123456745123456564543312345345111
04KBJ@;3G25A<LMD6FO?>1CP789H=:IENC
2672443196488
Edit 2: the ckpt.txt this morning:
34
104573
339262765
28569057
310073384
0
3900813131
0123456745123456455523455542123453
0123456745123456455523455542123456
01FC5I4>GE@9:KPB3<7A;L86NHJMO2D?=K
3229669446809
Zuletzt geändert von Beyond am 15.09.2011 14:30, insgesamt 1-mal geändert.