What happens with some ECM_UC ? Very long running time

Alles zum Projekt yoyo@home
Everything about the project yoyo@home
Nachricht
Autor
marsinph
PDA-Benutzer
PDA-Benutzer
Beiträge: 31
Registriert: 05.04.2018 09:08

What happens with some ECM_UC ? Very long running time

#1 Ungelesener Beitrag von marsinph » 02.12.2018 19:02

Hello,
On three of my hosts, I have two very special WU !
Normaly, I return my WU after a little more than one hour.
But Those six WU, already between 17 and 22 hours running and stay at 40%
The common of those ECM 705.02 wu is :
ecm_uc_1543642398_np_195_850e6_6_"xxx" (not P1, not P2 !!!
ecm_uc_1543642398_np_195_850e6_6_7300_0 (who use 1.1Giga RAM without changes).
ecm_uc_1543642398_np_195_850e6_6_6535_1
ecm_uc_1543642398_np_195_850e6_6_7735_0
ecm_uc_1543642398_np_195_850e6_6_6660_0
ecm_uc_1543642398_np_195_850e6_6_725_1 (one Gb RAM, but it vary)
and
ecm_uc_1543642398_np_195_850e6_6_365_0 (one Gb ram but vary

Then the others WU starts with
ecm_uc_1543642398_np_195_2900e6_01_...... who take about 8-9 hours to run !!!
But they are "P1" task. So it seems more or less normal running time
I repeat, all my WU since months were finished after 60-90 minuts !
All tasks use about 2Mb RAM, ecmwrapper about 1Mb.


What we need to do with those very special long WU ?
Restarting BAM is not the solution because no checkpoint !!!

Benutzeravatar
yoyo
Vereinsvorstand
Vereinsvorstand
Beiträge: 8045
Registriert: 17.12.2002 14:09
Wohnort: Berlin
Kontaktdaten:

Re: What happens with some ECM_UC ? Very long running time

#2 Ungelesener Beitrag von yoyo » 02.12.2018 19:41

Yes those ecm_uc_1543642398_np_195_850e6* workunits runs long. I see runtimes from 15 to 30 hours. RAM consumption is high, but will not exceed 1,8 GB RAM.

As for every ecm workunit (not P1 or P2) every 20% a checkpoint is done.
Each ecm runs 5 curves and each curve has a 2 stages where stage 1 runs long and needs less ram and stage 2 runs shorter but needs much RAM. Ration between stage 1 and 2 is roughly 4:1.
HILF mit im Rechenkraft-WiKi, dies gibts zu tun.
Wiki - FAQ - Verein - Chat

Bild Bild

marsinph
PDA-Benutzer
PDA-Benutzer
Beiträge: 31
Registriert: 05.04.2018 09:08

Re: What happens with some ECM_UC ? Very long running time

#3 Ungelesener Beitrag von marsinph » 02.12.2018 21:26

yoyo hat geschrieben:Yes those ecm_uc_1543642398_np_195_850e6* workunits runs long. I see runtimes from 15 to 30 hours. RAM consumption is high, but will not exceed 1,8 GB RAM.

As for every ecm workunit (not P1 or P2) every 20% a checkpoint is done.
Each ecm runs 5 curves and each curve has a 2 stages where stage 1 runs long and needs less ram and stage 2 runs shorter but needs much RAM. Ration between stage 1 and 2 is roughly 4:1.


Hello, thank you for explanation.
For short WU, there is checkpoint each 1200 seconds !!! Why not for the huge WU who take more than one thousand time to run ???
Then about RAM use of the huge WU, on all my three host (all the same CPU/RAM) ECM, take only a few megabyte !!!
Sometimes about one Giga, but not long (about 10 minuts) !

About your ratio stage 1 and 2. (I not consider RAM)
All very "rough estimation"
If stage 1 take 25% running, It would say, if the first stage take 12 hours to run, so the stage two would need 3 hours to finish !
It is not !
Those monster WU were at 20% after about 3 hours.
20 hours later, only 40% ! So I think the ratio is not 4:1 but 1:4.
It will says, those WU will be finished AFTER deadline (only four days !!!)
The small WU have a deadline of 10 days. The monster only four days (running 24/24, without restart, logoff,......)!!!


I hope the given credits will also be consequent !? On the same host (to be able to compare)
P1 WU runs on my host 29,300sec for a credit of 322.97 (ratio : 39.6 credit / hour)
And ECM_xy_.or hc, runs about 1.2 hour (3900sec for 76.98 credits) (ratio : 71 credit / hour)
Twice more for small WU !
It already shows how longer WU, how less credits !!!
And not considering no checkpoint , errors, host restart with lost of crunched part
I can not do any update on my host who require a restart because the very long WU block all !

My conclusion, the next time, I receive such "monster", I cancel it.
Sorry.It is not the goal of the research, but I will not block some hostys for nothing.
Look my signature, you will see I crunch on several project, also my team.

I will let finish the monster on one host, with hope on credits according running time.
The other still running, I will abort if they take more than 8 hours to completion.
Once again, sorry, but I need to do maintenance on my hosts.

Suggestion : reduce the size of the WU ! The monster requires 266TFLOPs, yes two hundred sixty-six TERRA FLOPs !!!
Only CPU !!!
To compare a i7-2600k OC to 4.0Ghz have a power of 4.22 GFLOPS (and 13.5 GINOPS (integer).


Best regards

marsinph
PDA-Benutzer
PDA-Benutzer
Beiträge: 31
Registriert: 05.04.2018 09:08

Re: What happens with some ECM_UC ? Very long running time

#4 Ungelesener Beitrag von marsinph » 03.12.2018 15:07

Hello,
Like expected, the credit for the "monster" WU is much lower as for small WU !!! 1713CR / 124,348 sec : so about 49 credit/hour

How to do to not receive those WU with expected running time of 2 days ?
The here above was predicted running 9 hours, it took 34 hours !
I changed the BAM local prefenrences to accept only work for 0.5 days. But still receiving "monster" with a deadline of 5 days !!!
Considering here above and a ration of 1 / 3 (expected / running). the received WU will never finish before deadline !
So I need to abort. It is not the goal of the project. But the only solution, unless someone has an idea to not get the ECM_UC_...._NP_195..... ?
Best regards

Benutzeravatar
Michael H.W. Weber
Vereinsvorstand
Vereinsvorstand
Beiträge: 22419
Registriert: 07.01.2002 01:00
Wohnort: Marpurk
Kontaktdaten:

Re: What happens with some ECM_UC ? Very long running time

#5 Ungelesener Beitrag von Michael H.W. Weber » 04.12.2018 12:34

Excellent. I need to re-connect my machines to this project to get such nice long tasks.

Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.

http://signature.statseb.fr I: Kaputte Seite A
http://signature.statseb.fr II: Kaputte Seite B

Bild Bild Bild

marsinph
PDA-Benutzer
PDA-Benutzer
Beiträge: 31
Registriert: 05.04.2018 09:08

Re: What happens with some ECM_UC ? Very long running time

#6 Ungelesener Beitrag von marsinph » 01.03.2019 19:41

Hello,
I see it start again !
ECM_UC_........
Project write in "input file" computation about 98,000GFLOPS ninety-eight thousand)
With a host with 4.5 GFLOPS (four) artythemetic, host need about 6 hours.
Not 25 hours.

Estimated running by BOINC, about 2 hours. After 12 hours still running at 68%
I have let finish one WU. 25 hours for a ridiculous credit of 52 !! (fifty-two) for 25 hours !!!
Host i7-3930K 12cores at 4.2Ghz !!!

ECM_XY_... no any problem with a credit about 142.09. and running time between 2-3 hours
I have set my preferences to NOT accept WU more than 0.5 days. Project still send.
Because no checkpoint, by host restart (or logoff), all is lost.
I can agree to loose one hour, not one day !!!
For sure for such total ridiculous credts.
By comparaison ODLK, give about 6,000 / day (running 24/24)
ECM_UC give 400 (8 simultanous, 24/24)

I have test on a ridiculous dual core on a Eepc 1.8Ghz, ODLK give 800 on 2 cores 24/24


I will not block powerfull host for as good as nothing ! Peanuts is perhaps beter word.
So, when I see ECM_UC_... I cancel . Sorry for other users.

No any reaction from projet admin !!!

Benutzeravatar
yoyo
Vereinsvorstand
Vereinsvorstand
Beiträge: 8045
Registriert: 17.12.2002 14:09
Wohnort: Berlin
Kontaktdaten:

Re: What happens with some ECM_UC ? Very long running time

#7 Ungelesener Beitrag von yoyo » 01.03.2019 23:45

You are blaming very loud an a lot without giving a link to any example. And your keyboard is broken, it often types many '!'.
And if you are blaming much more, I'm not willing to answer.

What I see in the results is, that ecm_uc are granted with 530 credits and runtimes are between 5 and 15 hours.
HILF mit im Rechenkraft-WiKi, dies gibts zu tun.
Wiki - FAQ - Verein - Chat

Bild Bild

marsinph
PDA-Benutzer
PDA-Benutzer
Beiträge: 31
Registriert: 05.04.2018 09:08

Re: What happens with some ECM_UC ? Very long running time

#8 Ungelesener Beitrag von marsinph » 06.03.2019 20:10

Hello Yoyo,
Sorry for repeating of "!" or "?"
You ask some link to WU.
Please read above.
Then look this http://www.rechenkraft.net/yoyo/workuni ... d=44471340
I have one WU running. but impossible to send status of stderr.txt because it is locked
It is also impossible to copy/paste the name of WU when on host. And when it is returned, all vanish !
Here the link to the only one WU running on my host.. May I ask you to follow it ?
http://www.rechenkraft.net/yoyo/workuni ... d=44737333
Running on host http://www.rechenkraft.net/yoyo/show_ho ... tid=427975

I let finish it fully. So you will be able to see all. I think it will be finished after 07th march 06UTC (I hope)
As admin, you can see it. It is not a "monster" but also not a little
If I close BOINC and go to the slot, it is empty because no checkpoint for big WU.
And it start again from zero !
With other words with your "ecm_ru" WU, it seem there are problems.
For sure about estimated computing size given by the WU itself
I repeat, if one WU have about 88,000GFLOPS, on a host with 4GFLOPS, it would take about 22,000 seconds (6.1hours)
Not two times more/ The WU here above mentionned runs already 6.5hours and stay at 60% (Boinc given, so not trustable, I agree)
But because stderr.txt is not readable, it is impossible to see what happen.


Once again my excuses if my previous message was not very clear.
But I see more and more complaints about the ecm_ru.
I think i am not the only who run several project at the same time. But If there are so much problem, probably, some will leave.
Not because the project itself, but because the competitions between users/teams
Once again I repeat, it is only about ecm_ru.
No any problem with others.





All others WU are without any problem. (I not consider P1, you have explain. So it is OK)

Benutzeravatar
yoyo
Vereinsvorstand
Vereinsvorstand
Beiträge: 8045
Registriert: 17.12.2002 14:09
Wohnort: Berlin
Kontaktdaten:

Re: What happens with some ECM_UC ? Very long running time

#9 Ungelesener Beitrag von yoyo » 06.03.2019 20:34

This ecm_uc workunit which you linked was running 7 hours, not as claimed above 25. And you got 366 credits for it and not 52 as you claimed in your posting. You computer claimed 280 credits for it, but got 366. So I do not see any big problem here.
This ecm_uc workunit ends in its name with P1. All P1 and P2 workunits have no checkpoints, technical not possible.

The ecm_ru workunit is from this batch ecm_ru_1551846634_*. Some of them returned already. Runtime was between 5 and 12 hours. The workunit names do NOT end with P1 or P2, means they checkpoint every 20%. So checkpoints are written every 1 or 2,4 hours. And you will get 564 credits for it.
HILF mit im Rechenkraft-WiKi, dies gibts zu tun.
Wiki - FAQ - Verein - Chat

Bild Bild

marsinph
PDA-Benutzer
PDA-Benutzer
Beiträge: 31
Registriert: 05.04.2018 09:08

Re: What happens with some ECM_UC ? Very long running time

#10 Ungelesener Beitrag von marsinph » 06.03.2019 21:51

Sorry,
I give up !
I have try to be as accurate as possible.

I never speak about the WU as above.
Nor P1 nor P2 !

Once again I repat the problem is the check point. If host retsart, all work is lost !

Then the time given is fully false
The WU still running already "blocked" at 60% after 2hours and 30 minuts. No any change visible on BOINC ( I know it is not a trust reference)
But the WU still running give me a running time of 7 (seven) hours but a CPU time of 8 (eight) hours !

And still no any explanation about the computing size given by WU itself (not BOINC, I repeat).


I have analyze returned (validated WU); nothing to see.
And I not speak about credits !
How longer, how less (of course comparaison on the same host)

So from no, when I see a ecm_ru, I cancel. Sorry.
All other WU do not have any problem. Why ecm_ru ?

For sure not a "client" problem.

Philippe

Benutzeravatar
yoyo
Vereinsvorstand
Vereinsvorstand
Beiträge: 8045
Registriert: 17.12.2002 14:09
Wohnort: Berlin
Kontaktdaten:

Re: What happens with some ECM_UC ? Very long running time

#11 Ungelesener Beitrag von yoyo » 07.03.2019 06:47

First you blamed credits of ecm_uc. Than I explained it. Than you said this is not the problem.
Than you blamed ecm_ru has no checkpoints and runs too long. So I eyplained how long they run and when they checkpoint. Even this is not what you want.
Don't know why they run long on your host but not on others.
Seems that we do not understand each other.
In this case further communication doesn_t make any sense.
HILF mit im Rechenkraft-WiKi, dies gibts zu tun.
Wiki - FAQ - Verein - Chat

Bild Bild

Benutzeravatar
yoyo
Vereinsvorstand
Vereinsvorstand
Beiträge: 8045
Registriert: 17.12.2002 14:09
Wohnort: Berlin
Kontaktdaten:

Re: What happens with some ECM_UC ? Very long running time

#12 Ungelesener Beitrag von yoyo » 07.03.2019 18:46

So, finaly your ecm_ru workunit finished: http://www.rechenkraft.net/yoyo/result. ... d=59062437
It used 9.4 cpu hours, your host claimed 330 credits but got 564.
In stderr you can see that 5 times ecm was started by the wrapper. Means everytime a checkpoint was made, so always after 20%.

I do not see where the problem is.
HILF mit im Rechenkraft-WiKi, dies gibts zu tun.
Wiki - FAQ - Verein - Chat

Bild Bild

Antworten

Zurück zu „Number crunching“