Anyone having issues with ecm 700.02 WUs?

Alles zum Projekt yoyo@home
Everything about the project yoyo@home
Nachricht
Autor
UBT - Timbo
Idle-Sammler
Idle-Sammler
Beiträge: 6
Registriert: 18.12.2016 11:58

Anyone having issues with ecm 700.02 WUs?

#1 Ungelesener Beitrag von UBT - Timbo » 18.12.2016 12:06

Hi all,

I've been crunching some yoyo@home tasks and I have some ecm 700.02 tasks that start off needing 6 hours to crunch.

I now have at least 6 of these tasks which are now only at 20% completion after nearly 12 hours...so, do I leave them to finish or do I abort them and start some of the other WUs in the cache?

regards
Tim

PS The deadline for these is fast approaching and if the tasks are all going to take this long, then some will not be finished in time...

CPU is a 3.6GHz Intel i7, with 6 cores running BOINC...so, I cannot see the CPU is an issue.

Benutzeravatar
yoyo
Vereinsvorstand
Vereinsvorstand
Beiträge: 8048
Registriert: 17.12.2002 14:09
Wohnort: Berlin
Kontaktdaten:

Re: Anyone having issues with ecm 700.02 WUs?

#2 Ungelesener Beitrag von yoyo » 18.12.2016 18:48

Can you give me the resultID's of those tasks?
HILF mit im Rechenkraft-WiKi, dies gibts zu tun.
Wiki - FAQ - Verein - Chat

Bild Bild

xyzzy
Fingerzähler
Fingerzähler
Beiträge: 1
Registriert: 18.12.2016 20:36

Re: Anyone having issues with ecm 700.02 WUs?

#3 Ungelesener Beitrag von xyzzy » 18.12.2016 20:42

Yes same problem, mine have been running 36 hrs, some at 80% complete, some as low as 20%. 1 WU ended, only got 250 credits, if another ends with the same amt of credit, I'm cancelling the rest and opting out of ECM.

UBT - Timbo
Idle-Sammler
Idle-Sammler
Beiträge: 6
Registriert: 18.12.2016 11:58

Re: Anyone having issues with ecm 700.02 WUs?

#4 Ungelesener Beitrag von UBT - Timbo » 19.12.2016 20:11

yoyo hat geschrieben:Can you give me the resultID's of those tasks?
Hi

Thanks for the msg.

It's a bit difficult to give these as the "Results for computer" page, doesn't list the WU names - so I have to check each one on BOINC Manager and hopefully I get the right file names...

So, the issue is with ecm_op_1481818542_444741478546331_17M.C235 files:

_2515_0 - this one is currently at 80% after 23 hours, and the remaining time is just increasing...
_2965_0 - now on 20% after 6+ hrs - remaining 1d 5hr
_2920_0 - now on 20% after 6+ hrs - remaining 1d 3hr
_2960_0 - now on 20% after 6+ hrs - remaining 1d 2hr
_2415_0 - now on 20% after 6+ hrs - remaining 1d 0hr
_2305_0 - now on 20% after 6+ hrs - remaining 1d 0hr
_2440_0 - now on 20% after 6+ hrs - remaining 23hr

I have had no issue with the ecm_xy files...these seem to be OK...but NOT the ecm_op files.

I have already aborted a number of ecm_op files as they were not going to be completed by the deadline...but I kept the above, as with a 7 core i7, I thought I would give these a chance to finish...but it looks unlikely.... :-(

Thanks in advance for your help

Tim
Zuletzt geändert von UBT - Timbo am 19.12.2016 22:43, insgesamt 1-mal geändert.

Benutzeravatar
yoyo
Vereinsvorstand
Vereinsvorstand
Beiträge: 8048
Registriert: 17.12.2002 14:09
Wohnort: Berlin
Kontaktdaten:

Re: Anyone having issues with ecm 700.02 WUs?

#5 Ungelesener Beitrag von yoyo » 19.12.2016 20:53

This is realy strange.
All of those WUs were finished with an runtime of 1 - 5 hours. Also a 32 bit Odroid (arm 32 bit) finished this WU after 11 hours.
There must something wrong with your boinc.
Have you throttled cpu usage or is your cpu clock reduced?

Would be interesting to see the stderr of such workunits.
- check in boinc in which slot the wu is running
- shutdown boinc
- send me the stderr

I only saw some crediting issues with ecm_ru_ workunits. There the Windows app needs 4 times longer than the Linux app while on other workunits it is vice versa. But anyway for such workunits it I give now 3 times more credits. I think this was the issue from @xyzzy.

yoyo
HILF mit im Rechenkraft-WiKi, dies gibts zu tun.
Wiki - FAQ - Verein - Chat

Bild Bild

UBT - Timbo
Idle-Sammler
Idle-Sammler
Beiträge: 6
Registriert: 18.12.2016 11:58

Re: Anyone having issues with ecm 700.02 WUs?

#6 Ungelesener Beitrag von UBT - Timbo » 19.12.2016 22:32

Hi

All of those WUs were finished with an runtime of 1 - 5 hours.

- Maybe, but not on my PC !!

There must something wrong with your boinc.

- Maybe, but ALL other projects crunch tasks perfectly well...and even the yoyo muon tasks and ecm_xy tasks are all OK

Have you throttled cpu usage or is your cpu clock reduced?

- Nope - running at standard CPU speed

Would be interesting to see the stderr of such workunits.
- check in boinc in which slot the wu is running - I have 13 slots with yoyo files - slots 1 thru 5 and 11 thru 18 - this is because I suspended 6 ecm_op tasks and I'm running 6 ecm_xy and 1 ecm_op tasks.
- shutdown boinc - if I do that, I am not sure that the crunching time for the 7 "running" ecm tasks will be checkpointed...as I noticed before, on some earlier ecm_op tasks, that I lost many hours, (with elapsed time going from 13+ hrs down to 6+ hours :-( ) when I shutdown BOINC, thinking it might be the issue...
- send me the stderr - that will have to wait until one ecm_oy file that is current, has finished.

regards and thanks

Tim

UBT - Timbo
Idle-Sammler
Idle-Sammler
Beiträge: 6
Registriert: 18.12.2016 11:58

Re: Anyone having issues with ecm 700.02 WUs?

#7 Ungelesener Beitrag von UBT - Timbo » 19.12.2016 23:53

UPDATE:

Ok, so I was WRONG about the ecm_xy tasks.

The 6 I had running were doing fine and got to about 97% or so and then, as one, they all restarted from zero again :-(

So, I shut down BOINC Manager and restarted it....

Looked in the slots folders and I found the folders for each of the yoyo tasks.

for the ecm_op 2515_0 task, here's the content of the stderr file:
wrapper: starting
wrapper: running ecm (-v -nn -timestamp -chkpnt checkpnt -inp in -maxmem 1800 110e6)
No heartbeat from core client for 30 sec - exiting
wrapper: starting
wrapper: running ecm (-v -nn -timestamp -chkpnt checkpnt -inp in -maxmem 1800 110e6)
No heartbeat from core client for 30 sec - exiting
wrapper: starting
wrapper: running ecm (-v -nn -timestamp -chkpnt checkpnt -inp in -maxmem 1800 110e6)
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
wrapper: starting
wrapper: running ecm (-v -nn -timestamp -chkpnt checkpnt -inp in -maxmem 1800 110e6)
No heartbeat from core client for 30 sec - exiting
wrapper: starting
wrapper: running ecm (-v -nn -timestamp -chkpnt checkpnt -inp in -maxmem 1800 110e6)
No heartbeat from core client for 30 sec - exiting
wrapper: starting
wrapper: running ecm (-v -nn -timestamp -chkpnt checkpnt -inp in -maxmem 1800 110e6)
No heartbeat from core client for 30 sec - exiting
wrapper: starting
wrapper: running ecm (-v -nn -timestamp -chkpnt checkpnt -inp in -maxmem 1800 110e6)
wrapper: running ecm (-v -nn -timestamp -chkpnt checkpnt -inp in -maxmem 1800 110e6)
wrapper: starting
wrapper: running ecm (-v -nn -timestamp -chkpnt checkpnt -inp in -maxmem 1800 110e6)
wrapper: running ecm (-v -nn -timestamp -chkpnt checkpnt -inp in -maxmem 1800 110e6)
wrapper: running ecm (-v -nn -timestamp -chkpnt checkpnt -inp in -maxmem 1800 110e6)
wrapper: running ecm (-v -nn -timestamp -chkpnt checkpnt -inp in -maxmem 1800 110e6)
wrapper: starting
wrapper: running ecm (-v -nn -timestamp -chkpnt checkpnt -inp in -maxmem 1800 110e6)
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
No heartbeat from core client for 30 sec - exiting
wrapper: starting
wrapper: running ecm (-v -nn -timestamp -chkpnt checkpnt -inp in -maxmem 1800 110e6)
No heartbeat from core client for 30 sec - exiting
wrapper: starting
wrapper: running ecm (-v -nn -timestamp -chkpnt checkpnt -inp in -maxmem 1800 110e6)
wrapper: starting
wrapper: running ecm (-v -nn -timestamp -chkpnt checkpnt -inp in -maxmem 1800 110e6)
The checkpoint.txt file says this:
4 73902.187500
and there's an "out" file full of numbers...not sure if this is relevant as it is very long.

So, if any of the files that have been created in various slots folders are of any use to you, please let me know and I'll maybe "zip" them up and you can look at them in detail.

regards
Tim

Benutzeravatar
yoyo
Vereinsvorstand
Vereinsvorstand
Beiträge: 8048
Registriert: 17.12.2002 14:09
Wohnort: Berlin
Kontaktdaten:

Re: Anyone having issues with ecm 700.02 WUs?

#8 Ungelesener Beitrag von yoyo » 20.12.2016 00:05

This "no heartbeat" looks bad and explains a bit the long runtime. It means that boinc restarts the workunit.
Do you run the latest boinc version?
HILF mit im Rechenkraft-WiKi, dies gibts zu tun.
Wiki - FAQ - Verein - Chat

Bild Bild

Benutzeravatar
Beyond
Prozessor-Polier
Prozessor-Polier
Beiträge: 111
Registriert: 02.02.2008 01:48
Wohnort: Rum River watershed, MN, USA

Re: Anyone having issues with ecm 700.02 WUs?

#9 Ungelesener Beitrag von Beyond » 20.12.2016 04:35

Code: Alles auswählen

ecm_ru_1481927124_10_1155.c367_1670_0		44:26:31 (42:45:18)	20.000	177:46:02	55:32:25	23.9 °C	0	700.01 ecm	Running High P.	[1] 20:53:19	74.04	1140.00 MB	
ecm_ru_1481927124_10_1155.c367_1770_0		44:26:31 (43:04:17)	20.000	177:46:07	55:31:23	23.9 °C	0	700.01 ecm	Running High P.	[1] 21:13:13	83.03	964.92 MB	
ecm_ru_1481927124_10_1155.c367_2450_0		44:26:31 (42:48:33)	20.000	177:46:07	55:31:52	23.9 °C	0	700.01 ecm	Running High P.	[1] 20:57:55	75.37	1031.77 MB	
ecm_ru_1481927124_10_1155.c367_2120_0		44:26:31 (43:02:36)	20.000	177:46:07	55:32:08	23.9 °C	0	700.01 ecm	Running High P.	[1] 21:11:53	88.70	995.49 MB	
ecm_ru_1481927124_10_1155.c367_3020_0		44:26:31 (42:54:26)	20.000	177:46:07	55:37:18	23.9 °C	0	700.01 ecm	Running High P.	[1] 21:03:11	80.83	1051.81 MB	
All 5 of the above WUs have been running for over 44 hours, estimate that they have 177 hours to go (with deadline at 55 hours). All are using a full CPU core and are using over 1GB memory each. BOINC 7.6.22. This machine has run more WUs than any other computer on the project :-). So far only 1 checkpoint and that's at around 21 hours. What do you want me to do?

Edit: All 5 jumped to 40% done at about 45.5 hours. New figures:

Code: Alles auswählen

ecm_ru_1481927124_10_1155.c367_1770_0		46:24:33 (44:47:33)	40.000	69:36:50	53:33:21	23.9 °C	0	700.01 ecm	Running High P.	[2] 01:03:47	97.92	151.10 MB	
ecm_ru_1481927124_10_1155.c367_2450_0		46:24:33 (44:31:09)	40.000	69:36:50	53:33:50	23.9 °C	0	700.01 ecm	Running High P.	[2] 00:49:15	98.31	151.09 MB	
ecm_ru_1481927124_10_1155.c367_2120_0		46:24:33 (44:50:07)	40.000	69:36:50	53:34:06	23.9 °C	0	700.01 ecm	Running High P.	[2] 01:06:24	100.00	151.10 MB	
ecm_ru_1481927124_10_1155.c367_1670_0		46:24:33 (44:26:46)	40.000	69:36:50	53:34:23	23.9 °C	0	700.01 ecm	Running High P.	[2] 00:42:35	97.45	151.09 MB	
ecm_ru_1481927124_10_1155.c367_3020_0		46:24:33 (44:35:40)	40.000	69:36:50	53:39:16	23.9 °C	0	700.01 ecm	Running High P.	[2] 00:51:39	98.94	151.10 MB	

Benutzeravatar
yoyo
Vereinsvorstand
Vereinsvorstand
Beiträge: 8048
Registriert: 17.12.2002 14:09
Wohnort: Berlin
Kontaktdaten:

Re: Anyone having issues with ecm 700.02 WUs?

#10 Ungelesener Beitrag von yoyo » 20.12.2016 09:51

I deployed a version 700.02 for Win 64 which should be much faster.

I would abort those workunits. They might run too long and error out with time limit exceeded.
Usualy they should finish in 50 hours.
HILF mit im Rechenkraft-WiKi, dies gibts zu tun.
Wiki - FAQ - Verein - Chat

Bild Bild

UBT - Timbo
Idle-Sammler
Idle-Sammler
Beiträge: 6
Registriert: 18.12.2016 11:58

Re: Anyone having issues with ecm 700.02 WUs?

#11 Ungelesener Beitrag von UBT - Timbo » 20.12.2016 11:38

yoyo hat geschrieben:This "no heartbeat" looks bad and explains a bit the long runtime. It means that boinc restarts the workunit.
Do you run the latest boinc version?
Hi OK - that's interesting.

The BOINC version is 7.6.22, so quite recent.

The PC spec is Win XP Pro, Intel i7 3820 @ 3.6GHz with 3Gb ram. So not a slow coach (usually).

I plan to abort these - I left 1x op and 1x xy running overnight and the elapsed time is higher but the %age to completion hasn't changed much.

Once aborted, I guess I will lose all the info in the folders, so if you need any info from the folders please advise. I will wait until I hear from you.

regards
Tim

Benutzeravatar
Beyond
Prozessor-Polier
Prozessor-Polier
Beiträge: 111
Registriert: 02.02.2008 01:48
Wohnort: Rum River watershed, MN, USA

Re: Anyone having issues with ecm 700.02 WUs?

#12 Ungelesener Beitrag von Beyond » 20.12.2016 18:31

yoyo hat geschrieben:I deployed a version 700.02 for Win 64 which should be much faster.

I would abort those workunits. They might run too long and error out with time limit exceeded.
Usualy they should finish in 50 hours.
They all aborted themselves at exactly 58:10:45 hours. Strange:

Code: Alles auswählen

ecm_ru_1481927124_10_1155.c367_1770_0	58:10:45 (56:16:03)	12/20/2016 11:12:38 AM	12/20/2016 11:27:37 AM		96.71	Reported: Computation error (197,)	1317.20 MB	1274.58 MB	
ecm_ru_1481927124_10_1155.c367_2450_0	58:10:45 (55:58:16)	12/20/2016 11:12:38 AM	12/20/2016 11:27:37 AM		96.20	Reported: Computation error (197,)	1317.20 MB	1255.02 MB	
ecm_ru_1481927124_10_1155.c367_2120_0	58:10:45 (56:20:23)	12/20/2016 11:12:38 AM	12/20/2016 11:27:37 AM		96.84	Reported: Computation error (197,)	1317.19 MB	1219.29 MB	
ecm_ru_1481927124_10_1155.c367_1670_0	58:10:45 (55:53:38)	12/20/2016 11:12:38 AM	12/20/2016 11:27:37 AM		96.07	Reported: Computation error (197,)	1317.20 MB	1212.13 MB	
ecm_ru_1481927124_10_1155.c367_3020_0	58:10:45 (56:04:59)	12/20/2016 11:12:38 AM	12/20/2016 11:27:37 AM		96.40	Reported: Computation error (197,)	1317.20 MB	1179.36 MB
Here's the reason:

<message>
exceeded elapsed time limit 209444.61 (1000000.00G/4.77G)
</message>

Edit: these were sent out again to some poor soul. Shouldn't they be cancelled? They've already caused almost 300 hours of wasted CPU time.

Antworten

Zurück zu „Number crunching“