VboxHeadless running without vboxwrapper on Mac OS

Everything about the project RNA World
Nachricht
Autor
Benutzeravatar
JeromeC
XBOX360-Installer
XBOX360-Installer
Beiträge: 76
Registriert: 23.10.2010 19:38
Wohnort: Poissy/France

VboxHeadless running without vboxwrapper on Mac OS

#1 Ungelesener Beitrag von JeromeC » 18.06.2014 22:55

Something very weird is happening with this long WU, look at this :

Bild

Bild

I currently have 3 VM projects ongoing, normally T4T and Atlas (new project) are running and RNA is in a wait state, however I have 3 VBoxHeadless process in memory, the 3 are running. You can see I have 9 apps actually running, and I have 8 cores on my i7 CPU ! (this explains each task as "relatively low CPU %" compared to the regular 90% or above they should have...)

I think RNA is the middle one, since I know T4T uses little RAM (297 MB looks like it) and the new Atlas requires huge amount (2 GB looks like it), so RNA would be the 1,1 GB one).

The setup to keep apps in memory while pending is OFF on my boinc, so it should not be there at all !

What do you think ?

ChristianB
Admin
Admin
Beiträge: 1920
Registriert: 23.02.2010 22:12

Re: Long running work unit

#2 Ungelesener Beitrag von ChristianB » 19.06.2014 08:00

I also noticed this on my Linux desktop. There were three RNA World tasks running although there should only be 2. Could you send me the stderr.txt of the task? What happens if you stop the boinc client? Are all vboxheadless processes terminated? What happens if you restart the computer? How many vboxheadless processes are there after rebooting?

Benutzeravatar
JeromeC
XBOX360-Installer
XBOX360-Installer
Beiträge: 76
Registriert: 23.10.2010 19:38
Wohnort: Poissy/France

Re: Long running work unit

#3 Ungelesener Beitrag von JeromeC » 19.06.2014 15:00

I'm afraid I won't be able to answer all these question : few minutes after I posted this another project's WU terminated and then RNA was again "officially running" in the boinc manager with 8 tasks at the same time as usual (not counting nci tasks obviously)... and it is the first ever I had 3 VM at a time : I don't think I ever got more than one RNA at a time, same for T4T, and it is my first Atlas task ever. When I've had two (T4T + RNA) it seemed to behave properly, maybe it's an issue only when more than 2 boinc VM are present ?

I'll try to send you the stderr.txt (there is one per project / task ? in the slot directory ? in the project directory ?) though, and see if I ever see such a behavior again, but I'm not very often in front of my boinc manager...

ChristianB
Admin
Admin
Beiträge: 1920
Registriert: 23.02.2010 22:12

Re: Long running work unit

#4 Ungelesener Beitrag von ChristianB » 19.06.2014 18:10

The stderr is in the slot directory. Interesting thing to get to know is how the newly started vboxwrapper got the control over the already running vboxheadless process. I hope I can find it in the logfile.

Benutzeravatar
JeromeC
XBOX360-Installer
XBOX360-Installer
Beiträge: 76
Registriert: 23.10.2010 19:38
Wohnort: Poissy/France

Re: Long running work unit

#5 Ungelesener Beitrag von JeromeC » 22.06.2014 16:22

Sorry for the delay, here is the file.

I found this :
2014-06-17 23:53:43 (17146): Error in delete stale snapshot for VM: -2147024809
Command:
VBoxManage -q snapshot "boinc_ab0e47072b0b9d80" delete "92c0a719-ce9b-447d-93e1-e029ca1d9df2"
Output:
VBoxManage: error: Code NS_ERROR_INVALID_ARG (0x80070057) - Invalid argument value (extended info not available)
VBoxManage: error: Context: "DeleteSnapshot(bstrSnapGuid.raw(), pProgress.asOutParam())" at line 421 of file VBoxManageSnapshot.cpp

2014-06-17 23:53:43 (17146): ERROR: Checkpoint maintenance failed, rescheduling task for a later time. (-2147024809)
2014-06-17 23:53:43 (17146): Powering off VM.
2014-06-17 23:53:43 (17146): Error in poweroff VM for VM: -2135228414
Command:
VBoxManage -q controlvm "boinc_ab0e47072b0b9d80" poweroff
Output:
VBoxManage: error: Invalid machine state: DeletingSnapshotOnline (must be Running, Paused or Stuck)
VBoxManage: error: Details: code VBOX_E_INVALID_VM_STATE (0x80bb0002), component Console, interface IConsole, callee nsISupports
VBoxManage: error: Context: "PowerDown(progress.asOutParam())" at line 222 of file VBoxManageControlVM.cpp

2014-06-17 23:53:43 (17146): VM did not power off when requested.
and the next thing is :
2014-06-18 23:54:46 (96533): vboxwrapper (7.3.26086): starting
2014-06-18 23:54:46 (96533): Feature: Checkpoint interval offset (561 seconds)
2014-06-18 23:54:46 (96533): Feature: Enabling trickle-ups (Interval: 14400.000000)
2014-06-18 23:54:46 (96533): Detected: VirtualBox 4.3.12r93733
2014-06-18 23:54:46 (96533): Detected: Sandbox Configuration Enabled
2014-06-18 23:54:47 (96533): Detected: Minimum checkpoint interval (1800.000000 seconds)
2014-06-18 23:54:48 (96533): Powering off VM.
2014-06-18 23:54:50 (96533): Successfully powered off VM.
2014-06-18 23:54:50 (96533): Restore from previously saved snapshot.
2014-06-18 23:54:51 (96533): Restore completed.
2014-06-18 23:54:51 (96533): Starting VM.
2014-06-18 23:55:13 (96533): Successfully started VM. (PID = '96578')
2014-06-18 23:55:13 (96533): Reporting VM Process ID to BOINC.
2014-06-18 23:55:16 (96533): Lowering VM Process priority.
2014-06-18 23:55:18 (96533): VM state change detected. (old = 'poweroff', new = 'running')
2014-06-18 23:55:20 (96533): Status Report: Elapsed Time: '921383.190122'
2014-06-18 23:55:20 (96533): Status Report: CPU Time: '771475.250000'
2014-06-18 23:55:20 (96533): Status Report: Trickle-Up Event.
2014-06-18 23:55:20 (96533): Preference change detected
2014-06-18 23:55:20 (96533): Setting CPU throttle for VM. (100%)
2014-06-18 23:55:21 (96533): Checkpoint Interval is now 240 seconds.
These two sections are really following themselves, so I dunno what happened in between that decided boinc to "take back" the VM properly (I hope).
Dateianhänge
RNA stderr.txt.zip
(26.03 KiB) 279-mal heruntergeladen

Benutzeravatar
JeromeC
XBOX360-Installer
XBOX360-Installer
Beiträge: 76
Registriert: 23.10.2010 19:38
Wohnort: Poissy/France

Re: Long running work unit

#6 Ungelesener Beitrag von JeromeC » 25.06.2014 22:25

It's happening again but now RNA is running OK, Atlas VM has started and it's the T4T VN that is

Bild

and is still running in memory...

I'm gonna mention it in the T4T forum, but it's Atlas the guilty new VM I guess, it started when Atlas VM did start !

Benutzeravatar
JeromeC
XBOX360-Installer
XBOX360-Installer
Beiträge: 76
Registriert: 23.10.2010 19:38
Wohnort: Poissy/France

Re: Long running work unit

#7 Ungelesener Beitrag von JeromeC » 26.06.2014 20:40

When I came back home this evening it was to find out that Atlas had been "abandoned" (but not by me !), but also that T4T and RNA WUs where also "pending / unmanageable", still running in memory with 8 other WUs at the same time... So I had to stop boinc and kill the 2 VBoxHeadless processes that wouldn't stop anyway.

Some time after restarting everything I just got a new Atlas WU, and so far things to be OK :

Bild

Bild

Wait & see.

Benutzeravatar
JeromeC
XBOX360-Installer
XBOX360-Installer
Beiträge: 76
Registriert: 23.10.2010 19:38
Wohnort: Poissy/France

Re: Long running work unit

#8 Ungelesener Beitrag von JeromeC » 26.06.2014 20:53

... I didn't have to wait for long :

Bild

Bild

T4T "unmanageable", only 2 VB wrappers in memory for 3 VM running, 9 tasks running at the same time on my i7...

I think happened when one malaria did finish and another one started, at that moment something did break one of the 3 wrappers.

ChristianB
Admin
Admin
Beiträge: 1920
Registriert: 23.02.2010 22:12

Re: Long running work unit

#9 Ungelesener Beitrag von ChristianB » 27.06.2014 07:41

That's odd, I thought we had this issue with "vboxheadless running without the wrapper" fixed. This zombie VM is bad for overall performance and is only resolved with a restart. What does the stderr.txt of the T4T vboxwrapper say why it can't manage the VM?

Benutzeravatar
JeromeC
XBOX360-Installer
XBOX360-Installer
Beiträge: 76
Registriert: 23.10.2010 19:38
Wohnort: Poissy/France

Re: Long running work unit

#10 Ungelesener Beitrag von JeromeC » 27.06.2014 09:48

I'll try to send this tonight, for the moment I can tell you that the wrapper disappears in the task list and only the VBoxHeadless remains, if I suspend boinc (without keeping apps in memory) or even shut it down, the VM VBoxHeadless continues, I have to kill it separately.

The other VM wrappers (T4T, Atlas) continue to run when this one fails, but then Atlas is failing after a while anyway (actually there are "abandoned" but not by me, by the project !), I guess it's the fault of this newcomer since before I tried it I had no problem with T4T + RNA at the same time.

Or maybe it's an issue because of 3 VM at the same time, and 2 was OK ?

Benutzeravatar
JeromeC
XBOX360-Installer
XBOX360-Installer
Beiträge: 76
Registriert: 23.10.2010 19:38
Wohnort: Poissy/France

Re: Long running work unit

#11 Ungelesener Beitrag von JeromeC » 28.06.2014 09:36

Well I have set Atlas not to send anymore WU, but this morning I found RNA in this "suspended unmanageable" state again, with T4T running normally + 2 VBoxHeadless running in memory and only one wrapper running (I assume the one of T4T)... I had to suspend + stop / restart boinc and now it found RNA again, there are 2 wrappers running + the 2 VM and the status is "running" ok for both...

Very strange, this never happened before and now all the time, I thought it was because of Atlas but even with no Atlas WU it's doing it !

I attach the latest stderr of RNA.
Dateianhänge
stderr.txt.zip
(33.15 KiB) 293-mal heruntergeladen

Benutzeravatar
JeromeC
XBOX360-Installer
XBOX360-Installer
Beiträge: 76
Registriert: 23.10.2010 19:38
Wohnort: Poissy/France

Re: Long running work unit

#12 Ungelesener Beitrag von JeromeC » 28.06.2014 11:39

Grrrrr... again, actually there is quite some stuff in the stderr :
2014-06-28 12:34:34 (56938): Error in delete stale snapshot for VM: -2147024809
Command:
VBoxManage -q snapshot "boinc_ab0e47072b0b9d80" delete "8c3e3a94-a697-41d4-b0c6-802014f8e4ee"
Output:
VBoxManage: error: Code NS_ERROR_INVALID_ARG (0x80070057) - Invalid argument value (extended info not available)
VBoxManage: error: Context: "DeleteSnapshot(bstrSnapGuid.raw(), pProgress.asOutParam())" at line 421 of file VBoxManageSnapshot.cpp

2014-06-28 12:34:34 (56938): ERROR: Checkpoint maintenance failed, rescheduling task for a later time. (-2147024809)
2014-06-28 12:34:34 (56938): Powering off VM.
2014-06-28 12:34:35 (56938): Error in poweroff VM for VM: -2135228414
Command:
VBoxManage -q controlvm "boinc_ab0e47072b0b9d80" poweroff
Output:
VBoxManage: error: Invalid machine state: DeletingSnapshotOnline (must be Running, Paused or Stuck)
VBoxManage: error: Details: code VBOX_E_INVALID_VM_STATE (0x80bb0002), component Console, interface IConsole, callee nsISupports
VBoxManage: error: Context: "PowerDown(progress.asOutParam())" at line 222 of file VBoxManageControlVM.cpp

2014-06-28 12:34:35 (56938): VM did not power off when requested.
So what am I to do now ? is it actually an issue with this WU ? will it run again normally ? are the 17 days of calculation already done somehow lost ? can I let this WU "continue" ?

Antworten

Zurück zu „RNA World Discussions (english)“