Rechenkraft.net e.V.

Verfasst: **08.11.2014 19:03**

Hi so I stated crunching and it went well for a while (Im at 1.4% completion), then I started seeing this and since it never seems to actually start crunching again. Does anyone know what its waiting for?

The deadline has passed, could that be it? In the past the deadline didn't cause an issue....

Verfasst: **09.11.2014 13:14**

See if you can find a "replay" text file in your slots directory. It usually tells you what commands were sent, and if you scroll to the bottom, you might find a command that failed. Return values of 0 usually mean success.

I have sometimes seen this behavior when the following situation happened:
- BOINC created a 2nd VM snapshot (like it normally does)
- User/system stopped BOINC before it had a chance to delete the old snapshot
- BOINC is started again, and runs for a little (~10-20 minutes) and creates a new snapshot (like it normally does)
- But when BOINC tries to delete the oldest snapshot, there are now 2 childs to the oldest parent snapshot, and VirtualBox disallows the deletion (gives an error message/code)
- BOINC doesn't know what to do, and considers the task temporarily unmanageable.

So, see if the replay/trace texts lead you to suspect that problematic behavior.

If you suspect this has happened, you might be able to fix it. Close BOINC, and open up VirtualBox Manager, click the VM, and click snapshots. If you see 2 snapshots (the node that says "Current state" is NOT a snapshot)... but if you see 2 snapshots, right click the oldest one and remove it. Then see if BOINC will run the task successfully for 20+ minutes (enough to create a snapshot and auto-delete the parent).

See, it's supposed to only have 1 snapshot ever, except when BOINC is saving off a new copy, in which case BOINC creates a 2nd snapshot, and then deletes the older snapshot, again leaving 1 snapshot.

Also, if the replay/trace files indicate that some other failure happened, then I apologize for misdiagnosing your problem. Perhaps you could paste a snippet, so we can investigate further. Basically, you have to help us dig here. And it's all about the text files in the slots directory.

Note: Regarding the deadline, the client program on your computer will communicate with RNA World occasionally, and RNA World will auto-extend the deadline on the server. So, you should be focusing on the server deadline -- if it's in the future (because it got auto-extended), then you're good. The client doesn't currently update it yet in the UI, but seeing an old deadline there will not be the cause of any problems.

Regards,
Jacob

Verfasst: **16.11.2014 19:26**

Sorry its taken me a while to respond, I do see the 20 minute activity issue then it stops.

I exited Boinc (stopped all crunching) and opened the VirtualBox manager, I actually have 3 snapshots, one from today, one from 21 days ago and one from 22 days ago. Problem is I cant delete the one from 22 or 21 days ago.
I try to delete the forst time and nothing happens, when I try a second time I get the error:

There is no virtual machine with the identifier c87e884a-ac2b-4cca-9c4f-68b2c28e9b1d.

Callee RC: RPC_S_SERVER_UNAVAILABLE 0x800706BA (0x800706BA)

Verfasst: **16.11.2014 19:38**

Try this:
- In BOINC, tell it Activity -> Suspend (so it will stop communicating with the VM)
- Restart the PC, to make absolutely sure it releases all of its communications, and also to make sure VirtualBox starts up fresh
- Open VirtualBox Manager
- For the VM in question, for the snapshots, start from the bottom. Sure, you'll want to keep the latest, but starting from the bottom, see if you can delete the duplicates. You'll only be allowed to delete a snapshot if it has up-to-1 child, so be sure to delete appropriately.
- At the end of the day, you should be left with only 1 snapshot.

Any luck?

Verfasst: **16.11.2014 19:41**

OK I managed to delete the newest and 21 day old snapshots by restoring the 22 day old snapshot, which created a save of the current state. This then allowed me to delete the most recent and the 21 day old snapshots.
I did this by pure trial and error, but it seems to be working, I will have to see if it goes wrong after 20 minutes again

Verfasst: **16.11.2014 19:48**

Okay. The replay file in the slot directory is the best way to see what went wrong, whenever it says things like "Scheduler Wait (VM job unmanageable, restarting later.)" The replay shows what commands were sent, and what the results were.

Verfasst: **17.11.2014 10:39**

Hmm, Im now getting BOINC Manager saying:
Postponed: VM Hypervisor failed to enter an online state in a timely fashion.

So I need more help, sorry you will have to explain 'The replay file in the slot directory', where is the slot directory located?

Verfasst: **17.11.2014 10:57**

Ok I found the replay file, unfortunately I cant understand it as Im not familiar with the syntax, any pointers on keywords to search for?

Verfasst: **17.11.2014 12:05**

Well the WU just failed, so i guess dont worry about it, hopefully if I get a new one it will all be ok.

Verfasst: **17.11.2014 15:34**

Well, part of the replay was put into the stderr.txt file upon task failure.
http://www.rnaworld.de/rnaworld/result. ... d=14948544

The errors at the bottom, look like it's saying that the VM was locked by some other task/process.
Did you at any time start the VM outside of BOINC?

Anyway, good luck with your next task when you get it.

Verfasst: **20.11.2014 09:36**

Think I did start it accidentally while I was messing around trying to delete those snapshots, will bear in mind for future, thanks.

Verfasst: **15.12.2014 03:13**

For the past several days, mine has been saying, "Postponed: VM job unmanageable, restarting later."

Last entry in vbox_replay.txt

VBoxManage -q --version
VBoxManage -q list hostinfo
VBoxManage -q showvminfo "boinc_edaf664144deb5f3" --machinereadable
VBoxManage -q snapshot "boinc_edaf664144deb5f3" restorecurrent
VBoxManage -q startvm "boinc_edaf664144deb5f3" --type headless
VBoxManage -q controlvm "boinc_edaf664144deb5f3" cpuexecutioncap 80
VBoxManage -q bandwidthctl "boinc_edaf664144deb5f3" set "boinc_edaf664144deb5f3_net" --limit 600K
VBoxManage -q controlvm "boinc_edaf664144deb5f3" pause
VBoxManage -q snapshot "boinc_edaf664144deb5f3" take boinc_977423
VBoxManage -q controlvm "boinc_edaf664144deb5f3" resume
VBoxManage -q snapshot "boinc_edaf664144deb5f3" list
VBoxManage -q snapshot "boinc_edaf664144deb5f3" delete "4aa1143d-a677-445b-aab2-03fe53cde355"
VBoxManage -q snapshot "boinc_edaf664144deb5f3" delete "5df11cbc-47f4-413f-b09b-4a8ca854d97a"
VBoxManage -q controlvm "boinc_edaf664144deb5f3" poweroff

In Snapshots directory, I have a bunch of .vdi files, and these:
2014-12-08T21-24-51-626094900Z.sav 12/8/2014 667,835kb
2014-12-10T07-11-03-661487600Z.sav 12/10/2014 667,842kb
2014-12-14T22-20-51-265033600Z.sav 12/14/2014 667,838kb

Since it's been messing up for several days, which should I delete?

Rechenkraft.net e.V.

Scheduler Wait (VM job unmanageable, restarting later).

Scheduler Wait (VM job unmanageable, restarting later).

Re: Scheduler Wait (VM job unmanageable, restarting later).

Re: Scheduler Wait (VM job unmanageable, restarting later).

Re: Scheduler Wait (VM job unmanageable, restarting later).

Re: Scheduler Wait (VM job unmanageable, restarting later).

Re: Scheduler Wait (VM job unmanageable, restarting later).

Re: Scheduler Wait (VM job unmanageable, restarting later).

Re: Scheduler Wait (VM job unmanageable, restarting later).

Re: Scheduler Wait (VM job unmanageable, restarting later).

Re: Scheduler Wait (VM job unmanageable, restarting later).

Re: Scheduler Wait (VM job unmanageable, restarting later).

Re: Scheduler Wait (VM job unmanageable, restarting later).