Seite 1 von 3

New Unit. Strange (maybe?) behavior.

Verfasst: 31.01.2016 02:34
von Gibson Praise
I received a unit a couple of days ago and even knowing the unit is going to run forever, I'm not sure it is actually running right.

I checked and found little about problems at the beginning of these VM units (other than a quick abort). After a couple of hours it claimed 0.1 percent completion and 1306:56:28 hours to go. The fact that 33 hours have passed and percent complete has not moved doesn't bother me so much as the 11 - 12 second loop the "remaining time" seems caught in. ( It is still 1306:56:2? ).

If this is expected behavior, great I'll let it run. If not I can return it to the pool.

Re: New Unit. Strange (maybe?) behavior.

Verfasst: 31.01.2016 09:45
von ChristianB
As long as the progress bar is progressing the task is working. The progress bar will get "stuck" at 98,765% at some time but that only means that our estimate of calculations needed was way too short.

Re: New Unit. Strange (maybe?) behavior.

Verfasst: 31.01.2016 15:49
von Gibson Praise
ChristianB hat geschrieben:As long as the progress bar is progressing the task is working.
I was going to edit my post to say I thought I had my answer, based on further post reading.

However, based on your statement, I am not sure now. The progress bar is NOT progressing. :worry: It remains at 0,1% and has never moved from that initial number. However, another post mentioned the file "progress.txt" file in the shared directory under the proper slot. That number is increasing (it now stands at 0.031158) and is updated every minute or less. vbox.log shows regular saves of the state of the VM and no apparent errors.

I am concerned.

edit -> task 14950811, work unit 6330836

Re: New Unit. Strange (maybe?) behavior.

Verfasst: 31.01.2016 16:44
von ChristianB
Ah, ok. That seems to be a bug Jacob reported too. The vboxwrapper is not reporting the value from progress.txt to the Client. If you don't want to babysitt and watch the progress.txt manually you can abort the task now that it has not so much hours.

Re: New Unit. Strange (maybe?) behavior.

Verfasst: 31.01.2016 22:06
von Gibson Praise
I don't mind checking in on progress.txt every couple of days so I'll let it run.

As far as the bug, it has persisted through a reboot of the system. BOINC progress bar still claims 0,1% done. progress.txt is incrementing ( currently 0.035200 .. which I am hoping reads as 3.5% ).

Re: New Unit. Strange (maybe?) behavior.

Verfasst: 01.02.2016 05:50
von ChristianB
Gibson Praise hat geschrieben:I don't mind checking in on progress.txt every couple of days so I'll let it run.

As far as the bug, it has persisted through a reboot of the system. BOINC progress bar still claims 0,1% done. progress.txt is incrementing ( currently 0.035200 .. which I am hoping reads as 3.5% ).
Yes, that's right. The value in progress.txt is always between 0 and 1.

Re: New Unit. Strange (maybe?) behavior.

Verfasst: 02.02.2016 18:03
von Jacob Klein
Gibson:
Might I recommend updating to the latest version of BOINC? I think I see that you're running 7.6.9, but 7.6.22 is the current recommended version. It might help the progress problem, but it might not. Still worth upgrading.

Re: New Unit. Strange (maybe?) behavior.

Verfasst: 03.02.2016 10:20
von Michael H.W. Weber
Jacob Klein hat geschrieben:Might I recommend updating to the latest version of BOINC? I think I see that you're running 7.6.9, but 7.6.22 is the current recommended version. It might help the progress problem, but it might not. Still worth upgrading.
Maybe Christian should first comment on the question whether an upgrade of BOINC might negatively affect one of these long running tasks? Even I have several extremely long running tasks in progress and use the same "outdated" BOINC version because so far I did not dare to change to a newer one...

Michael.

Re: New Unit. Strange (maybe?) behavior.

Verfasst: 03.02.2016 14:43
von VitaliiKoshura
Hello.

I have the same issue.
Also I saw a plenty of next errors in stderr.txt:

2016-02-03 15:38:08 (32144): Guest Log: ./boinc_app: line 80: cpu_time_cmd.txt: Read-only file system
2016-02-03 15:38:08 (32144): Guest Log: ./boinc_app: line 93: free_mem.txt: Read-only file system

I think this is a root of problem: application can't save its progress and stales in an infinite cycle.

Re: New Unit. Strange (maybe?) behavior.

Verfasst: 03.02.2016 14:54
von ChristianB
VitaliiKoshura hat geschrieben:Hello.

I have the same issue.
Also I saw a plenty of next errors in stderr.txt:

2016-02-03 15:38:08 (32144): Guest Log: ./boinc_app: line 80: cpu_time_cmd.txt: Read-only file system
2016-02-03 15:38:08 (32144): Guest Log: ./boinc_app: line 93: free_mem.txt: Read-only file system

I think this is a root of problem: application can't save its progress and stales in an infinite cycle.
That is something new. Why would the file system inside the VM be Read-only? I would think that the VM would not start if the .vdi is read-only. Then again why would the .vdi file be read-only? Can you give me some more context what's going on in your stderr.txt? It would be interesting to know what over files can not be written. The science app updates some files in the same folder as cpu_time_cmd.txt

Also I don't think that updating the BOINC Client has a great impact on stability of the vboxwrapper. As far as I can see there were no changes to the API so the communication between vboxwrapper and the Client should be undisturbed. But those two files in the message above are used to calculate the progress. If they can not be written the progress seems to stop. I have no idea why the filesystem inside the VM should switch to read-only.

Re: New Unit. Strange (maybe?) behavior.

Verfasst: 03.02.2016 16:27
von VitaliiKoshura
2016-02-03 07:43:57 (32144): vboxwrapper (7.7.26179): starting
2016-02-03 07:43:59 (32144): Feature: Checkpoint interval offset (33 seconds)
2016-02-03 07:43:59 (32144): Feature: Enabling trickle-ups (Interval: 14400.000000)
2016-02-03 07:43:59 (32144): Detected: VirtualBox COM Interface (Version: 5.0.14)
2016-02-03 07:43:59 (32144): Detected: Minimum checkpoint interval (1800.000000 seconds)
2016-02-03 07:43:59 (32144): Create VM. (boinc_43d904d9254d3dd5, slot#5)
2016-02-03 07:43:59 (32144): Setting Memory Size for VM. (3072MB)
2016-02-03 07:43:59 (32144): Setting CPU Count for VM. (1)
2016-02-03 07:43:59 (32144): Setting Chipset Options for VM.
2016-02-03 07:43:59 (32144): Setting Boot Options for VM.
2016-02-03 07:43:59 (32144): Disabling VM Network Access.
2016-02-03 07:43:59 (32144): Setting Network Configuration for NAT.
2016-02-03 07:43:59 (32144): Disabling USB Support for VM.
2016-02-03 07:43:59 (32144): Disabling COM Port Support for VM.
2016-02-03 07:43:59 (32144): Disabling LPT Port Support for VM.
2016-02-03 07:43:59 (32144): Disabling Audio Support for VM.
2016-02-03 07:43:59 (32144): Disabling Clipboard Support for VM.
2016-02-03 07:43:59 (32144): Disabling Drag and Drop Support for VM.
2016-02-03 07:43:59 (32144): Adding storage controller(s) to VM.
2016-02-03 07:43:59 (32144): Adding virtual disk drive to VM. (vm_image.vdi)
2016-02-03 07:44:00 (32144): Adding VirtualBox Guest Additions to VM.
2016-02-03 07:44:00 (32144): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB)
2016-02-03 07:44:00 (32144): Enabling remote desktop for VM.
2016-02-03 07:44:00 (32144): Required extension pack not installed, remote desktop not enabled.
2016-02-03 07:44:00 (32144): Enabling shared directory for VM.
2016-02-03 07:44:00 (32144): Starting VM. (boinc_43d904d9254d3dd5, slot#5)
2016-02-03 07:44:20 (32144): Guest Log: BIOS: VirtualBox 5.0.14
2016-02-03 07:44:20 (32144): Guest Log: BIOS: ata0-0: PCHS=4161/16/63 LCHS=1024/64/63
2016-02-03 07:44:20 (32144): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032
2016-02-03 07:44:20 (32144): Guest Log: BIOS: Booting from Hard Disk...
2016-02-03 07:44:20 (32144): Guest Log: vboxguest: major 0, IRQ 20, I/O port d020, MMIO at 00000000f0000000 (size 0x400000)
2016-02-03 07:44:20 (32144): Successfully started VM. (PID = '27556')
2016-02-03 07:44:20 (32144): Reporting VM Process ID to BOINC.
2016-02-03 07:44:30 (32144): Guest Log: [boinc_app] cms_vbox_control_1.12 started
2016-02-03 07:44:30 (32144): Guest Log: [boinc_app] unzip operations completed
2016-02-03 07:44:30 (32144): Guest Log: [boinc_app] init phase completed
2016-02-03 07:44:30 (32144): Guest Log: [boinc_app] read forecast file
2016-02-03 07:44:30 (32144): Guest Log: [boinc_app] start iteration over fcfilelist
2016-02-03 07:44:30 (32144): VM state change detected. (old = 'poweroff', new = 'running')
2016-02-03 07:44:50 (32144): Preference change detected
2016-02-03 07:44:50 (32144): Setting CPU throttle for VM. (100%)
2016-02-03 07:44:50 (32144): Setting checkpoint interval to 1800 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 1800 seconds))
2016-02-03 07:45:54 (32144): Guest Log: # cmsearch :: search a sequence database with an RNA CM
2016-02-03 07:45:54 (32144): Guest Log: # INFERNAL 1.0.2 (October 2009)
2016-02-03 07:45:54 (32144): Guest Log: # Copyright (C) 2009 HHMI Janelia Farm Research Campus
2016-02-03 07:45:54 (32144): Guest Log: # Freely distributed under the GNU General Public License (GPLv3)
2016-02-03 07:45:54 (32144): Guest Log: # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
2016-02-03 07:45:54 (32144): Guest Log: # command: ./cmsearch --forecast 1 -T 0.0 --fil-T-hmm 0.0 --fil-T-qdb 0.0 RF00028_Intron_gpI.cm Oryza-sativa-Japonica-Group_CM000144.lin.EMBL.fasta
2016-02-03 07:45:54 (32144): Guest Log: # date: Wed Feb 3 05:44:27 2016
2016-02-03 07:45:54 (32144): Guest Log: # compiler: 4.3.4, -m64 -O2 -static -msse4
2016-02-03 07:45:54 (32144): Guest Log: # num seqs: 1
2016-02-03 07:45:54 (32144): Guest Log: # dbsize(Mb): 57.253018
2016-02-03 07:45:54 (32144): Guest Log: #
2016-02-03 07:45:54 (32144): Guest Log: # Pre-search info for CM 1: Intron_gpI
2016-02-03 07:45:54 (32144): Guest Log: #
2016-02-03 07:45:54 (32144): Guest Log: # cutoffs predictions
2016-02-03 07:45:54 (32144): Guest Log: # ------------------- --------------------
2016-02-03 07:45:54 (32144): Guest Log: # rnd mod alg cfg beta E value bit sc surv run time
2016-02-03 07:45:54 (32144): Guest Log: # --- --- --- --- ----- ---------- ------- ------- -----------
2016-02-03 07:45:54 (32144): Guest Log: 1 hmm fwd loc - 4572.163 0.00 0.5967 01:20:54.99
2016-02-03 07:45:54 (32144): Guest Log: 2 cm ins loc 1e-15 19882.111 0.00 0.1743 1961:45:11.44
2016-02-03 07:45:54 (32144): Guest Log: # --- --- --- --- ----- ---------- ------- ------- -----------
2016-02-03 07:45:54 (32144): Guest Log: all - - - - - - - 1963:06:06.44
2016-02-03 07:45:54 (32144): Guest Log: #
2016-02-03 07:45:54 (32144): Guest Log: #
2016-02-03 07:45:54 (32144): Guest Log: # CPU time: 84.01u 1.86s 00:01:25.87 Elapsed: 00:01:26
2016-02-03 07:45:54 (32144): Guest Log: grep: forecast.txt: No such file or directory
2016-02-03 07:45:54 (32144): Guest Log: [boinc_app] forecast phase completed
2016-02-03 07:45:54 (32144): Guest Log: [boinc_app] forecast: 0 sec
2016-02-03 07:45:54 (32144): Guest Log: [boinc_app] start iteration over filelist
2016-02-03 07:45:54 (32144): Guest Log: [boinc_app] worker process with PID: 1859 started
2016-02-03 08:15:27 (32144): Creating new snapshot for VM.
2016-02-03 08:15:41 (32144): Checkpoint completed.
2016-02-03 08:45:30 (32144): Creating new snapshot for VM.
2016-02-03 08:45:44 (32144): Deleting stale snapshot.
2016-02-03 08:45:44 (32144): Checkpoint completed.
2016-02-03 09:15:33 (32144): Creating new snapshot for VM.
2016-02-03 09:15:48 (32144): Deleting stale snapshot.
2016-02-03 09:15:48 (32144): Deleting stale snapshot.
2016-02-03 09:15:48 (32144): Checkpoint completed.
2016-02-03 09:24:59 (32144): Status Report: Elapsed Time: '6008.714500'
2016-02-03 09:24:59 (32144): Status Report: CPU Time: '5725.735903'
2016-02-03 09:45:43 (32144): Creating new snapshot for VM.
2016-02-03 09:45:58 (32144): Deleting stale snapshot.
2016-02-03 09:45:58 (32144): Deleting stale snapshot.
2016-02-03 09:45:58 (32144): Deleting stale snapshot.
2016-02-03 09:45:58 (32144): Checkpoint completed.
2016-02-03 10:15:48 (32144): Creating new snapshot for VM.
2016-02-03 10:16:06 (32144): Deleting stale snapshot.
2016-02-03 10:16:06 (32144): Deleting stale snapshot.
2016-02-03 10:16:32 (32144): Deleting stale snapshot.
2016-02-03 10:16:33 (32144): Deleting stale snapshot.
2016-02-03 10:16:35 (32144): Checkpoint completed.
2016-02-03 10:45:52 (32144): Creating new snapshot for VM.
2016-02-03 10:46:09 (32144): Deleting stale snapshot.
2016-02-03 10:46:09 (32144): Deleting stale snapshot.
2016-02-03 10:46:09 (32144): Deleting stale snapshot.
2016-02-03 10:46:09 (32144): Deleting stale snapshot.
2016-02-03 10:46:12 (32144): Checkpoint completed.
2016-02-03 11:04:59 (32144): Status Report: Elapsed Time: '12008.977501'
2016-02-03 11:04:59 (32144): Status Report: CPU Time: '10588.396274'
2016-02-03 11:15:52 (32144): Creating new snapshot for VM.
2016-02-03 11:16:09 (32144): Deleting stale snapshot.
2016-02-03 11:16:14 (32144): Deleting stale snapshot.
2016-02-03 11:16:14 (32144): Deleting stale snapshot.
2016-02-03 11:16:14 (32144): Deleting stale snapshot.
2016-02-03 11:16:14 (32144): Deleting stale snapshot.
2016-02-03 11:16:14 (32144): Checkpoint completed.
2016-02-03 11:17:16 (32144): Guest Log: expr: write error: Read-only file system
2016-02-03 11:17:16 (32144): Guest Log: ./boinc_app: line 93: free_mem.txt: Read-only file system
2016-02-03 11:17:47 (32144): Guest Log: ./boinc_app: line 80: cpu_time_cmd.txt: Read-only file system

Re: New Unit. Strange (maybe?) behavior.

Verfasst: 03.02.2016 16:31
von VitaliiKoshura