RNA World/FAQ/en

Aus Rechenkraft
Wechseln zu: Navigation, Suche
Languages Languages

Deutsch ja.gif     •  United Kingdom01.gif     •  Italy01.gif   

Inhalt

In simple words, what are the goals of RNA World and why should I participate?

RNA World focuses on RNA research and as such it is the first distributed computing project of its kind. If you take a look for what discoveries the recent Nobel Prizes in chemistry and medicine were awarded (telomerase, ribosome: all RNA-based cell machineries; and before that for RNA interference / miRNAs: small RNAs that regulate cell development and are involved in cancer), you will realize that RNA research is a highly important subject. But maybe you have also once taken antibiotics to help your immune system to battle a bacterial infection. Such antibiotics usually bind to the RNA components of bacterial ribosomes and thereby inactivate protein synthesis and consequently growth of these microbes. Other small RNAs in bacteria are required for attaching to and eventually even invading human host cells. And of course, many aspects of fundamental cellular processes where RNAs are involved are of additional interest to researchers, as e.g. the question whether microorganisms such as bacteria might even possess some sort of primitive immune system (CRISPR RNA family). So as you can see, identifying non-protein-coding RNAs in as many organisms as possible establishes a fundamental basis of knowledge for a diverse range of questions and may even promote the development of drugs to combat diseases. For further information, please see the project description.

What applications are currently supported by RNA World?

type
application
purpose
INFERNAL 1.0.2 CMBUILD, CMCALIBRATE, CMSEARCH RNA covariance analysis
InReAlyzer 1.0 Converter to produce high-resolution graphics
from INFERNAL output files (in-house development)

What types of work units are available?

At present we have four types of work units, which are based on (1) CMBUILD, (2) CMCALIBRATE, (3) CMSEARCH and (4) InReAlyzer. CMBUILD work units produce an RNA co-variance model from a text alignment of members of RNAs belonging to the same RNA family. CMCALIBRATE work units calibrate an RNA co-variance model produced by CMBUILD such that it can be used to score the probability that a potential RNA identified as a member of a certain RNA family is indeed a true candiate of that family. CMSEARCH work units use the output of CMBUILD, i.e. a calibrated co-variance model to search for an RNA family in the genome of a specified organism. InReAlyzer work units convert the somewhat cryptic text output of CMSEARCH work units into high-resolution PNG graphics that allow for convenient visual judgement whether or not a given CMSEARCH candidate belongs to the RNA family under investigation.

Does RNA World support GPU/CUDA/STREAM processing?

At present, RNA World applications do not profit from GPU processing, but we can think of applications that will and if that applies, we will certainly try our best to make clients available that support such performance improvements.

Does RNA World support checkpointing?

RNA World is a universal framework comprising a diverse set of RNA-relevant bioinformatic tools. Originally, these tools were not designed for incorporation in a distributed computing environment although most of these require significant compute ressources. Traditional checkpointing, however, must be supported by the scientific application. Consequently, checkpointing is not available, but: The RNA World development team is seeking ways to establish novel, universal checkpointing procedures. These novel methods rely on either creating "suspend RAM to disk" images at the system level or the establishment of RNA World as a system that generally runs in a virtual machine (VM). For the former we have a functional 32-bit Linux workaround that, however, requires that memory randomization is deactivated in the Linux kernel. For the latter we have a promising cooperation with CERN.

What is the RAM requirement of RNA World?

type
application
RAM requirements
INFERNAL 1.0.2 CMCALIBRATE up to 2.5 GB of RAM,
usually below 600 MB of RAM
CMSEARCH up to 1 GB of RAM,
usually below 100 MB of RAM
InReAlyzer 1.0 below 50 MB of RAM

Please note one important thing: If you are using a multi-core machine, RNA World might engage on all of your cores simultaneously. As long as we do not use the OpenMPI-based multi-threaded application mode, this means that the RAM required to successfully complete all of the work units is additive.

What are the typical running times of RNA World work units using an Intel P8600 Core2Duo 2,4 GHz machine with Ubuntu Linux x64?

type
application
runtime
INFERNAL 1.0.2 CMCALIBRATE 20 minutes to 10 days,
typically a day
CMSEARCH a few minutes to 100 hours,
typically a few hours
InReAlyzer 1.0 a few seconds to 5 hrs,
typically a few minutes

Does RNA World support multithreaded (MT) applications?

RNA World supports whatever its scientific applications support. In case of INFERNAL, multithreaded applications are nicely supported on the basis of OpenMPI. While MT applications run well in a local computing environment under Linux, validation of results generated in a distributed computing environment that employ MT applications for unknown reasons fails in a subset of cases, although technically MT applications work excellent. We have not yet identified the cause of this inconsistent behavior, but INFERNAL makes use of random number generation that is tied to the processor core time at which individual threads are started. On a multicore system, an MT application therefore is initiated at slightly different times on each of the cores involved. Consequently, on a different (distributed) system, the random numbers differ and so do the results - although differences are only minor. If the random number generator is supplied with a fixed value, issues are largely solved but, strangely, not entirely and that is a problem for BOINC-based WU validation. The RNA World team is currently corresponding with the INFERNAL developers to solve this issue to fully enable distributed MT applications as soon as possible.

What is a multithreaded (MT) application?

A multithreaded application makes use of multiple CPU cores on modern machines to compute a single RNA World work unit. Our measurements for CMCALIBRATE running in a Linux environment showed that a quadcore crunches one work unit in even slightly less (!) than a fourth of the computation time that was required for singlethreaded computation of the same work unit. This shows that CMCALIRATE scales excellent in MT mode.

Will RNA World research ultimately lead to the development of medications?

Although nobody can forsee this for sure, it seems highly likely that the results generated by RNA World will contribute to the development of medications (see application section of this F.A.Q.).

What are potential applications of the RNA World project?

RNAs like proteins are macromolecules of defined structure that serve vital cellular tasks. Consequently, structural knowledge of RNAs is of similar importance for drug design as is the case for proteins. What you know as antibiotics are actually small molecule drugs that mainly target RNA components of bacterial ribosomes. This illustrates nicely how important RNAs are and what huge potential they represent in the context of drug design. However, before we can take on RNA structures to design drugs, we first need to identify which RNAs are present and where exactly they are located in the genome of any given organism. This we presently try to accomplish in a global manner using the RNA World supercomputer.

Will the RNA World project results be published and if so under what license?

All RNA World results will be published in high-quality peer-reviewed science journals, preferrably in those offering an open access policy for the general public.

Are work units generated automatically or manually?

BOINC-based work units are always generated in a fully automated manner from operator-supplied input files. RNA World currently relies on operator-curated input archives that will be automatically processed by the RNA World server to yield several thousands of work units per archive. Archives can be placed in a on-hold queue such that once the server is running low on work units, it can process new ones from this supply.

We are currently working on implementing user job submission interfaces such that, under strict security guidelines, researches can use the RNA World distributed supercomputer to process their own project files. Here, we do not plan to allow batch job processing for security reasons and the users will have to register and use a digital certificate for clear identification.

It is also planned to derive work units fully automated by regularly scanning RNA-relevant databases for novel sequences that could be analyzed.

Is a continuous work supply guaranteed?

Our objective is to continuously recruit more and more RNA-relevant bioinformatic tools to RNA World. Moreover, the data sources containing RNA-relevant information that require analysis by RNA World are growing daily. To cope with these two facts, we expect that RNA World will require increasing compute capacities and consequently should be expected not to run out of work, soon. However, we are computing on an individual project basis plus we try to build up databases containing pre-computed results e.g. for listing potential RNA candidate genes in any given organism. Once our objectives are reached, we will naturally stop sending out work units until we have new projects in store. This will be announced in time on the RNA World website to avoid machines to run idle.

Are the work units identical for different operating systems?

RNA World delivers work units based on available applications. If an RNA World application is available for more than one operating system then the work units assigned are usually identical. This, however, does certainly not mean that identical work units will be computed similarly fast on systems that have identical hardware but different operating systems. The reason for this, among others, is e.g. differences in compiler performance.

Does RNA World award project performance certificates?

Yes.

How can I save my work for non-checkpointed RNA World sub-applications when I need to turn my machine off?

If you run RNA World on a laptop or if you have multiple operating systems installed, from time to time it will be required to turn your machine off. The current RNA World sub-applications INFERNAL and InReAlyzer dot not support checkpointing, so your work would be lost if you shut down your machine the standard way. To avoid loss of your work, set the machine to enter sleep mode. Under these conditions the entire RAM will be saved to hard disk and upon system reboot your work will be reloaded into memory such that even RNA World can continue to calculate where it left off.

What operating systems (32/64-bit) are supported by RNA World?

Since RNA World is a framework consisting of many different bioinformatic applications, support of operating systems will always depend on the individual application. If its source code is available, the RNA World development team will do its best to make it available for as many operating systems as possible, i.e. Linux, Windows and OSX at the minimum. In addition, care will be taken to support 32-bit as well as 64-bit versions and special assembly codes (SSE, SSE2, etc.) wherever possible and useful.

What are the costs of participating in RNA World?

None, except for your private electricty, network and hardware maintenance costs.

Are there privileges that apply only to Rechenkraft.net team members?

As fairness dictates: No.

What are the plans concerning language support on the RNA World website?

Just take a look here: http://www.rnaworld.de/rnaworld/language_select.php

Will there be CPU-optimized applications?

These are already implemented, i.e. the initially downloaded package contains a program that checks which type of application is optimal for your machine (x86/x64, diverse SSE versions).

Is it possible to exclude participation in certain sub-applications?

Yes. In your RNA World project settings you can decide on your own which RNA World sub-applications to support and which not. In fact, if you have an older machine it might be recommended, e.g. to exclude CMCALIBRATE as work units for this program demand huge amounts of RAM and often take a long time to complete.

Is it possible to enable/disable participation in alpha/beta tests using a simple switch in the RNA World user profile?

Yes.

Can I as a scientist submit tasks to RNA World?

Hopefully soon. We are working on implementing user job submission interfaces for each of the various RNA World applications. However, you will have to register and receive a digital certificate such that jobs submitted to our system are clearly correlated to an individual known to us.

What systems are going to be supported in the future?

At present we support Linux, Windows and Mac wherever possible. PS3 most likely will not be supported due to its small RAM capacity although it might be possible to use it for other applications in the future which, at present, we have not implemented yet. If we manage to establish a virtual machine approach, however, it might be possible to even support a number of additional systems.

Are BOINC and the RNA World applications safe, i.e. free from viruses and other malware?

Yes. BOINC as well as all RNA World applications are open source, i.e. can be inspected by anyone who is interested. The RNA World applications are compiled in-house using compiler tools that are widely applied public domain tools which e.g. are used to produce the code of the majority of todays webservers. Consequently, if these were malicious, we would already face a much bigger problem.

How much hard disk space is required to run RNA World?

The required disk space varies depending on the types of work units you are being assigned to. At present, RNA World core files require around 25 MB of hard disk space. CMBUILD and CMCALIBRATE work units should not require more than approximately 10 MB while CMSEARCH work units may use up 300 MB at maximum, typically between 2-20 MB. InReAlyzer hard disk space requirements cannot be predicted reliably because the number of images generated depends on the CMSEARCH output file size. However, we do not expect InReAlyzer to require much more than 1-100 MB. Remember that the sum of required hard disk space calculates from the sum of work units that have been downloaded plus the RNA World core files. Currently, a maximum of 10 work units can be downloaded per CPU core. Note that you can specify the maximum hard disk space that RNA World is allowed to occupy manually from either within you local BOINC manager or on the RNA World website.

What Internet traffic can be expected?

All files are transferred in compressed format and most files contain simple ASCII data such that compression rate is around 30%, i.e. original file sizes will be reduced to 30% of their original size. In general, CMBUILD and CMCALIBRATE work units are the smallest and should require less than 1 MB (usually even less than 100 kB) of data traffic. CMSEARCH work units cause somewhat higher download traffic depending on the size of the genome that is going to be analyzed: Current upper limit: With a maximum of 512 MB for one of the chromosomes of an opossum (uncompressed file size), 150 MB would have to be transferred (compressed file size) for a CMSEARCH work unit plus a few kB for additional control files. Of course, the upload traffic only contains the result file and not the genome that was searched for RNA presence and consequently will be much, much smaller. Normal traffic: A typical bacterial genome such as that of e.g. E. coli is about 4.6 MB (uncompressed) in size. Hence, 1.3 MB (compressed) of data plus the control files (just a few kB) will be transferred. Lower limit: Many viral genomes as well as plasmid sequences contain less than 10 kB of data in uncompressed format. However, note that small CMSEARCH work units are expected to complete quickly such that your machine may request new data over and over again depending on your systems performance.

Can RNA World be operated in offline mode?

Currently not, because we do not allow for caching of large sets of work unit packages because we require a high turn-around time. In case of CMCALIBRATE work units this is quite easy to understand, since the results of these are the basis for all subsequent CMSEARCH work units. Generally, CMBUILD results are the basis for CMCALIBRATE calculations and CMCALIBRATE results are required as input data for CMSEARCH. The output of CMSEARCH in turn then serves as input for InReAlyzer. However, since we are planning to add a set of additional applications in the future which lack these strict interdependencies, it is very likely that certain types of future work units will allow for offline computation.

What are the minimal CPU requirements for participation in RNA World?

Concerning the CPU, even processors lacking SSE such as Intel Pentium II (or older) could in principle participate since we have the appropriate applications ready for delivery. However, you should consider the average run times, deadlines and RAM requirements for certain types of work units as detailed elsewhere in this F.A.Q.

What are the network requirements for participation in RNA World?

Currently we recommend RNA World participation only for machines that are connected to the Internet on a 24/7 basis, i.e. all around the clock.

Does RNA World offer a screensaver function?

Yes.

According to the server status page, work units should be available, so why don't I get any?

Assuming you have activated the type of work units announced to be available for processing in your RNA World project profile, the reason is RNA World's homogenous redundancy policy: a work unit delivered to a Linux x64 machine of a certain CPU type for example will only be sent to another Linux x64 machine for validation which has the same CPU type installed. If your system does not get any work units anymore then the remaining work indicated on the server status page can only be delivered to machines that provide an operating system and/or CPU different from yours.

I came home and my machine was basically unresponsive with multiple RNA World screensaver windows open - what is going on here?

This (rarely occuring) strange behaviour is not yet completely understood. For simplicity, our current screensaver makes use of Adobe FlashPlayer. Consequently, the problem you describe can occur only on machines where FlashPlayer is installed (on others, the screensaver function will not work). To resolve the issue, it seems you need to either upgrade to the latest FlashPlayer version or uninstall it completely from your machine (of course, uninstalling is not really a good suggestion as many websites make use of Flash).

It seems that the entire RNA World website is available only in German?

No, you can individually customize your display language. Forum settings for example are found here. Setting BOINC pages to English (only necessary if it doesn't work properly with the browser's ACCEPT setting) can be done here. Since 18th of January 2010 we have also incorporated the Boinc translation system (BTS).

I got the message "redundant result", what exactly does that mean?

First, a few remarks on the terms used in BOINC. A work unit is defined as a computational job which we would like participants to complete. A result, by contrast, is a collective term for the files which the server generates and sends to the participants. If enough results (quorum) are successful (this includes the data transfer to the participant, computation of the job, return of the result files to the server, etc.) and got validated (i.e. is identical to at least one other result successfully returned to the server), then a work unit is complete. For example, in RNA World, for each CMSEARCH-based work unit three results are being generated and sent to three different machines. If two of these (quorum) are successful and get validated, the work unit is completed. As a consequence, the third result is no longer required, i.e. it is redundant (redundant result). This third result then (1) will not be sent out again (if it has not yet been sent out), (2) will be aborted on the client machine if it has been sent out but computation has not yet commenced or (3) will be completed on receive credits if its computation has already started. We generate more results (three) per work unit than required for the quorum (two), to collect results more quickly. If we would not do it this way, we would always have to wait for the deadline to complete until the server detects that the clients do not send anything else in. Only then the server would generate an additional result and send that on out again and again wait for incoming data.

The progress bar is at 100% and seems to sit there for hours - what is happening here?

This is common behavior in BOINC projects, especially if you have just switched from another project to RNA World or if the work units of a given BOINC project are very heterogenous compared to each other. RNA World work units are de facto extremely heterogenous in their system requirements. For each computation, a series of small mini simulations is run on the server to estimate the time required for completion on the server. Since your machine differs from our server hardware, information based on the benchmarks performed from time to time on your machine are used to scale the duration determined for that work unit on the server to your machine. This scaling process is good but not perfectly accurate. So, the first work units often differ detectably in completion time from what the progress bar indicates. But, with more and more work units of that type pouring in on your system, a BOINC-integrated calculation mechanism corrects for that deviation in a progressive manner. So, with time, this "sitting at 100%" should become more and more rare. However, if the incoming work units are extremely different from each other in type (as is often the case for RNA World work units even if based on the same application), this adjustment might again turn out inaccurate for these new work units and an automatic re-adjustment will take place. In the worst case scenario, this might lead to the perception of an apparently constant unreliablity of the progress bar indicator. The bottom line is that you should just expect a work unit to take longer than indicated and not conclude there is something wrong with the work unit or your hardware.

Why is RNA World not using the standard BOINC forum?

The RNA World forums are multilingual which means there is more than just one forum and these are indeed located on a server different from the BOINC servers and from the RNA World server. The reason is that we need to make sure that forum communication remains intact even if the BOINC and the RNA World project severs are non-functional. It is actually surprising that several other DC projects do not do it the same way as we do. A single drawback is that you have to register on our forum server to make use of it but we feel that given the advantages, this drawback is bearable.

It seems a long-named RNA World work unit is blocking my entire BOINC system

This is a known issue which it is occurring only very rarely and relates to a yet unresolved bug in the BOINC manager. It is also exclusively happening on Windows-based machines. The source of the error is the fact that Windows allows only 256 characters at maximum that can be used for the sum of path name plus file name length. RNA World uses explicit file names, i.e. from the long file names the user can easily derive what is being computed on his or her machine. We would like to keep it like that to allow third-party developers to conveniently construct RNA World monitoring programs. The point is that if such a long-named work unit is being sent to your Windows machine, it will get stuck in the downloading process because it can't be written productively to your hard drive. As a baffling consequence, your BOINC manager will stop downloading work units for any DC project it is hooked up to. To resolve the issue you just have to delete that WU from within the BOINC manager. We hope that the BOINC developers will fix this issue, soon.

Can't you refresh the server status page more frequently?

We could, but we will not do that because we need to give priority to server performance. Updating the status page more frequently would cause a considerable increase in database queries which in turn cause additional server load. That power we prefer to dedicate to more important tasks as e.g. WU processing and serving. Anyway, the status page is refreshed every 10 minutes and if you browser shows something different, then you have a caching issue.

I started RNA World on my Debian Etch machine and got a lot of 'compute error' messages whereas running it on the same machine using RedHat gives no errors at all - is it possible that I experience an OS-dependent problem here?

Yes. If you have an older glibc version, please check your Linux distibution for an upgrade that employs glibc version 2.4. This will resolve the problem.

How can I monitor the RAM usage of individual RNA World WUs?

The most simple way to do this under Windows is to use the task manager (TM). But you need to tweak its settings a bit: start the TM -> choose the tab for 'processes' -> go to the TM menu and click 'view' -> 'select columns' and now check the checkbox of 'Peak Memory Usage'.


Eigene Werkzeuge