Installation¶
Installation on PMACS Cluster for Developers¶
Somewhere under your home directory, clone the develop branch of the BEERS2.0 respository:
git clone -b develop git@github.com:itmat/BEERS2.0.git
Now set up a virtual environment using python 3.6. I use conda on laptops but on PMACS I stick with python’s venv module and I place the virtual environment inside my project:
cd BEERS2.0
python3 -m venv ./venv_beers
I put venv* in .gitignore so you can use any name you want if you start it with venv and not have
to worry about accidentally committing it.
Now activate the environment thus:
source ./venv_beers/bin/activate
You’ll know the virtual environment is activated because the virtual environment path will precede
your terminal prompt. Now you need to add the python packages/modules upon which BEERS depends. You
do that by installing the packages/modules listed in the requirements_dev.txt file like so:
pip install -r requirements_dev.txt
The requirements_dev.txt file is supposed to be a superset of the requirements.txt file and in fact,
pulls in the requirements.txt file. Any packages/modules needed exclusively for development should
be listed in the requirements_dev.txt file. Requirements needed for a user to run the code should
live in the requirements.txt file.
Next, we need to put the beers package where python can find it. And this is where the setup.py
file on the top level comes in. From the top level directory once again, do the following:
pip install -e .
This takes the current directory, packages it and creates a link to the packaged version in
<virtualenv>/lib/python3.6/site-packages. The file name is beers.egg-link. This allows python
to find the beer package and subpackages while we can continue to edit them in place.
Next go to the configuration directory and cp config.json to a personal config file
(e.g., my_config.json). You can put it anywhere you like. You will have to reference it
when running beers. Open your version and modify all the absolute pathnames to conform to your
directory structure. Modify any parameters you wish to alter and save it.
There is 1 command that you can find in the bin directory under the top level, called run_beers.py.
Calling help on it will show you what is currently possible with it:
./run_beers.py -h
usage: run_beers.py [-h] -c CONFIG [-r RUN_ID] [-d]
{expression_pipeline,library_prep_pipeline,sequence_pipeline}
...
BEERS Simulator, Version 2.0
positional arguments:
{expression_pipeline,library_prep_pipeline,sequence_pipeline}
pipeline subcommand
expression_pipeline
Run the expression pipeline only
library_prep_pipeline
Run the library prep pipeline only
sequence_pipeline Run the sequence pipeline only
optional arguments:
-h, --help show this help message and exit
required named arguments:
-c CONFIG, --config CONFIG
Full path to configuration file.
optional named arguments - these override configuration file arguments.:
-r RUN_ID, --run_id RUN_ID
Integer used to specify run id.
-d, --debug Indicates whether additional diagnostics are printed.
Of the three subcommands, expression_pipeline, library_prep_pipeline, and sequence_pipeline, the
library_prep_pipeline is probably the easiest to run currently. You would run it from the bin
directory thus:
./run_beers -r123 -d -c ../config/my_config.json library_prep_pipeline
The run id and the path to the configuration file are both required. The -d is a debug switch.
Without it, exception tracebacks will not appear. The library_prep_pipeline currently accepts just
one molecule packet which it locates via the configuration file. For example:
"input": {
"directory_path": "/home/crislawrence/Documents/beers_project/BEERS2.0/data/library_prep",
"molecule_packet_filename": "molecule_packet_plus_source.pickle"
}
We only have the one packet so it is kind of precious right now. A copy of
molecule_packet_plus_source.pickle is available under /projects/itmatlab/for_cris. Feel free to
grab it. It has 10K molecules (all polyadenylated) derived from Test_data.1002_baseline.sorted.bam.
I have been using 100 as a seed to get reproducible results, I would suggest others use other seeds to avoid us getting tunnel vision.
One can use the molecule_packet output from the library prep pipeline as input for the sequence pipeline but again, you will need to tell the sequence pipeline where to find it via the configuration file, For example:
"input": {
"directory_path": "/home/crislawrence/Documents/beers_project/BEERS2.0/data/library_prep/output",
"molecule_packet_filename": "final_output.pickle"
}
Running the pipeline one stage at a time is a bit inconvenient presently. We have yet to write the stages together into a complete pipeline.
The expression pipeline is more difficult to use as it requires the reference genome and the pair of alignment files presently (bam and bai) and really only runs the variants finder portion of the pipeline. I threw in a BeagleStep that will eventually call the Beagle process. For now, I put my own Java program as a parameter to that step so I’d have something to run. You can add your own external process as a placemarker for now, if you like.
Requirements for Users¶
If the user chooses to supply his/her own reference genome, it should be edited so that a sequence contains no line breaks.
If the user declines to provide gender for each sample, the sample will not have X,Y, MT data. If the user neglects to provide gender for just some of the samples, X,Y,MT data will be generated for those samples that have gender and a warning will be issued to the user.