Installation ============ Installation on PMACS Cluster for Developers -------------------------------------------- Somewhere under your home directory, clone the develop branch of the BEERS2.0 respository:: git clone -b develop git@github.com:itmat/BEERS2.0.git Now set up a virtual environment using *python 3.6*. I use conda on laptops but on PMACS I stick with python's *venv* module and I place the virtual environment inside my project:: cd BEERS2.0 python3 -m venv ./venv_beers I put ``venv*`` in ``.gitignore`` so you can use any name you want if you start it with venv and not have to worry about accidentally committing it. Now activate the environment thus:: source ./venv_beers/bin/activate You'll know the virtual environment is activated because the virtual environment path will precede your terminal prompt. Now you need to add the python packages/modules upon which BEERS depends. You do that by installing the packages/modules listed in the ``requirements_dev.txt`` file like so:: pip install -r requirements_dev.txt The ``requirements_dev.txt`` file is supposed to be a superset of the ``requirements.txt`` file and in fact, pulls in the ``requirements.txt`` file. Any packages/modules needed exclusively for development should be listed in the ``requirements_dev.txt`` file. Requirements needed for a user to run the code should live in the ``requirements.txt`` file. Next, we need to put the beers package where python can find it. And this is where the ``setup.py`` file on the top level comes in. From the top level directory once again, do the following:: pip install -e . This takes the current directory, packages it and creates a link to the packaged version in ``/lib/python3.6/site-packages``. The file name is ``beers.egg-link``. This allows python to find the beer package and subpackages while we can continue to edit them in place. Next go to the ``configuration`` directory and ``cp config.json`` to a personal config file (*e.g.*, ``my_config.json``). You can put it anywhere you like. You will have to reference it when running beers. Open your version and modify all the absolute pathnames to conform to your directory structure. Modify any parameters you wish to alter and save it. There is 1 command that you can find in the ``bin`` directory under the top level, called ``run_beers.py``. Calling help on it will show you what is currently possible with it:: ./run_beers.py -h usage: run_beers.py [-h] -c CONFIG [-r RUN_ID] [-d] {expression_pipeline,library_prep_pipeline,sequence_pipeline} ... BEERS Simulator, Version 2.0 positional arguments: {expression_pipeline,library_prep_pipeline,sequence_pipeline} pipeline subcommand expression_pipeline Run the expression pipeline only library_prep_pipeline Run the library prep pipeline only sequence_pipeline Run the sequence pipeline only optional arguments: -h, --help show this help message and exit required named arguments: -c CONFIG, --config CONFIG Full path to configuration file. optional named arguments - these override configuration file arguments.: -r RUN_ID, --run_id RUN_ID Integer used to specify run id. -d, --debug Indicates whether additional diagnostics are printed. Of the three subcommands, expression_pipeline, library_prep_pipeline, and sequence_pipeline, the library_prep_pipeline is probably the easiest to run currently. You would run it from the ``bin`` directory thus:: ./run_beers -r123 -d -c ../config/my_config.json library_prep_pipeline The run id and the path to the configuration file are both required. The ``-d`` is a debug switch. Without it, exception tracebacks will not appear. The library_prep_pipeline currently accepts just one molecule packet which it locates via the configuration file. For example:: "input": { "directory_path": "/home/crislawrence/Documents/beers_project/BEERS2.0/data/library_prep", "molecule_packet_filename": "molecule_packet_plus_source.pickle" } We only have the one packet so it is kind of precious right now. A copy of ``molecule_packet_plus_source.pickle`` is available under ``/projects/itmatlab/for_cris``. Feel free to grab it. It has 10K molecules (all polyadenylated) derived from ``Test_data.1002_baseline.sorted.bam``. I have been using 100 as a seed to get reproducible results, I would suggest others use other seeds to avoid us getting tunnel vision. One can use the molecule_packet output from the library prep pipeline as input for the sequence pipeline but again, you will need to tell the sequence pipeline where to find it via the configuration file, For example:: "input": { "directory_path": "/home/crislawrence/Documents/beers_project/BEERS2.0/data/library_prep/output", "molecule_packet_filename": "final_output.pickle" } Running the pipeline one stage at a time is a bit inconvenient presently. We have yet to write the stages together into a complete pipeline. The expression pipeline is more difficult to use as it requires the reference genome and the pair of alignment files presently (bam and bai) and really only runs the variants finder portion of the pipeline. I threw in a BeagleStep that will eventually call the Beagle process. For now, I put my own Java program as a parameter to that step so I'd have something to run. You can add your own external process as a placemarker for now, if you like. Requirements for Users ---------------------- If the user chooses to supply his/her own reference genome, it should be edited so that a sequence contains no line breaks. If the user declines to provide gender for each sample, the sample will not have X,Y, MT data. If the user neglects to provide gender for just some of the samples, X,Y,MT data will be generated for those samples that have gender and a warning will be issued to the user.