Installation
============

Installation on PMACS Cluster for Developers
--------------------------------------------

Somewhere under your home directory, clone the develop branch of the BEERS2.0 respository::

    git clone -b develop git@github.com:itmat/BEERS2.0.git

Now set up a virtual environment using *python 3.6*.  I use conda on laptops but on PMACS I stick
with python's *venv* module and I place the virtual environment inside my project::

    cd BEERS2.0
    python3 -m venv ./venv_beers

I put ``venv*`` in ``.gitignore`` so you can use any name you want if you start it with venv and not have
to worry about accidentally committing it.

Now activate the environment thus::

    source ./venv_beers/bin/activate

You'll know the virtual environment is activated because the virtual environment path will precede
your terminal prompt.  Now you need to add the python packages/modules upon which BEERS depends.  You
do that by installing the packages/modules listed in the ``requirements_dev.txt`` file like so::

    pip install -r requirements_dev.txt

The ``requirements_dev.txt`` file is supposed to be a superset of the ``requirements.txt`` file and in fact,
pulls in the ``requirements.txt`` file.  Any packages/modules needed exclusively for development should
be listed in the ``requirements_dev.txt`` file.  Requirements needed for a user to run the code should
live in the ``requirements.txt`` file.

Next, we need to put the beers package where python can find it.  And this is where the ``setup.py``
file on the top level comes in.  From the top level directory once again, do the following::

    pip install -e .

This takes the current directory, packages it and creates a link to the packaged version in
``<virtualenv>/lib/python3.6/site-packages``.  The file name is ``beers.egg-link``.  This allows python
to find the beer package and subpackages while we can continue to edit them in place.

Next go to the ``configuration`` directory and ``cp config.json`` to a personal config file
(*e.g.*, ``my_config.json``).  You can put it anywhere you like.  You will have to reference it
when running beers.  Open your version and modify all the absolute pathnames to conform to your
directory structure.  Modify any parameters you wish to alter and save it.

There is 1 command that you can find in the ``bin`` directory under the top level, called ``run_beers.py``.
Calling help on it will show you what is currently possible with it::

    ./run_beers.py -h
    usage: run_beers.py [-h] -c CONFIG [-r RUN_ID] [-d]
                        {expression_pipeline,library_prep_pipeline,sequence_pipeline}
                        ...

    BEERS Simulator, Version 2.0

    positional arguments:
    {expression_pipeline,library_prep_pipeline,sequence_pipeline}
                            pipeline subcommand
        expression_pipeline
                            Run the expression pipeline only
        library_prep_pipeline
                            Run the library prep pipeline only
        sequence_pipeline   Run the sequence pipeline only

    optional arguments:
     -h, --help            show this help message and exit

    required named arguments:
    -c CONFIG, --config CONFIG
                            Full path to configuration file.

    optional named arguments - these override configuration file arguments.:
    -r RUN_ID, --run_id RUN_ID
                            Integer used to specify run id.
    -d, --debug           Indicates whether additional diagnostics are printed.


Of the three subcommands, expression_pipeline, library_prep_pipeline, and sequence_pipeline, the
library_prep_pipeline is probably the easiest to run currently.  You would run it from the ``bin``
directory thus::

    ./run_beers -r123 -d -c ../config/my_config.json library_prep_pipeline

The run id and the path to the configuration file are both required.  The ``-d`` is a debug switch.
Without it, exception tracebacks will not appear.  The library_prep_pipeline currently accepts just
one molecule packet which it locates via the configuration file.  For example::

     "input": {
        "directory_path": "/home/crislawrence/Documents/beers_project/BEERS2.0/data/library_prep",
        "molecule_packet_filename": "molecule_packet_plus_source.pickle"
     }

We only have the one packet so it is kind of precious right now.  A copy of
``molecule_packet_plus_source.pickle`` is available under ``/projects/itmatlab/for_cris``.  Feel free to
grab it.  It has 10K molecules (all polyadenylated) derived from ``Test_data.1002_baseline.sorted.bam``.

I have been using 100 as a seed to get reproducible results, I would suggest others use other
seeds to avoid us getting tunnel vision.

One can use the molecule_packet output from the library prep pipeline as input for the sequence
pipeline but again, you will need to tell the sequence pipeline where to find it via the
configuration file,  For example::

    "input": {
        "directory_path": "/home/crislawrence/Documents/beers_project/BEERS2.0/data/library_prep/output",
        "molecule_packet_filename": "final_output.pickle"
    }

Running the pipeline one stage at a time is a bit inconvenient presently.  We have yet to write
the stages together into a complete pipeline.

The expression pipeline is more difficult to use as it requires the reference genome and the pair
of alignment files presently (bam and bai) and really only runs the variants finder portion of
the pipeline.  I threw in a BeagleStep that will eventually call the Beagle process.  For now, I
put my own Java program as a parameter to that step so I'd have something to run.  You can
add your own external process as a placemarker for now, if you like.

Requirements for Users
----------------------

If the user chooses to supply his/her own reference genome, it should be edited so that a
sequence contains no line breaks.

If the user declines to provide gender for each sample, the sample will not have X,Y, MT
data.  If the user neglects to provide gender for just some of the samples, X,Y,MT data
will be generated for those samples that have gender and a warning will be issued to
the user.