.. _develop: ========== Developers ========== Developing for GenomeKit currently requires a C++ compiler that support C++20 features. Both can be installed via conda (``gxx_linux-64`` and ```clangxx_osx-64``). For OSX, you will also need the SDK via ``xcode-select --install`` or from https://developer.apple.com/download/all/. Setting up ---------- Clone the source tree:: git clone git@github.com:deepgenomics/GenomeKit.git From the ``GenomeKit`` directory, install the provided conda environment which contains all dependencies:: conda env create -f genomekit_dev.yml conda activate genomekit_dev On Windows, you'll need to comment out the mac/linux only test dependencies from genomekit_dev.yml. On M1 macs, you might need to set up the environment differently:: conda create -n cxx cxx-compiler zlib conda activate cxx conda install -c conda-forge -c bioconda --file a-file-with-the-deps-from-genomekit_dev-yml.txt Build the package in development mode:: pip install -e . This builds the C++ extension and copies it into your source tree (``genome_kit/_cxx.so``). It also ensures that ``import genome_kit`` works from any directory by linking your source tree from python's ``site-packages``. .. note:: Windows Prerequisites You will need VS 2019 or newer installed. To get a compatible shell, either locate and run ``vcvars64.bat``, or start the x64 Native Tools Command Prompt from the Start menu. To open VS with a preconfigured project, directly run in that command prompt:: .vcproj\genome_kit.sln Finally, run the all the tests:: python -m unittest discover You can also run examples from the ``demos`` directory. Jetbrains CLion setup --------------------- In the CMake settings, set the following environment variables:: IN_CLION=1;CONDA_PREFIX=$HOME/conda/envs/genomekit_dev Making changes -------------- If the C/C++ code changed, you must re-run the ``develop`` command:: pip install -e . This includes switching branches, merging changes, or editing the C/C++ code yourself. *Forgetting this step may lead to unpredictable behaviour.* .. tip:: To speed up compilation on Ubuntu or Mac, install ``ccache``. Before checking in any changes, run all tests locally:: python -m unittest discover Adding tests ------------ Tests are located in the ``tests`` directory, and any data they need is located in the ``tests/data`` directory. While developing a test, you may want to run it repeatedly, without all other tests. For example, to run just the ``TestInterval.test_serialize`` method in ``tests/test_interval.py`` use:: python -m unittest tests.test_interval.TestInterval.test_serialize C++ tests ^^^^^^^^^ To test C++ code directly, you can compile and run src/main.cpp:: cmake -DCMAKE_BUILD_TYPE=Debug -B unittestbuild cmake --build unittestbuild --parallel --verbose --target main test Debugging tests ^^^^^^^^^^^^^^^ Define envvar `GK_DEBUGBREAK` to break upon GK_CHECK failures when running under a debugger. Building data files ------------------- GenomeKit relies on many pre-built files. For example, the binary annotation ``gencode.v19.annotation.dganno`` is built from ``gencode.v19.annotation.gff3.gz``. Reasons to re-build these files include: * Changes to the binary file format. * Updates to the source data. * Changes to the processing of source data. GenomeKit has two sets of data files: * *Full data files* are for normal use. They are stored remotely in the GenomeKit store and pulled to the user's local file system on-demand. * *Test data files* are for testing. They are tiny excerpts of the full files, small enough to check in to source control, fast enough to run in continuous integration testing. They are stored in the source tree under ``tests/data``. The ``genome_kit`` module's ``build`` command can be used to build full Appris/MANE data files, and Appris/MANE/dganno/2bit test data files. For a full set of options, run:: python -m genome_kit build --help Building full data files ^^^^^^^^^^^^^^^^^^^^^^^^ For instructions on how to build annotation (dganno) files and assembly (2bit) files, see `Genomes `_. Full-sized data files reside in a local user directory reserved for GenomeKit, downloaded from the data store on-demand. .. note:: See the API Documentation for instructions on how to build `data tracks `_, `read alignments `_, `read distributions `_, `junction read alignments `_, and `VCF tables `_. Building test data files ^^^^^^^^^^^^^^^^^^^^^^^^ Test data files reside in the source tree under ``tests/data``. To build them, you must have registered your source tree in develop mode:: pip install -e . Now that your source tree is the default `genome_kit` import, the ``build`` subcommand will be able to find your test data directory. To build test annotation, 2bit, Appris, and MANE files, use `--test-` flags on the ``build`` subcommand:: python -m genome_kit build --test-anno --test-2bit --test-appris --test-mane Releasing GenomeKit ------------------- The `GenomeKit repo `__ uses the `Release Please bot `__ to create Github releases based on PRs. When the bot creates a PR, you can merge it to create a release. Once a Github release is created, a PR will automatically be created in the `GenomeKit conda-forge feedstock repo `__ by regro-cf-autotick-bot. Once that PR is merged, conda-forge's CI pipeline is kicked off and the new version of GenomeKit is built and published to conda-forge.