Developers
Developing for GenomeKit currently requires a C++ compiler that support C++20 features. Both can be installed via conda (gxx_linux-64
and `clangxx_osx-64
). For OSX, you will also need the SDK via xcode-select --install
or from https://developer.apple.com/download/all/.
Setting up
Clone the source tree:
git clone git@github.com:deepgenomics/GenomeKit.git
From the GenomeKit
directory, install the provided conda environment which
contains all dependencies:
conda env create -f genomekit_dev.yml
conda activate genomekit_dev
On Windows, you’ll need to comment out the mac/linux only test dependencies from genomekit_dev.yml.
On M1 macs, you might need to set up the environment differently:
conda create -n cxx cxx-compiler zlib fmt
conda activate cxx
conda install -c conda-forge -c bioconda --file a-file-with-the-deps-from-genomekit_dev-yml.txt
Build the package in development mode:
pip install -e .
This builds the C++ extension and copies it into
your source tree (genome_kit/_cxx.so
).
It also ensures that import genome_kit
works from any directory
by linking your source tree from python’s site-packages
.
Note
Windows Prerequisites
You will need VS 2019 or newer installed. To get a compatible shell, either locate
and run vcvars64.bat
, or start the x64 Native Tools Command Prompt from the
Start menu.
To open VS with a preconfigured project, directly run in that command prompt:
.vcproj\genome_kit.sln
Finally, run the all the tests:
python -m unittest discover
You can also run examples from the demos
directory.
Jetbrains CLion setup
In the CMake settings, set the following environment variables:
IN_CLION=1;CONDA_PREFIX=$HOME/conda/envs/genomekit_dev
Making changes
If the C/C++ code changed, you must re-run the develop
command:
pip install -e .
This includes switching branches, merging changes, or editing the C/C++ code yourself. Forgetting this step may lead to unpredictable behaviour.
Tip
To speed up compilation on Ubuntu or Mac, install ccache
.
Before checking in any changes, run all tests locally:
python -m unittest discover
Adding tests
Tests are located in the tests
directory, and any data they need
is located in the tests/data
directory.
While developing a test, you may want to run it repeatedly, without
all other tests.
For example, to run just the TestInterval.test_serialize
method in
tests/test_interval.py
use:
python -m unittest tests.test_interval.TestInterval.test_serialize
C++ tests
To test C++ code directly, you can compile and run src/main.cpp:
cmake -DCMAKE_BUILD_TYPE=Debug -B unittestbuild
cmake --build unittestbuild --parallel --verbose --target main test
Debugging tests
Define envvar GK_DEBUGBREAK to break upon GK_CHECK failures when running under a debugger.
Building data files
GenomeKit relies on many pre-built files.
For example, the binary annotation gencode.v19.annotation.dganno
is built from gencode.v19.annotation.gff3.gz
.
Reasons to re-build these files include:
Changes to the binary file format.
Updates to the source data.
Changes to the processing of source data.
GenomeKit has two sets of data files:
Full data files are for normal use. They are stored remotely in the GenomeKit store and pulled to the user’s local file system on-demand.
Test data files are for testing. They are tiny excerpts of the full files, small enough to check in to source control, fast enough to run in continuous integration testing. They are stored in the source tree under
tests/data
.
The genome_kit
module’s build
command can be used to build full
Appris data files, and Appris/dganno/2bit test data files.
For a full set of options, run:
python -m genome_kit build --help
Building full data files
For instructions on how to build annotation (dganno) files and assembly (2bit) files, see Genomes.
Full-sized data files reside in a local user directory reserved for GenomeKit, downloaded from the data store on-demand.
Note
See the API Documentation for instructions on how to build data tracks, read alignments, read distributions, junction read alignments, and VCF tables.
Building test data files
Test data files reside in the source tree under tests/data
.
To build them, you must have registered your source tree in
develop mode:
pip install -e .
Now that your source tree is the default genome_kit import,
the build
subcommand will be able to find
your test data directory.
To build test annotation, 2bit, and Appris files, use –test-<type>
flags on the build
subcommand:
python -m genome_kit build --test-anno --test-2bit --test-appris
Releasing GenomeKit
The GenomeKit repo uses the Release Please bot to create Github releases based on PRs. When the bot creates a PR, you can merge it to create a release.
Once a Github release is created, a PR will automatically be created in the GenomeKit conda-forge feedstock repo by regro-cf-autotick-bot. Once that PR is merged, conda-forge’s CI pipeline is kicked off and the new version of GenomeKit is built and published to conda-forge.