digest
|
Visualization of different minimizer schemes supported in Digest and code example using library
C++
library that supports various sub-sampling schemes for $k$-mers in DNA sequences.Digest
library utilizes the rolling hash-function from ntHash to order the $k$-mers in a window.Digest is available on bioconda. This installs both the C++ library and python library. The include
and lib
directories are in the conda environment dir (you can find it using conda env list
).
After cloning from GitHub, we use the Meson build-system to install the library.
PREFIX
is an absolute path to library files will be install (*.h
and *.a
files)PREFIX
should not be the root directory of the digest/
repo to avoid any issues with installation.--prefix=$(pwd)/build
from within the root directory of the digest/
repo.include
and lib
folders in PREFIX
folderIf your coding project uses Meson
to build the executable(s), you can include a file called subprojects/digest.wrap
in your repository and let Meson install it for you.
To use Digest in your C++ project, you just need to include the header files (*.h
) and library file (*.a
) that were installed in the first step. Assuming that build/
is the directory you installed them in, here is how you can compile.
There are three types of minimizer schemes that can be used:
The general steps to use Digest is as follows: (1) include the relevant header files, (2) declare the Digest object and (3) find the positions where the minimizers are present in the sequence.
output
.digest::BadCharPolicy::WRITEOVER
means that anytime the code encounters an non-ACTG
character, it will replace it with an A
.digest::BadCharPolicy::SKIPOVER
will skip any $k$-mers with non-ACTG
charactersdigest::ds::Adaptive
is our recommended data-structure for finding the minimum value in a window (see wiki for other options)If you would like to obtain both the positions and hash values for each minimizer, you can pass a vector of paired integers to do so.
Documentation generated with Doxygen can be found here
Included in the library are function bindings for each sub-sampling scheme for use in Python. The simplest way to install the python module is through conda (conda install bioconda::digest
). To install the Python module from source, first install the library with meson
(see above for detailed instructions), and install with pip
. For this setup, the meson
prefix must be set to --prefix=/$DIGEST_REPO/build
:
Alternatively, copy the lib
and include
directories from the earlier meson installation to a directory in the repo called build
, and run pip install .
We recommend using a conda or python virtual environment. Once installed, you can import and use the Digest library in Python: