digest
Loading...
Searching...
No Matches
Classes | Functions
digest::thread_out Namespace Reference

Possible implementation for multi-threading the digestion of a single sequence. The key thing to note is basically by carefully telling where each digester should start digesting you can make it so each kmer is only considered once. For more details on a function, click on more and it will take you to the description that is located in modules. More...

Classes

class  BadThreadOutParams
 Exception thrown when invalid parameters are passed to the thread functions. More...
 

Functions

template<digest::BadCharPolicy P>
void thread_mod (unsigned thread_count, std::vector< std::vector< uint32_t > > &vec, const char *seq, size_t len, unsigned k, uint32_t mod, uint32_t congruence=0, size_t start=0, digest::MinimizedHashType minimized_h=digest::MinimizedHashType::CANON)
 
template<digest::BadCharPolicy P>
void thread_mod (unsigned thread_count, std::vector< std::vector< uint32_t > > &vec, const std::string &seq, unsigned k, uint32_t mod, uint32_t congruence=0, size_t start=0, digest::MinimizedHashType minimized_h=digest::MinimizedHashType::CANON)
 same as the other thread_mod, except it can take a C++ string, and does not need to be provided the length of the string
 
template<digest::BadCharPolicy P>
void thread_mod (unsigned thread_count, std::vector< std::vector< std::pair< uint32_t, uint32_t > > > &vec, const char *seq, size_t len, unsigned k, uint32_t mod, uint32_t congruence=0, size_t start=0, digest::MinimizedHashType minimized_h=digest::MinimizedHashType::CANON)
 same as other thread_mod that takes a c-string, except here vec is a vector of vectors of pairs of uint32_ts
 
template<digest::BadCharPolicy P>
void thread_mod (unsigned thread_count, std::vector< std::vector< std::pair< uint32_t, uint32_t > > > &vec, const std::string &seq, unsigned k, uint32_t mod, uint32_t congruence=0, size_t start=0, digest::MinimizedHashType minimized_h=digest::MinimizedHashType::CANON)
 same as other thread_mod that takes a C++ string, except here vec is a vector of vectors of pairs of uint32_ts
 
template<digest::BadCharPolicy P, class T >
void thread_wind (unsigned thread_count, std::vector< std::vector< uint32_t > > &vec, const char *seq, size_t len, unsigned k, uint32_t large_wind_kmer_am, size_t start=0, digest::MinimizedHashType minimized_h=digest::MinimizedHashType::CANON)
 
template<digest::BadCharPolicy P, class T >
void thread_wind (unsigned thread_count, std::vector< std::vector< uint32_t > > &vec, const std::string &seq, unsigned k, uint32_t large_wind_kmer_am, size_t start=0, digest::MinimizedHashType minimized_h=digest::MinimizedHashType::CANON)
 same as the other thread_wind, except it can take a C++ string, and does not need to be provided the length of the string
 
template<digest::BadCharPolicy P, class T >
void thread_wind (unsigned thread_count, std::vector< std::vector< std::pair< uint32_t, uint32_t > > > &vec, const char *seq, size_t len, unsigned k, uint32_t large_wind_kmer_am, size_t start=0, digest::MinimizedHashType minimized_h=digest::MinimizedHashType::CANON)
 same as other thread_wind that takes a c-string, except here vec is a vector of vectors of pairs of uint32_ts
 
template<digest::BadCharPolicy P, class T >
void thread_wind (unsigned thread_count, std::vector< std::vector< std::pair< uint32_t, uint32_t > > > &vec, const std::string &seq, unsigned k, uint32_t large_wind_kmer_am, size_t start=0, digest::MinimizedHashType minimized_h=digest::MinimizedHashType::CANON)
 same as other thread_wind that takes a C++ string, except here vec is a vector of vectors of pairs of uint32_ts
 
template<digest::BadCharPolicy P, class T >
void thread_sync (unsigned thread_count, std::vector< std::vector< uint32_t > > &vec, const char *seq, size_t len, unsigned k, uint32_t large_wind_kmer_am, size_t start=0, digest::MinimizedHashType minimized_h=digest::MinimizedHashType::CANON)
 
template<digest::BadCharPolicy P, class T >
void thread_sync (unsigned thread_count, std::vector< std::vector< uint32_t > > &vec, const std::string &seq, unsigned k, uint32_t large_wind_kmer_am, size_t start=0, digest::MinimizedHashType minimized_h=digest::MinimizedHashType::CANON)
 same as the other thread_sync, except it can take a C++ string, and does not need to be provided the length of the string
 
template<digest::BadCharPolicy P, class T >
void thread_sync (unsigned thread_count, std::vector< std::vector< std::pair< uint32_t, uint32_t > > > &vec, const char *seq, size_t len, unsigned k, uint32_t large_wind_kmer_am, size_t start=0, digest::MinimizedHashType minimized_h=digest::MinimizedHashType::CANON)
 same as other thread_wind that takes a c-string, except here vec is a vector of vectors of pairs of uint32_ts
 
template<digest::BadCharPolicy P, class T >
void thread_sync (unsigned thread_count, std::vector< std::vector< std::pair< uint32_t, uint32_t > > > &vec, const std::string &seq, unsigned k, uint32_t large_wind_kmer_am, size_t start=0, digest::MinimizedHashType minimized_h=digest::MinimizedHashType::CANON)
 same as other thread_sync that takes a C++ string, except here vec is a vector of vectors of pairs of uint32_ts
 

Detailed Description

Possible implementation for multi-threading the digestion of a single sequence. The key thing to note is basically by carefully telling where each digester should start digesting you can make it so each kmer is only considered once. For more details on a function, click on more and it will take you to the description that is located in modules.

IMPORTANT:
This approach will not generate correct results for sequences that contain non-ACTG characters. Take this example, seq = ACTGANACNACTGA, k = 4, l_wind = 4, thread_count = 2, there is a total of 4 valid kmers in this sequence, and thus only 1 valid large window, but we can't know this until it actually goes through the sequence, so it's going to try to partition the sequence into ACTGANACNA, and ANACNACTGA and feed it into 2 digester objects which now each have 0 valid large windows

Function Documentation

◆ thread_mod() [1/4]

template<digest::BadCharPolicy P>
void digest::thread_out::thread_mod ( unsigned  thread_count,
std::vector< std::vector< std::pair< uint32_t, uint32_t > > > &  vec,
const char *  seq,
size_t  len,
unsigned  k,
uint32_t  mod,
uint32_t  congruence = 0,
size_t  start = 0,
digest::MinimizedHashType  minimized_h = digest::MinimizedHashType::CANON 
)

same as other thread_mod that takes a c-string, except here vec is a vector of vectors of pairs of uint32_ts

Parameters
vecvec will contain both the index and the hash of minimizers. All other things previously stated about vec remain true

◆ thread_mod() [2/4]

template<digest::BadCharPolicy P>
void digest::thread_out::thread_mod ( unsigned  thread_count,
std::vector< std::vector< std::pair< uint32_t, uint32_t > > > &  vec,
const std::string &  seq,
unsigned  k,
uint32_t  mod,
uint32_t  congruence = 0,
size_t  start = 0,
digest::MinimizedHashType  minimized_h = digest::MinimizedHashType::CANON 
)

same as other thread_mod that takes a C++ string, except here vec is a vector of vectors of pairs of uint32_ts

Parameters
vecvec will contain both the index and the hash of minimizers. All other things previously stated about vec remain true

◆ thread_mod() [3/4]

template<digest::BadCharPolicy P>
void digest::thread_out::thread_mod ( unsigned  thread_count,
std::vector< std::vector< uint32_t > > &  vec,
const char *  seq,
size_t  len,
unsigned  k,
uint32_t  mod,
uint32_t  congruence = 0,
size_t  start = 0,
digest::MinimizedHashType  minimized_h = digest::MinimizedHashType::CANON 
)
Parameters
thread_countthe number of threads to use
veca vector of vectors in which the minimizers will be placed. Each vector corresponds to one thread. The minimizers within each vector will be in ascending order by index, and the vectors themselves will also be in ascending order by index, i.e. all minimizers in vector_i will go before all minimizers in vector_(i+1).
seqchar pointer poitning to the c-string of DNA sequence to be hashed.
lenlength of seq.
kk-mer size.
modmod space to be used to calculate universal minimizers
congruencevalue we want minimizer hashes to be congruent to in the mod space
start0-indexed position in seq to start hashing from.
minimized_hhash to be minimized, 0 for canoncial, 1 for forward, 2 for reverse
Exceptions
BadThreadOutParams

◆ thread_mod() [4/4]

template<digest::BadCharPolicy P>
void digest::thread_out::thread_mod ( unsigned  thread_count,
std::vector< std::vector< uint32_t > > &  vec,
const std::string &  seq,
unsigned  k,
uint32_t  mod,
uint32_t  congruence = 0,
size_t  start = 0,
digest::MinimizedHashType  minimized_h = digest::MinimizedHashType::CANON 
)

same as the other thread_mod, except it can take a C++ string, and does not need to be provided the length of the string

Parameters
seqC++ string of DNA sequence to be hashed.

◆ thread_sync() [1/4]

template<digest::BadCharPolicy P, class T >
void digest::thread_out::thread_sync ( unsigned  thread_count,
std::vector< std::vector< std::pair< uint32_t, uint32_t > > > &  vec,
const char *  seq,
size_t  len,
unsigned  k,
uint32_t  large_wind_kmer_am,
size_t  start = 0,
digest::MinimizedHashType  minimized_h = digest::MinimizedHashType::CANON 
)

same as other thread_wind that takes a c-string, except here vec is a vector of vectors of pairs of uint32_ts

Parameters
vecvec will contain both the index and the hash of minimizers. All other things previously stated about vec remain true

◆ thread_sync() [2/4]

template<digest::BadCharPolicy P, class T >
void digest::thread_out::thread_sync ( unsigned  thread_count,
std::vector< std::vector< std::pair< uint32_t, uint32_t > > > &  vec,
const std::string &  seq,
unsigned  k,
uint32_t  large_wind_kmer_am,
size_t  start = 0,
digest::MinimizedHashType  minimized_h = digest::MinimizedHashType::CANON 
)

same as other thread_sync that takes a C++ string, except here vec is a vector of vectors of pairs of uint32_ts

Parameters
vecvec will contain both the index and the hash of minimizers. All other things previously stated about vec remain true

◆ thread_sync() [3/4]

template<digest::BadCharPolicy P, class T >
void digest::thread_out::thread_sync ( unsigned  thread_count,
std::vector< std::vector< uint32_t > > &  vec,
const char *  seq,
size_t  len,
unsigned  k,
uint32_t  large_wind_kmer_am,
size_t  start = 0,
digest::MinimizedHashType  minimized_h = digest::MinimizedHashType::CANON 
)
Template Parameters
Ppolicy for dealing with non-ACTG characters
Tmin query data structure to use, refer to docs of the classes in the ds namespace for more info
Parameters
thread_countthe number of threads to use
veca vector of vectors in which the minimizers will be placed. Each vector corresponds to one thread. The minimizers within each vector will be in ascending order by index, and the vectors themselves will also be in ascending order by index, i.e. all minimizers in vector_i will go before all minimizers in vector_(i+1).
seqchar pointer poitning to the c-string of DNA sequence to be hashed.
lenlength of seq.
kk-mer size.
large_wind_kmer_am
start0-indexed position in seq to start hashing from.
minimized_hhash to be minimized, 0 for canoncial, 1 for forward, 2 for reverse
Exceptions
BadThreadOutParams

◆ thread_sync() [4/4]

template<digest::BadCharPolicy P, class T >
void digest::thread_out::thread_sync ( unsigned  thread_count,
std::vector< std::vector< uint32_t > > &  vec,
const std::string &  seq,
unsigned  k,
uint32_t  large_wind_kmer_am,
size_t  start = 0,
digest::MinimizedHashType  minimized_h = digest::MinimizedHashType::CANON 
)

same as the other thread_sync, except it can take a C++ string, and does not need to be provided the length of the string

Parameters
seqC++ string of DNA sequence to be hashed.

◆ thread_wind() [1/4]

template<digest::BadCharPolicy P, class T >
void digest::thread_out::thread_wind ( unsigned  thread_count,
std::vector< std::vector< std::pair< uint32_t, uint32_t > > > &  vec,
const char *  seq,
size_t  len,
unsigned  k,
uint32_t  large_wind_kmer_am,
size_t  start = 0,
digest::MinimizedHashType  minimized_h = digest::MinimizedHashType::CANON 
)

same as other thread_wind that takes a c-string, except here vec is a vector of vectors of pairs of uint32_ts

Parameters
vecvec will contain both the index and the hash of minimizers. All other things previously stated about vec remain true

◆ thread_wind() [2/4]

template<digest::BadCharPolicy P, class T >
void digest::thread_out::thread_wind ( unsigned  thread_count,
std::vector< std::vector< std::pair< uint32_t, uint32_t > > > &  vec,
const std::string &  seq,
unsigned  k,
uint32_t  large_wind_kmer_am,
size_t  start = 0,
digest::MinimizedHashType  minimized_h = digest::MinimizedHashType::CANON 
)

same as other thread_wind that takes a C++ string, except here vec is a vector of vectors of pairs of uint32_ts

Parameters
vecvec will contain both the index and the hash of minimizers. All other things previously stated about vec remain true

◆ thread_wind() [3/4]

template<digest::BadCharPolicy P, class T >
void digest::thread_out::thread_wind ( unsigned  thread_count,
std::vector< std::vector< uint32_t > > &  vec,
const char *  seq,
size_t  len,
unsigned  k,
uint32_t  large_wind_kmer_am,
size_t  start = 0,
digest::MinimizedHashType  minimized_h = digest::MinimizedHashType::CANON 
)
Template Parameters
Ppolicy for dealing with non-ACTG characters
Tmin query data structure to use, refer to docs of the classes in the ds namespace for more info
Parameters
thread_countthe number of threads to use
veca vector of vectors in which the minimizers will be placed. Each vector corresponds to one thread. The minimizers within each vector will be in ascending order by index, and the vectors themselves will also be in ascending order by index, i.e. all minimizers in vector_i will go before all minimizers in vector_(i+1).
seqchar pointer poitning to the c-string of DNA sequence to be hashed.
lenlength of seq.
kk-mer size.
large_wind_kmer_am
start0-indexed position in seq to start hashing from.
minimized_hhash to be minimized, 0 for canoncial, 1 for forward, 2 for reverse
Exceptions
BadThreadOutParams

◆ thread_wind() [4/4]

template<digest::BadCharPolicy P, class T >
void digest::thread_out::thread_wind ( unsigned  thread_count,
std::vector< std::vector< uint32_t > > &  vec,
const std::string &  seq,
unsigned  k,
uint32_t  large_wind_kmer_am,
size_t  start = 0,
digest::MinimizedHashType  minimized_h = digest::MinimizedHashType::CANON 
)

same as the other thread_wind, except it can take a C++ string, and does not need to be provided the length of the string

Parameters
seqC++ string of DNA sequence to be hashed.