digest
|
an abstract class for Digester objects. More...
#include <digester.hpp>
Public Member Functions | |
Digester (const char *seq, size_t len, unsigned k, size_t start=0, MinimizedHashType minimized_h=MinimizedHashType::CANON) | |
Digester (const std::string &seq, unsigned k, size_t start=0, MinimizedHashType minimized_h=MinimizedHashType::CANON) | |
bool | get_is_valid_hash () |
unsigned | get_k () |
size_t | get_len () |
bool | roll_one () |
moves the internal pointer to the next valid k-mer. Time Complexity: O(1) | |
virtual void | roll_minimizer (unsigned amount, std::vector< uint32_t > &vec)=0 |
gets the positions, as defined by get_pos(), of minimizers up to the amount specified | |
virtual void | roll_minimizer (unsigned amount, std::vector< std::pair< uint32_t, uint32_t > > &vec)=0 |
gets the positions (pair.first), as defined by get_pos(), and the hashes (pair.second) of minimizers up to the amount specified | |
size_t | get_pos () |
uint64_t | get_chash () |
uint64_t | get_fhash () |
uint64_t | get_rhash () |
virtual void | new_seq (const char *seq, size_t len, size_t start) |
replaces the current sequence with the new one. It's like starting over with a completely new seqeunce | |
virtual void | new_seq (const std::string &seq, size_t pos) |
replaces the current sequence with the new one. It's like starting over with a completely new sequence | |
void | append_seq (const char *seq, size_t len) |
simulates the appending of a new sequence to the end of the old sequence. The old sequence will no longer be stored, but the rolling hashes will be able to preceed as if the sequences were appended. Can only be called when you've reached the end of the current sequence i.e. if you're current sequence is ACTGAC, and you have reached the end of this sequence, and you call append_seq with the sequence CCGGCCGG, then the minimizers you will get after calling append_seq plus the minimizers you got from going through ACTGAC, will be equivalent to the minimizers you would have gotten from rolling across ACTGACCCGGCCGG | |
void | append_seq (const std::string &seq) |
simulates the appending of a new sequence to the end of the old sequence. The old sequence will no longer be stored, but the rolling hashes will be able to preceed as if the sequences were appended. Can only be called when you've reached the end of the current sequence i.e. if you're current sequence is ACTGAC, and you have reached the end of this sequence, and you call append_seq with the sequence CCGGCCGG, then the minimizers you will get after calling append_seq plus the minimizers you got from going through ACTGAC, will be equivalent to the minimizers you would have gotten from rolling across ACTGACCCGGCCGG | |
MinimizedHashType | get_minimized_h () |
const char * | get_sequence () |
an abstract class for Digester objects.
a | BadCharPolicy enum value. The policy to adopt when handling non-ACTG characters. |
|
inline |
seq | const char pointer pointing to the c-string of DNA sequence to be hashed. |
len | length of seq. |
k | kmer size. |
start | 0-indexed position in seq to start hashing from. |
minimized_h | whether we are minimizing the canonical, forward, or reverse hash |
BadConstructionException | Thrown if k is less than 4, or if the starting position is after the end of the string |
|
inline |
seq | const string of the DNA sequence to be hashed. |
k | kmer size. |
start | 0-indexed position in seq to start hashing from. |
minimized_h | whether we are minimizing the canonical, forward, or reverse hash |
BadConstructionException | Thrown if k is less than 4, or if the starting position is after the end of the string |
|
inline |
simulates the appending of a new sequence to the end of the old sequence. The old sequence will no longer be stored, but the rolling hashes will be able to preceed as if the sequences were appended. Can only be called when you've reached the end of the current sequence i.e. if you're current sequence is ACTGAC, and you have reached the end of this sequence, and you call append_seq with the sequence CCGGCCGG, then the minimizers you will get after calling append_seq plus the minimizers you got from going through ACTGAC, will be equivalent to the minimizers you would have gotten from rolling across ACTGACCCGGCCGG
seq | const C string of DNA sequence to be appended |
len | length of the sequence |
NotRolledTillEndException | Thrown when the internal iterator is not at the end of the current sequence |
|
inline |
simulates the appending of a new sequence to the end of the old sequence. The old sequence will no longer be stored, but the rolling hashes will be able to preceed as if the sequences were appended. Can only be called when you've reached the end of the current sequence i.e. if you're current sequence is ACTGAC, and you have reached the end of this sequence, and you call append_seq with the sequence CCGGCCGG, then the minimizers you will get after calling append_seq plus the minimizers you got from going through ACTGAC, will be equivalent to the minimizers you would have gotten from rolling across ACTGACCCGGCCGG
seq | const std string of DNA sequence to be appended |
NotRolledTillEndException | Thrown when the internal iterator is not at the end of the current sequence |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
|
inlinevirtual |
replaces the current sequence with the new one. It's like starting over with a completely new seqeunce
seq | const char pointer to new sequence to be hashed |
len | length of the new sequence |
start | position in new sequence to start from |
BadConstructionException | thrown if the starting position is greater than the length of the string |
Reimplemented in digest::WindowMin< P, T >.
|
inlinevirtual |
replaces the current sequence with the new one. It's like starting over with a completely new sequence
seq | const std string reference to the new sequence to be hashed |
start | position in new sequence to start from |
BadConstructionException | thrown if the starting position is greater than the length of the string |
Reimplemented in digest::WindowMin< P, T >.
|
pure virtual |
gets the positions (pair.first), as defined by get_pos(), and the hashes (pair.second) of minimizers up to the amount specified
amount | number of minimizers you want to generate |
vec | a reference to a vector of a pair of uint32_t, the positions and hashes returned will go there |
Implemented in digest::ModMin< P >, digest::Syncmer< P, T >, and digest::WindowMin< P, T >.
|
pure virtual |
gets the positions, as defined by get_pos(), of minimizers up to the amount specified
amount | number of minimizers you want to generate |
vec | a reference to a vector of uint32_t, the positions returned will go there |
Implemented in digest::ModMin< P >, digest::Syncmer< P, T >, and digest::WindowMin< P, T >.
|
inline |
moves the internal pointer to the next valid k-mer.
Time Complexity: O(1)