digest
Loading...
Searching...
No Matches
Public Member Functions | List of all members
digest::Digester< P > Class Template Referenceabstract

an abstract class for Digester objects. More...

#include <digester.hpp>

Inheritance diagram for digest::Digester< P >:
digest::ModMin< P > digest::WindowMin< P, T > digest::Syncmer< P, T >

Public Member Functions

 Digester (const char *seq, size_t len, unsigned k, size_t start=0, MinimizedHashType minimized_h=MinimizedHashType::CANON)
 
 Digester (const std::string &seq, unsigned k, size_t start=0, MinimizedHashType minimized_h=MinimizedHashType::CANON)
 
bool get_is_valid_hash ()
 
unsigned get_k ()
 
size_t get_len ()
 
bool roll_one ()
 moves the internal pointer to the next valid k-mer.
Time Complexity: O(1)
 
virtual void roll_minimizer (unsigned amount, std::vector< uint32_t > &vec)=0
 gets the positions, as defined by get_pos(), of minimizers up to the amount specified
 
virtual void roll_minimizer (unsigned amount, std::vector< std::pair< uint32_t, uint32_t > > &vec)=0
 gets the positions (pair.first), as defined by get_pos(), and the hashes (pair.second) of minimizers up to the amount specified
 
size_t get_pos ()
 
uint64_t get_chash ()
 
uint64_t get_fhash ()
 
uint64_t get_rhash ()
 
virtual void new_seq (const char *seq, size_t len, size_t start)
 replaces the current sequence with the new one. It's like starting over with a completely new seqeunce
 
virtual void new_seq (const std::string &seq, size_t pos)
 replaces the current sequence with the new one. It's like starting over with a completely new sequence
 
void append_seq (const char *seq, size_t len)
 simulates the appending of a new sequence to the end of the old sequence. The old sequence will no longer be stored, but the rolling hashes will be able to preceed as if the sequences were appended. Can only be called when you've reached the end of the current sequence i.e. if you're current sequence is ACTGAC, and you have reached the end of this sequence, and you call append_seq with the sequence CCGGCCGG, then the minimizers you will get after calling append_seq plus the minimizers you got from going through ACTGAC, will be equivalent to the minimizers you would have gotten from rolling across ACTGACCCGGCCGG
 
void append_seq (const std::string &seq)
 simulates the appending of a new sequence to the end of the old sequence. The old sequence will no longer be stored, but the rolling hashes will be able to preceed as if the sequences were appended. Can only be called when you've reached the end of the current sequence i.e. if you're current sequence is ACTGAC, and you have reached the end of this sequence, and you call append_seq with the sequence CCGGCCGG, then the minimizers you will get after calling append_seq plus the minimizers you got from going through ACTGAC, will be equivalent to the minimizers you would have gotten from rolling across ACTGACCCGGCCGG
 
MinimizedHashType get_minimized_h ()
 
const char * get_sequence ()
 

Detailed Description

template<BadCharPolicy P>
class digest::Digester< P >

an abstract class for Digester objects.

Template Parameters
aBadCharPolicy enum value. The policy to adopt when handling non-ACTG characters.

Constructor & Destructor Documentation

◆ Digester() [1/2]

template<BadCharPolicy P>
digest::Digester< P >::Digester ( const char *  seq,
size_t  len,
unsigned  k,
size_t  start = 0,
MinimizedHashType  minimized_h = MinimizedHashType::CANON 
)
inline
Parameters
seqconst char pointer pointing to the c-string of DNA sequence to be hashed.
lenlength of seq.
kkmer size.
start0-indexed position in seq to start hashing from.
minimized_hwhether we are minimizing the canonical, forward, or reverse hash
Exceptions
BadConstructionExceptionThrown if k is less than 4, or if the starting position is after the end of the string

◆ Digester() [2/2]

template<BadCharPolicy P>
digest::Digester< P >::Digester ( const std::string &  seq,
unsigned  k,
size_t  start = 0,
MinimizedHashType  minimized_h = MinimizedHashType::CANON 
)
inline
Parameters
seqconst string of the DNA sequence to be hashed.
kkmer size.
start0-indexed position in seq to start hashing from.
minimized_hwhether we are minimizing the canonical, forward, or reverse hash
Exceptions
BadConstructionExceptionThrown if k is less than 4, or if the starting position is after the end of the string

Member Function Documentation

◆ append_seq() [1/2]

template<BadCharPolicy P>
void digest::Digester< P >::append_seq ( const char *  seq,
size_t  len 
)
inline

simulates the appending of a new sequence to the end of the old sequence. The old sequence will no longer be stored, but the rolling hashes will be able to preceed as if the sequences were appended. Can only be called when you've reached the end of the current sequence i.e. if you're current sequence is ACTGAC, and you have reached the end of this sequence, and you call append_seq with the sequence CCGGCCGG, then the minimizers you will get after calling append_seq plus the minimizers you got from going through ACTGAC, will be equivalent to the minimizers you would have gotten from rolling across ACTGACCCGGCCGG

Parameters
seqconst C string of DNA sequence to be appended
lenlength of the sequence
Exceptions
NotRolledTillEndExceptionThrown when the internal iterator is not at the end of the current sequence

◆ append_seq() [2/2]

template<BadCharPolicy P>
void digest::Digester< P >::append_seq ( const std::string &  seq)
inline

simulates the appending of a new sequence to the end of the old sequence. The old sequence will no longer be stored, but the rolling hashes will be able to preceed as if the sequences were appended. Can only be called when you've reached the end of the current sequence i.e. if you're current sequence is ACTGAC, and you have reached the end of this sequence, and you call append_seq with the sequence CCGGCCGG, then the minimizers you will get after calling append_seq plus the minimizers you got from going through ACTGAC, will be equivalent to the minimizers you would have gotten from rolling across ACTGACCCGGCCGG

Parameters
seqconst std string of DNA sequence to be appended
Exceptions
NotRolledTillEndExceptionThrown when the internal iterator is not at the end of the current sequence

◆ get_chash()

template<BadCharPolicy P>
uint64_t digest::Digester< P >::get_chash ( )
inline
Returns
uint64_t, the canonical hash of the kmer that was rolled over when roll_one was last called (roll_minimizer() calls roll_one() internally).

◆ get_fhash()

template<BadCharPolicy P>
uint64_t digest::Digester< P >::get_fhash ( )
inline
Returns
uint64_t, the forward hash of the kmer that was rolled over when roll_one was last called (roll_minimizer() calls roll_one() internally).

◆ get_is_valid_hash()

template<BadCharPolicy P>
bool digest::Digester< P >::get_is_valid_hash ( )
inline
Returns
bool, true if values of the 3 hashes are meaningful, false otherwise, i.e. the object wasn't able to initialize with a valid hash or roll_one() was called when already at end of sequence

◆ get_k()

template<BadCharPolicy P>
unsigned digest::Digester< P >::get_k ( )
inline
Returns
unsigned, the value of k (kmer size)

◆ get_len()

template<BadCharPolicy P>
size_t digest::Digester< P >::get_len ( )
inline
Returns
size_t, the length of the sequence

◆ get_minimized_h()

template<BadCharPolicy P>
MinimizedHashType digest::Digester< P >::get_minimized_h ( )
inline
Returns
unsigned, a number representing the hash you are minimizing, 0 for canoncial, 1 for forward, 2 for reverse

◆ get_pos()

template<BadCharPolicy P>
size_t digest::Digester< P >::get_pos ( )
inline
Returns
current index of the first character of the current kmer that has been hashed. Sequences that have been appended onto each other count as 1 big sequence, i.e. if you first had a sequence of length 10 and then appended another sequence of length 20, and the index of the first character of the current k-mer is at index 4, 0-indexed, in the second sequence, then it will return 14

◆ get_rhash()

template<BadCharPolicy P>
uint64_t digest::Digester< P >::get_rhash ( )
inline
Returns
uint64_t, the reverse hash of the kmer that was rolled over when roll_one was last called (roll_minimizer() calls roll_one() internally).

◆ get_sequence()

template<BadCharPolicy P>
const char * digest::Digester< P >::get_sequence ( )
inline
Returns
const char* representation of the sequence

◆ new_seq() [1/2]

template<BadCharPolicy P>
virtual void digest::Digester< P >::new_seq ( const char *  seq,
size_t  len,
size_t  start 
)
inlinevirtual

replaces the current sequence with the new one. It's like starting over with a completely new seqeunce

Parameters
seqconst char pointer to new sequence to be hashed
lenlength of the new sequence
startposition in new sequence to start from
Exceptions
BadConstructionExceptionthrown if the starting position is greater than the length of the string

Reimplemented in digest::WindowMin< P, T >.

◆ new_seq() [2/2]

template<BadCharPolicy P>
virtual void digest::Digester< P >::new_seq ( const std::string &  seq,
size_t  pos 
)
inlinevirtual

replaces the current sequence with the new one. It's like starting over with a completely new sequence

Parameters
seqconst std string reference to the new sequence to be hashed
startposition in new sequence to start from
Exceptions
BadConstructionExceptionthrown if the starting position is greater than the length of the string

Reimplemented in digest::WindowMin< P, T >.

◆ roll_minimizer() [1/2]

template<BadCharPolicy P>
virtual void digest::Digester< P >::roll_minimizer ( unsigned  amount,
std::vector< std::pair< uint32_t, uint32_t > > &  vec 
)
pure virtual

gets the positions (pair.first), as defined by get_pos(), and the hashes (pair.second) of minimizers up to the amount specified

Parameters
amountnumber of minimizers you want to generate
veca reference to a vector of a pair of uint32_t, the positions and hashes returned will go there

Implemented in digest::ModMin< P >, digest::Syncmer< P, T >, and digest::WindowMin< P, T >.

◆ roll_minimizer() [2/2]

template<BadCharPolicy P>
virtual void digest::Digester< P >::roll_minimizer ( unsigned  amount,
std::vector< uint32_t > &  vec 
)
pure virtual

gets the positions, as defined by get_pos(), of minimizers up to the amount specified

Parameters
amountnumber of minimizers you want to generate
veca reference to a vector of uint32_t, the positions returned will go there

Implemented in digest::ModMin< P >, digest::Syncmer< P, T >, and digest::WindowMin< P, T >.

◆ roll_one()

template<BadCharPolicy P>
bool digest::Digester< P >::roll_one ( )
inline

moves the internal pointer to the next valid k-mer.
Time Complexity: O(1)

Returns
bool, true if we were able generate a valid hash, false otherwise

The documentation for this class was generated from the following file: