src/bioseq/nucleotide

    Dark Mode
Search:
Group by:
  Source   Edit

The nucleotide module provides several enum types which represent a single DNA or RNA molecule (a single base) from a sequence. The enum type provides convenience and type safety.

There are two categories of nucleotide types which also serve as type aliases for the DNA and RNA types that fall within each category. These aliases along with the AnyNucleotide alias which aliases all types within this module are useful for overloading of procs to each type within the category when appropriate.

Type Categories
  • Nucleotide
  • StrictNucleotide

Nucleotide

The DNA and RNA types aliased by Nucleotide are consistent with the IUPAC nucleic acid notation except for one additional character '?' where it is not known if there is a gap or an unknown nucleic acid in the sequence. The DNA and RNA types can be used in cases where base ambiguity is desired. These types are mapped to an 8 bit unsigned integer representation following Paradis 2007 which allows for very fast comparison of nucleotides when base identities are ambiguous. The binary and uint8 representation along with the IUPAC symbols, definitions, and complementary nucleotides are summarized in the table below.

SymbolBinaryuint8DefinitionComplement
A10001000136AdenineT/U
G0100100072GuanineC
C0010100040CytosineG
T/U0001100024Thymine/UracilA
R11000000192A or GY
M10100000160A or CK
W10010000144A or T/UW
S0110000096G or CS
K0101000080G or T/UM
Y0011000048C or T/UR
V11100000224Not T/UB
H10110000176Not GD
D11010000208Not CH
B01110000112Not AV
N11110000240Any baseN
-000001004Alignment gap-
?000000102Unknown character?

Example:

import src/bioseq/nucleotide
let t = parseChar('T', DNA)
assert t.isThymine 

let comp = t.complement
assert comp.toChar == 'A'

let u = parseChar('U', RNA) 
assert u.isUracil

let ut = u.toDNA
assert ut.toChar == 'T'

let r = parseChar('R', DNA)
assert r.isPurine

StrictNucleotide

The StrictDNA and StrictRNA types aliased by StrictNucleotide are consistent with the Nucleotide types except that they are restricted to only A, G, C, and T/U nucleotides and thus do not allow ambiguity.

SymbolBinaryuint8DefinitionComplement
A10001000136AdenineT/U
G0100100072GuanineC
C0010100040CytosineG
T/U0001100024Thymine/UracilA

Example:

import src/bioseq/nucleotide
let 
  t = parseChar('T', StrictDNA)
  a = t.complement
assert t.isThymine
assert a.toChar == 'A'

Types

DNA = enum
  dnaA, dnaG, dnaC, dnaT, dnaR, dnaM, dnaW, dnaS, dnaK, dnaY, dnaV, dnaH, dnaD,
  dnaB, dnaN, dnaGap, dnaUnk
  Source   Edit
Nucleotide = DNA | RNA
  Source   Edit
RNA = enum
  rnaA, rnaG, rnaC, rnaU, rnaR, rnaM, rnaW, rnaS, rnaK, rnaY, rnaV, rnaH, rnaD,
  rnaB, rnaN, rnaGap, rnaUnk
  Source   Edit
StrictDNA = enum
  sdnaA, sdnaG, sdnaC, sdnaT
  Source   Edit
StrictRNA = enum
  srnaA, srnaG, srnaC, srnaU
  Source   Edit

Consts

dnaByte: array[DNA, byte] = [0b10001000'u8, 72'u, 40'u, 24'u, 192'u, 160'u,
                             144'u, 96'u, 80'u, 48'u, 224'u, 176'u, 208'u,
                             112'u, 240'u, 4'u, 2'u]
  Source   Edit
dnaChar: array[DNA, char] = ['A', 'G', 'C', 'T', 'R', 'M', 'W', 'S', 'K', 'Y',
                             'V', 'H', 'D', 'B', 'N', '-', '?']
  Source   Edit
dnaComplement: array[DNA, DNA] = [dnaT, dnaC, dnaG, dnaA, dnaY, dnaK, dnaW,
                                  dnaS, dnaM, dnaR, dnaB, dnaD, dnaH, dnaV,
                                  dnaN, dnaGap, dnaUnk]
  Source   Edit
dnaUnambiguousSet: array[DNA, set[DNA]] = [{dnaA}, {dnaG}, {dnaC}, {dnaT},
    {dnaA, dnaG}, {dnaA, dnaC}, {dnaA, dnaT}, {dnaG, dnaC}, {dnaG, dnaT},
    {dnaC, dnaT}, {dnaA, dnaG, dnaC}, {dnaA, dnaC, dnaT}, {dnaA, dnaG, dnaT},
    {dnaT, dnaG, dnaC}, {dnaA, dnaG, dnaC, dnaT}, {dnaGap},
    {dnaA, dnaG, dnaC, dnaT, dnaGap}]
  Source   Edit
rnaByte: array[RNA, byte] = [0b10001000'u8, 72'u, 40'u, 24'u, 192'u, 160'u,
                             144'u, 96'u, 80'u, 48'u, 224'u, 176'u, 208'u,
                             112'u, 240'u, 4'u, 2'u]
  Source   Edit
rnaChar: array[RNA, char] = ['A', 'G', 'C', 'U', 'R', 'M', 'W', 'S', 'K', 'Y',
                             'V', 'H', 'D', 'B', 'N', '-', '?']
  Source   Edit
rnaComplement: array[RNA, RNA] = [rnaU, rnaC, rnaG, rnaA, rnaY, rnaK, rnaW,
                                  rnaS, rnaM, rnaR, rnaB, rnaD, rnaH, rnaV,
                                  rnaN, rnaGap, rnaUnk]
  Source   Edit
rnaUnambiguousSet: array[RNA, set[RNA]] = [{rnaA}, {rnaG}, {rnaC}, {rnaU},
    {rnaA, rnaG}, {rnaA, rnaC}, {rnaA, rnaU}, {rnaG, rnaC}, {rnaG, rnaU},
    {rnaC, rnaU}, {rnaA, rnaG, rnaC}, {rnaA, rnaC, rnaU}, {rnaA, rnaG, rnaU},
    {rnaU, rnaG, rnaC}, {rnaA, rnaG, rnaC, rnaU}, {rnaGap},
    {rnaA, rnaG, rnaC, rnaU, rnaGap}]
  Source   Edit
strictDnaByte: array[StrictDNA, byte] = [0b10001000'u8, 72'u, 40'u, 24'u]
  Source   Edit
strictDnaChar: array[StrictDNA, char] = ['A', 'G', 'C', 'T']
  Source   Edit
strictDnaComplement: array[StrictDNA, StrictDNA] = [sdnaT, sdnaC, sdnaG, sdnaA]
  Source   Edit
strictRnaByte: array[StrictRNA, byte] = [0b10001000'u8, 72'u, 40'u, 24'u]
  Source   Edit
strictRnaChar: array[StrictRNA, char] = ['A', 'G', 'C', 'U']
  Source   Edit
strictRnaComplement: array[StrictRNA, StrictRNA] = [srnaU, srnaC, srnaG, srnaA]
  Source   Edit

Procs

func byte(n: DNA): byte {....raises: [], tags: [].}
Byte representation of base, alias of uint8.   Source   Edit
func byte(n: RNA): byte {....raises: [], tags: [].}
Byte representation of base, alias of uint8.   Source   Edit
func byte(n: StrictDNA): byte {....raises: [], tags: [].}
Byte representation of base, alias of uint8.   Source   Edit
func byte(n: StrictRNA): byte {....raises: [], tags: [].}
Byte representation of base, alias of uint8.   Source   Edit
func complement(n: DNA): DNA {....raises: [], tags: [].}
Complimentary base.   Source   Edit
func complement(n: RNA): RNA {....raises: [], tags: [].}
Complimentary base.   Source   Edit
func complement(n: StrictDNA): StrictDNA {....raises: [], tags: [].}
Complimentary base.   Source   Edit
func complement(n: StrictRNA): StrictRNA {....raises: [], tags: [].}
Complimentary base.   Source   Edit
func diffBase(a, b: AnyNucleotide): bool
Returns true if bases are unambiguously different. A base will be treated as different if it is unknown '?' but not if it is any 'N' or gap '-'.   Source   Edit
func isAdenine(n: AnyNucleotide): bool
Returns true if base is unambiguously adenine (A).   Source   Edit
func isCytosine(n: AnyNucleotide): bool
Returns true if base is unambiguously cytosine (C).   Source   Edit
func isGuanine(n: AnyNucleotide): bool
Returns true if base is unambiguously guanine (G).   Source   Edit
func isPurine(n: AnyNucleotide): bool
Returns true if base ia a unambiguosly purine (A or G).   Source   Edit
func isPyrimidine(n: AnyNucleotide): bool
Returns true if base is a unabmbiguously pyramidine (T/U or C).   Source   Edit
func isThymine(n: DNA | StrictDNA): bool
Returns true if base is unambiguously thymine (T).   Source   Edit
func isUracil(n: RNA | StrictRNA): bool
Returns true if base is unambiguously uracil (U).   Source   Edit
func knownBase(n: AnyNucleotide): bool
Returns true if base is not ambiguous.   Source   Edit
func parseChar(c: char; typ: typedesc[DNA]): DNA
Parse character to DNA enum type.   Source   Edit
func parseChar(c: char; typ: typedesc[RNA]): RNA
Parse character to RNA enum type.   Source   Edit
func parseChar(c: char; typ: typedesc[StrictDNA]): StrictDNA
Parse character to DNA enum type.   Source   Edit
func parseChar(c: char; typ: typedesc[StrictRNA]): StrictRNA
Parse character to RNA enum type.   Source   Edit
func sameBase(a, b: AnyNucleotide): bool
Returns true if bases are unambiguously the same.   Source   Edit
func toChar(n: DNA): char {....raises: [], tags: [].}
Character representation of base.   Source   Edit
func toChar(n: RNA): char {....raises: [], tags: [].}
Character representation of base.   Source   Edit
func toChar(n: StrictDNA): char {....raises: [], tags: [].}
Character representation of base.   Source   Edit
func toChar(n: StrictRNA): char {....raises: [], tags: [].}
Character representation of base.   Source   Edit
func toDNA(n: RNA): DNA {....raises: [], tags: [].}
Transcribe from RNA to DNA.   Source   Edit
func toDNA(n: StrictRNA): StrictDNA {....raises: [], tags: [].}
Transcribe from RNA to DNA.   Source   Edit
func toRNA(n: DNA): RNA {....raises: [], tags: [].}
Transcribe from DNA to RNA.   Source   Edit
func toRNA(n: StrictDNA): StrictRNA {....raises: [], tags: [].}
Transcribe from DNA to RNA.   Source   Edit
proc toUnambiguousSet(n: DNA): set[DNA] {....raises: [], tags: [].}
Returns set of unambiguous DNA characters represented by a given character   Source   Edit
proc toUnambiguousSet(n: RNA): set[RNA] {....raises: [], tags: [].}
Returns set of unambiguous RNA characters represented by a given character   Source   Edit