The nucleotide module provides several enum types which represent a single DNA or RNA molecule (a single base) from a sequence. The enum type provides convenience and type safety.
There are two categories of nucleotide types which also serve as type aliases for the DNA and RNA types that fall within each category. These aliases along with the AnyNucleotide alias which aliases all types within this module are useful for overloading of procs to each type within the category when appropriate.
- Type Categories
- Nucleotide
- StrictNucleotide
Nucleotide
The DNA and RNA types aliased by Nucleotide are consistent with the IUPAC nucleic acid notation except for one additional character '?' where it is not known if there is a gap or an unknown nucleic acid in the sequence. The DNA and RNA types can be used in cases where base ambiguity is desired. These types are mapped to an 8 bit unsigned integer representation following Paradis 2007 which allows for very fast comparison of nucleotides when base identities are ambiguous. The binary and uint8 representation along with the IUPAC symbols, definitions, and complementary nucleotides are summarized in the table below.
Symbol | Binary | uint8 | Definition | Complement |
---|---|---|---|---|
A | 10001000 | 136 | Adenine | T/U |
G | 01001000 | 72 | Guanine | C |
C | 00101000 | 40 | Cytosine | G |
T/U | 00011000 | 24 | Thymine/Uracil | A |
R | 11000000 | 192 | A or G | Y |
M | 10100000 | 160 | A or C | K |
W | 10010000 | 144 | A or T/U | W |
S | 01100000 | 96 | G or C | S |
K | 01010000 | 80 | G or T/U | M |
Y | 00110000 | 48 | C or T/U | R |
V | 11100000 | 224 | Not T/U | B |
H | 10110000 | 176 | Not G | D |
D | 11010000 | 208 | Not C | H |
B | 01110000 | 112 | Not A | V |
N | 11110000 | 240 | Any base | N |
- | 00000100 | 4 | Alignment gap | - |
? | 00000010 | 2 | Unknown character | ? |
Example:
import src/bioseq/nucleotide let t = parseChar('T', DNA) assert t.isThymine let comp = t.complement assert comp.toChar == 'A' let u = parseChar('U', RNA) assert u.isUracil let ut = u.toDNA assert ut.toChar == 'T' let r = parseChar('R', DNA) assert r.isPurine
StrictNucleotide
The StrictDNA and StrictRNA types aliased by StrictNucleotide are consistent with the Nucleotide types except that they are restricted to only A, G, C, and T/U nucleotides and thus do not allow ambiguity.
Symbol | Binary | uint8 | Definition | Complement |
---|---|---|---|---|
A | 10001000 | 136 | Adenine | T/U |
G | 01001000 | 72 | Guanine | C |
C | 00101000 | 40 | Cytosine | G |
T/U | 00011000 | 24 | Thymine/Uracil | A |
Example:
import src/bioseq/nucleotide let t = parseChar('T', StrictDNA) a = t.complement assert t.isThymine assert a.toChar == 'A'
Consts
dnaByte: array[DNA, byte] = [0b10001000'u8, 72'u, 40'u, 24'u, 192'u, 160'u, 144'u, 96'u, 80'u, 48'u, 224'u, 176'u, 208'u, 112'u, 240'u, 4'u, 2'u]
- Source Edit
dnaChar: array[DNA, char] = ['A', 'G', 'C', 'T', 'R', 'M', 'W', 'S', 'K', 'Y', 'V', 'H', 'D', 'B', 'N', '-', '?']
- Source Edit
dnaComplement: array[DNA, DNA] = [dnaT, dnaC, dnaG, dnaA, dnaY, dnaK, dnaW, dnaS, dnaM, dnaR, dnaB, dnaD, dnaH, dnaV, dnaN, dnaGap, dnaUnk]
- Source Edit
dnaUnambiguousSet: array[DNA, set[DNA]] = [{dnaA}, {dnaG}, {dnaC}, {dnaT}, {dnaA, dnaG}, {dnaA, dnaC}, {dnaA, dnaT}, {dnaG, dnaC}, {dnaG, dnaT}, {dnaC, dnaT}, {dnaA, dnaG, dnaC}, {dnaA, dnaC, dnaT}, {dnaA, dnaG, dnaT}, {dnaT, dnaG, dnaC}, {dnaA, dnaG, dnaC, dnaT}, {dnaGap}, {dnaA, dnaG, dnaC, dnaT, dnaGap}]
- Source Edit
rnaByte: array[RNA, byte] = [0b10001000'u8, 72'u, 40'u, 24'u, 192'u, 160'u, 144'u, 96'u, 80'u, 48'u, 224'u, 176'u, 208'u, 112'u, 240'u, 4'u, 2'u]
- Source Edit
rnaChar: array[RNA, char] = ['A', 'G', 'C', 'U', 'R', 'M', 'W', 'S', 'K', 'Y', 'V', 'H', 'D', 'B', 'N', '-', '?']
- Source Edit
rnaComplement: array[RNA, RNA] = [rnaU, rnaC, rnaG, rnaA, rnaY, rnaK, rnaW, rnaS, rnaM, rnaR, rnaB, rnaD, rnaH, rnaV, rnaN, rnaGap, rnaUnk]
- Source Edit
rnaUnambiguousSet: array[RNA, set[RNA]] = [{rnaA}, {rnaG}, {rnaC}, {rnaU}, {rnaA, rnaG}, {rnaA, rnaC}, {rnaA, rnaU}, {rnaG, rnaC}, {rnaG, rnaU}, {rnaC, rnaU}, {rnaA, rnaG, rnaC}, {rnaA, rnaC, rnaU}, {rnaA, rnaG, rnaU}, {rnaU, rnaG, rnaC}, {rnaA, rnaG, rnaC, rnaU}, {rnaGap}, {rnaA, rnaG, rnaC, rnaU, rnaGap}]
- Source Edit
strictDnaByte: array[StrictDNA, byte] = [0b10001000'u8, 72'u, 40'u, 24'u]
- Source Edit
strictDnaChar: array[StrictDNA, char] = ['A', 'G', 'C', 'T']
- Source Edit
strictDnaComplement: array[StrictDNA, StrictDNA] = [sdnaT, sdnaC, sdnaG, sdnaA]
- Source Edit
strictRnaByte: array[StrictRNA, byte] = [0b10001000'u8, 72'u, 40'u, 24'u]
- Source Edit
strictRnaChar: array[StrictRNA, char] = ['A', 'G', 'C', 'U']
- Source Edit
strictRnaComplement: array[StrictRNA, StrictRNA] = [srnaU, srnaC, srnaG, srnaA]
- Source Edit
Procs
func byte(n: DNA): byte {....raises: [], tags: [].}
- Byte representation of base, alias of uint8. Source Edit
func byte(n: RNA): byte {....raises: [], tags: [].}
- Byte representation of base, alias of uint8. Source Edit
func byte(n: StrictDNA): byte {....raises: [], tags: [].}
- Byte representation of base, alias of uint8. Source Edit
func byte(n: StrictRNA): byte {....raises: [], tags: [].}
- Byte representation of base, alias of uint8. Source Edit
func complement(n: DNA): DNA {....raises: [], tags: [].}
- Complimentary base. Source Edit
func complement(n: RNA): RNA {....raises: [], tags: [].}
- Complimentary base. Source Edit
func complement(n: StrictDNA): StrictDNA {....raises: [], tags: [].}
- Complimentary base. Source Edit
func complement(n: StrictRNA): StrictRNA {....raises: [], tags: [].}
- Complimentary base. Source Edit
func diffBase(a, b: AnyNucleotide): bool
- Returns true if bases are unambiguously different. A base will be treated as different if it is unknown '?' but not if it is any 'N' or gap '-'. Source Edit
func isAdenine(n: AnyNucleotide): bool
- Returns true if base is unambiguously adenine (A). Source Edit
func isCytosine(n: AnyNucleotide): bool
- Returns true if base is unambiguously cytosine (C). Source Edit
func isGuanine(n: AnyNucleotide): bool
- Returns true if base is unambiguously guanine (G). Source Edit
func isPurine(n: AnyNucleotide): bool
- Returns true if base ia a unambiguosly purine (A or G). Source Edit
func isPyrimidine(n: AnyNucleotide): bool
- Returns true if base is a unabmbiguously pyramidine (T/U or C). Source Edit
func isThymine(n: DNA | StrictDNA): bool
- Returns true if base is unambiguously thymine (T). Source Edit
func isUracil(n: RNA | StrictRNA): bool
- Returns true if base is unambiguously uracil (U). Source Edit
func knownBase(n: AnyNucleotide): bool
- Returns true if base is not ambiguous. Source Edit
func parseChar(c: char; typ: typedesc[StrictDNA]): StrictDNA
- Parse character to DNA enum type. Source Edit
func parseChar(c: char; typ: typedesc[StrictRNA]): StrictRNA
- Parse character to RNA enum type. Source Edit
func sameBase(a, b: AnyNucleotide): bool
- Returns true if bases are unambiguously the same. Source Edit
func toChar(n: StrictDNA): char {....raises: [], tags: [].}
- Character representation of base. Source Edit
func toChar(n: StrictRNA): char {....raises: [], tags: [].}
- Character representation of base. Source Edit
func toDNA(n: StrictRNA): StrictDNA {....raises: [], tags: [].}
- Transcribe from RNA to DNA. Source Edit
func toRNA(n: StrictDNA): StrictRNA {....raises: [], tags: [].}
- Transcribe from DNA to RNA. Source Edit
proc toUnambiguousSet(n: DNA): set[DNA] {....raises: [], tags: [].}
- Returns set of unambiguous DNA characters represented by a given character Source Edit
proc toUnambiguousSet(n: RNA): set[RNA] {....raises: [], tags: [].}
- Returns set of unambiguous RNA characters represented by a given character Source Edit