The nucleotide module provides several enum types which represent a single DNA or RNA molecule (a single base) from a sequence. The enum type provides convenience and type safety.

There are two categories of nucleotide types which also serve as type aliases for the DNA and RNA types that fall within each category. These aliases along with the AnyNucleotide alias which aliases all types within this module are useful for overloading of procs to each type within the category when appropriate.

Type Categories

Nucleotide
StrictNucleotide

Nucleotide

The DNA and RNA types aliased by Nucleotide are consistent with the IUPAC nucleic acid notation except for one additional character '?' where it is not known if there is a gap or an unknown nucleic acid in the sequence. The DNA and RNA types can be used in cases where base ambiguity is desired. These types are mapped to an 8 bit unsigned integer representation following Paradis 2007 which allows for very fast comparison of nucleotides when base identities are ambiguous. The binary and uint8 representation along with the IUPAC symbols, definitions, and complementary nucleotides are summarized in the table below.

Symbol	Binary	uint8	Definition	Complement
A	10001000	136	Adenine	T/U
G	01001000	72	Guanine	C
C	00101000	40	Cytosine	G
T/U	00011000	24	Thymine/Uracil	A
R	11000000	192	A or G	Y
M	10100000	160	A or C	K
W	10010000	144	A or T/U	W
S	01100000	96	G or C	S
K	01010000	80	G or T/U	M
Y	00110000	48	C or T/U	R
V	11100000	224	Not T/U	B
H	10110000	176	Not G	D
D	11010000	208	Not C	H
B	01110000	112	Not A	V
N	11110000	240	Any base	N
-	00000100	4	Alignment gap	-
?	00000010	2	Unknown character	?

Example:

import src/bioseq/nucleotide
let t = parseChar('T', DNA)
assert t.isThymine 

let comp = t.complement
assert comp.toChar == 'A'

let u = parseChar('U', RNA) 
assert u.isUracil

let ut = u.toDNA
assert ut.toChar == 'T'

let r = parseChar('R', DNA)
assert r.isPurine

StrictNucleotide

The StrictDNA and StrictRNA types aliased by StrictNucleotide are consistent with the Nucleotide types except that they are restricted to only A, G, C, and T/U nucleotides and thus do not allow ambiguity.

Symbol	Binary	uint8	Definition	Complement
A	10001000	136	Adenine	T/U
G	01001000	72	Guanine	C
C	00101000	40	Cytosine	G
T/U	00011000	24	Thymine/Uracil	A

Example:

import src/bioseq/nucleotide
let 
  t = parseChar('T', StrictDNA)
  a = t.complement
assert t.isThymine
assert a.toChar == 'A'

Imports

parserMacro

Types

AnyNucleotide = StrictNucleotide | Nucleotide: Source Edit
DNA = enum dnaA, dnaG, dnaC, dnaT, dnaR, dnaM, dnaW, dnaS, dnaK, dnaY, dnaV, dnaH, dnaD, dnaB, dnaN, dnaGap, dnaUnk: Source Edit
Nucleotide = DNA | RNA: Source Edit
RNA = enum rnaA, rnaG, rnaC, rnaU, rnaR, rnaM, rnaW, rnaS, rnaK, rnaY, rnaV, rnaH, rnaD, rnaB, rnaN, rnaGap, rnaUnk: Source Edit
StrictDNA = enum sdnaA, sdnaG, sdnaC, sdnaT: Source Edit
StrictNucleotide = StrictDNA | StrictRNA: Source Edit
StrictRNA = enum srnaA, srnaG, srnaC, srnaU: Source Edit

Consts

dnaByte: array[DNA, byte] = [0b10001000'u8, 72'u, 40'u, 24'u, 192'u, 160'u, 144'u, 96'u, 80'u, 48'u, 224'u, 176'u, 208'u, 112'u, 240'u, 4'u, 2'u]: Source Edit
dnaChar: array[DNA, char] = ['A', 'G', 'C', 'T', 'R', 'M', 'W', 'S', 'K', 'Y', 'V', 'H', 'D', 'B', 'N', '-', '?']: Source Edit
dnaComplement: array[DNA, DNA] = [dnaT, dnaC, dnaG, dnaA, dnaY, dnaK, dnaW, dnaS, dnaM, dnaR, dnaB, dnaD, dnaH, dnaV, dnaN, dnaGap, dnaUnk]: Source Edit
dnaUnambiguousSet: array[DNA, set[DNA]] = [{dnaA}, {dnaG}, {dnaC}, {dnaT}, {dnaA, dnaG}, {dnaA, dnaC}, {dnaA, dnaT}, {dnaG, dnaC}, {dnaG, dnaT}, {dnaC, dnaT}, {dnaA, dnaG, dnaC}, {dnaA, dnaC, dnaT}, {dnaA, dnaG, dnaT}, {dnaT, dnaG, dnaC}, {dnaA, dnaG, dnaC, dnaT}, {dnaGap}, {dnaA, dnaG, dnaC, dnaT, dnaGap}]: Source Edit
rnaByte: array[RNA, byte] = [0b10001000'u8, 72'u, 40'u, 24'u, 192'u, 160'u, 144'u, 96'u, 80'u, 48'u, 224'u, 176'u, 208'u, 112'u, 240'u, 4'u, 2'u]: Source Edit
rnaChar: array[RNA, char] = ['A', 'G', 'C', 'U', 'R', 'M', 'W', 'S', 'K', 'Y', 'V', 'H', 'D', 'B', 'N', '-', '?']: Source Edit
rnaComplement: array[RNA, RNA] = [rnaU, rnaC, rnaG, rnaA, rnaY, rnaK, rnaW, rnaS, rnaM, rnaR, rnaB, rnaD, rnaH, rnaV, rnaN, rnaGap, rnaUnk]: Source Edit
rnaUnambiguousSet: array[RNA, set[RNA]] = [{rnaA}, {rnaG}, {rnaC}, {rnaU}, {rnaA, rnaG}, {rnaA, rnaC}, {rnaA, rnaU}, {rnaG, rnaC}, {rnaG, rnaU}, {rnaC, rnaU}, {rnaA, rnaG, rnaC}, {rnaA, rnaC, rnaU}, {rnaA, rnaG, rnaU}, {rnaU, rnaG, rnaC}, {rnaA, rnaG, rnaC, rnaU}, {rnaGap}, {rnaA, rnaG, rnaC, rnaU, rnaGap}]: Source Edit
strictDnaByte: array[StrictDNA, byte] = [0b10001000'u8, 72'u, 40'u, 24'u]: Source Edit
strictDnaChar: array[StrictDNA, char] = ['A', 'G', 'C', 'T']: Source Edit
strictDnaComplement: array[StrictDNA, StrictDNA] = [sdnaT, sdnaC, sdnaG, sdnaA]: Source Edit
strictRnaByte: array[StrictRNA, byte] = [0b10001000'u8, 72'u, 40'u, 24'u]: Source Edit
strictRnaChar: array[StrictRNA, char] = ['A', 'G', 'C', 'U']: Source Edit
strictRnaComplement: array[StrictRNA, StrictRNA] = [srnaU, srnaC, srnaG, srnaA]: Source Edit

Procs

func byte(n: DNA): byte {....raises: [], tags: [].}: Byte representation of base, alias of uint8. Source Edit
func byte(n: RNA): byte {....raises: [], tags: [].}: Byte representation of base, alias of uint8. Source Edit
func byte(n: StrictDNA): byte {....raises: [], tags: [].}: Byte representation of base, alias of uint8. Source Edit
func byte(n: StrictRNA): byte {....raises: [], tags: [].}: Byte representation of base, alias of uint8. Source Edit
func complement(n: DNA): DNA {....raises: [], tags: [].}: Complimentary base. Source Edit
func complement(n: RNA): RNA {....raises: [], tags: [].}: Complimentary base. Source Edit
func complement(n: StrictDNA): StrictDNA {....raises: [], tags: [].}: Complimentary base. Source Edit
func complement(n: StrictRNA): StrictRNA {....raises: [], tags: [].}: Complimentary base. Source Edit
func diffBase(a, b: AnyNucleotide): bool: Returns true if bases are unambiguously different. A base will be treated as different if it is unknown '?' but not if it is any 'N' or gap '-'. Source Edit
func isAdenine(n: AnyNucleotide): bool: Returns true if base is unambiguously adenine (A). Source Edit
func isCytosine(n: AnyNucleotide): bool: Returns true if base is unambiguously cytosine (C). Source Edit
func isGuanine(n: AnyNucleotide): bool: Returns true if base is unambiguously guanine (G). Source Edit
func isPurine(n: AnyNucleotide): bool: Returns true if base ia a unambiguosly purine (A or G). Source Edit
func isPyrimidine(n: AnyNucleotide): bool: Returns true if base is a unabmbiguously pyramidine (T/U or C). Source Edit
func isThymine(n: DNA | StrictDNA): bool: Returns true if base is unambiguously thymine (T). Source Edit
func isUracil(n: RNA | StrictRNA): bool: Returns true if base is unambiguously uracil (U). Source Edit
func knownBase(n: AnyNucleotide): bool: Returns true if base is not ambiguous. Source Edit
func parseChar(c: char; typ: typedesc[DNA]): DNA: Parse character to DNA enum type. Source Edit
func parseChar(c: char; typ: typedesc[RNA]): RNA: Parse character to RNA enum type. Source Edit
func parseChar(c: char; typ: typedesc[StrictDNA]): StrictDNA: Parse character to DNA enum type. Source Edit
func parseChar(c: char; typ: typedesc[StrictRNA]): StrictRNA: Parse character to RNA enum type. Source Edit
func sameBase(a, b: AnyNucleotide): bool: Returns true if bases are unambiguously the same. Source Edit
func toChar(n: DNA): char {....raises: [], tags: [].}: Character representation of base. Source Edit
func toChar(n: RNA): char {....raises: [], tags: [].}: Character representation of base. Source Edit
func toChar(n: StrictDNA): char {....raises: [], tags: [].}: Character representation of base. Source Edit
func toChar(n: StrictRNA): char {....raises: [], tags: [].}: Character representation of base. Source Edit
func toDNA(n: RNA): DNA {....raises: [], tags: [].}: Transcribe from RNA to DNA. Source Edit
func toDNA(n: StrictRNA): StrictDNA {....raises: [], tags: [].}: Transcribe from RNA to DNA. Source Edit
func toRNA(n: DNA): RNA {....raises: [], tags: [].}: Transcribe from DNA to RNA. Source Edit
func toRNA(n: StrictDNA): StrictRNA {....raises: [], tags: [].}: Transcribe from DNA to RNA. Source Edit
proc toUnambiguousSet(n: DNA): set[DNA] {....raises: [], tags: [].}: Returns set of unambiguous DNA characters represented by a given character Source Edit
proc toUnambiguousSet(n: RNA): set[RNA] {....raises: [], tags: [].}: Returns set of unambiguous RNA characters represented by a given character Source Edit