This module contains the AminoAcid enum type for working with amino acid sequence data. Using an enum type provides convenience and type safety. The AminoAcid type represents an extended IUPAC code which includes Prolysine, Selenocysteine, and the two ambiguous characters 'B' and 'Z'. A full description of the implementation can be seen in the table below.
The genetic codes for translating to amino acids are sourced from NCIB at https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi.
Symbol | BioSeq | Abreviation | Definition |
---|---|---|---|
A | aaA | Ala | Alanine |
C | aaC | Cys | Cysteine |
D | aaD | Asp | Aspartic Acid |
E | aaE | Glu | Glutamic Acid |
F | aaF | Phe | Phenylalanine |
G | aaG | Gly | Glycine |
H | aaH | His | Histidine |
I | aaI | Ile | Isoleucine |
K | aaK | Lys | Lysine |
L | aaL | Leu | Leucine |
M | aaM | Met | Methionine |
N | aaN | Asn | Asparagine |
O | aaO | Pyl | Pyrolysine |
P | aaP | Pro | Proline |
Q | aaQ | Gln | Glutamine |
R | aaR | Arg | Arginine |
S | aaS | Ser | Serine |
T | aaT | Thr | Threonine |
U | aaU | Sec | Selenocysteine |
V | aaV | Val | Valine |
W | aaW | Trp | Tryptophan |
Y | aaY | Tyr | Tyrosine |
B | aaB | Asx | Aspartic acid or asparagine |
Z | aaZ | Glx | Glutamic acid or glutamine |
* | aaStp | Stp | Stop |
X | aaX | Amb | Ambiguous/Unknown |
Example:
import src/bioseq/aminoAcid import bioseq let ala = parseChar('A', AminoAcid) assert ala.toChar == 'A' let amino = translateCodon([dnaT, dnaT, dnaT], gCode1) assert amino == aaF
Types
AminoAcid = enum aaA, aaC, aaD, aaE, aaF, aaG, aaH, aaI, aaK, aaL, aaM, aaN, aaO, aaP, aaQ, aaR, aaS, aaT, aaU, aaV, aaW, aaY, aaB, aaZ, aaStp, aaX
- Source Edit
GeneticCode = enum gCode1 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YY*WCCSSSSLLFF", gCode2 = "KKNN**SSTTTTMMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YYWWCCSSSSLLFF", gCode3 = "KKNNRRSSTTTTMMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPTTTT**YYWWCCSSSSLLFF", gCode4 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YYWWCCSSSSLLFF", gCode5 = "KKNNSSSSTTTTMMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YYWWCCSSSSLLFF", gCode6 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLLQQYY*WCCSSSSLLFF", gCode9 = "NKNNSSSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YYWWCCSSSSLLFF", gCode10 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YYCWCCSSSSLLFF", gCode11 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YY*WCCSSSSLLFF", gCode12 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLSLL**YY*WCCSSSSLLFF", gCode13 = "KKNNGGSSTTTTMMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YYWWCCSSSSLLFF", gCode14 = "NKNNSSSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLLY*YYWWCCSSSSLLFF", gCode16 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL*LYY*WCCSSSSLLFF", gCode21 = "NKNNSSSSTTTTMMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YYWWCCSSSSLLFF", gCode22 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL*LYY*WCC*SSSLLFF", gCode23 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YY*WCCSSSS*LFF", gCode24 = "KKNNSKSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YYWWCCSSSSLLFF", gCode25 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YYGWCCSSSSLLFF", gCode26 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLALL**YY*WCCSSSSLLFF", gCode27 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLLQQYYWWCCSSSSLLFF", gCode28 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLLQQYYWWCCSSSSLLFF", gCode29 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLLYYYY*WCCSSSSLLFF", gCode30 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLLEEYY*WCCSSSSLLFF", gCode31 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLLEEYYWWCCSSSSLLFF", gCode33 = "KKNNSKSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLLY*YYWWCCSSSSLLFF"
-
Genetic codes for translating nucleotides to amino acids. The genetic code follows the NCBI definitions
- gCode1 - The Standard Code
- gCode2 - The Vertebrate Mitochondrial Code
- gCode3 - The Yeast Mitochondrial Code
- gCode4 - The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code
- gCode5 - The Invertebrate Mitochondrial Code
- gCode6 - The Ciliate, Dasycladacean and Hexamita Nuclear Code
- gCode9 - The Echinoderm and Flatworm Mitochondrial Code
- gCode10 - The Euplotid Nuclear Code
- gCode11 - The Bacterial, Archaeal and Plant Plastid Code
- gCode12 - The Alternative Yeast Nuclear Code
- gCode13 - The Ascidian Mitochondrial Code
- gCode14 - The Alternative Flatworm Mitochondrial Code
- gCode16 - Chlorophycean Mitochondrial Code
- gCode21 - Trematode Mitochondrial Code
- gCode22 - Scenedesmus obliquus Mitochondrial Code
- gCode23 - Thraustochytrium Mitochondrial Code
- gCode24 - Rhabdopleuridae Mitochondrial Code
- gCode25 - Candidate Division SR1 and Gracilibacteria Code
- gCode26 - Pachysolen tannophilus Nuclear Code
- gCode27 - Karyorelict Nuclear Code
- gCode28 - Condylostoma Nuclear Code
- gCode29 - Mesodinium Nuclear Code
- gCode30 - Peritrich Nuclear Code
- gCode31 - Blastocrithidia Nuclear Code
- gCode33 - Cephalodiscidae Mitochondrial UAA-Tyr Code
- The amino acids in each genetic code string are ordered relative to this sequence of codons:
- AAA, AAG, AAC, AAT, AGA, AGG, AGC, AGT, ACA, ACG, ACC, ACT, ATA, ATG, ATC, ATT, GAA, GAG, GAC, GAT, GGA, GGG, GGC, GGT, GCA, GCG, GCC, GCT, GTA, GTG, GTC, GTT, CAA, CAG, CAC, CAT, CGA, CGG, CGC, CGT, CCA, CCG, CCC, CCT, CTA, CTG, CTC, CTT, TAA, TAG, TAC, TAT, TGA, TGG, TGC, TGT, TCA, TCG, TCC, TCT, TTA, TTG, TTC, TTT
Consts
aminoAcidAbreviation: array[AminoAcid, string] = ["Ala", "Cys", "Asp", "Glu", "Phe", "Gly", "His", "Ile", "Lys", "Leu", "Met", "Asn", "Pyl", "Pro", "Gln", "Arg", "Ser", "Thr", "Sec", "Val", "Trp", "Tyr", "Asx", "Glx", "Stp", "Amb"]
- Source Edit
aminoAcidChar: array[AminoAcid, char] = ['A', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'Y', 'B', 'Z', '*', 'X']
- Source Edit
aminoAcidDefinition: array[AminoAcid, string] = ["Alanine", "Cysteine", "Aspartic Acid", "Glutamic Acid", "Phenylalanine", "Glycine", "Histidine", "Isoleucine", "Lysine", "Leucine", "Methionine", "Asparagine", "Pyrrolysine", "Proline", "Glutamine", "Arginine", "Serine", "Threonine", "Selenocysteine", "Valine", "Tryptophan", "Tyrosine", "Aspartic acid or asparagine", "Glutamic acid or glutamine", "Stop", "Ambiguous/Unknown"]
- Source Edit
Procs
func abreviation(a: AminoAcid): string {....raises: [], tags: [].}
- Returns amino acid abreviation Source Edit
func definition(a: AminoAcid): string {....raises: [], tags: [].}
- Returns amino acid definition Source Edit
func parseChar(c: char; T: typedesc[AminoAcid]): AminoAcid
- Parse character to DNA enum type. Source Edit
func toChar(a: AminoAcid): char {....raises: [], tags: [].}
- Returns amino acid character. Source Edit
func translateCodon(nucleotides: array[3, AnyNucleotide]; code: GeneticCode): AminoAcid
- Translate nucleotide codon to amino acid. See documentation for GeneticCode for code parameter options. Source Edit