src/bioseq/aminoAcid

  Source   Edit

This module contains the AminoAcid enum type for working with amino acid sequence data. Using an enum type provides convenience and type safety. The AminoAcid type represents an extended IUPAC code which includes Prolysine, Selenocysteine, and the two ambiguous characters 'B' and 'Z'. A full description of the implementation can be seen in the table below.

The genetic codes for translating to amino acids are sourced from NCIB at https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi.

SymbolBioSeqAbreviationDefinition
AaaAAlaAlanine
CaaCCysCysteine
DaaDAspAspartic Acid
EaaEGluGlutamic Acid
FaaFPhePhenylalanine
GaaGGlyGlycine
HaaHHisHistidine
IaaIIleIsoleucine
KaaKLysLysine
LaaLLeuLeucine
MaaMMetMethionine
NaaNAsnAsparagine
OaaOPylPyrolysine
PaaPProProline
QaaQGlnGlutamine
RaaRArgArginine
SaaSSerSerine
TaaTThrThreonine
UaaUSecSelenocysteine
VaaVValValine
WaaWTrpTryptophan
YaaYTyrTyrosine
BaaBAsxAspartic acid or asparagine
ZaaZGlxGlutamic acid or glutamine
*aaStpStpStop
XaaXAmbAmbiguous/Unknown

Example:

import src/bioseq/aminoAcid
import bioseq

let ala = parseChar('A', AminoAcid)  
assert ala.toChar == 'A'

let amino = translateCodon([dnaT, dnaT, dnaT], gCode1)
assert amino == aaF

Types

AminoAcid = enum
  aaA, aaC, aaD, aaE, aaF, aaG, aaH, aaI, aaK, aaL, aaM, aaN, aaO, aaP, aaQ,
  aaR, aaS, aaT, aaU, aaV, aaW, aaY, aaB, aaZ, aaStp, aaX
  Source   Edit
GeneticCode = enum
  gCode1 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YY*WCCSSSSLLFF",
  gCode2 = "KKNN**SSTTTTMMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YYWWCCSSSSLLFF",
  gCode3 = "KKNNRRSSTTTTMMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPTTTT**YYWWCCSSSSLLFF",
  gCode4 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YYWWCCSSSSLLFF",
  gCode5 = "KKNNSSSSTTTTMMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YYWWCCSSSSLLFF",
  gCode6 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLLQQYY*WCCSSSSLLFF",
  gCode9 = "NKNNSSSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YYWWCCSSSSLLFF",
  gCode10 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YYCWCCSSSSLLFF",
  gCode11 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YY*WCCSSSSLLFF",
  gCode12 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLSLL**YY*WCCSSSSLLFF",
  gCode13 = "KKNNGGSSTTTTMMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YYWWCCSSSSLLFF",
  gCode14 = "NKNNSSSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLLY*YYWWCCSSSSLLFF",
  gCode16 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL*LYY*WCCSSSSLLFF",
  gCode21 = "NKNNSSSSTTTTMMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YYWWCCSSSSLLFF",
  gCode22 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL*LYY*WCC*SSSLLFF",
  gCode23 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YY*WCCSSSS*LFF",
  gCode24 = "KKNNSKSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YYWWCCSSSSLLFF",
  gCode25 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLL**YYGWCCSSSSLLFF",
  gCode26 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLALL**YY*WCCSSSSLLFF",
  gCode27 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLLQQYYWWCCSSSSLLFF",
  gCode28 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLLQQYYWWCCSSSSLLFF",
  gCode29 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLLYYYY*WCCSSSSLLFF",
  gCode30 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLLEEYY*WCCSSSSLLFF",
  gCode31 = "KKNNRRSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLLEEYYWWCCSSSSLLFF",
  gCode33 = "KKNNSKSSTTTTIMIIEEDDGGGGAAAAVVVVQQHHRRRRPPPPLLLLY*YYWWCCSSSSLLFF"
Genetic codes for translating nucleotides to amino acids. The genetic code follows the NCBI definitions


The amino acids in each genetic code string are ordered relative to this sequence of codons:
AAA, AAG, AAC, AAT, AGA, AGG, AGC, AGT, ACA, ACG, ACC, ACT, ATA, ATG, ATC, ATT, GAA, GAG, GAC, GAT, GGA, GGG, GGC, GGT, GCA, GCG, GCC, GCT, GTA, GTG, GTC, GTT, CAA, CAG, CAC, CAT, CGA, CGG, CGC, CGT, CCA, CCG, CCC, CCT, CTA, CTG, CTC, CTT, TAA, TAG, TAC, TAT, TGA, TGG, TGC, TGT, TCA, TCG, TCC, TCT, TTA, TTG, TTC, TTT
  Source   Edit

Consts

aminoAcidAbreviation: array[AminoAcid, string] = ["Ala", "Cys", "Asp", "Glu",
    "Phe", "Gly", "His", "Ile", "Lys", "Leu", "Met", "Asn", "Pyl", "Pro", "Gln",
    "Arg", "Ser", "Thr", "Sec", "Val", "Trp", "Tyr", "Asx", "Glx", "Stp", "Amb"]
  Source   Edit
aminoAcidChar: array[AminoAcid, char] = ['A', 'C', 'D', 'E', 'F', 'G', 'H', 'I',
    'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'Y', 'B',
    'Z', '*', 'X']
  Source   Edit
aminoAcidDefinition: array[AminoAcid, string] = ["Alanine", "Cysteine",
    "Aspartic Acid", "Glutamic Acid", "Phenylalanine", "Glycine", "Histidine",
    "Isoleucine", "Lysine", "Leucine", "Methionine", "Asparagine",
    "Pyrrolysine", "Proline", "Glutamine", "Arginine", "Serine", "Threonine",
    "Selenocysteine", "Valine", "Tryptophan", "Tyrosine",
    "Aspartic acid or asparagine", "Glutamic acid or glutamine", "Stop",
    "Ambiguous/Unknown"]
  Source   Edit

Procs

func abreviation(a: AminoAcid): string {....raises: [], tags: [].}
Returns amino acid abreviation   Source   Edit
func definition(a: AminoAcid): string {....raises: [], tags: [].}
Returns amino acid definition   Source   Edit
func parseChar(c: char; T: typedesc[AminoAcid]): AminoAcid
Parse character to DNA enum type.   Source   Edit
func toChar(a: AminoAcid): char {....raises: [], tags: [].}
Returns amino acid character.   Source   Edit
func translateCodon(nucleotides: array[3, AnyNucleotide]; code: GeneticCode): AminoAcid
Translate nucleotide codon to amino acid. See documentation for GeneticCode for code parameter options.   Source   Edit