IUPAC ambiguity codes. Nucleotide ambiguity code. Nomenclature for Incompletely Specified Bases in Nucleic Acid Sequences.

Nucleotide ambiguity code

(IUPAC)

Nucleotide ambiguity code

(as defined in DNA Sequence Assembler)

Code	Represents	Complement
A	Adenine	T
G	Guanine	C
C	Cytosine	G
T	Thymine	A
Y	Pyrimidine (C or T)	R
R	Purine (A or G)	Y
W	weak (A or T)	W
S	strong (G or C)	S
K	keto (T or G)	M
M	amino (C or A)	K
D	A, G, T (not C)	H
V	A, C, G (not T)	B
H	A, C, T (not G)	D
B	C, G, T (not A)	V
X/N	any base	X/N
-	Gap	-

Code example:

Restriction enzyme: AarI

Recognition site: CACCTGCNNNN'NNNN_

Cleavage of DNA (/):

5'- C A C C T G C N N N N/N N N N -3'
3'- G T G G A C G N N N N N N N N/-5'

The letter codes and compliment translations are those proposed by Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-UIBMB)

Note:

DNA Sequence Assembler can automatically detect SNP and convert them to IUPAC codes. Its Reverse Complement tool can also automatically reverse complement IUPAC bases.

Standard Ambiguity Codes

The standard ambiguity codes for nucleotides and for the one-letter and three-letter designations of amino acids are given. The synonymous codons for the amino acids, and their depiction in IUB codes (Nomenclature Committee, 1985, Eur. J. Biochem. 150:1-5) are also shown.

Nucleotide	Symbol	3-Let	Amino Acid	IUB
(Adenosine) A	A	Ala	Alanine	GCX
C or G or T/U	B	Asx	Aspartate or Asparagine	RAY
(Cytidine) C	C	Cys	Cysteine	UGY
A or G or T/U	D	Asp	Aspartate	GAY
-	E	Glu	Glutamate	GAR
-	F	Phe	Phenylalanine	UUY
(Guanosine) G	G	Gly	Glycine	GGX
A or C or T/U	H	His	Histidine	CAY
(Inosine) I	I	Ile	Isoleucine	AUH
-	J	-	-	-
G or T/U	K	Lys	Lysine	AAR
-	L	Leu	Leucine	UUR,CUX,YUR
A or C	M	Met	Methionine	AUG
unknown base	N	Asn	Asparagine	AAY
-	O	-	-	-
-	P	Pro	Proline	CCX
-	Q	Gln	Glutamine	CAR
(Purine) A or G	R	Arg	Arginine	CGX,AGR,MGR
C or G	S	Ser	Serine	UCX,AGY
(Thymidine) T	T	Thr	Threonine	ACX
(Uridine) U	U	-	-	-
A or C or G	V	Val	Valine	GUX
A or T/U	W	Trp	Tryptophan	UGG
unknown base	X	unknown amino acid		XXX
(Pyrimidine) C or T/U	Y	Tyr	Tyrosine	UAY
-	Z	Glx	Glutamate or Glutamine	SAR
no base (deletion/gap)	.	no amino acid (deletion/gap)	-	-
-	*	End	terminator	UAR,URA

How the standard ambiguity codes were assigned

Standard Amino Acid Codes

A = Ala = Alanine
C = Cys =Cysteine (not Cystine!)
G = Gly = Glycine
I = Ile =Isoleucine
L = Leu = Leucine
M = Met = Methionine
P = Pro = Proline
S = Ser = Serine
T = Thr = Threonine
V = Val = Valine

should be obvious codes

Standard Nucleotide Codes

A = Adenylic acid
C = Cytidylic acid
G = Guanylic acid
T = Thymidylic acid
U = Uridylic acid
I = Inosylic acid

should be obvious codes

F = Phe = Phenylanine
N = Asn = Asparagine
R = Arg = Arginine
Y = Tyr = Tyrosine

are phonetic codes

R = A or G = puRine
Y = C or T = pYrimidine
K = G or T = Keto
M = A or C = aMino
S = G or C = Strong base pair
W = A or T = Weak base pair

double base codes

D = Asp = Aspartic acid
E = Glu = Glutamic acid
K = Lys = Lysine
Q = Gln = Glutamine
W = Trp = Tryptophan (big letter big residue)

non-obvious codes (you just have to learn them!)

B = not A (G or C or T)
D = not C (A or G or T)
H = not G (A or C or T)
V = not T/U (A or C or G)

triple base codes

B = Asx = Aspartic acid or Asparagine
Z = Glx = Glutamic acid or Glutamine

these are ambiguity codes

N = aNy base (by convention, X is used for unknown amino acids, N for unknown nucleotides)

X = any amino acid
J, O, U = no amino acid codes
.(dot) = deletion or gap
*(star) = End or terminator

E, F, J, L, O, P, Q, Z
have no base codes

.(dot) = deletion or gap

Single nucleotide polymorphism