Buscar

Visão global das vias e funções da glicosilação de proteínas humanas

Prévia do material em texto

Glycosylation of proteins is arguably the most diverse 
post-translational modification. Proteins are glycosylated 
by enzymes or through non-enzymatic glycation where 
glucose (aldehyde form) reacts with lysine and arginine 
residues in proteins, and undergo further changes that 
eventually lead to advanced glycation end products 
that serve important functions in ageing and disease, 
especially diabetes1. The enzymatic protein glycosylation 
processes (herein referred to as glycosylation) mostly 
involve sequential concerted steps in the endoplasmic 
reticulum (ER) and the Golgi system resulting in glyco-
sylation of most (>85%) secretory proteins2,3. In addition, 
the majority of nuclear and cytoplasmic proteins undergo 
dynamic O-GlcNAcylation4, perhaps making glycosylation 
the most abundant post-translational modification (even 
more abundant than phosphorylation)5.
Protein glycosylation is a complex, multistep pro-
cess that employs around 200 glycosyltransferase 
enzymes that determine which proteins are to become 
glycoproteins, the positions of glycans on those pro-
teins and the glycan structures assembled (TAble 1). 
Furthermore, the initial attachment of glycans may be 
differentially regulated in cells by expression of distinct 
glycosyltransferase isoenzymes, and thus some proteins 
may or may not become glycoproteins depending on 
their cell of origin and in response to functional needs. 
Glycosylation greatly amplifies the proteome by pro-
ducing diverse proteoforms with different properties, 
thereby instructing myriad functions6–8. The ensem-
ble of glycans found on glycoproteins, including 
glycosylphosphatidylinositol (GPI)-anchored proteins and 
proteoglycans, alongside glycosphingolipids and free 
oligosaccharides and polysaccharides constitutes the 
glycome of a cell, and the ensemble of glycoconjugates 
at the cell surface constitutes the glycocalyx (FIG. 1).
The glycome is diverse, involving different types 
of glycoconjugates and oligosaccharides with varying 
compositions, sequences and linkages of sugar moieties. 
Nevertheless, despite this diversity, the glycoconjugates 
do share certain features, such as common structural 
scaffolds and terminal modifications9. The diversity in 
types of glycans and structures has expanded during 
evolution of eukaryotes likely in response to needs for 
increased molecular cues and regulation10,11. Protein 
glycosylation pathways are nearly identical across mam-
malian cells, although several glycan features were elim-
inated late in evolution by gene inactivation resulting in 
xenoantigens12–14.
Studies of deficiencies in glycosylation enzymes 
in animal models and human diseases have advanced 
understanding of biological functions of protein glyco-
sylation and demonstrated that most glycosyltransferases 
serve essential roles in mammalian physiology15–18. 
Glycosylation of proteins is thus integral to their func-
tion and should be considered in functional studies of 
the proteome. However, our understanding of specific 
O-GlcNAcylation
The enzymatic process 
directed by the N-acetyl- 
d-glucosamine (GlcNAc) 
glycosyltransferase (OGT) that 
transfers GlcNAc to proteins 
(Ser and Thr residues) 
occurring in the cytosol 
and nucleus of cells.
Isoenzymes
enzymes that catalyse the 
same reactions but differ 
in amino acid sequence and 
often have partially distinct 
(non-redundant) functions.
Global view of human protein 
glycosylation pathways and functions
Katrine T. Schjoldager , Yoshiki Narimatsu , Hiren J. Joshi  ✉ and Henrik Clausen  ✉
Abstract | Glycosylation is the most abundant and diverse form of post-translational modification 
of proteins that is common to all eukaryotic cells. Enzymatic glycosylation of proteins involves 
a complex metabolic network and different types of glycosylation pathways that orchestrate 
enormous amplification of the proteome in producing diversity of proteoforms and its biological 
functions. The tremendous structural diversity of glycans attached to proteins poses analytical 
challenges that limit exploration of specific functions of glycosylation. Major advances in quantitative 
transcriptomics, proteomics and nuclease-based gene editing are now opening new global ways to 
explore protein glycosylation through analysing and targeting enzymes involved in glycosylation 
processes. In silico models predicting cellular glycosylation capacities and glycosylation outcomes 
are emerging, and refined maps of the glycosylation pathways facilitate genetic approaches to 
address functions of the vast glycoproteome. These approaches apply commonly available cell 
biology tools, and we predict that use of (single-cell) transcriptomics, genetic screens, genetic 
engineering of cellular glycosylation capacities and custom design of glycoprotein therapeutics 
are advancements that will ignite wider integration of glycosylation in general cell biology.
Center for Glycomics, 
Department of Cellular and 
Molecular Medicine, Faculty of 
Health Sciences, University 
of Copenhagen, Copenhagen, 
Denmark.
✉e-mail: joshi@sund.ku.dk; 
hclau@sund.ku.dk
https://doi.org/10.1038/ 
s41580-020-00294-x
REVIEWS
Nature reviews | Molecular cell Biology
http://orcid.org/0000-0002-8592-6763
http://orcid.org/0000-0003-1428-5695
http://orcid.org/0000-0002-8192-2829
http://orcid.org/0000-0002-0915-5055
mailto:joshi@sund.ku.dk
mailto:hclau@sund.ku.dk
https://doi.org/10.1038/s41580-020-00294-x
https://doi.org/10.1038/s41580-020-00294-x
http://crossmark.crossref.org/dialog/?doi=10.1038/s41580-020-00294-x&domain=pdf
structure–function relationships and roles of specific 
glycans on specific proteins is incomplete.
The glycome is produced and regulated by the gly-
cosylation machinery in a single cell, yet analysis of 
glycans at the single-cell level is not possible with current 
glycomics methods, which are limited to probing with 
glycan-specific antibodies and glycan-binding proteins 
(GBPs, such as lectins). It is therefore often perceived as 
a daunting task to uncover and dissect specific biolog-
ical functions of glycans and the underlying molecular 
mechanisms. Advances in next-generation sequencing 
and proteomics are beginning to provide single-cell 
Table 1 | initiation steps for human protein glycosylation pathways
Type Symbol linkage initiation 
enzymes
glycosylation sites in target proteins
Sequence motifs Specific 
domain
N-Glycosylation
β
Asn
LLO/GlcNAcβ–Asn OST complex 
(STT3A/B)
N-X-S/T, X≠P
N-X-C; N-G: N-X-V; X≠P
None
O-Glycosylation
α
Ser/Thr/Tyr
GalNAcα–Ser/Thr 
(Tyr)
GALNT1–20 Weak isoform-specific 
motifs51
None
GALNT11 C6-X-X-X-T-C1 LA
α
Ser/Thr
Fucα–Ser/Thr POFUT1 C2-X-X-X-X-(S/T)-C3 EGF
POFUT2 C-X-X-(S/T)-C-X-X-G TSR
β
Ser/Thr
GlcNAcβ–Ser/Thr EOGT C5XX(G/P/S)(Y/F/W)(T/S)
GXXC6
EGF
α
Ser/Thr
Manα–Ser/Thr POMT1, POMT2 ND None
TMTC1–4 ND EC
Unknown ND IPT
β
Ser
Glcβ–Ser POGLUT1 C1-X-S-X-(A/P)-C2 EGF
POGLUT2–3 C3-X-N-T-X-G-S-(F/Y)-X-C4
β
Ser
Xylβ–Ser XYLT1, XYLT2 a-a-a-a-G-S-G-a-(a/G)-a
(a = D/E)
None
β
Hyl
Galβ–Hyl COLGALT1–2 Collagen repeats Collagen
β
Ser/Thr
GlcNAcβ–Ser/Thr OGT None None
α
Tyr
Glcα–Tyr GYG Y194 in glycogenina NA
C-Mannosylation
α
Trp
Manα–Trp DPY19L1–4 W-X-X-W TSR
Glypiation
(GPI anchor)
NA Protein–C(O)
EthNP–Man
Transamidase Carboxy-terminal 
hydrophic segment
None
The minimal protein–glycan linkages for the distinct human protein glycosylation pathways are shown with the monosaccharide 
linkage (including the anomeric α/β configuration of linkage) to amino acid residues in proteins (FIGS 1 and 3), which include 
11 types of O-glycosylation when the 2 O-Fuc, 3 O-Man and 2 O-Glc pathways are considered distinct. For references and details, 
please refer to the main text. COLGALT, collagen O-Gal transferase; DPY19L, dpy-19 like C-Man transferase; EC, extracellular 
cadherin immunoglobulin-like; EGF, epidermal growth factor-like repeat; EOGT, epidermal growth factor-domain specific 
O-GlcNAc transferase; EthNP, ethanolaminephosphate; Fuc, l-fucose; Gal, d-galactose; GalNAc, N-acetyl-d-galactosamine; 
GALNT, polypeptide GalNAc transferase; Glc, d-glucose; GlcNAc, N-acetyl-d-glucosamine; GPI, glycosylphosphatidylinositol; 
Hyl, hydroxylysine; IPT, immunoglobulin-like, plexin and transcription factor; LA, low-density lipoprotein receptor (LDLR) class A 
repeats; LLO, lipid-linked oligosaccharide; Man, d-mannose; NA not applicable; ND, not determined; OGT, O-GlcNAc transferase; 
OST, oligosaccharyltransferase; POFUT, protein O-fucosyltransferase; POGLUT, protein O-Glc transferase; POMT, protein O-Man 
transferase; TMTC, transmembrane O-Man transferase targeting cadherin (also transmembrane and tetra-trico-peptide repeat 
(TPR) repeat-containing protein); TSR, thrombospondin type 1 repeat; unknown, transmembrane O-Man transferase to be 
reported; X, any of the 20 amino acids; Xyl, d-xylose; XYLT, protein O-Xyl transferase. aThe unique glycosylation of glycogenin 
(GYG) involves autoglucosylation of Tyr residue 194 (Glcα–Tyr) followed by formation of the glycogenin polymer, and is not 
further discussed in the text.
Glycosylphosphatidylinositol 
(GPI)-anchored 
glycoproteins
A class of proteins that are 
attached to the membrane 
lipid bilayer via a carboxy- 
terminal glycolipid anchor 
consisting of phosphoethanola-
mine, an oligosaccharide core 
and phosphatidylinositol.
Proteoglycans
Proteins carrying one or more 
glycosaminoglycan chains 
attached covalently.
Glycocalyx
The cell coat comprising 
glycans and glycoconjugates 
surrounding animal cells 
found as an electron-dense 
layer by electron microscopy. 
It protects the cell from 
physical stress and mediates 
a plethora of macromolecular 
and cell–cell interactions.
Xenoantigens
Antigens found in multiple 
species that elicit antibodies 
in a species without the 
antigen after transplantation 
of tissues and organs. A major 
xenoantigen in porcine to 
human transplantation is the 
Galα1–3Galβ1–R glycan 
epitope (αGal).
Lectins
Proteins that bind to glycans. 
Major animal lectin families 
include Galectin, C-type, 
P-type and I-type lectins. 
lectins, mostly from plants, 
with well-characterized binding 
specificities are frequently 
used as tools in glycobiology. 
lectins are often multivalent 
with binding affinities in the low 
micromolar range and binding 
avidities approaching the 
nanomolar range for larger 
glycans with multiple epitopes.
www.nature.com/nrm
R e v i e w s
transcriptomes and proteomes, which has opened 
the way for global analysis of the network of enzymes 
that orchestrate protein glycosylation and the assess-
ment of the glycosylation capacities of any given cell. 
Accompanying these are the nuclease-based gene edit-
ing technologies that — through precise manipulation 
of glycosylation enzymes — provide virtually unlimited 
opportunities for engineering, exploration and custom 
design of cellular glycosylation capacities. We can now 
probe glycosylation systematically through a genetic 
entry point, and we foresee that with additional efforts 
we will be able to connect information on cellular gly-
cosylation capacities with the actual outcome of the 
glycome and roles of glycosylation in cells.
Here, we provide an overview of the current know-
ledge of the human protein glycosylation pathways 
and the genes encoding the orchestrating enzymes. 
We also discuss the emerging genetic and systems bio-
logy approaches to the study of protein glycosylation and 
dissection of biological functions. We are fully aware that 
the inherent space limitations necessitate some level of 
generalization with omittance of numerous details (and 
referencing available literature). Our aim here is to pro-
vide a global view of how the human glycome is estab-
lished and highlight our current understanding of its 
roles in physiology.
Basis of protein glycosylation
Glycans are not primary gene products and, in contrast to 
proteins, their synthesis occurs without a template. The 
human genome contains around 700 genes encoding 
enzymes, transporters and chaperones required for the 
Ethanolamine phosphate Phosphate d-Glucose (Glc)
N-Acetyl-d-glucosamine (GlcNAc)Phosphoinositol d-Galactose (Gal)
N-Acetyl-d-galactosamine (GalNAc)d-Mannose (Man)
N-Acetylneuraminic acid (NeuAc)
d-Glucuronic acid (GlcA) P
d-Glucosamine (GlcN)
l-Fucose (Fuc)
D-Xylose (Xyl)
Sulfation (S) d-Ribitol (Rbo)
O-GlcNAc
Domain specificNucleus and cytoplasm
Proteo-
glycan
O-Xyl
Plexin/
IPT domain
Cadherin
EC
domain
α-Dystro-
glycan
Collagen
Hyl–Gal
TSR domain
3S
P
nn
n
n
3S 6S
6S
4S2S
NS
2S
n
GPI anchor N-Linked O-GalNAc EGF
repeats
(for example, NOTCH)
Glyco-
sphingolipid
NOTCH
Plasma
membrane
LDLR
class A
repeats
O-ManO-Fuc/O-Glc/
O-GlcNAc/ C-Man
ol
ol
ol
Fig. 1 | Main classes of glycoconjugates of the human cellular glycome. 
Depiction of the key components of the cellular glycome, highlighting types 
of glycosylation that are specific to distinct protein classes or protein 
domains. The glycans depicted are only illustrative examples of the glycan 
structures that can be synthesized by the different types of glycosylation 
pathways. N-glycans and most GalNAc-type O-glycans are widely found 
on most proteins trafficking the cellular secretory pathway, whereas the 
occurrence of domain-specific glycans is limited to specific protein 
domains. The enzymatic processes that orchestrate glycosylation of the 
different types of glycoconjugates are partly distinct and partly overlapping. 
The initial attachment of the first monosaccharide (or oligosaccharide for 
N-glycosylation) to proteins represents the key initiation step that 
determines which proteins and positions become glycosylated. These 
initiation steps are distinct for the different types of glycosylation pathways 
and, to large extent, direct the structures of glycans that are generated. 
Some overlap in the later processing steps that involve elongation, 
branching and capping of oligosaccharides is found among N-glycosylation 
and several types of O-glycosylation as well as in glycosphingolipid 
biosynthesis (FIG. 3). The background colour scheme is organized according 
to the colour of the first monosaccharide attached to the core protein, 
except for glycosylphosphatidylinositol (GPI)-anchored proteins and 
glycolipids (shown in grey). The colouring scheme is useful for distinguishing 
the protein glycosylation pathways involved in the synthesis of different 
types of glycans215 (see also TAble 1). Glycan symbols are drawn according to 
the Symbol Nomenclature for Glycans (SNFG) format246. Sugar repeat units 
are indicated by square brackets with ‘n’ to indicate a number of possible 
repeats. EC, extracellular cadherin; EGF, epidermal growth factor; Hyl, 
hydroxylysine; IPT, immunoglobulin-like, plexin, transcription factor; LDLR, 
low-density lipoprotein receptor; NS, non-specific; TSR domain, 
thrombospondin type 1 repeat domain.
Nature reviews | Molecular cell Biology
R e v i e w s
cellular glycosylation machinery, glycan modifications 
and their degradation (corresponding to 3–4% of the 
genome)19–22. Over 200 of these genes encode glycosyl-
transferases, and 173 of these work sequentially to estab-
lish the complex patterns of sugars found on glycoproteins 
and lipids20 (TAble 1). Tremendous efforts by a large num-
ber of laboratories allowed isolation, cloning, expression 
and characterization of these enzymes, thereby estab-
lishing insight into their substrate and kinetic pro perties 
and roles in glycosylation pathways (Supplementary 
Table 1) (for comprehensive reviews, see ReFS16,23–26). 
With complete genome sequences available, the reper-
toire of glycosyltransferase genes is well characterized, 
and most known glycosidic linkages established between 
sugar moieties are accounted for20. Nevertheless, new 
glycosyltransferase genes and even distinct glycosylation 
pathways are still being discovered27–30.
Protein glycosylation takes place in the secretorypath-
way (ER and Golgi), nucleus, cytoplasm and mito chondria 
of all eukaryotic cells (FIG. 2). Ten mono saccha rides — 
d-glucose (Glc), d-galactose (Gal), N-acetyl-d-glucosamine 
(GlcNAc), N-acetyl-d-galactosamine (GalNAc), l-fucose 
(Fuc), d-glucuronic acid (GlcA), d-mannose (Man), 
N-acetylneuraminic acid (Neu5Ac), d-xylose (Xyl) and 
d-ribose (Rib) — derived from activated donor sugar 
nucleotides or dolichol (Dol)-linked donors are used to 
build the human glycome. Glycans are attached to pro-
teins in four different ways — N-linked to asparagine 
(Asn), O-linked to the hydroxyl groups of serine (Ser), 
threonine (Thr) or tyrosine (Tyr), C-linked to trypto-
phan (Trp) and glypiation — and there are several dif-
ferent O-linked sugars including GalNAc, Fuc, GlcNAc, 
Man, Glc, Xyl and Gal (TAble 1). Different types of pro-
tein glycosylation are broadly defined by the sugar–
protein linkage, the initial monosaccharide linked to 
proteins and, for some types of O-glycosylation, by the 
enzymes directing the first step in protein glycosylation 
(FIG. 3). Many types of protein glycosylation (TAble 1) start 
in the ER, two start in the early Golgi (GalNAc-type and 
Xyl-type) and O-GlcNAcylation takes place in the cyto-
plasm and nucleus (GalNAc-type O-glycosylation has 
also been reported in the nucleus31, but this result may 
be an experimental artefact resulting from glycosylation 
occurring during cellular fractionation). Most glyco-
syltransferases are type II transmembrane glycoproteins 
with ER and Golgi lumen-oriented catalytic domains 
that make use of activated sugar nucleotides (UDP-Glc/
GlcNAc/Gal/GalNAc/Xyl/GlcA, GDP-Man/Fuc, 
CMP-NeuAc and CDP-ribitol) as donor substrates, 
and have a short carboxy-terminal segment required 
for retrograde transport from the Golgi to the ER via 
COPI-coated vesicles32,33. Type II glycosyltransferases are 
prone to proteolytic cleavage in stem regions, unteth-
ering their catalytic domains from the ER or Golgi 
mem branes34–36 and accounting for the presence of glyco-
syltransferases in body fluids37. Some ER-resident 
glycosyltransferases are multipass transmembrane proteins 
and utilize Dol-linked donor substrates (Dol-P-Man, 
Dol-P-Glc or the lipid-linked oligosaccharide precursor), 
whereas some glycosyltransferases are soluble ER-resident 
enzymes (protein O-fucosyltransferase 1/2 (POFUT1/2), 
protein O-Glc transferase 1–3 (POGLUT1–3), collagen 
O-Gal transferase 1/2 (COLGALT1/2) and epidermal 
growth factor-domain specific O-GlcNAc transferase 
(EOGT) — note that throughout the article we refer to 
glycosyltransferases by their gene names and when refer-
ring specifically to genes we use italics) retained in the 
ER by C-terminal KDel signals and use activated sugar 
nucleotides for glycosylation38,39.
Structural elaboration of glycans by sequential 
addition of monosaccharides to extend, branch and 
cap growing oligosaccharides occurs largely in the 
Golgi — one exception has been reported in the pro-
tein O-Man transferase (POMT)-directed synthesis of 
O-Man glycans, with the first elongation step mediated 
by POMGNT2 occurring in the ER40 (FIG. 3). Following 
anterograde transport of glycoproteins to the surface, 
further glycosylation (in particular, sialylation) during 
recycling of membrane glycoproteins can occur41–43, 
and more extensive modifications including change of 
O-glycan core structures have been suggested44–47.
Glycosylation is orchestrated mainly by the kinetic 
properties of glycosyltransferases and their compart-
mentalization in Golgi stacks, with a distribution related 
to sequential biosynthetic steps19,20,22,48 (FIG. 2). Formation 
of multimeric (homomeric and heteromeric) enzyme 
complexes may contribute to the orchestration of these 
glycosylation steps49. Insight into the structures and 
catalytic mechanisms of glycosyltransferases reveals 
common structural scaffolds with distinct acceptor 
substrate specificities partly conferred by variable loop 
regions extending from the core catalytic unit50,51. It is 
further proposed that evolutionary diversification of 
the functions of glycosyltransferases involves mutations 
in the common core sugar nucleotide binding region 
and varying loop regions, which drive the divergence 
in donor sugars and acceptor substrate recognition, 
respectively50–52. Glycosyltransferases utilizing activated 
donor nucleotide sugars have high specificity for the 
nucleotide (although they may have some flexibility for 
the donor monosaccharide) and, in general, form only 
one type of glycosidic linkage structure50. The final glyco-
sylation outcome is influenced by many other factors, 
including the availability of substrates and sugar nucleo-
tides, competing glycosylation reactions, co-factors 
(such as Mn2+), intracellular transport, pH and actions 
of protein chaperones and glycosidases, as well as by 
general factors, for example stress, that may affect the 
normal cellular state.
Human protein glycosylation pathways
The known glycome is generated through 16 distinct 
glycosylation pathways — distinguished on the basis of 
the sugar–protein linkage, the initial monosaccharide 
linked to proteins and the unique initiating enzymes — 
which are directed by at least 173 distinct glycosyltrans-
ferases (FIG. 3; see also Supplementary Fig. 1). These gly-
cosylation pathways include, apart from 2 types of lipid 
glycosylation, 14 distinct types of protein glycosylation, 
including N-glycosylation, 11 types of O-glycosylation, 
C-mannosylation and generation of GPI-anchored pro-
teins (TAble 1). Protein glycosylation involves a series of 
sequential steps to build characteristic oligosaccharide 
structures, including an initiation step determining 
Dolichol
(Dol). A polyisoprenol lipid that 
serves as an acceptor for the 
lipid-linked oligosaccharides 
in N-glycan biosynthesis.
Type II transmembrane 
glycoproteins
Single-pass transmembrane 
glycoproteins with the amino 
terminus oriented towards the 
cytosol and the carboxyl 
terminus facing the lumen 
of the secretory pathway 
or cell exterior.
COPI-coated vesicles
Coat protein complex I-coated 
vesicles that mediate intra- 
Golgi and Golgi-to-endoplasmic 
reticulum retrograde transport.
Multipass transmembrane 
proteins
Proteins spanning the 
membrane more than once.
KDEL signals
A carboxy-terminal 
lys-Asp-Glu-leu (KDel) 
retention sequence found 
on endoplasmic reticulum 
(eR)-resident proteins. The 
KDel receptors recognizing 
this signal facilitate the 
retrograde movement of 
eR-based proteins from the 
Golgi and back to the eR 
by coat protein complex I 
(COPI) vesicles.
Sialylation
Modification by the addition of 
sialic acids, which are a large 
family of glycans derived from 
the neuraminic acid (Neu) 
monosaccharide with a 
nine-carbon backbone. In 
humans, N-acetylneuraminic 
acid (Neu5Ac) is the most 
common sialic acid, often 
found in the non-reducing 
terminal of glycoconjugates.
O-Glycan core structures
The initiating O-GalNAc 
glycan can be extended to 
form four different common 
core structures. Core1, 
Galβ1–3GalNAcα1–O-Ser/Thr; 
Core2, GlcNAcβ1–6(Galβ1–3)
GalNAcα1–O-Ser/Thr; Core3, 
GlcNAcβ1–3GalNAcα1– 
O-Ser/Thr; and Core4, 
GlcNAcβ1–6(GlcNAcβ1–3)
GalNAcα1–O-Ser/Thr. The 
core structures can be further 
elongated or branched.
www.nature.com/nrm
R e v i e w s
proteins to be glycosylated, immediate core extension 
steps with options for different core structures, elon-
gation/branching steps that expand (and repeat) com-
mon structural motifs and capping steps that terminate 
oligosaccharide chains (FIG. 3).
Initiation of protein glycosylation. The initiation step for 
each type of protein glycosylation is distinct and regu-
lated by one or more unique glycosyltransferase, or, for 
N-glycosylation, an oligosaccharyltransferase (OST) complex 
or, for GPI-anchored proteins, a GPI–transamidase 
complex that transfers the preassembled GPI anchor to 
the C-terminus of select proteins in theER53,54 (TAble 1). 
A total of 47 of the 173 glycosyltransferases direct initiation 
steps of protein glycosylation. Initiation of N-glycosylation 
and likely POMT-directed O-mannosylation occur 
co-translationally.
N-glycosylation is initiated in the ER by the oligo-
saccharyltransferase (OST) complex assembled with 
STT3A or STT3B catalytic subunits for co-translational 
and post-translational glycosylation, respectively, and 
these subunits appear to provide some regulation of 
N-glycosites55,56. The OST–STT3A complex is asso-
ciated with the ER peptide translocon54, whereas the 
OST–STT3B complex includes MAGT1 or TUSC3 
oxidoreductase subunits57. The OST–STT3B complex 
appears to be the main source of released oligosac-
charides derived from deglycosylation of misfolded 
N-glycoproteins destined for proteasomal degradation 
and found widely in the cytosol56,58,59.
GalNAc-type O-glycosylation of Ser/Thr and pos-
sibly Tyr is initiated in the Golgi by up to 20 poly-
peptide GalNAc transferase (GALNT) isoenzymes 
with distinct and partly overlapping specificities 
(of note, only 15 of those have so far been confirmed 
to be active enzymes25,60), which leads to the genera-
tion of the simple GalNAcα1–O-Ser/Thr monosac-
charide structure also known as the cancer-associated 
Tn antigen. GALNTs generally lack clear acceptor 
sequence motifs in target proteins, but they do exhibit 
differences in substrate specificities and orchestrate 
regulation of sites and patterns of O-glycans in pro-
teins in cooperative ways25,51. Some GALNTs initiate 
GalNAc transfer directly to peptides, whereas others 
only transfer to prior GalNAc-glycosylated peptides 
(designated follow-up glycosylation). Follow-up glyco-
sylation can occur within a short range (1–3 residues) 
or a long range (6–17 residues) from the initial glyco-
sylation site, and the latter is mediated by C-terminal 
GalNAc-binding lectin domains, a unique prop-
erty among metazoan glycosyltransferases51. The 
GalNAc-type O-glycoproteome is extensive, with 
more than 3,000 human O-glycoproteins and over 
15,000 identified O-glycosites61. Analysis of differen-
tial O-glycoproteomes of pairs of isogenic cells with 
disabled GALNT genes confirms that the expressed 
repertoire of GALNTs in a given cell determines 
its O-glycoproteome, with individual isoenzymes 
making distinct non-redundant contributions62–64. 
Some GALNTs, for example GALNT1 and GALNT2, 
have major contributions to the O-glycoproteome, 
whereas others serve specific proteins and functional 
roles (for example, GALNT11 serves specifically the 
low-density lipoprotein receptor (LDLR)-related recep-
tor family (LRPs) and activates their ligand binding 
properties)65. GalNAc-type O-glycosylation cross-talks 
with other post-translational modifications (examples 
include FAM20C Ser phosphorylation66,67, VLK Tyr 
phosphorylation68 and TPST1/2 Tyr sulfation69).
Fuc, Glc and GlcNAc types of O-glycosylation are initi-
ated in the ER. The most prominent targets for these types 
of O-glycosylation are the NOTCH receptor epidermal 
growth factor (eGF)-like repeats (FIG. 1). NOTCH receptor 
O-glycosylation represents one of the most complex types 
of glycan-mediated regulation of receptor functions (see 
also Functions of glycosylation below)39,70–72. Initiation 
of Fuc-type O-glycosylation is directed by two POFUTs, 
wherein POFUT1 serves EGF repeats and POFUT2 
serves related thrombospondin type 1 repeats. Glc-type 
O-glycosylation of EGF repeats in NOTCH is differen-
tially regulated by the three POGLUTs: POGLUT1 has 
wide specificity for many NOTCH EGF repeats, whereas 
POGLUT2 and POGLUT3 have specificities for a single 
functionally important repeat (NOTCH1 EGF11 and 
NOTCH3 EGF10) and glycosylate at a different position73 
(TAble 1). GlcNAc-type O-glycosylation of EGF repeats 
is regulated by EOGT74. All of these initiation enzymes 
require folded repeat domains for activity and acceptor 
sites have defined sequence motifs.
Man-type O-glycosylation initiates in the ER and 
involves at least three distinct types of initiation enzymes. 
A yeast-related O-mannosylation type is directed by the 
POMT1/2 heteromeric complex75. Interestingly, whereas 
the yeast Man O-glycoproteome is diverse and similar 
to the GalNAc-type O-glycoproteome76, the human 
POMT1/2 complex appears to selectively target the 
mucin-like region in α-dystroglycan and a very limited 
number of other proteins27. Two additional types of 
animal O-mannosylation were recently discovered. 
These are driven by four transmembrane O-Man trans-
ferase targeting cadherins (TMTCs), dedicated primar-
ily to modifying the cadherin superfamily, and an as yet 
unreported enzyme that selectively targets IPT domains 
found in plexins and in receptor tyrosine kinases c-MET 
and RON27 (TAble 1).
Mannosyl moieties can also be attached to Trp resi-
dues (in the consensus WxxW motif; shown for throm-
bospondin type 1 repeats and type I cytokine receptors), 
known as C-mannosylation. This modification is driven 
by four dpy-19 like C-Man transferase (DPY19L) gly-
cosyltransferases and occurs in the ER, presumably 
co-translationally77.
Xyl-type O-glycosylation is characteristic of pro-
teoglycans and initiated in the Golgi by protein O-Xyl 
transferase 1 (XYLT1) and XYLT2 that both have a rela-
tively defined sequence motif (TAble 1). Xyl-type glyco-
sylation at select Ser residues primes the biosynthesis of 
glycosaminoglycan chains (GAGs). XYLT1 and XYLT2 are 
differentially expressed and have overlapping substrate 
specificities78. The diversity of proteoglycans is limited, 
but a recent sensitive glycoproteomics strategy almost 
doubled the number of known proteoglycans79.
Hydroxylysine (HYL)-Gal O-glycosylation is 
limited to collagens (important components of the 
Oligosaccharyltransferase 
(OST) complex
A membrane protein 
complex in the endoplasmic 
reticulum that transfers an 
oligosaccharide from a dolichol 
pyrophosphate-activated 
donor to N-linked acceptor 
sequences on secreted 
proteins.
Translocon
A protein complex that 
mediates translocation of 
newly synthesized polypeptides 
from the cytosol across the 
endoplasmic reticulum 
membrane.
Oxidoreductase
An enzyme that catalyses thiol–
disulphide exchange reactions. 
In vivo oxidoreductases are 
important in the oxidative 
protein folding that takes place 
in the endoplasmic reticulum. 
Well-known examples are 
PDI and eRp57.
Lectin domains
Carbohydrate-binding protein 
domains.
Sulfation
An enzymatic process that 
transfers a sulfo group to another 
molecule, for example a glycan, 
by modifying a hydroxyl group 
on a monosaccharide by 
addition of a sulfo group. 
Sulfotransferases catalyse 
the reaction using 3′-phospho- 
5′-adenylyl sulfate (PAPS) as 
a donor.
EGF-like repeats
Common motifs of 30–40 
amino acids found in the 
extracellular domain of 
transmembrane proteins or in 
proteins known to be secreted. 
The epidermal growth factor 
(eGF)-like repeats include six 
conserved cysteines forming 
three disulfide bonds.
Thrombospondin type 1 
repeats
(TSRs) Common protein 
motifs of 50–60 amino acids 
(6 conserved cysteines forming 
3 disulfide bonds) found on 
transmembrane proteins and 
proteins in the extracellular 
matrix.
Mucin
A large viscous heavily 
O-glycosylated protein. 
Mucins are the most abundant 
macromolecule in biofluids and 
mucus, covering most epithelial 
surfaces in the body.
Nature reviews | Molecular cell Biology
R e v i e w s
ol
ol
d-Glucuronic acid
(GlcA)
UMP
UDP
UDP
M6PR
or IGF2R
Clathrin
NEUs
COPII
CMP
SPPL3/BACE1
UMP
CMP
GNPTAB/G
KDELR
UDP
UDP
GDP
UMP
GMP
UMP
UMP
SIATs
ER
GDP
UDP
UDP
GALNTsXYLT1/2
POMTs,
TMTCs,
DPY19Ls
POFUT1/2
OST
POGLUTs
COLGALT1/2
EOGT
OGT
GDP
6
7
Galectins
Initiation
GMP
GMP
UMP
CDP Lysosomal
enzyme
Ribosome
Endocytosis
Secretion
Retrograde
transport
Lysosome
UDP
UDP
UDP UDP UDP
COPI
COPI
Core extension
Elongation and branching
Capping
UDP
OGTUDP
Nucleus
Cell-surface
glycoprotein
Plasma membrane
Secretion
Degradation
4
5
3
2
1
Golgi
TGN
Sugar nucleotide
donors
Sugar nucleotide
transporter
Adaptor proteins
Soluble enzyme
with KDEL
Dol-P-Man/Glc
Multipass
transmembrane protein
Single-pass
Golgi resident
UDP Sugar
GDP Sugar
CMP Sugar
Phosphate
d-Glucose (Glc)
d-Xylose (Xyl)
N-Acetylneuraminic
acid (NeuAc) 
N-Acetyl-d-
glucosamine 
(GlcNAc)
d-Galactose (Gal)
N-Acetyl-d-
galactosamine 
(GalNAc)
d-Mannose (Man)
l-Fucose (Fuc)
d-Ribitol (Rbo)
ACP2/
ACP5
CMP
Cytoplasm
www.nature.com/nrm
R e v i e w s
extracellular matrix). HYL residues are generated by the 
activity of pro-collagen lysine hydroxylases (PLOD1–3), 
and these are subsequently glycosylated by COLGALT1/2 
isoenzymes. These events take place in the ER before 
formation of the mature collagen triple helix, which is 
secreted80. In addition to catalysing lysine hydroxyla-
tion, PLOD3 also harbours galactosyltransferase and 
glucosyltransferase activity, at least in vitro, suggesting 
that this enzyme may also be involved in generating the 
collagen-attached sugar chain81.
Finally, O-GlcNAcylation is a highly abundant 
glycosylation affecting most cytosolic and nuclear 
proteins that serves as a nutrient sensor and a master 
switch in regulating signalling, transcription and cellu-
lar metabolism4,82. O-GlcNAcylation in the cytosol and 
nucleus is directed by a single soluble O-GlcNAc trans-
ferase (OGT) glycosyltransferase that in complex with 
an O-GlcNAc hydrolase (OGA; also known as MGEA5) 
serves to dynamically regulate on/off O-GlcNAc modi-
fications on proteins in concert with phosphorylations4. 
OGT contains amino-terminal transmembrane and 
tetra-trico-peptide repeats (TPRs) that mediate protein– 
protein interactions and orchestrate access to substrates83. 
The residues modified by O-GlcNAcylation are not fur-
ther glycosylated but pose particular analytic challenges, 
being labile and of low stoichiometry.
Further processing of protein glycosylation. The process-
ing of protein glycosylation involves sequential steps 
adding further monosaccharides to growing oligosac-
charide chains by glycosyltransferases, resulting in core 
extension, elongation and branching, and final capping 
of glycans (FIG. 3). The core extension refers to glycosyla-
tion steps and glycosyltransferases unique to particular 
types of protein glycosylation. For N-glycosylation, the 
initially attached preformed N-glycan oligosaccharide 
is trimmed by α-mannosidases in the ER, and sequen-
tial addition of β-GlcNAc residues by MGATs gen-
erates complex-type bi-antennary, tri-antennary and 
tetra-antennary N-glycan core structures. GalNAc-type 
O-glycosylation involves four distinct O-glycan core 
structures (Core1–Core4). POMT1/2-driven Man-type 
O-glycosylation involves three distinct core struc-
tures (Cores M1–M3). Interestingly, TMTC-initiated 
O-mannosylation does not appear to be processed 
and appears as the Man monosaccharide. Xyl-type 
O-glycosylation involves a common tetrasaccharide 
that is extended by either chondroitin sulfate or heparan 
sulfate polysaccharides.
The elongation and branching biosynthetic steps may 
be shared among different types of protein glycosylation, 
and the involved glycosyltransferases therefore function 
in multiple glycosylation pathways (FIG. 3). Elongation 
primarily involves N-acetyllactosamine (LacNAc 
type 2 chain Galβ1–4GlcNAc), often in the form of 
repeating disaccharides (polyLacNAc) and branches 
(Galβ1–4GlcNAcβ1–3(Galβ1–4GlcNAcβ1–6)Galβ1–
4GlcNAc). The isomeric disaccharide type 1 LacNAc 
(Galβ1–3GlcNAc) or the N,N′-Diacetyllactosamine 
(LacDiNAc, GalNAcβ1–4GlcNAc) disaccharide are 
also found as terminal disaccharides on a common 
scaffold of LacNAc glycans. Most of these elongation 
and branching reactions are shared with glycolipids. 
The capping step mainly involves terminal decoration 
of the oligosaccharide chains with Fuc and sialic acid 
N-acetylneuraminic acid (Neu5Ac) and is directed 
by the large sialyltransferase and fucosyltransferase 
families9,84,85 (FIG. 3).
Processing steps may be specific for certain types 
of protein glycosylation (pathway specific) or shared 
among several types (pathway non-specific). Most 
pathway-specific enzymes do not have close paralogues, 
which infers that expression of these enzymes allows 
predictions to be made about the cellular glycosylation 
capacity and the glycan structures being produced (FIG. 3) 
(Supplementary Box 1). This concerns glycosyltrans-
ferases involved in initiation and most core extension 
steps of the different types of protein glycosylation, with 
a few notable exceptions: initiation of GalNAc-type 
α-Dystroglycan
Dystroglycan is encoded by the 
DAG1 gene and comprises two 
non-covalently-bound subunits 
(α and β). The extracellular 
α-subunit with the O-Man 
matri glycan provides binding to 
laminin, and the transmembrane 
β-subunit provides binding to 
dystrophin and the cytoskeleton.
IPT domains
(also known as TIG domains). 
Immunoglobulin–plexin–
transcription (IPT) protein 
domains found on cell- 
surface transmembrane 
receptors and intracellular 
transcription factors with 
an immunoglobulin-like fold.
◀ Fig. 2 | Subcellular organization of protein glycosylation. The initiation steps of 
most types of protein glycosylation occur in the endoplasmic reticulum (ER) and involve 
transfer of a first monosaccharide to an amino acid (to Ser, Thr, Tyr, Trp), or in the case of 
N-glycosylation to a preformed oligosaccharide (to Asn). Two types of O-glycosylation 
(GalNAc-type and Xyl-type) are initiated in the Golgi. O-GlcNAcylation directed by 
O-GlcNAc transferase (OGT) occurs in the cytosol and nucleus. The glycosyltransferases 
directing these steps are indicated by their gene name (or for N-glycosylation, the oligo-
saccharyltransferase (OST) enzyme complex). Further glycosylation steps (core extension, 
elongation and capping) occur throughout the Golgi and trans-Golgi network (TGN). 
ER-resident glycosyltransferases directing initiation of protein glycosylation are 
transmembrane proteins (type III) or are equipped with a Lys-Asp-Glu-Leu (KDEL) retrieval 
signal recognized by the KDEL receptor (KDELR) associated with COPI vesicles (1). 
Glycosyltransferases in the Golgi are type II transmembrane proteins with luminal catalytic 
domains and short cytoplasmic carboxy-terminal sequences (2). Glycosyltransferases 
use dolichol (Dol)-linked donor substrates (Dol-P-Man, Dol-P-Glc or the lipid-linked 
oligosaccharide precursor) or activated sugar nucleotides transported from the cytosol 
into the ER and/or Golgi lumen by members of the SLC35 family of solute carriers (3). 
Although there are over 30 human members of the SLC35 family247, only 9 of these have 
been demonstrated to serve in sugar nucleotide transport and their specificities for 
donors are not fully clarified. The cisternal maturation model dictates that Golgi-resident 
glycosyltransferases are maintained and distributed across stacks by retrograde coat 
protein complex I (COPI) vesicular transport directed by motifs in short cytoplasmic tail 
sequences (4). Stem/transmembrane regions of glycosyltransferases can undergo 
proteolytic cleavage by, for example, signal peptide peptidase-like 3 (SPPL-3) or 
β-secretase 1 (BACE-1) proteases in the secretory pathway releasing catalytic domains 
to the extracellular milieu (5). N-Glycoproteins acquiring mannose-6-phosphate (M6P) 
in the early Golgi by action of the GlcNAc-1-phosphotransferase complex (complex of 
GNPTA, GNPTB and GNPTG, which catalyses formation of GlcNAc1-P-Man linkages) 
followed by the uncovering enzyme GlcNAc-1-phosphate hydrolase (NAGPA, which 
removes the GlcNAc residue leaving M6P) are recognized by the cation-dependent 
(M6PR) and cation-independent (insulin-like growth factor 2 receptor (IGF2R)) M6P 
receptors, and transported in clathrin-coated vesicles to the lysosome where mannose is 
dephosphorylatedby lysosomal acid phosphatases (ACP2 and ACP5), and the receptors 
are recycled back to the Golgi (6). From the cell surface, glycosylated proteins can 
undergo endocytosis followed by recycling, degradation in lysosomes or retrograde 
transport to the TGN/Golgi (7). Neuraminidases (NEU1–4) may remove sialic acids 
previously attached during the capping step of glycosylation (sialylation), and such 
desialylated glycoproteins may be recognized by different glycan-binding proteins, 
such as galectins, and upon internalization undergo resialylation by sialyltransferases 
(SIATs) again in the TGN. CDP, cytosine diphosphate; CMP, cytosine monophosphate; 
COLGALT, collagen O-Gal transferase; DPY19L, dpy-19 like C-Man transferase; EOGT, 
epidermal growth factor-domain specific O-GlcNAc transferase; GALNT, polypeptide 
GalNAc-transferase; GDP, guanosine diphosphate; GMP, guanosine monophosphate; 
GNPTAB/G, GlcNAc-1-phosphotransferase; POFUT, protein O-fucosyltransferase; 
POGLUT, protein O-Glc transferase; POMT, protein O-Man transferase; TMTC, 
transmembrane O-Man transferase targeting cadherins (also transmembrane and 
tetra-trico-peptide repeat (TPR) repeat-containing protein); UDP, uridine diphosphate; 
UMP, uridine monophosphate; XYLT, protein O-Xyl transferase.
Nature reviews | Molecular cell Biology
R e v i e w s
P
DPAGT1
ALG2
ALG1
ALG3
ALG9
ALG11
ALG6
ALG8
ALG10
ALG10B
ALG12
POMK
PGAP4
PIGV
PIGM
P
PIGB
PIGZ
MGAT1
(MGAT4D)
C1GALT1
(C1GALT1C1)
B3GNT6
B4GALT7
XXYLT1
B3GLCT
POMGNT2
POMGNT1 Core M1
Core M3
Matriglycan
Core M2
GXYLT1/2
MFNG/
LFNG/
RFNG
B3GALT6
B3GAT3
MGAT5B
B3GALNT2
A4GALT
B4GALNT1
B3GNT5
B3GALNT1
FKTN/
FKRP
B4GAT1
RXYLT1
FUT8
B3GALT4
MGAT2
MGAT3
MGAT5
B4GALNT2 A4GNT
ABO
Core1
Core2
Core3
Core4
FUT3/5
CHST10*
GLCE
DSE/DSEL 
DS
CS
HS
CHST1/3*
CHST5/
CHST7/
CHST4/
CHST2/
CHST6
GAL3ST1
GAL3ST2
GAL3ST3
GAL3ST4
HS2ST1
UST
CHST10*
PLOD3
ALG13/14
GCNT1/3/4
B3GALT1/2/5
B3GNT2–4
B3GNT7–9
B4GALNT3/4
B4GALT1–4
GCNT2/7
B3GAT1/2
ST3GAL1–6
ST6GAL1/2
ST6GALNAC1–6
ST8SIA1–6
FUT1/2
NDST1–4
HS6ST1–3
MGAT4A–C
CHST8/9
2 3
4
68
Initiation
N-Linked LLO precursor biosynthesis
Core extension
Elongation
and
branching Capping Sulfation
Lex Lea
R
NS 3S
6S 4S
R
3S
6S
R
2S 3S
R
R
RR
R
R
R
R
R
R
R
R
R
R
R
RR
R
R
R
R
RR
S/T/Y
N
S/T
S/T
S/T
S/T
S/T
S
S
S/T
S/T
Hyl
W
UGGT1
QC
UGGT2
UGCG
PIGA
UGT8
STT3A/B (OST)
POFUT1
POFUT2
EOGT
GALNT1–20
POGLUT1
OGT
XYLT1/2
POGLUT2/3
COLGALT1/2
POMT1/2
DPY19L1–4
TMTC1–4
 ? (IPT domains)
EXTL3
3S
6S 4S
2S
2S3SNS
6S
2S
2S
ol
ol ol
3S
3S
4S
3S
3S
6S 6S
6S 6S
3S
6S
Ethanolamine phosphate Phosphate d-Glucose (Glc) N-Acetyl-d-glucosamine (GlcNAc)
PhosphoinositolCeramide d-Galactose (Gal) N-Acetyl-d-galactosamine (GalNAc)
d-Mannose (Man) N-Acetylneuraminic acid (NeuAc)d-Glucuronic acid (GlcA) l-Iduronic acid (IdoA)
d-Glucosamine (GlcN) l-Fucose (Fuc) d-Xylose (Xyl)d-Ribitol (Rbo)
Linkage
Sulfation
R
Non-specificSpecificSpecific
Lipid
glycosylation
N-Glycosylation
Secretory
O-glycosylation
Secretory
O-glycosylation
Secretory
C-mannosylation
Nucleocytoplasmic
O-glycosylation
 Tn antigen
CHST15
B4GALT5/6
LARGE1/2
CHPF/
CHPF2/
CHSY1/
CHSY3
CSGALNACT1/2
EXT1/2
EXTL1/2
DPM1
GDP
FUT4–7/
FUT9–11
Type 1
LacNAc
Type 2
LacNAc
LacDiNAc
PolyLacNAc
Dolichol
CHST3*
CHST11–14
HS3ST3A1/B1
HS3ST1/2/4/5/6
ER
www.nature.com/nrm
R e v i e w s
O-glycosylation is regulated by 20 GALNT isoenzymes; 
tri-antennary branching of N-glycans is regulated by 
MGAT4A and MGAT4B (note that role of MGAT4C is 
unclear86); branching of GalNAc-type O-glycans to form 
Core2 or Core4 O-glycans is mediated by three isoen-
zymes (GCNT1, GCNT3 or GCNT4); extension of the 
POFUT1-mediated O-Fuc glycan is regulated by three 
isoenzymes (MFNG, LFNG or RFNG); and the impor-
tant steps determining chondroitin sulfate or heparan 
sulfate GAG biosynthesis on the tetrasaccharide linker 
are governed by CSGALNACT1 or CSGALNACT2 and 
by EXTL2 or EXTL3, respectively.
Glycosyltransferases considered pathway non- 
specific include 17 enzymes that are involved in elon-
gation steps, including B3GNTs, B4GALTs, B3GALTs 
and B4GALNTs, and 35 glycosyltransferases involved 
in capping steps, including FUTs, ST3GALs, ST6GALs, 
ST6GALNACs and ST8SIAs as well as B3GATs, A4GNT 
and ABO (FIG. 3). Arguably, these glycosylation pathway 
non-specific enzymes direct the greatest diversity in the 
glycome, although they also produce common struc-
tural scaffolds that may reduce this diversity in terms 
of distinct functional binding epitopes of the glycome9. 
Moreover, most of these glycosyltransferases belong to 
isoenzyme families with overlapping properties and 
poorly understood non-redundant functions, which 
at least partly hampers the ability to predict the glycan 
structures produced by analysis of enzyme expression 
(Supplementary Box 1).
Side-chain modifications of glycans. Sulfation is the most 
abundant and diverse glycan modification. Whereas 35 
Golgi sulfotransferases are involved in glycan sulfation, 
only two sulfotransferases (TPST1/2) direct tyrosine pro-
tein sulfation87. The majority of these sulfotransferases 
serve in decorating GAGs, and different sulfation pat-
terns produced on these large polysaccharides serve as 
distinct binding motifs for proteins and regulate wide 
essential biological roles (FIG. 3). Whereas the biosynthe-
sis and sulfation of GAGs is well understood, insight into 
the specific structures of biologically active GAG motifs 
and the sulfotransferase isoenzymes directing these is 
still limited88,89. A different type of GAG, keratan sulfate, 
is found on N-glycoproteins and O-glycoproteins and is 
built on polyLacNAc repeats. Keratan sulfate synthesis 
is initiated by 6-O-sulfation of GlcNAc carried out by 
CHST6/2 and subsequent galactosylation by B4GALT4 
and elongation by B3GNT7 with further 6-O-sulfation of 
Gal involving CHST1/3 (ReF.90). Some 14 sulfotransferases 
are predicted to direct sulfation of N-glycoproteins 
and O-glycoproteins, but the specific roles of these 
isoenzymes are only partly understood87 (FIG. 3).
Phosphorylation of glycans regulates glycosylation 
and serves as a gatekeeper for progression to elonga-
tion steps. For example, the extracellular kinase POMK 
phosphorylates the O-Man residue in α-dystroglycan 
to induce synthesis of the elaborate matriglycan — an 
extracellular matrix-binding motif on α-dystroglycan91,92 
(FIG. 3). FAM20B phosphorylates the Xyl residue in the 
forming tetrasaccharide linker, and this phosphorylation 
affects the third synthesis step directed by B3GALT6 
regulating GAG synthesis93,94.
Acetylation of sialic acids is an abundant modifica-
tion that regulates the interaction of sialoglycoproteins 
with cellular receptors such as the sialic acid-binding 
immunoglobulin-like lectins (Siglecs)95. Acetylation 
also conveys resistance to sialic acid removal by most 
sialidases7. Sialic acid acetylation occurs through 9-O 
or 7-O-acetylation of the activated Neu5Ac donor sugar 
nucleotide by CASD1, and incorporation of acetylated 
Neu5Ac into glycans by sialyltransferases in the Golgi96. 
Sialate 9-O-acetylesterases serve as NeuAc deacetylases97. 
Of interest, certain viral receptors bind 9-O-acetyl 
Neu5Ac (ReF.98). Thus, acetylation of sialic acids in glycans 
may function in regulating interactions with endogenous 
receptors, while being exploited by pathogens.
Context-specificity of glycosylation
The glycosylation of proteins, in general, reflects the 
glycosyltransferase repertoire and the glycosylation 
capacities of the producing cell. However, the individ-
ual protein may not be efficiently glycosylated, leading 
to heterogeneities, and certain glycosylation features are 
targeted to specific proteins and, hence, not universally 
found. Moreover, the secretory pathway and glycosyl-
ation machinery, in general, are influenced by numer-
ous cellular andenvironmental factors that affect the 
glycosylation efficiency.
Heterogeneity in protein glycosylation. Different effi-
ciencies in the initiation of glycosylation at specific sites 
in proteins (stoichiometry of glycan attachment) and 
Fig. 3 | Human glycosylation pathways and enzymes. A global view of glycosylation 
pathways, the major structural elements of the glycans and the assigned (predicted) 
biosynthetic roles for glycosyltransferases as well as carbohydrate sulfotransferases 
(indicated are genes encoding these enzymes; please note that the enzymes that initiate 
the novel O-mannosylation specific for immunoglobulin-like, plexin, transcription factor 
(IPT) domains are currently not reported (question mark)). The 16 known glycosylation 
pathways are organized into major biosynthetic steps specific for pathways (initiation 
and core extension) and those that are non-specific (elongation and branching, and 
capping). For pathways involving isoenzymes, all isoforms are listed (with a dash or solidus 
character). The background colours for the different protein glycosylation pathways 
mimic the colours of the initiating monosaccharide (lipid glycosylation pathways are 
shown in grey), and these are maintained for the pathway-specific glycosylation steps. 
Dashed lines indicate alternative pathways for chondroitin sulfate (CS) and dermatan 
sulfate (DS) or heparan sulfate (HS) biosynthesis on the common tetrasaccharide linker. 
The display is useful to peruse individual genes as well as the glycosylation processes 
as an integrated system. Please note that the major structural variations characteristic 
of the glycosylation pathways are illustrated, but the diagram is not intended to show all 
structural permutations and variations found in cells (linkages only indicated for select 
structures). Supplementary Table 1 provides detailed information of the glycosyltrans-
ferase genes mapped. For information on transcriptional regulation of glycosylation 
enzymes shown and their association with congenital diseases and genome-wide 
association study traits, see Supplementary Figs 1 and 2 (note that these present the 
biosynthetic pathways in a vertical orientation as compared with horizontal orientation 
shown here and used before215). *Transferases that appear twice in the figure due to dual 
pathway-specificity. COLGALT, collagen O-Gal transferase; DPY19L, dpy-19 like C-Man 
transferase; EOGT, epidermal growth factor-domain specific O-GlcNAc transferase; 
ER, endoplasmic reticulum; GALNT, polypeptide GalNAc-transferase; GDP, guanidine 
diphosphate; Hyl, hydroxylysine; LacNAc, N-acetyllactosamine (Galβ1–4GlcNAc); 
LAcDiNAc, N,N′-Diacetyllactosamine (GalNAcβ1–4GlcNAc); LLO, lipid-linked 
oligosaccharide; NS, non-specific; OGT, O-GlcNAc transferase; OST, oligosaccharyl-
transferase; POFUT, protein O-fucosyltransferase; POGLUT, protein O-Glc transferase; 
POMT, protein O-Man transferase; QC, quality control; R, variable underlying core 
glycan; S, sulfation; TMTC, transmembrane O-Man transferase targeting cadherins 
(also transmembrane and tetra-trico-peptide repeat (TPR) repeat-containing protein); 
XYLT, protein O-Xyl transferase.
◀
Glycosaminoglycan chains
(GAGs). extended linear 
polysaccharides comprising 
repeating disaccharides.
Stoichiometry
The fraction of a glycosylation 
site in a glycoprotein that is 
occupied by a glycan.
Nature reviews | Molecular cell Biology
R e v i e w s
variable processing of glycans at sites in proteins (struc-
tures of the glycan) will result in macroheterogeneity 
and microheterogeneity of glycosylation, respectively. 
Glycosylation may be influenced by the overall pro-
tein structure and conformational constraints around 
glycosites. For example, the presence of high-Man 
N-glycans at select sites in mature glycoproteins can 
correlate with steric hindrance of the action of manno-
sidases at these sites in the ER. Preferences for acquir-
ing bi-antennary or multi-antennary glycans regardless 
of the available glycosylation capacity are found; for 
example, generation of the conserved IgG1 biantennary 
N-glycan (Asn297) in the Fc region shows that glycan–
peptide backbone interactions and steric hindrance 
for processing enzymes result in inefficient branching, 
galactosylation and sialylation99–102. Assessing hetero-
geneity in glycosylation of specific proteins is inherently 
challenging in glycoproteomics and requires elabo-
rate site-specific analysis with isolated glycoproteins, 
but progress towards quantitative glycosite-specific 
glycoproteomics workflows is being made103.
Protein-specific glycosylation features. Some gly-
cosylation features are only observed in a subset of 
cellular proteins. The most notable example here is 
tagging of lysosomal hydrolases (some ~60 different 
N-glycoproteins) with mannose-6-phosphate (Man-6-P) 
by the GlcNAc-1-phosphotransferase (GNPTAB–
GNPTG) and the uncovering enzyme (UCE)23 (FIG. 2), 
which serves as a ligand for the Man-6-P receptors in the 
trans-Golgi network to direct transport of these enzymes 
to the endo-lysosomal system104. Mechanistically, the 
GlcNAc-1-phosphotransferase appears to recognize 
conformational motifs in the diverse lysosomal enzyme 
proteins to select high-Man N-glycans105. Polysialylation 
(α2–8NeuAc) is another protein-specific modification 
found on select N-glycans of the neural cell adhesion 
molecule and few other proteins that depends on inter-
actions with protein motifs106,107. Polysialylation may also 
be found on O-glycans of Neuropilin-2 (ReF.108).
Glycosylation features derived from the extracellular 
milieu. The glycome of a cell may to some degree also 
depend on the glycosylation capacities of other cells 
through transfer of glycoconjugates and extracellular 
glycosylation reactions. For example, uptake of AbH 
blood group glycolipids from plasma lipoproteins to 
red blood cells109 and uptake of GPI-anchored glyco-
protein CD52 in sperm during maturation110 both serve to 
introduce new glycan structures. Glycosylation enzymes 
can also be secreted. Secreted ST6GAL1 can contri-
bute to extracellular sialylation of IgGs and remodel-
ling of cell-surface glycans111–114. Glycosyltransferases 
and glycoproteins can also be transferred between 
cells via extracellular vesicles and non-membranous 
nanoparticles (exomeres), having functional conse-
quences in recipient cells115. Finally, pathogenic bacte-
ria encode and can inject virulent glycosyltransferases 
that O-glycosylate and N-glycosylate host proteins and 
interfere with cellular functions, as well as glycoside 
hydrolases that reprogramme cell-surface glycans116–118.
Functions of glycosylation
Glycosylation has roles in folding, quality control, sta-
bility, transport and function of proteins7, and many 
of these functions serve the proteome globally. For 
example, in the ER, the initial steps of N-glycosylation 
guide protein folding and their quality control119. 
Fig. 4 | Protein glycosylation serves general roles and specific roles for 
protein functions. a | Specific roles of glycosylation in regulating protein 
functions. Intracellularly, in the nucleus and cytosol, dynamic 
O-GlcNAcylation regulates numerous cellular processes, including 
transcription and signalling, and serves in nutrient sensing. In the secretory 
pathway, endoplasmic reticulum (ER)-initiated types of glycosylation may 
serve in folding, stability and trafficking of proteins. Several types of 
glycosylation serve specific roles for select proteins driven by differentially 
regulated glycosylation at specific glycosites, for example, co-regulating 
processing (for activation or inactivation) by proprotein convertases 
(GalNAc-type). Within the glycocalyx at the cell surface (zoom-in box), 
specific glycosites serve to (from left to right): regulate the stability and/or 
activity of glycoproteins, by affecting their susceptibility to regulated 
proteolytic processing (for example, G protein-coupled receptor (GPCR) 
amino-terminalcleavage to regulate their signalling or entire ectodomain 
shedding of membrane proteins) (GalNAc-type); by providing contact 
points for cell adhesion (Poly-Sia and Man-type); by modulating the ligand 
specificity and/or affinity of receptors like low-density lipoprotein receptors 
(LDLRs) and NOTCH (GalNAc-type, Fuc-type and Glc-type); by modulation 
effector function of immunoglobulins (N-glycans); or by regulating receptor 
dimerization and signalling (GalNAc-type). Glycans also serve in a myriad of 
interactions with endogenous glycan-binding proteins (GBPs) (intrinsic, left) 
and microbial GBPs (extrinsic, right). Intrinsic glycan interactions serve in 
cell–cell adhesion and communication, and are mediated by, for example, 
sialic acid-binding immunoglobulin-like lectins (Siglecs) binding to sialic 
acid capped glycans for immunomodulation; selectins binding to, for 
example, sialyl-Lewisa antigen for cell trafficking; and galectins binding to 
different β-Gal glycans for numerous functions, for example, cross-linking 
cell-surface glycoproteins by Galectin-3 pentamers in a dynamic lattice that 
regulates endocytosis and compartmentalization248. The large Siglec family 
promotes cell–cell interactions and regulates important functions in the 
immune system by engaging numerous sialylated glycan epitopes. Selectins 
(P-selectin, L-selectin and E-selectin) specifically require both sialic acid and 
fucose, and in some cases also sulfate, for binding. Extrinsic glycan 
interactions involve binding by bacterial lectins or adhesins mediating 
bacteria–host interactions, including initial attachment and potential 
invasion and colonization; for example, Helicobacter pylori encodes 
numerous adhesins that recognize different histo-blood group-related 
glycans in the human gastrointestinal tract mucosa249–251. Bacteria also 
produce glycoconjugates like lipopolysaccharides and glycan structures 
that mimic host glycans, allowing the bacteria to avoid recognition by host 
immune cells (mimicry). b | Overview of the common scaffolds and capping 
motifs, and the combinations of these that help generate diversity of 
glycans, which serve as the major ligands for GBPs (see FIG. 3 for enzymes 
involved). Core glycan structures (bottom left) are elongated with one of 
three types of chains (type 1 N-acetyllactosamine (LacNAc, Galβ1–3GlcNAc), 
type 2 LacNAc (Galβ1–4GlcNAc) or N,N′-Diacetyllactosamine (LacDiNAc, 
GalNAcβ1–4GlcNAc); labelled R1–R3, respectively). The core glycan 
structures can be extended by chains from R1–R3 (except for O-Man cores, 
which can only be extended by R2). Type 2 chains may be repeated to form 
poly-LacNAc chains (indicated by square brackets, ‘n’ indicating the number 
of repeat units). The terminal core structures may be variously fucosylated, 
sulfated and capped by sialic acids, as may residues along the poly-LacNAc 
chain, and the combinatorial action of relatively few transferases results in 
complex glycan structures with large diversity. To better indicate which 
residues have been added in a synthesis step, the residues added 
by previous steps are presented in grayscale. Pol II, polymerase II; X, any 
amino acid.
◀
ABH
Carbohydrate antigens of the 
AbO blood group system.
Lipoproteins
large complexes of lipids 
and proteins that transport 
lipids in the blood.
Lipopolysaccharide
A bacterial glycolipid endotoxin 
and a major constituent 
of the outer membrane of 
Gram-negative bacteria.
www.nature.com/nrm
R e v i e w s
Further, O-glycosylation via O-Fuc and O-Glc of folded 
EGF-like and thrombospondin type 1 repeat domains 
stabilizes these after folding, enabling secretion71,120,121. 
Interestingly, other types of O-glycosylation do not 
appear to be required for protein transport and secre-
tion. On the cell surface, glycoconjugates constitute the 
glycocalyx that provides a barrier and a protective layer 
shielding the plasma membrane from physical stress 
and shaping the cell surface. The glycocalyx serves to 
mediate numerous intrinsic and extrinsic interactions. 
Excellent recent reviews cover these topics7–9, and here 
we focus on non-global roles of glycosylation, high-
lighting examples where specific glycosyltransferases 
regulate specific functions at the level of glycosites or 
glycostructures (FIG. 4a).
Functions determined at the glycosite level. The NOTCH 
receptor illustrates how multiple types of O-glycosylation — 
O-Fuc (directed by POFUT1), O-GlcNAc (directed by 
EOGT)122 and O-Glc (directed by the POGLUT1–3 
isoenzymes) — converge to modulate complex protein 
functions by decorating distinct glycosites in the many 
NOTCH EGF repeats39 (TAble 1). With 20 isoenzymes, 
GalNAc-type O-glycosylation by GALNTs provides the 
greatest opportunity for differential regulation of specific 
sites in proteins, and GALNT isoenzymes serve highly 
specific co-regulatory roles in fundamental processes, 
including in the inhibition of protein processing by 
proprotein convertases35,123, in the inhibition (or, in some 
cases, activation) of proteolytic cleavage by ADAM pro-
teases with altered shedding of ectodomains of surface 
proteins124, in the activation of β1-adrenergic receptor 
by modulating its N-terminal cleavage125, in the modu-
lation of peptide hormone function and promoting their 
stability126,127, and in promoting ligand binding by LDLR 
and LRPs65 (FIG. 4a). These functions are subject to tight 
regulation by expression levels of specific GALNTs128, 
which may be controlled by feedback loops as shown 
for GALNT3: GALNT3 uniquely co-regulates propro-
tein convertase-mediated processing of FGF23 that 
serves to regulate phosphate homeostasis129, and in turn 
the expression of GALNT3 is responsive to blood phos-
phate levels130. Man-type O-glycosylation by TMTC1–4 
also provides for differential regulation of O-glycans on 
cadherins and protocadherins that may regulate their 
functions in cell adhesion131. Initiation of GAG synthesis 
by XYLT1 and XYLT2 also likely directs different subsets 
of proteoglycans and corresponding functions78.
The dynamic and widespread O-GlcNAcylation in 
the nucleus and cytosol regulated by the OGT and OGA 
pair of enzymes has a major role in nutrient sensing 
through the hexosamine biosynthetic pathway and modu-
lating the cellular response to stress4. O-GlcNAcylation 
at specific glycosites regulates functions of proteins, 
including their enzymatic activity, stability and locali-
zation, and the dynamic on/off nature of O-GlcNAc on 
glycosites enables O-GlcNAcylation to regulate tran-
scription and signalling events in crosstalk with protein 
phosphorylation4. O-GlcNAcylation regulates transcrip-
tion by modifying drivers of gene expression like Pol II, 
where C-terminal addition and removal of GlcNAc by 
OGT and OGA during the transcription cycle regulates 
a
b
S/T/Y
Fucosylation
Fucosylation
Fucosylation
Sulfation
Sialylation
Fucosylation
Sulfation
Sialylation
Fucosylation
Sulfation
Sulfation
Lipids
N-Glycan
Core1
GalNAc-
type
O-Man
(only R2)
R2
R2R1
S/T/Y
S/TN-X-S/T
n n
n
n
nn
n
n
n
GPCR
S/T/YR1 R2
R1 R2
R3
R1 R2 R3
S/T/Y
R1 R2 R3Type 1
LacNAc
Type 2
LacNAc
LacDiNAc
6S
6S
3S
4S
3S
3S
S/T/YR3
Elongation
Core extension
Sialylation
Adhesins
Receptor
dimerization
Ligand
binding
Cell
adhesion
Signalling
Signalling
Signalling
Effector
function
Siglecs
Selectins
Galectins
R-X-X-R-T/S
Intracellular
trafficking
Glycoprotein
folding
Nutrient sensing
and signalling
Pathogen mimicry
Endocytosis
Antibody
N-Acetylneuraminic
acid (NeuAc)
Sulfation (S) 
d-Mannose
(Man)
d-Glucose
(Glc)
N-Acetyl-d-glucosamine
(GlcNAc)
d-Galactose
(Gal)
N-Acetyl-d-galactosamine
(GalNAc)
l-Fucose
(Fuc)
Processing and
activity regulation Ectodomain
cleavage
Processing by
pro-protein
convertases 
Cell–cell
interaction
Bacteria–host
interaction
Golgi
ER
Nucleus
Pol IIDNA
Core2
OR
Nature reviews | Molecular cell Biology
R e v i e w s
transcription activationand repression. Site-specific 
O-GlcNAcylation also regulates important processes 
such as neuronal depolarization132 and intermediate fil-
ament morphology and cell migration133. The OGT sub-
strate selectivity is partly regulated through interactions 
between the TPR domains, adaptor proteins and target 
protein substrates83.
Functions regulated at the glycan structure level. There is 
abundant evidence that the structure of glycans on pro-
teins regulates diverse protein interactions and biological 
functions. For N-glycan, for example, tetra-antennary 
branching by MGAT5 favours synthesis of polyLacNAc 
chains that promote multivalent interactions with lectin 
receptors, for example galectins134–136. Functionally, 
MGAT5-mediated branching on α5β1 integrin serves to 
enhance cell motility and growth137. By contrast, MGAT3 
introduces a bisecting GlcNAc residue to the early stages 
of complex N-glycans that is not elongated and leads to 
inhibition of further N-glycan branching, and there-
fore is a major determining factor for the N-glycome 
architecture138. In line with this, MGAT3-directed gly-
cosylation of α5β1 integrin reduces its ability to bind to 
its ligands in the extracellular matrix and interferes with 
cell migration139. As another example, FUT8-directed 
fucosylation of the N-glycan core modulates several 
specific protein interactions, including EGF–EGFR 
binding140, interactions between the T cell receptor and 
the major histocompatibility complex II during T cell 
activation141, and IgG1 interactions with the FcγRIIIa 
receptor99,142 (FIG. 4a).
Protein function can also be modulated by specific 
GalNAc-type O-glycan core extensions. GCNT1- 
directed Core2 O-glycosylation promotes Galectin-
1-induced T cell death of immature T cell precursors and 
activated T cells143. GCNT3-directed Core2 O-glycans 
are also required for surface expression of intestinal 
cell differentiation markers144. The β3GlcNAc exten-
sion of O-Fuc on NOTCH regulates the interaction of 
the receptor with its ligands, Delta and Jagged, and the three 
isoenzymes, MFNG, LFNG or RFNG, directing the 
O-Fuc elongation appear to target distinct EGF repeats 
and determine differential binding between these two 
ligands71. Another example is assembly of the matri-
glycan on α-dystroglycan. In the ER, POMGNT2 primes 
matriglycan synthesis by elongating select O-Man gly-
cans on α-dystroglycan145, and in the Golgi, LARGE1/2 
produce the functional glycosaminoglycan-like 
matriglycan polymer28.
The common structural capping motifs on glycans 
found on several types of glycoproteins (FIG. 4b) comprise 
the major interactome for GBPs, including the endog-
enous lectin receptors galectins, Siglecs and selectins, 
and diverse microbial lectins and adhesins7,9,146 (FIG. 4). 
Common for most mammalian lectins are shallow 
binding sites and low affinity interactions (micromolar 
to millimolar) for single glycan epitopes, allowing for 
dynamic and reversible glycan–receptor associations. 
GBPs may also recognize more complex structures (clus-
tered saccharides or discontinuous glycans) involving a 
higher-order presentation of glycans that can be within 
a context of specific glycoconjugates and even specific 
proteins resulting in increased affinity towards their 
target glycans147. Glycan binding motifs are generally 
directed by multiple isoenzymes that are glycosylation 
pathway non-specific and highly regulated. However, 
these isoenzymes may have non-redundant activities 
and selectively regulate glycosylation: in megakaryo-
cytes, the B4GALT1-driven LacNAc epitope regulates 
β1 integrin activity required for the formation of mature 
platelets148; terminal sialylation of LacNAc or LacDiNAc 
O-glycans by ST6GAL1 was found to be important to 
regulate B cell activation via lectin Siglec-2 (CD22)95,149; 
and ST3GAL1-directed capping of Core1 O-glycans 
controls CD8+ T lymphocyte homeostasis by inhibiting 
apoptosis and regulating memory cell formation150,151.
Changes of glycosylation in disease
Glycosylation processes in cells are highly sensitive to 
the physiological state, and glycans are prevalent report-
ers of disease152,153. An overwhelming number of studies 
using lectins, antibodies and direct structural analyses of 
glycan features have documented diverse changes in gly-
cosylation in human diseases and especially cancers153. 
However, our insight into how these changes in gly-
cosylation occur, what the functional consequences, 
if any, are and the nature of specific mole cular mech-
anisms is still limited. Congenital disorders of glyco-
sylation (CDGs) and genome-wide association studies 
(GWAS) are beginning to enable us to deconvolve highly 
specific roles of glycosylation in common diseases (for 
a global overview of the glycosylation steps and path-
ways served by glycosyltransferase genes known to 
cause CDGs and/or implicated by GWAS traits, see 
Supplementary Fig. 2).
Lessons from congenital disorders of glycosylation. 
The importance of glycosylation is clearly under-
scored by decades of studies of rare monogenic CDGs, 
where more than 60 of the 100 CDGs identified to 
date are caused by complete or partial loss of function 
of a glycosyltransferase17,18 (Supplementary Table 1). 
CDGs may be caused by mutations in glycosyltrans-
ferase genes in all types of glycosylation pathways, but 
currently most glycosyltransferase-related CDGs are 
caused by deficiency in unique enzymes functioning in 
pathway-specific glycosylation steps where loss of func-
tion results in global glycome changes (Supplementary 
Fig. 2a). These CDGs are generally characterized by 
multisystemic disorders, with a wide spectrum and 
severity of clinical manifestations predicted to arise 
from a myriad of impaired direct and indirect biologi-
cal functions of glycans, and these are often difficult to 
trace and dissect17,18.
CDGs were often identified by profiling N-glycans 
on an abundant serum glycoprotein (usually transfer-
rin), but today CDGs are also discovered by genome 
sequencing that is uncovering CDGs with more subtle 
phenotypes154. This has resulted in the discovery of novel 
CDGs caused by deficiencies in isoenzymes with subtle 
non-global glycosylation functions20,155, and these are 
pointing us to highly specific regulatory roles of protein 
glycosylation. The most illustrative examples so far are 
CDGs caused by loss of function within the GALNT 
Proprotein convertases
A family of seven secretory 
mammalian serine proteinases 
that post-translationally 
activate proproteins in the 
secretory pathway by limited 
proteolysis after multiple basic 
residues with a general 
recognition motif, (R/K)Xn(R/K). 
A prototypical proprotein 
convertase is furin.
www.nature.com/nrm
R e v i e w s
isoenzymes, which have uncovered how site-specific 
O-glycosylation fine-tunes highly specific protein func-
tions and regulates major physiological processes such as 
blood phosphate homeostasis (GALNT3) and likely also 
kidney function (GALNT11)129,156,157 (FIG. 5).
Lessons from genome-wide association studies. Survey-
ing glycosyltransferase genes implicated by GWAS for 
diverse traits or predispositions interestingly revealed 
that GWAS candidate genes represent almost a mirror of 
the glycosyltransferase genes so far identified as causing 
↑GALNT3/T6
Tn
T
STn
Initiation
FUT8↑↓
MGAT3↓
MGAT5↑
ST6GALNAC1↑
C1GALT1C1↓
COSMC↓
GCNT1↓
↑FUT6
ST6GAL1↑
SLeX
α2-6-SA α2-6-SA 
FUT3↑
SLea
B3GALT5↓
GCNT3↓↓B3GNT6
Placental CS
• Adhesion
• Migration
Signalling
Metastasis
Apoptosis
Immune
recognition
6S 4S
2S
NS 3S
6S
Healthy Cancer
TGFβ
Immunosuppression
EMT
Siglec 7/9
Siglec 15Activation
Signalling
Activation
Macrophage
Epithelial/
endothelial
cell
ECM
T cell
NK cell
SelectinT cell receptor,
NK cell activated
receptor
Siglec Deathreceptor
E-Cadherin CSF3R EGFRIntegrin
Specific
Core elongation and branching
Non-specific
d-Xylose (Xyl) N-Acetyl-d-glucosamine (GlcNAc)d-Galactose (Gal)
N-Acetyl-d-galactosamine(GalNAc)d-Mannose (Man)
N-Acetylneuraminic acid (NeuAc)
d-Glucuronic acid (GlcA)
l-Fucose (Fuc)
Cell death
signalling
Cell death
signallingChanges in GAGs
• Reduced Core2 branching
• Reduced Core1 elongation
• Reduced Core3/4 structures
• Accumulation of Tn and STn
• Introduction of bisecting GlcNAcs
• Changes in branching
• Changes in core Fuc
• Switch from 2-3 to 2-6 
sialic acid capping 
• Increased SLex and
SLea epitopes
Fig. 5 | common dysregulated glycosyltransferase genes in cancer. 
Three major glycosylation pathways (N-glycosylation, GalNAc-type 
O-glycosylation and O-Xyl glycosaminoglycans) that undergo characteristic 
changes in cancer (left) and examples of the biological effects these 
may exert in cancer (right). Glycosylation pathways are simplified 
accord ing to the scheme in FIG. 3, and the key glycosyltransferase genes 
undergo ing altered expression in cancer are indicated (arrows indicate 
up/downregulation of expression). Examples of specific effects of altered 
glycosylation in cancer are illustrated. Adhesion and migration: increased 
branching of N-glycans (MGAT5↑, MGAT3↓) and core fucosylation (FUT8) on 
adhesion receptors (for example, E-cadherin, integrins) modulates both cell–
cell adhesion252,253 and, alongside increased α2-6-sialic acid (α2-6-SA) 
capping (ST6GAL1), cell–extracellular matrix (ECM) adhesion through 
integrin receptors253. Mechanisms behind the changes in adhesiveness can 
be related to the abundance of receptors, conformational changes or 
presentation of binding epitopes to adhesive partners. These glycosylation 
changes are one of the drivers of epithelial–mesenchymal transition (EMT). 
Metastasis: alongside increased propensity for EMT, the metastatic potential 
of cells is increased by the upregulation of expression of sialyl-Lewisx (SLex) 
(FUT6 and ST6GAL1) and sialyl-Lewisa (SLea) (FUT3) glycan epitopes, which 
are recognized by selectins on epithelial and endothelial cells promoting 
tumour cell extravasation from vasculature and metastatic site homing196–198. 
Immune recognition: altered sialylation patterns on tumour cells allow for 
evasion from immune surveillance. This is mediated by the sialic acid-binding 
immunoglobulin-like lectin (Siglec) family of receptors, both inhibitory (for 
example, Siglec 7/9) and activating (for example, Siglec 15), which recognize 
different sialic acid ligands and modulate the signalling responses to tumour 
ligands resulting in immunosuppression254. Changes in sialylation may be 
attributed to cancer-associated changes in glycosyltransferase genes in the 
GalNAc-type O-glycosylation pathway, most notably the ST6GALNAC1 
gene that induces expression of the STn epitope185–187. Signalling and 
apoptosis: glycosylation can modulate the proximity of cell-surface 
receptors to each other, affecting the ability to directly interact or cluster, 
thereby affecting their signalling capability. For example, loss of a single 
GalNAc-type O-glycosylation site (unknown polypeptide GalNAc 
transferase (GALNT) isoform) on one of the cytokine receptors of 
granulocytes, colony-stimulating factor 3 receptor (CSF3R), is suggested to 
result in increased ligand independent dimerization and aberrant 
signalling255. Similarly, increased N-linked branching and core fucosylation 
(FUT8) on epidermal growth factor receptor (EGFR) increase receptor 
dimerization and signalling promoting growth and progression of cancer181. 
By contrast, clustering of death receptors is reduced in cancers due to the 
truncated GalNAc-type O-glycans and modified N-glycosylation, which 
leads to tumour cells escaping apoptotic fates256. CS, chondroitin sulfate; 
GAG, glycosaminoglycan; NK cell, natural killer cell; NS, non-specific; 
S, sulfation; TGFβ, transforming growth factor-β.
Nature reviews | Molecular cell Biology
R e v i e w s
CDGs, and they concentrate around those found to 
exhibit high transcriptional regulation158 (Supplementary 
Fig. 2a). This suggests the somewhat controversial con-
clusion that dysregulation of glycosyltransferase iso-
enzymes, rather than loss of function of these enzymes, 
is involved in more common disease conditions, such as 
coronary disease, osteoporosis, chronic kidney disease 
and schizophrenia identifiable by GWAS158. This predic-
tion was first supported by studies of GALNT2 impli-
cated in regulation of high-density lipoprotein (HDL) 
and triglycerides important for cardiovascular health159. 
Individuals with disrupted GALNT2 show lower HDL 
levels155, which has been recapitulated in several animal 
models with loss of function of Galnt2 (ReF.160). The GWAS 
signal for GALNT2 is located close to a liver-specific 
regulatory element that induces differential allele- 
specific transcription161,162, supporting the idea that 
liver-specific dysregulation of GALNT2 is the cause 
of the altered HDL metabolism. GALNT2 serves 
non-redundant O-glycosylation on ANGPTL3 and phos-
pholipid transfer protein (PLTP), two proteins involved 
in regulating HDL. GALNT3 is a GWAS candidate for 
bone mineral density163, which is consistent with CDG 
loss of function causing hyperphosphataemia and familial 
tumoral calcinosis with ectopic bone formations129,156,164. 
GALNT11, a GWAS candidate for chronic kidney decline, 
likely relates to its role in directing O-glycosylation of 
the ligand binding regions of LDLR and LRPs, includ-
ing LRP2 (also known as megalin) that serves as the 
major endocytic receptor in the proximal tubules of 
the kidney65. GALNT11 O-glycosylation enhances 
ligand affinity of LDLR65, and a mouse Galnt11–/– model 
revealed an essential role of O-glycosylation in LRP2 
regulation and kidney function157.
Dysregulation of glycosyltransferases in cancer. Recent 
reviews detail the prevalent glycome changes that equip 
cancer cells with distinct glycan features required dur-
ing different stages of tumour growth and dissemination, 
including immune evasion and metastasis153,165–167 (FIG. 5). 
Given such common aberrations of glycosylation in can-
cer, it may be surprising that somatic mutations in genes 
controlling cellular glycosylation are extremely rare, and 
very few validated mutations in glycosyltransferases have 
been reported in cancer. Mutations in COSMC encoding 
a private chaperone required for C1GALT1 that directs 
GalNAc-type O-glycan Core1 elongation were reported 
in a few cervical cancers168,169, but studies in colorec-
tal and pancreatic cancers found hypermethylation of 
COSMC rather than somatic mutations to be the major 
reason for the O-glycan truncation that is a hallmark 
of epithelial cancers (see also below)169,170. Relatively 
rare heterozygous missense mutations in GALNT12 
were reported in colorectal cancers, and these were also 
shown to affect catalytic properties in recent structural 
analysis171. An analysis of data from The Cancer Genome 
Atlas (TCGA) reveals that cancer genes are more often 
mutated (for example, KRAS is mutated in 8% of can-
cers) than typical glycosyltransferases, which have pro-
tein coding mutations in only around 1% of cancers, 
underlining the rarity of somatic mutations deleterious 
to the glycosylation machinery.
In contrast to protein-coding somatic mutations, 
there is more evidence to support aberrant expression 
of glycosyltransferases in cancer153,172–175; for example, 
as seen with a hotspot of non-coding mutations around 
the ST6GAL1 gene in B cell non-Hodgkin lymphomas176. 
Still, only a few studies have provided compelling insights 
into the biosynthetic, structural and functional conse-
quences of misexpression of specific glycosyltransferases 
in cancer, and we limit our discussion to these (of note, 
we excluded discussion of studies using RNA silencing to 
validate involvement of specific glycosyltransferase genes 
as these generally require further validation due to ineffi-
ciencies in reduction of enzyme activities and off-target 
effects). The mechanisms behind altered expression 
may include epigenetic regulation169,177

Continue navegando