Baixe o app para aproveitar ainda mais
Prévia do material em texto
Glycosylation of proteins is arguably the most diverse post-translational modification. Proteins are glycosylated by enzymes or through non-enzymatic glycation where glucose (aldehyde form) reacts with lysine and arginine residues in proteins, and undergo further changes that eventually lead to advanced glycation end products that serve important functions in ageing and disease, especially diabetes1. The enzymatic protein glycosylation processes (herein referred to as glycosylation) mostly involve sequential concerted steps in the endoplasmic reticulum (ER) and the Golgi system resulting in glyco- sylation of most (>85%) secretory proteins2,3. In addition, the majority of nuclear and cytoplasmic proteins undergo dynamic O-GlcNAcylation4, perhaps making glycosylation the most abundant post-translational modification (even more abundant than phosphorylation)5. Protein glycosylation is a complex, multistep pro- cess that employs around 200 glycosyltransferase enzymes that determine which proteins are to become glycoproteins, the positions of glycans on those pro- teins and the glycan structures assembled (TAble 1). Furthermore, the initial attachment of glycans may be differentially regulated in cells by expression of distinct glycosyltransferase isoenzymes, and thus some proteins may or may not become glycoproteins depending on their cell of origin and in response to functional needs. Glycosylation greatly amplifies the proteome by pro- ducing diverse proteoforms with different properties, thereby instructing myriad functions6–8. The ensem- ble of glycans found on glycoproteins, including glycosylphosphatidylinositol (GPI)-anchored proteins and proteoglycans, alongside glycosphingolipids and free oligosaccharides and polysaccharides constitutes the glycome of a cell, and the ensemble of glycoconjugates at the cell surface constitutes the glycocalyx (FIG. 1). The glycome is diverse, involving different types of glycoconjugates and oligosaccharides with varying compositions, sequences and linkages of sugar moieties. Nevertheless, despite this diversity, the glycoconjugates do share certain features, such as common structural scaffolds and terminal modifications9. The diversity in types of glycans and structures has expanded during evolution of eukaryotes likely in response to needs for increased molecular cues and regulation10,11. Protein glycosylation pathways are nearly identical across mam- malian cells, although several glycan features were elim- inated late in evolution by gene inactivation resulting in xenoantigens12–14. Studies of deficiencies in glycosylation enzymes in animal models and human diseases have advanced understanding of biological functions of protein glyco- sylation and demonstrated that most glycosyltransferases serve essential roles in mammalian physiology15–18. Glycosylation of proteins is thus integral to their func- tion and should be considered in functional studies of the proteome. However, our understanding of specific O-GlcNAcylation The enzymatic process directed by the N-acetyl- d-glucosamine (GlcNAc) glycosyltransferase (OGT) that transfers GlcNAc to proteins (Ser and Thr residues) occurring in the cytosol and nucleus of cells. Isoenzymes enzymes that catalyse the same reactions but differ in amino acid sequence and often have partially distinct (non-redundant) functions. Global view of human protein glycosylation pathways and functions Katrine T. Schjoldager , Yoshiki Narimatsu , Hiren J. Joshi ✉ and Henrik Clausen ✉ Abstract | Glycosylation is the most abundant and diverse form of post-translational modification of proteins that is common to all eukaryotic cells. Enzymatic glycosylation of proteins involves a complex metabolic network and different types of glycosylation pathways that orchestrate enormous amplification of the proteome in producing diversity of proteoforms and its biological functions. The tremendous structural diversity of glycans attached to proteins poses analytical challenges that limit exploration of specific functions of glycosylation. Major advances in quantitative transcriptomics, proteomics and nuclease-based gene editing are now opening new global ways to explore protein glycosylation through analysing and targeting enzymes involved in glycosylation processes. In silico models predicting cellular glycosylation capacities and glycosylation outcomes are emerging, and refined maps of the glycosylation pathways facilitate genetic approaches to address functions of the vast glycoproteome. These approaches apply commonly available cell biology tools, and we predict that use of (single-cell) transcriptomics, genetic screens, genetic engineering of cellular glycosylation capacities and custom design of glycoprotein therapeutics are advancements that will ignite wider integration of glycosylation in general cell biology. Center for Glycomics, Department of Cellular and Molecular Medicine, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark. ✉e-mail: joshi@sund.ku.dk; hclau@sund.ku.dk https://doi.org/10.1038/ s41580-020-00294-x REVIEWS Nature reviews | Molecular cell Biology http://orcid.org/0000-0002-8592-6763 http://orcid.org/0000-0003-1428-5695 http://orcid.org/0000-0002-8192-2829 http://orcid.org/0000-0002-0915-5055 mailto:joshi@sund.ku.dk mailto:hclau@sund.ku.dk https://doi.org/10.1038/s41580-020-00294-x https://doi.org/10.1038/s41580-020-00294-x http://crossmark.crossref.org/dialog/?doi=10.1038/s41580-020-00294-x&domain=pdf structure–function relationships and roles of specific glycans on specific proteins is incomplete. The glycome is produced and regulated by the gly- cosylation machinery in a single cell, yet analysis of glycans at the single-cell level is not possible with current glycomics methods, which are limited to probing with glycan-specific antibodies and glycan-binding proteins (GBPs, such as lectins). It is therefore often perceived as a daunting task to uncover and dissect specific biolog- ical functions of glycans and the underlying molecular mechanisms. Advances in next-generation sequencing and proteomics are beginning to provide single-cell Table 1 | initiation steps for human protein glycosylation pathways Type Symbol linkage initiation enzymes glycosylation sites in target proteins Sequence motifs Specific domain N-Glycosylation β Asn LLO/GlcNAcβ–Asn OST complex (STT3A/B) N-X-S/T, X≠P N-X-C; N-G: N-X-V; X≠P None O-Glycosylation α Ser/Thr/Tyr GalNAcα–Ser/Thr (Tyr) GALNT1–20 Weak isoform-specific motifs51 None GALNT11 C6-X-X-X-T-C1 LA α Ser/Thr Fucα–Ser/Thr POFUT1 C2-X-X-X-X-(S/T)-C3 EGF POFUT2 C-X-X-(S/T)-C-X-X-G TSR β Ser/Thr GlcNAcβ–Ser/Thr EOGT C5XX(G/P/S)(Y/F/W)(T/S) GXXC6 EGF α Ser/Thr Manα–Ser/Thr POMT1, POMT2 ND None TMTC1–4 ND EC Unknown ND IPT β Ser Glcβ–Ser POGLUT1 C1-X-S-X-(A/P)-C2 EGF POGLUT2–3 C3-X-N-T-X-G-S-(F/Y)-X-C4 β Ser Xylβ–Ser XYLT1, XYLT2 a-a-a-a-G-S-G-a-(a/G)-a (a = D/E) None β Hyl Galβ–Hyl COLGALT1–2 Collagen repeats Collagen β Ser/Thr GlcNAcβ–Ser/Thr OGT None None α Tyr Glcα–Tyr GYG Y194 in glycogenina NA C-Mannosylation α Trp Manα–Trp DPY19L1–4 W-X-X-W TSR Glypiation (GPI anchor) NA Protein–C(O) EthNP–Man Transamidase Carboxy-terminal hydrophic segment None The minimal protein–glycan linkages for the distinct human protein glycosylation pathways are shown with the monosaccharide linkage (including the anomeric α/β configuration of linkage) to amino acid residues in proteins (FIGS 1 and 3), which include 11 types of O-glycosylation when the 2 O-Fuc, 3 O-Man and 2 O-Glc pathways are considered distinct. For references and details, please refer to the main text. COLGALT, collagen O-Gal transferase; DPY19L, dpy-19 like C-Man transferase; EC, extracellular cadherin immunoglobulin-like; EGF, epidermal growth factor-like repeat; EOGT, epidermal growth factor-domain specific O-GlcNAc transferase; EthNP, ethanolaminephosphate; Fuc, l-fucose; Gal, d-galactose; GalNAc, N-acetyl-d-galactosamine; GALNT, polypeptide GalNAc transferase; Glc, d-glucose; GlcNAc, N-acetyl-d-glucosamine; GPI, glycosylphosphatidylinositol; Hyl, hydroxylysine; IPT, immunoglobulin-like, plexin and transcription factor; LA, low-density lipoprotein receptor (LDLR) class A repeats; LLO, lipid-linked oligosaccharide; Man, d-mannose; NA not applicable; ND, not determined; OGT, O-GlcNAc transferase; OST, oligosaccharyltransferase; POFUT, protein O-fucosyltransferase; POGLUT, protein O-Glc transferase; POMT, protein O-Man transferase; TMTC, transmembrane O-Man transferase targeting cadherin (also transmembrane and tetra-trico-peptide repeat (TPR) repeat-containing protein); TSR, thrombospondin type 1 repeat; unknown, transmembrane O-Man transferase to be reported; X, any of the 20 amino acids; Xyl, d-xylose; XYLT, protein O-Xyl transferase. aThe unique glycosylation of glycogenin (GYG) involves autoglucosylation of Tyr residue 194 (Glcα–Tyr) followed by formation of the glycogenin polymer, and is not further discussed in the text. Glycosylphosphatidylinositol (GPI)-anchored glycoproteins A class of proteins that are attached to the membrane lipid bilayer via a carboxy- terminal glycolipid anchor consisting of phosphoethanola- mine, an oligosaccharide core and phosphatidylinositol. Proteoglycans Proteins carrying one or more glycosaminoglycan chains attached covalently. Glycocalyx The cell coat comprising glycans and glycoconjugates surrounding animal cells found as an electron-dense layer by electron microscopy. It protects the cell from physical stress and mediates a plethora of macromolecular and cell–cell interactions. Xenoantigens Antigens found in multiple species that elicit antibodies in a species without the antigen after transplantation of tissues and organs. A major xenoantigen in porcine to human transplantation is the Galα1–3Galβ1–R glycan epitope (αGal). Lectins Proteins that bind to glycans. Major animal lectin families include Galectin, C-type, P-type and I-type lectins. lectins, mostly from plants, with well-characterized binding specificities are frequently used as tools in glycobiology. lectins are often multivalent with binding affinities in the low micromolar range and binding avidities approaching the nanomolar range for larger glycans with multiple epitopes. www.nature.com/nrm R e v i e w s transcriptomes and proteomes, which has opened the way for global analysis of the network of enzymes that orchestrate protein glycosylation and the assess- ment of the glycosylation capacities of any given cell. Accompanying these are the nuclease-based gene edit- ing technologies that — through precise manipulation of glycosylation enzymes — provide virtually unlimited opportunities for engineering, exploration and custom design of cellular glycosylation capacities. We can now probe glycosylation systematically through a genetic entry point, and we foresee that with additional efforts we will be able to connect information on cellular gly- cosylation capacities with the actual outcome of the glycome and roles of glycosylation in cells. Here, we provide an overview of the current know- ledge of the human protein glycosylation pathways and the genes encoding the orchestrating enzymes. We also discuss the emerging genetic and systems bio- logy approaches to the study of protein glycosylation and dissection of biological functions. We are fully aware that the inherent space limitations necessitate some level of generalization with omittance of numerous details (and referencing available literature). Our aim here is to pro- vide a global view of how the human glycome is estab- lished and highlight our current understanding of its roles in physiology. Basis of protein glycosylation Glycans are not primary gene products and, in contrast to proteins, their synthesis occurs without a template. The human genome contains around 700 genes encoding enzymes, transporters and chaperones required for the Ethanolamine phosphate Phosphate d-Glucose (Glc) N-Acetyl-d-glucosamine (GlcNAc)Phosphoinositol d-Galactose (Gal) N-Acetyl-d-galactosamine (GalNAc)d-Mannose (Man) N-Acetylneuraminic acid (NeuAc) d-Glucuronic acid (GlcA) P d-Glucosamine (GlcN) l-Fucose (Fuc) D-Xylose (Xyl) Sulfation (S) d-Ribitol (Rbo) O-GlcNAc Domain specificNucleus and cytoplasm Proteo- glycan O-Xyl Plexin/ IPT domain Cadherin EC domain α-Dystro- glycan Collagen Hyl–Gal TSR domain 3S P nn n n 3S 6S 6S 4S2S NS 2S n GPI anchor N-Linked O-GalNAc EGF repeats (for example, NOTCH) Glyco- sphingolipid NOTCH Plasma membrane LDLR class A repeats O-ManO-Fuc/O-Glc/ O-GlcNAc/ C-Man ol ol ol Fig. 1 | Main classes of glycoconjugates of the human cellular glycome. Depiction of the key components of the cellular glycome, highlighting types of glycosylation that are specific to distinct protein classes or protein domains. The glycans depicted are only illustrative examples of the glycan structures that can be synthesized by the different types of glycosylation pathways. N-glycans and most GalNAc-type O-glycans are widely found on most proteins trafficking the cellular secretory pathway, whereas the occurrence of domain-specific glycans is limited to specific protein domains. The enzymatic processes that orchestrate glycosylation of the different types of glycoconjugates are partly distinct and partly overlapping. The initial attachment of the first monosaccharide (or oligosaccharide for N-glycosylation) to proteins represents the key initiation step that determines which proteins and positions become glycosylated. These initiation steps are distinct for the different types of glycosylation pathways and, to large extent, direct the structures of glycans that are generated. Some overlap in the later processing steps that involve elongation, branching and capping of oligosaccharides is found among N-glycosylation and several types of O-glycosylation as well as in glycosphingolipid biosynthesis (FIG. 3). The background colour scheme is organized according to the colour of the first monosaccharide attached to the core protein, except for glycosylphosphatidylinositol (GPI)-anchored proteins and glycolipids (shown in grey). The colouring scheme is useful for distinguishing the protein glycosylation pathways involved in the synthesis of different types of glycans215 (see also TAble 1). Glycan symbols are drawn according to the Symbol Nomenclature for Glycans (SNFG) format246. Sugar repeat units are indicated by square brackets with ‘n’ to indicate a number of possible repeats. EC, extracellular cadherin; EGF, epidermal growth factor; Hyl, hydroxylysine; IPT, immunoglobulin-like, plexin, transcription factor; LDLR, low-density lipoprotein receptor; NS, non-specific; TSR domain, thrombospondin type 1 repeat domain. Nature reviews | Molecular cell Biology R e v i e w s cellular glycosylation machinery, glycan modifications and their degradation (corresponding to 3–4% of the genome)19–22. Over 200 of these genes encode glycosyl- transferases, and 173 of these work sequentially to estab- lish the complex patterns of sugars found on glycoproteins and lipids20 (TAble 1). Tremendous efforts by a large num- ber of laboratories allowed isolation, cloning, expression and characterization of these enzymes, thereby estab- lishing insight into their substrate and kinetic pro perties and roles in glycosylation pathways (Supplementary Table 1) (for comprehensive reviews, see ReFS16,23–26). With complete genome sequences available, the reper- toire of glycosyltransferase genes is well characterized, and most known glycosidic linkages established between sugar moieties are accounted for20. Nevertheless, new glycosyltransferase genes and even distinct glycosylation pathways are still being discovered27–30. Protein glycosylation takes place in the secretorypath- way (ER and Golgi), nucleus, cytoplasm and mito chondria of all eukaryotic cells (FIG. 2). Ten mono saccha rides — d-glucose (Glc), d-galactose (Gal), N-acetyl-d-glucosamine (GlcNAc), N-acetyl-d-galactosamine (GalNAc), l-fucose (Fuc), d-glucuronic acid (GlcA), d-mannose (Man), N-acetylneuraminic acid (Neu5Ac), d-xylose (Xyl) and d-ribose (Rib) — derived from activated donor sugar nucleotides or dolichol (Dol)-linked donors are used to build the human glycome. Glycans are attached to pro- teins in four different ways — N-linked to asparagine (Asn), O-linked to the hydroxyl groups of serine (Ser), threonine (Thr) or tyrosine (Tyr), C-linked to trypto- phan (Trp) and glypiation — and there are several dif- ferent O-linked sugars including GalNAc, Fuc, GlcNAc, Man, Glc, Xyl and Gal (TAble 1). Different types of pro- tein glycosylation are broadly defined by the sugar– protein linkage, the initial monosaccharide linked to proteins and, for some types of O-glycosylation, by the enzymes directing the first step in protein glycosylation (FIG. 3). Many types of protein glycosylation (TAble 1) start in the ER, two start in the early Golgi (GalNAc-type and Xyl-type) and O-GlcNAcylation takes place in the cyto- plasm and nucleus (GalNAc-type O-glycosylation has also been reported in the nucleus31, but this result may be an experimental artefact resulting from glycosylation occurring during cellular fractionation). Most glyco- syltransferases are type II transmembrane glycoproteins with ER and Golgi lumen-oriented catalytic domains that make use of activated sugar nucleotides (UDP-Glc/ GlcNAc/Gal/GalNAc/Xyl/GlcA, GDP-Man/Fuc, CMP-NeuAc and CDP-ribitol) as donor substrates, and have a short carboxy-terminal segment required for retrograde transport from the Golgi to the ER via COPI-coated vesicles32,33. Type II glycosyltransferases are prone to proteolytic cleavage in stem regions, unteth- ering their catalytic domains from the ER or Golgi mem branes34–36 and accounting for the presence of glyco- syltransferases in body fluids37. Some ER-resident glycosyltransferases are multipass transmembrane proteins and utilize Dol-linked donor substrates (Dol-P-Man, Dol-P-Glc or the lipid-linked oligosaccharide precursor), whereas some glycosyltransferases are soluble ER-resident enzymes (protein O-fucosyltransferase 1/2 (POFUT1/2), protein O-Glc transferase 1–3 (POGLUT1–3), collagen O-Gal transferase 1/2 (COLGALT1/2) and epidermal growth factor-domain specific O-GlcNAc transferase (EOGT) — note that throughout the article we refer to glycosyltransferases by their gene names and when refer- ring specifically to genes we use italics) retained in the ER by C-terminal KDel signals and use activated sugar nucleotides for glycosylation38,39. Structural elaboration of glycans by sequential addition of monosaccharides to extend, branch and cap growing oligosaccharides occurs largely in the Golgi — one exception has been reported in the pro- tein O-Man transferase (POMT)-directed synthesis of O-Man glycans, with the first elongation step mediated by POMGNT2 occurring in the ER40 (FIG. 3). Following anterograde transport of glycoproteins to the surface, further glycosylation (in particular, sialylation) during recycling of membrane glycoproteins can occur41–43, and more extensive modifications including change of O-glycan core structures have been suggested44–47. Glycosylation is orchestrated mainly by the kinetic properties of glycosyltransferases and their compart- mentalization in Golgi stacks, with a distribution related to sequential biosynthetic steps19,20,22,48 (FIG. 2). Formation of multimeric (homomeric and heteromeric) enzyme complexes may contribute to the orchestration of these glycosylation steps49. Insight into the structures and catalytic mechanisms of glycosyltransferases reveals common structural scaffolds with distinct acceptor substrate specificities partly conferred by variable loop regions extending from the core catalytic unit50,51. It is further proposed that evolutionary diversification of the functions of glycosyltransferases involves mutations in the common core sugar nucleotide binding region and varying loop regions, which drive the divergence in donor sugars and acceptor substrate recognition, respectively50–52. Glycosyltransferases utilizing activated donor nucleotide sugars have high specificity for the nucleotide (although they may have some flexibility for the donor monosaccharide) and, in general, form only one type of glycosidic linkage structure50. The final glyco- sylation outcome is influenced by many other factors, including the availability of substrates and sugar nucleo- tides, competing glycosylation reactions, co-factors (such as Mn2+), intracellular transport, pH and actions of protein chaperones and glycosidases, as well as by general factors, for example stress, that may affect the normal cellular state. Human protein glycosylation pathways The known glycome is generated through 16 distinct glycosylation pathways — distinguished on the basis of the sugar–protein linkage, the initial monosaccharide linked to proteins and the unique initiating enzymes — which are directed by at least 173 distinct glycosyltrans- ferases (FIG. 3; see also Supplementary Fig. 1). These gly- cosylation pathways include, apart from 2 types of lipid glycosylation, 14 distinct types of protein glycosylation, including N-glycosylation, 11 types of O-glycosylation, C-mannosylation and generation of GPI-anchored pro- teins (TAble 1). Protein glycosylation involves a series of sequential steps to build characteristic oligosaccharide structures, including an initiation step determining Dolichol (Dol). A polyisoprenol lipid that serves as an acceptor for the lipid-linked oligosaccharides in N-glycan biosynthesis. Type II transmembrane glycoproteins Single-pass transmembrane glycoproteins with the amino terminus oriented towards the cytosol and the carboxyl terminus facing the lumen of the secretory pathway or cell exterior. COPI-coated vesicles Coat protein complex I-coated vesicles that mediate intra- Golgi and Golgi-to-endoplasmic reticulum retrograde transport. Multipass transmembrane proteins Proteins spanning the membrane more than once. KDEL signals A carboxy-terminal lys-Asp-Glu-leu (KDel) retention sequence found on endoplasmic reticulum (eR)-resident proteins. The KDel receptors recognizing this signal facilitate the retrograde movement of eR-based proteins from the Golgi and back to the eR by coat protein complex I (COPI) vesicles. Sialylation Modification by the addition of sialic acids, which are a large family of glycans derived from the neuraminic acid (Neu) monosaccharide with a nine-carbon backbone. In humans, N-acetylneuraminic acid (Neu5Ac) is the most common sialic acid, often found in the non-reducing terminal of glycoconjugates. O-Glycan core structures The initiating O-GalNAc glycan can be extended to form four different common core structures. Core1, Galβ1–3GalNAcα1–O-Ser/Thr; Core2, GlcNAcβ1–6(Galβ1–3) GalNAcα1–O-Ser/Thr; Core3, GlcNAcβ1–3GalNAcα1– O-Ser/Thr; and Core4, GlcNAcβ1–6(GlcNAcβ1–3) GalNAcα1–O-Ser/Thr. The core structures can be further elongated or branched. www.nature.com/nrm R e v i e w s proteins to be glycosylated, immediate core extension steps with options for different core structures, elon- gation/branching steps that expand (and repeat) com- mon structural motifs and capping steps that terminate oligosaccharide chains (FIG. 3). Initiation of protein glycosylation. The initiation step for each type of protein glycosylation is distinct and regu- lated by one or more unique glycosyltransferase, or, for N-glycosylation, an oligosaccharyltransferase (OST) complex or, for GPI-anchored proteins, a GPI–transamidase complex that transfers the preassembled GPI anchor to the C-terminus of select proteins in theER53,54 (TAble 1). A total of 47 of the 173 glycosyltransferases direct initiation steps of protein glycosylation. Initiation of N-glycosylation and likely POMT-directed O-mannosylation occur co-translationally. N-glycosylation is initiated in the ER by the oligo- saccharyltransferase (OST) complex assembled with STT3A or STT3B catalytic subunits for co-translational and post-translational glycosylation, respectively, and these subunits appear to provide some regulation of N-glycosites55,56. The OST–STT3A complex is asso- ciated with the ER peptide translocon54, whereas the OST–STT3B complex includes MAGT1 or TUSC3 oxidoreductase subunits57. The OST–STT3B complex appears to be the main source of released oligosac- charides derived from deglycosylation of misfolded N-glycoproteins destined for proteasomal degradation and found widely in the cytosol56,58,59. GalNAc-type O-glycosylation of Ser/Thr and pos- sibly Tyr is initiated in the Golgi by up to 20 poly- peptide GalNAc transferase (GALNT) isoenzymes with distinct and partly overlapping specificities (of note, only 15 of those have so far been confirmed to be active enzymes25,60), which leads to the genera- tion of the simple GalNAcα1–O-Ser/Thr monosac- charide structure also known as the cancer-associated Tn antigen. GALNTs generally lack clear acceptor sequence motifs in target proteins, but they do exhibit differences in substrate specificities and orchestrate regulation of sites and patterns of O-glycans in pro- teins in cooperative ways25,51. Some GALNTs initiate GalNAc transfer directly to peptides, whereas others only transfer to prior GalNAc-glycosylated peptides (designated follow-up glycosylation). Follow-up glyco- sylation can occur within a short range (1–3 residues) or a long range (6–17 residues) from the initial glyco- sylation site, and the latter is mediated by C-terminal GalNAc-binding lectin domains, a unique prop- erty among metazoan glycosyltransferases51. The GalNAc-type O-glycoproteome is extensive, with more than 3,000 human O-glycoproteins and over 15,000 identified O-glycosites61. Analysis of differen- tial O-glycoproteomes of pairs of isogenic cells with disabled GALNT genes confirms that the expressed repertoire of GALNTs in a given cell determines its O-glycoproteome, with individual isoenzymes making distinct non-redundant contributions62–64. Some GALNTs, for example GALNT1 and GALNT2, have major contributions to the O-glycoproteome, whereas others serve specific proteins and functional roles (for example, GALNT11 serves specifically the low-density lipoprotein receptor (LDLR)-related recep- tor family (LRPs) and activates their ligand binding properties)65. GalNAc-type O-glycosylation cross-talks with other post-translational modifications (examples include FAM20C Ser phosphorylation66,67, VLK Tyr phosphorylation68 and TPST1/2 Tyr sulfation69). Fuc, Glc and GlcNAc types of O-glycosylation are initi- ated in the ER. The most prominent targets for these types of O-glycosylation are the NOTCH receptor epidermal growth factor (eGF)-like repeats (FIG. 1). NOTCH receptor O-glycosylation represents one of the most complex types of glycan-mediated regulation of receptor functions (see also Functions of glycosylation below)39,70–72. Initiation of Fuc-type O-glycosylation is directed by two POFUTs, wherein POFUT1 serves EGF repeats and POFUT2 serves related thrombospondin type 1 repeats. Glc-type O-glycosylation of EGF repeats in NOTCH is differen- tially regulated by the three POGLUTs: POGLUT1 has wide specificity for many NOTCH EGF repeats, whereas POGLUT2 and POGLUT3 have specificities for a single functionally important repeat (NOTCH1 EGF11 and NOTCH3 EGF10) and glycosylate at a different position73 (TAble 1). GlcNAc-type O-glycosylation of EGF repeats is regulated by EOGT74. All of these initiation enzymes require folded repeat domains for activity and acceptor sites have defined sequence motifs. Man-type O-glycosylation initiates in the ER and involves at least three distinct types of initiation enzymes. A yeast-related O-mannosylation type is directed by the POMT1/2 heteromeric complex75. Interestingly, whereas the yeast Man O-glycoproteome is diverse and similar to the GalNAc-type O-glycoproteome76, the human POMT1/2 complex appears to selectively target the mucin-like region in α-dystroglycan and a very limited number of other proteins27. Two additional types of animal O-mannosylation were recently discovered. These are driven by four transmembrane O-Man trans- ferase targeting cadherins (TMTCs), dedicated primar- ily to modifying the cadherin superfamily, and an as yet unreported enzyme that selectively targets IPT domains found in plexins and in receptor tyrosine kinases c-MET and RON27 (TAble 1). Mannosyl moieties can also be attached to Trp resi- dues (in the consensus WxxW motif; shown for throm- bospondin type 1 repeats and type I cytokine receptors), known as C-mannosylation. This modification is driven by four dpy-19 like C-Man transferase (DPY19L) gly- cosyltransferases and occurs in the ER, presumably co-translationally77. Xyl-type O-glycosylation is characteristic of pro- teoglycans and initiated in the Golgi by protein O-Xyl transferase 1 (XYLT1) and XYLT2 that both have a rela- tively defined sequence motif (TAble 1). Xyl-type glyco- sylation at select Ser residues primes the biosynthesis of glycosaminoglycan chains (GAGs). XYLT1 and XYLT2 are differentially expressed and have overlapping substrate specificities78. The diversity of proteoglycans is limited, but a recent sensitive glycoproteomics strategy almost doubled the number of known proteoglycans79. Hydroxylysine (HYL)-Gal O-glycosylation is limited to collagens (important components of the Oligosaccharyltransferase (OST) complex A membrane protein complex in the endoplasmic reticulum that transfers an oligosaccharide from a dolichol pyrophosphate-activated donor to N-linked acceptor sequences on secreted proteins. Translocon A protein complex that mediates translocation of newly synthesized polypeptides from the cytosol across the endoplasmic reticulum membrane. Oxidoreductase An enzyme that catalyses thiol– disulphide exchange reactions. In vivo oxidoreductases are important in the oxidative protein folding that takes place in the endoplasmic reticulum. Well-known examples are PDI and eRp57. Lectin domains Carbohydrate-binding protein domains. Sulfation An enzymatic process that transfers a sulfo group to another molecule, for example a glycan, by modifying a hydroxyl group on a monosaccharide by addition of a sulfo group. Sulfotransferases catalyse the reaction using 3′-phospho- 5′-adenylyl sulfate (PAPS) as a donor. EGF-like repeats Common motifs of 30–40 amino acids found in the extracellular domain of transmembrane proteins or in proteins known to be secreted. The epidermal growth factor (eGF)-like repeats include six conserved cysteines forming three disulfide bonds. Thrombospondin type 1 repeats (TSRs) Common protein motifs of 50–60 amino acids (6 conserved cysteines forming 3 disulfide bonds) found on transmembrane proteins and proteins in the extracellular matrix. Mucin A large viscous heavily O-glycosylated protein. Mucins are the most abundant macromolecule in biofluids and mucus, covering most epithelial surfaces in the body. Nature reviews | Molecular cell Biology R e v i e w s ol ol d-Glucuronic acid (GlcA) UMP UDP UDP M6PR or IGF2R Clathrin NEUs COPII CMP SPPL3/BACE1 UMP CMP GNPTAB/G KDELR UDP UDP GDP UMP GMP UMP UMP SIATs ER GDP UDP UDP GALNTsXYLT1/2 POMTs, TMTCs, DPY19Ls POFUT1/2 OST POGLUTs COLGALT1/2 EOGT OGT GDP 6 7 Galectins Initiation GMP GMP UMP CDP Lysosomal enzyme Ribosome Endocytosis Secretion Retrograde transport Lysosome UDP UDP UDP UDP UDP COPI COPI Core extension Elongation and branching Capping UDP OGTUDP Nucleus Cell-surface glycoprotein Plasma membrane Secretion Degradation 4 5 3 2 1 Golgi TGN Sugar nucleotide donors Sugar nucleotide transporter Adaptor proteins Soluble enzyme with KDEL Dol-P-Man/Glc Multipass transmembrane protein Single-pass Golgi resident UDP Sugar GDP Sugar CMP Sugar Phosphate d-Glucose (Glc) d-Xylose (Xyl) N-Acetylneuraminic acid (NeuAc) N-Acetyl-d- glucosamine (GlcNAc) d-Galactose (Gal) N-Acetyl-d- galactosamine (GalNAc) d-Mannose (Man) l-Fucose (Fuc) d-Ribitol (Rbo) ACP2/ ACP5 CMP Cytoplasm www.nature.com/nrm R e v i e w s extracellular matrix). HYL residues are generated by the activity of pro-collagen lysine hydroxylases (PLOD1–3), and these are subsequently glycosylated by COLGALT1/2 isoenzymes. These events take place in the ER before formation of the mature collagen triple helix, which is secreted80. In addition to catalysing lysine hydroxyla- tion, PLOD3 also harbours galactosyltransferase and glucosyltransferase activity, at least in vitro, suggesting that this enzyme may also be involved in generating the collagen-attached sugar chain81. Finally, O-GlcNAcylation is a highly abundant glycosylation affecting most cytosolic and nuclear proteins that serves as a nutrient sensor and a master switch in regulating signalling, transcription and cellu- lar metabolism4,82. O-GlcNAcylation in the cytosol and nucleus is directed by a single soluble O-GlcNAc trans- ferase (OGT) glycosyltransferase that in complex with an O-GlcNAc hydrolase (OGA; also known as MGEA5) serves to dynamically regulate on/off O-GlcNAc modi- fications on proteins in concert with phosphorylations4. OGT contains amino-terminal transmembrane and tetra-trico-peptide repeats (TPRs) that mediate protein– protein interactions and orchestrate access to substrates83. The residues modified by O-GlcNAcylation are not fur- ther glycosylated but pose particular analytic challenges, being labile and of low stoichiometry. Further processing of protein glycosylation. The process- ing of protein glycosylation involves sequential steps adding further monosaccharides to growing oligosac- charide chains by glycosyltransferases, resulting in core extension, elongation and branching, and final capping of glycans (FIG. 3). The core extension refers to glycosyla- tion steps and glycosyltransferases unique to particular types of protein glycosylation. For N-glycosylation, the initially attached preformed N-glycan oligosaccharide is trimmed by α-mannosidases in the ER, and sequen- tial addition of β-GlcNAc residues by MGATs gen- erates complex-type bi-antennary, tri-antennary and tetra-antennary N-glycan core structures. GalNAc-type O-glycosylation involves four distinct O-glycan core structures (Core1–Core4). POMT1/2-driven Man-type O-glycosylation involves three distinct core struc- tures (Cores M1–M3). Interestingly, TMTC-initiated O-mannosylation does not appear to be processed and appears as the Man monosaccharide. Xyl-type O-glycosylation involves a common tetrasaccharide that is extended by either chondroitin sulfate or heparan sulfate polysaccharides. The elongation and branching biosynthetic steps may be shared among different types of protein glycosylation, and the involved glycosyltransferases therefore function in multiple glycosylation pathways (FIG. 3). Elongation primarily involves N-acetyllactosamine (LacNAc type 2 chain Galβ1–4GlcNAc), often in the form of repeating disaccharides (polyLacNAc) and branches (Galβ1–4GlcNAcβ1–3(Galβ1–4GlcNAcβ1–6)Galβ1– 4GlcNAc). The isomeric disaccharide type 1 LacNAc (Galβ1–3GlcNAc) or the N,N′-Diacetyllactosamine (LacDiNAc, GalNAcβ1–4GlcNAc) disaccharide are also found as terminal disaccharides on a common scaffold of LacNAc glycans. Most of these elongation and branching reactions are shared with glycolipids. The capping step mainly involves terminal decoration of the oligosaccharide chains with Fuc and sialic acid N-acetylneuraminic acid (Neu5Ac) and is directed by the large sialyltransferase and fucosyltransferase families9,84,85 (FIG. 3). Processing steps may be specific for certain types of protein glycosylation (pathway specific) or shared among several types (pathway non-specific). Most pathway-specific enzymes do not have close paralogues, which infers that expression of these enzymes allows predictions to be made about the cellular glycosylation capacity and the glycan structures being produced (FIG. 3) (Supplementary Box 1). This concerns glycosyltrans- ferases involved in initiation and most core extension steps of the different types of protein glycosylation, with a few notable exceptions: initiation of GalNAc-type α-Dystroglycan Dystroglycan is encoded by the DAG1 gene and comprises two non-covalently-bound subunits (α and β). The extracellular α-subunit with the O-Man matri glycan provides binding to laminin, and the transmembrane β-subunit provides binding to dystrophin and the cytoskeleton. IPT domains (also known as TIG domains). Immunoglobulin–plexin– transcription (IPT) protein domains found on cell- surface transmembrane receptors and intracellular transcription factors with an immunoglobulin-like fold. ◀ Fig. 2 | Subcellular organization of protein glycosylation. The initiation steps of most types of protein glycosylation occur in the endoplasmic reticulum (ER) and involve transfer of a first monosaccharide to an amino acid (to Ser, Thr, Tyr, Trp), or in the case of N-glycosylation to a preformed oligosaccharide (to Asn). Two types of O-glycosylation (GalNAc-type and Xyl-type) are initiated in the Golgi. O-GlcNAcylation directed by O-GlcNAc transferase (OGT) occurs in the cytosol and nucleus. The glycosyltransferases directing these steps are indicated by their gene name (or for N-glycosylation, the oligo- saccharyltransferase (OST) enzyme complex). Further glycosylation steps (core extension, elongation and capping) occur throughout the Golgi and trans-Golgi network (TGN). ER-resident glycosyltransferases directing initiation of protein glycosylation are transmembrane proteins (type III) or are equipped with a Lys-Asp-Glu-Leu (KDEL) retrieval signal recognized by the KDEL receptor (KDELR) associated with COPI vesicles (1). Glycosyltransferases in the Golgi are type II transmembrane proteins with luminal catalytic domains and short cytoplasmic carboxy-terminal sequences (2). Glycosyltransferases use dolichol (Dol)-linked donor substrates (Dol-P-Man, Dol-P-Glc or the lipid-linked oligosaccharide precursor) or activated sugar nucleotides transported from the cytosol into the ER and/or Golgi lumen by members of the SLC35 family of solute carriers (3). Although there are over 30 human members of the SLC35 family247, only 9 of these have been demonstrated to serve in sugar nucleotide transport and their specificities for donors are not fully clarified. The cisternal maturation model dictates that Golgi-resident glycosyltransferases are maintained and distributed across stacks by retrograde coat protein complex I (COPI) vesicular transport directed by motifs in short cytoplasmic tail sequences (4). Stem/transmembrane regions of glycosyltransferases can undergo proteolytic cleavage by, for example, signal peptide peptidase-like 3 (SPPL-3) or β-secretase 1 (BACE-1) proteases in the secretory pathway releasing catalytic domains to the extracellular milieu (5). N-Glycoproteins acquiring mannose-6-phosphate (M6P) in the early Golgi by action of the GlcNAc-1-phosphotransferase complex (complex of GNPTA, GNPTB and GNPTG, which catalyses formation of GlcNAc1-P-Man linkages) followed by the uncovering enzyme GlcNAc-1-phosphate hydrolase (NAGPA, which removes the GlcNAc residue leaving M6P) are recognized by the cation-dependent (M6PR) and cation-independent (insulin-like growth factor 2 receptor (IGF2R)) M6P receptors, and transported in clathrin-coated vesicles to the lysosome where mannose is dephosphorylatedby lysosomal acid phosphatases (ACP2 and ACP5), and the receptors are recycled back to the Golgi (6). From the cell surface, glycosylated proteins can undergo endocytosis followed by recycling, degradation in lysosomes or retrograde transport to the TGN/Golgi (7). Neuraminidases (NEU1–4) may remove sialic acids previously attached during the capping step of glycosylation (sialylation), and such desialylated glycoproteins may be recognized by different glycan-binding proteins, such as galectins, and upon internalization undergo resialylation by sialyltransferases (SIATs) again in the TGN. CDP, cytosine diphosphate; CMP, cytosine monophosphate; COLGALT, collagen O-Gal transferase; DPY19L, dpy-19 like C-Man transferase; EOGT, epidermal growth factor-domain specific O-GlcNAc transferase; GALNT, polypeptide GalNAc-transferase; GDP, guanosine diphosphate; GMP, guanosine monophosphate; GNPTAB/G, GlcNAc-1-phosphotransferase; POFUT, protein O-fucosyltransferase; POGLUT, protein O-Glc transferase; POMT, protein O-Man transferase; TMTC, transmembrane O-Man transferase targeting cadherins (also transmembrane and tetra-trico-peptide repeat (TPR) repeat-containing protein); UDP, uridine diphosphate; UMP, uridine monophosphate; XYLT, protein O-Xyl transferase. Nature reviews | Molecular cell Biology R e v i e w s P DPAGT1 ALG2 ALG1 ALG3 ALG9 ALG11 ALG6 ALG8 ALG10 ALG10B ALG12 POMK PGAP4 PIGV PIGM P PIGB PIGZ MGAT1 (MGAT4D) C1GALT1 (C1GALT1C1) B3GNT6 B4GALT7 XXYLT1 B3GLCT POMGNT2 POMGNT1 Core M1 Core M3 Matriglycan Core M2 GXYLT1/2 MFNG/ LFNG/ RFNG B3GALT6 B3GAT3 MGAT5B B3GALNT2 A4GALT B4GALNT1 B3GNT5 B3GALNT1 FKTN/ FKRP B4GAT1 RXYLT1 FUT8 B3GALT4 MGAT2 MGAT3 MGAT5 B4GALNT2 A4GNT ABO Core1 Core2 Core3 Core4 FUT3/5 CHST10* GLCE DSE/DSEL DS CS HS CHST1/3* CHST5/ CHST7/ CHST4/ CHST2/ CHST6 GAL3ST1 GAL3ST2 GAL3ST3 GAL3ST4 HS2ST1 UST CHST10* PLOD3 ALG13/14 GCNT1/3/4 B3GALT1/2/5 B3GNT2–4 B3GNT7–9 B4GALNT3/4 B4GALT1–4 GCNT2/7 B3GAT1/2 ST3GAL1–6 ST6GAL1/2 ST6GALNAC1–6 ST8SIA1–6 FUT1/2 NDST1–4 HS6ST1–3 MGAT4A–C CHST8/9 2 3 4 68 Initiation N-Linked LLO precursor biosynthesis Core extension Elongation and branching Capping Sulfation Lex Lea R NS 3S 6S 4S R 3S 6S R 2S 3S R R RR R R R R R R R R R R R RR R R R R RR S/T/Y N S/T S/T S/T S/T S/T S S S/T S/T Hyl W UGGT1 QC UGGT2 UGCG PIGA UGT8 STT3A/B (OST) POFUT1 POFUT2 EOGT GALNT1–20 POGLUT1 OGT XYLT1/2 POGLUT2/3 COLGALT1/2 POMT1/2 DPY19L1–4 TMTC1–4 ? (IPT domains) EXTL3 3S 6S 4S 2S 2S3SNS 6S 2S 2S ol ol ol 3S 3S 4S 3S 3S 6S 6S 6S 6S 3S 6S Ethanolamine phosphate Phosphate d-Glucose (Glc) N-Acetyl-d-glucosamine (GlcNAc) PhosphoinositolCeramide d-Galactose (Gal) N-Acetyl-d-galactosamine (GalNAc) d-Mannose (Man) N-Acetylneuraminic acid (NeuAc)d-Glucuronic acid (GlcA) l-Iduronic acid (IdoA) d-Glucosamine (GlcN) l-Fucose (Fuc) d-Xylose (Xyl)d-Ribitol (Rbo) Linkage Sulfation R Non-specificSpecificSpecific Lipid glycosylation N-Glycosylation Secretory O-glycosylation Secretory O-glycosylation Secretory C-mannosylation Nucleocytoplasmic O-glycosylation Tn antigen CHST15 B4GALT5/6 LARGE1/2 CHPF/ CHPF2/ CHSY1/ CHSY3 CSGALNACT1/2 EXT1/2 EXTL1/2 DPM1 GDP FUT4–7/ FUT9–11 Type 1 LacNAc Type 2 LacNAc LacDiNAc PolyLacNAc Dolichol CHST3* CHST11–14 HS3ST3A1/B1 HS3ST1/2/4/5/6 ER www.nature.com/nrm R e v i e w s O-glycosylation is regulated by 20 GALNT isoenzymes; tri-antennary branching of N-glycans is regulated by MGAT4A and MGAT4B (note that role of MGAT4C is unclear86); branching of GalNAc-type O-glycans to form Core2 or Core4 O-glycans is mediated by three isoen- zymes (GCNT1, GCNT3 or GCNT4); extension of the POFUT1-mediated O-Fuc glycan is regulated by three isoenzymes (MFNG, LFNG or RFNG); and the impor- tant steps determining chondroitin sulfate or heparan sulfate GAG biosynthesis on the tetrasaccharide linker are governed by CSGALNACT1 or CSGALNACT2 and by EXTL2 or EXTL3, respectively. Glycosyltransferases considered pathway non- specific include 17 enzymes that are involved in elon- gation steps, including B3GNTs, B4GALTs, B3GALTs and B4GALNTs, and 35 glycosyltransferases involved in capping steps, including FUTs, ST3GALs, ST6GALs, ST6GALNACs and ST8SIAs as well as B3GATs, A4GNT and ABO (FIG. 3). Arguably, these glycosylation pathway non-specific enzymes direct the greatest diversity in the glycome, although they also produce common struc- tural scaffolds that may reduce this diversity in terms of distinct functional binding epitopes of the glycome9. Moreover, most of these glycosyltransferases belong to isoenzyme families with overlapping properties and poorly understood non-redundant functions, which at least partly hampers the ability to predict the glycan structures produced by analysis of enzyme expression (Supplementary Box 1). Side-chain modifications of glycans. Sulfation is the most abundant and diverse glycan modification. Whereas 35 Golgi sulfotransferases are involved in glycan sulfation, only two sulfotransferases (TPST1/2) direct tyrosine pro- tein sulfation87. The majority of these sulfotransferases serve in decorating GAGs, and different sulfation pat- terns produced on these large polysaccharides serve as distinct binding motifs for proteins and regulate wide essential biological roles (FIG. 3). Whereas the biosynthe- sis and sulfation of GAGs is well understood, insight into the specific structures of biologically active GAG motifs and the sulfotransferase isoenzymes directing these is still limited88,89. A different type of GAG, keratan sulfate, is found on N-glycoproteins and O-glycoproteins and is built on polyLacNAc repeats. Keratan sulfate synthesis is initiated by 6-O-sulfation of GlcNAc carried out by CHST6/2 and subsequent galactosylation by B4GALT4 and elongation by B3GNT7 with further 6-O-sulfation of Gal involving CHST1/3 (ReF.90). Some 14 sulfotransferases are predicted to direct sulfation of N-glycoproteins and O-glycoproteins, but the specific roles of these isoenzymes are only partly understood87 (FIG. 3). Phosphorylation of glycans regulates glycosylation and serves as a gatekeeper for progression to elonga- tion steps. For example, the extracellular kinase POMK phosphorylates the O-Man residue in α-dystroglycan to induce synthesis of the elaborate matriglycan — an extracellular matrix-binding motif on α-dystroglycan91,92 (FIG. 3). FAM20B phosphorylates the Xyl residue in the forming tetrasaccharide linker, and this phosphorylation affects the third synthesis step directed by B3GALT6 regulating GAG synthesis93,94. Acetylation of sialic acids is an abundant modifica- tion that regulates the interaction of sialoglycoproteins with cellular receptors such as the sialic acid-binding immunoglobulin-like lectins (Siglecs)95. Acetylation also conveys resistance to sialic acid removal by most sialidases7. Sialic acid acetylation occurs through 9-O or 7-O-acetylation of the activated Neu5Ac donor sugar nucleotide by CASD1, and incorporation of acetylated Neu5Ac into glycans by sialyltransferases in the Golgi96. Sialate 9-O-acetylesterases serve as NeuAc deacetylases97. Of interest, certain viral receptors bind 9-O-acetyl Neu5Ac (ReF.98). Thus, acetylation of sialic acids in glycans may function in regulating interactions with endogenous receptors, while being exploited by pathogens. Context-specificity of glycosylation The glycosylation of proteins, in general, reflects the glycosyltransferase repertoire and the glycosylation capacities of the producing cell. However, the individ- ual protein may not be efficiently glycosylated, leading to heterogeneities, and certain glycosylation features are targeted to specific proteins and, hence, not universally found. Moreover, the secretory pathway and glycosyl- ation machinery, in general, are influenced by numer- ous cellular andenvironmental factors that affect the glycosylation efficiency. Heterogeneity in protein glycosylation. Different effi- ciencies in the initiation of glycosylation at specific sites in proteins (stoichiometry of glycan attachment) and Fig. 3 | Human glycosylation pathways and enzymes. A global view of glycosylation pathways, the major structural elements of the glycans and the assigned (predicted) biosynthetic roles for glycosyltransferases as well as carbohydrate sulfotransferases (indicated are genes encoding these enzymes; please note that the enzymes that initiate the novel O-mannosylation specific for immunoglobulin-like, plexin, transcription factor (IPT) domains are currently not reported (question mark)). The 16 known glycosylation pathways are organized into major biosynthetic steps specific for pathways (initiation and core extension) and those that are non-specific (elongation and branching, and capping). For pathways involving isoenzymes, all isoforms are listed (with a dash or solidus character). The background colours for the different protein glycosylation pathways mimic the colours of the initiating monosaccharide (lipid glycosylation pathways are shown in grey), and these are maintained for the pathway-specific glycosylation steps. Dashed lines indicate alternative pathways for chondroitin sulfate (CS) and dermatan sulfate (DS) or heparan sulfate (HS) biosynthesis on the common tetrasaccharide linker. The display is useful to peruse individual genes as well as the glycosylation processes as an integrated system. Please note that the major structural variations characteristic of the glycosylation pathways are illustrated, but the diagram is not intended to show all structural permutations and variations found in cells (linkages only indicated for select structures). Supplementary Table 1 provides detailed information of the glycosyltrans- ferase genes mapped. For information on transcriptional regulation of glycosylation enzymes shown and their association with congenital diseases and genome-wide association study traits, see Supplementary Figs 1 and 2 (note that these present the biosynthetic pathways in a vertical orientation as compared with horizontal orientation shown here and used before215). *Transferases that appear twice in the figure due to dual pathway-specificity. COLGALT, collagen O-Gal transferase; DPY19L, dpy-19 like C-Man transferase; EOGT, epidermal growth factor-domain specific O-GlcNAc transferase; ER, endoplasmic reticulum; GALNT, polypeptide GalNAc-transferase; GDP, guanidine diphosphate; Hyl, hydroxylysine; LacNAc, N-acetyllactosamine (Galβ1–4GlcNAc); LAcDiNAc, N,N′-Diacetyllactosamine (GalNAcβ1–4GlcNAc); LLO, lipid-linked oligosaccharide; NS, non-specific; OGT, O-GlcNAc transferase; OST, oligosaccharyl- transferase; POFUT, protein O-fucosyltransferase; POGLUT, protein O-Glc transferase; POMT, protein O-Man transferase; QC, quality control; R, variable underlying core glycan; S, sulfation; TMTC, transmembrane O-Man transferase targeting cadherins (also transmembrane and tetra-trico-peptide repeat (TPR) repeat-containing protein); XYLT, protein O-Xyl transferase. ◀ Glycosaminoglycan chains (GAGs). extended linear polysaccharides comprising repeating disaccharides. Stoichiometry The fraction of a glycosylation site in a glycoprotein that is occupied by a glycan. Nature reviews | Molecular cell Biology R e v i e w s variable processing of glycans at sites in proteins (struc- tures of the glycan) will result in macroheterogeneity and microheterogeneity of glycosylation, respectively. Glycosylation may be influenced by the overall pro- tein structure and conformational constraints around glycosites. For example, the presence of high-Man N-glycans at select sites in mature glycoproteins can correlate with steric hindrance of the action of manno- sidases at these sites in the ER. Preferences for acquir- ing bi-antennary or multi-antennary glycans regardless of the available glycosylation capacity are found; for example, generation of the conserved IgG1 biantennary N-glycan (Asn297) in the Fc region shows that glycan– peptide backbone interactions and steric hindrance for processing enzymes result in inefficient branching, galactosylation and sialylation99–102. Assessing hetero- geneity in glycosylation of specific proteins is inherently challenging in glycoproteomics and requires elabo- rate site-specific analysis with isolated glycoproteins, but progress towards quantitative glycosite-specific glycoproteomics workflows is being made103. Protein-specific glycosylation features. Some gly- cosylation features are only observed in a subset of cellular proteins. The most notable example here is tagging of lysosomal hydrolases (some ~60 different N-glycoproteins) with mannose-6-phosphate (Man-6-P) by the GlcNAc-1-phosphotransferase (GNPTAB– GNPTG) and the uncovering enzyme (UCE)23 (FIG. 2), which serves as a ligand for the Man-6-P receptors in the trans-Golgi network to direct transport of these enzymes to the endo-lysosomal system104. Mechanistically, the GlcNAc-1-phosphotransferase appears to recognize conformational motifs in the diverse lysosomal enzyme proteins to select high-Man N-glycans105. Polysialylation (α2–8NeuAc) is another protein-specific modification found on select N-glycans of the neural cell adhesion molecule and few other proteins that depends on inter- actions with protein motifs106,107. Polysialylation may also be found on O-glycans of Neuropilin-2 (ReF.108). Glycosylation features derived from the extracellular milieu. The glycome of a cell may to some degree also depend on the glycosylation capacities of other cells through transfer of glycoconjugates and extracellular glycosylation reactions. For example, uptake of AbH blood group glycolipids from plasma lipoproteins to red blood cells109 and uptake of GPI-anchored glyco- protein CD52 in sperm during maturation110 both serve to introduce new glycan structures. Glycosylation enzymes can also be secreted. Secreted ST6GAL1 can contri- bute to extracellular sialylation of IgGs and remodel- ling of cell-surface glycans111–114. Glycosyltransferases and glycoproteins can also be transferred between cells via extracellular vesicles and non-membranous nanoparticles (exomeres), having functional conse- quences in recipient cells115. Finally, pathogenic bacte- ria encode and can inject virulent glycosyltransferases that O-glycosylate and N-glycosylate host proteins and interfere with cellular functions, as well as glycoside hydrolases that reprogramme cell-surface glycans116–118. Functions of glycosylation Glycosylation has roles in folding, quality control, sta- bility, transport and function of proteins7, and many of these functions serve the proteome globally. For example, in the ER, the initial steps of N-glycosylation guide protein folding and their quality control119. Fig. 4 | Protein glycosylation serves general roles and specific roles for protein functions. a | Specific roles of glycosylation in regulating protein functions. Intracellularly, in the nucleus and cytosol, dynamic O-GlcNAcylation regulates numerous cellular processes, including transcription and signalling, and serves in nutrient sensing. In the secretory pathway, endoplasmic reticulum (ER)-initiated types of glycosylation may serve in folding, stability and trafficking of proteins. Several types of glycosylation serve specific roles for select proteins driven by differentially regulated glycosylation at specific glycosites, for example, co-regulating processing (for activation or inactivation) by proprotein convertases (GalNAc-type). Within the glycocalyx at the cell surface (zoom-in box), specific glycosites serve to (from left to right): regulate the stability and/or activity of glycoproteins, by affecting their susceptibility to regulated proteolytic processing (for example, G protein-coupled receptor (GPCR) amino-terminalcleavage to regulate their signalling or entire ectodomain shedding of membrane proteins) (GalNAc-type); by providing contact points for cell adhesion (Poly-Sia and Man-type); by modulating the ligand specificity and/or affinity of receptors like low-density lipoprotein receptors (LDLRs) and NOTCH (GalNAc-type, Fuc-type and Glc-type); by modulation effector function of immunoglobulins (N-glycans); or by regulating receptor dimerization and signalling (GalNAc-type). Glycans also serve in a myriad of interactions with endogenous glycan-binding proteins (GBPs) (intrinsic, left) and microbial GBPs (extrinsic, right). Intrinsic glycan interactions serve in cell–cell adhesion and communication, and are mediated by, for example, sialic acid-binding immunoglobulin-like lectins (Siglecs) binding to sialic acid capped glycans for immunomodulation; selectins binding to, for example, sialyl-Lewisa antigen for cell trafficking; and galectins binding to different β-Gal glycans for numerous functions, for example, cross-linking cell-surface glycoproteins by Galectin-3 pentamers in a dynamic lattice that regulates endocytosis and compartmentalization248. The large Siglec family promotes cell–cell interactions and regulates important functions in the immune system by engaging numerous sialylated glycan epitopes. Selectins (P-selectin, L-selectin and E-selectin) specifically require both sialic acid and fucose, and in some cases also sulfate, for binding. Extrinsic glycan interactions involve binding by bacterial lectins or adhesins mediating bacteria–host interactions, including initial attachment and potential invasion and colonization; for example, Helicobacter pylori encodes numerous adhesins that recognize different histo-blood group-related glycans in the human gastrointestinal tract mucosa249–251. Bacteria also produce glycoconjugates like lipopolysaccharides and glycan structures that mimic host glycans, allowing the bacteria to avoid recognition by host immune cells (mimicry). b | Overview of the common scaffolds and capping motifs, and the combinations of these that help generate diversity of glycans, which serve as the major ligands for GBPs (see FIG. 3 for enzymes involved). Core glycan structures (bottom left) are elongated with one of three types of chains (type 1 N-acetyllactosamine (LacNAc, Galβ1–3GlcNAc), type 2 LacNAc (Galβ1–4GlcNAc) or N,N′-Diacetyllactosamine (LacDiNAc, GalNAcβ1–4GlcNAc); labelled R1–R3, respectively). The core glycan structures can be extended by chains from R1–R3 (except for O-Man cores, which can only be extended by R2). Type 2 chains may be repeated to form poly-LacNAc chains (indicated by square brackets, ‘n’ indicating the number of repeat units). The terminal core structures may be variously fucosylated, sulfated and capped by sialic acids, as may residues along the poly-LacNAc chain, and the combinatorial action of relatively few transferases results in complex glycan structures with large diversity. To better indicate which residues have been added in a synthesis step, the residues added by previous steps are presented in grayscale. Pol II, polymerase II; X, any amino acid. ◀ ABH Carbohydrate antigens of the AbO blood group system. Lipoproteins large complexes of lipids and proteins that transport lipids in the blood. Lipopolysaccharide A bacterial glycolipid endotoxin and a major constituent of the outer membrane of Gram-negative bacteria. www.nature.com/nrm R e v i e w s Further, O-glycosylation via O-Fuc and O-Glc of folded EGF-like and thrombospondin type 1 repeat domains stabilizes these after folding, enabling secretion71,120,121. Interestingly, other types of O-glycosylation do not appear to be required for protein transport and secre- tion. On the cell surface, glycoconjugates constitute the glycocalyx that provides a barrier and a protective layer shielding the plasma membrane from physical stress and shaping the cell surface. The glycocalyx serves to mediate numerous intrinsic and extrinsic interactions. Excellent recent reviews cover these topics7–9, and here we focus on non-global roles of glycosylation, high- lighting examples where specific glycosyltransferases regulate specific functions at the level of glycosites or glycostructures (FIG. 4a). Functions determined at the glycosite level. The NOTCH receptor illustrates how multiple types of O-glycosylation — O-Fuc (directed by POFUT1), O-GlcNAc (directed by EOGT)122 and O-Glc (directed by the POGLUT1–3 isoenzymes) — converge to modulate complex protein functions by decorating distinct glycosites in the many NOTCH EGF repeats39 (TAble 1). With 20 isoenzymes, GalNAc-type O-glycosylation by GALNTs provides the greatest opportunity for differential regulation of specific sites in proteins, and GALNT isoenzymes serve highly specific co-regulatory roles in fundamental processes, including in the inhibition of protein processing by proprotein convertases35,123, in the inhibition (or, in some cases, activation) of proteolytic cleavage by ADAM pro- teases with altered shedding of ectodomains of surface proteins124, in the activation of β1-adrenergic receptor by modulating its N-terminal cleavage125, in the modu- lation of peptide hormone function and promoting their stability126,127, and in promoting ligand binding by LDLR and LRPs65 (FIG. 4a). These functions are subject to tight regulation by expression levels of specific GALNTs128, which may be controlled by feedback loops as shown for GALNT3: GALNT3 uniquely co-regulates propro- tein convertase-mediated processing of FGF23 that serves to regulate phosphate homeostasis129, and in turn the expression of GALNT3 is responsive to blood phos- phate levels130. Man-type O-glycosylation by TMTC1–4 also provides for differential regulation of O-glycans on cadherins and protocadherins that may regulate their functions in cell adhesion131. Initiation of GAG synthesis by XYLT1 and XYLT2 also likely directs different subsets of proteoglycans and corresponding functions78. The dynamic and widespread O-GlcNAcylation in the nucleus and cytosol regulated by the OGT and OGA pair of enzymes has a major role in nutrient sensing through the hexosamine biosynthetic pathway and modu- lating the cellular response to stress4. O-GlcNAcylation at specific glycosites regulates functions of proteins, including their enzymatic activity, stability and locali- zation, and the dynamic on/off nature of O-GlcNAc on glycosites enables O-GlcNAcylation to regulate tran- scription and signalling events in crosstalk with protein phosphorylation4. O-GlcNAcylation regulates transcrip- tion by modifying drivers of gene expression like Pol II, where C-terminal addition and removal of GlcNAc by OGT and OGA during the transcription cycle regulates a b S/T/Y Fucosylation Fucosylation Fucosylation Sulfation Sialylation Fucosylation Sulfation Sialylation Fucosylation Sulfation Sulfation Lipids N-Glycan Core1 GalNAc- type O-Man (only R2) R2 R2R1 S/T/Y S/TN-X-S/T n n n n nn n n n GPCR S/T/YR1 R2 R1 R2 R3 R1 R2 R3 S/T/Y R1 R2 R3Type 1 LacNAc Type 2 LacNAc LacDiNAc 6S 6S 3S 4S 3S 3S S/T/YR3 Elongation Core extension Sialylation Adhesins Receptor dimerization Ligand binding Cell adhesion Signalling Signalling Signalling Effector function Siglecs Selectins Galectins R-X-X-R-T/S Intracellular trafficking Glycoprotein folding Nutrient sensing and signalling Pathogen mimicry Endocytosis Antibody N-Acetylneuraminic acid (NeuAc) Sulfation (S) d-Mannose (Man) d-Glucose (Glc) N-Acetyl-d-glucosamine (GlcNAc) d-Galactose (Gal) N-Acetyl-d-galactosamine (GalNAc) l-Fucose (Fuc) Processing and activity regulation Ectodomain cleavage Processing by pro-protein convertases Cell–cell interaction Bacteria–host interaction Golgi ER Nucleus Pol IIDNA Core2 OR Nature reviews | Molecular cell Biology R e v i e w s transcription activationand repression. Site-specific O-GlcNAcylation also regulates important processes such as neuronal depolarization132 and intermediate fil- ament morphology and cell migration133. The OGT sub- strate selectivity is partly regulated through interactions between the TPR domains, adaptor proteins and target protein substrates83. Functions regulated at the glycan structure level. There is abundant evidence that the structure of glycans on pro- teins regulates diverse protein interactions and biological functions. For N-glycan, for example, tetra-antennary branching by MGAT5 favours synthesis of polyLacNAc chains that promote multivalent interactions with lectin receptors, for example galectins134–136. Functionally, MGAT5-mediated branching on α5β1 integrin serves to enhance cell motility and growth137. By contrast, MGAT3 introduces a bisecting GlcNAc residue to the early stages of complex N-glycans that is not elongated and leads to inhibition of further N-glycan branching, and there- fore is a major determining factor for the N-glycome architecture138. In line with this, MGAT3-directed gly- cosylation of α5β1 integrin reduces its ability to bind to its ligands in the extracellular matrix and interferes with cell migration139. As another example, FUT8-directed fucosylation of the N-glycan core modulates several specific protein interactions, including EGF–EGFR binding140, interactions between the T cell receptor and the major histocompatibility complex II during T cell activation141, and IgG1 interactions with the FcγRIIIa receptor99,142 (FIG. 4a). Protein function can also be modulated by specific GalNAc-type O-glycan core extensions. GCNT1- directed Core2 O-glycosylation promotes Galectin- 1-induced T cell death of immature T cell precursors and activated T cells143. GCNT3-directed Core2 O-glycans are also required for surface expression of intestinal cell differentiation markers144. The β3GlcNAc exten- sion of O-Fuc on NOTCH regulates the interaction of the receptor with its ligands, Delta and Jagged, and the three isoenzymes, MFNG, LFNG or RFNG, directing the O-Fuc elongation appear to target distinct EGF repeats and determine differential binding between these two ligands71. Another example is assembly of the matri- glycan on α-dystroglycan. In the ER, POMGNT2 primes matriglycan synthesis by elongating select O-Man gly- cans on α-dystroglycan145, and in the Golgi, LARGE1/2 produce the functional glycosaminoglycan-like matriglycan polymer28. The common structural capping motifs on glycans found on several types of glycoproteins (FIG. 4b) comprise the major interactome for GBPs, including the endog- enous lectin receptors galectins, Siglecs and selectins, and diverse microbial lectins and adhesins7,9,146 (FIG. 4). Common for most mammalian lectins are shallow binding sites and low affinity interactions (micromolar to millimolar) for single glycan epitopes, allowing for dynamic and reversible glycan–receptor associations. GBPs may also recognize more complex structures (clus- tered saccharides or discontinuous glycans) involving a higher-order presentation of glycans that can be within a context of specific glycoconjugates and even specific proteins resulting in increased affinity towards their target glycans147. Glycan binding motifs are generally directed by multiple isoenzymes that are glycosylation pathway non-specific and highly regulated. However, these isoenzymes may have non-redundant activities and selectively regulate glycosylation: in megakaryo- cytes, the B4GALT1-driven LacNAc epitope regulates β1 integrin activity required for the formation of mature platelets148; terminal sialylation of LacNAc or LacDiNAc O-glycans by ST6GAL1 was found to be important to regulate B cell activation via lectin Siglec-2 (CD22)95,149; and ST3GAL1-directed capping of Core1 O-glycans controls CD8+ T lymphocyte homeostasis by inhibiting apoptosis and regulating memory cell formation150,151. Changes of glycosylation in disease Glycosylation processes in cells are highly sensitive to the physiological state, and glycans are prevalent report- ers of disease152,153. An overwhelming number of studies using lectins, antibodies and direct structural analyses of glycan features have documented diverse changes in gly- cosylation in human diseases and especially cancers153. However, our insight into how these changes in gly- cosylation occur, what the functional consequences, if any, are and the nature of specific mole cular mech- anisms is still limited. Congenital disorders of glyco- sylation (CDGs) and genome-wide association studies (GWAS) are beginning to enable us to deconvolve highly specific roles of glycosylation in common diseases (for a global overview of the glycosylation steps and path- ways served by glycosyltransferase genes known to cause CDGs and/or implicated by GWAS traits, see Supplementary Fig. 2). Lessons from congenital disorders of glycosylation. The importance of glycosylation is clearly under- scored by decades of studies of rare monogenic CDGs, where more than 60 of the 100 CDGs identified to date are caused by complete or partial loss of function of a glycosyltransferase17,18 (Supplementary Table 1). CDGs may be caused by mutations in glycosyltrans- ferase genes in all types of glycosylation pathways, but currently most glycosyltransferase-related CDGs are caused by deficiency in unique enzymes functioning in pathway-specific glycosylation steps where loss of func- tion results in global glycome changes (Supplementary Fig. 2a). These CDGs are generally characterized by multisystemic disorders, with a wide spectrum and severity of clinical manifestations predicted to arise from a myriad of impaired direct and indirect biologi- cal functions of glycans, and these are often difficult to trace and dissect17,18. CDGs were often identified by profiling N-glycans on an abundant serum glycoprotein (usually transfer- rin), but today CDGs are also discovered by genome sequencing that is uncovering CDGs with more subtle phenotypes154. This has resulted in the discovery of novel CDGs caused by deficiencies in isoenzymes with subtle non-global glycosylation functions20,155, and these are pointing us to highly specific regulatory roles of protein glycosylation. The most illustrative examples so far are CDGs caused by loss of function within the GALNT Proprotein convertases A family of seven secretory mammalian serine proteinases that post-translationally activate proproteins in the secretory pathway by limited proteolysis after multiple basic residues with a general recognition motif, (R/K)Xn(R/K). A prototypical proprotein convertase is furin. www.nature.com/nrm R e v i e w s isoenzymes, which have uncovered how site-specific O-glycosylation fine-tunes highly specific protein func- tions and regulates major physiological processes such as blood phosphate homeostasis (GALNT3) and likely also kidney function (GALNT11)129,156,157 (FIG. 5). Lessons from genome-wide association studies. Survey- ing glycosyltransferase genes implicated by GWAS for diverse traits or predispositions interestingly revealed that GWAS candidate genes represent almost a mirror of the glycosyltransferase genes so far identified as causing ↑GALNT3/T6 Tn T STn Initiation FUT8↑↓ MGAT3↓ MGAT5↑ ST6GALNAC1↑ C1GALT1C1↓ COSMC↓ GCNT1↓ ↑FUT6 ST6GAL1↑ SLeX α2-6-SA α2-6-SA FUT3↑ SLea B3GALT5↓ GCNT3↓↓B3GNT6 Placental CS • Adhesion • Migration Signalling Metastasis Apoptosis Immune recognition 6S 4S 2S NS 3S 6S Healthy Cancer TGFβ Immunosuppression EMT Siglec 7/9 Siglec 15Activation Signalling Activation Macrophage Epithelial/ endothelial cell ECM T cell NK cell SelectinT cell receptor, NK cell activated receptor Siglec Deathreceptor E-Cadherin CSF3R EGFRIntegrin Specific Core elongation and branching Non-specific d-Xylose (Xyl) N-Acetyl-d-glucosamine (GlcNAc)d-Galactose (Gal) N-Acetyl-d-galactosamine(GalNAc)d-Mannose (Man) N-Acetylneuraminic acid (NeuAc) d-Glucuronic acid (GlcA) l-Fucose (Fuc) Cell death signalling Cell death signallingChanges in GAGs • Reduced Core2 branching • Reduced Core1 elongation • Reduced Core3/4 structures • Accumulation of Tn and STn • Introduction of bisecting GlcNAcs • Changes in branching • Changes in core Fuc • Switch from 2-3 to 2-6 sialic acid capping • Increased SLex and SLea epitopes Fig. 5 | common dysregulated glycosyltransferase genes in cancer. Three major glycosylation pathways (N-glycosylation, GalNAc-type O-glycosylation and O-Xyl glycosaminoglycans) that undergo characteristic changes in cancer (left) and examples of the biological effects these may exert in cancer (right). Glycosylation pathways are simplified accord ing to the scheme in FIG. 3, and the key glycosyltransferase genes undergo ing altered expression in cancer are indicated (arrows indicate up/downregulation of expression). Examples of specific effects of altered glycosylation in cancer are illustrated. Adhesion and migration: increased branching of N-glycans (MGAT5↑, MGAT3↓) and core fucosylation (FUT8) on adhesion receptors (for example, E-cadherin, integrins) modulates both cell– cell adhesion252,253 and, alongside increased α2-6-sialic acid (α2-6-SA) capping (ST6GAL1), cell–extracellular matrix (ECM) adhesion through integrin receptors253. Mechanisms behind the changes in adhesiveness can be related to the abundance of receptors, conformational changes or presentation of binding epitopes to adhesive partners. These glycosylation changes are one of the drivers of epithelial–mesenchymal transition (EMT). Metastasis: alongside increased propensity for EMT, the metastatic potential of cells is increased by the upregulation of expression of sialyl-Lewisx (SLex) (FUT6 and ST6GAL1) and sialyl-Lewisa (SLea) (FUT3) glycan epitopes, which are recognized by selectins on epithelial and endothelial cells promoting tumour cell extravasation from vasculature and metastatic site homing196–198. Immune recognition: altered sialylation patterns on tumour cells allow for evasion from immune surveillance. This is mediated by the sialic acid-binding immunoglobulin-like lectin (Siglec) family of receptors, both inhibitory (for example, Siglec 7/9) and activating (for example, Siglec 15), which recognize different sialic acid ligands and modulate the signalling responses to tumour ligands resulting in immunosuppression254. Changes in sialylation may be attributed to cancer-associated changes in glycosyltransferase genes in the GalNAc-type O-glycosylation pathway, most notably the ST6GALNAC1 gene that induces expression of the STn epitope185–187. Signalling and apoptosis: glycosylation can modulate the proximity of cell-surface receptors to each other, affecting the ability to directly interact or cluster, thereby affecting their signalling capability. For example, loss of a single GalNAc-type O-glycosylation site (unknown polypeptide GalNAc transferase (GALNT) isoform) on one of the cytokine receptors of granulocytes, colony-stimulating factor 3 receptor (CSF3R), is suggested to result in increased ligand independent dimerization and aberrant signalling255. Similarly, increased N-linked branching and core fucosylation (FUT8) on epidermal growth factor receptor (EGFR) increase receptor dimerization and signalling promoting growth and progression of cancer181. By contrast, clustering of death receptors is reduced in cancers due to the truncated GalNAc-type O-glycans and modified N-glycosylation, which leads to tumour cells escaping apoptotic fates256. CS, chondroitin sulfate; GAG, glycosaminoglycan; NK cell, natural killer cell; NS, non-specific; S, sulfation; TGFβ, transforming growth factor-β. Nature reviews | Molecular cell Biology R e v i e w s CDGs, and they concentrate around those found to exhibit high transcriptional regulation158 (Supplementary Fig. 2a). This suggests the somewhat controversial con- clusion that dysregulation of glycosyltransferase iso- enzymes, rather than loss of function of these enzymes, is involved in more common disease conditions, such as coronary disease, osteoporosis, chronic kidney disease and schizophrenia identifiable by GWAS158. This predic- tion was first supported by studies of GALNT2 impli- cated in regulation of high-density lipoprotein (HDL) and triglycerides important for cardiovascular health159. Individuals with disrupted GALNT2 show lower HDL levels155, which has been recapitulated in several animal models with loss of function of Galnt2 (ReF.160). The GWAS signal for GALNT2 is located close to a liver-specific regulatory element that induces differential allele- specific transcription161,162, supporting the idea that liver-specific dysregulation of GALNT2 is the cause of the altered HDL metabolism. GALNT2 serves non-redundant O-glycosylation on ANGPTL3 and phos- pholipid transfer protein (PLTP), two proteins involved in regulating HDL. GALNT3 is a GWAS candidate for bone mineral density163, which is consistent with CDG loss of function causing hyperphosphataemia and familial tumoral calcinosis with ectopic bone formations129,156,164. GALNT11, a GWAS candidate for chronic kidney decline, likely relates to its role in directing O-glycosylation of the ligand binding regions of LDLR and LRPs, includ- ing LRP2 (also known as megalin) that serves as the major endocytic receptor in the proximal tubules of the kidney65. GALNT11 O-glycosylation enhances ligand affinity of LDLR65, and a mouse Galnt11–/– model revealed an essential role of O-glycosylation in LRP2 regulation and kidney function157. Dysregulation of glycosyltransferases in cancer. Recent reviews detail the prevalent glycome changes that equip cancer cells with distinct glycan features required dur- ing different stages of tumour growth and dissemination, including immune evasion and metastasis153,165–167 (FIG. 5). Given such common aberrations of glycosylation in can- cer, it may be surprising that somatic mutations in genes controlling cellular glycosylation are extremely rare, and very few validated mutations in glycosyltransferases have been reported in cancer. Mutations in COSMC encoding a private chaperone required for C1GALT1 that directs GalNAc-type O-glycan Core1 elongation were reported in a few cervical cancers168,169, but studies in colorec- tal and pancreatic cancers found hypermethylation of COSMC rather than somatic mutations to be the major reason for the O-glycan truncation that is a hallmark of epithelial cancers (see also below)169,170. Relatively rare heterozygous missense mutations in GALNT12 were reported in colorectal cancers, and these were also shown to affect catalytic properties in recent structural analysis171. An analysis of data from The Cancer Genome Atlas (TCGA) reveals that cancer genes are more often mutated (for example, KRAS is mutated in 8% of can- cers) than typical glycosyltransferases, which have pro- tein coding mutations in only around 1% of cancers, underlining the rarity of somatic mutations deleterious to the glycosylation machinery. In contrast to protein-coding somatic mutations, there is more evidence to support aberrant expression of glycosyltransferases in cancer153,172–175; for example, as seen with a hotspot of non-coding mutations around the ST6GAL1 gene in B cell non-Hodgkin lymphomas176. Still, only a few studies have provided compelling insights into the biosynthetic, structural and functional conse- quences of misexpression of specific glycosyltransferases in cancer, and we limit our discussion to these (of note, we excluded discussion of studies using RNA silencing to validate involvement of specific glycosyltransferase genes as these generally require further validation due to ineffi- ciencies in reduction of enzyme activities and off-target effects). The mechanisms behind altered expression may include epigenetic regulation169,177
Compartilhar