Baixe o app para aproveitar ainda mais
Prévia do material em texto
Some genomes may contain >100,000 protein-coding genes1, but which are really indispensable for life? The answer to this fundamental — yet deceptively simple — question depends on the definition of gene essen- tiality. Working on Mendelian segregation patterns in mice, Castle and Little2 were the first to define the ‘lethal’ phenotype in genetic terms. Since then, several definitions have been used to describe this seemingly intuitive concept, but herein we refer to essential genes as those indispensable for reproductive success. To assess the ‘essentiality’ of a gene, researchers therefore need to assess the phenotype of a living system that either entirely lacks that gene or in which the expression or function of that gene has been substantially impaired. In the case of single-celled organisms or of single cells derived from multicellular organisms, this would trans- late into identifying genes required for the proliferation of individual cells (cellular gene essentiality). By contrast, in the case of multicellular organisms, this would mean finding genes required for growth and development from a zygote into a fertile adult (organismal gene essen- tiality). Here, we focus mostly on cellular gene essen- tiality and in particular on genes whose inactivation or loss causes either severe growth impairment (looser definition) or irreversible growth arrest or cell death (stricter definition). Owing to a lack of space, we do not cover viral gene essentiality, and even though several non-coding RNAs are known to be essential in vari- ous contexts3,4, the scope of this Review is focused on protein-coding genes. Classical genetic approaches, such as transposon mutagenesis or targeted single-gene knockout (KO) studies, have enabled the first systematic screens of gene essentiality in a variety of microorganisms, thus un - veiling the first essentialomes in bacteria and yeasts. The recent availability of high-throughput genetic resources, such as genome-wide RNA interference (RNAi) and genome editing technologies, enabled these screens to be extended to non-model organisms and to increas- ingly more complex systems, such as mice and human cells. As discussed in greater detail later, the availability of essentialomes in a wide range of species has proved critical for a variety of applications, including the defi- nition of a minimal genome for synthetic biology, the eluci dation of design principles of cellular networks, the understanding of genome organization and evolu- tion and the development of chemical genomics tools for drug target identification and prioritization. In this Review, we first provide a historical perspec- tive on how key technological breakthroughs in genetics and genomics have contributed to spurring the study of gene essentiality towards increasingly complex biological systems, from bacteria to human cells. Next, we build a consensus around common properties of essential genes, drawn from findings in a wide range of species, and discuss how these properties have been exploited for a multitude of synthetic or therapeutic purposes. Then, we review important classical and recent findings that challenge the view of gene essentiality as an absolute and static property and instead shed light on its context- dependent and evolvable nature. Finally, we discuss the consequences of incorporating these novel concepts of gene essentiality into both basic and applied biomedical sciences and in particular how they could be exploited to improve current antimicrobial and anticancer treatment strategies. Evolution of gene essentiality research Distinguishing essential genes from non-essential genes has been a long-standing question in genetics. Although numerous efforts have been put forward to attempt to predict gene essentiality in silico based on evolutionary 1Institute of Medical Biology, Agency of Science, Technology and Research (A*STAR), 8A Biomedical Grove, Immunos #05, Singapore 138648, Singapore. 2Donnelly Centre, University of Toronto, Toronto, Ontario M5S3E1, Canada. 3Canadian Institute for Advanced Research, Toronto, Ontario M5G1Z8, Canada. 4European Molecular Biology Laboratory (EMBL), Genome Biology, Meyerhofstrasse 1, 69117 Heidelberg, Germany. 5Singapore Immunology Network (SIgN), A*STAR, 8A Biomedical Grove, Immunos #04, Singapore 138648, Singapore. Correspondence to N.P. norman_pavelka@ immunol.a‑star.edu.sg doi:10.1038/nrg.2017.74 Published online 16 Oct 2017 Gene essentiality The extent to which a gene is required for the reproductive success of a living system, for example, a virus, a single-celled organism, a cell line or a multicellular organism. Emerging and evolving concepts in gene essentiality Giulia Rancati1, Jason Moffat2,3, Athanasios Typas4 and Norman Pavelka5 Abstract | Gene essentiality is a founding concept of genetics with important implications in both fundamental and applied research. Multiple screens have been performed over the years in bacteria, yeasts, animals and more recently in human cells to identify essential genes. A mounting body of evidence suggests that gene essentiality, rather than being a static and binary property, is both context dependent and evolvable in all kingdoms of life. This concept of a non-absolute nature of gene essentiality changes our fundamental understanding of essential biological processes and could directly affect future treatment strategies for cancer and infectious diseases. NATURE REVIEWS | GENETICS ADVANCE ONLINE PUBLICATION | 1 REVIEWS © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. mailto:norman_pavelka%40immunol.astar.edu.sg?subject= mailto:norman_pavelka%40immunol.astar.edu.sg?subject= http://dx.doi.org/10.1038/nrg.2017.74 Reproductive success The ability of a living system to generate fertile progeny, that is, viable offspring that can themselves generate further viable offspring. Cellular gene essentiality The extent to which a gene is required for the reproductive success of a single-celled organism or of a cell line derived from a multicellular organism. Organismal gene essentiality The extent to which a gene is required for the reproductive success of a multicellular organism, that is, for it to grow and develop from a zygote into a fertile adult. Viral gene essentiality The extent to which a gene is required for the reproductive success of a virus. conservation, expression levels and/or systems-level properties5–10, here we focus on experimental approaches for defining essentialomes (TABLE 1). The pre-genomic era. Attempts to determine the pro- portion of a genome that is required for life date back to the early days of molecular biology research. In 1951, Horowitz and Leupold11 proposed that the major- ity of proteins are likely to be indispensable for life. They isolated a large number of temperature-sensitive Escherichia coli and Neurospora crassa mutants grown in minimal medium and observed that only a quarter of the E. coli mutants and half of the N. crassa mutants did not grow at the restrictive temperature in rich medium, implying that the majority of genes are non-essential in both organisms. More than two decades later, sat- urating random mutagenesis by chemicals, followed by analysis of offspring viability, enabled more accu- rate estimations of essentiality in diploid organisms: ~50%, ~15% and ~12% of the genome was reported as essential in Drosophila melanogaster, Caenorhabditis elegans and Saccharomyces cerevisiae, respectively12–15. However, mapping the identity of essential genetic elements was cumbersome at the time and could be performed only on a case-by-case basis. Transposon mutagenesis16, shotgun sequencing17, restriction enzymes and PCR all made mutant mapping easier in the sub- sequent decades, but it was not until the genomic era that the first repertoires of essentialgenes within an organism (essentialomes) were defined (FIG. 1). From complete genomes to essentialomes. Regardless of the methods used to systematically inactivate genes or their products, the complete genome sequence of an organism is a prerequisite for compiling the full list of genes as well as for designing and interpreting targeted or random mutagenesis screens. Owing to the invention of shotgun sequencing and improvements in genome assembly software18, the complete genome sequences of the first free-living organisms, Haemophilus influenzae and Mycoplasma genitalium, were published in 1995 (REFS 19,20). Although the complete genomes of model organisms such as E. coli, Bacillus subtilis and S. cerevisiae were published in the subsequent 2 years21–23, it was not until 1999 that the first essentialome screen was reported for M. genitalium, by use of an inventive combination Table 1 | Tools to determine essentialomes Method category General advantages of the method category General disadvantages of the method category Method Specific advantage of the method Specific disadvantage of the method Random mutagenesis The fastest way to generate mutants Does not guarantee loss of function of mutated gene; applicable only to haploid cells or organisms Chemical mutagenesis Simple and low cost to implement Low-throughput and high-cost mutant mapping; several genes mutated per genome Transposon mutagenesis When coupled with next-generation sequencing, it allows maximal rapidity and throughput; when coupled with barcoding, it allows bulk analysis of pools of mutants in different conditions Mutation site bias; when in pools, trans-complementation of essential properties (for example, metabolic requirements) can occur Gene trapping Reports the expression of the trapped gene; allows for rapid identification of the disrupted gene Applicable only to species with intron-containing genomes Targeted mutagenesis Complete KO achievable; few off-target effects Labour-intensive generation of mutants Homologous recombination Can be used to achieve complete KO Does not work efficiently enough in all species Gene editing Greatest flexibility in terms of the type of genetic modification that can be introduced; CRISPR is scalable to genome-wide pooled screens High cost and longer timelines for generating ZFN and TALEN enzymes; cutting sites that are highly amplified in a particular cell will trigger cell proliferation defects Knockdown approaches Amenable for high-throughput screening in a wide range of organisms Does not always lead to complete suppression Repressible promoters Most specific repression of gene of interest As with targeted mutagenesis, mutants have to be individually generated RNAi Inducible, tuneable and reversible suppression of gene expression More off-target effects than for gene editing, as short 6–8 base seed regions can trigger RNAi dCas9-based gene repression (CRISPRi) Highly specific targeting of the gene of interest Polar effects on polycistron-en- coded mRNAs; limited to genes containing protospacer adjacent motifs flanking the transcription start site CRISPRi, CRISPR-based transcriptional inhibition; dCas9, a nuclease-dead Cas9 mutant; KO, knockout; RNAi, RNA interference; TALEN, transcription activator-like effector nuclease; ZFN, zinc-finger nuclease. R E V I E W S 2 | ADVANCE ONLINE PUBLICATION www.nature.com/nrg © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. Nature Reviews | Genetics Technology Biology Sanger sequencing starts189 Transposition and mobile genetic elements discovered188 Microarray technologies emerge DNA oligonucleotide synthesis starts on a large scale190 Human Genome Project draft sequence72 RNAi demonstrated in worms61 Arrayed siRNA, esiRNA and lentivirus-based shRNA screens194–196 First-generation RNAi genetic screens192,193 Gene editing with CRISPR–Cas9 introduced197 Next-generation sequencing technologies emerge69 Low-complexity CRISPR–Cas9 genetic screens198,199 Efficient gene editing in human cells197 High-complexity CRISPR–Cas9 genetic screens91,92 W. E. Castle and C. C. Little replicate Cuenot’s observations and define the ‘lethal’ phenotype in genetic terms2 Lucien Cuenot in Nancy, France, introduces multiple allelism at a genetic locus and non-Mendelian inheritance patterns working in M. musculus M. genitalium essentialome24,97 Sydney Farber reports first remissions in acute leukaemia using the inhibitor of dihydrofolate reductase aminopterin — possibly the first cancer essential gene200 S. cerevisiae33, H. influenzae27 and S. pneumoniae202 essentialomes S. aureus essentialome201 S. Typhimurium203, H. pylori29 and D. rerio204 essentialomes E. coli34, P. aeruginosa28 and C. elegans62 essentialomes S. pombe essentialome40 D. melanogaster organism essentialome205 Multiple groups define human cell line essentialomes using CRISPR and insertional mutagenesis67,91,92 Partial M. musculus essentialome defined through large-scale knockout study55 RNA interference demonstrated in mammalian cells191 Introduction of CRISPRi and CRISPRa screens87 1950 1975 1982 1995 1998 2001 2004 2007 2006 2012 2013 2014 2015 1905 1910 1947 1999 2001 2002 2003 2004 2007 2010 2013 2015 Essentialomes Comprehensive sets of essential genes in genomes. RNA interference (RNAi). A technique to inhibit the production of a protein by destabilizing a target mRNA molecule. Minimal genome A genome consisting solely of a minimal set of genes that are required and sufficient to sustain cellular life. Evolvable Able to change via a process of adaptive evolution, that is, via acquisition and fixation of genetic mutations that confer selective advantages. of transposon mutagenesis and shotgun sequencing24. As high-throughput Sanger sequencing was not univer- sally accessible, alternative techniques were developed to map transposon insertion sites. Microarray technology and PCR-based genetic footprinting enabled individual laboratories to determine the essentialome of numerous bacteria, including H. influenzae, Mycobacterium tubercu- losis, Pseudomonas aeruginosa and Helicobacter pylori25–29. Whereas whole-genome sequences of multicellular eukar- yotic model organisms, including C. elegans, D. melano- gaster and Arabidopsis thaliana30–32, became available by the turn of the century, systematic catalogues of essential genes lagged behind: it took a few years more for the first essentialomes of genetic workhorse models such as S. cerevisiae and E. coli to be published33,34. From random to targeted mutagenesis. Random mutagenesis approaches, such as transposon muta- genesis, catalysed early attempts at defining essentia- lomes in bacteria and eukaryotic microorganisms and still represent the main approach to identifying essential genes in microorganisms to date35,36. However, although these methods have been optimized over the years and some of their drawbacks have been mitigated, they suffer Figure 1 | Milestones of technological and biological breakthroughs in gene essentiality research. The figure illustrates some of the most important technological advancements61,72,188–199 (left panel) that were conducive to some key biological discoveries2,24,27–29,33,34,40,55,62,67,91,92,97,200–205 (right panel), which have shaped our current understanding of gene essentiality. C. elegans, Caenorhabditis elegans; CRISPRa, CRISPR-based transcriptional activation; CRISPRi, CRISPR-based transcriptional inhibition; D. rerio, Danio rerio; D. melanogaster, Drosophila melanogaster; E. coli, Escherichia coli; esiRNA, endoribonuclease-preparedsmall interfering RNA; H. influenzae, Haemophilus influenzae; H. pylori, Helicobacter pylori; M. genitalium, Mycoplasma genitalium; M. musculus, Mus musculus; RNAi, RNA interference; P. aeruginosa, Pseudomonas aeruginosa; S. aureus, Staphylococcus aureus; S. cerevisiae, Saccharomyces cerevisiae; S. pneumoniae, Streptococcus pneumoniae; S. pombe, Schizosaccharomyces pombe; S. Typhimurium, Salmonella enterica subsp. enterica serovar Typhimurium; shRNA, short hairpin RNA; siRNA, small interfering RNA. R E V I E W S NATURE REVIEWS | GENETICS ADVANCE ONLINE PUBLICATION | 3 © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. Transposon mutagenesis A gene disruption strategy based on random insertion of transposable genetic elements into a host genome. Shotgun sequencing A sequencing method based on sequencing several random fragments from a long DNA molecule followed by bioinformatic assembly of the fragments based on similarity of overlapping ends. Gene traps High-throughput approaches to introduce genome-wide insertional mutations in mammalian genomes. They inactivate the trapped gene by introducing a premature polyadenylation site. Next-generation sequencing A term used to describe different high-throughput sequencing technologies that became available over the past decade. Genetic interactions (GIs). Phenomena by which concomitant mutations in two genes result in a phenotype that is not readily predictable from the phenotype of the two individual mutations. Signature mutagenesis A genetic technique based on transposon mutagenesis, where each transposable element contains a different molecular tag that uniquely identifies it. This allows the phenotype of pools of mutants to be analysed en masse. from a few limitations. First, there is no guarantee that all genes in the genome will be disrupted, even when working under saturating conditions. This is especially important in higher organisms, in which only a small fraction of the genome is coding. Second, not all muta- tions necessarily completely disrupt the expression or the function of a gene, leaving room for misidentification of essential genes. Targeted approaches in which the entire open-reading frame (ORF) of a single gene is completely deleted from the genome can more accurately determine its essentiality. A PCR-based method to simply and relia- bly achieve this goal by homologous recombination was developed for S. cerevisiae37, but in order to apply this technology genome-wide, the concerted effort of sev- eral laboratories was required. An international consor- tium, the Saccharomyces Genome Deletion project, was founded in 1998 with the goal of deleting every single ORF in the budding yeast genome. The first draft of the S. cerevisiae essentialome arising from this consortium was published in 1999 (REF. 38), and the first complete yeast single-gene KO collection was published in 2002 (REF. 33). A similar approach was used to generate the first KO library in bacteria, the E. coli Keio collection39. Despite automation of many steps of the process, targeted approaches are still labour-intensive, and such libraries are available only for a few model bacteria and yeasts40–43. For some fungal organisms, such as Candida albicans, targeted gene inactivation is further hindered by their obligate diploid nature, and only partial homozygous KO libraries exist to date44. However, the discovery of haploid C. albicans cells45 might accelerate the generation of the first genome-scale gene KO collection in this important human pathogen. From microorganisms to animals. Initial efforts to per- form targeted gene disruption in animals were hindered because homologous recombination is typically in efficient in metazoan tissue culture46. The first breakthrough came with the discovery that homologous recombination works surprisingly well in embryonic stem (ES) cells derived from mouse blastocysts47–49, which led to the develop- ment of the first KO mice in 1989 (REFS 50,51). Several international consortia have been established since then with the long-term goal of systematically inactivating all individual genes in the mouse genome and scoring the phenotypes of the resultant strains52–57. When completed, these efforts will provide the first organismal essentialome of a mammalian species. Another breakthrough came with the serendipitous discovery of ‘co-suppression’ in plants in 1990 (REF. 58), which was later dubbed RNAi. This phenomenon, which is part of a plant’s natural immunity against pathogens, was co-opted as an effective way to knock down the expression of a targeted gene59–61. Taking advantage of this new technology, the first genome-wide RNAi screen to systematically define mutant phenotypes was performed in C. elegans in 2003 (REF. 62). This tech- nology was then rapidly adapted for use in mamma- lian cells63, and several groups generated human and mouse genome-scale RNAi libraries, which were widely adopted by the community not only to screen for gene essentiality but also to study the function of both essen- tial and non-essential genes in metazoan genomes64. Finally, the identification of haploid cell lines of dip- loid organisms enabled insertional mutagenesis meth- ods, such as gene traps, to be applied for the first time in higher eukaryotes65. Although in its first iterations, the method was used more for classical forward genetics, later improvements allowed the construction of compre- hensive KO libraries66, mapping of essentialomes in cell lines67 and more quantitative reverse genetics68, in which the phenotype of each gene KO could be measured. The next-generation sequencing revolution. Thanks to next-generation sequencing69, the past two decades witnessed progression from a few sequenced bacterial genomes, each requiring years of work by many groups, to the processing of thousands of bacterial genomes in single studies70,71 and from the human genome pro- ject72,73 to the 1,000 and 100,000 genomes projects74,75. This ease in acquiring sequencing data further fuelled other methodo logical leaps to introduce tractable genetic variation. The use of transposons with restric- tion sites recognized by enzymes cutting within flanking chromosomal regions enabled transposon sequencing (Tn-seq) to map transposon insertion sites by sequenc- ing these flanking regions76. Coupled with the power of next-generation sequencing, these types of transposon library have been used to identify the essentialomes of dozens of different microorganisms and to probe con- ditional essentiality and genetic interactions (GIs)35. A further advance came from an earlier concept of intro- ducing barcodes next to transposons in order to track the abundance of all mutants within a pool. This pio- neering idea was first introduced in Salmonella enterica subsp. enterica serovar Typhimurium for identifying virulence genes required for mouse infection and was dubbed signature mutagenesis77. This concept was later improved for the construction of the yeast KO collec- tion and subsequently used in almost all single-gene KO libraries78, radically accelerating the mapping of trans- poson insertion sites36 and the ability to link genes to phenotypes and to probe for conditional essentiality79,80. The genome editing revolution. Advances in genome sequencing capabilities were paralleled by major inno- vations in our ability to edit genome sequences, includ- ing the development of zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs) and CRISPR–Cas9 RNA-guided technologies. ZFNs, TALENs and Cas9 nucleases can be programmed to recognize and cut specific DNA sequences, thereby enabling a broad range of genetic modifications. In par- ticular, CRISPR–Cas9,which was discovered as part of the bacterial immune system against phages81, enables cost-effective and straightforward genome editing in yeasts, plants and animals82,83. Further development of the technology enabled single and multiplex gene editing in both mouse and human cells84,85. An engineered ‘dead’ Cas9 (dCas9) variant with inactivating mutations in the endonuclease domains can be guided to bind specific DNA locations without cutting86, providing a platform R E V I E W S 4 | ADVANCE ONLINE PUBLICATION www.nature.com/nrg © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. Nature Reviews | Genetics 100 1,000 10,000 a b 100 1,000 10,000 R2=0.5368 100,000 Es ti m at ed to ta l n um be r o f e ss en ti al g en es Es ti m at ed p er ce nt ag e of e ss en ti al g en es Total number of protein-coding genes R2=0.2429 R2= 0.3878 R2=0.8185 1 10 100 100 1,000 10,000 100,000 Total number of protein-coding genes Eukaryotes (whole organisms) Eukaryotes (single cells) Bacteria Eukaryotes (whole organisms) Eukaryotes (single cells) Bacteria for recruiting different functional moieties to target sites. Among others, the dCas9 system has been used for tran- scriptional activation and repression (known as CRISPRa and CRISPRi, respectively), allowing genome-wide gain-of-function and loss-of-function screens to function- ally annotate genomes87,88 or to study the role of essential genes89. Notably, both Cas9 and dCas9 have been used to map essentialomes in human cell lines with much improved results over RNAi, including less data variation, more functional constructs and fewer off-target effects90 (TABLE 1). As a result of the above-described technological leaps, the landscape of essential genes in human cells is now being explored using the same conceptual frame- works established in yeast67,91–93, a reality that was in - conceivable less than 5 years ago. Moreover, the marriage of new and old genetic engineering tools with high- throughput and ‘omics’ approaches has paved the way for studying the function and interconnection of essen- tial genes with unprecedented ease from bacteria to human cells. Emerging properties of essential genes The existence today of essentialomes for a variety of species enables comparative analyses of gene essentiality across the tree of life94. As reviewed below, essential genes share several common features, which have also been used by in silico prediction tools to infer the essentiality of uncharacterized genes95. These properties of essen- tial genes, emerging from a vast body of literature, have repercussions in a variety of fields, from evolutionary and systems biology to drug development. How many genes are required for life? Genome sizes vary greatly across species, offering an opportunity to detect emerging trends in the number of essential genes in genomes96. Focusing on a subset of species for which near-complete essentialomes have been reported, we note that genomes harbouring larger numbers of ORFs tend to have higher total numbers but lower percentages of essen- tial genes, suggesting an approximate power-law scaling (FIG. 2; Supplementary information S1 (table)). On one end of the spectrum, the obligate intracellular parasite M. genitalium has one of the smallest genomes known, and 382 (~80%) of its 482 genes were reported to be essential for growth under laboratory conditions97. This might not be surprising, as extreme genome reduction is a common feature of parasites98, making them poten- tial outliers in this analysis. Considering other bacteria, ~22% of ~1,600 genes are essential in H. influenzae99, whereas only ~7% of ~4,000 are essential in E. coli39,100, but essentialomes of bacteria with much larger genomes (such as certain cyanobacteria with >12,000 predicted ORFs101) have not yet been determined. Among eukar- yotes, ~20% of ~6,000 genes are essential in budding yeast, whereas only ~10% of ~20,000 are so in cultured human cells (Supplementary information S1 (table)); however, eukaryotic essentialomes are still very limited, and methodo logical biases might exist between studies. More essentialomes from both eukaryotes and large- genome-bearing bacteria will be required to verify the power-law scaling hypothesis, as well as to understand the origin and potential consequences of this relationship. What functions do essential genes encode? A clear com- mon property of essential genes is that they tend to encode ancient functions that are fundamental for the very Figure 2 | Scaling of essential gene number with genome size. The relationship between the number or percentage of essential genes in a genome and the size of the genome was investigated based on data compiled in Supplementary information S1 (table). a | The total number of essential genes is plotted against the number of protein-coding genes. b | The percentage of essential genes is plotted against the number of protein-coding genes. Taken together, these analyses show that whereas the total number of essential genes in a genome increases with total gene content, the percentage of essential genes decreases as a function of the same quantity. Mathematically, this type of scaling can be approximated by a power-law relationship between the number of essential genes and the total number of genes in a genome, in which the power coefficient lies between 0 and 1. Biologically, this result suggests the existence of a type of economy of scale, by which larger genomes require a proportionately smaller number of essential genes to support survival of the species. These analyses have been restricted to species for which a near-complete essentialome has been experimentally determined and to the most up-to-date publications. When available, the genome assembly of the same strain used for the essentialome screen was extracted from the Ensembl database to obtain the number of protein-coding genes. Otherwise, the genome assembly of the reference strain of the corresponding species was used. R E V I E W S NATURE REVIEWS | GENETICS ADVANCE ONLINE PUBLICATION | 5 © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. http://www.nature.com/nrg/journal/vaop/ncurrent/full/nrg.2017.74.html#supplementary-information http://www.nature.com/nrg/journal/vaop/ncurrent/full/nrg.2017.74.html#supplementary-information http://www.nature.com/nrg/journal/vaop/ncurrent/full/nrg.2017.74.html#supplementary-information http://www.ensembl.org Purifying selection Negative selection against deleterious alleles. Phyletic retention The tendency of genes to be retained in genomes along phylogenetic lineages. Orthologue Orthologues are genes found in different species that have evolved from a common ancestor. Interactomes Complete sets of genetic, protein–protein, metabolic or other types of molecular interaction within a given genome. Degree Within the context of biological networks, the number of genetic or physical interactions a gene or protein is involved in. Pleiotropy The production of two or more apparently unrelated phenotypes or traits by a single gene. Epistasis A form of genetic interaction, whereby an allele of one gene influences non-additively the phenotype associated with the allele of another gene. Positive or negative epistasis exists when a double mutant is fitter or less fit than the sum of the fitness effects of the two single mutants, respectively. existence of cellular life itself. From bacteria to eukaryotes, essentialomes are enriched in genes required for DNA, RNA and protein synthesis40,95. Interestingly,while essen- tial cellular functions can be highly conserved between species, one, more or even all components of a pathway or protein complex can be replaced by functional equivalents with an independent evolutionary origin. This process is known as ‘non-orthologous gene displacement’ (REF. 102) and occurs frequently when comparing evolutionarily distant organisms, such as the Gram-positive and Gram- negative model bacteria, B. subtilis and E. coli43. In fact, only ~60 of the 500–600 genes that are estimated to be present in the last universal common ancestor of all forms of cellular life are found in all modern-day genomes103. A showcase example of this phenomenon is the kinetochore, which is a protein complex that mediates chromosome– microtubule interactions during mitosis and that is essen- tial in all eukaryotes104. Despite its universally essential function, the majority of kinetochore subunits found in kinetoplastids, a group of flagellated protists, are phyloge- netically unrelated to those of yeasts and meta zoans105,106. Overall, this concept implies that it is likely that essenti- ality is a property more of functions than of genes and thus has important implications for our understanding of genetics, evolution and systems biology. How conserved are essential genes? As genes that are essential for cellular life cannot be easily lost or mod- ified, an intuitive prediction is that they should evolve more slowly than non-essential genes. This prediction has been tested many times, revealing a more complex reality107. In support of the hypothesis, many essential proteins are infrequently lost during evolution and dis- play quasi-invariant amino acid sequences. For instance, 70 kDa heat shock proteins (Hsp70s) are present in vir- tually all living organisms and serve essential chaperone functions; remarkably, 567 amino acids of the Hsp70 protein sequence have remained identical from bac- teria to eukaryotes for the past ~2 billion years108. At the nucleotide level, lower non-synonymous to synony- mous substitution ratios, which are indicative of purifying selection, have been reported in essential versus non-es- sential bacterial genes109,110 (FIG. 3a). The same trend was also observed in mice but was shown to be driven by a biased overrepresentation among the non-essential immune-related genes, which are thought to be under positive selection due to co-evolution with pathogens111. Moreover, the evolutionary distance between yeast and worm essential genes was similar to that of non-essential genes112. This discrepancy is potentially explained by the fact that purifying selection acts more strongly in bacteria than in eukaryotes owing to their vastly different effective population sizes113 but could also be due to fundamental issues in the dichotomous classification of gene essen- tiality (see the section below on the non-absolute nature of gene essentiality). When analysing phyletic retention, a somewhat clearer picture emerges across the tree of life. Counting the number of organisms a gene orthologue is present in proved highly predictive of gene essentiality in E. coli, B. subtilis and S. cerevisiae7,114,115. Bergmiller et al. found that the level of conservation of essential genes across bacterial taxa could be predicted by the extent to which their loss could be compensated by overexpression of non-homologous genes116. These data suggest that essential gene loss does not necessarily lead towards an evolutionary dead end. Conversely, many non-essential bacterial genes are retained at high frequency, or persist, across bacterial genomes114, suggesting that their loss has a substantial fitness impact that can lead to lineage extinction, even though it does not cause immediate cell death. Consistent with this hypothesis, condition- ally essential genes are on average nearly as conserved as essential genes in S. Typhimurium or E. coli (A.T., unpublished observations). Moreover, the majority of C. elegans genes, although not essential for the imme- diate survival of individual organisms, are nonetheless required for the multigenerational fitness of worm popu- lations117. Together, these observations challenge the per- vasive concept that essential genes are more important in evolution than non-essential genes and underline the necessity for a new definition of essentiality. How connected are essential genes? Integrating essential omes with other genome-wide information is leading towards a deeper understanding of how cells work at the systems level. For instance, the availability of high-quality and comprehensive interactomes for some model organisms, ranging from genome-wide protein– protein interactions (PPIs) to GI maps, has led to the initial observation that essential genes tend to act as hubs in molecular networks5,118,119. This is generally known as the ‘centrality–lethality’ rule, whereby genes and proteins with a high degree of connectivity, that is, playing a more ‘central’ role in molecular networks, are hypothesized to be essential because their inactivation is more likely to disrupt overall network architecture (FIG. 3b). Subsequent reports showed that essential genes encode proteins that are more likely to be involved in densely connected functional modules and in pro- tein complexes120–123. Therefore, the use of pull-down approaches, which do not discriminate between direct and indirect interactions, might have inflated the degree of essential proteins in PPI networks. In fact, a reanalysis of the yeast PPI network focusing on direct binary interactions did not provide support for the centrality– lethality rule124. Rather, protein connectivity was more related to genetic pleiotropy than to gene essen- tiality. Similarly, no relationship between centrality and essentiality was found in metabolic networks125. A recent study described the use of automated yeast genetics to generate >23 million double mutants and reported ~850,000 GIs126. This near-complete GI net- work in budding yeast confirmed the earlier hypo thesis that essential genes are network hubs, displaying on average five times as many interactions as non- essential genes118,126. Consistent with earlier analyses of the PPI network119,127, genes associated with higher fitness effects tended to be more pleiotropic. Hence, essential genes are expected to be in epistasis with a larger num- ber of pathways, explaining their higher degree in GI networks. Thus, whereas the centrality–lethality rule R E V I E W S 6 | ADVANCE ONLINE PUBLICATION www.nature.com/nrg © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. Nature Reviews | Genetics 0.80 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 10 20 30 40 50 60 70 80 Eukaryotes Bacteria Essential Non-essential All Essential Non-essential All Actinobacteria Tenericutes Firmicutes Gammaproteobacteria Epsilonproteobacteria Plant Worm Bacteroidetes Firmicutes Tenericutes Betaproteobacteria Alphaproteobacteria Gammaproteobacteria Epsilonproteobacteria Mycobacterium tuberculosis (H37Rv) Bacteroides fragilis (638R) Bacteroides thetaiotaomicron (VPI-5482) Porphyromonas gingivalis (ATCC 33277) Bacillus subtilis (168) Staphylococcus aureus (N315) Staphylococcus aureus (NCTC 8325) Streptococcus sanguinis (SK36) Mycoplasma genitalium (G37) Caulobacter crescentus (NA1000) Sphingomonas wittichii (RW1) Burkholderia pseudomallei (K96243) Burkholderia thailandensis (E264) Acinetobacter baylyi (ADP1) Escherichia coli (MG1655) Francisella novicida (U112) Haemophilus influenzae (Rd KW20) Helicobacter pylori (26695) Pseudomonas aeruginosa (PAO1) Pseudomonas aeruginosa (UCBPP-PA14) Salmonella enterica serovar Typhi (Ty2) Vibrio cholerae (N16961) Campylobacter jejuni (NCTC 11168) a Actinobacteria Yeast Mycobacteriumtuberculosis (H37Rv) Bacillus subtilis (168) Staphylococcus aureus (NCTC 8325) Staphylococcus aureus (N315) Streptococcus pneumoniae (TIGR4) Streptococcus sanguinis (SK36) Mycoplasma genitalium (G37) Mycoplasma pulmonis (UAB CTIP) Acinetobacter baylyi (ADP1) Escherichia coli (MG1655) Francisella novicida (U112) Haemophilus influenzae (Rd KW20) Pseudomonas aeruginosa (UCBPP-PA14) Salmonella enterica serovar Typhi (Ty2) Salmonella enterica serovar Typhimurium (LT2) Vibrio cholerae (N16961) Helicobacter pylori (26695) Arabidopsis thaliana Caenorhabditis elegans Saccharomyces cerevisiae b Mean Ka/Ks Mean protein–protein interaction degree appears to hold well for GI but not for other networks, the ‘centrality– pleiotropy’ rule appears to be a more general property of biological networks. The non-absolute nature of gene essentiality Despite its deceptively simple definition, gene essential- ity is neither binary nor static in its nature. In this sec- tion, we review key research milestones that redefined gene essentiality as a property that is dependent on both environmental and genetic contexts and that is subject to evolutionary change. The context-dependent nature of gene essentiality. Classic genetics has been based on a strict classification of genes as either essential or non-essential for cellular life. Yet, this binary classification was soon proved to be Figure 3 | Emerging properties of essential genes. Representative parameters for evolutionary conservation (the ratio of non-synonymous to synonymous sites (Ka/Ks); part a) and of connectivity in protein–protein interaction networks (degree; part b) are plotted for essential and non-essential genes in the indicated species. These graphs show that the amino acid sequence of essential genes tends to be more conserved than that of non-essential genes and that essential genes tend to participate in larger numbers of protein–protein interactions than non-essential genes. The former can be explained by a more stringent purifying (negative) selection on essential genes due to their critical function for species fitness, as any amino acid change that negatively affects protein function would not be well tolerated by the organism and would probably be pruned by evolution. The latter can be explained by the fact that, relative to non-essential genes, essential genes more often participate in large, multisubunit protein complexes, such as ribosomes, many of which play key roles in essential cellular functions, such as translation. Data have been extracted and replotted from REF. 110 (part a) and REF. 206 (part b). R E V I E W S NATURE REVIEWS | GENETICS ADVANCE ONLINE PUBLICATION | 7 © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. Synthetic lethality An extreme form of negative epistasis, whereby the combination of mutations in two or more genes causes cell death, whereas none of the single mutations are lethal in isolation. Ploidy The number of sets (that is, full complements) of chromosomes in a genome. too simplistic, and gene essentiality is now well accepted to be a conditional trait128 (FIG. 4). First, the essentiality of a gene often depends on the environmental context in which it is probed129–131. Classical examples include the concept of auxotrophy, whereby metabolic genes that are required for the synthesis of certain building blocks of life (for example, amino acids or nucleotides) are essen- tial only if the same building blocks are absent from the growth medium or natural environment (FIG. 4a). By carefully modelling the yeast metabolic flux network, Papp et al. extended this concept, demonstrating that the majority of dispensable genes for growth in rich media are actually important for fitness in other growth conditions132. This was later confirmed experimentally by screening libraries of KO deletion strains in hundreds of conditions, as a plethora of non-essential genes were found to be required for optimal fitness in at least one of several tested growth conditions130,131. Qian et al. even found hundreds of non-essential gene deletions that were actually beneficial in some environments, while being deleterious in others133, a phenomenon known as antagonistic pleiotropy. Gene essentiality also depends on the genetic context. In the case of synthetic lethality, a gene becomes essen- tial only when the function of a second gene is lost134 (FIG. 4b). On the other hand, some essential genes can become dispensable upon the loss of another gene; this is often because the former encodes protective functions towards the toxic effects of the latter135,136 (FIG. 4c). Due to differences in genetic or epigenetic backgrounds, the essentiality of some genes could differ between individ- uals of the same species or cell types of the same indi- vidual, respectively (FIG. 4d). The presence of protective alleles in some healthy humans can even completely mask the effects of loss-of-function mutations associated with highly penetrant Mendelian childhood diseases137 or with sterility138. Furthermore, in two widely used laboratory strains of S. cerevisiae, 44 genes are uniquely essential in the Sigma1278b strain, whereas 13 are essen- tial only in the S288c strain139. In bacterial species, where strains of the same species can differ by as much as half of their genomic content (owing to rampant horizontal gene transfer), changes in essentiality within species are presumably more dramatic, although this remains to be measured. Other genetic contexts that can influence gene essentiality include ploidy: ~1% of genes known to be non-essential in haploid budding yeast were essen- tial in tetraploid cells, where the presence of a larger number of chromosomes increases the burden on the mitotic machinery and the requirement for genome stability genes140. More interestingly, environmental and genetic context dependency seem to be linked. For exam- ple, synthetic lethality can be largely dependent on the environ ment141–143. Moreover, yeast genes with more environment-dependent phenotypes or chemical–genetic interactions tend to have larger numbers of GIs119,141. In accordance with the centrality–pleiotropy rule, this could again be explained by the fact that essential genes are asso- ciated with a higher level of functional pleiotropy than non-essential genes. Comparing essentialomes across different species and contexts has led to some re-evaluation of earlier notions. For instance, the previously mentioned signif- icant difference in evolutionary conservation between bacterial essential and non-essential genes appears to be both species-dependent and media- dependent110,144. Nevertheless, we expect that the invariant set of core essential genes across eukaryotic and prokaryotic organisms could eventually be identified, and these genes are likely to underlie the fundamental processes that drive the reproductive success of cellular organ- isms145. Cataloguing core and context- dependent essential genes is particularly relevant for studying disease in multi cellular organisms, where de lineating tissue- specific essential genes has the potential to reveal both the genetic roots of and candidate targets for tissue- specific diseases. Similarly, understanding the set of genes and GIs that are required for cellular and/or organismal reproductive success is the first step for developing cancer- specific or pathogen-specific therapies and for precision medicine91,93,146 (see the subsection ‘Implications for therapeutic applications’). The evolvable nature of gene essentiality. The essentiality of a gene does not only depend on the context in which it is probed; it can also change in the course of evolution. Indeed, as genomes evolve, genetic backgrounds canbe altered in such a way that changes the essentiality of a gene. For instance, roughly one-third of the genes found to be essential in E. coli are non-essential in B. subtilis Figure 4 | Context-dependent gene essentiality. Schematic representations illustrating different examples of context-dependent gene essentiality. a | A hypothetical gene X encodes enzyme X, which is required for the production of the essential metabolite B. In an environment where metabolite B is present, gene X is dispensable. When metabolite B is absent, gene X becomes essential. This phenomenon is also known as auxotrophy. b | Hypothetical genes X and Y encode enzymes performing redundant biochemical reactions. Whereas inactivation of either gene alone leads to viable cells, the simultaneous deletion of both genes causes cell death. This is an example of synthetic lethality. c | Hypothetical gene X encodes an inhibitor of toxin Y. In the absence of toxin Y, gene X is dispensable, but its activity is required for viability in the presence of the toxin. Gene X is an example of a protective essential gene. d | Hypothetical genes X and Xʹ encode mutually exclusive and redundant subunits of an essential protein complex with subunit Y. In cells in which the expression of gene Xʹ is epigenetically silenced, gene X becomes essential. This could form the basis of cell type-specific essentiality in multicellular eukaryotes. e | Hypothetical gene X encodes a protein that promotes essential process X. At a normal level of expression, the product of gene Y does not contribute to process X. Upon upregulation of protein Y (for example, due to aneuploidy of the chromosome encoding gene Y), a hidden function of protein Y is unmasked, leading to its promotion of process X. Therefore, the essentiality of gene X could be bypassed by the acquisition of mutations that upregulate gene Y. This is the basis of high copy number suppression screens and occurs frequently during adaptive evolution of yeast species. ▶ R E V I E W S 8 | ADVANCE ONLINE PUBLICATION www.nature.com/nrg © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. Nature Reviews | Genetics Enzyme X c Protective essential genes d Cell-type-specific essentiality e Karyotype-dependent essentiality Environment 1: Metabolite B present Gene X non-essential Environment 2: Metabolite B absent Gene X essential Toxin Y Antitoxin X Permissive epigenetic mark Inhibitory epigenetic mark Euploid cell: Gene Y expressed at normal levels Gene X essential Aneuploid cell: Gene Y over-expressed Gene X non-essential Cell type A: Gene X′ expressed Gene X non-essential Cell type B: Gene X′ not expressed Gene X essential Strain A: Gene Y present Strain B: Gene Y absent Gene X Gene X′ Viable Viable Viable Viable Viable Viable Viable Viable Viable Viable Viable Viable Viable Viable Viable Δ Δ Δ Δ Δ Δ Subunit X Subunit Y Subunit X′ Subunit Y Δ Δ Gene X non-essential Gene X essential Strain A: Gene Y present Strain B: Gene Y absent Gene X essential Gene X non-essential Δ Δ Process X Process X Process X Process X GX GX GY GY EYEX MA MA′GX Δ GX GX GY GY GY GX′ GY GX′ GY GX′ GX′ GX GY GX′ GX GX Δ Δ Δ MBMB MB MBMB MB MB MB MB MB MB MB MB MB MB EYEX MA MA′ MB EYEX MA MA′ MB EYEX MA MA′ MB MBMBMB MBMBMB MB MA MB MA MA EX MB MA EX EX EX MB Toxin Y Antitoxin X Toxin Y Antitoxin X Toxin Y Antitoxin X Subunit X Subunit Y Subunit X′ Subunit Y Subunit X Subunit Y Subunit X′ Subunit Y Subunit X Subunit Y Subunit X′ Subunit Y PY PY PY PYPY PY PY PY PX PXGX GY GY GY GY GY GY GX PX PX Enzyme Y Gene Y a Auxotrophy b Synthetic lethality Metabolite B Metabolite A Metabolite A′ GX EX MA MA′ GY EY MB Protein X Protein Y PX PY Inviable Inviable Inviable Inviable Inviable R E V I E W S NATURE REVIEWS | GENETICS ADVANCE ONLINE PUBLICATION | 9 © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. and vice versa39,43, whereas ~27% of essential genes in S. pombe fission yeast are non-essential in S. cerevisiae budding yeast, and ~17% of essential budding yeast genes are non-essential in fission yeast40, confirming that the essentiality of genes often changes during evolution. This change in essentiality can be due to various reasons: genes or functions could arise separately or be lost or replaced by others during evolution (distinct biology of the spe- cies); the cellular network could become more robust (for example, by acquiring separate pathways or pro- tein complexes performing the same function); or the network could be rewired to bypass the essentiality43. Overall, protein complexes typically behave uniformly in this evolutionary switch of essentiality, with all sub- units being either non-essential or essential in one species120,123,147 and by switching essentiality states in a coherent manner between species148,149. This suggests that essentiality is largely a property of entire molecu- lar machines and/or functional modules rather than of individual genes. If changes in genetic background could render an essential gene dispensable, could essential gene loss be compensated by short-term adaptive evolution, thus pre- venting lineage extinction? One potential mechanism is by spontaneous genetic suppression. Suppression screens have been used by geneticists for decades to infer the function of uncharacterized genes and are still success- fully utilized today to systematically identify connections between and within biological pathways150. The pheno- menon arises when a secondary mutation suppresses the phenotype of a primary mutation, and spontaneous genetic suppression by means of natural acquisition of compensatory mutations can be readily observed in cells that harbour a deletion of a non-essential gene151. By definition, complete deletion of an essential gene is incompatible with life; therefore, researchers wishing to isolate suppressors of an essential gene typically have to either mutagenize a conditional KO strain before inactivation of the essential gene or resort to a less severe alteration of the essential gene, for example, by employing temperature-sensitive or hypomorphic alleles. However, cells can spontaneously circumvent the lethality associated with the complete deletion of at least some essential genes. Following genetic inactivation of the essential type II myosin-encoding MYO1 gene, which is required for cytokinesis in budding yeast, although the majority of cells succumbed after a few failed cell divisions, some myo1 cells survived after prolonged incubation and improved their fitness and cytokinesis proficiency upon serial passaging. Interestingly, these evolved myo1 cells performed cytokinesis by mecha- nisms that were fundamentally different from wild-type cells152. Specifically, a subset of evolved myo1 strains acquired extra copies of chromosome XVI, which led to increased expression of RLM1 and MKK2, which encode a transcription factor and a signalling molecule, respec- tively, that play key roles in the cell wall integrity path- way153. As a result, these evolved myo1 strains achieved cytokinesis not by pulling the plasma membrane from the inside via the actomyosin ring, but by pushing it from the outside via thickening of the cell wall around the bud neck. As type II myosins drive cytokinesis from yeast to human cells154, these results suggest that short- term evolutionary processes, such as the acquisition of aneuploid chromosomes, are sufficient to overcome even the lossof a highly conserved essential gene (FIG. 4e). More recently, some of us undertook a genome- wide effort to establish the generality of this finding. Liu et al.155 tested the extent to which S. cerevisiae cells could withstand the individual deletion of each of ~1,100 essential genes and designated 88 (~9%) of these as evolvable essential genes. We define evolvable essen- tial genes as essential genes that can be acutely removed from the genome without causing stereotypical cell death; instead, a subset of cells with these genes deleted can undergo short-term adaptive evolution and spon- taneously acquire compensatory mutations that sup- press the lethal phenotype. Interestingly, evolved strains in which different essential subunits of the nuclear pore complex were deleted increased the gene dosage of BRL1 (which encodes an integral membrane protein) and restored nuclear–cytoplasmic transport by altering membrane fluidity. Consistent with other reports in bacteria and yeast116,152,156, this indicates that adap tation to essential gene loss often occurs by tinkering with seemingly unrelated biological functions rather than by fixing the broken molecular machinery. Efforts to sys- tematically map the interconnections between essential functional units89,118 might help us understand when and how networks can be rescued from such deep valleys in fitness landscapes. Importantly, evolutionary responses to genetic per- turbations are not restricted to essential genes. In fact, yeasts and bacteria can also acquire adaptive mutations in response to the deletion of non-essential genes151,157,158, indicating that non-essential genes might encode func- tions that are still essential for overall cellular fitness and that their loss can thus act as a powerful selective pres- sure. Together, these observations suggest the existence of a gradient of gene essentiality, with some essential genes being less essential than others (meaning that they can become dispensable via a change in the environment or via short-term or long-term evolutionary processes) and some non-essential genes being less dispensable than others (meaning that they encode functions that are nevertheless required for full fitness). In support of the existence of such an essentiality gradient, proper- ties previously found to be differentially associated with essential and non-essential genes, such as sequence con- servation, phyletic retention and centrality in molecular networks, were found to display intermediate values in evolvable essential genes155. Overall, these observations support a paradigm shift from a qualitative to a quantitative definition of gene essentiality, which must take into account not only the viability of the corresponding mutant cells but also the environmental and genetic context in which it is probed, as well as the ability of the mutant cells to evolve compensatory mechanisms via spontaneous acquisition of suppressor mutations (TABLE 2). Moving forward, it is likely that as gene essentiality was found to be context- dependent, the evolvability of gene essentiality will also R E V I E W S 10 | ADVANCE ONLINE PUBLICATION www.nature.com/nrg © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. be dependent on both environmental and genetic con- texts. Consistent with this idea, compensatory evolution of S. cerevisiae cells in response to a genetic alteration modelling a human disease was found to be dependent on the genetic background, the environmental condition and a combination of the two159. Broader implications across various fields Implications for metabolic engineering and synthetic biology. Knowledge of which genes are essential or non-essential in a genome in a given context is of crucial importance for metabolic engineering and the growing field of synthetic biology. The optimal production of chemicals, nutrients, pharmaceuticals and biofuels via genetic engineering of microbial cell factories relies on the addition and deletion of key metabolic genes to re- direct metabolic fluxes towards desired end products160. To this end, understanding the context-dependent essen- tiality of individual cellular reactions is key for devel- oping and optimizing relevant pathways, as genes that are non- essential under standard laboratory conditions might become essential under industrial conditions and vice versa. One of the main goals of synthetic biology is the rational design and complete synthesis of a living organ- ism161–163, and minimal genomes are seen as promising starting points for such endeavours164,165. Whereas the essentialome represents the set of genes that are required for reproductive success, the minimal genome repre- sents those genes that are sufficient to sustain cellular life166. By use of a top-down approach, genomes have been successfully minimized by progressively deleting increasing numbers of non-essential genes167–169. These efforts are, however, time consuming, as genes are typ- ically deleted in a stepwise manner and therefore can follow only specific design trajectories. This may lead to unexpected dead ends, as at each step of the process, the genetic context is modified in such a way that could alter the essentiality of other genes. The introduction of synthetic chromosome rearrangement and modification by loxP-mediated evolution (SCRaMbLE)162 promises to accelerate the generation of strains carrying multiple random gene deletions. Multiple SCRaMbLE sites will be incorporated in the first synthetic eukaryotic genome, Sc2.0 (REF. 164). Once completed, the Sc2.0 genome could then be subjected to multiple rounds of ‘SCRaMbLEing’, which could potentially bypass some of the fitness val- leys caused by context-dependent gene essentiality, until a eukaryotic minimal genome is achieved. Bottom-up approaches based on the synthesis of minimal genomes consisting only of essential genes, on the other hand, have proved much less straightforward. In silico modelling of bacterial metabolic networks demon- strated that minimal genomes require more than all the essential genes170. Experimental efforts to build a mini- mal genome consisting of only the 375 genes found to be individually essential in M. genitalium24 have consistently failed to yield viable cells165. An approximately minimal bacterial genome, Syn3.0, was eventually built based on several rounds of rational design and random muta genesis data on progressively reduced genomes and contained 98 more genes than the initially predicted set of individu- ally essential genes. This observation was attributed to non- essential genes in the original genome becoming essential or quasi-essential in the context of the reduced genome due to synthetic lethality165. This result high- lights the difficulty in predicting minimal genomes from single-gene deletion or knockdown (KD) studies alone and the importance of epistasis in determining the essentiality of a gene. Genome-wide GI maps will be required to improve these predictions, but due to the combinatorial nature of the problem, mapping even only all binary interactions within an organism is cur- rently a colossal effort119,126,150. However, as more ‘non- essential’ genes are removed from a genome, knowledge of higher-order GIs becomes important, rendering such predictions almost impossible. On the other hand, as more efficient technologies become available for gener- ating completely synthetic genomes by replacing long fragments of genomic DNA with synthetic ones171, bottom-up approaches are bound to increasingly yield greatly reduced, if not minimal, microbial genomes even following a trial-and-error process. Synthetic microbial communities and the emerg- ing field of synthetic ecology172 offer another potentialavenue for applying the concept of context- dependent gene essentiality. Although multi-species microbial communities often carry out complex bio conversions more efficiently than single-strain microbial cul- tures, it is challenging to maintain them long term172. Table 2 | Quantitative definitions of gene essentiality Definition based on Extent of essentiality No essentiality Low essentiality High essentiality Complete essentiality Context dependency Dispensable in all environmental and genetic contexts Dispensable in most environmental and genetic contexts Indispensable in most environmental and genetic contexts Indispensable in all environmental and genetic contexts Evolvability following gene inactivation No compensatory mutations required for survival Compensatory mutations are required for survival. For these compensatory mutations, multiple independent compensatory mechanisms exist and/or the mutations occur at high frequency and/or they are easily selected and fixed in the population Compensatory mutations are required for survival. For these compensatory mutations, only a few compensatory mechanisms exist and/or the mutations occur at low frequency and/or they are not easily selected and fixed in the population No compensatory mechanism exists R E V I E W S NATURE REVIEWS | GENETICS ADVANCE ONLINE PUBLICATION | 11 © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. Syntrophic relationships A phenomenon by which one species requires the product of another species to survive. Cooperative interactions could be predicted by in sil- ico modelling173 or by engineering mutual syntrophic relationships in a synthetic community. These relationships can be achieved by introducing mutual viability depend- encies between community members by taking advantage of what could be termed ‘ecology-dependent gene essen- tiality’, that is, by deleting genes in one species that are essential only if another species is absent from the com- munity. This strategy has been successfully implemented for synthetic communities of up to 14 members and effectively prevented some members from dominating the community and others from going extinct174. Implications for therapeutic applications. Given the fundamental role played by essential genes, it is un - surprising that they represent current and potential novel targets of many antimicrobial and anticancer compounds91,146,175,176. Yet the concept of core versus context-dependent gene essentiality could be exploited in different ways to eliminate pathogens or cancer cells. For antimicrobial therapy, the goal would be to target a cellular function that is universally essential for the micro- organism under most, if not all, environments and genetic backgrounds, so that it will be effective against different clinical isolates of the pathogen and in any possible body part of the host (FIG. 5a). This strategy can translate to both broad-spectrum and narrow-spectrum compounds, with the former focusing on conserved essential genes across species and the latter on conserved essential genes within a specific pathogenic species. Pathogen-specific drugs, which could also include drugs targeting non-essential virulence genes177, may also mitigate the collateral dam- age on the commensal microbiome and the dysbiosis or secondary infections associated178. By contrast, for cancer therapies, the focus would be on genes that are specifically essential in tumour cells but not in normal cells; this context-dependent essentiality would provide therapeutic efficacy without having exces- sive toxic effects on the whole body (FIG. 5b). An example of this strategy has been implemented in the case of can- cers that have mutations in the BRCA1 or BRCA2 breast and ovarian cancer susceptibility genes. Tumour cells with homozygous BRCA1 or BRCA2 loss of function (obtained through germline acquisition of a heterozygous mutant state, followed by cancer-associated somatic inactivation of the remaining functional allele) become dependent on poly(ADP-ribose) polymerase (PARP) activity179. This observation has been exploited for developing PARP inhibitors, such as olaparib, which is now clinically approved for the treatment of ovarian cancer180 (FIG. 5c). Indeed, the concept of synthetic lethality, whereby the presence of cancer-specific mutations in certain genes renders other genes essential for the proliferation and sur- vival of cancer cells181, has matured into a drug-targeting strategy for oncology programmes in both industry and academia. Patient-tailored therapy based on individual- ized cancer drug susceptibility profiles is already yielding promising results for precision medicine93,182 (FIG. 5d). Finally, the fact that gene essentiality is an evolvable property has tremendous implications for our under- standing of how drug resistance arises and our ability to curb its alarmingly progressive incidence183. Selection of drug-resistant pathogens or cancer cells by anti- microbials or chemotherapy, respectively, is an inherent evolutionary process184. Whether to fight off an infection or to beat cancer, anti-proliferative drugs typically target essential functions that are required for cell growth and survival. If the essentiality of cellular function truly lies on a gradient of evolvability, then genes associated with maximum essentiality (and hence, least evolvability) would make better drug targets. A way to address this is by screening KD or conditional KO libraries of essential genes in pathogens or cancer cells for gene mutations associated with the least propensity to acquire spon- taneous suppressors. Such genes would represent supe- rior targets for further drug screening and development, as they would probably be associated with a lower inci- dence of drug resistance. However, if the evolv ability of gene essentiality is as context-dependent as gene essen- tiality itself, in vitro evolution of mutant cells may not always translate to in vivo emergence of drug resistance. Thus, more research is required to fully define gene essentiality across different contexts and timescales. Conclusions In conclusion, recent technological advances have enabled massive genome-wide screening efforts that are uncovering the complex and multifaceted nature of essential genes. From the several examples herein reviewed, it is evident that gene essentiality is not a fixed property of a gene but strongly depends on the environmental and genetic context and can be altered in the course of both short-term and long-term evo- lution. Thus, the essentiality of a gene is a quantitative rather than a binary trait and should be measured on a continuous scale. This idea could be further extended by claiming that no gene is absolutely essential — only functions can be so. These emerging concepts are open- ing up exciting avenues for fundamental research into the basic requirements for life as well as illuminating new paths towards therapeutic exploitation against diseases spanning from cancer to infectious diseases. Some of the key next steps include re-assessing gene essentiality in the light of its context- dependent and quantitative nature not only in the few model organisms but also in non-model organisms across the tree of life80. This assessment will be instrumen- tal not only for understanding the evolutionary plasticity of essential cellular functions but also for gaining more knowledge of medically and industri- ally relevant microorganisms. Once systematic quan- tifications of gene essentiality are available for a large number of species, the next leap will be to understand how these genes are interconnected within the cell. GI maps need to become truly ‘genome-wide’ and not only focus onnon-essential genes, which could be accomplished by employing hypomorphic, tem- perature-sensitive or repressible alleles of essential genes126,185. By comprehensively mapping connections within and between cellular pathways across various species and environmental conditions, these studies will facilitate our understanding of archetypal network R E V I E W S 12 | ADVANCE ONLINE PUBLICATION www.nature.com/nrg © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. Nature Reviews | Genetics a b c a Blood b Spinal fluid c Respiratory tract d Gastrointestinal tract e Kidneys f Liver g Brain h Spleen a Colon cancer cells b Normal colonocytes c Hepatocytes d Bone marrow cells e Neurons f Hair follicles g Cardiomyocytes h Splenocytes Strategy to target colon cancer Target context-dependent essential genes specifically required for growth and survival of cancer cells a c b DNA damage Single-strand break Double-strand break Base-excision repair PARP inhibitor Homologous recombination Normal, BRCA1–/– or BRCA2–/– cells Normal cells BRCA1–/– or BRCA2–/– cells No repair Cell survival Cell death Patient with AML d Essential genes in AML cell lines Without oncogenic RAS With oncogenic RAS Blood sample Treatment * * * Strategy to target a pathogenic microbe causing a systemic infection Target core essential genes required for growth and survival in all body sites PARP NRAS KRAS Oncogenic RAS? PREX1 inhibitor? • RCE1 • ICMT • RAF1 • SHOC2 • PREX1 d e f g h a b c d e f g h Figure 5 | Exploiting the non-absolute nature of gene essentiality for drug targeting. The figure illustrates examples of how the concept of the core versus context-dependent nature of gene essentiality could be exploited to define drug targets. a | In the case of infectious diseases, the goal is to eradicate from the human body a microbial pathogen that might have spread to various body sites. In this case, an ideal drug target for antimicrobial therapy should be chosen from the core set of context- independent essential genes of that microbial species. This will ensure that the therapy will work in any part of the human body. b | In the case of cancer, the aim is to eradicate tumour cells while sparing healthy tissues. In this case, an ideal drug target should be chosen from the set of cancer-specific essential genes, which should be dispensable in normal tissues. c | In the presence of poly(ADP-ribose) polymerase (PARP), single-strand DNA breaks are effectively repaired by the base-excision repair pathway regardless of BRCA1 or BRCA2 functionality. In the absence of PARP, cells become dependent on BRCA1 and BRCA2 to repair their DNA by homologous recombination. Therefore, PARP is essential only in BRCA1-deficient or BRCA2-deficient cells, providing a therapeutic avenue to selectively eliminate cancer cells. d | Acute myeloid leukaemia (AML) cells were found to be dependent on PREX1 for survival but only if they were carrying oncogenic RAS mutations93. This could be a strategy to stratify patients with AML for treatment: a sample of blood could be tested for KRAS or NRAS mutations (indicated by the asterisks), which, if detected, could guide the use of hypothetical future anti-PREX1 treatments. ICMT, isoprenylcysteine carboxyl methyltransferase; RAF1, Raf-1 proto-oncogene, serine/threonine kinase; RCE1, Ras converting CAAX endopeptidase 1; SHOC2, SHOC2, leucine rich repeat scaffold protein. Part c is from REF. 207, Macmillan Publishers Limited. R E V I E W S NATURE REVIEWS | GENETICS ADVANCE ONLINE PUBLICATION | 13 © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 1. Clavijo, B. J. et al. An improved assembly and annotation of the allohexaploid wheat genome identifies complete families of agronomic genes and provides genomic evidence for chromosomal translocations. Genome Res. 27, 885–896 (2017). 2. Castle, W. E. & Little, C. C. On a modified Mendelian ratio among yellow mice. Science 32, 868–870 (1910). 3. Liu, S. J. et al. CRISPRi-based genome-scale identification of functional long noncoding RNA loci in human cells. Science 355, aah7111 (2017). 4. Lluch-Senar, M. et al. Defining a minimal cell: essentiality of small ORFs and ncRNAs in a genome- reduced bacterium. Mol. Syst. Biol. 11, 780 (2015). 5. Jeong, H., Mason, S. P., Barabasi, A. L. & Oltvai, Z. N. Lethality and centrality in protein networks. Nature 411, 41–42 (2001). 6. Joyce, A. R. & Palsson, B. O. Predicting gene essentiality using genome-scale in silico models. Methods Mol. Biol. 416, 433–457 (2008). 7. Hwang, Y. C. et al. Predicting essential genes based on network and sequence analysis. Mol. Biosyst. 5, 1672–1678 (2009). 8. Ye, Y. N., Hua, Z. G., Huang, J., Rao, N. & Guo, F. B. CEG: a database of essential gene clusters. BMC Genomics 14, 769 (2013). 9. Wei, W., Ning, L. W., Ye, Y. N. & Guo, F. B. Geptop: a gene essentiality prediction tool for sequenced bacterial genomes based on orthology and phylogeny. PLoS ONE 8, e72343 (2013). 10. Jiang, P. et al. Network analysis of gene essentiality in functional genomics experiments. Genome Biol. 16, 239 (2015). 11. Horowitz, N. H. & Leupold, U. Some recent studies bearing on the one gene-one enzyme hypothesis. Cold Spring Harb. Symp. Quant. Biol. 16, 65–74 (1951). 12. Judd, B. H., Shen, M. W. & Kaufman, T. C. The anatomy and function of a segment of the X chromosome of Drosophila melanogaster. Genetics 71, 139–156 (1972). 13. Lefevre, G. Jr. The one band-one gene hypothesis: evidence from a cytogenetic analysis of mutant and nonmutant rearrangement breakpoints in Drosophila melanogaster. Cold Spring Harb. Symp. Quant. Biol. 38, 591–599 (1974). 14. Brenner, S. The genetics of Caenorhabditis elegans. Genetics 77, 71–94 (1974). 15. Goebl, M. G. & Petes, T. D. Most of the yeast genomic sequences are not essential for cell growth and division. Cell 46, 983–992 (1986). 16. Kleckner, N., Roth, J. & Botstein, D. Genetic engineering in vivo using translocatable drug- resistance elements. New methods in bacterial genetics. J. Mol. Biol. 116, 125–159 (1977). 17. Anderson, S. Shotgun DNA sequencing using cloned DNase I-generated fragments. Nucleic Acids Res. 9, 3015–3027 (1981). 18. Sutton, G. G., White, O., Adams, M. D. & Kerlavage, A. R. TIGR Assembler: a new tool for assembling large shotgun sequencing projects. Genome Sci. Technol. 1, 9–19 (1995). 19. Fleischmann, R. D. et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–512 (1995). 20. Fraser, C. M. et al. The minimal gene complement of Mycoplasma genitalium. Science 270, 397–403 (1995). 21. Goffeau, A. et al. Life with 6000 genes. Science 274, 546–567 (1996). 22. Blattner, F. R. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1462 (1997). 23. Kunst, F. et al. The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature 390, 249–256 (1997). 24. Hutchison, C. A. et al. Global transposon mutagenesis and a minimal Mycoplasma genome. Science 286, 2165–2169 (1999). 25. Smith, V., Botstein, D. & Brown, P. O. Genetic footprinting: a genomic strategy for determining a gene’s function given its sequence. Proc. Natl Acad. Sci. USA 92, 6479–6483 (1995). 26. Sassetti, C. M., Boyd, D. H. & Rubin, E. J. Comprehensive identification of conditionally essential genes in mycobacteria. Proc. Natl Acad. Sci. USA 98, 12712–12717 (2001). 27. Akerley, B. J. et al. A genome-scale analysis for identification of genes required for growth or survival of Haemophilus
Compartilhar