Baixe o app para aproveitar ainda mais
Prévia do material em texto
1 Genômica e Bioinformática Diego Mauricio Riaño Pachón diego.riano@cena.usp.br CEN5789 Quinta-feira: Teoria 8:00h - 12:00h. Quinta-feira: Prática 13:30h - 16:30h 2 Rules of the game You are supposed to come to class You are supposed to read the mandatory literature BEFORE each class You are supposed to actively participate during class Respect: No cell phones, no headphones, no chatting, no SMS/WhatsApp, etc, pay attention to the professor and your classmates, and participate Grades: Presentation of 1 subject 30% Submission of pre-project 10% Final project presentation 40% Surprise Quizzes 20% Final grade rounding Grades above *.25 will be rounded up to *.5, grade below *.25 will be rounded down to the previous integer. Grades above *.75 will be rounded up to the next integer, grade lower *.75 and higher than *.5 will be rounded to *.5 All partial grades will be on a scale from 0 to 10, with increments of 0.5 only. e-Disciplinas 3 Rules of the game Grades and final concept Grades will be on the scale 0-10 4 Final Concept Grade A: Excelente, com direito a crédito ≥9.0 - ≤10.0 B: Bom, com direito a crédito ≥6.5 - <9.0 C: Regular, com direito a crédito ≥5.0 - <6.5 R: Reprovado, sem direito a crédito <5.0 Rules of the game Weekly presentation of two 1 subject per group 30% Groups of students will make a presentation of one theoretical topic following the weekly schedule available in e-Disciplinas. Groups have already been created by the professor. Presentation should last at least 1.5h and no more than 3h and should cover the topics presented in the program and/or requested by the professor. Presentation material (e.g., slides) should be send to the professor the night before (until 19h) of the presentation. At noon during class the group in charge of next’s week topic will be chosen 5 Rules of the game Pre-project 10% Groups of 2 students (same groups as before) must choose a project to be developed outside the classroom, applying techniques/concepts learned during the course. The idea of the project, 2 pages, including half a page of literature references must be delivered, via e-Disciplinas, to the professor, before the class of week 3. Students are encouraged to discuss their project with the professor before the class of week 3. 6 Rules of the game Final project 40% The oral presentations of the project will be carried out in week 10. All members of the group must participate of the oral presentation. Presentation should not last more than 20 min + 10 min questions. Slides must be sent via eDisciplinas until 19h the night before the presentations. A report in the form of a research paper must be delivered to the professor, via e-Disciplinas, before the oral presentation. 7 Rules of the game Surprise Quizzes 10% There will be surprize quizzes along the course, on an undetermined number and undefined dates. Look forward to them. 8 Rules of the game E-Disciplinas Everyone please log into the Moodle system and check whether you have access to the course CEN5789-2022 https://edisciplinas.usp.br/ Check that you have access to the system. All tests, quizzes, homework will be done and delivered via e-disciplinas 9 10 10 Check groups Select group responsible for next week presentation! 11 Survey 12 Have you used Linux before? Do you know about EMBOSS? Have you used it? Do you regularly use BLAST at NCBI? With the command line? Do you use Pfam? HMMER? Do you know any computer programming language? Which computational tools you use regularly to solve biological problems? In which course are you enrolled? 13 Talk a Little bit about youselves In one sentence What is your research topic? Why are you interested in this course? 14 Flow of information in the cell 15 Replicación: Linea de ensamblaje. La proteína azul en la parte inferior izquierda es la helicasa que se encarga de desenrollar el ADN en las dos hebras, la líder y la rezagada. La rezagada se sintetiza hacia atrás en fragmentos de Okazaki. El resultado son dos copias completas e idénticas de la molécula original. Transcripción: La animación inicia con la TBP unida al promotor. Llega la ARN polimerasa y los factores de transcripción basales, se forma el complejo de inicio de la transcripción. Un factor de transcripción especifico activa al complejo y la transcripción inicia. El ARN naciente se pliega sobre si mismo y esto puede afectar la velocidad de la transcripción. Esta continúa hasta llegar al sitio de finalización de la transcripción. El ARN tiene que madurar antes de ser exportado al citosol. Traducción: El ARN mensajero maduro es reconocido por el ribosoma, quien lo barre en busca de un codón de inicio, una vez lo encuentra inicia la traducción. Vista de la traducción al remover la subunidad pequeña del ribosoma. tRNA, estructura secundaria de RNA. Muestra sitio aminoacil, sitio peptidil y sitio de salida del tRNA. Muestra ARN cargados con amino ácidos. Nuestra el polipéptido naciente, este se va plegando sobre si mismo de forma autónoma, a veces ayudado por otras proteínas. 15 Biología molecular para bioinformática 16 La colección completa de metabolitos hace parte del fenotipo celular, de hechos muchas veces los usamos como un proxy del estado de la célula, e.g., diagnostica, glicemia 16 What’s Bioinformatics? 17 Today’s biologists try to work with full collection of stuff (omes) Our working definition: The use of computational, mathematical and statistical tools to handle, analyse, interprete and generate biological data La bioinformática es una disciplina que surge de la interacción entre la biología, la estadística y las ciencias de la computación (Figura 1.1. Tiene como principales objetivos el manejo y análisis de grandes volúmenes de datos, principalmente producto de las nuevas tecnologías en biología molecular, como la genómica, la proteómica y la metabolómica, especialmente hoy en día con el advenimiento de nuevas tecnologías de secuenciación de ácidos nucleicos que están revolucionando la forma en como estudiamos los genomas. Otro aspecto importante incluye el desarrollo de nuevos métodos computacionales, algoritmos y/o software, para el análisis de esos datos. Según Philip Bourne (UCSD), “la bioinformática se ha convertido en el interprete del lenguaje genómico del ADN y está intentando descifrar lenguajes mas complejos en los que las proteínas son los sustantivos, las interacciones son la sintaxis, las rutas metabólicas son las oraciones y los sistemas vivos son el volumen completo” (Bourne, 2004).Por lo tanto, de forma similar a la biología molecular, la bioinformática constituye hoy en día una caja de herramientas que todo investigador en biología tiene que manejar (Stein, 2008 presenta un punto de vista muy interesante). 17 What a bioinformatician does? 18 Handling large amounts of data, mostly sequences, spectra and networks. Design and implementation of databases. Use and design of standardized vocabularies (ontologies). Sequence and network alignment. Statistical assessments: differential gene expression, cluster analysis, functional annotation of clusters. Data visualization. La bioinformática es una disciplina que surge de la interacción entre la biología, la estadística y las ciencias de la computación (Figura 1.1. Tiene como principales objetivos el manejo y an ́alisis de grandes volu ́menes de datos, principalmente producto de las nuevas tecnolog ́ıas en biolog ́ıa molecular, como la gen ́omica, la prote ́omica y la metabol ́omica, especialmente hoy en d ́ıa con el advenimiento de nuevas tecnologías de secuenciación de ácidos nucleicos que est ́an revolucionando la forma en como estudiamos los genomas. Otro aspecto importante incluye el desarrollo de nuevos m ́etodos computacionales, algoritmos y/o software, para el análisis de esos datos.Segu ́n Philip Bourne (UCSD), “la bioinform ́atica se ha convertido en elinterprete del len- guaje gen ́omico del ADN y est ́a intentando descifrar lenguajes mas complejos en los que las prote ́ınas son los sustantivos, las interacciones son la sintaxis, las rutas metab ́olicas son las oraciones y los sistemas vivos son el volumen completo” (Bourne, 2004).Por lo tanto, de forma similar a la biolog ́ıa molecular, la bioinform ́atica constituye hoy en d ́ıa una caja de herramientas que todo investigador en biolog ́ıa tiene que manejar (Stein, 2008 presenta un punto de vista muy interesante). 18 19 Databases 19 20 ¿Qué es una base de datos? Pero . . .esa colección debe ser/estar: Estructurada Indexada Actualizada regularmente (±) Contener referencias cruzadas (±) Es una colección de lo que usted quiera. Debe incluir herramientas para hacer actualizaciones, agregar y borrar entradas Existen diferentes formatos: texto plano, esquemas relacionales, XML, etc. 20 21 Algunas bases de datos biológicas GenBank, DDBJ, EMBL ADN, ARN Swissprot Proteínas MEDLINE Literatura RSCB Protein Data Bank Estructuras 3D KEGG Rutas metabólicas Swiss-2DPAGE Geles 2D, datos MS Pfam, PROSITE, InterPro Dominios proteicos TRANSFAC, PLACE Sitios de regulación ¿Quieren mas? http://www.expasy.org/links.html El número especial de bases de datos de la revista Nucleic acids research que se publica en cada año en Enero 21 Bases de datos específicas de organismos 22 TAIR: The Arabidopsis Information Resource (Arabidopsis thaliana) J. Craig Venter Institute (Wheat, Rice, microorganisms) Ensembl (Human, Pan, Gallus, Mus, Canis, Apis, …) Chlamydomonas reinhardtii genomeDB (JGI-DOE) Saccharomyces cerevisiae (www.yeastgenome.org) Drosophila melanogaster (www.fruitfly.org) etc. . . http://genomesonline.org/ Bases de datos para una especie o un grupo de especies. Información mas detallada, con referencias a bancos de material (semillas, cultivos) y herramientas especializadas de análisis 22 Databases in Biology: NCBI Entrez and some of its tools 23 Created on Nov 4th, 1998, as a division of NLM and NIH What it does: Research at the molecular level in biomedical sciences, employing mathematical and computational methods. Collaboration with other institutes from the NIH, with academy, industry and other agencies Science communication, through meetings, workshops and conferences. Training in basic and applied research in computational biology. Develops, distributes, supports and coordinates access to many databases, and software for the scientific and medical community. Develops and promotes standards for data interchange. 24 http://www.ncbi.nlm.nih.gov/About/glance/ourmission.html NCBI: Site map 25 http://www.ncbi.nlm.nih.gov/Sitemap Manual NCBI: http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=handbook NCBI: Entrez The life science search engine Not a database itself, but a interface to query several databases at once: Literature Sequences ADN Proteins 3D structures Protein domains Population studies Gene expression data Genome data Taxonomic information 26 http://www.ncbi.nlm.nih.gov/Entrez/ Databases are crossreferenced! Entrez: neighbors Besides running the search simultaneously in all databases and create links among the different databases, ENTREZ implements the concept of neighbor entries in the same databases, e.g., related sequences, related 3D structures. 27 Online exercise (Hint: use the option for advanced search) Go to http://www.ncbi.nlm.nih.gov/ Search for Hemoglobin Subunit Beta (HBB) mRNA in humans Select nucleotide from the result screen Show the entry NM_000518.5 Identify the different pieces of information from the results screen. Entrez: Programmatic access API: Application program interface Eutils: Access to the data in ENTREZ, with out using the web interface. Very useful to automatically retrieve data. 28 http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html Some of the databases at NCBI 29 GenBank Nucleotide sequence database Part of the “International Nucleotide Sequence Database Collaboration”, that is integrated in addition by EMBL y el DDBJ In their latest release (15th June. 2022) there are 1.395.628.631.187 bases, from 239.017.893 reported sequences. It is updated approx. every 2 months Sequences are directly submitted by their own authors. There is a minimal quality check carried out by GenBank staff, and all the information associated to a sequence depends completely on their submitters The level of redundancy is high 30 http://www.ncbi.nlm.nih.gov/genbank/ La anotación de una secuencia consiste en la identificación de regiones en la estructura primaria, e.g., intrones, exones, dominios, motivos. Asignar papeles biológicos a la secuencia, e.g., tejidos en donde se expresa, procesos moleculares en los que participa. 30 Record format for a GenBank entry 31 http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html 31 Identifiers in the GenBank format Locus name: Up to 10 characters, must be unique. Due to the increase in the number of sequences in the last decades, the 10 char limit is not enough anymore. It was the first identifier in use, and have among its goals give an idea about the function of the gene, and its source organisms, e.g. HUMBB: human β-globin region Accession Number: Unique identifier. It is conserved (stable) across releases. Has the structure: 1+5 o 2+6. (Character + Numbers) The characters indicate the databases. This identifier never changes for a given entry in the database. Version: This is combined to the Accession number, and indicates the version of the entry. 32 RefSeq In contrast with genBank this is a curated database (secondary). It is curated by NBI staff and includes DNA, RNA and protein sequences. Each molecule has a single entry in RefSeq, so there is not redundancy. 33 GenBank vs. RefSeq 34 35 Growth of GenBank data and cost of data generation Illumina PacBio Ion torrent ONT PacBio HiFi ONT Q20+ PCR Draft HGP Human Microbiome- 1000 Genomes Finished HGP https://www.genome.gov/about-genomics/fact-sheets/DNA-Sequencing-Costs-Data The data deluge: Costs 36 Stein, 2010. Genome Biology, 11:207 “in the not too distant future it will cost less to sequence a base of DNA than to store it on a hard disk” There is not enough bioinformaticians to cope with the speed for data generation. Biologist should become savy on their own sequence data analysis http://www.genomebiology.com/2010/11/5/207/figure/F2?highres=y http://genomebiology.com/2010/11/5/207 Historical trends in storage prices versus DNA sequencing costs. The blue squares describe the historic cost of disk prices in megabytes per US dollar. The long-term trend (blue line, which is a straight line here because the plot is logarithmic) shows exponential growth in storage per dollar with a doubling time of roughly 1.5 years. The cost of DNA sequencing, expressed in base pairs per dollar, is shown by the red triangles. It follows an exponential curve (yellow line) with a doubling time slightly slower than disk storage until 2004, when next generation sequencing (NGS) causes an inflection in the curve to a doubling time of less than 6 months (red line). These curves are not corrected for inflation or for the 'fully loaded' cost of sequencing and disk storage, which would include personnel costs, depreciation and overhead. 36 PubMed 37 http://www.pubmed.gov/ PubMed 38 PubMed 39 Some other databases @ NCBI OMIM: Online Mendelian Inheritance in Man. Catalogue of human genes and genetic disorders Books: NCBI offers several dozens of book online Taxonomy: Taxonomy search engine for the mayor taxonomy divisions. Structure: Molecular Modelling Database. Contains 3D protein structures. 40 41 Entrez Go to the site: http://www.ncbi.nlm.nih.gov/ Select the database: PubMed 41 42 Entrez Search for the papers of a singleauthor (Weisshaar B) from one specific year (2009). Use the modifiers DP y AU. As it is shown below: As an alternative option you can use the “Advance search” option http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helppubmed&part=pubmedhelp#pubmedhelp.Search_Field_Descrip http://www.ncbi.nlm.nih.gov/ 42 43 Entrez Select the paper shown in the figure: You can follow this link to find other papers on the same matter. Click on the title of the paper. 43 44 Entrez Look for the link to related genes and follow it. Abstract 44 45 Entrez The link “Gene” will lead us to what is like a review paper for that specific transcript. This is the result of someone (an expert) collecting the infomration in the database (curation). From this page you can get access to all the sequences associated to the gene. 45 We need standardization A common vocabulary diego.riano@cena.usp.br - https://diriano.github.io/ Ontologies Standardized vocabulary in a knoledge domain: They allow sharing information For example in genome annotation, they allow to transfer information from one species to another The allow to answer question such as: What are all the transcription factors in Rice and Arabidopsis that are anchored to the membrane and active upon developmental queues. diego.riano@cena.usp.br - https://diriano.github.io/ Ontologies Bard & Rhee, 2004. https://www.nature.com/articles/nrg1295 ‘dwarf plant’ phenotype Observable Anatomy: Stem Trait Attribute: Height Value: Height Experimental condition Assay: Measurement with a ruler Ontologies Anatomy: Biological process Behaviour Metabolism Chemical Structure Cell Types Cell component Disease . . . Ontologies Attribute: Value: Qualifier Unit Ontologies Assay: Condition Environmental Genotypic Multiple ontologies and terms can be used to describe a phenotype/trait diego.riano@cena.usp.br - https://diriano.github.io/ Biological Ontologies http://www.obofoundry.org/ diego.riano@cena.usp.br - https://diriano.github.io/ Gene Ontology (GO) Developed to unify the genome annotations of the fruit fly, the mouse and the yeast A large fraction of the annotations are made by human experts. Describe the functionalities of gene products GO is actually 3 different ontologies: Cellular Compartment Biological Process Molecular Function http://www.geneontology.org/ GO is represented as a directed and acyclic graph Plessis, et al., 2011. doi: 10.1093/bib/bbr002 diego.riano@cena.usp.br - https://diriano.github.io/ GO as a directed and acyclic graph General Specific diego.riano@cena.usp.br - https://diriano.github.io/ Relationships among GO terms 52 Relationships: is a (is a subtype of) part of regulates negatively regulates positively regulates diego.riano@cena.usp.br - https://diriano.github.io/ 52 Gene Ontology Example Searching with keywords http://www.geneontology.org/ diego.riano@cena.usp.br - https://diriano.github.io/ Gene Ontology Example Selecting appropriate term diego.riano@cena.usp.br - https://diriano.github.io/ Gene Ontology Example Term definition and related info diego.riano@cena.usp.br - https://diriano.github.io/ Gene Ontology Example Gene products directly associated with term Filtering, only viridiplantae within the molecular function ontology. diego.riano@cena.usp.br - https://diriano.github.io/ Gene Ontology Example Gene – GO term associations have evidences diego.riano@cena.usp.br - https://diriano.github.io/ Bioinformatics Sequence similarity 58 Sequence comparison 59 Goal: To find regions of similarity in a set of sequences. Sequence comparison What for? Transfer functional information from something well studied to something new. Sequence assembly: transcripts or genomes. Feature identifications, e.g., exon borders, domains. Identify conserved regions, phylogenetics. 60 Sequence comparison: Transferring functional information 61 You just obtained the following sequence of your gene of interest, from a cell cycle experiment. You do not know yet its function >unknown_seq TAAAATTCCCTCCTTCCCTCGTTTTCTGCTCTCTCCTTTTCTTTTCTTCTTCCTCTTTCTCTCACTAAAACCCTTGTTTC TTCACTCGCCGTCGCTTTTCCCGTCATCGGAATCTTCAAATTCGACTCTCGCTTCACTACGATCCATGTCCGGTGTCGTA GATCTTCTCCCGGTTCTTCTCAGCCGCCACCGCCGCCGCCGCACCATCCACCGTCATCTCCGGTTCCGGTTACATCTACG CGGTTATACCACCTATACGTCGTCACTTAGCTTTCGCCTCAACAAAACCTCCGTTTCATCCTTCCGATGATTACCATCGA TTAACCCTTCTTCGCTCAGTAATAATAACGACAGGAGCTTCGTTCATGGTTGTGGTGTTGTAGATCGGGAGGAAGATGCT TCGTTGTTAGATCTCCTTCACGAAAGAGAAAGGCGACAATGGATATGGTTGTTGCTCCATCTAATAATGGATTCACGAGT CTGGTTTCACTAACATACCTAGCAGTCCCTGTCAAACTCCTAGAAAAGGGGGCAGAGTCAACATCAAGTCAAAGGCCAAA GAAACAAGTCAACTCCTCAAACACCCATCTCGACAAACGCTGGTTCTCCTATCACACTTACTCCATCAGGAAGTTGTCGT ATGACAGTTCTTTAGGTCTCCTTACAAAAAAGTTCGTCAATCTAATTAAACAAGCCAAAGATGGAATGCTGGACCTAAAC AAGCTGCAGAAACATTGGAGGTGCAGAAACGACGTATATATGATATTACAAACGTTTTGGAGGGGATAGATCTCATTGAA AGCCTTTCAAGAATCGAATACTTTGGAAGGGAGTTGATGCGTGTCCTGGCGATGAGGATGCTGACGTATCTGTATTACAG CAGAAATTGAAAACCTCGCCCTCGAAGAGCAAGCATTAGACAACCAAATCAGACAAACAGAGGAAAGATTAAGAGACCTG GCGAAAATGAAAAGAATCAGAAATGGCTTTTTGTAACTGAAGAGGATATCAAGAGTTTACCAGGTTTCCAGAACCAGACT TGATAGCCGTCAAAGCTCCTCATGGCACAACTTTGGAAGTGCCTGATCCAGATGAAGCGGCTGACCACCCACAAAGGAGA ACAGGATCATTCTTAGAAGTACAATGGGACCTATTGACGTATACCTCGTCAGCGAATTTGAAGGGAAATTCGAAGACACA ATGGGAGTGGTGCAGCACCACCAGCATGCTTGCCTATTGCTTCTAGCTCAGGATCTACAGGACACCATGACATCGAAGCC TAACTGTTGACAACCCAGAAACTGCTATTGTGTCTCATGATCATCCTCATCCTCAACCCGGCGATACCTCTGATCTTAAT ATTTGCAAGAGCAAGTAGGAGGAATGCTTAAGATTACTCCCTCTGATGTTGAAAATGATGAGTCGGACTACTGGCTTCTC CAAATGCTGAGATTAGCATGACGGATATTTGGAAAACTGACTCTGGTATCGATTGGGATTATGGAATAGCCGACGTGAGT CTCCACCACCAGGAATGGGCGAAATAGCACCAACAGCTGTTGACTCAACCCCGAGATGATCGAATACCAAGCACACTTCT AACTTCTGATCCCAAATGTGTTACCTCACAACACTCCCTAAAATCATATACAAGGAGGGAGCAACTACAGAACGTGTATG ACCAATGGCAGGTGCGTTCCATACAATGTACCATTAGATTATGATTCATTTATCGCCTAGAGTGATGTTGTAGAGGAGCA CGAGAAACTAATGTAAGTTTAACAGAGAATGTACTTCATCGGCTGCATTGGTACACTATTTGATTATAATATTTTTGACC CTCAAATGCATCTTTATAATCAGCTA Your first option is to compare this sequence against a database of sequences of known functions Sequence comparison: Transferring functional information 62 For more info check: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3820096/ Sequence comparison 63 What for? Transfer functional information from something well studied to something new. Sequence assembly: transcripts or genomes. Feature identifications, e.g., exon borders, domains. Identify conserved regions, phylogenetics. Sequence comparison: Genome and transcript assembly 64 http://www.nature.com/scitable/topicpage/complex-genomes-shotgun-sequencing-609 Building unigene sets from EST sequences Sequence comparison 65 What for? Transfer functional information from something well studied to something new. Sequence assembly: transcripts or genomes. Feature identifications, e.g., exon borders, domains. Identify conserved regions, phylogenetics. Sequence comparison: Identify features: exon-intron borders 66 http://www.ncbi.nlm.nih.gov/sutils/splign/splign.cgi Sequence comparison 67 What for? Transfer functional information from something well studied to something new. Sequence assembly: transcripts or genomes. Feature identifications, e.g., exon borders, domains. Identify conserved regions, phylogenetics. Sequence comparison: Identify conserved regions 68 Sequence comparison: Identify conserved regions 69 About similarity and homology 70 ‘Protein X is 43% is homologous to protein Y’ WRONG! You could say that the two sequences are 43% identical or 43% similar. Percent homology does not exists. You can't be partially homologous: that would be like being partially dead, or partially pregnant. You're either homologous or you're not. Petsko 2001, Genome Biology 2(2) Similarity This is a measure that tells you the degree of match between corresponding positions in a sequence. Usually it is expressed a percent identiy or percent conservation. IdentityThis measure expressed the proportion of positions that do not change between two sequences. Homology This is not a measure but a statement about the relationship between sequences. Two sequences are homologous if and only if they are related by divergent evolution from a common ancestor. Sobre similitud y homología 71 Similarity can suggest homology. Homologues traits/sequences might (or might not) have the same function. If two sequences are similar along their full length, in most cases they are homologues. >40% identity strongly suggest homology. Low complexity regions (repeats), can be highly similar without being homologues. Homologous sequences are not always similar. Types of Homology: Orthologous and Paralogous genes Orthologous genes: The most recent evolutionary event by which two genes are related to each other is a speciation event. Co-orthologues or in-paralogues: When a genomic duplication event follows the speciation event. Paralogous genes: The most recent evolutionary event by which two genes are related to each other is a gene duplication event. Xenologous genes: Genes that originate from a horizontal transfer event. 72 We need to know both the evolutionary history of the species and of the genes to identify such types of homology Types of Homology: Orthologous and Paralogous genes 73 Fitch, W. M. Homology a personal view on some of the problems. Trends Genet, 2000, 16, 227-231 Moss Arabidopsis Rice Genómica Genoma: Colección de todas los genes de un sistema biológico. Metodologías experimentales mas usadas: PCR, PCR en tiempo real, microarreglos, técnicas de secuenciación de segunda generación 74 74 Genómica 75 ¿Que papel desempeña la bioinformática en genómica? Montaje de secuencias, genomas (cromosomas) completos. Predicción de genes y ORFs. Predicción de estructura génica (exones, intrones). Anotación funcional de genes mediante genómica comparada Visualización de genomas y su anotación. 75 76 During this course we will focus on the application of bioinformatics to genomics, and a little bit to transcriptomics, as one aspect of functional genomics 76 Ensembl (http://www.ensembl.org/) 77 Ensembl (http://www.ensembl.org/) 78 Ensembl (http://www.ensembl.org/) 79 Ensembl (http://www.ensembl.org/) 80 Diego M. Riaño Pachón - MPIMP Ensembl (http://www.ensembl.org/) 81 Ensembl (http://www.ensembl.org/) 82 Ensembl (http://www.ensembl.org/) 83 Ensembl (http://www.ensembl.org/) 84 Variantes de splicing Ortólogos y parálogos Regulación Variación poblacional 85 That’s all for today 85 image9.png image10.gif image11.png image12.png image13.png image21.gif image14.gif image15.gif image16.png image17.png image18.jpeg image19.jpeg image20.png image22.png image23.png image24.png image25.png image26.png image27.png image28.png image29.png image30.png image31.png image32.emf 1990 1992 1994 1996 1998 2000 2003 2004 2006 2008 2010 2012 0 1 10 100 1,000 10,000 100,000 1,000,000 0.1 1 10 100 1000 10,000 100,000 1,000,000 10,000,000 100,000,000 Year D is k st or ag e (M by te s/ $) D N A sequencing (bp/$) Hard disk storage (MB/$) Doubling time 14 months Pre-NGS (bp/$) Doubling time 19 months - NGS (bp/$) Doubling time 5 months image33.png image34.png image35.png image36.png image37.png image38.png image39.png image40.png image41.png image42.png image43.png image7.jpeg image8.png image44.png image45.gif image46.png image47.jpeg image48.png image49.png image50.gif image51.gif image52.png image53.png image54.png image55.png image56.png image57.png image58.png image59.png image60.png image61.png image62.png image63.jpeg image64.png image65.png image66.png image67.png image68.png image69.png image70.gif image71.png image72.png image73.png image74.png image75.png image76.png image77.png image78.png image79.png image2.jpeg
Compartilhar