Buscar

Genômica e Bioinformática

Prévia do material em texto

1
Genômica e Bioinformática
Diego Mauricio Riaño Pachón
diego.riano@cena.usp.br
CEN5789
Quinta-feira: Teoria 8:00h - 12:00h. 
Quinta-feira: Prática 13:30h - 16:30h
2
Rules of the game
You are supposed to come to class 
You are supposed to read the mandatory literature BEFORE each class
You are supposed to actively participate during class
Respect: No cell phones, no headphones, no chatting, no SMS/WhatsApp, etc, pay attention to the professor and your classmates, and participate
Grades:
Presentation of 1 subject		30%
Submission of pre-project		10%
Final project presentation		40%
Surprise Quizzes			20%
Final grade rounding
Grades above *.25 will be rounded up to *.5, grade below *.25 will be rounded down to the previous integer. Grades above *.75 will be rounded up to the next integer, grade lower *.75 and higher than *.5 will be rounded to *.5
All partial grades will be on a scale from 0 to 10, with increments of 0.5 only.
e-Disciplinas
3
Rules of the game
Grades and final concept
Grades will be on the scale 0-10
4
	Final Concept	Grade
	A: Excelente, com direito a crédito	≥9.0 - ≤10.0
	B: Bom, com direito a crédito	≥6.5 - <9.0
	C: Regular, com direito a crédito	≥5.0 - <6.5
	R: Reprovado, sem direito a crédito	<5.0
Rules of the game
Weekly presentation of two 1 subject per group	30%
Groups of students will make a presentation of one theoretical topic following the weekly schedule available in e-Disciplinas. Groups have already been created by the professor.
Presentation should last at least 1.5h and no more than 3h and should cover the topics presented in the program and/or requested by the professor.
Presentation material (e.g., slides) should be send to the professor the night before (until 19h) of the presentation.
At noon during class the group in charge of next’s week topic will be chosen
5
Rules of the game
Pre-project 	10%
Groups of 2 students (same groups as before) must choose a project to be developed outside the classroom, applying techniques/concepts learned during the course.
The idea of the project, 2 pages, including half a page of literature references must be delivered, via e-Disciplinas, to the professor, before the class of week 3.
Students are encouraged to discuss their project with the professor before the class of week 3.
6
Rules of the game
Final project 	40%
The oral presentations of the project will be carried out in week 10. All members of the group must participate of the oral presentation. Presentation should not last more than 20 min + 10 min questions. Slides must be sent via eDisciplinas until 19h the night before the presentations.
A report in the form of a research paper must be delivered to the professor, via e-Disciplinas, before the oral presentation.
7
Rules of the game
Surprise Quizzes	10%
There will be surprize quizzes along the course, on an undetermined number and undefined dates. Look forward to them.
8
Rules of the game
E-Disciplinas
Everyone please log into the Moodle system and check whether you have access to the course CEN5789-2022
https://edisciplinas.usp.br/
Check that you have access to the system. All tests, quizzes, homework will be done and delivered via e-disciplinas
9
10
10
Check groups
Select group responsible for next week presentation!
11
Survey
12
Have you used Linux before?
Do you know about EMBOSS? Have you used it?
Do you regularly use BLAST at NCBI? With the command line?
Do you use Pfam? HMMER?
Do you know any computer programming language?
Which computational tools you use regularly to solve biological problems?
In which course are you enrolled?
13
Talk a Little bit about youselves
In one sentence What is your research topic? 
Why are you interested in this course?
14
Flow of information in the cell
15
Replicación: Linea de ensamblaje. La proteína azul en la parte inferior izquierda es la helicasa que se encarga de desenrollar el ADN en las dos hebras, la líder y la rezagada. La rezagada se sintetiza hacia atrás en fragmentos de Okazaki. El resultado son dos copias completas e idénticas de la molécula original.
Transcripción: La animación inicia con la TBP unida al promotor. Llega la ARN polimerasa y los factores de transcripción basales, se forma el complejo de inicio de la transcripción. Un factor de transcripción especifico activa al complejo y la transcripción inicia. El ARN naciente se pliega sobre si mismo y esto puede afectar la velocidad de la transcripción. Esta continúa hasta llegar al sitio de finalización de la transcripción. El ARN tiene que madurar antes de ser exportado al citosol.
Traducción: El ARN mensajero maduro es reconocido por el ribosoma, quien lo barre en busca de un codón de inicio, una vez lo encuentra inicia la traducción. Vista de la traducción al remover la subunidad pequeña del ribosoma. tRNA, estructura secundaria de RNA. Muestra sitio aminoacil, sitio peptidil y sitio de salida del tRNA. Muestra ARN cargados con amino ácidos. Nuestra el polipéptido naciente, este se va plegando sobre si mismo de forma autónoma, a veces ayudado por otras proteínas.
15
Biología molecular para bioinformática
16
La colección completa de metabolitos hace parte del fenotipo celular, de hechos muchas veces los usamos como un proxy del estado de la célula, e.g., diagnostica, glicemia
16
What’s Bioinformatics?
17
Today’s biologists try to work with full collection of stuff (omes)
Our working definition:
The use of computational, mathematical and statistical tools to handle, analyse, interprete and generate biological data
La bioinformática es una disciplina que surge de la interacción entre la biología, la estadística y las ciencias de la computación (Figura 1.1. Tiene como principales objetivos el manejo y análisis de grandes volúmenes de datos, principalmente producto de las nuevas tecnologías en biología molecular, como la genómica, la proteómica y la metabolómica, especialmente hoy en día con el advenimiento de nuevas tecnologías de secuenciación de ácidos nucleicos que están revolucionando la forma en como estudiamos los genomas. Otro aspecto importante incluye el desarrollo de nuevos métodos computacionales, algoritmos y/o software, para el análisis de esos datos. Según Philip Bourne (UCSD), “la bioinformática se ha convertido en el interprete del lenguaje genómico del ADN y está intentando descifrar lenguajes mas complejos en los que las proteínas son los sustantivos, las interacciones son la sintaxis, las rutas metabólicas son las oraciones y los sistemas vivos son el volumen completo” (Bourne, 2004).Por lo tanto, de forma similar a la biología molecular, la bioinformática constituye hoy en día una caja de herramientas que todo investigador en biología tiene que manejar (Stein, 2008 presenta un punto de vista muy interesante).
17
What a bioinformatician does?
18
Handling large amounts of data, mostly sequences, spectra and networks.
Design and implementation of databases.
Use and design of standardized vocabularies (ontologies).
Sequence and network alignment.
Statistical assessments: differential gene expression, cluster analysis, functional annotation of clusters.
Data visualization.
La bioinformática es una disciplina que surge de la interacción entre la biología, la estadística y las ciencias de la computación (Figura 1.1. Tiene como principales objetivos el manejo y an ́alisis de grandes volu ́menes de datos, principalmente producto de las nuevas tecnolog ́ıas en biolog ́ıa molecular, como la gen ́omica, la prote ́omica y la metabol ́omica, especialmente hoy en d ́ıa con el advenimiento de nuevas tecnologías de secuenciación de ácidos nucleicos que est ́an revolucionando la forma en como estudiamos los genomas. Otro aspecto importante incluye el desarrollo de nuevos m ́etodos computacionales, algoritmos y/o software, para el análisis de esos datos.Segu ́n Philip Bourne (UCSD), “la bioinform ́atica se ha convertido en elinterprete del len- guaje gen ́omico del ADN y est ́a intentando descifrar lenguajes mas complejos en los que las prote ́ınas son los sustantivos, las interacciones son la sintaxis, las rutas metab ́olicas son las oraciones y los sistemas vivos son el volumen completo” (Bourne, 2004).Por lo tanto, de forma similar a la biolog ́ıa molecular, la bioinform ́atica constituye hoy en d ́ıa una caja de herramientas que todo investigador en biolog ́ıa tiene que manejar (Stein, 2008 presenta un punto de vista muy interesante).
18
19
Databases
19
20
¿Qué es una base de datos?
 Pero . . .esa colección debe ser/estar:
Estructurada
Indexada
Actualizada regularmente (±)
Contener referencias cruzadas (±)
 Es una colección de lo que usted quiera.
 Debe incluir herramientas para hacer actualizaciones, agregar y borrar entradas 
 Existen diferentes formatos: texto plano, esquemas relacionales, XML, etc.
20
21
Algunas bases de datos biológicas
GenBank, DDBJ, EMBL	ADN, ARN
Swissprot				Proteínas
MEDLINE				Literatura
RSCB Protein Data Bank	Estructuras 3D
KEGG				Rutas metabólicas
Swiss-2DPAGE			Geles 2D, datos MS
Pfam, PROSITE, InterPro	Dominios proteicos
TRANSFAC, PLACE		Sitios de regulación
¿Quieren mas?
http://www.expasy.org/links.html
El número especial de bases de datos de la revista
Nucleic acids research que se publica en cada año en Enero
21
Bases de datos específicas de organismos
22
TAIR: The Arabidopsis Information Resource (Arabidopsis thaliana)
J. Craig Venter Institute (Wheat, Rice, microorganisms)
Ensembl (Human, Pan, Gallus, Mus, Canis, Apis, …)
Chlamydomonas reinhardtii genomeDB (JGI-DOE)
Saccharomyces cerevisiae (www.yeastgenome.org)
Drosophila melanogaster (www.fruitfly.org)
etc. . . 
http://genomesonline.org/
Bases de datos para una especie o un grupo de especies. Información mas detallada, con referencias a bancos de material (semillas, cultivos) y herramientas especializadas de análisis
22
Databases in Biology: NCBI Entrez and some of its tools
23
Created on Nov 4th, 1998, as a division of NLM and NIH
What it does:
Research at the molecular level in biomedical sciences, employing mathematical and computational methods.
Collaboration with other institutes from the NIH, with academy, industry and other agencies
Science communication, through meetings, workshops and conferences.
Training in basic and applied research in computational biology.
Develops, distributes, supports and coordinates access to many databases, and software for the scientific and medical community.
Develops and promotes standards for data interchange.
24
http://www.ncbi.nlm.nih.gov/About/glance/ourmission.html
NCBI: Site map
25
http://www.ncbi.nlm.nih.gov/Sitemap
Manual NCBI: http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=handbook
NCBI: Entrez The life science search engine
Not a database itself, but a interface to query several databases at once:
Literature 
Sequences
ADN
Proteins
3D structures
Protein domains
Population studies
Gene expression data
Genome data
Taxonomic information
26
http://www.ncbi.nlm.nih.gov/Entrez/
Databases are crossreferenced!
Entrez: neighbors
Besides running the search simultaneously in all databases and create links among the different databases, ENTREZ implements the concept of neighbor entries in the same databases, e.g., related sequences, related 3D structures.
27
Online exercise (Hint: use the option for advanced search)
Go to http://www.ncbi.nlm.nih.gov/
Search for Hemoglobin Subunit Beta (HBB) mRNA in humans
Select nucleotide from the result screen
Show the entry NM_000518.5
Identify the different pieces of information from the results screen.
Entrez: Programmatic access
API: Application program interface
Eutils: Access to the data in ENTREZ, with out using the web interface. Very useful to automatically retrieve data.
28
http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
Some of the databases at NCBI
29
GenBank
Nucleotide sequence database
Part of the “International Nucleotide Sequence Database Collaboration”, that is integrated in addition by EMBL y el DDBJ
In their latest release (15th June. 2022) there are 1.395.628.631.187 bases, from 239.017.893 reported sequences.
It is updated approx. every 2 months
Sequences are directly submitted by their own authors. There is a minimal quality check carried out by GenBank staff, and all the information associated to a sequence depends completely on their submitters
The level of redundancy is high
30
http://www.ncbi.nlm.nih.gov/genbank/
La anotación de una secuencia consiste en 
	la identificación de regiones en la estructura primaria, e.g., intrones, exones, dominios, motivos.
	Asignar papeles biológicos a la secuencia, e.g., tejidos en donde se expresa, procesos moleculares en los que participa.
30
Record format for a GenBank entry
31
http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html
31
Identifiers in the GenBank format
Locus name: Up to 10 characters, must be unique. Due to the increase in the number of sequences in the last decades, the 10 char limit is not enough anymore. It was the first identifier in use, and have among its goals give an idea about the function of the gene, and its source organisms, e.g. HUMBB: human β-globin region
Accession Number: Unique identifier. It is conserved (stable) across releases. Has the structure: 1+5 o 2+6. (Character + Numbers) The characters indicate the databases. This identifier never changes for a given entry in the database.
Version: This is combined to the Accession number, and indicates the version of the entry. 
32
RefSeq
In contrast with genBank this is a curated database (secondary). It is curated by NBI staff and includes DNA, RNA and protein sequences.
Each molecule has a single entry in RefSeq, so there is not redundancy.
33
GenBank vs. RefSeq
34
35
Growth of GenBank data and cost of data generation
Illumina
PacBio
Ion torrent
ONT
PacBio HiFi
ONT Q20+
PCR
Draft HGP
Human Microbiome- 1000 Genomes
Finished HGP
https://www.genome.gov/about-genomics/fact-sheets/DNA-Sequencing-Costs-Data
The data deluge: Costs
36
Stein, 2010. Genome Biology, 11:207
“in the not too distant future it will cost less to sequence a base of DNA than to store it on a hard disk”
There is not enough bioinformaticians to cope with the speed for data generation.
Biologist should become savy on their own sequence data analysis
http://www.genomebiology.com/2010/11/5/207/figure/F2?highres=y
http://genomebiology.com/2010/11/5/207
Historical trends in storage prices versus DNA sequencing costs. The blue squares describe the historic cost of disk prices in megabytes per US dollar. The long-term trend (blue line, which is a straight line here because the plot is logarithmic) shows exponential growth in storage per dollar with a doubling time of roughly 1.5 years. The cost of DNA sequencing, expressed in base pairs per dollar, is shown by the red triangles. It follows an exponential curve (yellow line) with a doubling time slightly slower than disk storage until 2004, when next generation sequencing (NGS) causes an inflection in the curve to a doubling time of less than 6 months (red line). These curves are not corrected for inflation or for the 'fully loaded' cost of sequencing and disk storage, which would include personnel costs, depreciation and overhead.
36
PubMed
37
http://www.pubmed.gov/
PubMed
38
PubMed
39
Some other databases @ NCBI
OMIM: Online Mendelian Inheritance in Man. Catalogue of human genes and genetic disorders
Books: NCBI offers several dozens of book online
Taxonomy: Taxonomy search engine for the mayor taxonomy divisions. 
Structure: Molecular Modelling Database. Contains 3D protein structures.
40
41
Entrez
Go to the site: http://www.ncbi.nlm.nih.gov/
Select the database: PubMed
41
42
Entrez
Search for the papers of a singleauthor (Weisshaar B) from one specific year (2009). Use the modifiers DP y AU. As it is shown below:
As an alternative option you can use the “Advance search” option
http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helppubmed&part=pubmedhelp#pubmedhelp.Search_Field_Descrip
http://www.ncbi.nlm.nih.gov/
42
43
Entrez
Select the paper shown in the figure:
You can follow this link to find other papers on the same matter. 
Click on the title of the paper.
43
44
Entrez
Look for the link to related genes and follow it.
Abstract
44
45
Entrez
The link “Gene” will lead us to what is like a review paper for that specific transcript. This is the result of someone (an expert) collecting the infomration in the database (curation). From this page you can get access to all the sequences associated to the gene.
45
We need standardization
A common vocabulary
diego.riano@cena.usp.br - https://diriano.github.io/
Ontologies
Standardized vocabulary in a knoledge domain: They allow sharing information
For example in genome annotation, they allow to transfer information from one species to another
The allow to answer question such as: What are all the transcription factors in Rice and Arabidopsis that are anchored to the membrane and active upon developmental queues.
diego.riano@cena.usp.br - https://diriano.github.io/
Ontologies
Bard & Rhee, 2004. https://www.nature.com/articles/nrg1295 
‘dwarf plant’ phenotype 
Observable
Anatomy:
Stem
Trait
Attribute:
Height
Value:
Height
Experimental condition
Assay:
Measurement with a ruler
Ontologies
Anatomy:
Biological process
	Behaviour
	Metabolism
Chemical Structure
Cell Types
Cell component
Disease
. . . 
Ontologies
Attribute:
Value:
Qualifier
Unit
Ontologies
Assay:
Condition
	Environmental
	Genotypic
Multiple ontologies and terms can be used to describe a phenotype/trait
diego.riano@cena.usp.br - https://diriano.github.io/
Biological Ontologies
http://www.obofoundry.org/
diego.riano@cena.usp.br - https://diriano.github.io/
Gene Ontology (GO)
Developed to unify the genome annotations of the fruit fly, the mouse and the yeast
A large fraction of the annotations are made by human experts.
Describe the functionalities of gene products
GO is actually 3 different ontologies:
Cellular Compartment
Biological Process
Molecular Function
http://www.geneontology.org/
GO is represented as a directed and acyclic graph
Plessis, et al., 2011. doi:  10.1093/bib/bbr002
diego.riano@cena.usp.br - https://diriano.github.io/
GO as a directed and acyclic graph
General
Specific
diego.riano@cena.usp.br - https://diriano.github.io/
Relationships among GO terms
52
Relationships:
is a (is a subtype of)
part of
regulates
negatively regulates
positively regulates
diego.riano@cena.usp.br - https://diriano.github.io/
52
Gene Ontology Example
Searching with keywords
http://www.geneontology.org/
diego.riano@cena.usp.br - https://diriano.github.io/
Gene Ontology Example
Selecting appropriate term
diego.riano@cena.usp.br - https://diriano.github.io/
Gene Ontology Example
Term definition and related info
diego.riano@cena.usp.br - https://diriano.github.io/
Gene Ontology Example
Gene products directly associated with term
Filtering, only viridiplantae within the molecular function ontology.
diego.riano@cena.usp.br - https://diriano.github.io/
Gene Ontology Example
Gene – GO term associations have evidences
diego.riano@cena.usp.br - https://diriano.github.io/
Bioinformatics
Sequence similarity
58
Sequence comparison
59
Goal: To find regions of similarity in a set of sequences.
Sequence comparison
What for?
Transfer functional information from something well studied to something new.
Sequence assembly: transcripts or genomes.
Feature identifications, e.g., exon borders, domains.
Identify conserved regions, phylogenetics.
60
Sequence comparison:
Transferring functional information
61
You just obtained the following sequence of your gene of interest, from a cell cycle experiment.
You do not know yet its function
>unknown_seq
TAAAATTCCCTCCTTCCCTCGTTTTCTGCTCTCTCCTTTTCTTTTCTTCTTCCTCTTTCTCTCACTAAAACCCTTGTTTC
TTCACTCGCCGTCGCTTTTCCCGTCATCGGAATCTTCAAATTCGACTCTCGCTTCACTACGATCCATGTCCGGTGTCGTA
GATCTTCTCCCGGTTCTTCTCAGCCGCCACCGCCGCCGCCGCACCATCCACCGTCATCTCCGGTTCCGGTTACATCTACG
CGGTTATACCACCTATACGTCGTCACTTAGCTTTCGCCTCAACAAAACCTCCGTTTCATCCTTCCGATGATTACCATCGA
TTAACCCTTCTTCGCTCAGTAATAATAACGACAGGAGCTTCGTTCATGGTTGTGGTGTTGTAGATCGGGAGGAAGATGCT
TCGTTGTTAGATCTCCTTCACGAAAGAGAAAGGCGACAATGGATATGGTTGTTGCTCCATCTAATAATGGATTCACGAGT
CTGGTTTCACTAACATACCTAGCAGTCCCTGTCAAACTCCTAGAAAAGGGGGCAGAGTCAACATCAAGTCAAAGGCCAAA
GAAACAAGTCAACTCCTCAAACACCCATCTCGACAAACGCTGGTTCTCCTATCACACTTACTCCATCAGGAAGTTGTCGT
ATGACAGTTCTTTAGGTCTCCTTACAAAAAAGTTCGTCAATCTAATTAAACAAGCCAAAGATGGAATGCTGGACCTAAAC
AAGCTGCAGAAACATTGGAGGTGCAGAAACGACGTATATATGATATTACAAACGTTTTGGAGGGGATAGATCTCATTGAA
AGCCTTTCAAGAATCGAATACTTTGGAAGGGAGTTGATGCGTGTCCTGGCGATGAGGATGCTGACGTATCTGTATTACAG
CAGAAATTGAAAACCTCGCCCTCGAAGAGCAAGCATTAGACAACCAAATCAGACAAACAGAGGAAAGATTAAGAGACCTG
GCGAAAATGAAAAGAATCAGAAATGGCTTTTTGTAACTGAAGAGGATATCAAGAGTTTACCAGGTTTCCAGAACCAGACT
TGATAGCCGTCAAAGCTCCTCATGGCACAACTTTGGAAGTGCCTGATCCAGATGAAGCGGCTGACCACCCACAAAGGAGA
ACAGGATCATTCTTAGAAGTACAATGGGACCTATTGACGTATACCTCGTCAGCGAATTTGAAGGGAAATTCGAAGACACA
ATGGGAGTGGTGCAGCACCACCAGCATGCTTGCCTATTGCTTCTAGCTCAGGATCTACAGGACACCATGACATCGAAGCC
TAACTGTTGACAACCCAGAAACTGCTATTGTGTCTCATGATCATCCTCATCCTCAACCCGGCGATACCTCTGATCTTAAT
ATTTGCAAGAGCAAGTAGGAGGAATGCTTAAGATTACTCCCTCTGATGTTGAAAATGATGAGTCGGACTACTGGCTTCTC
CAAATGCTGAGATTAGCATGACGGATATTTGGAAAACTGACTCTGGTATCGATTGGGATTATGGAATAGCCGACGTGAGT
CTCCACCACCAGGAATGGGCGAAATAGCACCAACAGCTGTTGACTCAACCCCGAGATGATCGAATACCAAGCACACTTCT
AACTTCTGATCCCAAATGTGTTACCTCACAACACTCCCTAAAATCATATACAAGGAGGGAGCAACTACAGAACGTGTATG
ACCAATGGCAGGTGCGTTCCATACAATGTACCATTAGATTATGATTCATTTATCGCCTAGAGTGATGTTGTAGAGGAGCA
CGAGAAACTAATGTAAGTTTAACAGAGAATGTACTTCATCGGCTGCATTGGTACACTATTTGATTATAATATTTTTGACC
CTCAAATGCATCTTTATAATCAGCTA
Your first option is to compare this sequence against a database of sequences of known functions
Sequence comparison:
Transferring functional information
62
For more info check: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3820096/
Sequence comparison
63
What for?
Transfer functional information from something well studied to something new.
Sequence assembly: transcripts or genomes.
Feature identifications, e.g., exon borders, domains.
Identify conserved regions, phylogenetics.
Sequence comparison:
Genome and transcript assembly
64
http://www.nature.com/scitable/topicpage/complex-genomes-shotgun-sequencing-609
Building unigene sets from EST sequences
Sequence comparison
65
What for?
Transfer functional information from something well studied to something new.
Sequence assembly: transcripts or genomes.
Feature identifications, e.g., exon borders, domains.
Identify conserved regions, phylogenetics.
Sequence comparison:
Identify features: exon-intron borders
66
http://www.ncbi.nlm.nih.gov/sutils/splign/splign.cgi
Sequence comparison
67
What for?
Transfer functional information from something well studied to something new.
Sequence assembly: transcripts or genomes.
Feature identifications, e.g., exon borders, domains.
Identify conserved regions, phylogenetics.
Sequence comparison:
Identify conserved regions
68
Sequence comparison:
Identify conserved regions
69
About similarity and homology
70
‘Protein X is 43% is homologous to protein Y’ WRONG!
You could say that the two sequences are 43% identical or 43% similar.
Percent homology does not exists.
You can't be partially homologous: that would be like being partially dead, or partially pregnant. You're either homologous or you're not.
 Petsko 2001, Genome Biology 2(2)
	Similarity	This is a measure that tells you the degree of match between corresponding positions in a sequence. Usually it is expressed a percent identiy or percent conservation.
	IdentityThis measure expressed the proportion of positions that do not change between two sequences.
	Homology	This is not a measure but a statement about the relationship between sequences. Two sequences are homologous if and only if they are related by divergent evolution from a common ancestor.
Sobre similitud y homología
71
Similarity can suggest homology. Homologues traits/sequences might (or might not) have the same function.
If two sequences are similar along their full length, in most cases they are homologues.
>40% identity strongly suggest homology.
Low complexity regions (repeats), can be highly similar without being homologues.
Homologous sequences are not always similar.
Types of Homology:
Orthologous and Paralogous genes
Orthologous genes: The most recent evolutionary event by which two genes are related to each other is a speciation event.
Co-orthologues or in-paralogues: When a genomic duplication event follows the speciation event.
Paralogous genes: The most recent evolutionary event by which two genes are related to each other is a gene duplication event.
Xenologous genes: Genes that originate from a horizontal transfer event.
72
We need to know both the evolutionary history of the species and of the genes to identify such types of homology
Types of Homology:
Orthologous and Paralogous genes
73
Fitch, W. M. Homology a personal view on some of the problems. Trends Genet, 2000, 16, 227-231
Moss
Arabidopsis
Rice
Genómica
Genoma: Colección de todas los genes de un sistema biológico.
Metodologías experimentales mas usadas: PCR, PCR en tiempo real, microarreglos, técnicas de secuenciación de segunda generación
74
74
Genómica
75
¿Que papel desempeña la bioinformática en genómica?
Montaje de secuencias, genomas (cromosomas) completos.
Predicción de genes y ORFs.
Predicción de estructura génica (exones, intrones).
Anotación funcional de genes mediante genómica comparada
Visualización de genomas y su anotación.
75
76
During this course we will focus on the application of bioinformatics to genomics, and a little bit to transcriptomics, as one aspect of functional genomics
76
Ensembl (http://www.ensembl.org/)
77
Ensembl (http://www.ensembl.org/)
78
Ensembl (http://www.ensembl.org/)
79
Ensembl (http://www.ensembl.org/)
80
Diego M. Riaño Pachón - MPIMP
Ensembl (http://www.ensembl.org/)
81
Ensembl (http://www.ensembl.org/)
82
Ensembl (http://www.ensembl.org/)
83
Ensembl (http://www.ensembl.org/)
84
Variantes de splicing
Ortólogos y parálogos
Regulación
Variación poblacional
85
That’s all for today
85
image9.png
image10.gif
image11.png
image12.png
image13.png
image21.gif
image14.gif
image15.gif
image16.png
image17.png
image18.jpeg
image19.jpeg
image20.png
image22.png
image23.png
image24.png
image25.png
image26.png
image27.png
image28.png
image29.png
image30.png
image31.png
image32.emf
1990 1992 1994 1996 1998 2000 2003 2004 2006 2008 2010 2012
0
1
10
100
1,000
10,000
100,000
1,000,000
0.1
1
10
100
1000
10,000
100,000
1,000,000
10,000,000
100,000,000
Year 
D
is
k 
st
or
ag
e 
(M
by
te
s/
$)
D
N
A
 sequencing (bp/$)
Hard disk storage (MB/$)
Doubling time 14 months 
Pre-NGS (bp/$)
Doubling time 19 months
-
NGS (bp/$) 
Doubling time 5 months
 
image33.png
image34.png
image35.png
image36.png
image37.png
image38.png
image39.png
image40.png
image41.png
image42.png
image43.png
image7.jpeg
image8.png
image44.png
image45.gif
image46.png
image47.jpeg
image48.png
image49.png
image50.gif
image51.gif
image52.png
image53.png
image54.png
image55.png
image56.png
image57.png
image58.png
image59.png
image60.png
image61.png
image62.png
image63.jpeg
image64.png
image65.png
image66.png
image67.png
image68.png
image69.png
image70.gif
image71.png
image72.png
image73.png
image74.png
image75.png
image76.png
image77.png
image78.png
image79.png
image2.jpeg

Continue navegando