::: marginnote
Things to do
- Read the class slides
- Read the bibliography
- Watch the videos from NCBI
- Check the prerequisites and the syllabus :::
This course teach how to interpret and understand the results of bioinformatic analyses. Most molecular biologists will work in team with (or hire) bioinformatic teams, so even if they do not use the tools, all molecular biologists need to understand what is the meaning of the results. It is important to speak the same language, and be aware of the key aspects that can lead to the experiment’s success or failure.
Classes
Here you will find the slides from the classes and other supplementary material. Notice that some things are said but not written, so you better take good notes. We recommend taking notes with pen and paper using the Cornell Method.
- Class 1: Why do we care about Bioinformatics?.
(Sep 27, 2021). [Video],[Slides].
What is and what is not Bioinformatics. What will we do here - Class 2: Finding data online. (Sep 30,
2021). [Video],[Slides].
Finding data online - Class 3: Taxonomy and Ontologies. (Oct 7,
2021). [Slides].
How we decompose complex things to understand them - Class 4: Automatization of NCBI searches. (Oct
7, 2021). [Slides].
Let the computer do the boring part. - Class 5: Optimal pairwise alignment. (Oct 11,
2021). [Slides].
How to know if two sequences are similar. - Class 6: Global and Local Alignment. (Oct 14,
2021). [Video],[Slides].
How to know if (parts of) two sequences are similar. See also:- Needleman, Saul B., and Christian D. Wunsch. “A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins.”. Journal of Molecular Biology 48, no. 3 (1970): 443–53.
- Smith, T. F., and M. S. Waterman. “Identification of Common Molecular Subsequences.”. Journal of Molecular Biology 147, no. 1 (1981): 195–97. https://doi.org/10.1016/0022-2836(81)90087-5.
- Dayhoff, Mo, and Rm Schwartz. “A Model of Evolutionary Change in Proteins.”. In Atlas of Protein Sequence and Structure. Washington, DC: National Biomedical Research Foundation, 1978. https://doi.org/10.1.1.145.4315.
- Henikoff, S, and J G Henikoff. “Amino Acid Substitution Matrices from Protein Blocks.”. Proc Natl Academy Sci 89 (1992). https://doi.org/10.1073/pnas.89.22.10915.
- Class 7: Affine Gaps are more realistic. (Oct
18, 2021). [Video],[Slides].
It is not like Google. Results depends on the options you choose. What are the options? See also:- Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. “Basic Local Alignment Search Tool.”. Journal of Molecular Biology 215, no. 3 (October 5, 1990): 403–10. https://doi.org/10.1016/S0022-2836(05)80360-2.
- NCBI BLAST topics.
- NCBI BLAST documentation.
- Five Teaching Examples Using BLAST. (29:37)
- Using BLAST Well. (43:53)
- BLAST Results: Expect Values, Part 1. (02:30)
- BLAST Results: Expect Values, Part 2. (03:39)
- Introducing a New Web BLAST Results Page. (03:13)
- Getting the Most out of Web BLAST Tabular Format. (27:06)
- A Practical Guide to NCBI BLAST. (1:22:09)
- Using BLAST for Genomic Analysis. (10:37)
- Class 8: BLAST. (Oct 21, 2021). [Video],[Slides].
It is not like Google. Results depends on the options you choose. What are the options? - Class 9: Computational Cost and Heuristics.
(Oct 25, 2021). [Slides].
How does BLAST work? Solving the easy question, not the correct one. - Class 10: Multiple Sequence Analysis. (Nov 1,
2021). [Video],[Slides].
What is conserved among several sequences? What are the polymorphisms? How to find patterns without aligning. See also: - Class 11: Clustal. (Nov 1, 2021).
[Slides].
Practice doing multiple alignments. - Class 12: Review of Midterm and Homework. (Nov
22, 2021). [Slides].
Practical comment about your answers. There are no slides. All is discussed in the classroom. - Class 13: Phylogenetic Trees. (Nov 25,
2021). [Video],[Slides].
Building a time machine, and failing. - Class 14: DNA Melting Temperature. (Nov 29,
2021). [Slides].
How to design primers - Class 15: Designing primers. (Dec 2,
2021). [Slides].
How to design primers - Class 16: DNA Sequencing. (Dec 6, 2021).
[Video],[Slides].
How can we know the genome of an organism? - Class 17: Genome Assembly. (Dec 9, 2021).
[Video],[Slides].
How can we know the genome of an organism? See also:- “Introduction to Bioinformatics”, Lecture by Yuzhen Ye, Indiana University Bloomington.
- Network Algorithms for Molecular Biology lesson on “Introduction to (de novo) assembly”, by Blerina Sinaimeri, Université Lyon I.
- “Foundations of Computational Systems Biology”, Lecture by David K. Gifford, MIT.
- Chou, H-H., and M. H. Holmes. “DNA Sequence Quality Trimming and Vector Removal.” Bioinformatics 17, no. 12 (2001): 1093–1104..
- Class 18: Solution of Homework 2. (Dec 13,
2021). [Slides].
We practice the solution of Neighbor Joining. - Class 19: Overlay-Layout-Consensus Assembly &
Statistics. (Dec 16, 2021). [Video],[Slides].
The classical method to assemble a genome. - Class 20: Definitions. (Dec 20, 2021).
[Slides].
Make sure you know the meaning of these words. - Class 21: De Bruijn Assembly . (Dec 23,
2021). [Slides].
The new method to assemble a genome. - Class 22: Mapping reads to a reference. (Dec
27, 2021). [Video],[Slides].
How can we know the genome of an organism? - Class 23: Understanding SAM files. (Dec 30,
2021). [Slides].
Reads are mapped to the genome using one of many tools (bowtie, bwa, hisat). All these programs give their results in SAM/BAM format. What is that format? - Class 23.1: Summary of the course. (Dec 30,
2021). [Slides].
How can we know the genome of an organism?
Sequences used in classes
- Terje Steinum’s metagenomic data
- E.coli 16S gene
- Several Dehidrin proteins
- Accession numbers for the figures of Aas 2005: Fig 1, Fig 2, Fig 3, Fig 4, Fig 5, Fig 6, Fig 7, Fig 8.
Attendance
By regulation from the Rectory, students need to attend at least 70% of the classes. If you cannot attend, you must deliver all homework on time. Late submissions will not be accepted.
The attendance book is updated every week and can be seen in Google Sheets.
Prerequisites
This course does not require knowledge of coding or programming, but it will always be a strong advantage —in this course and in professional life— to know how to code a program.
You will need:
- A computer with internet access for doing the homework.
- To know how to handle files and folders in the computer, how to copy and move files, and understand the folders’ structure.
- To know the difference between text and binary files, and between text editors, word processors, and integrated development environments.
- Install a text editor —not a word processor. There are many and you can use your favorite one. We recommend Visual Studio Code.
We recommend (but not require):
- Learn how to use the Unix/Linux command line.
- You can install Linux in your computer, either in parallel with Windows, or as a virtual machine.
- Alternatively, you can install Git for Windows and use the bash command line in Windows. This will work for ≈90% of the commands.
- Sometimes it is an advantage to write some small programs. It is good to know a little bit of R or Python. We recommend using RStudio and Jupyter Notebooks.
Syllabus
We follow partially the plan proposed by Sayres (2018)Sayres, et al. “Bioinformatics Core Competencies
for Undergraduate Life Sciences Education.” PLoS ONE 13, no. 6
(2018): 1–20. https://doi.org/10.1371/journal.pone.0196878.
. At the end of the course students should be able to:
- Understand the role of computation and data mining in hypothesis-driven processes within the life sciences
- Understand computational concepts used in bioinformatics
- Know the basic file types used in bioinformatics (FASTA, GBK, GFF, BLAST, FASTQ, SAM)
- Understand tree structures that are used to understand biological entities: phylogeny, taxonomy, ontology. Understand the difference between taxonomy and phylogeny
- Know how to access genomic data on the web
- Access NCBI nucleotide, protein, GEO, SRA databases, Entrez query system, EBI databases.
- Know how to handle the basic file types used in bioinformatics
- How to read them, how to understand them, how to transform one into another.
- Know how to visualize DNA sequences, partial genome assembly results, and protein domains
- Understand the results given by a bioinformatic tool
- know the different types of pairwise alignments (global,
semi-global, local) and when to use each one
- Know the biological hypotheses behind the alignment scores
- Understand the challenges of multiple alignment, how to use them to find SNPs. Know how to build phylogenetic trees.
- Understand how Databases Search works:
- Understand the difference between algorithms and heuristics, the role of indices
- Assigning putative functions to coding genes, using COG and Gene Ontology
- Assigning putative taxonomic identity, using alignment and alignment-free methods
- Understand the main DNA-assembly methodologies: Overlap-layout-consensus and De Bruijn graphs.
- know the different types of pairwise alignments (global,
semi-global, local) and when to use each one
- Know how to design PCR primers and understand how to calculate the DNA melting temperature
Online supplementary material
Bibliography
The list of recommended and mandatory papers is in a separate page.
Web references
NCBI Videos: Sequences
These videos are complementary to our classes. They cover the same topics with more detail. Please watch them to understand better this course.
- NCBI Minute: A Beginner’s Guide to Genes and Sequences at NCBI (33:44)
- NCBI Minute: How to Quickly Retrieve Sequences from NCBI (23:38)
- NCBI: Download a custom set of records (03:11)
- NCBI: Retrieve Sequences for an Organism (01:36)
- Obtain Genomic Sequence for a gene (02:47)
- Webinar: Accessing 1000 Genomes Data at NCBI (32:15)
- NCBI Minute: Important Changes Coming to the Sequence Databases - GI Numbers (24:26)
NCBI Genome Visualization
NCBI Literature Search
- Webinar: Pubmed for Scientists (45:19)
- NCBI Minute: Tailor Your PubMed Search Experience with My NCBI (07:47)
- NCBI Minute: Keeping Current and Getting Help with NCBI Resources (14:22)
- NCBI Minute: On the NCBI Bookshelf, Textbooks for Free! (19:42)
- NCBI Minute: An Updated PubMed is on its Way! (25:30)
- Need the Full Text Article? (02:03)
- The NCBI Minute: PubMed Commons (12:06)
- NCBI Minute: Finding Genes in PubMed (11:50)
- The NCBI Minute: How You and Your Journal Club Can Contribute Using PubMed Commons (12:48)
- PubMed: Using the Advanced Search Builder (03:12)
Searching
- NCBI Minute: Finding Gene, Protein and Chemical Names, Aliases and Synonyms (15:17)
- NCBI Minute: How to Locate and Use Human Genomes and Annotations from the NCBI (09:08)
- Find in This Sequence (02:17)
- Save Search Results in Collections, including Favorites (02:57)
- NCBI Minute: Setting Up Alerts for New Data in My NCBI (07:46)
- NCBI Minute: Automate PubMed Searches & Save Citation Collections with My NCBI (12:55)
- My NCBI (02:30)
- PubMed Advanced Search Builder (02:27)
- PubMed: The Filters Sidebar (02:02)
- Use MeSH to Build a Better PubMed Query (03:03)
- E-Utilities Introduction (03:46)