::: marginnote
Things to do
- Read the class slides
- Read the bibliography
- Watch the videos from NCBI
- Check the prerequisites and the syllabus :::
In this course we teach how to interpret and understand the results of bioinformatic analyses. Most molecular biologists will work in team with (or hire, or be hired by) bioinformatic teams, so even if they do not use the tools, all molecular biologists need to understand what is the meaning of the results. It is important to speak the same language, and be aware of the key aspects that can lead to the experiment’s success or failure.
Classes
This year’s slides are different to previous years, in content and organization. Sometimes we use copyrighted material that is ok to show in classes, but not ok for putting on the web. In those cases the slides are not here. We recommend you to take notes during classes, since many important things are written in the whiteboard but not in the slides. We recommend taking notes with pen and paper using the Cornell Method.
You can find the slides and videos of previous years at Bioinfo 2021 and Bioinfo 2022. Here you will find this year’s slides.
- Class 1: Why do we care about Bioinformatics?.
(Oct 5, 2023). [Slides].
What is and what is not Bioinformatics. What will we do here - Class 2: Taxonomy. (Oct 12, 2023).
[Slides].
How to understand the universe. - Class 3: Comparing sequences. (Oct 19,
2023). [Slides].
How many generic codes? What are their differences? See also:- Google Sheet used in class: “class03-bioinfo-2023”..
- Class 4: Global and Local Alignment. (Oct 26,
2023). [Slides].
How to know if (parts of) two sequences are similar. See also:- Google Sheet used in class: “class04-bioinfo-2023”.
- Needleman, Saul B., and Christian D. Wunsch. “A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins.”. Journal of Molecular Biology 48, no. 3 (1970): 443–53.
- Smith, T. F., and M. S. Waterman. “Identification of Common Molecular Subsequences.”. Journal of Molecular Biology 147, no. 1 (1981): 195–97. https://doi.org/10.1016/0022-2836(81)90087-5.
- Dayhoff, Mo, and Rm Schwartz. “A Model of Evolutionary Change in Proteins.”. In Atlas of Protein Sequence and Structure. Washington, DC: National Biomedical Research Foundation, 1978. https://doi.org/10.1.1.145.4315.
- Henikoff, S, and J G Henikoff. “Amino Acid Substitution Matrices from Protein Blocks.”. Proc Natl Academy Sci 89 (1992). https://doi.org/10.1073/pnas.89.22.10915.
- Class 5: Finding Local Alignments. (Nov 2,
2022). [Slides].
Looking for local matches is different from global ones. We need to use scores. They make more biological sense. See also:- Google Sheet used in class: “class05-bioinfo-2023”.
- Class 6: Understanding BLAST. (Nov 9,
2023). [Slides].
It is not like Google. Results depends on the options you choose. What are the options? - Class 7. Trees representing distance. (Nov 30,
2022). [Slides].
Trees are used to represent phylogenetic relationships - Class 8: Multiple Sequence Alignment. (Dec 7,
2023). [Slides].
What is conserved among several sequences? What are the polymorphisms? How to find patterns without aligning. - Class 9: Phylogenetic Trees. (Dec 14,
2024). [Slides].
Building a time machine, and failing. - Class 10: DNA Melting Temperature. (Dec 21,
2023). [Slides].
How to design primers - Class 11: Designing primers. (Dec 28,
2023). [Slides].
How to design primers - Class 12: Course Summary. (Jan 4, 2024).
[Slides].
What you should remember from this course.
Homework
Homework is an integral part of this course, since we want to match theory and practice. Besides, without practice it is easier to forget. All homework should be sent to andres.aravena+bioinfo@istanbul.edu.tr before the deadline to get a grade. Please be careful, otherwise you will get a grade zero.
- Homework
1 (Deadline: Thursday 19 of October at
12:00).
Practice creating NCBI Entrez queries. - Homework
2 (Deadline: Thursday 19 of October at
12:00).
Read about genetic codes. - Homework
3 (Deadline: Thursday 26 of October at
12:00).
How would you calculate the Hamming distance between genetic codes? - Homework
4 (Deadline: Thursday 2 of November at
12:00).
Calculate global and semi global Levenstein distances between sequences. - Homework
5 (Deadline: Thursday 9 of November at
12:00).
Find proteins similar to human hemoglobin. - Homework
6 (Deadline: Saturday 9 of December at
12:00).
Buid trees. - Homework
7 (Deadline: Thursday 21 of December at
12:00).
Reconstruct phylogenetic trees. - Homework
8 (Deadline: Thursday 4 of January at
12:00).
Design primers for finding a gene in a metagenomic sample.
Sequences used in classes
Attendance
By regulation from the Rectory, students need to attend at least 70% of the classes. If you cannot attend, you must deliver all homework on time. Late submissions will not be accepted.
The attendance book is updated every week and can be seen in Google Sheets.
Prerequisites
This course does not require knowledge of coding or programming, but it will always be a strong advantage —in this course and in professional life— to know how to code a program.
You will need:
- A computer with internet access for doing the homework.
- To know how to handle files and folders in the computer, how to copy and move files, and understand the folders’ structure.
- To know the difference between text and binary files, and between text editors, word processors, and integrated development environments.
- Install a text editor —not a word processor. There are many and you can use your favorite one. We recommend Visual Studio Code.
We recommend (but not require):
- Learn how to use the Unix/Linux command line.
- You can install Linux in your computer, either in parallel with Windows, or as a virtual machine.
- Alternatively, you can install Git for Windows and use the bash command line in Windows. This will work for ≈90% of the commands.
- Sometimes it is an advantage to write some small programs. It is good to know a little bit of R or Python. We recommend using RStudio and Jupyter Notebooks.
Syllabus
We follow partially the plan proposed by Sayres (2018)Sayres, et al. “Bioinformatics Core Competencies
for Undergraduate Life Sciences Education.” PLoS ONE 13, no. 6
(2018): 1–20. https://doi.org/10.1371/journal.pone.0196878.
. At the end of the course students should be able to:
- Understand the role of computation and data mining in hypothesis-driven processes within the life sciences
- Understand computational concepts used in bioinformatics
- Know the basic file types used in bioinformatics (FASTA, GBK, GFF, BLAST, FASTQ, SAM)
- Understand tree structures that are used to understand biological entities: phylogeny, taxonomy, ontology. Understand the difference between taxonomy and phylogeny
- Know how to access genomic data on the web
- Access NCBI nucleotide, protein, GEO, SRA databases, Entrez query system, EBI databases.
- Know how to handle the basic file types used in bioinformatics
- How to read them, how to understand them, how to transform one into another.
- Know how to visualize DNA sequences, partial genome assembly results, and protein domains
- Understand the results given by a bioinformatic tool
- know the different types of pairwise alignments (global,
semi-global, local) and when to use each one
- Know the biological hypotheses behind the alignment scores
- Understand the challenges of multiple alignment, how to use them to find SNPs. Know how to build phylogenetic trees.
- Understand how Databases Search works:
- Understand the difference between algorithms and heuristics, the role of indices
- Assigning putative functions to coding genes, using COG and Gene Ontology
- Assigning putative taxonomic identity, using alignment and alignment-free methods
- Understand the main DNA-assembly methodologies: Overlap-layout-consensus and De Bruijn graphs.
- know the different types of pairwise alignments (global,
semi-global, local) and when to use each one
- Know how to design PCR primers and understand how to calculate the DNA melting temperature
Online supplementary material
Bibliography
The list of recommended and mandatory papers is in a separate page.
Web references
NCBI Videos: Sequences
These videos are complementary to our classes. They cover the same topics with more detail. Please watch them to understand better this course.
- NCBI Minute: A Beginner’s Guide to Genes and Sequences at NCBI (33:44)
- NCBI Minute: How to Quickly Retrieve Sequences from NCBI (23:38)
- NCBI: Download a custom set of records (03:11)
- NCBI: Retrieve Sequences for an Organism (01:36)
- Obtain Genomic Sequence for a gene (02:47)
- Webinar: Accessing 1000 Genomes Data at NCBI (32:15)
- NCBI Minute: Important Changes Coming to the Sequence Databases - GI Numbers (24:26)
NCBI Genome Visualization
NCBI Literature Search
- Webinar: Pubmed for Scientists (45:19)
- NCBI Minute: Tailor Your PubMed Search Experience with My NCBI (07:47)
- NCBI Minute: Keeping Current and Getting Help with NCBI Resources (14:22)
- NCBI Minute: On the NCBI Bookshelf, Textbooks for Free! (19:42)
- NCBI Minute: An Updated PubMed is on its Way! (25:30)
- Need the Full Text Article? (02:03)
- The NCBI Minute: PubMed Commons (12:06)
- NCBI Minute: Finding Genes in PubMed (11:50)
- The NCBI Minute: How You and Your Journal Club Can Contribute Using PubMed Commons (12:48)
- PubMed: Using the Advanced Search Builder (03:12)
Searching
- NCBI Minute: Finding Gene, Protein and Chemical Names, Aliases and Synonyms (15:17)
- NCBI Minute: How to Locate and Use Human Genomes and Annotations from the NCBI (09:08)
- Find in This Sequence (02:17)
- Save Search Results in Collections, including Favorites (02:57)
- NCBI Minute: Setting Up Alerts for New Data in My NCBI (07:46)
- NCBI Minute: Automate PubMed Searches & Save Citation Collections with My NCBI (12:55)
- My NCBI (02:30)
- PubMed Advanced Search Builder (02:27)
- PubMed: The Filters Sidebar (02:02)
- Use MeSH to Build a Better PubMed Query (03:03)
- E-Utilities Introduction (03:46)