Things to do
- Read the class slides
- Read the bibliography
- Watch the videos from NCBI
- Check the prerequisites and the syllabus
In this course we teach how to interpret and understand the results of bioinformatic analyses. Most molecular biologists will work in team with (or hire, or be hired by) bioinformatic teams, so even if they do not use the tools, all molecular biologists need to understand what is the meaning of the results. It is important to speak the same language, and be aware of the key aspects that can lead to the experiment’s success or failure.
Classes
This year’s slides are different to previous years, in content and organization. Sometimes we use copyrighted material that is ok to show in classes, but not ok for putting on the web. In those cases the slides are not here. We recommend you to take notes during classes, since many important things are written in the whiteboard but not in the slides. We recommend taking notes with pen and paper using the Cornell Method.
You can find the slides and videos of previous years at Bioinfo 2021 and Bioinfo 2020. Here you will find this year’s slides.
- Class 1: Why do we care about Bioinformatics?.
(Sep 27, 2022). [Slides].
What is and what is not Bioinformatics. What will we do here - Class 2: Taxonomy. (Sep 29, 2022).
[Slides].
How to understand the universe. - Class 3: Distance. (Sep 29, 2022).
[Slides].
Comparing generic codes. - Class 4: Global and Local Alignment. (Oct 13,
2022). [Slides].
How to know if (parts of) two sequences are similar. See also:- Needleman, Saul B., and Christian D. Wunsch. “A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins.”. Journal of Molecular Biology 48, no. 3 (1970): 443–53.
- Smith, T. F., and M. S. Waterman. “Identification of Common Molecular Subsequences.”. Journal of Molecular Biology 147, no. 1 (1981): 195–97. https://doi.org/10.1016/0022-2836(81)90087-5.
- Dayhoff, Mo, and Rm Schwartz. “A Model of Evolutionary Change in Proteins.”. In Atlas of Protein Sequence and Structure. Washington, DC: National Biomedical Research Foundation, 1978. https://doi.org/10.1.1.145.4315.
- Henikoff, S, and J G Henikoff. “Amino Acid Substitution Matrices from Protein Blocks.”. Proc Natl Academy Sci 89 (1992). https://doi.org/10.1073/pnas.89.22.10915.
- Class 5: Practice using Spreadsheets. (Oct 14,
2022). [Slides].
Learn how to use Excel or Google Sheets. - Class 6: Finding Local Alignments. (Oct 20,
2022). [Slides].
Looking for local matches is different from global ones. We need to use scores. They make more biological sense. - Class 7: BLAST. (Oct 21, 2022). [Slides].
It is not like Google. Results depends on the options you choose. What are the options? - Class 8: How BLAST works. (Oct 27, 2022).
[Slides].
How does BLAST work? Solving the easy question, not the correct one. - Class 9. Q&A. (Oct 28, 2022).
[Slides].
Practice before Midterm - Class 10. Trees representing distance. (Nov 17,
2022). [Slides].
Trees are used to represent phylogenetic relationships - Class 11. Probabilities. (Nov 18, 2022).
[Slides].
Probabilities are the tools we use to understand experimental results. - Class 12: Multiple Sequence Alignment. (Nov 24,
2022). [Slides].
What is conserved among several sequences? What are the polymorphisms? How to find patterns without aligning. - Class 13: Essential Maths for Bioinformatics.
(Nov 25, 2022). [Slides].
What is conserved among several sequences? What are the polymorphisms? How to find patterns without aligning. - Class 14: Phylogenetic Trees. (Dec 1,
2021). [Slides].
Building a time machine, and failing. - Class 15: DNA Sequencing. (Dec 2, 2022).
[Slides].
How can we know the genome of an organism? See also: - Class 16: Mapping Reads to Reference. (Dec 8,
2022). [Slides].
Reads are mapped to the genome using one of many tools (bowtie, bwa, hisat). All these programs give their results in SAM/BAM format. What is that format? See also: - Class 17: Understanding SAM files. (Dec 9,
2022). [Slides].
Reads are mapped to the genome using one of many tools (bowtie, bwa, hisat). All these programs give their results in SAM/BAM format. What is that format? - Class 18: DNA Melting Temperature. (Dec 14,
2022). [Slides].
How to design primers - Class 20: Designing primers. (Dec 22,
2022). [Slides].
How to design primers
There are no slides for classes 19 and 21, since they were all practical and notes were written only in the whiteboard. Check your (classmates) handwritten notes.
Homework
Homework is an integral part of this course, since we want to match theory and practice. Besides, without practice we tend to forget more often. All homework should be sent to andres.aravena+bioinfo@istanbul.edu.tr before the deadline to get a grade. Please be careful, otherwise you will get a grade zero.
- Homework
1 (Deadline: Friday 30 of September at
9:00).
Practice creating NCBI Entrez queries - Homework
2 (Deadline: Thursday 6 of October at 9:00).
Read about genetic codes - Homework
3 (Deadline: Friday 14 of October at 9:00).
Prepare dot plots, calculate Hamming distance. - Homework
4 (Deadline: Monday 24 of October at 9:00).
Learning Excel and factorials - Homework
5 (Deadline: Friday 28 of October at 9:00).
Prepare for the midterm exam. - Homework
6 (Deadline: Friday 25 of November at 9:00).
Better searches, and comparing bacterial strains. - Homework
7 (Deadline: Friday 25 of November at 9:00).
Explain probabilities, draw trees. - Homework
8 (Deadline: Friday 9 of December at 9:00).
This week we practice drawing trees and doing multiple alignments. - Homework
9 (Deadline: Friday 16 of December at 9:00).
This week we practice using Galaxy to analyze RNAseq data.
Attendance
By regulation from the Rectory, students need to attend at least 70% of the classes. If you cannot attend, you must deliver all homework on time. Late submissions will not be accepted.
The attendance book is updated every week and can be seen in Google Sheets.
Prerequisites
This course does not require knowledge of coding or programming, but it will always be a strong advantage —in this course and in professional life— to know how to code a program.
You will need:
- A computer with internet access for doing the homework.
- To know how to handle files and folders in the computer, how to copy and move files, and understand the folders’ structure.
- To know the difference between text and binary files, and between text editors, word processors, and integrated development environments.
- Install a text editor —not a word processor. There are many and you can use your favorite one. We recommend Visual Studio Code.
We recommend (but not require):
- Learn how to use the Unix/Linux command line.
- You can install Linux in your computer, either in parallel with Windows, or as a virtual machine.
- Alternatively, you can install Git for Windows and use the bash command line in Windows. This will work for ≈90% of the commands.
- Sometimes it is an advantage to write some small programs. It is good to know a little bit of R or Python. We recommend using RStudio and Jupyter Notebooks.
Syllabus
We follow partially the plan proposed by Sayres (2018)Sayres, et al. “Bioinformatics Core Competencies
for Undergraduate Life Sciences Education.” PLoS ONE 13, no. 6
(2018): 1–20. https://doi.org/10.1371/journal.pone.0196878.
. At the end of the course students should be able to:
- Understand the role of computation and data mining in hypothesis-driven processes within the life sciences
- Understand computational concepts used in bioinformatics
- Know the basic file types used in bioinformatics (FASTA, GBK, GFF, BLAST, FASTQ, SAM)
- Understand tree structures that are used to understand biological entities: phylogeny, taxonomy, ontology. Understand the difference between taxonomy and phylogeny
- Know how to access genomic data on the web
- Access NCBI nucleotide, protein, GEO, SRA databases, Entrez query system, EBI databases.
- Know how to handle the basic file types used in bioinformatics
- How to read them, how to understand them, how to transform one into another.
- Know how to visualize DNA sequences, partial genome assembly results, and protein domains
- Understand the results given by a bioinformatic tool
- know the different types of pairwise alignments (global,
semi-global, local) and when to use each one
- Know the biological hypotheses behind the alignment scores
- Understand the challenges of multiple alignment, how to use them to find SNPs. Know how to build phylogenetic trees.
- Understand how Databases Search works:
- Understand the difference between algorithms and heuristics, the role of indices
- Assigning putative functions to coding genes, using COG and Gene Ontology
- Assigning putative taxonomic identity, using alignment and alignment-free methods
- Understand the main DNA-assembly methodologies: Overlap-layout-consensus and De Bruijn graphs.
- know the different types of pairwise alignments (global,
semi-global, local) and when to use each one
- Know how to design PCR primers and understand how to calculate the DNA melting temperature
Online supplementary material
Bibliography
The list of recommended and mandatory papers is in a separate page.
Web references
NCBI Videos: Sequences
These videos are complementary to our classes. They cover the same topics with more detail. Please watch them to understand better this course.
- NCBI Minute: A Beginner’s Guide to Genes and Sequences at NCBI (33:44)
- NCBI Minute: How to Quickly Retrieve Sequences from NCBI (23:38)
- NCBI: Download a custom set of records (03:11)
- NCBI: Retrieve Sequences for an Organism (01:36)
- Obtain Genomic Sequence for a gene (02:47)
- Webinar: Accessing 1000 Genomes Data at NCBI (32:15)
- NCBI Minute: Important Changes Coming to the Sequence Databases - GI Numbers (24:26)
NCBI Genome Visualization
NCBI Literature Search
- Webinar: Pubmed for Scientists (45:19)
- NCBI Minute: Tailor Your PubMed Search Experience with My NCBI (07:47)
- NCBI Minute: Keeping Current and Getting Help with NCBI Resources (14:22)
- NCBI Minute: On the NCBI Bookshelf, Textbooks for Free! (19:42)
- NCBI Minute: An Updated PubMed is on its Way! (25:30)
- Need the Full Text Article? (02:03)
- The NCBI Minute: PubMed Commons (12:06)
- NCBI Minute: Finding Genes in PubMed (11:50)
- The NCBI Minute: How You and Your Journal Club Can Contribute Using PubMed Commons (12:48)
- PubMed: Using the Advanced Search Builder (03:12)
Searching
- NCBI Minute: Finding Gene, Protein and Chemical Names, Aliases and Synonyms (15:17)
- NCBI Minute: How to Locate and Use Human Genomes and Annotations from the NCBI (09:08)
- Find in This Sequence (02:17)
- Save Search Results in Collections, including Favorites (02:57)
- NCBI Minute: Setting Up Alerts for New Data in My NCBI (07:46)
- NCBI Minute: Automate PubMed Searches & Save Citation Collections with My NCBI (12:55)
- My NCBI (02:30)
- PubMed Advanced Search Builder (02:27)
- PubMed: The Filters Sidebar (02:02)
- Use MeSH to Build a Better PubMed Query (03:03)
- E-Utilities Introduction (03:46)