These are some of the papers we want to read and understand during this semester. The most important ones are marked in bold face. Start by reading those ones.
If you find that the web link is wrong, or you find the missing URLs, please let me know.
Protein Clusters
Tatusov, R L, M Y Galperin, D A Natale, and E V Koonin. “The COG Database: A Tool for Genome-Scale Analysis of Protein Functions and Evolution.” Nucleic Acids Research 28, no. 1 (January 1, 2000): 33–36.
Tatusov, R L, D A Natale, I V Garkavtsev, and T A Tatusova. “The COG Database: New Developments in Phylogenetic Classification of Proteins from Complete Genomes.” Nucleic Acids Research, January 1, 2001. http://nar.oxfordjournals.org/cgi/content/abstract/29/1/22.
Tatusov, R L, N D Fedorova, J D Jackson, A R Jacobs, B Kiryutin, E V Koonin, D M Krylov, et al. “The COG Database: An Updated Version Includes Eukaryotes.” BMC Bioinformatics 4 (September 11, 2003): 41. http://www.biomedcentral.com/1471-2105/4/41.
Alignment
Needleman, Saul B., and Christian D. Wunsch. “A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins.” Journal of Molecular Biology 48, no. 3 (1970): 443–53.
Smith, T. F., and M. S. Waterman. “Identification of Common Molecular Subsequences.” Journal of Molecular Biology 147, no. 1 (1981): 195–97. https://doi.org/10.1016/0022-2836(81)90087-5.
Karlin, S, and S F Altschul. “Methods for Assessing the Statistical Significance of Molecular Sequence Features by Using General Scoring Schemes.” Proceedings of the National Academy of Sciences of the United States of America 87, no. 6 (1990): 2264–68. https://doi.org/10.1073/pnas.87.6.2264.
Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. “Basic Local Alignment Search Tool.” Journal of Molecular Biology 215, no. 3 (October 5, 1990): 403–10. https://doi.org/10.1016/S0022-2836(05)80360-2.
Karlin, S, and S F Altschul. “Applications and Statistics for Multiple High-Scoring Segments in Molecular Sequences.” Proceedings of the National Academy of Sciences of the United States of America 90, no. 12 (June 15, 1993): 5873–77.
Altschul, S. F., and W. Gish. “Local Alignment Statistics.” Methods in Enzymology 266, no. January (1996): 460–80. https://doi.org/10.1016/S0076-6879(96)66029-7.
Altschul, S F, T L Madden, A A Schäffer, J Zhang, Z Zhang, W Miller, and D J Lipman. “Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs.” Nucleic Acids Research 25, no. 17 (September 1, 1997): 3389–3402.
The Statistics of Local Pairwise Sequence Alignment, Part 1 YouTube video.
The Statistics of Local Pairwise Sequence Alignment, Part 2 YouTube video.
Protein Alignment
Dayhoff, M. O., and R. M. Schwartz. “A Model of Evolutionary Change in Proteins.” In Atlas of Protein Sequence and Structure. Washington, DC: National Biomedical Research Foundation, 1978. https://doi.org/10.1.1.145.4315.
Altschul, S F. “Amino Acid Substitution Matrices from an Information Theoretic Perspective.” J Mol Biol 219 (1991). https://doi.org/10.1016/0022-2836(91)90193-A.
Henikoff, S, and J G Henikoff. “Amino Acid Substitution Matrices from Protein Blocks.” Proc Natl Acad Sci 89 (1992). https://doi.org/10.1073/pnas.89.22.10915.
Henikoff, S, and J G Henikoff. “Performance Evaluation of of Amino Acid Substitution Matrices.” Proteins 17 (1993): 49–61.
Zhang, Z, A A Schäffer, W Miller, T L Madden, D J Lipman, E V Koonin, and S F Altschul. “Protein Sequence Similarity Searches Using Patterns as Seeds.” Nucleic Acids Research 26, no. 17 (September 1, 1998): 3986–90.
R package Biostring (part of Bioconductor), containing PAM and BLOSUM matrices.
Sequencing
- Chou, H.-H., and M. H. Holmes. “DNA Sequence Quality Trimming and Vector Removal.” Bioinformatics 17, no. 12 (2001): 1093–1104. https://doi.org/10.1093/bioinformatics/17.12.1093.
Assembly
Staden, R. “A Strategy of DNA Sequencing Employing Computer Programs.” Nucleic Acids Research 6, no. 7 (1979): 2601–10. https://doi.org/10.1093/nar/6.7.2601.
Lander, E S, and M S Waterman. “Genomic Mapping by Fingerprinting Random Clones: A Mathematical Analysis.” Genomics 2, no. 3 (April 1, 1988): 231–39. https://doi.org/10.1016/0888-7543(88)90007-9.
Pevzner, P A, H Tang, and M S Waterman. “An Eulerian Path Approach to DNA Fragment Assembly.” Proceedings of the National Academy of Sciences of the United States of America 98, no. 17 (August 14, 2001): 9748–53.
Chaisson, M, D Brinza, and P Pevzner. “De Novo Fragment Assembly with Short Mate-Paired Reads: Does the Read Length Matter?” Genome Research, December 3, 2008, 25.
Sims, David, Ian Sudbery, Nicholas E. Ilott, Andreas Heger, and Chris P. Ponting. “Sequencing Depth and Coverage: Key Considerations in Genomic Analyses.” Nature Reviews Genetics 15, no. 2 (2014): 121–32. https://doi.org/10.1038/nrg3642.
Bankevich, Anton, Sergey Nurk, Dmitry Antipov, Alexey a. Gurevich, Mikhail Dvorkin, Alexander S. Kulikov, Valery M. Lesin, et al. “SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing.” Journal of Computational Biology 19, no. 5 (2012): 455–77. https://doi.org/10.1089/cmb.2012.0021.
Li, Zhenyu, Yanxiang Chen, Desheng Mu, Jianying Yuan, Yujian Shi, Hao Zhang, Jun Gan, et al. “Comparison of the Two Major Classes of Assembly Algorithms: Overlap-Layout-Consensus and de-Bruijn-Graph.” Briefings in Functional Genomics 11, no. 1 (2012): 25–37. https://doi.org/10.1093/bfgp/elr035.
Nagarajan, Niranjan, and Mihai Pop. “Sequence Assembly Demystified.” Nature Reviews. Genetics 14, no. 3 (2013): 157–67. https://doi.org/10.1038/nrg3367.
Wick, Ryan R., Mark B. Schultz, Justin Zobel, and Kathryn E. Holt. “Bandage: Interactive Visualization of de Novo Genome Assemblies.” Bioinformatics 31, no. 20 (2015): 3350–52. https://doi.org/10.1093/bioinformatics/btv383.
Phillippy, Adam M. “New Advances in Sequence Assembly.” Genome Research 27, no. 5 (May 1, 2017): xi–xiii. https://doi.org/10.1101/gr.223057.117.
Metagenomics
Aas, Jørn A, Bruce J Paster, Lauren N Stokes, Ingar Olsen, and Floyd E Dewhirst. “Defining the Normal Bacterial Flora of the Oral Cavity.” Journal of Clinical Microbiology 43, no. 11 (2005): 5721–32. https://doi.org/10.1128/JCM.43.11.5721-5732.2005.
Dina Fine Maron. “Dirty Money.” Scientific American, 2017. https://www.scientificamerican.com/article/dirty-money/.
Jeff Leach. “Going Feral: My One-Year Journey to Acquire the Healthiest Gut Microbiome in the World,” January 2014. http://humanfoodproject.com/going-feral-one-year-journey-acquire-healthiest-gut-microbiome-world-heard/.
Tyson, Gene W, Jarrod Chapman, Philip Hugenholtz, Eric E Allen, Rachna J Ram, Paul M Richardson, Victor V Solovyev, Edward M Rubin, Daniel S Rokhsar, and Jillian F Banfield. “Community Structure and Metabolism through Reconstruction of Microbial Genomes from the Environment.” Nature 428, no. 6978 (2004): 37–43. https://doi.org/10.1038/nature02340.
Qin, Junjie, Ruiqiang Li, Jeroen Raes, Manimozhiyan Arumugam, Kristoffer Solvsten Burgdorf, Chaysavanh Manichanh, Trine Nielsen, et al. “A Human Gut Microbial Gene Catalogue Established by Metagenomic Sequencing.” Nature 464, no. 7285 (March 4, 2010): 59–65. https://doi.org/10.1038/nature08821.
Ünal, Burcu. “Phylogenetic Analysis of Bacterial Communities in Kefir by Metagenomics.” Izmir Institute of Technology, Turkey, 2008.
Ünal, Burcu, and Alper Arslanoğlu. “Phylogenetic Identification of Bacteria within Kefir by Both Culture-Dependent and Culture-Independent Methods.” African Journal of Microbiology Research 7, no. 36 (2013): 4533–38. https://doi.org/10.5897/AJMR2013.6064.
Handelsman, Jo. “Metagenomics: Application of Genomics to Uncultured Microorganisms.” Microbiology and Molecular Biology Reviews 68, no. 4 (2004): 669–85. https://doi.org/10.1128/MMBR.68.4.669-685.2004.
Baker, Brett J., and Jillian F. Banfield. “Microbial Communities in Acid Mine Drainage.” FEMS Microbiology Ecology 44, no. 2 (2003): 139–52. https://doi.org/10.1016/S0168-6496(03)00028-X.
Wooley, John C., and Yuzhen Ye. “Metagenomics: Facts and Artifacts, and Computational Challenges.” Journal of Computer Science and Technology 25, no. 1 (2009): 71–81. https://doi.org/10.1007/s11390-010-9306-4.
Sharpton, Thomas J. “An Introduction to the Analysis of Shotgun Metagenomic Data.” Frontiers in Plant Science 5 (June 16, 2014): 209. https://doi.org/10.3389/fpls.2014.00209.
Hunter, Chris I, Alex Mitchell, Philip Jones, Craig McAnulla, Sebastien Pesseat, Maxim Scheremetjew, and Sarah Hunter. “Metagenomic Analysis: The Challenge of the Data Bonanza.” Briefings in Bioinformatics 13, no. 6 (November 1, 2012): 743–46. https://doi.org/10.1093/bib/bbs020.
Teeling, Hanno, and Frank Oliver Glöckner. “Current Opportunities and Challenges in Microbial Metagenome Analysis–a Bioinformatic Perspective.” Briefings in Bioinformatics 13, no. 6 (December 1, 2012): 728–42. https://doi.org/10.1093/bib/bbs039.
Mande, Sharmila S, Monzoorul Haque Mohammed, and Tarini Shankar Ghosh. “Classification of Metagenomic Sequences: Methods and Challenges.” Briefings in Bioinformatics 13, no. 6 (November 1, 2012): 669–81. https://doi.org/10.1093/bib/bbs054.
Motifs
Bailey, T. L, and C. Elkan. “Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Bipolymers.” Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, 1994, 28–36. https://doi.org/citeulike-article-id:878292.
Eskin, E, M S Gelfand, and P Pevzner. “Genome Wide Analysis of Bacterial Promoter Regions.” Pacific Symposium on Biocomputing 2003: Kauai, Hawaii, 3-7 January 2003, 2002, 29.
Others
Sears, David B. “The Computational Linguistics of Biological Sequences.” In ARTIFICIAL INTELLIGENCE & MOLECULAR BIOLOGY W1·2, 47–121, 2002.
Subramanian, A, P Tamayo, V K Mootha, S Mukherjee, B L Ebert, M A Gillette, A Paulovich, et al. “Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles.” Proceedings of the National Academy of Sciences of the United States of America 102, no. 43 (October 25, 2005): 15545–50.
Reshef, D. N., Y. a. Reshef, H. K. Finucane, S. R. Grossman, G. McVean, P. J. Turnbaugh, E. S. Lander, M. Mitzenmacher, and P. C. Sabeti. “Detecting Novel Associations in Large Data Sets.” Science 334, no. 6062 (2011): 1518–24. https://doi.org/10.1126/science.1205438.
Yates, Andrew, Kathryn Beal, Stephen Keenan, William McLaren, Miguel Pignatelli, Graham R.S. Ritchie, Magali Ruffier, Kieron Taylor, Alessandro Vullo, and Paul Flicek. “The Ensembl REST API: Ensembl Data for Any Language.” Bioinformatics 31, no. 1 (2015): 143–45. https://doi.org/10.1093/bioinformatics/btu613.
Zerbino, Daniel R., Premanand Achuthan, Wasiu Akanni, M. Ridwan Amode, Daniel Barrell, Jyothish Bhai, Konstantinos Billis, et al. “Ensembl 2018.” Nucleic Acids Research 46, no. D1 (2018): D754–61. https://doi.org/10.1093/nar/gkx1098.
Web references
NCBI Videos
These videos are complementary to our classes. They cover the same topics with more detail. Please watch them to understand better this course.
Sequences
- NCBI Minute: A Beginner’s Guide to Genes and Sequences at NCBI (33:44)
- NCBI Minute: How to Quickly Retrieve Sequences from NCBI (23:38)
- NCBI: Download a custom set of records (03:11)
- NCBI: Retrieve Sequences for an Organism (01:36)
- Obtain Genomic Sequence for a gene (02:47)
- Webinar: Accessing 1000 Genomes Data at NCBI (32:15)
- NCBI Minute: Important Changes Coming to the Sequence Databases - GI Numbers (24:26)
Visualization
Literature
- Webinar: Pubmed for Scientists (45:19)
- NCBI Minute: Tailor Your PubMed Search Experience with My NCBI (07:47)
- NCBI Minute: Keeping Current and Getting Help with NCBI Resources (14:22)
- NCBI Minute: On the NCBI Bookshelf, Textbooks for Free! (19:42)
- NCBI Minute: An Updated PubMed is on its Way! (25:30)
- Need the Full Text Article? (02:03)
- The NCBI Minute: PubMed Commons (12:06)
- NCBI Minute: Finding Genes in PubMed (11:50)
- The NCBI Minute: How You and Your Journal Club Can Contribute Using PubMed Commons (12:48)
- PubMed: Using the Advanced Search Builder (03:12)
Searching
- NCBI Minute: Finding Gene, Protein and Chemical Names, Aliases and Synonyms (15:17)
- NCBI Minute: How to Locate and Use Human Genomes and Annotations from the NCBI (09:08)
- Find in This Sequence (02:17)
- Save Search Results in Collections, including Favorites (02:57)
- NCBI Minute: Setting Up Alerts for New Data in My NCBI (07:46)
- NCBI Minute: Automate PubMed Searches & Save Citation Collections with My NCBI (12:55)
- My NCBI (02:30)
- PubMed Advanced Search Builder (02:27)
- PubMed: The Filters Sidebar (02:02)
- Use MeSH to Build a Better PubMed Query (03:03)
- E-Utilities Introduction (03:46)