what it is and what it isn’t
for this course, according to
“Bioinformatics Core Competencies for
Undergraduate Life Sciences Education.”
Sayres, et al. PLoS ONE 13, no. 6 (2018): 1–20.
Measuring gene expression
Mostly about statistics
Sayres, et al. “Bioinformatics Core Competencies for
Undergraduate Life Sciences Education.”
PLoS ONE 13, no. 6 (2018): 1–20. https://doi.org/10.1371/journal.pone.0196878.
Understand the role of computation and data mining in hypothesis-driven processes within the life sciences
Understand computational concepts used in bioinformatics
Know statistical concepts used in bioinformatics
Know how to access genomic data
Be able to use bioinformatics tools to analyze genomic data
Know how to access gene expression data
Be able to use bioinformatics tools to analyze gene expression data
Know how to access proteomic data
Be able to use bioinformatics tools to examine protein structure and function
Know how to access metabolomic and systems biology data
Be able to use bioinformatics tools to examine the flow of molecules within pathways/networks
Be able to use bioinformatics tools to examine metagenomics data
Know how to write short computer programs as part of the scientific discovery process
Be able to use software packages to manipulate and analyze bioinformatics data
Operate in a variety of computational environments to manipulate and analyze bioinformatics data
We focus on How to understand results
My blog is at https://www.dry-lab.org/
Course’s blog at https://www.dry-lab.org/blog/2023/bioinfo/
All material will be published there
about bioinformatics
In 2001, the cost of sequencing the first human genome was USD 108
Today you can have your own genome for 1000 USD
The problem is no longer how to do the experiment
Instead is how do we make sense of the results
There are three large data repositories
These three databases interchange all sequence data
but they may have different structure
All data is available for free
Research payed with public money must be uploaded here
Good journals also require to upload data
The Clipboard is a temporary place on the NCBI website to save records.
My Collections that is a part of the My NCBI service is a more permanent place to save records.
You need to create an NCBI account to use My NCBI. It is easy and free
There are two major kinds of relationships in the NCBI website:
Combining neighbors and hard links can be an especially effective method for navigating across data and finding the most useful information
Searching NCBI has much more options than Google
(do you know Google options?)
By default the query text is searched in any part of any database
But you can specify the fields where you are looking for
protease NOT hiv1[organism]
1000:2000[slen]
Mus musculus[organism] AND biomol_mrna[properties]
10000:100000[mlwt]
src specimen voucher[properties]
all[filter] NOT environmental sample[filter] NOT metagenomes[orgn]
Quotes "
are important
The fields are written inside brackets []
Each database page includes an Advanced Search option
Entrez queries can be single words, short phrases, sentences, database identifiers, gene symbols, or names
AND: Finds documents that contain terms on both sides of the operator terms. The intersection of both searches.
OR: Finds documents that contain either term. The union of both searches.
NOT: Finds documents that contain the term on the left but not the term on the right of the operator. The subtraction of the right side from the left side
AND
must be in uppercase. It is recommended to also use
uppercase for OR and NOT
Operators are processed left-to-right
promoters OR response elements NOT human AND mammals
Parenthesis can be used to control the evaluation order
g1p3 AND (response element OR promoter)
Certain fields can accept ranges of values
Low and high numbers are entered with a colon “:” between them followed by the field
110:500[Sequence Length]
2015/3/1:2016/4/30[Publication Date]
We can get a different explanation in the public documentation made by NCBI
All documents made by NCBI are public domain