Using BLAST
There are two ways of using BLAST
- Going to NCBI’s website: https://blast.ncbi.nlm.nih.gov/
- Runs in NCBI servers with NCBI databases
- Using a command line version
- Runs in your server with your databases
- can also send jobs in NCBI servers with NCBI databases
- Download it from NCBI website
Types of BLAST
Depending on the alphabet of the query and subject
- BlastN
- Search nucleotides in nucleotide databases
- BlastP
- Search proteins in protein databases
- BlastX
- Search nucleotide in protein databases.
- Each query is translated into 6 putative proteins
Types of BLAST
- TBlastN
- Search proteins in nucleotide databases.
- Each subject is translated into 6 putative proteins
- TblastX
- Search nucleotides in nucleotide databases
- Translate each query and each subject into 6 proteins
- Compares all the resulting proteins
NCBI protein databases
- nr
- Non-redundant protein sequences
- refseq_protein
- Reference proteins
- refseq_select
- Reference Select proteins
NCBI protein databases
- landmark
- Model Organisms
- swissprot
- UniProtKB/Swiss-Prot
- pat_aa
- Patented protein sequences
NCBI protein databases
- pdb
- Protein Data Bank proteins
- env_nr
- Metagenomic proteins
- tsa_nr
- Transcriptome Shotgun Assembly proteins
NCBI nucleotide databases
- Human G+T
- Human genomic plus transcript
- Mouse G+T
- Mouse genomic plus transcript
- nr/nt
- Nucleotide collection
NCBI nucleotide databases
- Bacteria and Archaea
- 16S ribosomal RNA sequences
- refseq_select
- Reference Select sequences
- refseq_rna
- Reference RNA sequences
NCBI nucleotide databases
- refseq_representative_genomes
- RefSeq Representative genomes
- refseq_genomes
- RefSeq Genome Database
NCBI nucleotide (reads)
- SRA
- Sequence Read Archive
- TSA
- Transcriptome Shotgun Assembly
- HTGS
- High throughput genomic sequences
NCBI nucleotide databases
- pat
- Patent sequences
- pdb
- nucleotides in Protein Data Bank
- RefSeq_Gene
- Human RefSeqGene sequences
BlastN variants
- megablast
- Highly similar sequences
- discontiguous megablast
- More dissimilar sequences
- blastn
- Somewhat similar sequences
BlastP variants
- blastp
- protein-protein BLAST.
- PSI-BLAST
- Position-Specific Iterated BLAST.
- builds a position-specific scoring matrix.
- PHI-BLAST
- Pattern Hit Initiated BLAST.
- limits alignments to those that match a pattern in the query.
BlastP variants
- Quick BLASTP
- Accelerated protein-protein BLAST.
- very fast and works best if the target percent identity is 50% or more.
- DELTA-BLAST
- Domain Enhanced Lookup Time Accelerated BLAST.
- builds a PSSM using a Conserved Domain Database search.
- searches a sequence database.
Homework
- Write a document explaining the details of BLAST algorithms
- Each student takes a different algorithm
- megablast and discontiguous megablast
- PSI-BLAST
- PHI-BLAST
- Quick BLASTP
- DELTA-BLAST
- and …
Filters & Masking
Explain what are these options
- Low complexity regions filter
- Mask for lookup table only
- Mask lower case letters
Rules
Understand and explain
- why the algorithm is useful
- how does it work
- what is the difference with the standard BLAST
- Read the associated paper
Use your words: Do not copy and paste
Give proper references when citing someone else’s work