We are starting the advanced part of our course. Thus, we will be more professional in our answers. The answer for this homework should be a well structured document, like a paper or a report. It can be written in Markdown or Google Docs.
The first part is a continuation of the class “How BLAST works”. We want to search for words that have parts that match the query.
How can we search for all words that start with a specific prefix? In other terms, how can we search for words whose beginning matches the query?
How would you make an index to search efficiently for words that end in a specific suffix? In other words, if the query can appear only at the end of each word, how can you find efficiently which words end on that query?
What if the query could appear anywhere in the word? How would you make an index for that case?
Now we want to expand the question 7 of the midterm exam, and compare strains of E.coli, and from Paenibacillus larvae. We will use BLAST, not for a database search, but to align two sequences.
Compare Escherichia coli O157:H7 str. Sakai (accession NC_002695.2) against Escherichia coli strain K-12 substrain MG1655 (accession NC_000913.3). What does the dot-plot show?
Compare Paenibacillus larvae strain ATCC 9545 (accession NZ_CP019687.1) against Paenibacillus larvae strain Eric_V (accession NZ_CP019717.1). What does the dot-plot show?
In both cases please save the result as CSV file, and load into
Google Sheets. Use this data to draw a scatter plot of
query start
and subject start
. Copy this plot
in your answer and describe what you see.
The dot plot should look like this
- (bonus) What is ATCC?
Please send your answers as attached files in an email to andres.aravena+bioinfo@istanbul.edu.tr. If you use
Markdown, send also the relevant images so the document can be
reproduced completely. If you use Google Docs, send the document link,
and a copy in PDF format. Use the option File->Email
in
the menu.