This week we practice drawing trees and doing multiple alignments.
The answer for this homework should be a well structured document, like a paper or a report. It can be written in Markdown or Google Docs. In any case you should send your answer as an attached file, so the answer remains stored in the email server, and cannot be changed after being delivered. In other words, it is not enough to send links, you must send files.
The difference between an attached file and a link is important from the technical and legal perspectives. Be sure to understand it.
1. Build a neighbor joining tree of Turkey
The following table shows the distance (in kilometers) between some cities in Turkey, according to Wolfram Alpha.
City | Izmir | Bursa | Ankara | Adana |
---|---|---|---|---|
Istanbul | 336 | 100 | 351 | 711 |
Izmir | 257 | 520 | 737 | |
Bursa | 323 | 649 | ||
Ankara | 390 |
Using these distances build a Neighbor Joining tree. Show every step of the construction:
- The Q matrix of every
- Distance between old nodes and the new node on every step
- The new D matrix of each step
- The partial tree of each step
This step should be done “manually”. You can use a calculator, but not a packaged function or advanced program. We must do manually at least one tree once in our life.
2. Align keratin proteins
We want to align (and make a phylogenetic tree) of the following proteins
XP_036351858.1 XP_041361653.1 XP_031771706.1
XP_020892825.1 XP_037745389.1 XP_050186301.1
XP_044574083.1 XP_041361499.1 XP_030371273.1
XP_048565601.1 XP_027050739.1 WP_003121308.1
XP_040571383.1 XP_046913081.1 XP_045135256.1
XP_019233715.1 XP_024370096.1 XP_020325383.1
XP_045899490.1 XP_047326303.1 XP_027525420.1
XP_035144276.1 XP_035778297.1 XP_046802344.1
XP_027981243.1 XP_018564106.1 XP_017722073.1
Use at least two methods in the EBI website (https://www.ebi.ac.uk/Tools/msa) to build a Multiple Sequence alignment, and the corresponding phylogenetic tree. Please let me know which methods you will use.
You will probably need to download the sequences from NCBI’s Protein database. There are several ways of getting the FASTA files. For instance the Batch Entrez page (https://ncbi.nlm.nih.gov/sites/batchentrez). In this case you will need to prepare a text file with one accession number on each line. Do not use Microsoft Windows for this. Word files are not text files. You need to use a text editor, not a word processor. We recommend to use Visual Studio Code, but there are hundreds of alternatives, including Notepad and WordPad that are already included in Microsoft Windows. In MacOS you can use TextEdit.