Genome Assembly
Using all reads and SPAdes
- Reads in rstudio.iu.edu.tr server, folder
/home/bioinfo/reads/4-F20-96_S2_L001_R?_001.fastq.gz
- quality control using fastqc
- put the results in
public_html
folder, so you can see them on-line
- put the results in
- clean and trim using some tool like trimmomatic
- Suggest some alternative
- quality control again using fastqc. Is it better?
- assemble using SPAdes
- visualize using Bandage
- Calculate N50
Using phrap
- Convert all reads from fastq format to fasta and qual. It can be done using Python
- assemble using phrap
- visualize using consed
- calculate N50
Using reference sequence
- Terje suggested that this primer could be similar to NC_025175.
Test this hypothesis by aligning all reads to this sequence.
- Use bowtie, bowtie2, bwa mem and bwa aln
- all these produce SAM files
- are all results the same? How can you compare them?
- Which alignment is “the best”
- prepare fastq files containing only the reads that align to the plasmid
- assemble these reads using phrap
Analysis of alignment
- what is the coverage of each nucleotide of the reference plasmid?
- are there any missing regions? What genes are there?
- are there any regions with atypical coverage? What genes are there?
- Which reads are
- only in the plasmid
- only in the chromosome
- in both plasmid and chromosome
Multiple sequence alignment
We want to recreate some of the phylogenetic trees from the paper
Aas, Jørn A, Bruce J Paster, Lauren N Stokes, Ingar Olsen, and Floyd E Dewhirst. “Defining the Normal Bacterial Flora of the Oral Cavity.” Journal of Clinical Microbiology 43, no. 11 (2005): 5721–32. https://doi.org/10.1128/JCM.43.11.5721-5732.2005.
This paper has 8 figures. You can replicate any of them. The
accessions ids of the sequences used in each figure can be found in the
following filesThanks to Reyhan Aydın for doing the manual
labor.
:
- Aas2005-fig1.acc
- Aas2005-fig2.acc
- Aas2005-fig3.acc
- Aas2005-fig4.acc
- Aas2005-fig5.acc
- Aas2005-fig6.acc
- Aas2005-fig7.acc
- Aas2005-fig8.acc
You can download all of them at once in a zip file: Aas2005-fig-acc.zip
Some programs can work directly with accession numbers. Others will need the sequences in FASTA format. You will need to download them. It is better to download only the sequences you need for the figure you are making.
You can use multiple aligners on the web or on the server, such as
Then you can build the phylogenetic tree using one of these tools:
Some of these tools are available in the server. If you need something else, let me know.