One of the basic problems we want to address in this course is to
find a pattern —such as a word, a motif, a gene, or a protein
domain— into a larger text —such as a novel, a genome or a
protein. For example, we would like to know where we find the word
Sancho
in the file Don_Quixote.txt
.
Your mission is to write a function (in any reasonable
computer language) that takes two inputs, pattern
and
text
, and returns the set of locations where
pattern
occurs in text
.
For example, if pattern="RB"
and
text="ABRACADABRA"
, then your function should return 2 and
9. (In some languages, such as C++, Java and Python, indices start at 0,
so in that case the result is 1 and 8).
- Write the function, and test with a FASTA file, and with
Don_Quixote
. Try with several patterns. - How long does your function takes to find all matching places? What factors affect the execution time?