They seen more complex, but they are still polymers
DNA and proteins are large molecules
They are made using only a few types of pieces
And they form a string or chain
They are easy to be represented by symbols
We can represent it with four letters. The sequence
ATGAATACTATATTTTCAAGAATAACACCATTAGGAAATGGTACGTTATGTGTTATAAGAATTTCTGGAA
AAAATGTAAAATTTTTAATACAAAAAATTGTAAAAAAAAATATAAAAGAAAAAATAGCTACTTTTTCTAA
ATTATTTTTAGATAAAGAATGTGTAGATTATGCAATGATTATTTTTTTTAAAAAACCAAATACGTTCACT
GGAGAAGATATAATCGAATTTCATATTCACAATAATGAAACTATTGTAAAAAAAATAATTAATTATTTAT
TATTAAATAAAGCAAGATTTGCAAAAGCTGGCGAATTTTTAGAAAGACGATATTTAAATGGAAAAATTTC
TTTAATAGAATGCGAATTAATAAATAATAAAATTTTATATGATAATGAAAATATGTTTCAATTAACAAAA
AATTCTGAAAAAAAAATATTTTTATGTATAATTAAAAATTTAAAATTTAAAATAAATTCTTTAATAATTT
uses only A
, C
, G
, T
.
We can represent proteins as combinations of 20 letters.
MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTISGQDALPNISDAERI
FAELLTGLAAAQPGFPLAQLKTFVDQEFAQIKHVLHGISLLGQCPDSINAALICRGEKMSIAIMAGVLEA
RGHNVTVIDPVEKLLAVGHYLESTVDIAESTRRIAASRIPADHMVLMAGFTAGNEKGELVVLGRNGSDYS
AAVLAACLRADCCEIWTDVDGVYTCDPRQVPDARLLKSMSYQEAMELSYFGAKVLHPRTITPIAQFQIPC
LIKNTGNPQAPGTLIGASRDEDELPVKGISNLNNMAMFSVSGPGMKGMVGMAARVFAAMSRARISVVLIT
QSSSEYSISFCVPQSDCVRAERAMQEEFYLELKEGLLEPLAVTERLAIISVVGDGMRTLRGISAKFFAAL
ARANINIVAIAQGSSERSISVVVNNDDATTGVRVTHQMLFNTDQVIEVFVIGVGGVGGALLEQLKRQQSW
LKNKHIDLRVCGVANSKALLTNVHGLNLENWQEELAQAKEPFNLGRLIRLVKEYHLLNPVIVDCTSSQAV
ADQYADFLREGFHVVTPNKKANTSSMDYYHQLRYAAEKSRRKFLYDTNVGAGLPVIENLQNLLNAGDELM
In molecular biology we often work with sequences
The main reason why computing is useful for molecular biology
For example:
Each of these values are numbers with decimals,
and with a margin of error
Statistics is a way to tell a story that makes sense of the data
In genomics, we look for biological sense
That story can be global: about the complete genome
Or can be local: about some region of the genome
We will start with global properties
The percentage of nitrogenous bases on a DNA molecule that are either guanine or cytosine.
Measuring the melting temperature of the DNA double helix using spectrophotometry
The absorbance of DNA at a wavelength of 260nm increases when DNA separates into two single strands at melting temperature
If the DNA has been sequenced then the GC-content can be accurately calculated by simple arithmetic.
GC-content percentage is calculated as \[\frac{G+C}{A+T+G+C}\]
Write a step-by-step plan to find the GC content of the first gene of E.coli
Write it in English
There are several ways to store DNA or protein data
Most of the times they are stored in FASTA format
FASTA files are text files, with some rules
Microsoft Word files (doc
or docx
) are NOT text files
You should never use Microsoft Word to store sequences
A good alternative is Visual Studio Code, also by Microsoft
>AP009180.1 Candidatus Carsonella ruddii PV DNA, complete genome
ATGAATACTATATTTTCAAGAATAACACCATTAGGAAATGGTACGTTATGTGTTATAAGAATTTCTGGAA
AAAATGTAAAATTTTTAATACAAAAAATTGTAAAAAAAAATATAAAAGAAAAAATAGCTACTTTTTCTAA
ATTATTTTTAGATAAAGAATGTGTAGATTATGCAATGATTATTTTTTTTAAAAAACCAAATACGTTCACT
GGAGAAGATATAATCGAATTTCATATTCACAATAATGAAACTATTGTAAAAAAAATAATTAATTATTTAT
TATTAAATAAAGCAAGATTTGCAAAAGCTGGCGAATTTTTAGAAAGACGATATTTAAATGGAAAAATTTC
TTTAATAGAATGCGAATTAATAAATAATAAAATTTTATATGATAATGAAAATATGTTTCAATTAACAAAA
AATTCTGAAAAAAAAATATTTTTATGTATAATTAAAAATTTAAAATTTAAAATAAATTCTTTAATAATTT
GTATTGAAATCGCAAATTTTAATTTTAGTTTTTTTTTTTTTAATGATTTTTTATTTATAAAATATACATT
TAAAAAACTATTAAAACTTTTAAAAATATTAATTGATAAAATAACTGTTATAAATTATTTAAAAAAGAAT
TTCACAATAATGATATTAGGTAGAAGAAATGTAGGAAAGTCTACTTTATTTAATAAAATATGTGCACAAT
ATGACTCGATTGTAACTAATATTCCTGGTACTACAAAAAATATTATATCAAAAAAAATAAAAATTTTATC
TAAAAAAATAAAAATGATGGATACAGCAGGATTAAAAATTAGAACTAAAAATTTAATTGAAAAAATTGGA
ATTATTAAAAATATAAATAAAATTTATCAAGGAAATTTAATTTTGTATATGATTGATAAATTTAATATTA
AAAATATATTTTTTAACATTCCAATAGATTTTATTGATAAAATTAAATTAAATGAATTAATAATTTTAGT
TAACAAATCAGATATTTTAGGAAAAGAAGAAGGAGTTTTTAAAATAAAAAATATATTAATAATTTTAATT
TCTTCTAAAAATGGAACTTTTATAAAAAATTTAAAATGTTTTATTAATAAAATCGTTGATAATAAAGATT
TTTCTAAAAATAATTATTCTGATGTTAAAATTCTATTTAATAAATTTTCTTTTTTTTATAAAGAATTTTC
ATGTAACTATGATTTAGTGTTATCAAAATTAATTGATTTTCAAAAAAATATATTTAAATTAACAGGAAAT
TTTACTAATAAAAAAATAATAAATTCTTGTTTTAGAAATTTTTGTATTGGTAAATGAATATTTTTAATAT
AATTATTATTGGAGCAGGACATTCTGGTATAGAAGCAGCTATATCTGCATCTAAAATATGTAATAAAATA
AAAATAATTACTTCAAATTTAGAAAACTTAGGTATAATGTCTTGTAATCCTTCAATAGGAGGTATTGGAA
AATCACATTTAGTTAAAGAATTAGAATTATTTGGTGGAATAATGCCAGAAGCATCTGATTATAGTAGAAT
ACATTCTAAATTATTAAATTATAAAAAAGGAGAATCTGTTCATTCTTTAAGATATCAAATTGATAGAATT
TTATATAAAAATTACATATTGAAAATTTTATTTTTAAAAAAAAATATTTTAATAGAACAAAATGAAATAA
ATAAAATTATTAGATTTAAAAAAAAAATTTTAATCTTTAACAAATTAAAATTTTTTAATATAGCAAAAAT
TATTATTGTTTGTGCTGGTACTTTTATTAATTCTAAAATATATATAGGCAAAAATATTAAAGCTTTGAAC
AAAGCAGAAAAAAAATCTATTTCTTATTCTTTTAAAAAAATAAATTTATTTATTTCAAAATTAAAAACAG
GCACACCTCCAAGATTAGATTTAAATTATTTAAATTATAAAAAATTAAGTGTTCAATATAGTGATTATAC
TATTTCATATGGTAAAAATTTCAATTTTAATAATAACGTAAAATGCTTTATAACAAATACTGATAATAAA
ATTAATAACTTTATTAAAAAAAATATTAAAAATTCATCTTTATTTAATTTAAAATTTAAATCTATAGGAC
CCAGATATTGTCCAAGTATTGAAGATAAAATTTTTAAATTTCCAAATAATAAAAATCATCAAATTTTTTT
AGAGCCAGAAAGTTATTTTAGTAAAGAAATTTACGTTAATGGATTATCTAATTCATTATCTTATAATATT
CAAAAAAAATTAATAAAAAAAATTTTAGGAATTAAAAAAAGTTATATTATAAGATATGCGTATAATATTC
AATATGATTATTTTGACCCTAGGTGTTTAAAAATTTCTTTAAATATTAAATTTGCTAATAATATATTTTT
AGCAGGACAAATTAATGGTACAACTGGTTATGAAGAAGCTTCTTCACAAGGTTTTGTTGCAGGAATAAAT
TCCGCAAGAAAAATTTTAAAACTACCTTTATGGAAACCAAAAAAATGGAATTCTTATATAGGAGTTTTAT
TGTATGACTTAACTAATTTTGGAATTCAAGAACCTTATAGAATTTTTACTTCAAAATCAGACAATCGCTT
ATTTTTAAGATTTGATAATGCAATATTTAGATTAATAAATATTTCTTATTATTTAGGATGTTTACCTATT
GTTAAATTTAAATATTATAATTCTTTAATATACAAATTTTACAAAAATTTAATTAATATTAGAAAAATAA
AGTTATTTGATAATTTTTATTTGTTTAAGTTAATAATTATAATGTCAAAATATTATGGTTATATTAAAAA
AAAATATTTTAAATAATTTTCTTAATTTTAAAATAATTGATTTAAATTTAATAATATTATTATTATTTAT
ACATTTAATTGTATTTTATTTATTAAAAAATAATAATTTAATGATATTATTATCAATATATTTAAACAAT
TTTATTAAAAATTCTATCAACCTAAATTCAAGAAATATAATTTTTTTTTTTTCACTAGTATTGTTTAATA
TAATATTATTTTCTAATTTTATTGATTTATTTCCAAATAATTTAATAAAAAATTTTTTAAATTTAAAACA
AATTGAAATTGTTCCAACTTCAAATATAAATATAACTTTTTGTTTTTCAATAATTTCTTTTTTAATAATT
ATAATGTTAACACATAAAAAAATAGGTTTTAAAAAGTATATATATAGTTTTTTTATTTATCCAATAAACA
CTGAATACTTATATTTATTTAATTTTATTATTGAAAGTATTTCTTATATAATGAAACCGATATCTTTATC
TTTAAGATTATTTGGAAATATTTTTTCTTCTGAAATTATATTTAATATAATTAATAATATGAATGTATTT
ATTAATAGTTTTTTAAATTTAATTTGGGGAATTTTTCATTTTATAATTTTACCTCTTCAATCTTTTATTT
TTATTACATTGGTTATAATATATGTTTCACAAACTTTAAATCATTAAAAAAAAAAATGAATAATTTATTA
ATATTATCTTCATCAATAATGATAGGATTATCATCTATTGGAACAGGTATAGGATTTGGAATTTTAGGAG
GAAAACTTTTAGATTCCATATCAAGACAACCAGAATTAGATAATTTATTATTAACTAGAACTTTTTTAAT
GACAGGATTATTAGATGCTATTCCAATGATAAGCGTAGGTATAGGTTTATACTTAATATTTGTTTTATCA
AATAAATAATATGAATTTCAATTATACTATTATTAATGAATTTGTATCTTTTTTAATTTTTTTTTATGTT
TCATTTAAAATTATATTTCCAGTTATATTAAAAAAAATAAATAATTTTTTAATAATTGATTATAAAAATT
TTGTTTTTAACAATCAAGAAAAAATTATTAAAAAAAAATTATTAGATGAAATAGTTAAAAACGAAAATTT
AACAAATAAGAAATTTATATCTTTAATAGAAAAAATAAAAAAAAGTATTTTATTAGAAAAACAAAATTTT
ATTAATTTTATAAAATTAGAAAAAATAAACGTTCTAAAAATTTTTAAAAAAAAAATATTAAATAATAATA
TGTTAATTATTAAAAACTTTTTAATTGAGATTAAAAAATTGTTTATAAATAGCTTTAAAAATATTTTTAA
TGAAATTATTTGTTATAACAATGAATTTATAATTAATTATGTTTAAATTTATAAACAGGTTTTTAAATTT
AAAAAAAAGATATTTTTATATTTTTTTAATAAATTTTTTTTATTTTTTTAATAAATGTAATTTTATTAAA
AAAAAAAAAATATATAAAAAAATAATTACTAAAAAATTTGAAAATTATTTATTAAAATTAATTATTCAAA
AATATGCTAAATGAAGGAATAATAAACAAAATTTATGATAGTGTAGTTGAAGTTCTTGGATTGAAAAATG
CTAAATATGGTGAAATGATTTTATTTAGTAAAAATATTAAAGGAATAGTATTCAGTTTAAACAAAAAAAA
TGTAAATATAATTATATTAAATAATTATAACGAGTTAACACAAGGAGAAAAATGTTATTGCACAAACAAA
ATATTTGAAGTTCCTGTTGGAAAACAATTAATAGGTAGAATAATAAATTCTAGAGGAGAAACTCTCGATT
TGTTACCAGAAATTAAAATAAATGAATTTTCACCTATTGAAAAAATAGCACCAGGTGTTATGGATAGAGA
AACAGTAAATGAGCCATTATTAACTGGAATAAAATCTATTGATTCAATGATTCCTATTGGAAAAGGACAA
CGAGAATTAATTATTGGTGATAGACAAACTGGAAAAACTACAATTTGTATTGATACTATTATTAATCAAA
AAAATAAAAATATTATTTGTGTTTATGTTTGTATAGGTCAAAAAATATCTTCTTTAATAAATATTATTAA
TAAGCTTAAAAAATTTAATTGCTTAGAATATACAATTATTGTAGCTTCAACTGCCTCAGATAGTGCAGCG
GAGCAGTATATTGCTCCATATACTGGAAGCACAATAAGTGAATATTTTCGTGATAAAGGACAAGATTGCC
TAATTGTTTATGATGATTTAACAAAACATGCTTGGGCATATAGACAAATTTCTTTACTATTAAGACGTCC
ACCTGGTCGTGAAGCTTATCCTGGTGATGTATTTTATCTTCATTCAAGATTATTAGAAAGATCATCTAAA
GTGAACAAATTTTTTGTAAATAAAAAATCTAATATTTTAAAAGCAGGTTCTTTAACTGCATTTCCTATAA
TTGAAACTTTAGAAGGAGACGTAACTTCTTTTATTCCAACAAATGTTATTTCTATAACTGATGGTCAAAT
TTTTTTAGATACAAATTTATTTAATTCAGGAATTAGACCATCAATAAACGTTGGATTATCTGTTTCTAGA
GTTGGTGGCGCTGCTCAATATAAAATTATTAAAAAATTAAGTGGAGACATTAGAATTATGTTAGCTCAGT
ATAGAGAATTAGAAGCATTTTCTAAATTTTCATCCGATCTTGATAGTGAAACTAAAAATCAATTAATAAT
TGGAGAAAAAATAACAATATTAATGAAACAAAATATACATGATGTTTATGATATATTTGAATTAATATTA
ATATTATTGATAATTAAACATGATTTTTTTAGACTAATTCCAATAAACCAAGTTGAATATTTTGAAAATA
AAATTATAAATTATTTAAGAAAAATTAAATTTAAAAATCAAATTGAAATTGACAACAAAAATTTAGAAAA
TTGTTTAAACGAATTAATAAGTTTTTTTATATCAAACAGTATATTATGATTATTAAAGAAATAAATAGTA
AAATAAAAATAACAACAAATATCAATAAATTAACTAATACTTTGAGTATGATTTCATTGTCTAAAATGAA
TAAATATATAAATTTAATTAATAATTTAGATTATATTAACATTGAATTAAAAAAAATTTTAGAATATATT
ATTATTAACATTAAAAGTAACGTATTTTGTTTAATAATAATTACTTCAAACAAAGGATTGTGTGGAAATT
TAAATAATGAAATTATTAAATACTCGCTTAATTATATTAAAAACAATAAAAATTTAGATTTAATTTTAAT
AGGAAAAAAAGGAATAGATTTTTTTAATAAAAAAAATTTTTATATTAAAGAAAAAATAATTTTTAAAGAC
AATGAATTAAAAAATTTAGTTTTTAATAATAAAATTTTAAATGATTTAAAAAAATACGAAAATATTTTTT
TTATTAGTTCAAAAATTATTAAAAATAACGTTAAAATAATAAAAACAGATTTGTATTTAAAAAAAAAATA
TAATTATTTAATAAAACATAATTTTAATTATGATTGTTTTTTAAAAAATTTTTATAATTATAATTTAAAA
TGTTTGTATTTAAATAACTTGTTTTGTGAATTAAAATCTAGAATGATTACAATGAAGTCTGCTGCTGATA
ATTCAAAAAAAATAATTAAAGACATGAAATTAATAAAAAATAAAATTAGACAATTTAAAGTTACTCAAGA
TATGCTTGAAATAATAAATGGAAGTAATTTATGATAGGAAGAATTGTACAAATTTTAGGTTCTATAGTAG
ACGTTGAATTTAAAAAAAACAATATTCCATATATATATAATGCTTTATTTATTAAAGAATTTAATTTATA
TTTAGAAGTTCAACAACAAATTGGAAATAATATTGTAAGAACTATAGCTTTAGGTAGTACCTATGGATTA
AAAAGATATCTTTTAGTAATAGATACTAAAAAACCAATTTTAACTCCTGTTGGAAATTGTACTTTAGGAC
GTATATTGAATGTTTTAGGTAATCCCATTGATAATAATGGTGAAATTATTTCAAACAAAAAAAAACCAAT
ACATTGTTCACCGCCAAAATTTTCAGATCAAGTATTTTCAAATAATATATTAGAAACTGGAATAAAAGTA
ATAGATTTATTGTGTCCATTTTTAAGAGGAGGAAAAATTGGTTTATTTGGTGGAGCAGGTGTTGGTAAAA
CTATAAATATGATGGAATTAATAAGAAATATTGCAATTGAACATAAAGGATGTTCTGTATTTATAGGAGT
TGGTGAAAGAACTCGTGAAGGAAATGATTTTTATTATGAAATGAAAGAATCAAATGTATTAGACAAAGTT
TCTTTAATATATGGTCAAATGAATGAACCTTCAGGTAATAGATTAAGAGTTGCATTAACTGGATTAAGTA
TAGCAGAAGAATTTAGAGAAATGGGTAAAGATGTACTTTTATTTATAGATAATATTTACAGATTTACGTT
AGCAGGTACTGAAATTTCAGCATTATTGGGAAGAATGCCTTCAGCTGTTGGATATCAGCCTACTTTAGCA
GAAGAAATGGGAAAATTACAAGAAAGAATTTCTTCAACAAAAAATGGAAGTATTACTTCAGTACAAGCTA
TATACGTACCTGCTGATGATTTAACAGATCCATCTCCAAGTACTACTTTTACTCATTTAGATTCTACTAT
TGTTTTGTCTAGACAAATAGCGGAATTAGGAATTTATCCTGCTATTGATCCATTAGAATCTTATTCTAAA
CAATTAGATCCTTATATAGTAGGAATTGAACATTATGAAATTGCTAATTCTGTAAAATTTTATTTACAAA
AATATAAAGAATTAAAAGATACAATAGCTATTTTAGGAATGGACGAATTATCAGAAAATGATCAAATTAT
TGTTAAAAGAGCAAGAAAGTTGCAAAGATTTTTTTCTCAACCTTTTTTTGTTGGTGAAATATTTACAGGA
ATAAAAGGAGAATATGTAAATATAAAAGATACAATTCAATGTTTTAAAAATATTTTAAATGGTGAATTTG
ATAATATTAATGAAAAAAATTTTTATATGATAGGAAAAATATGAATTTATTAATTTTAAGTATAAAAAAT
ATTATAGAATATAAAAATGCTTCTATATTAAATGTAAAAACATACTTAAAACTTTTTTCAATTATGAATA
ATCATATAAATAATATTTGCGATGTTAATCAAATTAAGTTAATATTTAAAAATAAAATCATAAATATAAG
AATTAATAATGGTTTTTTATTTCAAAAAAAAAATAATACTAAAATAATATGTAATTTTTATGAATTTTTA
TAATAAACATATATTAAATGATTTTTCTTTTAAAAAGTATGAAATTTTAACTTTATTTGAAATTAGTAAA
AAAAAAATAAAAAATTTTTTAAATAATAAAAATATTTGTATTTTAAATGATAAAAAATCATTAAGAACAA
TTAATTCACTAATTAATAGTTTTAATTATTTAAATATTAAATATTTGCAAATTTTAAATAATCATAATAT
TAAAAAAGAAAGTTTTAAAGATTTTTCAAGAACAATAGGTTTAAATTTTGATTATTTATATTATAGATGT
TTAAATGACAAAATATTAAAAATTATTGCAAAATATTCAAGTTTAATAATTGTAAACTTATTAAGTAATG
GATATCATCCAATTCAAGCATTAACTGATATTAATAGTTTTTTTTATAATAAAAAAGATGTTTTAATGTA
TATAGGAAATATAACTTCAAATGTAATTAGATCAATAATTATATTATTATCAAAGATAAATTATCTTGTT
GTTTTAATATCACCTATTAAATATTGGTTTAAATTTTTAATAAAAAAAATTTTTCCAAAAAAGAAAATAC
TTATAAGTGAAAAATTAATTTTATTTAAAAAAAAATATTATGTATATACAGATGTTTGGGAATCAATGAA
TAATAAAAATGTAAAAATAACTGATTTTTTAAACTTACAAATTAATAAAAAATTATTTGATTTAATTAAA
ATAAAAAAAGTATTACATTGTATGCCAAGATTTAATAAAAGTTATTTAGATTTTGAAATTTCAAATTTAG
TATTTGAATCAGATTACTTTTTAGTTAATAATTCGATAATTAAAAAAAATAAAATATTTAAAAGTTATAT
TTTTATTAGTAATTCATTTTTTTTTAAAATCATTTAGTTCTTTTAAATTAATATTATAAGATAGTTTGTT
TATATAATCAAAAATTTCATTTTTTTTATATTCAATAATTTTAATAATTTTTTTCATAAACTTAAAATAT
AATTTATTGCATGAAAATATCCATTCTCTTTCATACCTGAAATTACAACATTTGAATTAAGTGAAATTAT
AGAATTTTTTCTATAATATTCTCTATTGTATATATTTGATATATGTAATTCGATAATTTTACCTTTAAAA
ATTTTTATACAATCTAATAAAGCAATTGAATAATGACTATATGCACCTGGATTTATAATAATATAATTAA
AGTTTATATTTTTTTGAATAAAATTAATTATTTTTCCTTCGCAATTTGAATTATAAAATTTAATATTTAT
AATATTTTTTGAGTATTTTAAAATTTTTTTTTTTAATTTTTTAAAAGAAATTTTAGAATAAATTTTTTCT
CTTTTTTTTAAAAAATTAATATTTGGTCCATTTATTATTAATACATTTATAATTTTATTACAAACAAACA
TAGTTTAATTAAAAATTTTTTGTTTAAATTAGTTTTTTTTTTTAGTTCTAGTTCGTTACTAGAATATCCA
AATTTTTTTATGTTTAACACATACGTAAAGTATTTTTTATATTTATACCAAAAATCATCATTTGAAGATT
CAACAAAAATTATTTTTTTACAATTTAAAATTTTGTTTTTATATATTTTTTTTTGTTTATCAAATAATTT
ATTGCAGAATAATGAAATTATTTGAATAATAAAATATTTTTTTAAGAAAAAATAACATTCAAAACAGATT
TCTAAATCTGATCCATTAGAAACAATTATTAATTCTATTTTTTTTTTATAAAAACATGAATAAGTACCAG
TTATAATATTTTTAATATTATATATTTTAATAAAATTGTTTTTAAAATTTTGTCTTGATAAAATTAGAGA
CGAACAATTATTTAAAAATTTCAAAATTAATATCCAACATAGAATTAATTCTATATAATTATATGGTCTA
AATATATAATTTCTTGGTATTATTCTAATTGAATGTAATTGTTCAATTGGTTGATGTGATGGTCCATCTT
CACCAACTAAAATTGAATCATGTGTAAATATAAAAATATTTTTAAGTTTAGATAAACAAAAATTTCTTAT
TGCACTATACATATAATTTGAAAAAACTAAAAAAGTAGAACAATAATTTATTCCTATTTTATCAGAAGAT
AACCCGTAATTTATTAATCCCATTGTAAATTCTCGTACTCCATAATTTATATATCTATTTTTAAAATTTT
TATATCTAATAGAATTAATAAAATTGTTTTTTGTTAAGTTAGAATTTGTTAAATCTGCGCTTCCTCCAAA
TGTTTCATTTATTGCATATATATTTTTTAATATATTAGAACAAACAAATCTAGTAGACTTATTTAAATTT
ATTTTATAGTATTTAAAATATAATTTTAAAAAATTTATTTTTGGTATAATGTTATTAAAAATTCTTATTA
ACTCAAAAAAATATTTTTTATATTTTTTTTTGTAGTATATTAAATATTTTTTTTTATTATCAAAAAACAT
TTTTTTAACATAATCATATGTTAATGTAAAATTTTTTAAAATTTCTAAAAATTCAAATTTTGTAAAAATA
TTTCCATGAGAATTTTCATTATATGATTTACATGGAGAAATAAATCCTATTATAGTATTGTAAATTATAA
TTGTTGGAAAATAACTTTTTTTTGCTTTTAATAAAGATTTAATTATTGAAAAATAGCAATGTCCATTTAT
TGGTCCAATAACATTCCAATTTAATGAAATAAATTTTAACTTAATATTTTCATTAAAATAATTTTTAACA
TTTCCATCTATTGAAATATTATTACTATCATATAATAATATAATATTGTTAATATTATAGCATCCACAAA
AAGAACATGATTCGGAGGACACTCCTTCCATTAAACATCCATCTCCACAAAATATCCAAACTTTATTATT
GAATATATTAAAAAAATTATTAAATTTATTTTTATACTTTTTACTTTTTAAACCAATTCCAATTCCAATT
CCAATTCCTTGTCCTAATGGACCAGTTGAAGCATCAATAAAATTTCCAATTTCAGGATGACCTGGTGTAT
TAGAATTAAACCTTCTAAAATTTATTAAATCTTTTATTTTATATACATTGTATAAATAAAGTAATACATA
ATTTATAATTATTCCATGCCCATTTGAAATTATAAGTTTATCTTTATTAATTGATTTTAAATTGTTAAAA
TTTATTTTATAAAAATTTAAAAAAAAAATCGTAAATACATCACAAATTCCAAGAGGCATACCGGGATGTC
CAGAATTAGCTTTTGAAATTGATTTAATACAAATTAATCTAATATTATTTATTATGTTATATAACATTTT
AAAATTTAAAATTTTTTTTTTCAAAATTTATTCAATTTGTAATTATAAAACAAATACTTTTTCTATTTTA
AATAAAAAAATAAAATATAATTTTTTTTTGAACTTTATTAATTATTATATAAATTATTTAAATTATAACA
ATAAAAAAAAAATTGGAATTTTAATGTATTTTAAAGTATCAAAAGTAATTTCTTCTTTTAACATAGAAAA
AAATGGTATCTTTTTTTTTTCAAACAAGAATGTTTTTTTATATAAAATATTAAAAAATTATGATATAAAC
AATATTTATCACGTAATTAAAATAATTAAAATAAATAAAATAAAGTTTAACTTAAAAATTTTAAAAAAAA
TATTTACAAAAATTTTAAAAAAAAAAAGAAAAGAAGTATATGAAAAATTAGAAGAAAGATATTTAATTAC
AATACTATTAAATAATTTAAACGAAACAAAAAATAAGATTATTAATATTTATAAATCATTAATTAATTAT
AATACTAATAATTTTTTTTTAATTAATAAAGAATTTAACAAAGTATGTTCTTTACTGTATTTAAGTAAAA
ATGAAAGTTTGTCGAAAAAAATTCATTTAGGATTAATAAAAAATAATTTTAAAGAAGAAACTCCTTTTTA
TTTAAATTACATATTTAATTATTTCTTAAAATTTAATGAGCTAAAATTAACAATTTCAATTGAAATTTAT
AACTTAGATATTTTAAAAATAATTAAAACAATCAAAAAAAATAAAAAAATAAAAATTTTCATTAATGTTG
GTATAAATGATTTATTTTTTGAAAAAATTTTTAAAAAAAAAAAAATAATTTTATTTAATTCGTTTAAAAT
AAAAAAAGAATATGGTTATTACGTACAAAATTTTTTTGATGAATATGTTGGATATGGATCATTTAGAAAA
ATGTATTTTAAAATATTTAAAAACAAAAATATTTTTAAGATAAAAATTTGTGCTAAATATTTTTTTTTAA
AAATTTTAAAAACTAAAAATTTAAAAATTTATTTTTTAGATTCTTTAAACAGAAACAATTTAAATAAACA
TATTAGTAATTTACTTACTGGATTTTTTCATCCAAAAATATTTGATAAAAATAATTTTTTTAAAAAAAAA
TATTTTTTTTACAAAAACAATAATATTTTAATAAATAAAAATAATTCTTTTTATTTAGAAATAAAATTTT
TTGTAAATTTTAAAATTTGTAAATATATTAAAAAAAAAATTGTTTTTTTATATAAATTTTTTAACAAAGA
AAGTGAAAATTATATTATAAAAAAAGAAATAAATTTTTGTTTAAATTATCGAATAAAACCAATAACAATT
TATTTTCATGTAGTAAATAAAAAAGTTGAAGAATATATTAATTTTTTAATTTTACAAATTAATTGTAATT
TATCAAAGAAAAATAATTCATATTGTTGGTACTTTGGTAGTAATATTTATAATAGCAATTTTTTTTATAT
TAAAAAATATATATCAAAAAAATGGAATTTTATTATTAAGAAAATCATTTTATTTAAAATAAAAAATTCT
GTTTATTTAAATTTTAAAATTAAAAAAACAAATTTAAAACTAATATCATTAGATAATTTTTTATTAAAAT
TAATAATTAAAAATTGGCAAAAAAAAAATGAAAAATATTAGTTTTGAAATATTTCCTTGTAATAACATTA
AAGACTTATCTGTTTTAATAAATTATTTAAACAAAAATAAACCTAGTTTTGTTTCTGTAACATTTGGAAA
AATCAATAACTTAAAATTTGTTAAAAATATACAAAAACAGATTTCTACAAAAATAATACCACATTTAATA
TGTGATAATATATTTAATATTATTAATTATATAATTTATTTTATTAAAATAAAAATATTTAATTTTTTAA
TAATTACAGGAGACAAAAACAAAAATAATTCTATAAAATATATTTATTTTATTAGATTTTTGTTTGGTCA
TATAATTAAGATAATAACAGGATGTTATTTTGAAAATCACAAATTTTCTAAAAATTTTAAAAACGAAATT
TTATTTCATTATAAAAAAAATAAAATAGGAACTAATATGTGTATTACACAGTTTTTTTATAATTTTAACA
CAATAAAGTATTACATTAATATTATTAAAAAAACTGGTATTAGTAAAAATTTTATATTAGGAATAATTTC
AAAAAAAAATATAAAAGATATTTTAAATTATACTAATTTATGTAAAATAGATATTCCAATTTGGATAATT
AAAAATTATAAAGAATTTAATATTGAACTTTTTTTTGTTAAAAATTTAAAAAAATACAAAAATTTGCATT
TTTATACTTTTAACAATATTAATTTAATTAAAAATTATTTTAAATAAATTTTATTGTTATAAAATAAGTA
TACAAAATAATTAATAATAAAAAAAAATTTTTTATTAATAAAAAAAAAAATTTTTTTTATTAAAAAGTTT
CTAACAAAATTTAAAACATTTACTTTAATCATTTAAATTATTTTAAAAAAAAAAAAAATAAACAATTCAT
TATACTAAAAATAGTTAAAATTTAATTTTTAAATTACTTTATTAAACTTGATATTTTTAAAAAAAAAAAA
… and more
>CRP_004 F0F1-type ATP synthase C subunit
ATGAATAATTTATTAATATTATCTTCATCAATAATGATAGGATTATCATCTATTGGAACAGGTATAGGAT
TTGGAATTTTAGGAGGAAAACTTTTAGATTCCATATCAAGACAACCAGAATTAGATAATTTATTATTAAC
TAGAACTTTTTTAATGACAGGATTATTAGATGCTATTCCAATGATAAGCGTAGGTATAGGTTTATACTTA
ATATTTGTTTTATCAAATAAATAA
>CRP_005 putative F0F1-type ATP synthase B subunit
ATGAATTTCAATTATACTATTATTAATGAATTTGTATCTTTTTTAATTTTTTTTTATGTTTCATTTAAAA
TTATATTTCCAGTTATATTAAAAAAAATAAATAATTTTTTAATAATTGATTATAAAAATTTTGTTTTTAA
CAATCAAGAAAAAATTATTAAAAAAAAATTATTAGATGAAATAGTTAAAAACGAAAATTTAACAAATAAG
AAATTTATATCTTTAATAGAAAAAATAAAAAAAAGTATTTTATTAGAAAAACAAAATTTTATTAATTTTA
TAAAATTAGAAAAAATAAACGTTCTAAAAATTTTTAAAAAAAAAATATTAAATAATAATATGTTAATTAT
TAAAAACTTTTTAATTGAGATTAAAAAATTGTTTATAAATAGCTTTAAAAATATTTTTAATGAAATTATT
TGTTATAACAATGAATTTATAATTAATTATGTTTAA
>CRP_006 hypothetical protein
ATGTTTAAATTTATAAACAGGTTTTTAAATTTAAAAAAAAGATATTTTTATATTTTTTTAATAAATTTTT
TTTATTTTTTTAATAAATGTAATTTTATTAAAAAAAAAAAAATATATAAAAAAATAATTACTAAAAAATT
TGAAAATTATTTATTAAAATTAATTATTCAAAAATATGCTAAATGA
>CRP_007 F0F1-type ATP synthase alpha subunit
ATGCTAAATGAAGGAATAATAAACAAAATTTATGATAGTGTAGTTGAAGTTCTTGGATTGAAAAATGCTA
AATATGGTGAAATGATTTTATTTAGTAAAAATATTAAAGGAATAGTATTCAGTTTAAACAAAAAAAATGT
AAATATAATTATATTAAATAATTATAACGAGTTAACACAAGGAGAAAAATGTTATTGCACAAACAAAATA
TTTGAAGTTCCTGTTGGAAAACAATTAATAGGTAGAATAATAAATTCTAGAGGAGAAACTCTCGATTTGT
TACCAGAAATTAAAATAAATGAATTTTCACCTATTGAAAAAATAGCACCAGGTGTTATGGATAGAGAAAC
AGTAAATGAGCCATTATTAACTGGAATAAAATCTATTGATTCAATGATTCCTATTGGAAAAGGACAACGA
GAATTAATTATTGGTGATAGACAAACTGGAAAAACTACAATTTGTATTGATACTATTATTAATCAAAAAA
ATAAAAATATTATTTGTGTTTATGTTTGTATAGGTCAAAAAATATCTTCTTTAATAAATATTATTAATAA
GCTTAAAAAATTTAATTGCTTAGAATATACAATTATTGTAGCTTCAACTGCCTCAGATAGTGCAGCGGAG
CAGTATATTGCTCCATATACTGGAAGCACAATAAGTGAATATTTTCGTGATAAAGGACAAGATTGCCTAA
TTGTTTATGATGATTTAACAAAACATGCTTGGGCATATAGACAAATTTCTTTACTATTAAGACGTCCACC
TGGTCGTGAAGCTTATCCTGGTGATGTATTTTATCTTCATTCAAGATTATTAGAAAGATCATCTAAAGTG
AACAAATTTTTTGTAAATAAAAAATCTAATATTTTAAAAGCAGGTTCTTTAACTGCATTTCCTATAATTG
AAACTTTAGAAGGAGACGTAACTTCTTTTATTCCAACAAATGTTATTTCTATAACTGATGGTCAAATTTT
TTTAGATACAAATTTATTTAATTCAGGAATTAGACCATCAATAAACGTTGGATTATCTGTTTCTAGAGTT
GGTGGCGCTGCTCAATATAAAATTATTAAAAAATTAAGTGGAGACATTAGAATTATGTTAGCTCAGTATA
GAGAATTAGAAGCATTTTCTAAATTTTCATCCGATCTTGATAGTGAAACTAAAAATCAATTAATAATTGG
AGAAAAAATAACAATATTAATGAAACAAAATATACATGATGTTTATGATATATTTGAATTAATATTAATA
TTATTGATAATTAAACATGATTTTTTTAGACTAATTCCAATAAACCAAGTTGAATATTTTGAAAATAAAA
TTATAAATTATTTAAGAAAAATTAAATTTAAAAATCAAATTGAAATTGACAACAAAAATTTAGAAAATTG
TTTAAACGAATTAATAAGTTTTTTTATATCAAACAGTATATTATGA
>CRP_008 F0F1-type ATP synthase gamma subunit
ATGATTATTAAAGAAATAAATAGTAAAATAAAAATAACAACAAATATCAATAAATTAACTAATACTTTGA
GTATGATTTCATTGTCTAAAATGAATAAATATATAAATTTAATTAATAATTTAGATTATATTAACATTGA
ATTAAAAAAAATTTTAGAATATATTATTATTAACATTAAAAGTAACGTATTTTGTTTAATAATAATTACT
TCAAACAAAGGATTGTGTGGAAATTTAAATAATGAAATTATTAAATACTCGCTTAATTATATTAAAAACA
ATAAAAATTTAGATTTAATTTTAATAGGAAAAAAAGGAATAGATTTTTTTAATAAAAAAAATTTTTATAT
TAAAGAAAAAATAATTTTTAAAGACAATGAATTAAAAAATTTAGTTTTTAATAATAAAATTTTAAATGAT
TTAAAAAAATACGAAAATATTTTTTTTATTAGTTCAAAAATTATTAAAAATAACGTTAAAATAATAAAAA
CAGATTTGTATTTAAAAAAAAAATATAATTATTTAATAAAACATAATTTTAATTATGATTGTTTTTTAAA
AAATTTTTATAATTATAATTTAAAATGTTTGTATTTAAATAACTTGTTTTGTGAATTAAAATCTAGAATG
ATTACAATGAAGTCTGCTGCTGATAATTCAAAAAAAATAATTAAAGACATGAAATTAATAAAAAATAAAA
TTAGACAATTTAAAGTTACTCAAGATATGCTTGAAATAATAAATGGAAGTAATTTATGA
>CRP_009 F0F1-type ATP synthase beta subunit
ATGATAGGAAGAATTGTACAAATTTTAGGTTCTATAGTAGACGTTGAATTTAAAAAAAACAATATTCCAT
ATATATATAATGCTTTATTTATTAAAGAATTTAATTTATATTTAGAAGTTCAACAACAAATTGGAAATAA
TATTGTAAGAACTATAGCTTTAGGTAGTACCTATGGATTAAAAAGATATCTTTTAGTAATAGATACTAAA
AAACCAATTTTAACTCCTGTTGGAAATTGTACTTTAGGACGTATATTGAATGTTTTAGGTAATCCCATTG
ATAATAATGGTGAAATTATTTCAAACAAAAAAAAACCAATACATTGTTCACCGCCAAAATTTTCAGATCA
AGTATTTTCAAATAATATATTAGAAACTGGAATAAAAGTAATAGATTTATTGTGTCCATTTTTAAGAGGA
GGAAAAATTGGTTTATTTGGTGGAGCAGGTGTTGGTAAAACTATAAATATGATGGAATTAATAAGAAATA
TTGCAATTGAACATAAAGGATGTTCTGTATTTATAGGAGTTGGTGAAAGAACTCGTGAAGGAAATGATTT
TTATTATGAAATGAAAGAATCAAATGTATTAGACAAAGTTTCTTTAATATATGGTCAAATGAATGAACCT
TCAGGTAATAGATTAAGAGTTGCATTAACTGGATTAAGTATAGCAGAAGAATTTAGAGAAATGGGTAAAG
ATGTACTTTTATTTATAGATAATATTTACAGATTTACGTTAGCAGGTACTGAAATTTCAGCATTATTGGG
AAGAATGCCTTCAGCTGTTGGATATCAGCCTACTTTAGCAGAAGAAATGGGAAAATTACAAGAAAGAATT
TCTTCAACAAAAAATGGAAGTATTACTTCAGTACAAGCTATATACGTACCTGCTGATGATTTAACAGATC
CATCTCCAAGTACTACTTTTACTCATTTAGATTCTACTATTGTTTTGTCTAGACAAATAGCGGAATTAGG
AATTTATCCTGCTATTGATCCATTAGAATCTTATTCTAAACAATTAGATCCTTATATAGTAGGAATTGAA
CATTATGAAATTGCTAATTCTGTAAAATTTTATTTACAAAAATATAAAGAATTAAAAGATACAATAGCTA
TTTTAGGAATGGACGAATTATCAGAAAATGATCAAATTATTGTTAAAAGAGCAAGAAAGTTGCAAAGATT
TTTTTCTCAACCTTTTTTTGTTGGTGAAATATTTACAGGAATAAAAGGAGAATATGTAAATATAAAAGAT
ACAATTCAATGTTTTAAAAATATTTTAAATGGTGAATTTGATAATATTAATGAAAAAAATTTTTATATGA
TAGGAAAAATATGA
>CRP_010 hypothetical protein
ATGAATTTATTAATTTTAAGTATAAAAAATATTATAGAATATAAAAATGCTTCTATATTAAATGTAAAAA
CATACTTAAAACTTTTTTCAATTATGAATAATCATATAAATAATATTTGCGATGTTAATCAAATTAAGTT
AATATTTAAAAATAAAATCATAAATATAAGAATTAATAATGGTTTTTTATTTCAAAAAAAAAATAATACT
AAAATAATATGTAATTTTTATGAATTTTTATAA
>CRP_004 F0F1-type ATP synthase C subunit
MNNLLILSSSIMIGLSSIGTGIGFGILGGKLLDSISRQPELDNLLLTRTFLMTGLLDAIPMISVGIGLYL
IFVLSNK
>CRP_005 putative F0F1-type ATP synthase B subunit
MNFNYTIINEFVSFLIFFYVSFKIIFPVILKKINNFLIIDYKNFVFNNQEKIIKKKLLDEIVKNENLTNK
KFISLIEKIKKSILLEKQNFINFIKLEKINVLKIFKKKILNNNMLIIKNFLIEIKKLFINSFKNIFNEII
CYNNEFIINYV
>CRP_006 hypothetical protein
MFKFINRFLNLKKRYFYIFLINFFYFFNKCNFIKKKKIYKKIITKKFENYLLKLIIQKYAK
>CRP_007 F0F1-type ATP synthase alpha subunit
MLNEGIINKIYDSVVEVLGLKNAKYGEMILFSKNIKGIVFSLNKKNVNIIILNNYNELTQGEKCYCTNKI
FEVPVGKQLIGRIINSRGETLDLLPEIKINEFSPIEKIAPGVMDRETVNEPLLTGIKSIDSMIPIGKGQR
ELIIGDRQTGKTTICIDTIINQKNKNIICVYVCIGQKISSLINIINKLKKFNCLEYTIIVASTASDSAAE
QYIAPYTGSTISEYFRDKGQDCLIVYDDLTKHAWAYRQISLLLRRPPGREAYPGDVFYLHSRLLERSSKV
NKFFVNKKSNILKAGSLTAFPIIETLEGDVTSFIPTNVISITDGQIFLDTNLFNSGIRPSINVGLSVSRV
GGAAQYKIIKKLSGDIRIMLAQYRELEAFSKFSSDLDSETKNQLIIGEKITILMKQNIHDVYDIFELILI
LLIIKHDFFRLIPINQVEYFENKIINYLRKIKFKNQIEIDNKNLENCLNELISFFISNSIL
>CRP_008 F0F1-type ATP synthase gamma subunit
MIIKEINSKIKITTNINKLTNTLSMISLSKMNKYINLINNLDYINIELKKILEYIIINIKSNVFCLIIIT
SNKGLCGNLNNEIIKYSLNYIKNNKNLDLILIGKKGIDFFNKKNFYIKEKIIFKDNELKNLVFNNKILND
LKKYENIFFISSKIIKNNVKIIKTDLYLKKKYNYLIKHNFNYDCFLKNFYNYNLKCLYLNNLFCELKSRM
ITMKSAADNSKKIIKDMKLIKNKIRQFKVTQDMLEIINGSNL
>CRP_009 F0F1-type ATP synthase beta subunit
MIGRIVQILGSIVDVEFKKNNIPYIYNALFIKEFNLYLEVQQQIGNNIVRTIALGSTYGLKRYLLVIDTK
KPILTPVGNCTLGRILNVLGNPIDNNGEIISNKKKPIHCSPPKFSDQVFSNNILETGIKVIDLLCPFLRG
GKIGLFGGAGVGKTINMMELIRNIAIEHKGCSVFIGVGERTREGNDFYYEMKESNVLDKVSLIYGQMNEP
SGNRLRVALTGLSIAEEFREMGKDVLLFIDNIYRFTLAGTEISALLGRMPSAVGYQPTLAEEMGKLQERI
SSTKNGSITSVQAIYVPADDLTDPSPSTTFTHLDSTIVLSRQIAELGIYPAIDPLESYSKQLDPYIVGIE
HYEIANSVKFYLQKYKELKDTIAILGMDELSENDQIIVKRARKLQRFFSQPFFVGEIFTGIKGEYVNIKD
TIQCFKNILNGEFDNINEKNFYMIGKI
>CRP_010 hypothetical protein
MNLLILSIKNIIEYKNASILNVKTYLKLFSIMNNHINNICDVNQIKLIFKNKIINIRINNGFLFQKKNNT
KIICNFYEFL
>
>
is the sequence identifier
>
or the end of fileNCBI stores all public biological sequences at
https://www.ncbi.nlm.nih.gov/nuccore
Anybody can upload sequences, and they may be wrong
NCBI has a curation process to validate the sequences
If a sequence is good enough to be a reference, then it is stored in the RefSeq collection https://www.ncbi.nlm.nih.gov/refseq
Accession numbers are the best way to identify a biological sequence
Different sequences can have the same name, but never the same accession
This is an important idea
Everything needs a name. A unique name.
We assign an identifier (or id) to each thing
If two things have the same id, then they are the same thing.
You download the FASTA file
Store it on your computer, and change the name
.fna
: FASTA nucleotide. Full genome.ffn
: FASTA nucleotide with features, such as genes.faa
: FASTA amino-acids. Proteins or peptidesIn this course we use two sequences for most of the examples
We use this genome in classes because it is a small example
Candidatus Carsonella ruddii is a obligate symbiont of Pachpsylla venusta.
It is not clear if this is a living cell or simply an organelle. It is missing genes needed for living independently.
Published as “The 160-kilobase genome of the bacterial endosymbiont Carsonella.” https://www.ncbi.nlm.nih.gov/pubmed/17038615
To handle sequence data in R, we use the seqinr library
You have to install it once.
Then you have to load it on every session
file
seqtype
"DNA"
or "AA"
. default is "DNA"
set.attributes
TRUE
, gets extra data. We will choose FALSE
library(seqinr)
proteins <- read.fasta("AP009180.faa", seqtype="AA", set.attributes = FALSE)
proteins[1:10]
$`lcl|AP009180.1_prot_BAF35032.1_1`
[1] "M" "N" "T" "I" "F" "S" "R" "I" "T" "P" "L" "G" "N" "G" "T" "L"
[17] "C" "V" "I" "R" "I" "S" "G" "K" "N" "V" "K" "F" "L" "I" "Q" "K"
[33] "I" "V" "K" "K" "N" "I" "K" "E" "K" "I" "A" "T" "F" "S" "K" "L"
[49] "F" "L" "D" "K" "E" "C" "V" "D" "Y" "A" "M" "I" "I" "F" "F" "K"
[65] "K" "P" "N" "T" "F" "T" "G" "E" "D" "I" "I" "E" "F" "H" "I" "H"
[81] "N" "N" "E" "T" "I" "V" "K" "K" "I" "I" "N" "Y" "L" "L" "L" "N"
[97] "K" "A" "R" "F" "A" "K" "A" "G" "E" "F" "L" "E" "R" "R" "Y" "L"
[113] "N" "G" "K" "I" "S" "L" "I" "E" "C" "E" "L" "I" "N" "N" "K" "I"
[129] "L" "Y" "D" "N" "E" "N" "M" "F" "Q" "L" "T" "K" "N" "S" "E" "K"
[145] "K" "I" "F" "L" "C" "I" "I" "K" "N" "L" "K" "F" "K" "I" "N" "S"
[161] "L" "I" "I" "C" "I" "E" "I" "A" "N" "F" "N" "F" "S" "F" "F" "F"
[177] "F" "N" "D" "F" "L" "F" "I" "K" "Y" "T" "F" "K" "K" "L" "L" "K"
[193] "L" "L" "K" "I" "L" "I" "D" "K" "I" "T" "V" "I" "N" "Y" "L" "K"
[209] "K" "N" "F" "T" "I" "M" "I" "L" "G" "R" "R" "N" "V" "G" "K" "S"
[225] "T" "L" "F" "N" "K" "I" "C" "A" "Q" "Y" "D" "S" "I" "V" "T" "N"
[241] "I" "P" "G" "T" "T" "K" "N" "I" "I" "S" "K" "K" "I" "K" "I" "L"
[257] "S" "K" "K" "I" "K" "M" "M" "D" "T" "A" "G" "L" "K" "I" "R" "T"
[273] "K" "N" "L" "I" "E" "K" "I" "G" "I" "I" "K" "N" "I" "N" "K" "I"
[289] "Y" "Q" "G" "N" "L" "I" "L" "Y" "M" "I" "D" "K" "F" "N" "I" "K"
[305] "N" "I" "F" "F" "N" "I" "P" "I" "D" "F" "I" "D" "K" "I" "K" "L"
[321] "N" "E" "L" "I" "I" "L" "V" "N" "K" "S" "D" "I" "L" "G" "K" "E"
[337] "E" "G" "V" "F" "K" "I" "K" "N" "I" "L" "I" "I" "L" "I" "S" "S"
[353] "K" "N" "G" "T" "F" "I" "K" "N" "L" "K" "C" "F" "I" "N" "K" "I"
[369] "V" "D" "N" "K" "D" "F" "S" "K" "N" "N" "Y" "S" "D" "V" "K" "I"
[385] "L" "F" "N" "K" "F" "S" "F" "F" "Y" "K" "E" "F" "S" "C" "N" "Y"
[401] "D" "L" "V" "L" "S" "K" "L" "I" "D" "F" "Q" "K" "N" "I" "F" "K"
[417] "L" "T" "G" "N" "F" "T" "N" "K" "K" "I" "I" "N" "S" "C" "F" "R"
[433] "N" "F" "C" "I" "G" "K"
$`lcl|AP009180.1_prot_BAF35033.1_2`
[1] "M" "N" "I" "F" "N" "I" "I" "I" "I" "G" "A" "G" "H" "S" "G" "I"
[17] "E" "A" "A" "I" "S" "A" "S" "K" "I" "C" "N" "K" "I" "K" "I" "I"
[33] "T" "S" "N" "L" "E" "N" "L" "G" "I" "M" "S" "C" "N" "P" "S" "I"
[49] "G" "G" "I" "G" "K" "S" "H" "L" "V" "K" "E" "L" "E" "L" "F" "G"
[65] "G" "I" "M" "P" "E" "A" "S" "D" "Y" "S" "R" "I" "H" "S" "K" "L"
[81] "L" "N" "Y" "K" "K" "G" "E" "S" "V" "H" "S" "L" "R" "Y" "Q" "I"
[97] "D" "R" "I" "L" "Y" "K" "N" "Y" "I" "L" "K" "I" "L" "F" "L" "K"
[113] "K" "N" "I" "L" "I" "E" "Q" "N" "E" "I" "N" "K" "I" "I" "R" "F"
[129] "K" "K" "K" "I" "L" "I" "F" "N" "K" "L" "K" "F" "F" "N" "I" "A"
[145] "K" "I" "I" "I" "V" "C" "A" "G" "T" "F" "I" "N" "S" "K" "I" "Y"
[161] "I" "G" "K" "N" "I" "K" "A" "L" "N" "K" "A" "E" "K" "K" "S" "I"
[177] "S" "Y" "S" "F" "K" "K" "I" "N" "L" "F" "I" "S" "K" "L" "K" "T"
[193] "G" "T" "P" "P" "R" "L" "D" "L" "N" "Y" "L" "N" "Y" "K" "K" "L"
[209] "S" "V" "Q" "Y" "S" "D" "Y" "T" "I" "S" "Y" "G" "K" "N" "F" "N"
[225] "F" "N" "N" "N" "V" "K" "C" "F" "I" "T" "N" "T" "D" "N" "K" "I"
[241] "N" "N" "F" "I" "K" "K" "N" "I" "K" "N" "S" "S" "L" "F" "N" "L"
[257] "K" "F" "K" "S" "I" "G" "P" "R" "Y" "C" "P" "S" "I" "E" "D" "K"
[273] "I" "F" "K" "F" "P" "N" "N" "K" "N" "H" "Q" "I" "F" "L" "E" "P"
[289] "E" "S" "Y" "F" "S" "K" "E" "I" "Y" "V" "N" "G" "L" "S" "N" "S"
[305] "L" "S" "Y" "N" "I" "Q" "K" "K" "L" "I" "K" "K" "I" "L" "G" "I"
[321] "K" "K" "S" "Y" "I" "I" "R" "Y" "A" "Y" "N" "I" "Q" "Y" "D" "Y"
[337] "F" "D" "P" "R" "C" "L" "K" "I" "S" "L" "N" "I" "K" "F" "A" "N"
[353] "N" "I" "F" "L" "A" "G" "Q" "I" "N" "G" "T" "T" "G" "Y" "E" "E"
[369] "A" "S" "S" "Q" "G" "F" "V" "A" "G" "I" "N" "S" "A" "R" "K" "I"
[385] "L" "K" "L" "P" "L" "W" "K" "P" "K" "K" "W" "N" "S" "Y" "I" "G"
[401] "V" "L" "L" "Y" "D" "L" "T" "N" "F" "G" "I" "Q" "E" "P" "Y" "R"
[417] "I" "F" "T" "S" "K" "S" "D" "N" "R" "L" "F" "L" "R" "F" "D" "N"
[433] "A" "I" "F" "R" "L" "I" "N" "I" "S" "Y" "Y" "L" "G" "C" "L" "P"
[449] "I" "V" "K" "F" "K" "Y" "Y" "N" "S" "L" "I" "Y" "K" "F" "Y" "K"
[465] "N" "L" "I" "N" "I" "R" "K" "I" "K" "L" "F" "D" "N" "F" "Y" "L"
[481] "F" "K" "L" "I" "I" "I" "M" "S" "K" "Y" "Y" "G" "Y" "I" "K" "K"
[497] "K" "Y" "F" "K"
$`lcl|AP009180.1_prot_BAF35034.1_3`
[1] "M" "V" "I" "L" "K" "K" "N" "I" "L" "N" "N" "F" "L" "N" "F" "K"
[17] "I" "I" "D" "L" "N" "L" "I" "I" "L" "L" "L" "F" "I" "H" "L" "I"
[33] "V" "F" "Y" "L" "L" "K" "N" "N" "N" "L" "M" "I" "L" "L" "S" "I"
[49] "Y" "L" "N" "N" "F" "I" "K" "N" "S" "I" "N" "L" "N" "S" "R" "N"
[65] "I" "I" "F" "F" "F" "S" "L" "V" "L" "F" "N" "I" "I" "L" "F" "S"
[81] "N" "F" "I" "D" "L" "F" "P" "N" "N" "L" "I" "K" "N" "F" "L" "N"
[97] "L" "K" "Q" "I" "E" "I" "V" "P" "T" "S" "N" "I" "N" "I" "T" "F"
[113] "C" "F" "S" "I" "I" "S" "F" "L" "I" "I" "I" "M" "L" "T" "H" "K"
[129] "K" "I" "G" "F" "K" "K" "Y" "I" "Y" "S" "F" "F" "I" "Y" "P" "I"
[145] "N" "T" "E" "Y" "L" "Y" "L" "F" "N" "F" "I" "I" "E" "S" "I" "S"
[161] "Y" "I" "M" "K" "P" "I" "S" "L" "S" "L" "R" "L" "F" "G" "N" "I"
[177] "F" "S" "S" "E" "I" "I" "F" "N" "I" "I" "N" "N" "M" "N" "V" "F"
[193] "I" "N" "S" "F" "L" "N" "L" "I" "W" "G" "I" "F" "H" "F" "I" "I"
[209] "L" "P" "L" "Q" "S" "F" "I" "F" "I" "T" "L" "V" "I" "I" "Y" "V"
[225] "S" "Q" "T" "L" "N" "H"
$`lcl|AP009180.1_prot_BAF35035.1_4`
[1] "M" "N" "N" "L" "L" "I" "L" "S" "S" "S" "I" "M" "I" "G" "L" "S"
[17] "S" "I" "G" "T" "G" "I" "G" "F" "G" "I" "L" "G" "G" "K" "L" "L"
[33] "D" "S" "I" "S" "R" "Q" "P" "E" "L" "D" "N" "L" "L" "L" "T" "R"
[49] "T" "F" "L" "M" "T" "G" "L" "L" "D" "A" "I" "P" "M" "I" "S" "V"
[65] "G" "I" "G" "L" "Y" "L" "I" "F" "V" "L" "S" "N" "K"
$`lcl|AP009180.1_prot_BAF35036.1_5`
[1] "M" "N" "F" "N" "Y" "T" "I" "I" "N" "E" "F" "V" "S" "F" "L" "I"
[17] "F" "F" "Y" "V" "S" "F" "K" "I" "I" "F" "P" "V" "I" "L" "K" "K"
[33] "I" "N" "N" "F" "L" "I" "I" "D" "Y" "K" "N" "F" "V" "F" "N" "N"
[49] "Q" "E" "K" "I" "I" "K" "K" "K" "L" "L" "D" "E" "I" "V" "K" "N"
[65] "E" "N" "L" "T" "N" "K" "K" "F" "I" "S" "L" "I" "E" "K" "I" "K"
[81] "K" "S" "I" "L" "L" "E" "K" "Q" "N" "F" "I" "N" "F" "I" "K" "L"
[97] "E" "K" "I" "N" "V" "L" "K" "I" "F" "K" "K" "K" "I" "L" "N" "N"
[113] "N" "M" "L" "I" "I" "K" "N" "F" "L" "I" "E" "I" "K" "K" "L" "F"
[129] "I" "N" "S" "F" "K" "N" "I" "F" "N" "E" "I" "I" "C" "Y" "N" "N"
[145] "E" "F" "I" "I" "N" "Y" "V"
$`lcl|AP009180.1_prot_BAF35037.1_6`
[1] "M" "F" "K" "F" "I" "N" "R" "F" "L" "N" "L" "K" "K" "R" "Y" "F"
[17] "Y" "I" "F" "L" "I" "N" "F" "F" "Y" "F" "F" "N" "K" "C" "N" "F"
[33] "I" "K" "K" "K" "K" "I" "Y" "K" "K" "I" "I" "T" "K" "K" "F" "E"
[49] "N" "Y" "L" "L" "K" "L" "I" "I" "Q" "K" "Y" "A" "K"
$`lcl|AP009180.1_prot_BAF35038.1_7`
[1] "M" "L" "N" "E" "G" "I" "I" "N" "K" "I" "Y" "D" "S" "V" "V" "E"
[17] "V" "L" "G" "L" "K" "N" "A" "K" "Y" "G" "E" "M" "I" "L" "F" "S"
[33] "K" "N" "I" "K" "G" "I" "V" "F" "S" "L" "N" "K" "K" "N" "V" "N"
[49] "I" "I" "I" "L" "N" "N" "Y" "N" "E" "L" "T" "Q" "G" "E" "K" "C"
[65] "Y" "C" "T" "N" "K" "I" "F" "E" "V" "P" "V" "G" "K" "Q" "L" "I"
[81] "G" "R" "I" "I" "N" "S" "R" "G" "E" "T" "L" "D" "L" "L" "P" "E"
[97] "I" "K" "I" "N" "E" "F" "S" "P" "I" "E" "K" "I" "A" "P" "G" "V"
[113] "M" "D" "R" "E" "T" "V" "N" "E" "P" "L" "L" "T" "G" "I" "K" "S"
[129] "I" "D" "S" "M" "I" "P" "I" "G" "K" "G" "Q" "R" "E" "L" "I" "I"
[145] "G" "D" "R" "Q" "T" "G" "K" "T" "T" "I" "C" "I" "D" "T" "I" "I"
[161] "N" "Q" "K" "N" "K" "N" "I" "I" "C" "V" "Y" "V" "C" "I" "G" "Q"
[177] "K" "I" "S" "S" "L" "I" "N" "I" "I" "N" "K" "L" "K" "K" "F" "N"
[193] "C" "L" "E" "Y" "T" "I" "I" "V" "A" "S" "T" "A" "S" "D" "S" "A"
[209] "A" "E" "Q" "Y" "I" "A" "P" "Y" "T" "G" "S" "T" "I" "S" "E" "Y"
[225] "F" "R" "D" "K" "G" "Q" "D" "C" "L" "I" "V" "Y" "D" "D" "L" "T"
[241] "K" "H" "A" "W" "A" "Y" "R" "Q" "I" "S" "L" "L" "L" "R" "R" "P"
[257] "P" "G" "R" "E" "A" "Y" "P" "G" "D" "V" "F" "Y" "L" "H" "S" "R"
[273] "L" "L" "E" "R" "S" "S" "K" "V" "N" "K" "F" "F" "V" "N" "K" "K"
[289] "S" "N" "I" "L" "K" "A" "G" "S" "L" "T" "A" "F" "P" "I" "I" "E"
[305] "T" "L" "E" "G" "D" "V" "T" "S" "F" "I" "P" "T" "N" "V" "I" "S"
[321] "I" "T" "D" "G" "Q" "I" "F" "L" "D" "T" "N" "L" "F" "N" "S" "G"
[337] "I" "R" "P" "S" "I" "N" "V" "G" "L" "S" "V" "S" "R" "V" "G" "G"
[353] "A" "A" "Q" "Y" "K" "I" "I" "K" "K" "L" "S" "G" "D" "I" "R" "I"
[369] "M" "L" "A" "Q" "Y" "R" "E" "L" "E" "A" "F" "S" "K" "F" "S" "S"
[385] "D" "L" "D" "S" "E" "T" "K" "N" "Q" "L" "I" "I" "G" "E" "K" "I"
[401] "T" "I" "L" "M" "K" "Q" "N" "I" "H" "D" "V" "Y" "D" "I" "F" "E"
[417] "L" "I" "L" "I" "L" "L" "I" "I" "K" "H" "D" "F" "F" "R" "L" "I"
[433] "P" "I" "N" "Q" "V" "E" "Y" "F" "E" "N" "K" "I" "I" "N" "Y" "L"
[449] "R" "K" "I" "K" "F" "K" "N" "Q" "I" "E" "I" "D" "N" "K" "N" "L"
[465] "E" "N" "C" "L" "N" "E" "L" "I" "S" "F" "F" "I" "S" "N" "S" "I"
[481] "L"
$`lcl|AP009180.1_prot_BAF35039.1_8`
[1] "M" "I" "I" "K" "E" "I" "N" "S" "K" "I" "K" "I" "T" "T" "N" "I"
[17] "N" "K" "L" "T" "N" "T" "L" "S" "M" "I" "S" "L" "S" "K" "M" "N"
[33] "K" "Y" "I" "N" "L" "I" "N" "N" "L" "D" "Y" "I" "N" "I" "E" "L"
[49] "K" "K" "I" "L" "E" "Y" "I" "I" "I" "N" "I" "K" "S" "N" "V" "F"
[65] "C" "L" "I" "I" "I" "T" "S" "N" "K" "G" "L" "C" "G" "N" "L" "N"
[81] "N" "E" "I" "I" "K" "Y" "S" "L" "N" "Y" "I" "K" "N" "N" "K" "N"
[97] "L" "D" "L" "I" "L" "I" "G" "K" "K" "G" "I" "D" "F" "F" "N" "K"
[113] "K" "N" "F" "Y" "I" "K" "E" "K" "I" "I" "F" "K" "D" "N" "E" "L"
[129] "K" "N" "L" "V" "F" "N" "N" "K" "I" "L" "N" "D" "L" "K" "K" "Y"
[145] "E" "N" "I" "F" "F" "I" "S" "S" "K" "I" "I" "K" "N" "N" "V" "K"
[161] "I" "I" "K" "T" "D" "L" "Y" "L" "K" "K" "K" "Y" "N" "Y" "L" "I"
[177] "K" "H" "N" "F" "N" "Y" "D" "C" "F" "L" "K" "N" "F" "Y" "N" "Y"
[193] "N" "L" "K" "C" "L" "Y" "L" "N" "N" "L" "F" "C" "E" "L" "K" "S"
[209] "R" "M" "I" "T" "M" "K" "S" "A" "A" "D" "N" "S" "K" "K" "I" "I"
[225] "K" "D" "M" "K" "L" "I" "K" "N" "K" "I" "R" "Q" "F" "K" "V" "T"
[241] "Q" "D" "M" "L" "E" "I" "I" "N" "G" "S" "N" "L"
$`lcl|AP009180.1_prot_BAF35040.1_9`
[1] "M" "I" "G" "R" "I" "V" "Q" "I" "L" "G" "S" "I" "V" "D" "V" "E"
[17] "F" "K" "K" "N" "N" "I" "P" "Y" "I" "Y" "N" "A" "L" "F" "I" "K"
[33] "E" "F" "N" "L" "Y" "L" "E" "V" "Q" "Q" "Q" "I" "G" "N" "N" "I"
[49] "V" "R" "T" "I" "A" "L" "G" "S" "T" "Y" "G" "L" "K" "R" "Y" "L"
[65] "L" "V" "I" "D" "T" "K" "K" "P" "I" "L" "T" "P" "V" "G" "N" "C"
[81] "T" "L" "G" "R" "I" "L" "N" "V" "L" "G" "N" "P" "I" "D" "N" "N"
[97] "G" "E" "I" "I" "S" "N" "K" "K" "K" "P" "I" "H" "C" "S" "P" "P"
[113] "K" "F" "S" "D" "Q" "V" "F" "S" "N" "N" "I" "L" "E" "T" "G" "I"
[129] "K" "V" "I" "D" "L" "L" "C" "P" "F" "L" "R" "G" "G" "K" "I" "G"
[145] "L" "F" "G" "G" "A" "G" "V" "G" "K" "T" "I" "N" "M" "M" "E" "L"
[161] "I" "R" "N" "I" "A" "I" "E" "H" "K" "G" "C" "S" "V" "F" "I" "G"
[177] "V" "G" "E" "R" "T" "R" "E" "G" "N" "D" "F" "Y" "Y" "E" "M" "K"
[193] "E" "S" "N" "V" "L" "D" "K" "V" "S" "L" "I" "Y" "G" "Q" "M" "N"
[209] "E" "P" "S" "G" "N" "R" "L" "R" "V" "A" "L" "T" "G" "L" "S" "I"
[225] "A" "E" "E" "F" "R" "E" "M" "G" "K" "D" "V" "L" "L" "F" "I" "D"
[241] "N" "I" "Y" "R" "F" "T" "L" "A" "G" "T" "E" "I" "S" "A" "L" "L"
[257] "G" "R" "M" "P" "S" "A" "V" "G" "Y" "Q" "P" "T" "L" "A" "E" "E"
[273] "M" "G" "K" "L" "Q" "E" "R" "I" "S" "S" "T" "K" "N" "G" "S" "I"
[289] "T" "S" "V" "Q" "A" "I" "Y" "V" "P" "A" "D" "D" "L" "T" "D" "P"
[305] "S" "P" "S" "T" "T" "F" "T" "H" "L" "D" "S" "T" "I" "V" "L" "S"
[321] "R" "Q" "I" "A" "E" "L" "G" "I" "Y" "P" "A" "I" "D" "P" "L" "E"
[337] "S" "Y" "S" "K" "Q" "L" "D" "P" "Y" "I" "V" "G" "I" "E" "H" "Y"
[353] "E" "I" "A" "N" "S" "V" "K" "F" "Y" "L" "Q" "K" "Y" "K" "E" "L"
[369] "K" "D" "T" "I" "A" "I" "L" "G" "M" "D" "E" "L" "S" "E" "N" "D"
[385] "Q" "I" "I" "V" "K" "R" "A" "R" "K" "L" "Q" "R" "F" "F" "S" "Q"
[401] "P" "F" "F" "V" "G" "E" "I" "F" "T" "G" "I" "K" "G" "E" "Y" "V"
[417] "N" "I" "K" "D" "T" "I" "Q" "C" "F" "K" "N" "I" "L" "N" "G" "E"
[433] "F" "D" "N" "I" "N" "E" "K" "N" "F" "Y" "M" "I" "G" "K" "I"
$`lcl|AP009180.1_prot_BAF35041.1_10`
[1] "M" "N" "L" "L" "I" "L" "S" "I" "K" "N" "I" "I" "E" "Y" "K" "N"
[17] "A" "S" "I" "L" "N" "V" "K" "T" "Y" "L" "K" "L" "F" "S" "I" "M"
[33] "N" "N" "H" "I" "N" "N" "I" "C" "D" "V" "N" "Q" "I" "K" "L" "I"
[49] "F" "K" "N" "K" "I" "I" "N" "I" "R" "I" "N" "N" "G" "F" "L" "F"
[65] "Q" "K" "K" "N" "N" "T" "K" "I" "I" "C" "N" "F" "Y" "E" "F" "L"
read.fasta()
A list of vectors of chars. Each element is a sequence object.
The first sequence is
[1] "M" "N" "T" "I" "F" "S" "R" "I" "T" "P" "L" "G" "N" "G" "T" "L"
[17] "C" "V" "I" "R" "I" "S" "G" "K" "N" "V" "K" "F" "L" "I" "Q" "K"
[33] "I" "V" "K" "K" "N" "I" "K" "E" "K" "I" "A" "T" "F" "S" "K" "L"
[49] "F" "L" "D" "K" "E" "C" "V" "D" "Y" "A" "M" "I" "I" "F" "F" "K"
[65] "K" "P" "N" "T" "F" "T" "G" "E" "D" "I" "I" "E" "F" "H" "I" "H"
[81] "N" "N" "E" "T" "I" "V" "K" "K" "I" "I" "N" "Y" "L" "L" "L" "N"
[97] "K" "A" "R" "F" "A" "K" "A" "G" "E" "F" "L" "E" "R" "R" "Y" "L"
[113] "N" "G" "K" "I" "S" "L" "I" "E" "C" "E" "L" "I" "N" "N" "K" "I"
[129] "L" "Y" "D" "N" "E" "N" "M" "F" "Q" "L" "T" "K" "N" "S" "E" "K"
[145] "K" "I" "F" "L" "C" "I" "I" "K" "N" "L" "K" "F" "K" "I" "N" "S"
[161] "L" "I" "I" "C" "I" "E" "I" "A" "N" "F" "N" "F" "S" "F" "F" "F"
[177] "F" "N" "D" "F" "L" "F" "I" "K" "Y" "T" "F" "K" "K" "L" "L" "K"
[193] "L" "L" "K" "I" "L" "I" "D" "K" "I" "T" "V" "I" "N" "Y" "L" "K"
[209] "K" "N" "F" "T" "I" "M" "I" "L" "G" "R" "R" "N" "V" "G" "K" "S"
[225] "T" "L" "F" "N" "K" "I" "C" "A" "Q" "Y" "D" "S" "I" "V" "T" "N"
[241] "I" "P" "G" "T" "T" "K" "N" "I" "I" "S" "K" "K" "I" "K" "I" "L"
[257] "S" "K" "K" "I" "K" "M" "M" "D" "T" "A" "G" "L" "K" "I" "R" "T"
[273] "K" "N" "L" "I" "E" "K" "I" "G" "I" "I" "K" "N" "I" "N" "K" "I"
[289] "Y" "Q" "G" "N" "L" "I" "L" "Y" "M" "I" "D" "K" "F" "N" "I" "K"
[305] "N" "I" "F" "F" "N" "I" "P" "I" "D" "F" "I" "D" "K" "I" "K" "L"
[321] "N" "E" "L" "I" "I" "L" "V" "N" "K" "S" "D" "I" "L" "G" "K" "E"
[337] "E" "G" "V" "F" "K" "I" "K" "N" "I" "L" "I" "I" "L" "I" "S" "S"
[353] "K" "N" "G" "T" "F" "I" "K" "N" "L" "K" "C" "F" "I" "N" "K" "I"
[369] "V" "D" "N" "K" "D" "F" "S" "K" "N" "N" "Y" "S" "D" "V" "K" "I"
[385] "L" "F" "N" "K" "F" "S" "F" "F" "Y" "K" "E" "F" "S" "C" "N" "Y"
[401] "D" "L" "V" "L" "S" "K" "L" "I" "D" "F" "Q" "K" "N" "I" "F" "K"
[417] "L" "T" "G" "N" "F" "T" "N" "K" "K" "I" "I" "N" "S" "C" "F" "R"
[433] "N" "F" "C" "I" "G" "K"
Last semester we used two data structures
Now we introduce a new data type: lists
Like vectors, but mixing different kinds of elements
people <- list(c(60, 72, 57, 90, 95, 72),
c(1.75, 1.80, 1.65, 1.90, 1.74, 1.91),
c("Ali", "Deniz", "Fatma", "Emre",
"Volkan", "Onur"),
TRUE, c(2017, 10, 10),
factor(c("M","F","F","M","M","M")))
Notice that elements can have different length
[[1]]
[1] 60 72 57 90 95 72
[[2]]
[1] 1.75 1.80 1.65 1.90 1.74 1.91
[[3]]
[1] "Ali" "Deniz" "Fatma" "Emre" "Volkan" "Onur"
[[4]]
[1] TRUE
[[5]]
[1] 2017 10 10
[[6]]
[1] M F F M M M
Levels: F M
Each list element starts with a number in double brackets
Inside each element, we can see vectors, lists or other things
When the element is a vector, we see a second number, in single brackets
[[1]]
[1] 60 72 57 90 95 72
[[2]]
[1] 1.75 1.80 1.65 1.90 1.74 1.91
[[1]]
[1] 60 72 57 90 95 72
[[2]]
[1] 1.75 1.80 1.65 1.90 1.74 1.91
This is a sublist (with one element):
[[1]]
[1] 60 72 57 90 95 72
This is an element:
[1] 60 72 57 90 95 72
read.fasta
returns a list of vectors of characters