March 1st, 2016
We recognize patterns, such as
Bottom up: joining one by one
microarray <- read.table("microarray.txt") d <- dist(microarray, method="euclidean") tree <- hclust(d, method = "average")
\[\text{dist}_2(x,y)=\sqrt{(x_1-y_1)^2+\cdots +(x_n-y_n)^2}\]
\[\text{dist}(x, C)=\text{mean} (\text{dist}(x, y): y \in C)\]
plot(tree, labels = FALSE)
Since this is a hierarchical clustering, the number of clusters depends on the distance threshold.
Higher cut value means less clusters
To finally split the samples into the clusters defined by the tree, we use the function cutree
cutree(tree, k = NULL, h = NULL)
hclust
.cluster <- cutree(tree, h = 5) cluster
b4634 b3241 b3240 b3243 b3242 b2836 b0885 b1338 b1337 b1339 1 2 3 4 3 1 5 6 3 3 [ reached getOption("max.print") -- omitted 4287 entries ]
table(cluster)
cluster 1 2 3 4 5 6 7 8 9 10 277 74 386 440 327 486 78 223 22 38 [ reached getOption("max.print") -- omitted 166 entries ]
>gi|556503834|ref|NC_000913.3| Escherichia coli str. K-12 substr. MG1655, complete genome AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAA TATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACC ATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAG CCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAA GTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTGGCGATGATTG AAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTTTGCCGAACTTTT GACGGGACTCGCCGCCGCCCAGCCGGGGTTCCCGCTGGCGCAATTGAAAACTTTCGTCGATCAGGAATTT GCCCAAATAAAACATGTCCTGCATGGCATTAGTTTGTTGGGGCAGTGCCCGGATAGCATCAACGCTGCGC TGATTTGCCGTGGCGAGAAAATGTCGATCGCCATTATGGCCGGCGTATTAGAAGCGCGCGGTCACAACGT TACTGTTATCGATCCGGTCGAAAAACTGCTGGCAGTGGGGCATTACCTCGAATCTACCGTCGATATTGCT GAGTCCACCCGCCGTATTGCGGCAAGCCGCATTCCGGCTGATCACATGGTGCTGATGGCAGGTTTCACCG CCGGTAATGAAAAAGGCGAACTGGTGGTGCTTGGACGCAACGGTTCCGACTACTCTGCTGCGGTGCTGGC TGCCTGTTTACGCGCCGATTGTTGCGAGATTTGGACGGACGTTGACGGGGTCTATACCTGCGACCCGCGT CAGGTGCCCGATGCGAGGTTGTTGAAGTCGATGTCCTACCAGGAAGCGATGGAGCTTTCCTACTTCGGCG CTAAAGTTCTTCACCCCCGCACCATTACCCCCATCGCCCAGTTCCAGATCCCTTGCCTGATTAAAAATAC CGGAAATCCTCAAGCACCAGGTACGCTCATTGGTGCCAGCCGTGATGAAGACGAATTACCGGTCAAGGGC ATTTCCAATCTGAATAACATGGCAATGTTCAGCGTTTCTGGTCCGGGGATGAAAGGGATGGTCGGCATGG CGGCGCGCGTCTTTGCAGCGATGTCACGCGCCCGTATTTCCGTGGTGCTGATTACGCAATCATCTTCCGA ATACAGCATCAGTTTCTGCGTTCCACAAAGCGACTGTGTGCGAGCTGAACGGGCAATGCAGGAAGAGTTC TACCTGGAACTGAAAGAAGGCTTACTGGAGCCGCTGGCAGTGACGGAACGGCTGGCCATTATCTCGGTGG TAGGTGATGGTATGCGCACCTTGCGTGGGATCTCGGCGAAATTCTTTGCCGCACTGGCCCGCGCCAATAT CAACATTGTCGCCATTGCTCAGGGATCTTCTGAACGCTCAATCTCTGTCGTGGTAAATAACGATGATGCG ACCACTGGCGTGCGCGTTACTCATCAGATGCTGTTCAATACCGATCAGGTTATCGAAGTGTTTGTGATTG GCGTCGGTGGCGTTGGCGGTGCGCTGCTGGAGCAACTGAAGCGTCAGCAAAGCTGGCTGAAGAATAAACA TATCGACTTACGTGTCTGCGGTGTTGCCAACTCGAAGGCTCTGCTCACCAATGTACATGGCCTTAATCTG GAAAACTGGCAGGAAGAACTGGCGCAAGCCAAAGAGCCGTTTAATCTCGGGCGCTTAATTCGCCTCGTGA AAGAATATCATCTGCTGAACCCGGTCATTGTTGACTGCACTTCCAGCCAGGCAGTGGCGGATCAATATGC CGACTTCCTGCGCGAAGGTTTCCACGTTGTCACGCCGAACAAAAAGGCCAACACCTCGTCGATGGATTAC TACCATCAGTTGCGTTATGCGGCGGAAAAATCGCGGCGTAAATTCCTCTATGACACCAACGTTGGGGCTG GATTACCGGTTATTGAGAACCTGCAAAATCTGCTCAATGCAGGTGATGAATTGATGAAGTTCTCCGGCAT TCTTTCTGGTTCGCTTTCTTATATCTTCGGCAAGTTAGACGAAGGCATGAGTTTCTCCGAGGCGACCACG CTGGCGCGGGAAATGGGTTATACCGAACCGGACCCGCGAGATGATCTTTCTGGTATGGATGTGGCGCGTA AACTATTGATTCTCGCTCGTGAAACGGGACGTGAACTGGAGCTGGCGGATATTGAAATTGAACCTGTGCT GCCCGCAGAGTTTAACGCCGAGGGTGATGTTGCCGCTTTTATGGCGAATCTGTCACAACTCGACGATCTC TTTGCCGCGCGCGTGGCGAAGGCCCGTGATGAAGGAAAAGTTTTGCGCTATGTTGGCAATATTGATGAAG ATGGCGTCTGCCGCGTGAAGATTGCCGAAGTGGATGGTAATGATCCGCTGTTCAAAGTGAAAAATGGCGA AAACGCCCTGGCCTTCTATAGCCACTATTATCAGCCGCTGCCGTTGGTACTGCGCGGATATGGTGCGGGC AATGACGTTACAGCTGCCGGTGTCTTTGCTGATCTGCTACGTACCCTCTCATGGAAGTTAGGAGTCTGAC ATGGTTAAAGTTTATGCCCCGGCTTCCAGTGCCAATATGAGCGTCGGGTTTGATGTGCTCGGGGCGGCGG TGACACCTGTTGATGGTGCATTGCTCGGAGATGTAGTCACGGTTGAGGCGGCAGAGACATTCAGTCTCAA CAACCTCGGACGCTTTGCCGATAAGCTGCCGTCAGAACCACGGGAAAATATCGTTTATCAGTGCTGGGAG CGTTTTTGCCAGGAACTGGGTAAGCAAATTCCAGTGGCGATGACCCTGGAAAAGAATATGCCGATCGGTT CGGGCTTAGGCTCCAGTGCCTGTTCGGTGGTCGCGGCGCTGATGGCGATGAATGAACACTGCGGCAAGCC GCTTAATGACACTCGTTTGCTGGCTTTGATGGGCGAGCTGGAAGGCCGTATCTCCGGCAGCATTCATTAC GACAACGTGGCACCGTGTTTTCTCGGTGGTATGCAGTTGATGATCGAAGAAAACGACATCATCAGCCAGC AAGTGCCAGGGTTTGATGAGTGGCTGTGGGTGCTGGCGTATCCGGGGATTAAAGTCTCGACGGCAGAAGC CAGGGCTATTTTACCGGCGCAGTATCGCCGCCAGGATTGCATTGCGCACGGGCGACATCTGGCAGGCTTC ATTCACGCCTGCTATTCCCGTCAGCCTGAGCTTGCCGCGAAGCTGATGAAAGATGTTATCGCTGAACCCT ACCGTGAACGGTTACTGCCAGGCTTCCGGCAGGCGCGGCAGGCGGTCGCGGAAATCGGCGCGGTAGCGAG CGGTATCTCCGGCTCCGGCCCGACCTTGTTCGCTCTGTGTGACAAGCCGGAAACCGCCCAGCGCGTTGCC
>NC_000913.3_prot_NP_414542.1_1 [gene=thrL] [protein=thr operon leader peptide] [protein_id=NP_414542.1] [location=190..255] MKRISTTITTTITITTGNGAG >NC_000913.3_prot_NP_414543.1_2 [gene=thrA] [protein=Bifunctional aspartokinase/homoserine dehydrogenase 1] [protein_id=NP_414543.1] [location=337..2799] MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTISGQDALPNISDAERI FAELLTGLAAAQPGFPLAQLKTFVDQEFAQIKHVLHGISLLGQCPDSINAALICRGEKMSIAIMAGVLEA RGHNVTVIDPVEKLLAVGHYLESTVDIAESTRRIAASRIPADHMVLMAGFTAGNEKGELVVLGRNGSDYS AAVLAACLRADCCEIWTDVDGVYTCDPRQVPDARLLKSMSYQEAMELSYFGAKVLHPRTITPIAQFQIPC LIKNTGNPQAPGTLIGASRDEDELPVKGISNLNNMAMFSVSGPGMKGMVGMAARVFAAMSRARISVVLIT QSSSEYSISFCVPQSDCVRAERAMQEEFYLELKEGLLEPLAVTERLAIISVVGDGMRTLRGISAKFFAAL ARANINIVAIAQGSSERSISVVVNNDDATTGVRVTHQMLFNTDQVIEVFVIGVGGVGGALLEQLKRQQSW LKNKHIDLRVCGVANSKALLTNVHGLNLENWQEELAQAKEPFNLGRLIRLVKEYHLLNPVIVDCTSSQAV ADQYADFLREGFHVVTPNKKANTSSMDYYHQLRYAAEKSRRKFLYDTNVGAGLPVIENLQNLLNAGDELM KFSGILSGSLSYIFGKLDEGMSFSEATTLAREMGYTEPDPRDDLSGMDVARKLLILARETGRELELADIE IEPVLPAEFNAEGDVAAFMANLSQLDDLFAARVAKARDEGKVLRYVGNIDEDGVCRVKIAEVDGNDPLFK VKNGENALAFYSHYYQPLPLVLRGYGAGNDVTAAGVFADLLRTLSWKLGV >NC_000913.3_prot_NP_414544.1_3 [gene=thrB] [protein=homoserine kinase] [protein_id=NP_414544.1] [location=2801..3733] MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETFSLNNLGRFADKLPSEPRENIVYQCWE RFCQELGKQIPVAMTLEKNMPIGSGLGSSACSVVAALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHY DNVAPCFLGGMQLMIEENDIISQQVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCIAHGRHLAGF IHACYSRQPELAAKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETAQRVA DWLGKNYLQNQEGFVHICRLDTAGARVLEN >NC_000913.3_prot_NP_414545.1_4 [gene=thrC] [protein=L-threonine synthase] [protein_id=NP_414545.1] [location=3734..5020] MKLYNLKDHNEQVSFAQAVTQGLGKNQGLFFPHDLPEFSLTEIDEMLKLDFVTRSAKILSAFIGDEIPQE ILEERVRAAFAFPAPVANVESDVGCLELFHGPTLAFKDFGGRFMAQMLTHIAGDKPVTILTATSGDTGAA VAHAFYGLPNVKVVILYPRGKISPLQEKLFCTLGGNIETVAIDGDFDACQALVKQAFDDEELKVALGLNS ANSINISRLLAQICYYFEAVAQLPQETRNQLVVSVPSGNFGDLTAGLLAKSLGLPVKRFIAATNVNDTVP RFLHDGQWSPKATQATLSNAMDVSQPNNWPRVEELFRRKIWQLKELGYAAVDDETTQQTMRELKELGYTS EPHAAVAYRALRDQLNPGEYGLFLGTAHPAKFKESVEAILGETLDLPKELAERADLPLLSHNLPADFAAL RKLMMNHQ >NC_000913.3_prot_NP_414546.1_5 [gene=yaaX] [protein=DUF2502 family putative periplasmic protein] [protein_id=NP_414546.1] [location=5234..5530] MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHL HGPPPPPRHHKKAPHDHHGGHGPGKHHR >NC_000913.3_prot_NP_414547.1_6 [gene=yaaA] [protein=peroxide resistance protein, lowers intracellular iron] [protein_id=NP_414547.1] [location=complement(5683..6459)] MLILISPAKTLDYQSPLTTTRYTLPELLDNSQQLIHEARKLTPPQISTLMRISDKLAGINAARFHDWQPD FTPANARQAILAFKGDVYTGLQAETFSEDDFDFAQQHLRMLSGLYGVLRPLDLMQPYRLEMGIRLENARG
##gff-version 3 #!gff-spec-version 1.20 #!processor NCBI annotwriter ##sequence-region NC_000913.3 1 4641652 ##species http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=511145 NC_000913.3 RefSeq region 1 4641652 . + . ID=id0;Name=ANONYMOUS;Dbxref=taxon:511145;Is_circular=true;gbkey=Src;genome=chromosome;mol_type=genomic DNA;strain=K-12;substrain=MG1655 NC_000913.3 RefSeq gene 190 255 . + . ID=gene0;Name=thrL;Dbxref=EcoGene:EG11277,GeneID:944742;gbkey=Gene;gene=thrL;gene_synonym=ECK0001,JW4367;locus_tag=b0001 NC_000913.3 RefSeq CDS 190 255 . + 0 ID=cds0;Name=NP_414542.1;Parent=gene0;Dbxref=ASAP:ABE-0000006,UniProtKB%2FSwiss-Prot:P0AD86,Genbank:NP_414542.1,EcoGene:EG11277,GeneID:944742;gbkey=CDS;gene=thrL;product=thr operon leader peptide;protein_id=NP_414542.1;transl_table=11 NC_000913.3 RefSeq gene 337 2799 . + . ID=gene1;Name=thrA;Dbxref=EcoGene:EG10998,GeneID:945803;gbkey=Gene;gene=thrA;gene_synonym=ECK0002,Hs,JW0001,thrA1,thrA2,thrD;locus_tag=b0002 NC_000913.3 RefSeq CDS 337 2799 . + 0 ID=cds1;Name=NP_414543.1;Parent=gene1;Note=bifunctional: aspartokinase I %28N-terminal%29%3B homoserine dehydrogenase I %28C-terminal%29;Dbxref=ASAP:ABE-0000008,UniProtKB%2FSwiss-Prot:P00561,Genbank:NP_414543.1,EcoGene:EG10998,GeneID:945803;gbkey=CDS;gene=thrA;product=Bifunctional aspartokinase%2Fhomoserine dehydrogenase 1;protein_id=NP_414543.1;transl_table=11 NC_000913.3 RefSeq gene 2801 3733 . + . ID=gene2;Name=thrB;Dbxref=EcoGene:EG10999,GeneID:947498;gbkey=Gene;gene=thrB;gene_synonym=ECK0003,JW0002;locus_tag=b0003 NC_000913.3 RefSeq CDS 2801 3733 . + 0 ID=cds2;Name=NP_414544.1;Parent=gene2;Dbxref=ASAP:ABE-0000010,UniProtKB%2FSwiss-Prot:P00547,Genbank:NP_414544.1,EcoGene:EG10999,GeneID:947498;gbkey=CDS;gene=thrB;product=homoserine kinase;protein_id=NP_414544.1;transl_table=11 NC_000913.3 RefSeq gene 3734 5020 . + . ID=gene3;Name=thrC;Dbxref=EcoGene:EG11000,GeneID:945198;gbkey=Gene;gene=thrC;gene_synonym=ECK0004,JW0003;locus_tag=b0004 NC_000913.3 RefSeq CDS 3734 5020 . + 0 ID=cds3;Name=NP_414545.1;Parent=gene3;Dbxref=ASAP:ABE-0000012,UniProtKB%2FSwiss-Prot:P00934,Genbank:NP_414545.1,EcoGene:EG11000,GeneID:945198;gbkey=CDS;gene=thrC;product=L-threonine synthase;protein_id=NP_414545.1;transl_table=11 NC_000913.3 RefSeq gene 5234 5530 . + . ID=gene4;Name=yaaX;Dbxref=EcoGene:EG14384,GeneID:944747;gbkey=Gene;gene=yaaX;gene_synonym=ECK0005,JW0004;locus_tag=b0005 NC_000913.3 RefSeq CDS 5234 5530 . + 0 ID=cds4;Name=NP_414546.1;Parent=gene4;Dbxref=ASAP:ABE-0000015,UniProtKB%2FSwiss-Prot:P75616,Genbank:NP_414546.1,EcoGene:EG14384,GeneID:944747;gbkey=CDS;gene=yaaX;product=DUF2502 family putative periplasmic protein;protein_id=NP_414546.1;transl_table=11 NC_000913.3 RefSeq repeat_region 5565 5669 . + . ID=id1;Note=RIP1 %28repetitive extragenic palindromic%29 element%3B contains 2 REP sequences and 1 IHF site;gbkey=repeat_region NC_000913.3 RefSeq gene 5683 6459 . - . ID=gene5;Name=yaaA;Dbxref=EcoGene:EG10011,GeneID:944749;gbkey=Gene;gene=yaaA;gene_synonym=ECK0006,JW0005;locus_tag=b0006 NC_000913.3 RefSeq CDS 5683 6459 . - 0 ID=cds5;Name=NP_414547.1;Parent=gene5;Dbxref=ASAP:ABE-0000018,UniProtKB%2FSwiss-Prot:P0A8I3,Genbank:NP_414547.1,EcoGene:EG10011,GeneID:944749;gbkey=CDS;gene=yaaA;product=peroxide resistance protein%2C lowers intracellular iron;protein_id=NP_414547.1;transl_table=11 NC_000913.3 RefSeq gene 6529 7959 . - . ID=gene6;Name=yaaJ;Dbxref=EcoGene:EG11555,GeneID:944745;gbkey=Gene;gene=yaaJ;gene_synonym=ECK0007,JW0006;locus_tag=b0007 NC_000913.3 RefSeq CDS 6529 7959 . - 0 ID=cds6;Name=NP_414548.1;Parent=gene6;Note=inner membrane transport protein;Dbxref=ASAP:ABE-0000020,UniProtKB%2FSwiss-Prot:P30143,Genbank:NP_414548.1,EcoGene:EG11555,GeneID:944745;gbkey=CDS;gene=yaaJ;product=putative transporter;protein_id=NP_414548.1;transl_table=11 NC_000913.3 RefSeq gene 8238 9191 . + . ID=gene7;Name=talB;Dbxref=EcoGene:EG11556,GeneID:944748;gbkey=Gene;gene=talB;gene_synonym=ECK0008,JW0007,yaaK;locus_tag=b0008 NC_000913.3 RefSeq CDS 8238 9191 . + 0 ID=cds7;Name=NP_414549.1;Parent=gene7;Dbxref=ASAP:ABE-0000027,UniProtKB%2FSwiss-Prot:P0A870,Genbank:NP_414549.1,EcoGene:EG11556,GeneID:944748;gbkey=CDS;gene=talB;product=transaldolase B;protein_id=NP_414549.1;transl_table=11 NC_000913.3 RefSeq gene 9306 9893 . + . ID=gene8;Name=mog;Dbxref=EcoGene:EG11511,GeneID:944760;gbkey=Gene;gene=mog;gene_synonym=bisD,chlG,ECK0009,JW0008,mogA,yaaG;locus_tag=b0009 NC_000913.3 RefSeq CDS 9306 9893 . + 0 ID=cds8;Name=NP_414550.1;Parent=gene8;Note=putative molybdochetalase in molybdopterine biosynthesis;Dbxref=ASAP:ABE-0000030,UniProtKB%2FSwiss-Prot:P0AF03,Genbank:NP_414550.1,EcoGene:EG11511,GeneID:944760;gbkey=CDS;gene=mog;product=molybdochelatase incorporating molybdenum into molybdopterin;protein_id=NP_414550.1;transl_table=11 NC_000913.3 RefSeq gene 9928 10494 . - . ID=gene9;Name=satP;Dbxref=EcoGene:EG11512,GeneID:944792;gbkey=Gene;gene=satP;gene_synonym=ECK0010,JW0009,yaaH;locus_tag=b0010 NC_000913.3 RefSeq CDS 9928 10494 . - 0 ID=cds9;Name=NP_414551.1;Parent=gene9;Dbxref=ASAP:ABE-0000032,UniProtKB%2FSwiss-Prot:P0AC98,Genbank:NP_414551.1,EcoGene:EG11512,GeneID:944792;gbkey=CDS;gene=satP;product=succinate-acetate transporter;protein_id=NP_414551.1;transl_table=11 NC_000913.3 RefSeq gene 10643 11356 . - . ID=gene10;Name=yaaW;Dbxref=EcoGene:EG14340,GeneID:944771;gbkey=Gene;gene=yaaW;gene_synonym=ECK0011,JW0010;locus_tag=b0011 NC_000913.3 RefSeq CDS 10643 11356 . - 0 ID=cds10;Name=NP_414552.1;Parent=gene10;Dbxref=ASAP:ABE-0000037,UniProtKB%2FSwiss-Prot:P75617,Genbank:NP_414552.1,EcoGene:EG14340,GeneID:944771;gbkey=CDS;gene=yaaW;product=UPF0174 family protein;protein_id=NP_414552.1;transl_table=11 NC_000913.3 RefSeq gene 11382 11786 . - . ID=gene11;Name=yaaI;Dbxref=EcoGene:EG11513,GeneID:944751;gbkey=Gene;gene=yaaI;gene_synonym=ECK0013,JW0012;locus_tag=b0013 NC_000913.3 RefSeq CDS 11382 11786 . - 0 ID=cds11;Name=NP_414554.1;Parent=gene11;Dbxref=ASAP:ABE-0000043,UniProtKB%2FSwiss-Prot:P28696,Genbank:NP_414554.1,EcoGene:EG11513,GeneID:944751;gbkey=CDS;gene=yaaI;product=UPF0412 family protein;protein_id=NP_414554.1;transl_table=11 NC_000913.3 RefSeq gene 12163 14079 . + . ID=gene12;Name=dnaK;Dbxref=EcoGene:EG10241,GeneID:944750;gbkey=Gene;gene=dnaK;gene_synonym=ECK0014,groPAB,groPC,groPF,grpC,grpF,JW0013,seg;locus_tag=b0014 NC_000913.3 RefSeq CDS 12163 14079 . + 0 ID=cds12;Name=NP_414555.1;Parent=gene12;Note=chaperone Hsp70%3B DNA biosynthesis%3B autoregulated heat shock proteins;Dbxref=ASAP:ABE-0000052,UniProtKB%2FSwiss-Prot:P0A6Y8,Genbank:NP_414555.1,EcoGene:EG10241,GeneID:944750;gbkey=CDS;gene=dnaK;product=chaperone Hsp70%2C with co-chaperone DnaJ;protein_id=NP_414555.1;transl_table=11 NC_000913.3 RefSeq gene 14168 15298 . + . ID=gene13;Name=dnaJ;Dbxref=EcoGene:EG10240,GeneID:944753;gbkey=Gene;gene=dnaJ;gene_synonym=ECK0015,faa,groP,grpC,JW0014;locus_tag=b0015 NC_000913.3 RefSeq CDS 14168 15298 . + 0 ID=cds13;Name=NP_414556.1;Parent=gene13;Note=chaperone with DnaK%3B heat shock protein;Dbxref=ASAP:ABE-0000054,UniProtKB%2FSwiss-Prot:P08622,Genbank:NP_414556.1,EcoGene:EG10240,GeneID:944753;gbkey=CDS;gene=dnaJ;product=chaperone Hsp40%2C DnaK co-chaperone;protein_id=NP_414556.1;transl_table=11
LOCUS NC_000913 4641652 bp DNA circular CON 15-MAY-2014 DEFINITION Escherichia coli str. K-12 substr. MG1655, complete genome. ACCESSION NC_000913 VERSION NC_000913.3 GI:556503834 DBLINK BioProject: PRJNA57779 BioSample: SAMN02604091 KEYWORDS RefSeq. SOURCE Escherichia coli str. K-12 substr. MG1655 ORGANISM Escherichia coli str. K-12 substr. MG1655 Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; Escherichia. FEATURES Location/Qualifiers source 1..4641652 /organism="Escherichia coli str. K-12 substr. MG1655" /mol_type="genomic DNA" /strain="K-12" /sub_strain="MG1655" /db_xref="taxon:511145" gene 190..255 /gene="thrL" /locus_tag="b0001" /gene_synonym="ECK0001; JW4367" /db_xref="EcoGene:EG11277" /db_xref="GeneID:944742" CDS 190..255 /gene="thrL" /locus_tag="b0001" /gene_synonym="ECK0001; JW4367" ORIGIN 1 agcttttcat tctgactgca acgggcaata tgtctctgtg tggattaaaa aaagagtgtc 61 tgatagcagc ttctgaactg gttacctgcc gtgagtaaat taaaatttta ttgacttagg 121 tcactaaata ctttaaccaa tataggcata gcgcacagac agataaaaat tacagagtac 181 acaacatcca tgaaacgcat tagcaccacc attaccacca ccatcaccat taccacaggt 241 aacggtgcgg gctgacgcgt acaggaaaca cagaaaaaag cccgcacctg acagtgcggg 301 cttttttttt cgaccaaagg taacgaggta acaaccatgc gagtgttgaa gttcggcggt 361 acatcagtgg caaatgcaga acgttttctg cgtgttgccg atattctgga aagcaatgcc 421 aggcaggggc aggtggccac cgtcctctct gcccccgcca aaatcaccaa ccacctggtg
The percentage of nitrogenous bases on a DNA molecule that are either guanine or cytosine.
The GC-content can be measured by several means,
In alternative manner, if the DNA has been sequenced then the GC-content can be accurately calculated by simple arithmetic.
GC-content percentage is calculated as \[\frac{G+C}{A+T+G+C}\]
Determine the GC content of E.coli
Is this GC content uniform through all genome?
Discovered by Austrian chemist Erwin Chargaff in 1952.
DNA from any cell of all organisms has a 1:1 ratio of pyrimidine and purine bases
The amount of guanine is equal to cytosine and the amount of adenine is equal to thymine.
Elson D, Chargaff E (1952). On the deoxyribonucleic acid content of sea urchin gametes. Experientia 8 (4): 143–145
Chargaff E, Lipshitz R, Green C (1952). Composition of the deoxypentose nucleic acids of four genera of sea-urchin. J Biol Chem 195 (1): 155–160
A double-stranded DNA molecule globally has percentage base pair equality: \(\%A = \%T\) and \(\%G = \%C\)
Both \(\%A \approx \%T\) and \(\%G \approx \%C\) are valid for each of the two DNA strands
Why the first rule is always valid?
How do you determine if \(\%G \approx \%C\)?
Is this valid through all the genome?
What is the difference \(G-C\) respect to the total \(G+C\)?
Calculate the ratio \[\frac{G-C}{G+C}\] using sliding windows and plot it
What do you see?
In each DNA strand the frequency of occurrence of G is equal to C because the substitution rate is presumably equal
Hence, the second parity rule only exists when there is no mutation or substitution
There is a richness of G over C and T over A in the leading strand
GC skew changes sign at the boundaries of the two replichores
This corresponds to DNA replication origin or terminus
Roten C-AH, Gamba P, Barblan J-L, Karamata D. Comparative Genometrics (CG): a database dedicated to biometric comparisons of whole genomes. Nucleic Acids Research. 2002;30(1):142-144
Chair image by Alex Rio Brazil - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=8045709
dogs By YellowLabradorLooking_new.jpg:
derivative work: Djmirko (talk)YellowLabradorLooking.jpg: User:HabjGolden_Retriever_Sammy.jpg:
Pharaoh HoundCockerpoo.jpg: ALMMLonghaired_yorkie.jpg:
Ed Garcia from United StatesBoxer_female_brown.jpg:
Flickr user boxercabMilù_050.JPG:
AleRBeagle1.jpg: TobycatBasset_Hound_600.jpg:
ToBNewfoundland_dog_Smoky.jpg:
Flickr user DanDee Shotsderivative work: December21st2012Freak (talk) - YellowLabradorLooking_new.jpg
Golden_Retriever_Sammy.jpg
Cockerpoo.jpg Longhaired_yorkie.jpg
Boxer_female_brown.jpg
Milù_050.JPGBeagle1.jpgBasset_Hound_600.jpg
Newfoundland_dog_Smoky.jpg
, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=10793219
Allegory of the cave By Veldkamp, Gabriele and Maurer, Markus - Veldkamp, Gabriele. Zukunftsorientierte Gestaltung informationstechnologischer Netzwerke im Hinblick auf die Handlungsfähigkeit des Menschen. Aachener Reihe Mensch und Technik, Band 15, Verlag der Augustinus Buchhandlung, Aachen 1996, Germany, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=24826744