It depends on the technology and the algorithm
Let’s ignore the technology
Small sizes take short time
Larger sizes take longer time
How many comparisons/multiplications are needed?
It depends on query and subject size, so \[Cost=f(m, n)\] where \(m\) and \(n\) are the query and subject lengths
So, What is the formula?
We care on how time depends on the input size
We do not care about fixed numbers
For example \(100m^2 n^4\) is equivalent to \(m^2 n^4\)
We say “the cost is on the order of \(m^2 n^4\)”
We write \[O(m^2 n^4)\]
Size of dot plot matrix is \(m\cdot n\)
Building dot plot matrix takes time \(m\cdot n\)
Then we have to find “the best diagonals”
Thus, computational cost is \(O(m\cdot n)\)
Search for “and
”, “it
”, “mlo
”, and “symbols
”. Without gaps, full words.
# The Library of Babel
by Jorge Luis Borges
The universe (which others call the Library) is composed of an indefinite,
perhaps infinite number of hexagonal galleries. In the center of each gallery
is a ventilation shaft, bounded by a low railing. From any hexagon one can see
the floors above and below-one after another, endlessly. The arrangement of
the galleries is always the same: Twenty bookshelves, five to each side, line
four of the hexagon's six sides; the height of the bookshelves, floor to
ceiling, is hardly greater than the height of a normal librarian. One of the
hexagon's free sides opens onto a narrow sort of vestibule, which in turn opens
onto another gallery, identical to the first-identical in fact to all. To the
left and right of the vestibule are two tiny compartments. One is for
sleeping, upright; the other, for satisfying one's physical necessities.
Through this space, too, there passes a spiral staircase, which winds upward
and downward into the remotest distance. In the vestibule there is a mirror,
which faithfully duplicates appearances. Men often infer from this mirror that
the Library is not infinite-if it were, what need would there be for that
illusory replication? I prefer to dream that burnished surfaces are a
figuration and promise of the infinite... . Light is provided by certain
spherical fruits that bear the name "bulbs." There are two of these bulbs in
each hexagon, set crosswise. The light they give is insufficient, and
Like all the men of the Library, in my younger days I traveled; I have
journeyed in quest of a book, perhaps the catalog of catalogs. Now that my
eyes can hardly make out what I myself have written, I am preparing to die, a
few leagues from the hexagon where I was born. When I am dead, compassionate
hands will throw me over the railing; my tomb will be the unfathomable air,
my body will sink for ages, and will decay and dissolve in the wind engendered
by my fall, which shall be infinite. I declare that the Library is endless.
Idealists argue that the hexagonal rooms are the necessary shape of absolute
space, or at least of our perception of space. They argue that a triangular or
pentagonal chamber is inconceivable. (Mystics claim that their ecstasies
reveal to them a circular chamber containing an enormous circular book with a
continuous spine that goes completely around the walls. But their testimony is
suspect, their words obscure. That cyclical book is God.) Let it suffice for
the moment that I repeat the classic dictum: _The Library is a sphere whose
exact center is any hexagon and whose circumference is unattainable._
Each wall of each hexagon is furnished with five bookshelves; each bookshelf
holds thirty-two books identical in format; each book contains four hundred ten
pages; each page, forty lines; each line, approximately eighty black letters.
There are also letters on the front cover of each book; those letters neither
indicate nor prefigure what the pages inside will say. I am aware that that
lack of correspondence once struck men as mysterious. Before summarizing the
solution of the mystery (whose discovery, in spite of its tragic consequences,
is perhaps the most important event in all history), I wish to recall a few
First: The Library has existed _ab eternitate_. That truth, whose immediate
corollary is the future eternity of the world, no rational mind can doubt.
Man, the imperfect librarian, may be the work of chance or of malevolent
demiurges; the universe, with its elegant appointments-its bookshelves, its
enigmatic books, its indefatigable staircases for the traveler, and its water
closets for the seated librarian-can only be the handiwork of a god. In order
to grasp the distance that separates the human and the divine, one has only to
compare these crude trembling symbols which my fallible hand scrawls on the
cover of a book with the organic letters inside---neat, delicate, deep black,
and inimitably symmetrical.
Second: There are twenty-five orthographic symbols[^1]. That discovery enabled
mankind, three hundred years ago, to formulate a general theory of the Library
and thereby satisfactorily solve the riddle that no conjecture had been able to
divine-the formless and chaotic nature of virtually all books. One book, which
my father once saw in a hexagon in circuit 15-94, consisted of the letters M C
V perversely repeated from the first line to the last. Another (much
consulted in this zone) is a mere labyrinth of letters whose penultimate page
contains the phrase 0 Time thy pyramids. This much is known: For every
rational line or forthright statement there are leagues of senseless cacophony,
verbal nonsense, and incoherency. (I know of one semibarbarous zone whose
librarians repudiate the "vain and superstitious habit" of trying to find sense
in books, equating such a quest with attempting to find meaning in dreams or
in the chaotic lines of the palm of one's hand.... They will acknowledge that
the inventors of writing imitated the twenty-five natural symbols, but contend
that that adoption was fortuitous, coincidental, and that books in themselves
have no meaning. That argument, as we shall see, is not entirely fallacious.)
For many years it was believed that those impenetrable books were in ancient or
far-distant languages. It is true that the most ancient peoples, the first
librarians, employed a language quite different from the one we speak today; it
is true that a few miles to the right, our language devolves into dialect and
that ninety floors above, it becomes incomprehensible. All of that, I repeat,
is true-but four hundred ten pages of unvarying M C V 's cannot belong to any
language, however dialectal or primitive it may be. Some have suggested that
each letter influences the next, and that the value of M C V on page 71, line
3, is not the value of the same series on another line of another page, but
that vague thesis has not met with any great acceptance. Others have mentioned
the possibility of codes; that conjecture has been universally accepted, though
not in the sense in which its originators formulated it.
Some five hundred years ago, the chief of one of the upper hexagons[^2] came
across a book as jumbled as all the others, but containing almost two pages of
homogeneous lines. He showed his find to a traveling decipher, who told him
that the lines were written in Portuguese; others said it was Yiddish. Within
the century experts had determined what the language actually was: a
Samoyed-Lithuanian dialect of Guarani, with inflections from classical Arabic.
The content was also determined: the rudiments of combinatorial analysis,
illustrated with examples of endlessly repeating variations. Those examples
allowed a librarian of genius to discover the fundamental law of the Library.
This philosopher observed that all books, however different from one another
they might be, consist of identical elements: the space, the period, the comma,
and the twenty-two letters of the alphabet. He also posited a fact which all
travelers have since confirmed: In all the Library, there are no two identical
books. From those incontrovertible premises, the librarian deduced that the
Library is "total"---perfect, complete, and whole---and that its bookshelves
contain all possible combinations of the twenty-two orthographic symbols (a
number which, though unimaginably vast, is not infinite)---that is, all that is
able to be expressed, in every language. _All_---the detailed history of the
future, the autobiographies of the archangels, the faithful catalog of the
Library, thousands and thousands of false catalogs, the proof of the falsity of
those false catalogs, a proof of the falsity of the true catalog, the gnostic
gospel of Basilides, the commentary upon that gospel, the commentary on the
commentary on that gospel, the true story of your death, the translation of
every book into every language, the interpolations of every book into all
books, the treatise Bede could have written (but did not) on the mythology of
the Saxon people, the lost books of Tacitus.
When it was announced that the Library contained all books, the first reaction
was unbounded joy. All men felt themselves the possessors of an intact and
secret treasure. There was no personal problem, no world problem, whose
eloquent solution did not exist-somewhere in some hexagon. The universe was
justified; the universe suddenly became congruent with the unlimited width and
breadth of humankind's hope. At that period there was much talk of The
Vindications---books of apologie and prophecies that would vindicate for all time
the actions of every person in the universe and that held wondrous arcana for
men's futures. Thousands of greedy individuals abandoned their sweet native
hexagons and rushed downstairs, upstairs, spurred by the vain desire to find
their Vindication. These pilgrims squabbled in the narrow corridors, muttered
dark imprecations, strangled one another on the divine staircases, threw
deceiving volumes down ventilation shafts, were themselves hurled to their
deaths by men of distant regions. Others went insane... . The Vindications do
exist (I have seen two of them, which refer to persons in the future, persons
perhaps not imaginary), but those who went in quest of them failed to recall
that the chance of a man's finding his own Vindication, or some perfidious
version of his own, can be calculated to be zero.
At that same period there was also hope that the fundamental mysteries of
mankind-the origin of the Library and of time-might be revealed. In all
likelihood those profound mysteries can indeed be explained in words; if the
language of the philosophers is not sufficient, then the multiform Library must
surely have produced the extraordinary language that is required, together
with the words and grammar of that language. For four centuries, men have been
scouring the hexagons.... There are official searchers, the "inquisitors." I
have seen them about their tasks: they arrive exhausted at some hexagon, they
talk about a staircase that nearly killed them-rungs were missing-they speak
with the librarian about galleries and staircases, and, once in a while, they
take up the nearest book and leaf through it, searching for disgraceful or
dishonorable words. Clearly, no one expects to discover anything.
That unbridled hopefulness was succeeded, naturally enough, by a similarly
disproportionate depression. The certainty that some bookshelf in some hexagon
contained precious books, yet that those precious books were forever out of
reach, was almost unbearable. One blasphemous sect proposed that the searches
be discontinued and that all men shuffle letters and symbols until those
canonical books, through some improbable stroke of chance, had been
constructed. The authorities were forced to issue strict orders. The sect
disappeared, but in my childhood I have seen old men who for long periods would
hide in the latrines with metal disks and a forbidden dice cup, feebly
mimicking the divine disorder.
Others, going about it in the opposite way, thought the first thing to do was
eliminate all worthless books. They would invade the hexagons, show
credentials that were not always false, leaf disgustedly through a volume, and
condemn entire walls of books. It is to their hygienic, ascetic rage that we
lay the senseless loss of millions of volumes. Their name is execrated today,
but those who grieve over the "treasures" destroyed in that frenzy overlook two
widely acknowledged facts: One, that the Library is so huge that any
reduction by human hands must be infinitesimal. And two, that each book is
unique and irreplaceable, but (since the Library is total) there are always
several hundred thousand imperfect facsimiles-books that differ by no more than
a single letter, or a comma. Despite general opinion, I daresay that the
consequences of the depredations committed by the Purifiers have been
exaggerated by the horror those same fanatics inspired. They were spurred on
by the holy zeal to reach-someday, through unrelenting effort-the books of the
Crimson Hexagon-books smaller than natural books, books omnipotent,
illustrated, and magical.
We also have knowledge of another superstition from that period: belief in
what was termed the Book-Man. On some shelf in some hexagon, it was argued,
there must exist a book that is the cipher and perfect compendium of all other
books, and some librarian must have examined that book; this librarian is
analogous to a god. In the language of this zone there are still vestiges of
the sect that worshiped that distant librarian. Many have gone in search of
Him. For a hundred years, men beat every possible path and every path in
vain. How was one to locate the idolized secret hexagon that sheltered Him?
Someone proposed searching by regression: To locate book A, first consult book
B, which tells where book A can be found; to locate book B, first consult
book C, and so on, to infinity.... It is in ventures such as these that I have
squandered and spent my years. I cannot think it unlikely that there is such a
total book[^3] on some shelf in the universe. I pray to the unknown gods that
some man-even a single man, tens of centuries ago-has perused and read that
book. If the honor and wisdom and joy of such a reading are not to be my own,
then let them be for others. Let heaven exist, though my own place be in hell.
Let me be tortured and battered and annihilated, but let there be one instant,
one creature, wherein thy enormous Library may find its justification.
Infidels claim that the rule in the Library is not "sense;' but "non-sense;'
and that "rationality" (even humble, pure coherence) is an almost miraculous
exception. They speak, I know, of "the feverish Library, whose random volumes
constantly threaten to transmogrify into others, so that they affirm all
things, deny all things, and confound and confuse all things, like some mad and
hallucinating deity." Those words, which not only proclaim disorder but
exemplify it as well, prove, as all can see, the infidels' deplorable taste and
desperate ignorance. For while the Library contains all verbal structures, all
the variations allowed by the twenty-five orthographic symbols, it includes not
a single absolute piece of nonsense. It would be pointless to observe that the
finest volume of all the many hexagons that I myself administer is titled
Combed Thunder, while another is titled The Plaster Cramp, and another,
Axaxaxas mlo. Those phrases, at first apparently incoherent, are undoubtedly
susceptible to cryptographic or allegorical "reading"; that reading, that
justification of the words' order and existence, is itself verbal and, ex
hypothesi, already contained somewhere in the Library. There is no combination
of characters one can make---dhcmrlchtdj, for example---that the divine Library
has not foreseen and that in one or more of its secret tongues does not hide a
terrible significance. There is no syllable one can speak that is not filled
with tenderness and terror, that is not, in one of those languages, the mighty
name of a god. To speak is to commit tautologies. This pointless, verbose
epistle already exists in one of the thirty volumes of the five bookshelves in
one of the countless hexagons---as does its refutation. (A number n of the
possible languages employ the same vocabulary; in some of them, the symbol
"library" possesses the correct definition "everlasting, ubiquitous system of
hexagonal galleries," while a library---the thing---is a loaf of bread or a
pyramid or something else, and the six words that define it themselves have
other definitions. You who read me---are you certain you understand my
Methodical composition distracts me from the present condition of humanity.
The certainty that everything has already been written annuls us, or renders us
phantasmal. I know districts in which the young people prostrate themselves
before books and like savages kiss their pages, though they cannot read a
letter. Epidemics, heretical discords, pilgrimages that inevitably degenerate
into brigandage have decimated the population. I believe I mentioned the
suicides, which are more and more frequent every year. I am perhaps misled by
old age and fear, but I suspect that the human species---the only
species---teeters at the verge of extinction, yet that the Library enlightened,
solitary, infinite, perfectly unmoving, armed with precious volumes, pointless,
incorruptible, and secret---will endure.
I have just written the word "infinite." I have not included that adjective out
of mere rhetorical habit; I hereby state that it is not illogical to think that
the world is infinite. Those who believe it to have limits hypothesize that in
some remote place or places the corridors and staircases and hexagons may,
inconceivably, end-which is absurd. And yet those who picture the world as
unlimited forget that the number of possible books is not. I will be bold
enough to suggest this solution to the ancient problem: The Library is
unlimited but periodic. If an eternal traveler should journey in any
direction, he would find after untold centuries that the same volumes are
repeated in the same disorder-which, repeated, becomes order: the Order. My
solitude is cheered by that elegant hope.[^4]
[^1]: The original manuscript has neither numbers nor capital letters;
punctuation is limited to the comma and the period. Those two marks, the
space, and the twenty-two letters of the alphabet are the twenty-five
sufficient symbols that our unknown author is referring to. [Ed. note.]
[^2]: In earlier times, there was one man for every three hexagons. Suicide
and diseases of the lung have played havoc with that proportion. An
unspeakably melancholy memory: I have sometimes traveled for nights on end,
down corridors and polished staircases, without coming across a single
[^3]: I repeat: In order for a book to exist, it is sufficient that it be
possible. Only the impossible is excluded. For example, no book is also a
staircase, though there are no doubt books that discuss and deny and prove that
possibility, and others whose structure corresponds to that of a staircase.
[^4]: Letizia Alvarez de Toledo has observed that the vast Library is
pointless; strictly speaking, all that is required is a single volume, of the
common size, printed in nine- or ten-point type, that would consist of an
infinite number of infinitely thin pages. (In the early seventeenth century,
Cavalieri stated that every solid body is the superposition of an infinite
number of planes.) Using that silken vademecum would not be easy: each apparent
page would open into other similar pages; the inconceivable middle page would
have no "back."
<!-- -->
Let’s say that
Then each word comparison takes \(O(mn)\)
and searching all database takes \(O(mnd)\)
Let’s say that
Then each query-subject comparison takes \(O(mn)\)
and searching all database takes \(O(mnd)\)
0 69
1 62 253
12 278
15 66
2 92 258
2015 278
226 278
3 87 191 264
4 250 269
71 86
94 66
a 6 10 11 15 16 19 26 27 33 35 38 48 56 59 63 66 68 73 81 82 93 94 96 100 103 107 112 135 146 147 151 159 164 172 181 183 185 187 188 190 192 193 207 215 218 220 223 231 261 264 265 267 270
ab 51
abandoned 127
able 64 109
about 145 146 147 162
above 7 83
absolute 32 207
absurd 244
acceptance 88
accepted 89
acknowledge 74
acknowledged 168
across 93 261
actions 126
actually 96
adjective 240
administer 208
adoption 76
affirm 201
after 7 248
age 235
ages 30
ago 63 92 192
air 29
all 12 25 48 65 83 93 101 103 104 107 108 109 115 119 120 125 139 155 163 181 201 202 204 205 208 270
allegorical 211
allowed 100 206
almost 93 154 199
alphabet 103 255
already 213 219 229
also 44 98 103 138 179 265
alvarez 269
always 8 164 170
am 27 28 45 234
an 4 35 120 199 247 259 271 273
analogous 183
analysis 98
ancient 79 80 246
and 7 13 16 20 22 30 39 55 57 60 64 65 71 72 76 82 86 103 106 111 120 123 125 126 128 139 143 147 148 155 159 164 169 170 177 181 182 185 189 190 192 193 195 199 202 204 209 212 215 217 224 231 234 235 238 243 244 254 255 259 261 266 267
annihilated 195
announced 119
annuls 229
another 7 12 67 87 101 130 179 209
any 6 39 84 88 168 247
anything 149
apologie 125
apparent 274
apparently 210
appearances 17
appointments 54
approximately 43
arabic 97
arcana 126
archangels 110
are 13 19 21 32 44 62 70 104 144 170 183 193 210 225 234 248 255 266
argue 32 33
argued 180
argument 77
armed 237
around 36
arrangement 7
arrive 145
as 46 77 93 189 204 220 244
ascetic 165
at 33 124 138 145 210 236
attempting 73
author 256
authorities 157
autobiographies 110
aware 45
axaxaxas 210
axioms 49
b 188
babel 1 278
back 276
basilides 113
battered 195
be 18 29 31 53 56 85 102 109 136 139 140 155 169 188 193 194 195 207 245 264 274
bear 21
beat 185
became 123
becomes 83 249
bede 116
been 64 89 143 156 173 229
before 46 231
belief 179
believe 233 242
believed 79
belong 84
below 7
black 43 59
blasphemous 154
body 30 273
bold 245
book 26 35 37 42 44 59 65 93 115 148 169 180 181 182 187 188 189 191 193 264 265
books 42 55 65 73 76 79 101 105 116 117 119 125 153 156 163 165 171 175 176 182 231 245 266
bookshelf 41 152
bookshelves 8 9 41 54 106 219
borges 2 278
born 28
bounded 6
bread 223
breadth 124
brigandage 233
bulbs 21
burnished 19
but 36 75 84 87 93 116 134 158 167 170 195 198 203 235 247
by 2 6 20 31 128 132 151 169 171 173 174 175 187 206 234 250
c 66 84 86 189
cacophony 70
calculated 136
call 4
came 92
can 6 27 52 56 136 140 188 204 214 216
cannot 84 190 231
canonical 156
capital 253
catalog 26 110 112
catalogs 26 111 112
cavalieri 273
ceiling 10
center 5 39
centuries 143 192 248
century 96 272
certain 20 225
certainty 152 229
chamber 34 35
chance 53 135 156
chaotic 65 74
characters 214
cheered 250
chief 92
childhood 158
cipher 181
circuit 66
circular 35
circumference 39
claim 34 198
classic 38
classical 97
clearly 149
closets 56
codes 89
coherence 199
coincidental 76
combed 209
combination 213
combinations 107
combinatorial 98
coming 261
comma 102 172 254
commentary 113 114
commit 218
committed 173
common 271
compare 58
compartments 13
compassionate 28
compendium 181
complete 106
completely 36
composed 4
composition 228
condemn 165
condition 228
confirmed 104
confound 202
confuse 202
congruent 123
conjecture 64 89
consequences 47 173
consist 102 271
consisted 66
constantly 201
constructed 157
consult 187 188
consulted 68
contain 107
contained 119 153 213
containing 35 93
contains 42 69 205
contend 75
content 98 278
continuous 36
corollary 52
correct 222
correspondence 46
corresponds 267
corridors 129 243 261
could 116
countless 220
cover 44 59
cramp 209
creature 196
credentials 164
crimson 176
crosswise 22
crude 58
cryptographic 211
cup 159
cyclical 37
daresay 172
dark 130
days 25
de 269
dead 28
death 114
deaths 132
decay 30
deceiving 131
decimated 233
decipher 94
declare 31
deduced 105
deep 59
define 224
definition 222
definitions 225
degenerate 232
deity 203
delicate 59
demiurges 54
deny 202 266
deplorable 204
depredations 173
depression 152
desire 128
desperate 205
despite 172
destroyed 167
detailed 109
determined 96 98
devolves 82
dhcmrlchtdj 214
dialect 82 97
dialectal 85
dice 159
dictum 38
did 116 122
die 27
differ 171
different 81 101
direction 248
disappeared 158
discontinued 155
discords 232
discover 100 149
discovery 47 62
discuss 266
diseases 259
disgraceful 148
disgustedly 164
dishonorable 149
disks 159
disorder 160 203 249
disproportionate 152
dissolve 30
distance 16 57
distant 80 132 184
distracts 228
districts 230
divine 57 65 130 160 214
do 132 162
does 215 220
doubt 52 266
down 131 261
downstairs 128
downward 16
dream 19
dreams 73
duplicates 17
each 5 8 22 41 42 43 44 86 169 274
earlier 258
early 272
easy 274
ecstasies 34
ed 256
edu 278
effort 175
eighty 43
elegant 54 250
elements 102
eliminate 163
eloquent 122
else 224
employ 221
employed 81
enabled 62
end 244 260
endless 31
endlessly 7 99
endure 238
engendered 30
enigmatic 55
enlightened 236
enormous 35 196
enough 151 246
entire 165
entirely 77
epidemics 232
epistle 219
equating 73
eternal 247
eternitate 51
eternity 52
even 192 199
event 48
evergreen 278
everlasting 222
every 69 109 115 126 185 234 258 273
everything 229
ex 212
exact 39
exaggerated 174
examined 182
example 214 265
examples 99
exception 200
excluded 265
execrated 166
exemplify 204
exhausted 145
exist 122 133 181 194 264
existed 51
existence 212
exists 219
expects 149
experts 96
explained 140
expressed 109
extinction 236
extraordinary 142
eyes 27
facsimiles 171
fact 12 103
facts 168
failed 134
faithful 110
faithfully 17
fall 31
fallacious 77
fallible 58
false 111 112 164
falsity 111 112
fanatics 174
far 80
father 66
fear 235
feebly 159
felt 120
feverish 200
few 28 48 82
figuration 20
filled 216
find 72 73 94 128 196 248
finding 135
finest 208
first 12 51 67 80 119 162 187 188 210
five 8 41 62 75 92 206 219 255
floor 9
floors 7 83
for 13 14 18 30 37 55 56 69 79 125 126 143 148 158 185 194 205 214 258 260 264 265
forbidden 159
forced 157
foreseen 215
forever 153
forget 245
format 42
formless 65
formulate 63
formulated 90
forthright 70
fortuitous 76
forty 43
found 188
four 9 42 84 143
free 11
frenzy 167
frequent 234
from 6 17 28 67 81 97 101 105 179 228
front 44
fruits 21
fundamental 100 138
furnished 41
future 52 110 133
futures 127
galleries 5 8 147 223
gallery 5 12
general 63 172
genius 100
give 22
gnostic 112
god 37 56 183 218
gods 191
goes 36
going 162
gone 184
gospel 113 114
grammar 143
grasp 57
great 88
greater 10
greedy 127
grieve 167
guarani 97
habit 72 241
had 64 96 156
hallucinating 203
hand 58 74
handiwork 56
hands 29 169
hardly 10 27
has 51 57 88 89 192 215 229 253 269
have 25 27 77 85 88 104 116 133 142 143 145 158 173 179 182 184 189 224 233 240 242 259 260 276
havoc 259
he 94 103 248
heaven 194
height 9 10
held 126
hell 194
hereby 241
heretical 232
hexagon 6 9 11 22 28 39 41 66 122 145 152 176 180 186
hexagonal 5 32 223
hexagons 92 128 144 163 208 220 243 258
hide 159 215
him 94 185
him? 186
his 94 135 136
history 48 109
holds 42
holy 175
homogeneous 94
honor 193
hope 124 138 250
hopefulness 151
horror 174
how 186
however 85 101
https 278
huge 168
human 57 169 235
humanity 228
humankind 124
humble 199
hundred 42 63 84 92 171 185
hurled 131
hygienic 165
hypothesi 213
hypothesize 242
i 19 25 27 28 31 38 45 48 71 83 133 144 158 172 189 190 191 200 208 230 233 234 235 240 241 245 260 264
idealists 32
identical 12 42 102 104
idolized 186
if 18 140 193 247
ignorance 205
illogical 241
illusory 19
illustrated 99 177
imaginary 134
imitated 75
immediate 51
impenetrable 79
imperfect 53 171
important 48
impossible 265
imprecations 130
improbable 156
in 5 11 12 16 21 25 26 30 42 47 48 56 66 68 73 74 76 79 90 95 104 109 122 126 129 133 134 139 140 147 152 158 159 162 167 179 180 183 184 185 189 191 194 198 213 215 217 219 221 230 242 247 249 258 264 271 272
included 240
includes 206
incoherency 71
incoherent 210
incomprehensible 83
inconceivable 34 275
inconceivably 244
incontrovertible 105
incorruptible 238
indeed 140
indefatigable 55
indefinite 4
indicate 45
individuals 127
inevitably 232
infer 17
infidels 198 204
infinite 5 18 20 31 108 237 240 242 272 273
infinitely 272
infinitesimal 169
infinity 189
inflections 97
influences 86
inimitably 60
inquisitors 144
insane 132
inside 45 59
inspired 174
instant 195
insufficient 22
intact 120
interpolations 115
into 16 82 115 201 233 275
invade 163
inventors 75
irreplaceable 170
is 4 6 8 10 13 16 18 20 22 31 34 36 37 38 39 41 48 52 68 69 77 80 82 84 87 106 108 141 142 165 166 168 169 170 181 182 189 190 198 199 208 209 212 213 216 217 218 223 241 242 244 245 246 250 254 256 264 265 269 270 273
issue 157
it 18 37 79 80 81 83 85 90 95 119 148 162 165 180 189 190 204 206 207 224 241 242 264
its 47 54 55 90 106 196 215 220
itself 212
jorge 2
journey 247
journeyed 26
joy 120 193
jumbled 93
just 240
justification 196 212
justified 123
killed 146
kiss 231
know 71 200 230
knowledge 179
known 69
labyrinth 68
lack 46
language 81 82 85 96 109 115 141 142 143 183
language? 226
languages 80 217 221
last 67
latrines 159
law 100
lay 166
leaf 148 164
leagues 28 70
least 33
left 13
let 37 194 195
letizia 269
letter 86 172 232
letters 43 44 59 66 68 103 155 253 255
librarian 10 53 56 100 105 147 182 184 262
librarians 72 81
library 1 4 18 25 31 38 51 63 100 104 106 111 119 139 141 168 170 196 198 200 205 213 214 222 223 236 246 269 278
light 20 22
like 25 202 231
likelihood 140
limited 254
limits 242
line 8 43 67 70 86 87
lines 43 74 94 95
lithuanian 97
loaf 223
locate 186 187 188
long 158
loss 166
lost 117
low 6
luis 2
lung 259
m 66 84 86
mad 202
magical 177
make 27 214
malevolent 53
man 53 135 180 192 258
mankind 63 139
manuscript 253
many 79 184 208
marks 254
may 53 85 196 243
me 29 195 225 228
meaning 73 77
melancholy 260
memory 260
men 17 25 46 120 127 132 143 155 158 185
mentioned 88 233
mere 68 241
met 88
metal 159
methodical 228
middle 275
might 102 139
mighty 217
miles 82
millions 166
mimicking 160
mind 52
miraculous 199
mirror 16 17
misled 234
missing 146
mlo 210
moment 38
more 171 215 234
most 48 80
much 67 69 124
multiform 141
must 141 169 181 182
muttered 129
my 25 26 29 30 31 58 66 158 190 193 194 225 249
myself 27 208
mysteries 138 140
mysterious 46
mystery 47
mystics 34
mythology 116
n 220
name 21 166 218
narrow 11 129
native 127
natural 75 176
naturally 151
nature 65
nearest 148
nearly 146
neat 59
necessary 32
necessities 14
need 18
neither 44 253
next 86
nights 260
nine 271
ninety 83
no 52 64 77 104 121 149 171 213 216 265 266 276
non 198
nonsense 71 207
nor 45 253
normal 10
not 18 77 87 88 90 108 116 122 134 141 164 193 198 203 206 215 216 217 240 241 245 274
note 256
now 26
number 5 108 220 245 272 274
numbers 253
obscure 37
observe 207
observed 101 269
of 1 4 5 7 9 10 11 13 20 21 25 26 32 33 41 44 46 47 52 53 56 59 63 65 66 68 70 71 72 74 75 83 84 86 87 89 92 93 97 98 99 100 102 103 107 109 110 111 112 113 114 115 116 117 120 124 125 126 127 132 133 134 135 136 138 139 141 143 153 156 165 166 173 175 179 181 183 184 192 193 200 207 208 212 214 215 217 218 219 220 221 222 223 228 236 241 245 255 259 267 270 271 272 273 274 278
official 144
often 17
old 158 235
omnipotent 176
on 44 58 86 87 113 114 116 130 174 180 189 191 260
once 46 66 147
one 6 7 10 13 14 57 65 71 74 81 92 101 130 149 154 168 186 195 196 214 215 216 217 219 220 258
only 56 57 203 235 265
onto 11 12
open 275
opens 11
opinion 172
opposite 162
or 33 53 70 73 79 85 135 148 172 211 215 223 224 229 243 271
order 56 212 249 264
orders 157
organic 59
origin 139
original 253
originators 90
orthographic 62 107 206
other 14 181 225 275
others 4 88 93 95 132 162 194 201 267
our 33 82 256
out 27 153 240
over 29 167
overlook 167
own 135 136 193 194
page 43 68 86 87 275
pages 43 45 84 93 231 272 275
palm 74
passes 15
path 185
pdf 278
pentagonal 34
penultimate 68
people 117 230
peoples 80
perception 33
perfect 106 181
perfectly 237
perfidious 135
perhaps 5 26 48 134 234
period 102 124 138 179 254
periodic 247
periods 158
person 126
personal 121
persons 133
perused 192
perversely 67
phantasmal 230
philosopher 101
philosophers 141
phrase 69
phrases 210
physical 14
picture 244
piece 207
pilgrimages 232
pilgrims 129
place 194 243
places 243
planes 274
plaster 209
played 259
point 271
pointless 207 218 237 270
polished 261
politicalshakespeares 278
population 233
portuguese 95
posited 103
possesses 222
possessors 120
possibility 89 267
possible 107 185 221 245 265
pray 191
precious 153 237
prefer 19
prefigure 45
premises 105
preparing 27
present 228
primitive 85
printed 271
problem 121 246
proclaim 203
produced 142
profound 140
promise 20
proof 111 112
prophecies 125
proportion 259
proposed 154 187
prostrate 230
prove 204 266
provided 20
punctuation 254
pure 199
purifiers 173
pyramid 224
pyramids 69
quest 26 73 134
quite 81
rage 165
railing 6 29
random 200
rational 52 70
rationality 199
reach 154 175
reaction 119
read 192 225 231
reading 193 211
recall 48 134
reduction 169
refer 133
referring 256
refutation 220
regions 132
regression 187
remote 243
remotest 16
renders 229
repeat 38 83 264
repeated 67 249
repeating 99
replication? 19
repudiate 72
required 142 270
reveal 35
revealed 139
rhetorical 241
riddle 64
right 13 82
rooms 32
rudiments 98
rule 198
rungs 146
rushed 128
s 9 11 14 74 84 124 127 135
said 95
same 8 87 138 174 221 248 249
samoyed 97
satisfactorily 64
satisfying 14
savages 231
saw 66
saxon 117
say 45
scouring 144
scrawls 58
search 184
searchers 144
searches 154
searching 148 187
seated 56
second 62
secret 121 186 215 238
sect 154 157 184
see 6 77 204
seen 133 145 158
semibarbarous 71
sense 72 90 198
senseless 70 166
separates 57
series 87
set 22
seventeenth 272
several 171
shaft 6
shafts 131
shall 31 77
shape 32
shelf 180 191
sheltered 186
should 247
show 163
showed 94
shuffle 155
side 8
sides 9 11
significance 216
silken 274
similar 275
similarly 151
since 104 170
single 172 192 207 261 270
sink 30
sites 278
six 9 224
size 271
sleeping 14
smaller 176
so 168 189 201
solid 273
solitary 237
solitude 250
solution 47 122 246
solve 64
some 85 92 122 135 145 152 156 180 182 191 192 202 221 243
someday 175
someone 187
something 224
sometimes 260
somewhere 122 213
sort 11
space 15 33 102 255
speak 81 146 200 216 218
speaking 270
species 235 236
spent 190
sphere 38
spherical 21
spine 36
spiral 15
spite 47
spurred 128 174
squabbled 129
squandered 190
staircase 15 146 266 267
staircases 55 130 147 243 261
state 241
stated 273
statement 70
still 183
story 114
strangled 130
strict 157
strictly 270
stroke 156
struck 46
structure 267
structures 205
succeeded 151
such 73 189 190 193
suddenly 123
suffice 37
sufficient 141 256 264
suggest 246
suggested 85
suicide 258
suicides 234
summarizing 46
superposition 273
superstition 179
superstitious 72
surely 142
surfaces 19
susceptible 211
suspect 37 235
sweet 127
syllable 216
symbol 221
symbols 58 62 75 107 155 206 256
symmetrical 60
system 222
tacitus 117
take 148
talk 124 146
tasks 145
taste 204
tautologies 218
teeters 236
tells 188
ten 42 84 271
tenderness 217
tens 192
termed 180
terrible 216
terror 217
testimony 36
than 10 171 176
that 17 18 19 21 26 31 32 33 34 36 37 38 45 51 57 62 64 74 76 77 79 80 82 83 85 86 88 89 95 101 105 106 108 113 114 119 124 125 126 135 138 142 143 146 151 152 153 154 155 164 165 167 168 169 171 172 179 181 182 184 186 189 190 191 192 198 199 201 207 208 211 214 215 216 217 224 229 232 235 236 240 241 242 245 248 250 256 259 264 266 267 269 270 271 273 274
the 1 4 5 7 8 9 10 12 13 14 16 18 20 21 22 25 26 28 29 30 31 32 36 38 44 45 46 47 48 51 52 53 54 55 56 57 58 59 63 64 65 66 67 69 72 74 75 80 81 82 86 87 89 90 92 93 95 96 98 100 102 103 104 105 107 109 110 111 112 113 114 115 116 117 119 120 122 123 124 126 128 129 130 132 133 135 138 139 140 141 142 143 144 147 148 152 154 157 159 160 162 163 166 167 168 170 172 173 174 175 180 181 183 184 186 191 193 198 200 204 205 206 207 208 209 212 213 214 217 219 220 221 222 223 224 228 229 230 233 235 236 240 242 243 244 245 246 248 249 253 254 255 259 265 269 270 272 273 275 278
their 34 36 37 127 129 131 145 165 166 231
them 35 133 134 145 146 194 221
themselves 76 120 131 224 230
then 141 194
theory 63
there 15 16 18 21 44 62 70 104 121 124 138 144 170 181 183 190 195 213 216 258 266
thereby 64
these 21 58 129 189
thesis 88
they 22 33 74 102 145 146 147 163 174 200 201 231
thin 272
thing 162 223
things 202
think 190 241
thirty 42 219
this 15 17 68 69 101 182 183 218 246
those 44 79 99 105 112 134 140 153 155 167 174 203 210 217 242 244 254
though 89 108 194 231 266
thought 162
thousand 171
thousands 111 127
threaten 201
three 63 258
threw 130
through 15 148 156 164 175
throw 29
thunder 209
thy 69 196
time 69 125 139
times 258
tiny 13
titled 208 209
to 8 9 12 19 27 35 48 57 63 64 67 72 73 82 84 94 100 109 128 131 133 134 136 149 157 162 165 175 183 186 187 188 189 191 193 201 207 211 218 241 242 246 254 256 264 267
today 81 166
together 142
told 94
toledo 269
tomb 29
tongues 215
too 15
tortured 195
total 106 170 191
tragic 47
translation 114
transmogrify 201
traveled 25 260
traveler 55 247
travelers 104
traveling 94
treasure 121
treasures 167
treatise 116
trembling 58
triangular 33
true 80 82 84 112 114
truth 51
trying 72
turn 11
twenty 8 62 75 103 107 206 255
two 13 21 42 93 103 104 107 133 167 169 254 255
type 271
ubiquitous 222
unattainable 39
unbearable 154
unbounded 120
unbridled 151
unceasing 23
understand 225
undoubtedly 210
unfathomable 29
unimaginably 108
unique 170
universally 89
universe 4 54 122 123 126 191
unknown 191 256
unlikely 190
unlimited 123 245 247
unmoving 237
unrelenting 175
unspeakably 260
until 155
untold 248
unvarying 84
up 148
uploads 278
upon 113
upper 92
upright 14
upstairs 128
upward 15
us 229
using 274
v 67 84 86
vademecum 274
vague 88
vain 72 128 186
value 86 87
variations 99 206
vast 108 269
ventilation 6 131
ventures 189
verbal 71 205 212
verbose 218
verge 236
version 136
vestibule 11 13 16
vestiges 183
vindicate 125
vindication 129 135
vindications 125 132
virtually 65
vocabulary 221
volume 164 208 270
volumes 131 166 200 219 237 248
wall 41
walls 36 165
was 28 76 79 95 96 98 119 120 121 122 124 138 151 154 162 180 186 258
water 55
way 162
we 77 81 165 179
well 204
went 132 134
were 18 79 95 131 146 153 157 164 174
what 18 27 45 96 180
when 28 119
where 28 188
wherein 196
which 4 11 15 17 31 58 65 90 103 108 133 188 203 230 234 244 249
while 147 205 209 223
who 94 134 158 167 225 242 244
whole 106
whose 38 39 47 51 68 71 121 200 267
widely 168
width 123
will 29 30 45 74 238 245
wind 30
winds 15
wisdom 193
wish 48
with 35 41 54 59 73 88 97 99 123 143 147 159 217 237 259
within 95
without 261
wondrous 126
word 240
words 37 140 143 149 203 212 224
work 53
world 52 121 242 244
worshiped 184
worthless 163
would 18 125 158 163 207 248 271 274 275
wp 278
writing 75
written 27 95 116 229 240
year 234
years 63 79 92 185 190
yet 153 236 244
yiddish 95
you 225
young 230
younger 25
your 114
zeal 175
zero 136
zone 68 71 183
The database is \((s_1,…,s_d)\)
We need two auxiliary variables. Let \(l ←1, u ←d\)
To search in a sorted file, we start in the middle
In a sorted file, we can discard half of the database after every comparison
So we need to compare the query with \(\log(d)\) subjects
Thus, the search cost is \(O(mn\log_2(d))\)
Let’s say that \(m=100, n=100.\) Then
d | plain.database | time | with.index | time_2 |
1000 | 1e+07 | 10 sec | 99658 | 0.1 sec |
10000 | 1e+08 | 1.7 min | 132877 | 0.13 sec |
1e+05 | 1e+09 | 16.7 min | 166096 | 0.17 sec |
1e+06 | 1e+10 | 2.8 hours | 199316 | 0.2 sec |
1e+07 | 1e+11 | 1.2 days | 232535 | 0.23 sec |
1e+08 | 1e+12 | 11.6 days | 265754 | 0.27 sec |
1e+09 | 1e+13 | 115.7 days | 298974 | 0.3 sec |
assuming \(10^{6}\) comparisons each second
There are many strategies to index databases
This is one of the main differences between tools
A big part of bioinformatics is to know which index to use
All previous discussion assumed exact match
It can be extended to partial match
Let’s say that there are at most 3 mismatches (indels or gaps) between query and the best subject
To find a partial match with \(n\) mismatches
BLAST uses indices to look for an initial hit
(sometimes called seed)
Then it tries to extend it using building a dot-matrix around the hit
The key parameter is Word size
Words of larger size are faster to search
Small word size is more sensitive
In particular Multiple Alignment