November 14, 2018
for today’s class we need to “copy” several files to each one of your folders
But we will not modify the files. They will be read only
And we do not want 30 copies of the same file. It will use too much disk
Instead of copying, we will link the files to your folder
mkdir Gutenberg cd Gutenberg ln /home/andres/Gutenberg/* . ls -l
total 6316 -rw-r--r-- 1 andres andres 594933 Nov 14 12:46 Adventures_of_Sherlock_Holmes.txt -rw-r--r-- 1 andres andres 2347825 Nov 14 12:46 Don_Quixote.txt -rw-r--r-- 1 andres andres 405900 Nov 14 12:46 Dubliners.txt -rw-r--r-- 1 andres andres 748536 Nov 14 12:46 Educating_by_Story-Telling.txt -rw-rw-r-- 1 andres andres 1938980 Nov 15 09:52 english -rw-r--r-- 1 andres andres 141420 Nov 14 12:46 Metamorphosis.txt -rw-r--r-- 1 andres andres 272277 Nov 14 12:46 Study_In_Scarlet.txt
ln
gives new names to existing filesEach physical file on the disk can have several names
ln
creates a new name to an existing file
You can see the number of names of a file in the output of ls -l
What happens when we use rm
?
We use grep
to look for a pattern in one or more files
We use these options:
grep --color 'regex' file ... grep --only 'regex' file ... grep --count 'regex' file ...
The pattern is a regex. This means Regular Expression
A regex describes several words with a single text
grep --color 'analyze' english
analyze analyzed analyzer analyzer's analyzers analyzes psychoanalyze psychoanalyzed psychoanalyzes
grep --color '^analyze' english
analyze analyzed analyzer analyzer's analyzers analyzes
The symbol ^
represents “start of line”
grep --color 'analyze$' english
analyze psychoanalyze
The symbol $
represents “end of line”
grep --color '^analyze$' english
analyze
Symbols ^
and $
are called “anchors”
Count how many times the word “Sherlock” appears on each text file in the Gutenberg
folder
There are small differences between American and British versions of the English language
grep --color '^analyze$' english
analyze
grep --color '^analyse$' english
analyse
grep --color '^analy[sz]e$' english
analyze analyse
The symbols [
and ]
indicate a character class
That is, one letter from a set of letters
A character class allows you to match a range or set of characters
Example: [aeiou]
will match any (English) vowel
This matches “c”, followed by a vowel, followed by “t”
grep --only 'c[aeiou]t' english | sort |uniq
cat cet cit cot cut
We can also use character classes to specify characters we don’t want to match. These are called negated character classes
They are created by putting a caret ^
at the be-ginning of the class
This will match a “c”, followed by a non- vowel, followed by a “t”:
grep --only 'c[^aeiou]t' english | sort |uniq
cht ckt cst cyt
Show the complete matching line with the pattern in color for the following regex
You can also match a range of characters using a character class. For example,
[a-i]
will match any of the letters between a
and i
(inclusive)
Character classes work with numbers too
This matches a date between 1000 and 9999:
grep '[1-9][0-9][0-9][0-9]' *txt
The symbol .
represents “any letter”
More precisely, any character
grep --only 'c.t' english | sort |uniq
cat cet cht cit ckt cot cst cut cyt
Find all the lines ending with “Holmes” followed by a single character
The *
symbol means that something should be repeated zero or more times
That is, it folles an optional expression
grep '^colou*r$' english
color colour
The characters .
, *
, [
, ]
, ^
, $
are special
They are called meta-characters
How can we look for them?
To take out the “superpowers”, we use \
\.
, \*
, \[
, \]
, \^
, $
, and \\
The rest of the characters match themselves
Look for “Holmes.”
Instead of grep
we will use egrep
Now the characters .?*[]^${}()+|\
are special
As before, we can always escape them
?
is like *
but means “zero or one time”
egrep --only 'lo?k' english |sort |uniq
lk lok
egrep --only 'lo*k' english |sort |uniq
lk lok look
+
means one or more times
That is, [a-z][a-z]*
is the same as [a-z]+
We can use curly braces to repeat something between a range of times:
^a{3,5}$
That will match the letter “a” repeated 3, 4, or 5 times.
If you want to match something repeated up to a certain number of times, you can use 0 as the first number.
If you want to match something more than a certain number with no maximum, you can just leave the second number blank:
^a{3,}$
If you want to match two different expressions, you can use |
egrep 'cat|dog' english
Alcatraz Alcatraz's Decatur Decatur's Hecate Hecate's Ladoga Ladoga's Mercator Mercator's Muscat Muscat's Popocatepetl Popocatepetl's Yucatan Yucatan's abdicate abdicated abdicates abdicating abdication abdication's abdications adjudicate adjudicated adjudicates adjudicating adjudication adjudication's adjudicator adjudicator's adjudicators advocate advocate's advocated advocates advocating allocate allocated allocates allocating allocation allocation's allocations altercation altercation's altercations amplification amplification's amplifications application application's applications applicator applicator's applicators authenticate authenticated authenticates authenticating authentication authentication's authentications avocation avocation's avocations beatification beatification's beatifications beautification beautification's bifurcate bifurcated bifurcates bifurcating bifurcation bifurcation's bifurcations bobcat bobcat's bobcats boondoggle boondoggle's boondoggled boondoggles boondoggling bulldog bulldog's bulldogged bulldogging bulldogs caricature caricature's caricatured caricatures caricaturing caricaturist caricaturist's caricaturists cat cat's cataclysm cataclysm's cataclysmic cataclysms catacomb catacomb's catacombs catafalque catafalque's catafalques catalepsy catalepsy's cataleptic cataleptic's cataleptics catalog catalog's cataloged cataloger cataloger's catalogers cataloging catalogs catalogue catalogue's catalogued cataloguer cataloguer's cataloguers catalogues cataloguing catalpa catalpa's catalpas catalysis catalysis's catalyst catalyst's catalysts catalytic catalytic's catalyze catalyzed catalyzes catalyzing catamaran catamaran's catamarans catapult catapult's catapulted catapulting catapults cataract cataract's cataracts catarrh catarrh's catastrophe catastrophe's catastrophes catastrophic catastrophically catatonic catatonic's catatonics catbird catbird's catbirds catboat catboat's catboats catcall catcall's catcalled catcalling catcalls catch catch's catchall catchall's catchalls catcher catcher's catchers catches catchier catchiest catching catchings catchment catchphrase catchup catchup's catchword catchword's catchwords catchy catechise catechised catechises catechising catechism catechism's catechisms catechize catechized catechizes catechizing categorical categorically categories categorization categorization's categorizations categorize categorized categorizes categorizing category category's cater catered caterer caterer's caterers catering caterings caterpillar caterpillar's caterpillars caters caterwaul caterwaul's caterwauled caterwauling caterwauls catfish catfish's catfishes catgut catgut's catharses catharsis catharsis's cathartic cathartic's cathartics cathedral cathedral's cathedrals catheter catheter's catheters cathode cathode's cathodes catholic catholicity catholicity's cation cation's cations catkin catkin's catkins catnap catnap's catnapped catnapping catnaps catnip catnip's cats catsup catsup's cattail cattail's cattails cattier cattiest cattily cattiness cattiness's cattle cattle's cattleman cattleman's cattlemen catty catwalk catwalk's catwalks certificate certificate's certificated certificates certificating certification certification's certifications cicatrice cicatrice's cicatrices cicatrix cicatrix's clarification clarification's clarifications classification classification's classifications codification codification's codifications coeducation coeducation's coeducational collocate collocate's collocated collocates collocating collocation collocation's collocations communicate communicated communicates communicating communication communication's communications communicative communicator communicator's communicators complicate complicated complicates complicating complication complication's complications concatenate concatenated concatenates concatenating concatenation concatenation's concatenations confiscate confiscated confiscates confiscating confiscation confiscation's confiscations convocation convocation's convocations copycat copycat's copycats copycatted copycatting coruscate coruscated coruscates coruscating decathlon decathlon's decathlons dedicate dedicated dedicates dedicating dedication dedication's dedications defecate defecated defecates defecating defecation defecation's deification deification's delicate delicately delicatessen delicatessen's delicatessens demarcate demarcated demarcates demarcating demarcation demarcation's deprecate deprecated deprecates deprecating deprecation deprecation's deprecatory desiccate desiccated desiccates desiccating desiccation desiccation's detoxification detoxification's dislocate dislocated dislocates dislocating dislocation dislocation's dislocations disqualification disqualification's disqualifications diversification diversification's dog dog's dogcatcher dogcatcher's dogcatchers dogfight dogfight's dogfights dogfish dogfish's dogfishes dogged doggedly doggedness doggedness's doggerel doggerel's doggie doggie's doggier doggies doggiest dogging doggone doggoned doggoneder doggonedest doggoner doggones doggonest doggoning doggy doggy's doghouse doghouse's doghouses dogie dogie's dogies dogma dogma's dogmas dogmata dogmatic dogmatically dogmatism dogmatism's dogmatist dogmatist's dogmatists dogs dogtrot dogtrot's dogtrots dogtrotted dogtrotting dogwood dogwood's dogwoods domesticate domesticated domesticates domesticating domestication domestication's ducat ducat's ducats duplicate duplicate's duplicated duplicates duplicating duplication duplication's duplicator duplicator's duplicators edification edification's educate educated educates educating education education's educational educationally educations educator educator's educators electrification electrification's emulsification emulsification's equivocate equivocated equivocates equivocating equivocation equivocation's equivocations eradicate eradicated eradicates eradicating eradication eradication's evocation evocation's evocations evocative excommunicate excommunicated excommunicates excommunicating excommunication excommunication's excommunications exemplification exemplification's exemplifications explicate explicated explicates explicating explication explication's explications extricate extricated extricates extricating extrication extrication's fabricate fabricated fabricates fabricating fabrication fabrication's fabrications falsification falsification's falsifications flycatcher flycatcher's flycatchers fornicate fornicated fornicates fornicating fornication fornication's fortification fortification's fortifications gentrification gentrification's glorification glorification's gratification gratification's gratifications hangdog identification identification's implicate implicated implicates implicating implication implication's implications imprecation imprecation's imprecations inculcate inculcated inculcates inculcating inculcation inculcation's indelicate indelicately indemnification indemnification's indemnifications indicate indicated indicates indicating indication indication's indications indicative indicative's indicatives indicator indicator's indicators intensification intensification's intoxicate intoxicated intoxicates intoxicating intoxication intoxication's intricate intricately invocation invocation's invocations judicature judicature's justification justification's justifications locate located locates locating location location's locations lolcat lolcat's lolcats lubricate lubricated lubricates lubricating lubrication lubrication's lubricator lubricator's lubricators magnification magnification's magnifications masticate masticated masticates masticating mastication mastication's medicate medicated medicates medicating medication medication's medications metrication metrication's misapplication misapplication's miscommunication modification modification's modifications mollification mollification's mortification mortification's multiplication multiplication's multiplications multiplicative mummification mummification's muscat muscatel muscatel's muscatels mystification mystification's notification notification's notifications nullification nullification's obfuscate obfuscated obfuscates obfuscating obfuscation obfuscation's ossification ossification's oversimplification oversimplification's oversimplifications pacification pacification's personification personification's personifications piscatorial pizzicati pizzicato pizzicato's pizzicatos placate placated placates placating placation placation's polecat polecat's polecats pontificate pontificate's pontificated pontificates pontificating predicate predicate's predicated predicates predicating predication predication's predicative prefabricate prefabricated prefabricates prefabricating prefabrication prefabrication's prevaricate prevaricated prevaricates prevaricating prevarication prevarication's prevarications prevaricator prevaricator's prevaricators prognosticate prognosticated prognosticates prognosticating prognostication prognostication's prognostications prognosticator prognosticator's prognosticators provocation provocation's provocations provocative provocatively publication publication's publications purification purification's pussycat pussycat's pussycats quadruplicate quadruplicate's quadruplicated quadruplicates quadruplicating qualification qualification's qualifications ramification ramification's ramifications ratification ratification's reallocate reallocated reallocates reallocating reallocation reciprocate reciprocated reciprocates reciprocating reciprocation reciprocation's rectification rectification's rectifications rededicate rededicated rededicates rededicating reeducate reeducated reeducates reeducating reeducation reeducation's relocatable relocate relocated relocates relocating relocation relocation's replicate replicated replicates replicating replication replication's replications reunification reunification's revivification revivification's revocation revocation's revocations sanctification sanctification's scat scat's scathing scathingly scatological scats scatted scatter scatter's scatterbrain scatterbrain's scatterbrained scatterbrains scattered scattering scatters scatting sheepdog sheepdog's sheepdogs signification signification's significations silicate silicate's silicates simplification simplification's simplifications slumdog slumdog's slumdogs solidification solidification's sophisticate sophisticate's sophisticated sophisticates sophisticating sophistication sophistication's specification specification's specifications staccati staccato staccato's staccatos stratification stratification's stultification stultification's suffocate suffocated suffocates suffocating suffocation suffocation's supplicate supplicated supplicates supplicating supplication supplication's supplications syllabication syllabication's syllabification syllabification's syndicate syndicate's syndicated syndicates syndicating syndication syndication's telecommunication telecommunication's telecommunications telecommunications's tomcat tomcat's tomcats triplicate triplicate's triplicated triplicates triplicating truncate truncated truncates truncating truncation truncation's unauthenticated uncatalogued uncommunicative uncomplicated underdog underdog's underdogs uneducated unification unification's unscathed unsophisticated vacate vacated vacates vacating vacation vacation's vacationed vacationer vacationer's vacationers vacationing vacations verification verification's versification versification's vilification vilification's vindicate vindicated vindicates vindicating vindication vindication's vindications vindicator vindicator's vindicators vocation vocation's vocational vocations vocative vocative's vocatives watchdog watchdog's watchdogs wildcat wildcat's wildcats wildcatted wildcatting Alcatraz Alcatraz's Decatur Decatur's Hecate Hecate's Ladoga Ladoga's Mercator Mercator's Muscat Muscat's Popocatepetl Popocatepetl's Yucatan Yucatan's abdicate abdicated abdicates abdicating abdication abdication's abdications adjudicate adjudicated adjudicates adjudicating adjudication adjudication's adjudicator adjudicator's adjudicators advocate advocate's advocated advocates advocating allocate allocated allocates allocating allocation allocation's allocations altercation altercation's altercations amplification amplification's amplifications application application's applications applicator applicator's applicators authenticate authenticated authenticates authenticating authentication authentication's authentications avocation avocation's avocations beatification beatification's beatifications beautification beautification's bifurcate bifurcated bifurcates bifurcating bifurcation bifurcation's bifurcations bobcat bobcat's bobcats boondoggle boondoggle's boondoggled boondoggles boondoggling bulldog bulldog's bulldogged bulldogging bulldogs caricature caricature's caricatured caricatures caricaturing caricaturist caricaturist's caricaturists cat cat's cataclysm cataclysm's cataclysmic cataclysms catacomb catacomb's catacombs catafalque catafalque's catafalques catalepsy catalepsy's cataleptic cataleptic's cataleptics catalogue catalogue's catalogued cataloguer cataloguer's cataloguers catalogues cataloguing catalpa catalpa's catalpas catalyse catalysed catalysing catalysis catalysis's catalyst catalyst's catalysts catalytic catalytic's catamaran catamaran's catamarans catapult catapult's catapulted catapulting catapults cataract cataract's cataracts catarrh catarrh's catastrophe catastrophe's catastrophes catastrophic catastrophically catatonic catatonic's catatonics catbird catbird's catbirds catboat catboat's catboats catcall catcall's catcalled catcalling catcalls catch catch's catchall catchall's catchalls catcher catcher's catchers catches catchier catchiest catching catchings catchment catchphrase catchup catchup's catchword catchword's catchwords catchy catechise catechised catechises catechising catechism catechism's catechisms categorical categorically categories categorisation categorisation's categorisations categorise categorised categorises categorising category category's cater catered caterer caterer's caterers catering caterings caterpillar caterpillar's caterpillars caters caterwaul caterwaul's caterwauled caterwauling caterwauls catfish catfish's catfishes catgut catgut's catharses catharsis catharsis's cathartic cathartic's cathartics cathedral cathedral's cathedrals catheter catheter's catheters cathode cathode's cathodes catholic catholicity catholicity's cation cation's cations catkin catkin's catkins catnap catnap's catnapped catnapping catnaps catnip catnip's cats catsup catsup's cattail cattail's cattails cattier cattiest cattily cattiness cattiness's cattle cattle's cattleman cattleman's cattlemen catty catwalk catwalk's catwalks certificate certificate's certificated certificates certificating certification certification's certifications cicatrice cicatrice's cicatrices cicatrix cicatrix's clarification clarification's clarifications classification classification's classifications codification codification's codifications coeducation coeducation's coeducational collocate collocate's collocated collocates collocating collocation collocation's collocations communicate communicated communicates communicating communication communication's communications communicative communicator communicator's communicators complicate complicated complicates complicating complication complication's complications concatenate concatenated concatenates concatenating concatenation concatenation's concatenations confiscate confiscated confiscates confiscating confiscation confiscation's confiscations convocation convocation's convocations copycat copycat's copycats copycatted copycatting coruscate coruscated coruscates coruscating decathlon decathlon's decathlons dedicate dedicated dedicates dedicating dedication dedication's dedications defecate defecated defecates defecating defecation defecation's deification deification's delicate delicately delicatessen delicatessen's delicatessens demarcate demarcated demarcates demarcating demarcation demarcation's deprecate deprecated deprecates deprecating deprecation deprecation's deprecatory desiccate desiccated desiccates desiccating desiccation desiccation's detoxification detoxification's dislocate dislocated dislocates dislocating dislocation dislocation's dislocations disqualification disqualification's disqualifications diversification diversification's dog dog's dogcatcher dogcatcher's dogcatchers dogfight dogfight's dogfights dogfish dogfish's dogfishes dogged doggedly doggedness doggedness's doggerel doggerel's doggie doggie's doggier doggies doggiest dogging doggone doggoned doggoneder doggonedest doggoner doggones doggonest doggoning doggy doggy's doghouse doghouse's doghouses dogie dogie's dogies dogma dogma's dogmas dogmata dogmatic dogmatically dogmatism dogmatism's dogmatist dogmatist's dogmatists dogs dogtrot dogtrot's dogtrots dogtrotted dogtrotting dogwood dogwood's dogwoods domesticate domesticated domesticates domesticating domestication domestication's ducat ducat's ducats duplicate duplicate's duplicated duplicates duplicating duplication duplication's duplicator duplicator's duplicators edification edification's educate educated educates educating education education's educational educationally educations educator educator's educators electrification electrification's emulsification emulsification's equivocate equivocated equivocates equivocating equivocation equivocation's equivocations eradicate eradicated eradicates eradicating eradication eradication's evocation evocation's evocations evocative excommunicate excommunicated excommunicates excommunicating excommunication excommunication's excommunications exemplification exemplification's exemplifications explicate explicated explicates explicating explication explication's explications extricate extricated extricates extricating extrication extrication's fabricate fabricated fabricates fabricating fabrication fabrication's fabrications falsification falsification's falsifications flycatcher flycatcher's flycatchers fornicate fornicated fornicates fornicating fornication fornication's fortification fortification's fortifications gentrification gentrification's glorification glorification's gratification gratification's gratifications hangdog identification identification's implicate implicated implicates implicating implication implication's implications imprecation imprecation's imprecations inculcate inculcated inculcates inculcating inculcation inculcation's indelicate indelicately indemnification indemnification's indemnifications indicate indicated indicates indicating indication indication's indications indicative indicative's indicatives indicator indicator's indicators intensification intensification's intoxicate intoxicated intoxicates intoxicating intoxication intoxication's intricate intricately invocation invocation's invocations judicature judicature's justification justification's justifications locate located locates locating location location's locations lolcat lolcat's lolcats lubricate lubricated lubricates lubricating lubrication lubrication's lubricator lubricator's lubricators magnification magnification's magnifications masticate masticated masticates masticating mastication mastication's medicate medicated medicates medicating medication medication's medications metrication metrication's misapplication misapplication's miscommunication modification modification's modifications mollification mollification's mortification mortification's multiplication multiplication's multiplications multiplicative mummification mummification's muscat muscatel muscatel's muscatels mystification mystification's notification notification's notifications nullification nullification's obfuscate obfuscated obfuscates obfuscating obfuscation obfuscation's ossification ossification's oversimplification oversimplification's oversimplifications pacification pacification's personification personification's personifications piscatorial pizzicati pizzicato pizzicato's pizzicatos placate placated placates placating placation placation's polecat polecat's polecats pontificate pontificate's pontificated pontificates pontificating predicate predicate's predicated predicates predicating predication predication's predicative prefabricate prefabricated prefabricates prefabricating prefabrication prefabrication's prevaricate prevaricated prevaricates prevaricating prevarication prevarication's prevarications prevaricator prevaricator's prevaricators prognosticate prognosticated prognosticates prognosticating prognostication prognostication's prognostications prognosticator prognosticator's prognosticators provocation provocation's provocations provocative provocatively publication publication's publications purification purification's pussycat pussycat's pussycats quadruplicate quadruplicate's quadruplicated quadruplicates quadruplicating qualification qualification's qualifications ramification ramification's ramifications ratification ratification's reallocate reallocated reallocates reallocating reallocation reciprocate reciprocated reciprocates reciprocating reciprocation reciprocation's rectification rectification's rectifications rededicate rededicated rededicates rededicating reeducate reeducated reeducates reeducating reeducation reeducation's relocatable relocate relocated relocates relocating relocation relocation's replicate replicated replicates replicating replication replication's replications reunification reunification's revivification revivification's revocation revocation's revocations sanctification sanctification's scat scat's scathing scathingly scatological scats scatted scatter scatter's scatterbrain scatterbrain's scatterbrained scatterbrains scattered scattering scatters scatting sheepdog sheepdog's sheepdogs signification signification's significations silicate silicate's silicates simplification simplification's simplifications slumdog slumdog's slumdogs solidification solidification's sophisticate sophisticate's sophisticated sophisticates sophisticating sophistication sophistication's specification specification's specifications staccati staccato staccato's staccatos stratification stratification's stultification stultification's suffocate suffocated suffocates suffocating suffocation suffocation's supplicate supplicated supplicates supplicating supplication supplication's supplications syllabication syllabication's syllabification syllabification's syndicate syndicate's syndicated syndicates syndicating syndication syndication's telecommunication telecommunication's telecommunications telecommunications's tomcat tomcat's tomcats triplicate triplicate's triplicated triplicates triplicating truncate truncated truncates truncating truncation truncation's unauthenticated uncatalogued uncommunicative uncomplicated underdog underdog's underdogs uneducated unification unification's unscathed unsophisticated vacate vacated vacates vacating vacation vacation's vacationed vacationer vacationer's vacationers vacationing vacations verification verification's versification versification's vilification vilification's vindicate vindicated vindicates vindicating vindication vindication's vindications vindicator vindicator's vindicators vocation vocation's vocational vocations vocative vocative's vocatives watchdog watchdog's watchdogs wildcat wildcat's wildcats wildcatted wildcatting
Alcatraz lubricator's complicated Alcatraz's lubricators complicates Decatur magnification complicating Decatur's magnification's complication Hecate magnifications complication's Hecate's masticate complications Ladoga masticated concatenate Ladoga's masticates concatenated Mercator masticating concatenates Mercator's mastication concatenating
Look for cats and dogs on all the text files
We can use (
and )
to define groups of expressions
egrep --only '([aeiou][^aeiou]){2}' english |sort |uniq
aliy one' ones axes ases izen utop ired ates emun aron ones onin axil asin izes ilot ires atin erat asid akin used axim eleg ebeg utop irin ator emun elar ated uses izat ated eful utos aged edic erat ilen ates ical axim eleg oken utow ages ated emun aham atin usin izat ates erin utum agin edic erat alom ire' iced axim eleg oman uxil ured ates emun uja' ired icem ized atin ized ilab ures edic erat apul ires ices axim eleg oman ilit urin atin emun ure' irin icin izes elen izer ilab ined edic erat