+
addition, sum-
subtraction, difference*
multiplication, product/
division, quotient%
reminder, modulo^
exponentiation, power
December 12, 2019
+
addition, sum-
subtraction, difference*
multiplication, product/
division, quotient%
reminder, modulo^
exponentiation, powerWhat does this command do?
seq 100 | awk '$1 % 7 == 0 {print $1}'
What does this command do?
seq 100 | awk '($1 % 3 == 0) && ($1 % 2 ==0) {print $1}'
The symbol =
is used to assign a new value to a variable
variable = value
Variables are created automatically when you assign them
Variables can be numeric or text
Besides the variables we create with assignments, we have some pre-definded variables
$1
, $2
, and so are the fields of each record$0
: complete input recordNF
: Number of Fields in the current recordNR
: Number of the current RecordFILENAME
: name of the current file being processedFNR
: Number of the current Record in the current fileWrite an awk command that prints
Apply this to all .txt
files in your folder
==
, !=
, <
, >
, <=
, >=
BEGIN
, END
&&
, ||
and !
BEGIN
and END
These special conditions are only true once in every program
BEGIN
is true before reading any record
END
is true after reading all records
ls -l | awk '$6 == "Nov" { sum += $5 } END { print sum }'
FS
: Field Separator. Regex that separates fieldsRS
: Record Separator. Regex that separates fieldsOFS
: Output Field Separator. Text to separate printed fieldsORS
: Output Record Separator. Text to separate printed recordsNotice that Output separators are text, but Input separators are regular expressions
By default,
ORS="\n"
, that is, records are separated by new lineOFS=" "
, that is, fields are separated by spaceHere we find something new
Most characters are easy to write. Just use the keyboard
Some are harder, because they have other meanings, or they are not in the keyboard
To write them, we use \
followed by a letter
There are several cases, such as tab, new line, beep, backspace
(if you are curious, look at “line endings in UNIX v/s Windows”)
The most common special characters are written as follows
name | how to write it |
---|---|
TAB | \t |
new line | \n |
backslash | \\ |
Notice that Input separators are regular expressions
By default,
RS="\n"
, that is, records are separated by new lineFS=/[ \t]+/
, that is, fields are separated by whitespace. One or more space or tabWe can change the input’s field separator to process different kinds of files
For example, the file /home/andres/population_total.csv
contains values separated by comma
geo,1800,1801,1802,1803,1804,1805,1806,1807,1808,1809,1810,1811,1812,1813, Afghanistan,3280000,3280000,3280000,3280000,3280000,3280000,3280000,3280000, Albania,410000,412000,413000,414000,416000,417000,418000,420000,421000,422000,
Here fields are separated by ,
Let’s print the country name (field 1) that had over 100 million people in 1800 (field 2)
awk 'BEGIN {FS=","} $2>100e6 {print $1}' population_total.csv
Changing FS
is so useful that there is a shortcut for it
awk has the -F
option for it. Upper case F
awk -F ":" '$2>100e6 {print $1}' population_total.csv