FS
: Field Separator. Regex that separates fieldsRS
: Record Separator. Regex that separates fieldsOFS
: Output Field Separator. Text to separate printed fieldsORS
: Output Record Separator. Text to separate printed records
December 19, 2019
FS
: Field Separator. Regex that separates fieldsRS
: Record Separator. Regex that separates fieldsOFS
: Output Field Separator. Text to separate printed fieldsORS
: Output Record Separator. Text to separate printed recordsFS
is super usefulChanging FS
is so useful that there is a shortcut for it
awk has the -F
option for it. Upper case F
awk -F ":" '$2>100e6 {print $1}' population_total.csv
awk
reads the input files one line at a time
For each record, awk
tries the patterns of each rule
If several patterns match, then several actions execute in the order in which they appear in the awk
program
If no patterns match, then no actions run.
After processing all the rules that match the line, awk
reads the next line.
This continues until the program reaches the end of the file.
For example, the following awk program contains two rules:
/12/ { print $0 } /21/ { print $0 }
Sooner or later you will have too many rules to fit in the command line
And it becomes hard to write all again and again
In this case we can write all in an .awk
file
We use a text editor (like nano
or vim
) to edit it
Let’s write the file gdp.awk
with this content
BEGIN { FS="\t" } NF>0 {gdp = $3*$4; total+=gdp; print $1,gdp } END {print "Total", total}
This time we take the commands from a file
To tell awk
to read commands from a file, we use the option -f
(lower case f
)
awk -f gdp.awk world2017.txt
Be careful. Do not confuse -f
and -F
AWK has the following built-in arithmetic functions:
int(expr) | Truncate to integer. |
rand() | Return a random number N, between 0 and 1, such that 0 ≤ N < 1. |
srand([expr]) | Use expr as the new seed for the random number generator. If no expr is provided, use the time of day. Return the previous seed for the random number generator. |
atan2(y, x) | Return the arctangent of y/x in radians. |
cos(expr) | Return the cosine of expr, which is in radians. |
sin(expr) | Return the sine of expr, which is in radians. |
exp(expr) | The exponential function. |
log(expr) | The natural logarithm function. |
sqrt(expr) | Return the square root of expr. |
Print seven random numbers from 0 to 99, inclusive:
awk 'BEGIN { for (i = 1; i <= 7; i++) print int(100 * rand()) }'
rand()
is a real number between 0 and 1.
It can be 0, but cannot be 1
i.e. 0 <= rand() && rand < 1
tolower(str)
str
, with all the uppercase characters in str
translated to their corresponding lowercase counterparts.toupper(str)
str
, with all the lowercase characters in str
translated to their corresponding uppercase counterparts.length([s])
s
, or the length of $0 if s
is not supplied.substr(s, i [, n])
s
starting at i
.n
is omitted, use the rest of s
.Write an awk program that changes the first word to Title Case
In an AWK script you can write comments to help you understand what is happening
This is super practical. since other people (or yourself) can understand the program later
Comments start with #
and continue to the end of line
The file /home/andres/population_total.csv
has data for all years and all countries
Take a look doing this:
head /home/andres/population_total.csv
We want to change the shape of this table
The output should be in three columns
We need to use the -F
option. Something like
awk -F ',' '{print $1, 1800, $2; print $1, 1801, $3; print $1, 1802, $4; }' /home/andres/population_total.csv
with one print
command for every field
Can we do it smarter?
for
loopsLike many other computer languages, awk can repeat the same commands several times
awk -F ',' '{for(i=2; i<=NF; i++) { print $1, 1798+i, $i } }' /home/andres/population_total.csv
for
loops have four partsThe general form of a for
loop looks like this:
for(
A;
B;
C){
D}
;
(semicolon){}
for
for(
A;
B;
C){
D}
A, C and D are normal awk commands or assignments
B is a TRUE/FALSE condition
The D part is repeated while B is true
B should be FALSE sometimes, otherwise we never finish