+
addition, sum-
subtraction, difference*
multiplication, product/
division, quotient%
reminder, modulo^
exponentiation, power
November 29, 2018
+
addition, sum-
subtraction, difference*
multiplication, product/
division, quotient%
reminder, modulo^
exponentiation, powerWhat does this command do?
seq 100 | awk '$1 % 7 == 0 {print $1}'
What does this command do?
seq 100 | awk '($1 % 3 == 0) && ($1 % 2 ==0) {print $1}'
The symbol =
is used to assign a new value to a variable
variable = value
Variables are created automatically when you assign them
Variables can be numeric or text
x += increment |
Add increment to the value of x . |
x -= decrement |
Subtract decrement from the value of x . |
x *= coefficient |
Multiply the value of x by coefficient . |
x /= divisor |
Divide the value of x by divisor . |
x %= modulus |
Set x to its remainder by modulus . |
x ^= power |
Raise x to the power power . |
We want to read a value, and change it by 1
y = x; x += 1 |
y = x++ |
y = x; x -= 1 |
y = x-- |
x += 1; y = x |
y = ++x |
x -= 1; y = x |
y = --xx |
Besides the variables we create with assignments, we have some pre-definded variables
$1
, $2
, and so are the fields of each record$0
: complete input recordNF
: Number of Fields in the current recordNR
: Number of the current RecordFILENAME
: name of the current file being processedFNR
: Number of the current Record in the current fileWrite an awk command that prints
Apply this to all .txt
files in your folder
==
, !=
, <
, >
, <=
, >=
BEGIN
, END
&&
, ||
and !
BEGIN
and END
These special conditions are only true once in every program
BEGIN
is true before reading any record
END
is true after reading all records
ls -l | awk '$6 == "Nov" { sum += $5 } END { print sum }'
FS
: Field Separator. Regex that separates fieldsRS
: Record Separator. Regex that separates fieldsOFS
: Output Field Separator. Text to separate printed fieldsORS
: Output Record Separator. Text to separate printed recordsNotice that Output separators are text, but Input separators are regular expressions
By default,
ORS="\n"
, that is, records are separated by new lineOFS=" "
, that is, fields are separated by spaceHere we find something new
Most characters are easy to write. Just use the keyboard
Some are harder, because they have other meanings, or they are not in the keyboard
To write them, we use \
followed by a letter
There are several cases, such as tab, new line, beep, backspace
(if you are curious, look at “line endings in UNIX v/s Windows”)
The most common special characters are written as follows
name | how to write it |
---|---|
TAB | \t |
new line | \n |
backslash | \\ |
Notice that Input separators are regular expressions
By default,
RS="\n"
, that is, records are separated by new lineFS=/[ \t]+/
, that is, fields are separated by whitespace. One or more space or tabThis is why the file /home/andres/world_2007.txt
is different from /home/andres/gapminder-2007.txt
Write an awk command that prints the file name and the number of fields for /home/andres/world_2007.txt
and /home/andres/gapminder-2007.txt
We can change the input’s field separator to process different kinds of files
For example, the list of users in UNIX is stored in the file /etc/passwd
busrabal:x:1060:1022:Busra Bala,,,:/home/busrabal:/bin/bash simay-24:x:1061:1006:Simay Goknil Urek,0401170068:/home/simay-24:/bin/bash mert-sir:x:1062:1006:Mert Sırmalı,0401170090:/home/mert-sir:/bin/bash
Here fields are separated by :
Let’s print the username (field 1) for the users in group (field 4) 1006
awk 'BEGIN {FS=":"} $4==1006 {print $1}' /etc/passwd
Write an awk command that prints the file name and the number of fields for /home/andres/world_2007.txt
and /home/andres/gapminder-2007.txt
Changing FS
is so useful that there is a shortcut for it
awk has the -F
option for it. Upper case F
awk -F ":" '$4==1006 {print $1}' /etc/passwd
awk
reads the input files one line at a time
For each line, awk
tries the patterns of each rule
If several patterns match, then several actions execute in the order in which they appear in the awk
program
If no patterns match, then no actions run.
After processing all the rules that match the line, awk
reads the next line.
This continues until the program reaches the end of the file.
For example, the following awk program contains two rules:
/12/ { print $0 } /21/ { print $0 }
Sooner or later you will have too many rules to fit in the command line
And it becomes hard to write all again and again
In this case we can write all in an .awk
file
We use a text editor (like nano
or vim
) to edit it
Let’s write the file gdp.awk
with this content
BEGIN { FS="\t" } NF>0 {gdp = $2*$5; total+=gdp; print $1,gdp } END {print "Total", total}
To tell awk
to read commands from a file, we use the option -f
(lower case f
)
awk -f gdp.awk /home/andres/gapminder-2007.txt
Be careful with -f
and -F