November 28, 2018

AWK

  • Programs in awk consist of condition–action pairs.
  • conditions are either true or false
  • actions are blocks of commands, wrapped in {}
  • An action without a condition always runs.
  • A condition without an action runs the default { print $0 }.

Everything is optional

but not at the same time

awk 'statements' file1 file2 ...
  • If there is no condition, the action runs on every input line
  • If there is no action, then it prints all the line when the condition is true
  • If there is no input file, awk reads from the standard input

AWK data model

  • Each file is a sequence of records
    • By default one line is one record
  • Each record contains several fields
    • By default each word is a field

AWK automatic variables

There are many automatic variables in awk, such as

  • $1, $2, and so are the fields of each record
    • i.e., the first word, the second word
  • $0 is the complete input record
  • NF is the Number of Fields in the current record
    • i.e., the number of words
  • NR is the Number of the current Record
    • i.e., the line number

Trivial Example

This program is equivalent to the cat command

awk '{ print $0 }'

It copies the standard input to its standard output

It also works with one or more files

awk '{ print $0 }' file1 file2 ...

Actions with many commands

If we want to use several commands in an action, we have to separate them with ;

For example this program prints the number of words of each line and then each line

awk '{ print NF; print $0 }'

The output has 2 lines for each line in the input

More actions

So far we have seen only one command: print

There are several more commands, starting with assignments

Example

x = 1

Assignments

The symbol = is used to assign a new value to a variable

variable = value

Variables are created automatically when you assign them

Variables can be numeric or text

Arithmetic

To create a value we can use the basic arithmetic operators

  • + addition, sum
  • - subtraction, difference
  • * multiplication, product
  • / division, quotient
  • % reminder, modulo
  • ^ exponentiation, power

Examples

a = 4*x^2 - 2*x + 1;
b = n / 2;
c = n % 2;

Updating values

It is very common that we do this

x = x + y

The value of x is incremented by y

There is a shorter and faster command for this

x += y

Assignments

  • x += increment Add increment to the value of x.
  • x -= decrement Subtract decrement from the value of x.
  • x *= coefficient Multiply the value of x by coefficient.
  • x /= divisor Divide the value of x by divisor.
  • x %= modulus Set x to its remainder by modulus.
  • x ^= power Raise x to the power power.

Text variables

To assign a text constant, it must be inside ""

name = "Andres"
surname = "Aravena"

The only operation valid for text variables is concatenation

full_name = name " " surname

We just put the text variables or constant together. There is no symbol for concatenation

Text to number and back

Text and numbers can be mixed when it makes sense

For example, if a=1 and b="2", then

  • a + b is equal to 3
    • b is used as a number, since + is a number operation
  • a b is equal to "12"
    • a is used as a text, since concatenation is a text operation

The result depends on the operations

Initial values

If we use the value of a variable that has never been used before, the result is an empty text ""

Thus, we can do this

all = all $0

and the variable all will collect all the text on the file

Initial values

If the empty text "" is used in a numeric context, then its value is 0

Thus, we can do this

n += NF

After processing all the file, the variable n will contain the number of words of the file

Conditions

Conditions

  • comparisons
    • ==, !=, <, >, <=, >=
  • regular expressions
    • matching anywhere
    • matching a single column
  • BEGIN, END
  • combinations using &&, || and !

Comparisons

a == b
a is equal to b. Comparison uses ==, assignment uses =
a != b
a is not equal to b
a < b
a is less than b
a > b
a is greater than b
a <= b
a is less than or equal to b
a >= b
a is greater or equal to b

Regular Expressions

We write regular expressions surrounded by //

  • /regex/ is true if any part of the record matches regex
  • $2 ~ /regex/ is true if the second field matches regex

In general you can use ~ (tilde) to see if any variable matches a regular expression

BEGIN and END

These special conditions are only true once in every program

  • BEGIN is true before reading any record
    • We can define initial values for some variables
  • END is true after reading all records
    • We can print the variables that we computed while reading the records

Examples

Print every line that has at least one field:

awk 'NF > 0' data

This is an easy way to delete blank lines from a file

Examples

Print the total number of bytes used by files:

ls -l | awk '{ x += $5 }
                   END { print "total bytes: " x }'

Examples

Print the total number of kilobytes used by files:

ls -l | awk '{ x += $5 }
        END { print "total Kilobytes:", x / 1024 }'

Examples

Print a sorted list of the countries:

awk -F: '{ print $1 }' gapminder-2007.txt | sort

(this is like cut)

Examples

Count the lines in a file:

awk 'END { print NR }' gapminder-2007.txt

(this is like wc -l)

Examples

Print the even-numbered lines in the data file:

awk 'NR % 2 == 0' data

If we use NR % 2 == 1 instead, the program prints the odd-numbered lines