Source: World Health Organization website
COVID-19 data
I will show you some commands to get real data in R
Follow all the steps carefully
We will explain the commands later
This example will show the possibilities
Open the webpage
https://covid19.who.int/WHO-COVID-19-global-data.csv
The extension .csv
means “comma-separated values”
Downloads
Environment → Import Dataset → From Text (base)…
Find the file that you already downloaded
This will move data from the disk to the main memory
covid
The console shows the command used to read the file
Next time we can import data by writing this command
covid
on “Environment”It is a big table, with thousands of rows
Write the following commands in R
Be careful with UPPER and lower case
There are two =
signs in the command
What do we have in Environment now?
Turkey
[1] 5 0 4 0 0 42 0 144 0 475 277 0 289
[14] 293 343 561 1196 2069 1704 1869 1556 2704 2148 2456 2786 3013
[27] 3135 3148 3892 4117 4056 4747 5138 4789 4093 4062 4281 4801 4353
[40] 3783 3977 4674 4611 3083 3116 3122 2861 2357 2131 2392 2936 2615
[53] 2188 1983 1670 1614 1832 2253 1977 1848 1546 1542 1114 1704 1639
[66] 1635 1708 1610 1368 1158 1022 972 961 952 1186 1141 987 948
[79] 1035 1182 1141 983 839 827 786 867 988 930 878 914 989
[92] 993 922 987 1195 1459 1562 1592 1467 1429 1304 1214 1248 1192
[105] 1212 1268 1492 1458 1396 1372 1356 1374 1293 1192 1186 1172 1154
[118] 1148 1086 1053 1041 1024 1003 1016 1012 1008 992 947 933 926
[131] 918 924 931 928 902 913 937 921 927 919 963 942 967
[144] 982 996 987 995 1083 1178 1153 1185 1172 1182 1193 1183 1212
[157] 1243 1226 1256 1192 1233 1263 1303 1412 1203 1309 1217 1443 1502
[170] 1313 1491 1517 1549 1482 1587 1572 1596 1642 1612 1673 1578 1703
[183] 1761 1673 1512 1671 1509 1527 1716
This is what we call a vector in R
In the previous class we used variables to store single numbers
It is useful to handle several values at the same time, all grouped in the same variable
The most simple objects in R
Group of values, all with the same type
For example, a set of numbers
[1] 2392 2936 2615 2188 1983 1670 1614 1832 2253 1977 1848
The structure of a variable corresponds to the way the data is organized
Vectors are the simplest way to organize data
We will learn others later
In the previous cases we saw functions that work on a single number
[1] 3
Now we will use functions working on a vector
length()
: Number of elements in the vector
[1] 189
sum()
: Total of all values in the vector
[1] 292878
min()
: smallest value
[1] 0
max()
: largest value
[1] 5138
mean()
: mean value
[1] 1549.619
median()
: median value
[1] 1233
var()
: variance
[1] 1074685
sd()
: standard deviation
[1] 1036.67
What happens if you use sqrt()
over a vector?
The function c()
(“concatenate”) takes many values and makes a single vector
[1] 1 2 3
[1] 10 20
Concatenation means “to put in the same chain”
We use the <-
operator for assignment.
Now we look inside the variables
[1] 1 2 3
[1] 10 20
Variables x
and y
are two vectors
We can concatenate them into a larger vector
[1] 1 2 3 10 20 5
If the vector has only one element,
you do not need to use c()
Instead of c(3)
just write 3
A common case is a vector with the same value repeated several times
For that case, we use the rep()
function
[1] 1 1 1
The first input of rep()
is the value to repeat
The second is how many times to repeat
rep()
can work with vectorsThe first input can be a vector
[1] 7 9 13 7 9 13 7 9 13 7 9 13
The complete vector is repeated
Both vectors must have the same length
[1] 7 7 9 13 13 13
Each element of the first vector is repeated according to the value in the second vector
A vector with numbers between 4 and 9
[1] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Two numbers separated by :
results in a vector from the first up to the second
a:b
becomes c(a, a+1, a+2, …, b)
a:b
A vector with numbers between 4 and 20
[1] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
This is the same as 4:20
It is easier to write using :
seq()
functionWe can go from 4 to 10 incrementing by 2
[1] 4 6 8 10 12 14 16 18 20
The function can take an extra input
But it is hard to remember the meaning
Instead of
[1] 4 6 8 10 12 14 16 18 20
we can write
[1] 4 6 8 10 12 14 16 18 20
seq()
is more flexibleWe can say how many numbers we want, instead of the last value
[1] 4 5 6 7 8 9 10 11 12 13 14 15
seq()
is more flexibleseq()
fills the missing inputs
[1] 4.0 4.5 5.0 5.5 6.0
The inputs are inside round parenthesis
Their role given by position or by name
If the input is optional, then you must write its name
The help page shows what is the default value
[1] 1 2 3 4 5 6 7 8
[1] 2 5 8 11 14 17 20 23
(this is just an example)
[1] 3 7 11 15 19 23 27 31
[1] -1 -3 -5 -7 -9 -11 -13 -15
[1] 2 10 24 44 70 102 140 184
Works component by component
[1] 3 4 5 6 7 8 9 10
[1] -1 0 1 2 3 4 5 6
[1] 2 4 6 8 10 12 14 16
Again, component by component
We can make new vectors by combining
Question: What happens if the vectors do not have the same length?
c()
, rep()
, and seq()
a:b
is the shorthand of seq(from=a, to=b)