either numeric, logic, or text
The variables are decided before doing the experiment
The observations are found during the experiment
It is hard to add new columns in a text file
But it is very easy to add rows
Therefore we write observations as rows,
and variables as columns
One observation on each row
One variable on each column
Data enters the computer from instruments
Most modern instruments have digital output
In some cases it has to be entered manually
This is dangerous, humans make many mistakes
For us, data always comes from another program
There are several file formats used to store data tables
The most common are
For now, we work with tab- and comma-separated values
Today we will use data from
http://www.dry-lab.org/static/2020/ cmb1/students2018-2020.tsv
Take a look at it
What can you say about it?
The classical way to read this data is using
Environment → Import Dataset → From text (base)
which corresponds to the command
(you can load data with the menu or the keyboard)
answer_date id english_level sex
1 2018-09-17 3e501d I can speak fluently Male
2 2018-09-17 479d88 I can understand movies without subtitles Female
3 2018-09-17 39df0d I can read and understand technical papers Female
4 2018-09-17 d2b091 I can read and understand technical papers Male
5 2018-09-17 f22b12 I can read and understand technical papers Female
6 2018-09-17 849c75 İngilizce bilmiyorum Female
7 2018-09-17 83812b I can speak fluently Female
8 2018-09-17 b0dde9 I can read and understand technical papers Male
9 2018-09-17 297223 I can read and understand technical papers Female
10 2018-09-17 72c073 I can read and understand technical papers Female
11 2018-09-17 d29251 I can read and understand technical papers Male
12 2018-09-17 6f0831 I can read and understand technical papers Female
13 2018-09-17 75b355 I can read and understand technical papers Female
14 2018-09-17 0b0da7 I can read and understand technical papers Female
15 2018-09-17 352b9f I can read and understand technical papers Female
16 2018-09-17 6f28ac I can read and understand technical papers Female
17 2018-09-17 ee5ef4 I can read and understand technical papers Female
18 2018-09-17 ba52ec I can read and understand technical papers Male
19 2018-09-17 9d98b6 I can read and understand technical papers Female
20 2018-09-17 f92274 I can speak fluently Female
21 2018-09-17 1c7531 I can read and understand technical papers Female
22 2018-09-17 8c9730 I can understand movies without subtitles Male
23 2018-09-18 371f15 I can read and understand technical papers Female
24 2018-09-18 52766e I can read and understand technical papers Female
25 2018-09-18 644c22 I can read and understand technical papers Female
26 2018-09-18 df8cf1 I can read and understand technical papers Female
27 2018-09-18 c0bd32 I can understand movies without subtitles Female
28 2018-09-19 ddbc78 İngilizce bilmiyorum Female
29 2018-09-19 6c394f I can understand movies without subtitles Male
30 2018-09-19 9fb139 İngilizce bilmiyorum Female
31 2018-09-20 70bd4d I can write poetry better than Shakespeare Male
32 2018-09-20 567104 I can read and understand technical papers Female
33 2018-09-20 b2571a I can read and understand technical papers Female
34 2018-09-20 dcc268 I can read and understand technical papers Male
35 2018-09-20 ac1b6f I can understand movies without subtitles Male
36 2018-09-20 89cd86 I can speak fluently Male
37 2018-09-20 ba5f4b I can read and understand technical papers Female
38 2018-09-20 ba5f4b I can read and understand technical papers Female
39 2018-09-21 b45951 İngilizce bilmiyorum Male
40 2018-09-21 c6208d I can read and understand technical papers Male
41 2018-09-23 412ea2 I can understand movies without subtitles Female
42 2018-09-24 b741bc I can read and understand technical papers Female
43 2018-09-24 715173 I can read and understand technical papers Female
44 2018-09-24 bc23db I can read and understand technical papers Male
45 2018-09-24 e9d1f5 I can read and understand technical papers Male
46 2018-09-24 08d7a1 English is my native language Female
47 2018-09-24 08d7a1 English is my native language Female
48 2018-09-24 219959 I can understand movies without subtitles Female
49 2018-09-24 383ce5 İngilizce bilmiyorum Female
50 2018-09-24 7b5198 I can speak fluently Female
51 2018-09-24 68efdf I can read and understand technical papers Female
52 2018-09-24 7afb3f İngilizce bilmiyorum Male
53 2018-09-24 cbda9b I can read and understand technical papers Male
54 2018-09-24 3a597c I can speak fluently Male
55 2018-09-24 cd7205 I can read and understand technical papers Male
56 2018-09-24 dcaf3d I can understand movies without subtitles Male
57 2018-09-24 dcaf3d I can understand movies without subtitles Male
58 2018-09-29 70de11 I can read and understand technical papers Female
59 2018-10-04 b43e2b I can read and understand technical papers Male
60 2018-10-06 3b85c4 I can understand movies without subtitles Female
61 2018-10-08 6961a2 I can understand movies without subtitles Male
62 2018-10-09 0dd83b I can read and understand technical papers <NA>
63 2018-10-11 213231 I can speak fluently Female
64 2018-10-11 998d64 İngilizce bilmiyorum Male
65 2018-10-15 008c4d I can understand movies without subtitles Male
66 2018-11-07 7955ff I can speak fluently Male
67 2018-11-09 a896b2 I can read and understand technical papers Female
68 2019-09-25 b2571a I can read and understand technical papers Female
69 2019-09-27 68a1cf İngilizce bilmiyorum Female
70 2019-09-27 dbf5bc I can read and understand technical papers Female
71 2019-09-29 a7ff02 İngilizce bilmiyorum Female
72 2019-10-01 cbda9b I can read and understand technical papers Male
73 2019-10-07 3a597c I can speak fluently Male
74 2019-10-09 213231 I can speak fluently Female
75 2019-10-09 1e2e83 I can understand movies without subtitles Male
76 2019-10-11 a45fe6 İngilizce bilmiyorum Female
77 2019-10-14 6961a2 I can understand movies without subtitles Male
78 2019-10-14 7b5198 I can speak fluently Female
79 2019-10-14 68efdf I can read and understand technical papers Female
80 2019-10-15 08d7a1 English is my native language Female
81 2020-10-19 70f3de I can speak fluently Female
82 2020-10-19 b81bd1 I can understand movies without subtitles Female
83 2020-10-19 692637 I can understand movies without subtitles Female
84 2020-10-19 42c891 English is my native language Male
85 2020-10-19 242bf7 I can understand movies without subtitles Female
86 2020-10-19 cd7205 I can speak fluently Male
87 2020-10-19 f8d60d I can speak fluently Female
88 2020-10-19 47e2e0 I can read and understand technical papers Female
89 2020-10-19 50988d I can read and understand technical papers Female
90 2020-10-19 60a92f I can read and understand technical papers Female
91 2020-10-19 432cf7 I can speak fluently Male
92 2020-10-19 9bba74 I can read and understand technical papers Female
93 2020-10-19 a7ff02 I can read and understand technical papers Female
94 2020-10-19 5012ed I can read and understand technical papers Male
95 2020-10-19 91e5e8 I can understand movies without subtitles Female
96 2020-10-19 fe26f8 I can understand movies without subtitles Female
97 2020-10-19 4f5875 I can speak fluently Female
98 2020-10-19 52b150 I can understand movies without subtitles Female
99 2020-10-21 d29251 I can read and understand technical papers Male
100 2020-10-21 849c75 İngilizce bilmiyorum Female
101 2020-10-21 c9a95d I can read and understand technical papers Female
102 2020-10-21 2f4b15 I can read and understand technical papers Female
103 2020-10-22 3fe6b5 I can read and understand technical papers Female
104 2020-10-22 412ea2 I can understand movies without subtitles Female
105 2020-10-23 a45fe6 I can read and understand technical papers Female
106 2020-10-23 287c3a I can understand movies without subtitles Female
107 2020-10-24 6961a2 I can understand movies without subtitles Male
108 2020-10-24 6961a2 I can understand movies without subtitles Male
109 2020-10-26 6e5137 I can speak fluently Female
110 2020-10-26 3a597c I can speak fluently Male
111 2020-10-26 f5dafd I can read and understand technical papers Female
112 2020-11-05 242bf7 I can understand movies without subtitles Female
113 2020-11-05 91e5e8 I can read and understand technical papers Female
114 2020-11-05 60a92f I can read and understand technical papers Female
115 2020-11-05 b041ba I can understand movies without subtitles Male
116 2020-11-06 c9b8b1 İngilizce bilmiyorum Female
117 2020-11-06 68a1cf I can read and understand technical papers Female
birthdate birthplace height_cm weight_kg handness hand_span
1 1993-02-01 turkey 179.00 67.0 Right 15.0
2 1998-05-21 Kahramanmaraş 1.68 55.0 Right 14.0
3 1998-01-18 Batman, Türkiye NA NA Right 18.0
4 1998-08-29 Antalya,Turkey 170.00 74.0 Right 25.0
5 1998-05-03 izmir 162.00 68.0 Right 13.0
6 1995-10-09 Türkiye / Yalova 167.00 58.0 Right 18.0
7 1997-09-19 Adıyaman,Turkey 174.00 72.0 Right 16.0
8 1997-11-27 Bursa 180.00 68.0 Right 19.0
9 1999-01-02 İstanbul/Türkiye 162.00 58.0 Right 19.0
10 1998-10-02 İstanbul,Turkey 172.00 55.0 Right 20.0
11 1997-05-18 VAN/TURKEY 181.00 81.0 Right 20.0
12 1997-12-08 <NA> NA NA Right 20.0
13 1997-10-13 Sümeyye Onat 155.00 42.5 Right 20.0
14 1998-02-03 Istanbul NA NA Right 30.0
15 1998-06-10 İstanbul 1.59 69.0 Right 18.0
16 1998-05-17 Samsun, Türkiye 165.00 58.0 Right 19.0
17 1997-07-07 Mardin,Turkey 166.00 47.0 Right 20.0
18 1998-10-13 gaziantep turkey 182.00 78.0 Right 21.0
19 1998-06-09 İstanbul,Turkey 158.00 57.0 Right 19.0
20 2018-09-03 Yıldırım, BURSA 1.64 55.0 Right 20.0
21 1998-09-17 Istanbul/Turkey 173.00 55.0 Right 8.0
22 1998-07-28 Bursa / TURKEY 185.00 65.0 Left 22.0
23 1998-08-17 Yalova 163.00 60.0 Right 15.0
24 1998-03-24 Ordu Turkey 167.00 50.0 Right 30.0
25 2018-04-24 Istanbul, Turkey NA NA Right 19.0
26 1997-10-13 İstanbul 171.00 52.0 Right 25.0
27 1997-05-18 Edirne, Türkiye 165.00 54.0 Right 18.0
28 1997-01-14 Malatya, Türkiye 162.00 75.0 Left 18.0
29 1997-06-25 <NA> 188.00 105.0 Right 20.0
30 1995-01-28 Türkiye/Hatay/Antakya 1.70 56.0 Left 18.0
31 2018-12-08 istanbul NA NA Right 20.0
32 1997-07-03 Çorum 160.00 50.0 Right 15.0
33 1996-01-04 İstanbul NA NA Left 15.0
34 1997-01-05 Muğla/Turkey 178.00 67.0 Right 24.0
35 1997-12-26 City 176.00 59.0 Right 24.0
36 1998-10-31 Istanbul, TURKEY 184.00 75.0 Right 22.0
37 1991-01-01 Suriye 160.00 60.0 Right 19.0
38 1991-01-01 Suriye 160.00 60.0 Right 19.0
39 1998-01-10 Yıldırım, Bursa 175.00 106.0 Right 15.0
40 1992-08-11 Malatya/Turky 1.80 94.0 Right 25.0
41 1999-05-02 Balıkesir 165.00 63.0 Left 17.0
42 1997-07-29 Istanbul/Türkiye 1.60 54.0 Right 20.0
43 1998-02-05 Nakhchivan/Azerbaijan 1.57 53.0 Right 20.0
44 1998-11-19 Azerbaijan 175.00 75.0 Right 20.0
45 1997-02-09 Sivas,Turkey 183.00 70.0 Right 20.0
46 1997-06-30 Ankara 158.00 65.0 Right 8.0
47 1997-06-30 Ankara 158.00 65.0 Right 8.0
48 1998-09-03 Samsun 174.00 55.0 Right 22.0
49 1998-11-16 Adana,türkiye 163.00 68.0 Right 13.0
50 1999-05-23 Almaty, Kazakhstan 178.00 55.0 Right 12.0
51 1998-04-07 istanbul 165.00 NA Right 9.0
52 1997-05-01 Antalya/Türkiye 173.00 80.0 Right 16.0
53 1996-09-26 Hatay/Turkey 175.00 77.0 Right 18.0
54 1993-03-14 Tekirdag / Turkey 195.00 85.0 Right 30.0
55 1997-12-06 turkey 166.00 65.0 Right 15.0
56 1998-11-06 İzmir-Turkey 163.00 64.0 Right 15.0
57 1998-11-06 İzmir-Turkey 163.00 64.0 Right 15.0
58 1998-09-01 Van,Turkey 174.00 60.0 Right 24.0
59 2018-01-15 Bursa,türkiye 175.00 76.0 Right 20.0
60 1996-04-05 Tunceli,Turkey 173.00 56.0 Right 21.0
61 1994-01-01 Aleppo NA 78.0 Right 25.0
62 <NA> <NA> NA NA Right 22.0
63 <NA> <NA> NA NA Right 17.0
64 1996-03-09 İstanbul 177.00 77.0 Right 23.0
65 1996-10-25 Safranbolu/KARABUK 181.00 72.0 Left 26.0
66 1994-01-05 <NA> NA NA Right 25.0
67 1998-04-18 İstanbul 165.00 58.0 Right 20.5
68 1996-01-04 İstanbul NA NA Left 20.0
69 1995-03-26 YALOVA 168.00 66.0 Right 18.0
70 1994-08-18 Edremit (Balıkesir) 1.64 52.0 Right 19.0
71 1997-03-23 Turkmenistan 179.00 NA Right 18.0
72 1996-09-26 Hatay/Antakya 175.00 73.0 Right 20.0
73 1993-03-14 Tekirdağ/Turkey 195.00 82.0 Right 25.0
74 2019-06-06 İstanbul 160.00 55.0 Right 17.0
75 1996-10-25 İstanbul 180.00 86.0 Right 23.0
76 1997-02-03 Sivas 161.00 63.0 Right 18.0
77 1994-01-01 Aleppo/Syria 183.00 85.0 Right 22.0
78 1999-05-23 Almaty, Kazakhstan 178.00 58.0 Right 21.0
79 1998-04-07 istanbul 165.00 65.0 Right 20.0
80 1997-06-30 Ankara 158.00 65.0 Right 14.0
81 2000-11-07 Konya, Türkiye 165.00 70.0 Right 18.0
82 2001-12-25 Afyon, Türkiye 169.00 NA Right 21.0
83 1999-05-23 Antalya, Türkiye 167.00 47.0 Right 20.0
84 1994-01-05 tekirdag 1.80 82.0 Right 21.0
85 2001-11-01 İstanbul, Türkiye 162.00 70.0 Left 16.0
86 1997-06-12 Kırklareli 169.00 75.0 Right 20.0
87 1998-02-20 Aydın, Turkey 165.00 47.0 Right 21.0
88 1997-07-24 İstanbul,Turkey 168.00 72.0 Right 21.0
89 2000-12-28 Hannover, Germany 171.00 NA Right 18.0
90 1998-12-28 Istanbul/ Turkey 171.00 61.0 Right 21.0
91 2001-07-04 Mersin, Turkey 184.00 79.0 Right 25.0
92 2000-01-22 TÜrkiye/Bursa 165.00 55.0 Right 14.0
93 1997-03-23 Turkmenistan 179.00 NA Right 21.0
94 1999-10-29 Bodrum/Muğla 180.00 74.0 Left 23.0
95 2000-07-26 Afyonkarahisar, Turkey 164.00 47.0 Right 19.0
96 2000-04-15 Istanbul/ Turkey 156.00 54.0 Right 15.0
97 1998-01-21 Istanbul NA NA Right 19.0
98 2000-12-06 Ordu/Turkey 1.63 60.0 Left 19.0
99 1997-05-18 VAN / TURKEY 183.00 74.0 Right 19.5
100 1995-10-09 OSMANGAZİ, TÜRKİYE 167.00 56.0 Right 17.0
101 1996-08-14 Manisa/ Turkey NA NA Right 18.0
102 1998-08-02 Turkey /İstanbul 1.75 65.0 Right 20.0
103 1999-03-21 Istanbul, Turkey 162.00 49.0 Right 17.0
104 1999-05-02 Turkey 168.00 63.0 Left 18.0
105 1997-02-03 Sivas,Turkey 161.00 65.0 Right 18.0
106 1999-06-22 İstanbul, Türkiye 165.00 47.0 Right 18.0
107 1994-01-01 istanbul 184.00 90.0 Right 23.0
108 1994-01-01 istanbul 184.00 90.0 Right 23.0
109 2001-08-01 Istanbul/ Turkey 162.00 76.0 Right 24.0
110 1993-03-14 Tekirdağ, Turkey 195.00 88.0 Right 24.0
111 1977-03-08 İstanbul 167.00 80.0 Right 22.0
112 2001-11-01 İstanbul/Türkiye 162.00 72.0 Left 16.0
113 2000-07-26 Afyonkarahisar, Turkey 164.00 47.0 Right 19.0
114 1998-12-28 İstanbul 171.00 61.0 Right 21.0
115 1991-11-15 Istanbul 192.00 95.0 Right 26.0
116 1996-01-18 istanbul,turkey 168.00 67.0 Right 21.0
117 1995-03-26 YALOVA / TÜRKİYE 168.00 80.0 Right 15.0
Bidimensional structures
Each column can be of a different type
All columns have the same length
All columns need a name
Usually too big to print
How can we see survey
In Rstudio we can use the command
But this does not work on Rmarkdown,
so we cannot use it in a paper or report
answer_date id english_level sex
1 2018-09-17 3e501d I can speak fluently Male
2 2018-09-17 479d88 I can understand movies without subtitles Female
3 2018-09-17 39df0d I can read and understand technical papers Female
4 2018-09-17 d2b091 I can read and understand technical papers Male
5 2018-09-17 f22b12 I can read and understand technical papers Female
6 2018-09-17 849c75 İngilizce bilmiyorum Female
birthdate birthplace height_cm weight_kg handness hand_span
1 1993-02-01 turkey 179.00 67 Right 15
2 1998-05-21 Kahramanmaraş 1.68 55 Right 14
3 1998-01-18 Batman, Türkiye NA NA Right 18
4 1998-08-29 Antalya,Turkey 170.00 74 Right 25
5 1998-05-03 izmir 162.00 68 Right 13
6 1995-10-09 Türkiye / Yalova 167.00 58 Right 18
Notice that there are too many columns
One basic question we need to answer is how many observations are in our data frame
In other words, we want to know the number of rows
Use the command
[1] 117
We also want to know what is the number of columns
[1] 10
Together, the number of rows and columns is called dimension
[1] 117 10
Each column represents a variable
The column name is the name of the variable
[1] "answer_date" "id" "english_level" "sex"
[5] "birthdate" "birthplace" "height_cm" "weight_kg"
[9] "handness" "hand_span"
You can use $
to get the vector on each column
[1] 67.0 55.0 NA 74.0 68.0 58.0 72.0 68.0 58.0 55.0 81.0 NA
[13] 42.5 NA 69.0 58.0 47.0 78.0 57.0 55.0 55.0 65.0 60.0 50.0
[25] NA 52.0 54.0 75.0 105.0 56.0 NA 50.0 NA 67.0 59.0 75.0
[37] 60.0 60.0 106.0 94.0 63.0 54.0 53.0 75.0 70.0 65.0 65.0 55.0
[49] 68.0 55.0 NA 80.0 77.0 85.0 65.0 64.0 64.0 60.0 76.0 56.0
[61] 78.0 NA NA 77.0 72.0 NA 58.0 NA 66.0 52.0 NA 73.0
[73] 82.0 55.0 86.0 63.0 85.0 58.0 65.0 65.0 70.0 NA 47.0 82.0
[85] 70.0 75.0 47.0 72.0 NA 61.0 79.0 55.0 NA 74.0 47.0 54.0
[97] NA 60.0 74.0 56.0 NA 65.0 49.0 63.0 65.0 47.0 90.0 90.0
[109] 76.0 88.0 80.0 72.0 47.0 61.0 95.0 67.0 80.0
This data is real, and belongs to you
To use it here, we deleted some of your personal data
It does not show your name, email or student number
Instead, there is an id
column, unique to each person
[1] "3e501d" "479d88" "39df0d" "d2b091" "f22b12" "849c75"
The id
column was created using a digital signature
(we discuss them in class 14)
Same id
is always same person. But privacy is preserved
This is one step to do a blind analysis
It is essential to keep anonymity of patients data
And to avoid researcher bias
As with vectors, we want to choose which parts to see
We can use logic values to filter the rows
For example, we may want to know about left-handed people attending to our course this year
For example, we can do this
answer_date id english_level sex
85 2020-10-19 242bf7 I can understand movies without subtitles Female
94 2020-10-19 5012ed I can read and understand technical papers Male
98 2020-10-19 52b150 I can understand movies without subtitles Female
104 2020-10-22 412ea2 I can understand movies without subtitles Female
112 2020-11-05 242bf7 I can understand movies without subtitles Female
birthdate birthplace height_cm weight_kg handness hand_span
85 2001-11-01 İstanbul, Türkiye 162.00 70 Left 16
94 1999-10-29 Bodrum/Muğla 180.00 74 Left 23
98 2000-12-06 Ordu/Turkey 1.63 60 Left 19
104 1999-05-02 Turkey 168.00 63 Left 18
112 2001-11-01 İstanbul/Türkiye 162.00 72 Left 16
Here we do not need to write survey$
but…
In the last years people has improved data frames to make them easier to use
The new version is called tibble
The easiest way to load data is to use the menu
Environment → Import Dataset → From Text (readr)…
── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
cols(
answer_date = col_date(format = ""),
id = col_character(),
english_level = col_character(),
sex = col_character(),
birthdate = col_date(format = ""),
birthplace = col_character(),
height_cm = col_double(),
weight_kg = col_double(),
handness = col_character(),
hand_span = col_double()
)
(we will explain library(readr)
later)
# A tibble: 117 x 10
answer_date id english_level sex birthdate birthplace height_cm
<date> <chr> <chr> <chr> <date> <chr> <dbl>
1 2018-09-17 3e50… I can speak … Male 1993-02-01 turkey 179
2 2018-09-17 479d… I can unders… Fema… 1998-05-21 Kahramanm… 1.68
3 2018-09-17 39df… I can read a… Fema… 1998-01-18 Batman, T… NA
4 2018-09-17 d2b0… I can read a… Male 1998-08-29 Antalya,T… 170
5 2018-09-17 f22b… I can read a… Fema… 1998-05-03 izmir 162
6 2018-09-17 849c… İngilizce bi… Fema… 1995-10-09 Türkiye /… 167
7 2018-09-17 8381… I can speak … Fema… 1997-09-19 Adıyaman,… 174
8 2018-09-17 b0dd… I can read a… Male 1997-11-27 Bursa 180
9 2018-09-17 2972… I can read a… Fema… 1999-01-02 İstanbul/… 162
10 2018-09-17 72c0… I can read a… Fema… 1998-10-02 İstanbul,… 172
# … with 107 more rows, and 3 more variables: weight_kg <dbl>, handness <chr>,
# hand_span <dbl>
This is much easier to read
These commands work in tibbles as in data frames
[1] 117 10
[1] 117
[1] 10
As before, we can ask for column names
[1] "answer_date" "id" "english_level" "sex"
[5] "birthdate" "birthplace" "height_cm" "weight_kg"
[9] "handness" "hand_span"
Each column can be accessed by its name
Left Right
12 105
What is the height of left-handed people?
To answer this question, we need new tools
Let’s get new tools for our R
library(readr)
?Remember how we read data from the file
Now we will explain library(readr)
:
We use it to enable the read_tsv()
command
Out of the box, your R system has many commands
But there are more commands, that you can also use
The new commands are in packages or libraries
To enable a package, we use the command library()
library()
with installed packagesIf you click on the package name, you can see what are its commands
To use them, write library(
package name)
You need to do this once in every session
What if you need more packages?
If the package is not in your computer,
you need to use install.packages()
This command download new packages from the web
We install only one time
We load every time we need them
You can use the menu Packages → Install
To work with tibbles we need to install several packages
This set of packages is called tidyverse
In the command line, you write
This command will download all the packages
and store them in your computer
You only need to do this one time.
We will use several packages from tidyverse
There is a lot of material free online
Read it. Watch it
Today we use only the dplyr
package
dplyr
package
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Do not pay attention to the warning messages
We will deal with them later
We can easily choose the relevant rows
# A tibble: 5 x 10
answer_date id english_level sex birthdate birthplace height_cm
<date> <chr> <chr> <chr> <date> <chr> <dbl>
1 2020-10-19 242b… I can unders… Fema… 2001-11-01 İstanbul,… 162
2 2020-10-19 5012… I can read a… Male 1999-10-29 Bodrum/Mu… 180
3 2020-10-19 52b1… I can unders… Fema… 2000-12-06 Ordu/Turk… 1.63
4 2020-10-22 412e… I can unders… Fema… 1999-05-02 Turkey 168
5 2020-11-05 242b… I can unders… Fema… 2001-11-01 İstanbul/… 162
# … with 3 more variables: weight_kg <dbl>, handness <chr>, hand_span <dbl>
(notice that we use ==
for comparisons)
# A tibble: 117 x 2
weight_kg height_cm
<dbl> <dbl>
1 67 179
2 55 1.68
3 NA NA
4 74 170
5 68 162
6 58 167
7 72 174
8 68 180
9 58 162
10 55 172
# … with 107 more rows
We can use the result of this comparison as a row index
left_handed <- filter(students, handness=="Left" & answer_date > "2020-01-01")
select(left_handed, answer_date, weight_kg, height_cm)
# A tibble: 5 x 3
answer_date weight_kg height_cm
<date> <dbl> <dbl>
1 2020-10-19 70 162
2 2020-10-19 74 180
3 2020-10-19 60 1.63
4 2020-10-22 63 168
5 2020-11-05 72 162
Normally we use <-
for assignment
There is another way, that is sometimes nicer
The ->
arrow goes from the value to the variable
filter(students, handness=="Left" & answer_date > "2020-01-01") -> left_handed
select(left_handed, answer_date, weight_kg, height_cm)
# A tibble: 5 x 3
answer_date weight_kg height_cm
<date> <dbl> <dbl>
1 2020-10-19 70 162
2 2020-10-19 74 180
3 2020-10-19 60 1.63
4 2020-10-22 63 168
5 2020-11-05 72 162
left_handed
is an intermediate variable
We use it only for one step. We don’t need it at the end
filter(students, handness=="Left" & answer_date > "2020-01-01") %>% select(answer_date, weight_kg, height_cm)
# A tibble: 5 x 3
answer_date weight_kg height_cm
<date> <dbl> <dbl>
1 2020-10-19 70 162
2 2020-10-19 74 180
3 2020-10-19 60 1.63
4 2020-10-22 63 168
5 2020-11-05 72 162
The key thing is %>%
, called pipe
filter(students, handness=="Left" & answer_date > "2020-01-01") %>%
select(answer_date, weight_kg, height_cm)
# A tibble: 5 x 3
answer_date weight_kg height_cm
<date> <dbl> <dbl>
1 2020-10-19 70 162
2 2020-10-19 74 180
3 2020-10-19 60 1.63
4 2020-10-22 63 168
5 2020-11-05 72 162
If you write %>%
at the end of the line, you can continue in the next line
The %>%
symbol help us to write clear code.
Instead of
we write
The first function input is taken from the pipe
Instead of
we write
We can read %>%
as “then”
“Take x
, then calculate sine, then square root, then take the smallest of the result and z
, and store it in y
”
The package providing pipes is called magrittr
Why?
Tell me in the next class
(no writing necessary)