readr
filter()
, select()
, and %>%
Tidy up survey data
Answer some interesting questions
To answer this question we need to check the answer date
There is and old way, that you may know if you did this course previously. This method uses indices.
# A tibble: 5 x 10
answer_date id english_level sex birthdate birthplace height_cm
<date> <chr> <chr> <chr> <date> <chr> <dbl>
1 2020-10-19 242b… I can unders… Fema… 2001-11-01 İstanbul,… 162
2 2020-10-19 5012… I can read a… Male 1999-10-29 Bodrum/Mu… 180
3 2020-10-19 52b1… I can unders… Fema… 2000-12-06 Ordu/Turk… 1.63
4 2020-10-22 412e… I can unders… Fema… 1999-05-02 Turkey 168
5 2020-11-05 242b… I can unders… Fema… 2001-11-01 İstanbul/… 162
# … with 3 more variables: weight_kg <dbl>, handness <chr>, hand_span <dbl>
In the “old” way we had to write students
many times
With filter()
we do not need to repeat the name
# A tibble: 5 x 10
answer_date id english_level sex birthdate birthplace height_cm
<date> <chr> <chr> <chr> <date> <chr> <dbl>
1 2020-10-19 242b… I can unders… Fema… 2001-11-01 İstanbul,… 162
2 2020-10-19 5012… I can read a… Male 1999-10-29 Bodrum/Mu… 180
3 2020-10-19 52b1… I can unders… Fema… 2000-12-06 Ordu/Turk… 1.63
4 2020-10-22 412e… I can unders… Fema… 1999-05-02 Turkey 168
5 2020-11-05 242b… I can unders… Fema… 2001-11-01 İstanbul/… 162
# … with 3 more variables: weight_kg <dbl>, handness <chr>, hand_span <dbl>
Now we can use several conditions, separated by comma
# A tibble: 5 x 10
answer_date id english_level sex birthdate birthplace height_cm
<date> <chr> <chr> <chr> <date> <chr> <dbl>
1 2020-10-19 242b… I can unders… Fema… 2001-11-01 İstanbul,… 162
2 2020-10-19 5012… I can read a… Male 1999-10-29 Bodrum/Mu… 180
3 2020-10-19 52b1… I can unders… Fema… 2000-12-06 Ordu/Turk… 1.63
4 2020-10-22 412e… I can unders… Fema… 1999-05-02 Turkey 168
5 2020-11-05 242b… I can unders… Fema… 2001-11-01 İstanbul/… 162
# … with 3 more variables: weight_kg <dbl>, handness <chr>, hand_span <dbl>
# A tibble: 5 x 10
answer_date id english_level sex birthdate birthplace height_cm
<date> <chr> <chr> <chr> <date> <chr> <dbl>
1 2020-10-19 242b… I can unders… Fema… 2001-11-01 İstanbul,… 162
2 2020-10-19 5012… I can read a… Male 1999-10-29 Bodrum/Mu… 180
3 2020-10-19 52b1… I can unders… Fema… 2000-12-06 Ordu/Turk… 1.63
4 2020-10-22 412e… I can unders… Fema… 1999-05-02 Turkey 168
5 2020-11-05 242b… I can unders… Fema… 2001-11-01 İstanbul/… 162
# … with 3 more variables: weight_kg <dbl>, handness <chr>, hand_span <dbl>
# A tibble: 5 x 10
answer_date id english_level sex birthdate birthplace height_cm
<date> <chr> <chr> <chr> <date> <chr> <dbl>
1 2020-10-19 242b… I can unders… Fema… 2001-11-01 İstanbul,… 162
2 2020-10-19 5012… I can read a… Male 1999-10-29 Bodrum/Mu… 180
3 2020-10-19 52b1… I can unders… Fema… 2000-12-06 Ordu/Turk… 1.63
4 2020-10-22 412e… I can unders… Fema… 1999-05-02 Turkey 168
5 2020-11-05 242b… I can unders… Fema… 2001-11-01 İstanbul/… 162
# … with 3 more variables: weight_kg <dbl>, handness <chr>, hand_span <dbl>
In that course we used UNIX command line to process data
We did something like
Here we follow the same philosophy