readr
filter()
, select()
, and %>%
Some people forgot to use valid_value
and first_3_letters
students$birthplace[!is.na(students$birthplace) &
substr(students$birthplace, start = 1, stop=3)=="MUG"]<-"MUGLA/TURKEY"
These variables are there to help you.
Moreover, the new names should not be ALL CAPS
Some students found that
did not work.
This happens only in Microsoft Windows® with non-english symbols
Non-english letters can be encoded in different ways
There used to be several alternatives
Today there is a universal standard, called Unicode or UTF-8
All professional systems use UTF-8
Microsoft Windows® still uses the old standard, but they are changing it.
Unicode is a general idea. UTF-8 is an implementation of it.
More details at “UTF-8 Support in Windows”
The combination of “Windows® + R + Non-English” is bad
Choose two of three
I made a clean version for us. Let’s use this file:
http://www.dry-lab.org/static/2020/ cmb1/students2018-2020-tidy.tsv
Answer some interesting questions
Which ones can be answered with our data?
Answer in the Quiz
# A tibble: 117 x 10
answer_date id english_level sex birthdate birthplace height_cm
<date> <chr> <chr> <chr> <date> <chr> <dbl>
1 2018-09-17 3e50… I can speak … Male 1993-02-01 -/Turkey 179
2 2018-09-17 479d… I can unders… Fema… 1998-05-21 Kahramanm… 168
3 2018-09-17 39df… I can read a… Fema… 1998-01-18 Batman/Tu… NA
4 2018-09-17 d2b0… I can read a… Male 1998-08-29 Antalya/T… 170
5 2018-09-17 f22b… I can read a… Fema… 1998-05-03 Izmir/Tur… 162
6 2018-09-17 849c… İngilizce bi… Fema… 1995-10-09 Yalova/Tu… 167
7 2018-09-17 8381… I can speak … Fema… 1997-09-19 Adıyaman/… 174
8 2018-09-17 b0dd… I can read a… Male 1997-11-27 Bursa/Tur… 180
9 2018-09-17 2972… I can read a… Fema… 1999-01-02 Istanbul/… 162
10 2018-09-17 72c0… I can read a… Fema… 1998-10-02 Istanbul/… 172
# … with 107 more rows, and 3 more variables: weight_kg <dbl>,
# handedness <chr>, hand_span <dbl>
From dplyr
package
filter()
: choose rowsselect()
: choose columnsarrange()
: sortmutate()
: change or add columnssummarize()
: calculate on all rowsgroup_by()
: separate in many tibblesCombine different tools with pipe %>%
distinct()
: eliminate duplicatesslice_head()
: a modern version of head()
slice_min()
: keep only the “best” rowssummarize()
:
n()
: countn_distinct()
: count without repetitionsFrom kintr
package
kable()
: prints nicer tables in the documentSame questions with weight
Sort the table by body mass index \[BMI=\frac{Weight}{Height^2}\]
Show the top three and bottom three