Class 18: Practice on tidying up data

Computing in Molecular Biology and Genetics 1

Andrés Aravena, PhD

23 November 2020

Which are the students this semester?

To answer this question we need to check the answer date

There is and old way, that you may know if you did this course previously. This method uses indices.

students[students$handness=="Left" & students$answer_date > "2020-01-01", ]

# A tibble: 5 x 10
  answer_date id    english_level sex   birthdate  birthplace height_cm
  <date>      <chr> <chr>         <chr> <date>     <chr>          <dbl>
1 2020-10-19  242b… I can unders… Fema… 2001-11-01 İstanbul,…    162   
2 2020-10-19  5012… I can read a… Male  1999-10-29 Bodrum/Mu…    180   
3 2020-10-19  52b1… I can unders… Fema… 2000-12-06 Ordu/Turk…      1.63
4 2020-10-22  412e… I can unders… Fema… 1999-05-02 Turkey        168   
5 2020-11-05  242b… I can unders… Fema… 2001-11-01 İstanbul/…    162   
# … with 3 more variables: weight_kg <dbl>, handness <chr>, hand_span <dbl>

Avoid repeating the name

In the “old” way we had to write students many times

With filter() we do not need to repeat the name

library(dplyr)
filter(students, handness=="Left" & answer_date > "2020-01-01")

# A tibble: 5 x 10
  answer_date id    english_level sex   birthdate  birthplace height_cm
  <date>      <chr> <chr>         <chr> <date>     <chr>          <dbl>
1 2020-10-19  242b… I can unders… Fema… 2001-11-01 İstanbul,…    162   
2 2020-10-19  5012… I can read a… Male  1999-10-29 Bodrum/Mu…    180   
3 2020-10-19  52b1… I can unders… Fema… 2000-12-06 Ordu/Turk…      1.63
4 2020-10-22  412e… I can unders… Fema… 1999-05-02 Turkey        168   
5 2020-11-05  242b… I can unders… Fema… 2001-11-01 İstanbul/…    162   
# … with 3 more variables: weight_kg <dbl>, handness <chr>, hand_span <dbl>

comma is AND

Now we can use several conditions, separated by comma

filter(students, handness=="Left" , answer_date > "2020-01-01")

# A tibble: 5 x 10
  answer_date id    english_level sex   birthdate  birthplace height_cm
  <date>      <chr> <chr>         <chr> <date>     <chr>          <dbl>
1 2020-10-19  242b… I can unders… Fema… 2001-11-01 İstanbul,…    162   
2 2020-10-19  5012… I can read a… Male  1999-10-29 Bodrum/Mu…    180   
3 2020-10-19  52b1… I can unders… Fema… 2000-12-06 Ordu/Turk…      1.63
4 2020-10-22  412e… I can unders… Fema… 1999-05-02 Turkey        168   
5 2020-11-05  242b… I can unders… Fema… 2001-11-01 İstanbul/…    162   
# … with 3 more variables: weight_kg <dbl>, handness <chr>, hand_span <dbl>

Using pipes

students %>% filter(handness=="Left" , answer_date > "2020-01-01")

# A tibble: 5 x 10
  answer_date id    english_level sex   birthdate  birthplace height_cm
  <date>      <chr> <chr>         <chr> <date>     <chr>          <dbl>
1 2020-10-19  242b… I can unders… Fema… 2001-11-01 İstanbul,…    162   
2 2020-10-19  5012… I can read a… Male  1999-10-29 Bodrum/Mu…    180   
3 2020-10-19  52b1… I can unders… Fema… 2000-12-06 Ordu/Turk…      1.63
4 2020-10-22  412e… I can unders… Fema… 1999-05-02 Turkey        168   
5 2020-11-05  242b… I can unders… Fema… 2001-11-01 İstanbul/…    162   
# … with 3 more variables: weight_kg <dbl>, handness <chr>, hand_span <dbl>

students %>% filter(handness=="Left") %>% filter(answer_date > "2020-01-01")

# A tibble: 5 x 10
  answer_date id    english_level sex   birthdate  birthplace height_cm
  <date>      <chr> <chr>         <chr> <date>     <chr>          <dbl>
1 2020-10-19  242b… I can unders… Fema… 2001-11-01 İstanbul,…    162   
2 2020-10-19  5012… I can read a… Male  1999-10-29 Bodrum/Mu…    180   
3 2020-10-19  52b1… I can unders… Fema… 2000-12-06 Ordu/Turk…      1.63
4 2020-10-22  412e… I can unders… Fema… 1999-05-02 Turkey        168   
5 2020-11-05  242b… I can unders… Fema… 2001-11-01 İstanbul/…    162   
# … with 3 more variables: weight_kg <dbl>, handness <chr>, hand_span <dbl>

Class 18: Practice on tidying up data Computing in Molecular Biology and Genetics 1 Andrés Aravena, PhD 23 November 2020

Class 18: Practice on tidying up data

Computing in Molecular Biology and Genetics 1

Andrés Aravena, PhD

23 November 2020

Right now you should be able to

Today’s Goal

We load the data

Filtering and selecting

Which are the students this semester?

Avoid repeating the name

comma is AND

Using pipes

Choose columns

Did you attend to “Introduction to Computer Science”?

Combine different tools with pipe