Someone who does Science
Science is not only making experiments
Science is the process of creating knowledge
We search for truth with the Scientific Method
This way we avoid fooling ourselves
The society expect from us, the scientists, to answer these questions with the truth
Tomorrow you may work in a blood bank.
Is the blood safe?
Or in a food factory.
Is the food safe? Is it GMO?
Or in a University.
Is this pharmaceutical company telling the truth?
Or you do a paternity test.
Is this person the real father?
You can be wrong, but you cannot lie
Models are tested with experiments
To be valid, experiments must be replicable
There may be some variation between experiments
library(readr)
students <- read_tsv("students2018-2020-tidy.tsv",
col_types = cols(
sex = col_factor(levels = c("Male", "Female")),
handedness = col_factor(levels = c("Left", "Right"))
)
)
students
# A tibble: 117 x 10
answer_date id english_level sex birthdate birthplace height_cm
<date> <chr> <chr> <fct> <date> <chr> <dbl>
1 2018-09-17 3e50… I can speak … Male 1993-02-01 -/Turkey 179
2 2018-09-17 479d… I can unders… Fema… 1998-05-21 Kahramanm… 168
3 2018-09-17 39df… I can read a… Fema… 1998-01-18 Batman/Tu… NA
4 2018-09-17 d2b0… I can read a… Male 1998-08-29 Antalya/T… 170
5 2018-09-17 f22b… I can read a… Fema… 1998-05-03 Izmir/Tur… 162
6 2018-09-17 849c… İngilizce bi… Fema… 1995-10-09 Yalova/Tu… 167
7 2018-09-17 8381… I can speak … Fema… 1997-09-19 Adıyaman/… 174
8 2018-09-17 b0dd… I can read a… Male 1997-11-27 Bursa/Tur… 180
9 2018-09-17 2972… I can read a… Fema… 1999-01-02 Istanbul/… 162
10 2018-09-17 72c0… I can read a… Fema… 1998-10-02 Istanbul/… 172
# … with 107 more rows, and 3 more variables: weight_kg <dbl>,
# handedness <fct>, hand_span <dbl>
plot()
functionplot(y ~ x)
looks like plot(x, y)
plot(y~x, data=dframe)
In science we work by creating models of how nature works
There are several kinds of models
One of the easiest and more commonly used are the linear models
We approximate all our data by a straight line that shows the relationship between some variables, with a formula like \[y=a + b\cdot x\]
We want to describe the relationship between weight and height
A basic rule of thumb is that weight in kg is usually equal to height in cm minus 100 \[weight = height - 100\]
That is, the weight is the number of centimeters over one meter
This command adds a straight line in a specific position
abline(h=1)
adds a horizontal line in 1
abline(v=2)
adds a vertical line in 2
abline(a=3, b=4)
adds an \(a +b\cdot x\) line
The hard part is to decide
Here we are using a linear model \[y=a + b\cdot x\] more precisely \[weight = a + b\cdot height\]
Which are the best \(a\) and \(b\) values?
We draw the figure using a formula
Using the same formula we can get a linear model
Call:
lm(formula = weight_kg ~ height_cm, data = students)
Coefficients:
(Intercept) height_cm
-84.0079 0.8822
We draw using abline()
with the coefficients of lm()
We must avoid copying data manually or using fixed values
Call:
lm(formula = weight_kg ~ height_cm, data = students)
Coefficients:
(Intercept) height_cm
-84.0079 0.8822
(Intercept) height_cm
-84.007888 0.882157
The formula is \(y=a + b\cdot x\)
\(a\) is coef(model)[1]
\(b\) is coef(model)[2]
Beyond giving a description of the data, models are often used to get a prediction of what would be the output of the system when we have new data
In this case we need to provide a data.frame
with at least one column. The column name must be the same as the one used to create the model. For example
Robert Hooke (1635–1703) was an English natural philosopher, architect and polymath.
In 1660, Hooke discovered the law of elasticity which describes the linear variation of tension with extension
“The extension is proportional to the force”
Natural philosophy was the study of nature and the physical universe that was dominant before the development of modern science
Polymath (from Greek “having learned much”) is a person whose expertise spans a significant number of different subject areas
Biologist. Hooke used the microscope and was the fists to use the term cell for describing biological organisms.
The essence of the coil is:
n_balls | length | repetition |
---|---|---|
0 | 78.00 | 1 |
1 | 82.61 | 1 |
2 | 85.85 | 1 |
3 | 90.26 | 1 |
4 | 95.05 | 1 |
0 | 79.21 | 2 |
2 | 85.55 | 2 |
3 | 90.06 | 2 |
4 | 94.35 | 2 |
(some data omitted from the table. Get all data at
http://dry-lab.org/static/2017/rubber1.txt)
When data seems to be in a straight line, we can find that line
The best-fitting line is found using a linear model
Call:
lm(formula = length ~ n_balls, data = rubber)
Coefficients:
(Intercept) n_balls
76.48 5.06
Remember that straight lines can be represented by the formula \[\text{n_marbles}=A+B\cdot \text{length}\] The coefficient \(A\) is the value where the line intercepts the vertical axis
The coefficient \(B\) is how much length
goes up when n_marbles
increases. This is called slope
The formula from Hooke’s Law is \[\text{force}=K\cdot(L-\text{height})\] Since force
is the weight of the balls, we can write \[-m g\cdot\text{n_balls}=K\cdot(L-\text{lenght})\] which can be re-written as \[\text{lenght}=\underbrace{L}_{A}+\underbrace{\frac{m g}{K}}_{B}\cdot\text{n_balls}\]
coef(model)
When there are no balls, the length of the coil is \(L\), in this case
(Intercept)
76.48188
If the mass of each ball is 20gr, we can find \(K\) as
n_balls
25.81566