November 21st, 2016
For us, data always come from another program
The function used to read text files is
read.table(file, header = FALSE, sep = "", quote = "\"'", row.names, col.names, na.strings = "NA", stringsAsFactors = default.stringsAsFactors(), dec = ".", comment.char = "#", ...)
Please take a look at the help page of read.table()
.
The output of this function is a data.frame
. The only mandatory argument is:
read.table()
Other important options
Sometimes R detects rows and columns names automatically (when?)
read.table()
"
and '
.
, in Europe is ,
.
read.table()
#
TRUE
then all character columns in the file are converted to factorsTRUE
There are more options that may be necessary sometimes. We just showed the most often used
Today we will use data from
http://anaraven.bitbucket.io/static/birth.txt
Take a look at it. What can you say about it?
We read data with
birth <- read.table("http://anaraven.bitbucket.io/static/birth.txt", header=TRUE)
which results in a data frame like this:
head(birth)
id birth apgar5 sex weight head age parity weeks 1 4347 1 8 F 1610 41.0 28.5 1 31 2 4346 1 9 F 3580 51.0 35.0 1 39 3 4300 1 9 F 3350 52.0 37.0 1 40 4 4345 1 9 F 3230 50.5 35.0 1 38 5 4349 1 8 F 3650 52.0 36.5 1 40 6 4315 2 8 F 3900 51.0 35.0 1 38
plot(birth$head) points(birth$age, pch=2)
The first one defines the scale
plot(birth$head) points(birth$age, pch=2) abline(h=mean(birth$head), lwd = 3) abline(h=mean(birth$age), lwd = 3, col = "blue")
This command adds a straight line in a specific position
abline(h=1)
adds a horizontal line in 1abline(v=2)
adds a vertical line in 2abline(a=3, b=4)
adds an \(a +b\cdot x\) lineplot(birth$age, birth$apgar5)
plot(birth$age, birth$head)
Sometimes it is easier to describe the relationship between variables using a formula
Instead of
plot(birth$age, birth$head)
we can write
plot(birth$head ~ birth$age)
or even
plot(head ~ age, data = birth)
plot(head ~ age, data = birth) plot(head ~ age, data = birth, subset = sex=="F") plot(head ~ age, data = birth, subset = sex=="M")
It is easier to specify the data.frame and which values to plot
Try these commands at home.
What is wrong with these graphics?