November 12th, 2018
We have our own data
survey <- read.table("survey1-tidy.txt")
We want to tell something about them
There are many. You have to explore and learn
So far we have seen:
length()
min()
, max()
, range()
head()
, tail()
summary()
table()
length(survey$handness)
[1] 51
summary(survey$handness)
Left Right 4 47
table(survey$handness)
Left Right 4 47
length(survey$weight_kg)
[1] 51
summary(survey$weight_kg)
Min. 1st Qu. Median Mean 3rd Qu. Max. 42.50 55.00 64.00 65.56 74.50 106.00
table(survey$weight_kg)
42.5 47 50 52 53 54 55 56 57 58 59 1 1 2 1 1 2 6 2 1 3 1 60 63 64 65 67 68 69 70 72 74 75 3 1 1 3 2 3 1 1 1 1 3 76 77 78 80 81 85 94 105 106 1 2 1 1 1 1 1 1 1
Sometimes the best way to tell the story of the data is with a graphic
plot(survey$weight_kg)
length(vector)
vector[3]
contains the value 170
, we will have a point at the coordinates (3,170)plot(survey$height_cm)
plot(survey$height_cm)
plot(survey$height_cm, col="red")
There are several ways to specify the color
The easiest one is to use a number
Each point can have a different color. You use a vector of the same lenght as the data
Something like this
plot(1:8, col=1:8)
plot(survey$height_cm, cex=2)
plot(survey$height_cm, cex=0.5)
The parameter cex
means character expansion
Each point can have a different size
You use a vector of the same lenght as the data
plot(1:8, cex=1:8)
plot(survey$height_cm, pch=16)
plot(survey$height_cm, pch=".")
The parameter pch
means plot character
Each point can have a different symbol. You use a vector of the same lenght as the data
Plot char can be chosen by a number
plot(1:25, pch=1:25)
The parameter pch
means plot character
Each point can have a different symbol. You use a vector of the same lenght as the data
Plot char can be also chosen by a letter
plot(1:7, pch=c("A", "T", "a", "t", ".", "0", "1"))
Notice that:
pch="."
is faster and it is understood betterpch=1
is different from pch="1"
LETTERS
and letters
to transform numbers into lettersPlots should help you to tell a story.
Ask yourself:
“Is this telling the story I want to tell?”
plot(survey$height_cm, type = "l")
plot(survey$height_cm, type = "b")
plot(survey$height_cm, type = "o")
plot(survey$height_cm, type = "p")
The type depends on the story you want to tell
type="p"
length(vector)>300
, better use type="p"
plot(survey$height_cm, pch=16)
plot(survey$height_cm, pch=16, xlim=c(1,20))
Including main title, subtitle, x and y axis label
plot(survey$height_cm, main="Length of survey$height_cm", sub = "51 samples", xlab="Person", ylab="Height [cm]")
plot(survey$height_cm, ylim=c(0,200)) points(survey$weight_kg, pch=2)
The first plot defines the scale. points()
works on a pre-existing plot
plot(survey$height_cm, type="l", ylim=c(0,200)) lines(survey$weight_kg, col="red")
lines()
is like points()
but with type="l"
by default
types
plot(survey$height_cm, type="o", ylim=c(0,200)) lines(survey$weight_kg, col="red", type="b")
The previous graphics used numeric data. What about factors?
plot(survey$handness)
Plotting a vector of type factor produces a barlplot
Each bar size corresponds to
plot(survey$weight_kg)
barplot(survey$weight_kg)
Remember that we can use cut()
to make a factor vector from numeric values. We need to say how many groups we want
cut(survey$weight_kg, 10)
[1] (61.5,67.9] (55.2,61.5] (55.2,61.5] (93.3,99.7] [5] (55.2,61.5] (74.2,80.6] (55.2,61.5] (74.2,80.6] [9] (74.2,80.6] (99.7,106] (55.2,61.5] (67.9,74.2] [13] (55.2,61.5] (48.9,55.2] (74.2,80.6] (48.9,55.2] [17] (99.7,106] (67.9,74.2] (67.9,74.2] (61.5,67.9] [21] (74.2,80.6] (42.4,48.9] (48.9,55.2] (67.9,74.2] [25] (55.2,61.5] (55.2,61.5] (48.9,55.2] (42.4,48.9] [29] (61.5,67.9] (61.5,67.9] (67.9,74.2] (67.9,74.2] [33] (48.9,55.2] (48.9,55.2] (55.2,61.5] (48.9,55.2] [37] (48.9,55.2] (55.2,61.5] (74.2,80.6] (48.9,55.2] [41] (80.6,86.9] (48.9,55.2] (48.9,55.2] (67.9,74.2] [45] (61.5,67.9] (61.5,67.9] (48.9,55.2] (80.6,86.9] [49] (61.5,67.9] (74.2,80.6] (74.2,80.6] 10 Levels: (42.4,48.9] (48.9,55.2] ... (99.7,106]
Now we have a factor that we can plot
plot(cut(survey$weight_kg, 10))
plot(survey$weight_kg)
hist(survey$weight_kg)
Numeric data can be grouped into classes
The default number of classes is automatic, but you can change it
Frequency means “how many times”
Histogram bars are not separated
hist(survey$weight_kg, col="grey")
hist(survey$weight_kg, col="grey", nclass = 20)