Class 14: Some summary statistics

November 28th, 2016

Some summary statistics

Summary of a data.frame

Let’s look for a minute the data about US states

head(state.x77)

           Population Income Illiteracy Life Exp Murder HS Grad Frost   Area
Alabama          3615   3624        2.1    69.05   15.1    41.3    20  50708
Alaska            365   6315        1.5    69.31   11.3    66.7   152 566432
Arizona          2212   4530        1.8    70.55    7.8    58.1    15 113417
Arkansas         2110   3378        1.9    70.66   10.1    39.9    65  51945
California      21198   5114        1.1    71.71   10.3    62.6    20 156361
Colorado         2541   4884        0.7    72.06    6.8    63.9   166 103766

What is the average of each column?

Averages of matrices

If we try directly with mean() we get

mean(state.x77)

[1] 9956.887

That is the average of everything, which probably is not what we want

Instead we can use

colMeans(state.x77)

Population     Income Illiteracy   Life Exp     Murder    HS Grad      Frost       Area 
 4246.4200  4435.8000     1.1700    70.8786     7.3780    53.1080   104.4600 70735.8800

Summary graphics

The result of colMeans() is a vector. It can be plotted easily

barplot(colMeans(state.x77))

Averages of data frames

If all the columns are numeric, then we can also use colMeans() in data frames

There are also other similar functions

rowMeans()
colSums()
rowSums()

For other cases we can use apply(), which is more advanced

Read the manual of `apply()`

Application: Exam Grades

Please go to

http://anaraven.github.io/cmb1/2016/midterm/

Each question has 0 to 6 points. That way is easy to represent \[ 0, \quad \frac{1}{6}, \quad \frac{1}{3},\quad\frac{1}{2},\quad\frac{2}{3},\quad \frac{5}{6},\quad 1 \] You can load them on R using read.table()