September 27th, 2018

About your questions

If you have any questions about the course or homework

  • Send your questions to iu-cmb@googlegroups.com by email. You get points for doing this
  • You can also use the web page https://groups.google.com/d/forum/iu-cmb
  • Write in English or Turkish
  • You can also answer other people’s questions. You get extra points if you do so

About your answers

  • Send your answers to andres.aravena+cmb@istanbul.edu.tr
  • If you send to any other address, I do not see it
  • Send only .Rmd files. No .pdf, no .html
  • Always write your student number on the header

About Structured Documents

We want to identify the meaning, not the shapes

Documents have two components:

  • visual and design aspects (presentation and style)
  • core material and structure (content) of a document

This is called “Separation of content and presentation”. Google it

We can also do it in Word

Once you have identified the structure of the document, you have to describe them to the computer

Markdown is one way to describe the structure. There are other ways

You can also do it in Word, using the mouse

(but the keyboard is faster)

R and RStudio

How to use RStudio

You have to install R and RStudio in your computer

You have to execute RStudio. Then

  • We read data from one or more files
  • We transform this data according to a program we design
  • We write the results to new files

Command line

RStudio, as almost all serious programs, is controlled by the keyboard

The mouse can be used for some shortcuts, but the real deal is the keyboard

A goal of this course is to become comfortable with the keyboard

These tools are for people who read books and don’t watch TV

The keyboard

your real friend

Talking with the computer

R version 3.5.1 (2018-07-02) -- "Feather Spray"
Copyright (C) 2018 The R Foundation for Statistical Computing

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

>

This > symbol is called prompt

You do not write the > part. This is a message from the computer to you

You write after the prompt

prompt [präm(p)t]

verb

  • Assist or encourage (a hesitating speaker) to say something: “What do you want?” he prompted.
  • Computing (of a computer) request input from a user.

From “New Oxford American Dictionary”

An interactive session

  • The computer shows the prompt
  • You write some commands using the keyboard
  • You finish by pressing Enter or Return
  • The computer executes your commands
  • When the execution finishes you get a new prompt

and repeat

Tab is your friend

In Rstudio you can press TAB and get superpowers!

  • The computer will propose alternatives depending on the context
  • You can select the good one using the arrows
  • If there is only one option then it is completed automatically
  • You write faster and make less mistakes

You can also repeat and edit previous commands using the arrows

You can delete all the line using Escape

Learning a new Language

beyond English

Basic Rules of a Language

Each phrase in a program is imperative.

Involves nouns, verbs and adverbs

Today we will focus on nouns

The first verb we need today is assign <-

Data represent objects

  • We know that computers store numbers
  • The numbers represent other things
  • What they represent depends on the type of the object
  • How they are used depend on the structure of the object

Objects

Every object in R has 2 important properties:

Type
What does it represent
Structure
How can we read and modify parts of it

Basic Objects

Nouns are names of objects

To handle objects we give them names

We “store” the objects in variables

If we don’t give a name to an object, it is lost for ever

Vectors

The most simple objects in R

rivers
  [1]  735  320  325  392  524  450 1459  135  465  600  330  336  280  315
 [15]  870  906  202  329  290 1000  600  505 1450  840 1243  890  350  407
 [29]  286  280  525  720  390  250  327  230  265  850  210  630  260  230
 [43]  360  730  600  306  390  420  291  710  340  217  281  352  259  250
 [57]  470  680  570  350  300  560  900  625  332 2348 1171 3710 2315 2533
 [71]  780  280  410  460  260  255  431  350  760  618  338  981 1306  500
 [85]  696  605  250  411 1054  735  233  435  490  310  460  383  375 1270
 [99]  545  445 1885  380  300  380  377  425  276  210  800  420  350  360
[113]  538 1100 1205  314  237  610  360  540 1038  424  310  300  444  301
[127]  268  620  215  652  900  525  246  360  529  500  720  270  430  671
[141] 1770

Vectors

  • Group of values, all with the same type
  • Basic types are
    • Character
    • Numeric
    • Logic
    • Factor

Factors

Also known as categorical variables.

They are used for discrete values, for example when there is no natural order

  • Color
  • Gender/Sex
  • Country of Origin

These are variables that you would never average

Examples

Data about USA states

The R systems has already defined some vectors

For historical reasons, it has data about USA states

Later we will use data bout Turkey and other countries

Today we use US states just as example

Example: character vector

US States

state.name
 [1] "Alabama"        "Alaska"         "Arizona"        "Arkansas"      
 [5] "California"     "Colorado"       "Connecticut"    "Delaware"      
 [9] "Florida"        "Georgia"        "Hawaii"         "Idaho"         
[13] "Illinois"       "Indiana"        "Iowa"           "Kansas"        
[17] "Kentucky"       "Louisiana"      "Maine"          "Maryland"      
[21] "Massachusetts"  "Michigan"       "Minnesota"      "Mississippi"   
[25] "Missouri"       "Montana"        "Nebraska"       "Nevada"        
[29] "New Hampshire"  "New Jersey"     "New Mexico"     "New York"      
[33] "North Carolina" "North Dakota"   "Ohio"           "Oklahoma"      
[37] "Oregon"         "Pennsylvania"   "Rhode Island"   "South Carolina"
[41] "South Dakota"   "Tennessee"      "Texas"          "Utah"          
[45] "Vermont"        "Virginia"       "Washington"     "West Virginia" 
[49] "Wisconsin"      "Wyoming"       

Example: character vector

US States

state.abb
 [1] "AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "FL" "GA" "HI" "ID" "IL" "IN"
[15] "IA" "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "NE" "NV"
[29] "NH" "NJ" "NM" "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN"
[43] "TX" "UT" "VT" "VA" "WA" "WV" "WI" "WY"

Example: numeric vector

US States

state.area
 [1]  51609 589757 113909  53104 158693 104247   5009   2057  58560  58876
[11]   6450  83557  56400  36291  56290  82264  40395  48523  33215  10577
[21]   8257  58216  84068  47716  69686 147138  77227 110540   9304   7836
[31] 121666  49576  52586  70665  41222  69919  96981  45333   1214  31055
[41]  77047  42244 267339  84916   9609  40815  68192  24181  56154  97914

Example: logic vector

US States

state.area > 80000
 [1] FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
[12]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
[23]  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE
[34] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE
[45] FALSE FALSE FALSE FALSE FALSE  TRUE

A logic vector can be created using a comparison

Example: factor vector

US States

state.region
 [1] South         West          West          South         West         
 [6] West          Northeast     South         South         South        
[11] West          West          North Central North Central North Central
[16] North Central South         South         Northeast     South        
[21] Northeast     North Central North Central South         North Central
[26] West          North Central West          Northeast     Northeast    
[31] West          Northeast     South         North Central North Central
[36] South         West          Northeast     Northeast     South        
[41] North Central South         South         West          Northeast    
[46] South         West          South         North Central West         
Levels: Northeast South North Central West

Creating vectors

Simple concatenation

c(1,2,3)
[1] 1 2 3
c(10,20)
[1] 10 20

The function c() takes many values and makes a single vector. All values should be of the same type

Exercise for home

What happens if you create a vector with elements of different type?

We will discuss this on the next class

Storing vectors in variables

x <- c(1,2,3)
y <- c(10,20)

We use the <- operator for assignment.

x
[1] 1 2 3
y
[1] 10 20

Vectors can also be concatenated

x and y are two numeric vectors. We can concatenate them

c(x, y, 5)
[1]  1  2  3 10 20  5

Creating Logical Vectors

c(TRUE, TRUE, FALSE, TRUE)
[1]  TRUE  TRUE FALSE  TRUE

We can also write c(T,T,F,T)

  • T is the short of TRUE
  • F is the short of FALSE

Creating Logical Vectors

A comparison creates a logical vector

weight <- c(60, 72, 57, 90, 95, 72)
weight > 25
[1] TRUE TRUE TRUE TRUE TRUE TRUE

Character vectors

Same idea. Concatenation

Each element must be between single or double quotes

c("alpha", 'beta', "gamma")
[1] "alpha" "beta"  "gamma"

You can use either ' or ", but you have to be coherent

Writing quotes inside quotes

You can use single quotes inside double quotes, and vice-versa

c('he said "yes"', "I don't know")
[1] "he said \"yes\"" "I don't know"   

Some special characters are coded with two symbols: \", \\, \n, \t

Factor vectors

Easy. Any character vector can be transformed into a factor

chr.vector <- c("female", "male", "male", "female", "male", "male", 
                "female", "female")
chr.vector
[1] "female" "male"   "male"   "female" "male"   "male"   "female" "female"
fact.vector <-factor(chr.vector)
fact.vector
[1] female male   male   female male   male   female female
Levels: female male