A sample R session

The following session is intended to introduce to you some features of the R environment by using them. Many features of the system will be unfamiliar and puzzling at first, but this puzzlement will soon disappear.

In the text below, text in this font is R code. To avoid confusion, the R prompt (a ">") is not displayed. Comments appear on the line after an R code, and are in the font you are reading now. Short comments are prefaced by a "#" since this is the way R indicates a comment. Longer comments appear in paragraphs and are indicated by context.

None of the output from R (either text or graphics) is displayed in this document. You can copy and paste these commands (with or without comments) into an R window to see what happens. If you are trying this session yourself, I encourage you to experiment!

If you are experimenting with this session yourself, you will need a working copy of R. Instructions for installation of R on various platforms are on the preliminaries for R page.

This example is modified from the online Introduction to R manual. Modifications are quite extensive, placing more emphasis on loading datasets, manipulation of objects, and training models in "learning" problems.

Session 1: Start, help, quit

# Login, start your windowing system.

$ R
# Start R as appropriate for your platform. The above command is the unix/linux way to start R. On Windows/Mac, you'd double click an R icon or choose from the start menu.

# The R program begins, with a banner.

# (Within R, the prompt on the left hand side will not be shown to avoid confusion.)

help.start()
Start the HTML interface to on-line help (using a web browser available at your machine). You should briefly explore the features of this facility with the mouse.

# Iconify the help window and move on to the next part.

q()
# This stops R

Session 2: constructing manipulating vectors and matrices of data

# Now start R up again, and we'll continue with the first session.

x <- rnorm(50)
y <- rnorm(x) 
# Generate two pseudo-random normal vectors of x- and y-coordinates.
plot(x, y)
# Plot the points in the plane. A graphics window will appear automatically.
x
# Typing the name of an object will display its contents.
ls()
# See which R objects are now in the R workspace.
rm(x, y)
# Remove objects no longer needed. (Clean up). What happens if you "ls()" after this?
x <- 1:20
# Make x = (1, 2, ..., 20).
x = 1:20
# "=" and "<-" are the same
dummy <- data.frame(x=x, y= x + rnorm(x))
dummy
# Make a data frame of two columns, x and y, and look at it.
fm <- lm(y ~ x, data=dummy)
summary(fm)
# Fit a simple linear regression of y on x and look at the analysis.
attach(dummy)
# Make the columns in the data frame visible as variables.
plot(x, y)
# Standard point plot.
abline(0, 1, lty=3)
# The true regression line: (intercept 0, slope 1).
abline(coef(fm))
# Add the regression line.
detach()
# Remove data frame from the search path.
rm(fm, x, dummy)
# Clean up again.
q()
Quit. You will be asked if you want to save your workspace, which contains all the objects you have created. For this session, you don't need to do this. Saving a workspace will make objects you create available to you in future sessions.

Session 3: A bit of learning

Now start R again. If you are running this yourself, you will need to download the e1071 package. See the preliminaries for R page if you don't know how to do this.

library(e1071)
# load the "e1071" package, which is a collection of machine learning tools and datasets.
data(Glass)
# load the "Glass" dataset. It's built into the "e1071" package.

An alternate approach is to read the data from a file or from a URL (below):

glass2 <- read.table('http://www.ics.uci.edu/~mlearn/databases/glass/glass.data',
sep=',',head=F)
We'll use the built-in version of the data, instead of the downloaded one.
dim(Glass)
# What are dimensions of the data matrix?
summary(Glass)
# summarize each column of Glass.
? Glass
# Ask for help on the Glass dataset. Because the data are part of a library, there is a help page that describes the data. You can ask for help on any R object (either a dataset or a function) using ? or help(Glass).
library(lattice)
histogram(~Na|Type,data=Glass)
# look at histograms of the "Na" variable for each type of glass separately. Note that the more basic "hist" command is included with R, and "histogram" is part of the more fancy "lattice" package.
plot(Glass[,1:2],col=as.numeric(Glass$Type),pch=19)
# get an idea of the separability of the two classes
plot(Glass[,1:5],col=as.numeric(Glass$Type),pch=19)
# look at more scatterplots simultaneously

# Lots of things are going on in the above operations.

Glass[,1:2]
#Subscript columns 1 and 2

It's also possible to subscript rows or both rows and columns, for example

Glass[1:10,]
Glass[1:10,1:2]
Glass$Type
# We can also subscript data frames by name

Below we'll fit a neural network to the data.

library(nnet)
# load the neural network library
tempdata <- Glass
tempdata[,1:9] <- scale(tempdata[,1:9])
nn1 <- nnet(Type~.,data=tempdata,size=10,decay=1)
# fit the model
predict(nn1,type='class')
table(actual=Glass$Type,predicted=predict(nn1,type='class'))
sum(Glass$Type==predict(nn1,type='class'))
# look at predictions.
# The first line above just retreives the predicted class.
# The next line makes a table of the frequencies of all combinations of actual and predicted classes.
# the last line calculates the number of observations whose predicted class was correct.
# In the last expression, the Glass$Type==predict(nn1,type='class') makes a logical vector of length 214, and the sum counts the number of "TRUE" values.

Stop the press!

Dateline: May 22, 7:53 pm

After developing other tutorial material, there is other stuff I hope to cover. In any case, since the above material was distributed to students a week in advance, everyone probably has figured it out. Topics are ordered (roughly) by concept.

Creating matrices and vectors of data

Systematically:
 
1:10                   # integers
seq(3,10,.1)           # a sequence 3 to 10 in steps of .1
seq(3,10,l=20)         # instead specify the length
rep(0,20)              # repeat 0, 20 times
c(1:10,3,seq(1,2,l=4)) # paste together in vector

All of the above could be assigned, eg x<-1:10 or x=1:10
Random:
x <- c(1,1,1,1,2,2,3)
sample(x,2)      # pick 2 elements of x w/o replacement
sample(x)        # permute elements of x
sample(x,rep=T)  # sample length(x) elements with replacement
rnorm(10,2,1)    # simulate 10 N(2,1) observations
rbinom(10,1,.5)  # simulate 10 binomials with 1 trial each, and prob=.5
rbinom(10,1,1:10/11) # second or third (here) argument can be vector

Matrix operations

x <- matrix(1:6,2,3)
y <- matrix(c(1,1,1),3,1)
x%*%y            # matrix multiplication
x*x              # elementwise multiplication
mean(x)          # mean of elements of x
apply(x,2,mean)  # sweep out the mean function, leaving 
                 # dimension 2 (ie leave columns)

Loops, functions

sqr <- function(x) {
  x*x
}  # define a function
sqr(3.3)
y <- rep(0,10)  
for (i in 1:10){
  y[i] <- sqr(i)
}
y
WARNING: The above is an inefficient way to square the elements of a vector. In general, in R, it's much faster to "vectorize" operations than carry out for loops, for example y <- (1:10)*(1:10). However, situations will arise when loops are necesary.

Object-oriented programming

R will call an appropriate method on the basis of the first argument provided. In plain English, a command like plot(x) will do different things depending on whether x is a vector, a matrix, a data frame, or an object created by a complex function like lm. We'll encounter object-oriented functions such as plot, summary, and predict.

Statistical Models in R

Many models are specified in a unified framework, even though the details of models differ. Below is a summary of some of the notation we'll use.

model formulas identify the predictors (inputs, features) and response (output, target), can specify functions of predictors to use, automatically generate indicator variables for categorical predictors, and can represent more complex modelling structure, such as nested terms, conditional structure, etc. The variables identified in the formula usually correspond to named columns of a data frame.

library(MASS)
crabs[1:2,]
glm(sp~FL,data=crabs,family=binomial)   # use FL only as a predictor
glm(sp~.,data=crabs,family=binomial)    # use everything except sp as a predictor
glm(sp~.-sex-index,data=crabs,family=binomial) # exclude terms sex and index
glm(sp~FL+RW,data=crabs,family=binomial) # use FL and RW as predictors.
The way in which a predictor actually enters the model will depend on the model being fit. For example, a linear regression (lm) or generalized linear model (glm) may use them as linear predictors, while tree or a neural network will estimate nonlinear functions and possibly interactions. In these latter cases, the main purpose is to say what variables to use.

Other goodies

  1. str is a good way to "look under the hood" of any object without getting 10,000 lines of printed output...
    junk <- kmeans(matrix(rnorm(100),25,4),4)
    str(junk)           # show a 1-line summary of the "str"ucture of x
    str(junk,2)         # as above, but only recurse to depth 2
    
  2. We don't have the time or resources to show you a really cool visualization package called "ggobi". It's designed for dynamic, interactive graphics, and has recently become more stable and has an R interface. See www.ggobi.org .

...In the lab...

  1. You will be experimenting quite a bit of R code. A fair amount of examples are in place, but these were written with the attitude "you're a smart bunch, and should be able to figure things out."
  2. You should probably have an editor window open and develop code there, pasting it into R as you go. This way you keep a record of your work and it's easier to modify.
  3. If you like, you can execute a file of commands (say "mycommands.txt") using source("mycommands.txt"). R has a batch mode but it's not likely to be relevant here.
  4. You're welcome to work in teams. At least, if you haven't used R before, try to sit beside someone who has, so they can help you out.
  5. So now we go to the lab. You'll log in, start R, see some of the system highlights, and install some libaries. Then there are a series of things to try, with questions and suggestions for modification.

    In places these labs may be "over the top", in that they present much more material than you have a hope of finishing. Don't panic. Pick and chose what looks interesting. Ask lot of questions. Have fun.


The last time I remembered to update the "modification date" for this page was May 22, 2006.
Hugh Chipman, Acadia University