### Solutions ### ### Exercise 1 # Data d = read.table("pr1012.txt", header=TRUE) obj = lm(y ~ x1 + I(x1^2) + x2 + I(x2^2) + I(x1*x2), data=d) summary(obj) # backwards elimination summary(lm(y ~ x1 + I(x1^2) + I(x2^2) + I(x1*x2), data=d)) summary(lm(y ~ I(x1^2) + I(x2^2) + I(x1*x2), data=d)) summary(lm(y ~ I(x1^2) + I(x2^2), data=d)) summary(lm(y ~ I(x2^2), data=d)) # R can do this automatically with the step() function, but this # uses the so-called Akaike information criterion for comparing # models - there are in fact many different versions of backwards # elimination. # Let's try out the step() function step(obj) # Not the same model. # Beware: It is considered bad statistical practice to rely on an # automatic model selection procedure by some. # Cleaning up rm(list=ls()) ### Exercise 2 y = scan("pr0506.txt") # We use "gl" to generate levels of factors G = gl(2, 3, 18); G P = gl(3, 6); P ## Question (b) obj1 = aov(y ~ G * P) anova(obj1) # No interaction indicated, additive model accepted. ## Question (a) # Arguably, one should continue with the model aov(y ~ G + P) # This is rarely done or discussed in American textbooks. # My position is that if you accept a model then you should use it! obj2 = aov(y ~ G + P) anova(obj2) # Both main effect are highly significant, Glass very much so. ## Question (c) res = resid(obj2) yhat = fitted(obj2) plot(res ~ yhat); abline(h=0, lty=2) plot(res ~ G); abline(h=0, lty=2) plot(res ~ P); abline(h=0, lty=2) plot(res); abline(h=0, lty=2) # If you don't like the box plots, you can recode the factors to # numeric variables like this: plot(res ~ as.numeric(as.character(G))); abline(h=0, lty=2) rm(list=ls()) ### Exercise 3 # (b) y = scan("pr0509.txt") temp = rep(c(100,125,150),each=9) glass = gl(3,3,27) # Plot of the data plot(y ~ temp, col=rep(rep(c("red", "green", "blue"),3), rep(3,9)), pch=16) # The blue dots does not seem to follow a straight line # We will need a second order polynomial with different coefficients # for each glass type: anova(lm(y ~ (I(temp^2) + temp) * glass)) # All the terms seem relevant (small p-values) summary(lm(y ~ (I(temp^2) + temp) * glass)) # But all the terms containing glass2 is potentially zero - this means # there probably is very little difference between type 1 and 2 # Let's plot the model as it is a = coef(lm(y ~ (I(temp^2) + temp) * glass)) curve(a[1]+a[2]*x^2+a[3]*x,add=TRUE,col="red") curve(a[1]+a[4]+(a[2]+a[6])*x^2+(a[3]+a[8])*x,add=TRUE,col="green") curve(a[1]+a[5]+(a[2]+a[7])*x^2+(a[3]+a[9])*x,add=TRUE,col="blue") # Lots of other things should be done, e.g. residual plots, # confidence/prediction intervals, etc., but we will stop here