##############################
#                            #
#        LECTURE FOUR        #
#                            #
##############################


############################# PART ONE ###############################


# We start by loading the datasets for Chapter 3.
etch = read.table("etch.txt",header=TRUE)
y = etch$y; power = as.factor(etch$power)

# For the Plasma Etching data (Example 3.1) we assumed the model given
# in Sec.3.2 (p.68). Now we shall check if the model is reasonable,
# using residuals. We form the ANOVA object again.
obj = aov(y ~ power)

# The residuals are the observed values minus the fitted values. If the
# model is correct, they should be random, i.e. not having any
# discernible structure.
res = resid(obj)

# We find the numbers in small boxes in Table 3.6 (p.81). But one
# more, very important, thing is indicated in this table, and this is
# the randomisation order. It would be a serious error to first make
# all the experiments with power 160W, after that those with power
# 180W, etc. Unfortunately, the order numbers are not included in the
# data frame in R, but now we do it.
order = c(13,4,14,8,5,17,6,18,16,9,20,1,19,7,10,2,15,11,12,3)

# In the normal probability plot, Figure 3.4, I prefer the axes
# swapped, i.e. having observations on the ordinate axis. If you want
# the form in the book, you can use the option datax=TRUE
qqnorm(res, pch=15)
qqline(res)
# It seems that normality is justified.

# Next we make an index plot, i.e residuals against the order of
# experimentation, as on Figure 3.5
plot(res ~ order, pch=15)
abline(h=0, lty=2)
# We see that this is not the same as Figure 3.5. The correct index
# plot is better than Montgomery's, because it illustrates a point.
# Contrary to Figure 3.5 there is a clear decreasing tendency, showing
# the importance of randomisation. If we had not used it, we would
# have made a systematic error, probably not discovered, instead of a
# random error governed by the laws of probability.

# The standard residual plot, residuals against fitted values as in
# Figure 3.6 shows no pattern (except for the built-in grouping on the
# abscissa axis).
plot(res ~ fitted(obj), pch=15)
abline(h=0, lty=2)

# Next we turn to the question of variance homogeneity using
# Bartlett's test:
bartlett.test(etch$y, etch$power)
# Montgomery also mentions the Levene test. We could follow the
# recipe in Montgomery, but there is a simpler way. It is implemented
# in the package car, found in CRAN.

# Cleaning up
rm(list=ls())

############################## EXERCISE #############################


# Problem 1
# Look at the Peak Discharge dataset (Table 3.7, p.86, if you have
# done the above, you should already have the data frame peak in R.)
# (a) Do the ANOVA in Table 3-8.
# (b) Check the model with a residual plot and Bartlett test -
# you should realise that data is not homogeneous.
# (c) A transformation can fix this - check Table 3.9 for possible
# transformations, and calculate the mean and standard deviation for
# each treatment and make Figure 3-8. Fit a straight line to the log(sd)
# versus log(mean) (hint: the command lm(y~x) fits a straight line to
# data y versus x, and abline(lm(y~x)) plots this). Check that the slope
# is around 1/2, justifying a square root transformation on y.
# (d) Apply the square root transformation and redo the ANOVA table,
# residual plot and Bartlett's test to check that now everything is OK.


########################### PART TWO ###############################


# We use the etch dataset againto illustrate contrasts as well as
# Tukey's test, Fisher's LSD test and Dunnett's test
etch = read.table("etch.txt",header=TRUE)
y = etch$y; power = as.factor(etch$power)

# We know from the ANOVA table (R or Table 3.4) that power has
# influence on the etch rate. The four levels of power do not all have
# the same effect but still some of them could have. We start with the 
# ANOVA table
obj = aov(y ~ power)
anova(obj)   # This is Table 3-4

# We could just use the value for Mean Square of Error, and there are
# ways to get more precision. A direct way, using the formula on the
# p.71-72, is as follows:
SSE = sum(resid(obj)^2); SSE
MSE = SSE/16; MSE

# We also need the group averages to get the contrasts
powmean = tapply(y, power, mean); powmean

# Contrasts
# Too avoid data snooping, they ought to be substantiated by the
# subject matter, not chosen by looking at the data.  If nevertheless
# used after inspection of data, Scheffe must be used to circumvent
# "data snooping".
# We do the calculations in Example 3.6 at page 95.
# The three degrees of freedom for comparing treatments are split into
# three orthogonal contrasts.
c1 = c(1, -1, 0, 0)
c2 = c(1, 1, -1, -1)
c3 = c(0, 0, 1, -1)
C1 = sum(c1*powmean); C2 = sum(c2*powmean); C3 = sum(c3*powmean)
c(C1, C2, C3)
ni = 5
SS1 = C1^2/(sum(c1^2)/ni); SS2 = C2^2/(sum(c2^2)/ni); SS3 = C3^2/(sum(c3^2)/ni)
c(SS1, SS2, SS3)
sum(.Last.value)  # they sum to the treatment SS
F0 = c(SS1, SS2, SS3)/MSE
round(F0, 2)
1-pf(F0, 1, 16)  # Gives the p-value, but only for pre-chosen contrast

# Scheffe's Method
FSch = c(SS1, SS2, SS3)/(3*MSE)
1-pf(FSch, 3, 16)
# Now the first contrast is just barely significant

# Comparing pairs using Tukey and LSD
q = qtukey(0.95, 4, 16); q
tukey.span =  q*sqrt(MSE/5); tukey.span
powmean
# We find all absolute pairwise differences
abs(outer(powmean,powmean,"-"))
# There is also a function, ready to use
TukeyHSD(obj, "power")
plot(.Last.value)
# All pairwise comparisons are significant
lsd.span = qt(0.975, 16)*sqrt(MSE*2/5); lsd.span
# This value is smaller than tukey.span, hence all pairwise are still
# significant! 

# Comparing cases with control using Dunnett
q = 2.59  # Found in Table VIII, p. 631 under (3,16)
dunnett.span = q * sqrt(MSE*2/5); dunnett.span
# Finding absolute differences between group 4 and the others
abs(powmean[-4] - powmean[4])

# There is an R library, multcomp, which can do multiple comparisons
# I'll shortly show how to use it with Dunnett
# Note: to install a new package use the command: install.packages()
# To be able to use the commands in it use: library()
# You only need to install in once, but you should run library() everytime
# you restart RStudio
# Note that RStudio has a graphical interface for installing and loading packages
library(multcomp)
summary(glht(obj, linfct = mcp(power = "Dunnett")))
# The first level is considered control; we must change the order!
pow2 = ordered(power, levels=c(220,160,180,200)); pow2
obj2 = aov(y ~ pow2)
summary(obj2)  # Exactly as before
summary(glht(obj2, linfct = mcp(pow2 = "Dunnett")))
# Tukey also works
summary(glht(obj, linfct = mcp(power = "Tukey")))
# cleaning up
rm(list=ls())

####### Exercises ########


# Problem 2
# Problem 3.22 (p.135). Use "pr0322.txt" as dataset.
# Skip (c).
# As an bonus question you can try out Fisher's LSD and Dunnett's test
# using the same data if you finish the other questions.

