*******************************************************************************
*** ***
*** Stata-program (do-file) associated with the Exercises on day two. ***
*** ***
*** ***
*******************************************************************************
* Specifying the path to where the data are, for example
cd "C:\ANOVA\DAY 3"
***** Exercise 8 (crp in pigs undergoing CPB or a sham-operation) ******
* Reading in the data.
use crp.dta, clear
* 1. Make a plot of the mean curves in each group and add the best straight
* line (for each group).
egen groupmean = mean(crp), by(group time)
twoway (scatter groupmean time if group==1, msymbol(Oh)) ///
(scatter groupmean time if group==2, msymbol(O)) ///
(lfit groupmean time if group==1) ///
(lfit groupmean time if group==2), ///
legend(label(1 "CPB") label(2 "Sham")) ///
ytitle("Mean crp")
* With only four time-points it is hard to say anything about whether or not it
* is reasonable to describe the development of the crp level using a straight
* line. There is a tendency in the sham group that the variation around the
* line is larger at later time-points.
* 2. Analyze the data using the random coefficient model.
mixed crp bn.group bn.group#c.time, nocons || id: time, cov(un) reml
pwcompare group#c.time, eff
* Mixed-effects REML regression Number of obs = 88
* Group variable: id Number of groups = 22
*
* Obs per group:
* min = 4
* avg = 4.0
* max = 4
*
* Wald chi2(4) = 165.18
* Log restricted-likelihood = -184.11485 Prob > chi2 = 0.0000
*
* -----------------------------------------------------------------------------
* crp | Coef. Std. Err. z P>|z| [95% Conf. Interval]
* -------------+---------------------------------------------------------------
* group |
* 1 | 1.535417 .5836943 2.63 0.009 .3913969 2.679436
* 2 | 1.295 .639405 2.03 0.043 .0417891 2.548211
* |
* group#c.time |
* 1 | .8241667 .2687084 3.07 0.002 .2975079 1.350825
* 2 | 2.2205 .2943553 7.54 0.000 1.643574 2.797426
* -----------------------------------------------------------------------------
*
* -----------------------------------------------------------------------------
* Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
* -----------------------------+-----------------------------------------------
* id: Unstructured |
* var(time) | .3915264 .2921052 .0907208 1.689721
* var(_cons) | .526458 1.49937 .001982 139.8347
* cov(time,_cons) | .0552238 .5525121 -1.02768 1.138128
* -----------------------------+-----------------------------------------------
* var(Residual) | 2.37462 .5062683 1.563573 3.606368
* -----------------------------------------------------------------------------
* LR test vs. linear model: chi2(3) = 35.32 Prob > chi2 = 0.0000
* | Contrast Std. Err. z P>|z| [95% Conf. Interval]
* -------------+---------------------------------------------------------------
* crp |
* group#c.time |
* 2 vs 1 | 1.396333 .398559 3.50 0.000 .6151721 2.177495
* We see a clear difference between the two groups with a significantly larger
* increase in crp over time in the sham group (1.4 per time-unit higher in the
* sham group compared to the CPB group, 95%-CI: 0.6 - 2.2, p<0.001).
predict fit, fitted
predict res, rstandard
scatter res fit, name(g1,replace)
qnorm res, name(g2,replace)
graph combine g1 g2
* We note that there is some deviation from a straight line in the QQ-plot
* whereas the plot of the standardized reiduals against the fitted values looks
* fine.
* 3. Repeat this for the log-transformed data.
generate logcrp = log(crp)
drop groupmean
egen groupmean = mean(logcrp), by(group time)
twoway (scatter groupmean time if group==1, msymbol(Oh)) ///
(scatter groupmean time if group==2, msymbol(O)) ///
(lfit groupmean time if group==1) ///
(lfit groupmean time if group==2), ///
legend(label(1 "CPB") label(2 "Sham")) ///
ytitle("Mean log-crp")
* Again it is difficult to say anything definitive about a straight line
* relationship with only four time-points. Perhaps you would conclude that the
* points are more evenly distributed around the lines, and that we still see
* a larger increase in crp over time in the sham group compared to the CPB
* group even though the development is much more similar on the log-scale.
mixed logcrp bn.group bn.group#c.time, nocons || id: time, cov(un) reml
pwcompare group#c.time, eff
* | Contrast Std. Err. z P>|z| [95% Conf. Interval]
* -------------+---------------------------------------------------------------
* logcrp |
* group#c.time |
* 2 vs 1 | .0805914 .0759009 1.06 0.288 -.0681717 .2293545
* We conclude that there is really not much evidence against the hypothesis of
* equal slopes in the two groups on the log-scale (p=0.29). Let us estimate a
* common slope in the two groups and test whether the two lines are shifted
* vertically:
mixed logcrp bn.group c.time, nocons || id: time, cov(un) reml
pwcompare group, eff eform
* Mixed-effects REML regression Number of obs = 88
* Group variable: id Number of groups = 22
*
* Obs per group:
* min = 4
* avg = 4.0
* max = 4
*
* Wald chi2(3) = 340.46
* Log restricted-likelihood = -53.633622 Prob > chi2 = 0.0000
*
* -----------------------------------------------------------------------------
* logcrp | Coef. Std. Err. z P>|z| [95% Conf. Interval]
* -------------+---------------------------------------------------------------
* group |
* 1 | .4835285 .1474973 3.28 0.001 .1944391 .7726179
* 2 | 1.056275 .1561077 6.77 0.000 .7503094 1.36224
* |
* time | .2702991 .0379078 7.13 0.000 .1960012 .3445971
* -----------------------------------------------------------------------------
*
* -----------------------------------------------------------------------------
* Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
* -----------------------------+-----------------------------------------------
* id: Unstructured |
* var(time) | .0101232 .0107786 .001256 .0815887
* var(_cons) | .1867305 .1146633 .0560439 .62216
* cov(time,_cons) | -.0239918 .0309177 -.0845895 .0366058
* -----------------------------+-----------------------------------------------
* var(Residual) | .1074542 .0229093 .0707533 .1631924
* -----------------------------------------------------------------------------
* LR test vs. linear model: chi2(3) = 25.74 Prob > chi2 = 0.0000
* | exp(b) Std. Err. z P>|z| [95% Conf. Interval]
* -------------+---------------------------------------------------------------
* logcrp |
* group |
* 2 vs 1 | 1.77313 .3006793 3.38 0.001 1.271738 2.472199
* We see that there is clear evidence against the hypothesis of identical lines
* describing the development of crp over time, p=0.001. The conclusion is that
* the crp-level in the sham group is 77% higher (95%-CI: 27-147%) than in the
* CPB group at all time-points. With every time-unit we expect a exp(0.2702991)
* = 1.310356 or a 31% increase (95%-CI: 22-41%) in crp in both groups.
drop fit res
predict fit, fitted
predict res, rstandard
scatter res fit, name(g1,replace)
qnorm res, name(g2,replace)
graph combine g1 g2
* The residual plots look very similar to the ones for the original data.
* Perhaps the QQ-plot is closer to a straight line, but there is not a great
* difference. All in all no clear deviations from the random coefficient model.
* 4. Write a short summary of the statistical methods used in the analysis and
* the findings. How do these findings compare to what we saw in exercise 6?
* We analyzed the data using a random coefficient model on a log-scale. There
* was clear evidence against the hypothesis of identical lines describing the
* development in crp over time, p=0.001. At all time-points the crp-level in
* the sham group is 77% higher (95%-CI: 27-147%) than in the CPB group. With
* every time-unit we expect a 31% increase (95%-CI: 22-41%) in crp in both
* groups. In exercise 6 we analyzed the same data on a log-scale with the
* univariate repeated measurements ANOVA and found a 78% (95%-CI: 28-148%,
* p=0.001) higher median crp-level in the sham group compared to the CPB group
* at all time-points. The conclusions regarding the group differences are
* therefore very much alike.
***** Exercise 9 (FEV1 after different concentrations of smoke) ******
* Reading in the data.
use smoke.dta, clear
* 1. Plot the individual and the mean curves.
scatter fev1 time, connect(L) sort(conc time) by(sub) msy(i)
scatter fev1 time, connect(L) sort(conc sub time) by(conc)
* We see that there is no or less the same variation within and between
* subjects (and between days on the same subject) for the different
* concentrations of wood burning smoke. It does not appear that there are very
* deviant observations or individuals.
egen groupmean = mean(fev1), by(conc time)
sort conc time sub
twoway (connected groupmean time if conc==1, msymbol(Oh)) ///
(connected groupmean time if conc==2, msymbol(O)) ///
(connected groupmean time if conc==3, msymbol(+)), ///
legend(label(1 "0 ug/m3") label(2 "200 ug/m3") label(3 "400 ug/m3") ///
rows(1)) ytitle("Mean FEV1 (l)")
* On average we observe a clear drop in FEV1 corresponding to the
* concentrations 200 and 400 ug/m3, followed by a slow increase. For
* concentration 0 ug/m3 the average FEV1 is more or less at the same level for
* the whole time-period.
* 2. Consider the decrease in FEV1 from baseline to 2 hours as a summary
* statistic. What is the effect of dose on this difference in FEV1?
drop groupmean
reshape wide fev1, i(sub day) j(time)
generate FEV1dif = fev10 - fev12
anova FEV1dif order/sub day conc
pwcompare conc, eff
* Number of obs = 36 R-squared = 0.6964
* Root MSE = .805374 Adj R-squared = 0.4687
*
* Source | Partial SS df MS F Prob>F
* -----------+----------------------------------------------------
* Model | 29.756076 15 1.9837384 3.06 0.0105
* |
* order | 2.8257779 5 .56515557 2.99 0.1074
* sub | 1.1339833 6 .18899722
* -----------+----------------------------------------------------
* day | 2.3145068 2 1.1572534 1.78 0.1936
* conc | 22.144538 2 11.072269 17.07 0.0000
* |
* Residual | 12.972556 20 .64862779
* -----------+----------------------------------------------------
* Total | 42.728632 35 1.220818
* | Contrast Std. Err. t P>|t| [95% Conf. Interval]
* ------------------+----------------------------------------------------------
* conc |
* 200 vs 0 ug/m3 | 1.14 .3287927 3.47 0.002 .4541504 1.825849
* 400 vs 0 ug/m3 | 1.909167 .3287927 5.81 0.000 1.223317 2.595016
* 400 vs 200 ug/m3 | .7691667 .3287927 2.34 0.030 .0833171 1.455016
* From the anova-command we see that there is clear evidence against the
* hypothesis of equal mean decrease in FEV1 from baseline to 2 hours for the
* three concentrations (p<0.0001). This corresponds to a oneway ANOVA but takes
* into account subject, day and order. Our best estimate of the difference in
* decrease from baseline to 2 hours in that it is 1.14 l (95%-CI: 0.45-1.83 l,
* p=0.002) higher at a concentration of 200 ug/m3 compared to 0. The mean
* decrease from baseline to 2 hours after a concentration of 400 ug/m3 is
* 0.77 l (95%-CI: 0.08-1.46 l, p=0.03) higher than after 200 ug/m3.
* 3. Make an analysis of all the data.
drop FEV1dif
reshape long fev1, i(sub day) j(time)
anova fev1 order /sub day conc /sub|day time conc#time, bse(sub#day) ///
repeated(time)
* Number of obs = 180 R-squared = 0.7883
* Root MSE = .545361 Adj R-squared = 0.7130
*
* Source | Partial SS df MS F Prob>F
* -----------+----------------------------------------------------
* Model | 146.22137 47 3.111093 10.46 0.0000
* |
* order | 11.235737 5 2.2471473 0.44 0.8054
* sub | 30.458809 6 5.0764681
* -----------+----------------------------------------------------
* day | 2.104563 2 1.0522815 2.15 0.1428
* conc | 6.3111662 2 3.1555831 6.44 0.0069
* sub|day | 9.7946863 20 .48973431
* -----------+----------------------------------------------------
* time | 38.759931 4 9.6899827 32.58 0.0000
* conc#time | 26.379986 8 3.2974983 11.09 0.0000
* |
* Residual | 39.259324 132 .29741912
* -----------+----------------------------------------------------
* Total | 185.48069 179 1.036205
*
* Between-subjects error term: sub|day
* Levels: 36 (20 df)
* Lowest b.s.e. variable: sub
* Covariance pooled over: day (for repeated variable)
*
* Repeated variable: time
* Huynh-Feldt epsilon = 1.3141
* *Huynh-Feldt epsilon reset to 1.0000
* Greenhouse-Geisser epsilon = 0.6492
* Box's conservative epsilon = 0.2500
*
* ------------ Prob > F ------------
* Source | df F Regular H-F G-G Box
* -----------+----------------------------------------------------
* time | 4 32.58 0.0000 0.0000 0.0000 0.0000
* conc#time | 8 11.09 0.0000 0.0000 0.0000 0.0002
* Residual | 132
* ----------------------------------------------------------------
* As we would expect we see clear evidence against the hypothesis of equal
* development over time for the three concentrations (p<0.0001). This is based
* on the univariate repeated measurements ANOVA but taking into account the
* different design variables corresponding to the cross-over design. It is not
* altered by the different corrections to the F-test that Stata provides.
* Analysis using mixed to get estimates, fitted values and residuals:
mixed fev1 bn.order bn.day bn.conc bn.time bn.conc#bn.time || sub: || day:
predict fit if e(sample), fit
predict res if e(sample), rstandard
scatter res fit, yline(0) name(q1,replace)
qnorm res, title ("residual probability-plot") name(q2,replace)
graph combine q1 q2
* Nothing in the plot of the standardized residuals against the fitted values
* or the QQ-plot to suggest that the statistical analysis is not appropriate
* for these data.
margins conc#time
marginsplot, xdimension(time)
* 4. Is the FEV1-level back to normal after 6 hours?
contrast r(0 4).time@conc, eff nowald
* | Contrast Std. Err. z P>|z| [95% Conf. Interval]
* --------------------+--------------------------------------------------------
* fev1 |
* time@conc |
* (4 vs 0) 0 ug/m3 | .0158334 .2131643 0.07 0.941 -.401961 .4336278
* (4 vs 0) 200 ug/m3 | .0825 .2131643 0.39 0.699 -.3352944 .5002944
* (4 vs 0) 400 ug/m3 |-.9891667 .2131643 -4.64 0.000 -1.406961 -.5713723
* We see that we are more or less back to normal efter 6 hours for the
* concentrations 0 and 200 ug/m3 (0: p=0.94, 200: p=0.70), no evidence against
* that anyway. For 400 ug/m3, however, our best guess is that the mean FEV1
* after 6 hours is 0.99 l (95%-CI: 0.57-1.41 l, p<0.0001) lower than at
* baseline.