*******************************************************************************
*** ***
*** Stata-program (do-file) associated with the Exercises on day two. ***
*** ***
*** ***
*******************************************************************************
* Specifying the path to where the data are, for example
cd "C:\ANOVA\DAY 4"
**** Exercise 10 (Capillary density for ulcerated patients and controls) ****
* Reading in the data.
use capillary.dta, clear
* 1. Make a scatter plot of the data in each group and connect density
* measurements corresponding to the same subject.
scatter density foot, connect(L) sort(subject foot) by(group)
* It looks as if there is considerable variation between individuals. There
* are not obvious deviating observations or subjects. It seems that the
* capillary density in the control group is slightly higher than in the
* patient group, but it is not clear if there is a systematic difference
* between feet.
* 2. Consider the healthy controls. Is there any systematic difference between
* the capillary density corresponding to the right and left foot? What about
* the worse and the better foot for the ulcerated patients?
mixed density bn.foot if group==1, nocons || subject: , reml var ///
dfmethod(kroger)
pwcompare foot, eff small
* | Contrast Std. Err. t P>|t| [95% Conf. Interval]
* ---------------+------------------------------------------------------------
* density |
* foot |
* Left vs Right | .7222222 .8922524 0.81 0.429 -1.160266 2.60471
* ----------------------------------------------------------------------------
* We conclude that the mean capillary density is slightly higher in the left
* foot for the controls compared to the right foot (0.7 /mm3, 95%-CI:
* -1.2 - 2.6), though not significantly so (p=0.43).
mixed density bn.foot if group==2, nocons || subject: , reml var ///
dfmethod(kroger)
pwcompare foot, eff small
* The conclusion is the same in the ulcerated patient group (better - worse:
* 0.8 /mm3, 95%-CI: -2.4 - 4.0), p=0.60.
* 3. Test whether there is any systematic difference between the patients and
* the controls. Find estimates and 95%-confidence intervals for the
* capillary density in each group and for the difference.
mixed density bn.group, nocons || subject: , reml var dfmethod(kroger)
contrast group, small eff nowald
* density | Coef. Std. Err. t P>|t| [95% Conf. Interval]
* -------------+---------------------------------------------------------------
* group |
* Controls | 34.08333 1.736798 19.62 0.000 30.54559 37.62107
* Patients | 23.65625 1.842152 12.84 0.000 19.90391 27.40859
* -----------------------------------------------------------------------------
*
* -----------------------------------------------------------------------------
* Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
* -----------------------------+-----------------------------------------------
* subject: Identity |
* var(_cons) | 48.31111 13.6515 27.76636 84.05725
* -----------------------------+-----------------------------------------------
* var(Residual) | 11.97058 2.903293 7.441641 19.25582
* | Contrast Std. Err. t P>|t| [95% Conf. Interval]
* --------------------+--------------------------------------------------------
* density |
* group |
* (Patients vs base) |-10.42708 2.531796 -4.12 0.000 -15.58418 -5.269983
* There is significantly higher mean capillary density in the control group
* compared to patient group (10.4, 95%-CI: 5.3 - 15.6, p<0.0001). The estimated
* mean capillary density is 34.1, 95%-CI: 30.5 - 37.6 in the control group and
* 23.7, 95%-CI: 19.9 - 27.4 in the patient group.
* Model validation:
predict fit, fitted
predict res, rstandard
scatter res fit, name(g1,replace)
qnorm res, name(g2,replace)
graph combine g1 g2
* No obvious pattern in the plot of the standardized residuals against the
* fitted values, and no clear deviation from a straight line in the QQ-plot.
* 4. Estimate the variation between feet and the variation between subjects.
* Which source of variation explains most of the variation in the
* measurements (how much)?
* The variance component associated with the variation between subjects is
* estimated to be 48.31, and the variation between feet on the same person
* is estimated to be 11.97:
*
* Variation Estimate Percent
* Subject 48.31 80
* Feet 11.97 20
* Total 60.28 100
*
* So 80% of the variation in the capillary density is related to the variation
* between subjects whereas 20% is associated to the variation between feet
* within subjects.
* 5. Observations corresponding to the same person are now positively
* correlated. What is (according to the model) the estimated correlation
* between measurements from the same person?
estat icc
* Level | ICC Std. Err. [95% Conf. Interval]
* ---------------------------+------------------------------------------------
* subject | .8014226 .0623 .6520235 .8968286
* We see (again) that the correlation between measurements on the same subjects
* is 80%, 95%-CI: 65% - 90%.
* 6. Write a short summary of statistical methods used in the analysis and the
* findings.
* First we compared the capillary density measurements on the two feet within
* each group to see if there was a systematic difference between feet in the
* two group. We did not find any significant difference in any of the two
* groups (paired t-tests):
*
* Controls: 0.7 capillaries/mm3, 95%-CI: -1.2 - 2.6, p=0.43
* Patients: 0.8 capillaries/mm3, 95%-CI: -2.4 - 4.0, p=0.60
*
* We then proceeded to analyze all the data without including a systematic
* effect of foot in the analysis. There we found a significantly higher
* capillary density in the control group compared to the patient group
* (p<0.0001):
*
* Control: 34.1, 95%-CI: 30.5 - 37.6
* Patient: 23.7, 95%-CI: 19.9 - 27.4
* Difference: 10.4, 95%-CI: 5.3 - 15.6
*
* An inspection of the residuals yielded no clear pattern in the plot of the
* residuals against the fitted values and no obvious departures from a straight
* line in the QQ-plot. Finally, around 80% of the variation was due to the
* variation between individuals and the remaining 20% was connected to the
* variation between feet within individuals.
**** Exercise 11 (PSS in an intervention and a waiting list group) ****
* Reading in the data.
use pss.dta, clear
* 1. Analyze the data with special focus on the effectiveness of the
* intervention.
bysort group: pwcorr pss1 pss2 pss3 pss4
tabstat pss1 pss2 pss3 pss4, nototal by(group) stat(sd)
* -> group = WLC-group
*
* | pss1 pss2 pss3 pss4
* -------------+------------------------------------
* pss1 | 1.0000
* pss2 | 0.4759 1.0000
* pss3 | 0.4628 0.3594 1.0000
* pss4 | 0.5182 0.5006 0.6717 1.0000
*
* -> group = I-group
*
* | pss1 pss2 pss3 pss4
* -------------+------------------------------------
* pss1 | 1.0000
* pss2 | 0.3055 1.0000
* pss3 | 0.0506 0.6039 1.0000
* pss4 | . . . .
*
* group | pss1 pss2 pss3 pss4
* ----------+----------------------------------------
* WLC-group | 5.243305 6.324403 6.221911 6.454911
* I-group | 5.095656 5.902225 6.222718 .
* We see more or less the same standard deviations and correlations when you
* take into account that we only have three measurements in the intervention
* group and four in the waiting list group.
reshape long pss, i(id) j(time)
scatter pss time, connect(L) sort(time) by(id)
scatter pss time, connect(L) sort(id time) by(group) cmissing(n)
* No very deviating observations or individuals. There is a substantial
* variation within subject but not so large between subject variation.
egen groupmean = mean(pss), by(group time)
sort group time id
twoway (connected groupmean time if group==0, msymbol(Oh)) ///
(connected groupmean time if group==1, msymbol(O)), ///
legend(label(1 "WLC-group") label(2 "I-group")) ///
ytitle("Mean PSS")
* Regarding the mean curves: In the intervention group we see a clear fall in
* the mean PSS during the intervention period. There is a further smaller drop
* in the period after the intervention in that group. In the waiting list group
* we see a small drop in mean PSS while on the waiting list and a more
* pronounced one over the intention period. Also in this group we see a small
* decline in the period following the intervention.
xi: mixed pss bn.time##bn.group || id: i.time, cov(un) matlog ///
technique(bfgs) reml
predict fit, xb
predict res, rstandard
scatter res fit, name(g1,replace)
qnorm res, name(g2,replace)
graph combine g1 g2
* Some pattern in the plot of the standardized residuals against the fitted
* values due to the discrete nature of the data. The QQ-plot looks fine.
testparm bn.time#bn.group
* ( 1) [pss]1bn.time#0bn.group = 0
* ( 2) [pss]2.time#0bn.group = 0
*
* chi2( 2) = 17.28
* Prob > chi2 = 0.0002
* We see clear evidence against the hypothesis of equal development in the mean
* perceived stress score over time (Test 1), p=0.0002. This is no surprise
* given the nature of the mean curves in the plot.
margins bn.time#bn.group
contrast group@time, eff nowald
* | Contrast Std. Err. z P>|z| [95% Conf. Interval]
* ---------------------+-------------------------------------------------------
* pss |
* group@time |
* (I-group vs base) 1 | 1.174143 1.031164 1.14 0.255 -.8469006 3.195187
* (I-group vs base) 2 |-4.182571 1.261815 -3.31 0.001 -6.655684 -1.709458
* (I-group vs base) 3 |-1.377265 1.386605 -0.99 0.321 -4.094962 1.340431
* (I-group vs base) 4 | . (not estimable)
* We notice that the two groups are not significantly different at baseline
* (this was a randomized trial). At time point 2, where the intervention group
* has finished the intervention whereas the waiting list group has been, well,
* on a waiting list, the intervention group has a significantly lower mean PSS,
* 4.2 point, 95%-CI: 1.7 - 6.7, p=0.001. At time point 3, where the waiting
* list group has also undergone the intervention, the mean PSS is 1.4 points
* (95%-CI: -1.3 - 4.1) lower in the intervention group compared to the waiting
* list group (p=0.32). This is mainly due to the fact that the mean PSS
* continues to drop in the intervention group.
* It looks as if the waiting list patients also reduce their stress during
* their time on the waiting list, so the question is if the drop in the
* intervention group is larger than the drop in the waiting list group.
lincom (1.time#0.group - 2.time#0.group) - (1.time#1.group - 2.time#1.group)
* pss | Coef. Std. Err. z P>|z| [95% Conf. Interval]
* -----------+----------------------------------------------------------------
* (1) | -5.356714 1.293952 -4.14 0.000 -7.892813 -2.820615
* So our best guess is that the drop in mean PSS is 5.4 points
* (95%-CI: 2.8-7.9) higher if you start out with the intervention compared to
* if you start out on the waiting list, p<0.0001.
* 2. Is there the same effect of the intervention even though it is postponed
* by three months?
lincom (2.time#0.group - 3.time#0.group) - (1.time#1.group - 2.time#1.group)
* pss | Coef. Std. Err. z P>|z| [95% Conf. Interval]
* -----------+----------------------------------------------------------------
* (1) | 2.805306 1.402769 2.00 0.046 .0559288 5.554682
* The drop in mean PSS during the intervention period is 2.8 points higher in
* the intervention group compared to the waiting list group (95%-CI: 0.1-5.6)
* and only just statistically significant, p=0.046.
lincom (1.time#0.group - 3.time#0.group) - (1.time#1.group - 3.time#1.group)
* pss | Coef. Std. Err. z P>|z| [95% Conf. Interval]
* -----------+----------------------------------------------------------------
* (1) | -2.551408 1.48144 -1.72 0.085 -5.454977 .352161
* You could say that the combined effect of waiting list and intervention is
* more or less the same as the intervention and a post-intervention period:
* 2.6 point higher drop in the intervention group compared to the waiting list
* group (95%-CI: -0.4 - 5.5), which is also borderline significant, p=0.09.
* 3. Write a short summary of statistical methods used in the analysis and the
* findings.
* The data were analyzed using a multivariate repeated measurements ANOVA with
* time and group as the factors of interest. Model validation was performed by
* inspecting the standardized residuals. While the QQ-plot did not reveal any
* deviations from a straight line, there was some patterns in the plot of the
* standardized residuals against the fitted values due to the discrete nature
* of the data.
*
* An overall test made ud conclude that there was clear evidence against the
* hypothesis of equal development in the mean perceived stress score over time
* in the intervention group and the waiting list group, p=0.0002.
*
* The drop in mean PSS during the intervention period is 5.4 points (95%-CI:
* 2.8-7.9) higher in the intervention group (those who start out with the
* intervention) compared to the drop in the waiting list period in the waiting
* list group, p<0.0001.
* The drop in mean PSS during the intervention period is 2.8 points higher in
* the intervention group compared to the drop in the intervention period in the
* waiting list group (95%-CI: 0.1-5.6) and only just statistically significant,
* p=0.046. This can partly be explained by the fact that there is a drop in
* mean PSS during the waiting list period in the waiting list group.