Preliminary plan
Notes, Data sets and programs
for
Postgraduate course
in
Linear and logistic regression

Edited September 23rd, 2010
By Morten Frydenberg
morten@biostat.au.dk



If you are using your own laptop at the exercises then download datasets and do-file.
A list of all data sets All data
If you want the data in SAS or R go here.

Where to store downloaded and homemade Stata programs (so called ado files):
By default Stata assume that the downloaded ado and help files are located in
C:\ado\personal (for you personal/homemade programs) and
C:\ado\plus (for the one you download from the net).
C:\ado (ado files made many years ago).

You can check the setting by by the commmand sysdir in Stata.

If you want to store the programs in another locating, say on your R drive,
you do this by the commands

sysdir set PERSONAL "R:\ado\personal\"
sysdir set PLUS     "R:\ado\plus\"
sysdir set OLDPLACE "R:\ado\"

These lines should either be in your profile.do file or in the begining of every do file you use.

User written Stata procedure you need during the course:
The .ado and the .sthlp files should be place in your personal ado-folder (see above).
  • Confidence interval for the standard deviation in a regression model
    (This will now work with Stata 9 , 10 and 11):
    cisd.ado and cisd.sthlp
  • Save coefficients and writting equation after a regression model:
    (Version 1.1 will now work on complicate models with many parameters)
    regeq.ado and regeq.sthlp

Day 1: Monday November 8th 2010
9.15 - 10.30 Lecture: Simple linear regression -1 .
The model, the parameters, estimation and inference.

All Stata code used at the lecture.
Data set used at the lecture: lung
10.30 - 12.00 Exercises .
Data set used at the exercises: lung.
12.00 - 13.00 Lunch break
13.00 - 14.30 Lecture: Simple linear regression -2 .
Checking the model, residuals, leverage, diagnostics plots,transformation of variables.

Most of Stata code used at the lecture.
Data set used at the lecture: lung and gfrdata.
14.30 - 16.00 Exercises .
Data set used at the exercises: gfrdata and glyco.


Day 2: Wednesday November 10th 2010

9.15 - 9.30 Summarizing Mondays exercises.
9.30 - 10.30 Lecture: Multiple linear regression - 1 .
The model, the parameters, estimation and inference.
Checking the model.

All of Stata code used at the lectures today.
Data set used at the lecture: fram200 .
10.30 - 12.00 Exercises .
Data set used at the exercises: lung and fram200 .
12.00 - 12.30 Lunch break
12.30 - 14.00 Lecture:
Prior to Stata 11 Multiple linear regression - 2
Stata 11             Multiple linear regression - 2
Working med categorical explanatory variables
Interaction/effectmodification.
14.00 - 15.30 Exercises .
Data set used at the exercises: lung and fram200 .


Day 3: Friday November 12th 2010

9.15 - 12.00 Exercises .
Data set used at the exercises: serumchol.
12.00 - 12.30 Lunch break
12.30 - 13.30 Exercises continued .
13.30 - 15.30 Lecture: Linear regression, collinerarity, splines and extensions
Colinearity
Restricted cubic splines
Random coefficient models
Clustered data

Some off Stata code used at the lecture.
Data set used at the lecture : serumchol194 , Framingham, FEV and greymatter.

Home work
The homework to the last week is to go through the lectures on logistic regression day 7 in the Basic Statistics course Day7.pdf (Day7.do).
After that you should complete the exercises Exercise7.pdf with data : postterm and tatsoib.
SPSS users should substitute exercise 7.1 with SPSSday7_1.pdf.


Day 4: Monday November 22nd 2010

9.15 - 10.00 Discussing the home work.
10:15 - 12:00 Lecture:
Prior to Stata 11 Logistic regression .
Stata 11             Logistic regression .
Odds ratios via logistic regression
Continuous independendt variables
Categorical independendt variables
Interactions
Wald and likelihood ratio test
The logistic regression model in general

Most of Stata code used at the lectures.
Data set used at the lecture: obese and case_control.
12.00 - 12.30 Lunch break.
12.30 - 14.00 The lecture continued.
14.00 - 15.30 Exercises
Data set used at the exercises: obese.


Day 5: Wednesday November 24th 2010

9.15 - 10.00 Exercises. - Monday afternoon continued
10.15 - 12.00 Lecture: Modelbuilding in regression models
Modelbuilding: this to consider
Confounding and adjustment
Model selection an its consequences
Over-fitting
A strategy
12.00 - 12.30 Lunch break
12.030 - 15.30 Exercises
Data set used at the exercises: coffee.


Day 6: Friday November 26th 2010

9.15 - 10.00 Discusing wednesdays exercise
10.15 - 12.00 Lecture: Working with logistics regression models and Extensions .
Diagnostics for logistic regression
ROC curves and the area under the ROC-curve
Conditional logistic regression
Models for relative risk or risk differences
Clustered binary data
Missing data

Some of Stata code used at the lectures.
Data set used at the lecture: obese and euroscore.
12.00 - 12.30 Lunch break
12.30 - 14.45 Case studies
14.45 - 15.30 Course evaluation


The assignment