Preliminary plan
Notes, Data sets and programs
for
Postgraduate course
in
Linear and logistic regression

Edited October 31st, 2013
By Morten Frydenberg
morten@biostat.au.dk



If you are using your own laptop at the exercises then download datasets and do-file.
A list of all data sets All data
If you want the data in SAS or R go here.

Where to store downloaded and homemade Stata programs (so called ado files):
By default Stata assume that the downloaded ado and help files are located in
C:\ado\personal (for you personal/homemade programs) and
C:\ado\plus (for the one you download from the net).
C:\ado (ado files made many years ago).

You can check the setting by by the commmand sysdir in Stata.

If you want to store the programs in another locating, say on your R drive,
you do this by the commands

sysdir set PERSONAL "R:\ado\personal\"
sysdir set PLUS     "R:\ado\plus\"
sysdir set OLDPLACE "R:\ado\"

These lines should either be in your profile.do file or in the begining of every do file you use.

We use some extra Stata commands you can install these in Stata by
net from http://www.biostat.au.dk/teaching/Ados
net install cis
net install regeq
net from http://www.stata.com
net install gr42_6

The command qplot can make several qq-plots at the same time. Note that hou have to apply the option "trscale(invnorm(@)".
An example: to make qnorm plots of res for the two sexes: qplot res, trscale(invnorm(@)) by(sex)

Day 1: Monday November 4th 2013
9.15 - 10.30 Lecture: Simple linear regression -1 .
The model, the parameters, estimation and inference.

All Stata code used at the lecture.
Data set used at the lecture: lung
10.30 - 12.00 Exercises .
Data set used at the exercises: lung.
12.00 - 13.00 Lunch break
13.00 - 14.30 Lecture: Simple linear regression -2 .
Checking the model, residuals, leverage, diagnostics plots,transformation of variables.

Most of Stata code used at the lecture.
Data set used at the lecture: lung and gfrdata.
14.30 - 16.00 Exercises .
Data set used at the exercises: gfrdata and glyco.
How to get from the log - model to the original scale.


Day 2: Wednesday November 6th 2013

9.15 - 9.30 Summarizing Mondays exercises.
9.30 - 10.30 Lecture: Multiple linear regression - 1
The model, the parameters, estimation and inference.
Checking the model.

All of Stata code used at the lectures today.
Data set used at the lecture: fram200 .
10.30 - 12.00 Exercises .
Data set used at the exercises: lung and fram200 .
12.00 - 12.30 Lunch break
12.30 - 14.00 Lecture:
Prior to Stata 11 Multiple linear regression - 2
Stata 11+12             Multiple linear regression - 2
Stata 13             Multiple linear regression - 2
Working med categorical explanatory variables
Interaction/effectmodification.
14.00 - 15.30 Exercises .
Data set used at the exercises: lung and fram200 .


Day 3: Friday November 8th 2013

9.15 - 12.00 Exercises .
Data set used at the exercises: serumchol.
12.00 - 12.30 Lunch break
12.30 - 13.30 Exercises continued .
13.30 - 15.30 Lecture: Linear regression, collinerarity, splines and extensions
Collinearity
Restricted cubic splines
Clustered data

Some off Stata code used at the lecture.
Data set used at the lecture : serumchol194 , Framingham, and FEV .

Home work
The homework to the last week is to go through the lectures on logistic regression day 7 in the Basic Statistics course Day7.pdf (Day7.do).
After that you should complete the exercises Exercise7.pdf with data : postterm and tatsoib.
SPSS users should substitute exercise 7.1 with SPSSday7_1.pdf.


Day 4: Monday November 18th 2013

9.15 - 10.00 Discussing the home work.
10:15 - 12:00 Lecture:
Prior to Stata 11 Logistic regression .
Stata 11             Logistic regression .
Odds ratios via logistic regression
Continuous independent variables
Categorical independent variables
Interactions
Wald and likelihood ratio test
The logistic regression model in general

Most of Stata code used at the lectures.
Data set used at the lecture: obese and case_control.
12.00 - 12.30 Lunch break.
12.30 - 14.00 The lecture continued.
14.00 - 15.30 Exercises
Data set used at the exercises: obese.


Day 5: Wednesday November 20th 2013

9.15 - 10.00 Exercises. - Monday afternoon continued
10.15 - 12.00 Lecture: Modelbuilding in regression models
Modelbuilding: this to consider
Confounding and adjustment
Model selection an its consequences
Over-fitting
A strategy
12.00 - 12.30 Lunch break
12.30 - 15.30 Exercises
Data set used at the exercises: coffee.


Day 6: Friday November 22nd 2013

9.15 - 12.00 Working with wednesdays exercise
12.00 - 12.30 Lunch break
12.30 - 13.30 Discussing wednesdays exercise
13.30 - 15.00 Lecture: Working with logistics regression models and Extensions .
Diagnostics for logistic regression
Conditional logistic regression
Models for relative risk or risk differences
Missing data
Binary data with several random components

Some of Stata code used at the lectures.
15.00 - 15.30 Course evaluation