Notes, Data sets and programs
for
Postgraduate course
in
Linear and logistic regression

Edited August 17, 2007
By Morten Frydenberg
morten@biostat.au.dk



Day 1: Monday May 7 2007
9.15 - 10.30 Lecture: Simple linear regression -1 .
The model, the parameters, estimation and inference.

All Stata code used at the lecture.
Data set used at the lecture Stata: lung.dta . SPSS: lung.sav.

If you want the data in SAS go here
.
10.30 - 12.00 Exercises .
The lung data Stata lung.dta . SPSS: lung.sav.
12.00 - 13.00 Lunch break
13.00 - 14.30 Lecture: Simple linear regression -2 .
Checking the model, residuals, leverage, diagnostics plots,transformation of variables.

Most of Stata code used at the lecture.
Data set used at the lecture: Stata: lung.dta and gfrdata.dta . SPSS: lung.sav and gfrdata.sav .
14.30 - 16.00 Exercises .
The gfr data Stata gfrdata.dta SPSS: gfrdata.sav .
The glyco data Stata glyco.dta SPSS: glyco.sav .


Day 2: Thursday May 10 2007

9.15 - 9.30 Summarizing Tuesdays exercises.
9.30 - 10.30 Lecture: Multiple linear regression - 1 .
The model, the parameters, estimation and inference.
Checking the model.

All of Stata code used at the lectures.
Data set used at the lecture Stata: fram200.dta . SPSS: fram200.sav .
10.30 - 12.00 Exercises .
Data Stata: lung.dta and fram200.dta . SPSS: lung.sav and fram200.sav .
12.00 - 13.00 Lunch break
13.00 - 14.30 Lecture: Multiple linear regression - 2 (updated 11/24/05)
Working med categorical explanatory variables
Interaction/effectmodification.
14.30 - 16.00 Exercises .
Data Stata: lung.dta and fram200.dta . SPSS: lung.sav and fram200.sav .

Home work

The home work with data sets Stata: case_control.dta and serumchol.dta . SPSS: case_control.sav and serumchol.sav .
Slides used at the discussion og the home work.


Day 3: Monday May 21 2007

9.15 - 10.00 Summarizing the home work exercises.
10:15 - 12:00 Lecture: Logistic regression .
Odds ratios via logistic regression
Continuous independendt variables
Categorical independendt variables
Interactions
Wald and likelihood ratio test
The logistic regression model in general

Most of Stata code used at the lectures.
Data set used at the lecture Stata: obese.dta and case_control.dta . SPSS: case_control.sav and obese.sav.
12.00 - 13.00 Lunch break.
13.00 - 14.30 The lecture continued.
14.30 - 16.00 Exercises .
The prostate cancer data set prossub.dta . SPSS: prossub.sav .



Day 4: Thursday May 24 2007

9.15 - 10.00 Exercises. - Tuesday afternoon continued
10.15 - 12.00 Lecture: Working with linear and logistics regression models .
Diagnostics for logistic regression
Test and estimation after the model has been fitted in Stata
Colinearity
Things to consider when specifying a model
Model selection an its consequences

All Stata code used at the lecture.
Data set used at the lecture Stata: obese.dta and serumchol194.dta .
SPSS: obese.sav and serumchol194.sav .
12.00 - 13.00 Lunch break
13.00 - 15.00 Lecture: Extensions .
Conditional logistic regression
Models for relative risk or risk differences
Clustered data
Non-linear regression

All Stata code used at the lecture.
Data set used at the lecture Stata:
obese.dta , oralcancer.dta , FEV.dta and AZT.dta .
SPSS:
obese.sav , oralcancer.sav , FEV.sav and AZT.sav .
15.15 - 16.00 Course evaluation

Lecture notes with 4 slides per page
Day 1 morning and Day 1 afternoon
Day 2 morning and Day 2 afternoon
Day 3
Day 4 morning and Day 4 afternoon

The assignment
You can choose between two assignments
Linear regression: The assignment and the data Stata: ExamF2007lin.dta, SPSS: ExamF2007lin.sav.
Logistic regression: The assignment and the data Stata: ExamF2007log.dta , SPSS: ExamF2007log.sav.
The assignment has the form of a statistical analysis of a data set and a solution in form of a short repost should be returned before
Wednesday June 13 2007 at 12 a.m. at the Department of Biostatistics.

A satisfactory analysis of the data and answers to the questions posed in the assignment will be credited by 1 ECTS point.