Preliminary plan
Notes, Data sets and programs
for
Postgraduate course
Linear regression models for continuous and binary data
Note, lectures, and exercises are under revisions, and the hyperlinks will be dead under this revision.

Edited November 22nd, 2018
By Morten Frydenberg
morten@biostat.au.dk



A list of all data sets All data
Some shortcuts and other tricks for MACs StataMAc.pdf

Where to store downloaded and homemade Stata programs (so called ado files):
By default Stata assume that the downloaded ado and help files are located in
C:\ado\personal (for you personal/homemade programs) and
C:\ado\plus (for the one you download from the net).
C:\ado (ado files made many years ago).

You can check the setting by by the commmand sysdir in Stata.

If you are running Stata via CITRIX, you might not be able to store downloaded Stata programs on the C drive.
The solution can be to store them om your personal drive (here H:). You can do this by changing the location by the commands

sysdir set PERSONAL "H:\ado\personal\"
sysdir set PLUS     "H:\ado\plus\"
sysdir set OLDPLACE "H:\ado\"

These lines should either be in your profile.do file or in the begining of every do file you use.

We use some extra Stata commands you can install these in Stata by
net install cis, from(https://www.biostat.au.dk/teaching/Ados)
net install regeq, from(https://www.biostat.au.dk/teaching/Ados)
net install gr42_7.pkg ,from(https://www.stata-journal.com/software/sj16-3)

If this does not work
Download the four files below and put them in the folder(s) that you use for this course:
cisd.ado
cisd.sthlp
regeq.ado
regeq.sthlp

The command qplot can make several qq-plots at the same time. Note that hou have to apply the option "trscale(invnorm(@)".
An example: to make qnorm plots of res for the two sexes: qplot res, trscale(invnorm(@)) by(sex)

Day 1: Monday November 5th 2018 - Updated October 26th 2018
9.15 - 10.30 Lecture: Simple linear regression -1 .
The model, the parameters, estimation and inference.

All Stata code used at the lecture.
Data set used at the lecture: lung
10.30 - 12.00 Exercises .
Data set used at the exercises: lung.
12.00 - 13.00 Lunch break
13.00 - 14.30 Lecture: Simple linear regression -2 .
Checking the model, residuals, leverage, diagnostics plots,transformation of variables.

Most of Stata code used at the lecture.
Data set used at the lecture: lung and gfrdata.
14.30 - 16.00 Exercises .
Data set used at the exercises: gfrdata and glyco.
How to get from the log - model to the original scale.


Day 2: Wednesday November 7th 2018 - Updated October 26th 2018

9.00 - 9.15 Summarizing Mondays exercises.
9.15 - 10.15 Lecture: Multiple linear regression - 1
The model, the parameters, estimation and inference.
Checking the model.

All of Stata code used at the lectures today.
Data set used at the lecture: fram200 .
10.15 - 12.00 Exercises .
Data set used at the exercises: lung and fram200 .
12.00 - 12.30 Lunch break
12.30 - 14.00 Lecture: Multiple linear regression - 2
Working med categorical explanatory variables
Interaction/effectmodification.
14.00 - 15.15 Exercises .
Data set used at the exercises: lung and fram200 .


Day 3: Friday November 9th 2018 - Updated November 8th 2018

9.00 - 12.00 Exercises .
Data set used at the exercises: serumchol.
Some answers to the exercises
12.00 - 12.30 Lunch break
12.30 - 13.30 Exercises continued .
13.30 - 15.15 Lecture: Linear regression, collinerarity, splines and extensions
Collinearity
Restricted cubic splines
Clustered data

Some off Stata code used at the lecture.
Data set used at the lecture : serumchol194 , Framingham, and FEV .

Homework
The homework to the last week is to go through the lectures on logistic regression day 7 in the Basic Statistics course Day7.pdf (Day7.do).
After that you should complete the exercises Homework.pdf with data : postterm and the do-file HomeworkPartB.do.


Day 4: Monday November 19th 2018 - Updated November 14th 2018

9.00 - 11.00 Lecture: Regression model for binary data .
The logistic regression model in general
Most of Stata code used at the lectures.
Data set used at the lecture: obese .
11.00 - 12.00 Morning exercises
Data set used at the exercises: obese.
12.00 - 12.30 Lunch break.
12.30 - 14.00 The lecture continued.
14.00 - 15.15 Afternoon exercises
Data set used at the exercises: obese.


Day 5: Wednesday November 21th 2018 - Updated November 14th 2018<

9.00 - 10.00 Exercises. - Monday afternoon continued
10.00 - 12.00 Lecture: Modelbuilding in regression models
Modelbuilding: this to consider
Confounding and adjustment
Model selection an its consequences
Over-fitting
A strategy
12.00 - 12.30 Lunch break
12.30 - 15.15 Exercises
Data set used at the exercises: coffee.


Day 6: Friday November 23th 2018 - Updated November 22nd 2018

9.00 - 12.00 Working with wednesdays exercise
12.00 - 12.30 Lunch break
12.30 - 13.30 Discussing wednesdays exercise
13.30 - 15.00 Lecture: Working with logistics regression models and Extensions .
Hosmer-Lemmeshow test in logistic regression models
Conditional logistic regression
Binary data with several random components
Missing data

Some of Stata code used at the lectures.
15.00 - 15.15 Course evaluation