You are at: The material -> Advanced Medical statistics


Advanced Medical Statistics - 2013


week date (for you to complete) material (link to pdf) datasets tick when completed exercises tick when completed MCQs
my YouTube videos theoldorganplayer: link to videos



Assignment details

Survival analysis


Chan Biostatistics survival analysis article


additional information for Chan article (including R Commander equivalent analysis)


Example Course handbook

(students will be provided with individually edited version of this)

Description of all datasets








  continue survival analysis handout      


  Logistic regression






  Multiple regression








Continue multiple regression handout above



  Mixed models approaches:
Getting familiar with the Linear Mixed Models (LMM) options in SPSS






  Repeated measures 2 - Four repeated measures in a single group same as above    



Repeated measures 3 - Five repeated measures in two treatment groups


The following links provide the data at the various stages described in the chapter - you should not need them:


9 Each candidate can select either the factor analysis or binary analysis chapter to work through

Factor analysis


R script for correlation matrix input



grnt_fem.sav (two factor model)





Analysing a set of binary questions

somtot.dat (somtat = mus scores; 30 vars. n=161)





(sexual attitude data from Dmitris Rizopoulos itm r package; 10 vars n=1077)



Welcome. This is the second course in medical statistics (you can see the first course here.) This course has been designed specifically for medics and those concerned directly with healthcare provision, while students from other disciplines are welcome to use the material it is advised that they search out examples of the various techniques discussed within their own discipline to compliment the materials presented here.

The course assumes that you have completed the introductory course before undertaking this one - if you have not done this it is unlikely you will understand the material above and specifically you will probably find using R for the various exercises extremely difficult.

This is a 10 week (around 100 hours of work) course which introduces you to survival analysis, logistic regression, and various other varieties of multiple regression, however the focus of the course is on the modern analysis of repeated measures, the so called Linear Mixed Models (LMM) approach. There has been much research and development of specific software for this type of analysis over the last decade and statistical practices are changing rapidly. Linked closely with this approach are what are called multilevel models.

Both the LMM and multilevel approach have much in common as they both allow the modelling of non independent data. For example you may have a cluster randomised trial where patients are seem by particular GPs, within various practices or investigating the effects of a training program across many subjects (containing several trainers) across many locations. Considering the GP evample, regardless of how standardised the GPs try to be you need to be able to take into account the GP effect upon the particular cluster of patients along with the practice effect at a higher level. A similar argument goes for the training program example. Often repeated measures for a particular patient/client/subject mean it is also essential to take into account including the individual baseline measure and any possible correlation between the repeated measures for themsleves and the independent nature of the data across different patients.

Finally we look briefly at a data reduction techniques - Factor analysis (Principal component analysis), this much older technique, which was developed in the early 20th century allows an analyst to take a large number of variables, and given certian circumstances, reduce them down to a core set of one or more factors each of which consists of a subset of the original variables. The classic example of this is where a whole battery of tests concerning Intelligence is reduced down to a few factors, such as motor skills, language, etc. It is also a very common technique in psychiatry where a questionnaire has a large number of items to provide a smaller set of indicators concerning certain traits. Unfortunatley this techique does not work well for a set of binary variables, a common situation in questionnaire design so we consider this seperately.

An aspect of this course that you should find familiar, as it reflects the approach taken in the Essential Medical statistics course, is the introduction of a variety of software packages, for specific situations, the two core software applications here being SPSS (for a short time called PASW) and R, including a free add on called R commander.

R, being a free open source application, has seen a meteoric rise in use in the last few years and is being increasingly used for introductory statistics courses. I will also be introducing you to MLwIN, a free application specifically for carrying out multilevel modeling

This course requires you to carry out a large number of practical exercises yourselves and to facilitate this I have included screenshots in the pdf handouts and also over 30 HD youtube videos. It is essential that from the very first week you carry out the analyses yourselves and do not just read the notes and watch the videos, as the assignment requires you to carry out a similar analysis.

To complete this course you will need:

Additional none essential texts