Statistics

A Critical Analysis of Null Hypothesis Testing and its Alternatives (Including Bayesian Analysis)

This module will provide a detailed critique of the methods and philosophy of the Null Hypothesis Significance Testing (NHST) approach to statistics which is currently dominant in social and biomedical science. We will briefly contrast NHST with alternatives, especially with Bayesian methods. We will use some computer code (Matlab and R) to demonstrate some issues. However, we will focus on the big picture rather on the implementation of specific procedures.

Prerequisites

Students should have studied some basic statistics before this course.

Format

Interactive lectures

Sessions

Date	Time	Teaching Format
Wed 6 Mar 2024	13:00 - 18:00	In person venue
Wed 13 Mar 2024	13:00 - 18:00	In person venue

How to Book

Bookings can be made via the Modules List where you will also find the module dates. Click on the module you want and you will be taken to a booking screen. As soon as you book, you will receive an automated email confirming your place. Please note, some modules (though not all) have multiple iterations in the Michaelmas and Lent Term.

Bayesian Statistics

The purpose of this course is to familiarise students with the basic concepts of Bayesian theory. It is designed to provide an introduction to the principles, methods, and applications of Bayesian statistics. Bayesian statistics offers a powerful framework for data analysis and inference, allowing for the incorporation of prior knowledge and uncertainty in a coherent and systematic manner. Throughout this course, we will cover key concepts such as Bayes' theorem, prior and posterior distributions, likelihood functions, and the fundamental differences between Bayesian and frequentist approaches. You will learn to formulate and estimate statistical models, update beliefs using new data, and make informed decisions based on the posterior probabilities generated through Bayesian inference. By the end of this course, you will possess the necessary skills to perform Bayesian data analysis, interpret results, and apply Bayesian methods in various contexts.

Prerequisites

Some familiarity with using Stata is recommended. You may wish to take the module 'Introduction to Stata' before begin this module.

Sessions

Date	Time	Teaching Format
Tue 7 May 2024	10:00 - 12:00	SSRMP pre-recorded lecture(s) on Moodle
Tue 7 May 2024	14:00 - 16:00	SSRMP Zoom
Tue 14 May 2024	10:00 - 12:00	SSRMP pre-recorded lecture(s) on Moodle
Tue 14 May 2024	14:00 - 16:00	SSRMP Zoom

How to Book

Evaluation Methods

This course aims to provide students with a range of specific technical skills that will enable them to undertake impact evaluation of policy. Too often policy is implemented but not fully evaluated. Without evaluation we cannot then tell what the short or longer term impact of a particular policy has been. On this course, students will learn the skills needed to evaluate particular policies and will have the opportunity to do some hands on data manipulation. A particular feature of this course is that it provides these skills in a real world context of policy evaluation. It also focuses primarily not on experimental evaluation (Random Control Trials) but rather quasi-experimental methodologies that can be used where an experiment is not desirable or feasible.

Topics

Regression-based techniques
Evaluation framework and concepts
The limitations of regression based approaches and RCTs
Before/After, Difference in Difference (DID) methods
Computer exercise on difference in difference methods
Instrumental variables techniques
Regression discontinuity design.

The target audience for this course will be students who are keen to learn more about impact evaluation and the associated statistical techniques. Participants will get the opportunity to develop some skills in the use of STATA software.

Prerequisites

Basic knowledge of STATA is assumed
Familiarity with regression analysis would be an advantage for this course
Knowledge of descriptive and inferential statistics is required

Format

4 x pre-recorded lectures, and corresponding lab sessions.

NB. Lectures should be watched before the corresponding practical lab (eg. Session 1 should be watched before Session 2 takes place, Session 3 should be watched before Session 4 takes place, and so on...)

System requirements

Information will be provided on the course Moodle page for how to access the relevant software on your personal machine.

Sessions

Date	Time	Teaching Format
Thu 1 Feb 2024	10:00 - 11:15	SSRMP pre-recorded lecture(s) on Moodle
Thu 1 Feb 2024	14:00 - 15:15	In Person
Thu 8 Feb 2024	10:00 - 11:15	SSRMP pre-recorded lecture(s) on Moodle
Thu 8 Feb 2024	14:00 - 15:15	In Person
Thu 15 Feb 2024	10:00 - 11:15	SSRMP pre-recorded lecture(s) on Moodle
Thu 15 Feb 2024	14:00 - 15:15	In Person
Thu 22 Feb 2024	10:00 - 11:15	SSRMP pre-recorded lecture(s) on Moodle
Thu 22 Feb 2024	14:00 - 15:15	In Person

How to Book

Factor Analysis

This module introduces the statistical techniques of Exploratory and Confirmatory Factor Analyses. Exploratory Factor Analysis (EFA) is used to uncover the latent structure (dimensions) of a set of variables. It reduces the attribute space from a larger number of variables to a smaller number of factors. Confirmatory Factor Analysis (CFA) examines whether collected data correspond to a model of what the data are meant to measure. STATA will be introduced as a powerful tool to conduct confirmatory factor analysis. A brief introduction will be given to confirmatory factor analysis and structural equation modelling.

Session 1: Exploratory Factor Analysis Introduction
Session 2: Factor Analysis Applications
Session 3: CFA and Path Analysis with STATA
Session 4: Introduction to SEM and programming

Prerequisites

Students are expected to be familiar with basic statistical concepts such as variance, correlation and regression
The course also assumes familiarity with using the essential features of Stata

Software

Stata

Assessment

There may be a practical exercise at the end of the module.

Textbook(s)

Bartholomew, D. J., Steele, F., & Moustaki, I. (2008). Analysis of Multivariate Social Science Data (2nd ed ed.). New York, NY: Chapman & Hall/CRC Press.
Long, J. S. (1983). Confirmatory Factor Analysis: A Preface to LISREL. SAGE Publications.
Kim, J. O., & Mueller, C. W. (1978). Factor Analysis: Statistical Methods and Practical Issues. SAGE Publications.
LP., S., Hamilton, L. C., & Corporation, S. (2011). Stata Structural Equation Modeling Reference Manual-Release 12. StataCorp LP.

In addition, there are chapters on Exploratory and Confirmatory Factor Analysis in the Electronic Statistics Textbook (EST).

Sessions

Date	Time	Teaching Format
Mon 19 Feb 2024	11:00 - 13:00	SSRMP pre-recorded lecture(s) on Moodle
Mon 19 Feb 2024	14:00 - 16:00	In Person
Mon 26 Feb 2024	11:00 - 13:00	SSRMP pre-recorded lecture(s) on Moodle
Mon 26 Feb 2024	14:00 - 16:00	In Person

How to Book

Meta Analysis

In this module students will be introduced to meta-analysis, a powerful statistical technique allowing researchers to synthesize the available evidence for a given research question using standardized (comparable) effect sizes across studies. The sessions teach students how to compute treatment effects, how to compute effect sizes based on correlational studies, how to address questions such as what is the association of bullying victimization with depression? The module will be useful for students who seek to draw statistical conclusions in a standardized manner from literature reviews they are conducting.

Prerequisites

Students need a clear understanding of fundamental statistical concepts, bivariate association and linear regression
Students will also need to be familiar with using the statistical software R. If you are not familiar with using R, then we strongly recommend that you take the SSRMP module Introduction to R.

Topics covered

Session 1: Introductory session (systematic reviews and meta-analysis) and Using Comprehensive Meta-analysis to calculate effect sizes, run meta-analysis (under fixed/random models) and produce forest plots

Session 2: Heterogeneity in effect sizes: Tau-squared, Tau and I-squared and Sub-group analysis and meta-regression

Aims

To understand and judge the results produced by a meta-analysis
To learn how to compute effects sizes based on dichotomous and continuous data
To become familiar with heterogeneity tests
To learn how to calculate and report subgroup analysis and meta-regression

System requirements

Excel, and R

Textbook(s)

Borenstein, M. Hedges, L.V. Higins, J.P.T. & Rothstein, H.R. (2009) Introduction to Meta-Analysis. Chichester: Wiley
Lipsey,M.W.& Wilson,D.B. (2001). Practical Meta-Analysis. London:Sage

Sessions

Date	Time	Teaching Format
Thu 7 Mar 2024	09:00 - 13:00	In Person
Fri 8 Mar 2024	09:00 - 13:00	In Person

How to Book

Propensity Score Matching

Propensity score matching (PSM) is a technique that simulates an experimental study in an observational data set in order to estimate a causal effect. In an experimental study, subjects are randomly allocated to “treatment” and “control” groups; if the randomisation is done correctly, there should be no differences in the background characteristics of the treated and non-treated groups, so any differences in the outcome between the two groups may be attributed to a causal effect of the treatment. An observational survey, by contrast, will contain some people who have been subject to the “treatment” and some people who have not, but they will not have not been randomly allocated to those groups. The characteristics of people in the treatment and control groups may differ, so differences in the outcome cannot be attributed to the treatment. PSM attempts to mimic the experimental situation trial by creating two groups from the sample, whose background characteristics are virtually identical. People in the treatment group are “matched” with similar people in the control group. The difference between the treatment and control groups in this case should may therefore more plausibly be attributed to the treatment itself. PSM is widely applied in many disciplines, including sociology, criminology, economics, politics, and epidemiology. The module covers the basic theory of PSM, the steps in the implementation (e.g. variable choice for matching and types of matching algorithms), and assessment of matching quality. We will also work through practical exercises using Stata, in which students will learn how to apply the technique to the analysis of real data and how to interpret the results.

Prerequisites

Students wishing to take this module should have either successfully completed Doing Multivariate Analysis, including the end-of-module test, or have had previous equivalent training in statistics (verified by the Skill Check). You will need to be confident in the use of Stata; if you are not, then please take the SSRMC’s Introduction to Stata or 90-minute Stata courses.

Readings and resources

Caliendo, M., & Kopeinig, S. (2008). Some practical guidance for the implementation of propensity score matching. Journal of Economic Surveys, 22(1), 31-72.
Dehejia, R.H., & Wahba, S. (2002). Propensity score-matching methods for nonexperimental causal studies. The Review of Economics and Statistics, 84(1), 151-161.
Guo, S., & Fraser, M.W. (2010). Propensity score analysis: Statistical methods and applications. SAGE Publications Ltd. USA.
Morgan, S.L., & Winship, C. (2007). Counterfactuals and casual inference: Methods and principles for social research. Cambridge University Press.
Rosenbaum, P.R., & Rubin, D.B. (1983). The central role of the propensity score in observational studies for casual effects. Biometrika, 70(1), 41-55.
Rubin, D.B. (2001). Using propensity scores to help design observational studies: Application to the tobacco litigation. Health Services & Outcomes Research Methodology, 2, 169-188.

Assessment

There may be an online open-book test at the end of the module; for most students, the test is not compulsory.

Sessions

Date	Time	Teaching Format
Tue 20 Feb 2024	09:00 - 13:00	In Person
Wed 21 Feb 2024	09:00 - 13:00	In Person

How to Book

Secondary Data Analysis

Using secondary data (that is, data collected by someone else, usually a government agency or large research organisation) has a number of advantages in social science research: sample sizes are usually larger than can be achieved by primary data collection, samples are more nearly representative of the populations they are drawn from, and using secondary data for a research project often represents significant savings in time and money. This short course, taught by Dr Deborah Wiltshire of the UK Data Archive, will discuss the advantages and limitations of using secondary data for research in the social sciences, and will introduce students to the wide range of available secondary data sources. Students will learn how to search online for suitable secondary data by browsing the database of the UK Data Archive.

Assessment

This module is not assessed.

Sessions

Date	Time	Teaching Format
Tue 6 Feb 2024	09:00 - 12:00	In Person
Tue 6 Feb 2024	14:00 - 17:00	In Person

How to Book

Structural Equation Modelling

This intensive course on structural equation modelling will provide an introduction to SEM using the statistical software Stata. The aim of the course is to introduce structural equation modelling as an analytical framework and to familiarize participants with the applications of the technique in the social sciences.

The application of the structural equation modelling framework to a variety of social science research questions will be illustrated through examples of published papers. The examples used are drawn from recent papers as well as from publications from the early days of the technique; some use path analysis using cross-national data, others confirmatory factor analysis, and other still full structural models, to test particular hypotheses. Some example papers may be found below, though they should not be treated as the gold standard, rather as an illustration of the variety of approaches and reporting techniques within SEM.

Duff, A., Boyle, E., Dunleavy, K., & Ferguson, J. (2004). The relationship between personality, approach to learning and academic performance. Personality and individual differences, 36(8), 1907-1920.
Garnier, M., & Hout, M. (1976). Inequality of educational opportunity in France and the United States. Social Science Research, 5(3), 225-246.
Helm, F., Müller-Kalthoff, H., Mukowski, R., & Möller, J. (2018). Teacher judgment accuracy regarding students' self-concepts: Affected by social and dimensional comparisons?. Learning and Instruction, 55, 1-12.
Parker, P. D., Jerrim, J., Schoon, I., & Marsh, H. W. (2016). A multination study of socioeconomic inequality in expectations for progression to higher education: The role of between-school tracking and ability stratification. American Educational Research Journal, 53(1), 6-32.

Students will engage in a critique of such examples, with the aim of gaining a better understanding of the SEM framework, as well as its application to real-life data. To further facilitate this application focus, the theoretical introduction will be accompanied by practical examples based on real, publicly-available data.

This course is intended for students who in their research may want to engage with the testing of multiple hypotheses, or with complex relationships between variables.

Prerequisites

No prior knowledge of SEM is assumed, and only a basic familiarity with Stata (or any other command-based statistical software) is needed.
Students should have an understanding of the principles of multivariate regression
Background reading (see below)

Topics

Introduction to the general principles of SEM;
Latent variables, measurement models, and confirmatory factor analysis;
Path analysis and mediation analysis, with practical application in Stata;
Confirmatory factor analysis and latent variable models.

Reading

Schumacker, R.E. & Lomax, R.G. (fourth edition, 2016, but other editions will also be fine) A beginner's guide to structural equation modelling
- Chapter: 1 (Introduction);
- Chapter 5 (Path Models);
- Chapter 6 (Factor Analysis).

The text is a particularly accessible introduction to SEM. It contains examples in a variety of software packages, although not in Stata©.

Students should focus on understanding the concepts of the technique rather than software issues in preparation for the course. Chapters 1, 5, and 6 provide the core concepts of structural equation modeling, and are required reading for everyone enrolling on the course.

Students who are less confident about their background in quantitative data analysis may want to also read Chapters 2 (Data Entry and Edit Issues), 3 (Correlation) and 4 (Regression Models).

Assessment

There may be an online open-book test at the end of the module; for most students, the test is not compulsory.

Sessions

Date	Time	Teaching Format
Tue 27 Feb 2024	09:00 - 13:00	SSRMP pre-recorded lecture(s) on Moodle
Wed 28 Feb 2024	09:00 - 13:00	In Person

How to Book

Time Series Analysis

This module introduces the time series techniques relevant to forecasting in social science research and computer implementation of the methods. Background in basic statistical theory and regression methods is assumed. Topics covered include time series regression, Vector Error Correction and Vector Autoregressive Models, Time-varying Volatility, and ARCH models. The study of applied work is emphasized in this non-specialist module.

Topics

Introduction to Time Series: Time series and cross-sectional data; Components of a time series, Forecasting methods overview; Measuring forecasting accuracy, Choosing a forecasting technique
Time Series Regression; Modelling linear and nonlinear trend; Detecting autocorrelation; Modelling seasonal variation by using dummy variables
Stationarity; Unit Root test; Cointegration
Vector Error Correlation and Vector Autoregressive models; Impulse responses and variance decompositions
Time-varying volatility and ARCH models; GARCH models

Prerequisites

A background in basic statistical theory and regression methods
A working knowledge of statistical concepts up to the level of Linear Regression

Assessment

There may be an online open-book test at the end of the module; for most students, the test is not compulsory.

Textbook

Hill, Griffiths & Lim (2011). Principles of Econometrics (4th ed). John Wiley & Sons. ISBN-10: 0470626739. ISBN-13: 978-0470626733.

Software

Stata

Sessions

Date	Time	Teaching Format
Fri 23 Feb 2024	09:00 - 13:00	In Person
Fri 23 Feb 2024	14:00 - 18:00	In Person

How to Book

Bookings can be made via the M odules List where you will also find the module dates. Click on the module you want and you will be taken to a booking screen. As soon as you book, you will receive an automated email confirming your place. Please note, some modules (though not all) have multiple iterations in the Michaelmas and Lent Term.

Panel Data Analysis

Description

Panel data consists of repeated observations measured at multiple time points, collected from multiple individuals, entities, or subjects over a period of time. For instance, child A’s numeracy test score in Year 1, Year 2, Year 3 and Year 4. Country B’s GDP per capita in year 2020, 2021, 2022 and 2023. Panel data analysis, as a subset of longitudinal data analysis, is particularly useful for addressing research questions that try to understand how variables change over time and how individual units differ in their responses to changes. An example research question could be: how do children's numeracy scores vary across different socioeconomic backgrounds, and how have these disparities changed over the years? Panel data analysis holds several advantages, such as (1) increased statistical efficiency, (2) more effective at controlling for unobserved individual or entity-specific effects, and (3) more capable to study the dynamics of relationships over time.

Over the course of this module, participants will learn how to work with panel data. Through hands-on exercises and practical examples, participants will gain proficiency in data manipulation, visualisation, and advanced statistical techniques tailored specifically for panel data.

Target audience

The module is suitable for postgraduate students and researchers at any stages of their study and research. However, foundational Stata skills are required.

Prerequisites

If you are not already an experienced user of Stata software then we recommend you take the Introduction to Stata module.

Learning objectives

1. Understand panel data: Define and differentiate between cross-sectional, time series and panel data. Explain the advantages and limitations of using panel data in different research contexts

2. Prepare panel data and exploratory data analysis: Collect, clean and prepare panel datasets for analyses. Conduct descriptive analyses, including summary statistics and data visualisation. Identify patterns and trends within longitudinal data

3. Fixed effects, random effects and mixed models: Understand the differences between fixed effects, random effects and mixed models in panel data analysis. Apply these models to account for individual-specific and/or time-specific effects

4. Diagnostics tests Use diagnostics tests to examine which model should be used for the panel data, and address issues of heteroskedasticity and endogeneity.

5. Interpretation and reporting. Communicate the results of panel data analyses.

Teaching Format

The module consists of 8-hours spread over two half days.

Sessions

Date	Time	Teaching Format
Mon 4 Mar 2024	09:00 - 13:00	In Person
Tue 5 Mar 2024	09:00 - 13:00	In Person

How to Book

Bookings can be made via the M odules List where you will also find the module dates. Click on the module you want and you will be taken to a booking screen. As soon as you book, you will receive an automated email confirming your place. Please note, some modules (though not all) have multiple iterations in the Michaelmas and Lent Term.

Longitudinal Data Analysis

Longitudinal data analysis is a statistical method used to examine data collected from the same subjects or entities over multiple time points. This type of data analysis is particularly valuable for understanding how variables change over time and for investigating trends, patterns, and relationships within a dynamic context. For instance, how does children’s early home environment affect their future mathematical development?

Longitudinal data analysis holds several advantages, such as (1) understanding individual-level trajectories, enabling a deeper understanding of how different subjects respond to interventions or external factors over time, (2) supporting stronger causal inference by tracking changes before and after an intervention and (3) accounting for heterogeneity since it recognises that not all subjects respond uniformly to changes over time.

Over the course of this module, participants will learn how to work with longitudinal data. Through hands-on exercises and practical examples, participants will gain proficiency in data manipulation, visualisation, and advanced statistical techniques tailored specifically for longitudinal data. From understanding growth trajectories to uncovering causal relationships, this module will empower participants to navigate the complexities of longitudinal data with confidence.

Target audience

The module is suitable for postgraduate students and researchers at any stages of their study and research. However, foundational Stata skills are required.

Prerequisites

If you are not already an experienced user of Stata software then we recommend you first take the Introduction to Stata module.

Learning objectives:

1. Foundational concepts:

Define and differentiate between cross-sectional and longitudinal data.

2. Prepare panel data and exploratory data analysis:

Collect, clean and prepare panel datasets for analyses. Conduct descriptive analyses, including summary statistics and data visualisation. Identify trends in the data.

3. Linear mixed models:

Apply mixed models to account for within-subject correlations and individual-specfici effects. Interpret fixed and random effects.

4. Non-linear growth modelling:

Explore non-linear growth modelling techniques, such as growth curve modelling. Fit and interpret growth models to describe changes over time.

5. Interpretation and reporting

Communicate the results of longitudinal data analyses.

Teaching Format

This is an 8-hour in-person module with practical session spread over two half days.

System requirements

For the practical lab session you will need to bring a fully charged laptop with Stata software already downloaded. A free copy of Stata MP4 can be downloaded for free to students of Cambridge University with a valid CRSID who book a place on this module.

Sessions

Date	Time	Teaching Format
Wed 31 Jan 2024	09:00 - 13:00	In Person
Thu 1 Feb 2024	09:00 - 13:00	In Person

How to Book

Bookings can be made via the M odules List where you will also find the module dates. Click on the module you want and you will be taken to a booking screen. As soon as you book, you will receive an automated email confirming your place. Please note, some modules (though not all) have multiple iterations in the Michaelmas and Lent Term.

Causal Inference Methods

The module introduces causal inference methods that are commonly used in quantitative research, in particular in social policy evaluations. It covers the contexts and principles as well as applications of several specific methods - instrumental variable approach, regression discontinuity design, and difference-in-differences analysis. Key aspects of the module include investigations of the theoretical basis, statistical process, and illustrative examples drawn from research papers published on leading academic journals. The module is suitable for those who are interested in quantitative research and analysis of causality across a range of topics in social sciences.

Topics covered

· Lecture 1: Introduction into causal inference methods

· Lecture 2: Instrumental variable approach

· Lecture 3: Regression discontinuity design

· Lecture 4: Difference-in-differences analysis

Learning Objectives

By the end of the module, students will be able to:

· Understand the contexts and principles of causal inference methods

· Grasp the theoretical and statistical basis for several common causal inference methods

· Analyse and interpret research papers drawn on causal inference methods

Pre-requisites

Basic knowledge of Stata is recommended to maximise the learning experience.

Sessions

Date	Time	Teaching Format
Tue 23 Jan 2024	10:00 - 12:00	In Person
Tue 23 Jan 2024	14:00 - 16:00	In Person
Thu 25 Jan 2024	10:00 - 12:00	In Person
Thu 25 Jan 2024	14:00 - 16:00	In Person

How to Book

Bookings can be made via the M odules List where you will also find the module dates. Click on the module you want and you will be taken to a booking screen. As soon as you book, you will receive an automated email confirming your place. Please note, some modules (though not all) have multiple iterations in the Michaelmas and Lent Term.

Social Sciences Research Methods Programme (SSRMP)

FAQs

Study at Cambridge

About the University

Research at Cambridge