Qualitative Saturation Model

<aside> 👤 Colin Whaley, MSc(Pharm,c) BSc University of Waterloo School of Pharmacy [email protected] | colinwhaley.com

</aside>

Overview

The poster presented at Qualitative Health Research (QHR) Conference can be found here:

A PDF version can be found here.

This method is based on the mathematical model developed by Lowe and colleagues in Field Methods (ResearchGate upload). Of note, I did rename some of the variables that they used, to make the variables and concepts more recognizable to qualitative researchers working with interviews. In their work, Lowe et al. proposed and tested a number of mathematical models to determine how many interviews are needed to reach a given percent saturation for a set of interviews for qualitative analysis.

Upon seeing the model that Lowe et al. developed, I recognized that it could be of more use than they may have anticipated. Being a relatively new trainee in qualitative methods, I was very frustrated with the ambiguity surrounding the concept of saturation. Conceptually it made sense to me, but I realized that this would not translate to confidence by my Advisory Committee. Thus, after some thinking (and frustration with algebra), I am pleased to present the following.

What this model can do

Lowe and colleagues presented two equations that are considered here. Firstly, the model they proposed for use, the so-called information weighting model, which was selected after evaluating a number of possible models:

$$ C_{T} = \frac{(\frac{(N-1)C_{1}C_{N}}{N C_{1}-C_{N}}Rn_{int})}{1+R\;(n_{int}-1)} $$

Equation 1

where:

C_T is the expected total number of codes
N is the number of interviews evaluated (usually between 3 and 5)
C_1 is the number of codes in the first interview coded
C_N is the total number of interviews from all N interviews
R is a constant, and is the average prevalence of themes (i.e. the number of interviews per new code observed)
- R based on the properties of the population being interviewed (e.g. homo- or hetero-genenity of the population, types of codes being extracted, how structured the interview was, etc.)
n_int is the total number of interviews.