Trisha Greenhalgh, professor1
To bake good cookies, start with good cookie dough. To use a different metaphor, to build a brick wall, take a large collection of bricks—all the same size and in perfect shape—and line them up neatly.
A systematic review is a review of primary research undertaken to an explicit, rigorous, and reproducible methodology.1 The Cochrane Handbook recommends asking a tightly focused question (usually of the format “What is the impact of intervention A on outcome B?”), finding all randomised controlled trials (RCTs) that have addressed it, extracting data on (for example) sample size, completeness of follow-up, and numbers of participants with each outcome, and summing the findings.2 The poster child of systematic review is a cumulative meta-analysis of 33 trials of streptokinase for myocardial infarction, which showed that as each study was added to a numerical synthesis, the confidence interval became progressively narrower.3
But what if there are no bricks? What if RCTs are either practically impossible or unethical? Would a research ethics committee approve a trial in which half the patients on a theatre list would be randomised to an anaesthetist (a doctor with 4-6 years’ training plus a postgraduate qualification) and half to an anaesthetic associate (with two years’ training), with no correction for case mix? Would a patient sign the consent form?
Perhaps that is one reason why the 52 papers we identified in a search for research on physician associates (PAs) and anaesthetic associates (AAs) turned up no randomised controlled trials.4 Perhaps it is why every study that compared the performance of PAs or AAs with that of doctors found large differences in case mix. Quite rightly, triage processes were in place to ensure that complex patients (e.g. extremes of age, with more severe or risky medical conditions, multimorbidity or challenging social circumstances) were seen by someone with longer and more in-depth training. Also rightly, PAs and AAs in the studies were closely supervised by senior doctors.
How, then, can we reasonably compare apples (PAs or AAs seeing low-complexity patients under supervision) with oranges (doctors seeing more complex patients with less supervision) studies where nobody was randomised and there was very little (usually no) blinding of assessors?
In such circumstances, we need to accept that the “risk of bias” tools beloved by the GRADE methodologists2 are going to tell us only the screamingly obvious—that there are significant biases in the study designs. What those risk of bias tools won’t tell us is what we should do with those biased data.
For starters, we need to question how researchers attempted to correct for these biases. One team produced an elegant methodology for correcting for what they called “medical acuity.”5 But as with any non-randomised study, the presence of unknown (and as-yet unimagined) confounders cannot be excluded. The study is still comparing apples with oranges, even when you’ve added a fudge factor for the disproportionate hardness of apples and greater juiciness of oranges.
We need to go beyond the “single focused question” approach to examining the primary studies. Different research teams looked at the PA/AA issue in different ways, each asking a different question and studying a different primary outcome. In such situations, the primary studies are not tidy bricks, but a set of mis-shapen stones. Ogilvie et al use the metaphor of the dry stone wall to depict how reviewers need to use narrative synthesis to weave these disparate studies together, highlight the strengths and limitations of each design and show what each study contributes (and what it fails to contribute) to the overall picture.6
Whereas the conventional systematic reviewer’s task is statistical (to summarise data), the dry stone wall reviewer’s task is interpretive (to make sense of those data). Both are important and provide complementary information.7
Our “dry stone wall” review turned up some troubling findings, chief of which was that only a handful of primary studies had looked at any aspect of the clinical performance of PAs or AAs in a UK context.4 Even fewer had made any attempt to correct for case mix. Many of those studies had been undertaken in the early 2010s when the UK had a more resilient, better-staffed health service and associate professionals were being deployed cautiously in low-risk roles. Yet when national policymakers were interviewed, they appeared to view the evidence base on efficacy and safety as a closed case (it had, they thought, been demonstrated definitely).
The government has asked Gillian Leng to examine the evidence (including but not limited to the studies we included in our review) on the efficacy and safety of PAs and AAs.8 If they are expecting her to build a brick wall, they will be disappointed.
Footnotes
Competing interests: TG was approached by Gillian Leng in the context of the review into PAs that Leng has been commissioned to lead, which was the initial prompt to review the academic literature for submission to the Leng review. Please see the linked research paper for a full COI statement.
Provenance and peer review: commissioned, not externally peer reviewed.
References
↵
↵
↵
↵
↵
Halter M, Joly L, de Lusignan S, et al. Capturing complexity in clinician case-mix: classification system development using GP and physician associate data. BJGP Open 2018;2(1):bjgpopen18X101277.doi:10.3399/bjgpopen18X101277 [published Online First: 20180410]
6. ↵
7. ↵
8. ↵
Department of Health and Social Care. Leng review: independent review of physician associate and anaesthesia associate professions terms of reference. London: gov.uk 2024.