“Do we still remember the question we are trying to answer?
Or have we substituted an easier one?”
(Daniel Kahneman) 
Good news: Fluoride helps raise bone mineral density.
Sounds promising, doesn’t it?
But wait a second, will that change translate into a reduction in fracture risk?
Actually, no . Don’t cry Joni. Life’s not that easy.
The thing is, when dealing with a difficult question, we tend to seek the answer to a simpler one, that seems to be relevant. Instead of talking directly about fracture risk, we try a detour, conduct research into bone mineral density, then continue as if an increase in bone mineral density equals a reduction in fracture risk.
At first glance, this seems reasonable.
However, a seductive trap awaits us here.
We can be misled by focusing too much on the surrogate endpoints.
Because surrogate endpoints are not the same as “direct” endpoints. The former doesn’t take into account patients’ real benefits.
- A surrogate endpoint is “a laboratory measure or a physical sign that is intended to be used as a substitute for a clinically meaningful endpoint” . For example, this may be blood sugar level, Prostate-specific antigen (PSA) measurement, CD4 count (an indicator of how well the immune system is working), etc. Surrogate endpoints are not “felt” by patients.
- “Direct” endpoints are “clinically meaningful endpoints that directly measure how a patient feels, functions, or survives” . They can be objectively measured (such as survival rate) or subjectively assessed (such as quality of life).
Why do people use surrogate endpoints?
A drug or other treatment must show its efficacy before it can be approved for marketing.
However, it’s not easy to wait for a long time to count complication events, or to assess survival rate. Conducting a study, in that case, will be heavily costly, lengthy, and arduous (with a large sample size to manage). Surrogate endpoints help keep studies simple.
In some cases, proving effect on “direct” endpoints may not be feasible. For example, in the case of preventive treatment for asymptomatic deep venous thrombosis, clinical events (such as death, pulmonary embolism, etc.) don’t occur frequently. So follow-ups with ultrasound to screen for blood clots (an outcome that is not “felt” by patients) are reasonable.
Pitfalls of surrogate endpoints
- Clofibrate (an agent used for controlling high cholesterol) helps lower the blood cholesterol level, but may increase mortality in patients .
- Strict blood glucose control keeps the HbA1c (average plasma glucose concentration) lower than 6%, but can raise mortality .
- Hormone replacement therapy augments HDL concentration (i.e. concentration of “good” cholesterol) but may also increase stroke risk in women after menopause .
As the above examples indicate, in some cases, treatments do improve surrogate outcomes, but may fail to show real benefit to patients.
Just like how we may sometimes have faith in people who are wrong.
It’s understandable. Because our knowledge of the relevant pharmacologic and biologic events is always imperfect and incomplete.
We don’t fully know what lies behind the relationship between a surrogate endpoint and the pathology of a disease. We make an inference based on surrogate endpoints just because it seems logical to do that. Well, it also seemed reasonable that no bacteria could live inside an acidic stomach. Until Helicobacter pylori was found.
Some surrogate endpoints may not lie in the causal pathway of a disease. For example, the risk that HIV-infected pregnant women will transmit the infection to their infants is correlated with CD4 count (an indicator of how well the immune system is working). However, interleukin-2 (a cancer treatment) fails to reduce transmission risk, despite its beneficial impact on CD4 count. It turns out that there is no causal relationship between them. Maternal CD4 and transmission risk are both influenced by a third factor: maternal viral load (i.e. the amount of HIV in the mother’s blood ). Therefore, focusing directly on the outcome of the proportion of infants infected with HIV would be fair, and most appropriate, in this case.
Drugs may have other unfavorable effects, despite having a “beneficial” impact on surrogate endpoints. As in the case of hormone replacement therapy for women after menopause. The treatment raises HDL level (i.e. concentration of “good” cholesterol). However, it may also increase the risk of deep vein thrombosis and breast cancer .
Well, in short, many pitfalls can be detected. So caution is needed when dealing with surrogate endpoints, in order to avoid false hope.
What to do now, as medical practitioners?
Are you looking for something that brings direct benefits to patients?
Seek for POEMs!
POEMs stands for Patient-Oriented Evidence that matters. It is direct evidence that a medical intervention, on average, lengthens life, decreases symptoms, and improves life quality.
If a study showed that a new drug reduces total cholesterol by 15%, the evidence is not yet persuasive. We could remain confident if that drug is also shown to reduce cardiovascular mortality. That would be better for patients.
However, POEMs is not always available.
Just follow the endpoint hierachy . You should start with level 1 (that is, POEMs). Then if level-1 evidence is not ready for use, move on to level 2. And so on.
- Level 1: a true clinical-efficacy measure.
Example: 1) death or hospitalization, in Heart failure; 2) cardiovascular death, stroke, symptomatic myocardial infarction, in Acute coronary syndrome; 3) stroke or systemic embolic event, in Atrial fibrillation; 4) pain or loss of joint function, in Rheumatoid Arthritis; etc.
- Level 2: a validated surrogate endpoint.
Example: 1) HbA1c (a measure of average blood sugar level over time) for clinical effects on long term risk of microvascular complications, in Type II diabetes mellitus; 2) Systolic and diastolic blood pressure, in multiple classes of Anti-Hypertensives; 3) > 40 meter improvements in 6 minute walk distance, in Pulmonary arterial hypertension; etc.
- Level 3: a nonvalidated surrogate endpoint, yet one established to be “reasonably likely to predict clinical benefit”.
Example: 1) large and durable effects on viral load, in some treatment of HIV infection settings; 2) durable complete responses, in some hematologic oncology settings; 3) large effects on progression-free-survival, in some solid tumor oncology settings; etc.
- Level 4: a correlate that is a measure of biological activity but that has not been established to be at a higher level.
Example: 1) CD4 (an indicator of how well the immune system is working), in HIV infected patients; 2) Prostate-specific antigen (PSA) levels, in Prevention of prostate cancer symptoms etc.
Now you have an invaluable compass to explore the medical evidence jungle.
Feel free to send me a postcard if you have any problems during the trip.
1/ Kahneman, Daniel. Thinking, Fast and Slow. New York: Farrar, Straus and Giroux, 2011.
2/ Haguenauer D, Welch V, Shea B, Tugwell P, Adachi JD, Wells G. Fluoride for the treatment of postmenopausal osteoporotic fractures: a meta-analysis. Osteoporos Int. 2000;11:727–738.
3/ Reporting Information Regarding Falsification of Data” Proposed Rule. http://www.fda.gov/downloads/drugs/newsevents/ucm440796.pdf
4/ Oliver MF, et al. A co-operative trial in the primary prevention of ischaemicheart disease using clofibrate. Br Heart J. 1978 Oct;40(10):1069–1118.
5/ Gerstein HC, Miller ME, Byington RP, Goff DC, Bigger JT, Buse JB,et al. Effects of intensive glucose lowering in type 2 diabetes.N Engl JMed. 2008 Jun;358(24):2545–2559.
6/ Rossouw JE, Anderson GL, Prentice RL, LaCroix AZ, Kooperberg C, Stefanick ML, et al. Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results From the Women’s Health Initiative randomized controlled trial.JAMA. 2002 Jul;288(3):321–333.
7/ Fleming TR, Powers JH. Biomarkers and Surrogate Endpoints In Clinical Trials. Statistics in medicine. 2012;31(25):2973-2984