Without standards for testing treatment ideas,
we'd have thousands of choices with no basis for making a helpful
and no foundation on which to make reliable scientific or clinical
Associations: Causal or Coincidental?
Baseball & the FDA | Billiards and
Clinical trials: Retrospective versus Prospective |
Hazard Ratio? |
Odds Ratio |
Identifying Quackery |
Is there an Impartial group? | Terminology
Types of clinical data (best first) | The Problems with Testimonials |
Weighing Sources of Medical Evidence |
Common Myths About
Observations can fool us, as in this historic example:
"For many centuries doctors used leeches and lancets to relieve
patients of their blood.
They knew that bloodletting worked.
EVERYBODY said it did. When you had a fever and the doctor bled
you, you got better.
EVERYONE knew of a friend or relative who
had been at death’s door until bloodletting cured him.
could recount thousands of successful cases."
Why We Need Science: “I saw it with my own eyes” Is Not Enough
of trying to see things as they are:
... and knowing what we do not know.
Our goal is to
help patients and caregivers to weigh the strengths and weaknesses of medical claims
We encourage patients and caregivers to develop the good habit of
asking informed questions.
If a study finding is strong the authors will be able to provide good
answers and will be happy to do so.
Do the results reported predict outcomes in the real world ... predict
what will happen to me or you?
Was everything that is important to report (clinically relevant),
provided in the abstract or the press release?
How do we start?
By focusing on the methods of the study, not just the results and
the conclusions of the authors.
By learning about
the key factors that can influence how the study is interpreted, such
participated in the study (1, 6, 30, 50, or 300)?
How were the participants selected?
(type of lymphoma, eligibility: age, exclusions, prior
Was there a control group?
What is the natural history of the disease for the eligible
(varied, predictable, favorable, short survival without
How were the treatment effects (good and bad) measured?
What was the length of follow up?
(did the effects last, were there late bad effects?)
By reviewing the
source document, instead of the press release
By recognizing the potential influence of conflict of interest
By asking if other groups have had similar findings in similar
By asking independent experts in the field
reasons for different groups:
"Bias" has two
inclination of temperament or outlook; especially: a personal and
sometimes unreasoned judgment." Bias can influence how one looks at
outcomes, or in what we choose to read or ignore.
(2) In a study design, a bias is defined as an error in the method of
study that leads to a deviation in the outcome away from the truth.
Sources of Bias:
Wikipedia: "A conflict of
interest is a situation in which someone in a position of trust,
such as a lawyer, insurance adjuster, executive or director of a
corporation or a medical research scientist or physician, has
competing professional or personal interests. Such competing
interests can make it difficult to fulfill his or her duties
impartially. A conflict of interest exists even if no unethical or
improper act results from it. A conflict of interest can create an
appearance of impropriety that can undermine confidence in the
person, profession, or court system."
Conflicts of interest occur when personal, professional, or financial
interests intentionally or unintentionally influence decisions on
scientific methods, or how data from the study are interpreted.
conflict of interest, I believe, is any financial association that would
cause an investigator to prefer one outcome of his research to another.
Let me give you an example. If an investigator is comparing drug A with
drug B and owns a large amount of stock in the company that makes drug
A, he will prefer to find that drug A is better than drug B. That is a
conflict of interest." ~ Marcia Angell, M. D. Source:
Those who develop
new drugs or sell supplements have an inherent financial conflict of
interest with respect to objectively evaluating the true worth and
benefits of their products or services, which can lead to selective
reporting or "hyping" the products in order to maximize shareholder
confidence and the profitability of the company that sells them.
Scientists who have financial interests in products or services must
disclose these relationships, which make them inherently less able to
overcome the biases these monetary interests can create.
"While most people
think conflicts of interest are a problem of overt corruption, that is,
that professionals consciously and intentionally misrepresent the advice
they give so as to secure personal gain, considerable research suggests
that bias is more frequently the result of motivational processes that
are unintentional and unconscious " 4
Consider that, by investing in a company, a scientist may demonstrate a
belief in the value of its product, perhaps in advance of evidence.
Scientists have an ethical responsibility to avoid arriving at
conclusions ahead of time. The discipline of science requires that
theories be tested in well-controlled studies, and that the outcomes be
evaluated objectively before conclusions are made.
"Was as the research funded by an organization that generally advocates
a specific point of view?
Do the findings of the research parallel the organization's point of
view - or too closely?"
Because of personal inclinations, expectations, or other biases, a
participant in a study may report events more favorable to the
hypothesis, and leave out, or not see, events that contradict it. There
is also the potential for this type of bias in case reports
Investigators may introduce bias into a study by selecting patients who
have characteristics (such as young age) that are favorable to a desired
outcome; and also by excluding patients who do not (such as patients
with low blood counts). Randomized studies protect against this kind of
Was the data source manipulated to produce analyzable data?
Was the data source selected with an unusual methodology?
Was the data source unusually small or narrow, or tightly controlled by
Was the data source self-selected (not random)?
Investigators and scientists can develop unintended prejudices about the
value of their own work, ideas, or intellectual property.
Investigators may unconsciously see benefit when none exists, or they
may set up a study in ways more likely to reveal weaknesses, exaggerate
the benefits, overlook unanticipated side effects, and so on.
know that most men, including those at ease with problems of the
greatest complexity, can seldom accept even the simplest and most
obvious truth if it be such as would oblige them to admit the falsity of
conclusions which they have delighted in explaining to colleagues, which
they have proudly taught to others, and which they have woven, thread by
thread, into the fabric of their lives.”
~ Leo Tolstoy
Is the researcher affiliated with an organization that promotes a
specific point of view?
Does the researcher produce studies that consistently generate the same
- Is the research group the ONLY group interested in the research
is the tendency
to give more weight to incidents and data that conform to preexisting
beliefs and to forget things that do not.
"We're all prone to it, including scientists. One major advantage of
the scientific method is that it is pretty good at overcoming
confirmation bias." Source:
Biases common to
patients and caregivers:
In order to cope with living with a life-threatening disease, patients
or caregivers may develop a tendency to minimize the dangers of the
disease, or to inflate the potential of alternative and other less toxic
approaches to control it. Denial can lead to missed opportunities and
delays that can make the disease more difficult to treat.
Fear: To be fearful of a cancer or cancer treatment is to be
human, and sometimes it's justified. In patients, the fear of the
toxicity associated with many standard cancer therapies can form a bias
in favor of claims made for safer alternative, or even investigational
We are highly prone to wishful thinking.
Physician biases (reasons to consult independent experts):
Even a trained
oncologist can have conflicts of interest, biases, or gaps in knowledge
- especially if he or she does not specialize in lymphomas.
Investigators may have intellectual biases about any therapies they may
Community doctors might have biases in favor of what is easiest to
HMO physicians may prescribe what is least expensive. Other doctors
might be influenced, perhaps unconsciously, by sales promotions from the
Patients expressing their desire to continue working without
interruption may influence a busy physician to prescribe what meets the
immediate needs, without fully discussing possible negative long-term
implications of that treatment decision.
For a readable
and concise paper on scientific integrity
Sources of Bias
The Dirt on
Coming Clean: Perverse Effects of Disclosing Conflicts of Interest
is important to recognize your biases, and to periodically evaluate
whether they are interfering with your judgment. The first step to not
letting bias interfere with your judgment is to accept that it's there
and decide to deal with it."
Cause and Effect?
Did an action lead to an outcome? ... or was it coincidental?
is an observation that one event or condition occurs with
another. But associations do not mean one thing caused another.
That is, associations do not prove causality - the relating of
causes to the effects they produce.
Suppose a survey shows that that people who drink wine live longer than those who drink beer.
We might be
tempted to interpret this association to mean wine is better for you
than beer. However,
it may be that
people who choose wines are more likely to eat healthier foods, or that foods that go well with wine are better for you
than foods that go well with beer.
So the chips and pizza are confounding
variables and the study does not yet prove causality: the consumption of wine might not influence health one way or the
There's a likely association between people who wear crash
helmets and brain injury.
However, it's easy to recognize it's the high-risk activities
of folks who must wear helmets that increase risk of brain injury,
not the helmets.
In any study,
better outcomes may be observed for people born in August or January.
The last example
obviously explained by chance.
conclusions that A caused B can range from harmless to
when a baseball player hits a home run and recalls that he had eggs
for breakfast that day - who then continues to have eggs on game
Dangerous, when a patient delays a needed treatment based on anecdotal
reports (an association taken as fact) that use of an herb can control
However, observations and exploring
possible connections between events
(forming hypothesis) is the starting point of science - inspiring
further research and well-controlled experiments that prove or disprove
associations are so common, plausibility is required before
expensive research would be done. For example, it would be
foolish to study the effects of birth dates on outcomes, or to test
if minute fractions of a compound - too small to measure - could
have therapeutic effects on human disease. See
Here are some
common assumptions about cause and effect related to indolent lymphoma,
kept alive by testimonials:
My disease is
stable, therefore the life style changes I have made are helping.1
resistance assay predicted my response to treatment and my response
was great. Therefore,
the assay is proven to be valid. 2
My lymph nodes
are regressing, therefore the investigational vaccine I took is
My lymph nodes
started growing after I had chocolates, therefore chocolate cause
My lymph nodes
started regressing while taking Aloe Vera, therefore it's effective
against the disease.1
with indolent lymphomas may be particularly susceptible to
confusing cause and effect, because the natural course of the
disease is so variable. It may remain stable for many years without any
regress spontaneously "as many as 20% to 30% of patients will
experience regressions at some time in the clinical course of their
Therefore, if a practitioner prescribes a life style or alternative
protocol that 100 patients follow, as many as 30% are likely to do well
because they would have done well anyway. This "effect," - which has
good probability of being unrelated to the practice - will often result
in strong belief and promotions, as in: "How can you argue with
sensitive to many treatments and has a variable natural course - can wax
and wane independent of therapy. The proof that an assay can predict
response would therefore require controlled studies on many patients over time.
Practitioners have an obligation to
state that their ideas and practices have not been proven to
provide benefit or predict outcomes.
Vaccines and Autism link:
It's easy to
confuse correlation with causation, by
Tony L. Hines
John Godfrey Saxe's version of an Indian parable applies to the quest
for understanding - and actually understates the complexity of human
biology and drug interactions,
It was six men of
To learning much inclined,
Who went to see the Elephant
(Though all of them were blind),
That each by observation
Might satisfy his mind.
approached the Elephant,
And happening to fall
Against his broad and sturdy side,
At once began to bawl:
"God bless me! but the Elephant
Is very like a wall!"
... (a target shooting analogy)
The drug failed to achieve the primary goal(s)/ endpoints the
researchers (and drug company) set for themselves ahead of the clinical
By slicing and dicing the data after the fact, in a small subset of
the patient cohort, they drew statistically iffy conclusions that served
As some wag once put it:
After-the-fact sub-set analysis is like shooting at a barn, then
painting bulls-eye around the bullet hole, where ever it happened to be.
... Hardly compelling "proof".
Careful patient advocacy requires that we be able to tell the difference
marketing hype of companies with clear financial interests and cases
regulatory hurdles are keeping back a truly valuable drug from the
The drug in question is hardly a good example for getting the
Subset analysis - an example
This subgroup analysis suggested
that the treatment was quite effective and statistically
significant for all patients except those born under the sign of
Gemini or Libra. The difference in outcome with respect to
astrologic sign was naturally an artifact and would not be
reproducible in subsequent studies. This was the point of their
believe every close pitch is a strike and that the strike zone is too
small. So just as we recognize the need for a neutral party to call
balls and strikes in baseball, we should also recognize the more urgent
need for an impartial agency, such as the FDA, to evaluate claims of
medical benefits and risks.
In order to test claims the drug sponsor must conduct well-designed
studies that minimize bias and demonstrate safety and efficacy for a
given condition. Absent evidence-based tests and assessments we'd
return to a "Wild West" environment with no means of making informed or
safe medical decisions, and no good foundation on which advances in
clinical science could be made.
Is the FDA without bias and conflict of interest?
No agency or human being is completely free of bias, but the agency is
committed and mandated by law to achieve impartiality. There are strong
policies on ethics and conflicts of interest in place, and criminal
penalties can be invoked for violations of these regulations.
About FDA's Ethics
Program: "The Agency’s ethics program is administered to help ensure
that decisions made by Agency employees are not, nor appear to be,
tainted by any question of conflict of interest. The "ethics" laws and
regulations were established to promote and strengthen the public’s
confidence in the integrity of the Federal Government. The Agency’s
Ethics and Integrity Branch provides advice and assistance to FDA
employees on a variety of ethics related matters including, but not
limited to, financial disclosure, prohibited financial interests,
outside activities, co-sponsorship agreements, and post employment."
Clinical Trial Design:
Retrospective versus Prospective trial designs
There are two
basic types of studies: retrospective and prospective. Understanding
each can help the patient advocate to better converse with scientists on
drug development and assessment ... and help consumers (all of us) to
evaluate the strength of evidence in scientific reports, and other
A patient advocate provided a nice analogy to help compare retrospective
and prospective studies:
"Shoot a cue ball into a pack of billiard balls and the 7 ball goes
in the side pocket. A retrospective analysis *looks back* at the shot
with the objective of finding evidence that guides how to play pool to
win. The prospective study, on the other hand, starts with a hypothesis
and tests it going forward: "I will shoot the ball into the pack this
way, and I predict the 7 ball will go in the side pocket." So with a
prospective study you must call your shot in advance.
Thus, the prospective study provides a much higher level of confidence
that the outcome was determined by the action (that is was causal), and
can be repeated ... that it was not the result of chance or other
Is the results of a single prospective experiment sufficient
evidence? Generally not, unless the findings are "robust." Most
often a second experiment will be needed to validate the first. You also
want to scrutinize the DESIGN of the experiment to see if it contains
BIASES (study flaws) that may have "rigged" the outcome. ... Perhaps
the 7-ball was put near the side pocket, or the table slants that way.
An example of bias in a clinical trial is when investigators select
"ideal" patients that have a favorable prognosis (good counts, young
age) ... or they don't count participants in the analysis who died from
"unrelated" causes, or they do not give sufficient weight to side
The main purpose of doing controlled experiments is to achieve an
acceptable LEVEL OF CONFIDENCE that the positive effects measured in the
experiment predicts what will happen to patients in the real world. The
alternative to this expensive process is to rely on OPINION ... a return
to the dark ages.
Importantly, there is never absolute certainty in these matters.
Statistics is about measuring the level of certainty that an outcome in
an experiment predicts outcomes for the rest of us ... so that we can
have confidence that making a new drug available for an indication
(cancer, diabetes, osteoporosis) is on balance better for the patients
afflicted with the disease than no treatment, or an existing treatment.
So the billiard table analogy is useful but it oversimplifies. A
clinical trial is many times more complicated. For example: what is the
outcome that you are measuring (the end point), and how well does it
predict clinical benefit? Does tumor shrinkage increase survival, or
outweigh the risk bone marrow toxicity? Does an increase in time to
progression offset the long-term risk of secondary MDS? Does the
intervention improve overall survival or quality of life? Thus, the
indication (cancer versus a cold), and what's already available to treat
it, has a lot to do with how much risk is acceptable for the new drug.
Not surprisingly the drug sponsor will have a bias, because they are
driven to do this difficult work in order to realize a profit. So the
industry is prone to setting up, interpreting, and reporting on the experiment in ways
that favor the benefit side of the equation.
however, that the PROFIT motive is ESSENTIAL to the process and to
progress Without it new drugs would NEVER be developed or tested. We
need all of these: the profit incentive, rigorous scientific method
(controlled prospective studies), patient participation, and independent
and impartial FDA review.
The barometer that we are heading in the right direction in general is
an increase in life expectancy, and overall survival (OS) for various
indications: Note the recent improvements in OS for indolent follicular
The purpose of conducting well-design trials
is to avoid the many dangers of practicing medicine based on opinion. We
want to be sure that an intervention - for a specific condition -
provides clinical benefit. ...
For example, without use of a controlled
study, Hormone Replacement Therapy (HRT) would still be a common
practice today ... and, contrary to what was anticipated based on case
observation and theories, we would still be giving these hormones to
women, increasing the risk of
heart disease and cancers.
Finally, importantly, with opinion-based medicine we would have no
scientific foundation to build on. With the value of drug A based on
opinion (observations and theory), we could not reliably compare it to
drug B, or evaluate how prudent it is to test drug C with drug A.
Take away standards for approval -- as the Abigail Alliance appears to
advocate for -- we would soon be victimized by claims, counterclaims,
sales pitches, and promotions of inadequately tested drugs. Drugs would
be released into the market with insufficient safety and efficacy
information. Gathering this information is very much more difficult
after marketing - and that absent standards of evidence for the release
of new drugs, the difficulty would increase exponentially.
perspective more believable?
Perhaps we can
learn quickly from the perspectives of doctors and scientists who have
cancer, as the threatening nature of the disease is likely to remove any
financial biases they may have had in respect to the integrity of the
drug evaluation system in America.
don’t understand the difference between information based on theory,
anecdote, historical analysis, or double-blind placebo controlled
studies are making ill-informed decisions, believing alternative
therapies are safer or more effective when they are not.
Even patients who presume that alternative therapies are ineffective may
use them. Why? When faced with a life-threatening disease requiring
highly toxic treatments with no guarantees, or when dying because there
are no effective conventional treatments, it takes guts to reject
something or someone claiming to be able to save you, just in case you
might be wrong." - Wendy S. Harpham, MD (NHL survivor) Full text:
are often used to
promote unorthodox therapies
Please consider that such a conspiracy would require the complicity of
many thousands of scientists, doctors, and regulators - who also get
cancer and spouses, parents, grandparents, children, and loved ones also
lymphoma are particularly vulnerable to believing in easy, risk-free
The charlatan or
quack has a message that appeals to wishful thinking and also our desire
for certainty, which is an easy message to craft when unencumbered by
the truth ... or an objective test that a theory actually works and
provides clinical benefit.
In medicine - as
in life - benefit is not often achievable without risk, and outcomes are
rarely certain. Unlike the charlatan, the trained doctor is obligated
to provide accurate information that is based on published clinical
studies, which will describe both risks and benefits, relative to the
disease untreated or treated differently.
Red flags of quackery:
See also our printable Red Flags and Free Speech
certainty that you will be helped or cured
as a cure for many types of disease.
solely by the practitioner.
The lack of
acceptance of the promoted remedy is blamed on a conspiracy.
The remedy has
is based entirely on
studies are cited.
"A charlatan (also called swindler) is a person practicing
quackery or some similar
confidence trick in order to obtain money, fame or other
advantages via some form of
"Most people think that quackery is easy to spot. Often it is not.
Its promoters wear the cloak of science."
Patients Deal with Questionable Cancer Treatments, William Jarvis,
and Quackery -
"Drs. Barrett and Herbert define a quack as anyone who
fraudulently pretends to medical skills they do not possess. They
distinguish among three types: dumb quacks (ignorant), deluded
quacks (self-righteous, true believers), and lastly dishonest quacks
The term pseudoscience can be applied to any information
masquerading as science. The fakery may be obvious, as in the case
of supermarket tabloids, or much more subtle, and potentially
harmful, as is the case when well-known personalities recommend
unproven remedies for serious medical conditions.
What are some signs that can alert us to the presence of
Medicine: Identifying Potential Cancer Treatments Of Herbal Origin
(Mar. 5, 2008) — Curing cancer with natural products -- a case for
shamans and herbalists?
Not at all, for many chemotherapies to fight cancer applied in
modern medicine are natural products or were developed on the basis
of natural substances. Thus, taxanes used in prostate and breast
cancer treatment are made from yew trees. The popular periwinkle
plant, which grows along the ground of many front yards, is the
source of vinca alkaloids that are effective, for example, against
malignant lymphomas. The modern anti-cancer drugs topotecan and
irinotecan are derived from a constituent of the Chinese Happy
help you assess clinical data and medical claims:
"All scientific work is
incomplete - whether it be observational or experimental.
All scientific work is liable to be upset or modified by advancing
That does not confer upon us a freedom to ignore the knowledge
we already have or postpone the action that it appears to demand at a
given time. "
- Sir Austin Bradford Hill (1965)
Treatment Response |
Statistical Significance (p-value & confidence)
Abstracts are summaries of larger papers and therefore do not
contain all the available details of the study methods and data.
Abstract conclusions may not be accepted by experts in the
field. Reputable peer-reviewed journals sometimes require
modifications to conclusions from the original abstract for this
reason -- or they may reject the paper from publication because it
was determined that the methods (methodology) or data did not
support the conclusions made in the abstract.
Therefore, it's important to
avoid forming conclusions on the
basis of abstracts. They should be considered only a
starting point for discussions with your doctors and perhaps a basis
for additional inquires and research.
"I think it's important to note that one must be very cautious in
drawing conclusions from merely reading an abstract. It's important to
read the full article, understand the methodology, and the strength of
the statistics and research design to determine if the conclusions the
authors present in the abstract are reasonable. The better the journal -- and the higher the quality of the peer review
necessary to be published in the journal, the more likely the methods
and design, etc. will be good. Even with that, I've seen some
questionable studies get into good journals." - L - (NHL-survivor &
Questions to ask:
the paper published in a respected journal?
2) What types of studies and methods were used to reach the
3) Do papers published by other groups support the conclusions?
Reproducibility is valued in science, especially when a
finding comes from a different investigative group. The reason for this
is that it reduces the chance that bias, or choice of methods, or chance
influenced the findings or the conclusions.
Key: start by asking questions of (not just accepting) the
information we receive.
"A hypothesis consists either of a suggested explanation
phenomenon (observable event or, quite literally, something that
can be seen.)"
Theories are starting points for experiments and
studies. They should not be regarded as a proof, no matter the
reputation of the author. When someone tells you that this
is how a treatment works and that it is therefore desirable, you
What clinical data supports the theory?
Who published the findings, and in what journal?
Evidence-based medicine requires that a theory - the hypothesis
- be tested objectively ...
in a way that minimizes biases - that the pool table is not tilted
in a way that favors the theory.
Describes clinical outcomes from
therapy based on a clinical
change, such as the reduction in size of a lymph node. Responses, however, may or may not result in
benefit - defined as improved survival or the reduction of symptoms.
example, with lymphatic cancers the lymph nodes can increase and
decrease in size because of transient inflammatory reactions, which
could lead to false assumptions about the benefit of a drug or a
life style intervention. Also, the reduction in tumor size might
be offset by the short or long-term toxicities of the drug.
Some questions to
ask about response:
1) How long was the response?
2) At what intervals were the outcomes measured and with what tests?
3) Did the measured response correspond to clinical benefits?
4) Who reported the responses and were the outcomes verified by
5) How large was the patient sample, and how were they selected?
6) What is the expected clinical course of the disease?
7) What were the short or long-term toxicities (the costs) relative to
Mean (average), Median
(middle), Mode (most common)
the total of all numbers included divided by the quantity of numbers
number midway through an odd set of numbers or
a value halfway between the two middle numbers in an even set.
number or value that occurs most frequently in a series.
If two or more values occur with the same frequency, then you take the
mean of the values.
Note: These calculations are meant to
provide information based on the study sample for comparison purposes,
not to predict individual outcomes. The significance of these
calculations depending on other important questions as described below:
Survival curves - the basics
"You will often
find survival curves inhabiting technical papers in the medical
literature, and some understanding of these curves is essential to
understanding the technical literature."
Statistics (a primer):
The purpose of statistical analysis is to tell us how
likely the finding in the experiment predicts outcomes in the real
world. Was it fully or partially due to chance? What
is the level of confidence?
the question being asked relevant?
the data come from reliable sources?
Margin of error/confidence
interval—when is a change really a change?
Are all data reported, or just the best/worst?
Are the data presented in context?
Have the data been interpreted correctly?
Does the author confuse correlation with causation?
About statistical significance:
In order to
draw better conclusions about data we need to know just a little
about measures of statistical significance, which I think of as a
level of certainty that an outcome was not due to chance.
So, hypothetically, if you gave two groups identical placebo therapies there
would be a difference in the outcome - even if those two groups were
quite large. That is, by chance one or the other group will
do better given identical drugs because by chance they will have
higher or lower risk disease, or capacity to heal, and so on. . So it's very common to see what
looks to be a sizable difference in a study outcome be described as
"no significant difference."
conclusions, about responses to treatment in a small group of
patients for example, are not absolute. Everything is possible, but
some things are very possible, some are less possible and others are
very unlikely -- but still possible to occur.
We draw conclusions with a certain amount of confidence,
conventionally 95% or 99%,
but there is still some chance of an error (5% or lower)."
Margin of Error - Confidence
We might just look
for two measures of statistical significance in scientific reports to
quickly estimate the strength of the finding: p-value and
If the p-value
is .05 or less, the findings are considered statistically significant -
not due to chance.
The confidence interval (CI) shows the level of confidence that
the study outcome predicts the result in a larger population, expressed
as a range. The wider the range, the less confidence we have that the
results of the study predicts outcomes in the real world.
Value or range
that indicates statistical significance
probability value that tells you if outcome is due to chance
.05 or less
.01 is very good
.05 is on threshold (borderline)
.08 is not statistically significant
95% confidence interval (were the study repeated multiple
times, it would contain the true effect 95% of the time) - a
range of expected results.
71% (95% CI: 42-92%)
83% (95% CI: 63-95%)
94% (95% CI: 63-97%)
In the last example, we might say that we are 95% confident that
the response rate is between 63 and 97%.
The wider the confidence interval the lower the confidence.
P-value or Probability value:
value that results from a calculation that tells you how likely or
unlikely the finding (of a difference in treatment response as an
example) was due to chance.
Common language for low p-value:
Statistically significant - means unlikely due to chance.
Caveats about importance of P-value:
Threshold (cap) is arbitrary, therefore, the closer the
p-value is to the threshold (.05), the less statistically
Statistical significance does not necessarily = clinical
is rarely the most pressing issue. Biggest threat is systemic
error (bias) Therefore more qualitative questions include: Are
these the right patients? Are these the right outcomes? Are
there measurement biases? Are observed associations confounded
by other factors?
provide no information on the results' precision - that is,
the degree to which they would vary if measured multiple
times. Consequently, journals are increasingly emphasizing a
second approach: reporting a range of plausible results,
better known as the 95% confidence interval (CI).
Factors that influence P-values:
Magnitude of the main effect: a larger difference will have a
of observations: a difference noted in a study of 500 patients
will have a lower p-value than the same differences observed
in a 25 pt group.
of the data (standard deviation): if the observed differences
in response are unified and less spread out, the p-value will
be lower (more statistically significant)
Adapted from American College of Physicians-American
Society of Internal Medicine
95% Confidence Interval (CI):
interval computed from the sample data which, were the study
repeated multiple times, would contain the true effect 95% of the
time." " Where confidence intervals are wide, they indicate less
precise estimates of effect." -
on a limited number of patients can only estimate real world
results; and if you did the same study 10 times (on different
participants with same inclusion criteria), you would get
different outcome each time.
So CI is a shorthand way to state this.
Example: in one study:
"Median duration of response was 21 months (95% CI, 18 to 24
The 95% part is to show, again, that the median outcome in any
single trial cannot with 100% certainty predict outcomes for the
entire population - in this case, all people with relapsed
indolent lymphomas or MCL using this drug.
You can also think of the CI interval as a margin of error ...
21 (18 to 24 ) as done in election exit polls.
The narrower the CI interval, the more confidence there is in the
finding. If the 95% CI was very wide (2 to 100) that would show
very low confidence.
This ties into n (the number of patients in the study). In a ten
patient study, the CI would be very wide - probably not even
worth calculating - just as only 4 flips of a coin (all Heads)
would not show the true odds of coin-flip outcomes.
The CI will widen (lower confidence) when the study population is
small, and it includes a mix of lymphoma types.
you calculate the CI?
between p-value and confidence interval:
Measures of Association
ASSOCIATIONS & CAUSATIONS
difference | risk ratio | odds ratio
Difference (excess risk):
The difference in the incidence among those who are exposed and
those who are unexposed.
If the incidence of lung cancer among smokers is 10 per
1000 and incidence among non-smokers is 1 per 1000, then
the risk difference is 10 per 1000, minus 1 per 1000
= 9 per 1000.
Ratio (Relative Risk or RR):
The ratio of incidence among exposed and unexposed.
using the same smoking example above, this will be 10 per
1000 (smokers), divided by 1 per 1000 (non-smokers) =
10/1 or 10. An Relative Risk of 10 means
that smokers have a 10 times higher risk of developing lung cancer
compared to non-smokers.
A risk ratio is an extremely powerful measure of association. The
greater the RR the more strength you have for the observed
association (in other words, an RR of 10 implies a much stronger
association between smoking and lung cancer compared to an RR of
An RR of 1 implies no association between two variables. In
practice, it's difficult to estimate the RR because true incidence
figures are obtained only from studies which have a longitudinal
component (cohort study) and such studies are difficult to do.
is a kind of relative risk
The hazard ratio is the effect of an explanatory variable
on the hazard or risk of an event. (Wikipedia.org) ... such as
the effect of a dietary practice (high red meat consumption) on
mortality (the event) ... compared to those who have less of it.
Hazard Ratio (HR) is 1.00 there is no difference in the incidence
of the event when the two patient groups are compared - one with
the explanatory variable, the other without.
read meat consumption and mortality
(women): HR, 1.36 ... confidence range: [95% CI, 1.30-1.43]
(men): (HR, 1.31 ... confidence range: [95% CI, 1.27-1.35],
(modest increased risk of mortality from high red meet
Statins use during treatment in follicular lymphoma patients was
also associated with better event-free survival (the measured
event), though this difference was not statistically significant:
(HR = 0.67, ... confidence range: 95% CI: 0.39-1.16, p=0.15).
(NOTE: The Hazard Ratio (HR) above is less than 1, but the wide
Confidence Interval (CI) and the fact that the higher range was
above 1 indicates that this difference between the two patient
categories could be from random variation - not statistically
Reference point: The Hazard Ratio for smoking (variable) and lung
cancer (event) is
about 10 (range 2.35 to 19.33, depending on race)
or OR (Relative Odds):
The odds (not risk) of occurrence of an event or disease compared
between two groups (exposed and unexposed).
An OR is
usually computed in a case control study where it is not possible
to get the true risk (incidence). Risk and odds tend to very
similar when the disease occurrence is rare'.
a group finds that lymphatic tissue from patients with NHL are 5
times more likely than controls (normal tissue) to have a certain
virus, that would be expressed as
OR = 5. An OR of 1 would mean no difference between the two
Absolute risk /
Life time risk
The risk of developing a disease over a period of time. We all have
absolute risks of developing various diseases such as heart disease,
cancer, stroke, etc. The same absolute risk can be expressed in
different ways. For example, you have a 1 in 50 risk of developing a
lymphoma in your life. This can be expressed as a 2% risk, or a 0.02
Clinical Data (most reliable first)
controlled clinical trials:
(Provides strongest evidence of clinical benefit)
Participants are assigned randomly (by chance
instead of by investigator selectioni) to separate groups
(arms) for the comparison of different treatments -- usually a
standard and an investigational treatment. Patient informed consent
is required. Neither the investigators nor the patient choose the
group in which participants will be placed.
Using chance to assign people to treatment arms helps to avoid
selection bias -- putting pts in better health in the
investigational arm, for example. It also helps to ensure that the
groups will be similar and that the treatments they receive can be
Randomized trials can be "double-blinded" or "non-blinded." In
double-blind studies, neither the investigator nor the participants
are informed of which arm the participants have been assigned to.
This also reduces bias and improves confidence in the findings.
NOTE: Systematic reviews that evaluate the outcomes in many
trials, including randomized trials, may be the best source of
evidence to guide clinical practice.
controlled clinical trials
Participants are assigned to a treatment group based on criteria
determined by the investigators, such as prognostic indicators, and
disease type. This study design makes it possible for investigator
bias to influence the findings, and therefore there is less
confidence that the group receiving the treatment under study and
the control group are comparable.
Case series are studies (usually retrospective) that describe
outcomes, such as responses, time to progression, etc.) from
patients who received the treatment under investigation.
These provide weaker evidence than do experimental studies because
of the potential for biases such as, but not limited to, who is
observed and what outcomes the observer is looking for, unknown
association between factors and outcomes -- such as not accounting
for other reasons that could explain the observed result.
The value of these types of studies (e.g., case series, ecologic,
case-control, cohort) is that they provide preliminary evidence that
can be used as the basis for hypotheses testing in stronger
experimental studies, such as randomized controlled trials. Consider
the recent HRT report finding that using estrogens increases the
risk of heart disease and cancers. The hypothesis that it might
reduce these risks was based on observations that were proven to be
and the reports you don't see
Protocols, ethical principles, and a desire to maintain credibility
are beneficial forces that encourage responsible public reporting of
drug development research and clinical trial outcomes.
that an easy way to put a positive spin on a company's drug
development project is to selectively report on favorable outcomes,
and to keep less than stellar results from being released at all.
Ask: How is response to treatment being defined? How were
the patients selected? How many patients were tested? What was the
control? Has the finding been replicated by an independent group?
Have all the study outcomes been reported? Is this an interim
report, or a report on all - predetermined number of
The Problems with Testimonials
"For many centuries doctors used leeches and lancets
to relieve patients of their blood.
They KNEW bloodletting worked. EVERYBODY said it did.
When you had a fever and the doctor bled you, you got
EVERYONE knew of a friend or relative who had been at
death’s door until bloodletting cured him. Doctors could
recount thousands of successful cases."
When Laypersons Give Medical Advice
The most obvious problem being that you can't tell in any
individual case what would have
happened if nothing was done, or something else -
particularly for a type of lymphoma that is known to wax and
wane - that has a variable clinical course.
Such accounts can't inform
about the number of persons who have used the intervention
and did not benefit - or were harmed.
There is no all important denominator - the number of patients studied,
that can tell us if the result will occur in 1 of 5, or rarely -
in 1 of 30,000.
Testimonials cannot provide even an
estimate of a rate of effect in others, or if the
effect that is measured was even caused by the intervention.
pre-specified study size is required to provide a rate of an event or
treatment effect. A control group is often required to
establish causality - that the intervention is the cause of the
This is the reason that expensive and large controlled studies are
often needed before a drug can be approved for the treatment of a
People who die cannot testify - are not included in the all
Patients who have tried and failed an alternative strategy
cannot be accounted for. Only the "successful" outcomes are reported,
which may be 1 in many thousands.
Compare with peer-review clinical trial where the number of
patients receiving the treatment
are known up front (prospectively), and the positive and
negative outcomes are measured uniformly - and reviewed
authenticity of the report?
We cannot know if the person reporting the benefit really has
the medical condition, or if he or she is reporting the outcome accurately?
With testimonials there is no follow up, or independent review of
of the individual reporting the case?
Does the individual have a financial conflict of interest or
strong belief? Do they sell
the product or charge a fee for dispensing the information?
Is the testimonial a way of validating their personal decision
process and theories?
specifics of the case, such as the natural history of the
Even for cancers with a very poor prognosis there are case
reports in the literature of spontaneous remissions, independent
of any intervention.
People sometimes win the lottery , but this does not make
playing the lottery a good bet - particularly when betting your
outcomes were measured, when, and by whom?
Was the reported success objectively measured and validated by
Is it a patient reported
outcome? Was it that the patient felt better? What
tests were used to measure it? What happened later? ... did the intervention lead to a lasting
Is the condition self-limiting - does it sometime self-correct
medical treatments were given shortly before or after?
A CT scan will often show lesions after standard treatment that
are necrotic scar tissue. Credit might be given to an alternative practice used after this
treatment, when it was merely the resolution of a scar tissue
over time, a normal bodily process.
accuracy of the diagnosis?
Was it a false diagnosis of a cancer, or a cancer of a type with
an indolent course?
For all of
these reasons it's prudent to regard testimonials with suspicion -
particularly if the report is implausible, scientifically.
Scientists get cancer too, as do their children. Do
scientists think the approach is plausible?
Similarly, case reports have many of the above limitations
- cannot establish causality, and can't be the basis for predicting
in-vitro means in test tube or cell culture; in-vivo means in the
We frequently read or hear about the anti-cancer properties of this
or that supplement based on scientific research findings. Here are
some questions to ask of this kind of information:
response detected in a test tube (in-vitro/cell culture)?
human body is infinitely more complex than a test tube.
The tumor cells change when removed from the body; oftentimes
they will die spontaneously.
Nevertheless, indications of activity in a test tube often
become the basis for product claims about natural supplements.
Be aware that this can only be a starting point for additional
experiments. Using such data as the basis of medical claims is
irresponsible, and bias should be suspected. Furthermore, you
might ask if the dose used to produce the in-vitro effect
possible to achieve in the body, or if it can be achieved
Is the claim for the promise of a drug or supplement based solely on
animal studies? While animals are useful for preclinical testing of
new drugs, there are many differences between animals and humans;
and drugs that show promise in tumors implanted in animals are not
always effective in cancers originating in humans, or can be given
safely to humans.
See also A Mouse is not a Man (or Woman)
of Medical Evidence
Evidence for Clinical Benefit?
Description with notes
Random selection prevents patient-selection
bias -- "cherry picking" lower-risk participants in the
Provides the most objective basis for understanding
risks and benefits.
|Phase III randomized, blinded, multi-center studies
Best if studies had large numbers of participants, and
the results are confirmed by other studies completed by independent
The data from phase III studies are subject to
peer review and sometimes to impartial independent 3rd-party
review, such as FDA.
is a study of
studies, including randomized types. A systematic review may
provide strongest evidence to guide clinical practice.
Look for the
reputation of the journal - it is best if published in highly
respected, peer-review journal.
Best results will show low
p-values and narrow confidence intervals.
P-values < .05, or
lower are considered statistically significant (not due to
chance), but this does not speak to potential weaknesses in
The analysis should include all of the participants in
the study - the intent-to-treat population, not a subset.
studies do not show best use, necessarily, but provides the best
information about risks and benefits for a specific
condition and setting, relative to the control - typically
the standard of care.
Low to Moderate
Subject to patient selection-bias
|Phase II studies
- typically small single-arm studies that look for
indications of clinical activity and safety, and sometimes
explore dose or dose schedule
Determine if phase III
testing is worth effort and expense, and importantly, if
potential benefits offset risks to future study
Phase II studies are often not conclusive.
(Watch for sponsor hype in press releases.)
If the safety
and signals of activity are good, data from phase II studies might provide
the rationale for phase III testing.
dose-finding, safety study
Phase I studies
- earliest clinical phase: dose-finding studies in human
This type of study cautiously seeks
bioavailability and safety information with
close supervision of the participants. For example,
helps to determine if the drug goes
to target cells, or organs of concern.
Most drugs fail at this phase, but
even the great ones start here. Typically participants have
tried and failed other options.
- anecdotal reports made by
physicians, usually for off-label use of an approved drug,
for which there is existing safety information.
Rationale for use is often mechanism-based. Note that the
rationale for Hormone Replacement Therapy was supported by
observation and case reports, and proven incorrect in a
Animal studies provide (imperfect) models for what
might happens in humans.
These next-step experiments provide clues about toxicity
and bioavailability of agents shown to have activity in cell
Tumors are often transplanted into
animals - setting up an artificial host/tumor environment.
Animal studies can't account for important differences in
biology, metabolism, tumor/host interactions in humans.
(in-vitro) These are
first-step experiments to show the activity of
compounds on cultured cancer cell lines, often of limited
types and strains ... cells that that have been removed from the host
environment ... And like a fish out of water, these cells do
not behave like cancer cells in the body. Consider
that even malignant cells, resistant to cell death in the
body, will die spontaneously when removed from the body and
put into cell
Further, according to some sources, approximately 1 in 5,000 agents
showing activity in cell culture assays (such as a change in
proliferation rate) will become useful therapeutic agents.
Such experiments lack information on
bioavailability, toxicity or activity in
the body at clinically relevant doses.
Thus the results of in-vitro experiments are
hypothesis forming -- where research starts; not what conclusions can be based on.
Unfortunately, the limitations of in-vitro experiments
are not always explained in ads for herbal products promoted
as having anti-cancer activity in cell culture experiments.
Cause for suspicion
Anecdotal reports made by individuals about improvements in
health associated with an intervention.
sometimes found on commercial sites as a marketing strategy.
Reliance on testimonials is a
flag that controlled studies have not been done, else the
study, and not the story, would be the basis for the
Limitations of testimonials
- they cannot inform about (1) who is reporting the result; if they
have biases or conflict of interest - or if it is truthful
account; (2) the case details, such as prior or subsequent
treatments, or how the reported benefits were measured, or
how long the reported benefits lasted. (3) The background,
such as the natural history of the disease.
There is no way to know if negative reports are excluded -
those who die cannot provide a testimonial.
Not every scientific paper that is published is of high
Concentrate on the study
methods (such as the number of participants and how they
were selected, how long the follow up), not the conclusions
of the study authors
Have the outcomes been reproduced by another research
An active drug stops or interrupt a
disease process, such as cell division in cancer, but an
active drug is not necessarily an effective drug, because
the side effects of the drug might offset the positive effects.
References and Related
Background articles on
Evaluating Medical Information -
The limits of
Respir Care. 2001
Dec;46(12):1435-40; discussion 1440-1. Review.
The Product Pipeline and
Clinical Trials: Bringing a Drug to Market -
September 6 and 8, 2005
The Dirt on Coming
Clean: Perverse Effects of Disclosing Conflicts of Interest