This time, he is talking about the statistics part of the erstwhile article we discussed in the previous post: "Risk stratification at diagnosis for children with hypertrophic cardiomyopathy: an analysis of data from the Pediatric Cardiomyopathy Registry". He explains some of the jargon related to medical statistics for the understanding of those who can just spell the word!

Here we go:

__“There are lies, damned lies and statistics” …..__

__Mark Twain certainly did not have medical biostatistics when he popularized this phrase.__

The
contribution of statistics to medical science cannot be said in the same light.
The weight of a study is determined by its statistics. So let us just take a
breather from just satisfying our intellectual taste buds by tasting only the
abstract of an article and proceed to actually looking at the entire cooking
process.

I was
exposed to statistics when I accompanied my mother to the vegetable
market. The potatoes were “sampled”; by
size, freshness, color etc. Then they were screened for “predictive markers”
indicating their likely shelf life before they reach their “end point” either
cooked or rot. And the inevitable remark that out of the last purchase, what
percent of the potatoes went bad (“failed to survive”) within a fortnight
(“period of observation”). These observations helped us predict and pick the
best potatoes.

In the
article under discussion, “Risk stratification at diagnosis for children
with hypertrophic cardiomyopathy: an analysis of data from the Pediatric
Cardiomyopathy Registry” the clinical implications and its applicability is
quite obvious. So the million dollar question would be “Why do we need
statistics to prove it?” The answer is simple. Any data needs to be
authenticated before it can be applied in practice or replicated. This authentication
process is done by statistics.

In this
study, two important statistical analytical methods were employed. Let’s
discuss them in brief.

__Cox proportional-hazards regression:__**How important is it?**

It appears in one in ten papers.

**How easy is it to understand?**

Aim to
understand the end result – the “hazard ratio” (HR).

**When is it used?**

The Cox
regression model is used to investigate the relationship between an event
(usually death) and possible explanatory variables, for instance observe table 3 &
4 in the article in detail: “Idiopathic, diagnosed at age less than 1 year; Idiopathic,
diagnosed at age more than or equal to 1 year; Malformation syndromes; Inborn errors of metabolism; With
restrictive cardiomyopathy; With dilated cardiomyopathy.”

**What does it mean?**

The Cox
regression model provides us with estimates of the effect that different factors
have on the time until the end event. As well as considering the significance
of the effect of different factors the model can give us an estimate of life
expectancy for an individual.

Regression and
correlation are easily confused.Correlation measures the

*strength*of the association between variables.Regression*quantifies*the association. It should only be used if one of the variables is thought to precede or cause the other. Interpreting the Cox model involves examining the coefficients for each explanatory variable. A positive regression coefficient for an explanatory variable means that the hazard is higher and thus the prognosis is worse. Conversely, a negative regression coefficient implies a better prognosis for patients with higher values of that variable
The “HR” is
the ratio of the hazard (chance of something harmful happening) of an event in
one group of observations divided by the hazard of an event in another group. A
HR of 1 means the risk is1 × that of the second group, i.e. the same. A HR of2
implies twice the risk.

__Kaplan–Meier estimate of the survivor function__
To
determine the Kaplan–Meier estimate of the survivor function for the above
example, a series of time intervals is formed. Each of these intervals is
constructed to be such that one observed death is contained in the interval,
and the time of this death is taken to occur at the start of the interval.
A plot of the Kaplan–Meier estimate of the survivor function (Figure 1) is a
step function, in which the estimated survival probabilities are constant
between adjacent death times and only decrease at each death. Figure 1 of the
said article depicts a step wise graph of the surviving subjects in the study
upto five years where the number of surviving subjects can be seen on the Y
axis of the graph.

I have tried to simplify these
statistical tests, probably a bit too simplified. The intricacies regarding a
test lie in deciding where and when to use a certain test. This narration
cannot dwell in these details. Secondly the calculations are quite enormous and
confusing. But thankfully suitable software are available to carry out these
tests. It is imperative for a researcher to be conversant with the various
statistical software packages available before embarking
on a study. A few examples are Epi 6, SSPS, SAS etc. A researcher should have
his statistical test decided before embarking on a study, therefore it is
prudent to discuss with a biostatistician beforehand.

So much for now.

That is Dr Ravi Ramamurthy for you! If you have any doubts or applause, please email them to kiran.vs.dr@nhhospitals.com or write a comment in the comments box below. It will be honored!