Thursday, May 22, 2014

Dr Ravi Ramamurthy has done it again!

This time, he is talking about the statistics part of the erstwhile article we discussed in the previous post: "Risk stratification at diagnosis for children with hypertrophic cardiomyopathy: an analysis of data from the Pediatric Cardiomyopathy Registry". He explains some of the jargon related to medical statistics for the understanding of those who can just spell the word!

Here we go: 
“There are lies, damned lies and statistics” …..

Mark Twain certainly did not have medical biostatistics when he popularized this phrase.
The contribution of statistics to medical science cannot be said in the same light. The weight of a study is determined by its statistics. So let us just take a breather from just satisfying our intellectual taste buds by tasting only the abstract of an article and proceed to actually looking at the entire cooking process.
I was exposed to statistics when I accompanied my mother to the vegetable market.  The potatoes were “sampled”; by size, freshness, color etc. Then they were screened for “predictive markers” indicating their likely shelf life before they reach their “end point” either cooked or rot. And the inevitable remark that out of the last purchase, what percent of the potatoes went bad (“failed to survive”) within a fortnight (“period of observation”). These observations helped us predict and pick the best potatoes.
 In the article under discussion, “Risk stratification at diagnosis for children with hypertrophic cardiomyopathy: an analysis of data from the Pediatric Cardiomyopathy Registry” the clinical implications and its applicability is quite obvious. So the million dollar question would be “Why do we need statistics to prove it?” The answer is simple. Any data needs to be authenticated before it can be applied in practice or replicated. This authentication process is done by statistics.
In this study, two important statistical analytical methods were employed. Let’s discuss them in brief.

Cox proportional-hazards regression:
How important is it?
 It appears in one in ten papers.
How easy is it to understand?
Aim to understand the end result – the “hazard ratio” (HR).
When is it used?
The Cox regression model is used to investigate the relationship between an event (usually death) and possible explanatory variables, for instance observe table 3 & 4 in the article in detail: “Idiopathic, diagnosed at age less than 1 year; Idiopathic, diagnosed at age more than or equal to 1 year; Malformation syndromes; Inborn errors of metabolism; With restrictive cardiomyopathy; With dilated cardiomyopathy.”
What does it mean?
The Cox regression model provides us with estimates of the effect that different factors have on the time until the end event. As well as considering the significance of the effect of different factors the model can give us an estimate of life expectancy for an individual.
Regression and correlation are easily confused.Correlation measures the strength of the association between variables.Regression quantifies the association. It should only be used if one of the variables is thought to precede or cause the other. Interpreting the Cox model involves examining the coefficients for each explanatory variable. A positive regression coefficient for an explanatory variable means that the hazard is higher and thus the prognosis is worse. Conversely, a negative regression coefficient implies a better prognosis for patients with higher values of that variable
The “HR” is the ratio of the hazard (chance of something harmful happening) of an event in one group of observations divided by the hazard of an event in another group. A HR of 1 means the risk is1 × that of the second group, i.e. the same. A HR of2 implies twice the risk.
Kaplan–Meier estimate of the survivor function
To determine the Kaplan–Meier estimate of the survivor function for the above example, a series of time intervals is formed. Each of these intervals is constructed to be such that one observed death is contained in the interval, and the time of this death is taken to occur at the start of the interval. A plot of the Kaplan–Meier estimate of the survivor function (Figure 1) is a step function, in which the estimated survival probabilities are constant between adjacent death times and only decrease at each death. Figure 1 of the said article depicts a step wise graph of the surviving subjects in the study upto five years where the number of surviving subjects can be seen on the Y axis of the graph.

I have tried to simplify these statistical tests, probably a bit too simplified. The intricacies regarding a test lie in deciding where and when to use a certain test. This narration cannot dwell in these details. Secondly the calculations are quite enormous and confusing. But thankfully suitable software are available to carry out these tests. It is imperative for a researcher to be conversant with the various statistical software packages available before embarking on a study. A few examples are Epi 6, SSPS, SAS etc. A researcher should have his statistical test decided before embarking on a study, therefore it is prudent to discuss with a biostatistician beforehand.

So much for now. 

That is Dr Ravi Ramamurthy for you! If you have any doubts or applause, please email  them to or write a comment in the comments box below. It will be honored!                                               


  1. The evident prospects as mentioned above would help them to regard about every possible ideas must have been followed by the individual.