Teaching Statistics Without Teaching Thinking - Reform is not optional. It is overdue

Dr. Aakash Kembhavi* & *

Vd. Ayudha Kembhavi,* BAMS, Founder and Clinical Director, Aaranya Ayurveda and Holistic Wellness Center, Bengaluru.*

Image 1

National Statistics Day is observed in India on June 29 each year to honour the birth anniversary of Professor P.C. Mahalanobis — a man who understood, with rare clarity, that statistical thinking is not a technical luxury but a civilisational necessity. Mahalanobis did not merely teach Indians to calculate; he taught a newly independent nation to ask better questions about itself. He built institutions, not just methods. He insisted that numbers mean nothing without the reasoning structures that give them interpretive force. It is a bitter irony, then, that on the day we celebrate his legacy, the institutions entrusted with producing India’s next generation of healthcare researchers continue to teach statistics as a ritual of calculation — executed without comprehension, cited without understanding, and deployed without the critical thinking that Mahalanobis spent his life insisting was the whole point.

The crisis this essay documents is not a peripheral one. Across Ayurvedic medical colleges in India, biostatistics is taught as a procedural module — a set of decision rules and formula templates transmitted in the final year of training and promptly reduced, in practice, to the selection of a p-value threshold and the writing of a conclusion that declares victory or defeat on its basis. This is not statistical literacy. It is statistical mimicry. And the cost of that mimicry is borne not by the institutions that perpetuate it, not by the faculty who transmit it, and not by the examination boards that reward it — but by the patients whose care depends on research that cannot reliably distinguish a real effect from a statistical artefact, and by a tradition that deserves far better than the appearance of rigour in place of its substance. Reform is not optional. It is overdue.

There is a specific kind of examination answer that every biostatistics teacher in an Ayurvedic institution has seen so many times that it has become invisible — absorbed into the texture of normal academic performance without triggering the alarm it should.

The question asks: “A clinical study comparing two Ayurvedic formulations for Amavata reports a p-value of 0.03. What does this mean and what are the limitations of this conclusion?”

The answer writes itself: “A p-value of 0.03 means that the probability of obtaining the observed result by chance is 3%, which is less than the significance threshold of 0.05. Therefore the result is statistically significant, meaning the difference between the two formulations is real and not due to chance. The study proves that Formulation A is superior to Formulation B.”

Every sentence of this answer contains an error. The p-value does not measure the probability that the result occurred by chance. Statistical significance does not mean the difference is real in any clinically meaningful sense. And a single study, regardless of its p-value, proves nothing — it provides evidence of varying strength for a conclusion that remains probabilistic.

The student who wrote this answer has been taught statistics. They can define a p-value, identify a significance threshold, and apply a paired t-test to a dataset. They have passed their biostatistics examination. They are, by the metrics the system uses, statistically literate.

They are not. They are statistically fluent in a language they do not understand — able to reproduce the vocabulary and the procedures without possessing the conceptual framework that gives those procedures meaning. They have been taught the grammar of statistical reasoning without being taught to think statistically. And when they graduate — when they design dissertations, supervise research, review manuscripts, make clinical decisions based on published evidence — the gap between their statistical fluency and their statistical understanding will cost their patients, their students, and the discipline dearly.

This essay — written from two generational perspectives, that of a teacher who has watched this pattern for three decades and a recent graduate who lived through it — is about that gap. About why it exists, what it costs, and what genuine statistical education would require.

Part One: The Student’s Experience — Learning Statistics as a Foreign Language

Vd. Ayudha Kembhavi writes:

I learned biostatistics in my fifth year of BAMS. It arrived at the end of a curriculum that had spent four and a half years teaching me to think in an entirely different register — the register of classical Ayurvedic reasoning, where concepts are defined through their relationships within the Tridosha framework, where knowledge is validated by textual authority and clinical experience, and where the primary intellectual activity is interpretation rather than quantification.

Biostatistics arrived speaking a completely different language. Normal distribution. Standard deviation. Null hypothesis. Chi-square test. Confidence interval. The words were English — or an approximation of English dressed in mathematical notation — but they were no more immediately meaningful to me than Sanskrit shlokas had been on the first day of BAMS. They required translation. They required context. They required a framework of understanding within which they could be placed.

That framework was never provided. We were given definitions. We were shown formulas. We were walked through calculation procedures with worked examples. We were told when to use which test — “use Student’s t-test when comparing two means of normally distributed data; use Mann-Whitney U when the distribution is non-normal.” We memorized these decision rules. We reproduced them in examinations. We moved on.

What we were not given was the answer to the question that should have preceded every definition, every formula, and every decision rule: Why does this matter? Why does it matter whether data is normally distributed? Why does the sample size affect the reliability of conclusions? Why is a confidence interval more informative than a p-value? Why does randomization matter? Why does blinding matter? Why does a statistically significant result not necessarily mean a clinically meaningful one?

These “why” questions are not advanced biostatistics. They are the foundational questions that make every technical detail interpretable. Without them, the technical details are procedures without purposes — algorithms to be executed without understanding what they are for or what their outputs mean.

I am not describing an unusual educational experience. I am describing, based on conversations with peers across multiple institutions, the standard experience of biostatistics education in BAMS programs across India. The language is taught. The thinking is not.

Part Two: What Statistical Thinking Actually Is — And Is Not

Dr. Aakash Kembhavi writes:

Statistical thinking is not, at its foundation, mathematical. It is a specific orientation toward uncertainty — a way of asking and answering questions about the world that acknowledges, systematically and rigorously, that our observations are always incomplete, always variable, and always capable of misleading us if we do not account for the ways in which they can go wrong.

The statistician and writer Nate Silver describes statistical thinking as the discipline of knowing what you don’t know — of building explicit uncertainty into every conclusion so that the conclusion accurately represents the evidential situation rather than the conclusion you hoped the evidence would support. The physician and methodologist David Sackett describes it as the application of the best available evidence to the care of individual patients — which requires not merely the ability to find evidence but the ability to evaluate its quality and relevance.

Neither description mentions formulas. Neither mentions t-tests or chi-square distributions or standard errors. The mathematical machinery of statistics is the implementation of statistical thinking — the tools by which the orientation is operationalized. But the orientation must precede the tools, because tools without orientation are procedures without purpose, and procedures without purpose produce exactly what the examination answer in this essay’s opening illustrates: technically correct operations performed in the service of conceptually wrong conclusions.

What does statistical thinking require, concretely?

An understanding of variability as inherent, not incidental. The reason statistics is necessary in clinical research is that biological systems are variable — the same treatment applied to different patients produces different outcomes, and the same outcome measure applied to the same patient on different occasions produces different readings. This variability is not error to be eliminated. It is the fundamental reality that statistical methods are designed to characterize. A researcher who does not understand variability — who treats it as noise that gets in the way of the “true” result — has not grasped the foundational premise of the enterprise.

An understanding of the sampling relationship. Every clinical study is a sample from a population — a subset of observations used to make inferences about a larger reality. The reliability of those inferences depends on how the sample was drawn, how large it is, and how variable the underlying population is. This is the conceptual foundation of sample size calculations, confidence intervals, and statistical power — all of which make sense only when the sampling relationship is understood.

An understanding of what a p-value actually is and is not. The p-value is the probability of obtaining a result as extreme as or more extreme than the observed result, assuming the null hypothesis is true. It is not the probability that the null hypothesis is true. It is not the probability that the result occurred by chance. It is not a measure of the size of the effect. It is not a measure of the clinical importance of the finding. These misinterpretations — all of which appear in published Ayurvedic research — are not minor technical errors. They are fundamental conceptual confusions that render the conclusions built on them unreliable.

An understanding of the distinction between statistical and clinical significance. A result can be statistically significant — genuinely reflecting a real effect rather than sampling variability — and clinically meaningless — reflecting an effect too small to matter to patients. This distinction, which requires understanding both the p-value and the effect size, is one of the most consistently confused in clinical research worldwide and one of the most consistently absent from Ayurvedic research reporting.

An understanding of uncertainty as information. A confidence interval — the range of values within which the true effect plausibly lies — is more informative than a p-value because it communicates the magnitude and direction of the effect along with the precision of the estimate. A 95% confidence interval that runs from a trivially small to a clinically substantial effect is telling us something important: that the study was insufficiently powered to determine whether the effect is real and important or real and trivial. A p-value of 0.04 attached to the same data tells us only that something real is probably happening — not whether it matters.

None of these understandings require advanced mathematics. They require conceptual clarity — the patient, careful development of ideas that take time and teaching quality to transmit, and that the current biostatistics curriculum, oriented toward procedures rather than understanding, consistently fails to build.

Part Three: The Textbook Problem — Procedures Without Principles

Dr. Aakash Kembhavi writes:

The biostatistics textbooks prescribed in most Ayurvedic postgraduate programs are not, in the main, designed to develop statistical thinking. They are designed to convey statistical procedures — the decision trees, the calculation methods, the tables of critical values, the worked examples. This is understandable: statistical procedures are teachable in a structured, sequential way; statistical thinking is harder to sequence and harder to examine.

But a textbook organized around procedures teaches students to ask “which test should I use?” before they have learned to ask “what question am I trying to answer?” and “what are the assumptions my analysis requires?” The procedural orientation produces exactly the kind of statistical fluency without understanding that the examination answer in this essay’s opening illustrates — the ability to identify the right test and execute it correctly in the service of a conceptually wrong conclusion.

The internationally recognized biostatistics texts that develop statistical thinking alongside statistical procedure — Altman’s Practical Statistics for Medical Research, Bland’s An Introduction to Medical Statistics, Greenhalgh’s How to Read a Paper — are rarely prescribed in Ayurvedic postgraduate programs. They are written for medical researchers rather than statisticians, they prioritize conceptual understanding alongside technical instruction, and they explicitly connect statistical reasoning to clinical decision-making in ways that make the relevance of every technical concept immediately clear.

Their absence from Ayurvedic biostatistics curricula is not a minor bibliographic omission. It is a reflection of a curriculum design that prioritizes the appearance of statistical literacy — the ability to cite tests, reproduce formulas, and calculate results — over the substance of it.

Part Four: The Faculty Problem — Teaching What Was Never Learned

Dr. Aakash Kembhavi writes:

The biostatistics teaching deficit in Ayurvedic institutions is not primarily a curriculum problem. It is a faculty problem — and it is a faculty problem for exactly the same reason that the dissertation methodology deficit described in Essay 9 exists: the faculty teaching biostatistics were themselves taught biostatistics as procedure rather than as thinking, and they are transmitting what they received.

The Ayurvedic faculty member teaching biostatistics in the fifth year is typically a senior clinician in Kayachikitsa, Swasthavritta, or Community Medicine whose statistical training was the same compressed, procedure-oriented module that they are now teaching. They can define the tests, demonstrate the calculations, and grade the examination answers. They cannot, in most cases, explain why a confidence interval is more informative than a p-value, demonstrate how a sample size calculation is connected to the clinical question it is designed to answer, or model the process of reading a published trial with the critical awareness that genuine statistical literacy provides.

This is not a critique of any individual faculty member. It is a description of a transmission failure that operates across generations and that requires systematic intervention — not the replacement of existing faculty with statisticians, but the sustained professional development of existing faculty in statistical thinking alongside statistical procedure.

The resources for this development exist. The international clinical research community has produced a wealth of accessible, non-mathematically intimidating introductions to statistical thinking for clinicians: the BMJ’s Statistics Notes series, the Users’ Guides to the Medical Literature series, the EQUATOR network’s methodological resources. Online courses in clinical biostatistics from Johns Hopkins, Edinburgh, and Toronto are freely available. The Cochrane Collaboration’s training resources are explicitly designed for clinicians with limited statistical backgrounds.

None of these are being systematically deployed in Ayurvedic faculty development programs. The professional development infrastructure that would translate them into institutional capacity — regular faculty biostatistics workshops, journal club programs with methodological focus, collaborative relationships with biostatistics departments of medical universities — is almost entirely absent.

Part Five: The Research Committee as Statistical Illiteracy’s Last Defense

Dr. Aakash Kembhavi writes:

If there is a single institutional moment at which statistical illiteracy could most efficiently be intercepted, it is the research committee review of dissertation protocols — the point at which a proposed study is evaluated before data collection begins. A research committee that asks the right methodological questions at this stage can prevent underpowered, unvalidated, and analytically unsound studies from being conducted in the first place, rather than noting their limitations after the fact.

The right methodological questions, at the protocol stage, include: What is your sample size justification? What power calculation have you performed? What is your primary outcome measure and how has it been validated? What is your blinding strategy? How will you handle missing data? What is your analysis plan for the primary and secondary outcomes? These questions are not exotic. They are standard components of any competent protocol review in clinical research.

In the typical Ayurvedic research committee meeting, these questions are not asked — not because the committee members are indifferent but because many of them do not have the statistical background to ask them with confidence or to evaluate the answers. The committee reviews the classical rationale, the formulation justification, the ethical considerations, and the logistics of data collection. The methodology chapter is reviewed for structural completeness rather than analytical soundness. The study is approved. The data is collected. The analysis reveals whatever it reveals. The conclusions are written to match the significant findings. The limitations section acknowledges what the study could not establish. The degree is awarded.

Every member of every Ayurvedic research committee should be required to complete a formal, assessed training program in clinical research methodology before they are authorized to review dissertation protocols. This is not an unreasonable standard. It is the minimum that the ethical responsibility of research oversight demands.

Part Six: What Statistical Education That Develops Thinking Would Look Like

Both authors write:

We offer this section from both our perspectives — the teacher who has seen what the current system produces and the graduate who experienced it from the inside — because the reform we are describing requires both the institutional authority that experience provides and the student’s-eye-view that only recent formation can supply.

Genuine statistical education in an Ayurvedic context would begin not with definitions but with questions. Not “what is a p-value?” but “How do you know that this treatment helped this patient rather than the patient improving on their own?” Not “what is a confidence interval?” but “If I tell you that in this study the treatment reduced pain scores on average, what else do you need to know before you decide to use it?” Not “when do you use a chi-square test?” but “What makes one study more trustworthy than another?”

These questions engage the student’s existing reasoning capacity — the capacity for clinical judgment, for pattern recognition, for causal reasoning — and connect it to the statistical framework rather than replacing it with an alien procedural system. They establish the relevance of every technical concept before the concept is introduced. They teach the student to read a research paper as a critical question rather than a source of facts.

The curriculum would include structured journal club sessions beginning in the third year, in which students apply a simplified critical appraisal tool to a published Ayurvedic study. The tool would ask: How many patients? Was there a comparison group? How were outcomes measured? What did the statistics show? Does the conclusion match the evidence? These questions require no mathematical training. They require the application of common sense to the specific logic of clinical evidence — and they develop, through repeated practice, the orientation that statistical thinking requires.

The curriculum would include explicit teaching of the p-value’s limitations — not merely its definition but the documented history of its misinterpretation, the international movement toward effect size reporting and confidence intervals, and the reasons why many leading journals are moving away from significance thresholds as the primary criterion for publication. Students who understand why the p-value is problematic understand what it is far more deeply than students who can only define it correctly.

The curriculum would include the reading of retracted papers — studies that were published, found to be flawed, and subsequently withdrawn — as pedagogical tools. Nothing develops critical appraisal skills more effectively than the analysis of a study that went wrong: where was the flaw, when could it have been caught, and what would a careful review at each stage have identified? This is the statistical equivalent of the clinical case study, and it is absent from every Ayurvedic biostatistics curriculum we are aware of.

The examination would test statistical thinking rather than statistical procedure. Not “calculate the standard deviation of this dataset” but “a study reports a statistically significant result with p = 0.04 and an effect size of 0.2 on a scale of 0-100. The study enrolled 200 patients. Is this finding clinically important? Justify your answer.” The answer to this question requires understanding variability, effect size, clinical significance, and the relationship between sample size and statistical power — not the ability to execute a formula.

Conclusion: The Number That Means Nothing Until It Means Everything

The p-value is the most misunderstood number in clinical medicine. It has been the subject of an extraordinary volume of methodological commentary, a formal statement by the American Statistical Association calling for a fundamental change in how it is used and interpreted, and a growing international movement toward statistical approaches that better represent the uncertainty in clinical research findings.

In Ayurvedic research, it is cited in virtually every dissertation and published trial as the primary — often the only — criterion for the validity of conclusions. It is taught as a decision threshold. It is reported as a verdict. And it is almost never explained in terms of what it actually measures, what it does not measure, and what additional information is required before a clinical conclusion can be drawn from it.

This is the emblematic failure of statistical education in Ayurvedic institutions: the most important number in the research enterprise is the number that students can calculate without understanding, cite without interpreting, and deploy without knowing what it costs them when they get it wrong.

Statistical thinking is not a supplementary skill for researchers. It is the epistemological infrastructure of evidence-based practice. A clinician who cannot think statistically cannot evaluate the evidence for what they prescribe, cannot protect their patients from treatments that appear to work but do not, and cannot contribute to the collective knowledge enterprise that gives clinical medicine its self-correcting capacity.

Teaching the formula without the thinking is not statistical education. It is statistical theater — the performance of quantitative rigor in the service of qualitative certainty that the numbers were never designed to provide.

The thinking must come first. Everything else follows.


Share your thoughts in the comments below.