By Milan Vodicka, Ph.D.
It seems that these days of August 2020 one finds in the “news” a vast field of argumentation that rests on the use of statistics and polling (which uses statistics). As usually, driven by the US cultural trends of knowledge deprivation and ignorance, many of these arguments are based on fallacies of not understanding what statistics and polling are and how they work.
The questions: Are the Covid-19 published statistics – regarding number of cases, number of hospitalizations, number of deaths – for the US and the world trustworthy? Does published upcoming elections polling reflect the true situation? Will the 2020 US Census count be reliable? Does anyone simply lie about the associated statistics and use them to “prove” his or her point?
Let us start with Statistics 101. Benjamin Disraeli: “There are three types of lies: lies, damn lies, and statistics.” Joseph Stalin: “A single death is a tragedy, a million deaths is a statistic.” Andrew Lang: “Most people use statistics like a drunk man uses a lamppost; more for support than illumination.”
You can Google more. Or, interpret and learn by yourself what is transpiring in the public discourse, in the nation and around you.
I cannot resist to paraphrase the one quote from Roger Jones. The original is: “I guess I think of lotteries as a tax on mathematically challenged.” Now, it could be: “I can think of Covid-19 as a science lesson on logically challenged.”
Put through a mill of logic, exposing oneself to a potential infection can turn out to be a fatal error. Such an error is “statistically significant.” In fact, it can literally turn one into a statistic. No excuses or pretzel twisted apologies work. The reality asserts itself. The consequences instill the discipline of “no logic errors.”
Statistics is a subset of mathematics; paradoxically, “being precise in its ambiguity.” It deals with “probability,” not “certainty,” yet in a precise way. Here is the first stone to trip on – many people are looking for certainty. It is not there, and therefore, according to them, what is there is worthless. This faulty logic leads to, “it is not certain that by not wearing mask or or by not keeping social distancing I shall get infected by Covid-19, therefore it makes no sense to adhere to these measures.” We all witness the outcomes of such “logic.”
Another fundamental stone to trip on is the relationship between a “sample” and its associated “population.” Statistics, and polling in particular, infer a probable, not certain, characteristics or trait for the entire population from a sample; not entirety of that population. Polling, jury selection process, or vaccine testing may serve as obvious examples of sampling and inference from the samples.
The selection of a sample, if involved, is crucial to the reliability of the statistical findings. It is important with regard to size and its profile of “representing the population.” Clearly, if – for example – the sample for asserting the average family income would be comprised of Walmart and Bill Gates families, the result would be vastly different than if the sample families were yours and mine.
Here we go, for statistical errors. Let us use “voting” to illustrate. Sampling bias – the sample is not representative of the population (only Republicans or only Democrats are sampled). Sampling impression (due to sample size). Testing bias (only certain questions asked, presenting something that is not true as true, or vice versa). Measurement bias (only certain methodology or terminology used). Reporting bias (failure to report).
Bottom line: Statistics are not infallible. They are a tool, to be used for a certain purpose. A lot of people use them in that way – indeed to the point of justifying the unjustifiable.
Any polling or statistical endeavor is invariably tied to the context. The notion “the more Covid-19 testing we do, the more Covid-19 infections we have” falls on its face. Yes, if we had zero positive tests, we would have zero infections; only relative to these test numbers. There still would be hospital admissions, symptoms diagnosis, and death numbers relative to those occurrences. The context.
Interpretation of statistics, translated into probabilities and risk taking, is the responsibility of each one of us. However, how each one of us does that, “may or may not” (statistically speaking) affect oneself and others.
Statistics trustworthiness depends on adherence to avoiding the errors, meaning adhering to mathematics and science. “Statistics are no substitute for judgement.” (Henry Clay, Sr.)