How scientists massage results with “P-Hacking”.


Jonathan kitchenGetty Images

The pursuit of science is geared towards searching for meaning in a maze of data. At least that’s how it’s supposed to work.

According to some reports, that facade began to crumble in 2010, when Cornell University social psychologist Daryl Bem published a 10-year analysis in the respected journal Journal of Personality and Social Psychology, who used widely accepted statistical methods to show that extrasensory perception (ESP), essentially the “sixth sense,” was an observable phenomenon. Unable to replicate the paper’s findings, Bem’s colleagues were quick to blame what we now refer to as “p-hacking,” a process of massaging and over-analyzing your data in search of statistically significant — and publishable — results.

♾ You love math. We do too. Let’s dive deep into its intricacies together – join Pop Mech Pro.

To support or refute a hypothesis, the goal is to establish statistical significance by recording a “p-value” of less than 0.05, explains Benjamin Baer, ​​a postdoctoral researcher and statistician at the University of Rochester. whose recent work is addressing this problem. The “p” in the p-value stands for probability and is a measure of how likely the outcome of a null hypothesis is compared to chance.

For example, if you wanted to test whether all roses were red, you would count the number of red roses and roses of other colors in a sample and perform a hypothesis test to compare the values. If this test returns a p-value of less than 0.05, then you have statistically significant reasons to say that only red roses exist — even if evidence outside of your flower sample suggests otherwise.

Misusing p-values ​​to support the idea that ESP exists may be relatively benign, but when this practice is used in medical trials, it can have far deadlier results, Baer says. “I think the big risk is that the wrong decision is made,” he explains based on what they should be.”

Baer was the first author of a paper published in the journal in late 2021 PNAS along with his former Cornell mentor and statistics professor Martin Wells, studied how new statistics could improve the use of p-values. The metric they examined is called the Fragility Index and is designed to complement and improve p-values.

This measure describes the fragility of a data set when some of its data points change from a positive to a negative result – for example, when a patient who has been positively influenced by a drug actually feels no effects. If the change in just a few of these data points is sufficient to downgrade a result from statistically significant to not significant, it is considered fragile.

p-value curve


In 2014, physician Michael Walsh originally beat the Fragility Index in the Journal of Clinical Epidemiology. In the article, he and his colleagues applied the fragility index to nearly 400 randomized controlled trials with statistically significant results and found that one in four had low fragility scores, meaning their results may not be very reliable or robust.

However, the fragility index still has a lot of momentum to gain in medical studies. Some critics of the approach have emerged, such as the Mayo Clinic’s Rickey Carter, who says it’s too similar to the p-values ​​without offering enough improvement. “The irony is that the Fragility Index was a p-hacking approach,” says Carter.

“Talking to the victim’s family after an operation went wrong is something else entirely [experience] as statisticians who sit at their desks and calculate.”

To improve the fragility index, Baer, ​​Wells, and colleagues focused on improving two main elements to answer previous criticisms: making only sufficiently probable modifications and generalizing the approach to look beyond binary 2×2 tables (representing positive or negative presenting control and experimental group results). .

Despite the uphill battle the Fragility Index has faced so far, Baer believes it’s still a useful metric for medical statisticians and hopes improvements in her recent work will help convince others of the same.

“Talking to the victim’s family after an operation went wrong is something else entirely [experience] than statisticians who sit at their desks and calculate,” says Baer.


About Author

Comments are closed.