How To Spot Bad Science: What To Watch Out For When You Read Scientific Studies | Healthy Living articles | Well Being center

How many times have you taken a news report that claims that "studies show..." at face-value? How many times have you quickly looked through an abstract and simply concluded that the study must have reached the results the authors say it did — because the study was, after all, in a peer-reviewed journal?

Research studies tend to offer more reliable evidence than, say, first-hand experience ("I had the flu and survived, so I know influenza sucks but isn't serious"), being a direct witness ("My neighbor was diagnosed with depression but he's fine on medication, so depression doesn't have to ruin your life"), or third-hand information ("My sister knows someone who managed to treat cancer homeopathically, so it's possible").

Studies are not, however, always free from methodological errors that make their conclusions unreliable. Studies can be poorly designed, their method of data collection may be flawed, the manner in which the data is analyzed can leave much to be desired, or other studies may have reached opposing conclusions.

And that's just the start — studies themselves can be bad, but the way they're talked about and reported in the news can be very bad, too! Sometimes the problem isn't with the study itself at all, but rather the way people read it or perhaps didn't read it.

What should you watch out for? Here, a doctor answers common questions laypeople have about reading scientific studies.

Isn't it enough for me to just read the abstract? The most important results are right there.

The abstract is there to give you a very brief overview of the study. Although it should correctly sum up all the chapters of the study, an abstract is not comprehensive enough and focuses only on the key points of each chapter. Therefore, some of the facts that are most relevant for you may be left out. This is especially important if you want to find out more about an over-the-counter drug, for which you don’t need a prescription.

For example, in this article, the abstract focuses on the benefits of aspirin for the treatment and prevention of cardiovascular disease, without going deeper into the usage details. If you read the entire article, you will see that there is a whole chapter (Chapter 6) referencing all the adverse effects of aspirin, some of which may be very serious and relevant to you.

I'm not very good with math, but studies often have lots of it. What can I do to understand the results? What do I need to know about error margins?

There is a lot of math and statistics in scientific studies, and most of it is contained in the "Results" section. This is the section where the study results are presented in absolute numbers, percentages, and statistical significance levels. The level of statistical significance is the only value you should take into account if you are not very familiar with science math principles.

For example, this study compared weight loss (and other parameters) in obese and overweight individuals using two dietary patterns: IER and CER. If you look at the mean values for weight loss, you will see that people from the IER group lost more weight than people from the CER group (6.4Kg vs 5.6Kg), which would suggest that ICR was a better method. However, this conclusion cannot be drawn simply based on these mean values. Other parameters matter — such as the number of participants, their distribution among the two groups, range of values, and much more.

Statistics take into account all those parameters and, based on them, chooses the best method to run the statistical analysis. In this case, although the IER group's mean value was better, statistical analysis has shown that the difference between the groups was not statistically significant. To you, this would mean that both groups had similar results and that IER and CER methods had similar weight loss outcomes.

In medical research, the level of statistical significance (p) is the value you are looking for and it is interpreted like this:

p < 0.10 – not significant

0.05 ≤ p ≤ 0.10 – suggestive

0.01 ≤ p < 0.05 – significant

0.001 ≤ p < 0.01 – highly significant

P < 0.001 – very highly significant

In the obesity study, the level of statistical significance was p=0.4, which means that the difference between the groups was not significant.

What about studies with contradictory results, or studies that different people analyze in different ways?

You should not miss the main point of the scientific research of this type. It is not performed to give you — as an individual — direct recommendations on what to do. It is performed in order to raise awareness around a subject, so that other researchers become interested and involved. Eventually, when there is enough data, we will be able to make recommendations for widespread use.

That is how we came to many clear conclusions over a long period of research, and most of them are familiar to you. Some obvious examples are:

Tobacco smoke is harmful for human health.
Aspirin is not safe for use in children.
Statins used in patients with hypercholesterolemia can cause liver damage.

How do I know what sample size is sufficient? How can I tell if the sample wasn't representative?

Choosing the appropriate sample size is a complicated subject and it depends on the research objectives. Statisticians can use complex methods to determine what is the necessary sample size for a particular case. Researchers can also rely on experience with previously conducted studies and use them as a reference to determine the appropriate sample size. For example, when investigating the prevalence of a rare disease, such as hemophilia, we need a much larger size than for more frequent diseases, such as gastric ulcers.

We have a similar situation if we want to find out whether some drug is efficient for the treatment of a particular condition. A smaller size is adequate to study an antibiotic drug used in the treatment of infection caused by a known bacteria. An antihypertensive drug used in the treatment of essential hypertension — which is a disease of unknown origin and with very variable clinical presentation and response to therapy — requires a much larger sample size.

Not having enough participants does not mean that the study is not valid! In a good study, if a researcher doesn’t have a large enough sample, they will make a note about it in the sub-section of the article often called “limitations of the study”. It is usually placed at the bottom of the discussion, before or after the conclusion section, and it often suggests that the study should be only taken as a foundation for conducting further research with larger sample size.

Can studies conducted specifically in order to reach certain conclusions, for commercial reasons for instance, ever reach scientifically accurate results?

Studies are always conducted in order to confirm or reject a certain hypothesis. In that sense, any study, conducted for any reason, should be considered valid if it follows proven methodology of scientific research. There is a process called “critical appraisal”, which is conducted in order to investigate all the aspects of the study and approve or disapprove its validity. “Peer review” is another way of proving the validity of the study, and it is an evaluation of the study performed by people dealing with the same or similar problems in the same scientific field. Peer reviewed journals contain peer reviewed articles, and many of them require critical appraisal to be conducted before the study can be published.

There is a large number of biased studies conducted for commercial and even political reasons, but the process of critical appraisal is taking that into consideration as well.

In conclusion, as far as science is concerned, there are tools to examin the validity of any study. The question is whether they have been used or not in a particular case. There is no place for personal opinion when it comes to scientific research. If you want to find relevant studies, it is better that you give up your personal opinion on the subject in question too, and to start looking for studies that have been thoroughly evaluated.

Why doesn't correlation equal causation? Surely, if a very large sample shows that breastfed babies have larger IQs later in life, that's got to have something to do with breastfeeding?

The simple fact that two variables are in correlation does not necessarily mean that there is a causal connection between them. Correlation between breastfeeding and higher IQ is a perfect example of how confounding factors can trick you, at a first glance, into thinking that breastfeeding makes babies smarter.

Confounding factors are variables which are bound to one of the variables that we try to correlate, thus making the correlation irrelevant. In this case, it didn’t take scientists a long time to find out why some studies had come up with this correlation. It has been found that mothers of breastfed babies were more likely to have a higher socioeconomic status, more likely to have a partner in the household, and less likely to be very young or smoking during pregnancy. These confounding factors are more likely to affect a child’s IQ than breastfeeding itself.

Why is a double-blind study considered the gold standard? Does that mean that studies which cannot be conducted in this way cannot be of good quality?

A double-blind study is a study in which neither the researchers nor the participants know who is in the experimental group and who is in the control group. This concept is good to follow whenever possible, because it cuts the probability of bias to a minimum. Of course, it cannot be applied in all studies and it is not necessary in order for the study to be perfectly valid. Usage of automated tools (software tests, etc.) to analyze results can help to prevent subjective opinion and influence of researchers on study results in cases in which the double-blind study design is not possible.

Couldn't find what you looking for?

TRY OUR SEARCH!