The P-value: A Beginner’s Guide for Medical and Public Health Researchers

by , , | Nov 22, 2024 | Epidemiology

If you have ever taken a course in statistics or read a scientific article, you must have come across the p-value.  It is an unavoidable and very commonly used concept in research, especially in fields like medical and public health. But what exactly does it mean? Why is it important? And how can you use it in your research? These are questions that every beginner researcher needs to understand, as the p-value plays a key role in making decisions based on data.

This guide is meant to demystify the p-value, breaking down its purpose, how to interpret it, and its advantages and limits. By the end, you should feel confident in your ability to not only understand the p-value but also to use it effectively in your research.

So, what is the p-value?

Essentially, the p-value helps us answer one important question: If there were no true effect (i.e., the null hypothesis is true), how likely is it that I would see the effect observed in my study just by chance? Think of it this way: Suppose you’re studying whether a new drug improves patient outcomes compared to the standard treatment. If there truly were no difference between the two (this is what we call the null hypothesis), the p-value tells you how likely it is that the differences you observe in your study are just random noise. A smaller p-value means it’s less likely that what you observed happened by chance, suggesting there’s a genuine effect.  

Here’s a more formal definition: The p-value is the probability that the observed data would occur if the null hypothesis were true. The lower the p-value, the less compatible your data are with the null hypothesis.

Understanding the Null Hypothesis Significance Testing (NHST)

To understand the p-value better, you need to understand what a hypothesis is and what a hypothesis test is all about. In research, a “Hypothesis is a formal and testable statement or claim about the relationship between variables (or observable phenomenon). For example, we can hypothesize that there are more women in the world than men. This is a claim that is testable.

Null Hypothesis Significance Testing (NHST), also known as Statistical Hypothesis Testing (SHT), is a fundamental approach in statistical inference used to make decisions about parameters in the broader population based on a sample of data from it. This method involves formulating and testing two competing hypotheses: the null hypothesis and the alternative hypothesis.

  • Null Hypothesis (H₀): It is a statement of no effect, no difference, or no association or no change in your study, —think of it as the “status quo”.  For example, if you’re testing whether a new treatment is better than the current one, the null hypothesis would claim they are equally effective.  It is the hypothesis that researchers try to reject or disprove. It is statistically tested under the assumption that it is true.
  • Alternative Hypothesis (H₁): This is the opposite of the null hypothesis. In our example, this would claim the new treatment is better than the current one. It typically represents the research hypothesis or the effect that researchers are looking for. It is accepted if there is sufficient evidence to reject the null hypothesis.

Your goal in NHST is to collect data and use statistical tests to decide whether to reject the null hypothesis in favor of the alternative. If your data provide enough evidence (i.e., if the p-value is low enough), you reject the null hypothesis and accept that the effect is real.

How Do You Use the P-value in Hypothesis Testing?

Let’s walk through an example. Imagine you’re investigating whether smoking increases the risk of lung cancer.

  • The null hypothesis (H0) would be: Smoking is not associated with lung cancer.
  • The alternative hypothesis (H1) would be: Smoking is associated with lung cancer.

To test this hypothesis pair, collect data from your study participants, then you run a statistical test—let’s say a t-test—and the software calculates a p-value for you. If the p-value is small (commonly set at a threshold of 0.05), it suggests that the observed association between smoking and lung cancer is unlikely to be due to chance alone. You would then reject the null hypothesis, concluding that smoking is indeed associated with lung cancer. However, if the p-value is greater than 0.05, you wouldn’t have enough evidence to reject the null hypothesis. This doesn’t necessarily mean the null hypothesis is true—it just means your data aren’t strong enough to say otherwise. Findings yielding p-values smaller than the cut-off value (<0.05) are typically considered as ‘statistically significant’, while findings with p-values equal to or larger than the cutoff are described as ‘non-statistically significant’.

Procedure of NHST

The NHST process typically involves these steps:

  1. Clearly formulate the null and alternative hypotheses, based on the research question.
  2. Set Type I error rate  (α or critical value), which defines the probability of rejecting the null hypothesis when it is actually true (False Positive).
  3. Choose a significance level (commonly α = 0.05 or 0.01), which represents the threshold for determining statistical significance. The smaller the α, the stricter the test, reducing the chance of a Type I error but increasing the chance of a Type II error.
  4. Choose an appropriate statistical test, based on your study design, data type, and research question. Be mindful of test assumptions (e.g., normal distribution for t-tests) and select statistical software to compute the test. Establish your decision rule, considering whether to use a one-tailed or two-tailed test depending on your hypothesis.
  5. Then compute the test statistic and p-value, ensuring proper data handling and cleaning processes are followed beforehand. .
  6. Compare the p-value to the significance level and make a decision based on the rule for rejecting H0:

If p ≤ α, reject the null hypothesis and accept the alternative hypothesis.

If p > α, fail to reject the null hypothesis.

Remember, failing to reject does not prove the null hypothesis is true—it only suggests the data do not strongly contradict it. Also, consider the practical significance of your results alongside the statistical significance. Also, interpret the results in the context of the research question.

Interpreting P-values in Research

The conventional threshold for p-values in most fields is 0.05. This means that if your p-value is less than or equal to 0.05, the results are considered statistically significant. In other words, you’re saying there’s only a 5% (or lower) chance that you would see these results if the null hypothesis were true.

  • A p-value of 0.05 means there’s a 5% probability of observing your data, or something more extreme, if the null hypothesis is true.
  • A p-value of 0.01 means there’s only a 1% chance of seeing such extreme data under the null hypothesis.

On the other hand, if the p-value is greater than 0.05, it suggests that the observed results are not unusual under the null hypothesis. This doesn’t mean the null hypothesis is true, but it does mean that you lack strong evidence to reject it.

How to Report P-values

There are a couple of ways to report p-values in your research:

  1. General reporting: You could report it as, for example, “p > 0.05” or “p < 0.01.”
  2. Exact reporting: You can also report the exact p-value, at 2 or 3 decimal places, like “p = 0.023.” If the p-value is less than 0.001, it’s often conventional to simply write “p < 0.001.”

Figure 1 below depicts one way to present p-values. P-values are denoted with asterisks and the corresponding threshold added to the footer. In simple terms, this manuscript presents p-values not in the table but add asterisks to the corresponding Odd ratios (95%CI) and specify in the legend of the table the p-value thresholds that are statistically significant according to the baseline characteristics, helping readers understand the strength of the evidence without adding many numbers in the main table.

Figure 01: The results section of a manuscript (Predictors and Consequences of HIV Status Disclosure in Adolescents living with HIV in Eastern Cape, South Africa: A Prospective Cohort Study) presenting p-values significance as footer notes

Meanwhile, in figure 2, P-values are presented directly after the OR and the 95%CI. It’s one of the most common ways of presenting it for the reader to directly visualize all the statistical tests performed per baseline characteristics and obverse their significance across the row. Here, the statistical significance for each variable is observed on the same line.

Figure 02: The results section of a manuscript (Hypertension among People living with HIV/AIDS in Cameroon: A Cross-sectional Analysis from Central Africa International Epidemiology Databases to Evaluate AIDS) presenting p-values significance in the main table.

Advantages of using the p-value in medical research

Why are p-values so popular in research? Here are some reasons:

  • Simplicity: It can be easily interpreted and computed.
  • Decision Making tool: It helps determine whether the results are likely due to chance, guiding researchers on whether to pursue a hypothesis further..
  • Versatility: P-values can be used with a wide range of statistical tests and models, making them applicable across different research designs.
  • The p-value provides a continuous measure of evidence against the null hypothesis.: Unlike binary decision-making (e.g., reject or fail to reject the null hypothesis), the p-value gives a range of evidence. Example: A p-value of 0.001 suggests stronger evidence against the null hypothesis than a p-value of 0.04. This continuous scale allows researchers to gauge the strength of the evidence instead of making a strict yes/no decision.
  • When used correctly, p-values help control the rate of false positive results (Type I errors) in research: If a researcher set the p-value threshold at 0.05, it means there’s a 5% chance of concluding there is an effect when there actually isn’t. This helps limit false positives, maintaining the integrity of scientific findings.

Limitations of P-value

Despite their usefulness, p-values are not without their shortcomings:

  • A small p-value doesn’t “prove” the alternative hypothesis is true. It only indicates that the null hypothesis should be rejected and does not otherwise indicate for which reason the alternative hypothesis may be true.
  • P-values are sensitive to sample size—a small sample can lead to a large p-value, even if there is a real effect.
  • Researchers may misinterpret the p-value, thinking it tells the probability of the null hypothesis being true or false, which it does not. This is particularly the case when there is poor replication of p-values.
  • Encourages p-hacking (Researchers may be tempted to manipulate data to achieve p < 0.05. This can lead to false positive results and publication bias).
  •  

How to Address the Limitations of P-values

To get a more complete picture, you should complement p-values with other statistical measures. You should:

  • Report effect sizes along with p-values is more meaningful than just knowing whether the effect exists or not.
  •  Report confidence intervals, to give a range of plausible values for your estimate, helping you gauge the precision of your results.
  • Consider bayesian methods, which offer an alternative approach, incorporating prior knowledge into the analysis.
  • Consider practical or clinical significance, not just statistical significance.
  • Pre-register studies (publishing protocols and data analysis plans) to prevent p-hacking st the data analysis stage.

Conclusion

The p-value remains a cornerstone of statistical inference in medical and public health research, offering a standardized measure to assess the strength of evidence against a null hypothesis. While it provides valuable insights, researchers must be aware of its limitations and potential for misinterpretation. By addressing the limitations of p-values through pre-registration of studies, transparent reporting, and a more holistic interpretation of results, researchers can improve the reliability and reproducibility of their work. To enhance the robustness of research findings, it’s crucial to complement p-values with effect sizes, confidence intervals, and, where appropriate, Bayesian methods. Additionally, researchers should focus on practical and clinical significance alongside statistical significance.

References

  • Goodman S. A dirty dozen: twelve P-value misconceptions. Semin Hematol. 2008;45:135–140. 7.
  • Lytsy P. P in the right place: Revisiting the evidential value of P-values. JEvidBasedMed.
  • 2018;11:288291. https://doi.org/10.1111/jebm.12319
  • Cohen J. The earth is round (p < .05). Am Psychol. 1994;49:997–1003. 8.
  • Szucs D, Ioannidis JPA. When null hypothesis significance testing is unsuitable for research: a reassessment. Front Hum Neurosci. 2017;11:390
  • Greenland S, Senn SJ, Rothman KJ, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31:337–350.
  • Daniël Lakens, The Practical Alternative to the p Value Is the Correctly Used p Value. Perspectives on Psychological Science 2021 Sep 4;23(1):578. doi: 10.1186/s12879-023-08544-x. PMID: 37667182; PMCID: PMC10478445.
  • Chén OY, Bodelet JS, Saraiva RG, Phan H, Di J, Nagels G, Schwantje T, Cao H, Gou J, Reinen JM, Xiong B, Zhi B, Wang X, de Vos M. The roles, challenges, and merits of the p-value. Patterns (N Y). 2023 Dec 8;4(12):100878. doi: 10.1016/j.patter.2023.100878. PMID: 38106615; PMCID: PMC10724370.
  • Zhu W. p < 0.05, < 0.01, < 0.001,  < 0.0001, < 0.00001, < 0.000001, or < 0.0000001 …. J Sport Health Sci. 2016 Mar;5(1):77-79. doi: 10.1016/j.jshs.2016.01.019. Epub 2016 Jan 21. PMID: 30356881; PMCID: PMC6191982.
  • Daniël Lakens, The Practical Alternative to the p-value Is the Correctly Used p-value. Perspectives on Psychological Science 2021

https://www.investopedia.com/terms/p/p-value.asp

Authors

  • Gabriel Mabou

    Gabriel Mabou (MPH), leads the Ethics Unit at CRENC-IeDEA, specializing in study protocol preparations and submissions to various ethics committees. He also serves as a Data Manager within the organization.

  • Dr Barche is a physician and holds a Masters in Public Health. He is a senior fellow at CRENC with interests in Data Science and Data Analysis.

  • Dr Ebasone is a physician and PhD Candidate at the University of Cape Town. He is the Director of Research Operations at CRENC. He is charged with coordinating the International Epidemiology Databases to Evaluate AIDS (IeDEA) in Cameroon.

Post Navigation

2 Comments

  1. Umar Usman

    There are some steps to take,first ìs you conduct a surveillance by collecting data,analysed,interpret and evaluate,it has to go through ERC,validation,confidentiality and data management.

    Reply
    • Gabriel Tchatchouang Mabou

      Hi Umar.,
      That’s very true. We have other series of blogs regarding ERC and confidentiality. You can go across all of them and propose new ideas for future blogs on the steps we might have missed.
      Thanks very much.

Submit a Comment

Your email address will not be published. Required fields are marked *

  Receive updates on new courses and blog posts

Never Miss a Thing!

Never Miss a Thing!

Subscribe to our mailing list to receive the latest news and updates on our webinars, articles and courses.

You have Successfully Subscribed!

Share This