If a good research question equates to a story then, a roadmap will be very vital for good storytelling. We advise every student/researcher to personally write his/her data analysis plan before seeking any advice. In this blog article, we will explore how to create a data analysis plan: the content and structure.
This data analysis plan serves as a roadmap to how data collected will be organised and analysed. It includes the following aspects:
- Clearly states the research objectives and hypothesis
- Identifies the dataset to be used
- Inclusion and exclusion criteria
- Clearly states the research variables
- States statistical test hypotheses and the software for statistical analysis
- Creating shell tables
1. Stating research question(s), objectives and hypotheses:
All research objectives or goals must be clearly stated. They must be Specific, Measurable, Attainable, Realistic and Time-bound (SMART). Hypotheses are theories obtained from personal experience or previous literature and they lay a foundation for the statistical methods that will be applied to extrapolate results to the entire population.
2. The dataset:
The dataset that will be used for statistical analysis must be described and important aspects of the dataset outlined. These include; owner of the dataset, how to get access to the dataset, how the dataset was checked for quality control and in what program is the dataset stored (Excel, Epi Info, SQL, Microsoft access etc.).
3. The inclusion and exclusion criteria:
They guide the aspects of the dataset that will be used for data analysis. These criteria will also guide the choice of variables included in the main analysis.
4. Variables:
Every variable collected in the study should be clearly stated. They should be presented based on the level of measurement (ordinal/nominal or ratio/interval levels), or the role the variable plays in the study (independent/predictors or dependent/outcome variables). The variable types should also be outlined. The variable type in conjunction with the research hypothesis forms the basis for selecting the appropriate statistical tests for inferential statistics. A good data analysis plan should summarize the variables as demonstrated in Figure 1 below.
5. Statistical software
There are tons of software packages for data analysis, some common examples are SPSS, Epi Info, SAS, STATA, Microsoft Excel. Include the version number, year of release and author/manufacturer. Beginners have the tendency to try different software and finally not master any. It is rather good to select one and master it because almost all statistical software have the same performance for basic and the majority of advance analysis needed for a student thesis. This is what we recommend to all our students at CRENC before they begin writing their results section.
6. Selecting the appropriate statistical method to test hypotheses
Depending on the research question, hypothesis and type of variable, several statistical methods can be used to answer the research question appropriately. This aspect of the data analysis plan outlines clearly why each statistical method will be used to test hypotheses. The level of statistical significance (p-value) which is often but not always <0.05 should also be written. Presented in figures 2a and 2b are decision trees for some common statistical tests based on the variable type and research question
A good analysis plan should clearly describe how missing data will be analysed.
7. Creating shell tables
Data analysis involves three levels of analysis; univariable, bivariable and multivariable analysis with increasing order of complexity. Shell tables should be created in anticipation for the results that will be obtained from these different levels of analysis. Read our blog article on how to present tables and figures for more details. Suppose you carry out a study to investigate the prevalence and associated factors of a certain disease “X” in a population, then the shell tables can be represented as in Tables 1, Table 2 and Table 3 below.
Table 1: Example of a shell table from univariate analysis
Table 2: Example of a shell table from bivariate analysis
Table 3: Example of a shell table from multivariate analysis
aOR = adjusted odds ratio
Summary
Now that you have learned how to create a data analysis plan, these are the takeaway points. It should clearly state the:
- Research question, objectives, and hypotheses
- Dataset to be used
- Inclusion and exclusion criteria
- Variable types and their role
- Statistical software and statistical methods
- Shell tables for univariate, bivariate and multivariate analysis
Further readings
Creating a Data Analysis Plan: What to Consider When Choosing Statistics for a Study https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4552232/pdf/cjhp-68-311.pdf
Creating an Analysis Plan: https://www.cdc.gov/globalhealth/healthprotection/fetp/training_modules/9/creating-analysis-plan_pw_final_09242013.pdf
Data Analysis Plan: https://www.statisticssolutions.com/dissertation-consulting-services/data-analysis-plan-2/
Thanks. Quite informative.
Educative write-up. Thanks.
Easy to understand. Thanks Dr
Very explicit Dr. Thanks
I will always remember how you help me conceptualize and understand data science in a simple way. I can only hope that someday I’ll be in a position to repay you, my dear friend.
Plan d’analyse
This is interesting, Thanks
Very understandable and informative. Thank you..
love the figures.
Nice, and informative
This is so much educative and good for beginners, I would love to recommend that you create and share a video because some people are able to grasp when there is an instructor. Lots of love
Thank you Doctor very helpful.
Thank you Doctor very helpful.
Educative and clearly written. Thanks
Well said doctor,thank you.But when do you present in tables ,bars,pie chart etc?
Very informative guide!