bp_data <- here::here("data/bp_data.csv") |> readr::read_csv()09: Test Flight
Overview
Today’s tutorial is a bit different - less tutorial and more supported practice. Instead of more new information, today’s session is an opportunity to try out the skills we’ve been working on.
We’ll look at a new dataset and work through the process of inspecting, cleaning, summarising, visualising, analysing, and reporting. Rather than solutions, this tutorial will support you to do these tasks yourself - but writing the code is up to you!
Setup
There’s no workbook this week, because we’re emulating the process of doing a data analysis from start to finish in R. Instead, create a new Quarto document to work in.
Make sure you start out by loading any necessary packages, minimally {tidyverse}.
Data
Today’s data comes from Body Positivity, but not for everyone (Simon & Hurst, 2021), which has publicly available data hosted on Figshare (h/t Dr Hurst for this!). You can certainly use the data directly from Figshare, but for the purposes of this tutorial, I’ve prepared a subset of the data that you can load in using the code below.
Read in from file:
Read in from URL:
bp_data <- readRDS("https://raw.githubusercontent.com/drmankin/practicum/master/data/bp_data.rds")Codebooks
There are two separate codebooks for this dataset. The first describes the demographic and other single-item variables in the dataset. The second describes the variables that are items belonging to measures, and which items belong to what measures.
As in previous datasets we’ve used, I’ve introduced some changes to the data to mess it up a bit and give us opportunities to practice. If you want the real dataset, make sure you use the links above!
| Variable Name | Type | Item | Response Options |
|---|---|---|---|
| recordeddate | date | Recorded date | NA |
| id | character | Unique participant id number | Alphanumeric identifier (three letters, four numbers) |
| gender | factor | What gender do you identify with? | Female, Male |
| age | factor | Which age bracket do you fall under? | 18-29, 30-39, 40-49, 50+ |
| height | numeric | Height (in m) | Range: 1.49 - 1.9 |
| weight | numeric | Weight (in kg) | Range: 44 - 158.75 |
| factor | Do you have an instagram account? | Yes, No | |
| condition | factor | Whether participants viewed a body-positivity post featuring an average-sized model, larger model, or a control image about travel | Control, BPaverage, BPlarger |
| choice_perc | numeric | Percentage of healthy picks | Range: 0 - 100 |
| Scale | Full Name | Subscale | Variable Prefix | Item Numbers | Items | Response Scale |
|---|---|---|---|---|---|---|
| PANAS | Positive and Negative Affect Scale | Positive | panas | 1, 3, 5, 9, 10, 12, 14, 16, 17, 19 | Interested, Excited, Strong, Enthusiastic, Proud, Alert, Inspired, Determined, Attentive, Active | Very Slightly/Not at all, A little, Moderately, Quite a bit, Extremely |
| PANAS | Positive and Negative Affect Scale | Negative | panas | 2, 4, 6, 7, 8, 11, 13, 15, 18, 20 | Distressed, Upset, Guilty, Scared, Hostile, Irritable, Ashamed, Nervous, Jittery, Afraid | Very Slightly/Not at all, A little, Moderately, Quite a bit, Extremely |
| BSS | Body Satisfaction Scale (State) | Single scale | bss | 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 | Whole Body, Head, Face, Jaw, Teeth, Nose, Mouth, Ears, Eyes, Shoulders, Neck, Chest, Tummy, Arms, Hands, Legs, Feet | Very dissatisfied, Moderately dissatisfied, Slightly dissatisfied, Undecided, Slightly satisfied, Moderately satisfied, Very satisfied |
| SAC | Social Comparison | Single scale | sac | 1, 2, 3, 4 | To what extent did you think overall about your appearance when viewing these images?To what extent did you compare your overall appearance to the individuals in the Instagram images?To what extent did you compare your stomach to the individuals in the Instagram images?To what extent did you compare your thighs to the individuals in the Instagram images? | Not at all, 2, 3, 4, A lot |
| BAS | Body Appreciation Scale (Trait) | Single scale | bas | 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 | I respect my bodyI feel good about my bodyI feel that my body has at least some good qualitiesI take a positive attitude toward my bodyI am tentative to my body's needsI feel love for my bodyI appreciate the different and unique characteristics of my bodyMy behaviour reveals my positive attitude toward my body: eg. I walk holding my head high and smilingI am comfortable in my bodyI feel like I am beautiful even if I am different from media images of attractive people eg. models, actresses | Never, Seldom, Sometimes, Often, Always |
| PACSR | Physical Appearance Comparison Scale, Revised | Single scale | pacsr | 1, 2, 3, 4 | When I meet a new person (same sex), I compare my body size to their body size.When I am out in public, I compare my body fat to the body fat of others.When I am at a party, I compare my body shape to the body shape of others.When I am out in public, I compare my body size to the body size of others | Never, Sometimes, About half the time, Most of the time, Always |
Data Analysis Tips
Plan Your Process
The headings in the next section provide a basic outline of the steps to help you progress through the steps of the analysis, along with some suggestions about things to check or do at each step. However, you may want to add steps, or find that others aren’t necessary. For your own data analysis, it’s a good idea to have a clear idea of what you are try to achieve overall and at each step, so you know when you’ve done it successfully!
Keep a Record
Today it’s up to you to create your own code chunks, make your own notes, and write your own report. As you go, you’ll be writing a continuous script across multiple code chunks. For this code to work as expected, it’s essential that you keep a record of each bit of code that contributes to the final output.
However, for future use, it’s almost as important to clearly note the code that didn’t work, or that was only for the purpose of testing or understanding your code rather than that actually contributes to the analysis process. For this, make liberal use of # comments. You can add comments to your code, and also “comment out” code that was useful at the time but was only for you at the time.
In the example below, I’ve made a comment, preceded by ##. This is my personal convention - just one # will do, but I use two (or more) to visually distinguish from commented-out code. That’s the next two lines: a bit of code that I might have used to check whether my exclusions or mutate commands worked as expected. This code isn’t a part of the main tasks I want to do with my code, so I don’t need this to print out when I run the code in the future; but it’s handy to have it, so I know I checked and how, and so if I need to check again, I have the code easy to hand.
## Checking whether the counts came out
# data |>
# dplyr::count(condition)To easily comment out any bit of code, highlight (select) it with your cursor and press Ctrl/Cmd + Shift + C on your keyboard.
Be Consistent
This tutorial is chance to work on some skills that you might not have had a lot of practice with yet, like creating and naming objects, managing your environment, and keeping track of what you have done. You may find this confusing or frustrating at times - don’t worry, that’s normal, and will get much easier with practice.
However, you can make things easier from the start by thinking strategically about how to manage your code and objects. The key is to be consistent, in a variety of respects.
Variable and object names
Decide on a consistent convention for naming objects, and variables/elements within objects. Thus far we’ve been using snake_case, but you can of course use another convention of your choice; I’d recommend avoiding a_mixOf.cases
Consider using suffixes, such as _data for datasets, _tab for tables, _lm, _afx for models, _n for objects containing counts, etc.
Packages and functions
Decide whether you use explicit style (i.e. package::function()) or not. Do you need to load more packages to make sure all your functions work, if you are not using explicit style?
Overwriting vs. creating an object
This is one of the more nuanced elements of managing your environment. When you make many changes to a dataset, consider how many you may need to make before you’ve made something “different”. It may help to think of creating a new object as a “checkpoint” that you can return to easily.
Reduce Redundancies
Once you have some hands-on practice, you can start to look for opportunities to make your code more resilient to errors, more efficient, and more versatile. Keep an eye out for places where your code repeats or is identical, and/or you had to copy-paste multiple times; you might want to use a more efficient iteration function, or write your own function.
Similarly, watch out for places where you, the individual, have to remember to do something - like add or delete elements, do a calculation, or update the code - in order for it to work. These are places where you might consider instead using an object instead of hard-coded value, so you can easily update your code later.
The Whole Shebang
The following sections are intended to prompt you through the whole process of inspecting, cleaning, wrangling, summarising, visualising, analysing, and reporting results. You will notice there are no solutions! All of the exercises (and hints) are suggestions, but you’re welcome to make use of this dataset and workbook however is helpful for you.
Remember, rather than working in a workbook, you should create your own Quarto document and work in that file, creating code chunks as necessary.
Planning
The independent manipulation in this experiment was the image condition: whether participants saw an average-sized or larger-sized model in a body positivity post, or a control travel image. A key outcome was the selections participants subsequently made from an example menu, with percentage of healthy picks provided in the dataset.
Percentage of healthy picks would be a sensible outcome variable to investigate, but you are welcome to choose something else. You may want to choose a couple categorical variables - for example, condition, age, and gender - along with a couple measures of your choice.
Inspecting
Start by getting familiar with the overall dataset. Have a look at it, and look through the codebook. Is everything in order? Are there any issues?
Create some summaries of the variables in the dataset. Are there any potential problems that need addressing?
Cleaning
Make any changes necessary to clean the data. Remember to keep track of any exclusions that you need to report.
Wrangling
Change or create any new variables. Minimally, you will need to create mean scores for each of the multi-item measures, making use of the Codebook to do so.
Summarising
Create at least one nicely formatted summary table describing the variables of interest, split up by (at least) experimental condition. Feel free to add more if you like.
Analysing
Choose an analysis to perform and do it.
Remember you can always refer to the discovr tutorials for more help!
Visualising
Choose a visualisation that reflects the key comparison or relationship of interest that you investigated in your analysis, and create it. Remember that you can build a visualisation from scratch with {ggplot2}, or use one of the “shortcut” plotting functions we have covered.
Reporting
Write up a formal report of the analysis from start to finish, including:
- A statement of the key comparison or relationship of interest
- A description of relevant data cleaning procedures, including any exclusions
- A nicely formatted descriptive table
- A full report and interpretation of the analysis
- An accompanying visualisation