09: Test Flight

Overview

Today’s tutorial is a bit different - less tutorial and more supported practice. Instead of more new information, today’s session is an opportunity to try out the skills we’ve been working on.

We’ll look at a new dataset and work through the process of inspecting, cleaning, summarising, visualising, analysing, and reporting. Rather than solutions, this tutorial will support you to do these tasks yourself - but writing the code is up to you!

Setup

There’s no workbook this week, because we’re emulating the process of doing a data analysis from start to finish in R. Instead, create a new Quarto document to work in.

Make sure you start out by loading any necessary packages, minimally {tidyverse}.

Data

Today’s data comes from Body Positivity, but not for everyone (Simon & Hurst, 2021), which has publicly available data hosted on Figshare (h/t Dr Hurst for this!). You can certainly use the data directly from Figshare, but for the purposes of this tutorial, I’ve prepared a subset of the data that you can load in using the code below.

Read in from file:

bp_data <- here::here("data/bp_data.csv") |> readr::read_csv()

Read in from URL:

bp_data <- readRDS("https://raw.githubusercontent.com/drmankin/practicum/master/data/bp_data.rds")

Codebooks

There are two separate codebooks for this dataset. The first describes the demographic and other single-item variables in the dataset. The second describes the variables that are items belonging to measures, and which items belong to what measures.

Important

As in previous datasets we’ve used, I’ve introduced some changes to the data to mess it up a bit and give us opportunities to practice. If you want the real dataset, make sure you use the links above!

Demographics and Single-Item Variables
Variable Name	Type	Item	Response Options
recordeddate	date	Recorded date	NA
id	character	Unique participant id number	Alphanumeric identifier (three letters, four numbers)
gender	factor	What gender do you identify with?	Female, Male
age	factor	Which age bracket do you fall under?	18-29, 30-39, 40-49, 50+
height	numeric	Height (in m)	Range: 1.49 - 1.9
weight	numeric	Weight (in kg)	Range: 44 - 158.75
instagram	factor	Do you have an instagram account?	Yes, No
condition	factor	Whether participants viewed a body-positivity post featuring an average-sized model, larger model, or a control image about travel	Control, BPaverage, BPlarger
choice_perc	numeric	Percentage of healthy picks	Range: 0 - 100

Scale Items
Scale	Full Name	Subscale	Variable Prefix	Item Numbers	Items	Response Scale
PANAS	Positive and Negative Affect Scale	Positive	panas	1, 3, 5, 9, 10, 12, 14, 16, 17, 19	Interested, Excited, Strong, Enthusiastic, Proud, Alert, Inspired, Determined, Attentive, Active	Very Slightly/Not at all, A little, Moderately, Quite a bit, Extremely
PANAS	Positive and Negative Affect Scale	Negative	panas	2, 4, 6, 7, 8, 11, 13, 15, 18, 20	Distressed, Upset, Guilty, Scared, Hostile, Irritable, Ashamed, Nervous, Jittery, Afraid	Very Slightly/Not at all, A little, Moderately, Quite a bit, Extremely
BSS	Body Satisfaction Scale (State)	Single scale	bss	1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17	Whole Body, Head, Face, Jaw, Teeth, Nose, Mouth, Ears, Eyes, Shoulders, Neck, Chest, Tummy, Arms, Hands, Legs, Feet	Very dissatisfied, Moderately dissatisfied, Slightly dissatisfied, Undecided, Slightly satisfied, Moderately satisfied, Very satisfied
SAC	Social Comparison	Single scale	sac	1, 2, 3, 4	To what extent did you think overall about your appearance when viewing these images?To what extent did you compare your overall appearance to the individuals in the Instagram images?To what extent did you compare your stomach to the individuals in the Instagram images?To what extent did you compare your thighs to the individuals in the Instagram images?	Not at all, 2, 3, 4, A lot
BAS	Body Appreciation Scale (Trait)	Single scale	bas	1, 2, 3, 4, 5, 6, 7, 8, 9, 10	I respect my bodyI feel good about my bodyI feel that my body has at least some good qualitiesI take a positive attitude toward my bodyI am tentative to my body's needsI feel love for my bodyI appreciate the different and unique characteristics of my bodyMy behaviour reveals my positive attitude toward my body: eg. I walk holding my head high and smilingI am comfortable in my bodyI feel like I am beautiful even if I am different from media images of attractive people eg. models, actresses	Never, Seldom, Sometimes, Often, Always
PACSR	Physical Appearance Comparison Scale, Revised	Single scale	pacsr	1, 2, 3, 4	When I meet a new person (same sex), I compare my body size to their body size.When I am out in public, I compare my body fat to the body fat of others.When I am at a party, I compare my body shape to the body shape of others.When I am out in public, I compare my body size to the body size of others	Never, Sometimes, About half the time, Most of the time, Always

Data Analysis Tips

Plan Your Process

The headings in the next section provide a basic outline of the steps to help you progress through the steps of the analysis, along with some suggestions about things to check or do at each step. However, you may want to add steps, or find that others aren’t necessary. For your own data analysis, it’s a good idea to have a clear idea of what you are try to achieve overall and at each step, so you know when you’ve done it successfully!

Keep a Record

Today it’s up to you to create your own code chunks, make your own notes, and write your own report. As you go, you’ll be writing a continuous script across multiple code chunks. For this code to work as expected, it’s essential that you keep a record of each bit of code that contributes to the final output.

However, for future use, it’s almost as important to clearly note the code that didn’t work, or that was only for the purpose of testing or understanding your code rather than that actually contributes to the analysis process. For this, make liberal use of # comments. You can add comments to your code, and also “comment out” code that was useful at the time but was only for you at the time.

In the example below, I’ve made a comment, preceded by ##. This is my personal convention - just one # will do, but I use two (or more) to visually distinguish from commented-out code. That’s the next two lines: a bit of code that I might have used to check whether my exclusions or mutate commands worked as expected. This code isn’t a part of the main tasks I want to do with my code, so I don’t need this to print out when I run the code in the future; but it’s handy to have it, so I know I checked and how, and so if I need to check again, I have the code easy to hand.

## Checking whether the counts came out

# data |> 
#   dplyr::count(condition)

Tip

To easily comment out any bit of code, highlight (select) it with your cursor and press Ctrl/Cmd + Shift + C on your keyboard.

Be Consistent

This tutorial is chance to work on some skills that you might not have had a lot of practice with yet, like creating and naming objects, managing your environment, and keeping track of what you have done. You may find this confusing or frustrating at times - don’t worry, that’s normal, and will get much easier with practice.

However, you can make things easier from the start by thinking strategically about how to manage your code and objects. The key is to be consistent, in a variety of respects.

Variable and object names

Decide on a consistent convention for naming objects, and variables/elements within objects. Thus far we’ve been using snake_case, but you can of course use another convention of your choice; I’d recommend avoiding a_mixOf.cases

Consider using suffixes, such as _data for datasets, _tab for tables, _lm, _afx for models, _n for objects containing counts, etc.

Packages and functions

Decide whether you use explicit style (i.e. package::function()) or not. Do you need to load more packages to make sure all your functions work, if you are not using explicit style?

Overwriting vs. creating an object

This is one of the more nuanced elements of managing your environment. When you make many changes to a dataset, consider how many you may need to make before you’ve made something “different”. It may help to think of creating a new object as a “checkpoint” that you can return to easily.

Reduce Redundancies

Once you have some hands-on practice, you can start to look for opportunities to make your code more resilient to errors, more efficient, and more versatile. Keep an eye out for places where your code repeats or is identical, and/or you had to copy-paste multiple times; you might want to use a more efficient iteration function, or write your own function.

Similarly, watch out for places where you, the individual, have to remember to do something - like add or delete elements, do a calculation, or update the code - in order for it to work. These are places where you might consider instead using an object instead of hard-coded value, so you can easily update your code later.

The Whole Shebang

The following sections are intended to prompt you through the whole process of inspecting, cleaning, wrangling, summarising, visualising, analysing, and reporting results. You will notice there are no solutions! All of the exercises (and hints) are suggestions, but you’re welcome to make use of this dataset and workbook however is helpful for you.

Remember, rather than working in a workbook, you should create your own Quarto document and work in that file, creating code chunks as necessary.

Planning

The independent manipulation in this experiment was the image condition: whether participants saw an average-sized or larger-sized model in a body positivity post, or a control travel image. A key outcome was the selections participants subsequently made from an example menu, with percentage of healthy picks provided in the dataset.

Percentage of healthy picks would be a sensible outcome variable to investigate, but you are welcome to choose something else. You may want to choose a couple categorical variables - for example, condition, age, and gender - along with a couple measures of your choice.

Inspecting

Start by getting familiar with the overall dataset. Have a look at it, and look through the codebook. Is everything in order? Are there any issues?

Create some summaries of the variables in the dataset. Are there any potential problems that need addressing?

Cleaning

Make any changes necessary to clean the data. Remember to keep track of any exclusions that you need to report.

Wrangling

Change or create any new variables. Minimally, you will need to create mean scores for each of the multi-item measures, making use of the Codebook to do so.

Summarising

Create at least one nicely formatted summary table describing the variables of interest, split up by (at least) experimental condition. Feel free to add more if you like.

Analysing

Choose an analysis to perform and do it.

Tip

Remember you can always refer to the discovr tutorials for more help!

Visualising

Choose a visualisation that reflects the key comparison or relationship of interest that you investigated in your analysis, and create it. Remember that you can build a visualisation from scratch with {ggplot2}, or use one of the “shortcut” plotting functions we have covered.

Reporting

Write up a formal report of the analysis from start to finish, including:

A statement of the key comparison or relationship of interest
A description of relevant data cleaning procedures, including any exclusions
A nicely formatted descriptive table
A full report and interpretation of the analysis
An accompanying visualisation