#| load packages
library(psych)
library(ggplot2)
library(gtsummary)
library(dplyr)
library(kfa)
From data to disaggregation to decisions: Data manipulation and descriptives
Step 1: Load R packages we will be using today
Remember to load your R packages every time you open your workspace in R studio
Step 2: Read your data into R and check it
It’s important before starting analysis to check that your data is properly formatted. There are a different commands to do so in R - we give examples of two here.
#Load your data
<-read.csv(file.path("Data/cint_data.csv"), as.is = T)
dat
#Check your data: Option 1
head(dat)
This will give you all variable names and the first six rows in the data frame you just loaded. It’s looking good!
#Check your data: Option 1
str(dat)
This lists all variables, variable type, and first 10 entries (rows) vertically.
Step 3: Select your items and rename your data to facilitate descriptive analysis
In this step, you want to create a “mini data set” or data frame that contains only the items in your measure, not the demographic variables. This involves several sub-steps:
Create a character vector with your item variable names: Think of this like a list of labels, with each label being an item variable name in your measure. It contains no data.
Create a new mini data set or data frame: Tell R to take your original dataset (“dat”), and select the items from your list of labels (cint.lab) to create a new dataframe with the data from all internalizing items (cint.dat). Also create mini data frames for just the hypothesized depression items and just the hypothesized anxiety items.
Check your new data frames: Make sure that everything looks right!
#|item data frame
#1. Create a character vector or list with your internalizing item names
<-paste0("cint", c(1, 2, 4, 11, 19, 21:24, 27:30))
cintprint(cint)
## Also make separate lists of your hypothesized depression and anxiety items
<-paste0("cint", c(1, 2, 4, 11, 27:30))
cdep<-paste0("cint", c(19, 21:24))
canx
#2. Create an internalizing item data frame
<-dat %>%
cint.datselect(all_of(cint))
## Also create separate dataframes for our hypothesized depression and anxiety items.
<-dat %>%
cdep.datselect(all_of(cdep))
<-dat %>%
canx.datselect(all_of(canx))
#3. Check your new dataframes
str(cint.dat)
str(cdep.dat)
str(canx.dat)
Now, we are going to create another dataframe in which we rename the items with their actual questions so that it is easier to interpret how people are responding to the items. To do this, we will:
Create two new character vectors with both the item variable names and the item questions, one vector in Spanish and one in English. This takes a bit longer, but it helps in interpreting the results!
Create more new data frames: This time we will tell R to take our internalizing data frame we created above (cint.dat) and rename the variables with their longer labels.
#|item data frame - descriptive
#1. Create character vectors with your item variable names and questions in Spanish and English.
<- c(
cint.labels.esp "cint1: Se sienten tristes o decaídos",
"cint2: No disfrutan haciendo nada",
"cint4: Se sienten muy cansados y sin energía",
"cint11: Se sienten malhumorados o renegones",
"cint19: Temen interactuar con personas nuevas",
"cint21: Se preocupan cuando piensan que han hecho algo mal",
"cint22: Se preocupan cuando piensan que alguien está molesto con ellos",
"cint23: Se preocupan por lo que otras personas piensan de ellos",
"cint24: Se preocupan por cometer errores",
"cint27: Se sienten solos incluso cuando están con otras personas",
"cint28: Les molestan cosas que antes no les molestaban",
"cint29: Se van a su habitación y lloran",
"cint30: Se sienten intranquilos y caminan de un lado a otr"
)
<- c(
cint.labels.en "cint1: Feel sad or down",
"cint2: Do not enjoy anything anymore",
"cint4: Feel tired and drained of energy",
"cint11: Feel moody or grumpy",
"cint19: Afraid to interact with new people",
"cint21: Worry when they think they have done something poorly",
"cint22: Worry when they think someone is angry at them",
"cint23: Worry about what other people think of them",
"cint24: Worry about making mistakes",
"cint27: Feel lonely around other peoples",
"cint28: Get bothered by things that didn't bother them before",
"cint29: Go to their room and cry",
"cint30: Feel restless and walk around"
)
#2. Rename the items in your mini dataframe (cint.dat)
<- cint.dat%>% #tells R to take the mini dataset created above
cint.dat.items.esp rename_with(~cint.labels.esp) #tells R to rename the cint items with their labels.
<- cint.dat%>% #tells R to take the mini dataset created above
cint.dat.items.en rename_with(~cint.labels.en) #tells R to rename the cint items with their labels.
str(cint.dat.items.esp)
str(cint.dat.items.en)
Step 4: Calculate descriptive statistics for your items
Let’s start with the percentage of children who endorsed each response option in each item. Remember, children were asked to respond whether statements were true for them “never” (0), “rarely” (1), “sometimes” (2), or “almost always” (3).
#|item frequencies
##ESP
<-tbl_summary(cint.dat.items.esp)%>%
cint.table.freq.espmodify_header(label ~ "**Item**")
::kable(cint.table.freq.esp) knitr
Item | N = 840 |
---|---|
cint1: Se sienten tristes o decaídos | NA |
0 | 112 (13%) |
1 | 305 (36%) |
2 | 304 (36%) |
3 | 119 (14%) |
cint2: No disfrutan haciendo nada | NA |
0 | 218 (26%) |
1 | 266 (32%) |
2 | 224 (27%) |
3 | 132 (16%) |
cint4: Se sienten muy cansados y sin energía | NA |
0 | 124 (15%) |
1 | 260 (31%) |
2 | 318 (38%) |
3 | 138 (16%) |
cint11: Se sienten malhumorados o renegones | NA |
0 | 135 (16%) |
1 | 204 (24%) |
2 | 355 (42%) |
3 | 146 (17%) |
cint19: Temen interactuar con personas nuevas | NA |
0 | 190 (23%) |
1 | 186 (22%) |
2 | 249 (30%) |
3 | 215 (26%) |
cint21: Se preocupan cuando piensan que han hecho algo mal | NA |
0 | 81 (9.6%) |
1 | 163 (19%) |
2 | 299 (36%) |
3 | 297 (35%) |
cint22: Se preocupan cuando piensan que alguien está molesto con ellos | NA |
0 | 139 (17%) |
1 | 182 (22%) |
2 | 329 (39%) |
3 | 190 (23%) |
cint23: Se preocupan por lo que otras personas piensan de ellos | NA |
0 | 248 (30%) |
1 | 197 (23%) |
2 | 258 (31%) |
3 | 137 (16%) |
cint24: Se preocupan por cometer errores | NA |
0 | 98 (12%) |
1 | 159 (19%) |
2 | 320 (38%) |
3 | 263 (31%) |
cint27: Se sienten solos incluso cuando están con otras personas | NA |
0 | 285 (34%) |
1 | 204 (24%) |
2 | 214 (25%) |
3 | 137 (16%) |
cint28: Les molestan cosas que antes no les molestaban | NA |
0 | 188 (22%) |
1 | 231 (28%) |
2 | 278 (33%) |
3 | 143 (17%) |
cint29: Se van a su habitación y lloran | NA |
0 | 424 (50%) |
1 | 179 (21%) |
2 | 164 (20%) |
3 | 73 (8.7%) |
cint30: Se sienten intranquilos y caminan de un lado a otr | NA |
0 | 304 (36%) |
1 | 202 (24%) |
2 | 215 (26%) |
3 | 119 (14%) |
##EN
<-tbl_summary(cint.dat.items.en)%>%
cint.table.freq.enmodify_header(label ~ "**Item**")
::kable(cint.table.freq.en) knitr
Item | N = 840 |
---|---|
cint1: Feel sad or down | NA |
0 | 112 (13%) |
1 | 305 (36%) |
2 | 304 (36%) |
3 | 119 (14%) |
cint2: Do not enjoy anything anymore | NA |
0 | 218 (26%) |
1 | 266 (32%) |
2 | 224 (27%) |
3 | 132 (16%) |
cint4: Feel tired and drained of energy | NA |
0 | 124 (15%) |
1 | 260 (31%) |
2 | 318 (38%) |
3 | 138 (16%) |
cint11: Feel moody or grumpy | NA |
0 | 135 (16%) |
1 | 204 (24%) |
2 | 355 (42%) |
3 | 146 (17%) |
cint19: Afraid to interact with new people | NA |
0 | 190 (23%) |
1 | 186 (22%) |
2 | 249 (30%) |
3 | 215 (26%) |
cint21: Worry when they think they have done something poorly | NA |
0 | 81 (9.6%) |
1 | 163 (19%) |
2 | 299 (36%) |
3 | 297 (35%) |
cint22: Worry when they think someone is angry at them | NA |
0 | 139 (17%) |
1 | 182 (22%) |
2 | 329 (39%) |
3 | 190 (23%) |
cint23: Worry about what other people think of them | NA |
0 | 248 (30%) |
1 | 197 (23%) |
2 | 258 (31%) |
3 | 137 (16%) |
cint24: Worry about making mistakes | NA |
0 | 98 (12%) |
1 | 159 (19%) |
2 | 320 (38%) |
3 | 263 (31%) |
cint27: Feel lonely around other peoples | NA |
0 | 285 (34%) |
1 | 204 (24%) |
2 | 214 (25%) |
3 | 137 (16%) |
cint28: Get bothered by things that didn’t bother them before | NA |
0 | 188 (22%) |
1 | 231 (28%) |
2 | 278 (33%) |
3 | 143 (17%) |
cint29: Go to their room and cry | NA |
0 | 424 (50%) |
1 | 179 (21%) |
2 | 164 (20%) |
3 | 73 (8.7%) |
cint30: Feel restless and walk around | NA |
0 | 304 (36%) |
1 | 202 (24%) |
2 | 215 (26%) |
3 | 119 (14%) |
Now let’s examine children’s average responses on each of the items as well as the variation in responses.
#|item means and SDs
##ESP
<- tbl_summary(cint.dat.items.esp,
cint.table.msd.esptype=all_of(cint.labels.esp)~"continuous", ## tells R that all of your items - remember to use the character vector with the labels! - are continuous
statistic = all_continuous() ~ "{mean} ({sd})", ## tells R that you want the means and standard deviations for all of your continuous items
digits=all_of(cint.labels.esp)~2, ##tells R to print means and standard deviations to two decimal points
missing = "no" ##tells R that you don't want missing data listed
%>%
)modify_header(label ~ "**Item**")
::kable(cint.table.msd.esp) knitr
Item | N = 840 |
---|---|
cint1: Se sienten tristes o decaídos | 1.51 (0.89) |
cint2: No disfrutan haciendo nada | 1.32 (1.03) |
cint4: Se sienten muy cansados y sin energía | 1.56 (0.93) |
cint11: Se sienten malhumorados o renegones | 1.61 (0.95) |
cint19: Temen interactuar con personas nuevas | 1.58 (1.10) |
cint21: Se preocupan cuando piensan que han hecho algo mal | 1.97 (0.97) |
cint22: Se preocupan cuando piensan que alguien está molesto con ellos | 1.68 (1.00) |
cint23: Se preocupan por lo que otras personas piensan de ellos | 1.34 (1.07) |
cint24: Se preocupan por cometer errores | 1.89 (0.98) |
cint27: Se sienten solos incluso cuando están con otras personas | 1.24 (1.09) |
cint28: Les molestan cosas que antes no les molestaban | 1.45 (1.02) |
cint29: Se van a su habitación y lloran | 0.86 (1.02) |
cint30: Se sienten intranquilos y caminan de un lado a otr | 1.18 (1.07) |
##EN
<- tbl_summary(cint.dat.items.en,
cint.table.msd.entype=all_of(cint.labels.en)~"continuous", ## tells R that all of your items - remember to use the character vector with the labels! - are continuous
statistic = all_continuous() ~ "{mean} ({sd})", ## tells R that you want the means and standard deviations for all of your continuous items
digits=all_of(cint.labels.en)~2, ##tells R to print means and standard deviations to two decimal points
missing = "no" ##tells R that you don't want missing data listed
%>%
)modify_header(label ~ "**Item**")
::kable(cint.table.msd.en) knitr
Item | N = 840 |
---|---|
cint1: Feel sad or down | 1.51 (0.89) |
cint2: Do not enjoy anything anymore | 1.32 (1.03) |
cint4: Feel tired and drained of energy | 1.56 (0.93) |
cint11: Feel moody or grumpy | 1.61 (0.95) |
cint19: Afraid to interact with new people | 1.58 (1.10) |
cint21: Worry when they think they have done something poorly | 1.97 (0.97) |
cint22: Worry when they think someone is angry at them | 1.68 (1.00) |
cint23: Worry about what other people think of them | 1.34 (1.07) |
cint24: Worry about making mistakes | 1.89 (0.98) |
cint27: Feel lonely around other peoples | 1.24 (1.09) |
cint28: Get bothered by things that didn’t bother them before | 1.45 (1.02) |
cint29: Go to their room and cry | 0.86 (1.02) |
cint30: Feel restless and walk around | 1.18 (1.07) |
Step 5: Create histograms of your items
Sometimes it’s easier to interpret patterns in item responses if we present the information visually. So let’s also look at the histograms.
#|item histograms
#a function to create a set of histograms (don't need to change anything)
<- function(dat_s, nrow = 3, ncol = 3){
gen.hist par(mfrow = c(nrow,ncol),
mar = c(4,1,2,1))
for (i in 1:ncol(dat_s)) {
hist(dat_s[, i], main = "", xlab = names(dat_s)[i], col = "dark blue")
}
}
gen.hist(cint.dat)
Step 6: Examine item correlations
Now we want to move from thinking about each item individually to think about whether and how the items are associated with each other. We can use the psych package in R to easily create a correlation matrix - but with this many variables, it’s hard to interpret.
#|item correlation table
#item correlation table
corr.test(cint.dat)
So let’s see about a correlation heatmap.
#|item correlation heatmap: internalizing
#item correlation heatmap: internalizing
corPlot(cint.dat,
numbers=TRUE,
upper=FALSE,
diag=FALSE,
cex=.8,
main="cint correlations",
stars=TRUE)
Even this is alot! Let’s see about the depression and anxiety items separately
#|item correlation heatmap: depression/anxiety
#item correlation heatmap: depression/anxiety
corPlot(cdep.dat,
numbers=TRUE,
upper=FALSE,
diag=FALSE,
cex=.8,
main="cdep correlations",
stars=TRUE)
corPlot(canx.dat,
numbers=TRUE,
upper=FALSE,
diag=FALSE,
cex=.8,
main="canx correlations",
stars=TRUE)