From data to disaggregation to decisions: Data manipulation and descriptives

Author

NYU Global TIES for Children

Step 1: Load R packages we will be using today

Remember to load your R packages every time you open your workspace in R studio

#| load packages

library(psych)
library(ggplot2)
library(gtsummary)
library(dplyr)
library(kfa)

Step 2: Read your data into R and check it

It’s important before starting analysis to check that your data is properly formatted. There are a different commands to do so in R - we give examples of two here.

#Load your data
dat <-read.csv(file.path("Data/cint_data.csv"), as.is = T)

#Check your data: Option 1
head(dat)

This will give you all variable names and the first six rows in the data frame you just loaded. It’s looking good!

#Check your data: Option 1
str(dat)

This lists all variables, variable type, and first 10 entries (rows) vertically.

Step 3: Select your items and rename your data to facilitate descriptive analysis

In this step, you want to create a “mini data set” or data frame that contains only the items in your measure, not the demographic variables. This involves several sub-steps:

  1. Create a character vector with your item variable names: Think of this like a list of labels, with each label being an item variable name in your measure. It contains no data.

  2. Create a new mini data set or data frame: Tell R to take your original dataset (“dat”), and select the items from your list of labels (cint.lab) to create a new dataframe with the data from all internalizing items (cint.dat). Also create mini data frames for just the hypothesized depression items and just the hypothesized anxiety items.

  3. Check your new data frames: Make sure that everything looks right!

#|item data frame

#1. Create a character vector or list with your internalizing item names

cint<-paste0("cint", c(1, 2, 4, 11, 19, 21:24, 27:30))
print(cint)

## Also make separate lists of your hypothesized depression and anxiety items

cdep<-paste0("cint", c(1, 2, 4, 11, 27:30))
canx<-paste0("cint", c(19, 21:24))

#2. Create an internalizing item data frame

cint.dat<-dat %>%
select(all_of(cint))

## Also create separate dataframes for our hypothesized depression and anxiety items. 
cdep.dat<-dat %>%
select(all_of(cdep))

canx.dat<-dat %>%
select(all_of(canx))

#3. Check your new dataframes
str(cint.dat)
str(cdep.dat)
str(canx.dat)

Now, we are going to create another dataframe in which we rename the items with their actual questions so that it is easier to interpret how people are responding to the items. To do this, we will:

  1. Create two new character vectors with both the item variable names and the item questions, one vector in Spanish and one in English. This takes a bit longer, but it helps in interpreting the results!

  2. Create more new data frames: This time we will tell R to take our internalizing data frame we created above (cint.dat) and rename the variables with their longer labels.

#|item data frame - descriptive

#1. Create character vectors with your item variable names and questions in Spanish and English. 

cint.labels.esp <- c(
"cint1: Se sienten tristes o decaídos",
"cint2: No disfrutan haciendo nada",
"cint4: Se sienten muy cansados y sin energía",
"cint11: Se sienten malhumorados o renegones",
"cint19: Temen interactuar con personas nuevas",
"cint21: Se preocupan cuando piensan que han hecho algo mal",
"cint22: Se preocupan cuando piensan que alguien está molesto con ellos",
"cint23: Se preocupan por lo que otras personas piensan de ellos",
"cint24: Se preocupan por cometer errores",
"cint27: Se sienten solos incluso cuando están con otras personas",
"cint28: Les molestan cosas que antes no les molestaban",
"cint29: Se van a su habitación y lloran",
"cint30: Se sienten intranquilos y caminan de un lado a otr"
)

cint.labels.en <- c(
"cint1: Feel sad or down",
"cint2: Do not enjoy anything anymore",
"cint4: Feel tired and drained of energy",
"cint11: Feel moody or grumpy",
"cint19: Afraid to interact with new people",
"cint21: Worry when they think they have done something poorly",
"cint22: Worry when they think someone is angry at them",
"cint23: Worry about what other people think of them",
"cint24: Worry about making mistakes",
"cint27: Feel lonely around other peoples",
"cint28: Get bothered by things that didn't bother them before",
"cint29: Go to their room and cry",
"cint30: Feel restless and walk around"
)

#2. Rename the items in your mini dataframe (cint.dat)

cint.dat.items.esp <- cint.dat%>%  #tells R to take the mini dataset created above
    rename_with(~cint.labels.esp)  #tells R to rename the cint items with their labels.

cint.dat.items.en <- cint.dat%>%  #tells R to take the mini dataset created above
    rename_with(~cint.labels.en)  #tells R to rename the cint items with their labels.

str(cint.dat.items.esp)
str(cint.dat.items.en)

Step 4: Calculate descriptive statistics for your items

Let’s start with the percentage of children who endorsed each response option in each item. Remember, children were asked to respond whether statements were true for them “never” (0), “rarely” (1), “sometimes” (2), or “almost always” (3).

#|item frequencies

##ESP
cint.table.freq.esp<-tbl_summary(cint.dat.items.esp)%>%
      modify_header(label ~ "**Item**")
knitr::kable(cint.table.freq.esp)
Item N = 840
cint1: Se sienten tristes o decaídos NA
0 112 (13%)
1 305 (36%)
2 304 (36%)
3 119 (14%)
cint2: No disfrutan haciendo nada NA
0 218 (26%)
1 266 (32%)
2 224 (27%)
3 132 (16%)
cint4: Se sienten muy cansados y sin energía NA
0 124 (15%)
1 260 (31%)
2 318 (38%)
3 138 (16%)
cint11: Se sienten malhumorados o renegones NA
0 135 (16%)
1 204 (24%)
2 355 (42%)
3 146 (17%)
cint19: Temen interactuar con personas nuevas NA
0 190 (23%)
1 186 (22%)
2 249 (30%)
3 215 (26%)
cint21: Se preocupan cuando piensan que han hecho algo mal NA
0 81 (9.6%)
1 163 (19%)
2 299 (36%)
3 297 (35%)
cint22: Se preocupan cuando piensan que alguien está molesto con ellos NA
0 139 (17%)
1 182 (22%)
2 329 (39%)
3 190 (23%)
cint23: Se preocupan por lo que otras personas piensan de ellos NA
0 248 (30%)
1 197 (23%)
2 258 (31%)
3 137 (16%)
cint24: Se preocupan por cometer errores NA
0 98 (12%)
1 159 (19%)
2 320 (38%)
3 263 (31%)
cint27: Se sienten solos incluso cuando están con otras personas NA
0 285 (34%)
1 204 (24%)
2 214 (25%)
3 137 (16%)
cint28: Les molestan cosas que antes no les molestaban NA
0 188 (22%)
1 231 (28%)
2 278 (33%)
3 143 (17%)
cint29: Se van a su habitación y lloran NA
0 424 (50%)
1 179 (21%)
2 164 (20%)
3 73 (8.7%)
cint30: Se sienten intranquilos y caminan de un lado a otr NA
0 304 (36%)
1 202 (24%)
2 215 (26%)
3 119 (14%)
##EN
cint.table.freq.en<-tbl_summary(cint.dat.items.en)%>%
      modify_header(label ~ "**Item**")
knitr::kable(cint.table.freq.en)
Item N = 840
cint1: Feel sad or down NA
0 112 (13%)
1 305 (36%)
2 304 (36%)
3 119 (14%)
cint2: Do not enjoy anything anymore NA
0 218 (26%)
1 266 (32%)
2 224 (27%)
3 132 (16%)
cint4: Feel tired and drained of energy NA
0 124 (15%)
1 260 (31%)
2 318 (38%)
3 138 (16%)
cint11: Feel moody or grumpy NA
0 135 (16%)
1 204 (24%)
2 355 (42%)
3 146 (17%)
cint19: Afraid to interact with new people NA
0 190 (23%)
1 186 (22%)
2 249 (30%)
3 215 (26%)
cint21: Worry when they think they have done something poorly NA
0 81 (9.6%)
1 163 (19%)
2 299 (36%)
3 297 (35%)
cint22: Worry when they think someone is angry at them NA
0 139 (17%)
1 182 (22%)
2 329 (39%)
3 190 (23%)
cint23: Worry about what other people think of them NA
0 248 (30%)
1 197 (23%)
2 258 (31%)
3 137 (16%)
cint24: Worry about making mistakes NA
0 98 (12%)
1 159 (19%)
2 320 (38%)
3 263 (31%)
cint27: Feel lonely around other peoples NA
0 285 (34%)
1 204 (24%)
2 214 (25%)
3 137 (16%)
cint28: Get bothered by things that didn’t bother them before NA
0 188 (22%)
1 231 (28%)
2 278 (33%)
3 143 (17%)
cint29: Go to their room and cry NA
0 424 (50%)
1 179 (21%)
2 164 (20%)
3 73 (8.7%)
cint30: Feel restless and walk around NA
0 304 (36%)
1 202 (24%)
2 215 (26%)
3 119 (14%)

Now let’s examine children’s average responses on each of the items as well as the variation in responses.

#|item means and SDs

##ESP
cint.table.msd.esp<- tbl_summary(cint.dat.items.esp,
      type=all_of(cint.labels.esp)~"continuous", ## tells R that all of your items - remember to use the character vector with the labels! - are continuous
      statistic = all_continuous() ~ "{mean} ({sd})",  ## tells R that you want the means and standard deviations for all of your continuous items
      digits=all_of(cint.labels.esp)~2, ##tells R to print means and standard deviations to two decimal points
      missing = "no"  ##tells R that you don't want missing data listed
      )%>%
      modify_header(label ~ "**Item**")

knitr::kable(cint.table.msd.esp)
Item N = 840
cint1: Se sienten tristes o decaídos 1.51 (0.89)
cint2: No disfrutan haciendo nada 1.32 (1.03)
cint4: Se sienten muy cansados y sin energía 1.56 (0.93)
cint11: Se sienten malhumorados o renegones 1.61 (0.95)
cint19: Temen interactuar con personas nuevas 1.58 (1.10)
cint21: Se preocupan cuando piensan que han hecho algo mal 1.97 (0.97)
cint22: Se preocupan cuando piensan que alguien está molesto con ellos 1.68 (1.00)
cint23: Se preocupan por lo que otras personas piensan de ellos 1.34 (1.07)
cint24: Se preocupan por cometer errores 1.89 (0.98)
cint27: Se sienten solos incluso cuando están con otras personas 1.24 (1.09)
cint28: Les molestan cosas que antes no les molestaban 1.45 (1.02)
cint29: Se van a su habitación y lloran 0.86 (1.02)
cint30: Se sienten intranquilos y caminan de un lado a otr 1.18 (1.07)
##EN
cint.table.msd.en<- tbl_summary(cint.dat.items.en,
      type=all_of(cint.labels.en)~"continuous", ## tells R that all of your items - remember to use the character vector with the labels! - are continuous
      statistic = all_continuous() ~ "{mean} ({sd})",  ## tells R that you want the means and standard deviations for all of your continuous items
      digits=all_of(cint.labels.en)~2, ##tells R to print means and standard deviations to two decimal points
      missing = "no"  ##tells R that you don't want missing data listed
      )%>%
      modify_header(label ~ "**Item**")

knitr::kable(cint.table.msd.en)
Item N = 840
cint1: Feel sad or down 1.51 (0.89)
cint2: Do not enjoy anything anymore 1.32 (1.03)
cint4: Feel tired and drained of energy 1.56 (0.93)
cint11: Feel moody or grumpy 1.61 (0.95)
cint19: Afraid to interact with new people 1.58 (1.10)
cint21: Worry when they think they have done something poorly 1.97 (0.97)
cint22: Worry when they think someone is angry at them 1.68 (1.00)
cint23: Worry about what other people think of them 1.34 (1.07)
cint24: Worry about making mistakes 1.89 (0.98)
cint27: Feel lonely around other peoples 1.24 (1.09)
cint28: Get bothered by things that didn’t bother them before 1.45 (1.02)
cint29: Go to their room and cry 0.86 (1.02)
cint30: Feel restless and walk around 1.18 (1.07)

Step 5: Create histograms of your items

Sometimes it’s easier to interpret patterns in item responses if we present the information visually. So let’s also look at the histograms.

#|item histograms

#a function to create a set of histograms (don't need to change anything)
gen.hist <- function(dat_s, nrow = 3, ncol = 3){
  par(mfrow = c(nrow,ncol),
      mar = c(4,1,2,1))
  for (i in 1:ncol(dat_s)) {
    hist(dat_s[, i], main = "", xlab = names(dat_s)[i], col = "dark blue")
  }
}

gen.hist(cint.dat)

Step 6: Examine item correlations

Now we want to move from thinking about each item individually to think about whether and how the items are associated with each other. We can use the psych package in R to easily create a correlation matrix - but with this many variables, it’s hard to interpret.

#|item correlation table

#item correlation table
corr.test(cint.dat)

So let’s see about a correlation heatmap.

#|item correlation heatmap: internalizing

#item correlation heatmap: internalizing
corPlot(cint.dat,
        numbers=TRUE,
        upper=FALSE,
        diag=FALSE,
        cex=.8, 
        main="cint correlations", 
        stars=TRUE) 

Even this is alot! Let’s see about the depression and anxiety items separately

#|item correlation heatmap: depression/anxiety

#item correlation heatmap: depression/anxiety
corPlot(cdep.dat,
        numbers=TRUE,
        upper=FALSE,
        diag=FALSE,
        cex=.8, 
        main="cdep correlations", 
        stars=TRUE) 

corPlot(canx.dat,
        numbers=TRUE,
        upper=FALSE,
        diag=FALSE,
        cex=.8, 
        main="canx correlations", 
        stars=TRUE)