To install the package, please use:
library(devtools)
install_github("doomlab/ViSe")
This installation will give you the latest version of the package, regardless of status on CRAN.
library(ViSe)
library(cowplot)
library(ggplot2)
library(plotly)
#>
#> Attaching package: 'plotly'
#> The following object is masked from 'package:ggplot2':
#>
#> last_plot
#> The following object is masked from 'package:stats':
#>
#> filter
#> The following object is masked from 'package:graphics':
#>
#> layout
To demonstrate the use of visual sensitivity analysis, we use the effect of child maltreatment on the extent of mental health problems in terms of internalising and externalising behaviour. The study of Kisely and colleagues (Kisely et al., 2018) was based on a general population sample in Brisbane, Australia, and compared 3554 mother-child pairs without ‘substantiated child maltreatment’ to, for example, 73 pairs with child neglect. Note that the results vary across different types of maltreatment assessed, we choose child neglect because its results (a smaller but still considerable l after adjustment) are particularly suited to illustrating sensitivity analysis. Maltreatment was assessed ‘by linkage to state child protection agency data’. Internalising and externalising behaviours were measured using the Youth Self-Report (YSR) scale (Achenbach & Rescorla, 2001) at around the age of 21. The study reports unadjusted mean differences and mean differences adjusted for ‘gender, parental ethnicity, maternal age, family income, maternal relationship status, maternal education, youth income level, youth education level, youth marital status’ (e.g., likely based on ordinary least squares regression, but the paper does not specify).
See supplemental document at https://static.cambridge.org/content/id/urn:cambridge.org:id:article:S0007125018002076/resource/name/S0007125018002076sup001.docx
To obtain the estimates and two-tailed 95% confidence intervals on the effect size d scale (via d = Mdifference / SDTotal, similar to Glass’ Δ), we divided the reported unadjusted and adjusted mean differences for internalising and externalising by the respective standard deviations of the total sample.
internal_unadjusted <- list(
mean_diff = 3.68,
lower_diff = 1.73,
upper_diff = 5.62,
sd = 8.29
)
internal_adjusted <- list(
mean_diff = 2.73,
lower_diff = 0.77,
upper_diff = 4.69,
sd = 8.28
)
external_unadjusted <- list(
mean_diff = 3.72,
lower_diff = 2.13,
upper_diff = 5.32,
sd = 6.81
)
external_adjusted <- list(
mean_diff = 3.10,
lower_diff = 1.49,
upper_diff = 4.71,
sd = 6.81
)
list_values <- list(internal_unadjusted, internal_adjusted,
external_unadjusted, external_adjusted)
list_names <- c("internal_unadjusted", "internal_adjusted",
"external_unadjusted", "external_adjusted")
names(list_values) <- list_names
for (i in list_names){
list_values[[i]][["d"]] <- list_values[[i]][["mean_diff"]] / list_values[[i]][["sd"]]
list_values[[i]][["lower_d"]] <- list_values[[i]][["lower_diff"]] / list_values[[i]][["sd"]]
list_values[[i]][["upper_d"]] <- list_values[[i]][["upper_diff"]] / list_values[[i]][["sd"]]
}
unlist(list_values)
#> internal_unadjusted.mean_diff internal_unadjusted.lower_diff
#> 3.68000000 1.73000000
#> internal_unadjusted.upper_diff internal_unadjusted.sd
#> 5.62000000 8.29000000
#> internal_unadjusted.d internal_unadjusted.lower_d
#> 0.44390832 0.20868516
#> internal_unadjusted.upper_d internal_adjusted.mean_diff
#> 0.67792521 2.73000000
#> internal_adjusted.lower_diff internal_adjusted.upper_diff
#> 0.77000000 4.69000000
#> internal_adjusted.sd internal_adjusted.d
#> 8.28000000 0.32971014
#> internal_adjusted.lower_d internal_adjusted.upper_d
#> 0.09299517 0.56642512
#> external_unadjusted.mean_diff external_unadjusted.lower_diff
#> 3.72000000 2.13000000
#> external_unadjusted.upper_diff external_unadjusted.sd
#> 5.32000000 6.81000000
#> external_unadjusted.d external_unadjusted.lower_d
#> 0.54625551 0.31277533
#> external_unadjusted.upper_d external_adjusted.mean_diff
#> 0.78120411 3.10000000
#> external_adjusted.lower_diff external_adjusted.upper_diff
#> 1.49000000 4.71000000
#> external_adjusted.sd external_adjusted.d
#> 6.81000000 0.45521292
#> external_adjusted.lower_d external_adjusted.upper_d
#> 0.21879589 0.69162996
To visualise the lower level confidence interval l, we must
obtain the lower bound of the confidence interval for our proposed
effect size. ViSe includes a function calculate_d()
that
calculates from t.test()
output, dataframes, individual
vectors of data, sample statistics (M, SD, n
for group), t-test values, or a pre-calculated
d-value.
In this example, we have our d value from the original
research, along with sample sizes from the study which can be used to
calculate the two-tailed dlow_central
or one-tailed
done_low_central
lower confidence interval. The package
vignette shows all possible ways to calculate values from data and
includes the non-centralized confidence intervals for effect size
d as well (below).
internal_unadj_output <- calculate_d(
d = list_values$internal_adjusted$d, # d value
a = .05, # alpha for confidence interval
lower = TRUE, # you expect d to be positive
n1 = 71, # sample size group 1
n2 = 3653 # sample size group 2
)
internal_unadj_output$dlow_central
#> [1] 0.09465985
internal_unadj_output$done_low_central
#> [1] 0.1324648
# note, the program also provide noncentral t confidence intervals
# in this case, they are unusable because d has been calculated from
# mean difference / control rather than mean difference / spooled
# therefore the approximation of t and the noncentral
# limits is not appropriate
calculate_d
# from dataframe
calculate_d(
df = mtcars,
x_col = "am",
y_col = "hp",
a = .05,
lower = TRUE
)
#> $d
#> [1] -0.501465
#>
#> $dlow
#> [1] -1.2067
#>
#> $dhigh
#> [1] 0.2260463
#>
#> $dlow_central
#> [1] -1.247618
#>
#> $dhigh_central
#> [1] 0.2446883
#>
#> $done_low
#> [1] -1.091478
#>
#> $done_low_central
#> [1] -1.121567
#>
#> $M1
#> [1] 126.8462
#>
#> $sd1
#> [1] 84.06232
#>
#> $se1
#> [1] 23.31469
#>
#> $M1low
#> [1] 76.0478
#>
#> $M1high
#> [1] 177.6445
#>
#> $M2
#> [1] 160.2632
#>
#> $sd2
#> [1] 53.9082
#>
#> $se2
#> [1] 19.28522
#>
#> $M2low
#> [1] 119.7464
#>
#> $M2high
#> [1] 200.7799
#>
#> $spooled
#> [1] 67.60359
#>
#> $sepooled
#> [1] 24.33304
#>
#> $n1
#> [1] 13
#>
#> $n2
#> [1] 19
#>
#> $df
#> [1] 30
#>
#> $t
#> [1] -1.373318
#>
#> $p
#> [1] 0.1798309
#>
#> $estimate
#> [1] "$d_s$ = -0.50, 95\\% CI [-1.21, 0.23]"
#>
#> $statistic
#> [1] "$t$(30) = -1.37, $p$ = .180"
# from two columns
x <- mtcars$am
y <- mtcars$hp
calculate_d(
x_col = x,
y_col = y,
a = .05,
lower = TRUE
)
#> $d
#> [1] -0.501465
#>
#> $dlow
#> [1] -1.2067
#>
#> $dhigh
#> [1] 0.2260463
#>
#> $dlow_central
#> [1] -1.247618
#>
#> $dhigh_central
#> [1] 0.2446883
#>
#> $done_low
#> [1] -1.091478
#>
#> $done_low_central
#> [1] -1.121567
#>
#> $M1
#> [1] 126.8462
#>
#> $sd1
#> [1] 84.06232
#>
#> $se1
#> [1] 23.31469
#>
#> $M1low
#> [1] 76.0478
#>
#> $M1high
#> [1] 177.6445
#>
#> $M2
#> [1] 160.2632
#>
#> $sd2
#> [1] 53.9082
#>
#> $se2
#> [1] 19.28522
#>
#> $M2low
#> [1] 119.7464
#>
#> $M2high
#> [1] 200.7799
#>
#> $spooled
#> [1] 67.60359
#>
#> $sepooled
#> [1] 24.33304
#>
#> $n1
#> [1] 13
#>
#> $n2
#> [1] 19
#>
#> $df
#> [1] 30
#>
#> $t
#> [1] -1.373318
#>
#> $p
#> [1] 0.1798309
#>
#> $estimate
#> [1] "$d_s$ = -0.50, 95\\% CI [-1.21, 0.23]"
#>
#> $statistic
#> [1] "$t$(30) = -1.37, $p$ = .180"
# from summary statistics
calculate_d(m1 = 14.37, # neglect mean
sd1 = 10.716, # neglect sd
n1 = 71, # neglect n
m2 = 10.69, # none mean
sd2 = 8.219, # none sd
n2 = 3653, # none n
a = .05, # alpha/confidence interval
lower = TRUE) # lower or upper bound
#> $d
#> [1] 0.4448249
#>
#> $dlow
#> [1] 0.2097233
#>
#> $dhigh
#> [1] 0.6798669
#>
#> $dlow_central
#> [1] 0.2096767
#>
#> $dhigh_central
#> [1] 0.6799731
#>
#> $done_low
#> [1] 0.2475166
#>
#> $done_low_central
#> [1] 0.2474974
#>
#> $M1
#> [1] 14.37
#>
#> $sd1
#> [1] 10.716
#>
#> $se1
#> [1] 1.271755
#>
#> $M1low
#> [1] 11.83356
#>
#> $M1high
#> [1] 16.90644
#>
#> $M2
#> [1] 10.69
#>
#> $sd2
#> [1] 8.219
#>
#> $se2
#> [1] 0.135986
#>
#> $M2low
#> [1] 10.42338
#>
#> $M2high
#> [1] 10.95662
#>
#> $spooled
#> [1] 8.272918
#>
#> $sepooled
#> [1] 0.9913101
#>
#> $n1
#> [1] 71
#>
#> $n2
#> [1] 3653
#>
#> $df
#> [1] 3722
#>
#> $t
#> [1] 3.712259
#>
#> $p
#> [1] 0.0002084243
#>
#> $estimate
#> [1] "$d_s$ = 0.44, 95\\% CI [0.21, 0.68]"
#>
#> $statistic
#> [1] "$t$(3722) = 3.71, $p$ < .001"
# from t-test model
output <- t.test(mtcars$hp ~ mtcars$am, var.equal = TRUE)
n_values <- tapply(mtcars$hp, mtcars$am, length)
calculate_d(
model = output,
n1 = unname(n_values[1]),
n2 = unname(n_values[2]),
a = .05,
lower = TRUE
)
#> $d
#> [1] 0.501465
#>
#> $dlow
#> [1] -0.2260463
#>
#> $dhigh
#> [1] 1.2067
#>
#> $dlow_central
#> [1] -0.2446883
#>
#> $dhigh_central
#> [1] 1.247618
#>
#> $done_low
#> [1] -0.1109204
#>
#> $done_low_central
#> [1] -0.1186368
#>
#> $M1
#> NULL
#>
#> $sd1
#> NULL
#>
#> $se1
#> NULL
#>
#> $M1low
#> NULL
#>
#> $M1high
#> NULL
#>
#> $M2
#> NULL
#>
#> $sd2
#> NULL
#>
#> $se2
#> NULL
#>
#> $M2low
#> NULL
#>
#> $M2high
#> NULL
#>
#> $spooled
#> NULL
#>
#> $sepooled
#> NULL
#>
#> $n1
#> [1] 19
#>
#> $n2
#> [1] 13
#>
#> $df
#> [1] 30
#>
#> $t
#> [1] 1.373318
#>
#> $p
#> [1] 0.1798309
#>
#> $estimate
#> [1] "$d_s$ = 0.50, 95\\% CI [-0.23, 1.21]"
#>
#> $statistic
#> [1] "$t$(30) = 1.37, $p$ = .180"
# from t-values
calculate_d(
t = 1.37,
n1 = unname(n_values[1]),
n2 = unname(n_values[2]),
a = .05,
lower = TRUE
)
#> $d
#> [1] 0.5002533
#>
#> $dlow
#> [1] -0.2271793
#>
#> $dhigh
#> [1] 1.205462
#>
#> $dlow_central
#> [1] -0.2458469
#>
#> $dhigh_central
#> [1] 1.246353
#>
#> $done_low
#> [1] -0.1120615
#>
#> $done_low_central
#> [1] -0.1198044
#>
#> $M1
#> NULL
#>
#> $sd1
#> NULL
#>
#> $se1
#> NULL
#>
#> $M1low
#> NULL
#>
#> $M1high
#> NULL
#>
#> $M2
#> NULL
#>
#> $sd2
#> NULL
#>
#> $se2
#> NULL
#>
#> $M2low
#> NULL
#>
#> $M2high
#> NULL
#>
#> $spooled
#> NULL
#>
#> $sepooled
#> NULL
#>
#> $n1
#> [1] 19
#>
#> $n2
#> [1] 13
#>
#> $df
#> [1] 30
#>
#> $t
#> [1] 1.37
#>
#> $p
#> [1] 0.1808537
#>
#> $estimate
#> [1] "$d_s$ = 0.50, 95\\% CI [-0.23, 1.21]"
#>
#> $statistic
#> [1] "$t$(30) = 1.37, $p$ = .181"
Not all research studies use d as the effect size of
interest, and ViSe
provides functionality to convert
between effect sizes. The other_to_d()
function can be used
to convert effect sizes f, f2, NNT, r,
probability of superiority, U1, U2, U3, and proportional overlap into
d.
All options of other_to_d()
:
f = NULL,
f2 = NULL,
nnt = NULL,
r = NULL,
prob = NULL,
prop_u1 = NULL,
prop_u2 = NULL,
prop_u3 = NULL,
prop_overlap = NULL
The effect size d value can then be used to visualize all
effects and their conversions at once in the
visualize_effects()
function. You can use the percent,
color, and font family options to adjust the resulting graph for
readability and color scheme.
visualize_effects(d = list_values$internal_adjusted$d,
circle_color = "lightblue",
circle_fill = "gray",
percent_color = "darkblue",
percent_size = 10,
text_color = "black",
font_family = "Times")
#> $graph
# note graphs look better scaled, try saving them
# ggsave(filename = "visualize_effects.png")
# you can make very ugly graphs if you want
visualize_effects(d = .2,
circle_color = "green",
circle_fill = "orange",
percent_color = "pink",
percent_size = 20,
text_color = "purple",
font_family = "Arial")
#> $graph
This package uses the following conversion functions that can be implemented separately as well:
d_to_r(d = list_values$internal_adjusted$d)
#> [1] 0.1626596
d_to_f2(d = list_values$internal_adjusted$d)
#> $f
#> [1] 0.1648551
#>
#> $f2
#> [1] 0.02717719
d_to_nnt(d = list_values$internal_adjusted$d)
#> [1] 5.424537
probability_superiority(d = list_values$internal_adjusted$d)
#> [1] 0.5921738
proportion_overlap(d = list_values$internal_adjusted$d)
#> $u1
#> [1] 0.2315626
#>
#> $u2
#> [1] 0.565471
#>
#> $u3
#> [1] 0.6291905
#>
#> $p_o
#> [1] 0.8690581
Now that we have an idea of our d value, the lower
confidence limit l, and other potential effect sizes, we can
create a sensitivity plot of the effect size to bias. To visualize
different d values and their representation of the distribution
overlap, use estimate_d()
to examine different effect
sizes:
Similarly, estimate_r()
shows a scatterplot of the
entered correlation coefficient to visualize the relation between two
variables:
You can then use the two estimation functions to create a graph that shows you which combinations of effect size and r would indicate a causal effect. You can enter any effect size from our conversion options, and these will be converted to d with labels for inspection. The shaded area represents an area that would be considered the effect, and the points represent the entered combination of r and effect size. Points in the shaded area would be considered sensitive to bias.
You can use the plotly
library to hover over those dots
to see their values (also embedded in our shiny app). Note: Not
run to make this package CRAN compatible (plotly makes html files
large).
visual_c_mapped <-
# your lower confidence limit required
visualize_c_map(dlow = list_values$internal_adjusted$lower_d,
# correlation values required
r_values = c(.1, .4, .3),
# other effect sizes you want to plot
d_values = c(.2, .8),
nnt_values = c(60),
# if you think d will be positive
lower = TRUE,
# as many values as the max number effects
point_colors = c("red", "green", "blue"),
# a size for the shapes
size = 2,
# shape 1
shape_1 = 2,
# shape 2, make these the same number if you
# want the shapes overlapping
# we think two different ones helps readability
shape_2 = 3,
# color of the background highlighted area
ribbon_color = "lightblue"
)
visual_c_mapped$graph
ggsave(filename = "visualize_c_map.png", width = 8,
height = 6, dpi = 300)
# ggplotly(visual_c_mapped$graph)
All options for the graph, including their defaults:
visualize_c_map(
dlow,
r_values,
d_values = NULL,
f_values = NULL,
f2_values = NULL,
nnt_values = NULL,
prob_values = NULL,
prop_u1_values = NULL,
prop_u2_values = NULL,
prop_u3_values = NULL,
prop_overlap_values = NULL,
lower = TRUE,
point_colors = c("red", "green", "blue"),
size = 2,
shape_1 = 2,
shape_2 = 3,
ribbon_color = "lightblue"
)