I have a dummy dataset df
which contains data on the treatment, disease grade and survival outcome of 50 people.
library(Hmisc)# Generate initial cases.set.seed(123)n <- 50df <- data.frame( treatment = sample(0:1, n, replace = TRUE), grade = sample(0:1, n, replace = TRUE), death = sample(0:1, n, replace = TRUE))# Add labels to columns.label(df$treatment) <- "Drug Intervention"label(df$grade) <- "Cancer Grade"label(df$death) <- "Death Occurrence"# Factor the values.df$treatment <- factor(df$treatment, levels = c(0, 1), labels = c("Drug A", "Drug B"))df$grade <- factor(df$grade, levels = c(0, 1), labels = c("Grade 1", "Grade 2"))df$death <- factor(df$death, levels = c(0, 1), labels = c("Survived", "Died"))
I want to generate a 2 x 3 cross table which shows the number of people who survived by grade and treatment. I am using tbl_strata()
and tbl_summary()
from gtsummary to do this.
This code is getting close to the desired outcome:
library(tidyverse)library(gtsummary)df %>% tbl_strata( strata = grade, ~.x %>% tbl_summary( by = death, percent = "row" ))
Which produces a plot that looks like this:
Characteristic | Grade 1 Survived, N = 10 | Grade 1 Died, N = 17 | Grade 2 Survived, N = 9 | Grade 2 Died, N = 14 |
---|---|---|---|---|
Treatment | ||||
Drug A | 5 (28%) | 13 (72%) | 5 (42%) | 7 (58%) |
Drug B | 5 (56%) | 4 (44%) | 4 (36%) | 7 (64%) |
However the desired output is:
Treatment | Grade 1 Died | Grade 2 Died | P-Value |
---|---|---|---|
Drug A | 13 (72%) | 7 (58%) | 0.46 |
Drug B | 4 (44%) | 7 (64%) | 0.65 |
How can I use gtsummary to filter out/collapse the 'Survived` columns to simplify the table, and is it possible to add a Fisher's exact p-value for the relationship between death and grade (for the two separate drugs)?