To demonstrate the inference of eGFR slopes with a joint modelling approach using open-source R packages, we first create a synthetic dataset. We simulate a patient cohort consisting of 100 patients sampled from two disease groups “Disease A” and “Disease B”.

The support functions which we use to simulate our data can be found in the script code/syntheticData.R.

Load necessary libraries:

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<>) to force all conflicts to become errors

Load support functions:


Simulate the dataset

Simulate patient metadata for our synthetic cohort:

patient_metadata <- sim_cohort_metadata(n_patients = 100)
patient_id sex disease age_at_biopsy
1 M B 29
2 M A 79
3 M B 35
4 F B 60
5 M A 43
6 F B 80

Simulate longitudinal data:

longitudinal_data <- sim_cohort_longitudinal_data(n_patients = 100,
                                                  patient_metadata = patient_metadata)
patient_id years_from_biopsy measurement type units days_from_biopsy
1 0.0054757 107.02514 SCr umol/L 2
1 0.0191650 107.14072 SCr umol/L 7
1 0.0273785 101.41421 SCr umol/L 10
1 0.0301164 106.18664 SCr umol/L 11
1 0.7091034 95.95474 SCr umol/L 259
1 0.9637235 95.74575 SCr umol/L 352

The simulated longitudinal data currently contains two types of measurements:

[1] "SCr"      "Dialysis"

We have a range of follow-up times. Some dropout is random, due to time of biopsy relative to the end of follow-up, and we also have dropout not-at-random due to dialysis.

endpoint_date <- longitudinal_data %>%
  dplyr::filter(type == "Dialysis") %>%
  dplyr::transmute(patient_id = patient_id, endpoint_years = years_from_biopsy)

to_plot <- longitudinal_data %>%
  dplyr::full_join(endpoint_date, by = "patient_id") %>%
  dplyr::group_by(patient_id) %>%
  dplyr::summarise(start = 0,
                   end = max(years_from_biopsy),
                   endpoint = min(endpoint_years))

to_plot %>%
  dplyr::mutate(patient_id = forcats::fct_reorder(patient_id, end, .desc = T)) %>% 
  ggplot() +
               linewidth=1, lineend = "round") +
  geom_point(aes(x = endpoint, 
                 y = patient_id, color = "Dialysis" ),
             na.rm=TRUE) + 
  labs(x = "Follow-up time", y = "Patient", color = "Endpoint reached") +
  theme(axis.ticks.y = element_blank(),
        axis.text.y = element_blank(),
        legend.position = c(0.85, 0.93))

Plot longitudinal SCr measurements for the first eight patients:

longitudinal_data %>%
  dplyr::filter(patient_id %in% 1:8) %>%
  ggplot(aes(x = years_from_biopsy,
             y = measurement)) +
  geom_point(size = 1) +
  xlim(0, 20) +
  facet_wrap(vars(patient_id), nrow = 2)

Save patient metadata and longitudinal data:

write.csv(patient_metadata, "data/simulated_metadata.csv", row.names = FALSE)
write.csv(longitudinal_data, "data/simulated_longitudinal_data.csv", row.names = FALSE)

