Skip to contents
library(bmstate)
#> Attached bmstate 0.2.7. Type ?bmstate to get started.
library(ggplot2)
theme_set(theme_bw())

Multistate hazard model

States and transitions

In a multistate model, we can represent the possible transitions between states as directed edges of a graph whose SS nodes correspond to states s1,,Ss \in {1, \ldots, S}. The transitions that start from the same node are all at risk simultaneously when the system is in this state. The transitions that are at risk simultaneously are competing, meaning that occurrence of one transition censors the other ones.

Below is an example multistate system.

tmat <- transmat_illnessdeath()
tmat$plot()

tmat$states_df()
#>   state_idx    state terminal source
#> 1         1  Healthy    FALSE   TRUE
#> 2         2 Diseased    FALSE  FALSE
#> 3         3     Dead     TRUE  FALSE
tmat$trans_df()
#>   trans_idx prev_state state          trans_char trans_type
#> 1         1          1     2 Healthy -> Diseased          1
#> 2         2          1     3     Healthy -> Dead          2
#> 3         3          2     3    Diseased -> Dead          2

Hazard model

For each transition h=1,,Hh = 1, \ldots, H and observation unit (subject) n=1,,Nn = 1, \ldots, N, the hazard rates λn(h)(t)\lambda^{(h)}_n(t), h=1,,Hh = 1, \ldots, H, are modeled as λn(h)(t)=b(h)(t)exp(βTh𝐱n) \lambda^{(h)}_n(t) = b^{(h)}(t)\exp\left(\beta_{T_h}^{\top} \mathbf{x}_n \right)

where

  • b(h)(t)b^{(h)}(t) represents the baseline hazard for transition hh
  • ThT_h is an integer indexing the type (trans_type) of transition hh
  • βTh\beta_{T_h} is the regression coefficient for transition type ThT_h
  • 𝐱n\mathbf{x}_n includes time-independent covariates for unit nn

We group transitions that end in the same state together to have the same transition type.

el <- parse(text = paste0("lambda[n]^(", c(1, 2, 3), ")"))
tmat$plot(edge_labs = el, edge.label.cex = 1.5)

Baseline hazards

Each log baseline hazard logb(h)(t)\log b^{(h)}(t) is modeled nonparametrically so that log(b(h)(t))=b0(h)+B(h)(t) \log \left(b^{(h)}(t)\right) = b_0^{(h)} + B^{(h)}(t) where b0(h)b_0^{(h)} is a constant and B(h)(t)B^{(h)}(t) is time-dependent variation modeled using B-splines. The knot locations for the splines are same for all transitions.

Priors

Currently the package uses a standard normal prior for the coefficients β\beta. The prior for the weights of the spline basis functions is set hierarchically so that transitions of same type have a shared mean. Also the average log hazard rates b0(h)b_0^{(h)} have a hierarchical prior so that transitions of the same type have a shared mean. This prior mean is estimated from the data using a standard Cox proportional hazards model fit.

You can view the prior in the Stan code for example like this.

stan_lines <- readLines(default_stan_filepath())
line_idx <- which(grepl(stan_lines, pattern = "HAZARD PARAM PRIOR"))
stan_lines_prior <- stan_lines[line_idx[1]:line_idx[2]]
prior_stan_code <- paste(stan_lines_prior, collapse = "\n")
cat(prior_stan_code)
#>   // HAZARD PARAM PRIOR
#>   if(do_haz == 1){
#>     if(nc_haz > 0){
#>       for(k in 1:nc_haz){
#>         beta_oth[1, k] ~ normal(0, 1);
#>       }
#>     }
#>     if(I_auc==1){
#>       beta_auc[1, 1] ~ normal(0, 1); // only if has pk submodel
#>     }
#> 
#>     // Baseline hazard base level
#>     sig_w0[1] ~ normal(0, 3);
#>     to_vector(z_w0[1]) ~ std_normal();
#> 
#>     // Baseline hazard spline weights
#>     to_vector(mu_weights[1]) ~ normal(0, 1);
#>     to_vector(sig_weights[1]) ~ normal(0, 0.3);
#>     to_vector(z_weights[1]) ~ std_normal();
#>   }
#>   // END HAZARD PARAM PRIOR

Using custom Stan code

stan_lines <- readLines(default_stan_filepath())
full_stan_code <- paste(stan_lines, collapse = "\n")

You should be able to write full_stan_code to some file, edit that file, and set options(bmstate_stan_file = path_to_your_file). If a path to custom Stan model code has been set, the correctness of package functionality cannot be guaranteed. However, it should be safe to change for example just the prior of some parameter from, say, normal(0, 1) to student_t(4, 0, 1) for example.

Likelihood evaluation

Binary matrix representation of observed data

For each subject i=1,,Ni = 1, \ldots, N, the state transition times divide the follow-up time period into Ki1K_i \geq 1 intervals. We assume t0(i)=0t^{(i)}_0 = 0 as the start time of the first interval and for k1k \geq 1 use tk(i)t^{(i)}_k to denote the end time of the kkth interval. Furthermore, let s0(i)s^{(i)}_0 be the initial state and sk(i){1,,S}s^{(i)}_k \in \{1, \ldots, S\} to denote the state that subject ii was in at the end of the time interval.

We use the binary variable δk,i(h)\delta_{k,i}^{(h)} to indicate whether transition h{1,,H}h \in \{1, \ldots, H\} occurred at the end of the kkth time interval for unit ii. Another binary variable Ik,i(h)I_{k,i}^{(h)} indicates whether unit ii was at risk for transition hh during the kkth time interval. We collect the entire observed multistate path (event sequence) data into 𝐃MS={Δi,𝐈i,𝐭i}i=1N\mathbf{D}_{\textrm{MS}} = \left \{\Delta_i, \mathbf{I}_i, \mathbf{t}_i \right\}_{i=1}^N where Δi\Delta_i, 𝐈i\mathbf{I}_i and 𝐭i\mathbf{t}_i contain the transition indicator, risk indicator, and interval time data for unit ii, respectively.

We generate example data for a single subject here and plot the realized path.

set.seed(2344)
mod <- create_msm(tmat, n_grid = 12) # set very low n_grid for demo
h0_true <- rep(1e-3, 3)
mod$set_prior_mean_h0(h0_true) # has no effect for simulation
dat <- mod$simulate_data(N_subject = 1, w0 = h0_true)
#> Generating 1 paths
dat$paths$plot_paths(truncate = TRUE, alpha = 1) + ggtitle("")

Here we plot the corresponding binary values δk,1(h)\delta_{k,1}^{(h)} and Ik,1(h)I_{k,1}^{(h)} in H×KH \times K matrices transition and at_risk, respectively.

sd <- create_stan_data(mod, dat)
a <- plot_stan_data_matrix(mod, sd, "transition", 1)
b <- plot_stan_data_matrix(mod, sd, "at_risk", 1)
gridExtra::grid.arrange(a, b, nrow = 1, ncol = 2)

Likelihood formula

The log likelihood of the path data is (Kneib and Hennerfeind 2008)

logp(𝐃MSθ)=i=1Nk=1Kih=1H[δk,i(h)log(λi(h)(tk(i)θ))Ik,i(h)tk1(i)tk(i)λi(h)(tθ)dt]\begin{equation} \label{eq: mslik} \log p(\mathbf{D}_{\textrm{MS}} \mid \theta) = \sum_{i=1}^{N} \sum_{k=1}^{K_i}\sum_{h=1}^H\left[\delta_{k,i}^{(h)}\log\left(\lambda_i^{(h)}(t^{(i)}_k \mid \theta) \right) - \text{I}_{k,i}^{(h)} \int_{t^{(i)}_{k-1}}^{t^{(i)}_{k}} \lambda_i^{(h)}(t \mid \theta) \text{d}t \right] \end{equation} where θ={θ0,β}\theta = \{\theta_0, \beta\} contains the baseline hazard parameters θ0\theta_0 and regression coefficients β\beta.

Numerical integration

The package relies on numerical integration of the hazard functions to evaluate the likelihood. The integration step size is the max time divided by number of grid points, which we can check here.

delta_grid <- mod$get_tmax() / mod$get_n_grid()
a <- c(delta_grid, sd$delta_grid)
names(a) <- c("expected delta_grid", "found delta_grid")
print(a)
#> expected delta_grid    found delta_grid 
#>            83.33333            83.33333

We demonstrate the integration grid with an example hazard function.

w <- c(0.2, 0.3, 0.5, -0.5, 0, 0.1)
h0 <- 0.001
plot_stan_data_integral(mod, sd, 1, h0, w) # first interval

plot_stan_data_integral(mod, sd, 2, h0, w) # second interval

State visit risk prediction

We often ultimately want the model to be able to predict, for a given subject, the probability that a particular state is visited at least once before time tt, given an initial state s0s_0 at time t0t_0. We denote this probability by P(st,s0,t0)P(s \mid t, s_0, t_0).

Path generation

For any subject, which does not need to be part of the training data, we can generate a number of state paths starting from s0s_0 at time t0t_0. This can be done via the generate_paths() function. For drawing the arrival times for a non-homogeneous Poisson process in path generation, we use the thinning algorithm (Lewis and Shedler 1979).

The probability P(st,s0,t0)P(s \mid t, s_0, t_0) can then be estimated as the proportion of paths in which ss was visited before time tt. This can be done via the function p_state_visit_per_subject().

Kolmogorov forward equations

Generator matrix

If the transition matrix of the multistate model has no self-loops, it is a continuous-time Markov process (Norris 1997). In this special case, we can define a generator matrix 𝐐(t)=[qjk(t)]\mathbf{Q}(t) = [q_{jk}(t)], where qkj(t)q_{kj}(t) is called the transition rate from state kk to state jj. The transition rate for kjk \neq j is 00 if the corresponding transition is not possible, and otherwise it is given by the corresponding hazard function. Additionally, qkk(t)=jkqkj(t)\begin{equation} q_{kk}(t) = -\sum_{j \neq k} q_{kj}(t) \end{equation} for all diagonal values k1,,Sk \in {1, \ldots, S}.

Transition probability matrix

We also define a transition probability matrix 𝐏t0(t)\mathbf{P}_{t_0}(t), where the element on the iith row and jjth column is the probability that the process is in state jj at time tt, given that it started in state ii at time t0t_0. The Kolmogorov forward equations are ddt𝐏t0(t)=𝐏t0(t)𝐐(t),\begin{equation} \frac{\text{d}}{\text{d}t} \mathbf{P}_{t_0}(t) = \mathbf{P}_{t_0}(t) \mathbf{Q}(t), \end{equation} and we can solve 𝐏t0(t)\mathbf{P}_{t_0}(t) using the initial condition 𝐏t0(t0)=I\mathbf{P}_{t_0}(t_0) = I.

State occupancy probability

The function p_state_occupancy() can be used to compute, for a set of output times tendt_{\mathrm{end}}, the probability that a subject occupies the a certain state ss given their state s0s_0 at the initial time. For terminal states, the state occupancy probability at time tt is the state visit probability P(st,s0,t0)P(s \mid t, s_0, t_0).

Examples of multistate models

In this section, we illustrate some common multistate models.

N_sub <- 1000

# Plot event time distribution
plot_time_dist <- function(t) {
  checkmate::assert_numeric(t)
  ggplot(data.frame(Time = t), aes(x = .data$Time)) +
    ggdist::stat_halfeye() +
    geom_vline(
      mapping = NULL,
      xintercept = mean(t),
      color = "gray20",
      lty = 2
    )
}

Basic survival model

tm <- transmat_survival()
s <- tm$states

The basic survival model can be seen as a multistate model with two states: Alive and Dead. The death event corresponds to transitioning from the state Alive to the terminal state Dead. First we create a TransitionMatrix. The transition matrix is a directed graph encoded by a binary matrix.

print(tm)
#>       Alive Dead
#> Alive     0    1
#> Dead      0    0
tm$plot()

Then we create a MultistateModel using this graph.

tmax <- 3 * 365.25
mod <- create_msm(tm, t_max = tmax)
print(mod)
#> A MultistateModel with:
#>  - Hazard covariates: {}
#> 
#> A MultistateSystem with:
#>  - States: {Alive, Dead}
#>  - Number of spline knots: 5

Now we can simulate death events with known constant hazard rate lambda. For this model, we are simulating from a homogeneous Poisson process with rate λ\lambda, meaning that the event times should follow an exponential distribution with mean 1λ\frac{1}{\lambda}.

lambda <- 1e-2
pd <- mod$simulate_data(N_sub, w0 = lambda)
#> Generating 1000 paths
pd$paths$plot_paths()

pd$paths$plot_graph()

We plot the distribution of event times (i.e., transition times) and the theoretical mean (red line).

times <- pd$paths$transition_times()
plot_time_dist(times) + geom_vline(xintercept = 1 / lambda, color = "firebrick")

Here generate paths with a lower hazard so that they can get censored before the follow-up ends. We compute the risk of death based on paths and ensure that it is correct, by comparing to two different more analytic implementations of solving a transition probability matrix.

# 1) This is the general way using the package
solve_p <- function(mod, lambda) {
  tt <- c(0, mod$system$get_tmax())
  P <- solve_trans_prob_matrix(mod$system, t_out = tt, log_w0 = log(lambda))
  P[2, 1, ]
}

# 2) This is a special case which can be used if the baseline hazards are constant,
# currently not in the package but we use it to validate that we get the
# Same result
solve_P_consthaz <- function(mod, h0) {
  Q_true <- mod$system$intensity_matrix(mod$system$get_tmax(), log(h0))
  Matrix::expm(Q_true * mod$system$get_tmax())
}

lambda <- 1e-3
pd <- mod$simulate_data(N_sub, w0 = lambda)
#> Generating 1000 paths
p <- solve_p(mod, lambda)
P <- solve_P_consthaz(mod, lambda)
r <- c(p_state_visit(pd$paths)$prob, p[2], P[1, 2])
names(r) <- c("Paths", "Analytic1", "Analytic2")
print(r)
#>     Paths Analytic1 Analytic2 
#> 0.6500000 0.6657109 0.6657112

Competing risks model

The transition matrix is for a competing risks model can be created like so.

tm <- transmat_comprisk()
print(tm)
#>   A B C D
#> A 0 1 1 1
#> B 0 0 0 0
#> C 0 0 0 0
#> D 0 0 0 0
tm$plot()

mod <- create_msm(tm, t_max = tmax)

This model has 3 possible transitions. Now we can simulate events with known constant hazard rates lambda_1, lambda_2, lambda_3. For this model, we are simulating from independent homogeneous Poisson processes with rates λi\lambda_i, meaning that the transition probabilities should be P(i)=λij=13λj. P(i) = \frac{\lambda_i}{\sum_{j=1}^ 3 \lambda_j}.

lambda <- 1e-2
w0 <- c(lambda, 2 * lambda, 3 * lambda)
print(w0 / sum(w0))
#> [1] 0.1666667 0.3333333 0.5000000
pd <- mod$simulate_data(N_sub, w0 = w0)
#> Generating 1000 paths
pd$paths$plot_paths()

pd$paths$plot_graph()

The distribution of event times should be exponentially distributed with mean 1λ0\frac{1}{\lambda_0} where λ0=j=13λj \lambda_0 = \sum_{j=1}^ 3 \lambda_j is the combined hazard rate. We plot the empirical distribution and the theoretical mean (red line).

times <- pd$paths$as_transitions(truncate = TRUE)$time
plot_time_dist(times) + geom_vline(xintercept = 1 / sum(w0), color = "firebrick")

Here we again generate paths with a lower hazard so that they can get censored before the follow-up ends. We compute the probability of different terminal states at the end of follow up based on paths and ensure that they are correct.

pd <- mod$simulate_data(N_sub, w0 = 0.01 * w0)
#> Generating 1000 paths
p <- solve_p(mod, 0.01 * w0)
P <- solve_P_consthaz(mod, 0.01 * w0)
r <- rbind(p_state_visit(pd$paths)$prob, p[2:4], P[1, 2:4])
rownames(r) <- c("Paths", "Analytic1", "Analytic2")
colnames(r) <- paste0("P(state = ", colnames(r), ")")
print(r)
#>           P(state = B) P(state = C) P(state = D)
#> Paths       0.08100000    0.1440000    0.2380000
#> Analytic1   0.08030481    0.1606096    0.2409144
#> Analytic2   0.08030484    0.1606097    0.2409145

Illness-death model

The illness-death model is another common multistate model.

tm <- transmat_illnessdeath()
print(tm)
#>          Healthy Diseased Dead
#> Healthy        0        1    1
#> Diseased       0        0    1
#> Dead           0        0    0
tm$plot()

mod <- create_msm(tm)
w0 <- 1e-2
pd <- mod$simulate_data(N_sub, w0 = w0)
#> Generating 1000 paths
pd$paths$plot_paths()

pd$paths$plot_graph()

Here we again generate paths with a lower hazard so that they can get censored before the follow-up ends. We compute the probability of death before the end of follow up based on paths and ensure that it is correct.

h0_true <- 0.5 * 1e-3
h0_true_vec <- rep(h0_true, 3)
h0_true_vec[2] <- 0.1 * h0_true
h0_true_vec[3] <- 5 * h0_true
pd <- mod$simulate_data(N_sub, w0 = h0_true_vec)
#> Generating 1000 paths
p <- solve_p(mod, h0_true_vec)
P_true <- solve_P_consthaz(mod, h0_true_vec)
p_death1 <- p_state_visit(pd$paths)$prob[2]
p_death2 <- p[3]
p_death3 <- P_true[1, 3]
r <- c(p_death1, p_death2, p_death3)

names(r) <- c("Paths", "Analytic", "Analytic2")
print(r)
#>     Paths  Analytic Analytic2 
#> 0.3110000 0.2961619 0.2961618

Diamond model

This model is a more complex multistate model.

tm <- transmat_diamond()
print(tm)
#>   A B C D
#> A 0 1 1 1
#> B 0 0 1 1
#> C 0 1 0 1
#> D 0 0 0 0
tm$plot()

mod <- create_msm(tm)
w0 <- 1e-2
pd <- mod$simulate_data(N_sub, w0 = w0)
#> Generating 1000 paths
pd$paths$plot_paths()

pd$paths$plot_graph()

Here we again generate paths with a lower hazard so that they can get censored before the follow-up ends. We compute the probability of death before the end of follow up based on paths and ensure that it is correct.

h0_true <- 0.5 * 1e-3
h0_true_vec <- rep(h0_true, 7)
h0_true_vec[3] <- 0.1 * h0_true
h0_true_vec[5] <- 5 * h0_true
h0_true_vec[7] <- 20 * h0_true

P_true <- solve_P_consthaz(mod, h0_true_vec)
p_death3 <- P_true[1, 4]
w0 <- h0_true_vec
pd <- mod$simulate_data(N_sub, w0 = w0)
#> Generating 1000 paths
p <- solve_p(mod, w0)
p_death1 <- p_state_visit(pd$paths)$prob[3]
p_death2 <- p[4]
r <- c(p_death1, p_death2, p_death3)
names(r) <- c("Paths", "Analytic", "Analytic2")
print(r)
#>     Paths  Analytic Analytic2 
#> 0.5270000 0.5459059 0.5459058

General multistate models

General multistate models can have multiple terminal states and various transitions.

tm <- transmat_full(state_names = LETTERS[1:5], sources = 1, terminal = c(4, 5))
print(tm)
#>   A B C D E
#> A 0 1 1 1 1
#> B 0 1 1 1 1
#> C 0 1 1 1 1
#> D 0 0 0 0 0
#> E 0 0 0 0 0
tm$plot()

mod <- create_msm(tm)
pd <- mod$simulate_data(N_sub, w0 = 1e-3)
#> Generating 1000 paths
pd$paths$plot_paths()

pd$paths$plot_graph()

Some of the paths may not reach a terminal state before t_max, meaning they get censored.

pd$paths$prop_matrix()
#>     to
#> from         A         B         C         D         E  no event
#>    A 0.0000000 0.2600000 0.2550000 0.2380000 0.2330000 0.0140000
#>    B 0.0000000 0.2339545 0.2091097 0.2070393 0.2111801 0.1387164
#>    C 0.0000000 0.2412281 0.2192982 0.1973684 0.2302632 0.1118421
#>    D 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 1.0000000
#>    E 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 1.0000000

References

Kneib, Thomas, and Andrea Hennerfeind. 2008. “Bayesian Semi Parametric Multi-State Models.” Statistical Modelling 8: 169–98. https://api.semanticscholar.org/CorpusID:1219145.
Lewis, P. A. W., and G. S. Shedler. 1979. “Simulation of Nonhomogeneous Poisson Processes with Degree-Two Exponential Polynomial Rate Function.” Operations Research 27 (5): 1026–40.
Norris, J. R. 1997. Markov Chains. Cambridge: Cambridge University Press.