Code Generation vs Configuration:
A Case Study with Stan and R

Nathan Green

Department of Statistical Science, UCL

Outline

Basic example
What is code generation and configuration?
Pros 👍 and cons 👎
Case study

Resources

Slides and code here: github.com/n8thangreen/stan-code-generation-talk

Using `if/else` for the Distribution

sample_if_else <- function(distribution_name, n, ...) {
  if (distribution_name == "norm") {
    return(rnorm(n, ...))
  } else if (distribution_name == "pois") {
    return(rpois(n, ...))
  } else if (distribution_name == "unif") {
    return(runif(n, ...))
  } else if (distribution_name == "exp") {
    return(rexp(n, ...))
  } else {
    stop("Unsupported distribution.")
  }
}

sample_if_else("norm", 5, mean = 0, sd = 1)
sample_if_else("pois", 5, lambda = 5)

Provide a string for the distribution name
Could use a switch statement
Duplicate code

Build Function Name from Text Argument

sample_with_arg <- function(distribution_name, n, ...) {
  sampler_func_name <- paste0("r", distribution_name)
  
  args <- list(n = n, ...)
  
  # Call the function dynamically
  return(do.call(sampler_func_name, args))
}

sample_with_arg("norm", 5, mean = 0, sd = 1)
sample_with_arg("pois", 5, lambda = 5)

Pass kernel / root of distribution name
Create the appropriate function name
Then execute with additional arguments

Code Generation

generate_code <- function(dist_base_name, n, ...) {
  func_name_string <- paste0("r", dist_base_name)
  
  dots <- match.call(expand.dots = FALSE)$`...`
  arg_strings <- c()
  
  arg_strings <- c(arg_strings, paste0("n = ", n))
  
  # Add other arguments from '...'
  if (!is.null(dots)) {
     # ...
  }
  
  all_args_string <- paste(arg_strings, collapse = ", ")
  
  r_code_string <-
    paste0(func_name_string, "(", all_args_string, ")")
  
  return(r_code_string)
}

norm_code <-
   generate_code("norm", 10, mean = 5, sd = 2.5)
norm_samples <- eval(parse(text = norm_code))

Separate the creation of the sampling function code from the execution
Two-step approach allows for repeated use of code
Extend for a combination of generated code and passed arguments
A functional paradigm may be better

What’s this called and what’s the Difference? 🤔

Code Generation vs. Configuration/Parameterization
- Choosing between generating distinct code files (code generation) or having a single code base that adapts its behaviour based on input parameters, flags, or configuration settings.
Compile-time vs. Run-time Flexibility
- Code generation leans towards compile-time flexibility. The variations are created before the program runs (at “compile time” in a broader sense, even for interpreted languages like R where it’s more about script generation). This can lead to highly optimized, lean code for each specific use case.
- Flexible inputs/flags lean towards run-time flexibility. The same code handles different scenarios by evaluating parameters or flags at the time the program is executed. This offers greater adaptability without needing to re-generate or re-deploy.

Generality vs. Specialization
- Code generation allows for highly specialized code, tailored precisely to a given set of requirements. This can lead to better performance and simpler code for that specific instance.
- Flexible inputs favour generality, creating a single solution that can handle a broader range of cases. This can lead to more complex code internally (e.g., more if/else statements, more parameters) but fewer distinct code artifacts.
DRY (Don’t Repeat Yourself) vs. YAGNI (You Ain’t Gonna Need It)
- Sometimes code generation is used to adhere to DRY principles by generating repetitive boilerplate. However, if the variations are minor, generating separate code for each can introduce repetition at a higher level (e.g., repeating the “model structure” with slight changes).
- The “flexible inputs” approach often aligns with YAGNI, as you’re not generating extra code until it’s explicitly needed via configuration.

Advantages of Code Generation 👍

Performance: Generated code can be highly optimized for a specific task, as it doesn’t need to carry the overhead of conditional logic for other scenarios.
Simplicity (per generated file): Each generated code file can be simpler and more readable because it only contains the logic relevant to its specific configuration.
Reduced runtime overhead: No need to parse flags or configuration settings at runtime.
Type safety/early error detection: If the generated code is truly distinct, type errors or other issues might be caught earlier in the development cycle.

Disadvantages of Code Generation 👎

Increased Maintenance Overhead: You now have multiple code files to manage. If the underlying logic changes, you might need to regenerate all of them, which can be prone to errors if the generation process isn’t robust.
Debugging Challenges: Debugging generated code can be harder, especially if the generation logic itself is complex. The actual code you’re debugging might not be what you wrote directly.
Version Control Complexity: Managing many generated files in version control can be cumbersome.
Potential for “Code Bloat”: If the variations are numerous, you could end up with a large number of very similar code files.

Advantages of Flexible Inputs 👍

Easier Maintenance: You maintain one core piece of code. Changes to the underlying logic only need to be applied in one place.
Greater Flexibility: It’s easier to introduce new variations or modify existing ones by simply changing configuration or input parameters, without needing to touch the core code or regenerate anything.
Reduced Code Duplication: Promotes the DRY principle by centralizing common logic.
Simpler Deployment: Only one version of the code needs to be deployed.

Disadvantages of Flexible Inputs 👎

Increased Code Complexity: The single code base might become more complex due to the need for conditional logic (e.g., if/else, switch statements) to handle different configurations.
Potential Performance Overhead: Runtime checks for flags or parameters can introduce minor performance overhead.
Readability Challenges: A highly flexible, parameterized code base can sometimes be harder to read and understand due to the many branches and conditions.
Risk of “God Function”: A single function or object trying to do too much based on various flags can become an unmanageable “God” entity.

Case Study in Stan: Multilevel Mixture Cure Model

From Green et al. (2025)

Reference

Green, N., Kurt, M., Moshyk, A., Larkin, J. and Baio, G. (2025), A Bayesian Hierarchical Mixture Cure Modelling Framework to Utilize Multiple Survival Datasets for Long-Term Survivorship Estimates: A Case Study From Previously Untreated Metastatic Melanoma. Statistics in Medicine, 44: e70132. https://doi.org/10.1002/sim.70132

Stan code

functions {
#include /include/distributions.stan
}

data {
 int<lower=1> nTx;
int<lower=1, upper=3> cf_model;         // cure fraction
int<lower=0> N_1;
array[nTx] int<lower=0> n_1;
int<lower=0> H_1;
vector<lower=0>[N_1] t_1;
vector<lower=0, upper=1>[N_1] d_1;
matrix[N_1, H_1] X_1;
vector[H_1] mu_S_1;
vector<lower=0>[H_1] sigma_S_1;

 int<lower=0> N_2;
array[nTx] int<lower=0> n_2;
int<lower=0> H_2;
vector<lower=0>[N_2] t_2;
vector<lower=0, upper=1>[N_2] d_2;
matrix[N_2, H_2] X_2;
vector[H_2] mu_S_2;
vector<lower=0>[H_2] sigma_S_2;

 int<lower=0> N_3;
array[nTx] int<lower=0> n_3;
int<lower=0> H_3;
vector<lower=0>[N_3] t_3;
vector<lower=0, upper=1>[N_3] d_3;
matrix[N_3, H_3] X_3;
vector[H_3] mu_S_3;
vector<lower=0>[H_3] sigma_S_3;

 int<lower=0> N_4;
array[nTx] int<lower=0> n_4;
int<lower=0> H_4;
vector<lower=0>[N_4] t_4;
vector<lower=0, upper=1>[N_4] d_4;
matrix[N_4, H_4] X_4;
vector[H_4] mu_S_4;
vector<lower=0>[H_4] sigma_S_4;

int<lower=1, upper=2> bg_model;
 vector[bg_model == 1 ? H_1 : 0] mu_bg;
 vector<lower=0>[bg_model == 1 ? H_1 : 0] sigma_bg;
vector<lower=0>[bg_model == 2 ? N_1 : 0] h_bg_1;
vector<lower=0>[bg_model == 2 ? N_2 : 0] h_bg_2;
vector<lower=0>[bg_model == 2 ? N_3 : 0] h_bg_3;
vector<lower=0>[bg_model == 2 ? N_4 : 0] h_bg_4;
 matrix[nTx, nTx] Tx_dmat;         // treatment design matrix
 vector[cf_model == 3 ? nTx : 0] mu_alpha;             // treatment regression coefficients
 vector<lower=0>[cf_model == 3 ? nTx : 0] sigma_alpha;
 int<lower=0> t_max;
vector[cf_model == 2 ? nTx : 0] mu_alpha_1;
vector<lower=0>[cf_model == 2 ? nTx : 0] sigma_alpha_1;
vector[cf_model == 2 ? nTx : 0] mu_alpha_2;
vector<lower=0>[cf_model == 2 ? nTx : 0] sigma_alpha_2;
vector[cf_model == 2 ? nTx : 0] mu_alpha_3;
vector<lower=0>[cf_model == 2 ? nTx : 0] sigma_alpha_3;
vector[cf_model == 2 ? nTx : 0] mu_alpha_4;
vector<lower=0>[cf_model == 2 ? nTx : 0] sigma_alpha_4;
 array[cf_model == 1 ? 1 : 0] real<lower=0> a_cf;

 array[cf_model == 1 ? 1 : 0] real<lower=0> b_cf;
vector[cf_model == 3 ? nTx : 0] mu_sd_cf;
vector<lower=0>[cf_model == 3 ? nTx : 0] sigma_sd_cf;
}

parameters {
 // coefficients in linear predictor (including intercept)
 vector[bg_model == 1 ? H_1 : 0] beta_bg;
 vector[cf_model != 2 ? nTx : 0] alpha;
vector[cf_model == 2 ? nTx : 0] alpha_1;
vector[H_1] beta_1;
vector[cf_model == 2 ? nTx : 0] alpha_2;
vector[H_2] beta_2;
vector[cf_model == 2 ? nTx : 0] alpha_3;
vector[H_3] beta_3;
vector[cf_model == 2 ? nTx : 0] alpha_4;
vector[H_4] beta_4;
 vector<lower=0, upper=1>[cf_model == 1 ? nTx : 0] cf_pooled;
vector[cf_model == 3 ? nTx : 0] lp_cf_1;
vector[cf_model == 3 ? nTx : 0] lp_cf_2;
vector[cf_model == 3 ? nTx : 0] lp_cf_3;
vector[cf_model == 3 ? nTx : 0] lp_cf_4;
 vector<lower=0>[cf_model == 3 ? nTx : 0] sd_cf;

}

transformed parameters {
vector[N_1] lp_1_bg;

vector<lower=0>[N_1] lambda_1_bg;

vector[N_2] lp_2_bg;

vector<lower=0>[N_2] lambda_2_bg;

vector[N_3] lp_3_bg;

vector<lower=0>[N_3] lambda_3_bg;

vector[N_4] lp_4_bg;

vector<lower=0>[N_4] lambda_4_bg;
vector[N_1] lp_1;
// rate parameters
             vector<lower=0>[N_1] lambda_1;
 vector[N_2] lp_2;
// rate parameters
             vector<lower=0>[N_2] lambda_2;
 vector[N_3] lp_3;
// rate parameters
             vector<lower=0>[N_3] lambda_3;
 vector[N_4] lp_4;
// rate parameters
             vector<lower=0>[N_4] lambda_4;
 vector<lower=0, upper=1>[cf_model == 3 ? nTx : 0] cf_global;
vector<lower=0, upper=1>[nTx] cf_1;
vector[cf_model == 2 ? nTx : 0] tx_cf_1;
vector<lower=0, upper=1>[nTx] cf_2;
vector[cf_model == 2 ? nTx : 0] tx_cf_2;
vector<lower=0, upper=1>[nTx] cf_3;
vector[cf_model == 2 ? nTx : 0] tx_cf_3;
vector<lower=0, upper=1>[nTx] cf_4;
vector[cf_model == 2 ? nTx : 0] tx_cf_4;
 vector[cf_model == 3 ? nTx : 0] lp_cf_global;
// correlated event times
  lp_1 = X_1*beta_1;

// background survival with uncertainty

if (bg_model == 1) {
  lp_1_bg = X_1*beta_bg;
} else {
  lp_1_bg = log(h_bg_1);
}

lambda_1_bg = exp(lp_1_bg);

// correlated event times
  lp_2 = X_2*beta_2;

// background survival with uncertainty

if (bg_model == 1) {
  lp_2_bg = X_2*beta_bg;
} else {
  lp_2_bg = log(h_bg_2);
}

lambda_2_bg = exp(lp_2_bg);

// correlated event times
  lp_3 = X_3*beta_3;

// background survival with uncertainty

if (bg_model == 1) {
  lp_3_bg = X_3*beta_bg;
} else {
  lp_3_bg = log(h_bg_3);
}

lambda_3_bg = exp(lp_3_bg);

// correlated event times
  lp_4 = X_4*beta_4;

// background survival with uncertainty

if (bg_model == 1) {
  lp_4_bg = X_4*beta_bg;
} else {
  lp_4_bg = log(h_bg_4);
}

lambda_4_bg = exp(lp_4_bg);
lambda_1 = exp(lp_1);
 lambda_2 = exp(lp_2);
 lambda_3 = exp(lp_3);
 lambda_4 = exp(lp_4);
 if (cf_model == 1) {
     cf_1 = cf_pooled;
     cf_2 = cf_pooled;
     cf_3 = cf_pooled;
     cf_4 = cf_pooled;
}
if (cf_model == 3) {
     lp_cf_global = Tx_dmat*alpha;
     cf_global = inv_logit(lp_cf_global);
     cf_1 = inv_logit(lp_cf_1);
     cf_2 = inv_logit(lp_cf_2);
     cf_3 = inv_logit(lp_cf_3);
     cf_4 = inv_logit(lp_cf_4);
}
if (cf_model == 2) {
tx_cf_1 = Tx_dmat*alpha_1;
cf_1 = inv_logit(tx_cf_1);
tx_cf_2 = Tx_dmat*alpha_2;
cf_2 = inv_logit(tx_cf_2);
tx_cf_3 = Tx_dmat*alpha_3;
cf_3 = inv_logit(tx_cf_3);
tx_cf_4 = Tx_dmat*alpha_4;
cf_4 = inv_logit(tx_cf_4);
}

}

model {
int idx_1;
int idx_2;
int idx_3;
int idx_4;      
beta_1 ~ normal(mu_S_1, sigma_S_1);
      
beta_2 ~ normal(mu_S_2, sigma_S_2);
      
beta_3 ~ normal(mu_S_3, sigma_S_3);
      
beta_4 ~ normal(mu_S_4, sigma_S_4);
if (bg_model == 1) {
     beta_bg ~ normal(mu_bg, sigma_bg);
}
 
 // cure fraction 
 if (cf_model == 3) {
     alpha ~ normal(mu_alpha, sigma_alpha);
 sd_cf ~ cauchy(mu_sd_cf, sigma_sd_cf);  // truncated

lp_cf_1 ~ normal(lp_cf_global, sd_cf);
lp_cf_2 ~ normal(lp_cf_global, sd_cf);
lp_cf_3 ~ normal(lp_cf_global, sd_cf);
lp_cf_4 ~ normal(lp_cf_global, sd_cf);
} else if (cf_model == 2) {
alpha_1 ~ normal(mu_alpha_1, sigma_alpha_1);
alpha_2 ~ normal(mu_alpha_2, sigma_alpha_2);
alpha_3 ~ normal(mu_alpha_3, sigma_alpha_3);
alpha_4 ~ normal(mu_alpha_4, sigma_alpha_4);
} else {
     cf_pooled ~ beta(a_cf, b_cf);
}
idx_1 = 1;

// likelihood
for (Tx in 1:nTx) {
  for (i in idx_1:(idx_1 + n_1[Tx] - 1)) {

     target += log_sum_exp(
    log(cf_1[Tx]) +
      exp_log_S(t_1[i], lambda_1_bg[i]),
    log1m(cf_1[Tx]) +
      exp_log_S(t_1[i], lambda_1_bg[i]) + exp_log_S(t_1[i], lambda_1[i]));

     target += d_1[i] * log_sum_exp(
    log(lambda_1_bg[i]),
    log1m(cf_1[Tx]) +
      exp_lpdf(t_1[i] | lambda_1[i]) - log(cf_1[Tx] + (1 - cf_1[Tx])*exp_Surv(t_1[i], lambda_1[i])));
  }

  idx_1 = idx_1 + n_1[Tx];
}
 idx_2 = 1;

// likelihood
for (Tx in 1:nTx) {
  for (i in idx_2:(idx_2 + n_2[Tx] - 1)) {

     target += log_sum_exp(
    log(cf_2[Tx]) +
      exp_log_S(t_2[i], lambda_2_bg[i]),
    log1m(cf_2[Tx]) +
      exp_log_S(t_2[i], lambda_2_bg[i]) + exp_log_S(t_2[i], lambda_2[i]));

     target += d_2[i] * log_sum_exp(
    log(lambda_2_bg[i]),
    log1m(cf_2[Tx]) +
      exp_lpdf(t_2[i] | lambda_2[i]) - log(cf_2[Tx] + (1 - cf_2[Tx])*exp_Surv(t_2[i], lambda_2[i])));
  }

  idx_2 = idx_2 + n_2[Tx];
}
 idx_3 = 1;

// likelihood
for (Tx in 1:nTx) {
  for (i in idx_3:(idx_3 + n_3[Tx] - 1)) {

     target += log_sum_exp(
    log(cf_3[Tx]) +
      exp_log_S(t_3[i], lambda_3_bg[i]),
    log1m(cf_3[Tx]) +
      exp_log_S(t_3[i], lambda_3_bg[i]) + exp_log_S(t_3[i], lambda_3[i]));

     target += d_3[i] * log_sum_exp(
    log(lambda_3_bg[i]),
    log1m(cf_3[Tx]) +
      exp_lpdf(t_3[i] | lambda_3[i]) - log(cf_3[Tx] + (1 - cf_3[Tx])*exp_Surv(t_3[i], lambda_3[i])));
  }

  idx_3 = idx_3 + n_3[Tx];
}
 idx_4 = 1;

// likelihood
for (Tx in 1:nTx) {
  for (i in idx_4:(idx_4 + n_4[Tx] - 1)) {

     target += log_sum_exp(
    log(cf_4[Tx]) +
      exp_log_S(t_4[i], lambda_4_bg[i]),
    log1m(cf_4[Tx]) +
      exp_log_S(t_4[i], lambda_4_bg[i]) + exp_log_S(t_4[i], lambda_4[i]));

     target += d_4[i] * log_sum_exp(
    log(lambda_4_bg[i]),
    log1m(cf_4[Tx]) +
      exp_lpdf(t_4[i] | lambda_4[i]) - log(cf_4[Tx] + (1 - cf_4[Tx])*exp_Surv(t_4[i], lambda_4[i])));
  }

  idx_4 = idx_4 + n_4[Tx];
}
}

generated quantities {
real mean_bg;
 // real pbeta_bg;
 real log_lik = 0;
 vector[t_max] S_bg;
vector[t_max] S_1;
vector[nTx] rmst_1;
vector[nTx] median_1;
matrix[t_max, nTx] S_1_pred;
real mean_1;
int idx_1;
real log_lik_1;
// real pbeta_1 = normal_rng(mu_S_1[1], sigma_S_1[1]);


vector[t_max] S_2;
vector[nTx] rmst_2;
vector[nTx] median_2;
matrix[t_max, nTx] S_2_pred;
real mean_2;
int idx_2;
real log_lik_2;
// real pbeta_2 = normal_rng(mu_S_2[1], sigma_S_2[1]);


vector[t_max] S_3;
vector[nTx] rmst_3;
vector[nTx] median_3;
matrix[t_max, nTx] S_3_pred;
real mean_3;
int idx_3;
real log_lik_3;
// real pbeta_3 = normal_rng(mu_S_3[1], sigma_S_3[1]);

vector[t_max] S_4;
vector[nTx] rmst_4;
vector[nTx] median_4;
matrix[t_max, nTx] S_4_pred;
real mean_4;
int idx_4;
real log_lik_4;
// real pbeta_4 = normal_rng(mu_S_4[1], sigma_S_4[1]);

 mean_1 = exp(beta_1[1]);
 mean_2 = exp(beta_2[1]);
 mean_3 = exp(beta_3[1]);
 mean_4 = exp(beta_4[1]);// background rate
if (bg_model == 1) {
    mean_bg = exp(beta_bg[1]);
} else {
// mean_bg = 0.001;
mean_bg = mean(h_bg_1);
mean_bg = mean(h_bg_2);
mean_bg = mean(h_bg_3);
mean_bg = mean(h_bg_4);
}
// posterior mean checks
for (j in 1:nTx) {
  for (i in 1:t_max) {
    S_bg[i] = exp_Surv(i, mean_bg);
    S_1[i] = exp_exp_Surv(i, mean_1, mean_bg);
    S_1_pred[i, j] = cf_1[j]*S_bg[i] + (1 - cf_1[j])*S_1[i];
  }
}

 // posterior mean checks
for (j in 1:nTx) {
  for (i in 1:t_max) {
    S_bg[i] = exp_Surv(i, mean_bg);
    S_2[i] = exp_exp_Surv(i, mean_2, mean_bg);
    S_2_pred[i, j] = cf_2[j]*S_bg[i] + (1 - cf_2[j])*S_2[i];
  }
}

 // posterior mean checks
for (j in 1:nTx) {
  for (i in 1:t_max) {
    S_bg[i] = exp_Surv(i, mean_bg);
    S_3[i] = exp_exp_Surv(i, mean_3, mean_bg);
    S_3_pred[i, j] = cf_3[j]*S_bg[i] + (1 - cf_3[j])*S_3[i];
  }
}

 // posterior mean checks
for (j in 1:nTx) {
  for (i in 1:t_max) {
    S_bg[i] = exp_Surv(i, mean_bg);
    S_4[i] = exp_exp_Surv(i, mean_4, mean_bg);
    S_4_pred[i, j] = cf_4[j]*S_bg[i] + (1 - cf_4[j])*S_4[i];
  }
}
// restricted mean survival time
for (i in 1:nTx) {
  rmst_1[i] = cf_1[i]*t_max + (1 - cf_1[i])*rmst_exp(mean_1, t_max);
  median_1[i] = median_surv_cf_exp(mean_1, cf_1[i]);
  //median_1[i] = median_surv_exp(mean_1);  // uncured
}

 // restricted mean survival time
for (i in 1:nTx) {
  rmst_2[i] = cf_2[i]*t_max + (1 - cf_2[i])*rmst_exp(mean_2, t_max);
  median_2[i] = median_surv_cf_exp(mean_2, cf_2[i]);
  //median_2[i] = median_surv_exp(mean_2);  // uncured
}

 // restricted mean survival time
for (i in 1:nTx) {
  rmst_3[i] = cf_3[i]*t_max + (1 - cf_3[i])*rmst_exp(mean_3, t_max);
  median_3[i] = median_surv_cf_exp(mean_3, cf_3[i]);
  //median_3[i] = median_surv_exp(mean_3);  // uncured
}

 // restricted mean survival time
for (i in 1:nTx) {
  rmst_4[i] = cf_4[i]*t_max + (1 - cf_4[i])*rmst_exp(mean_4, t_max);
  median_4[i] = median_surv_cf_exp(mean_4, cf_4[i]);
  //median_4[i] = median_surv_exp(mean_4);  // uncured
}

// likelihood
  idx_1 = 1;

  for (Tx in 1:nTx) {
    for (i in idx_1:(idx_1 + n_1[Tx] - 1)) {
      log_lik_1 += log_sum_exp(
      log(cf_1[Tx]) +
      surv_exp_lpdf(t_1[i] | d_1[i], lambda_1_bg[i]),
      log1m(cf_1[Tx]) +
      joint_exp_exp_lpdf(t_1[i] | d_1[i], lambda_1[i], lambda_1_bg[i]));
    }

    idx_1 = idx_1 + n_1[Tx];
  }
  log_lik = log_lik + log_lik_1;

 // likelihood
  idx_2 = 1;

  for (Tx in 1:nTx) {
    for (i in idx_2:(idx_2 + n_2[Tx] - 1)) {
      log_lik_2 += log_sum_exp(
      log(cf_2[Tx]) +
      surv_exp_lpdf(t_2[i] | d_2[i], lambda_2_bg[i]),
      log1m(cf_2[Tx]) +
      joint_exp_exp_lpdf(t_2[i] | d_2[i], lambda_2[i], lambda_2_bg[i]));
    }

    idx_2 = idx_2 + n_2[Tx];
  }
  log_lik = log_lik + log_lik_2;

 // likelihood
  idx_3 = 1;

  for (Tx in 1:nTx) {
    for (i in idx_3:(idx_3 + n_3[Tx] - 1)) {
      log_lik_3 += log_sum_exp(
      log(cf_3[Tx]) +
      surv_exp_lpdf(t_3[i] | d_3[i], lambda_3_bg[i]),
      log1m(cf_3[Tx]) +
      joint_exp_exp_lpdf(t_3[i] | d_3[i], lambda_3[i], lambda_3_bg[i]));
    }

    idx_3 = idx_3 + n_3[Tx];
  }
  log_lik = log_lik + log_lik_3;

 // likelihood
  idx_4 = 1;

  for (Tx in 1:nTx) {
    for (i in idx_4:(idx_4 + n_4[Tx] - 1)) {
      log_lik_4 += log_sum_exp(
      log(cf_4[Tx]) +
      surv_exp_lpdf(t_4[i] | d_4[i], lambda_4_bg[i]),
      log1m(cf_4[Tx]) +
      joint_exp_exp_lpdf(t_4[i] | d_4[i], lambda_4[i], lambda_4_bg[i]));
    }

    idx_4 = idx_4 + n_4[Tx];
  }
  log_lik = log_lik + log_lik_4;
}

create_stancode <- function(models) {
  n_grps <- length(models)

  # generate separate blocks of Stan code
  stancode <- create_code_skeleton(n_grps)
  cf_code <- create_cf_code(n_grps)

  latent_model_code <- list()
  priorpred_code <- postpred_code <- list()
  estimates_code <- list()
  loo_code <- loglik_code <- list()

  for (i in seq_along(models)) {
    latent_model_code[[i]] <-  make_latent_model_code(models[i], id = i)
    priorpred_code[[i]] <- make_priorpred(models[i], id = i)
    postpred_code[[i]] <- make_postpred(models[i], id = i)
    estimates_code[[i]] <- make_summary_estimates(models[i], id = i)
    loglik_code[[i]] <- make_pop_loglik(models[i], id = i)
    # loglik_code[[i]] <- make_loglik(models[i], id = i)
    loo_code[[i]] <- make_loo(models[i], id = i)
  }

  # interleave all model lists
  latent_model_code <- rearrange_blocks(latent_model_code)
  priorpred_code <- rearrange_blocks(priorpred_code)
  postpred_code <- rearrange_blocks(postpred_code)
  estimates_code <- rearrange_blocks(estimates_code)
  loglik_code <- rearrange_blocks(loglik_code)
  loo_code <- rearrange_blocks(loo_code)

  scode <- list()

  scode$functions <-
    c("functions {\n#include /include/distributions.stan\n}\n\n")

  # generate data block
  scode$data <- paste0(
    "data {\n",
    stancode$data_def,
    cf_code$data_def,
    latent_model_code$data_def,
    stancode$data_main,
    cf_code$data_main,
    "\n}\n\n"
  )

  # generate parameters block
  scode$parameters <- paste0(
    "parameters {\n",
    latent_model_code$parameters,
    stancode$parameters,
    cf_code$parameters,
    "\n}\n\n"
  )

  # generate transformed parameters block
  scode$trans_params <- paste0(
    "transformed parameters {\n",
    stancode$trans_params_def,
    latent_model_code$trans_params_def,
    cf_code$trans_params_def,
    stancode$trans_params_main,
    latent_model_code$trans_params_main,
    cf_code$trans_params_main,
    "\n}\n\n"
  )

  # combine likelihood with prior part
  scode$model <- paste0(
    "model {\n",
    stancode$model,
    latent_model_code$model,
    cf_code$model,
    loglik_code,
    "\n}\n\n"
  )

  # generate generated quantities block
  scode$generated_quantities <- paste0(
    "generated quantities {\n",
    stancode$generated_quantities_def,
    latent_model_code$generated_quantities_def,
    latent_model_code$generated_quantities_main,
    stancode$generated_quantities_main,
    cf_code$generated_quantities_main,
    do.call(paste, postpred_code),
    do.call(paste, estimates_code),
    do.call(paste, priorpred_code),
    do.call(paste, loo_code),
    "}\n"
  )

  # combine all elements into a complete Stan model
  paste0(
    scode$functions,
    scode$data,
    scode$parameters,
    scode$trans_params,
    scode$model,
    scode$generated_quantities
  )
}

make_latent_model_code <- function(model, id = 1L) {

  scode <- list()

  scode$data_def <-
    common_code_event_data(id)

  scode$trans_params_def <-
    glue("\nvector[N_{id}] lp_{id};\n")

  if (model == "exp") {
    scode$trans_params_def <-
      glue(scode$trans_params_def,
           "\n// rate parameters
             vector<lower=0>[N_{id}] lambda_{id};\n")

    scode$parameters <- ""

    scode$trans_params_main <-
      glue("\nlambda_{id} = exp(lp_{id});\n")

    scode$model <- ""

    scode$generated_quantities_def <- ""
    scode$generated_quantities_main <-
      glue("mean_{id} = exp(beta_{id}[1]);\n")
  }

  if (model %in% c("gompertz", "loglogistic", "weibull")) {
    scode$data_def <-
      glue(scode$data_def,
           "real<lower=0> a_shape_{id};\n",
           "real<lower=0> b_shape_{id};\n")
           # "array[0] real x_r;  // empty data array\n",
           # "array[0] int x_i;   // for integration\n")

    scode$parameters <-
      glue("real<lower=0> shape_{id};\n")

    scode$trans_params_def <-
      glue(scode$trans_params_def,
           "// rate parameters
            vector[N_{id}] lambda_{id};\n")

    scode$trans_params_main <-
      glue("lambda_{id} = exp(lp_{id});\n",
           "for (i in 1:num_elements(lambda_{id})) {{\n",
           "   if (lambda_{id}[i] > 100) {{ // constrain upper limit\n",
           "      lambda_{id}[i] = 100;\n",
           "   }\n",
           "   if (lambda_{id}[i] <= 0) {{ // constrain lower limit\n",
           "      lambda_{id}[i] = 1e-10;\n",
           "   }\n",
           "}\n")

    scode$model <-
      glue("shape_{id} ~ gamma(a_shape_{id}, b_shape_{id});\n")

    scode$generated_quantities_def <-
      glue("real pshape_{id} = gamma_rng(a_shape_{id}, b_shape_{id});\n")
    scode$generated_quantities_main <-
      glue("mean_{id} = exp(beta_{id}[1]);\n")
  }

create_code_skeleton <- function(n_grp) {

  scode <- list()
  ids <- data.frame(id = 1:n_grp)

  scode$data_def <-
    c(" int<lower=1> nTx;\n")

  scode$data_main <-
    paste0(
      paste("\nint<lower=1, upper=2> bg_model;\n",
            "vector[bg_model == 1 ? H_1 : 0] mu_bg;\n",
            "vector<lower=0>[bg_model == 1 ? H_1 : 0] sigma_bg;\n", collapse = "\n"),
      cglue_data(ids, " vector<lower=0>[bg_model == 2 ? N_{id} : 0] h_bg_{id};\n"),
      paste("\n matrix[nTx, nTx] Tx_dmat;         // treatment design matrix\n",
            "vector[cf_model == 3 ? nTx : 0] mu_alpha;             // treatment regression coefficients\n",
            "vector<lower=0>[cf_model == 3 ? nTx : 0] sigma_alpha;\n",
            "int<lower=0> t_max;\n", collapse = "\n"),
      cglue_data(ids,
                 "vector[cf_model == 2 ? nTx : 0] mu_alpha_{id};
                 vector<lower=0>[cf_model == 2 ? nTx : 0] sigma_alpha_{id};\n"))

  scode$parameters <-
    paste0(
      paste("// coefficients in linear predictor (including intercept)\n",
            "vector[bg_model == 1 ? H_1 : 0] beta_bg;\n",
            "vector[cf_model != 2 ? nTx : 0] alpha;\n", collapse = "\n"),
      cglue_data(ids, "vector[cf_model == 2 ? nTx : 0] alpha_{id};
                      vector[H_{id}] beta_{id};\n"), collapse = "\n")

  scode$trans_params_def <-
    cglue_data(ids,
               "vector[N_{id}] lp_{id}_bg;\n
                vector<lower=0>[N_{id}] lambda_{id}_bg;\n
                ")

  scode$trans_params_main <-
    cglue_data(ids,
               "// correlated event times
        lp_{id} = X_{id}*beta_{id};

      // background survival with uncertainty\n
      if (bg_model == 1) {{
        lp_{id}_bg = X_{id}*beta_bg;
      } else {{
        lp_{id}_bg = log(h_bg_{id});
      }\n
      lambda_{id}_bg = exp(lp_{id}_bg);\n
      ")

  scode$model <-
    paste0(
      cglue_data(ids, "\nint idx_{id};"),
      cglue_data(ids,"
      \nbeta_{id} ~ normal(mu_S_{id}, sigma_S_{id});\n"),
      paste0("\nif (bg_model == 1) {\n",
        "\t beta_bg ~ normal(mu_bg, sigma_bg);\n",
      "}\n"), collapse = "\n")
      
...

How to Choose?

Degree of variation: If the differences between your Stan models are truly substantial (e.g., entirely different likelihoods, highly custom priors that fundamentally change the model structure), then code generation might be beneficial.
Commonalities: If there’s a lot of shared structure and only minor tweaks (e.g., changing the number of groups, adding a specific type of covariate), flexible inputs with flags would likely be more maintainable.
Expertise and comfort: If you have strong metaprogramming skills in R and are comfortable with generating code dynamically, it might be a powerful approach. If you prioritize simpler, more direct code that’s easy to read and debug, flexible inputs are probably better.
Performance requirements: For highly performance-critical Stan models, generating optimized code for each specific case might be worth the extra complexity.

Thanks 🙏

References

Green, Nathan, Murat Kurt, Andriy Moshyk, James Larkin, and Gianluca Baio. 2025. “A Bayesian Hierarchical Mixture Cure Modelling Framework to Utilize Multiple Survival Datasets for Long-Term Survivorship Estimates: A Case Study from Previously Untreated Metastatic Melanoma.” Statistics in Medicine 44 (13-14): e70132. https://doi.org/https://doi.org/10.1002/sim.70132.

Code Generation vs Configuration: A Case Study with Stan and R

Outline

Using if/else for the Distribution

Build Function Name from Text Argument

Code Generation

What’s this called and what’s the Difference? 🤔

Advantages of Code Generation 👍

Disadvantages of Code Generation 👎

Advantages of Flexible Inputs 👍

Disadvantages of Flexible Inputs 👎

Case Study in Stan: Multilevel Mixture Cure Model

Stan code

How to Choose?

Thanks 🙏

References

Code Generation vs Configuration:
A Case Study with Stan and R

Using `if/else` for the Distribution