Standard errors in case–cohort studies with small sample sizes and adjustment for prognostic covariates

RSS International Conference 2024

1 References

Estimation and variance methods:

Prentice, R. L. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73, 1–11 (1986).

Self, S. G. & Prentice, R. L. Asymptotic Distribution Theory and Efficiency Results for Case-Cohort Studies. The Annals of Statistics 16, 64–81 (1988).

Barlow, W. E. Robust Variance Estimation for the Case-Cohort Design. Biometrics 50, 1064–1072 (1994).

Barlow, W. E., Ichikawa, L., Rosner, D. & Izumi, S. Analysis of Case-Cohort Designs. Journal of Clinical Epidemiology 52, 1165–1172 (1999).

Lin, D. Y. & Ying, Z. Cox Regression with Incomplete Covariate Measurements. Journal of the American Statistical Association 88, 1341–1349 (1993).

Therneau, T. M. & Li, H. Computing the Cox Model for Case Cohort Designs. 14 (1999).

Review of case-cohort studies:

Sharp, S. J., Poulaliou, M., Thompson, S. G., White, I. R. & Wood, A. M. A Review of Published Analyses of Case-Cohort Studies and Recommendations for Future Reporting. PLoS One 9, e101176 (2014).

Multiple Imputation and Bootstrap:

Keogh, R. H. & White, I. R. Using full-cohort data in nested case–control and case–cohort studies by multiple imputation. Statistics in Medicine 32, 4021–4043 (2013).

Keogh, R. H., Seaman, S. R., Bartlett, J. W. & Wood, A. M. Multiple imputation of missing data in nested case-control and case-cohort studies. Biometrics 74, 1438–1449 (2018).

Huang, Y. Bootstrap for the case-cohort design. Biometrika 101, 465–476 (2014).

Other useful references:

Jiao, J. Q. Comparison of variance estimators in case -cohort studies. (University of Southern California, United States – California, 2002).

Kulathinal, S., Karvanen, J., Saarela, O. & Kuulasmaa, K. Case-cohort design in practice – experiences from the MORGAM Project. Epidemiol Perspect Innov 4, 15 (2007).

Onland-Moret, N. C. et al. Analysis of case-cohort data: A comparison of different methods. Journal of Clinical Epidemiology 60, 350–355 (2007).

2 Abstract

Measurement of covariates for all participants in a large cohort study can be too expensive or impractical. In a case–cohort design, covariates are measured in a subcohort (often a random sample of the larger cohort) and in some or all cases. This design can be more efficient and flexible than a nested case–control study. Such studies used to estimate exposure–disease associations will also likely include prognostic covariates and confounders.

Cox proportional hazards models can be used to estimate hazard ratios via one of several weighting methods. These methods produce a pseudo-partial likelihood used for estimation. Standard errors based on the naïve variance estimator (the inverse of the information matrix) overestimate the precision of these estimators. Several methods for variance estimation have been proposed including asymptotic and robust (approximate jackknife) estimators. These two methods use the dfbeta residuals which are calculated from the information and score matrices and approximate the influence of an observation on parameter estimates.

Using simulations, we investigate the performance of hazard ratio estimators and standard errors derived from the asymptotic and robust variance estimators for different sample sizes and covariate patterns. We consider a range of sample sizes for the subcohort and number of cases that emulate real study designs. Covariates that are independent of exposure and covariates that confound the exposure–disease association are examined. We demonstrate some substantial bias of the standard errors estimated by these methods when adjusting for highly prognostic covariates in scenarios with a small subcohort sample size. We explore alternative approaches to calculating standard errors in case–cohort studies and make recommendations for the planning and analysis of future studies.