Repeated split sample validation to assess logistic regression and recursive partitioning: an application to the prediction of cognitive impairment

Stat Med. 2005 Oct 15;24(19):3019-35. doi: 10.1002/sim.2154.

Abstract

Screening strategies play an important part in the identification and diagnosis of illness. Testing of such strategies in a clinical trial can have important implications for the treatment of such illnesses. Before the clinical trial, however, it is important to develop a practical screening/classification procedure that accurately predicts the presence of the illness in question. Recent published studies have shown a growing preference for classification tree/recursive partitioning procedures.This paper compares the application of logistic regression and recursive partitioning to a neuropsychological data set of 252 patients recruited from four Veterans Affairs Medical Centers. Logistic regression and recursive partitioning was used to predict cognitive impairment in 12 randomly selected exploratory/validation samples. We assessed the effect of sampling on variable selection and predictive accuracy.Predictive accuracy of the logistic regression and recursive partitioning procedures was comparable across the exploratory data samples but varied across the validation samples. Based on shrinkage, both classification procedures performed equally well for the prediction of cognitive impairment across the twelve samples. While logistic regression provided an estimated probability of outcome for each patient, it required several mathematical calculations to do so. However, logistic regression selected one or two less predictors than recursive partitioning with comparable predictive accuracy. Recursive partitioning, on the other hand, readily identified patient characteristics and variable interactions, was easy to interpret clinically and required no mathematical calculations. There was a high degree of overlap of the predictor variables between the two procedures.In the context of neuropsychological screening, logistic regression and recursive partitioning performed equally well and were quite stable in the selection of predictors for the identification of patients with cognitive impairment, although recursive partitioning may be easier to use in a clinical setting because it is based on a simple decision tree.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cognition Disorders / diagnosis*
  • Decision Trees*
  • Humans
  • Logistic Models*
  • Models, Psychological*
  • Predictive Value of Tests
  • Psychological Tests
  • ROC Curve