• DocumentCode
    2008053
  • Title

    Missing Data Imputation in Longitudinal Cohort Studies: Application of PLANN-ARD in Breast Cancer Survival

  • Author

    Fernandes, Ana S. ; Jarman, Ian H. ; Etchells, Terence A. ; Fonseca, José M. ; Biganzoli, Elia ; Bajdik, Chris ; Lisboa, Paulo J G

  • Author_Institution
    Fac. de Cienc. e Lecnologia, Univ. Nova de Lisboa, Lisboa
  • fYear
    2008
  • fDate
    11-13 Dec. 2008
  • Firstpage
    644
  • Lastpage
    649
  • Abstract
    Missing values are common in medical datasets and may be amenable to data imputation when modelling a given data set or validating on an external cohort. This paper discusses model averaging over samples of the imputed distribution and extends this approach to generic non-linear modelling with the Partial Logistic Artificial Neural Network (PLANN) regularised within the evidence-based framework with Automatic Relevance Determination (ARD). The study then applies the imputation to external validation over new patient cohorts, considering also the case of predictions made for individual patients. A prognostic index is defined for the non-linear model and validation results show that 4 statistically significant risk groups identified at the 95% level of confidence from the modelling data, from Christie Hospital (n=931), retain good separation during external validation with data from the British Columbia Cancer Agency (n=4,083).
  • Keywords
    cancer; data analysis; mammography; medical computing; medical information systems; neural nets; British Columbia Cancer Agency; Christie Hospital; PLANN-ARD; automatic relevance determination; breast cancer survival; evidence-based framework; external cohort; generic nonlinear modelling; imputed distribution; longitudinal cohort study; medical datasets; missing data imputation; modelling data; new patient cohorts; partial logistic artificial neural network; prognostic index; Breast cancer; Hospitals; Logistics; Machine learning; Medical treatment; Metastasis; Predictive models; Recruitment; Training data; Tumors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications, 2008. ICMLA '08. Seventh International Conference on
  • Conference_Location
    San Diego, CA
  • Print_ISBN
    978-0-7695-3495-4
  • Type

    conf

  • DOI
    10.1109/ICMLA.2008.106
  • Filename
    4725043