• DocumentCode
    2250573
  • Title

    The quantification of unstructured information and its use in predictive modeling

  • Author

    Dumrong, Prae ; Gould, Jared ; Lee, Greg ; Nicholson, Logan ; Gao, Kelly ; Beling, Peter ; Blume, Matthias ; Robinson, Jeff

  • Author_Institution
    Dept. of Syst. & Inf. Eng., Virginia Univ., Charlottesville, VA, USA
  • fYear
    2003
  • fDate
    24-25 April 2003
  • Firstpage
    225
  • Lastpage
    232
  • Abstract
    Managing text-based information is crucial when trying to extract valuable information from documents. Assigning a numerical value to the text-based (unstructured) information is one of the ways to extract value. This research studied the quantification of unstructured text and its forecasting power. In order to examine unstructured information that related to predictive models, the Beige books were utilized to investigate and predict changes in the U.S. economy. The Beige books describe current economic conditions and discuss fluctuations in real gross domestic product (GDP). To quantify the text-based unstructured information, the direct scoring algorithm (DSA) was proposed. It utilized the keywords in the document and their subjectively-determined numerical weights to score individual sentence. Statistical analyses were then conducted to verify which sections of the Beige books contributed the most significant information to the prediction of GDP. Utilizing the significant sections, a linear regression model was constructed to predict future GDP growth. The adjusted-R2 values of the DSA model were compared to the scoring of the same documents by an economic expert. The comparison demonstrated that the DSA model using the Beige book significantly contributed to the prediction of GDP, and it explained similar amounts of variance compared to the scores created by an economic expert. Also, a comparison between a structured predictive model and the DSA model was conducted to again prove the significance of text-based information.
  • Keywords
    economic indicators; information retrieval; regression analysis; text analysis; word processing; Beige books; direct scoring algorithm; linear regression model; predictive modeling; real gross domestic product; statistical analyses; text-based information; unstructured information quantification; Books; Data mining; Economic forecasting; Economic indicators; Fluctuations; Information management; Linear regression; Power generation economics; Predictive models; Statistical analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems and Information Engineering Design Symposium, 2003 IEEE
  • Print_ISBN
    0-9744559-0-3
  • Type

    conf

  • DOI
    10.1109/SIEDS.2003.158028
  • Filename
    1242423