• DocumentCode
    570692
  • Title

    Comparing methods to extract technical content for technological intelligence

  • Author

    Newman, Nils C. ; Porter, Alan L. ; Newman, David ; Courseault, Cherie ; Bolan, Stephanie D.

  • Author_Institution
    IISC, Atlanta, GA, USA
  • fYear
    2012
  • fDate
    July 29 2012-Aug. 2 2012
  • Firstpage
    1279
  • Lastpage
    1285
  • Abstract
    We are developing indicators for the emergence of science and technology (S&T) topics. We are targeting various S&T information resources, including metadata (i.e., bibliographic information) and full text. We explore alternative text analysis approaches - principal components analysis (PCA) and topic modeling - to extract technical topic information. We analyze the topical content to pursue potential applications and innovation pathways. In this presentation we compare alternative ways of consolidating messy sets of key terms [e.g., using Natural Language Processing (NLP) on abstracts and titles, together with various keyword sets]. Our process includes combinations of stopword removal, fuzzy term matching, association rules, and tf-idf weighting. We compare PCA results to topic modeling results. Our key test set consists of 4104 Web of Science records on Dye-Sensitized Solar Cells (DSSCs). Results suggest good potential to enhance our technical intelligence payoffs from database searches on topics of interest.
  • Keywords
    content-based retrieval; data mining; meta data; principal component analysis; scientific information systems; PCA; S&T information resources; alternative text analysis; association rules; bibliographic information; fuzzy term matching; metadata; principal components analysis; science and technology topics; stopword removal; technical content extraction; technological intelligence; tf-idf weighting; topic modeling; Abstracts; Clustering algorithms; Decision support systems; Electrodes; Films; Photovoltaic cells; Principal component analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Technology Management for Emerging Technologies (PICMET), 2012 Proceedings of PICMET '12:
  • Conference_Location
    Vancouver, BC
  • Print_ISBN
    978-1-4673-2853-1
  • Type

    conf

  • Filename
    6304150