• DocumentCode
    178626
  • Title

    Playscript Classification and Automatic Wikipedia Play Articles Generation

  • Author

    Banerjee, S. ; Caragea, C. ; Mitra, P.

  • Author_Institution
    Pennsylvania State Univ., University Park, PA, USA
  • fYear
    2014
  • fDate
    24-28 Aug. 2014
  • Firstpage
    3630
  • Lastpage
    3635
  • Abstract
    In this work, we aim to create Wikipedia pages on plays automatically by extracting relevant information from various web sources. Our approach involves building an efficient classifier that can classify web documents as play scripts. From the set of correctly classified instances of play scripts, we extract relevant play-related information from the documents and use it to obtain additional information from various sources on the web. This information is aggregated and human-readable Wikipedia pages are created using a bot. The results of our experiments show that classifiers trained by combining our designed features along with "bag-of-words" (bow) features outperform classifiers trained using only bow features. Our approach further shows that good quality human-readable pages can be created using our bot. Such automatic page generation process can eventually ensure a more complete Wikipedia.
  • Keywords
    Internet; Web sites; Web documents; Web sources; Wikipedia pages; articles generation; automatic Wikipedia; automatic page generation process; bag-of-words features; human readable pages; playscript classification; Data mining; Electronic publishing; Encyclopedias; Feature extraction; Internet; Radio frequency;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2014 22nd International Conference on
  • Conference_Location
    Stockholm
  • ISSN
    1051-4651
  • Type

    conf

  • DOI
    10.1109/ICPR.2014.624
  • Filename
    6977336