• DocumentCode
    3487906
  • Title

    Automatic Detection of Pseudocodes in Scholarly Documents Using Machine Learning

  • Author

    Tuarob, Suppawong ; Bhatia, Sumit ; MITRA, PINAKI ; Giles, C. Lee

  • Author_Institution
    Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
  • fYear
    2013
  • fDate
    25-28 Aug. 2013
  • Firstpage
    738
  • Lastpage
    742
  • Abstract
    A significant number of scholarly articles in computer science and other disciplines contain algorithms that provide concise descriptions for solving a wide variety of computational problems. For example, Dijkstra´s algorithm describes how to find the shortest paths between two nodes in a graph. Automatic identification and extraction of these algorithms from scholarly digital documents would enable automatic algorithm indexing, searching, analysis and discovery. An algorithm search engine, which identifies pseudocodes in scholarly documents and makes them searchable, has been implemented as a part of the CiteSeerX suite. Here, we illustrate the limitations of start-of-the-art rule based pseudocode detection approach, and present a novel set of machine learning based techniques that extend previous methods.
  • Keywords
    document handling; electronic publishing; indexing; information retrieval; learning (artificial intelligence); search engines; CiteSeer suite; algorithm search engine; automatic algorithm analysis; automatic algorithm discovery; automatic algorithm extraction; automatic algorithm identification; automatic algorithm indexing; automatic algorithm searching; automatic pseudocode detection; computer science; machine learning; pseudocode identification; scholarly articles; scholarly digital documents; Algorithm design and analysis; Approximation algorithms; Feature extraction; Machine learning algorithms; Portable document format; Software algorithms; Standards; Algorithm; Classification; Experiment; Machine Learning; Pseudocode;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2013.151
  • Filename
    6628716