• DocumentCode
    3486521
  • Title

    Improving Formula Analysis with Line and Mathematics Identification

  • Author

    Alkalai, Mohamed ; Baker, Josef B. ; Sorge, Volker ; Xiaoyan Lin

  • Author_Institution
    Sch. of Comput. Sci., Univ. of Birmingham, Birmingham, UK
  • fYear
    2013
  • fDate
    25-28 Aug. 2013
  • Firstpage
    334
  • Lastpage
    338
  • Abstract
    The explosive growth of the internet and electronic publishing has led to a huge number of scientific documents being available to users, however, they are usually inaccessible to those with visual impairments and often only partially compatible with software and modern hardware such as tablets and e-readers. In this paper we revisit Maxtract, a tool for analysing and converting documents into accessible formats, and combine it with two advanced segmentation techniques, statistical line identification and machine learning formula identification. We show how these advanced techniques improve the quality of both Maxtract´s underlying document analysis and its output. We re-run and compare experimental results over a number of datasets, presenting a qualitative review of the improved output and drawing conclusions.
  • Keywords
    document image processing; electronic publishing; image segmentation; learning (artificial intelligence); statistical analysis; Internet; Maxtract document analysis; e-readers; electronic publishing; electronic readers; formula analysis; machine learning formula identification; mathematics identification; segmentation techniques; statistical line identification; tablets; Accuracy; Feature extraction; Histograms; Layout; Portable document format; Text recognition; Math formula recognition; formula identification; line segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2013.74
  • Filename
    6628639