• DocumentCode
    3204908
  • Title

    An efficient method for in memory construction of suffix arrays

  • Author

    Itoh, Hideo ; Tanaka, Hozumi

  • Author_Institution
    Software Res. Center, Ricoh Co. Ltd., Japan
  • fYear
    1999
  • fDate
    1999
  • Firstpage
    81
  • Lastpage
    88
  • Abstract
    The suffix array is a string-indexing structure and a memory efficient alternative to the suffix tree. It has many advantages for text processing. We propose an efficient algorithm for sorting suffixes. We call this algorithm the two-stage suffix sort. One of our ideas is to exploit the specific relationships between adjacent suffixes. Our algorithm makes it possible to use the suffix array for much larger texts and suggests new areas of application. Our experiments on several text data sets (including 514-MB Japanese newspapers) demonstrate that our algorithm is 4.5 to 6.9 times faster than Quicksort, and 2.5 to 3.6 times faster than K. Sadakane´s (1998) algorithm, which is considered to be the fastest algorithm in previous work
  • Keywords
    sorting; string matching; text analysis; Japanese newspapers; Quicksort; adjacent suffixes; in memory construction; larger texts; memory efficient alternative; string-indexing structure; suffix array; suffix arrays; suffix tree; text data sets; text processing; two-stage suffix sort; Automata; Costs; Data structures; Dictionaries; Indexing; Natural language processing; Natural languages; Personal communication networks; Sorting; Tail;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    String Processing and Information Retrieval Symposium, 1999 and International Workshop on Groupware
  • Conference_Location
    Cancun
  • Print_ISBN
    0-7695-0268-7
  • Type

    conf

  • DOI
    10.1109/SPIRE.1999.796581
  • Filename
    796581