• DocumentCode
    2326071
  • Title

    Recovering traceability links in multilingual Web sites

  • Author

    Tonella, Paolo ; Ricca, Filippo ; Pianta, Emanuele ; Girardi, Christian

  • Author_Institution
    Centro per la Ricerca Scientifica a Tecnologica, ITC-irst, Trento, Italy
  • fYear
    2001
  • fDate
    10 Nov. 2001
  • Firstpage
    14
  • Lastpage
    21
  • Abstract
    The problem of verifying the consistency between Web site portions devoted to different languages is investigated. The purpose is to support the activity of the site maintainer, who is responsible for the alignment between different site versions. Anomalies that typically occur in such situations include the absence of pages in some languages, differences in the page structure in different languages, missing information and parts not translated. The approach which is proposed for recovering traceability links so as to simplify the update of the site to a consistent state, is based on a mix of structural and textual information extracted from the page. The syntax trees of the pages to be compared drive the page matching process. When structurally corresponding nodes are encountered during the tree visit, their text attributes are considered to see if they are each other´s translation.
  • Keywords
    computational linguistics; information resources; language translation; program diagnostics; text analysis; Web site maintainer; Web site portions; Web site update; consistency verification; consistent state; language translation; missing information; multilingual Web sites; page matching process; page structure; structurally corresponding nodes; syntax trees; text attributes; textual information; traceability link recovery; tree visit; Face recognition; Web page design;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Site Evolution, 2001. Proceedings. 3rd International Workshop on
  • Print_ISBN
    0-7695-1399-9
  • Type

    conf

  • DOI
    10.1109/WSE.2001.988780
  • Filename
    988780