DocumentCode :
1694117
Title :
Bootstrapping a Comparable Corpus from Patent Family Members
Author :
Lupu, Mihai
Author_Institution :
ESTeam AB, Vienna Univ. of Technol., Sweden, Austria
fYear :
2012
Firstpage :
144
Lastpage :
148
Abstract :
We present a method to generate comparable corpora from different patent documents covering the same invention. We rely on the fact that many inventors apply for protection in more than one jurisdictions. Often, these jurisdictions have different publication languages, and therefore, the same invention is described in more than one language. We use this fact to generate comparable corpora in any language pair where patent documents are available. We do this at the level of the title, abstract, description and claims and present statistics for English-Spanish data thus generated. We then show that with an additional filtering step we can reduce the errors inserted in the collection by the automated procedure.
Keywords :
document handling; natural language processing; patents; English-Spanish data; automated procedure; comparable corpus bootstrapping; patent documents; patent family members; publication languages; Abstracts; Databases; Law; Manuals; Patents; Technological innovation; comparable corpora; patent;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database and Expert Systems Applications (DEXA), 2012 23rd International Workshop on
Conference_Location :
Vienna
ISSN :
1529-4188
Print_ISBN :
978-1-4673-2621-6
Type :
conf
DOI :
10.1109/DEXA.2012.60
Filename :
6327417
Link To Document :
بازگشت