• DocumentCode
    3318125
  • Title

    Hybrid framework for information extraction for geographical terms in Hindi language texts

  • Author

    Dutta, Kamlesh ; Prakash, Nupur ; Kaushik, Saroj

  • Author_Institution
    Nat. Inst. of Technol., Hamirpur, India
  • fYear
    2005
  • fDate
    30 Oct.-1 Nov. 2005
  • Firstpage
    577
  • Lastpage
    581
  • Abstract
    A hybrid information extraction (IE) framework based on geographical term detection approach has been developed to extract geographical information from an unrestricted Hindi text. The relationship between geographical entities extracted with the adjacent text is shown graphically so that information about these entities can be related. The system, a combination of statistically and linguistically motivated techniques, identifies single geographical names and multiple geographical names as well. The method is applied on Hindi language text, but the approach can be adapted for other languages also. The paper presents some experiments illustrating the accuracy of the method. The system being developed is in a prototype stage and will be extended to include relation mark-up as well.
  • Keywords
    geography; grammars; information retrieval; linguistics; natural languages; text analysis; Hindi language texts; geographical term detection; grammars; information extraction; linguistics; statistical techniques; Biological materials; Biology; Biomedical materials; Biomedical monitoring; Data mining; Finance; Internet; Natural languages; Prototypes; Research and development;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
  • Print_ISBN
    0-7803-9361-9
  • Type

    conf

  • DOI
    10.1109/NLPKE.2005.1598803
  • Filename
    1598803