Abstract :
The paper herein discusses the significance of metaphor annotation for a resource-scarce Bantu language of South Africa, Southern Sotho. In so doing, the need for development of NLP tools for this language and others like it will be indicated. Some of the challenges that have led to the lack of representation in NLP for this language include the absence of a corpus, and since this project refers to a recently compiled corpus for this language, it has been decided that the corpus needs to be annotated, in order to further prime language processing programs with its linguistic attributes. There are various reasons to annotate a corpus and various ways of annotating it, but this paper has chosen to focus on metaphor annotation. This sort of annotation can be performed at word or phrase/sentence level. For this project, the corpus will be annotated at word level. Another type of annotation that the corpus being worked with would require is word class tagging, which is the classification of word sense into their respective lexical classes. It may seem at first glance that metaphor annotation at word level is somehow the same as word tagging, however, metaphor annotation does not only identify a part-of-speech in its linguistic category, it includes semantic interpretation as well. This simultaneously disambiguates word senses. In addition, the computational requirements for such annotation will be looked into. There has already been challenges presented with word class tagging using conventional processing tools, and these indicated that this is due to the internal structure of Sesotho. However, metaphor annotation promises to open-up to other forms of language processing for Sesotho, which will in a long run make it untedious to process or for computer programs to recognize.
Keywords :
natural language processing; pattern classification; semantic networks; text analysis; word processing; NLP; Sesotho text corpus; linguistic attribute; metaphor annotation; natural language processing; resource-scarce Bantu language representation; semantic interpretation; word sense classification; word sense disambiguation; Animals; Computers; Manuals; Natural language processing; Ontologies; Pragmatics; Semantics; Metaphor Annotation; NLP Tools; Natural Language Processing; Resource-scarce languages; Southern Sotho;