DocumentCode :
3614599
Title :
Vocabulary size prediction of Croatian texts
Author :
T. Miroslav;M. Nives;B. Damir
Author_Institution :
Dept. of Inf. Sci., Zagreb Univ., Croatia
fYear :
2003
fDate :
6/25/1905 12:00:00 AM
Firstpage :
223
Lastpage :
228
Abstract :
Preliminary research on the vocabulary size of Croatian lexical corpora shows that the distribution of types is regular and that deviations of the calculated values are within theoretically acceptable limits. The research also brought us to conclusion that Zipf´s law in Croatian language is not applicable because the lexical density is different, i.e. the proportion of types and tokens in different languages is different and the parameters of that proportion need to be calculated for every language separately.
Keywords :
"Vocabulary","Natural languages","Density measurement","Natural language processing","Particle measurements","Size measurement","Differential equations","Terminology","Information technology"
Publisher :
ieee
Conference_Titel :
Information Technology Interfaces, 2003. ITI 2003. Proceedings of the 25th International Conference on
ISSN :
1330-1012
Print_ISBN :
953-96769-6-7
Type :
conf
DOI :
10.1109/ITI.2003.1225349
Filename :
1225349
Link To Document :
بازگشت