DocumentCode
3055386
Title
Zipf’s law of burstiness in Turkish: The length of intervals between repetitions
Author
Kocabas, Ilker ; Kisla, Tarik ; Karaoglan, Bahar
Author_Institution
Ege Univ., Izmir
fYear
2007
fDate
7-9 Nov. 2007
Firstpage
1
Lastpage
3
Abstract
Zipf law of burstiness of content words is being less studied than his laws that describe the relation between the rank and the frequency of words. Zipf counted the number of intervals of the same length between the repetitions of the words belonging to the same frequency class and on a 260,000 word English corpus empirically showed that the interval size, I, between each occurrence of a word is inversely proportional to the number of intervals having that size: F a Ip, where p varied between 1 and 1.3. In this study we investigated the validity of the law of burstiness on a Turkish corpus of size 55,000 and found p varying between 0.5 and 0.8.
Keywords
natural language processing; English corpus; Turkish word burstiness; Zipf law; content word; Books; Differential equations; Frequency measurement; Indexing; Information retrieval; Length measurement; Mathematical model; Measurement standards; Measurement units; Natural language processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer and information sciences, 2007. iscis 2007. 22nd international symposium on
Conference_Location
Ankara
Print_ISBN
978-1-4244-1363-8
Electronic_ISBN
978-1-4244-1364-5
Type
conf
DOI
10.1109/ISCIS.2007.4456847
Filename
4456847
Link To Document