Title of article :
Untangling Herdanʹs law and Heapsʹ law: Mathematical and informetric arguments
Author/Authors :
Leo Egghe1، نويسنده , , 2، نويسنده ,
Issue Information :
ماهنامه با شماره پیاپی سال 2007
Pages :
8
From page :
702
To page :
709
Abstract :
Herdanʹs law in linguistics and Heapsʹ law in information retrieval are different formulations of the same phenomenon. Stated briefly and in linguistic terms they state that vocabulariesʹ sizes are concave increasing power laws of textsʹ sizes. This study investigates these laws from a purely mathematical and informetric point of view. A general informetric argument shows that the problem of proving these laws is, in fact, ill-posed. Using the more general terminology of sources and items, the author shows by presenting exact formulas from Lotkaian informetrics that the total number T of sources is not only a function of the total number A of items, but is also a function of several parameters (e.g., the parameters occurring in Lotkaʹs law). Consequently, it is shown that a fixed T (or A) value can lead to different possible A (respectively, T) values. Limiting the T(A)-variability to increasing samples (e.g., in a text as done in linguistics) the author then shows, in a purely mathematical way, that for large sample sizes T≈LAθ, where θ is a constant, θ6<1 but close to 1, hence roughly, Heapsʹ or Herdanʹs law can be proved without using any linguistic or informetric argument. The author also shows that for smaller samples, u is not a constant but essentially decreases as confirmed by practical examples. Finally, an exact informetric argument on random sampling in the items shows that, in most cases, T = T(A) is a concavely increasing function, in accordance with practical examples.
Journal title :
Journal of the American Society for Information Science and Technology
Serial Year :
2007
Journal title :
Journal of the American Society for Information Science and Technology
Record number :
993484
Link To Document :
بازگشت