مرکز منطقه ای اطلاع رساني علوم و فناوري - Nearest neighbor classification using bottom-k sketches

DocumentCode :

659581

Title :

Nearest neighbor classification using bottom-k sketches

Author :

Dahlgaard, Soren ; Igel, Christian ; Thorup, Mikkel

fYear :

2013

fDate :

6-9 Oct. 2013

Firstpage :

Lastpage :

Abstract :

Bottom-k sketches are an alternative to k×minwise sketches when using hashing to estimate the similarity of documents represented by shingles (or set similarity in general) in large-scale machine learning. They are faster to compute and have nicer theoretical properties. In the case of k×minwise hashing, the bias introduced by not truly random hash function is independent of the number k of hashes, while this bias decreases with increasing k when employing bottom-k. In practice, bottom-k sketches can expedite classification systems if the trained classifiers are applied to many data points with a lot of features (i.e., to many documents encoded by a large number of shingles on average). An advantage of b-bit k×minwise hashing is that it can be efficiently incorporated into machine learning methods relying on scalar products, such as support vector machines (SVMs). Still, experimental results indicate that a nearest neighbors classifier with bottom-k sketches can be preferable to using a linear SVM and b-bit k×minwise hashing if the amount of training data is low or the number of features is high.

Keywords :

document handling; learning (artificial intelligence); pattern classification; support vector machines; bottom-k sketches; classification systems; documents similarity; hash function; k-minwise hashing; large-scale machine learning; linear SVM; nearest neighbor classification; scalar products; support vector machines; Accuracy; Indexes; Kernel; Support vector machines; Training; Training data; Vectors; document encoding; hashing; large-scale machine learning; nearest neighbor classification; set similarity;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Big Data, 2013 IEEE International Conference on

Conference_Location :

Silicon Valley, CA

Type :

conf

DOI :

10.1109/BigData.2013.6691730

Filename :

6691730

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=659581