Efficient Document Indexing Using Pivot Tree

Gaurav Singh; Benjamin Piwowarski

Rapport (Rapport De Recherche) Année : 2016

Efficient Document Indexing Using Pivot Tree

, (1)

Gaurav Singh

Fonction : Auteur

Benjamin Piwowarski

Fonction : Auteur
PersonId : 9362
IdHAL : benjamin-piwowarski
ORCID : 0000-0001-6792-3262
IdRef : 226846601

Bases de Données

Résumé

We present a novel method for efficiently searching top-k neighbors for documents represented in high dimensional space of terms based on the cosine similarity. Mostly, documents are stored as bag-of-words tf-idf representation. One of the most used ways of computing similarity between a pair of documents is cosine similarity between the vector representations, but cosine similarity is not a metric distance measure as it doesn't follow triangle inequality, therefore most metric searching methods can not be applied directly. We propose an efficient method for indexing documents using a pivot tree that leads to efficient retrieval. We also study the relation between precision and efficiency for the proposed method and compare it with a state of the art in the area of document searching based on inner product.

Domaines

Informatique [cs] Recherche d'information [cs.IR]

Benjamin Piwowarski : Connectez-vous pour contacter le contributeur

https://hal.sorbonne-universite.fr/hal-01358681

Soumis le : jeudi 1 septembre 2016-11:25:37

Dernière modification le : mardi 11 avril 2023-15:16:28

Dates et versions

hal-01358681 , version 1 (01-09-2016)

Identifiants

HAL Id : hal-01358681 , version 1
ARXIV : 1605.06693

Citer

Gaurav Singh, Benjamin Piwowarski. Efficient Document Indexing Using Pivot Tree. [Research Report] Sorbonne Universités, UPMC Univ Paris 06, CNRS, LIP6 UMR 7606. 2016. ⟨hal-01358681⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS LIP6 LARA SORBONNE-UNIVERSITE SU-SCIENCES

175 Consultations

0 Téléchargements

Efficient Document Indexing Using Pivot Tree

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager