Section: Software
SimJoin (Distributed Approximate Similarity Join)
Participant : Alexis Joly [contact] .
SimJoin is a distributed software for the efficient computation of the
full approximate k-nn graph of large collections of high-dimensional
features. It is developed within a MapReduce framework and is
therefore easily portable to large cloud computing plateform. It is
based on recent theoretic contributions related to locality preserving
hash functions [34] . Its first main feature
is to allow splitting a large collection of high-dimensional features
into highly balanced pages that preserve locality according to any
given similarity kernel. Its second main feature is to build in