A generic tool to perform automatic clustering of scanned images
Everyone who has to group a large set of images in such a way that images in the same group are more similar to each other than to those in other groups, like for instance, in incoming mail processing.
Two kinds of methods have been implemented. The first method consists in applying optical character recognition on pages. Distances are computed between images to classify and images contained in a database of labeled images. The second method consists in randomly selecting a pool of images inside a directory. For each image, invariant key points are extracted and characteristic features are computed (SIFT or SURF) to build the clusters.
Any Posix compliant system