TY - JOUR
T1 - Cofe
T2 - A scalable method for feature extraction from complex objects
AU - Hristescu, Gabriela
AU - Farach-Colton, Martin
PY - 2000
Y1 - 2000
N2 - Feature Extraction, also known as Multidimensional Scaling, is a basic primitive associated with indexing, clustering, nearest neighbor searching and visualization. We consider the problem of feature extraction when the data-points are complex and the distance evaluation function is very expensive to evaluate. Examples of expensive distance evaluations include those for computing the Hausdorff distance between polygons in a spatial database, or the edit distance between macromolecules in a DNA or protein database. We propose Cofe, a method for sparse feature extraction which is based on novel random non-linear projections. We evaluate Cofe on real data and find that it performs very well in terms of quality of features extracted, number of distances evaluated, number of database scans performed and total run time.We further propose Cofe-GR, which matches Cofe in terms of distance evaluations and run-time, but outperforms it in terms of quality of features extracted.
AB - Feature Extraction, also known as Multidimensional Scaling, is a basic primitive associated with indexing, clustering, nearest neighbor searching and visualization. We consider the problem of feature extraction when the data-points are complex and the distance evaluation function is very expensive to evaluate. Examples of expensive distance evaluations include those for computing the Hausdorff distance between polygons in a spatial database, or the edit distance between macromolecules in a DNA or protein database. We propose Cofe, a method for sparse feature extraction which is based on novel random non-linear projections. We evaluate Cofe on real data and find that it performs very well in terms of quality of features extracted, number of distances evaluated, number of database scans performed and total run time.We further propose Cofe-GR, which matches Cofe in terms of distance evaluations and run-time, but outperforms it in terms of quality of features extracted.
UR - http://www.scopus.com/inward/record.url?scp=84947610412&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84947610412&partnerID=8YFLogxK
U2 - 10.1007/3-540-44466-1_36
DO - 10.1007/3-540-44466-1_36
M3 - Article
AN - SCOPUS:84947610412
SN - 0302-9743
VL - 1874
SP - 358
EP - 371
JO - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
JF - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ER -