Abstract
A rapid algorithm for visualizing large chemical databases in a low-dimensional space (2D or 3D) is presented as a first step in database analysis and design applications. The projection mapping of the compound database (described as vectors in the high-dimensional space of chemical descriptors) is based on the singular value decomposition (SVD) combined with a minimization procedure implemented with the efficient truncated-Newton program package (TNPACK). Numerical experiments on four chemical datasets with real-valued descriptors (ranging from 58 to 27 255 compounds) show that the SVD/TNPACK projection duo achieves a reasonable accuracy in 2D, varying from 30% to about 100% of pairwise distance segments that lie within 10% of the original distances. The lowest percentages, corresponding to scaled datasets, can be made close to 100% with projections onto a 10-dimensional space. We also show that the SVD/TNPACK duo is efficient for minimizing the distance error objective function (especially for scaled datasets), and that TNPACK is much more efficient than a current popular approach of steepest descent minimization in this application context. Applications of our projection technique to similarity and diversity sampling in drug design can be envisioned.
Original language | English (US) |
---|---|
Pages (from-to) | 167-177 |
Number of pages | 11 |
Journal | Journal of Chemical Information and Computer Sciences |
Volume | 40 |
Issue number | 1 |
DOIs | |
State | Published - 2000 |
ASJC Scopus subject areas
- General Chemistry
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics