Mining database structure; Or, how to build a data quality browser

Tamraparni Dasu, Theodore Johnson, S. Muthukrishnan, Vladislav Shkapenyuk

    Research output: Contribution to journalConference article

    Abstract

    Data mining research typically assumes that the data to be analyzed has been identified, gathered, cleaned, and processed into a convenient form. While data mining tools greatly enhance the ability of the analyst to make data-driven discoveries, most of the time spent in performing an analysis is spent in data identification, gathering, cleaning and processing the data. Similarly, schema mapping tools have been developed to help automate the task of using legacy or federated data sources for a new purpose, but assume that the structure of the data sources is well understood. However the data sets to be federated may come from dozens of databases containing thousands of tables and tens of thousands of fields, with little reliable documentation about primary keys or foreign keys. We are developing a system, Bellman, which performs data mining on the structure of the database. In this paper, we present techniques for quickly identifying which fields have similar values, identifying join paths, estimating join directions and sizes, and identifying structures in the database. The results of the database structure mining allow the analyst to make sense of the database content. This information can be used to e.g., prepare data for data mining, find foreign key joins for schema mapping, or identify steps to be taken to prevent the database from collapsing under the weight of its complexity.

    Original languageEnglish (US)
    Pages (from-to)240-251
    Number of pages12
    JournalProceedings of the ACM SIGMOD International Conference on Management of Data
    DOIs
    StatePublished - 2002
    EventACM SIGMOD 2002 Proceedings of the ACM SIGMOD International Conference on Managment of Data - Madison, WI, United States
    Duration: Jun 3 2002Jun 6 2002

    ASJC Scopus subject areas

    • Software
    • Information Systems

    Fingerprint Dive into the research topics of 'Mining database structure; Or, how to build a data quality browser'. Together they form a unique fingerprint.

    Cite this