The string edit distance matching problem with moves

Graham Cormode, S. Muthukrishnan

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    The edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes and changes needed to convert R to S. Given a text string t of length n, and a pattern string p of length m, informally, the string edit distance matching problem is to compute the smallest edit distance between p and substrings of t. A well known dynamic programming algorithm takes time O(nm) to solve this problem, and it is an important open problem in Combinatorial Pattern Matching to significantly improve this bound. We relax the problem so that (a) we allow an additional operation, namely, substring moves, and (b) we approximate the string edit distance upto a factor of O(log n log∗ n).1 Our result is a near linear time deterministic algorithm for this version of the problem. This is the first known significantly subquadratic algorithm for a string edit distance problem in which the distance involves nontrivial alignments. Our results are obtained by embedding strings into Li vector space using a simplified parsing technique we call Edit Sensitive Parsing (ESP). This embedding is approximately distance preserving, and we show many applications of this embedding to string proximity problems including nearest neighbors, oudiers, and streaming computations with strings.

    Original languageEnglish (US)
    Title of host publicationProceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2002
    PublisherAssociation for Computing Machinery
    Pages667-676
    Number of pages10
    ISBN (Electronic)089871513X
    StatePublished - 2002
    Event13th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2002 - San Francisco, United States
    Duration: Jan 6 2002Jan 8 2002

    Publication series

    NameProceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms
    Volume06-08-January-2002

    Other

    Other13th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2002
    CountryUnited States
    CitySan Francisco
    Period1/6/021/8/02

    ASJC Scopus subject areas

    • Software
    • Mathematics(all)

    Fingerprint Dive into the research topics of 'The string edit distance matching problem with moves'. Together they form a unique fingerprint.

  • Cite this

    Cormode, G., & Muthukrishnan, S. (2002). The string edit distance matching problem with moves. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2002 (pp. 667-676). (Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms; Vol. 06-08-January-2002). Association for Computing Machinery.