The string edit distance matching problem with moves

Graham Cormode, S. Muthukrishnan

    Research output: Contribution to journalArticlepeer-review

    Abstract

    The edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes, and changes needed to convert R to S. Given a text string t of length n, and a pattern string p of length m, informally, the string edit distance matching problem is to compute the smallest edit distance between p and substrings of t.We relax the problem so that: (a) we allow an additional operation, namely, substring moves; and (b) we allow approximation of this string edit distance. Our result is a near-linear time deterministic algorithm to produce a factor of O(log n log* n) approximation to the string edit distance with moves. This is the first known significantly subquadratic algorithm for a string edit distance problem in which the distance involves nontrivial alignments. Our results are obtained by embedding strings into L1 vector space using a simplified parsing technique, which we call edit-sensitive parsing (ESP).

    Original languageEnglish (US)
    Article number1219947
    JournalACM Transactions on Algorithms
    Volume3
    Issue number1
    DOIs
    StatePublished - Feb 1 2007

    Keywords

    • Approximate pattern matching
    • Data streams
    • Edit distance
    • Embedding
    • Similarity search
    • String matching

    ASJC Scopus subject areas

    • Mathematics (miscellaneous)

    Fingerprint Dive into the research topics of 'The string edit distance matching problem with moves'. Together they form a unique fingerprint.

    Cite this