Non-standard stringology: Algorithms and complexity

S. Muthukrishnan, K. Palem

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Non-standard stringology concerns string matching problems, wherein a position in the "text" (of size n) matches one in the "pattern)) (of size m), based on very general relationships between the corresponding "symbols". For example, string matching with don't cares is a simple non-standard string matching prob.' lem, wherein text andjor pattern positions might have wildcard symbols rather than those drawn from the base alphabet Σ these wildcards match ever-y symbol from Σ. The main results in this paper concern the inherent complexity of a variety of non-standard string mat thing problems, characterized in terms of algebraic convolutions. Non-standard Basic String Matching: l For three problems from this family - including string mat thing with don't cares and its generalizations - we prove a lower bound of Ω(f(Σ 1)) convolutions, where the (increasing) function depends on the problem. These results are proved in the boolean convolution model that we introduce here. We also match this bound by improving or adapting the bestknown algorithms to this model. l In the RAM model we show, using reductions, that all of these problems encode a variant of truncated boolean convolution with integer parameter n. These reductions allow us to infer that any improvement to the best-known algorithms for these problems (in the RAM) will yield faster algorithms for solving parametrized truncated convolutions of the input vectors. As we show here, the fastest algorithms for the latter problem derived by extending the scheme from [K089] uses 0(min{Pi;,√ M}) convolutions. We also derive analogous results for eight other variants of non-standard string matching problems that are drawn from the following two families: nonstandard counting string matching (eg., variant of classical string mat thing, that involves counting number of mismatches at each text position), and nonstandard threshold string matching (in which the kmismatches problem is a basic example). Interestingly, all of the above results are derived using the structure of the "match graph" defined by the mat thing relation of the given inst ante of the nonstandard string matching problem, and its complement. It turns out that our lower bounds and reductions depend upon the sizes of the induced cliques in these graphs - in particular, the "dominating" cliques in the match graph, and the "clique edge covers/ partitions" in its complement. We also provide improved deterministic and randomized algorithms, as well are those with better expected running times for some non-standard string matching problems.

    Original languageEnglish (US)
    Title of host publicationProceedings of the 26th Annual ACM Symposium on Theory of Computing, STOC 1994
    PublisherAssociation for Computing Machinery
    Pages770-779
    Number of pages10
    ISBN (Electronic)0897916638
    DOIs
    StatePublished - May 23 1994
    Event26th Annual ACM Symposium on Theory of Computing, STOC 1994 - Montreal, Canada
    Duration: May 23 1994May 25 1994

    Publication series

    NameProceedings of the Annual ACM Symposium on Theory of Computing
    VolumePart F129502
    ISSN (Print)0737-8017

    Conference

    Conference26th Annual ACM Symposium on Theory of Computing, STOC 1994
    Country/TerritoryCanada
    CityMontreal
    Period5/23/945/25/94

    ASJC Scopus subject areas

    • Software

    Fingerprint

    Dive into the research topics of 'Non-standard stringology: Algorithms and complexity'. Together they form a unique fingerprint.

    Cite this