مرکز منطقه ای اطلاع رساني علوم و فناوري - Language Edit Distance and Maximum Likelihood Parsing of Stochastic Grammars: Faster Algorithms and Connection to Fundamental Graph Problems

Abstract :

Given a context free language G over alphabet Σ and a string s ∈ Σ*, the language edit distance problem seeks the minimum number of edits (insertions, deletions and substitutions) required to convert s into a valid member of the language £(G). The well-known dynamic programming algorithm solves this problem in O(n³) time (ignoring grammar size) where n is the string length [Aho, Peterson 1972, Myers 1985]. Despite its numerous applications, to date there exists no algorithm that computes the exact or approximate language edit distance problem in true subcubic time. In this paper we give the first such algorithm that computes language edit distance almost optimally. For any arbitrary ε > 0, our algorithm runs in Õ(nω/poly(∈)) time and returns an estimate within a multiplicative approximation factor of (1 + ∈) with high probability, where w is the exponent of ordinary matrix multiplication of n dimensional square matrices. It also computes the actual edits required. We further solve the local alignment problem; for all substrings of s, we can estimate their language edit distance within (1 ± ε) factor in Õ(n^ω/poly(∈)) time with high probability. Next, we design the very first subcubic (Õ(n^ω)) algorithm that given an arbitrary stochastic context free grammar, and a string returns a nearly-optimal maximum likelihood parsing of that string. Stochastic context free grammars significantly generalize hidden Markov models; they lie at the foundation of statistical natural language processing, and have found widespread applications in many other fields. To complement our upper bound result, we show that exact computation of maximum likelihood parsing of stochastic grammars or language edit distance in true subcubic time will imply a truly subcubic algorithm for all-pairs shortest paths, a long-standing open question. This will result in a breakthrough for a large range of problems in graphs and matrices due to subcubic equivalence. By a known lower bound result [Lee 2002], and a recent development [Abboud et al. 2015] even the much simpler problem of parsing a context free grammar requires Ω(n^ω) time. Therefore any nontrivial multiplicative approximation algorithms for either of the two problems in time o(n^ω) are unlikely to exist.