string constraint solving refers to solving combinatorial problems involving constraints over string variables. stringsolving approaches have become popular over the past few years given the massive use of strings in...
详细信息
string constraint solving refers to solving combinatorial problems involving constraints over string variables. stringsolving approaches have become popular over the past few years given the massive use of strings in different application domains like formal analysis, automated testing, database query processing, and cybersecurity. This article reports a comprehensive survey on string constraint solving by exploring the large number of approaches that have been proposed over the past few decades to solve stringconstraints.
constraintsolving is an essential technique for detecting vulnerabilities in programs, since it can reason about input sanitization and validation operations performed on user inputs. However, real-world programs typ...
详细信息
ISBN:
(纸本)9781538638682
constraintsolving is an essential technique for detecting vulnerabilities in programs, since it can reason about input sanitization and validation operations performed on user inputs. However, real-world programs typically contain complex string operations that challenge vulnerability detection. State-of-the-art stringconstraint solvers support only a limited set of string operations and fail when they encounter an unsupported one;this leads to limited effectiveness in finding vulnerabilities. In this paper we propose a search-driven constraintsolving technique that complements the support for complex string operations provided by any existing stringconstraint solver. Our technique uses a hybrid constraintsolving procedure based on the Ant Colony Optimization meta-heuristic. The idea is to execute it as a fallback mechanism, only when a solver encounters a constraint containing an operation that it does not support. We have implemented the proposed search-driven constraintsolving technique in the ACO-Solver tool, which we have evaluated in the context of injection and XSS vulnerability detection for Java Web applications. We have assessed the benefits and costs of combining the proposed technique with two state-of-the-art constraint solvers (Z3-str2 and CVC4). The experimental results, based on a benchmark with 104 constraints derived from nine realistic Web applications, show that our approach, when combined in a state-of-the-art solver, significantly improves the number of detected vulnerabilities (from 4.7% to 71.9% for Z3-str2, from 85.9% to 100.0% for CVC4), and solves several cases on which the solver fails when used stand-alone (46 more solved cases for Z3-str2, and 11 more for CVC4), while still keeping the execution time affordable in practice.
Securing web applications remains a pressing challenge. Unfortunately, the state of the art in web crawling and security scanning still falls short of deep crawling. A major roadblock is the crawlers' limited abil...
详细信息
ISBN:
(纸本)9798400700507
Securing web applications remains a pressing challenge. Unfortunately, the state of the art in web crawling and security scanning still falls short of deep crawling. A major roadblock is the crawlers' limited ability to pass input validation checks when web applications require data of a certain format, such as email, phone number, or zip code. This paper develops Black Ostrich, a principled approach to deep web crawling and scanning. The key idea is to equip web crawling with string constraint solving capabilities to dynamically infer suitable inputs from regular expression patterns in web applications and thereby pass input validation checks. To enable this use of constraint solvers, we develop new automata-based techniques to process JavaScript regular expressions. We implement our approach extending and combining the Ostrich constraint solver with the Black Widow web crawler. We evaluate Black Ostrich on a set of 8,820 unique validation patterns gathered from over 21,667,978 forms from a combination of the July 2021 Common Crawl and Tranco top 100K. For these forms and reconstructions of input elements corresponding to the patterns, we demonstrate that Black Ostrich achieves a 99% coverage of the form validations compared to an average of 36% for the state-of-the-art scanners. Moreover, out of the 66,377 domains using these patterns, we solve all patterns on 66,309 (99%) while the combined efforts of the other scanners cover 52,632 (79%). We further show that our approach can boost coverage by evaluating it on three open-source applications. Our empirical studies include a study of email validation patterns, where we find that 213 (26%) out of the 825 found email validation patterns liberally admit XSS injection payloads.
Regular expressions are a classical concept in formal language theory. Regular expressions in programming languages (RegEx) such as JavaScript, feature non-standard semantics of operators (e.g. greedy/lazy Kleene star...
详细信息
Regular expressions are a classical concept in formal language theory. Regular expressions in programming languages (RegEx) such as JavaScript, feature non-standard semantics of operators (e.g. greedy/lazy Kleene star), as well as additional features such as capturing groups and references. While symbolic execution of programs containing RegExes appeals to string solvers natively supporting important features of RegEx, such a string solver is hitherto missing. In this paper, we propose the first string theory and string solver that natively provides such support. The key idea of our string solver is to introduce a new automata model, called prioritized streaming string transducers (PSST), to formalize the semantics of RegEx-dependent string functions. PSSTs combine priorities, which have previously been introduced in prioritized finite-state automata to capture greedy/lazy semantics, with string variables as in streaming string transducers to model capturing groups. We validate the consistency of the formal semantics with the actual JavaScript semantics by extensive experiments. Furthermore, to solve the stringconstraints, we show that PSSTs enjoy nice closure and algorithmic properties, in particular, the regularity-preserving property (i.e., pre-images of regular constraints under PSSTs are regular), and introduce a sound sequent calculus that exploits these properties and performs propagation of regular constraints by means of taking post-images or pre-images. Although the satisfiability of the stringconstraint language is generally undecidable, we show that our approach is complete for the so-called straightline fragment. We evaluate the performance of our string solver on over 195 000 stringconstraints generated from an open-source RegEx library. The experimental results show the efficacy of our approach, drastically improving the existing methods (via symbolic execution) in both precision and efficiency.
暂无评论