检索结果-内蒙古大学图书馆

A scalable learning algorithm for data-driven program analysis

INFORMATION AND SOFTWARE TECHNOLOGY 2018年 104卷 1-13页

作者： Cha, Sooyoung Jeong, Sehun Oh, Hakjoo Korea Univ Dept Comp Sci & Engn Seoul South Korea

Context: Recently data-driven program analysis has emerged as a promising approach for building cost-effective static analyzers. The ideal static analyzer should apply accurate but costly techniques only when they benefit. However, designing such a strategy for real-world programs is highly nontrivial and requires labor-intensive work. The goal of data-driven program analysis is to automate this process by learning the strategy from data through a learning algorithm. Objective: Current learning algorithms for data-driven program analysis are not scalable enough to be used with large codebases. The objective of this paper is to overcome this shortcoming and present a new algorithm that is able to efficiently learn a strategy from large codebases. Method: The key idea is to use an oracle and transform the existing blackbox learning problem into a whitebox one that is much easier to solve. The oracle quantifies the relative importance of each part of the program with respect to the analysis precision. The oracle can be obtained by running the most and least precise analyses only once over the codebase. Results: Our learning algorithm is much faster than the existing algorithms while producing high quality strategies. The evaluation is done with 140 open-source C programs, comprising of 2.1 MLoC in total. Learning at this large scale was previously impractical. Conclusion: Our work advances the state-of-the-art of data-driven program analysis by addressing the scalability issue of the existing learning algorithm. Our technique will make the data-driven approach more practical in the real-world.

关键词： data-driven program analysis Learning algorithm

来源：评论

学校读者我要写书评

暂无评论

A Machine-Learning Algorithm with Disjunctive Model for data-driven program analysis

引用

ACM TRANSACTIONS ON programMING LANGUAGES AND SYSTEMS 2019年第2期41卷 13-13页

作者： Jeon, Minseok Jeong, Sehun Cha, Sungdeok Oh, Hakjoo Korea Univ Dept Comp Sci & Engn 145 Anam Ro Seoul 02841 South Korea

We present a new machine-learning algorithm with disjunctive model for data-driven program analysis. One major challenge in static program analysis is a substantial amount of manual effort required for tuning the analysis performance. Recently, data-driven program analysis has emerged to address this challenge by automatically adjusting the analysis based on data through a learning algorithm. Although this new approach has proven promising for various program analysis tasks, its effectiveness has been limited due to simple-minded learning models and algorithms that are unable to capture sophisticated, in particular disjunctive, program properties. To overcome this shortcoming, this article presents a new disjunctive model for data-driven program analysis as well as a learning algorithm to find the model parameters. Our model uses Boolean formulas over atomic features and therefore is able to express nonlinear combinations of program properties. A key technical challenge is to efficiently determine a set of good Boolean formulas, as brute-force search would simply be impractical. We present a stepwise and greedy algorithm that efficiently learns Boolean formulas. We show the effectiveness and generality of our algorithm with two static analyzers: context-sensitive points-to analysis for Java and flow-sensitive interval analysis for C. Experimental results show that our automated technique significantly improves the performance of the state-of-the-art techniques including ones hand-crafted by human experts.

关键词： data-driven program analysis static analysis context-sensitivity flow-sensitivity

来源：评论

学校读者我要写书评

暂无评论

Precise and Scalable Points-to analysis via data-driven Context Tunneling

引用

PROCEEDINGS OF THE ACM ON programMING LANGUAGES-PACMPL 2018年第OOPSLA期2卷 1–29页

作者： Jeon, Minseok Jeong, Sehun Oh, Hakjoo Korea Univ Dept Comp Sci & Engn 145 Anam Ro Seoul 02841 South Korea

We present context tunneling, a new approach for making k-limited context-sensitive points-to analysis precise and scalable. As context-sensitivity holds the key to the development of precise and scalable points-to analysis, a variety of techniques for context-sensitivity have been proposed. However, existing approaches such as k-call-site-sensitivity or k-object-sensitivity have a significant weakness that they unconditionally update the context of a method at every call-site, allowing important context elements to be overwritten by more recent, but not necessarily more important, context elements. In this paper, we show that this is a key limiting factor of existing context-sensitive analyses, and demonstrate that remarkable increase in both precision and scalability can be gained by maintaining important context elements only. Our approach, called context tunneling, updates contexts selectively and decides when to propagate the same context without modification. We attain context tunneling via a data-driven approach. The effectiveness of context tunneling is very sensitive to the choice of important context elements. Even worse, precision is not monotonically increasing with respect to the ordering of the choices. As a result, manually coming up with a good heuristic rule for context tunneling is extremely challenging and likely fails to maximize its potential. We address this challenge by developing a specialized data-driven algorithm, which is able to automatically search for high-quality heuristics over the non-monotonic space of context tunneling. We implemented our approach in the Doop framework and applied it to four major flavors of contextsensitivity: call-site-sensitivity, object-sensitivity, type-sensitivity, and hybrid context-sensitivity. In all cases, 1-context-sensitive analysis with context tunneling far outperformed deeper context-sensitivity with k = 2 in both precision and scalability.

关键词： Points-to analysis Context-sensitive analysis data-driven program analysis

来源：评论

学校读者我要写书评

暂无评论

Automatically Generating Features for Learning program analysis Heuristics for C-Like Languages

引用

PROCEEDINGS OF THE ACM ON programMING LANGUAGES-PACMPL 2017年第OOPSLA期1卷 1-25页

作者： Chae, Kwonsoo Oh, Hakjoo Heo, Kihong Yang, Hongseok Korea Univ Seoul South Korea Seoul Natl Univ Seoul South Korea Univ Oxford Oxford England

We present a technique for automatically generating features for data-driven program analyses. Recently data-driven approaches for building a program analysis have been developed, which mine existing codebases and automatically learn heuristics for finding a cost-effective abstraction for a given analysis task. Such approaches reduce the burden of the analysis designers, but they do not remove it completely;they still leave the nontrivial task of designing so called features to the hands of the designers. Our technique aims at automating this feature design process. The idea is to use programs as features after reducing and abstracting them. Our technique goes through selected program-query pairs in codebases, and it reduces and abstracts the program in each pair to a few lines of code, while ensuring that the analysis behaves similarly for the original and the new programs with respect to the query. Each reduced program serves as a boolean feature for program-query pairs. This feature evaluates to true for a given program-query pair when (as a program) it is included in the program part of the pair. We have implemented our approach for three real-world static analyses. The experimental results show that these analyses with automatically-generated features are cost-effective and consistently perform well on a wide range of programs.

关键词： data-driven program analysis Automatic feature generation

来源：评论

学校读者我要写书评

暂无评论

data-driven Context-Sensitivity for Points-to analysis

引用

PROCEEDINGS OF THE ACM ON programMING LANGUAGES-PACMPL 2017年第OOPSLA期1卷 1-28页

作者： Jeong, Sehun Jeon, Minseok Cha, Sungdeok Oh, Hakjoo Korea Univ Seoul South Korea

We present a new data-driven approach to achieve highly cost-effective context-sensitive points-to analysis for Java. While context-sensitivity has greater impact on the analysis precision and performance than any other precision-improving techniques, it is difficult to accurately identify the methods that would benefit the most from context-sensitivity and decide how much context-sensitivity should be used for them. Manually designing such rules is a nontrivial and laborious task that often delivers suboptimal results in practice. To overcome these challenges, we propose an automated and data-driven approach that learns to effectively apply context-sensitivity from codebases. In our approach, points-to analysis is equipped with a parameterized and heuristic rules, in disjunctive form of properties on program elements, that decide when and how much to apply context-sensitivity. We present a greedy algorithm that efficiently learns the parameter of the heuristic rules. We implemented our approach in the Doop framework and evaluated using three types of context-sensitive analyses: conventional object-sensitivity, selective hybrid object-sensitivity, and type-sensitivity. In all cases, experimental results show that our approach significantly outperforms existing techniques.

关键词： data-driven program analysis Points-to analysis Context-sensitivity

来源：评论

学校读者我要写书评

暂无评论

Adaptive Static analysis via Learning with Bayesian Optimization

引用

ACM TRANSACTIONS ON programMING LANGUAGES AND SYSTEMS 2018年第4期40卷 14-14页

作者： Heo, Kihong Oh, Hakjoo Yang, Hongseok Yi, Kwangkeun Seoul Natl Univ Seoul South Korea Korea Univ Seoul South Korea Univ Oxford Oxford England

Building a cost-effective static analyzer for real-world programs is still regarded an art. One key contributor to this grim reputation is the difficulty in balancing the cost and the precision of an analyzer. An ideal analyzer should be adaptive to a given analysis task and avoid using techniques that unnecessarily improve precision and increase analysis cost. However, achieving this ideal is highly nontrivial, and it requires a large amount of engineering efforts. In this article, we present a new learning-based approach for adaptive static analysis. In our approach, the analysis includes a sophisticated parameterized strategy that decides, for each part of a given program, whether to apply a precision-improving technique to that part or not. We present a method for learning a good parameter for such a strategy from an existing codebase via Bayesian optimization. The learnt strategy is then used for new, unseen programs. Using our approach, we developed partially flow- and context-sensitive variants of a realistic. C static analyzer. The experimental results demonstrate that using Bayesian optimization is crucial for learning from an existing codebase. Also, they show that among all program queries that require flow- or context-sensitivity, our partially flow- and context-sensitive analysis answers 75% of them, while increasing the analysis cost only by 3.3x of the baseline flow- and context-insensitive analysis, rather than 40x or more of the fully sensitive version.

关键词： Static program analysis data-driven program analysis Bayesian optimization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：