Due to current real-time data compression algorithms is not efficient enough, we have proposed a two-phase real-time data compression algorithm which can be very fast in data compression with high compression rate. Th...
详细信息
With the development of multicore chips, it is of great need for people to study the optimization algorithm of matrix operation under multicore environment, so as to make full use of the CPU power;however, the existin...
详细信息
With the development of systems biology, more and more researchers focus on the study of bio-molecular networks. In recent years, researchers in different fields have accumulated a large number of biological experimen...
详细信息
With the development of systems biology, more and more researchers focus on the study of bio-molecular networks. In recent years, researchers in different fields have accumulated a large number of biological experimental data and algorithms for analysis and calculation of bio-molecular networks, but these data and methods are relatively independent, difficult to be utilized by biologists. Based on PSE-Bio, a problem solving environment for bioinformatics, this paper describes an integrated computing environment for bio-molecular networks in order to achieve molecular homology analysis, bio-molecular network building, querying, statistics and visualization.
Recently, combining a video recording of a presentation along with the digital slides used in it has become popular in e-learning and presentation of archives. For users of the archives, it is useful to preview a dige...
详细信息
Recently, combining a video recording of a presentation along with the digital slides used in it has become popular in e-learning and presentation of archives. For users of the archives, it is useful to preview a digest of such content to grasp the atmosphere and/or an outline of the presentation. This paper proposes a method of automatic digest generation by extracting important scenes from the presentation content. The extracted scenes are chosen based on several factors such as frequency and specificity of words, scene duration and order. Finally, the effectiveness of the proposed methods are evaluated by comparing with testers' answer sets for actual lectures.
Proper naming of methods can make program code easier to understand, and thus enhance software maintainability. Yet, developers may use inconsistent names due to poor communication or a lack of familiarity with conven...
详细信息
Proper naming of methods can make program code easier to understand, and thus enhance software maintainability. Yet, developers may use inconsistent names due to poor communication or a lack of familiarity with conventions within the software development lifecycle. To address this issue, much research effort has been invested into building automatic tools that can check for method name inconsistency and recommend consistent names. However, existing datasets generally do not provide precise details about why a method name was deemed improper and required to be changed. Such information can give useful hints on how to improve the recommendation of adequate method names. Accordingly, we construct a sample method-naming benchmark, ReName4J, by matching name changes with code reviews. We then present an empirical study on how state-of-the-art techniques perform in detecting or recommending consistent and inconsistent method names based on ReName4J. The main purpose of the study is to reveal a different perspective based on reviewed names rather than proposing a complete benchmark. We find that the existing techniques underperform on our review-driven benchmark, both in inconsistent checking and the recommendation. We further identify potential biases in the evaluation of existing techniques, which future research should consider thoroughly.
Code generation models based on the pre-training and fine-tuning paradigm have been increasingly attempted by both academia and industry, resulting in well-known industrial models such as Codex, CodeGen, and PanGu-Cod...
详细信息
Code generation models based on the pre-training and fine-tuning paradigm have been increasingly attempted by both academia and industry, resulting in well-known industrial models such as Codex, CodeGen, and PanGu-Coder. After being pre-trained on a large-scale corpus of code, a model is further fine-tuned with datasets specifically for the target downstream task, e.g., generating code from natural language description. The target code being generated can be classified into two types: a standalone function, i.e., a function that invokes or accesses only built-in functions and standard libraries, and a non-standalone function, i.e., a function that invokes or accesses user-defined functions or third-party *** effectively generate code especially non-standalone functions (largely ignored by existing work), in this article, we present Wenwang, an approach to improving the capability of a pre-trained model on generating code beyond standalone functions. Wenwang consists of two components: a fine-tuning dataset named WenwangData and a fine-tuned model named WenwangCoder. Compared with existing fine-tuning datasets, WenwangData additionally covers non-standalone functions. Besides the docstring and code snippet for a function, WenwangData also includes its contextual information collected via program analysis. Based on PanGu-Coder, we produce WenwangCoder by fine-tuning PanGu-Coder on WenwangData with our context-aware fine-tuning technique so that the contextual information can be fully leveraged during code generation. On CoderEval and HumanEval, WenwangCoder outperforms three state-of-the-art models with similar parameter sizes (at the scale of around 300M), namely CodeGen, PanGu-Coder, and PanGu-FT. Although WenwangCoder does not outperform ChatGPT on HumanEval, WenwangCoder with smaller model parameter sizes can achieve similar effects to ChatGPT on CoderEval. Our experimental results also shed light on a number of promising optimization directions based on
暂无评论