Similarity join plays an important role in many applications, such as data cleaning and integration, to address the poor data quality problem. Most of the existing studies focused on performing similarity join on stat...
详细信息
ISBN:
(纸本)9781479976164
Similarity join plays an important role in many applications, such as data cleaning and integration, to address the poor data quality problem. Most of the existing studies focused on performing similarity join on static datasets but few studies realized running it on dynamic data streams. With the development of network technology, the data accessing paradigm has transferred from disk-oriented mode to online data streams, which makes performing similarity join in continuous query on data streams become a novel query processing paradigm. Different from static dataset, data stream is unbounded, continuous and unpredictable. The significant differences pose serious challenges, such as real-time query performance. To this end, we study the problem of continuous similarity join on data streams in this paper, which is based on edit distance metric and filter-and-verify framework with sliding-window semantics. Two subcases of this problem are studied, including self similarity join on a single data stream and similarity join on two streams. We introduced the basic window based sliding window model to facilitate the update of sliding window and its index. More details of our method, including signature extraction schemes, filtering and verification algorithms, re-evaluation strategies are discussed respectively. Finally, extensive experimental results show that our method works efficiently on real data streams.
To make the traditional applications benefit from multicore processors, the traditional Gaussian Elimination algorithm is improved to enhance its parallel performance under multicore architecture by matrix partition. ...
详细信息
Internet of Things (IoT) or Cyber-Physical Systems (CPS) is a new trend of real-time systems in the area of information technology. This paper introduces a spatiotemporal consistence language for real-time systems (Sh...
详细信息
Due to current real-time data compression algorithms is not efficient enough, we have proposed a two-phase real-time data compression algorithm which can be very fast in data compression with high compression rate. Th...
详细信息
With the development of multicore chips, it is of great need for people to study the optimization algorithm of matrix operation under multicore environment, so as to make full use of the CPU power;however, the existin...
详细信息
With the development of systems biology, more and more researchers focus on the study of bio-molecular networks. In recent years, researchers in different fields have accumulated a large number of biological experimen...
详细信息
With the development of systems biology, more and more researchers focus on the study of bio-molecular networks. In recent years, researchers in different fields have accumulated a large number of biological experimental data and algorithms for analysis and calculation of bio-molecular networks, but these data and methods are relatively independent, difficult to be utilized by biologists. Based on PSE-Bio, a problem solving environment for bioinformatics, this paper describes an integrated computing environment for bio-molecular networks in order to achieve molecular homology analysis, bio-molecular network building, querying, statistics and visualization.
Recently, combining a video recording of a presentation along with the digital slides used in it has become popular in e-learning and presentation of archives. For users of the archives, it is useful to preview a dige...
详细信息
Recently, combining a video recording of a presentation along with the digital slides used in it has become popular in e-learning and presentation of archives. For users of the archives, it is useful to preview a digest of such content to grasp the atmosphere and/or an outline of the presentation. This paper proposes a method of automatic digest generation by extracting important scenes from the presentation content. The extracted scenes are chosen based on several factors such as frequency and specificity of words, scene duration and order. Finally, the effectiveness of the proposed methods are evaluated by comparing with testers' answer sets for actual lectures.
Proper naming of methods can make program code easier to understand, and thus enhance software maintainability. Yet, developers may use inconsistent names due to poor communication or a lack of familiarity with conven...
详细信息
Proper naming of methods can make program code easier to understand, and thus enhance software maintainability. Yet, developers may use inconsistent names due to poor communication or a lack of familiarity with conventions within the software development lifecycle. To address this issue, much research effort has been invested into building automatic tools that can check for method name inconsistency and recommend consistent names. However, existing datasets generally do not provide precise details about why a method name was deemed improper and required to be changed. Such information can give useful hints on how to improve the recommendation of adequate method names. Accordingly, we construct a sample method-naming benchmark, ReName4J, by matching name changes with code reviews. We then present an empirical study on how state-of-the-art techniques perform in detecting or recommending consistent and inconsistent method names based on ReName4J. The main purpose of the study is to reveal a different perspective based on reviewed names rather than proposing a complete benchmark. We find that the existing techniques underperform on our review-driven benchmark, both in inconsistent checking and the recommendation. We further identify potential biases in the evaluation of existing techniques, which future research should consider thoroughly.
暂无评论