检索结果-内蒙古大学图书馆

IEEE TRANSACTIONS ON knowledge AND data ENGINEERING 2013年第3期25卷 648-661页

作者： Goasdoue, Francois Rousset, Marie-Christine Univ Paris 11 INRIA Saclay F-91405 Orsay France Univ Paris 11 Lab Rech Informat LRI F-91405 Orsay France Univ Grenoble Inst Univ France F-38402 St Martin Dheres France Univ Grenoble Lab Informat Grenoble LIG F-38402 St Martin Dheres France

The current trend for building an ontology-based data management system (DMS) is to capitalize on efforts made to design a preexisting well-established DMS (a reference system). The method amounts to extracting from the reference DMS a piece of schema relevant to the new application needs-a module-, possibly personalizing it with extra constraints w.r.t. the application under construction, and then managing a data set using the resulting schema. In this paper, we extend the existing definitions of modules and we introduce novel properties of robustness that provide means for checking easily that a robust module-based DMS evolves safely w.r.t. both the schema and the data of the reference DMS. We carry out our investigations in the setting of description logics which underlie modern ontology languages, like RDFS, OWL, and OWL2 from W3C. Notably, we focus on the DL-liteA dialect of the DL-lite family, which encompasses the foundations of the QL profile of OWL2 (i.e., DL-liteR): the W3C recommendation for efficiently managing large data sets.

关键词： Models and principles database management personalization algorithms for data and knowledge management artificial intelligence intelligent web services Semantic Web

来源：评论

学校读者我要写书评

暂无评论

Encoding semantic awareness in resource-constrained devices

引用

IEEE INTELLIGENT SYSTEMS 2008年第2期23卷 26-33页

作者： Preuveneers, Davy Berbers, Yolande Katholieke Univ Leuven Dept Comp Sci Louvain Belgium

With the Semantic Web relying on ontologies to establish online machine-interpretable information, the Internet is growing into a semantically aware computing paradigm that facilitates Web entities' discovery of the knowledge and resources they need. Ambient intelligence aims to enable smart interaction beyond the Internet by embedding intelligence into our environment to unobtrusively support users' daily activities. To accomplish these goals, ontologies and semantic awareness are crucial for better understanding a user's context. While interest in the Semantic Web has spurred the development of large-scale semantic grid architectures, expanding the Semantic Web to the other end of the computing spectrum is a complex undertaking. The techniques and tools that support the Semantic Web aren't designed to deal with the resource-constrained devices with which people frequently interact in an ambient-intelligence environment. A proposed coding scheme for ontologies embeds semantic awareness in devices with limited memory and processing capabilities, such as sensory nodes and smart phones. This scheme provides a compact representation of an ontology and is enhanced with an efficient and effective semantic-matching algorithm similar to subsumption testing in many ontology reasoners.

关键词： algorithms for data and knowledge management Ontology Languages Pervasive Computing

来源：评论

学校读者我要写书评

暂无评论

KungFQ: A Simple and Powerful Approach to Compress fastq Files

引用

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012年第6期9卷 1837-1842页

作者： Grassi, Elena Di Gregorio, Federico Molineris, Ivan Ctr Mol Biotechnol Dept Genet Biol & Biochem I-10126 Turin Italy DNDG Srl I-10123 Turin Italy

Nowadays storing data derived from deep sequencing experiments has become pivotal and standard compression algorithms do not exploit in a satisfying manner their structure. A number of reference-based compression algorithms have been developed but they are less adequate when approaching new species without fully sequenced genomes or nongenomic data. We developed a tool that takes advantages of fastq characteristics and encodes them in a binary format optimized in order to be further compressed with standard tools (such as gzip or lzma). The algorithm is straightforward and does not need any external reference file, it scans the fastq only once and has a constant memory requirement. Moreover, we added the possibility to perform lossy compression, losing some of the original information (IDs and/or qualities) but resulting in smaller files;it is also possible to define a quality cutoff under which corresponding base calls are converted to N. We achieve 2.82 to 7.77 compression ratios on various fastq files without losing information and 5.37 to 8.77 losing IDs, which are often not used in common analysis pipelines. In this paper, we compare the algorithm performance with known tools, usually obtaining higher compression levels.

关键词： Biology and genetics algorithms for data and knowledge management

来源：评论

学校读者我要写书评

暂无评论

Decentralised data Quality Control in Ground Truth Production for Autonomic Decisions

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2022年第10期33卷 2416-2427页

作者： Gkikopoulos, Panagiotis Schiavoni, Valerio Spillner, Josef Zurich Univ Appl Sci InIT CH-8401 Winterthur Switzerland Univ Neuchatel Dept Comp Sci CH-2000 Neuchatel Switzerland

Autonomic decision-making based on rules and metrics is inevitably on the rise in distributed software systems. Often, the metrics are acquired from system observations such as static checks and runtime traces. To avoid bias propagation and hence reduce wrong decisions in increasingly autonomous systems due to poor observation data quality, multiple independent observers can exchange their findings and produce a majority-accepted, complete and outlier-cleaned ground truth in the form of consensus-supported metrics. In this work, we motivate the growing importance of metrics for informed and autonomic decisions in clouds and other distributed systems, present reasons for diverging observations, and describe a federated approach to produce ground truth with data-centric consensus voting for more reliable decision making processes. We validate the system design with experiments in the area of cloud software artefact observations and highlight benefits for reproducible distributed system behaviour.

关键词： Measurement Software Decision making System analysis and design Redundancy Peer-to-peer computing data integrity Distributed systems redundant design algorithms for data and knowledge management

来源：评论

学校读者我要写书评

暂无评论

ADMiRe:: An algebraic data mining approach to system performance analysis

引用

IEEE TRANSACTIONS ON knowledge AND data ENGINEERING 2005年第7期17卷 888-901页

作者： Jiang, N Villafane, R Hua, KA Sawant, A Prabhakara, K Univ Cent Florida Sch Comp Sci Orlando FL 32816 USA Andrews Univ Berrien Springs MI 49104 USA Oracle Corp Redwood City CA 94065 USA Oracle Corp Redwood Shores CA 94065 USA

Performance analysis of computing systems is an increasingly difficult task due to growing system complexity. Traditional tools rely on ad hoc procedures. With these, determining which of the manifold system and workload parameters to examine is often a lengthy and highly speculative process. The analysis is often incomplete and, therefore, prone to revealing faulty conclusions and not uncovering useful tuning knowledge. We address this problem by introducing a data mining approach called ADMiRe ( Analyzer for data Mining Results). In this scheme, regression analysis is first applied to performance data to discover correlations between various system and workload parameters. The results of this analysis are summarized in sets of regression rules. The user can then formulate intuitive algebraic expressions to manipulate these sets of rules to capture critical information. To demonstrate this approach, we use ADMiRe to analyze an Oracle database system running the TPC- C ( Transaction Processing Performance Council) benchmark. The results generated by ADMiRe were confirmed by Oracle experts. We also show that by applying ADMiRe to Microsoft Internet Information Server performance data, we can improve system performance by 20 percent.

关键词： data mining performance of systems algorithms for data and knowledge management

来源：评论

学校读者我要写书评

暂无评论

VA-Store: A Virtual Approximate Store Approach to Supporting Repetitive Big data in Genome Sequence Analyses

引用

IEEE TRANSACTIONS ON knowledge AND data ENGINEERING 2020年第3期32卷 602-616页

作者： Liu, Xianying Zhu, Qiang Pramanik, Sakti Brown, C. Titus Qian, Gang Univ Michigan Dept Comp & Informat Sci Dearborn MI 48128 USA Michigan State Univ Dept Comp Sci & Engn E Lansing MI 48824 USA Univ Calif Davis Genome Ctr Davis CA 95616 USA Univ Cent Oklahoma Dept Comp Sci Edmond OK 73034 USA

In recent years, we have witnessed an increasing demand to process big data in numerous applications. It is observed that there often exist substantial amounts of repetitive data in different portions of a big data repository/dataset for applications such as genome sequence analyses. In this paper, we present a novel method, called the VA-Store, to reduce the large space requirement for repetitive data in prevailing genome sequence analysis tasks usingk(0)-mers (i.e., subsequences of length k(0)) with multiple k values. The VA-Store maintains a physical store for one portion of the input dataset (i.e., k(0) k(0)-mers) and supports multiple virtual stores for other portions of the dataset (i.e., kk-mers with k(0) k(0)). Utilizing important relationships among repetitive data, the VA-Store transforms a given query on a virtual store into one or more queries on the physical store for execution. Both precise and approximate transformations are considered. Accuracy estimation models for approximate solutions are derived. Query optimization strategies are suggested to improve query performance. Our experiments using real and synthetic datasets demonstrate that the VA-Store is quite promising in providing effective storage and efficient query processing for solving a kernel database problem on repetitive big data for genome sequence analysis applications.

关键词： Bioinformatics (genome or protein) databases data storage representations query processing algorithms for data and knowledge management

来源：评论

学校读者我要写书评

暂无评论

SpatialSSJP: QoS-Aware Adaptive Approximate Stream-Static Spatial Join Processor

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2024年第1期35卷 73-88页

作者： Al Jawarneh, Isam Mashhour Bellavista, Paolo Corradi, Antonio Foschini, Luca Montanari, Rebecca Univ Bologna Dipartimento Informat Scienzae & Ingn DISI I-40136 Bologna Italy

The widespread adoption of Internet of Things (IoT) motivated the emergence of mixed workloads in smart cities, where fast arriving geo-referenced big data streams are joined with archive tables, aiming at enriching streams with descriptive attributes that enable insightful analytics. Applications are now relying on finding, in real-time, to which geographical regions data streaming tuples belong. This problem requires a computationally intensive stream-static join for joining a dynamic stream with a disk-resident static table. In addition, the time-varying nature of fluctuation in geospatial data arriving online calls for an approximate solution that can trade-off QoS constraints while ensuring that the system survives sudden spikes in data loads. In this paper, we present SpatialSSJP, an adaptive spatial-aware approximate query processing system that specifically focuses on stream-static joins in a way that guarantees achieving an agreed set of Quality-of-Service goals and maintains geo-statistics of stateful online aggregations over stream-static join results. SpatialSSJP employs a state-of-art stratified-like sampling design to select well-balanced representative geospatial data stream samples and serve them to a stream-static geospatial join operator downstream. We implemented a prototype atop Spark Structured Streaming. Our extensive evaluations on big real datasets show that our system can survive and mitigate harsh join workloads and outperform state-of-art baselines by significant magnitudes, without risking rigorous error bounds in terms of the accuracy of the output results. SpatialSSJP achieves a relative accuracy gain against plain Spark joins of approximately 10% in worst cases but reaching up to 50% in best case scenarios.

关键词： algorithms for data and knowledge management data Architecture Spatial databases and GIS QoS data management Spatial Join Spatial Indexes Geospatial Analysis Apache Spark Query Processing Big data Applications

来源：评论

学校读者我要写书评

暂无评论

Interpretation of Structural Preservation in Low-Dimensional Embeddings

引用

IEEE TRANSACTIONS ON knowledge AND data ENGINEERING 2022年第5期34卷 2227-2240页

作者： Ghosh, Aindrila Nashaat, Mona Miller, James Quader, Shaikh Univ Alberta Dept Elect & Comp Engn Edmonton AB T6G 2R3 Canada IBM Toronto Software Lab Toronto ON Canada

Despite being commonly used in big-data analytics;the outcome of dimensionality reduction remains a black-box to most of its users. Understanding the quality of a low-dimensional embedding is important as not only it enables trust in the transformed data, but it can also help to select the most appropriate dimensionality reduction algorithm in a given scenario. As existing research primarily focuses on the visual exploration of embeddings, there is still a need for enhancing interpretability of such algorithms. To bridge this gap, we propose two novel interactive explanation techniques for low-dimensional embeddings obtained from any dimensionality reduction algorithm. The first technique LAPS produces a local approximation of the neighborhood structure to generate interpretable explanations on the preserved locality for a single instance. The second method GAPS explains the retained global structure of a high-dimensional dataset in its embedding, by combining non-redundant local-approximations from a coarse discretization of the projection space. We demonstrate the applicability of the proposed techniques using 16 real-life tabular, text, image, and audio datasets. Our extensive experimental evaluation shows the utility of the proposed techniques in interpreting the quality of low-dimensional embeddings, as well as with selecting the most suitable dimensionality reduction algorithm for any given dataset.

关键词： Dimensionality reduction Visualization Approximation algorithms Optimization Manifolds Bridges data visualization Interactive data exploration and discovery algorithms for data and knowledge management data and knowledge visualization

来源：评论

学校读者我要写书评

暂无评论

Determining k-Most Demanding Products with Maximum Expected Number of Total Customers

引用

IEEE TRANSACTIONS ON knowledge AND data ENGINEERING 2013年第8期25卷 1732-1747页

作者： Lin, Chen-Yi Koh, Jia-Ling Chen, Arbee L. P. Natl Tsing Hua Univ Dept Comp Sci Hsinchu 30013 Taiwan Natl Taiwan Normal Univ Dept Comp Sci & Informat Engn Taipei 11677 Taiwan Natl Chengchi Univ Dept Comp Sci Taipei 11605 Taiwan

In this paper, a problem of production plans, named k-most demanding products (k-MDP) discovering, is formulated. Given a set of customers demanding a certain type of products with multiple attributes, a set of existing products of the type, a set of candidate products that can be offered by a company, and a positive integer k, we want to help the company to select k products from the candidate products such that the expected number of the total customers for the k products is maximized. We show the problem is NP-hard when the number of attributes for a product is 3 or more. One greedy algorithm is proposed to find approximate solution for the problem. We also attempt to find the optimal solution of the problem by estimating the upper bound of the expected number of the total customers for a set of k candidate products for reducing the search space of the optimal solution. An exact algorithm is then provided to find the optimal solution of the problem by using this pruning strategy. The experiment results demonstrate that both the efficiency and memory requirement of the exact algorithm are comparable to those for the greedy algorithm, and the greedy algorithm is well scalable with respect to k.

关键词： algorithms for data and knowledge management decision support performance evaluation of algorithm and systems query processing

来源：评论

学校读者我要写书评

暂无评论

FLOTT-A Fast, Low Memory T-Transform Algorithm for Measuring String Complexity

引用

IEEE TRANSACTIONS ON COMPUTERS 2014年第4期63卷 917-926页

作者： Rebenich, Niko Speidel, Ulrich Neville, Stephen W. Gulliver, T. Aaron Univ Victoria Dept Elect & Comp Engn STN CSC Victoria Victoria BC V8W 3P6 Canada Univ Auckland Dept Comp Sci Auckland 1 New Zealand

This paper presents flott, a fast, low memory T-transform algorithm which can be used to compute the string complexity measure T-complexity. The algorithm uses approximately one third of the memory of its predecessor while reducing the running time by about 20 percent. The flott implementation has the same worst-case memory requirements as state of the art suffix tree construction algorithms. A suffix tree can be used to efficiently compute the Lempel-Ziv production complexity, which is another measure of string complexity. The C-implementation of flott is available as Open Source software.

关键词： Coding theory information theory information filtering text analysis algorithms for data and knowledge management computation of transforms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：