In order to extract knowledge from databases, data mining algorithms heavily query the databases. Inefficient processing of these queries will inevitably have its impact on the performance of these algorithms, making ...
详细信息
ISBN:
(纸本)0818681489
In order to extract knowledge from databases, data mining algorithms heavily query the databases. Inefficient processing of these queries will inevitably have its impact on the performance of these algorithms, making them less valuable. In this paper, we describe an optimization framework for an efficient processing of queries generated by different data mining algorithms. We show how to take advantage of the physical organization of the database, the operators and the control structures used in an algorithm. Finally, we discuss how our framework fits into conventional query optimization frameworks.
We consider the problem of computing the product of two n x n Boolean matrices A and B. For two 0 - I strings s = s(1)s(2) .... s(m) and u = u(1)u(2) ... u(m), an extended Hamming distance, eh(s, u), between the strin...
详细信息
ISBN:
(纸本)3540405453
We consider the problem of computing the product of two n x n Boolean matrices A and B. For two 0 - I strings s = s(1)s(2) .... s(m) and u = u(1)u(2) ... u(m), an extended Hamming distance, eh(s, u), between the strings, is defined by a recursive equation eh(s, u) = eh(s(l+1) ... s(m), u(l+1) ... u(m)) + (s(1) + u(1) mod 2), where l is the maximum number, s.t., s(j) = s, and u(j) = u(1) for j = For any n x n Boolean matrix C, let GC be a complete weighted graph on the rows of C, where the weight of an edge between two rows is equal to its extended Hamming distance. Next, let MWT(C) be the weight of a minimum weight spanning, tree of GC. WE! show that the product of A and B as well as the so called witnesses of the product can be computed in time (O) over bar (n(n + min{MWT(A), MWT(B-t)}))(1). Since the extended Hamming distance between two strings never exceeds the standard Hamming distance between them, our result subsumes an earlier similar result on the Boolean matrix product in terms of the Hamming distance due to Bjorklund and Lingas [4]. We also observe that min{MWT(A),MWT(B-t)} = O(min{r(A),r(B)}), where r(A) and r(B) reflect the minimum number of rectangles required to cover ls in A and B, respectively. Hence, our result also generalizes the recent upper bound on the Boolean matrix product in terms of r(A) and r(B), due to Lingas [12].
Tensor decompositions are powerful tools for large data analytics, as they jointly model multiple aspects of data into one framework and enable the discovery of the latent structures and higher-order correlations with...
详细信息
ISBN:
(数字)9781728155494
ISBN:
(纸本)9781728155494
Tensor decompositions are powerful tools for large data analytics, as they jointly model multiple aspects of data into one framework and enable the discovery of the latent structures and higher-order correlations within the data. One of the most widely studied and used decompositions, especially in data mining and machine learning, is the Canonical Polyadic or PARAFAC decomposition. However, today's datasets are not static and often grow and change over time. To operate on such large dynamic data, we present OCTEN, the first ever compression-based online parallel implementation for the CP/PARAFAC decomposition. We conduct an extensive empirical analysis of the algorithms in terms of fitness, memory used and CPU time and in order to demonstrate the compression and scalability of the method, we apply OCTEN to big tensor data. Indicatively, OCTEN performs on-par or better than state-of-the-art online and offline methods in terms of decomposition accuracy and efficiency, while achieving memory savings ranging in 40-200%.
In different application fields, heterogeneous data sets are structured into either matrices or higher-order tensors. In some cases, these structures present the property of having common underlying factors, which is ...
详细信息
ISBN:
(纸本)9781728155494
In different application fields, heterogeneous data sets are structured into either matrices or higher-order tensors. In some cases, these structures present the property of having common underlying factors, which is used to improve the efficiency of factor-matrices estimation in the process of the so-called coupled matrix-tensor factorization (CMTF). Many methods target the CMTF problem relying on alternating algorithms or gradient approaches. However, computational complexity remains a challenge when the data sets are tensors of high-order, which is linked to the well-known "curse of dimensionality". In this paper, we present a methodological approach, using the Joint dImensionality Reduction And Factors rEtrieval (JIRAFE) algorithm for joint factorization of high-order tensor and matrix. this approach reduces the high-order CMTF problem into a set of 3-order CMTF and canonical polyadic decomposition (CPD) problems. the proposed algorithm is evaluated on simulation and compared with a gradient-based method.
Phylogenetic reconstruction from gene-rearrangement data has seen increased attention over the last five years. Existing methods are limited computationally and by the assumption (highly unrealistic in practice) that ...
详细信息
the optimal prefix-free code problem is to determine, for a given array p = [p(i) \i is an element of {1...n}] of n weights, an integer array I = [l(i) \i is an element of (1...n)] of n codeword lengths such that Sigm...
详细信息
ISBN:
(纸本)3540602208
the optimal prefix-free code problem is to determine, for a given array p = [p(i) \i is an element of {1...n}] of n weights, an integer array I = [l(i) \i is an element of (1...n)] of n codeword lengths such that Sigma(i=1)(n) 2(-li) less than or equal to 1 and Sigma(i=1)(n) p(i)l(i) is minimized. Huffman's famous greedy algorithm solves this problem in O(n log n) time, if p is unsorted;and can be implemented to. execute in O(n) time, if the input array p is sorted. Here we consider the space requirements of the greedy method. We show that if p is sorted then it is possible to calculate the array 1 in-place, with l(i) overwriting pi, in O(n) time and using O(1) additional space. the new implementation leads directly to an O(n log n)-time and n + O(1) words of extra space implementation for the case when p is not sorted. the proposed method is simple to implement and executes quickly.
this paper aims to reveal a methodology used to quantitatively evaluate the impact of an earthquake on a region, considering multi temporal high resolution optical images. the proposed approach was initiated in the fr...
详细信息
ISBN:
(纸本)9781467371193
this paper aims to reveal a methodology used to quantitatively evaluate the impact of an earthquake on a region, considering multi temporal high resolution optical images. the proposed approach was initiated in the frame of GEODIM Project whose goal is to develop a Romanian downstream emergency response service in order to contribute to current disaster and risk management approach based on Earth observation data. the project is focused on developing experimental processing algorithms and mapping products for natural disasters (floods, earthquakes, landslides) damage assessment in urban areas based on very high resolution optical and SAR satellite imagery acquired worldwide. the prospective scenario considers knowledge discovery from pre and post event satellite images by mapping the extracted data features into semantic classes and symbolic representations like "buildings", "vegetation", "streets", "bare land" and "damaged buildings", etc.
the proceedings contain 92 papers. the topics discussed include: multimodal image fusion of anatomical structures for diagnosis, therapy planning and assistance;a robust geometrical method for blind separation of nois...
ISBN:
(纸本)9781467355407
the proceedings contain 92 papers. the topics discussed include: multimodal image fusion of anatomical structures for diagnosis, therapy planning and assistance;a robust geometrical method for blind separation of noisy mixtures of non-negatives sources;blind unmixing of remote sensing data with some pure pixels: extension and comparison of spatial methods exploiting sparsity and nonnegativity properties;comparative performance analysis of non orthogonal joint diagonalization algorithms;tensor estimation and visualization using dMRI;toward an adaptive extraction method of masses in digitized mammograms;temporal signatures of electrodermal activity for the evaluation of runners' performance: start and finish phases;stereo image coding: state of the art;and hybrid chaos-based image encryption approach using block and stream ciphers.
Modular system to acquire, store;analyze and process experimental nuclear data is described. Besides standard methods new developed algorithms also have been included in the system. (C) 2003 Elsevier Science B.V. All ...
详细信息
Modular system to acquire, store;analyze and process experimental nuclear data is described. Besides standard methods new developed algorithms also have been included in the system. (C) 2003 Elsevier Science B.V. All rights reserved.
this paper presents a mathematical framework for modeling arithmetic operators and other RTL design modules as discrete word-level functions and proposes a polynomial representation of those functions. the proposed re...
详细信息
ISBN:
(纸本)0780382366
this paper presents a mathematical framework for modeling arithmetic operators and other RTL design modules as discrete word-level functions and proposes a polynomial representation of those functions. the proposed representation attempts to bridge the gap between bit-level BDD representations and word-level representations, such as *BMDs and TEDs.
暂无评论