With the rapid development and popularization, Internet is becoming the most convenient way to publish and obtain information, which causes an extremely increasing quantity and variety of data. It is difficult to find...
详细信息
ISBN:
(纸本)9781538676721
With the rapid development and popularization, Internet is becoming the most convenient way to publish and obtain information, which causes an extremely increasing quantity and variety of data. It is difficult to find out potentially valuable information from these data, which is the primary problem of data mining. Mining company hot events from Internet news can effectively reflect how its business works. Thus, we propose a method for discovering and obtaining hot events from Internet news. In the proposed method, we use Gaussian kernel to update clustering center instead of global cluster to modify single-pass clustering algorithm. It is a dynamic incremental clustering algorithm which does not need to initialize the number of clusters. Then, Top-N hot events can be obtained through the clustering centers. Experimental comparison shows that the improved algorithm has higher clustering efficiency than the classic algorithm. Case studies from Shanghai pilot free-trade zone (FTZ) also show the effectiveness of our proposed method.
This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image, or sketch, of the matrix. These methods can preserve structural properties of the inpu...
详细信息
This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image, or sketch, of the matrix. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by numerical experiments with real and synthetic data.
Non-rigid registration is crucial in imaging, in particular, to adjust deformities produced during image acquisition and improve the accuracy of datasets. However, conventional imaging systems lack the desired speed a...
详细信息
ISBN:
(纸本)9780769547497
Non-rigid registration is crucial in imaging, in particular, to adjust deformities produced during image acquisition and improve the accuracy of datasets. However, conventional imaging systems lack the desired speed and computational bandwidth for additional non-rigid registration of the deformed images. Therefore, such functionality is usually unavailable in time-critical settings. Expensive computations and memory intensive characteristics of non-rigid image registration algorithms such as the Demons algorithm further limits the realization of such systems. In response, we propose an alternative and efficient custom hardware-based Demons registration algorithm which utilizes pipelined streaming models to minimize memory fetches for computation. Designed for highly customizable hardware, our design only requires single-pass of images to compute the Demons kernel. Implementation results on the Xilinx ML605 FPGA system is presented and quantitatively evaluated in clock cycle counts in contrast with a software-based implementation.
Online mining of data streams is an important data mining problem with broad applications. However, it is also a difficult problem since the streaming data possess some inherent characteristics. In this paper, we prop...
详细信息
Online mining of data streams is an important data mining problem with broad applications. However, it is also a difficult problem since the streaming data possess some inherent characteristics. In this paper, we propose a new single-pass algorithm, called DSM-FI (data stream mining for frequent itemsets), for online incremental mining of frequent itemsets over a continuous stream of online transactions. According to the proposed algorithm, each transaction of the stream is projected into a set of sub-transactions, and these sub-transactions are inserted into a new in-memory summary data structure, called SFI-forest (summary frequent itemset forest) for maintaining the set of all frequent itemsets embedded in the transaction data stream generated so far. Finally, the set of all frequent itemsets is determined from the current SFI-forest. Theoretical analysis and experimental studies show that the proposed DSM-FI algorithm uses stable memory, makes only one pass over an online transactional data stream, and outperforms the existing algorithms of one-pass mining of frequent itemsets.
Online mining of frequent patterns from music data is one of the most important research issues of multimedia data mining. Most previous studies require the specification of a min_support threshold and aim at mining a...
详细信息
Online mining of frequent patterns from music data is one of the most important research issues of multimedia data mining. Most previous studies require the specification of a min_support threshold and aim at mining a complete set of frequent patterns satisfying min_support. However. in practice, it is difficult for users to provide an appropriate value of min_support threshold. In this paper, we propose a new problem of multimedia data mining: online mining of top-k melody structures of length no less than min_1, where k is the desired number of hot melody structures to be mined and min_1 is the minimal length of each melody structure. An efficient single-pass algorithm, called top-k-HMS (top-k Hot Melody Structures) is developed for mining such melody structures Without min_support. In the framework of top-k-HMS algorithm, a new summary data structure, called TKM-list (top-k melody list) is developed to maintain the essential information about the top-k hot melody structures from the Current melody sequence streams. Experimental Studies show that the proposed top-k-HMS algorithm is an efficient one-pass method for mining the set of top-k Hot Melody Structures over a continuous stream of melody sequences. (C) 2008 Elsevier B.V. All rights reserved.
Mining Web click streams is an important data mining problem with broad applications. However, it is also a difficult problem since the streaming data possess some interesting characteristics, such as unknown or unbou...
详细信息
Mining Web click streams is an important data mining problem with broad applications. However, it is also a difficult problem since the streaming data possess some interesting characteristics, such as unknown or unbounded length, possibly a very fast arrival rate, inability to backtrack over previously arrived click-sequences, and a lack of system control over the order in which the data arrive. In this paper, we propose a projection-based, single-pass algorithm, called DSM-PLW (Data Stream Mining for Path traversal patterns in a Landmark Window), for online incremental mining of path traversal patterns over a continuous stream of maximal forward references generated at a rapid rate. According to the algorithm, each maximal forward reference of the stream is projected into a set of reference-suffix maximal forward references, and these reference-suffix maximal forward references are inserted into a new in-memory summary data structure, called SP-forest (Summary Path traversal pattern forest), which is an extended prefix tree-based data structure for storing essential information about frequent reference sequences of the stream so far. The set of all maximal reference sequences is determined from the SP-forest by a depth-first-search mechanism, called MRS-mining (Maximal Reference Sequence mining). Theoretical analysis and experimental studies show that the proposed algorithm has gently growing memory requirements and makes only one pass over the streaming data. (c) 2005 Elsevier B.V. All rights reserved.
暂无评论