mining maximal frequent itemsets in data streams is more difficult than mining them in static databases for the huge, high-speed and continuous characteristics of data streams. In this paper, we propose a novel one-pa...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
mining maximal frequent itemsets in data streams is more difficult than mining them in static databases for the huge, high-speed and continuous characteristics of data streams. In this paper, we propose a novel one-pass algorithm called FpMFI-DS, which mines all maximal frequent itemsets in Landmark windows or Sliding windows in data streams based on FP-Tree. A new structure of FP-Tree is designed for storing all transactions in Landmark windows or Sliding windows in data streams. To improve the efficiency of the algorithm, a new pruning technique, extension support equivalency pruning (ESEquivPS), is imported to it. The experiments show that our algorithm is efficient and scalable. It is suitable for mining MFIs both in static database and in data streams.
datamining, which is known as knowledge discovery in databases has been defined as the nontrivial extraction of implicit, previous unknown and potentially useful information from data. It uses machinelearning, stati...
详细信息
ISBN:
(纸本)0769529941
datamining, which is known as knowledge discovery in databases has been defined as the nontrivial extraction of implicit, previous unknown and potentially useful information from data. It uses machinelearning, statistical and visualization techniques to discover and present knowledge in a form which is easily comprehensible to human. In the paper the authors first introduce the idea, basic concept and process of datamining, then, an example and methods of the application of datamining in physical statistics are analyzed. datamining is applied in physical training and evaluation, such as constitution data analyzing, PE industry and competitive sports. Thus;we think datamining becomes an important task of the scientific research of sports topic in future.
Fractal theory has been used for computer graphics, image compression and different fields of patternrecognition. In this paper, a fractal based method for recognition of both on-line and off-line Farsi/Arabic handwr...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
Fractal theory has been used for computer graphics, image compression and different fields of patternrecognition. In this paper, a fractal based method for recognition of both on-line and off-line Farsi/Arabic handwritten digits is proposed. Our main goal is to verify whether fractal theory is able to capture discriminatory information from digits for patternrecognition task. Digit classification problem (on-line and off-line) deals with patterns which do not have complex structure. So, a general purpose fractal coder, introduced for image compression, is simplified to be utilized for this application. In order to do that, during the coding process, contrast and luminosity information of each point in the input pattern are ignored. Therefore, this approach can deal with on-line data and binary images of handwritten Farsi digits. In fact, our system represents the shape of the input pattern by searching for a set, of geometrical relationship between parts of it. Some fractal-based features are directly extracted by the fractal coder. We show that the resulting features have invariant properties which can be used for object recognition.
In the area of multimedia processing, a number of studies have been devoted to narrowing the gap between multimedia content and human sense. In fact, multimedia understanding is a difficult and challenging task even u...
详细信息
ISBN:
(纸本)9780769529943
In the area of multimedia processing, a number of studies have been devoted to narrowing the gap between multimedia content and human sense. In fact, multimedia understanding is a difficult and challenging task even using machine-learning techniques. To deal with this challenge, in this paper we propose an innovative method that employs datamining techniques and content-based paradigm to conceptualize videos. Mainly, our proposed method puts the focus on: (1) Construction of prediction models, namely speech-association model Model(Sass) and visual-statistical model Model(CRM), and (2) Fusion of prediction models to annotate unknown videos automatically. Without additional manual cost, discovered speech-association patterns can show the implicit relationships among the sequential images. On the other hand, visual features can atone for the inadequacy of speech-association patterns. Empirical evaluations reveal that our approach makes, on the average, the promising results than other methods for annotating videos.
Forecasting extremes of Indian summer monsoon rainfall (ISMR), one or more seasons in advance, is of great economic significance for Indian economy. Most of the statistical forecasting models that have been developed ...
详细信息
In recent years there has been a tremendous increase in the number of users maintaining online blogs on the Internet. Companies, in particular, have become aware of this medium of communication and have taken a keen i...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
In recent years there has been a tremendous increase in the number of users maintaining online blogs on the Internet. Companies, in particular, have become aware of this medium of communication and have taken a keen interest in what is being said about them through such personal blogs. This has given rise to a new field of research directed towards mining useful information from a large amount of unformatted data present in online blogs and online forums. We discuss an implementation of such a blog mining application. The application is broadly divided into two parts, the indexing process and the search module. Blogs pertaining to different organizations are fetched from a particular blog domain on the Internet. After analyzing the textual content of these blogs they are assigned a sentiment rating. Specific data from such blogs along with their sentiment ratings are then indexed on the physical hard drive. The search module searches through these indexes at run time for the input organization name and produces a list of blogs conveying both positive and negative sentiments about the organization.
The main challenge of mining sequential patterns is the high processing cost of support counting for large amount of candidate patterns, and a lot of patterns are not interesting to users. In this paper a novel algori...
详细信息
ISBN:
(纸本)9780769529943
The main challenge of mining sequential patterns is the high processing cost of support counting for large amount of candidate patterns, and a lot of patterns are not interesting to users. In this paper a novel algorithm MSMA (Maximal Sequential patternmining Based on Simultaneous Monotone and Anti-monotone constraints) incorporating both maximal and constraint-based sequential patternmining in mining process is proposed. It allows the efficient mining of sequential patterns when both monotone and anti-monotone constraints are simultaneously pushed in mining process at different strategic stages. Our experiment shows that MSMA is an efficient algorithm for handling simultaneous monotone and anti-monotone constraints.
Association rule mining often results in an overwhelming number of rules. In practice, it is difficult for the final user to select the most relevant rules. In order to tackle this problem, various interestingness mea...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
Association rule mining often results in an overwhelming number of rules. In practice, it is difficult for the final user to select the most relevant rules. In order to tackle this problem, various interestingness measures were proposed. Nevertheless, the choice of an appropriate measure remains a hard task and the use of several measures may lead to conflicting information. In this paper, we give a unified view of objective interestingness measures. We define a new framework embedding a large set of measures called SBMs and we prove that the SBMs have a similar behavior. Furthermore, we identify the whole collection of the rules simultaneously optimizing all the SBMs. We provide an algorithm to efficiently mine a reduced set of rules among the rules optimizing all the SBMs. Experiments on real datasets highlight the characteristics of such rules.
Advances in wireless and mobile technology flood us with amounts of moving object data that preclude all means of manual data processing. The volume of data gathered from position sensors of mobile phones, PDAs, or ve...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
Advances in wireless and mobile technology flood us with amounts of moving object data that preclude all means of manual data processing. The volume of data gathered from position sensors of mobile phones, PDAs, or vehicles, defies human ability to analyze the stream of input data. On the other hand, vast amounts of gathered data hide interesting and valuable knowledge patterns describing the behavior of moving objects. Thus, new algorithms for mining moving object data are required to unearth this knowledge. An important function of the mobile objects management system is the prediction of the unknown location of an object. In this paper we introduce a datamining approach to the problem of predicting the location of a moving object. We mine the database of moving object locations to discover frequent trajectories and movement rules. Then, we match the trajectory of a moving object with the database of movement rules to build a probabilistic model of object location. Experimental evaluation of the proposal reveals prediction accuracy close to 80%. Our original contribution includes the elaboration on the location prediction model, the design of an efficient mining algorithm, introduction of movement rule matching strategies, and a thorough experimental evaluation of the proposed model.
We present a method, called equivalence learning, which applies a two-class classification approach to object-pairs defined within a multi-class scenario. The underlying idea is that instead of classifying objects int...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
We present a method, called equivalence learning, which applies a two-class classification approach to object-pairs defined within a multi-class scenario. The underlying idea is that instead of classifying objects into their respective classes, we classify object pairs either as equivalent (belonging to the same class) or non-equivalent (belonging to different classes). The method is based on a vectorisation of the similarity between the objects and the application of a machinelearning algorithm (SVM, ANN, LogReg, Random Forests) to learn the differences between equivalent and non-equivalent object pairs, and define a, unique kernel function that can be obtained via equivalence learning. Using a small dataset of archaeal, bacterial and eukaryotic 3-phosphoglycerate-kinase sequences we found that the classification performance of equivalence learning slightly exceeds those of several simple machinelearning algorithms at the price of a minimal increase in time and space requirements.
暂无评论