To ensure the liability of civil aircrafts, engines have to be tested after their production. Vibrations are one of the most informative measures to diagnose some damages in the engine if any. the representation of th...
详细信息
ISBN:
(纸本)9783319419206;9783319419190
To ensure the liability of civil aircrafts, engines have to be tested after their production. Vibrations are one of the most informative measures to diagnose some damages in the engine if any. the representation of these vibrations as spectrograms provides visual signatures related to damages. However, this representation is noisy and high-dimensional. Moreover, the relevant signatures are localized in small parts of the spectrogram and the number of damaged engines in the database is extremely low. these elements disturb the elaboration of detection algorithms. A new adequate representation computed from the spectrograms is needed in order to perform automatic diagnose of the aircraft engines. In this paper, we study two kinds of representations with dictionaries that can be learnt from the data (NMF) or fixed in advance (curvelets). We present some dictionary comparison methods taking into account the low number of damaged engines.
Multi-view learning is a very useful classification technique when multiple, conditionally independent feature sets are available in a dataset. In this paper multi-view learning is used to classify sequences of protei...
详细信息
ISBN:
(纸本)9783319419206;9783319419190
Multi-view learning is a very useful classification technique when multiple, conditionally independent feature sets are available in a dataset. In this paper multi-view learning is used to classify sequences of protein crystallization images that were obtained over a period of time, varying between a few hours to a few months. We introduce the use of the difference image features, along withthe original image features, as a second feature set in classifying x-ray crystallography images, after arranging the images according to the timeline of an experiment. Usage of multi-view learning is proposed after carrying out experiments to determine the features that should be used in each view to increase classification accuracy. Random forests are used as the classifier in each view, as preliminary experiments have suggested that it provides higher classification accuracy in crystallography datasets. Accuracy of 97.2% was obtained using multi-view learning based on original and difference features, which is the highest obtained so far in the classification of protein crystallography images.
the goal of this study is to develop a method that is capable of inferring geo-locations for non-representative data. In order to protect privacy of surveyed individuals, most data collectors release coarse geo-inform...
详细信息
ISBN:
(数字)9783319419206
ISBN:
(纸本)9783319419206;9783319419190
the goal of this study is to develop a method that is capable of inferring geo-locations for non-representative data. In order to protect privacy of surveyed individuals, most data collectors release coarse geo-information (e.g., tract), rather than detailed geo-information (e.g., street, apt number) when sharing surveyed data. Without the exact locations, many point-based analyses cannot be performed. While several scholars have developed new methods to address this issue, little attention has been paid to how to correct this issue when data are not representative. To fill this knowledge gap, we propose a bias correction method that adjusts for the bias using a bias factor approach. Applying our method to an empirical data set with a known bias associated with gender, we found that our method could generate reliable results despite the non-representativeness of the sample.
We present a solution named LiveDoc, which augments natural language text documents with relevant contextual background information. this background information helps readers to understand the context of the discourse...
详细信息
ISBN:
(纸本)9783319419206;9783319419190
We present a solution named LiveDoc, which augments natural language text documents with relevant contextual background information. this background information helps readers to understand the context of the discourse better by fetching relevant information from other sources such as Wikipedia. Often the readers do not possess all background and supplementary information required for comprehending the purport of a narrative such as a news op-ed article. At the same time, it is not possible for authors to provide all contextual information while addressing a particular topic. LiveDoc processes the information in a document;uses extracted entities to fetch relevant background information in the context of the document from various sources (as defined by user) using semantic matching and topic modeling techniques like Latent Dirichlet Allocation and Hierarchical Dirichlet Process;and presents the background information to the user by augmenting the original document withthe fetched information. Reader is then equipped better to understand the document withthis additional background information. We present the effectiveness of our solution through extensive experimentation and associated results.
this paper proposes a novel methodology for discovering interestingness hotspots in spatial datasets using a graph-based algorithm. We define interestingness hotspots as contiguous regions in space which are interesti...
详细信息
ISBN:
(纸本)9783319419206;9783319419190
this paper proposes a novel methodology for discovering interestingness hotspots in spatial datasets using a graph-based algorithm. We define interestingness hotspots as contiguous regions in space which are interesting based on a domain expert's notion of interestingness captured by an interestingness function. In our recent work, we proposed a computational framework which discovers interestingness hotspots in gridded datasets using a 3-step approach which consists of seeding, hotspot growing and post-processing steps. In this work, we extend our framework to discover hotspots in any given spatial dataset. We propose a methodology which firstly creates a neighborhood graph for the given dataset and then identifies seed regions in the graph using the interestingness measure. Next, we grow interestingness hotspots from seed regions by adding neighboring nodes, maximizing the given interestingness function. Finally after all interestingness hotspots are identified, we create a polygon model for each hotspot using an approach that uses Voronoi tessellations and the convex hull of the objects belonging to the hotspot. the proposed methodology is evaluated in a case study for a 2-dimensional earthquake dataset in which we find interestingness hotspots based on variance and correlation interestingness functions.
datamining and machinelearning is one of the most popular research areas in computer science that is relevant in today's world of unfathomable data. To keep up withthe rising size of data, there arises a need t...
详细信息
ISBN:
(纸本)9781467382045
datamining and machinelearning is one of the most popular research areas in computer science that is relevant in today's world of unfathomable data. To keep up withthe rising size of data, there arises a need to quickly extract knowledge from data sources to aid data analysis research and improve industry and market needs. Primary datamining algorithms like k-means, Apriori, PageRank etc. are used today, but machinelearning techniques can enhance the same by learning from the complex patterns. this paper focuses on the various existing approaches where machinelearning algorithms have been used to improve data classification and patternrecognition in datamining especially for Feature Selection. It compares and contrasts the existing techniques and finds out the best one among them. Further, the paper proposes a heuristic approach to theoretically overcome most of the limitations in existing algorithms.
In this research work an ensemble of bagging, boosting, rotation forest, decorate and random subspace methods with 5 symbolic sub-classifiers in each one is presented. then a voting methodology is used for the final p...
详细信息
ISBN:
(纸本)9781467393119
In this research work an ensemble of bagging, boosting, rotation forest, decorate and random subspace methods with 5 symbolic sub-classifiers in each one is presented. then a voting methodology is used for the final prediction. In order to decrease training time, before building the ensemble redundant features were removed using a slight filter feature selection method. A comparison with simple bagging, boosting, rotation forest, decorate and random subspace methods ensembles with 25 symbolic sub-classifiers is performed, as well as other well-known combining methods, on standard benchmark datasets. the proposed technique is shown to be more accurate than other related methods in most cases.
the proceedings contain 55 papers. the special focus in this conference is on patternrecognition and machine Intelligence. the topics include: Recent Advances in Recommender Systems and Future Directions;On the Numbe...
the proceedings contain 55 papers. the special focus in this conference is on patternrecognition and machine Intelligence. the topics include: Recent Advances in Recommender Systems and Future Directions;On the Number of Rules and Conditions in miningdata with Attribute-Concept Values and "Do Not Care" Conditions;Simplifying Contextual Structures;Towards a Robust Scale Invariant Feature Correspondence;Hierarchical Agglomerative Method for Improving NPS;A New Linear Discriminant Analysis Method to Address the Over-Reducing Problem;Procedural Generation of Adjustable Terrain for Application in Computer Games Using 2D Maps;Fixed Point learning Based 3D Conversion of 2D Videos;Fast and Accurate Foreground Background Separation for Video Surveillance;Enumeration of Shortest Isothetic Paths Inside a Digital Object;Modified Exemplar-Based Image Inpainting via Primal-Dual Optimization;A Novel Approach for Image Super Resolution Using Kernel Methods;Generation of Random Triangular Digital Curves Using Combinatorial Techniques;Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval;Face Profile View Retrieval Using Time of Flight Camera Image Analysis;Context-Based Semantic Tagging of Multimedia data;Improved Simulation of Holography Based on Stereoscopy and Face Tracking;Head Pose Tracking from RGBD Sensor Based on Direct Motion Estimation;A Novel Hybrid CNN-AIS Visual patternrecognition Engine;Modified Orthogonal Neighborhood Preserving Projection for Face recognition;An Optimal Greedy Approximate Nearest Neighbor Method in Statistical patternrecognition.
Binary decision diagrams (BDD) is a compact and efficient representation of Boolean functions with extensions available for sets and finite-valued functions. the key feature of the BDD is an ability to employ internal...
详细信息
ISBN:
(纸本)9783319089799;9783319089782
Binary decision diagrams (BDD) is a compact and efficient representation of Boolean functions with extensions available for sets and finite-valued functions. the key feature of the BDD is an ability to employ internal structure (not necessary known upfront) of an object being modelled in order to provide a compact in-memory representation. In this paper we propose application of the BDD for machinelearning as a tool for fast general patternrecognition. Multiple BDDs are used to capture a sets of training samples (patterns) and to estimate the similarity of a given test sample withthe memorized training sets. then, having multiple similarity estimates further analysis is done using additional layer of BDDs or common machinelearning techniques. We describe training algorithms for BDDs (supervised, unsupervised and combined), an approach for constructing multi-layered networks combining BDDs with traditional artificial neurons and present experimental results for handwritten digits recognition on the MNIST dataset.
Measuring similarity or distance between two data points is fundamental to many machinelearning algorithms such as K-Nearest-Neighbor, Clustering etc. Depending on the nature of the data point, various measurements c...
详细信息
ISBN:
(纸本)9781479959341
Measuring similarity or distance between two data points is fundamental to many machinelearning algorithms such as K-Nearest-Neighbor, Clustering etc. Depending on the nature of the data point, various measurements can be used. DTW is largely used for mining time series but it is not adopted to large data sets because of its quadratic complexity. Global constraints narrow the search path in the matrix which results in a significant decrease in the number of performed calculations. the distance between examples from the same class is small. Instances from different classes are with large distances. A field called metric learning is introduced to make such criteria. In some time series classification tasks, it is a common case that two time series are out of phase, even they share the same class label. An appropriate constraint of DTW can strongly improve the classification performance. It is to choose the appropriate size of the global constraint. A Tabu search algorithm is used to find the optimal size of the global constraint. Results show the efficiency of the proposed method in terms of the improvement of the classification results and the CPU time.
暂无评论