This volume contains 95 papers presented at FICTA 2014: Thirdinternationalconference on Frontiers in Intelligent Computing: Theory and Applications. The conference was held during 14-15, November, 2014 at Bhubaneswa...
ISBN:
(纸本)9783319119328
This volume contains 95 papers presented at FICTA 2014: Thirdinternationalconference on Frontiers in Intelligent Computing: Theory and Applications. The conference was held during 14-15, November, 2014 at Bhubaneswar, Odisha, India. This volume contains papers mainly focused on data Warehousing and mining, machinelearning, Mobile and Ubiquitous Computing, AI, E-commerce & Distributed Computing and Soft Computing, Evolutionary Computing, Bio-inspired Computing and its Applications.
In this paper, we present a novel model that we propose for document representation. In contrast with the classical Vector Space Model which represents each document by a unique vector in the feature space, our model ...
详细信息
ISBN:
(纸本)9783319089799;9783319089782
In this paper, we present a novel model that we propose for document representation. In contrast with the classical Vector Space Model which represents each document by a unique vector in the feature space, our model consists in representing each document by a vector in the space of training documents of each category. We develop, for this novel model, a discriminative classifier which is based on the norms of the generated vectors by our model. We call this algorithm the Nearest Cetroid based on Vector Norms. Our major goal, by the proposition of such new classification framework, is to overcome the problems related to huge dimensionality and vector sparsity which are commonly faced in Text Classification problems. We evaluate the performance of the proposed framework by comparing its effectiveness and efficiency with those of some standard classifiers when used with the classical document representation. The studied classifiers are Naive Bayes (NB), Support Vector machines (SVM) and k-Nearest Neighbors (kNN). We conduct our experiments on multi-lingual balanced and unbalanced binary data sets. Our results show that our algorithm typically performs well since it is competitive with the classical methods and, at the same time, dramatically faster especially in comparison with NB and kNN. We also apply our model on the Reuters21578 corpus so as to evaluate its performance in a multi-class environment. We can say that the obtained result (85.4% in terms of micro-F1) is promising and that it can be improved in future works.
In recent years, learning from imbalanced data sets has become a challenging issue in machinelearning and datamining communities. This problem occurs when some classes of data have smaller number of instances than o...
详细信息
Monitoring data streams in a distributed system is a challenging problem with profound applications. The task of feature selection (e.g., by monitoring the information gain of various features) is an example of an app...
详细信息
ISBN:
(纸本)9783319089799;9783319089782
Monitoring data streams in a distributed system is a challenging problem with profound applications. The task of feature selection (e.g., by monitoring the information gain of various features) is an example of an application that requires special techniques to avoid a very high communication overhead when performed using straightforward centralized algorithms. Motivated by recent contributions based on geometric ideas, we present an alternative approach that combines system theory techniques and clustering. The proposed approach enables monitoring values of an arbitrary threshold function over distributed data streams through a set of constraints applied independently on each stream and/or clusters of streams. The clusters are designed to adapt themselves to the data stream. A correct choice of clusters yields a reduction in communication load. Unlike many clustering algorithms that attempt to collect together similar data items, monitoring requires clusters with dissimilar vectors canceling each other as much as possible. In particular, sub-clusters of a good cluster do not have to be good. This novel type of clustering dictated by the problem at hand requires development of new algorithms, and the paper is a step in this direction. We report experiments on real-world data that detect instances where communication between nodes is required, and show that the clustering approach reduces communication load.
As the landscape around Big data continues to exponentially evolve, the "big" facet of Big data is no more number one priority of researchers and IT professionals. The race has recently become more about how...
详细信息
ISBN:
(纸本)9783642550324;9783642550317
As the landscape around Big data continues to exponentially evolve, the "big" facet of Big data is no more number one priority of researchers and IT professionals. The race has recently become more about how to sift through torrents of data to find the hidden diamond and engineer a better, smarter and healthier world. The ease with which our mobile captures daily data about ourselves makes it an exceptionally suitable means for ultimately improving the quality of our lives and gaining valuable insights into our affective, mental and physical state. This paper takes the first exploratory step into this direction by using the mobile to process and analyze the "digital exhaust" it collects to automatically recognize our emotional states and accordingly respond to them in the most effective and "human" way possible. To achieve this we treat all technical, psycho-somatic, and cognitive aspects of emotion observation and prediction, and repackage all these elements into a mobile multimodal emotion recognition system that can be used on any mobile device.(1)
Cross-lingual sentiment classification aims to automatically predict sentiment polarity (e.g., positive or negative) of data in a labelscare target language by exploiting labeled data from a label-rich language. The f...
详细信息
Biomedical experts are confronted with "Big data", driven by the trend towards precision medicine. Despite the fact that humans are excellent at patternrecognition in dimensions of ≤ 3, most biomedical dat...
详细信息
We propose a set of features to study the effects of data streams on complex systems. This feature set is called the the signature representation of a stream. It has its origin in pure mathematics and relies on a rela...
详细信息
The paper describes a novel approach to categorize users39; reviews according to the three Quality in Use (QU) indicators defined in ISO: effectiveness, efficiency and freedom from risk. With the tremendous amount o...
详细信息
pattern matching is one of the principal methodologies in speech signal processing and isolated wordrecognition. There are a number of similarity measures to find the similarity between the target words in an acquire...
详细信息
pattern matching is one of the principal methodologies in speech signal processing and isolated wordrecognition. There are a number of similarity measures to find the similarity between the target words in an acquired speech by matching with template. The retrieved similarity result is dependent both on the feature vectors extraction technique and the distance metric produced by similarity measure. Short Time Fourier Transform is used for spectrogram re- trieval which is used for performance comparison of different simi- larity measures. In this paper, a novel similarity measure is proposed which is based on vector addition. It outperforms in terms of execu- tion time as compared to other similarity measures. We have done a detailed study of the performance of different similarity measurement techniques for a number of spectrograms on a large database of speech recordings. Manhattan distance, Euclidean distance, vector cosine angle distance, Bhattacharyya coefficients and posterior prob- ability measure are used for the performance comparison. The results show that this similarity measure outperforms as compare to all above similarity measures in terms of time complexity and similarity performance.
暂无评论