This paper presents an algorithm to distinguish whether the output label that is yielded from multiclass support vector machine (SVM) is true or false without knowing the answer. Such judgment is done only by the conf...
详细信息
ISBN:
(纸本)9781479958962
This paper presents an algorithm to distinguish whether the output label that is yielded from multiclass support vector machine (SVM) is true or false without knowing the answer. Such judgment is done only by the confidence analysis based on the pre-training/testing using the training data. Such true/false judgment is useful for refining the output labels. We experimentally demonstrate that the decision value difference between the top candidate and the second candidate is a good measure. In addition, a proper threshold can be determined by the pre-training/testing using only the training data. Experimental results using three standard image datasets demonstrate that our proposed algorithm can improve Matthews correlation coefficient (MCC) much better than simply thresholding the decision value for the top candidate.
The classical front end analysis in speech recognition is a spectral analysis which parameterizes the speech signal into feature vectors. This paper proposes a voice recognition model that is able to automatically cla...
详细信息
ISBN:
(纸本)9781479942367
The classical front end analysis in speech recognition is a spectral analysis which parameterizes the speech signal into feature vectors. This paper proposes a voice recognition model that is able to automatically classify and recognize a voice signal with background noise. The model uses the concept of spectrogram, pitch period, short time energy, zero crossing rate, mel frequency scale and cepestral coefficient in order to calculate feature vectors. The k-Nearest Neighbor (k-NN) classification is used for classification and recognition of real-time input signal. Analytical hierarchical process is used for deciding the weightage of different features.
There is very little practicable significance to prove the equivalency between a pseudo-inverse linear discriminant (PILD) with the desired outputs in reverse proportion to the number of within-class samples and a Fis...
详细信息
ISBN:
(纸本)9781509061822
There is very little practicable significance to prove the equivalency between a pseudo-inverse linear discriminant (PILD) with the desired outputs in reverse proportion to the number of within-class samples and a Fisher linear discriminant (FLD) with the totally projected mean thresholds which are disadvantageous to improve the overall classification accuracy. Even if so, several examples have borne out that a PILD is not wholly equivalent to an FLD. Consequently, the most often used total-projected-mean thresholds usually behave poor. Starting from the customarily desired targets {1, -1}, a simple practicable threshold is gotten, which is only related to sample sizes. By substituting the desired targets with the actually algebraic distances of all training samples, a new threshold is obtained. When the desired targets are different from each other, the weight vector and the threshold given by a PILD are not equal to the ones given by an FLD anymore. At the moment, a PILD is wholly different from an FLD.
This thesis focuses on the field of Job Recommendation. Particularly, we focus on using implicit preferences exhibited by the job seeker in interactions with a web platform to propose an improved ranking algorithm for...
详细信息
This thesis focuses on the field of Job Recommendation. Particularly, we focus on using implicit preferences exhibited by the job seeker in interactions with a web platform to propose an improved ranking algorithm for a job recommendation platform called Magnet. me. We also study evaluation of relevance, and evaluation of recommendation sorting algorithms to determine the degree of improvement achieved by the proposed algorithm. Using NDCG with different relevance evaluations, we test performance of the proposed algorithm in an online experiment on the job recommendation platform. We find that the evaluation of relevance strongly affects the distinguishability of NDCG. The evaluation shows that our sorting algorthm outperforms the original algorithm when using classical binary relevance, or relevance evaluations that consider items with negative feedback less relevant than items with missing feedback. However, when using relevance evaluations for NDCG that punish missing feedback more than negative feedback, NDCG loses its capability of distinguishing between algorithm performance. Based on baseline sorting algorithm evaluation MRR and the different evaluations using NDCG, we conclude that the proposed recommendation sorting algorithm outperforms the original algorithm.
For the problem low speech recognition rate, an improved method of combining Deep Belief Network(DBN) with support vector machine(SVM) for analyzing Small sample speech signals is proposed. The speech
ISBN:
(纸本)9781467389808
For the problem low speech recognition rate, an improved method of combining Deep Belief Network(DBN) with support vector machine(SVM) for analyzing Small sample speech signals is proposed. The speech
Word segmentation is the very first task for Vietnamese language processing. Word-segmented text is the input of almost other NLP tasks. This task faces some challenges due to specific characteristics of the language....
详细信息
ISBN:
(纸本)9781509041343
Word segmentation is the very first task for Vietnamese language processing. Word-segmented text is the input of almost other NLP tasks. This task faces some challenges due to specific characteristics of the language. As in many other Asian languages such as Japanese, Korean and Chinese, white spaces in Vietnamese are not always used as word separators and a word may contain one or more syllables. In this paper, we propose an efficient hybrid approach to detect word boundary for Vietnamese texts using logistic regression as a binary classifier combining with longest matching algorithm. First, longest matching algorithm is used to catch words that contain more than two syllables in input sentence. Next, the system utilizes the classifier to determine the boundary of 2-syllable words and proper names. Then, the predictions having low confidence conducted by the classifier are verified by a dictionary to get the final result. Our system can achieve an F-measure of 98.82% which is the most accurate result for Vietnamese word segmentation to the best of our knowledge. Moreover, the system also has a high speed. It can run word segmentation for nearly 34k tokens per second.
In this paper, we propose a novel Payload-based One-class Classifier for Anomaly Detection called POCAD, which combines a generalized 2w-gram feature extractor and a one-class SVM classifier to effectively detect netw...
详细信息
ISBN:
(纸本)9781509020980
In this paper, we propose a novel Payload-based One-class Classifier for Anomaly Detection called POCAD, which combines a generalized 2w-gram feature extractor and a one-class SVM classifier to effectively detect network intrusion attacks. We extensively evaluate POCAD with real-world datasets of HTTP-ased attacks. Our experiment results show that POCAD can quickly detect malicious payload and achieves a high detection rate as well as a low false positive rate. The experiment results also show that POCAD outperforms state of the art payload-based detection schemes such as McPAD [4] and PAYL [8].
We present an online boundary classification error detection algorithm to improve accuracy of the original distributed boundary detection algorithm for networked multi-robot systems. It is a fully decentralized method...
详细信息
ISBN:
(纸本)9781509027118
We present an online boundary classification error detection algorithm to improve accuracy of the original distributed boundary detection algorithm for networked multi-robot systems. It is a fully decentralized method based on the geometric approach allowing to suppress boundary errors without recursive process and global synchronization. The accuracy of the ration of correctly identified robots over the total number of robots reaches 100%. We have demonstrated the effectiveness of this boundary detection algorithm in both simulation and real-world environment.
In real-time anomaly detection problems, reducing the dimensionality and improving recognition rate are two most crucial problems. The unbalanced data distribution is one of main reasons of leading to low recognition ...
详细信息
ISBN:
(纸本)9781479973972
In real-time anomaly detection problems, reducing the dimensionality and improving recognition rate are two most crucial problems. The unbalanced data distribution is one of main reasons of leading to low recognition rate. In this paper, a hybrid approach using Tabu search (TS) and ensemble classification algorithm is proposed. Tabu search is simultaneously applied to select features and weights of ensemble classification. To relieve unbalanced data problems, three policies are used: taking advantages of the cost function of TS to attach more importance to high recognition rate of minority class in the process of feature selection, constructing new sample sets by using oversampling and undersampling, and using ensemble classification method to improve the detection accuracy at low false positive rates. Experimental results show that the approach is effective to improve the classification accuracy of unbalanced class.
V oboru informačních technologií je jednou ze základních dovedností každého programátora zvládnutí problematiky řadicích algoritmů. Řadicí algoritmy jsou vyu...
详细信息
V oboru informačních technologií je jednou ze základních dovedností každého programátora zvládnutí problematiky řadicích algoritmů. Řadicí algoritmy jsou využívány ve velmi širokém rozmezí a i když se v každém programovacím jazyce zapisují odlišně, jejich princip zůstává stejný. V této práci se budu zabývat problematikou řadicích algoritmů a popisem programu, který je součástí této bakalářské práce. Program má jednoduchou formou demonstrovat principy nejpoužívanějších řadicích algoritmů a slouží tak jako pomůcka pro snazší pochopení metod řazení.
暂无评论