Higgs Boson is an elementary particle that gives the mass to everything in the natural world. The discovery of the Higgs Boson is a major challenge for particle physics. This paper proposes to solve the Higgs Boson Cl...
详细信息
Higgs Boson is an elementary particle that gives the mass to everything in the natural world. The discovery of the Higgs Boson is a major challenge for particle physics. This paper proposes to solve the Higgs Boson Classification Problem with four Machine Learning (ML) Methods, using the Pyspark environment: Logistic Regression (LR), Decision Tree (DT), Random Forest (RF) and Gradient Boosted Tree (GBT). We compare the accuracy and AUC metrics of those ML Methods. We use a large dataset as Higgs Boson, collected from public site UCI and Higgs dataset downloaded from Kaggle site, in the experimentation stage.
Outlier detection is one of the main fields in machine learning and it has been growing rapidly due to its wide range of applications. In the last few years, deep learning-based methods have outperformed machine learn...
详细信息
ISBN:
(纸本)9781450388894
Outlier detection is one of the main fields in machine learning and it has been growing rapidly due to its wide range of applications. In the last few years, deep learning-based methods have outperformed machine learning and handcrafted outlier detection techniques, and our method is no different. We present a new twist to generative models which leverages variational autoencoders as a source for uniform distributions which can be used to separate the inliers from the outliers. Both the generative and adversarial parts of the model are used to obtain three main losses (Reconstruction loss, KL-divergence, Discriminative loss) which in return are wrapped with a one-class SVM which is used to make the predictions. We evaluated our method against several datasets both for images and tabular data and it has shown great results for the zero-shot outlier detection problem and was able to easily generalize it for supervised outlier detection tasks on which the performance has increased. For comparison, we evaluated our method against several of the common outlier detection techniques such as DBSCAN-based outlier detection, GMM, K-means and one class SVM directly, and we have outperformed all of them on all datasets.
This paper proposes a new form of diagnosis and repair based on reinforcement learning. Self-interested agents learn locally which agents may provide a low quality of service for a task. The correctness of learned ass...
详细信息
This paper shows a new approach for anomaly detection by combining the extraction of so-called triples consisting of a subject, predicate, and object using dynamic anomaly-detection. First, the methods used to extract...
详细信息
Methods from supervised machine learning allow the classification of new data automatically and are tremendously helpful for data *** quality of supervised maching learning depends not only on the type of algorithm us...
详细信息
Methods from supervised machine learning allow the classification of new data automatically and are tremendously helpful for data *** quality of supervised maching learning depends not only on the type of algorithm used,but also on the quality of the labelled dataset used to train the *** instances in a training dataset is often done manually relying on selections and annotations by expert analysts,and is often a tedious and time-consuming *** learning algorithms can automatically determine a subset of data instances for which labels would provide useful input to the learning *** visual labelling techniques are a promising alternative,providing effective visual overviews from which an analyst can simultaneously explore data records and select items to a *** putting the analyst in the loop,higher accuracy can be achieved in the resulting *** initial results of interactive visual labelling techniques are promising in the sense that user labelling can improve supervised learning,many aspects of these techniques are still largely *** paper presents a study conducted using the mVis tool to compare three interactive visualisations,similarity map,scatterplot matrix(SPLOM),and parallel coordinates,with each other and with active learning for the purpose of labelling a multivariate *** results show that all three interactive visual labelling techniques surpass active learning algorithms in terms of classifier accuracy,and that users subjectively prefer the similarity map over SPLOM and parallel coordinates for *** also employ different labelling strategies depending on the visualisation used.
For a class D of drawings of loopless (multi-)graphs in the plane, a drawing D ∈ D is saturated when the addition of any edge to D results in D0 ∈/ D—this is analogous to saturated graphs in a graph class as introd...
To train end-to-end automatic speech recognition models, it requires a large amount of labeled speech data. This goal is challenging for languages with fewer resources. In contrast to the commonly used feature level d...
详细信息
ISBN:
(数字)9781728193205
ISBN:
(纸本)9781728193236
To train end-to-end automatic speech recognition models, it requires a large amount of labeled speech data. This goal is challenging for languages with fewer resources. In contrast to the commonly used feature level data augmentation, we propose to expand the training set by using different audio codecs at the data level. The augmentation method consists of using different audio codecs with changed bit rate, sampling rate, and bit depth. The change reassures variation in the input data without drastically affecting the audio quality. Besides, we can ensure that humans still perceive the audio, and any feature extraction is possible later. To demonstrate the general applicability of the proposed augmentation technique, we evaluated it in an end-to-end automatic speech recognition architecture in four languages. After applying the method, on the Amharic, Dutch, Slovenian, and Turkish datasets, we achieved a 1.57 average improvement in the character error rates (CER) without integrating language models. The result is comparable to the baseline result, showing CER improvement of 2.78, 1.25, 1.21, and 1.05 for each language. On the Amharic dataset, we reached a syllable error rate reduction of 6.12 compared to the baseline result.
Analyzing multi-way measurements with variations across one mode of the dataset is a challenge in various fields including data mining, neuroscience and chemometrics. For example, measurements may evolve over time or ...
详细信息
暂无评论