In this paper, we have discussed different data pre-processing techniques and different machinelearning and deep learning models which are used for sentiment analysis. The dataset used was 'Restaurant Reviews'...
详细信息
machinelearning-based classification algorithms typically operate under assumptions that assert that the underlying data generating distribution is stationary and draws from a finite set of categories. In some scenar...
详细信息
ISBN:
(数字)9781665490627
ISBN:
(纸本)9781665490627
machinelearning-based classification algorithms typically operate under assumptions that assert that the underlying data generating distribution is stationary and draws from a finite set of categories. In some scenarios, these assumptions might not hold, but identifying violating inputs - here referred to as anomalies - is a challenging task. Recent publications propose deep learning-based approaches that perform anomaly detection and classification jointly by (implicitly) learning a mapping that projects data points to a lower-dimensional space, such that the images of points of one class reside inside of a hypersphere, while others are mapped outside of it. In this work, we propose Multi-Class Hypersphere Anomaly Detection (MCHAD), a new hypersphere learning algorithm for anomaly detection in classification settings, as well as a generalization of existing hypersphere learning methods that allows incorporating example anomalies into the training. Extensive experiments on competitive benchmark tasks, as well as theoretical arguments, provide evidence for the effectiveness of our method. Our code is publicly available(1).
Multifarious approaches exist for data preprocessing in machinelearning (ML), patternrecognition, and datamining, and feature selection (FS) is regarded as the most efficacious technique for dimensionality reductio...
详细信息
ISBN:
(纸本)9798331529246
Multifarious approaches exist for data preprocessing in machinelearning (ML), patternrecognition, and datamining, and feature selection (FS) is regarded as the most efficacious technique for dimensionality reduction in a structured way. The existence of irrelevant, redundant, and disproportionate features directly diminishes the model's performance and dramatically expands its complexity. Distinguishable FS algorithms can outperform others, and the most influential aspect of a well-performing FS algorithm is discovering momentous features that help optimize model performance by processing data more efficiently. Therefore, this study introduces a novel FS technique based on the joint mutual information (FSJMI) technique for life science data. This technique declines the number of features and enhances the model's performance, or at least maintains it in most cases, making it a unique and promising approach. It also enriches predictive accuracy while minimizing computational overhead by substituting the correlation between the features and the correlation substitution of the class values with the features. Comprehensive experiments utilizing ten diverse life science datasets have been conducted to validate the suggested model's performance. The experimental results demonstrate a substantial improvement in the proposed techniques compared to other well-known FS techniques and no selection (NoSel) in the primary dataset. The results conveyed using widely used ML algorithms such as Decision Tree (DT), K Nearest Neighbour (KNN), Naive Bayes (NB) Classifier, and Logistic Regression (LR) reveal that the presented algorithm demonstrated superior performance and outperformed NoSel and other FS algorithms in mean accuracy by at least 2.7%, 3.466%, 2.038%, and 1.792%, respectively. Additionally, regarding the LR model, the proposed algorithm outperforms one of the FS models by at most 11.631%. These findings of the proposed algorithm regarding outcomes highlight its capabili
This article outlines the digitization process and methodology applied to the archive of parliamentary questions from the 1st Parliamentary Term (1974-1977) in the Hellenic Parliament. A collaborative pilot project in...
详细信息
ISBN:
(纸本)9783031706448;9783031706455
This article outlines the digitization process and methodology applied to the archive of parliamentary questions from the 1st Parliamentary Term (1974-1977) in the Hellenic Parliament. A collaborative pilot project involving parliament, academia, and a research center facilitated the conversion of printed material to open data. The main tasks of the project include capturing digital images, a custom Optical Character recognition (OCR) software solution employing machinelearning, and rigorous validation for accuracy of a fragmented and of variable quality polytonic corpus in a variety of modern Greek language called Katharevousa. The article discusses the approach and challenges as well as the initial results of the digitization effort, emphasizing ongoing research steps. Overall, 1,674 images were digitally processed corresponding to 1,338 questions. Following algorithmic training, character recognition accuracy is over 98.5%. Successful implementation streamlines further similar digitalization operations in the vast parliamentary archives, while enabling in-depth studies on parliamentary control in the turbulent period of the immediate post-junta era in Greece. A preliminary comparative analysis with a corpus of newer parliamentary questions (2009-2019) provides insights and incentives for the further study of the characteristics and evolution of the Greek language.
Due to increasing amount of internet usage, mining useful information and knowledge from the proxy server log is evolving into a significant research area. Web usage mining is the method of extracting interesting patt...
详细信息
Information technology has become an integral part of modern life, but with this come new cyber threats. One of them is botnets—networks of infected computers that criminals use for DDoS attacks, data theft, and spam...
详细信息
In this research paper, our main focus is to design and develop a system for classification and recognition methodology for the acknowledgment and retrieval of a Sunflower flower in the natural environment centralized...
详细信息
In this research paper, our main focus is to design and develop a system for classification and recognition methodology for the acknowledgment and retrieval of a Sunflower flower in the natural environment centralized on the indigenous habitat dependent on a multi-layer method. Further, we design applica-tions for their better classification. To handle a difficult undertaking task, an interdisciplinary cooperation is displayed dependent in the latest advancement methods in software implementation in engineering and innovation implemented by machinelearning. A proposed work is design to increase the strategy for utilizing the techniques of machinelearning. Final utilization of the Texture Feature, Rst-Invariant Feature, pattern Classification and furthermore utilize the K-Closest Neighbor calculations is done. Firstly, the paper is proposes to study about how to gather a flower images from the natural environment along with their corresponding background and Secondly, the paper focus on the Sunflower classification utility through machinelearning. The computerization methods through blossom utilizing through AI system for sunflower utilized the 6-types of sunflower to get the fine yielding of profoundly sprouted sunflower blooms is caught from an advanced camera with a picture. The process of recognition imple-mented carried with 280 pictures. This method used a recognition as well as classification of sunflower by using the k-nearest neighbor image having overall 88.52% accuracy. This designed research paper, we trained the model with information and when concealed information is achieved then the predictive model predicts the Sunflower recognition through trained data supervised technique with machinelearning. (c) 2021 Elsevier Ltd. All rights reserved. Selection and peer-review under responsibility of the scientific committee of the 1stinternational Con-ference on Computations in Materials and Applied Engineering - 2021.
With the advances in machinelearning, lie detection technology gained significant attention. In recent years, several multi-modal techniques achieved as high as 99% accuracy results using the Real-life Trial dataset ...
详细信息
ISBN:
(数字)9781665490627
ISBN:
(纸本)9781665490627
With the advances in machinelearning, lie detection technology gained significant attention. In recent years, several multi-modal techniques achieved as high as 99% accuracy results using the Real-life Trial dataset with only 121 data points. This led to considerable media hype and research interest in lie detection with machinelearning. In this paper, we analyze the effect of dataset bias in deception detection. More specifically, we train a classifier to predict the sex of the identity appearing in the video. On a testdata point, we use the sex predictor to predict sex which we use as a proxy for predicting deception, predicting lie for females and truth for males. This lie predictor simulates a classifier that uses nothing but dataset bias. Nevertheless, we find that the performance of this biased classifier is comparable to those of state-of-the-art papers. More specifically, when using IDT features, our biased classifier achieves 64.6% and 59.3% AUC while a classifier trained normally on truth/lie labels achieves 57.4% accuracy and 69.3% AUC. We perform similar experiments on the Bag-of-Lies dataset and show that it too is biased with respect to sex. In addition, we apply the state-of-the-art techniques on an unbiased dataset and show that their performance is no better than chance. Our experiments strongly suggest that the results of recent deception detection techniques can be explained by the bias inherent in the datasets.
Surgical action triplet recognition provides a better understanding of the surgical scene. This task is of high relevance as it provides the surgeon with context-aware support and safety. The current go-to strategy fo...
详细信息
ISBN:
(数字)9783031395390
ISBN:
(纸本)9783031395383;9783031395390
Surgical action triplet recognition provides a better understanding of the surgical scene. This task is of high relevance as it provides the surgeon with context-aware support and safety. The current go-to strategy for improving performance is the development of new network mechanisms. However, the performance of current state-of-the-art techniques is substantially lower than other surgical tasks. Why is this happening? This is the question that we address in this work. We present the firststudy to understand the failure of existing deep learning models through the lens of robustness and explainability. Firstly, we study current existing models under weak and strong d- perturbations via an adversarial optimisation scheme. We then analyse the failure modes via feature based explanations. Our study reveals that the key to improving performance and increasing reliability is in the core and spurious attributes. Our work opens the door to more trustworthy and reliable deep learning models in surgical data science. https://***/robustIVT/.
The field of graph datamining, one of the most important AI research areas, has been revolutionized by graph neural networks (GNNs), which benefit from training on real-world graph data with millions to billions of n...
详细信息
ISBN:
(纸本)9781450392365
The field of graph datamining, one of the most important AI research areas, has been revolutionized by graph neural networks (GNNs), which benefit from training on real-world graph data with millions to billions of nodes and links. Unfortunately, the training data and process of GNNs involving graphs beyond millions of nodes are extremely costly on a centralized server, if not impossible. Moreover, due to the increasing concerns about data privacy, emerging data from realistic applications are naturally fragmented, forming distributed private graphs of multiple "data silos", among which direct transferring of data is forbidden. The nascent field of federated learning (FL), which aims to enable individual clients to jointly train their models while keeping their local data decentralized and completely private, is a promising paradigm for large-scale distributed and private training of GNNs. FedGraph2022 aims to bring together researchers from different backgrounds with a common interest in how to extend current FL algorithms to operate with graph data models such as GNNs. FL is an extremely hot topic of large commercial interest and has been intensively explored for machinelearning with visual and textual data. The exploration from graph mining researchers and industrial practitioners is timely catching up just recently. There are many unexplored challenges and opportunities, which urges the establishment of an organized and open community to collaboratively advance the science behind it. The prospective participants of this workshop will include researchers and practitioners from both graph mining and federated learning communities, whose interests include, but are not limited to: graph analysis and mining, heterogeneous network modeling, complex datamining, large-scale machinelearning, distributed systems, optimization, meta-learning, reinforcement learning, privacy, robustness, explainability, fairness, ethics, and trustworthiness.
暂无评论