data series classification is an important and challenging problem in data science. Explaining the classification decisions by finding the discriminant parts of the input that led the algorithm to some decision is a r...
详细信息
ISBN:
(纸本)9781450392495
data series classification is an important and challenging problem in data science. Explaining the classification decisions by finding the discriminant parts of the input that led the algorithm to some decision is a real need in many applications. Convolutional neural networks perform well for the data series classification task;though, the explanations provided by this type of algorithms are poor for the specific case of multivariate data series. Addressing this important limitation is a significant challenge. In this paper, we propose a novel method that solves this problem by highlighting both the temporal and dimensional discriminant information. Our contribution is two-fold: we first describe a convolutional architecture that enables the comparison of dimensions;then, we propose a method that returns dCAM, a Dimension-wise Class Activation Map specifically designed for multivariate time series (and CNN-based models). Experiments with several synthetic and real datasets demonstrate that dCAM is not only more accurate than previous approaches, but the only viable solution for discriminant feature discovery and classification explanation in multivariate time series.
Large pre-trained language models help to achieve state of the art on a variety of natural language processing (NLP) tasks, nevertheless, they still suffer from forgetting when incrementally learning a sequence of ***...
详细信息
Tabular data synthesis is a long-standing research topic in machine learning. Many different methods have been proposed over the past decades, ranging from statistical methods to deep generative methods. However, it h...
Pancreatic cancer poses a significant challenge in early detection and treatment due to its malignant nature within the digestive tract. Recent studies, such as those conducted by the Pancreatic Cancer Action Network,...
详细信息
Micro-mobility services have gained popularity in the last years, becoming a relevant part of the transportation network in a plethora of cities. This has given rise to a fruitful research area, covering from the impa...
详细信息
ISBN:
(数字)9781665468800
ISBN:
(纸本)9781665468800
Micro-mobility services have gained popularity in the last years, becoming a relevant part of the transportation network in a plethora of cities. This has given rise to a fruitful research area, covering from the impact and relationships of these transportation modes with preexisting ones to the different ways for estimating the demand of such services in order to guarantee the quality of service. Within this domain, docked bike sharing systems constitute an interesting surrogate for understanding the mobility of the whole city, as origin-destination matrices can be obtained straightforward from the information available at the docking stations. This work elaborates on the characterization of such origin-destination matrices, providing an essential set of insights on how to estimate their behavior in the long-term. To do so, the main non-mobility features that affect mobility are studied and used to train different machine learning algorithms to produce viable mobility patterns. The case study performed over real data captured by the bike sharing system of Bilbao (Spain) reveals that, by virtue of a properly selected set of features and the adoption of specialized modeling algorithms, reliable long-term estimations of such origin-destination matrices can be effectively achieved.
Day by day, the role of data science and machine learning in cricket is increasing due to the large amount of data generated from a single player on a whole line. The field of data science is the intensive study of da...
详细信息
Accurate network traffic classification is the key to achieve controllable and manageable Internet. This paper analyzes and compares the application of algorithm in the classification of network traffic source country...
Accurate network traffic classification is the key to achieve controllable and manageable Internet. This paper analyzes and compares the application of algorithm in the classification of network traffic source country. The experimental results show that logistic regression is the most efficient. SVM and neural networks take longer, probably because their algorithms are more complex. Optimize the neural network time consumption by setting the appropriate batch size. The time consumption of the three machine learning algorithm models on Precision is about 92%. In terms of Recall and F1 scores, the three machine learning algorithms mainly concentrate between 89% and 90%. There is no significant difference between the algorithm models. SVM was superior in precision (92%), recall (89%) and F1 score (90%).
data augmentation reduces the generalization error by forcing a model to learn invariant representations given different transformations of the input image. In computer vision, on top of the standard image processing ...
详细信息
ISBN:
(纸本)9781665409155
data augmentation reduces the generalization error by forcing a model to learn invariant representations given different transformations of the input image. In computer vision, on top of the standard image processing functions, data augmentation techniques based on regional dropout such as CutOut, MixUp, and CutMix and policy-based selection such as AutoAugment demonstrated state-of-the-art (SOTA) results. With an increasing number of data augmentation algorithms being proposed, the focus is always on optimizing the input-output mapping while not realizing that there might be an untapped value in the transformed images with the same label. We hypothesize that by forcing the representations of two transformations to agree, we can further reduce the model generalization error. We call our proposed method Agreement Maximization or simply AgMax. With this simple constraint applied during training, empirical results show that data augmentation algorithms can further improve the classification accuracy of ResNet50 on ImageNet by up to 1.5%, WideResNet40-2 on CIFAR10 by up to 0.7%, WideResNet40-2 on CIFAR100 by up to 1.6%, and LeNet5 on Speech Commands dataset by up to 1.4%. Experimental results further show that unlike other regularization terms such as label smoothing, AgMax can take advantage of the data augmentation to consistently improve model generalization by a significant margin. On downstream tasks such as object detection and segmentation on PascalVOC and COCO, AgMax pre-trained models outperforms other data augmentation methods by as much as 1.0mAP (box) and 0.5mAP (mask). Code is available at https://***/roatienza/agmax.
In this paper, we systematically study how to use edge computing to monitor the movements of multiple connected and automated vehicles (CAV) and warn of potential accidents (e.g., lane departures, collisions). Compare...
详细信息
ISBN:
(数字)9781665468800
ISBN:
(纸本)9781665468800
In this paper, we systematically study how to use edge computing to monitor the movements of multiple connected and automated vehicles (CAV) and warn of potential accidents (e.g., lane departures, collisions). Compared to conventional approaches that only use the sensing data of individual vehicles, cooperative vehicle infrastructure systems directly collect the movement data of vehicles via vehicle-to-everything (V2X) communications and thus easily calculate the risk of every vehicle synthetically. We propose a fast algorithm and the corresponding data structure model to calculate collision risks based on the timely received data. We also discuss the data accuracy and transmission delay requirements to guarantee the driving safety of CAVs. Testing results show the effectiveness of the proposed approach.
This paper presents a new hybrid model, termed Random Long Short Term Memory (RLST), which merges the Random Forest Algorithm's effective feature selection capability with the Long Short-Term Memory (LSTM) network...
详细信息
ISBN:
(数字)9798331527662
ISBN:
(纸本)9798331527679
This paper presents a new hybrid model, termed Random Long Short Term Memory (RLST), which merges the Random Forest Algorithm's effective feature selection capability with the Long Short-Term Memory (LSTM) network's potent capacity for modeling time series data, thereby introducing a novel methodology for the analysis of electrocardiogram (ECG) data. The model first uses random forests to extract the most discriminative features from complex ECG signals, which are then fed into the LSTM network for in-depth sequence learning to capture subtle patterns in the ECG signal over time. This comprehensive approach not only improves the recognition accuracy of ECG abnormalities, but also enhances the model's joint understanding of different ECG characteristics and time series dynamics, which brings significant performance improvement for heart health assessment and early diagnosis of disease.
暂无评论