Sparse mobile crowdsensing is a new crowdsensing paradigm which leverages the spatial and temporal correlation between data sensed at different locations over time to reduce the overall sensing cost by significantly r...
详细信息
Sparse mobile crowdsensing is a new crowdsensing paradigm which leverages the spatial and temporal correlation between data sensed at different locations over time to reduce the overall sensing cost by significantly reducing the number of sensing tasks. Consequently, only sparsely selected spatio-temporal cells would be reporting the sensed data, whereas data for the rest of the cells would have to be inferred from the sensed data. This process, which is largely known as missing data inference is the focus of this study. We examine the KNN (K-Nearest Neighbor) approach, which is known to be relatively faster and simpler. However, it is generally accepted to perform poorly when the sensed data is sparse. In the context of environmental crowd sensing, we examine whether it is a viable missing data inference approach if we incorporate the spatio-temporal correlation of data in the algorithm, instead of just exploiting either the spatial or the temporal correlation independently. Thus, we examine three variants of KNN: KNN-ST (KNN-Spatio-Temporal), KNN-S (KNN-Spatial), and KNN-T (KNN-Temporal) on sparse data. Besides, we find that voxelization is a natural way of exploiting the spatio-temporal properties of sensed data and thereby the spatio-temporal correlation between them. Interestingly, we find that KNN-ST indeed shows good performance (normalized absolute error of about 0.1) even when the loss probability is as high as 0.9. Additionally, we implement an existing method on the same experimental datasets and present corresponding comparative simulation results.
Effective network intrusion detection using anomaly scores from unsupervised machine learning models depends on the performance of the models. Although unsupervised models do not require labels during the training and...
详细信息
Effective network intrusion detection using anomaly scores from unsupervised machine learning models depends on the performance of the models. Although unsupervised models do not require labels during the training and testing phases, the assessment of their performance metrics during the evaluation phase still requires comparing anomaly scores against labels. In real-world scenarios, the absence of labels in massive network datasets makes it infeasible to calculate performance metrics. Therefore, it is valuable to develop an algorithm that calculates robust performance metrics without using labels. In this paper, we propose a novel algorithm, Expectation Maximization-Area Under the Curve (EM-AUC), to derive the Area Under the ROC Curve (AUC-ROC) and the Area Under the Precision-Recall Curve (AUC-PR) by treating the unavailable labels as missingdata and replacing them through their posterior probabilities. This algorithm was applied to two network intrusion datasets, yielding robust results. To the best of our knowledge, this is the first time AUC-ROC and AUC-PR, derived without labels, have been used to evaluate network intrusion detection systems. The EM-AUC algorithm enables model training, testing, and performance evaluation to proceed without comprehensive labels, offering a cost-effective and scalable solution for selecting the most effective models for network intrusion detection.
Mobile Crowd Sensing (MCS) involves allocation of sensing tasks associated with an area of interest to a crowd of participants over time. Consequently, the collective amount of time and energy spent on sensing can be ...
详细信息
Mobile Crowd Sensing (MCS) involves allocation of sensing tasks associated with an area of interest to a crowd of participants over time. Consequently, the collective amount of time and energy spent on sensing can be quite large. Sparse Mobile Crowd Sensing (Sparse-MCS) aims at reducing this overhead by reducing the number of sensing tasks, which results in obtaining sensed values from only some portions of the area or time. For those portions which are not thus covered, their corresponding values can be inferred from the collected sensed values. Hence, missing data inference is an integral part of Sparse-MCS. This study is divided into two phases: First, we explore the viability of using machine learning, viz., regression for missing data inference in Sparse-MCS. Hence, we explore several representative regression algorithms such as Linear Regression, LASSO, Elastic Net, Ridge, Decision Tree (DT), Random Forest (RF) and KNN. Using two real data-sets, we conclude that some algorithms such as DT and RF exhibit good performance (giving normalized mean absolute error much less than 0.1 most of the time) whereas the rest do not. Moreover, we compare these techniques with a state-of-the-art missing data inference method known as Compressing Sensing with the help of simulation results. Next, we propose a divide-and-conquer polynomial-time algorithm for task reduction which is based on the proposed inference approach. We also present the results of the analysis of the algorithm in terms of: (i) its time complexity, and (ii) lower and upper bounds on task reduction.
Multi-view clustering aims to improve the clustering performance by leveraging information from multiple views. Most existing works assume that all views are complete. However, samples in real-world scenarios cannot b...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Multi-view clustering aims to improve the clustering performance by leveraging information from multiple views. Most existing works assume that all views are complete. However, samples in real-world scenarios cannot be always observed in all views, leading to the challenging problem of Incomplete Multi-View Clustering (IMVC). Although some attempts are made recently, they still suffer from the following two limitations: (1) they usually adopt shallow models, which are unable to sufficiently explore the consistency and complementary of multiple views;(2) they lack of a suitable measurement to evaluate the quality of the recovered data during the learning process. To address the aforementioned limitations, we introduce a novel Incomplete Multi-View Clustering via inference and Evaluation (IMVC-IE). Specifically, IMVC-IE adopts the contrastive learning strategy on features of different views to excavate the underlying information from existing samples firstly. Subsequently, massive alternative simulated data are inferred for missing views and a novel evaluation strategy is presented to obtain the proper data for missing views completion. Extensive experiments are conducted and verify the effectiveness of our method.
暂无评论