One-Class anomaly detection aims to detect anomalies from normal samples using a model trained on normal data. With recent advancements in deep learning, researchers have designed efficient one-class anomaly detection...
详细信息
One-Class anomaly detection aims to detect anomalies from normal samples using a model trained on normal data. With recent advancements in deep learning, researchers have designed efficient one-class anomaly detection methods. Existing works commonly use neural networks to map the data into a more informative representation and then apply an anomaly detection algorithm. In this paper, we propose a method, DASVDD, that jointly learns the parameters of an autoencoder while minimizing the volume of an enclosing hypersphere on its latent representation. We propose a novel anomaly score that combines the autoencoder's reconstruction error and the distance from the center of the enclosing hypersphere in the latent representation. Minimizing this anomaly score aids us in learning the underlying distribution of the normal class during training. Including the reconstruction error in the anomaly score ensures that DASVDD does not suffer from the hypersphere collapse issue since the DASVDD model does not converge to the trivial solution of mapping all inputs to a constant point in the latent representation. Experimental evaluations on several benchmark datasets show that the proposed method outperforms the commonly used state-of-the-art anomaly detection algorithms while maintaining robust performance across different anomaly classes.
Reconstruction of multi-view 3-dimensional images is essential in robotics and computer vision to obtain an accurate 3-dimensional representation of objects by analyzing the 2-dimensional input data. For reconstructin...
详细信息
Reconstruction of multi-view 3-dimensional images is essential in robotics and computer vision to obtain an accurate 3-dimensional representation of objects by analyzing the 2-dimensional input data. For reconstructing the 3-dimensional image, it is mandatory to analyze the 3-dimensional geometry features from multiple viewpoints. It includes feature extraction and transformation from 2-dimensional features to 3-dimensional volumetric meshes. However, the existing research cannot produce consistent reconstruction results for the same input images with different orders. Therefore, the deep learning-based Residual Network-50 model is developed for 3-dimensional image reconstruction from multi-view images in the present work. The proposed system model comprises a 2-dimensional and 3-dimensional network and a backpropagation layer. From the input image, 2-dimensional features are computed using a 2-dimensional network. Then, the metaheuristic Adaptive School of Fish Optimization is used to improve the neural network's output, determining the optimal weight that gives less classification error. Then, the testing process uses the deep autoencoder, which decodes the output of the training model. Residual Network-50 is used to reconstruct 2-dimensional images to 3-dimensional using single or multi-views. Finally, the experimental analysis is performed in Python. The experiment is performed on the ShapeNet dataset and compared with the existing works. The proposed model yields better accuracy, F-score and Intersection-over-Union values of 99.3%, 0.893 and 0.734, respectively, which is more efficient than other existing models.
This paper presents a deep auto-encoder model and a phased framework approach to predict the next 12 h of vessel trajectories using 1 to 3 h of Automatic Identification System data as input. The strategy involves fusi...
详细信息
This paper presents a deep auto-encoder model and a phased framework approach to predict the next 12 h of vessel trajectories using 1 to 3 h of Automatic Identification System data as input. The strategy involves fusing spatiotemporal features from AIS messages with probabilistic features engineered from historical AIS data to reduce forecasting uncertainty. The probabilistic features have an F1-Score of approximately 85% and 75% for the vessel route and destination prediction, respectively. Under such circumstances, we achieved an R2 Score of over 98% with different layer structures and varying feature combinations;the high R2 Score is a natural outcome of the well-defined shipping lanes in the study region. However, our proposal stands out among competing approaches as it demonstrates the capability of complex decision-making during turnings and route selection. Furthermore, we have shown that our model achieves more accurate forecasting with average and median errors of 11km and 6km, respectively, a 25% improvement from the current state-ofthe-art approaches. The resulting model from this proposal is deployed as part of a broader Decision Support System to safeguard whales by preventing the risk of vessel-whale collisions under the smartWhales initiative and acting on the Gulf of St. Lawrence in Atlantic Canada.
The past decade has witnessed a prosperity of sparsity-inspired face hallucination methods that use sparse prior and instances to generate High-Resolution (HR) faces. However, they need numerous Low-Resolution (LR) an...
详细信息
The past decade has witnessed a prosperity of sparsity-inspired face hallucination methods that use sparse prior and instances to generate High-Resolution (HR) faces. However, they need numerous Low-Resolution (LR) and HR instance pairs and adopt approximate sparse coding, which will bring bias to the recovery and suffer from high computational burden. In this paper we advance a Single Face Image Hallucination (SFIH) method from a new perspective of Non-linear Learning Compressive Sensing (NLCS), which can recover HR faces from a surprisingly small number of HR faces. The nonlinear sparse coding of facial images is explored, and a deep autoencoder (DAE) network is constructed for learning a kernel function from a single HR instance set. SFIH is then reduced to an analytic compressive recovery problem by reformulating linear sparse coding as a nonlinear DAE model. By exploring the nonlinear sparsity in the feature space, NLCS can accurately and rapidly recover HR facial images with large magnification factor and exhibit robustness to LR-HR instance pairs mapping. Some experiments are taken on realizing 3X, 6X, 9X amplification of face images, and the results prove its efficiency and superiority to its counterparts.
Effective student group formation is crucial in higher education to foster collaborative learning environments. Grouping students by academic disciplines enhances peer-to-peer interactions and facilitates in-depth dis...
详细信息
Effective student group formation is crucial in higher education to foster collaborative learning environments. Grouping students by academic disciplines enhances peer-to-peer interactions and facilitates in-depth discussions on specialized topics. However, due to classroom space and resource constraints, it is challenging to accommodate all students from similar disciplines in one class. This necessitates a grouping method that can ensure a balanced distribution of students across available groups. Traditional K-means clustering, commonly used for this purpose, often results in inconsistent group sizes and fails to guarantee a balanced distribution of group members. Hard balanced clustering, which strictly enforces precise size limits on each cluster, offers a promising alternative for organizing balanced student sections to optimize classroom utilization. Nonetheless, most hard balanced clustering methods are limited in feature learning capability, which can lead to the overlooking of significant data patterns and result in ineffective clustering. To address this limitation, this paper introduces a new unsupervised model, deep Hard Balanced Clustering (DHBC), which integrates hard balanced clustering with a deep learning framework to enhance feature learning. DHBC incorporates a balanced clustering mechanism within the optimization process of an autoencoder architecture. It enhances the generated latent space representation by introducing a joint loss function that combines reconstruction and balanced clustering objectives, ensuring the embedded representation supports a balanced distribution of students. The model optimizes balanced clustering centroids during training. Comparative experiments conducted on real-world student enrollment datasets, evaluated by WCSS scores, demonstrate DHBC's superiority in creating more cohesive and balanced student groups compared to state-of-the-art methods.
Providing accurate and speedy diagnosis and, in turn, treatment, automated medical image analysis plays a significant role in survival rate improvement. Inherent different kinds of uncertainties and complexities prove...
详细信息
Providing accurate and speedy diagnosis and, in turn, treatment, automated medical image analysis plays a significant role in survival rate improvement. Inherent different kinds of uncertainties and complexities prove machine learning-based, particularly dictionary-learning-based classification approaches, very promisingly. This work concerns class-specific fuzzy discriminative dictionary learning using deep features on the continuum of our machine-learning-based medical image classifiers' evolution path. In (DFC)-F-3, a deep autoencoder generates a more relevant, representative, and compact features set. The distinctive-hidden information and inherent complexity and uncertainty of medical images are addressed using fuzzy-discriminative terms in the optimization function, simultaneously improving the inter-class-representation distance and intra-class-representation similarity. A comprehensive set of experiments on cancer tumor images from three different databases shows the outperformance of (DFC)-F-3 over related state-of-the-art competitions in accuracy, sensitivity, specificity, precision, convergence speed, and noise resilience. The meaningfulness of the experiments' results is statistically verified.
To alleviate the sparsity issue, many recommender systems have been proposed to consider the review text as the auxiliary information to improve the recommendation quality. Despite success, they only use the ratings a...
详细信息
To alleviate the sparsity issue, many recommender systems have been proposed to consider the review text as the auxiliary information to improve the recommendation quality. Despite success, they only use the ratings as the ground truth for error backpropagation. However, the rating information can only indicate the users' overall preference for the items, while the review text contains rich information about the users' preferences and the attributes of the items. In real life, reviews with the same rating may have completely opposite semantic information. If only the ratings are used for error backpropagation, the latent factors of these reviews will tend to be consistent, resulting in the loss of a large amount of review information. In this article, we propose a novel deep model termed deep rating and review neural network (DRRNN) for recommendation. Specifically, compared with the existing models that adopt the review text as the auxiliary information, DRRNN additionally considers both the target rating and target review of the given user-item pair as ground truth for error backpropagation in the training stage. Therefore, we can keep more semantic information of the reviews while making rating predictions. Extensive experiments on four publicly available datasets demonstrate the effectiveness of the proposed DRRNN model in terms of rating prediction.
The recent advancements in bio-photonics enabled physicians to combine techniques such as narrow-band imaging, fluorescence spectroscopy, optical coherence tomography, with visible spectrum endoscopy video to provide ...
详细信息
The recent advancements in bio-photonics enabled physicians to combine techniques such as narrow-band imaging, fluorescence spectroscopy, optical coherence tomography, with visible spectrum endoscopy video to provide in vivo microscopic tissue characterization in online optical biopsy (Ye et al.2015);(Wang and Van Dam2004). Despite the aforementioned advantages, it is challenging for gastroenterologists to retarget the optical biopsy sites during endoscopic examinations because of the degraded quality of endoscopic video which gets corrupted by haze, noise, oversaturated illumination, etc. Enhancement of video frames by considering color channels independently gives birth to unintended phantom color due to its ignorance of the psycho-visual correspondence. To address the aforementioned, we have proposed a novel algorithm to enhance video with faster performance. The proposedC(2)D(2)A(Cross Color Dominant deep autoencoder) uses the strength of (a) bilateral filtering both in spatial neighborhood domain and psycho-visual range;(b) deep autoencoder which learns salient patterns. The domain-based color sparseness has further improved the performance, modulating classical deep autoencoder to color dominant deep autoencoder. The work has shown promise towards not only a generic framework of quality enhancement of video streams but also addressing performance. The current work in turn improves the image and video analytics like segmentation, detection, and tracking the objects or regions of interest.
The brain-like functionality of the artificial neural networks besides their great performance in various areas of scientific applications, make them a reliable tool to be employed in Audio-Visual Speech Recognition (...
详细信息
The brain-like functionality of the artificial neural networks besides their great performance in various areas of scientific applications, make them a reliable tool to be employed in Audio-Visual Speech Recognition (AVSR) systems. The applications of such networks in the AVSR systems extend from the preliminary stage of feature extraction to the higher levels of information combination and speech modeling. In this paper, some carefully designed deep autoencoders are proposed to produce efficient bimodal features from the audio and visual stream inputs. The basic proposed structure is modified in three proceeding steps to make better usage of the presence of the visual information from the speakers' lips Region of Interest (ROI). The performance of the proposed structures is compared to both the unimodal and bimodal baselines in a professional phoneme recognition task, under different noisy audio conditions. This is done by employing a state-of-the-art DNN-HMM hybrid as the speech classifier. In comparison to the MFCC audio-only features, the finally proposed bimodal features cause an average relative reduction of 36.9% for a range of different noisy conditions, and also, a relative reduction of 19.2% for the clean condition in terms of the Phoneme Error Rates (PER). (C) 2018 Elsevier Inc. All rights reserved.
Speech recognition has the problem of low recognition accuracy because of poor denoising effect and low endpoint detection accuracy. Therefore, a new intelligent speech multifeature recognition method based on deep ma...
详细信息
Speech recognition has the problem of low recognition accuracy because of poor denoising effect and low endpoint detection accuracy. Therefore, a new intelligent speech multifeature recognition method based on deep machine learning is proposed. In this method, speech signals are digitally processed, a first-order finite impulse response (FIR) high pass digital filter is used to preemphasize digital speech signals, and short-term energy and zero crossing rate are combined to detect speech signals to expand endpoints. The detected speech signal is input into the depth autoencoder, and the features of the speech signal are extracted through deep learning. The Gaussian mixture model of deep machine learning is constructed using a continuous distribution hidden Markov model, and the extracted features are input into the model to complete feature recognition. The experimental results show that the proposed method has high endpoint detection accuracy, good denoising effect, and high recognition accuracy, and this method has higher application value.
暂无评论