Recently, many methods have been proposed for object detection. However, they cannot detect objects by semantic features, adaptively. According to channel and spatial attention mechanisms, we mainly analyze that diffe...
详细信息
ISBN:
(纸本)9783030638191;9783030638207
Recently, many methods have been proposed for object detection. However, they cannot detect objects by semantic features, adaptively. According to channel and spatial attention mechanisms, we mainly analyze that different methods detect objects adaptively. Some state-of-the-art detectors combine different feature pyramids with many mechanisms. However, they require more cost. This work addresses that by an anchor-free detector with shared encoder-decoder with attention mechanism, extracting shared features. We consider features of different levels from backbone (e.g., ResNet-50) as the basis features. Then, we feed the features into a simple module, followed by a detector header to detect objects. Meantime, we use the semantic features to revise geometric locations, and the detector is a pixel-semantic revising of position. More importantly, this work analyzes the impact of different pooling strategies (e.g., mean, maximum or minimum) on multi-scale objects, and finds the minimum pooling can improve detection performance on small objects better. Compared with state-of-the-art MNC based on ResNet-101 for the standard MSCOCO 2014 baseline, our method improves detection AP of 3.8%.
In recent years, data-driven approaches for remaining useful life (RUL) prognostics have aroused widespread concern. Bearings act as the fundamental component of machinery and their conditioning status is closely asso...
详细信息
In recent years, data-driven approaches for remaining useful life (RUL) prognostics have aroused widespread concern. Bearings act as the fundamental component of machinery and their conditioning status is closely associated with the normal operation of equipment. Hence, it is crucial to accurately predict the remaining useful life of bearings. This paper explores the degradation process of bearings and proposes an enhanced encoder-decoder framework. The framework attempts to construct a decoder with the ability to look back and selectively mine underlying information in the encoder. Additionally, trigonometric functions and cumulative operation are employed to enhance the quality of health indicators. To verify the effectiveness of the proposed method, vibration data from PRONOSTIA platform are utilized for RUL prognostics. Compared with several state-of-the-art methods, the experimental results demonstrate the superiority and feasibility of the proposed method.
Recent improvements in Automatic Speech Recognition (ASR) systems have enabled the growth of myriad applications such as voice assistants, intent detection, keyword extraction and sentiment analysis. These application...
详细信息
ISBN:
(纸本)9781713820697
Recent improvements in Automatic Speech Recognition (ASR) systems have enabled the growth of myriad applications such as voice assistants, intent detection, keyword extraction and sentiment analysis. These applications, which are now widely used in the industry, are very sensitive to the errors generated by ASR systems. This could be overcome by having a reliable confidence measurement associated to the predicted output. This work presents a novel method which uses internal neural features of a frozen ASR model to train an independent neural network to predict a softmax temperature value. This value is computed in each decoder time step and multiplied by the logits in order to redistribute the output probabilities. The resulting softmax values corresponding to predicted tokens constitute a more reliable confidence measure. Moreover, this work also studies the effect of teacher forcing on the training of the proposed temperature prediction module. The output confidence estimation shows an improvement of -25.78% in EER and +7.59% in AUC-ROC with respect to the unaltered softmax values of the predicted tokens, evaluated on a proprietary dataset consisting on News and Entertainment videos.
As one popular modeling approach for end-to-end speech recognition, attention-based encoder-decoder models are known to suffer the length bias and corresponding beam problem. Different approaches have been applied in ...
详细信息
ISBN:
(纸本)9781713820697
As one popular modeling approach for end-to-end speech recognition, attention-based encoder-decoder models are known to suffer the length bias and corresponding beam problem. Different approaches have been applied in simple beam search to ease the problem, most of which are heuristic-based and require considerable tuning. We show that heuristics are not proper modeling refinement, which results in severe performance degradation with largely increased beam sizes. We propose a novel beam search derived from reinterpreting the sequence posterior with an explicit length modeling. By applying the reinterpreted probability together with beam pruning, the obtained final probability leads to a robust model modification, which allows reliable comparison among output sequences of different lengths. Experimental verification on the LibriSpeech corpus shows that the proposed approach solves the length bias problem without heuristics or additional tuning effort. It provides robust decision making and consistently good performance under both small and very large beam sizes. Compared with the best results of the heuristic baseline, the proposed approach achieves the same WER on the 'clean' sets and 4% relative improvement on the 'other' sets. We also show that it is more efficient with the additional derived early stopping criterion.
This paper presents a density enhancement method for airborne LiDAR point cloud with the corresponding image based on a fused encoder-decoder network. Different from terrestrial indoor or outdoor scenes, the variance ...
详细信息
ISBN:
(纸本)9781728163741
This paper presents a density enhancement method for airborne LiDAR point cloud with the corresponding image based on a fused encoder-decoder network. Different from terrestrial indoor or outdoor scenes, the variance of objects and depth ranges in the large scale airborne data is challenging. To address the problem of objects at different scales, we propose a RGB and depth fused encoder-decoder structure inspired by UNet. In addition, we propose a heuristic method for refining the result if instance segmentation labels are available. Both quantitative and qualitative evaluations are performed on a dataset covering 24km(2) area of Osaka in Japan validates the feasibility of the proposed method for densification of point cloud in large scale environment.
Social networks become widely used for understanding patients shared experiences, and reaching a vast audience in a matter of seconds. In particular, many health-related organizations used sentiment analysis to automa...
详细信息
ISBN:
(纸本)9780738111803
Social networks become widely used for understanding patients shared experiences, and reaching a vast audience in a matter of seconds. In particular, many health-related organizations used sentiment analysis to automatically reporting treatment issues, drug misuse, new infectious disease symptoms. Few approaches have proposed in this matter, especially for detecting different drug reaction descriptions from patients generated narratives on social networks. Most of them consisted of only detecting adverse drug reaction(ADR), but may fail to retrieve other aspect, e.g, the beneficial drug reaction or drug retroviral effects such as "relieve intraocular pressure associated with glaucoma". In this study, we propose to develop an encoder-decoder for drug reaction discrimination that involves an enhanced distributed biomedical representation from controlled medical vocabulary such as PubMed and Clinical note MIMIC III. The embedding mechanism primarily leverages contextual information and learn from predefined clinical relationships in term of medical conditions in order to define possible drug reaction of individual meaning and multi-word expressions in the field of distributional semantics configuration that clarifies sentence's similarity in the same contextual target space, which are further share semantically common drug description meanings. Furthermore, the bidirectional sentiment inductive model are created to enhance drug reactions vectorization from real-world patients description whereby achieved higher performance in terms of disambiguating false positive and/or negative assessments. As a result, we achieved an 85.2% accuracy performance and the architecture shows a well-encoding of real-world drug entities descriptions.
作者:
Li, YuemengFan, YongUniv Penn
Ctr Biomed Image Comp & Analyt Perelman Sch Med Dept Radiol Philadelphia PA 19104 USA
Pulmonary nodule detection plays an important role in lung cancer screening with low-dose computed tomography (CT) scans. It remains challenging to build nodule detection deep learning models with good generalization ...
详细信息
ISBN:
(纸本)9781538693308
Pulmonary nodule detection plays an important role in lung cancer screening with low-dose computed tomography (CT) scans. It remains challenging to build nodule detection deep learning models with good generalization performance due to unbalanced positive and negative samples. In order to overcome this problem and further improve state-of-the-art nodule detection methods, we develop a novel deep 3D convolutional neural network with an encoder-decoder structure in conjunction with a region proposal network. Particularly, we utilize a dynamically scaled cross entropy loss to reduce the false positive rate and combat the sample imbalance problem associated with nodule detection. We adopt the squeeze-and-excitation structure to learn effective image features and utilize inter-dependency information of different feature maps. We have validated our method based on publicly available CT scans with manually labelled ground-truth obtained from LIDC/IDRI dataset and its subset LUNA16 with thinner slices. Ablation studies and experimental results have demonstrated that our method could outperform state-of-the-art nodule detection methods by a large margin.
Contemporary diffusion MRI based analysis with HARDI, which provides more accurate fiber orientation, can be performed using single or multiple b-values (single or multi-shell). Single shell HARDI cannot provide volum...
详细信息
ISBN:
(纸本)9781728119908
Contemporary diffusion MRI based analysis with HARDI, which provides more accurate fiber orientation, can be performed using single or multiple b-values (single or multi-shell). Single shell HARDI cannot provide volume fraction for different tissue types, which can produce bias and noisier results in estimation of fiber ODF. Multi-shell acquisition can resolve this issue. However, it requires more scanning time and is therefore not very well suited in clinical setting. Considering this, we propose a novel deep learning architecture, MSR-Net, for reconstruction of diffusion MRI volumes for some b-value using acquisitions at another b-value. In this work, we demonstrate this for b = 2000 s/mm(2) and b = 1000 s/mm(2). We learn such a transformation in the space of spherical harmonic coefficients. The proposed network consists of encoder-decoder along-with an attention module and a feature module. We have considered L2 and Content loss for optimizing and improving the performance. We have trained and validated the network using the HCP data-set with standard qualitative and quantitative performance measures.
Hyperspectral image (HSI) classification is an important task in the remote sensing community. In general, many hyperspectral classification methods are based on pixel patch, which leads to information redundancy. In ...
详细信息
ISBN:
(纸本)9781728163741
Hyperspectral image (HSI) classification is an important task in the remote sensing community. In general, many hyperspectral classification methods are based on pixel patch, which leads to information redundancy. In this paper, we propose a multi-scale encoder-decoder network for HSI classification. First, we adapt an encoder-decoder framework as the backbone network and use a skip connection between the encoder and decoder, the spatial information is obtained by this network. Second, we develop a multi-scale block to get the multi-scale information. Third, we retain complete spectral information through the constant number of spectral channels. Finally, an optimizer strategy is designed to achieve our model for the HSI classification task. We experiment with our method and other methods on two public datasets, and the results denote our model is useful for HSI classification task.
暂无评论