Intent detection (ID) and slot filling (SF) are important components in spoken language understanding (SLU) of a dialogue system. The most widely used method is pipeline manner which detects the user's intent at f...
详细信息
Intent detection (ID) and slot filling (SF) are important components in spoken language understanding (SLU) of a dialogue system. The most widely used method is pipeline manner which detects the user's intent at first, then labels the slots. For the purpose of addressing error propagate, some researchers combine these two tasks together by ID and SF joint model. However, the joint models usually perform well only on one of these tasks due to the different values of the trade-off-parameter. We therefore propose an encoder-decoder model with a new tag scheme which unifies these two tasks into one sequence labeling task. In our model, the process of slot filling can receive an intent information and the performance about multiple tags of a word has been improved. Moreover, we show a length-variable attention which can selectively look at a subset of source sentence in the sequence labeling model. Experimental results on two datasets display that the proposed model with length-variable attention outperforms over other joint models. Besides, our method will automatically find the balance between two tasks and achieve better overall performances. (C) 2019 Elsevier B.V. All rights reserved.
Sea surface temperature (SST) prediction plays an important role in ocean-related fields. It is challenging due to the nonlinear temporal dynamics with changing complex factors and the inherent difficulties in long-sc...
详细信息
Sea surface temperature (SST) prediction plays an important role in ocean-related fields. It is challenging due to the nonlinear temporal dynamics with changing complex factors and the inherent difficulties in long-scale predictions. Conventional models often lack efficient information extraction and cannot meet the requirements of long-scale predictions. Therefore, the gate recurrent unit (GRU) encoder-decoder with SST codes and dynamic influence link (DIL), GRU encoder-decoder (GED), which considered both the static and dynamic influence, is proposed in this letter. Each SST code, capturing the static information more effectively, was computed by all hidden states of the encoder and was individually associated with each predicted SST. The DIL, capturing the dynamic influence, connected the SST code with the early predicted future SST for solving the long-scale dependence problem. GED was tested on the Bohai Sea SST data sets and South China Sea SST data sets and compared with full-connected long-short term memory (FC-LSTM) and support vector regression. The results demonstrated that GED outperformed others on different prediction scales and different prediction terms (daily, weekly, and monthly), especially in terms of long-scale and long-term predictions. In addition, attention relationships between historical and future SSTs were further explored, and there was a meaningful finding that each future daily mean SST of Bohai Sea most strongly correlated with the past 27th to 29th historical values.
Purpose Acute ischemic stroke is one of the primary causes of death worldwide. Recent studies have shown that the assessment of collateral status could aid in improving the treatment for patients with acute ischemic s...
详细信息
Purpose Acute ischemic stroke is one of the primary causes of death worldwide. Recent studies have shown that the assessment of collateral status could aid in improving the treatment for patients with acute ischemic stroke. We present a 3D deep regression neural network to automatically generate the collateral images from dynamic susceptibility contrast-enhanced magnetic resonance perfusion (DSC-MRP) in acute ischemic stroke. Methods This retrospective study includes 144 subjects with acute ischemic stroke (stroke cases) and 201 subjects without acute ischemic stroke (controls). DSC-MRP images of these subjects were manually inspected for collateral assessment in arterial, capillary, early and late venous, and delay phases. The proposed network was trained on 205 subjects, and the optimal model was chosen using the validation set of 64 subjects. The predictive power of the network was assessed on the test set of 76 subjects using the squared correlation coefficient (R-squared), mean absolute error (MAE), Tanimoto measure (TM), and structural similarity index (SSIM). Results The proposed network was able to predict the five phase maps with high accuracy. On average, 0.897 R-squared, 0.581 x 10(-1) MAE, 0.946 TM, and 0.846 SSIM were achieved for the five phase maps. No statistically significant difference was, in general, found between controls and stroke cases. The performance of the proposed network was lower in the arterial and venous phases than the other three phases. Conclusion The results suggested that the proposed network performs equally well for both control and acute ischemic stroke groups. The proposed network could help automate the assessment of collateral status in an efficient and effective manner and improve the quality and yield of diagnosis of acute ischemic stroke. The follow-up study will entail the clinical evaluation of the collateral images that are generated by the proposed network.
The efficient and accurate extraction of building feature information in remote-sensing images has become one of the most important elements of satellite remote-sensing image research. The paper proposes a convolution...
详细信息
The efficient and accurate extraction of building feature information in remote-sensing images has become one of the most important elements of satellite remote-sensing image research. The paper proposes a convolutional neural network with a symmetric encoding-decoding structure. Alternating convolutional blocks and maximum pooled under-sampling at the encoder end are used to complete the relevant operations. The convolutional blocks are operated by linear residual blocks, and complementary zeros are added after 3 x 3 convolutional layers to ensure consistency in feature-map dimensions. A traditional ReLU activation function is replaced with a SELU activation function in order to retain more feature information during training and to solve the problem of dead neurons. A 1 x 1 convolutional layer and a Sigmoid function are finally introduced to complete the final building extraction. The experimental results show that the model is more effective in densely-populated urban areas than in Alpine towns, but the overcrowding of buildings also causes difficulties in accurate edge segmentation.
Generating medical reports manually is a difficult task, especially in rural areas and in urgent medical cases, where there is an emergency. It can also be error-prone for inexperienced physicians to generate a medica...
详细信息
ISBN:
(纸本)9781450387637
Generating medical reports manually is a difficult task, especially in rural areas and in urgent medical cases, where there is an emergency. It can also be error-prone for inexperienced physicians to generate a medical report. There are various deep learning methodologies such as Image captioning, image classification that has been implemented earlier to solve this problem. Generating a medical report automatically is a difficult task, considering the less amount of open-source data available and the paired data which contains medical Images and the report is also limited. One of the challenging tasks is data bias in medical Imaging. A generative encoder-decoder model is suggested to solve this problem in an efficient way. There are various other challenges. First, the medical report itself contains various heterogeneous information such as paragraphs, tags, keywords. Secondly, it is also difficult to identify the abnormal regions in medical images. To solve this problem, a multi-task framework is built, which can perform tag generation and paragraph generation. LSTM (Long Short Term Memory) is built to generate long heterogeneous paragraphs in the medical report. The model working is demonstrated on Chest X-Ray dataset and also on pathology dataset.
Trip planning/recommendation is an important task for a plethora of applications in urban settings (e.g., tourism, transportation, social outings), relying on services provided by Location-Based Social Networks (LBSN)...
详细信息
Trip planning/recommendation is an important task for a plethora of applications in urban settings (e.g., tourism, transportation, social outings), relying on services provided by Location-Based Social Networks (LBSN). To provide greater context-awareness in trajectory planning, LBSNs combine historical trajectories of users for generating various hand-crafted features-e.g., geo-tags of photos taken by tourists and textual characteristics derived from reviews. Those features are used to learn tourists' preferences, which are then used to generate a travel plan recommendation. However, many such features are extracted based on prior knowledge or empirical analysis specific to particular datasets, rendering the corresponding solutions not to be generalizable to diverse data sources. Thus, one important question for managing mobility is how to learn an accurate tour planning model based solely on POI visits or user check-ins and without the efforts of hand-crafted feature engineering. Inspired by recent successes of deep learning in sequence learning, we develop a solution to the tour planning problem based on the semi-supervised learning paradigm. An important aspect of our solution is that it does not involve any feature engineering. Specifically, we propose the Trip Recommendation method via trajectory encoder and decoder-a novel end-to-end approach encoding historical trajectories into vectors, while capturing both the intrinsic characteristics of individual POIs and the transition patterns among POIs. We also incorporate historical attention mechanism in our sequence-to-sequence trip recommendation task to improve the effectiveness. Experiments conducted on multiple publicly available LBSN datasets demonstrate significantly superior performance of our method.
Chinese couplets, as one of the traditional Chinese culture, is the treasure of Chinese civilization and the inheritance of Chinese history. Given a sentence (namely an antecedent clause), people reply with another se...
详细信息
Chinese couplets, as one of the traditional Chinese culture, is the treasure of Chinese civilization and the inheritance of Chinese history. Given a sentence (namely an antecedent clause), people reply with another sentence (namely a subsequent clause) equal in length. Because of the complexity of the semantic and grammatical rules of couplet, it is not easy to create a suitable couplet that meets the requirements of sentence pattern, context, and flatness. With the development of neural models and natural language processing, automatic generation of Chinese couplets has drawn significant attention due to its artistic and cultural value, most of these works mainly focus on generating couplet by given text information, while visual inspirations for couplet generation have been rarely explored. In this paper, we design a Chinese couplet generation model based on NIC (Neural Image Caption), which can compose a piece of couplet suitable to the artistic conception in an image. At first, we use the improved VGG16 model to predict the input image. The content of the image can be automatically recognized and the corresponding description are generated and translated into Chinese keywords. Then, the encoder-decoder framework is used repeatedly to process these keywords, and finally the couplet can be generated. Moreover, to satisfy special characteristics of couplets, we incorporate the attention mechanism into the encoding-decoding process, which greatly improves the accuracy of couplets generated automatically.
The self-attention networks and Transformer have dominated machine translation and natural language processing fields,and shown great potential in image vision tasks such as image classification and object *** by the ...
详细信息
The self-attention networks and Transformer have dominated machine translation and natural language processing fields,and shown great potential in image vision tasks such as image classification and object *** by the great progress of Transformer,we propose a novel general and robust voxel feature encoder for 3D object detection based on the traditional *** first investigate the permutation invariance of sequence data of the self-attention and apply it to point cloud *** we construct a voxel feature layer based on the self-attention to adaptively learn local and robust context of a voxel according to the spatial relationship and context information exchanging between all points within the ***,we construct a general voxel feature learning framework with the voxel feature layer as the core for 3D object *** voxel feature with Transformer(VFT)can be plugged into any other voxel-based 3D object detection framework easily,and serves as the backbone for voxel feature *** results on the KITTI dataset demonstrate that our method achieves the state-of-the-art performance on 3D object detection.
This paper introduces a two-stage deep learning-based methodology for clustering time series data. First, a novel technique is introduced to utilize the characteristics (e.g., volatility) of the given time series data...
详细信息
This paper introduces a two-stage deep learning-based methodology for clustering time series data. First, a novel technique is introduced to utilize the characteristics (e.g., volatility) of the given time series data in order to create labels and thus enable transformation of the problem from an unsupervised into a supervised learning. Second, an autoencoder-based deep learning model is built to model both known and hidden non-linear features of time series data. The paper reports a case study in which the selected financial and stock time series data of over 70 stock indices are clustered into distinct groups using the introduced two-stage procedure. The results show that the proposed methodology is capable of achieving 87.5% accuracy in clustering and predicting the labels for unseen time series data. The paper also reports an important finding in which it is observed that the performance of both techniques (i.e., autoencoder and Kmeans) are comparable. However, there are a few instances of time series data that are classified differently by the autoencoder-based methodology compared to the Kmeans algorithm. The results may indicate that the proposed deep learning-based approach is taking into account additional hidden features that might be overlooked by conventional Kmeans. The finding raises the question whether the explicit features of data should be analyzed for clustering or more advanced techniques such as deep learning need to be adapted by which hidden features and relationships are explored for clustering purposes.
Video Object Segmentation (VOS) is a fundamental task required in many high-level real-world computer vision applications. VOS becomes challenging due to the presence of background distractors as well as to object app...
详细信息
Video Object Segmentation (VOS) is a fundamental task required in many high-level real-world computer vision applications. VOS becomes challenging due to the presence of background distractors as well as to object appearance variations. Many existing VOS approaches use online model updates to capture the appearance variations which incurs high computational cost. Template matching and propagation-based VOS methods, although cost-effective, suffer from performance degradation under challenging scenarios such as occlusion and background clutter. In order to tackle these challenges, we propose a network architecture dubbed 4G-VOS to encode video context for improved VOS performance to tackle these challenges. To preserve long term semantic information, we propose a guided transfer embedding module. We employ a global instance matching module to generate similarity maps from the initial image and the mask. Besides, we use a generative directional appearance module to estimate and dynamically update the foreground/background class probabilities in a spherical embedding space. Moreover, during feature refinement, existing approaches may lose contextual information. Therefore, we propose a guided pooled decoder to exploit the global and local contextual information during feature refinement. The proposed framework is an end-to-end learning architecture that is trained in an offline fashion. Evaluations over three VOS benchmark datasets including DAVIS2016, DAVIS2017, and YouTube-VOS have demonstrated outstanding performance of the proposed algorithm compared to 40 existing state-of-the-art methods. (C) 2021 Elsevier B.V. All rights reserved.
暂无评论