Active compound jamming, particularly compound interrupted sampling repeater jamming (ISRJ), possesses excellent flexibility and jamming effectiveness, making it one of the major threats to radar systems. Accurately m...
详细信息
Active compound jamming, particularly compound interrupted sampling repeater jamming (ISRJ), possesses excellent flexibility and jamming effectiveness, making it one of the major threats to radar systems. Accurately measuring the key parameters of each ISRJ component within the compound ISRJ can provide critical prior information for subsequent anti-jamming efforts. However, most of the existing ISRJ parameter measurement methods target a single ISRJ and lack in-depth research on the measurement of compound ISRJ parameters. Therefore, we propose a unified framework for compound ISRJ parameter measurement that contains a compound ISRJ separation network based on an encoder-decoder architecture and a parameter regression module for measuring the key parameters of each ISRJ component in the compound ISRJ. Experimental results indicate that the proposed framework achieves parameter measurement accuracies of over 89% and 85% for dual-compound and multicompound ISRJ, respectively, significantly outperforming existing methods.
General sign language recognition models are only designed for recognizing categories, i.e., such models do not discriminate standard and nonstandard sign language actions made by learners. It is inadequate to use in ...
详细信息
General sign language recognition models are only designed for recognizing categories, i.e., such models do not discriminate standard and nonstandard sign language actions made by learners. It is inadequate to use in a sign language education software. To address this issue, this paper proposed a sign language category and standardization correctness discrimination model for sign language education. The proposed model is implemented with a hand detection and standard sign language discrimination method. For hand detection, the proposed method utilizes flow-guided features and acquires relevant proposals using stable and flow key frame detections. This model can resolve the inconsistency between the forward optical flow and the box center point offset. In addition, the proposed method employs an encoder-decoder model structure for sign language correctness discrimination. The encoder model combines 3D convolution and 2D deformable convolution results with residual structures, and it implements a sequence attention mechanism. A Sign Language Correctness Discrimination dataset (SLCD dataset) was also constructed in this study. In this dataset, each sign language video has two recognition labels, i.e., sign language category and standardization category. The semi-supervised learning method was employed to generate pseudo hand position labels. The hand detection model was getting sufficiently high hand detection result. The sign language correctness discrimination model was tested with hand patches or full images. SLCD dataset is available at https://***/10.21227/p9sn-dz70.
Semantic segmentation can help the perception link to better build an understanding of complex scenes, and can assist the unmanned system to better perceive the scene content. To address the problem of detailed inform...
详细信息
Semantic segmentation can help the perception link to better build an understanding of complex scenes, and can assist the unmanned system to better perceive the scene content. To address the problem of detailed information loss and segmentation edge blur in the semantic segmentation task for complex scenes, we propose a modified version of Deeplabv3+ based on the improved ASPP and fusion module. Firstly, we propose an RA-ASPP module combining residual network and asymmetric atrous convolution block (AACB), which further enriches the scale of feature extraction and achieves denser multi-scale feature extraction. It significantly enhances the representation power of the network. Then, we propose a parallel fusion module named convolution combine with bottleneck block (CBB), which combines 1 x 1 convolution and bottleneck block to reduce the information loss in the whole network transmission process. We perform ablation experiments on the PASCAL VOC2012 dataset. When the backbone is Xception, the Mean Intersection over Union (MIoU) of Ours1 is 79.78% . At the cost of 1.72 frames per second (FPS), its MIoU is 2.81% faster than Deeplabv3+. The proposed modules significantly improve the accuracy in semantic segmentation and achieve segmentation results comparable to state-of-the-art algorithms. When MobileNetV2 is the backbone, Ours2 achieves 37.54FPS and a MIoU of 73.32% , which ensures a balance between real-time segmentation speed and accuracy. In summary, our proposed modified module improves the segmentation performance of Deeplabv3+, and the different backbones also provide additional options for semantic segmentation tasks in complex scenes.
At present, gastric cancer patients account for a large proportion of all tumor patients. Gastric tumor image segmentation can provide a reliable additional basis for the clinical analysis and diagnosis of gastric can...
详细信息
At present, gastric cancer patients account for a large proportion of all tumor patients. Gastric tumor image segmentation can provide a reliable additional basis for the clinical analysis and diagnosis of gastric cancer. However, the existing gastric cancer image datasets have disadvantages such as small data sizes and difficulty in labeling. Moreover, most existing CNN-based methods are unable to generate satisfactory segmentation masks without accurate labels, which are due to the limited context information and insufficient discriminative feature maps obtained after the consecutive pooling and convolution operations. This paper presents a gastric cancer lesion dataset for gastric tumor image segmentation research. A multiscale boundary neural network (MBNet) is proposed to automatically segment the real tumor area in gastric cancer images. MBNet adopts encoder-decoder architecture. In each stage of the encoder, a boundary extraction refinement module is proposed for obtaining multi granular edge information and refinement firstly. Then, we build a selective fusion module to selectively fuse features from the different stages. By cascading the two modules, the richer context and fine-grained features of each stage are encoded. Finally, the astrous spatial pyramid pooling is improved to obtain the remote dependency relationship of the overall context and the fine spatial structure information. The experimental results show that the accuracy of the model reaches 92.3%, the similarity coefficient (DICE) reaches 86.9%, and the performance of the proposed method on the CVC-ClinicDB and Kvasir-SEG datasets also outperforms existing approaches.
In recent years, significant progress has been made in semantic segmentation methods. Traditional semantic segmentation methods based on convolutional neural network (CNN) are prone to lose spatial information in the ...
详细信息
In recent years, significant progress has been made in semantic segmentation methods. Traditional semantic segmentation methods based on convolutional neural network (CNN) are prone to lose spatial information in the feature extraction stage, and pay less attention to global context information, especially, in some lightweight real-time semantic segmentation networks. This is a huge challenge for semantic segmentation tasks. In addition, although some methods have improved this problem to a certain extent, they are often embedded in specific networks and cannot be applied to other network models. Aiming at these problems, a semantic segmentation method based on multilayer feature fusion is proposed. The flexible and lightweight squeeze-excitation module is used to improve the spatial pyramid pooling (SPP) network, and the accuracy of the semantic segmentation method is further improved by extracting network feature information at different levels. To verify the efficiency and commonality of our methodology, we selected ERFNet and Deeplabv3 networks to experiment on Cityscapes and COCO data sets. Experiments show that our best method can improve 3.1% mIoU and 3.2% mAcc on the Cityscapes data set relative to ERFNet, and at the same time, our method can achieve 61.93 FPS on 1024 x 512 resolution images and the best improvement of 0.9% mIoU 1.4% mAcc was achieved on the Deeplabv3 network. The experimental results show that the improved multilayer feature fusion structure can improve the accuracy of the semantic segmentation network.
Gale is a kind of disaster weather, and the forecast of wind speed is a difficult point in operational weather forecast. In this study, we propose a method to forecast the time series of wind speed in the future perio...
详细信息
Gale is a kind of disaster weather, and the forecast of wind speed is a difficult point in operational weather forecast. In this study, we propose a method to forecast the time series of wind speed in the future period at the target station by using the time series of wind speed in the past period at the target station and its adjacent stations. This method is established by using deep learning technology. Based on the infrastructure of encoder-decoder, the driving series at the adjacent stations and the target series at the target station are taken as the input of the encoder module and the decoder module, respectively. There are two attention layers in the encoder module. One is used to strengthen the contribution of each influence factor in the input driving series to the hidden state in the long short-term memory (LSTM) layer. The other is used to enable the encoder to adaptively select the hidden state output by the LSTM layer. The loss function based on the Gaussian kernel function is adopted in the forecast model of this study, and the dynamic weight is designed to optimize the attention to the errors of the output results at different forecast leading times in the training process of the neural network model, thus improving the model forecast performance for longer forecast leading times. The results show that the performance of this method is excellent in the wind speed forecast from T+1 to T+24. The mean absolute error and root mean squared error of the forecast results at T+24 are 0.796 m s-1 and 1.029 m s-1, respectively, which are better than those of the other two models in the experiment. It is proved that the method proposed in this study can not only be applied to the wind speed forecast but also can provide technical support for operational applications such as early-warning of gale disaster and wind power prediction.
Electric shorting induced by tall vegetation is one of the major hazards affecting power transmission lines extending through rural regions and rough terrain for tens of kilometres. This raises the need for an accurat...
详细信息
Electric shorting induced by tall vegetation is one of the major hazards affecting power transmission lines extending through rural regions and rough terrain for tens of kilometres. This raises the need for an accurate, reliable, and cost-effective approach for continuous monitoring of canopy heights. This paper proposes and evaluates two deep convolution neural network (CNN) variants based on Seg-Net and Res-Net architectures, characterized by their small number of trainable weights (nearly 800,000) while maintaining high estimation accuracy. The proposed models utilize the freely available data from Sentinel-2, and a digital surface model to estimate forest canopy heights with high accuracy and a spatial resolution of 10 metres. Various factors affect canopy height estimation, including topography signature, dataset diversity, input layers, and model structure. The proposed models are applied separately to two powerline regions located in the northern and southern parts of Thailand. The application results show that the proposed encoder-decoder CNN Seg-Net model presents an average mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination R 2 of 1.38 m, 1.85 m, and 0.87, respectively, and is nearly 4.8 times faster than the CNN Res-Net model in conversion. These results prove the proposed model's capability of estimating and monitoring canopy heights with high accuracy and fine spatial resolution.
Predicting human motion based on past observed motion is one of the challenging issues in computer vision and graphics. Existing research works are dealing with this issue by using discriminative models and showing th...
详细信息
Predicting human motion based on past observed motion is one of the challenging issues in computer vision and graphics. Existing research works are dealing with this issue by using discriminative models and showing the results for cases that follow a homogeneous distribution (in distribution) and not discussing the issues of the domain shift problem, where training and testing data follow a heterogeneous (out of distribution) problem, which is the reality when such models are used in practice. However, recent research proposed addressing domain shift issues by augmenting the discriminative model with a generative model and obtained better results. In the present investigation, we propose regularizing the extended network by inserting linear layers to minimize the rank of the latent space and train the entire end-to-end network. We regularize the network to strengthen the model to deal effectively with domain shift scenarios. Both training and testing data come from different distribution sets;to deal with this, we toughen our network by adding the extra linear layers to the network encoder. We tested our model with the benchmark datasets, CMU Motion Capture and Human3.6M, and proved that our model outperforms 14 OoD actions of H3.6M and 7 OoD actions of CMU MoCap in terms of the Euclidean distance calculated between predicted and ground truth joint angle values. Our average results of 14 OoD actions for short-term (80, 160, 320, 400) are 0.34, 0.6, 0.96, 1.07, and for CMU MoCap of 7 OoD actions for short-term and long term (80, 160, 320, 400, 1000) are 0.28, 0.45, 0.77, 0.89, 1.46. All these results are much better than the other state-of-the-art results.
The aim of the image captioning task is to understand various semantic concepts such as objects and their relationships in an image and combine them to generate a natural language description. Thus, it needs an algori...
详细信息
The aim of the image captioning task is to understand various semantic concepts such as objects and their relationships in an image and combine them to generate a natural language description. Thus, it needs an algorithm to understand the visual content of a given image and translates it into a sequence of output words. In this paper, a Local Relation Network (LRN) is designed over the objects and image regions which not only discovers the relationship between the object and the image regions but also generates significant context-based features corresponding to every region in the image. Also, a multilevel attention approach is used to focus on a given image region and its related image regions, thus enhancing the image representation capability of the proposed method. Finally, a variant of traditional long-short term memory (LSTM), which uses an attention mechanism, is employed which focuses on relevant contextual information, spatial locations, and deep visual features. With these measures, the proposed model encodes an image in an improved way, which gives the model significant cues and thus leads to improved caption generation. Extensive experiments have been performed on three benchmark datasets: Flickr30k, MSCOCO, and Nocaps. On Flickr30k, the obtained evaluation scores are 31.2 BLEU@4, 23.5 METEOR, 51.5 ROUGE, 65.6 CIDEr and 17.2 SPICE. On MSCOCO, the proposed model has attained 42.4 BLEU@4, 29.4 METEOR, 59.7 ROUGE, 125.7 CIDEr and 23.2 SPICE. The overall CIDEr score on Nocaps dataset achieved by the proposed model is 114.3. The above scores clearly show the superiority of the proposed method over the existing methods.
Part quality manufactured by the laser powder bed fusion process is significantly affected by porosity. Existing works of process-property relationships for porosity prediction require many experiments or computationa...
详细信息
Part quality manufactured by the laser powder bed fusion process is significantly affected by porosity. Existing works of process-property relationships for porosity prediction require many experiments or computationally expensive simulations without considering environmental variations. While efforts that adopt real-time monitoring sensors can only detect porosity after its occurrence rather than predicting it ahead of time. In this study, a novel porosity detection-prediction framework is proposed based on deep learning that predicts porosity in the next layer based on thermal signatures of the previous layers. The proposed framework is validated in terms of its ability to accurately predict lack of fusion porosity using computerized tomography (CT) scans, which achieves a F1-score of 0.75. The framework presented in this work can be effectively applied to quality control in additive manufacturing. As a function of the predicted porosity positions, laser process parameters in the next layer can be adjusted to avoid more part porosity in the future or the existing porosity could be filled. If the predicted part porosity is not acceptable regardless of laser parameters, the building process can be stopped to minimize the loss.
暂无评论