License plate detection has wide applications in the intelligent transportation system, while it still remains challenges to improve the robustness under various shooting distance and observation angles. To get better...
详细信息
Lung tumor PET and CT image fusion is a key technology in clinical diagnosis. However, the existing fusion methods are difficult to obtain fused images with high contrast, prominent morphological features, and accurat...
详细信息
Lung tumor PET and CT image fusion is a key technology in clinical diagnosis. However, the existing fusion methods are difficult to obtain fused images with high contrast, prominent morphological features, and accurate spatial localization. In this paper, an isomorphic Unet fusion model (GMRE-iUnet) for lung tumor PET and CT images is proposed to address the above problems. The main idea of this network is as following: Firstly, this paper constructs an isomorphic Unet fusion network, which contains two independent multiscale dual encoders Unet, it can capture the features of the lesion region, spatial localization, and enrich the morphological information. Secondly, a Hybrid CNN-Transformer feature extraction module (HCTrans) is constructed to effectively integrate local lesion features and global contextual information. In addition, the residual axial attention feature compensation module (RAAFC) is embedded into the Unet to capture fine-grained information as compensation features, which makes the model focus on local connections in neighboring pixels. Thirdly, a hybrid attentional feature fusion module (HAFF) is designed for multiscale feature information fusion, it aggregates edge information and detail representations using local entropy and Gaussian filtering. Finally, the experiment results on the multimodal lung tumor medical image dataset show that the model in this paper can achieve excellent fusion performance compared with other eight fusion models. In CT mediastinal window images and PET images comparison experiment, AG, EI, QAB/F, SF, SD, and IE indexes are improved by 16.19%, 26%, 3.81%, 1.65%, 3.91% and 8.01%, respectively. GMRE-iUnet can highlight the information and morphological features of the lesion areas and provide practical help for the aided diagnosis of lung tumors.
Non-invasive gaze estimation from only eye images captured by camera is a challenging problem due to various eye shapes, eye structures and image qualities. Recently, CNN network has been applied to directly regress e...
详细信息
Recently, Joint Video Experts Team (JVET) has completed the new Versatile Video Coding (H.266/VVC) standard. VVC employs a new block partition structure named quad-tree with nested multi-type tree (QTMT) to improve co...
详细信息
Recently, Joint Video Experts Team (JVET) has completed the new Versatile Video Coding (H.266/VVC) standard. VVC employs a new block partition structure named quad-tree with nested multi-type tree (QTMT) to improve coding efficiency. However, the new block partition structure increases huge encoding time compared with HEVC for brute-force ratedistortion (RD) optimization. To reduce encoding complexity, we propose a Support Vector Machine (SVM) based fast CU partitioning algorithm for VVC intra coding in this paper which terminates redundant partitions early by predicting the partition of CU using texture information. We trained classifiers for CUs of different sizes to improve accuracy and control the complexity of the classifiers themselves. Different thresholds are set for each classifier to achieve a trade-off between encoding complexity and RD performance. Experimental results show that the proposed method can save encoder time ranging from 30.78% to 63.16% with 1.10% to 2.71% BD-BR increase.
The emergence of infectious disease COVID-19 has challenged and changed the world in an unprecedented manner. The integration of wireless networks with edge computing (namely wireless edge networks) brings opportuniti...
详细信息
Facial Expression Analysis (FEA) plays a vital role in diagnosing and treating early-stage neurological disorders (NDs) like Alzheimer's and Parkinson's. Manual FEA is hindered by expertise, time, and training...
详细信息
At present, most high-performing saliency prediction models for omnidirectional images (ODIs) depend on deeper or wider convolutional neural networks (CNNs), benefiting from their superior feature representation capab...
详细信息
In the Chinese character writing task of the robotic arms, the stroke category and position information should be extracted by object detection. The detection algorithms based on predefined anchor frames have difficul...
详细信息
In the Chinese character writing task of the robotic arms, the stroke category and position information should be extracted by object detection. The detection algorithms based on predefined anchor frames have difficulty in resolving the differences among many different styles of Chinese character strokes. While the deformable detection transformer (deformable DETR) algorithms without predefined anchor frames result in some invalid sampling points having no contribution to the feature update of the current reference point due to the random sampling of sampling points in the deformable attention module. These processes cause the effectiveness of correlation calculations between reference points in Chinese strokes and their surrounding sampled points is limited. So that the speed of vector learning stroke features in the detection head is reduced. In view of this problem, a new detection method of multi-style strokes of Chinese characters via SCSQ-MDD (Simple Conditional Spatial Query Mask Deformable DETR) is proposed in this paper. Firstly, a mask prediction layer is jointly determined using the shallow feature map of the Chinese character image and the query vector of the transformer encoder, which is used to filter the points with actual contribution and resample the points without contribution, so that the randomness of correlation calculation among reference points is solved. Secondly, by separating the content query and spatial query of the transformer deocder, the content embedding and spatial embedding can be separately focused on when cross-attention computations are performed. Thus the dependence of the prediction task on the content embedding is relaxed and the training process is simplified. Finally, the detection model without predefined anchor frames based on deformable DETR called SCSQ-MDD is constructed using the mask mechanism and the simple conditional spatial query mechanism, and trained and validated on a multi-style Chinese character stroke dataset
To reduce over-rasterization distortion caused by global uniform quantization for static surface point cloud, an adaptive quantization coding method based on feature mining is proposed. Combining spatial position and ...
详细信息
To reduce over-rasterization distortion caused by global uniform quantization for static surface point cloud, an adaptive quantization coding method based on feature mining is proposed. Combining spatial position and texture feature of point clouds with level of details, the quantization increment is dynamically set according to feature priority, which can reserve the number of effective points to the maximum extent, and reduce the rasterization distortion. Experimental results show that the proposed method can effectively enhance the subjective reconstruction quality of compressed point cloud, gaining better results of rate-distortion optimization.
Gaze following, i.e., detecting the gaze target of a human subject, in 2D images has become an active topic in computer vision. However, it usually suffers from the out of frame issue due to the limited field-of-view ...
详细信息
ISBN:
(纸本)9781665428132
Gaze following, i.e., detecting the gaze target of a human subject, in 2D images has become an active topic in computer vision. However, it usually suffers from the out of frame issue due to the limited field-of-view (FoV) of 2D images. In this paper, we introduce a novel task, gaze following in 360-degree images which provide an omnidirectional FoV and can alleviate the out of frame issue. We collect the first dataset, "GazeFollow360" 1 , for this task, containing around 10,000 360-degree images with complex gaze behaviors under various scenes. Existing 2D gaze following methods suffer from performance degradation in 360degree images since they may use the assumption that a gaze target is in the 2D gaze sight line. However, this assumption is no longer true for long-distance gaze behaviors in 360-degree images, due to the distortion brought by sphere-to-plane projection. To address this challenge, we propose a 3D sight line guided dual-pathway framework, to detect the gaze target within a local region (here) and from a distant region (there), parallelly. Specifically, the local region is obtained as a 2D cone-shaped field along the 2D projection of the sight line starting at the human subject’s head position, and the distant region is obtained by searching along the sight line in 3D sphere space. Finally, the location of the gaze target is determined by fusing the estimations from both the local region and the distant region. Experimental results show that our method achieves significant improvements over previous 2D gaze following methods on our GazeFollow360 dataset.
暂无评论