The structural similarity of point clouds presents challenges in accurately recognizing and segmenting semantic information at the demarcation points of complex scenes or objects. In this study, we propose a multi-sca...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
The structural similarity of point clouds presents challenges in accurately recognizing and segmenting semantic information at the demarcation points of complex scenes or objects. In this study, we propose a multi-scale graph transformer network (MGTN) for 3D point cloud semantic segmentation. First, a multi-scale graph convolution (MSG-Conv) is devised to address the limitations faced by existing methods when extracting local and global features of point cloud data with varying densities simultaneously. Subsequently, we employ a graph-transformer (G-T) module to enhance edge details and spatial position information in the point cloud, thereby improving recognition accuracy for small objects and confusing elements such as columns and beams. Extensive testing on ShapeNet parts and S3DIS datasets was conducted to demonstrate the effectiveness of MGTN. Compared to the baseline network DGCNN, our proposed MGTN achieves substantial performance improvements, as evidenced by notable increases in mIoU of 1.5% and 18.5% on the ShapeNet parts and S3DIS datasets respectively. Additionally, MGTN outperforms the recent CFSA-Net by 2.3% and 3.4% on OA and mIoU respectively.
visual attention plays an important role in image and video processing. Nowadays, high definition (HD) techniques have been widely used. And ultra high definition (UHD) is becoming more and more popular. However, exis...
详细信息
ISBN:
(纸本)9781479961399
visual attention plays an important role in image and video processing. Nowadays, high definition (HD) techniques have been widely used. And ultra high definition (UHD) is becoming more and more popular. However, existing researches in visual attention mainly focus on relatively low resolution videos or images. There is very limited studies in visual attention of UHD videos. In this paper, we built a Ultra High Definition (4k) Video Saliency Database. Using this database, we explored the characteristics of visual attention related to ultra high definition videos. A concept of aggregation maps (AGM) for videos is put forward to better analyse the characteristics of visual attention of videos. Through the experiment, we find that there exist fairly strong correlations between the video resolution and visual attention behaviors. We also find that people tend to focus on the center of videos of relatively low resolution. The database will be make publicly available at *** soon.
In this study, a method is proposed for pasting a user selected and copied part of a region from the source image to the target image. Since the selected areas in the source and target images are not homogenous which ...
详细信息
ISBN:
(纸本)9781467355636;9781467355629
In this study, a method is proposed for pasting a user selected and copied part of a region from the source image to the target image. Since the selected areas in the source and target images are not homogenous which means they contain texture information, most of the previous methods in the literature depending on the Poisson equation cause occurrence of adverse effects such as blur or color leakage in the processed region. The proposed method does not cause those artifacts in most cases but it makes an improvement and minimizes the artifacts. The visual results also prove that the method is promising.
The end of the performance entitlement historically achieved by classic scaling of CMOS devices is within sight, driven ultimately by fundamental limits. Performance entitlements predicted by classic CMOS scaling have...
详细信息
ISBN:
(纸本)9780819469946
The end of the performance entitlement historically achieved by classic scaling of CMOS devices is within sight, driven ultimately by fundamental limits. Performance entitlements predicted by classic CMOS scaling have progressively failed to be realized in recent process generations due to excessive leakage, increasing interconnect delays and scaling of gate dielectrics. Prior to reaching fundamental limits, trends in technology, architecture and economics will pressure the industry to adopt new paradigms. A likely response is to repartition system functions away from digital implementations and into new architectures. Future architectures for visualcommunications will require extending the implementation into the optical and analog processing domains. The fundamental properties of these domains will in turn give rise to new architectural concepts. The limits of CMOS scaling and impact on architectures will be briefly reviewed. Alternative approaches in the optical, electronic and analog domains will then be examined for advantages, architectural impact and drawbacks.
Due to the substantial storage requirements of the 4D medical images, achieving efficient compression of such images is a crucial topic. Existing traditional image/video coding methods have achieved remarkable results...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Due to the substantial storage requirements of the 4D medical images, achieving efficient compression of such images is a crucial topic. Existing traditional image/video coding methods have achieved remarkable results in most compression tasks, but their performance in encoding 4D medical images remain poor. This is because these methods cannot fully exploit the spatio-temporal correlations in 4D images. Recently, implicit neural representation (INR) based image/video compression methods have made significant progress, with coding performance comparable to traditional methods. However, they also suffer from significant performance losses in 4D medical image compression like traditional methods. In this paper, we propose an efficient hybrid representation framework, which includes six learnable feature planes and a tiny MLP decoder. This framework alleviates the issue of previous methods lacking the ability to utilize the spatio-temporal correlations in 4D medical images, enabling it to capture these information more effectively. We also introduce a novel adaptive plane scaling strategy that allocates the numbers of parameter in each plane based on the resolution of the image. This design allows the model to further enhance the reconstruction quality at the same compression ratio. Extensive experiments show that our model achieves better RD performance compared to traditional and INR-based methods, and it also offers faster encoding speeds than INR-based methods.
Computer vision tasks suffer from the high cost of collecting large amounts of labeled data. Few-shot Learning (FSL) is a dominant approach to solve this problem because it provides an insight to learn the knowledge o...
详细信息
ISBN:
(纸本)9781665475921
Computer vision tasks suffer from the high cost of collecting large amounts of labeled data. Few-shot Learning (FSL) is a dominant approach to solve this problem because it provides an insight to learn the knowledge of novel categories with few training samples. In FSL task, Meta-learning and metric learning have achieved impressive results. However, the performance of this task is still limited by large intra-class variance and small inter-class distance caused by limited number of few samples. To solve this problem, In this paper, we propose a new method, which integrates meta-learning and metric learning techniques. Specifically, we first propose a feature representation module (FR) to construct representative support class prototypes and query features. Then, we design bias loss to minimize the bias between support and query samples. Furthermore, we design an intra-class loss to minimize the distance between query class prototype and each query sample. We denote this model as ML-FDA and validate it on standard few-shot classification benchmark datasets (MiniimageNet, CIFAR-FS, FC100). The results show that our method improves the performance over other same paradigm methods and achieves the best performance on most benchmarks. The ablation study and visulization analysis also demonstrate the effectiveness of our method.
Compressive sensing imaging (CSI) is a new framework for image coding, which enables acquiring and compressing a scene simultaneously. The CS encoder shifts the bulk of the system complexity to the decoder efficiently...
详细信息
ISBN:
(纸本)9781479902880
Compressive sensing imaging (CSI) is a new framework for image coding, which enables acquiring and compressing a scene simultaneously. The CS encoder shifts the bulk of the system complexity to the decoder efficiently. Ideally, implementation of CSI provides lossless compression in image coding. In this paper, we consider the lossy compression of the CS measurements in CSI system. We design a universal quantizer for the CS measurements of any input image. The proposed method firstly establishes a universal probability model for the CS measurements in advance, without knowing any information of the input image. Then a fast quantizer is designed based on this established model. Simulation result demonstrates that the proposed method has nearly optimal rate-distortion (R similar to D) performance, meanwhile, maintains a very low computational complexity at the CS encoder.
As short video industry grows up, quality assessment of user generated videos has become a hot issue. Existing no reference video quality assessment methods are not suitable for this type of application scenario since...
详细信息
ISBN:
(纸本)9781728180687
As short video industry grows up, quality assessment of user generated videos has become a hot issue. Existing no reference video quality assessment methods are not suitable for this type of application scenario since they are aimed at synthetic videos. In this paper, we propose a novel deep blind quality assessment model for user generated videos according to content variety and temporal memory effect. Content-aware features of frames are extracted through deep neural network, and a patch-based method is adopted to obtain frame quality score. Moreover, we propose a temporal memory-based pooling model considering temporal memory effect to predict video quality. Experimental results conducted on KoNViD-1k and LIVE-VQC databases demonstrate that the performance of our proposed method outperforms other state-of-the-art ones, and the comparative analysis proves the efficiency of o ur t emporal p ooling model.
The unmixing of hyperspectral data is a hot topic in the field of r emote s ensing. H owever, in p resence o f various types of noise, especially the noisy channels, the performance of unmixing approaches is seriously...
详细信息
ISBN:
(纸本)9781728180687
The unmixing of hyperspectral data is a hot topic in the field of r emote s ensing. H owever, in p resence o f various types of noise, especially the noisy channels, the performance of unmixing approaches is seriously deteriorated. To enhance the robustness of the unmixing method is a subject worth studying. This paper presents a robust unmixing method based on the recently- proposed multilinear mixing model, where the l(2,1) norm is adopted in the loss function to suppress the influence of noise. The sparseness of abundance is also considered to improve the parameter estimation. The resulting optimization problem is solved by the alternating direction multiplier method (ADMM). Experiments on both synthetic and real images demonstrate the performance of the proposed unmixing strategy.
Recent advances in mobile device technology have turned the mobile phones into powerfull devices with high resolution cameras and fast processing capabilities. Having more user interaction potential compared to regula...
详细信息
ISBN:
(纸本)9781467373869
Recent advances in mobile device technology have turned the mobile phones into powerfull devices with high resolution cameras and fast processing capabilities. Having more user interaction potential compared to regular PCs, mobile devices with cameras can enable richer content-based object image queries: the user can capture multiple images of the query object from different viewing angles and at different scales, thereby providing much more information about the object to improve the retrieval accuracy. The goal of this paper is to improve the mobile image retrieval performance using multiple query images. To this end, we use the well-known bag-of-visual-words approach to represent the images, and employ early and late fusion strategies to utilize the information in multiple query images. With extensive experiments on an object image dataset with a single object per image, we show that multi-image queries result in higher average precision performance than single image queries.
暂无评论