In recent years, Neural Radiance Fields (NeRF) have demonstrated significant advantages in representing and synthesizing 3D scenes. Explicit NeRF models facilitate the practical NeRF applications with faster rendering...
详细信息
ISBN:
(数字)9798331529543
ISBN:
(纸本)9798331529550
In recent years, Neural Radiance Fields (NeRF) have demonstrated significant advantages in representing and synthesizing 3D scenes. Explicit NeRF models facilitate the practical NeRF applications with faster rendering speed, and also attract considerable attention in NeRF compression due to its huge storage cost. To address the challenge of the NeRF compression study, in this paper, we construct a new dataset, called Explicit-NeRF-QA. We use 22 3D objects with diverse geometries, textures, and material complexities to train four typical explicit NeRF models across five parameter levels. Lossy compression is introduced during the model generation, pivoting the selection of key parameters such as hash table size for InstantNGP and voxel grid resolution for Plenoxels. By rendering NeRF samples to processed video sequences (PVS), a large scale subjective experiment with lab environment is conducted to collect subjective scores from 21 viewers. The diversity of content, accuracy of mean opinion scores (MOS), and characteristics of NeRF distortion are comprehensively presented, establishing the heterogeneity of the proposed dataset. The state-of-the-art objective metrics are tested in the new dataset. Best Pearson correlation, which is around 0.85, is collected from the full-reference objective metric. All tested no-reference metrics report very poor results with 0.4 to 0.6 correlations, demonstrating the need for further development of more robust no-reference metrics. The dataset, including NeRF samples, source 3D objects, multiview images for NeRF generation, PVSs, MOS, is made publicly available at the following location: https://***/YukeXing/Explicit-NeRF-QA.
Human-object interaction (HOI) detection is a meaningful research topic on human activity understanding. Recent works have made significant progress by focusing on efficient triplet matching and leveraging image-wide ...
详细信息
ISBN:
(纸本)9781665475938
Human-object interaction (HOI) detection is a meaningful research topic on human activity understanding. Recent works have made significant progress by focusing on efficient triplet matching and leveraging image-wide features based on encoder-decoder architecture. However, the ability to gather relevant contextual information about human is limited and different sub-tasks in HOI detection are not differentiated by specific decoupling in previous methods. To this end, we propose a new transformer-based method for HOI detection, namely, Mask-Guided Transformer (MGT). Our model, which is composed of five parallel decoders with a shared encoder, not only emphasizes interactive regions by applying body features, but also disentangles the prediction of instance and interaction. We achieve a favorable result at 63.3 mAP on the well-known HOI detection dataset V-COCO.
In this paper, a new haze restoration algorithm is proposed to recover hazy images. Hazy images always suffer from color distortion, detail-information loss and contrast reduction, which degrade the visual quality of ...
详细信息
In this paper, we present a novel yet intuitive unsupervised feature learning approach, referred to as Minimizing Interframe Differences (MID). The idea is the following: as long as the unsupervised features successfu...
详细信息
Video cropping is a key research task in video processing field. In this paper, a spatio-temporal saliency based video cropping framework (SalCrop) is introduced including four core modules: video scene detection modu...
详细信息
ISBN:
(纸本)9781665475938
Video cropping is a key research task in video processing field. In this paper, a spatio-temporal saliency based video cropping framework (SalCrop) is introduced including four core modules: video scene detection module, video saliency prediction module, adaptive cropping module, and video codec module. It can automatically reframe videos in the desired aspect ratios. In addition, a large-scale video cropping dataset (VCD) is built for training and testing. Experiments on the VCD test dataset show that our SalCrop outperforms the state-of-the-art algorithms with high efficiency. Besides, a FFmpeg video filter is developed based on the framework, which can be widely used in different scenarios. A demo is available at: https://***/smartcontent/videoCrop (access token: test_token).
In this study, we propose a Clip-Driven Contrastive Learning for Skeleton-Based Action Recognition (CdCLR). In-stead of considering sequences as instances, CdCLR extracts clips from the sequences as new instances. Aim...
详细信息
ISBN:
(纸本)9781665475938
In this study, we propose a Clip-Driven Contrastive Learning for Skeleton-Based Action Recognition (CdCLR). In-stead of considering sequences as instances, CdCLR extracts clips from the sequences as new instances. Aim to implement inherent supervision-guided contrastive learning through joint optimal training of sequences discrimination, clips discrimination, and order verification. Mining abundant positive/negative pairs inside sequence while learning inter-and intra-sequence semantic repre-sentations. Extensive experiments on the NTU RGB+D 60, UCLA and iMiGUE datasets present that CdCLR exhibits superior performance under various evaluation protocols and reaches state-of-the-art. Our code is available at https://***/Erich-G/CdCLRI.
Deep learning models perform exceptionally well in imageprocessing; however, the increasing number of parameters and computational burdens make it difficult to apply these models on mobile devices. To address this is...
详细信息
ISBN:
(纸本)9798400707674
Deep learning models perform exceptionally well in imageprocessing; however, the increasing number of parameters and computational burdens make it difficult to apply these models on mobile devices. To address this issue, a widely adopted approach is using knowledge distillation to compress and optimize deep learning models. Traditional knowledge distillation trains a student network with soft targets from a teacher network. This paper proposes a Novel Dual Knowledge Distillation (NDKD) network by integrating Convolutional Block Attention Module (CBAM) distillation and feature distillation. Specifically, a new CBAM module and a hash layer are first added to improve the teacher-student network. Next, attention distillation loss is computed to transfer CBAM knowledge from the teacher module to the student network to enhance the student network's ability to capture low-level visual features. Finally, feature distillation loss is constructed by hashing layers to transfer semantic knowledge from the teacher network to the student network to enhance the student network's high-level semantic ability. Our method is compared with other knowledge distillation methods on benchmarking datasets including the CIFAR-10, SVNM, EMNIST, and Tire Pattern image dataset CIIP-TPID. Experimental results show that our method outperforms other methods on adapted datasets.
In this paper, a new predictive wavelet transform (PWT) is proposed to solve LiDAR point clouds attribute compression. Our method is a combination of predictive coding and Haar wavelet transform. Based on the spatial ...
详细信息
ISBN:
(纸本)9781665475938
In this paper, a new predictive wavelet transform (PWT) is proposed to solve LiDAR point clouds attribute compression. Our method is a combination of predictive coding and Haar wavelet transform. Based on the spatial information, a hierarchical predictive transform tree is designed to represent 3D irregular data points efficiently. Each level node is classified as a predictive node (P-node) or a transform node (T-node) according to the distances to its adjacent nodes. Then in a top-down coding process, the Haar transform is applied to all T-node pairs, and predictive coding is processed on all P-nodes alternately. It is shown by experimental results that the proposed PWT method offers better R-D performances compared with state-of-the-art methods.
In this paper, we propose a method to skip unnecessary CU encoding modes for VVC Intra coding based on the extra trees model. Two extra tree models with calculated features are used to simplify the encoding process, w...
详细信息
ISBN:
(纸本)9781665475938
In this paper, we propose a method to skip unnecessary CU encoding modes for VVC Intra coding based on the extra trees model. Two extra tree models with calculated features are used to simplify the encoding process, where the first model determines whether to early terminate the partition and the best partition direction and the second model selects the better partition mode between the binary and ternary partition modes. Experimental results show that our proposed method can save encoding time from 34.68% to 46.70 % with only from 0.81% to 1.65% increase of BDBR compared to VVC reference software (VTM 10.0). Besides, the method gets a great tradeoff when applied on VVenC 1.0, an efficient encoder ofVVC, at both preset slower and preset medium.
A set of auto encoders is trained to perform intra prediction for block-based video coding. Each auto encoder consists of an encoding network and a decoding network. Both encoding network and decoding networks are joi...
详细信息
ISBN:
(纸本)9781665475938
A set of auto encoders is trained to perform intra prediction for block-based video coding. Each auto encoder consists of an encoding network and a decoding network. Both encoding network and decoding networks are jointly optimized and integrated into the state-of-the-art VVC reference software VTM-11.0 as an additional intra prediction mode. The simulation is conducted under common test conditions with all intra config-urations and the test results show 1.55%, 1.04%, and 0.99% of Y, U, V components Bjentegaard-Delta bit rate saving compared to VTM-11.0 anchor, respectively. The overall relative decoding running time of proposed autoencoder-based intra prediction mode on top of VTM-11.0 are 408% compared to VTM-11.0.
暂无评论