In this golden age of multimedia, realistic content is in high demand with users seeking more immersive and interactive experiences. As a result, new image modalities for 3D representations have emerged in recent year...
详细信息
In this golden age of multimedia, realistic content is in high demand with users seeking more immersive and interactive experiences. As a result, new image modalities for 3D representations have emerged in recent years, among which pointclouds have deserved especial attention. Naturally, with this increase in demand, efficient storage and transmission became a must, with standardization groups such as MPEG and JPEG entering the scene, as it happened before with other types of visual media. In a surprising development, JPEG issued a Call for Proposals on point cloud coding targeting exclusively learning-based solutions, in parallel to a similar call for image coding. This is a natural consequence of the growing popularity of deep learning, which due to its excellent performances is currently dominant in the multimedia processing field, including coding. This article presents the coding solution selected by JPEG as the best-performing response to the Call for Proposals and adopted as the first version of the JPEG Pleno point cloud coding Verification Model, in practice the first step for developing a standard. The proposed solution offers a novel joint geometry and color approach for point cloud coding, in which a single deep learning model processes both geometry and color simultaneously. To maximize the RD performance for a large range of pointclouds, the proposed solution uses down-sampling and learning-based super-resolution as pre- and post-processing steps. Compared to the MPEG point cloud coding standards, the proposed coding solution comfortably outperforms G-PCC, for both geometry, color, and joint quality metrics.
Efficient point cloud coding has become increasingly critical for multiple applications such as virtual reality, autonomous driving, and digital twin systems, where rich and interactive 3D data representations may fun...
详细信息
Efficient point cloud coding has become increasingly critical for multiple applications such as virtual reality, autonomous driving, and digital twin systems, where rich and interactive 3D data representations may functionally make the difference. Deep learning has emerged as a powerful tool in this domain, offering advanced techniques for compressing pointclouds more efficiently than conventional coding methods while also allowing effective computer vision tasks performed in the compressed domain thus, for the first time, making available a common compressed visual representation effective for both man and machine. Taking advantage of this potential, JPEG has recently finalized the JPEG Pleno Learning-based point cloud coding (PCC) standard offering efficient lossy coding of static pointclouds, targeting both human visualization and machine processing by leveraging deep learning models for geometry and color coding. The geometry is processed directly in its original 3D form using sparse convolutional neural networks, while the color data is projected onto 2D images and encoded using the also learning-based JPEG AI standard. The goal of this paper is to provide a complete technical description of the JPEG PCC standard, along with a thorough benchmarking of its performance against the state-of-the-art, while highlighting its main strengths and weaknesses. In terms of compression performance, JPEG PCC outperforms the conventional MPEG PCC standards, especially in geometry coding, achieving significant rate reductions. Color compression performance is less competitive but this is overcome by the power of a full learning-based coding framework for both geometry and color and the associated effective compressed domain processing.
In this paper, a stability analysis of the JPEG Pleno Learning-based point cloud coding Verification Model (VmUC) is performed. The codec is a deep learning-based solution that is able to compress both color and geome...
详细信息
ISBN:
(纸本)9781728198354
In this paper, a stability analysis of the JPEG Pleno Learning-based point cloud coding Verification Model (VmUC) is performed. The codec is a deep learning-based solution that is able to compress both color and geometry. Three different training sessions were conducted using the default training set and cost function, and six pointclouds were encoded/decoded with the resulting operating points for six target distortion/bitrate ratios. The VmUC performance was compared with the MPEG codecs V-PCC and G-PCC, considering three objective metrics, notably PSNR MSE D1, PSNR MSE D2, and PCQM. PSNR MSE D1 was also computed at each training epoch for the six decoded pointclouds. It is concluded that the VmUC is able to outperform G-PCC and V-PCC in geometry encoding. However, it is outperformed by V-PCC in terms of color encoding, namely across all three training sessions. Furthermore, it is also shown that the codec does not present a high level of stability, changing its performance considerably with different training sessions.
作者:
Pereira, FernandoUniv Lisbon
Inst Super Tecn Inst Telecomunicacoes Av Rovisco Pais P-1049001 Lisbon Portugal
The recent advances in visual data acquisition and consumption have led to the emergence of the so-called plenoptic visual models, where pointclouds (PCs) are playing an increasingly important role. pointclouds are ...
详细信息
ISBN:
(纸本)9781450392037
The recent advances in visual data acquisition and consumption have led to the emergence of the so-called plenoptic visual models, where pointclouds (PCs) are playing an increasingly important role. pointclouds are a 3D visual model where the visual scene is represented through a set of points and associated attributes, notably color. To offer realistic and immersive experiences, pointclouds need to have millions, or even billions, of points, thus asking for efficient representation and coding solutions. This is critical for emerging applications and services, notably virtual and augmented reality, personal communications and meetings, education and medical applications and virtual museum tours. The point cloud coding field has received many contributions in recent years, notably adopting deep learning-based approaches, and it is critical for the future of immersive media experiences. In this context, the key objective of this tutorial is to review the most relevant point cloud coding solutions available in the literature with a special focus on deep learning-based solutions and its specific novel features. Special attention will be dedicated to the ongoing standardization projects in this domain, notably in JPEG and MPEG.
As the interest in deep learning tools continues to rise, new multimedia research fields begin to discover its potential. Both image and point cloud coding are good examples of technologies, where deep learning-based ...
详细信息
As the interest in deep learning tools continues to rise, new multimedia research fields begin to discover its potential. Both image and point cloud coding are good examples of technologies, where deep learning-based solutions have recently displayed very competitive performance. In this context, this article brings two novel contributions to the pointcloud geometry coding state-of-the-art;first, a novel neighborhood adaptive distortion metric to be used in the training loss function, which allows significantly improving the rate-distortion performance with commonly used objective quality metrics;second, an explicit quantization approach at the training and coding times to generate varying rate/quality with a single trained deep learning coding model, effectively reducing the training complexity and storage requirements. The result is an improved deep learning-based pointcloud geometry coding solution, which is both more compression efficient and less demanding in training complexity and storage.
In the ever-evolving landscape of deep learning, attention models have contributed to boost the performance in diverse fields such as computer vision and natural language processing. Following this trend, this paper p...
详细信息
ISBN:
(纸本)9798350387261;9798350387254
In the ever-evolving landscape of deep learning, attention models have contributed to boost the performance in diverse fields such as computer vision and natural language processing. Following this trend, this paper proposes a novel Relational Neighborhood Self-Attention (RNSA) model, specifically designed for pointcloud (PC) geometry coding to be integrated in the emerging learning-based JPEG PCC standard. The RNSA model proposes three new methods: first, to effectively learn correlations between the points by capturing the relational features and positions of neighboring points;second, to address the inefficiencies of conventional dot product attention, a novel Relational Scoring method to generate an attention map able to capture both linear and non-linear relationships between points and their neighbors is adopted;third, the created attention maps are normalized by Sparsemax instead of Softmax to generate sparse probabilities and assigns higher scores to the most important neighbors while marginalizing the less significant ones. Experimental results show that the proposed attention model achieves around 8% gains in both BD-Rate PSNR D1 and PSNR D2 compared to the baseline codec, i.e., JPEG PCC, while adding a small number of model parameters to JPEG PCC.
pointclouds represent one of the most versatile 3D visual representation models as they can provide the user the six degrees of freedom required for a truly immersive experience. In the last decade, several point clo...
详细信息
ISBN:
(纸本)9781728198354
pointclouds represent one of the most versatile 3D visual representation models as they can provide the user the six degrees of freedom required for a truly immersive experience. In the last decade, several point cloud coding solutions have been proposed using distinct approaches, notably two MPEG standards, addressing static and dynamic point cloud coding. More recently, learning-based coding approaches started to be considered also for point cloud coding. The performance of these solutions has been so competitive that JPEG already decided to develop a point cloud coding standard adopting this novel approach. This paper proposes the first learning-based rate control mechanism to minimize the complexity associated to the selection of appropriate coding parameters for the learning-based pointcloud geometry codec adopted as the initial Verification Model for the development of the JPEG Pleno Learning-based point cloud coding standard.
Multimedia applications have been evolving towards providing users with more immersive and realistic experiences. A common way to model the light available for the users' eyes is the so-called plenoptic function -...
详细信息
Humans mainly communicate among them and with the world around them using light and vision, thus implying that visual representation technologies play a central role in human societies. While visual representation has...
详细信息
Humans mainly communicate among them and with the world around them using light and vision, thus implying that visual representation technologies play a central role in human societies. While visual representation has been based on the 2D representation paradigm for many decades, multiple developments are nowadays pressing towards the adoption of more realistic and immersive 3D visual representation models. pointclouds are one of these emerging representation models. However, the huge amount of data involved asks for highly efficient coding solutions, some of which have recently started to be developed by the MPEG and JPEG standardization groups. In this hectic context, this paper proposes a privileged view over the current point cloud coding technologies, driven by a novel, appropriate classification taxonomy. For this purpose, some of the most representative point cloud coding solutions available in the literature will be reviewed to exercise the most relevant classification paths in the proposed taxonomy. It is expected that this type of classification taxonomy and privileged view may help better understanding the point cloud coding landscape for further solid and consistent advancements in this emerging technical area.
Surface light-field (SLF) is a mapping of a set of color vectors to a set of ray vectors that originate at a point on a surface. It enables rendering photo-realistic view points in extended reality applications. Howev...
详细信息
ISBN:
(纸本)9781728193205
Surface light-field (SLF) is a mapping of a set of color vectors to a set of ray vectors that originate at a point on a surface. It enables rendering photo-realistic view points in extended reality applications. However, the amount of data required to represent SLF is significantly more. Therefore, storing and distributing SLFs requires an efficient compressed representation. The Motion Pictures Experts Group (MPEG) has an on-going standard activity for the compression of pointclouds. Until recently, this activity was targeting compression of single texture information, but is now investigating view dependent textures. In this paper, we propose methods to optimize coding of view dependent color without compromising on the visual quality. Our results show the optimizations provided in this paper reduce coded HEVC bit rate by 64% for the all-intra configuration and 52% for the random-access configuration, when compared to coding all texture independently.
暂无评论