In this golden age of multimedia, realistic content is in high demand with users seeking more immersive and interactive experiences. As a result, new image modalities for 3D representations have emerged in recent year...
详细信息
In this golden age of multimedia, realistic content is in high demand with users seeking more immersive and interactive experiences. As a result, new image modalities for 3D representations have emerged in recent years, among which pointclouds have deserved especial attention. Naturally, with this increase in demand, efficient storage and transmission became a must, with standardization groups such as MPEG and JPEG entering the scene, as it happened before with other types of visual media. In a surprising development, JPEG issued a Call for Proposals on point cloud coding targeting exclusively learning-based solutions, in parallel to a similar call for image coding. This is a natural consequence of the growing popularity of deep learning, which due to its excellent performances is currently dominant in the multimedia processing field, including coding. This article presents the coding solution selected by JPEG as the best-performing response to the Call for Proposals and adopted as the first version of the JPEG Pleno point cloud coding Verification Model, in practice the first step for developing a standard. The proposed solution offers a novel joint geometry and color approach for point cloud coding, in which a single deep learning model processes both geometry and color simultaneously. To maximize the RD performance for a large range of pointclouds, the proposed solution uses down-sampling and learning-based super-resolution as pre- and post-processing steps. Compared to the MPEG point cloud coding standards, the proposed coding solution comfortably outperforms G-PCC, for both geometry, color, and joint quality metrics.
Efficient point cloud coding has become increasingly critical for multiple applications such as virtual reality, autonomous driving, and digital twin systems, where rich and interactive 3D data representations may fun...
详细信息
Efficient point cloud coding has become increasingly critical for multiple applications such as virtual reality, autonomous driving, and digital twin systems, where rich and interactive 3D data representations may functionally make the difference. Deep learning has emerged as a powerful tool in this domain, offering advanced techniques for compressing pointclouds more efficiently than conventional coding methods while also allowing effective computer vision tasks performed in the compressed domain thus, for the first time, making available a common compressed visual representation effective for both man and machine. Taking advantage of this potential, JPEG has recently finalized the JPEG Pleno Learning-based point cloud coding (PCC) standard offering efficient lossy coding of static pointclouds, targeting both human visualization and machine processing by leveraging deep learning models for geometry and color coding. The geometry is processed directly in its original 3D form using sparse convolutional neural networks, while the color data is projected onto 2D images and encoded using the also learning-based JPEG AI standard. The goal of this paper is to provide a complete technical description of the JPEG PCC standard, along with a thorough benchmarking of its performance against the state-of-the-art, while highlighting its main strengths and weaknesses. In terms of compression performance, JPEG PCC outperforms the conventional MPEG PCC standards, especially in geometry coding, achieving significant rate reductions. Color compression performance is less competitive but this is overcome by the power of a full learning-based coding framework for both geometry and color and the associated effective compressed domain processing.
In this paper, a stability analysis of the JPEG Pleno Learning-based point cloud coding Verification Model (VmUC) is performed. The codec is a deep learning-based solution that is able to compress both color and geome...
详细信息
ISBN:
(纸本)9781728198354
In this paper, a stability analysis of the JPEG Pleno Learning-based point cloud coding Verification Model (VmUC) is performed. The codec is a deep learning-based solution that is able to compress both color and geometry. Three different training sessions were conducted using the default training set and cost function, and six pointclouds were encoded/decoded with the resulting operating points for six target distortion/bitrate ratios. The VmUC performance was compared with the MPEG codecs V-PCC and G-PCC, considering three objective metrics, notably PSNR MSE D1, PSNR MSE D2, and PCQM. PSNR MSE D1 was also computed at each training epoch for the six decoded pointclouds. It is concluded that the VmUC is able to outperform G-PCC and V-PCC in geometry encoding. However, it is outperformed by V-PCC in terms of color encoding, namely across all three training sessions. Furthermore, it is also shown that the codec does not present a high level of stability, changing its performance considerably with different training sessions.
作者:
Pereira, FernandoUniv Lisbon
Inst Super Tecn Inst Telecomunicacoes Av Rovisco Pais P-1049001 Lisbon Portugal
The recent advances in visual data acquisition and consumption have led to the emergence of the so-called plenoptic visual models, where pointclouds (PCs) are playing an increasingly important role. pointclouds are ...
详细信息
ISBN:
(纸本)9781450392037
The recent advances in visual data acquisition and consumption have led to the emergence of the so-called plenoptic visual models, where pointclouds (PCs) are playing an increasingly important role. pointclouds are a 3D visual model where the visual scene is represented through a set of points and associated attributes, notably color. To offer realistic and immersive experiences, pointclouds need to have millions, or even billions, of points, thus asking for efficient representation and coding solutions. This is critical for emerging applications and services, notably virtual and augmented reality, personal communications and meetings, education and medical applications and virtual museum tours. The point cloud coding field has received many contributions in recent years, notably adopting deep learning-based approaches, and it is critical for the future of immersive media experiences. In this context, the key objective of this tutorial is to review the most relevant point cloud coding solutions available in the literature with a special focus on deep learning-based solutions and its specific novel features. Special attention will be dedicated to the ongoing standardization projects in this domain, notably in JPEG and MPEG.
As the interest in deep learning tools continues to rise, new multimedia research fields begin to discover its potential. Both image and point cloud coding are good examples of technologies, where deep learning-based ...
详细信息
As the interest in deep learning tools continues to rise, new multimedia research fields begin to discover its potential. Both image and point cloud coding are good examples of technologies, where deep learning-based solutions have recently displayed very competitive performance. In this context, this article brings two novel contributions to the pointcloud geometry coding state-of-the-art;first, a novel neighborhood adaptive distortion metric to be used in the training loss function, which allows significantly improving the rate-distortion performance with commonly used objective quality metrics;second, an explicit quantization approach at the training and coding times to generate varying rate/quality with a single trained deep learning coding model, effectively reducing the training complexity and storage requirements. The result is an improved deep learning-based pointcloud geometry coding solution, which is both more compression efficient and less demanding in training complexity and storage.
Efficient coding and streaming of 360-degree video and pointcloud video are critical for the continued development of lifelike virtual reality (VR) experiences. Interactive 360-degree video applications, e.g. video c...
详细信息
Efficient coding and streaming of 360-degree video and pointcloud video are critical for the continued development of lifelike virtual reality (VR) experiences. Interactive 360-degree video applications, e.g. video conferencing, require an extremely low delay in video delivery and robustness to both network dynamics and field of view (FoV) prediction errors. We propose a frame-level FoV-adaptive coding structure that varies the bit rates for different regions of a coded frame based on the predicted FoV. Integrating such frame-level FoV adaptation with temporal predictive coding is challenging due to the temporal variations of the FoV. We propose novel ways for modeling the influence of FoV dynamics on the quality-rate performance of temporal predictive coding. Compared with other benchmark systems, our system shows significantly improved rendered video quality, while achieving very low end-to-end delay and low frame-freeze probability. Octree-based pointcloud representation and compression have been adopted by the MPEG G-PCC standard. However, it only uses handcrafted methods to predict the probability that a leaf node is non-empty, which is used for entropy coding. We propose a 3D convolution-based machine learning model to predict such probabilities for geometry coding using the context information from the previous and currently coded octree level. We further propose a convolution-based model to upsample the decoded pointcloud at a coarse resolution on the decoder side. Integration of the two approaches significantly improves the octree-based geometry coding performance. A key advantage of our work from the prior related studies is that our octree-based entropy coding model is naturally scalable. This benefits the future design of the pointcloud streaming system.
In the ever-evolving landscape of deep learning, attention models have contributed to boost the performance in diverse fields such as computer vision and natural language processing. Following this trend, this paper p...
详细信息
ISBN:
(纸本)9798350387261;9798350387254
In the ever-evolving landscape of deep learning, attention models have contributed to boost the performance in diverse fields such as computer vision and natural language processing. Following this trend, this paper proposes a novel Relational Neighborhood Self-Attention (RNSA) model, specifically designed for pointcloud (PC) geometry coding to be integrated in the emerging learning-based JPEG PCC standard. The RNSA model proposes three new methods: first, to effectively learn correlations between the points by capturing the relational features and positions of neighboring points;second, to address the inefficiencies of conventional dot product attention, a novel Relational Scoring method to generate an attention map able to capture both linear and non-linear relationships between points and their neighbors is adopted;third, the created attention maps are normalized by Sparsemax instead of Softmax to generate sparse probabilities and assigns higher scores to the most important neighbors while marginalizing the less significant ones. Experimental results show that the proposed attention model achieves around 8% gains in both BD-Rate PSNR D1 and PSNR D2 compared to the baseline codec, i.e., JPEG PCC, while adding a small number of model parameters to JPEG PCC.
pointclouds represent one of the most versatile 3D visual representation models as they can provide the user the six degrees of freedom required for a truly immersive experience. In the last decade, several point clo...
详细信息
ISBN:
(纸本)9781728198354
pointclouds represent one of the most versatile 3D visual representation models as they can provide the user the six degrees of freedom required for a truly immersive experience. In the last decade, several point cloud coding solutions have been proposed using distinct approaches, notably two MPEG standards, addressing static and dynamic point cloud coding. More recently, learning-based coding approaches started to be considered also for point cloud coding. The performance of these solutions has been so competitive that JPEG already decided to develop a point cloud coding standard adopting this novel approach. This paper proposes the first learning-based rate control mechanism to minimize the complexity associated to the selection of appropriate coding parameters for the learning-based pointcloud geometry codec adopted as the initial Verification Model for the development of the JPEG Pleno Learning-based point cloud coding standard.
Multimedia applications have been evolving towards providing users with more immersive and realistic experiences. A common way to model the light available for the users' eyes is the so-called plenoptic function -...
详细信息
Humans mainly communicate among them and with the world around them using light and vision, thus implying that visual representation technologies play a central role in human societies. While visual representation has...
详细信息
Humans mainly communicate among them and with the world around them using light and vision, thus implying that visual representation technologies play a central role in human societies. While visual representation has been based on the 2D representation paradigm for many decades, multiple developments are nowadays pressing towards the adoption of more realistic and immersive 3D visual representation models. pointclouds are one of these emerging representation models. However, the huge amount of data involved asks for highly efficient coding solutions, some of which have recently started to be developed by the MPEG and JPEG standardization groups. In this hectic context, this paper proposes a privileged view over the current point cloud coding technologies, driven by a novel, appropriate classification taxonomy. For this purpose, some of the most representative point cloud coding solutions available in the literature will be reviewed to exercise the most relevant classification paths in the proposed taxonomy. It is expected that this type of classification taxonomy and privileged view may help better understanding the point cloud coding landscape for further solid and consistent advancements in this emerging technical area.
暂无评论