Versatile Video Coding (VVC) now supports Screen Content Coding (SCC) by integrating two efficient coding modes: Intra Block Copy (IBC) and Palette (PLT). However, the numerous modes and the Quad-Tree Plus Multi-Type ...
Versatile Video Coding (VVC) now supports Screen Content Coding (SCC) by integrating two efficient coding modes: Intra Block Copy (IBC) and Palette (PLT). However, the numerous modes and the Quad-Tree Plus Multi-Type Tree (QTMT) structure inherent to VVC contribute to a very high coding complexity. To effectively reduce the computational complexity of VVC SCC, we propose a fast Intra mode prediction algorithm for VVC SCC. More specifically, we first use the difference of minimum Sum of Absolute Transformed Differences (SATD) value of four Directional Modes (DMs) of Intra and the SATD value of the IBC-merge mode to determine whether to early skip Intra checking. Subsequently, we use a decision tree to determine whether to early terminate the checking after block differential pulse coded modulation (BDPCM). Finally, we employ a decision tree to determine whether to early skip multiple transform selection (MTS) and low frequency non-separable transform (LFNST) checking. The results demonstrate that our algorithm achieves an average encoding time reduction of 34.34% with a negligible Bjøntegaard delta bitrate increase of 0.46%.
Rate control is a critical component for image and video compression Particularly under limited network bandwidth conditions, bitrate control is essential to ensure efficient image transmission by effectively allocati...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Rate control is a critical component for image and video compression Particularly under limited network bandwidth conditions, bitrate control is essential to ensure efficient image transmission by effectively allocation channel resources. In this research, since both Channel and Spatial have relationship with rate allocation, we first propose a joint Channel-wise and Spatial-wise Quantization scheme to determine optimal quantization parameters. Subsequently, we develop a quantization step estimation network to obtain parameters to efficiently allocate rate according to target rate. Experiments demonstrate that our algorithm significantly improve compressed image quality with minimal bitrate distortion and achieve accurate rate control with nearly 3% average bitrate error.
Group re-identification (G-ReID) aims to re-identify a group of people that is observed from non-overlapping camera systems. The existing literature has mainly addressed RGB-based problems, but RGB-infrared (RGB-IR) c...
Group re-identification (G-ReID) aims to re-identify a group of people that is observed from non-overlapping camera systems. The existing literature has mainly addressed RGB-based problems, but RGB-infrared (RGB-IR) cross-modality matching problem has not been studied yet. In this paper, we propose a metric learning method Closest Permutation Matching (CPM) for RGB-IR G-ReID. We model each group as a set of single-person features which are extracted by MPANet, then we propose the metric Closest Permutation Distance (CPD) to measure the similarity between two sets of features. CPD is invariant with order changes of group members so that it solves the layout change problem in G-ReID. Furthermore, we introduce the problem of G-ReID without person labels. In the weak-supervised case, we design the Relation-aware Module (RAM) that exploits visual context and relations among group members to produce a modality-invariant order of features in each group, with which group member features within a set can be sorted to form a robust group representation against modality change. To support the study on RGB-IR G-ReID, we construct a new large-scale RGB-IR G-ReID dataset CM-Group. The dataset contains 15,440 RGB images and 15,506 infrared images of 427 groups and 1,013 identi-ties. Extensive experiments on the new dataset demonstrate the effectiveness of the proposed models and the complexity of CM-Group. The code and dataset are available at: https://***/WhollyOat/CM-Group.
How can we test AI performance? This question seems trivial, but it isn't. Standard benchmarks often have problems such as in-distribution and small-size test sets, oversimplified metrics, unfair comparisons, and ...
ISBN:
(纸本)9798331314385
How can we test AI performance? This question seems trivial, but it isn't. Standard benchmarks often have problems such as in-distribution and small-size test sets, oversimplified metrics, unfair comparisons, and short-term outcome pressure. As a consequence, good performance on standard benchmarks does not guarantee success in real-world scenarios. To address these problems, we present Touchstone, a large-scale collaborative segmentation benchmark of 9 types of abdominal organs. This benchmark is based on 5,195 training CT scans from 76 hospitals around the world and 5,903 testing CT scans from 11 additional hospitals. This diverse test set enhances the statistical significance of benchmark results and rigorously evaluates AI algorithms across out-of-distribution scenarios. We invited 14 inventors of 19 AI algorithms to train their algorithms, while our team, as a third party, independently evaluated these algorithms. In addition, we also evaluated pre-existing AI frameworks—which, differing from algorithms, are more flexible and can support different algorithms—including MONAI from NVIDIA, nnU-Net from DKFZ, and numerous other open-source frameworks. We are committed to expanding this benchmark to encourage more innovation of AI algorithms for the medical domain.
Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users’ viewing experience in various real-world video-enabled media applications. As an experimental field, the im...
详细信息
Due to multi-layer encoding and Inter-layer prediction, Spatial Scalable High-Efficiency Video Coding (SSHVC) has extremely high coding complexity. It is very crucial to improve its coding speed so as to promote wides...
详细信息
Due to multi-layer encoding and Inter-layer prediction, Spatial Scalable High-Efficiency Video Coding (SSHVC) has extremely high coding complexity. It is very crucial to improve its coding speed so as to promote widespread and cost-effective SSHVC applications. In this paper, we have proposed a novel Mode Selection-Based Fast Intra Prediction algorithm for SSHVC. We reveal the RD costs of Inter-layer Reference (ILR) mode and Intra mode have a significant difference, and the RD costs of these two modes follow Gaussian distribution. Based on this observation, we propose to apply the classic Gaussian Mixture Model and Expectation Maximization in machine learning to determine whether ILR is the best mode so as to skip the Intra mode. Experimental results demonstrate that the proposed algorithm can significantly improve the coding speed with negligible coding efficiency loss.
Establishing reliable correspondences between two sets of feature points is a critical preprocessing step in many computer vision and pattern recognition tasks. In this paper, we propose a novel robust Local Neighbor ...
详细信息
With the idea of divide and rule, there exist two different forms of semantic features flowing in the two stage instance segmentation paradigms. They are the global features at the image level and the instance feature...
详细信息
Limited-angle and sparse-view computed tomography (LACT and SVCT) are crucial for expanding the scope of X-ray CT applications. However, they face challenges due to incomplete data acquisition, resulting in diverse ar...
详细信息
Alzheimer’s disease (AD) is known as one of the major causes of dementia and is characterized by slow progression over several years. There have been efforts to identify the risk of developing AD in its earliest time...
详细信息
ISBN:
(纸本)9781665429825
Alzheimer’s disease (AD) is known as one of the major causes of dementia and is characterized by slow progression over several years. There have been efforts to identify the risk of developing AD in its earliest time. Recently, multi-task feature learning (MTFL) methods with sparsity-inducing $\ell_{2,1}$-norm have been widely studied to select a discriminative feature subset from MRI features. However, they ignore the complex relationships among imaging markers and among cognitive outcomes. Constructing the relationships with simple Pearson correlation coefficient may degrade model generalizability. To better capture the complicated but more flexible relationship between the cognitive scores and the neuroimaging measures, we propose a two-stage framework to jointly learn the structure within the feature correlation as well as within the task correlation. Moreover, we propose a dual graph regularization to encode the learned correlation structure. It is able to guide the training procedure of MTFL by incorporating both the inherent correlations. Extensive results on benchmark datasets show that for the proposed FTSMTFL model trained with the dual graph regularization, the proposed joint training framework outperforms existing methods and achieves state-of-the-art cognitive prediction performance of AD.
暂无评论