Camera calibration for sport videos enables precise and natural delivery of graphics on video footage and several other special effects. This in turns substantially improves the visual experience in the audience and f...
详细信息
ISBN:
(纸本)9781728173221
Camera calibration for sport videos enables precise and natural delivery of graphics on video footage and several other special effects. This in turns substantially improves the visual experience in the audience and facilitates sports analysis within or after the live show. In this paper, we propose a high accuracy camera calibration method for sport videos. First, we generate a homography database by uniformly sampling camera parameters. This database includes more than 91 thousand different homography matrices. Then, we use the conditional generative adversarial network (cGAN) to achieve semantic segmentation splitting the broadcast frames into four classes. In a subsequent processing step, we build an effective feature extraction network to extract the feature of semantic segmented images. After that, we search for the feature in the database to find the best matching homography. Finally, we refine the homography by image alignment. In a comprehensive evaluation using the 2014 World Cup dataset, our method outperforms other state-of-the-art techniques.
This paper presents a learning-based complexity reduction scheme for Versatile Video Coding (VVC) intra-frame prediction. VVC introduces several novel coding tools to improve the coding efficiency of the intra-frame p...
详细信息
ISBN:
(纸本)9781728173221
This paper presents a learning-based complexity reduction scheme for Versatile Video Coding (VVC) intra-frame prediction. VVC introduces several novel coding tools to improve the coding efficiency of the intra-frame prediction at the cost of a high computational effort. Thus, we developed an efficient complexity reduction scheme composed of three solutions based on machine learning and statistical analysis to reduce the number of intra prediction modes evaluated in the costly Rate-Distortion Optimization (RDO) process. Experimental results demonstrated that the proposed solution provides 18.32% encoding timesaving with a negligible impact on the coding efficiency.
In recent years, Location Based Services (LBS) have developed rapidly. There is an urgent need to establish a low-cost and convenient method to get precise positions indoors. Therefore, we choose an easier way of usin...
In recent years, Location Based Services (LBS) have developed rapidly. There is an urgent need to establish a low-cost and convenient method to get precise positions indoors. Therefore, we choose an easier way of using monocular camera mounted on a mobile terminal to achieve indoor visual positioning. In this paper, two main problems are studied under this condition. Firstly, the feature extraction method is designed to minimize the loss of positioning information while processing quickly in textureless environment. Secondly, the relative direction vector obtained from the epipolar geometry cannot converge at the point to be located due to errors, so a position determination method is proposed to transform the problem of positioning into distance calculation. A WeChat mini program was built for users to obtain location information. Experimental results show that 90% positioning error was less than 0. 5m, and the average error was less than 0.32m.
Synthetic DNA has received much attention recently as a long-term archival medium alternative due to its high density and durability characteristics. However, most current work has primarily focused on using DNA as a ...
详细信息
ISBN:
(纸本)9781728173221
Synthetic DNA has received much attention recently as a long-term archival medium alternative due to its high density and durability characteristics. However, most current work has primarily focused on using DNA as a precise storage medium. In this work, we take an alternate view of DNA. Using neural-network-based compression techniques, we transform images into a latent-space representation, which we then store on DNA. By doing so, we transform DNA into an approximate image storage medium, as images generated back from DNA are only approximate representations of the original images. Using several datasets, we investigate the storage benefits of approximation, and study the impact of DNA storage errors (substitutions, indels, bias) on the quality of approximation. In doing so, we demonstrate the feasibility and potential of viewing DNA as an approximate storage medium.
Underwater images suffer from low contrast, color distortion and visibility degradation due to the light scattering and attenuation. Over the past few years, the importance of underwater image enhancement has increase...
详细信息
ISBN:
(纸本)9781728173221
Underwater images suffer from low contrast, color distortion and visibility degradation due to the light scattering and attenuation. Over the past few years, the importance of underwater image enhancement has increased because of ocean engineering and underwater robotics. Existing underwater image enhancement methods are based on various assumptions. However, it is almost impossible to define appropriate assumptions for underwater images due to the diversity of underwater images. Therefore, they are only effective for specific types of underwater images. Recently, underwater image enhancement algorisms using CNNs and GANS have been proposed, but they are not as advanced as other imageprocessing methods due to the lack of suitable training data sets and the complexity of the issues. To solve the problems, we propose a novel underwater image enhancement method which combines the residual feature attention block and novel combination of multi-scale and multi-patch structure. Multi-patch network extracts local features to adjust to various underwater images which are often Non-homogeneous. In addition, our network includes multi-scale network which is often effective for image restoration. Experimental results show that our proposed method outperforms the conventional method for various types of images.
As an emerging media format, virtual reality (VR) has attracted the attention of researchers. 6-DoF VR can reconstruct the surrounding environment with the help of the depth information of the scene, so as to provide ...
详细信息
ISBN:
(纸本)9781728173221
As an emerging media format, virtual reality (VR) has attracted the attention of researchers. 6-DoF VR can reconstruct the surrounding environment with the help of the depth information of the scene, so as to provide users with immersive experience. However, due to the lack of depth information in panoramic image, it is still a challenge to convert panorama to 6-DOF VR. In this paper, we propose a new depth estimation method SPCNet based on spherical convolution to solve the problem of depth information restoration of panoramic image. Particularly, spherical convolution is introduced to improve depth estimation accuracy by reducing distortion, which is attributed to Equi-Rectangular Projection (ERP). The experimental results show that many indicators of SPCNet are better than other advanced networks. For example, RMSE is 0.419 lower than UResNet. Moreover, the threshold accuracy of depth estimation has also been improved.
Video chat becomes more and more popular in our daily life. However, how to provide a high-quality video chat with the limited bandwidth is a key challenging task. In this paper, beyond the state-of-the-art video comp...
详细信息
ISBN:
(纸本)9781728173221
Video chat becomes more and more popular in our daily life. However, how to provide a high-quality video chat with the limited bandwidth is a key challenging task. In this paper, beyond the state-of-the-art video compression system, we propose an encoder-decoder joint enhancement algorithm for the video chat. In particular, the sparse map of the original frame is extracted at the encoder side and signaled to the decoder, which is utilized together with the sparse map of the decoded frame to obtain the boundary transformation map. In this manner, the boundary transformation map represents the key difference between the original frame and the decoded frame and hence can be used to enhance the decoded frame. Experimental results show that the proposed algorithm brings clear subjective and objective quality improvements. At the same quality, the proposed algorithm can achieve 35% bitrate savings compared to the VVC.
In the video saliency prediction task, one of the key issues is the utilization of temporal contextual information of keyframes. In this paper, a deep reinforcement learning agent for video saliency prediction is prop...
详细信息
ISBN:
(纸本)9781728173221
In the video saliency prediction task, one of the key issues is the utilization of temporal contextual information of keyframes. In this paper, a deep reinforcement learning agent for video saliency prediction is proposed, designed to look around adjacent frames and adaptively generate a salient contextual window that contains the most correlated information of keyframe for saliency prediction. More specifically, an action set step by step decides whether to expand the window, meanwhile a state set and reward function evaluate the effectiveness of the current window. The deep Q-learning algorithm is followed to train the agent to learn a policy to achieve its goal. The proposed agent can be regarded as plug-and-play which is compatible with generic video saliency prediction models. Experimental results on various datasets demonstrate that our method can achieve an advanced performance.
Recently, scene text detection based on deep learning has progressed substantially. Nevertheless, most previous models with FPN are limited by the drawback of sample interpolation algorithms, which fail to generate hi...
详细信息
ISBN:
(纸本)9781728173221
Recently, scene text detection based on deep learning has progressed substantially. Nevertheless, most previous models with FPN are limited by the drawback of sample interpolation algorithms, which fail to generate high-quality up-sampled features. Accordingly, we propose an end-to-end trainable text detector to alleviate the above dilemma. Specifically, a Back Projection Enhanced Up-sampling (BPEU) block is proposed to alleviate the drawback of sample interpolation algorithms. It significantly enhances the quality of up-sampled features by employing back projection and detail compensation. Further-more, a Multi-Dimensional Attention (MDA) block is devised to learn different knowledge from spatial and channel dimensions, which intelligently selects features to generate more discriminative representations. Experimental results on three benchmarks, ICDAR2015, ICDAR2017- MLT and MSRA-TD500, demonstrate the effectiveness of our method.
This paper presents a learning-based method to improve bi-prediction in video coding. In conventional video coding solutions, the motion compensation of blocks from already decoded reference pictures stands out as the...
详细信息
ISBN:
(纸本)9781728173221
This paper presents a learning-based method to improve bi-prediction in video coding. In conventional video coding solutions, the motion compensation of blocks from already decoded reference pictures stands out as the principal tool used to predict the current frame. Especially, the bi-prediction, in which a block is obtained by averaging two different motion-compensated prediction blocks, significantly improves the final temporal prediction accuracy. In this context, we introduce a simple neural network that further improves the blending operation. A complexity balance, both in terms of network size and encoder mode selection, is carried out. Extensive tests on top of the recently standardized VVC codec are performed and show a BD-rate improvement of −1.4% in random access configuration for a network size of fewer than 10k parameters. We also propose a simple CPU-based implementation and direct network quantization to assess the complexity/gains tradeoff in a conventional codec framework.
暂无评论