Camera calibration for sport videos enables precise and natural delivery of graphics on video footage and several other special effects. This in turns substantially improves the visual experience in the audience and f...
详细信息
ISBN:
(纸本)9781728173221
Camera calibration for sport videos enables precise and natural delivery of graphics on video footage and several other special effects. This in turns substantially improves the visual experience in the audience and facilitates sports analysis within or after the live show. In this paper, we propose a high accuracy camera calibration method for sport videos. First, we generate a homography database by uniformly sampling camera parameters. This database includes more than 91 thousand different homography matrices. Then, we use the conditional generative adversarial network (cGAN) to achieve semantic segmentation splitting the broadcast frames into four classes. In a subsequent processing step, we build an effective feature extraction network to extract the feature of semantic segmented images. After that, we search for the feature in the database to find the best matching homography. Finally, we refine the homography by image alignment. In a comprehensive evaluation using the 2014 World Cup dataset, our method outperforms other state-of-the-art techniques.
Recently, scene text detection based on deep learning has progressed substantially. Nevertheless, most previous models with FPN are limited by the drawback of sample interpolation algorithms, which fail to generate hi...
详细信息
ISBN:
(纸本)9781728173221
Recently, scene text detection based on deep learning has progressed substantially. Nevertheless, most previous models with FPN are limited by the drawback of sample interpolation algorithms, which fail to generate high-quality up-sampled features. Accordingly, we propose an end-to-end trainable text detector to alleviate the above dilemma. Specifically, a Back Projection Enhanced Up-sampling (BPEU) block is proposed to alleviate the drawback of sample interpolation algorithms. It significantly enhances the quality of up-sampled features by employing back projection and detail compensation. Further-more, a Multi-Dimensional Attention (MDA) block is devised to learn different knowledge from spatial and channel dimensions, which intelligently selects features to generate more discriminative representations. Experimental results on three benchmarks, ICDAR2015, ICDAR2017- MLT and MSRA-TD500, demonstrate the effectiveness of our method.
This paper presents a learning-based method to improve bi-prediction in video coding. In conventional video coding solutions, the motion compensation of blocks from already decoded reference pictures stands out as the...
详细信息
ISBN:
(纸本)9781728173221
This paper presents a learning-based method to improve bi-prediction in video coding. In conventional video coding solutions, the motion compensation of blocks from already decoded reference pictures stands out as the principal tool used to predict the current frame. Especially, the bi-prediction, in which a block is obtained by averaging two different motion-compensated prediction blocks, significantly improves the final temporal prediction accuracy. In this context, we introduce a simple neural network that further improves the blending operation. A complexity balance, both in terms of network size and encoder mode selection, is carried out. Extensive tests on top of the recently standardized VVC codec are performed and show a BD-rate improvement of −1.4% in random access configuration for a network size of fewer than 10k parameters. We also propose a simple CPU-based implementation and direct network quantization to assess the complexity/gains tradeoff in a conventional codec framework.
RGB-T tracker owns the capability of fusing two different yet complementary target observations, thus it will become a promising technology to fulfill all-weather tracking. Existing convolutional neural network based ...
详细信息
Anomaly detection is an important task in many traffic applications. Methods based on deep learning networks reach high accuracy; however, they typically rely on supervised training with large annotated data. Consider...
详细信息
ISBN:
(纸本)9781728173221
Anomaly detection is an important task in many traffic applications. Methods based on deep learning networks reach high accuracy; however, they typically rely on supervised training with large annotated data. Considering that anomalous data are not easy to obtain, we present data transformation methods which convert the data obtained from one intersection to other intersections to mitigate the effort of collecting training data. The proposed methods are demonstrated on the task of anomalous trajectory detection. A General model and a Universal model are proposed. The former focuses on saving data collection effort; the latter further reduces the network training effort. We evaluated the methods on the dataset with trajectories from four intersections in GTA V virtual world. The experimental results show that with significant reduction in data collecting and network training efforts, the proposed anomalous trajectory detection still achieves state-of-the-art accuracy.
Secret keys generated from wireless channel can provide high level of security. The error correction code (ECC) based information reconciliation method owns the benefits of high security and efficiency. In this paper,...
Secret keys generated from wireless channel can provide high level of security. The error correction code (ECC) based information reconciliation method owns the benefits of high security and efficiency. In this paper, we designed a highly robust secret key reconcile system using low-density parity-check (LDPC) codes to correct disagreed key bits. In this system, we cache some high-quality secret key bits in a cyclic shift buffer (CSB) for compensation, which can effectively reduce the over-all key disagreement rate (KDR). Combined with a simple hybrid automatic repeat request (HARQ) implementation, this system significantly improves the correction success rate of information reconciliation. We built a realtime secret key generation system using universal software radio peripheral (USRP). Extensive experiments were carried out in different channel state information (CSI) scenarios. The proposed system can improve the success rate of secret key generation effectively in all experiment scenarios, especially in the case of low USRP Tx/Rx gain setups.
With the development of science and technology, more and more high-tech technologies are applied to the tracking system of dance virtual human. The remote dynamic interaction detection method designed based on AdaBoos...
详细信息
ISBN:
(纸本)9781665491129
With the development of science and technology, more and more high-tech technologies are applied to the tracking system of dance virtual human. The remote dynamic interaction detection method designed based on AdaBoost algorithm is widely used in real life. This paper mainly designs the remote dynamic interaction, dynamic monitoring and auxiliary functions based on AdaBoost algorithm (imageprocessing and matching algorithm). Firstly, digital coding and compression are widely used in video transmission. Secondly, it makes a brief theoretical analysis of AdaBoost and its related mathematical models, and then establishes a virtual human tracking system using MATLAB software, and gives the corresponding experimental results. The final experimental results show that the detection accuracy of the virtual person tracking technology is more than 90%, and the false detection rate is low.
Two-pass rate control (RC) schemes have proven useful for generating low-bitrate video-on-demand or streaming catalogs. visually optimized encoding particularly using latest-generation coding standards like Versatile ...
详细信息
ISBN:
(纸本)9781728173221
Two-pass rate control (RC) schemes have proven useful for generating low-bitrate video-on-demand or streaming catalogs. visually optimized encoding particularly using latest-generation coding standards like Versatile Video Coding (VVC), however, is still a subject of intensive study. This paper describes the two-pass RC method integrated into version 1 of VVenC, an open VVC encoding software. The RC design is based on a novel two-step rate-quantization parameter (R-QP) model to derive the second-pass coding parameters, and it uses the low-complexity XPSNR visual distortion measure to provide numerically as well as visually stable, perceptually R-D optimized encoding results. Random-access evaluation experiments confirm the improved objective as well as subjective performance of our RC solution.
The proceedings contain 57 papers. The topics discussed include: text information source modeling for learning monitoring system;providing a desired quality of BPG compressed images for FSIM metric;a model for managin...
ISBN:
(纸本)9781665438452
The proceedings contain 57 papers. The topics discussed include: text information source modeling for learning monitoring system;providing a desired quality of BPG compressed images for FSIM metric;a model for managing the functional services of a common information space for defense forces based on smart technology;marker information coding for structural clustering of spectral space;web application critical resources protection;model of steganographic system depending on indirect conditional dependencies;method of determining the required number of database nodes in a distributed data processing system;and analysis of noisy image lossy compression by bpg using visual quality metrics.
The saliency prediction of panoramic images is dramatically affected by the distortion caused by non-Euclidean geometry characteristic. Traditional CNN based saliency pre-diction algorithms for 2D images are no longer...
详细信息
ISBN:
(纸本)9781728173221
The saliency prediction of panoramic images is dramatically affected by the distortion caused by non-Euclidean geometry characteristic. Traditional CNN based saliency pre-diction algorithms for 2D images are no longer suitable for 360-degree images. Intuitively, we propose a graph based fully convolutional network for saliency prediction of 360-degree images, which can reasonably map panoramic pixels to spherical graph data structures for representation. The saliency prediction network is based on residual U-Net architecture, with dilated graph convolutions and attention mechanism in the bottleneck. Furthermore, we design a fully convolutional layer for graph pooling and unpooling operations in spherical graph space to retain node-to-node features. Experimental results show that our proposed method outperforms other state-of-the-art saliency models on the large-scale dataset.
暂无评论