In this paper, we propose a simplified cubic polynomial R-D model with corresponding rate control methods for Versatile Video Coding (VVC) intra frame coding. First, we explore the rate-distortion (R-D) characteristic...
In this paper, we propose a simplified cubic polynomial R-D model with corresponding rate control methods for Versatile Video Coding (VVC) intra frame coding. First, we explore the rate-distortion (R-D) characteristics of VVC intra coding. By comparing several potential R-D modeling approaches, a new intra coding R-D model has been proposed based on the simplified cubic polynomial function. Subsequently, we derive the corresponding $R-\lambda$ model and introduce a complexity measurement to improve the performance of intra frame rate control. Furthermore, we propose a Coding Tree Unit (CTU)-level rate control method based on the newly proposed R-D model and further develop a pre-compression-based approach on this basis. Experimental results show that the proposed method can achieve 1.87% and 0.55% bit rate reduction for All-Intra (AI) and Random-Access (RA) configurations over the original rate control in VVC Test Model (VTM), while the computational complexity increment is negligible. Meanwhile, the enhanced bit rate accuracy from rate control has been observed in the proposed methods.
Anomaly detection and localization are widely used in industrial manufacturing for its efficiency and effectiveness. Anomalies are rare and hard to collect and supervised models easily over-fit to these seen anomalies...
详细信息
Due to the proliferation of internet evaluations brought on by the rising demand for smartphones, consumers find it challenging to make accurate selections when purchasing. In this paper, we offer ensemble voting meth...
详细信息
Recent advances in image editing techniques have posed serious challenges to the trustworthiness of multimedia data, which drives the research of image tampering detection. In this paper, we propose ObjectFormer to de...
详细信息
Weakly-supervised audio-visual violence detection aims to distinguish snippets containing multimodal violence events with video-level labels. Many prior works perform audio-visual integration and interaction in an ear...
详细信息
Fast adapting to unknown peers (partners or opponents) with different strategies is a key challenge in multi-agent games. To do so, it is crucial for the agent to probe and identify the peer's strategy efficiently...
Fast adapting to unknown peers (partners or opponents) with different strategies is a key challenge in multi-agent games. To do so, it is crucial for the agent to probe and identify the peer's strategy efficiently, as this is the prerequisite for carrying out the best response in adaptation. However, exploring the strategies of unknown peers is difficult, especially when the games are partially observable and have a long horizon. In this paper, we propose a peer identification reward, which rewards the learning agent based on how well it can identify the behavior pattern of the peer over the historical context, such as the observation over multiple episodes. This reward motivates the agent to learn a context-aware policy for effective exploration and fast adaptation, i.e., to actively seek and collect informative feedback from peers when uncertain about their policies and to exploit the context to perform the best response when confident. We evaluate our method on diverse testbeds that involve competitive (Kuhn Poker), cooperative (PO-Overcooked), or mixed (Predator-Prey-W) games with peer agents. We demonstrate that our method induces more active exploration behavior, achieving faster adaptation and better outcomes than existing methods.
Medical image registration is vital for disease diagnosis and treatment with its ability to merge diverse information of images, which may be captured under different times, angles, or modalities. Although several sur...
详细信息
Video captioning is an important vision task and has been intensively studied in the computer vision community. Existing methods that utilize the fine-grained spatial information have achieved significant improvements...
详细信息
ISBN:
(纸本)9781665428132
Video captioning is an important vision task and has been intensively studied in the computer vision community. Existing methods that utilize the fine-grained spatial information have achieved significant improvements, however, they either rely on costly external object detectors or do not sufficiently model the spatial/temporal relations. In this paper, we aim at designing a spatial information extraction and aggregation method for video captioning without the need of external object detectors. For this purpose, we propose a Recurrent Region Attention module to better extract diverse spatial features, and by employing Motion-Guided Cross-frame Message Passing, our model is aware of the temporal structure and able to establish high-order relations among the diverse regions across frames. They jointly encourage information communication and produce compact and powerful video representations. Furthermore, an Adjusted Temporal Graph Decoder is proposed to flexibly update video features and model high-order temporal relations during decoding. Experimental results on three benchmark datasets: MSVD, MSR-VTT, and VATEX demonstrate that our proposed method can outperform state-of-the-art methods.
Text semantic matching is a fundamental task that has been widely used in various scenarios, such as community question answering, information retrieval, and recommendation. Most state-of-the-art matching models, e.g....
详细信息
In this paper, we propose a method to predict the success of primer amplification based on the relationship existing between the sequence of primer and template, which can optimize the primer design and select the pri...
In this paper, we propose a method to predict the success of primer amplification based on the relationship existing between the sequence of primer and template, which can optimize the primer design and select the primer with better amplification from the candidate primer set. The double-stranded structure between primer and template nucleotide sequences is represented here by a number of words, each consisting of five characters that form sentences, as the dataset for the experiment, which is learned using an attention-based mechanism of bidirectional long short-term memory neural network model (Attention-BiLSTM), and then predicts primer amplification. The model predicted the results of polymerase chain reaction (PCR) involving specific primers and specific DNA templates with 82% accuracy, an improvement of about 2% over the performance of the LSTM with more stable value. These results show that the model can be used to effectively predict the results of PCR. This is the first paper to optimize primer design by screening the candidate primer set with a neural network model.
暂无评论