The multimedia and computervisionlab.of the University of Augsburg participated in the VTT task only. We use the VATEX [1] and TRECVID-VTT [2] datasets for training our VTT models. We base our model on the Transform...
详细信息
The current state-of-the-art No-Reference Image Quality Assessment (NR-IQA) methods typically rely on feature extraction from upstream semantic backbone networks, assuming that all extracted features are relevant. How...
The current state-of-the-art No-Reference Image Quality Assessment (NR-IQA) methods typically rely on feature extraction from upstream semantic backbone networks, assuming that all extracted features are relevant. However, we make a key observation that not all features are beneficial, and some may even be harmful, necessitating careful selection. Empirically, we find that many image pairs with small feature spatial distances can have vastly different quality scores, indicating that the extracted features may contain quality-irrelevant noise. To address this issue, we propose a Quality-Aware Feature Matching IQA Metric (QFM-IQM) that employs an adversarial perspective to remove harmful semantic noise features from the upstream task. Specifically, QFM-IQM enhances the semantic noise distinguish capabilities by matching image pairs with similar quality scores but varying semantic features as adversarial semantic noise and adaptively adjusting the upstream task's features by reducing sensitivity to adversarial noise perturbation. Furthermore, we utilize a distillation framework to expand the dataset and improve the model's generalization ability. Extensive experiments conducted on eight standard IQA datasets have demonstrated the effectiveness of our proposed QFM-IQM.
Blind Image Quality Assessment (BIQA) mirrors subjective made by human observers. Generally, humans favor comparing relative qualities over predicting absolute qualities directly. However, current BIQA models focus on...
Blind Image Quality Assessment (BIQA) mirrors subjective made by human observers. Generally, humans favor comparing relative qualities over predicting absolute qualities directly. However, current BIQA models focus on mining the "local" context, i.e., the relationship between information among individual images and the absolute quality of the image, ignoring the "global" context of the relative quality contrast among different images in the training data. In this paper, we present the Perceptual Context and Sensitivity BIQA (CSIQA), a novel contrastive learning paradigm that seamlessly integrates "global" and "local" perspectives into the BIQA. Specifically, the CSIQA comprises two primary components: 1) A Quality Context Contrastive Learning module, which is equipped with different contrastive learning strategies to effectively capture potential quality correlations in the global context of the dataset. 2) A Quality-aware Mask Attention Module, which employs the random mask to ensure the consistency with visual local sensitivity, thereby improving the model's perception of local distortions. Extensive experiments on eight standard BIQA datasets demonstrate the superior performance to the state-of-the-art BIQA methods.
This paper presents a model that robustly estimates important flight parameters for ski jumpers during their flight phase based on several camera views from the side along the jumpers' typical flight trajectories....
详细信息
ISBN:
(数字)9781728114859
ISBN:
(纸本)9781728114866
This paper presents a model that robustly estimates important flight parameters for ski jumpers during their flight phase based on several camera views from the side along the jumpers' typical flight trajectories. A convolutional neural network for pose estimation, but also trained to detect skis, serves as a base model. It identifies 98.0% of the relevant flight parameters correctly within an angle threshold of 5 degrees, improving by 11.6% over previous work. In postprocessing, a pose checker first removes all wrong poses by using comparisons of distances and relative positions of the detected keypoints. A second step executes two RANSAC variants. One robustly estimates the average pose and another one the average pose angles. This model lifts the detection performance to 99.3% of the relevant flight parameters within a threshold of 5 degrees.
Automatic medical report generation from chest X-ray images is one possibility for assisting doctors to reduce their workload. However, the different patterns and data distribution of normal and abnormal cases can bia...
详细信息
With the development of ubiquitous computing, entering text on HMDs and smart TVs using handheld touchscreen devices (e.g., smartphone and controller) is becoming more and more attractive. In these indirect touch scen...
详细信息
Speech input, such as voice assistant and voice message, is an attractive interaction option for mobile users today. However, despite its popularity, there is a use limitation for smartphone speech input: users need t...
详细信息
Simultaneous localisation and categorization of objects in medical images, also referred to as medical object detection, is of high clinical relevance because diagnostic decisions often depend on rating of objects rat...
详细信息
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the...
详细信息
Automatically generating descriptive captions for images is a well-researched area in computervision. However, existing evaluation approaches focus on measuring the similarity between two sentences disregarding fine-...
详细信息
暂无评论