检索结果-内蒙古大学图书馆

arXiv 2021年

作者： Harzig, Philipp Einfalt, Moritz Ludwig, Katja Lienhart, Rainer Multimedia Computing and Computer Vision Lab University of Augsburg Augsburg Germany

The multimedia and computer vision lab.of the University of Augsburg participated in the VTT task only. We use the VATEX [1] and TRECVID-VTT [2] datasets for training our VTT models. We base our model on the Transformer [3] approach for both of our submitted runs, i.e., for run 2021-01 . For our second model (2021-02), we adapt the X-Linear Attention Networks for Image Captioning [4] which does not yield the desired bump in scores. For both models, we train on the complete VATEX dataset and 90% of the TRECVID-VTT dataset for pretraining while using the remaining 10% for validation. We finetune both models with self-critical sequence training [5], which boosts the validation performance significantly. Overall, we find that training a Video-to-Text system on traditional Image Captioning pipelines [6] delivers very poor performance. When switching to a Transformer-based architecture our results greatly improve and the generated captions match better with the corresponding video (see Figure 3). Copyright © 2021, The Authors. All rights reserved.

关键词： Pipelines

来源：评论

学校读者我要写书评

暂无评论

Adaptive feature selection for no-reference image quality assessment by mitigating semantic noise sensitivity 24

Adaptive feature selection for no-reference image quality as...

引用

Proceedings of the 41st International Conference on Machine Learning

作者： Xudong Li Timin Gao Runze Hu Yan Zhang Shengchuan Zhang Xiawu Zheng Jingyuan Zheng Yunhang Shen Ke Li Yutao Liu Pingyang Dai Rongrong Ji Key Laboratory of Multimedia Trusted Perception and Efficient Computing Ministry of Education of China Xiamen University P.R. China School of Information and Electronics Beijing Institute of Technology Beijing China School of Medicine Xiamen University Tencent Youtu Lab. School of Computer Science and Technology Ocean University of China

The current state-of-the-art No-Reference Image Quality Assessment (NR-IQA) methods typically rely on feature extraction from upstream semantic backbone networks, assuming that all extracted features are relevant. However, we make a key observation that not all features are beneficial, and some may even be harmful, necessitating careful selection. Empirically, we find that many image pairs with small feature spatial distances can have vastly different quality scores, indicating that the extracted features may contain quality-irrelevant noise. To address this issue, we propose a Quality-Aware Feature Matching IQA Metric (QFM-IQM) that employs an adversarial perspective to remove harmful semantic noise features from the upstream task. Specifically, QFM-IQM enhances the semantic noise distinguish capabilities by matching image pairs with similar quality scores but varying semantic features as adversarial semantic noise and adaptively adjusting the upstream task's features by reducing sensitivity to adversarial noise perturbation. Furthermore, we utilize a distillation framework to expand the dataset and improve the model's generalization ability. Extensive experiments conducted on eight standard IQA datasets have demonstrated the effectiveness of our proposed QFM-IQM.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Integrating global context contrast and local sensitivity for blind image quality assessment 24

Integrating global context contrast and local sensitivity fo...

引用

Proceedings of the 41st International Conference on Machine Learning

作者： Xudong Li Runze Hu Jingyuan Zheng Yan Zhang Shengchuan Zhang Xiawu Zheng Ke Li Yunhang Shen Yutao Liu Pingyang Dai Rongrong Ji Key Laboratory of Multimedia Trusted Perception and Efficient Computing Ministry of Education of China Xiamen University P.R. China School of Information and Electronics Beijing Institute of Technology Beijing China School of Medicine Xiamen University. Tencent Youtu Lab. School of Computer Science and Technology Ocean University of China

Blind Image Quality Assessment (BIQA) mirrors subjective made by human observers. Generally, humans favor comparing relative qualities over predicting absolute qualities directly. However, current BIQA models focus on mining the "local" context, i.e., the relationship between information among individual images and the absolute quality of the image, ignoring the "global" context of the relative quality contrast among different images in the training data. In this paper, we present the Perceptual Context and Sensitivity BIQA (CSIQA), a novel contrastive learning paradigm that seamlessly integrates "global" and "local" perspectives into the BIQA. Specifically, the CSIQA comprises two primary components: 1) A Quality Context Contrastive Learning module, which is equipped with different contrastive learning strategies to effectively capture potential quality correlations in the global context of the dataset. 2) A Quality-aware Mask Attention Module, which employs the random mask to ensure the consistency with visual local sensitivity, thereby improving the model's perception of local distortions. Extensive experiments on eight standard BIQA datasets demonstrate the superior performance to the state-of-the-art BIQA methods.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Robust Estimation of Flight Parameters for SKI Jumpers

Robust Estimation of Flight Parameters for SKI Jumpers

引用

IEEE International Conference on multimedia and Expo Workshops (ICMEW)

作者： Katja Ludwig Moritz Einfalt Rainer Lienhart Multimedia Computing and Computer Vision Lab University of Augsburg

ISBN: (数字)9781728114859

ISBN: (纸本)9781728114866

This paper presents a model that robustly estimates important flight parameters for ski jumpers during their flight phase based on several camera views from the side along the jumpers' typical flight trajectories. A convolutional neural network for pose estimation, but also trained to detect skis, serves as a base model. It identifies 98.0% of the relevant flight parameters correctly within an angle threshold of 5 degrees, improving by 11.6% over previous work. In postprocessing, a pose checker first removes all wrong poses by using comparisons of distances and relative positions of the detected keypoints. A second step executes two RANSAC variants. One robustly estimates the average pose and another one the average pose angles. This model lifts the detection performance to 99.3% of the relevant flight parameters within a threshold of 5 degrees.

关键词： Cameras Computational modeling Hip Pose estimation Videos Principal component analysis Head

来源：评论

学校读者我要写书评

暂无评论

Addressing data bias problems for chest x-ray image report generation 30

Addressing data bias problems for chest x-ray image report g...

引用

30th British Machine vision Conference, BMVC 2019

作者： Harzig, Philipp Chen, Yan-Ying Chen, Francine Lienhart, Rainer Multimedia Computing and Computer Vision Lab University of Augsburg Augsburg Germany FX Palo Alto Laboratory 3174 Porter Drive Palo AltoCA United States

Automatic medical report generation from chest X-ray images is one possibility for assisting doctors to reduce their workload. However, the different patterns and data distribution of normal and abnormal cases can bias machine learning models. Previous attempts did not focus on isolating the generation of the abnormal and normal sentences in order to increase the variability of generated paragraphs. To address this, we propose to separate abnormal and normal sentence generation by using a dual word LSTM in a hierarchical LSTM model. In addition, we conduct an analysis on the distinctiveness of generated sentences compared to the BLEU score, which increases when less distinct reports are generated. Together with this analysis, we propose a way of selecting a model that generates more distinctive sentences. We hope our findings will help to encourage the development of new metrics to better verify methods of automatic medical report generation. © 2019. The copyright of this document resides with its authors.

关键词： Long short-term memory

来源：评论

学校读者我要写书评

暂无评论

Investigating gesture typing for indirect touch

引用

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2019年第3期3卷 1-22页

作者： Yang, Zhican Yu, Chun Yi, Xin Shi, Yuanchun Key Laboratory of Pervasive Computing Ministry of Education Beijing Key Lab. of Networked Multimedia Beijing National Research Center for Information Science and Technology Department of Computer Science and Technology Tsinghua University Beijing100084 China

With the development of ubiquitous computing, entering text on HMDs and smart TVs using handheld touchscreen devices (e.g., smartphone and controller) is becoming more and more attractive. In these indirect touch scenarios, the touch input surface is decoupled from the visual display. Compared with direct touch input, entering text using a keyboard in indirect touch is more challenging because before the finger touch, no visual feedback is availab.e for locating the touch finger. Aiming at this problem, in this paper, we investigate the feasibility of gesture typing for indirect touch since keeping the finger in touch with the screen during typing makes it possible to provide continuous visual feedback, which is beneficial for increasing the input performance. We first examine users' gesture typing ability in terms of the appropriate keyboard size and location in motor space and then compare the typing performance in direct and indirect touch mode. We then propose an improved design to address the uncertainty and inaccuracy of the first touch. Our evaluation result shows that users can quickly acquire indirect gesture typing, and type 22.3 words per minute after 30 phases, which significantly outperforms previous numbers in literature. Our work provides the empirical support for leveraging gesture typing for indirect touch. © 2019 Association for computing Machinery.

关键词： Touch screens

来源：评论

学校读者我要写书评

暂无评论

ProxiTalk: Activate speech input by bringing smartphone to the mouth

引用

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2019年第3期3卷 1-25页

作者： Yang, Zhican Yu, Chun Zheng, Fengshi Shi, Yuanchun Key Laboratory of Pervasive Computing Ministry of Education Beijing Key Lab. of Networked Multimedia Beijing National Research Center for Information Science and Technology Department of Computer Science and Technology Tsinghua University Beijing100084 China

Speech input, such as voice assistant and voice message, is an attractive interaction option for mobile users today. However, despite its popularity, there is a use limitation for smartphone speech input: users need to press a button or say a wake word to activate it before use, which is not very convenient. To address it, we match the motion that brings the phone to mouth with the user's intention to use voice input. In this paper, we present ProxiTalk, an interaction technique that allows users to enable smartphone speech input by simply moving it close to their mouths. We study how users use ProxiTalk and systematically investigate the recognition abilities of various data sources (e.g., using a front camera to detect facial features, using two microphones to estimate the distance between phone and mouth). Results show that it is feasible to utilize the smartphone's built-in sensors and instruments to detect ProxiTalk use and classify gestures. An evaluation study shows that users can quickly acquire ProxiTalk and are willing to use it. In conclusion, our work provides the empirical support that ProxiTalk is a practical and promising option to enable smartphone speech input, which coexists with current trigger mechanisms. © 2019 Association for computing Machinery.

关键词： Smartphones

来源：评论

学校读者我要写书评

暂无评论

nnDetection: A Self-configuring Method for Medical Object Detection

arXiv

引用

arXiv 2021年

作者： Baumgartner, Michael Jäger, Paul F. Isensee, Fabian Maier-Hein, Klaus H. Division of Medical Image Computing German Cancer Research Center Heidelberg Germany Interactive Machine Learning Group German Cancer Research Center Germany HIP Applied Computer Vision Lab. German Cancer Research Center Germany Pattern Analysis and Learning Group Heidelberg University Hospital Germany

Simultaneous localisation and categorization of objects in medical images, also referred to as medical object detection, is of high clinical relevance because diagnostic decisions often depend on rating of objects rather than e.g. pixels. For this task, the cumbersome and iterative process of method configuration constitutes a major research bottleneck. Recently, nnU-Net has tackled this challenge for the task of image segmentation with great success. Following nnU-Net’s agenda, in this work we systematize and automate the configuration process for medical object detection. The resulting self-configuring method, nnDetection, adapts itself without any manual intervention to arbitrary medical detection problems while achieving results en par with or superior to the state-of-the-art. We demonstrate the effectiveness of nnDetection on two public benchmarks, ADAM and LUNA16, and propose 11 further medical object detection tasks on public data sets for comprehensive method evaluation. Code is at https://***/MIC-DKFZ/nnDetection. Copyright © 2021, The Authors. All rights reserved.

关键词： Object recognition

来源：评论

学校读者我要写书评

暂无评论

MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

arXiv

引用

arXiv 2024年

作者： Dai, Yuekun Zhang, Dafeng Li, Xiaoming Yue, Zongsheng Li, Chongyi Zhou, Shangchen Feng, Ruicheng Yang, Peiqing Jin, Zhezhu Liu, Guanqun Loy, Chen Change Zhang, Lize Liu, Shuai Feng, Chaoyu Wang, Luyang Chen, Shuan Shao, Guangqi Wang, Xiaotao Lei, Lei Yang, Qirui Cheng, Qihua Xu, Zhiqiang Liu, Yihao Yue, Huanjing Yang, Jingyu Vasluianu, Florin-Alexandru Wu, Zongwei Ciubotariu, George Timofte, Radu Zhang, Zhao Zhao, Suiyi Wang, Bo Zuo, Zhichao Wei, Yanyan Teja, Kuppa Sai Sri Jayakar Reddy, A. Rongali, Girish Mitra, Kaushik Ma, Zhihao Liu, Yongxu Zhang, Wanying Shang, Wei He, Yuhong Peng, Long Yu, Zhongxin Luo, Shaofei Wang, Jian Miao, Yuqi Li, Baiang Wei, Gang Verma, Rakshank Maheshwari, Ritik Tekchandani, Rahul Hambarde, Praful Tazi, Satya Narayan Vipparthi, Santosh Kumar Murala, Subrahmanyam Zhang, Haopeng Hou, Yingli Yao, Mingde Levin, M.S. Sundararajan, Aniruth Hari Kumar, A. Xiaomi Inc. China School of Electrical and Information Engineering Tianjin University China Shenzhen MicroBT Electronics Technology Co. Ltd China Shanghai Artificial Intelligence Laboratory China Computer Vision Lab CAIDAS IFI University of Würzburg Germany Laboratory for Multimedia Computing Hefei University of Technology China Detect Technologies Pvt Ltd India Indian Institute of Technology Madras India Xidian University China Harbin Institute of Technology China Northeastern University China University of Science and Technology of China China Fujian Normal University China Snap Inc. United States Tongji University China Hefei University of Technology China GEC Ajmer India CVPR Lab IIT Ropar India SCSS Trinity College Dublin Ireland Faculty of Robot Science and Engineering Northeastern University China Software College Northeastern University China The Chinese University of Hong Kong China Shiv Nadar University India

The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://***/MIPI2024. Copyright © 2024, The Authors. All rights reserved.

关键词： Image sensors

来源：评论

学校读者我要写书评

暂无评论

Image Captioning with Clause-Focused Metrics in a Multi-Modal Setting for Marketing

arXiv

引用

arXiv 2019年

作者： Harzig, Philipp Zecha, Dan Lienhart, Rainer Kaiser, Carolin Schallner, René Multimedia Computing and Computer Vision Lab University of Augsburg Augsburg86159 Germany GfK Verein Nuremberg90419 Germany

Automatically generating descriptive captions for images is a well-researched area in computer vision. However, existing evaluation approaches focus on measuring the similarity between two sentences disregarding fine-grained semantics of the captions. In our setting of images depicting persons interacting with branded products, the subject, predicate, object and the name of the branded product are important evaluation criteria of the generated captions. Generating image captions with these constraints is a new challenge, which we tackle in this work. By simultaneously predicting integer-valued ratings that describe attributes of the human-product interaction, we optimize a deep neural network architecture in a multi-task learning setting, which considerably improves the caption quality. Furthermore, we introduce a novel metric that allows us to assess whether the generated captions meet our requirements (i.e., subject, predicate, object, and product name) and describe a series of experiments on caption quality and how to address annotator disagreements for the image ratings with an approach called soft targets. We also show that our novel clause-focused metrics are also applicable to other image captioning datasets, such as the popular MSCOCO dataset. Copyright © 2019, The Authors. All rights reserved.

关键词： Integer programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：