检索结果-内蒙古大学图书馆

Making the V in VQA Matter: Elevating the Role of image Understanding in Visual Question Answering

international JOURNAL OF computer vision 2019年第4期127卷 398-414页

作者： Goyal, Yash Khot, Tejas Agrawal, Aishwarya Summers-Stay, Douglas Batra, Dhruv Parikh, Devi Georgia Tech Atlanta GA 30332 USA Carnegie Mellon Univ Pittsburgh PA 15213 USA Army Res Lab Adelphi MD USA Facebook AI Res Menlo Pk CA USA

The problem of visual question answering (VQA) is of significant importance both as a challenging research question and for the rich set of applications it enables. In this context, however, inherent structure in our world and bias in our language tend to be a simpler signal for learning than visual modalities, resulting in VQA models that ignore visual information, leading to an inflated sense of their capability. We propose to counter these language priors for the task of VQA and make vision (the V in VQA) matter! Specifically, we balance the popular VQA dataset (Antol et al., in: ICCV, 2015) by collecting complementary images such that every question in our balanced dataset is associated with not just a single image, but rather a pair of similar images that result in two different answers to the question. Our dataset is by construction more balanced than the original VQA dataset and has approximately twice the number of image-question pairs. Our complete balanced dataset is publicly available at http://***/ as part of the 2nd iteration of the VQA Dataset and Challenge (VQA v2.0). We further benchmark a number of state-of-art VQA models on our balanced dataset. All models perform significantly worse on our balanced dataset, suggesting that these models have indeed learned to exploit language priors. This finding provides the first concrete empirical evidence for what seems to be a qualitative sense among practitioners. We also present interesting insights from analysis of the participant entries in VQA Challenge 2017, organized by us on the proposed VQA v2.0 dataset. The results of the challenge were announced in the 2nd VQA Challenge Workshop at the IEEE conference on computer vision and Pattern Recognition (CVPR) 2017. Finally, our data collection protocol for identifying complementary images enables us to develop a novel interpretable model, which in addition to providing an answer to the given (image, question) pair, also provides a counter-example ba

关键词： Visual question answering VQA VQA challenge

来源：评论

学校读者我要写书评

暂无评论

Lip reading using external viseme decoding 12

Lip reading using external viseme decoding

引用

12th Iranian/2nd international conference on Machine vision and image processing, MVIP 2022

作者： Peymanfard, Javad Reza Mohammadi, Mohammad Zeinali, Hossein Mozayani, Nasser Iran University of Science and Technology School of Computer Engineering Tehran Iran Amirkabir University of Technology Department of Computer Engineering Tehran Iran

ISBN: (纸本)9781665412162

Lip-reading is the operation of recognizing speech from lip movements. This is a difficult task because the movements of the lips when pronouncing the words are similar for some of them. Viseme is used to describe lip movements during a conversation. This paper aims to show how to use external text data (for viseme-to-character mapping) by dividing video-to-character into two stages, namely converting video to viseme and then converting viseme to character by using separate models. Our proposed method improves word error rate by an absolute rate of 4% compared to the typical sequence to sequence lipreading model on the BBC-Oxford Lip Reading dataset (LRS2). © 2022 IEEE.

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

Adaptation of Eikonal Equation over Weighted Graph

引用

2nd international conference on Scale Space and Variational Methods in computer vision

作者： Ta, Vinh-Thong Elmoataz, Abderrahim Lezoray, Olivier Univ Caen Basse Normandie CNRS GREYC Image TeamUMR 6072 Caen France

ISBN: (纸本)9783642022555

In this paper, an adaptation of the eikonal equation is proposed by considering the latter on weighted graphs of arbitrary structure. This novel approach is based on a family of discrete morphological local and nonlocal gradients expressed by partial difference equations (PdEs). Our formulation of the eikonal equation on weighted graphs generalizes local and nonlocal configurations in the context of image processing and extends this equation for the processing of any unorganized high dimensional discrete data that can be represented by a graph. Our approach leads to a unified formulation for image segmentation and high dimensional irregular data processing.

关键词： Geometrical optics

来源：评论

学校读者我要写书评

暂无评论

Realization of Anti-counterfeiting and Traceability of Ceramics Based on computer vision and Blockchain Technology-Taking Jingdezhen Ceramics as an Example 2

Realization of Anti-counterfeiting and Traceability of Ceram...

引用

2nd international conference on Algorithm, image processing and Machine vision, AIPMV 2024

作者： Chang, Yangyang Cheng, Xien Hu, Jingfang School of Information Engineering Jingdezhen Ceramic University Jingdezhen China

ISBN: (纸本)9798350390254

Jingdezhen ceramics have a long history and are world-famous, and thus often become the object of imitation. Aiming at the current ceramic anti-counterfeiting traceability technology is not precise enough, a new treatment program is proposed. Firstly, professional equipment is used to collect the ceramic micropores, due to the randomness and uniqueness of the arrangement of micropores in the ceramic microstructure. afterwards, these micro-holes are processed by Canny edge detection method and then combined with high-precision SIFT method of computer vision for feature extraction. In the next part, this paper introduces the Interplanetary File System (IPFS) combined with decentralized and difficult to tamper with blockchain technology to write a smart contract for traceability. Finally, In the identification process, the picture to be compared with the picture stored on IPFS is compared using knnmatch, and if it meets the principle of 3 times the standard deviation, it is authentic, and vice versa, it is a fake. The experimental results show that the results can be used as a powerful basis for ceramic anti-counterfeiting traceability. © 2024 IEEE.

关键词： Micropores

来源：评论

学校读者我要写书评

暂无评论

The Detection of Blastocyst Embryo in Vitro Fertilization (IVF) 12

The Detection of Blastocyst Embryo in Vitro Fertilization (I...

引用

12th Iranian/2nd international conference on Machine vision and image processing, MVIP 2022

作者： Dehkordi, Kimiya Samie Moghaddam, Mohsen Ebrahimi Shahid Beheshti University Computer Engineering Tehran Iran

ISBN: (纸本)9781665412162

One of the most important stages in the fate of the embryo in In vitro fertilization (IVF) is the blastocyst stage. There is currently no way to diagnose blastocyst. In this study, using Resnet and Unet networks, the embryo was detected in the blastocyst state. The proposed method is trained on a set of data consisting of 40392 data, which is 24365 data for training and 5814 data for validation, and is tested on 10263 data obtained from various sources. The results show an accuracy of 92.9% and a precision of 93.7% and recall of 92 92.1% which confirm that the proposed method was well able to detect the states in which the fetus is in the blastocyst state. © 2022 IEEE.

关键词： Deep learning

来源：评论

学校读者我要写书评

暂无评论

Spherical image denoising and its application to omnidirectional imaging

Spherical image denoising and its application to omnidirecti...

引用

2nd international conference on computer vision Theory and Applications, VISAPP 2007

作者： Bigot, Stephanie Kachi, Djemaa Durand, Sylvain Mouaddib, El Mustapha LA.M.F.A. UMR CNRS 6140 U.P.J.V. 33 rue Saint Leu 80000 Amiens Cedex 1 France C.R.E.A. EA 3299 U.P.J.V. 7 rue du moulin neuf 80039 Amiens Cedex 1 France

This paper addresses the problem of spherical image processing. Thanks to projective geometry, the omnidi-rectional image can be presented as a function on sphere S2. The target application includes omnidirectional image smoothing. We describe a new method of smoothing for spherical images. For that purpose, we in-troduce a suitable Wiener filter and we use the Tikhonov method to these images. In order to compare their performances, we present the most used classical spherical kernels. We present several examples for filtering real and synthetical spherical images. ".

关键词： image denoising

来源：评论

学校读者我要写书评

暂无评论

A face detection method via ensemble of four versions of YOLOs 12

A face detection method via ensemble of four versions of YOL...

引用

12th Iranian/2nd international conference on Machine vision and image processing, MVIP 2022

作者： Khalili, Sanaz Shakiba, Ali Vali-e-Asr University of Rafsanjan Department of Computer Sciences Rafsanjan Iran

ISBN: (纸本)9781665412162

We implemented a real-time ensemble model for face detection by combining the results of YOLO v1 to v4. We used the WIDER FACE benchmark for training YOLOv1 to v4 in the Darknet framework. Then, we ensemble their results by two methods, namely, WBF (Weighted boxes fusion) and NMW (Non-maximum weighted). The experimental analysis showed that the mAP increases in the WBF ensemble of the models for all the easy, medium, and hard images in the datasets by 7.81%, 22.91%, and 12.96%, respectively. These numbers are 6.25%, 20.83%, and 11.11% for the NMW ensemble. © 2022 IEEE.

关键词： Face recognition

来源：评论

学校读者我要写书评

暂无评论

Application of BM3D algorithm in CT image denoising of liver cancer 2

Application of BM3D algorithm in CT image denoising of liver...

引用

2nd international conference on computer vision, image, and Deep Learning

作者： Wu, Changhe Gao, Tianhan Department of Software Engineering School of Software Northeastern University 500 Wisdom Street Liaoning Shenyang China Department of Digital Media School of Software Northeastern University 500 Wisdom Street Liaoning Shenyang China

ISBN: (数字)9781510646827

ISBN: (纸本)9781510646810

CT images play a vital role in the diagnosis of liver cancer. However, CT images often have significant image noise, which is unfavourable for doctors' diagnoses. In response to this problem, this paper applies the BM3D denoising algorithm to the denoising CT images of liver cancer. The BM3D denoising algorithm first obtains similar blocks through block matching, stacks these similar blocks into three-dimensional blocks, performs collaborative filtering processing, and finally obtains the final clear image through aggregation. Experimental results show that the BM3D algorithm can effectively remove the noise of CT images of liver cancer. © 2021 SPIE.

关键词： image denoising

来源：评论

学校读者我要写书评

暂无评论

Sign Language Detection using Action Recognition 2

Sign Language Detection using Action Recognition

引用

2nd international conference on Advance Computing and Innovative Technologies in Engineering, ICACITE 2022

作者： Iyer, Vishwa Hariharan Prakash, U.M Vijay, Aashrut Sathishkumar, P. SRM Institute of Science and Technology Department of Computer Science and Engineering Chennai India Pgp College of Engineering and Technology Department of Computer Science and Engineering Chennai India

ISBN: (纸本)9781665437899

Sign Language Detection has become crucial and effective for humans and research in this area is in progress and is one of the applications of computer vision. Earlier works included detection using static signs with the help of a simple deep learning-based Convolutional Neural Network. This proposal is based on continuous detection of image frames in real-time using action detection so as to detect the action performed by the user. The model uses LSTM neural network model after identifying keypoints using mediapipe holistic which includes face, pose and hand features. The proposed work is done by collecting key value points for training and testing, pre-processing the data, and creating labels and features. It saves the weights and evaluates the model using confusion matrix accuracy. © 2022 IEEE.

关键词： computer vision

来源：评论

学校读者我要写书评

暂无评论

Combining local and global features for image segmentation using iterative classification and region merging 2

Combining local and global features for image segmentation u...

引用

2nd Canadian conference on computer and Robot vision, CRV 2005

作者： Yu, Qiyao Clausi, D.A. Systems Design Engineering University of Waterloo WaterlooON Canada

ISBN: (纸本)0769523196

In MRF based unsupervised segmentation, the MRF model parameters are typically estimated globally. Those global statistics sometimes are far from accurate for local areas if the image is highly non-stationary, and hence will generate false boundaries. The problem cannot be solved if local statistics are not considered. This work incorporates the local feature of edge strength in the MRF energy function, and segmentation is obtained by reducing the energy function using iterative classification and region merging. © 2005 IEEE.

关键词： Edge detection

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：