检索结果-内蒙古大学图书馆

conference on Computer Vision and Pattern Recognition (CVPR)

作者： Jilan Xu Junlin Hou Yuejie Zhang Rui Feng Yi Wang Yu Qiao Weidi Xie Shanghai Key Lab of Intelligent Information Processing School of Computer Science Shanghai Collaborative Innovation Center of Intelligent Visual Computing Fudan University Shanghai AI Laboratory CMIC Shanghai Jiao Tong University

This paper considers the problem of open-vocabulary semantic segmentation (OVS), that aims to segment objects of arbitrary classes beyond a pre-defined, closed-set categories. The main contributions are as follows: First, we propose a transformer-based model for OVS, termed as OVSegmentor, which only exploits web-crawled imagetext pairs for pre-training without using any mask annotations. OVSegmentor assembles the image pixels into a set of learnable group tokens via a slotattention based binding module, then aligns the group tokens to corresponding caption embeddings. Second, we propose two proxy tasks for training, namely masked entity completion and cross-image mask consistency. The former aims to infer all masked entities in the caption given group tokens, that enables the model to learn fine-grained alignment between visual groups and text entities. The latter enforces consistent mask predictions between images that contain shared entities, encouraging the model to learn visual invariance. Third, we construct CC4M dataset for pre-training by filtering CC12M with frequently appeared entities, which significantly improves training efficiency. Fourth, we perform zero-shot transfer on four benchmark datasets, PASCAL VOC, PASCAL Context, COCO Object, and ADE20K. OVSegmentor achieves superior results over state-of-the-art approaches on PASCAL VOC using only 3% data (4M vs 134M) for pre-training.

关键词：

来源：评论

学校读者我要写书评

暂无评论

SMRD: A Local Feature Descriptor for Multi-modal image Registration

SMRD: A Local Feature Descriptor for Multi-modal Image Regis...

引用

ieee visual communications and image processing (VCIP)

作者： Jiayu Xie Xin Jin Hongkun Cao Shenzhen Key Lab of Broadband Network and Multimedia Shenzhen International Graduate School Tsinghua University Shenzhen China

ISBN: (纸本)9781728173221

image registration among multimodality has received increasing attention in the scope of computer vision and computational photography nowadays. However, the non-linear intensity variations prohibit the accurate feature points matching between modal-different image pairs. Thus, a robust image descriptor for multi-modal image registration is proposed, named shearlet-based modality robust descriptor(SMRD). The anisotropic feature of edge and texture information in multi-scale is encoded to describe the region around a point of interest based on discrete shearlet transform. We conducted the experiments to verify the proposed SMRD compared with several state-of-the-art multi-modal/multispectral descriptors on four different multi-modal datasets. The experimental results showed that our SMRD achieves superior performance than other methods in terms of precision, recall and F1-score.

关键词： Photography image registration Computer vision visual communication image edge detection Transforms Robustness

来源：评论

学校读者我要写书评

暂无评论

Accurate Recognition of Kiwifruit Based on Improved YOLOv5

Accurate Recognition of Kiwifruit Based on Improved YOLOv5

引用

Natural Language processing (ICNLP), International conference on

作者： Sun Wei Sun Yi Jun Li Zhao Chen Guo Jing School of Communications and Information Engineering Xi’an University of Posts and Telecommunications Xi’an China

In order to meet the urgent needs of automation and intelligent picking of kiwifruit, aiming at the problems of unreasonable construction of kiwifruit data set, low fruit recognition accuracy and poor spatial positioning in the natural environment of orchard, a precise recognition and visual positioning method of kiwifruit based on improved Yolov5s was proposed. In view of the growth characteristics of kiwifruit in trellis orchards, a multi-type kiwifruit data set was first constructed. Furthermore, the attention mechanism and multi-scale module are combined to improve the Yolov5s network structure, identify kiwifruit and extract the center coordinates of the prediction box. The experimental results show that the average accuracy of the model for six kiwifruit types under different weather and light conditions is 98 %. The single image recognition time of $1280\times 720$ pixel is about 13.8 ms, and the weight is only 15.21 Mb. It can be seen that this study can provide technical support for the vision system of kiwifruit automatic picking robot, and provide reference for the intelligent recognition and positioning of other fruits (such as apples, mangoes and oranges).

关键词：

来源：评论

学校读者我要写书评

暂无评论

A Novel Steganography Method for Halftone images

A Novel Steganography Method for Halftone Images

引用

ieee Signal processing and communications Applications (SIU)

作者： Efe Ç iftci Emre Sü mer Bilgisayar M&#x00FC hendisli&#x011F i B&#x00F6 l&#x00FC m&#x00FC &#x00C7 ankaya &#x00DC niversitesi Ankara T&#x00FC rkiye

ISBN: (数字)9781665450928

ISBN: (纸本)9781665450935

Steganography is the common name of methods that aim secret communication. In this conference proceeding, a novel steganography algorithm that hides plaintext payload in halftone images and a payload extraction algorithm that is suitable for messages hidden using this steganography method is presented. Our steganography algorithm uses a modified pattern-based halftone image generation procedure and distributes the payload into multiple output images. The proposed method has proven to be secure and able to hide large payloads. According to the objective and subjective evaluations made, it was seen that the proposed method produces promising results.

关键词： Steganography image synthesis Signal processing algorithms Signal processing Payloads

来源：评论

学校读者我要写书评

暂无评论

Object-Based Resolution Selection for Efficient Edge-Assisted Multi-Task Video Analytics

Object-Based Resolution Selection for Efficient Edge-Assiste...

引用

GLOBECOM 2022 - 2022 ieee Global communications conference

作者： Chengzhi Wang Peng Yang Jie Lin Wen Wu Ning Zhang School of Electronic Information and Communications Huazhong University of Science and Technology Wuhan China Peng Cheng Laboratory Frontier Research Center Shenzhen China Department of Electrical and Computer Engineering University of Windsor Windsor ON Canada

ISBN: (纸本)9781665435413

Camera-based monitoring is becoming increasingly popular, as multi-objective detection tasks can be enabled by video analytics over captured frames. Yet, video frames have to be delivered to computation-capable edge nodes for further processing, because the amount of required resources exceeds the capacity of built-in hardware of video cameras. In this paper, observing that video resolution directly determines the subsequent bandwidth and computing resource consumption, as well as the analytic accuracy, we propose an edge-assisted object-based resolution configuration algorithm to achieve efficient multi-task video analytics. The proposed algorithm harnesses the diversity of neural networks used for detecting different objects in one frame, which brings about two-fold possibility for bandwidth saving. On one hand, background information cannot be indiscriminately transmitted, as is unlikely to contribute to improving the analytics accuracy. On the other hand, fine-grained resolution selection allows object-level optimal resolution that minimizes the transmitted data volume under accuracy and latency constraints. Simulation results demonstrate that the proposed method can effectively reduce up to 50% of the transmitted data volume, compared to existing benchmarks.

关键词： Sensitivity visual analytics Simulation image edge detection Neural networks Bandwidth Multitasking

来源：评论

学校读者我要写书评

暂无评论

Cattle Region Extraction using image processing Technology 10

Cattle Region Extraction using Image Processing Technology

引用

10th ieee Global conference on Consumer Electronics, GCCE 2021

作者： Motomura, Yuya Zin, Thi Thi Horii, Yoichiro University of Miyazaki Graduate School of Engineering Miyazaki Japan University of Miyazaki Center For Animal Disease Control Miyazaki Japan

ISBN: (纸本)9781665436762

In recent years, the number of dairy and beef cattle farms has been decreasing, while the number of cattle and the number of cattle per farm have been increasing, so systems for automatically monitoring cattle have been actively introduced. However, most of them are contact type, which causes physical or mental stress to the cows and is costly when the equipment is damaged. Therefore, in this research, we proposed a method for extracting the approximate shape of cattle using a non-contact 360-degree camera to reduce the burden on livestock farmers and cattle, and confirmed its effectiveness through experiments. © 2021 ieee.

关键词： Morphology

来源：评论

学校读者我要写书评

暂无评论

AI Based Automated image Caption Tool Implementation for visually Impaired 1

AI Based Automated Image Caption Tool Implementation for Vis...

引用

1st International conference on Industrial Electronics Research and Applications, ICIERA 2021

作者： Wadhwa, Vanshika Gupta, Bhoomi Gupta, Sachin Maharaja Agrasen Institute of Technology Department of It Delhi India School of Engineering and Technology Mvn University Haryana India

ISBN: (纸本)9781665435420

image captioning is a rapidly emerging area in the Artificial Intelligence applications for natural language definitions. It works at the confluence of image data obtained through datasets, and the sentence definitions towards capturing meaningful interpretations of the interaction that exists between them. It uses CNN's (Convolutional Neural Networks) reading techniques in image and LSTM (Long Short Term Memory) type RNN (Recurrent Neural Network) over sentences together so that the computer can see the context of the image and display it in a natural language like English. This paper combines the application of computer vision and natural language processing towards building assistive technology that supplements visual data like images by providing braille readable captions for the visually impaired to get a better sense of what is happening around them and understand their surroundings. © 2021 ieee.

关键词： Computer vision

来源：评论

学校读者我要写书评

暂无评论

image Precise Matching With Illumination Robust in Vehicle visual Navigation

引用

ieee ACCESS 2020年 8卷 92503-92513页

作者： Zhou Jingmei Cheng Xin Han Ruizhi Zhao Xiangmo Changan Univ Sch Elect & Control Engn Xian 71064 Peoples R China Changan Univ Sch Informat Engn Xian 71064 Peoples R China

In vehicle visual navigation, image matching algorithm is highly critical to positioning accuracy and processing efficiency. One single matching algorithm cannot satisfy all types of image features accurate acquisition, so Harris, SUSAN, FAST, SIFT, and SURF are respectively adopted to process various road images under normal lighting condition. During practical application, the appropriate algorithm can be selected based on detection rate and running time of the above algorithms. Aiming at the illumination change interference of the collected images in vehicle visual navigation, many traditional matching algorithms for illumination change are not optimal, so an image precise matching algorithm with illumination change robustness is proposed. Because image edges and detail information have lower sensitivity for illumination change, SURF feature points are optimized by image gradient based on the idea of Canny, and the bidirectional search is used to obtain precise matching points. The experimental results show that feature point detection of the algorithm remains good stability for illumination change in images, and the matching accuracy can reach more than 94 & x0025;. The algorithm is not only robust to illumination change, but also ensures higher matching speed and meanwhile improves the matching accuracy significantly.

关键词： Lighting Robustness Feature extraction image matching Navigation visualization Histograms image matching illumination robust SURF feature gradient bidirectional search

来源：评论

学校读者我要写书评

暂无评论

Non-invasive image Quality Assessment Based on Eye-tracking 7

Non-invasive Image Quality Assessment Based on Eye-tracking

引用

7th International conference on Computer and communications, ICCC 2021

作者： Wei, Hongan Lin, Sang Chen, Weiling Chen, Jing Zheng, Yannan Fuzhou University Fujian Key Lab For Intelligent Processing and Wireless Transmission of Media Information Fuzhou China

ISBN: (纸本)9781665409506

In order to measure the perceptual quality of images, it is important to find suitable image Quality Assessment (IQA) methods. Compared with the traditional objective IQA methods, the subjective IQA methods can more truly reflect users' subjective feelings on image quality. However, the evaluation steps in subjective experiments will interfere with the process of subjects viewing images. We propose a non-intrusive IQA method based on eye-tracking technology to overcome the shortcoming. Through subjective experiments, we collect the eye movement data of subjects and calculate four main features from them. The analysis results of the eye movement parameters indicate that users pay attention to different areas of images with different qualities. Then, we use the eye movement parameters to design the Non-Intrusive Subjective (NIS) IQA algorithm which is a new subjective quality metric of images. Experimental results reveal that the validity and feasibility of the proposed metric. © 2021 ieee.

关键词： Eye movements

来源：评论

学校读者我要写书评

暂无评论

SEMANTIC-PRESERVING image COMPRESSION

SEMANTIC-PRESERVING IMAGE COMPRESSION

引用

ieee International conference on image processing (ICIP)

作者： Patwa, Neel Ahuja, Nilesh Somayazulu, Srinivasa Tickoo, Omesh Varadarajan, Srenivas Koolagudi, Shashidhar Intel Mountain View CA 94039 USA Samsung Seoul South Korea NIT Karnataka Mangalore India

ISBN: (纸本)9781728163956

Video traffic comprises a large majority of the total traffic on the internet today. Uncompressed visual data requires a very large data rate;lossy compression techniques are employed in order to keep the data-rate manageable. Increasingly, a significant amount of visual data being generated is consumed by analytics (such as classification, detection, etc.) residing in the cloud. image and video compression can produce visual artifacts, especially at lower data-rates, which can result in a significant drop in performance on such analytic tasks. Moreover, standard image and video compression techniques aim to optimize perceptual quality for human consumption by allocating more bits to perceptually significant features of the scene. However, these features may not necessarily be the most suitable ones for semantic tasks. We present here an approach to compress visual data in order to maximize performance on a given analytic task. We train a deep auto-encoder using a multi-task loss to learn the relevant embeddings. An approximate differentiable model of the quantizer is used during training which helps boost the accuracy during inference. We apply our approach on an image classification problem and show that for a given level of compression, it achieves higher classification accuracy than that obtained by performing classification on images compressed using JPEG. Our approach also outperforms the relevant state-of-the-art approach by a significant margin.

关键词： image compression deep image compression semantic preserving

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：