检索结果-内蒙古大学图书馆

6th International conference on Information Systems and Management Science, ISMS 2023

作者： Adeniyi, Abidemi Emmanuel Brahma, Biswajit Awotunde, Joseph Bamidele Aworinde, Halleluyah Oluwatobi Bhuyan, Hemanta Kumar Department of Computer Science Bowen University Osun State Iwo Nigeria McKesson Corporation USA 32559 Lake Bridgeport St FremontCA94555 United States Department of Computer Science Faculty of Information and Communication Sciences University of Ilorin Ilorin240003 Nigeria Andhra Pradesh Guntur India

ISBN: (纸本)9783031707889

This research explore the integration and application of advanced deep learning models, specifically Convolutional Neural Networks (CNNs) and vision Transformer (ViT) models, in the field of age and gender detection. The study begins by outlining the significance and challenges of accurate age and gender detection in various domains such as targeted advertising, security, and humancomputer interaction. It then delves into technical aspects of CNNs and ViTs, elucidating their architecture, working principles, and suitability for image-based task. The proposed techniques used in this study was able to differentiate between the following age groups: 0–15, 15–20, 20–25, 25–30, 30–35, and 40. The purpose is to offer a technique for creating and implementing accurate categorization and age estimation systems capable of operating and achieving high accuracy by integrating and applying a variety of feature extractors and algorithms. Pre-processing is evaluating preliminary data, configuring it, and transforming it to a standard format. The feature extraction component of the age and gender prediction technique is crucial. Three different extraction methods (ResNet 50, ViT Small, and ViT Base) are used in this section. Convolutional Neural Network (CNN) and vision Transformer (ViT) classifiers were used. This optional component of pattern recognition system design focuses on system accuracy. To assess pattern recognition system performance, several approaches are employed, including Mean Absolute Error (MAE), Cumulative Score (CS), Leave-One-Out Cross-validation, and Confusion Matrix. In this study, however, gender and age prediction were tested using the Confusion Matrix and Mean Absolute Error (MAE). Python was the programming language utilized in this study. Python is a high-level, general-purpose programming language that is interpreted. Precision, recall, f1score and accuracy were the performance matrices used. A precision of 99% was achieved for male classification whi

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

image processing based data reduction technique in WVSN for smart agriculture

引用

COMPUTING 2023年第12期105卷 2675-2698页

作者： Koteich, Jana Salim, Christian Mitton, Nathalie Inria Villeneuve Dascq France Junia Comp Sci & Math F-59000 Lille France

Nowadays, to improve animal well being in livestock farming application, a wire-less video sensor network (WVSN) can be deployed to early detect injury and moni-tor animals. They are composed of small embedded video and camera motes that capture video frames periodically and send them to a specific node called a sink. Sending all the captured images to the sink consumes a lot of energy on every sensor and may cause a bottleneck at the sink level. Energy consumption and bandwidth limitation are two important challenges in WVSNs because of the limited energy resources of the nodes and the medium scarcity. In this work, we introduce two mechanisms to reduce the overall number of frames sensed and sent to the sink. The first approach is applied on each sensor node, where the FRABID algorithm, a joint data reduction, and frame rate adaptation on sensing and transmission phases mechanism is introduced. This approach reduces the number of sensed frames based on a similarity method. The aim is to adapt the number of sensed frames based on the degree of difference between two consecutive sensed frames in each period. This adaptation technique maintains the accuracy of the video while capturing frames holding new information. This approach is validated through simulations using real data-sets from video sensors (Wang et al. in: 2014 IEEE conference on computer vision and pattern recognition workshops, pp 393-400, 2014). The results show that the amount of sensed data is reduced by more than 70% compared to a recent algorithm in Christian et al. (Multimed Tools Appl 79(3):1801-1819, 2020) while guaranteeing the detection of all the critical events at the sensor node level. The sec-ond approach exploits the Spatio-temporal correlation between neighboring nodes to reduce the number of captured frames. For that purpose, Synchronization with Frame Rate Adaptation SFRA algorithm is introduced where overlapping nodes capture frames in a synchronized fashion every N - 1 periods, wher

关键词： Smart agriculture Spatio-temporal correlation Data reduction Data prediction WSN

来源：评论

学校读者我要写书评

暂无评论

NTIRE 2023 Quality Assessment of Video Enhancement Challenge

NTIRE 2023 Quality Assessment of Video Enhancement Challenge

引用

2023 IEEE/CVF conference on computer vision and pattern recognition Workshops, CVPRW 2023

作者： Liu, Xiaohong Min, Xiongkuo Sun, Wei Zhang, Yulun Zhang, Kai Timofte, Radu Zhai, Guangtao Gao, Yixuan Cao, Yuqin Kou, Tengchuan Dong, Yunlong Jia, Ziheng Li, Yilin Wu, Wei Hu, Shuming Deng, Sibin Xiao, Pengxiang Chen, Ying Li, Kai Zhao, Kai Yuan, Kun Sun, Ming Cong, Heng Wang, Hao Fu, Lingzhi Zhang, Yusheng Zhang, Rongyu Shi, Hang Xu, Qihang Xiao, Longan Ma, Zhiliang Agarla, Mirko Celona, Luigi Rota, Claudio Schettini, Raimondo Huang, Zhiwei Li, Ya'nan Wang, Xiaotao Lei, Lei Liu, Hongye Hong, Wei Chuang, Ironhead Lin, Allen Guan, Drake Chen, Iris Lou, Kae Huang, Willy Tasi, Yachun Kao, Yvonne Fan, Haotian Kong, Fangyuan Zhou, Shiqi Liu, Hao Lai, Yu Chen, Shanshan Wang, Wenqi Wu, Haoning Chen, Chaofeng Zhu, Chunzheng Guo, Zekun Zhao, Shiling Yin, Haibing Wang, Hongkui Meftah, Hanene Brachemi Fezza, Sid Ahmed Hamidouche, Wassim Déforges, Olivier Shi, Tengfei Mansouri, Azadeh Motamednia, Hossein Bakhtiari, Amir Hossein Aznaveh, Ahmad Mahmoudi Shanghai Jiao Tong University China Eth Zürich Switzerland University of Würzburg Germany Department of Tao Technology Alibaba Group China Kuaishou Technology China Interactive Entertainment Group of Netease Inc Guangzhou China Transsion China Hefei University of Technology China Department of Informatics Systems and Communication University of Milano - Bicocca Italy Xiaomi Inc. China China Ji Liang University China FreeTech Advanced Research Center KKCompany Taiwan ByteDance China Mgtv China Fuzhou University China Shopee Information Technology Co. Ltd. S-Lab Nanyang Technological University Singapore Hunan University China Hangzhou Dianzi University China Insa Rennes Cnrs Ietr - Umr 6164 Rennes France National Higher School of Telecommunications and Ict Oran Algeria Buaa Department of Electrical and Computer Engineering Faculty of Engineering Kharazmi University Tehran Iran High Performance Computing Laboratory School of Computer Science Institute for Research in Fundamental Sciences Tehran Iran Cyberspace Research Institute Shahid Beheshti University Tehran Iran

ISBN: (纸本)9798350302493

This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual Video Enhancement (VDPVE), which has a total of 1211 enhanced videos, including 600 videos with color, brightness, and contrast enhancements, 310 videos with deblurring, and 301 deshaked videos. The challenge has a total of 167 registered participants. 61 participating teams submitted their prediction results during the development phase, with a total of 3168 submissions. A total of 176 submissions were submitted by 37 participating teams during the final testing phase. Finally, 19 participating teams submitted their models and fact sheets, and detailed the methods they used. Some methods have achieved better results than baseline methods, and the winning methods have demonstrated superior prediction performance. © 2023 IEEE.

关键词： image enhancement

来源：评论

学校读者我要写书评

暂无评论

Human Pose as Compositional Tokens

Human Pose as Compositional Tokens

引用

conference on computer vision and pattern recognition (CVPR)

作者： Zigang Geng Chunyu Wang Yixuan Wei Ze Liu Houqiang Li Han Hu University of Science and Technology of China Microsoft Research Asia Tsinghua University

Human pose is typically represented by a coordinate vector of body joints or their heatmap embeddings. While easy for data processing, unrealistic pose estimates are admitted due to the lack of dependency modeling between the body joints. In this paper, we present a structured representation, named Pose as Compositional Tokens (PCT), to explore the joint dependency. It represents a pose by M discrete tokens with each characterizing a sub-structure with several interdependent joints (see Figure 1). The compositional design enables it to achieve a small reconstruction error at a low cost. Then we cast pose estimation as a classification task. In particular, we learn a classifier to predict the categories of the M tokens from an image. A pre-learned decoder network is used to recover the pose from the tokens without further post-processing. We show that it achieves better or comparable pose estimation results as the existing methods in general scenarios, yet continues to work well when occlusion occurs, which is ubiquitous in practice. The code and models are publicly available at https://***/Gengzigang/PCT.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Element-Wise Layer Normalization for Continuous Signal Representation 23

Element-Wise Layer Normalization for Continuous Signal Repre...

引用

15th International conference on Digital image processing, ICDIP 2023

作者： Chen, Weifeng Ding, Hui He, Zhifen Li, Bo Liu, Bin Wang, Kang School of Mathematics and Information Science Key Laboratory of Jiangxi Province for Image Processing and Pattern Recognition Nanchang Hangkong University Nanchang China Jiangxi Science and Technology Infrastructure Center Jiangxi Computing Center Jiangxi Computing Center Nanchang China

ISBN: (纸本)9798400708237

Implicit neural representation (INR), sometimes also referred to coordinate-based representation or fitting, has gained the state-of-the-art performance in numerous research fields including computer vision and computer graphics due to the powerful continuous representation ability. Recent researches indicate that Fourier embedding is critical for INR to fit realistic images with high-frequency details. However, is Fourier embedding all we need for high-frequency coordinate (image) fitting? In this paper, we revisit the problem of coordinate fitting from a novel perspective of distribution mapping. Fourier embedding, as a preprocessing step of coordinate fitting, essentially performs the operation of mapping the uniform coordinate distribution to a normal distribution, and makes the learning of mapping function between two similar smooth distribution become easier. However, the number of discrete Fourier basis function affects the fitting performance dramatically and cannot be determined automatically. Based on the above analysis, a simple yet efficient INR coordinate fitting method is proposed in this paper, which demonstrates that the Fourier embedding is not the only way to improve INR. The proposed method only adds an element-wise layer normalization (ELN) module to the vanilla multi-layer perception (MLP) with ReLU activation. Experimental results on public database demonstrate that the proposed method outperforms the state-of-the-art methods using Fourier embedding. © 2023 ACM.

关键词： Mapping

来源：评论

学校读者我要写书评

暂无评论

PyramidTabNet: Transformer-Based Table recognition in image-Based Documents 17th

PyramidTabNet: Transformer-Based Table Recognition in Image-...

引用

17th International conference on Document Analysis and recognition (ICDAR)

作者： Umer, Muhammad Mohsin, Muhammad Ahmed Ul-Hasan, Adnan Shafait, Faisal Natl Univ Sci & Technol NUST Sch Elect Engn & Comp Sci SEECS Islamabad Pakistan Natl Ctr Artificial Intelligence NCAI Deep Learning Lab Islamabad Pakistan

ISBN: (纸本)9783031417337;9783031417344

Table detection and structure recognition is an important component of document analysis systems. Deep learning-based transformer models have recently demonstrated significant success in various computer vision and document analysis tasks. In this paper, we introduce PyramidTabNet (PTN), a method that builds upon Convolutionless Pyramid vision Transformer to detect tables in document images. Furthermore, we present a tabular image generative augmentation technique to effectively train the architecture. The proposed augmentation process consists of three steps, namely, clustering, fusion, and patching, for the generation of new document images containing tables. Our proposed pipeline demonstrates significant performance improvements for table detection on several standard datasets. Additionally, it achieves performance comparable to the state-of-the-art methods for structure recognition tasks.

关键词： deep learning image transformer image processing data augmentation table augmentation table segmentation table detection structure recognition

来源：评论

学校读者我要写书评

暂无评论

Building Segmentation from Remote Sensing image via DWT Attention Networks 23

Building Segmentation from Remote Sensing Image via DWT Atte...

引用

5th International conference on pattern recognition and Intelligent Systems, PRIS 2023

作者： Wu, Yingbin Zhou, Mingquan Geng, Shengling Zhang, Dan College of Computer Qinghai Normal University Xining China College of Math and Information Yuncheng University Yuncheng China State Key Laboratory of Tibetan Intelligent Information Processing and Application Xining China

ISBN: (纸本)9781450399968

The attention mechanism has been widely used and achieved good results in many visual tasks. But the calculations of attention mechanism in vision tasks consume huge spaces and times, which is the obvious disadvantage of this method. In order to alleviate this problem, we use the DWT(Discrete Wavelet Transform) method to reduce the complexity of attention calculation. DWT can transform an N-dimensional vector into two vectors, one is the low-frequency component of N/2 dimension and the other is high-frequency component of N/2 dimension too. We only use the low-frequency to calculate the attention matrixes, which can reduce the complexity of matrix multiplication, then the time and space consumption of the network is reduced significantly. We also find that the building segmentation in the remote sensing image is different from the other scene segmentation, that the sizes and numbers of different classes of the targets in the general scene images are obvious. Despite all this, our method is still applicable for the targets with large numbers and sizes in general scene images, but not for the targets with small sizes and numbers, and this view is also verified by the subsequent experiments on different datasets. We apply our method on three typical networks (Danet, Swin and Segmenter), and carry out comprehensive experiments on the Cityscape dataset and three building segmentation datasets (Inria Aerial Dataset, Massachusetts Buildings Dataset and Chinese Style Architecture Dataset). The experiments show that, our method is more suitable for building segmentation and can reduce the complexity of the model calculation in building segmentation, and the Mean IoU of segmentation results is not reduced clearly, some even improved. © 2023 ACM.

关键词： Remote sensing

来源：评论

学校读者我要写书评

暂无评论

ViTs for SITS: vision Transformers for Satellite image Time Series

ViTs for SITS: Vision Transformers for Satellite Image Time ...

引用

conference on computer vision and pattern recognition (CVPR)

作者： Michail Tarasiou Erik Chavez Stefanos Zafeiriou Imperial College London

In this paper we introduce the Temporo-Spatial vision Transformer (TSViT), a fully-attentional model for general Satellite image Time Series (SITS) processing based on the vision Transformer (ViT). TSViT splits a SITS record into non-overlapping patches in space and time which are tokenized and subsequently processed by a factorized temporo-spatial encoder. We argue, that in contrast to natural images, a temporal-then-spatial factorization is more intuitive for SITS processing and present experimental evidence for this claim. Additionally, we enhance the model's discriminative power by introducing two novel mechanisms for acquisition-time-specific temporal positional encodings and multiple learnable class tokens. The effect of all novel design choices is evaluated through an extensive ablation study. Our proposed architecture achieves state-of-the-art performance, surpassing previous approaches by a significant margin in three publicly available SITS semantic segmentation and classification datasets. All model, training and evaluation codes can be found at https://***/michaeltrs/DeepSatModels.

关键词：

来源：评论

学校读者我要写书评

暂无评论

iCLIP: Bridging image Classification and Contrastive Language-image Pre-training for Visual recognition

iCLIP: Bridging Image Classification and Contrastive Languag...

引用

conference on computer vision and pattern recognition (CVPR)

作者： Yixuan Wei Yue Cao Zheng Zhang Houwen Peng Zhuliang Yao Zhenda Xie Han Hu Baining Guo Tsinghua University Microsoft Research Asia

This paper presents a method that effectively combines two prevalent visual recognition methods, i.e., image classification and contrastive language-image pre-training, dubbed iCLIP. Instead of naïve multi-task learning that use two separate heads for each task, we fuse the two tasks in a deep fashion that adapts the image classification to share the same formula and the same model weights with the language-image pre-training. To further bridge these two tasks, we propose to enhance the category names in image classification tasks using external knowledge, such as their descriptions in dictionaries. Extensive experiments show that the proposed method combines the advantages of two tasks well: the strong discrimination ability in image classification tasks due to the clean category labels, and the good zero-shot ability in CLIP tasks ascribed to the richer semantics in the text descriptions. In particular, it reaches 82.9% top-1 accuracy on IN-1K, and mean-while surpasses CLIP by 1.8%, with similar model size, on zero-shot recognition of Kornblith 12-dataset benchmark. The code and models are publicly available at https://***/weiyx16/iCLIP.

关键词：

来源：评论

学校读者我要写书评

暂无评论

BBDM: image-to-image Translation with Brownian Bridge Diffusion Models

BBDM: Image-to-Image Translation with Brownian Bridge Diffus...

引用

conference on computer vision and pattern recognition (CVPR)

作者： Bo Li Kaitao Xue Bin Liu Yu-Kun Lai School of Mathematics and Information Science Nanchang Hangkong University Nanchang China School of Computer Sciences and Informatics Cardiff University Cardiff UK

image-to-image translation is an important and challenging problem in computer vision and image processing. Diffusion models (DM) have shown great potentials for high-quality image synthesis, and have gained competitive performance on the task of image-to-image translation. However, most of the existing diffusion models treat image-to-image translation as conditional generation processes, and suffer heavily from the gap between distinct domains. In this paper, a novel image-to-image translation method based on the Brownian Bridge Diffusion Model (BBDM) is proposed, which models image-to-image translation as a stochastic Brownian Bridge process, and learns the translation between two domains directly through the bidirectional diffusion process rather than a conditional generation process. To the best of our knowledge, it is the first work that proposes Brownian Bridge diffusion process for image-to-image translation. Experimental results on various benchmarks demonstrate that the proposed BBDM model achieves competitive performance through both visual inspection and measurable metrics.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：