检索结果-内蒙古大学图书馆

arXiv 2024年

作者： Khan, Asifullah Sohail, Anabia Fiaz, Mustansar Hassan, Mehdi Afridi, Tariq Habib Marwat, Sibghat Ullah Munir, Farzeen Ali, Safdar Naseem, Hannan Zaheer, Muhammad Zaigham Ali, Kamran Sultana, Tangina Tanoli, Ziaurrehman Akhter, Naeem Pattern Recognition Lab DCIS PIEAS Nilore Islamabad45650 Pakistan PIEAS Nilore Islamabad45650 Pakistan Deep Learning Lab Center for Mathematical Sciences PIEAS Nilore Islamabad45650 Pakistan Center of Secure Cyber-Physical Security Systems Khalifa University Abu Dhabi United Arab Emirates IBM Research United States Department of Computer Science Air University Islamabad Pakistan Department of Computer Science and Engineering Kyung Hee University Global Campus 1732 Gyeonggi-do Yongin17104 Korea Republic of Department of Electrical Engineering and Automation Aalto University Finland Finnish Center of Artificial Center Finland Faculty of Engineering and Green Technology Universiti Tunku Abdul Rahman Malaysia Computer Vision Department Mohamed Bin Zayed University of Artificial Intelligence United Arab Emirates Karachi Pakistan Department of Electronics and Communication Engineering Hajee Mohammad Danesh Science and Technology University Bangladesh HiLIFE University of Helsinki Finland

vision Transformers (ViTs) have recently demonstrated remarkable performance in computer vision tasks. However, their parameter-intensive nature and reliance on large amounts of data for effective performance have shifted the focus from traditional human-annotated labels to unsupervised learning and pretraining strategies that uncover hidden structures within the data. In response to this challenge, self-supervised learning (SSL) has emerged as a promising paradigm. SSL leverages inherent relationships within the data itself as a form of supervision, eliminating the need for manual labeling and offering a more scalable and resource-efficient alternative for model training. Given these advantages, it is imperative to explore the integration of SSL techniques with ViTs, particularly in scenarios with limited labeled data. Inspired by this evolving trend, this survey aims to systematically review SSL mechanisms tailored for ViTs. We propose a comprehensive taxonomy to classify SSL techniques based on their representations and pre-training tasks. Additionally, we discuss the motivations behind SSL, review prominent pre-training tasks, and highlight advancements and challenges in this field. Furthermore, we conduct a comparative analysis of various SSL methods designed for ViTs, evaluating their strengths, limitations, and applicability to different scenarios. Copyright © 2024, The Authors. All rights reserved.

关键词： Self-supervised learning

来源：评论

学校读者我要写书评

暂无评论

A Low-Cost Pathological Gait Detection System in Multi-Kinect Environment 1

引用

20th International Symposium on Optomechatronic Technologies, ISOT 2019

作者： Chakraborty, Saikat Mishra, Rishabh Dwivedi, Anurag Das, Tania Nandy, Anup Machine Intelligence and Bio-motion Research Lab Department of Computer Science and Engineering National Institute of Technology Rourkela RourkelaOdisha India Department of Computer Science and Engineering National Institute of Technology Sikkim Sikkim India Department of Electronics and Communication Engineering Heritage Institute of Technology KolkataWest Bengal India

ISBN: (数字)9789811564673

ISBN: (纸本)9789811564666

Traditional vision-based systems used for automatic gait pathology detection, associate high-cost. However, with the advent of Microsoft Kinect sensor, researchers tried to model some low-cost gait assessment systems;but they suffer from the device-specific generic constraints. This study attempted to mitigate those pitfalls by introducing a noble multi-Kinect setup for automated gait diagnosis. Ten healthy participants were recruited to simulate pathological gait. Extracted salient features were classified using supervised learning, leading to an overall accuracy of 93%, which outperformed state-of-the-art. © 2020, Springer Nature Singapore Pte Ltd.

关键词： Costs

来源：评论

学校读者我要写书评

暂无评论

Sad: Saliency-based defenses against adversarial examples

arXiv

引用

arXiv 2020年

作者： Tran, Richard Patrick, David Geyer, Michael Fernandez, Amanda S. Vision and Artificial Intelligence Lab Department of Computer Science University of Texas at San Antonio

With the rise in popularity of machine and deep learning models, there is an increased focus on their vulnerability to malicious inputs. These adversarial examples drift model predictions away from the original intent of the network and are a growing concern in practical security. In order to combat these attacks, neural networks can leverage traditional image processing approaches or state-of-the-art defensive models to reduce perturbations in the data. Defensive approaches that take a global approach to noise reduction are effective against adversarial attacks, however their lossy approach often distorts important data within the image. In this work, we propose a visual saliency based approach to cleaning data affected by an adversarial attack. Our model leverages the salient regions of an adversarial image in order to provide a targeted countermeasure while comparatively reducing loss within the cleaned images. We measure the accuracy of our model by evaluating the effectiveness of state-of-the-art saliency methods prior to attack, under attack, and after application of cleaning methods. We demonstrate the effectiveness of our proposed approach in comparison with related defenses and against established adversarial attack methods, across two saliency datasets. Our targeted approach shows significant improvements in a range of standard statistical and distance saliency metrics, in comparison with both traditional and state-of-the-art approaches. Copyright © 2020, The Authors. All rights reserved.

关键词： Noise abatement

来源：评论

学校读者我要写书评

暂无评论

NTIRE 2023 Challenge on Image Super-Resolution (×4): Methods and Results

NTIRE 2023 Challenge on Image Super-Resolution (×4): Method...

引用

2023 IEEE/CVF Conference on computer vision and Pattern Recognition Workshops, CVPRW 2023

作者： Zhang, Yulun Zhang, Kai Chen, Zheng Li, Yawei Timofte, Radu Zhang, Junpei Zhang, Kexin Peng, Rui Ma, Yanbiao Jiao, Licheng Huang, Huaibo Zhou, Xiaoqiang Ai, Yuang He, Ran Qiu, Yajun Zhu, Qiang Li, Pengfei Li, Qianhui Zhu, Shuyuan Zhang, Dafeng Li, Jia Wang, Fan Li, Chunmiao Kim, TaeHyung Kil, Jungkeong Kim, Eon Yu, Yeonseung Lee, Beomyeol Lee, Subin Lim, Seokjae Chae, Somi Choi, Heungjun Huang, ZhiKai Chen, YiChung Chiang, YuanChun Yang, HaoHsiang Chen, WeiTing Chang, HuaEn Chen, I-Hsiang Hsieh, ChiaHsuan Kuo, SyYen Choi, Ui-Jin Conde, Marcos V. Khowaja, Sunder Ali Yoon, Jiseok Lee, Ik Hyun Gendy, Garas Sabor, Nabil Hou, Jingchao He, Guanghui Zhang, Zhao Li, Baiang Zheng, Huan Zhao, Suiyi Gao, Yangcheng Wei, Yanyan Ren, Jiahuan Wei, Jiayu Li, Yanfeng Sun, Jia Cheng, Zhanyi Li, Zhiyuan Yao, Xu Wang, Xinyi Li, Danxu Cui, Xuan Cao, Jun Li, Cheng Zheng, Jianbin Sarvaiya, Anjali Prajapati, Kalpesh Patra, Ratnadeep Barik, Pragnesh Rathod, Chaitanya Upla, Kishor Raja, Kiran Ramachandra, Raghavendra Busch, Christoph Computer Vision Lab Eth Zurich Switzerland Shanghai Jiao Tong University China University of Würzburg Germany Xidian University China Mais&cripac Institute of Automation Chinese Academy of Sciences China School of Artificial Intelligence University of Chinese Academy of Sciences China University of Science and Technology of China China Beijing Institute of Technology China School of Information Science and Technology ShanghaiTech University China School of Information and Communication Engineering University of Electronic Science and Technology of China China China Lotte Data Communication Company Seoul Korea Republic of Graduate Institute of Electronics Engineering National Taiwan University Taiwan Department of Electrical Engineering National Taiwan University Taiwan Graduate Institute of Communication Engineering National Taiwan University Taiwan ServiceNow United States MegaStudyEdu Korea Republic of Computer Vision Lab Caidas University of Würzburg Germany University of Sindh Pakistan Iklab Inc. Tech University of Korea Siheung-Si Korea Republic of Micro-Nano Electronics Department Shanghai Jiao Tong University China Electrical Engineering Department Faculty of Engineering Assiut University Egypt Hefei University of Technology China Beijing Jiaotong University China South China University of Technology Guangdong Guangzhou China Sardar Vallabhbhai National Institute of Technology India Norwegian University of Science and Technology Norway

ISBN: (纸本)9798350302493

This paper reviews the NTIRE 2023 challenge on image super-resolution (×4), focusing on the proposed solutions and results. The task of image super-resolution (SR) is to generate a high-resolution (HR) output from a corresponding low-resolution (LR) input by leveraging prior information from paired LR-HR images. The aim of the challenge is to obtain a network design/solution capable to produce high-quality results with the best performance (e.g., PSNR). We want to explore how high performance we can achieve regardless of computational cost (e.g., model size and FLOPs) and data. The track of the challenge was to measure the restored HR images with the ground truth HR images on DIV2K testing dataset. The ranking of the teams is determined directly by the PSNR value. The challenge has attracted 192 registered participants, where 15 teams made valid submissions. They achieve state-of-the-art performance in single image super-resolution. © 2023 IEEE.

关键词： Statistical tests

来源：评论

学校读者我要写书评

暂无评论

Learning Graph Representation of Person-specific Cognitive Processes from Audio-visual Behaviours for Automatic Personality Recognition

arXiv

引用

arXiv 2021年

作者： Song, Siyang Shao, Zilong Jaiswal, Shashank Shen, Linlin Valstar, Michel Gunes, Hatice Department of Computer Science and Technology University of Cambridge Cambridge United Kingdom Computer Vision Institute Shenzhen University Shenzhen China Shenzhen Institute of Artificial Intelligence of Robotics of Society Shenzhen China Guangdong Key Laboratory of Intelligent Information Processing Shenzhen University Shenzhen China Computer Vision Lab University of Nottingham Nottingham United Kingdom

This paper proposes to recognise the true (self-reported) personality from the learned simulation of the target subject’s cognition. This approach builds on two following findings in cognitive science: (i) human cognition partially determines expressed behaviour and is directly linked to true personality traits;and (ii) in dyadic interactions individuals’ nonverbal behaviours are influenced by their conversational partner’s behaviours. In this context, we hypothesise that during a dyadic interaction, a target subject’s facial reactions are driven by two main factors, i.e. their internal (person-specific) cognitive process, and the externalised nonverbal behaviours of their conversational partner. Consequently, we propose to represent the target subject’s (defined as the listener) person-specific cognition in the form of a person-specific CNN architecture that has unique architectural parameters and depth, which takes audio-visual non-verbal cues displayed by the conversational partner (defined as the speaker) as input, and is able to reproduce the target subject’s facial reactions. Each person-specific CNN is explored by the Neural Architecture Search (NAS) and a novel adaptive loss function, which is then represented as a graph representation for recognising the target subject’s true personality. Experimental results not only show that the produced graph representations are well associated with target subjects’ personality traits in both human-human and human-machine interaction scenarios, and outperform the existing approaches with significant advantages, but also demonstrate that the proposed novel strategies such as adaptive loss, and the end-to-end vertices/edges feature learning, help the proposed approach in learning more reliable personality representations. Building on our earlier version of this work, this paper further proposes: (i) assigning a unique depth for each CNN;(ii) a novel end-to-end graph vertex feature learning strategy;(iii) a transformer-bas

关键词： Cognitive systems

来源：评论

学校读者我要写书评

暂无评论

LocalViT: Analyzing Locality in vision Transformers

LocalViT: Analyzing Locality in Vision Transformers

引用

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

作者： Yawei Li Kai Zhang Jiezhang Cao Radu Timofte Michele Magno Luca Benini Luc Van Goo Computer Vision Lab D-ITET ETH Zurich Switzerland Center for Artificial Intelligence and Data Science (CAIDAS) University of Wurzburg Germany Center for Project-Based Learning D-ITET ETH Zurich Switzerland Integrated Systems Laboratory D-ITET ETH Zurich Switzerland Department of Electrical Electronic and Information Engineering University of Bologna Italy Processing Speech and Images (PSI) KU Leuven Belgium

The aim of this paper is to study the influence of locality mechanisms in vision transformers. Transformers originated from machine translation and are particularly good at modelling long-range dependencies within a long sequence. Although the global interaction between the token embeddings could be well modelled by the self-attention mechanism of transformers, what is lacking is a locality mechanism for infor-mation exchange within a local region. In this paper, locality mechanism is systematically investigated by carefully designed controlled experiments. We add locality to vision transformers into the feed-forward network. This seemingly simple solution is inspired by the comparison between feed-forward networks and inverted residual blocks. The importance of locality mechanisms is validated in two ways: 1) A wide range of design choices (activation function, layer placement, expansion ratio) are available for incorporating locality mechanisms and proper choices can lead to a performance gain over the baseline, and 2) The same locality mechanism is successfully applied to vision transformers with different architecture designs, which shows the generalization of the locality concept. For ImageNet2012 classification, the locality-enhanced transformers outperform the baselines Swin-T [1], DeiT-T [2] and PVT-T [3] by 1.0%, 2.6 % and 3.1 % with a negligible increase in the number of parameters and computational effort. Code is available at https://***/ofsoundof/LocalViT.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Efficient Deep Models for Real-Time 4K Image Super-Resolution. NTIRE 2023 Benchmark and Report

Efficient Deep Models for Real-Time 4K Image Super-Resolutio...

引用

2023 IEEE/CVF Conference on computer vision and Pattern Recognition Workshops, CVPRW 2023

作者： Conde, Marcos V. Zamfir, Eduard Timofte, Radu Motilla, Daniel Liu, Cen Zhang, Zexin Peng, Yunbo Lin, Yue Guo, Jiaming Zou, Xueyi Chen, Yuyi Liu, Yi Hao, Jia Yan, Youliang Zhang, Yuanfan Li, Gen Sun, Lei Kong, Lingshun Bai, Haoran Pan, Jinshan Dong, Jiangxin Tang, Jinhui Ayazoglu, Mustafa Bilecen, Bahri Batuhan Li, Mingxi Zhang, Yuhang Fan, Xianjun Sheng, Yankai Sun, Long Liu, Zibin Gou, Weiran Li, Shaoqing Yi, Ziyao Xiang, Yan Kong, Dehui Xu, Ke Gankhuyag, Ganzorig Yoon, Kihwan Zhang, Jin Yu, Gaocheng Zhang, Feng Wang, Hongbin Zhou, Zhou Chao, Jiahao Gao, Hongfan Gong, Jiali Yang, Zhengfeng Zeng, Zhenbing Chen, Chengpeng Guo, Zichao Park, Anjin Liu, Yuqing Jia, Qi Yu, Hongyuan Yin, Xuanwu Zuo, Kunlong Zhang, Dongyang Fu, Ting Cheng, Zhengxue Zhu, Shiai Zhou, Dajiang Yu, Weichen Ge, Lin Dong, Jiahua Zou, Yajun Wu, Zhuoyuan Han, Binnan Zhang, Xiaolin Zhang, Heng Shao, Ben Zheng, Shaolong Yin, Daheng Chen, Baijun Liu, Mengyang Nistor, Marian-Sergiu Chen, Yi-Chung Huang, Zhi-Kai Chiang, Yuan-Chun Chen, Wei-Ting Yang, Hao-Hsiang Chang, Hua-En Chen, I-Hsiang Hsieh, Chia-Hsuan Kuo, Sy-Yen Vo, Tu Yan, Qingsen Zhu, Yun Su, Jinqiu Zhang, Yanning Zhang, Cheng Luo, Jiaying Cho, Youngsun Lee, Nakyung Computer Vision Lab CAIDAS IFI University of Würzburg Germany Sony Interactive Entertainment CA United States Huawei Technologies Co. Ltd. China NetEase Games AI Lab Nanjing University of Science and Technology China Tencent China Attrsense Korea Republic of Sanechips Co Ltd Ant Group China East China Normal University China Shopee Dalian University of Technology Xiaomi Inc. China China Zhejiang Dahua Technology Co. Ltd. China Multimedia Department Xiaomi Inc. China Korea Photonic Technology Institute Korea Republic of School of Computer Science and Engineering Southeast University China University Al. I. Cuza Iasi Romania Graduate Institute of Electronics Engineering National Taiwan University Taiwan Department of Electrical Engineering National Taiwan University Taiwan Graduate Institute of Communication Engineering National Taiwan University Taiwan ServiceNow United States Northwestern Polytechnical University China KC Machine Learning Lab CJ OliveNetworks AI Research

ISBN: (纸本)9798350302493

This paper introduces a novel benchmark for efficient up-scaling as part of the NTIRE 2023 Real-Time Image Super-Resolution (RTSR) Challenge, which aimed to upscale images from 720p and 1080p resolution to native 4K (×2 and ×3 factors) in real-time on commercial GPUs. For this, we use a new test set containing diverse 4K images ranging from digital art to gaming and photography. We assessed the methods devised for 4K SR by measuring their runtime, parameters, and FLOPs, while ensuring a minimum PSNR fidelity over Bicubic interpolation. Out of the 170 participants, 25 teams contributed to this report, making it the most comprehensive benchmark to date and showcasing the latest advancements in real-time SR. © 2023 IEEE.

关键词： Program processors

来源：评论

学校读者我要写书评

暂无评论

Crypt-OR:A privacy-preserving system for exemplar-based object-removal over the cloud

TechRxiv

引用

TechRxiv 2020年

作者： Tanwar, Vishesh Kumar Raman, Balasubramanian Bhargava, Rama Department of Mathematics Machine Vision Lab Department of Computer Science & Engineering Indian Institute of Technology Roorkee India

Object removal is a technique for removing the undesired object(s) and then fill-in the empty region(s) in an image such that the modified image is visually plausible. The existing algorithms are unable to provide promising results when the region to be removed - has varying textured-neighborhood, is small in size and the depth of the image and, is of specific geometric shapes such as triangle and rectangle. In this paper, we proposed a new algorithm by incorporating the merits of partial differential equations (PDEs) and exemplar-based schemes to address these challenges. The data term, which measures the continuity of isophotes in exemplar-based methods, is modified by incorporating a regularizer term and partial derivatives up to second order of the input image. This regularizer enhances the strength of isophotes striking the boundary and boosts the information propagation in an unbiased manner, in terms of pixel intensity values. Additionally, the low-cost, agility, and accessing flexibility benefits of cloud services have attracted user’s attention today. Besides, users are concerned about utilizing them for their data, as they are supported by untrusted third parties. Addressing these privacy concerns for object-removal in an image over the cloud server, we extended and modified our algorithm to make it compatible for (T, N)-threshold Shamir secret sharing scheme (SSS). This privacy-preserving system is an end-to-end system for object-removal in the ED over the cloud server namely Crypt-OR. Crypt-OR is evaluated by removing synthetically imposed objects in real-images. Further, Crypt-OR has proved to be secure under various pixel-based cryptographic attacks such as frequency-known attack and pixel-correlation attack. © 2020, CC BY.

关键词： Pixels

来源：评论

学校读者我要写书评

暂无评论

Visual object tracking with discriminative filters and siamese networks: A survey and outlook

arXiv

引用

arXiv 2021年

作者： Javed, Sajid Danelljan, Martin Khan, Fahad Shahbaz Khan, Muhammad Haris Felsberg, Michael Matas, Jiri The EECS Department Khalifa University of Science and Technology P.O Box: 127788 Abu Dhabi United Arab Emirates The Computer Vision Lab Dept. of Information Technology and Electrical Engineering ETH Zürich Switzerland Computer Vision Department MBZUAI Abu Dhabi United Arab Emirates Computer Vision Laboratory Linköping University Sweden Center for Machine Perception Czech Technical University Prague Czech Republic

Accurate and robust visual object tracking is one of the most challenging and fundamental computer vision problems. It entails estimating the trajectory of the target in an image sequence, given only its initial location, and segmentation, or its rough approximation in the form of a bounding box. Discriminative Correlation Filters (DCFs) and deep Siamese Networks (SNs) have emerged as dominating tracking paradigms, which have led to significant progress. Following the rapid evolution of visual object tracking in the last decade, this survey presents a systematic and thorough review of more than 90 DCFs and Siamese trackers, based on results in nine tracking benchmarks. First, we present the background theory of both the DCF and Siamese tracking core formulations. Then, we distinguish and comprehensively review the shared as well as specific open research challenges in both these tracking paradigms. Furthermore, we thoroughly analyze the performance of DCF and Siamese trackers on nine benchmarks, covering different experimental aspects of visual tracking: datasets, evaluation metrics, performance, and speed comparisons. We finish the survey by presenting recommendations and suggestions for distinguished open challenges based on our analysis. © 2021, CC BY-SA.

关键词： Surveys

来源：评论

学校读者我要写书评

暂无评论

Causal Disentanglement for Semantics-Aware Intent Learning in Recommendation

arXiv

引用

arXiv 2022年

作者： Wang, Xiangmeng Li, Qian Yu, Dianer Cui, Peng Wang, Zhichao Xu, Guandong Data Science and Machine Intelligence Lab Faculty of Engineering and Information Technology University of Technology Sydney NSW Australia The School of Electrical Engineering Computing and Mathematical Sciences Curtin University Perth Australia School of Electrical Engineering and Telecommunications University of New South Wales Sydney Australia The Department of Computer Science and Technology Tsinghua University Beijing100084 China

Traditional recommendation models trained on observational interaction data have generated large impacts in a wide range of applications, it faces bias problems that cover users’ true intent and thus deteriorate the recommendation effectiveness. Existing methods tracks this problem as eliminating bias for the robust recommendation, e.g., by re-weighting training samples or learning disentangled representation. The disentangled representation methods as the state-of-the-art eliminate bias through revealing cause-effect of the bias generation. However, how to design the semantics-aware and unbiased representation for users true intents is largely unexplored. To bridge the gap, we are the first to propose an unbiased and semantics-aware disentanglement learning called CaDSI (Causal Disentanglement for Semantics-Aware Intent Learning) from a causal perspective. Particularly, CaDSI explicitly models the causal relations underlying recommendation task, and thus produces semantics-aware representations via disentangling users true intents aware of specific item context. Moreover, the causal intervention mechanism is designed to eliminate confounding bias stemmed from context information, which further to align the semantics-aware representation with users true intent. Extensive experiments and case studies both validate the robustness and interpretability of our proposed model. © 2022, CC BY.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：