检索结果-内蒙古大学图书馆

A high-density stretchable electrode array for stable sEMG monitoring on dorsal hand

学校读者我要写书评

暂无评论

A high-density stretchable electrode array for stable sEMG m...

2023 IEEE International Conference on Real-Time computing and Robotics, RCAR 2023

作者： Yang, Ruofan Luo, Dong Sun, Jing Zhao, Hang Li, Qingsong Tian, Qiong Tong, Wei Qi, Dianpeng Li, Guanglin Liu, Zhiyuan University of Chinese Academy of Sciences School of Artificial Intelligence China Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems Shenzhen518055 China The Chinese University of Hong Kong Hong Kong Hong Kong Chinese Academy of Sciences Center for Opto-Electronics Engineering & Technology Shenzhen Institutes of Advanced Technology China Chinese Academy of Sciences Cas Key Laboratory of Human-Machine Intelligence-Synergy Systems Shenzhen Institutes of Advanced Technology Shenzhen518055 China University of Science and Technology of China Department of Nano Science and Technology Institute Suzhou China Harbin Institute of Technology Miit Key Laboratory of Critical Materials Technology for New Energy Conversion and Storage School of Chemistry and Chemical Engineering Harbin150001 China Harbin Institute of Technology Natl. and Loc. Jt. Eng. Lab. for Synthesis Transformation and Separation of Extreme Environ. Nutrients Harbin150001 China

ISBN: (纸本)9798350327182

Hand electromyogram (EMG) signals, instrumental in tasks like movement recognition, rehabilitation monitoring, disease diagnosis, and human-computer collab.ration, are typically obtained via high-density EMG electrodes on the dorsal hand. Conventional polyimide (PI)-based flexible high-density electrodes, despite their popularity, suffer from high modulus and poor stretchability. These factors lead to conformability issues, particularly on complex human body surfaces such as the dorsal hand skin, causing motion artifacts, signal degradation, and in severe cases, signal loss due to electrode *** paper presents a thermoplastic polyurethane (TPU)-based high-density stretchable sEMG electrode for reliable EMG signal capture on complex surfaces. Performance comparisons between the proposed stretchable electrode (SE) and the traditional Pl-based flexible sEMG electrode (FE), focusing on time-domain waveforms and signal-to-noise ratio (SNR) under identical fixation conditions, indicate that SE maintains stable EMG signal acquisition across various hand postures. In contrast, the flexible electrode demonstrates signal weakness or loss under large deformations. Further, SNR analysis highlights SE's superior performance during high-intensity movements causing substantial skin deformation. Thus, the proposed SE emerges as a more convenient solution for stable and reliable hand EMG signal monitoring. © 2023 IEEE.

关键词： Electrodes

Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Huang, Hanxun Wang, Yisen Erfani, Sarah Gu, Quanquan Bailey, James Ma, Xingjun School of Computing and Information Systems The University of Melbourne VIC Australia Key Lab. of Machine Perception School of Artificial Intelligence Peking University Beijing China Institute for Artificial Intelligence Peking University Beijing China University of California Los Angeles United States School of Computer Science Fudan University Shanghai China

Deep neural networks (DNNs) are known to be vulnerable to adversarial attacks. A range of defense methods have been proposed to train adversarially robust DNNs, among which adversarial training has demonstrated promising results. However, despite preliminary understandings developed for adversarial training, it is still not clear, from the architectural perspective, what configurations can lead to more robust DNNs. In this paper, we address this gap via a comprehensive investigation on the impact of network width and depth on the robustness of adversarially trained DNNs. Specifically, we make the following key observations: 1) more parameters (higher model capacity) does not necessarily help adversarial robustness;2) reducing capacity at the last stage (the last group of blocks) of the network can actually improve adversarial robustness;and 3) under the same parameter budget, there exists an optimal architectural configuration for adversarial robustness. We also provide a theoretical analysis explaning why such network configuration can help robustness. These architectural insights can help design adversarially robust DNNs. Code is availab.e at https://***/HanxunH/RobustWRN. © 2021, CC BY.

关键词： Deep neural networks

Learning to Know Where to See: A Visibility-Aware Approach for Occluded Person Re-identification

学校读者我要写书评

暂无评论

Learning to Know Where to See: A Visibility-Aware Approach f...

International Conference on Computer Vision (ICCV)

作者： Jinrui Yang Jiawei Zhang Fufu Yu Xinyang Jiang Mengdan Zhang Xing Sun Yingcong Chen Wei-Shi Zheng School of Computer Science and Engineering Sun Yat-sen University China Pazhou Lab Key Laboratory of Machine Intelligence and Advanced Computing Ministry of Education China Youtu Lab Tencent Microsoft Research Asia The Hong Kong University of Science and Technology The Hong Kong University of Science and Technology (Guangzhou)

ISBN: (纸本)9781665428132

Person re-identification (ReID) has gained an impressive progress in recent years. However, the occlusion is still a common and challenging problem for recent ReID methods. Several mainstream methods utilize extra cues (e.g., human pose information) to distinguish human parts from obstacles to alleviate the occlusion problem. Although achieving inspiring progress, these methods severely rely on the fine-grained extra cues, and are sensitive to the estimation error in the extra cues. In this paper, we show that existing methods may degrade if the extra information is sparse or noisy. Thus we propose a simple yet effective method that is robust to sparse and noisy pose information. This is achieved by discretizing pose information to the visibility lab.l of body parts, so as to suppress the influence of occluded regions. We show in our experiments that leveraging pose information in this way is more effective and robust. Besides, our method can be embedded into most person ReID models easily. Extensive experiments validate the effectiveness of our model on common occluded person ReID datasets.

关键词： Estimation error Computer vision Annotations Computational modeling Robustness Noise measurement

Multi-stage Speaker Extraction with Utterance and Frame-Level Reference Signals

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Ge, Meng Xu, Chenglin Wang, Longbiao Chng, Eng Siong Dang, Jianwu Li, Haizhou Tianjin Key Laboratory of Cognitive Computing and Application College of Intelligence and Computing Tianjin University Tianjin China School of Computer Science and Engineering Nanyang Technological University Singapore Japan Advanced Institute of Science and Technology Ishikawa Japan Department of Electrical and Computer Engineering National University of Singapore Singapore Machine Listening Lab University of Bremen Germany

Speaker extraction requires a sample speech from the target speaker as the reference. However, enrolling a speaker with a long speech is not practical. We propose a speaker extraction technique, that performs in multiple stages to take full advantage of short reference speech sample. The extracted speech in early stages is used as the reference speech for late stages. For the first time, we use frame-level sequential speech embedding as the reference for target speaker. This is a departure from the traditional utterance-based speaker embedding reference. In addition, a signal fusion scheme is proposed to combine the decoded signals in multiple scales with automatically learned weights. Experiments on WSJ0-2mix and its noisy versions (WHAM! and WHAMR!) show that SpEx++ consistently outperforms other state-of-the-art baselines. Copyright © 2020, The Authors. All rights reserved.

关键词： Embeddings

View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network

学校读者我要写书评

暂无评论

View-decoupled Transformer for Person Re-identification unde...

Conference on Computer Vision and Pattern Recognition (CVPR)

作者： Quan Zhang Lei Wang Vishal M. Patel Xiaohua Xie Jianhuang Lai School of Computer Science and Engineering Sun Yat-Sen University China Department of Electrical and Computer Engineering Johns Hopkins University USA Pazhou Lab (HuangPu) Guangdong China Guangdong Province Key Laboratory of Information Security Technology Guangdong China Key Laboratory of Machine Intelligence and Advanced Computing Ministry of Education China

ISBN: (数字)9798350353006

ISBN: (纸本)9798350353013

Existing person re-identification methods have achieved remarkable advances in appearance-based identity association across homogeneous cameras, such as ground-ground matching. However, as a more practical scenario, aerial-ground person re-identification (AGPReID) among heterogeneous cameras has received minimal attention. To alleviate the disruption of discriminative identity representation by dramatic view discrepancy as the most significant challenge in AGPReID, the view-decoupled transformer (VDT) is proposed as a simple yet effective framework. Two major components are designed in VDT to decouple view-related and view-unrelated features, namely hierarchical subtractive separation and orthogonal loss, where the former separates these two features inside the VDT, and the latter constrains these two to be independent. In addition, we contribute a large-scale AGPReID dataset called CARGO, consisting of five/eight aerial/ground cameras, 5,000 identities, and 108,563 images. Experiments on two datasets show that VDT is a feasible and effective solution for AGPReID, surpassing the previous method on mAP/Rank1 by up to 5.0%/2.7% on CARGO and 3.7%/5.2% on AG-ReID, keeping the same magnitude of computational complexity. Our project is availab.e at https://***/LinlyAC/VDT-AGPReID.

关键词： Computer vision Cameras Transformers Pattern recognition Computational complexity Identification of persons

Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Fu, Shenghao Yan, Junkai Yang, Qize Wei, Xihan Xie, Xiaohua Zheng, Wei-Shi School of Computer Science and Engineering Sun Yat-sen University China Peng Cheng Laboratory Shenzhen518055 China Tongyi Lab Alibaba Group China Key Laboratory of Machine Intelligence and Advanced Computing Ministry of Education China Guangdong Province Key Laboratory of Information Security Technology China Guangdong Guangzhou510555 China

Recent vision foundation models can extract universal representations and show impressive abilities in various tasks. However, their application on object detection is largely overlooked, especially without fine-tuning them. In this work, we show that frozen foundation models can be a versatile feature enhancer, even though they are not pre-trained for object detection. Specifically, we explore directly transferring the high-level image understanding of foundation models to detectors in the following two ways. First, the class token in foundation models provides an in-depth understanding of the complex scene, which facilitates decoding object queries in the detector’s decoder by providing a compact context. Additionally, the patch tokens in foundation models can enrich the features in the detector’s encoder by providing semantic details. Utilizing frozen foundation models as plug-and-play modules rather than the commonly used backbone can significantly enhance the detector’s performance while preventing the problems caused by the architecture discrepancy between the detector’s backbone and the foundation model. With such a novel paradigm, we boost the SOTA query-based detector DINO from 49.0% AP to 51.9% AP (+2.9% AP) and further to 53.8% AP (+4.8% AP) by integrating one or two foundation models respectively, on the COCO validation set after training for 12 epochs with R50 as the detector’s backbone. Copyright © 2024, The Authors. All rights reserved.

关键词： Decoding

Spatial-Temporal Graph Convolutional Network for Video-Based Person Re-Identification

学校读者我要写书评

暂无评论

Spatial-Temporal Graph Convolutional Network for Video-Based...

Conference on Computer Vision and Pattern Recognition (CVPR)

作者： Jinrui Yang Wei-Shi Zheng Qize Yang Ying-Cong Chen Qi Tian School of Data and Computer Science Sun Yat-sen University China Ministry of Education Key Laboratory of Machine Intelligence and Advanced Computing China Peng Cheng Laboratory Shenzhen China Chinese University of Hong Kong Hong Kong China The Huawei Noah's Ark Lab China

ISBN: (数字)9781728171685

ISBN: (纸本)9781728171692

While video-based person re-identification (Re-ID) has drawn increasing attention and made great progress in recent years, it is still very challenging to effectively overcome the occlusion problem and the visual ambiguity problem for visually similar negative samples. On the other hand, we observe that different frames of a video can provide complementary information for each other, and the structural information of pedestrians can provide extra discriminative cues for appearance features. Thus, modeling the temporal relations of different frames and the spatial relations within a frame has the potential for solving the above problems. In this work, we propose a novel Spatial-Temporal Graph Convolutional Network (STGCN) to solve these problems. The STGCN includes two GCN branches, a spatial one and a temporal one. The spatial branch extracts structural information of a human body. The temporal branch mines discriminative cues from adjacent frames. By jointly optimizing these branches, our model extracts robust spatial-temporal information that is complementary with appearance information. As shown in the experiments, our model achieves state-of-the-art results on MARS and DukeMTMC-VideoReID datasets.

关键词： Feature extraction Computational modeling Robustness Optical computing Optical imaging Computer vision Cameras

Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Lin, Kun-Yu Ding, Henghui Zhou, Jiaming Tang, Yu-Ming Peng, Yi-Xing Zhao, Zhilin Loy, Chen Change Zheng, Wei-Shi School of Computer Science and Engineering Sun Yat-sen University China Key Laboratory of Machine Intelligence and Advanced Computing Ministry of Education China S-Lab Nanyang Technological University Singapore Institute of Big Data Fudan University China AI Thrust Hong Kong University of Science and Technology Guangzhou China Data Science Lab School of Computing & DataX Research Centre Macquarie University Australia

Building upon the impressive success of CLIP (Contrastive Language-Image Pretraining), recent pioneer works have proposed to adapt the powerful CLIP to video data, leading to efficient and effective video learners for open-vocabulary action recognition. Inspired by that humans perform actions in diverse environments, our work delves into an intriguing question: Can CLIP-based video learners effectively generalize to video domains they have not encountered during training? To answer this, we establish a CROSS-domain Open-Vocabulary Action recognition benchmark named XOV-Action, and conduct a comprehensive evaluation of five state-of-the-art CLIP-based video learners under various types of domain gaps. The evaluation demonstrates that previous methods exhibit limited action recognition performance in unseen video domains, revealing potential challenges of the cross-domain open-vocabulary action recognition task. In this paper, we focus on one critical challenge of the task, namely scene bias, and accordingly contribute a novel scene-aware video-text alignment method. Our key idea is to distinguish video representations apart from scene-encoded text representations, aiming to learn scene-agnostic video representations for recognizing actions across domains. Extensive experiments demonstrate the effectiveness of our method. The benchmark and code will be availab.e at https://***/KunyuLin/XOV-Action/. Copyright © 2024, The Authors. All rights reserved.

关键词： Video recording

SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Tan, Chaolei Lin, Zihang Pu, Junfu Qi, Zhongang Pei, Wei-Yi Qu, Zhi Wang, Yexin Shan, Ying Zheng, Wei-Shi Hu, Jian-Fang Sun Yat-sen University Guangzhou China ARC Lab Tencent PCG Shenzhen China Tencent Video PCG Shenzhen China Sun Yat-sen University Key Laboratory of Machine Intelligence and Advanced Computing Ministry of Education Guangzhou China Sun Yat-sen University Guangdong Province Key Laboratory of Information Security Technology Guangzhou China

Video grounding is a fundamental problem in multimodal content understanding, aiming to localize specific natural language queries in an untrimmed video. However, current video grounding datasets merely focus on simple events and are either limited to shorter videos or brief sentences, which hinders the model from evolving toward stronger multimodal understanding capabilities. To address these limitations, we present a large-scale video grounding dataset named SynopGround, in which more than 2800 hours of videos are sourced from popular TV dramas and are paired with accurately localized human-written synopses. Each paragraph in the synopsis serves as a language query and is manually annotated with precise temporal boundaries in the long video. These paragraph queries are tightly correlated to each other and contain a wealth of abstract expressions summarizing video storylines and specific descriptions portraying event details, which enables the model to learn multimodal perception on more intricate concepts over longer context dependencies. Based on the dataset, we further introduce a more complex setting of video grounding dubbed Multi-Paragraph Video Grounding (MPVG), which takes as input multiple paragraphs and a long video for grounding each paragraph query to its temporal interval. In addition, we propose a novel Local-Global Multimodal Reasoner (LGMR) to explicitly model the local-global structures of long-term multimodal inputs for MPVG. Our method provides an effective baseline solution to the multi-paragraph video grounding problem. Extensive experiments verify the proposed model’s effectiveness as well as its superiority in long-term multi-paragraph video grounding over prior state-of-the-arts. Dataset and code are publicly availab.e. Project page: https://***/. © 2024, CC BY.

关键词： Large datasets