检索结果-内蒙古大学图书馆

arXiv 2025年

作者： Esfahani, Arash Nasr Hosseini, Hamed Masouleh, Mehdi Tale Kalhor, Ahmad Sajedi, Hedieh Human and Robot Interaction Lab School of Electrical and Computer Engineering University of Tehran Tehran Iran School of Mathematics Statistics and Computer Science University of Tehran Tehran Iran

As smart homes become more prevalent in daily life, the ability to understand dynamic environments is essential which is increasingly dependent on AI systems. This study focuses on developing an intelligent algorithm which can navigate a robot through a kitchen, recognizing objects, and tracking their relocation. The kitchen was chosen as the testing ground due to its dynamic nature as objects are frequently moved, rearranged and replaced. Various techniques, such as SLAM feature-based tracking and deep learning-based object detection (e.g., Faster R-CNN), are commonly used for object tracking. Additionally, methods such as optical flow analysis and 3D reconstruction have also been used to track the relocation of objects. These approaches often face challenges when it comes to problems such as lighting variations and partial occlusions, where parts of the object are hidden in some frames but visible in others. The proposed method in this study leverages the YOLOv5 architecture, initialized with pre-trained weights and subsequently fine-tuned on a custom dataset. A novel method was developed, introducing a frame-scoring algorithm which calculates a score for each object based on its location and features within all frames. This scoring approach helps to identify changes by determining the best-associated frame for each object and comparing the results in each scene, overcoming limitations seen in other methods while maintaining simplicity in design. The experimental results demonstrate an accuracy of 97.72%, a precision of 95.83% and a recall of 96.84% for this algorithm, which highlights the efficacy of the model in detecting spatial changes. © 2025, CC BY.

关键词： Optical flows

来源：评论

学校读者我要写书评

暂无评论

Magic, Superpowers, or Empowerment? A Conceptual Framework for Magic interaction Techniques

Magic, Superpowers, or Empowerment? A Conceptual Framework f...

引用

Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), IEEE Conference on

作者： Bastian Dewitz Sukran Karaosmanoglu Robert W. Lindeman Frank Steinicke Human-Computer Interaction Universität Hamburg HIT Lab NZ University of Canterbury

This poster presents an approach to systematically distinguish between interaction techniques (ITs) in the context of magic ITs in immersive virtual environments. Currently, heterogeneous terms are used in research to describe the concept of enhancing the abilities of users beyond the limits of the real world, such as magic, super-natural, hyper-natural, superhuman abilities, superpowers, augmentation or empowerment. As a first step towards clarifying and systematically defining the terminology, we propose using the orthogonal concepts of interalizability, congruence, and enhancement (or ICE-cube) as a simple yet expressive conceptual framework.

关键词： Solid modeling Three-dimensional displays Terminology Conferences Computational modeling Virtual environments User interfaces

来源：评论

学校读者我要写书评

暂无评论

DRDM: A Disentangled Representations Diffusion Model for Synthesizing Realistic Person Images

DRDM: A Disentangled Representations Diffusion Model for Syn...

引用

2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025

作者： Huang, Enbo Zhang, Yuan Huang, Faliang Zhang, Guangyu Liu, Yang Guangxi Key Lab of Human-machine Interaction and Intelligent Decision Nanning Normal University Nanning China School of Computer and Information Engineering Nanning Normal University Nanning China College of Mathematics and Informatics South China Agricultural University Guangzhou China School of Computer Science and Engineering Sun Yat-sen University Guangzhou China

ISBN: (纸本)9798350368741

Person image synthesis with controllable body poses and appearances is an essential task owing to the practical needs in the context of virtual try-on, image editing and video production. However, existing methods face significant challenges with details missing, limbs distortion and the garment style deviation. To address these issues, we propose a Disentangled Representations Diffusion Model (DRDM) to generate photo-realistic images from source portraits in specific desired poses and appearances. First, a pose encoder is responsible for encoding pose features into a high-dimensional space to guide the generation of person images. Second, a body-part subspace decoupling block (BSDB) disentangles features from the different body parts of a source figure and feeds them to the various layers of the noise prediction block, thereby supplying the network with rich disentangled features for generating a realistic target image. Moreover, during inference, we develop a parsing map-based disentangled classifier-free guided sampling method, which amplifies the conditional signals of texture and pose. Extensive experimental results on the Deepfashion dataset demonstrate the effectiveness of our approach in achieving pose transfer and appearance control. The associated project can be found at https://***/lovemusiceb/DRDM. © 2025 IEEE.

关键词： Diffusion Model Disentangled Representation human Parsing Person image synthesis

来源：评论

学校读者我要写书评

暂无评论

AI-Driven Relocation Tracking in Dynamic Kitchen Environments

AI-Driven Relocation Tracking in Dynamic Kitchen Environment...

引用

International eConference on computer and Knowledge Engineering (ICCKE)

作者： Arash Nasr Esfahani Hamed Hosseini Mehdi Tale Masouleh Ahmad Kalhor Hedieh Sajedi Human and Robot Interaction Lab School of Electrical and Computer Engineering University of Tehran Tehran Iran School of Mathematics Statistics and Computer Science University of Tehran Tehran Iran

ISBN: (数字)9798331511272

ISBN: (纸本)9798331511289

关键词： YOLO Accuracy Three-dimensional displays Computational modeling Heuristic algorithms Training data Smart homes Data models Object recognition Testing

来源：评论

学校读者我要写书评

暂无评论

Real-Time Imitation of human Head Motions, Blinks and Emotions by Nao Robot: A Closed-Loop Approach 11

Real-Time Imitation of Human Head Motions, Blinks and Emotio...

引用

11th RSI International Conference on Robotics and Mechatronics, ICRoM 2023

作者： Rayati, Keyhan Feizi, Amirhossein Beigy, Alireza Shahverdi, Pourya Masouleh, Mehdi Tale Kalhor, Ahmad Louie, Wing-Yue Geoffrey University of Tehran Human and Robot Interaction Lab Electrical and Computer Engineering Tehran Iran Shahrood University of Technology Department of Electrical Engineering Shahrood Iran Oakland University Intelligent Robotics Laboratory Michigan United States

ISBN: (纸本)9798350308105

This paper introduces a novel approach for enabling real-time imitation of human head motion by a Nao robot, with a primary focus on elevating human-robot interactions. By using the robust capabilities of the MediaPipe as a computer vision library and the DeepFace as an emotion recognition library, this research endeavors to capture the subtleties of human head motion, including blink actions and emotional expressions, and seamlessly incorporate these indicators into the robot's responses. The result is a comprehensive framework which facilitates precise head imitation within human-robot interactions, utilizing a closed-loop approach that involves gathering real-time feedback from the robot's imitation performance. This feedback loop ensures a high degree of accuracy in modeling head motion, as evidenced by an impressive R2 score of 96.3 for pitch and 98.9 for yaw. Notably, the proposed approach holds promise in improving communication for children with autism, offering them a valuable tool for more effective interaction. In essence, proposed work explores the integration of real-time head imitation and real-time emotion recognition to enhance human-robot interactions, with potential benefits for individuals with unique communication needs. © 2023 IEEE.

关键词： Emotion Recognition

来源：评论

学校读者我要写书评

暂无评论

Exploring Auditory Hand Guidance for Eyes-free 3D Path Tracing

Exploring Auditory Hand Guidance for Eyes-free 3D Path Traci...

引用

2025 CHI Conference on human Factors in Computing Systems, CHI EA 2025

作者： Abe, Yuki Hara, Kotaro Sakamoto, Daisuke Ono, Tetsuo Human-Computer Interaction Lab. Hokkaido University Hokkaido Sapporo Japan School of Computing and Information Systems Singapore Management University Singapore Singapore Division of Computer Science and Information Technology Hokkaido University Hokkaido Sapporo Japan Faculty of Engineering Kyoto Tachibana University Kyoto Japan

ISBN: (纸本)9798400713958

Guiding a user’s hand along a 3D path can help individuals avoid obstacles and manipulate everyday items with eyes-free. While prior work focused on haptic approaches using robots, auditory approaches for 3D path guidance remain underexplored. We prototyped and assessed two auditory hand guidance techniques for 3D path tracing: (1) VERBAL repeating spoken directional prompts;(2) Follow-Your-Finger (FYF) using the user’s index finger in moving hand as an embodied reference and sonifying whether the user should follow the index finger direction. In a controlled pilot study with 12 sighted participants under an eyes-free condition, VERBAL yielded about 1.5 times faster performance than FYF, while FYF achieved an error rate of less than half VERBAL’s. We thus recommend using VERBAL for wider paths with fewer obstacles and FYF for narrower paths with higher collision risks. We also discuss the research direction for applying these findings to real-world tasks. © 2025 Copyright held by the owner/author(s).

关键词： Three dimensional computer graphics

来源：评论

学校读者我要写书评

暂无评论

Bringing Instant Neural Graphics Primitives to Immersive Virtual Reality

Bringing Instant Neural Graphics Primitives to Immersive Vir...

引用

Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), IEEE Conference on

作者： Ke Li Tim Rolff Susanne Schmidt Reinhard Bacher Simone Frintrop Wim Leemans Frank Steinicke Department of Informatics Human-Computer Interaction Group Universität Hamburg Deutsches Elektronen-Synchrotron DESY Germany Department of Informatics Computer Vision Group Universität Hamburg

Neural radiance field (NeRF), in particular, its extension by instant neural graphics primitives is a novel rendering method for view synthesis that uses real-world images to build photo-realistic immersive virtual scenes. Despite its enormous potential for virtual reality (VR) applications, there is currently little robust integration of NeRF into typical VR systems available for research and benchmarking in the VR community. In this poster paper, we present an extension to instant neural graphics primitives and bring stereoscopic, high-resolution, low-latency, 6-DoF NeRF rendering to the Unity game engine for immersive VR applications. 1 1 Link to the repository: https://***/uhhhci/immersive-ngp

关键词： Three-dimensional displays Stereo image processing Conferences Virtual reality Games User interfaces Rendering (computer graphics)

来源：评论

学校读者我要写书评

暂无评论

Free handwritten string recognition using CNN and DP matching

Free handwritten string recognition using CNN and DP matchin...

引用

International Symposium on Soft Computing and Intelligent Systems (SCIS)

作者： Yuta Okajima Kento Morita Tetsushi Wakabayashi Human Computer Interaction Lab. Mie University

While handwritten zip code recognition and ledger sheet recognition are in practical use, character recognition technology for free handwritten documents is still in the process of commercialization. If character recognition for free handwritten documents is realized, scanned images of notes and memos written on paper can be converted into text data on a computer, which will be useful for keyword searches and natural language processing. One of the reasons why character recognition for free handwritten documents is still in the research stage is that it is difficult to detect lines and extract characters from a document. In order to improve the accuracy of character recognition for free handwritten documents, it is first necessary to improve the accuracy of line detection and character segmentation. In this study, we propose a method to segment characters from words or strings using CNN and dynamic programming so that the sum of character similarities is optimal, and aim to improve the accuracy of character segmentation

关键词： Handwriting recognition Codes Keyword search Natural language processing Dynamic programming Character recognition Commercialization

来源：评论

学校读者我要写书评

暂无评论

Post-Training Quantization in Brain-computer Interfaces Based on Event-Related Potential Detection

Post-Training Quantization in Brain-Computer Interfaces Base...

引用

IEEE International Conference on Systems, Man and Cybernetics

作者： Hubert Cecotti Dalvir Dhaliwal Hardip Singh Yohesh Kumar Meena Department of Computer Science California State University Fresno USA Human-AI Interaction (HAIx) Lab IIT Gandhinagar India

ISBN: (数字)9781665410205

ISBN: (纸本)9781665410212

Post-training quantization (PTQ) is a technique used to optimize and reduce the memory footprint and computational requirements of machine learning models. It has been used primarily for neural networks. For Brain-computer Interfaces (BCI) that are fully portable and usable in various situations, it is necessary to provide approaches that are lightweight for storage and computation. In this paper, we propose the evaluation of post-training quantization on state-of-the-art approaches in brain-computer interfaces and assess their impact on accuracy. We evaluate the performance of the single-trial detection of event-related potentials representing one major BCI paradigm. The area under the receiver operating characteristic curve drops from 0.861 to 0.825 with PTQ when applied on both spatial filters and the classifier, while reducing the size of the model by about x 15. The results support the conclusion that PTQ can substantially reduce the memory footprint of the models while keeping roughly the same level of accuracy.

关键词： Quantization (signal) Accuracy Memory management Deep architecture Receivers Machine learning Brain modeling Spatial filters Brain-computer interfaces Hardware

来源：评论

学校读者我要写书评

暂无评论

Quantifying Spatial Domain Explanations in BCI using Earth Mover’s Distance

Quantifying Spatial Domain Explanations in BCI using Earth M...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Param Rajpura Hubert Cecotti Yogesh Kumar Meena Human-AI Interaction (HAIx) Lab IIT Gandhinagar India Department of Computer Science California State University Fresno USA

ISBN: (数字)9798350359312

ISBN: (纸本)9798350359329

Brain-computer interface (BCI) systems facilitate unique communication between humans and computers, benefiting severely disabled individuals. Despite decades of research, BCIs are not fully integrated into clinical and commercial settings. It’s crucial to assess and explain BCI performance, offering clear explanations for potential users to avoid frustration when it doesn’t work as expected. This work investigates the efficacy of different deep learning and Riemannian geometry-based classification models in the context of motor imagery (MI) based BCI using electroencephalography (EEG). We then propose an optimal transport theory-based approach using earth mover’s distance (EMD) to quantify the comparison of the feature relevance map with the domain knowledge of neuroscience. For this, we utilized explainable AI (XAI) techniques for generating feature relevance in the spatial domain to identify important channels for model outcomes. Three state-of-the-art models are implemented - 1) Riemannian geometry-based classifier, 2) EEGNet, and 3) EEG Conformer, and the observed trend in the model’s accuracy across different architectures on the dataset correlates with the proposed feature relevance metrics. The models with diverse architectures perform significantly better when trained on channels relevant to motor imagery than data-driven channel selection. This work focuses attention on the necessity for interpretability and incorporating metrics beyond accuracy, underscores the value of combining domain knowledge and quantifying model interpretations with data-driven approaches in creating reliable and robust Brain-computer Interfaces (BCIs).

关键词： Measurement Earth Accuracy computer architecture Brain modeling Motors Feature extraction

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：