Handwritten digit recognition remains a pivotal area in machine learning and computervision, essential for applications like license plate identification, form processing, and historical document reading. Addressing ...
详细信息
The task of motion transfer between a source dancer and a target person is a special case of the pose transfer problem, in which the target person changes their pose in accordance with the motions of the dancer. In th...
详细信息
ISBN:
(纸本)9781665445092
The task of motion transfer between a source dancer and a target person is a special case of the pose transfer problem, in which the target person changes their pose in accordance with the motions of the dancer. In this work, we propose a novel method that can reanimate a single image by arbitrary video sequences, unseen during training. The method combines three networks: (i) a segmentation-mapping network, (ii) a realistic frame-rendering network, and (iii) a face refinement network. By separating this task into three stages, we are able to attain a novel sequence of realistic frames, capturing natural motion and appearance. Our method obtains significantly better visual quality than previous methods and is able to animate diverse body types and appearances, which are captured in challenging poses.
Human activity recognition is important for a wide range of applications such as surveillance systems and human-computer interaction. computervision based human activity recognition suffers from performance degradati...
详细信息
Nowadays, video conference solutions are widely adopted for companies, education, and government. People segmentation is crucial for supporting virtual background, an essential video conference function to protect use...
详细信息
ISBN:
(纸本)9781665448994
Nowadays, video conference solutions are widely adopted for companies, education, and government. People segmentation is crucial for supporting virtual background, an essential video conference function to protect users' privacy. This paper demonstrated a people segmentation framework called CE-PeopleSeg, which employed an efficient segmentation method, structural pruning, and dynamic frame skipping techniques, leading to a fast inference speed on CPU. Our extensive experiments show that the proposed CE-PeopleSeg can achieve a high prediction mIoU of 87.9% on Supervised People Dataset while reaching a real-time inference speed of 32.40 fps on CPU with very low usage of 10%. Our code would be released at https://***/geekJZY/***.
In an era where communication is key, the gap in accessible tools for those with hearing impairments or speech disabilities is significant. These individuals often face obstacles in education and social interaction du...
详细信息
In an era where communication is key, the gap in accessible tools for those with hearing impairments or speech disabilities is significant. These individuals often face obstacles in education and social interaction due to a heavy reliance on spoken language and a lack of sign language resources. The Interactive Sign Language Learning System (ISLLS) addresses this gap by providing an innovative platform for learning sign language, enhanced with voice output to assist individuals with speech disabilities. This feature allows for auditory feedback alongside visual sign learning, enriching the educational experience. The ISLLS employs advanced technologies like computervision and deep learning to facilitate sign recognition and text-to-sign conversion. With the new voice output, it further aids those with speech impairments, expanding its inclusivity. This system offers a comprehensive learning tool that caters to a diverse user base, enabling people with speech difficulties to engage more fully with the world. The ISLLS is a significant step towards a more inclusive society, offering a user-friendly platform that not only improves the learning of sign language but also empowers people with speech disabilities to connect and thrive, representing progress in both technology and social inclusivity.
Recent research using deep learning has been actively conducted in various fields, including computervision, reinforcement learning, classifiers, and more. AlphaGo, which learned to play Go and beat professional play...
详细信息
Digit recognition is foundational in pattern recog-nition and machine learning, with applications in document processing and optical character recognition. Current research often targets English digits, overlooking la...
详细信息
Multi-label activity recognition is designed for recognizing multiple activities that are performed simultaneously or sequentially in each video. Most recent activity recognition networks focus on single-activities, t...
详细信息
ISBN:
(纸本)9781665445092
Multi-label activity recognition is designed for recognizing multiple activities that are performed simultaneously or sequentially in each video. Most recent activity recognition networks focus on single-activities, that assume only one activity in each video. These networks extract shared features for all the activities, which are not designed for multi-label activities. We introduce an approach to multi-label activity recognition that extracts independent feature descriptors for each activity and learns activity correlations. This structure can be trained end-to-end and plugged into any existing network structures for video classification. Our method outperformed state-of-the-art approaches on four multi-label activity recognition datasets. To better understand the activity-specific features that the system generated, we visualized these activity-specific features in the Charades dataset. The code will be released later.
Conventional approaches to vision-and-language navigation (VLN) are trained end-to-end but struggle to perform well in freely traversable environments. Inspired by the robotics community, we propose a modular approach...
详细信息
ISBN:
(纸本)9781665445092
Conventional approaches to vision-and-language navigation (VLN) are trained end-to-end but struggle to perform well in freely traversable environments. Inspired by the robotics community, we propose a modular approach to VLN using topological maps. Given a natural language instruction and topological map, our approach leverages attention mechanisms to predict a navigation plan in the map. The plan is then executed with low-level actions (e.g. FORWARD, ROTATE) using a robust controller. Experiments show that our method outperforms previous end-to-end approaches, generates interpretable navigation plans, and exhibits intelligent behaviors such as backtracking.
The ubiquitous physical mouse, despite its years of service, presents challenges in accessibility, hygiene, and user experience. This paper explores the transformative potential of AI Virtual Mouse (AI VM) systems, po...
详细信息
暂无评论