Entering text precisely by voice, users might encounter colloquial inserts, inappropriate wording, and recognition errors, which brings difficulties to voice editing. users need to locate the errors and then correct t...
详细信息
Graphical userinterface (GUI) agents are autonomous systems that interpret and generate actions, enabling intelligent user assistance and automation. Effective training of these agent presents unique challenges, such...
详细信息
ISBN:
(纸本)9798400713316
Graphical userinterface (GUI) agents are autonomous systems that interpret and generate actions, enabling intelligent user assistance and automation. Effective training of these agent presents unique challenges, such as sparsity in supervision signals, scalability for large datasets, and the need for nuanced user understanding. We propose stateful screen schema, an efficient representation of GUI interactions that captures key user actions and intentions over time. Building on this foundation, we introduce ScreenLLM, a set of multimodal large language models (MLLMs) tailored for advanced UI understanding and action prediction. Extensive experiments on both open-source and proprietary models show that ScreenLLM accurately models user behavior and predicts actions. Our work lays the foundation for scalable, robust, and intelligent GUI agents that enhance user interaction in diverse software environments.
Digital fabrication machines for makers have expanded access to manufacturing processes such as 3D printing, laser cutting, and milling. While digital models encode the data necessary for a machine to manufacture an o...
详细信息
Motion tracking systems with viewpoint concerns or whose marker data include unreliable states have proven difficult to use despite many impactful benefits. We propose a technique inspired by active vision and using a...
详细信息
ISBN:
(纸本)9781450390927
Motion tracking systems with viewpoint concerns or whose marker data include unreliable states have proven difficult to use despite many impactful benefits. We propose a technique inspired by active vision and using a customized hill-climbing approach to control a robot-sensor setup and apply it to a magnetic induction system capable of occlusion-free motion tracking. Our solution reduces the impact of displacement and orientation issues for markers which inherently present a dead-angle range that disturbs usability and accuracy. The resulting interface is successful in stabilizing previously unexploitable data while preventing sub-optimal states for up to hundreds of occurrences per recording and featuring an approximate 40% decrease in tracking error.
In this paper, we propose EIT-kit, an electrical impedance tomography toolkit for designing and fabricating health and motion sensing devices. EIT-kit contains (1) an extension to a 3D editor for personalizing the for...
详细信息
We present a tool that allows developers to debug hard-and software and their interaction in an early design stage. We combine a SystemC virtual prototype (VP) with an easily configurable and interactive graphical use...
详细信息
Tibetan culture is an indispensable part of world, and Tibetan language is the primary channel to understand it. There are many types of Tibetan learning software, but most of the interaction between the software and ...
详细信息
We present Melody Slot Machine on iPhone, an iPhone application using the melodic morphing method on the basis of the Generative Theory of Tonal Music (GTTM). We previously developed a demonstration system called Melo...
详细信息
ISBN:
(纸本)9798400700965
We present Melody Slot Machine on iPhone, an iPhone application using the melodic morphing method on the basis of the Generative Theory of Tonal Music (GTTM). We previously developed a demonstration system called Melody Slot Machine to introduce the melodic morphing method and presented it at international conferences and exhibitions. Since in-person demonstrations were reduced due to the Covid-19 Pandemic, we implemented an application with the same functionality and made it available for downloading to experience it. Our Melody Slot Machine on iPhone currently has two contents available for download, and we plan to add more contents in the future.
This paper designs a real-time video stream-oriented action recognition platform, namely SmartCamera. It aims to detect target actions in real-time in a resource-constrained edge environment. The SmartCamera adopts 3D...
详细信息
ISBN:
(纸本)9781450384612
This paper designs a real-time video stream-oriented action recognition platform, namely SmartCamera. It aims to detect target actions in real-time in a resource-constrained edge environment. The SmartCamera adopts 3D convolutional network-based sliding window as the classification model of actions, and introduces a flexible sliding window localization method to reduce the computational complexity of action recognition without harming behavioral consistency. The prototype of SmartCamera mainly includes five modules: Realtime Video Stream Collection, Video Stream Preprocessing, Action Recognition, Sliding Window Localization, userinterface. The preliminary evaluation shows that the system can detect target actions with an acceptable accuracy and latency.
We evaluate how highly realistic the inclination of the ground can be perceived with our simple VR walking platform. Firstly we prepared seven maps with different ground inclinations of -30 to 30 degrees and every 10 ...
详细信息
ISBN:
(纸本)9781450390927
We evaluate how highly realistic the inclination of the ground can be perceived with our simple VR walking platform. Firstly we prepared seven maps with different ground inclinations of -30 to 30 degrees and every 10 degrees. Then we conducted a perception experiment of the inclination feeling with each of the treadmill and our proposed platform, and questionnaire evaluation about the presence, the fatigue, and the exhilaration. As a result, it was clarified that even if our proposed platform is used, not only the feeling of presence equivalent to that of the treadmill can be felt, but also the inclination of the ground up and down can be perceived.
暂无评论