检索结果-内蒙古大学图书馆

arXiv 2025年

作者： Wang, Haoyang Xu, Jingao Luo, Xinyu Chen, Xuecheng Zhang, Ting Duan, Ruiyang Liu, Yunhao Chen, Xinlei Shenzhen International Graduate School Tsinghua University China Carnegie Mellon University United States Meituan Academy of Robotics Shenzhen China School of Software Tsinghua University China Pengcheng Laboratory Shenzhen China RISC-V International Open Source Laboratory Shenzhen China

For precise, efficient, and safe drone landings, ground platforms should real-time, accurately locate descending drones and guide them to designated spots. While mmWave sensing combined with cameras improves localization accuracy, the lower sampling frequency of traditional frame cameras compared to mmWave radar creates bottlenecks in system throughput. In this work, we replace the traditional frame camera with event camera, a novel sensor that harmonizes in sampling frequency with mmWave radar within the ground platform setup, and introduce mmE-Loc, a high-precision, low-latency ground localization system designed for drone landings. To fully leverage the temporal consistency and spatial complementarity between these modalities, we propose two innovative modules, consistency-instructed collaborative tracking and graph-informed adaptive joint optimization, for accurate drone measurement extraction and efficient sensor fusion. Extensive real-world experiments in landing scenarios from a leading drone delivery company demonstrate that mmE-Loc outperforms state-of-the-art methods in both localization accuracy and latency. Copyright © 2025, The Authors. All rights reserved.

关键词： Drones

来源：评论

学校读者我要写书评

暂无评论

Training wheels for the robot: Learning from demonstration using simulation

Training wheels for the robot: Learning from demonstration u...

引用

2012 AAAI Fall Symposium

作者： Koenig, Nathan Matarić, Maja Open Source Robotics Foundation Mountain View CA United States University of Southern California Los Angeles CA United States

来源：评论

学校读者我要写书评

暂无评论

Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames 48

Fast-U2++: Fast and Accurate End-to-End Speech Recognition i...

引用

48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

作者： Liang, Chengdong Zhang, Xiao-Lei Zhang, BinBin Wu, Di Li, Shengqiang Song, Xingchen Peng, Zhendong Pan, Fuping Northwestern Polytechnical University School of Marine Science and Technology Xi'an China Horizon Robotics Beijing China WeNet Open Source Community

ISBN: (纸本)9781728163277

Recently, the unified streaming and non-streaming two-pass (U2/U2++) end-to-end model for speech recognition has shown great performance in terms of streaming capability, accuracy and latency. In this paper, we present fast-U2++, an enhanced version of U2++ to further reduce partial latency. The core idea of fast-U2++ is to output partial results of the bottom layers in its encoder with a small chunk, while using a large chunk in the top layers of its encoder to compensate the performance degradation caused by the small chunk. More-over, we use knowledge distillation method to reduce the token emission latency. We present extensive experiments on Aishell-1 dataset. Experiments and ablation studies show that compared to U2++, fast-U2++ reduces model latency from 320ms to 80ms, and achieves a character error rate (CER) of 5.06% with a streaming setup. © 2023 IEEE.

关键词： fast-U2++ model latency streaming speech recognition token emission latency

来源：评论

学校读者我要写书评

暂无评论

tf: The Transform Library

tf: The Transform Library

引用

IEEE Conference on Technologies for Practical Robot Applications

作者： Tully Foote Open Source Robotics Foundation Mountain View

ISBN: (纸本)9781467362238

The tf library was designed to provide a standard way to keep track of coordinate frames and transform data within an entire system such that individual component users can be confident that the data is in the coordinate frame that they want without requiring knowledge of all the coordinate frames in the system. During early development of the Robot Operating System (ROS), keeping track of coordinate frames was identified as a common pain point for developers. The complexity of this task made it a common place for bugs when developers improperly applied transforms to data. The problem is also a challenge due to the often distributed sources of information about transformations between different sets of coordinate frames. This paper will explain the complexity of the problem and distill the requirements. Then it will discuss the design of the tf library in relation to the requirements. A few use cases will be presented to demonstrate successful deployment of the library. And powerful extensions to the core capabilities such as being able to transform data in time as well as in space.

关键词： confident information extensions confident Coordinates FRAMES Frame Libraries Transform bugs

来源：评论

学校读者我要写书评

暂无评论

The HumanoidLab: Involving students in a research centre through an educational initiative

The HumanoidLab: Involving students in a research centre thr...

引用

6th International Conference on Computer Supported Education, CSEDU 2014

作者： Alenyà, Guillem Rivero, José Luis Rull, Aleix Grosch, Patrick Hernández, Sergi Llorens i Artigas 4-6 08028 Barcelona Spain Open Source Robotics Foundation 419 N Shoreline Blvd Mountain View CA 94043 United States

ISBN: (纸本)9789897580215

The HumanoidLab is a more than 5 year old activity aimed to use educational robots to approach students to our Research Centre. Different commercial educative humanoid platforms have been used to introduce students to different aspects of robotics using projects and offering guidance and assistance. About 40 students have performed small mechanics, electronics or programming projects that are used to improve the robots by adding features. robotics competitions are used as a motivation tool. A two weeks course was started that has received 80 undergraduate students, and more than 100 secondary school students in a short version. The experience has been very positive for students and for the institution: some of these students have performed their scholar projects and research in robotics and continue enrolled in the robotics field, and some of them are currently in research groups at IRI.

关键词： Students

来源：评论

学校读者我要写书评

暂无评论

Wekws: A Production First Small-Footprint End-to-End Keyword Spotting Toolkit 48

Wekws: A Production First Small-Footprint End-to-End Keyword...

引用

48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

作者： Wang, Jie Xu, Menglong Hou, Jingyong Zhang, Binbin Zhang, Xiao-Lei Xie, Lei Pan, Fuping Northwestern Polytechnical University School of Marine Science and Technology Xi'an China WeNet Open Source Community China Horizon Robotics Beijing China School of Computer Science Xi'an China

ISBN: (纸本)9781728163277

Keyword spotting (KWS) enables speech-based user interaction and gradually becomes an indispensable component of smart devices. Recently, end-to-end (E2E) methods have be-come the most popular approach for on-device KWS tasks. However, there is still a gap between the research and deployment of E2E KWS methods. In this paper, we introduce WeKws, a production-quality, easy-to-build, and convenient-to-be-applied E2E KWS toolkit. WeKws contains the implementations of several state-of-the-art backbone networks, making it achieve highly competitive results on three publicly available datasets. To make WeKws a pure E2E toolkit, we utilize a refined max-pooling loss to make the model learn the ending position of the keyword by itself, which significantly simplifies the training pipeline and makes WeKws very efficient to be applied in real-world scenarios. The toolkit is publicly available at https://***/wenet-e2e/wekws. © 2023 IEEE.

关键词： Computer vision

来源：评论

学校读者我要写书评

暂无评论

LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech 48

LightGrad: Lightweight Diffusion Probabilistic Model for Tex...

引用

48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

作者： Chen, Jie Song, Xingchen Peng, Zhendong Zhang, Binbin Pan, Fuping Wu, Zhiyong Tsinghua University Shenzhen International Graduate School Shenzhen China Horizon Robotics Beijing China WeNet Open Source Community The Chinese University of Hong Kong Hong Kong Hong Kong

ISBN: (纸本)9781728163277

Recent advances in neural text-to-speech (TTS) models bring thousands of TTS applications into daily life, where models are deployed in cloud to provide services for customs. Among these models are diffusion probabilistic models (DPMs), which can be stably trained and are more parameter-efficient compared with other generative models. As transmitting data between customs and the cloud introduces high latency and the risk of exposing private data, deploying TTS models on edge devices is preferred. When implementing DPMs onto edge devices, there are two practical problems. First, current DPMs are not lightweight enough for resource-constrained devices. Second, DPMs require many denoising steps in inference, which increases latency. In this work, we present LightGrad, a lightweight DPM for TTS. LightGrad is equipped with a lightweight U-Net diffusion decoder and a training-free fast sampling technique, reducing both model parameters and inference latency. Streaming inference is also implemented in LightGrad to reduce latency further. Compared with Grad-TTS, LightGrad achieves 62.2% reduction in paramters, 65.7% reduction in latency, while preserving comparable speech quality on both Chinese Mandarin and English in 4 denoising steps1. © 2023 IEEE.

关键词： Diffusion

来源：评论

学校读者我要写书评

暂无评论

Extending open dynamics engine for the DARPA virtual robotics challenge

Lecture Notes in Computer Science (including subseries Lectu...

引用

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2014年 8810卷 37-48页

作者： Hsu, John M. Peters, Steven C. Open Source Robotics Foundation 419 N. Shoreline Blvd Mountain ViewCA94041 United States

The DARPA Virtual robotics Challenge (VRC)[1] was a cloud-based robotic simulation competition. Teams competed by writing control software for a humanoid robot to perform disaster response tasks in real-time simulation. Simulating the physics and sensors of a humanoid robot in real-time presented challenges related to the trade-off between simulation accuracy and computational time. The Projected Gauss-Seidel (PGS) iterative solver was chosen for its performance and robustness, but it lacks the accuracy and the fidelity required for reliable simulation of task-level behaviors. This paper presents the modeling decisions and algorithmic improvements made to the open Dynamics Engine (ODE) physics solver that improved PGS accuracy and fidelity without sacrificing its real-time simulation performance in the VRC. These improvements allowed for stable simulation regardless of user input during the VRC, and supported reliable contact dynamics during VRC tasks without violating the near real-time requirement. © Springer International Publishing Switzerland 2014.

关键词： Anthropomorphic robots

来源：评论

学校读者我要写书评

暂无评论

WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

arXiv

引用

arXiv 2022年

作者： Zhang, Binbin Wu, Di Peng, Zhendong Song, Xingchen Yao, Zhuoyuan Lv, Hang Xie, Lei Yang, Chao Pan, Fuping Niu, Jianwei Horizon Robotics Beijing China School of Computer Science Northwestern Polytechnical University Xi'An China WeNet Open Source Community China

Recently, we made available WeNet [1], a production-oriented end-to-end speech recognition toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address the streaming and non-streaming decoding modes in a single model. To further improve ASR performance and facilitate various production requirements, in this paper, we present WeNet 2.0 with four important updates. (1) We propose U2++, a unified two-pass framework with bidirectional attention decoders, which includes the future contextual information by a right-to-left attention decoder to improve the representative ability of the shared encoder and the performance during the rescoring stage. (2) We introduce an n-gram based language model and a WFST-based decoder into WeNet 2.0, promoting the use of rich text data in production scenarios. (3) We design a unified contextual biasing framework, which leverages user-specific context (e.g., contact lists) to provide rapid adaptation ability for production and improves ASR accuracy in both with-LM and without-LM scenarios. (4) We design a unified IO to support large-scale data for effective model training. In summary, the brand-new WeNet 2.0 achieves up to 10% relative recognition performance improvement over the original WeNet on various corpora and makes available several important production-oriented features. © 2022, CC BY.

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames

Fast-U2++: Fast and Accurate End-to-End Speech Recognition i...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Chengdong Liang Xiao-Lei Zhang BinBin Zhang Di Wu Shengqiang Li Xingchen Song Zhendong Peng Fuping Pan School of Marine Science and Technology Northwestern Polytechnical University Xi’an China Horizon Robotics Beijing China WeNet Open Source Community

关键词： Degradation Error analysis Speech recognition Signal processing Acoustics Speech processing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：