检索结果-内蒙古大学图书馆

Innovation in Technology (INOCON), IEEE International Conference for

作者： A. Dharshan Purushottam Kumar S. Ravimaran U Srinivasulu Reddy Department of Artificial Intelligence and Data Science Saranathan College of Engineering Trichy India CoE in Artificial Intelligence Machine Learning & Data Analytics Lab National Institute of Technology Trichy India Department of Computer Applications Machine Learning & Data Analytics Lab National Institute of Technology Trichy India

In the field of aquaponics, where fish and plants coexist in a symbiotic environment, closely monitoring nitrate levels in the water is crucial due to their profound impact on aquatic and plant well-being. Traditional nitrate measurement methods are often time-consuming and costly. Various approaches, including first principles, IoT-based sensors, and machine learning-based soft sensors, have been attempted to address this challenge. However, these efforts face challenges such as expensive sensors, infrequent data collection, multistage data processing using limited sensor types, and the need for regular maintenance like cleaning and calibration. Additionally, varied environmental conditions affect sensor suitability for different water environments, and even some machine learning-based soft sensors have proven inaccurate. In response, soft sensors, especially deep learning-based ones, have gained prominence in industrial applications for their adaptability and accuracy. These sensors provide real-time insights into complex processes without requiring expensive hardware. In this study, an innovative solution was introduced using Long Short-Term Memory (LSTM) technology, a neural network architecture in deep learning known for capturing complex temporal patterns. LSTM is well-suited for modeling and predicting nitrate concentration changes in aquaponics, trained with extensive data collected from various aquaponic ponds. Through rigorous evaluation, a remarkable MSE value of 0.00074 and an impressive R-squared score of 0.98 were achieved, holding potential for scaling up to commercial applications, benefiting aquaponics operations, supporting researchers, and enhancing sustainability and productivity in aquaponic systems.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision

IEEE Transactions on Audio, Speech and Language Processing

引用

IEEE Transactions on Audio, Speech and Language Processing 2025年 33卷 1440-1453页

作者： Saierdaer Yusuyin Te Ma Hao Huang Wenbo Zhao Zhijian Ou School of Computer Science and Technology Xinjiang University Urumqi China China Unicom (Guangdong) Industrial Internet Company Ltd Guangzhou China Speech Processing and Machine Intelligence (SPMI) Lab Department of Electronic Engineering Tsinghua University Beijing China

There exist three approaches for multilingual and crosslingual automatic speech recognition (MCL-ASR) - supervised pretraining with phonetic or graphemic transcription, and self-supervised pretraining. We find that pretraining with phonetic supervision has been underappreciated so far for MCL-ASR, while conceptually it is more advantageous for information sharing between different languages. This paper explores the approach of pretraining with weakly phonetic supervision towards data-efficient MCL-ASR, which is called Whistle. We relax the requirement of gold-standard human-validated phonetic transcripts, and obtain International Phonetic Alphabet (IPA) based transcription by leveraging the LanguageNet grapheme-to-phoneme (G2P) models. We construct a common experimental setup based on the CommonVoice dataset, called CV-Lang10, with 10 seen languages and 2 unseen languages. A set of experiments are conducted on CV-Lang10 to compare, as fair as possible, the three approaches under the common setup for MCL-ASR. Experiments demonstrate the advantages of phoneme-based models (Whistle) for MCL-ASR, in terms of speech recognition for seen languages, crosslingual performance for unseen languages with different amounts of few-shot data, overcoming catastrophic forgetting, and training efficiency. It is found that when training data is more limited, phoneme supervision can achieve better results compared to subword supervision and self-supervision, thereby providing higher data-efficiency.

关键词： Phonetics Multilingual Speech recognition Training Hidden Markov models Data models Symbols Error analysis Pipelines Information sharing

来源：评论

学校读者我要写书评

暂无评论

From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities

arXiv

引用

arXiv 2023年

作者： Ishmam, Md Farhan Shovon, Md Sakib Hossain Mridha, M.F. Dey, Nilanjan Department of Computer Science and Engineering Islamic University of Technology Dhaka Bangladesh Advanced Machine Intelligence Research Lab Dhaka Bangladesh Department of Computer Science and Engineering American International University Dhaka Bangladesh Department of Computer Science and Engineering Techno International New Town Kolkata India

The multimodal task of Visual Question Answering (VQA) encompassing elements of computer vision (CV) and Natural Language Processing (NLP), aims to generate answers to questions on any visual input. Over time, the scope of VQA has expanded from datasets focusing on an extensive collection of natural images to datasets featuring synthetic images, video, 3D environments, and various other visual inputs. The emergence of large pre-trained networks has shifted the early VQA approaches relying on feature extraction and fusion schemes to vision language pre-training (VLP) techniques. However, there is a lack of comprehensive surveys that encompass both traditional VQA architectures and contemporary VLP-based methods. Furthermore, the VLP challenges in the lens of VQA haven’t been thoroughly explored, leaving room for potential open problems to emerge. Our work presents a survey in the domain of VQA that delves into the intricacies of VQA datasets and methods over the field’s history, introduces a detailed taxonomy to categorize the facets of VQA, and highlights the recent trends, challenges, and scopes for improvement. We further generalize VQA to multimodal question answering, explore tasks related to VQA, and present a set of open problems for future investigation. The work aims to navigate both beginners and experts by shedding light on the potential avenues of research and expanding the boundaries of the field. © 2023, CC BY.

关键词： Question answering

来源：评论

学校读者我要写书评

暂无评论

Efficient Multi-Query Oriented Continuous Subgraph Matching 40

Efficient Multi-Query Oriented Continuous Subgraph Matching

引用

40th IEEE International Conference on Data Engineering, ICDE 2024

作者： Ma, Ziyi Yang, Jianye Zhou, Xu Xiao, Guoqing Wang, Jianhua Yang, Liang Li, Kenli Lin, Xuemin School of Artificial Intelligence Hebei University of Technology China Wuzhou University Guangxi Key Laboratory of Machine Vision and Intelligent Control China Cyberspace Institute of Advanced Technology Guangzhou University China PengCheng Laboratory Department of New Networks China College of Computer Science and Electronic Engineering Hunan University China Shenzhen Research Institute Hunan University China Antai College of Economics and Management Shanghai Jiao Tong University China

ISBN: (纸本)9798350317152

Continuous subgraph matching (CSM) is a critical task for analyzing dynamic graphs and has a wide range of applications, such as merchant fraud detection, cyber-attack hunting, and rumor detection. Although many efficient CSM algorithms have been recently proposed, they are mainly designed to process a single query. However, in some application scenarios, multi-query oriented continuous subgraph matching (MQCSM) may be of more interest. To our knowledge, the two existing solutions to MQCSM are outdated due to unsatisfactory performance. In this paper, we propose MQ-Match, an efficient approach to MQCSM. First, we design a compact yet effective index structure CCG, which maintains the local matching result of vertices in the data graph using a directed graph. The directed edges in CCG can be utilized as an effective pruning rule for the subsequent incremental matching algorithm when expanding a partial match. Then, we develop a computation sharing incremental matching algorithm. In specific, a set of matching trees is constructed based on the depth-first search trees of the query graphs. By utilizing CCG, we conduct subgraph matching for the matching tree to collect the incremental matches for the query graphs, where the common structures of query graphs are matched only once. Extensive experiments show that MQ-Match can achieve 3.1x-7071.4x speedup over the competitors, and consumes much less memory under the majority of the experiment settings. © 2024 IEEE.

关键词： Directed graphs

来源：评论

学校读者我要写书评

暂无评论

Unsupervised Pre-training with Language-vision Prompts for Low-Data Instance Segmentation

arXiv

引用

arXiv 2024年

作者： Zhang, Dingwen Li, Hao He, Diqi Liu, Nian Cheng, Lechao Wang, Jingdong Han, Junwei The Brain and Artificial Intelligence Lab. Northwestern Polytechnical University Shaanxi Xi’an710000 China The Inception Institute of Artificial Intelligence United Kingdom The School of Computer Science and Information Engineering Hefei University of Technology Hefei230601 China Department of Computer Vision Baidu Inc. China

In recent times, following the paradigm of DETR (DEtection TRansformer), query-based end-to-end instance segmentation (QEIS) methods have exhibited superior performance compared to CNN-based models, particularly when trained on large-scale datasets. Nevertheless, the effectiveness of these QEIS methods diminishes significantly when confronted with limited training data. This limitation arises from their reliance on substantial data volumes to effectively train the pivotal queries/kernels that are essential for acquiring localization and shape priors. To address this problem, we propose a novel method for unsupervised pre-training in low-data regimes. Inspired by the recently successful prompting technique, we introduce a new method, Unsupervised Pre-training with Language-vision Prompts (UPLVP), which improves QEIS models’ instance segmentation by bringing language-vision prompts to queries/kernels. Our method consists of three parts: (1) Masks Proposal: Utilizes language-vision models to generate pseudo masks based on unlabeled images. (2) Prompt-Kernel Matching: Converts pseudo masks into prompts and injects the best-matched localization and shape features to their corresponding kernels. (3) Kernel Supervision: Formulates supervision for pre-training at the kernel level to ensure robust learning. With the help of our pre-training method, QEIS models can converge faster and perform better than CNN-based models in low-data regimes. Experimental evaluations conducted on MS COCO, Cityscapes, and CTW1500 datasets indicate that the QEIS models’ performance can be significantly improved when pre-trained with our method. Code will be available at: https://***/lifuguan/UPLVP. Copyright © 2024, The Authors. All rights reserved.

关键词： Large datasets

来源：评论

学校读者我要写书评

暂无评论

Dynamic Decision Frequency with Continuous Options

Dynamic Decision Frequency with Continuous Options

引用

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

作者： Amirmohammad Karimi Jun Jin Jun Luo A. Rupam Mahmood Martin Jagersand Samuele Tosatto Department of Computer Science University of Alberta Edmonton Canada Noah's Ark Lab Huawei Technologies Canada Edmonton Canada Alberta Machine Intelligence Institute (Amii) Department of Computer Science Digital Science Center University of Innsbruck Innsbruck Austria

In classic reinforcement learning algorithms, agents make decisions at discrete and fixed time intervals. The duration between decisions becomes a crucial hyperparameter, as setting it too short may increase the problem's difficulty by requiring the agent to make numerous decisions to achieve its goal while setting it too long can result in the agent losing control over the system. However, physical systems do not necessarily require a constant control frequency, and for learning agents, it is often preferable to operate with a low frequency when possible and a high frequency when necessary. We propose a framework called Continuous-Time Continuous-Options (CTCO), where the agent chooses options as sub-policies of variable durations. These options are time-continuous and can interact with the system at any desired frequency providing a smooth change of actions. We demonstrate the effectiveness of CTCO by comparing its performance to classical RL and temporal-abstraction RL methods on simulated continuous control tasks with various action-cycle times. We show that our algorithm's performance is not affected by the choice of environment interaction frequency. Furthermore, we demonstrate the efficacy of CTCO in facilitating exploration in a real-world visual reaching task for a 7 DOF robotic arm with sparse rewards.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Skeletal Human Action Recognition using Hybrid Attention based Graph Convolutional Network

arXiv

引用

arXiv 2022年

作者： Xing, Hao Burschka, Darius Technical University of Munich Machine Vision and Perception Group Munich Institute of Robotics and Machine Intelligence Department of Computer Science Parkring 13 Munich85748 Germany

In skeleton-based action recognition, Graph Convolutional Networks model human skeletal joints as vertices and connect them through an adjacency matrix, which can be seen as a local attention mask. However, in most existing Graph Convolutional Networks, the local attention mask is defined based on natural connections of human skeleton joints and ignores the dynamic relations for example between head, hands and feet joints. In addition, the attention mechanism has been proven effective in Natural Language Processing and image description, which is rarely investigated in existing methods. In this work, we proposed a new adaptive spatial attention layer that extends local attention map to global based on relative distance and relative angle information. Moreover, we design a new initial graph adjacency matrix that connects head, hands and feet, which shows visible improvement in terms of action recognition accuracy. The proposed model is evaluated on two large-scale and challenging datasets in the field of human activities in daily life: NTU-RGB+D and Kinetics skeleton. The results demonstrate that our model has strong performance on both dataset. © 2022, CC BY.

关键词： Convolution

来源：评论

学校读者我要写书评

暂无评论

Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

Skeleton-in-Context: Unified Skeleton Sequence Modeling with...

引用

Conference on computer vision and Pattern Recognition (CVPR)

作者： Xinshun Wang Zhongbin Fang Xia Li Xiangtai Li Chen Chen Mengyuan Liu Sun Yat-sen University National Key Laboratory of General Artificial Intelligence Peking University Shenzhen Graduate School Department of Computer Science ETH Zurich S-Lab Nanyang Technological University Center for Research in Computer Vision University of Central Florida

ISBN: (数字)9798350353006

ISBN: (纸本)9798350353013

In-context learning provides a new perspective for multi-task modeling for vision and NLP. Under this setting, the model can perceive tasks from prompts and accomplish them without any extra task-specific head predictions or model fine-tuning. However, skeleton sequence modeling via in-context learning remains unexplored. Directly applying existing in-context models from other areas onto skeleton sequences fails due to the similarity between inter-frame and cross-task poses, which makes it exceptionally hard to perceive the task correctly from a subtle context. To address this challenge, we propose Skeleton-in-Context (SiC), an effective framework for in-context skeleton sequence modeling. Our SiC is able to handle multiple skeleton-based tasks simultaneously after a single training process and accomplish each task from context according to the given prompt. It can further generalize to new, unseen tasks according to customized prompts. To facilitate context perception, we additionally propose a task-unified prompt, which adaptively learns tasks of different natures, such as partial joint-level generation, sequence-level prediction, or 2D-to-3D motion prediction. We conduct extensive experiments to evaluate the effectiveness of our SiC on multiple tasks, including motion prediction, pose estimation, joint completion, and future pose estimation. We also evaluate its generalization capability on unseen tasks such as motion-in-between. These experiments show that our model achieves state-of-the-art multi-task performance and even outperforms single-task methods on certain tasks.

关键词： Training Solid modeling Three-dimensional displays Silicon carbide Computational modeling Pose estimation Predictive models

来源：评论

学校读者我要写书评

暂无评论

Probabilistic Mission Design in Neuro-Symbolic Systems

arXiv

引用

arXiv 2024年

作者： Kohaut, Simon Flade, Benedict Ochs, Daniel Dhami, Devendra Singh Eggert, Julian Kersting, Kristian Artificial Intelligence and Machine Learning Lab Department of Computer Science TU Darmstadt Darmstadt64283 Germany Honda Research Institute Europe GmbH Carl-Legien-Str. 30 Offenbach63073 Germany Uncertainty in Artificial Intelligence Group Department of Mathematics and Computer Science TU Eindhoven Eindhoven5600 MB Netherlands Hessian AI Centre for Cognitive Science Germany

Advanced Air Mobility (AAM) is a growing field that demands accurate modeling of legal concepts and restrictions in navigating intelligent vehicles. In addition, any implementation of AAM needs to face the challenges posed by inherently dynamic and uncertain human-inhabited spaces robustly. Nevertheless, the employment of Unmanned Aircraft Systems (UAS) beyond visual line of sight (BVLOS) is an endearing task that promises to enhance significantly today's logistics and emergency response capabilities. To tackle these challenges, we present a probabilistic and neuro-symbolic architecture to encode legal frameworks and expert knowledge over uncertain spatial relations and noisy perception in an interpretable and adaptable fashion. More specifically, we demonstrate Probabilistic Mission Design (ProMis), a system architecture that links geospatial and sensory data with declarative, Hybrid Probabilistic Logic Programs (HPLP) to reason over the agent's state space and its legality. As a result, ProMis generates Probabilistic Mission Landscapes (PML), which quantify the agent's belief that a set of mission conditions is satisfied across its navigation space. Extending prior work on ProMis' reasoning capabilities and computational characteristics, we show its integration with potent machine learning models such as Large Language Models (LLM) and Transformer-based vision models. Hence, our experiments underpin the application of ProMis with multi-modal input data and how our method applies to many important AAM scenarios. Copyright © 2024, The Authors. All rights reserved.

关键词： Air mobility

来源：评论

学校读者我要写书评

暂无评论

Partial Convolution U-Net for Inpainting Distorted Images

Partial Convolution U-Net for Inpainting Distorted Images

引用

International Conference on Computing and Networking Technology (ICCNT)

作者： R Rashmi Adyapady B Annappa Prem Sagar Department of Artificial Intelligence and Machine Learning NITTE (Deemed to be University)NMAM Institute of Technology Nitte Karnataka India Department of Computer Science and Engineering DISCOVER Lab National Institute of Technology Karnataka Mangalore Karnataka India

ISBN: (数字)9798350370249

ISBN: (纸本)9798350370270

Image inpainting is a domain in which researchers have shown considerable interest, and when it comes to deep learning techniques, realistic problems become interesting and challenging. In image inpainting, a corrupted facial image with missing holes or significant holes can be restored and compared to the original image to see if it is real or fake. In addition to fixing the texture of the image and getting the image’s high-level abstract properties, it may also recover semantic images such as human faces. In the field of image-inpainting models, the Attention model with features learned through semantic approaches and progressive networks has become particularly popular. The proposed model introduces (i) Attention blocks in each decoder layer of U-Net architecture and (ii) a hybrid loss function leveraging both Mean Square Error (MSE) and Mean Absolute Error (MAE). The proposed Attention-based U-Net showed remarkable performance with SSIM and PSNR by 0.1067 and 13.63, respectively, compared to the previous approaches.

关键词： Deep learning Convolution Semantics Mean square error methods Predictive models Generative adversarial networks Decoding Image restoration Security Faces

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：