检索结果-内蒙古大学图书馆

24th International Microwave and Radar Conference, MIKON 2022

作者： Tran, Thanh Nam Bogdan, Grzegorz Hanoi University Department of Software Engineering Hanoi Viet Nam Institute of Radioelectronics and Multimedia Technology Warsaw University of Technology Warsaw Poland

ISBN: (纸本)9788395602030

The convolutional neural network (CNN) is a machine learning methodology that was successfully implemented in many domains, including electromagnetics and wireless communications. This paper investigates the use of CNN in modulation and pulse shaping filter classification, which can be used in future cognitive radios to facilitate the reception of unknown signals. The focus of this paper is on the conversion of raw baseband samples to a format which complies with the CNN input. The AlexNet CNN architecture was selected and trained on six different datasets. Obtained results show that CNN can be used to classify modulation schemes and the raised cosine filter roll-off factor in noisy samples even without carrier synchronization. © 2022 Warsaw University of Technology.

关键词： Convolution

来源：评论

学校读者我要写书评

暂无评论

GoT: Effective Graph-of-Thought Reasoning in Language Models

GoT: Effective Graph-of-Thought Reasoning in Language Models

引用

2024 Findings of the Association for Computational Linguistics: NAACL 2024

作者： Yao, Yao Li, Zuchao Zhao, Hai Department of Computer Science and Engineering Shanghai Jiao Tong University China MoE Key Lab of Artificial Intelligence AI Institute Shanghai Jiao Tong University China National Engineering Research Center for Multimedia Software School of Computer Science Wuhan University Wuhan430072 China

ISBN: (纸本)9798891761193

With the widespread use of language models (LMs) in NLP tasks, researchers have discovered the potential of Chain-of-thought (CoT) to assist LMs in accomplishing complex reasoning tasks by generating intermediate steps. However, human thought processes are often non-linear, rather than simply sequential chains of thoughts. Therefore, we propose Graph-of-Thought (GoT) reasoning, which models human thought processes not only as a chain but also as a graph. By representing thought units as nodes and connections between them as edges, our approach captures the non-sequential nature of human thinking and allows for a more realistic modeling of thought processes. GoT adopts a two-stage framework with an additional GoT encoder for thought graph representation and fuses the graph representation with the original input representation through a gated fusion mechanism. We evaluate GoT's performance on a text-only reasoning task (AQUA-RAT) and a multimodal reasoning task (ScienceQA). Our model achieves significant improvement over the strong CoT baseline on the AQUA-RAT test set and boosts accuracy from 85.19% to 87.59% using the T5-base model over the state-of-the-art Multimodal-CoT (Zhang et al., 2023) on the ScienceQA test set. Our code is publicly available at https://***/Zoeyyao27/Graphof-Thought. © 2024 Association for Computational Linguistics.

关键词： Rats

来源：评论

学校读者我要写书评

暂无评论

A Deep Understanding Video Q&A System for Film Education in Acting Department

A Deep Understanding Video Q&A System for Film Education in ...

引用

Intelligent Education and Intelligent Research (IEIR), International Conference on

作者： Zhengqian Wu Ruizhe Li Jiahao Guo Zhongyuan Wang Chao Liang Hubei Key Laboratory of Multimedia and Network Communication Engineering National Engineering Research Center for Multimedia Software School of Computer Science Wuhan University

Recently, advancements in artificial intelligence technology have greatly influenced the field of education, particularly in the area of intelligent homework assistance. However, current approaches are primarily designed for procedural and logical tasks and often lack comprehension abilities. This limitation is particularly evident when it comes to multi-hop and continuous tasks. To address this challenge, the integration of Large Language Model (LLM) has significantly enhanced the capability of AI systems to handle multi-hop and highly interconnected inputs. In this study, we focus on the learning needs of students in Acting Department, specifically their study of movies and the significance of classic movie videos in their learning process. However, assessing deep comprehension of classic movies poses its own challenges. To overcome these challenges, we develop a quiz system utilizing Knowledge Graphs (KG) and LLM to facilitate a deeper understanding of classic films. The generation of video quiz pairs is achieved through the use of Automatic Speech Recognition (ASR) technology, which leverages movie subtitles for question generation. For answering these questions, we employ techniques KG and LLM to process questions and retrieve corresponding answers. The proposed method achieves good performance in Deep Video Understanding (DVU) task of NIST TRECVID, demonstrating its effectiveness.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Imitation Learning from Purified Demonstrations 41

Imitation Learning from Purified Demonstrations

引用

41st International Conference on Machine Learning, ICML 2024

作者： Wang, Yunke Dong, Minjing Zhao, Yukun Du, Bo Xu, Chang School of Computer Science National Engineering Research Center for Multimedia Software Institute of Artificial Intelligence Wuhan Institute of Data Intelligence Wuhan University China Department of Computer Science City University of Hong Kong Hong Kong School of Computer Science Faculty of Engineering The University of Sydney Australia

Imitation learning has emerged as a promising approach for addressing sequential decision-making problems, with the assumption that expert demonstrations are optimal. However, in real-world scenarios, most demonstrations are often imperfect, leading to challenges in the effectiveness of imitation learning. While existing research has focused on optimizing with imperfect demonstrations, the training typically requires a certain proportion of optimal demonstrations to guarantee performance. To tackle these problems, we propose to purify the potential noises in imperfect demonstrations first, and subsequently conduct imitation learning from these purified demonstrations. Motivated by the success of diffusion model, we introduce a two-step purification via diffusion process. In the first step, we apply a forward diffusion process to smooth potential noises in imperfect demonstrations by introducing additional noise. Subsequently, a reverse generative process is utilized to recover the optimal demonstration from the diffused ones. We provide theoretical evidence supporting our approach, demonstrating that the distance between the purified and optimal demonstration can be bounded. Empirical results on MuJoCo and RoboSuite demonstrate the effectiveness of our method from different aspects. Copyright 2024 by the author(s)

关键词： Contrastive Learning

来源：评论

学校读者我要写书评

暂无评论

An Experimental Study of Unsupervised Rank Aggregation Methods in World University Rankings

An Experimental Study of Unsupervised Rank Aggregation Metho...

引用

Intelligent Education and Intelligent Research (IEIR), International Conference on

作者： Shiwei Feng Qi Deng Siyi Wang Lin Song Chao Liang Hubei Key Laboratory of Multimedia and Network Communication Engineering National Engineering Research Center for Multimedia Software (NERCMS) School of Computer Science Wuhan University

Recently, more and more college ranking systems are receiving attention due to the demand and necessity of higher education and college ranking system is a key topic in the field of social choice. However, these rankings have different evaluation criteria that lead to confusion for decision-makers. To address this issue, a simple and practical approach is to aggregate these ranking systems from different sources. In this paper, we conduct an experimental study on aggregation of world university ranking. Specifically, we first classify unsupervised RA methods. Then, we compare the aggregation effects of 28 unsupervised RA methods on five public university rankings.

关键词：

来源：评论

学校读者我要写书评

暂无评论

MFLCP: Personalized Multimodal Federated Learning via Collaborative Prompting with Missing Modalities 25

MFLCP: Personalized Multimodal Federated Learning via Collab...

引用

Proceedings of the 2025 International Conference on multimedia Retrieval

作者： Wenli Li Meiyu Liang Ruoyu Fan Yuxuan Li Beijing Key Laboratory of Intelligent Telecommunication Software and Multimedia School of Computer Science (National Pilot Software Engineering School) Beijing University of Posts and Telecommunications Beijing China

ISBN: (纸本)9798400718779

Multimodal Federated learning (FL) is a collaborative and privacy preserving machine learning paradigm for multimodal data. With the impressive performance of large-scale pre-trained models, an increasing number of these models are being applied to FL. However, multimodal data in the real world is usually incomplete in modalities. Additionally, directly applying these large-scale pre-trained models in the federated learning framework will lead to the problem of high computational and communication costs. To address these problems, we propose a novel Personalized Multimodal Federated Learning method via Collaborative Prompting with Missing Modalities (MFLCP) . Specifically, we propose an efficient large-scale pre-trained personalized multimodal federated learning framework. To address the issue of incomplete modalities in multimodal data, we propose a modal projection-aware collaborative prompting strategy for incomplete multimodal federated learning. Different categories of prompts are designed for the missing categories, and the modality mapping part and leverage complementary semantic information from different modalities are designed to guide prompt learning, promoting better interaction between modalities. In addition, we propose a communication optimization method for efficient multimodal federated learning, which reduces the parameters of multimodal pre-trained models in the process of federated communication transmission, enhances the speed of local training, and significantly improves convergence speed by integrating large-scale pre-trained models in a lightweight manner. Meanwhile, we establish a personalized adaptive update mechanism for the federated local model, which can adaptively update the local model according to the characteristics of local data, effectively reduces the impact of data heterogeneity. Extensive experimental results on several benchmark datasets demonstrate that the proposed method outperforms the state-of-the-art baselines.

关键词： communication optimization

来源：评论

学校读者我要写书评

暂无评论

WePerson: Generalizable Re-Identification From Synthetic Data With Single Query Adaptation

IEEE Transactions on Biometrics, Behavior, and Identity Scie...

引用

IEEE Transactions on Biometrics, Behavior, and Identity Science 2025年第3期7卷 458-470页

作者： Li, He Ye, Mang Su, Kehua Du, Bo Wuhan University National Engineering Research Center for Multimedia Software School of Computer Science Wuhan430072 China Hubei Luojia Laboratory Wuhan China

Person re-identification (ReID) aims to retrieve a target person across non-overlapping cameras. Due to the uncontrollable environment and the privacy concerns, the diversity and scale of real-world training data are usually limited, resulting in poor testing generalizability. To overcome these problems, we introduce a large-scale Weather Person dataset that generates synthetic images with different weather conditions, complex scenes, natural lighting changes, and various pedestrian accessories in a simulated camera network. The environment is fully controllable, supporting factor-by-factor analysis. To narrow the gap between synthetic data and real-world scenarios, this paper introduces a simple yet efficient domain generalization method via Single Query Adaptation (SQA), calibrating the statistics and transformation parameters in BatchNorm layers with only a single query image in the target domain. This significantly improves performance through a single adaptation epoch, greatly boosting the applicability of the ReID technique for intelligent surveillance systems. Abundant experiment results demonstrate that the WePerson dataset achieves superior performance under direct transfer setting without any real-world data training. In addition, the proposed SQA method shows amazing robustness in real-to-real, synthetic-to-real ReID, and various corruption settings. © 2019 IEEE.

关键词： Domain Generalization Image Retrieval Person Re-Identification Synthetic Data Test Time Adaptation

来源：评论

学校读者我要写书评

暂无评论

Who, What and Where: Composite-semantic Instance Search for Story Videos

Who, What and Where: Composite-semantic Instance Search for ...

引用

IEEE International Conference on multimedia and Expo (ICME)

作者： Jiahao Guo Chao Liang Zhongyuan Wang National Engineering Research Center for Multimedia Software (NERCMS) Hubei Key Laboratory of Multimedia and Network Communication Engineering School of Computer Science Wuhan University

This paper studies Who-What-Where (3W) composite-semantic video instance search (INS) problem, which aims to find a specific person doing a queried action in a particular place. Mainstream approaches adopt a complete decomposition strategy, which divides a composite-semantic query into multiple single-semantic queries. However, due to the lack of necessary correlation analysis among constituent semantics, these methods cannot always generate identity-matching and semantics-consistent 3W INS results. To address the above challenges, we propose a partial decomposition scheme with action as the link. Specifically, we selectively split the 3W INS as person-action INS and action-location INS. The former ensures the retrieved person and action share the same identity by modeling their relative spatial positions at the frame level, while the latter improves the semantic consistency between action and location with a cross-semantic attention mechanism at the shot level. Particularly, we build a large-scale 3W INS dataset, containing over 470k video shots, on basis of NIST TRECVID 2016-2021 INS tasks and verify the effectiveness of the proposed method with both quantitative and qualitative experiments.

关键词：

来源：评论

学校读者我要写书评

暂无评论

NNVISR: Bring Neural Network Video Interpolation and Super Resolution into Video Processing Framework

arXiv

引用

arXiv 2023年

作者： Tong, Yuan Hu, Mengshun Wang, Zheng National Engineering Research Center for Multimedia Software Hubei Key Laboratory of Multimedia and Network Communication Engineering School of Computer Science Wuhan University China

We present NNVISR - an open-source filter plugin for the VapourSynth1 video processing framework, which facilitates the application of neural networks for various kinds of video enhancing tasks, including denoising, super resolution, interpolation, and spatio-temporal super-resolution. NNVISR fills the gap between video enhancement neural networks and video processing pipelines, by accepting any network that enhances a group of frames, and handling all other network agnostic details during video processing. NNVISR is publicly released at https://***/tongyuantongyu/vs-NNVISR. © 2023, CC BY-SA.

关键词： Interpolation

来源：评论

学校读者我要写书评

暂无评论

Self-Guided Network for Fine-Grained Object Localization Using Weakly Supervised Learning

Self-Guided Network for Fine-Grained Object Localization Usi...

引用

2022 IEEE International Conference on multimedia and Expo, ICME 2022

作者： Qu, Xiangyu Wang, Zengmao School of Computer Science Wuhan University China National Engineering Research Center for Multimedia Software Artificial Intelligence Institute of Wuhan University China

ISBN: (数字)9781665485630

ISBN: (纸本)9781665485630

Weakly supervised object localization (WSOL) aims at pre-dicting the location of objects with image-level labels. Fine-grained WSOL task has its characteristic challenge compared with generic object localization. The structural information of the objects in fine-grained benchmark has little relevance to class. Previous WSOL works mainly focus on learning the most class discriminative parts recursively, leading to se-rious structural feature missing issue. In this paper, we pro-pose a self-guided network (SGN) which consists of two branch deep classification networks. It adopts a coarse-to-fine strategy to detect the structural information of the ob-ject. First, we devise a self-adaptive method (SAM) to de-tect the most body structure of the object by directly leveraging the feature recognition ability of the first classifier. Then, an object structure generation (OSG) method is proposed in the fine localization phase. OSG helps the second classifier to learn the boundary feature of the object with less back-ground noise. Extensive experiments on four well-known fine-grained benchmarks, including CUB, FGVC Aircraft, Stanford Dogs, and Stanford Cars show that the proposed SGN outperforms the state-of-the-art WSOL methods. © 2022 IEEE.

关键词： Computer vision

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：