检索结果-内蒙古大学图书馆

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Zeyu Xie Xuenan Xu Zhizheng Wu Mengyue Wu MoE Key Lab of Artificial Intelligence AI Institute X-LANCE Lab Shanghai Jiao Tong University Shanghai AI Lab School of Data Science Shenzhen Research Institute of Big Data CUHK-Shenzhen China

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Recently, audio generation tasks have attracted considerable research interests. Despite rapid advancements in generating high-fidelity audio that is coarsely aligned with the text description, precise temporal controllability is still a challenge, which is essential to integrate audio generation with real applications. In this work, we propose a temporal controlled audio generation framework, PicoAudio. It leverages data crawling, segmentation and filtering to simulate fine-grained temporally-aligned audio-text data. Furthermore, PicoAudio integrates temporal information to guide audio generation through tailored model design. With the effective text processing capabilities from large language models, PicoAudio can take natural language input and generate audio that aligns well with the temporal description in the input. Both subjective and objective evaluation demonstrate that PicoAudio dramatically surpasses current state-of-the-art generation models in terms of timestamp and occurrence frequency controllability. Generation samples are available at the $PicoAudio - Demo$.

关键词： Time-frequency analysis Filtering Annotations Process control Signal processing Controllability Solids Speech processing Frequency control Text processing

来源：评论

学校读者我要写书评

暂无评论

Towards Personalized Evaluation of Large Language Models with An Anonymous Crowd-Sourcing Platform

arXiv

引用

arXiv 2024年

作者： Cheng, Mingyue Zhang, Hao Yang, Jiqian Liu, Qi Li, Li Huang, Xin Song, Liwei Li, Zhi Huang, Zhenya Chen, Enhong Anhui Province Key Laboratory of Big Data Analysis and Application University of Science and Technology of China State Key Laboratory of Cognitive Intelligence Hefei China Shenzhen International Graduate School Tsinghua University Shenzhen China

Large language model evaluation plays a pivotal role in the enhancement of its capacity. Previously, numerous methods for evaluating large language models have been proposed in this area. Despite their effectiveness, these existing works mainly focus on assessing objective questions, overlooking the capability to evaluate subjective questions which is extremely common for large language models. Additionally, these methods predominantly utilize centralized datasets for evaluation, with question banks concentrated within the evaluation platforms themselves. Moreover, the evaluation processes employed by these platforms often overlook personalized factors, neglecting to consider the individual characteristics of both the evaluators and the models being evaluated. To address these limitations, we propose a novel anonymous crowd-sourcing evaluation platform, BingJian, for large language models that employs a competitive scoring mechanism where users participate in ranking models based on their performance. This platform stands out not only for its support of centralized evaluations to assess the general capabilities of models but also for offering an open evaluation gateway. Through this gateway, users have the opportunity to submit their questions, testing the models on a personalized and potentially broader range of capabilities. Furthermore, our platform introduces personalized evaluation scenarios, leveraging various forms of human-computer interaction to assess large language models in a manner that accounts for individual user preferences and contexts. The demonstration of BingJian can be accessed at https://***/Mingyue-Cheng/Bingjian. Copyright © 2024, The Authors. All rights reserved.

关键词： Crowdsourcing

来源：评论

学校读者我要写书评

暂无评论

A bounded ability estimation for computerized adaptive testing 23

A bounded ability estimation for computerized adaptive testi...

引用

Proceedings of the 37th International Conference on Neural Information Processing Systems

作者： Yan Zhuang Qi Liu GuanHao Zhao Zhenya Huang Weizhe Huang Zachary A. Pardos Enhong Chen Jinze Wu Xin Li Anhui Province Key Laboratory of Big Data Analysis and Application University of Science and Technology of China and State Key Laboratory of Cognitive Intelligence University of California Berkeley State Key Laboratory of Cognitive Intelligence and iFLYTEK Co. Ltd

Computerized adaptive testing (CAT), as a tool that can efficiently measure student's ability, has been widely used in various standardized tests (e.g., GMAT and GRE). The adaptivity of CAT refers to the selection of the most informative questions for each student, reducing test length. Existing CAT methods do not explicitly target ability estimation accuracy since there is no student's true ability as ground truth; therefore, these methods cannot be guaranteed to make the estimate converge to the true with such limited responses. In this paper, we analyze the statistical properties of estimation and find a theoretical approximation of the true ability: the ability estimated by full responses to question bank. Based on this, a Bounded Ability Estimation framework for CAT (BECAT) is proposed in a data-summary manner, which selects a question subset that closely matches the gradient of the full responses. Thus, we develop an expected gradient difference approximation to design a simple greedy selection algorithm, and show the rigorous theoretical and error upper-bound guarantees of its ability estimate. Experiments on both real-world and synthetic datasets, show that it can reach the same estimation accuracy using 15% less questions on average, significantly reducing test length.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Differentiable and Scalable Generative Adversarial Models for data Imputation (Extended Abstract) 40

Differentiable and Scalable Generative Adversarial Models fo...

引用

40th IEEE International Conference on data Engineering, ICDE 2024

作者： Wu, Yangyang Wang, Jun Miao, Xiaoye Wang, Wenjia Yin, Jianwei Software College Zhejiang University Ningbo China Academy of Interdisciplinary Studies The Hong Kong University of Science and Technology Hong Kong Hong Kong Center for Data Science Zhejiang University Hangzhou China The State Key Lab of Brain-Machine Intelligence Zhejiang University Hangzhou China Guangzhou China College of Computer Science Zhejiang University Hangzhou China

ISBN: (纸本)9798350317152

The dramatically increasing volume of incomplete data makes the imputation models computationally infeasible in many real-life applications. In this paper, we propose an effective scalable imputation system named SCIS to significantly speed up the training of the differentiable generative adversarial imputation models under accuracy-guarantees for large-scale incomplete data. SCIS consists of two modules, differentiable imputation modeling (DIM) and sample size estimation (SSE). DIM leverages a new masking Sinkhorn divergence function to make an arbitrary generative adversarial imputation model differentiable, while for such a differentiable imputation model, SSE can estimate an appropriate sample size to ensure the user-specified imputation accuracy of the final model. Moreover, SCIS can also accelerate the autoencoder based imputation models. Extensive experiments upon several real-life large-scale datasets demonstrate that, our proposed system can accelerate the generative adversarial model training by 6.23x. Using around 1.27% samples, SCIS yields competitive accuracy with the state-of-the-art imputation methods in much shorter computation time. © 2024 IEEE.

关键词： Large datasets

来源：评论

学校读者我要写书评

暂无评论

Hierarchical multi-granularity classification based on bidirectional knowledge transfer

引用

Multimedia Systems 2024年第4期30卷 207-207页

作者： Jiang, Juan Yang, Jingmin Zhang, Wenjie Zhang, Hongbin School of Computer Science Minnan Normal University Fujian Zhangzhou363000 China Key Laboratory of Data Science and Intelligence Application Fujian Province University Fujian Zhangzhou363000 China Department of Electronic Engineering National Taipei University of Technology Taipei China

Hierarchical multi-granularity classification is the task of classifying objects according to multiple levels or granularities. The class hierarchy is vital side information for hierarchical multi-granularity classification. The existing hierarchical multi-granularity classification research utilizes class hierarchy to classify from coarse to fine or fine to coarse. Although these methods are effective in many cases, there are still two issues: (1) multi-task learning for hierarchical multi-granularity classification leads to decreased classification performance;(2) class hierarchy transfer learning is prone to error propagation. In this paper, we propose a bidirectional knowledge transfer model framework to address these issues. Firstly, we improve classification performance through data augmentation. Specifically, by learning the similarity between the original image and the enhanced image, better learn discriminative features, which is beneficial for subsequent classification. Secondly, using class hierarchy trees, we propose reverse hierarchical knowledge transfer to correct some errors in forward hierarchical propagation and improve hierarchical consistency. In addition, we also construct a hierarchical network that adds features from coarse-grained levels to fine-grained levels. The experimental results on six datasets with different class hierarchies demonstrate the effectiveness and superiority of the proposed model. Especially on the CUB-200-2011 and Cifar-100 datasets, our model improved classification accuracy by 3.61% and 4.17% compared to the suboptimal model. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.

关键词： Classification (of information)

来源：评论

学校读者我要写书评

暂无评论

AudioTime: A Temporally-aligned Audio-text Benchmark dataset

AudioTime: A Temporally-aligned Audio-text Benchmark Dataset

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Recent advances in audio generation have enabled the creation of high-fidelity audio clips from free-form textual descriptions. However, temporal relation, a critical feature for audio content, is currently underrepresented in mainstream models, resulting in an imprecise temporal controllability. Specifically, users cannot accurately control the timestamps of sound events using free-form text. One significant challenge is the absence of a high-quality, temporally-aligned audio-text dataset, which is essential for training models with temporal control. The more temporally-aligned the annotations, the better the models can understand the precise relationship between audio outputs and temporal textual prompts. Therefore, we propose a temporally-aligned audio-text dataset, AudioTime. It provides text annotations rich in temporal information such as timestamps, duration, frequency, and ordering, covering almost all aspects of temporal control. Additionally, we offer a comprehensive test set and evaluation metric to assess the temporal control performance of text-to-audio generation models. Examples are available on the $AudioTime - Demo$.

关键词： Measurement Training Time-frequency analysis Analytical models Annotations Signal processing Benchmark testing Controllability Speech processing Frequency control

来源：评论

学校读者我要写书评

暂无评论

Learning by Applying: A General Framework for Mathematical Reasoning via Enhancing Explicit Knowledge Learning

arXiv

引用

arXiv 2023年

作者： Liu, Jiayu Huang, Zhenya Zhai, Chengxiang Liu, Qi Anhui Province Key Laboratory of Big Data Analysis and Application School of Data Science School of Computer Science and Technology University of Science and Technology of China China State Key Laboratory of Cognitive Intelligence China University of Illinois at Urbana-Champaign United States

Mathematical reasoning is one of the crucial abilities of general artificial intelligence, which requires machines to master mathematical logic and knowledge from solving problems. However, existing approaches are not transparent (thus not interpretable) in terms of what knowledge has been learned and applied in the reasoning process. In this paper, we propose a general Learning by Applying (LeAp) framework to enhance existing models (backbones) in a principled way by explicit knowledge learning. In LeAp, we perform knowledge learning in a novel problem-knowledge-expression paradigm, with a Knowledge Encoder to acquire knowledge from problem data and a Knowledge Decoder to apply knowledge for expression reasoning. The learned mathematical knowledge, including word-word relations and word-operator relations, forms an explicit knowledge graph, which bridges the knowledge "learning" and "applying" organically. Moreover, for problem solving, we design a semantics-enhanced module and a reasoning-enhanced module that apply knowledge to improve the problem comprehension and symbol reasoning abilities of any backbone, respectively. We theoretically prove the superiority of LeAp's autonomous learning mechanism. Experiments on three real-world datasets show that LeAp improves all backbones' performances, learns accurate knowledge, and achieves a more interpretable reasoning process. Copyright © 2023, The Authors. All rights reserved.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

Hierarchical Feature Selection Based on Neighborhood Interclass Spacing from Fine to Coarse

SSRN

引用

SSRN 2023年

作者： Lin, Zilong Lin, Yaojin School of Computer Science Minnan Normal University Zhangzhou363000 China Key Laboratory of Data Science and Intelligence Application Minnan Normal University Fujian Zhangzhou363000 China

In hierarchical classification learning, hierarchical feature selection algorithms plays an important role which can be used to address the curse of dimensionality. Existing hierarchical feature selection algorithms based on the granular computing framework all use three basic search strategies for similar and dissimilar search, which in turn calculate the importance of features to the global label for feature selection. However, the three original search strategies, especially the most commonly used sibling search strategies, can only remain in the fine-grained hierarchy to select features. Therefore, a Hierarchical Feature Selection Based on Neighborhood Interclass Spacing From Fine to Coarse (HFSNIS) algorithm is proposed in this paper, which aims to change the original stay of feature selection in fine-grained hierarchy to feature selection with coarse-grained hierarchy. The framework of the HFSNIS algorithm is as follows: Firstly, each fine-grained leaf node is coarsened to the coarsest hierarchy of granularity from fine to coarse where the non-root ancestor node is located. Next, the search for similar and dissimilar nearest neighbor samples is performed at the coarsest granularity hierarchy. Finally, the features are filtered using the neighborhood interclass spacing model to obtain a subset of features. Therefore, this HFSNIS algorithm based on the Coarsest Search Strategy (CSS) can re-select features that were previously ignored in the fine-grained hierarchy as a final feature subset. Therefore, the filtered feature subset is a better feature subset. Finally, the proposed algorithm outperforms two fuzzy rough set feature selection algorithms and five hierarchical optimisation feature selection algorithms on six datasets. © 2023, The Authors. All rights reserved.

关键词： Feature Selection

来源：评论

学校读者我要写书评

暂无评论

DARE: disentanglement-augmented rationale extraction 22

DARE: disentanglement-augmented rationale extraction

引用

Proceedings of the 36th International Conference on Neural Information Processing Systems

作者： Linan Yue Qi Liu Yichao Du Yanqing An Li Wang Enhong Chen Anhui Province Key Laboratory of Big Data Analysis and Application University of Science and Technology of China and State Key Laboratory of Cognitive Intelligence Anhui Province Key Laboratory of Big Data Analysis and Application University of Science and Technology of China and State Key Laboratory of Cognitive Intelligence and ByteDance

ISBN: (纸本)9781713871088

Rationale extraction can be considered as a straightforward method of improving the model explainability, where rationales are a subsequence of the original inputs, and can be extracted to support the prediction results. Existing methods are mainly cascaded with the selector which extracts the rationale tokens, and the predictor which makes the prediction based on selected tokens. Since previous works fail to fully exploit the original input, where the information of non-selected tokens is ignored, in this paper, we propose a Disentanglement-Augmented Rationale Extraction (DARE) method, which encapsulates more information from the input to extract rationales. Specifically, it first disentangles the input into the rationale representations and the non-rationale ones, and then learns more comprehensive rationale representations for extracting by minimizing the mutual information (MI) between the two disentangled representations. Besides, to improve the performance of MI minimization, we develop a new MI estimator by exploring existing MI estimation methods. Extensive experimental results on three real-world datasets and simulation studies clearly validate the effectiveness of our proposed method.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A Multi-Signal Perception Network for Textile Composition Identification 48

A Multi-Signal Perception Network for Textile Composition Id...

引用

48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

作者： Peng, Bo He, Liren Wu, Dong Chi, Mingmin Chen, Jintao Shanghai Key Laboratory of Data Science Fudan University School of Computer Science Shanghai China Zhongshan PoolNet Technology Co. Ltd Zhongshan Fudan Joint Innovation Center Zhongshan China Shanghai Fabric Eyes Artificial Intelligence Technology Co. Ltd Shanghai China Zhengzhou Zhongke Institute of Integrated Circuit and System Application China

ISBN: (纸本)9781728163277

Textile composition identification (TCI) is an essential basic link in the textile industry. Methods based on computer vision or near-infrared (NIR) signal processing have shown potential for the nondestructive TCI task. However, these methods ignore that the integration of NIR signals and visual information may help the model learn a better representation through information complementarity. This paper propose a Multi-Signal Perception Network (MSPNet) for nondestructive textile composition identification, allowing the model to benefit from the advantages of multimodal data. Firstly, a two-way feature extraction network is used to obtain multi-modal features. After that, we propose a multimodal signal fusion module to control the aggregation granularity among multimodal data. Specifically, the target areas of the image are perceived by a target area perception module (TAP). Then a bi-gated aggregation (Bi-GFA) is designed to capture consistent semantic information from signal to image and image to signal. The quantitative and qualitative results of the proposed MSPNet are significantly improved compared to both single and multimodal approaches. © 2023 IEEE.

关键词： Computer vision

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：