检索结果-内蒙古大学图书馆

Crowdsourcing aggregation with deep Bayesian learning

science China(Information sciences) 2021年第3期64卷 46-56页

作者： Shao-Yuan LI Sheng-Jun HUANG Songcan CHEN College of Computer Science and Technology College of Artificial IntelligenceNanjing University of Aeronautics and Astronautics State Key Laboratory for Novel Software Technology Nanjing University

In this study, we consider a crowdsourcing classification problem in which labeling information from crowds is aggregated to infer latent true labels. We propose a fully Bayesian deep generative crowdsourcing model(Bayes DGC), which combines the strength of deep neural networks(DNNs) on automatic representation learning and the interpretable probabilistic structure encoding of probabilistic graphical models. The model comprises a DNN classifier as a prior for the true labels and a probabilistic model for the annotation generation process. The DNN classifier and annotation generation process share the latent true label variables. To address the inference challenge, we developed a natural-gradient stochastic variational inference, which combines variational message passing for conjugate parameters and stochastic gradient descent for DNN and learns the distribution of latent true labels and workers' confusion matrix via end-to-end training. We illustrated the effectiveness of the proposed model using empirical results on 22 real-world datasets.

关键词： crowdsourcing classification fully Bayesian deep generative models natural gradient stochastic variational inference

来源：评论

学校读者我要写书评

暂无评论

THE BLESSING OF RANDOMNESS: SDE BEATS ODE IN GENERAL DIFFUSION-BASED IMAGE EDITING 12

THE BLESSING OF RANDOMNESS: SDE BEATS ODE IN GENERAL DIFFUSI...

引用

12th International Conference on Learning Representations, ICLR 2024

作者： Nie, Shen Guo, Hanzhong Allan Lu, Cheng Zhou, Yuhao Zheng, Chenyu Li, Chongxuan Gaoling School of Artificial Intelligence Renmin University of China Beijing China Beijing Key Laboratory of Big Data Management and Analysis Methods Beijing China Department of Computer Science and Technology Tsinghua University Beijing China

We present a unified probabilistic formulation for diffusion-based image editing, where a latent variable is edited in a task-specific manner and generally deviates from the corresponding marginal distribution induced by the original stochastic or ordinary differential equation (SDE or ODE). Instead, it defines a corresponding SDE or ODE for editing. In the formulation, we prove that the Kullback-Leibler divergence between the marginal distributions of the two SDEs gradually decreases while that for the ODEs remains as the time approaches zero, which shows the promise of SDE in image editing. Inspired by it, we provide the SDE counterparts for widely used ODE baselines in various tasks including inpainting and image-to-image translation, where SDE shows a consistent and substantial improvement. Moreover, we propose SDE-Drag - a simple yet effective method built upon the SDE formulation for point-based content dragging. We build a challenging benchmark (termed DragBench) with open-set natural, art, and AI-generated images for evaluation. A user study on DragBench indicates that SDE-Drag significantly outperforms our ODE baseline, existing diffusion-based methods, and the renowned DragGAN. Our results demonstrate the superiority and versatility of SDE in image editing and push the boundary of diffusion-based editing methods. See the project page https://***/SDE-Drag-demo/for the code and DragBench dataset. © 2024 12th International Conference on Learning Representations, ICLR 2024. All rights reserved.

关键词： Diffusion

来源：评论

学校读者我要写书评

暂无评论

A UNIFIED FRAMEWORK FOR SOFT THRESHOLD PRUNING 11

A UNIFIED FRAMEWORK FOR SOFT THRESHOLD PRUNING

引用

11th International Conference on Learning Representations, ICLR 2023

作者： Chen, Yanqi Ma, Zhengyu Fang, Wei Zheng, Xiawu Yu, Zhaofei Tian, Yonghong National Engineering Research Center of Visual Technology School of Computer Science Peking University China Institute for Artificial Intelligence Peking University China Peng Cheng Laboratory China

Soft threshold pruning is among the cutting-edge pruning methods with state-of-the-art performance. However, previous methods either perform aimless searching on the threshold scheduler or simply set the threshold trainable, lacking theoretical explanation from a unified perspective. In this work, we reformulate soft threshold pruning as an implicit optimization problem solved using the Iterative Shrinkage-Thresholding Algorithm (ISTA), a classic method from the fields of sparse recovery and compressed sensing. Under this theoretical framework, all threshold tuning strategies proposed in previous studies of soft threshold pruning are concluded as different styles of tuning L1-regularization term. We further derive an optimal threshold scheduler through an in-depth study of threshold scheduling based on our framework. This scheduler keeps L1-regularization coefficient stable, implying a time-invariant objective function from the perspective of optimization. In principle, the derived pruning algorithm could sparsify any mathematical model trained via SGD. We conduct extensive experiments and verify its state-of-the-art performance on both artificial Neural Networks (ResNet-50 and MobileNet-V1) and Spiking Neural Networks (SEW ResNet-18) on ImageNet datasets. On the basis of this framework, we derive a family of pruning methods, including sparsify-during-training, early pruning, and pruning at initialization. The code is available at https://***/Yanqi-Chen/LATS. © 2023 11th International Conference on Learning Representations, ICLR 2023. All rights reserved.

关键词： Neural networks

来源：评论

学校读者我要写书评

暂无评论

RIGID PROTEIN-PROTEIN DOCKING VIA EQUIVARIANT ELLIPTIC-PARABOLOID INTERFACE PREDICTION 12

RIGID PROTEIN-PROTEIN DOCKING VIA EQUIVARIANT ELLIPTIC-PARAB...

引用

12th International Conference on Learning Representations, ICLR 2024

作者： Yu, Ziyang Huang, Wenbing Liu, Yang Department of Computer Science Tsinghua University China Tsinghua University China Gaoling School of Artificial Intelligence Renmin University of China China Beijing Key Laboratory of Big Data Management and Analysis Methods Beijing China

The study of rigid protein-protein docking plays an essential role in a variety of tasks such as drug design and protein engineering. Recently, several learning-based methods have been proposed, exhibiting much faster docking speed than those computational methods. In this paper, we propose a novel learning-based method called ElliDock, which predicts an elliptic paraboloid to represent the protein-protein docking interface. To be specific, our model estimates elliptic paraboloid interfaces for the two input proteins respectively, and obtains the roto-translation transformation for docking by making two interfaces coincide. By its design, ElliDock is independently equivariant with respect to arbitrary rotations/translations of the proteins, which is an indispensable property to ensure the generalization of the docking process. Experimental evaluations show that ElliDock achieves the fastest inference time among all compared methods and is strongly competitive with current state-of-the-art learning-based models such as DiffDock-PP and Multimer particularly for antibody-antigen docking. © 2024 12th International Conference on Learning Representations, ICLR 2024. All rights reserved.

关键词： Proteins

来源：评论

学校读者我要写书评

暂无评论

EdgeFormer: Edge-Assisted Transformer for Thermal Images Semantic Segmentation 2

EdgeFormer: Edge-Assisted Transformer for Thermal Images Sem...

引用

2nd International Conference on Electronic Information Engineering, Big Data, and computer Technology, EIBDCT 2023

作者： Wang, Futian Ding, Zhongfeng Shi, Tao Tang, Jin Anhui Provincial Key Laboratory of Multimodal Cognitive Computation School of Computer Science and Technology Anhui University Hefei China Institute of Artificial Intelligence Hefei Comprehensive National Science Center Hefei China

ISBN: (纸本)9781510664968

Most problems in the power system are related to the temperature of electrical equipment. The abnormal high temperature of electrical equipment will not only cause damage to equipment itself, but also threaten the safety of people's life and property. Therefore, patrol inspectors of power system will carry out routine inspection on electric equipment to ensure safety. Semantic segmentation for electrical equipment carries a big weight in the inspection. With the result of semantic segmentation of power equipment, patrol inspectors can quickly judge whether the temperature of power equipment is normal, and then take the corresponding action. For this reason, we propose the EdgeFormer, which is a typical end-to-end network using thermal image to segment electrical equipment. In our method, we employ an edge information extraction network to attain rich edge features to promote the segmentation performance in electrical equipment's edges and interiors, and we also design the global enhancing module (GEM) to get rich semantic information. Besides, we also propose a feature inserting module (FIM) to fuse the feature maps from different stages together. Lots of experiments have been conducted on the LS-ETS dataset and the results show that our EdgeFormer has achieved the best performance. © 2023 SPIE.

关键词： Semantic Segmentation

来源：评论

学校读者我要写书评

暂无评论

MIKE: A New Benchmark for Fine-grained Multimodal Entity Knowledge Editing 62

MIKE: A New Benchmark for Fine-grained Multimodal Entity Kno...

引用

Findings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024

作者： Li, Jiaqi Du, Miaozeng Zhang, Chuanyi Chen, Yongrui Hu, Nan Qi, Guilin Jiang, Haiyun Cheng, Siyuan Tian, Bozhong School of Cyber Science and Engineering Southeast University Nanjing China School of Computer Science and Engineering Southeast University Nanjing China College of Artificial Intelligence and Automation Hohai University Nanjing China Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications Southeast University Ministry of Education China Tencent AI Lab China Zhejiang University China

ISBN: (纸本)9798891760998

Multimodal knowledge editing represents a critical advancement in enhancing the capabilities of Multimodal Large Language Models (MLLMs). Despite its potential, current benchmarks predominantly focus on coarse-grained knowledge, leaving the intricacies of fine-grained (FG) multimodal entity knowledge largely unexplored. This gap presents a notable challenge, as FG entity recognition is pivotal for the practical deployment and effectiveness of MLLMs in diverse real-world scenarios. To bridge this gap, we introduce MIKE, a comprehensive benchmark and dataset specifically designed for the FG multimodal entity knowledge editing. MIKE encompasses a suite of tasks tailored to assess different perspectives, including Vanilla Name Answering, Entity-Level Caption, and Complex-Scenario Recognition. In addition, a new form of knowledge editing, Multi-Step Editing, is introduced to evaluate the editing efficiency. Through our extensive evaluations, we demonstrate that the current state-of-the-art methods face significant challenges in tackling our proposed benchmark, underscoring the complexity of FG knowledge editing in MLLMs. Our findings spotlight the urgent need for novel approaches in this domain, setting a clear agenda for future research and development efforts within the community. © 2024 Association for Computational Linguistics.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

Unlocking the Potential of Model Merging for Low-Resource Languages

Unlocking the Potential of Model Merging for Low-Resource La...

引用

2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024

作者： Tao, Mingxu Zhang, Chen Huang, Quzhe Ma, Tianyao Huang, Songfang Zhao, Dongyan Feng, Yansong Wangxuan Institute of Computer Technology Peking University China State Key Laboratory of General Artificial Intelligence Peking University China Center for Data Science Peking University China College of Engineering Peking University China

ISBN: (纸本)9798891761681

Adapting large language models (LLMs) to new languages typically involves continual pre-training (CT) followed by supervised finetuning (SFT).However, this CT-then-SFT approach struggles with limited data in the context of low-resource languages, failing to balance language modeling and task-solving *** thus propose a new model merging solution as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional *** use model merging to develop task-solving LLMs for low-resource languages without SFT data in the target *** experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce *** performance saturation in model merging with increasingly more training tokens, we further analyze the merging process and introduce a slack variable to the model merging algorithm to mitigate the loss of important parameters, thereby enhancing model *** hope that model merging can benefit more human languages suffering from data scarcity with its higher data efficiency. © 2024 Association for Computational Linguistics.

关键词： Modeling languages

来源：评论

学校读者我要写书评

暂无评论

The Reliability of OKRidge Method in Solving Sparse Ridge Regression Problems 38

The Reliability of OKRidge Method in Solving Sparse Ridge Re...

引用

38th Conference on Neural Information Processing Systems, NeurIPS 2024

作者： Li, Xiyuan Wang, Youjun Liu, Weiwei School of Computer Science Wuhan University National Engineering Research Center for Multimedia Software Wuhan University Institute of Artificial Intelligence Wuhan University Hubei Key Laboratory of Multimedia and Network Communication Engineering Wuhan University

Sparse ridge regression problems play a significant role across various domains. To solve sparse ridge regression, [1] recently proposes an advanced algorithm, Scalable Optimal K-Sparse Ridge Regression (OKRidge), which is both faster and more accurate than existing approaches. However, the absence of theoretical analysis on the error of OKRidge impedes its large-scale applications. In this paper, we reframe the estimation error of OKRidge as a Primary Optimization (PO) problem and employ the Convex Gaussian min-max theorem (CGMT) to simplify the PO problem into an Auxiliary Optimization (AO) problem. Subsequently, we provide a theoretical error analysis for OKRidge based on the AO problem. This error analysis improves the theoretical reliability of OKRidge. We also conduct experiments to verify our theorems and the results are in excellent agreement with our theoretical findings. © 2024 Neural information processing systems foundation. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Feature Mutual Reinforcement Learning and Resampling for RGB-T Tracking 2

Feature Mutual Reinforcement Learning and Resampling for RGB...

引用

2nd International Conference on Electronic Information Engineering, Big Data, and computer Technology, EIBDCT 2023

作者： Wang, Futian Zhou, Xuan Wang, Wenqi Anhui Provincial Key Laboratory of Multimodal Cognitive Computation School of Computer Science and Technology Anhui University Hefei China Institute of Artificial Intelligence Hefei Comprehensive National Science Center Hefei China

ISBN: (纸本)9781510664968

More and more researchers have paid attention to the tracking of visible-thermal infrared (RGB-T). How to fully exploit the complementary features of visible and thermal infrared images and fully integrate them is a key issue. After extracting image features, many researchers simply fuse the features by adding, connecting operations or designing fusion modules. However, these methods ignore the effects of different levels of fusion features on target modeling and specific feature extraction. In this work, we propose a RGB-T tracking network (MRLRNet) based on feature mutual reinforcement learning and resampling. Specifically, we design a feature mutual reinforcement learning module, which combines different layers of features to achieve progressive fusion. After each layer feature is extracted, the aggregation features are used to enhance specific modal features to achieve better specific feature representation and reduce noise and redundancy features. At the same time, we design a resampling module, which calculates the offset of two adjacent frames by phase correlation operation, and recalculates the Gaussian sample points to solve the problem of ground target loss caused by sudden camera movement. A large number of experiments on three RGB-T tracking datasets, GTOT, RGBT234 and LasHeR, demonstrate the effectiveness of this method. © 2023 SPIE.

关键词： Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Speaker diarization based on multi-timescale feature fusion 2

Speaker diarization based on multi-timescale feature fusion

引用

2nd International Conference on Electronic Information Engineering, Big Data, and computer Technology, EIBDCT 2023

作者： Wang, Futian Chen, Lailong Anhui Provincial Key Laboratory of Multimodal Cognitive Computation School of Computer Science and Technology Anhui University Hefei China Institute of Artificial Intelligence Hefei Comprehensive National Science Center Hefei China

ISBN: (纸本)9781510664968

Deep and convolutional neural networks have performed well in capturing speaker characteristics, while the ECAPA-TDNN model has demonstrated outstanding performance in both the fields of speaker validation and speaker diarization. Within this essay, during the speech segmentation stage, we uniformly redivide the speech on multiple time scales based on the oracle voice activity detection. Meanwhile, we fine-tune the ECAPA-TDNN architecture by adding a RepVGG module to extract more abundant features, then aggregate all of the outputs. Finally, we use DOVER-Lap to integrate the results obtained after the clustering of multiple schemes in a way to obtain the final temporal labeling. The best results achieves 1.91% of the diarization error rate on the classical AMI conference corpus. © 2023 SPIE.

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：