检索结果-内蒙古大学图书馆

作者： Saeed Basirian Jahromi Aalto University

学位级别：硕士

Transductive semi-supervised learning methods aim at automatically labeling large datasets by leveraging information provided by few manually labeled data points and the intrinsic structure of the dataset. Many such methods based on a graph signal representation of a dataset have been proposed, in which the nodes correspond to the data points, the edges connect similar points, and the graph signal is the mapping between the nodes and the la- bels. Most of the existing methods use deterministic signal models and try to recover the graph signal using a regularized or constrained convex optimiza- tion approach, where the regularization/constraint term enforce some sort of smoothness of the graph signal. This thesis takes a different route and inves- tigates a probabilistic graphical modeling approach in which the graph signal is considered a Markov random field defined over the underlying network structure. The measurement process, modeling the initial manually obtained labels, and smoothness assumptions are imposed by a probability distribution defined over the Markov network corresponding to the data graph. Various approximate inference methods such as loopy belief propagation and the mean field methods are studied by means of numerical experiments involving both synthetic and real-world datasets.

关键词： Semi-supervised learning graph signal learning probabilistic graphical models approximate inference loopy belief propagation complex networks

来源：评论

学校读者我要写书评

暂无评论

Multi-labeled Document Classification using Semi-supervived Mixture Model of Watson distributions on Document Manifold

Multi-labeled Document Classification using Semi-supervived ...

引用

International Conference of Soft Computing and Pattern Recognition (SoCPaR)

作者： Nguyen Kim Anh Ngo Van Linh Nguyen Khac Toi Nguyen The Tam Hanoi Univ Sci & Technol Sch Informat & Commun Technol Hanoi Vietnam

ISBN: (纸本)9781479934003

Classification of multilabel documents is essential to information retrieval and text mining. Most of existing approaches to multilabel text classification do not pay attention to relationship between class labels and input documents and also rely on labeled data all the time for classification. In fact, unlabeled data is readily available whereas generation of labeled data is expensive and error prone as it needs human annotation. In this paper, we propose a novel multilabel document classification approach based on semi-supervised mixture model of Watson distributions on document manifold which explicitly considers the manifold structure of document space to exploit efficiently both labeled and unlabeled data for classification. Our proposed approach models all labels within a dataset simultaneously, which lends itself well to the task of considering the relationship between these labels. The experimental results show that proposed method outperforms the state-of-the-art methods applying to multilabeled text classification.

关键词： probabilistic graphical models Semi-supervised Learning Laplacian Regularization Mixture models

来源：评论

学校读者我要写书评

暂无评论

Multi-Label Classification based on Sum-Product Networks

Multi-Label Classification based on Sum-Product Networks

引用

作者： Julissa Giuliana Villanueva Llerena Universidade de Sao Paulo

学位级别：硕士

Multi-label classification consists of learning a function that is capable of mapping an object to a set of relevant labels. It has applications such as the association of genes with biological functions, semantic classification of scenes and text categorization. Traditional classification (i. e., single-label) is therefore a particular case of multi-label classification in which each object is associated with exactly one label. A successful approach to constructing classifiers is to obtain a probabilistic model of the relation between object attributes and labels. This model can then be used to classify objects, finding the most likely prediction by computing the marginal probability or the most probable explanation (MPE) of the labels given the attributes. Depending on the probabilistic models family chosen, such inferences may be intractable when the number of labels is large. Sum-Product Networks (SPN) are deep probabilistic models, that allow tractable marginal inference. Nevertheless, as with many other probabilistic models, performing MPE inference is NP- hard. Although, SPNs have already been used successfully for traditional classification tasks (i. e. single-label), there is no in-depth investigation on the use of SPNs for Multi-Label classification. In this work we investigate the use of SPNs for Multi-Label classification. We compare several algorithms for learning SPNs combined with different proposed approaches for classification. We show that SPN-based multi-label classifiers are competitive against state-of-the-art classifiers, such as Random k-Labelsets with Support Vector Machine and MPE inference on CutNets, in a collection of benchmark datasets.

关键词： Multi-label classification probabilistic graphical models sum-product networks

来源：评论

学校读者我要写书评

暂无评论

Fair Inference for Discrete Latent Variable models: An Intersectional Approach 24

Fair Inference for Discrete Latent Variable Models: An Inter...

引用

4th International Conference on Information Technology

作者： Islam, Rashidul Pan, Shimei Foulds, James R. Visa Inc Visa Res Atlanta GA 30309 USA UMBC Dept IS Baltimore MD USA

ISBN: (纸本)9798400710940

It is now widely acknowledged that machine learning models, trained on data without due care, often exhibit discriminatory behavior. Traditional fairness research has mainly focused on supervised learning tasks, particularly classification. While fairness in unsupervised learning has received some attention, the literature has primarily addressed fair representation learning of continuous embeddings. This paper, however, takes a different approach by investigating fairness in unsupervised learning using graphical models with discrete latent variables. We develop a fair stochastic variational inference method for discrete latent variables. Our approach uses a fairness penalty on the variational distribution that reflects the principles of intersectionality, a comprehensive perspective on fairness from the fields of law, social sciences, and humanities. Intersectional fairness brings the challenge of data sparsity in minibatches, which we address via a stochastic approximation approach. We first show the utility of our method in improving equity and fairness for clustering using naive Bayes and Gaussian mixture models on benchmark datasets. To demonstrate the generality of our approach and its potential for real-world impact, we then develop a specialized graphical model for criminal justice risk assessments, and use our fairness approach to prevent the inferences from encoding unfair societal biases.

关键词： fairness in AI intersectionality probabilistic graphical models stochastic variational inference

来源：评论

学校读者我要写书评

暂无评论

Exploiting Domain Structure with Hybrid Generative-discriminative models

Exploiting Domain Structure with Hybrid Generative-discrimin...

引用

作者： Kelly, Austen University of Oregon

学位级别：M.S.

Machine learning methods often face a tradeoff between the accuracy of discriminative models and the lower sample complexity of their generative counterparts. This inspires a need for hybrid methods. In this paper we present the graphical ensemble classifier (GEC), a novel combination of logistic regression and naive Bayes. By partitioning the feature space based on known independence structure, GEC is able to handle datasets with a diverse set of features and achieve higher accuracy than a purely discriminative model from less training data. In addition to describing the theoretical basis of our model, we show the practical effectiveness on artificial data, along with the 20-newsgroups, MNIST, and MediFor datasets.

关键词： machine learning probabilistic graphical models

来源：评论

学校读者我要写书评

暂无评论

A probabilistic-based approach for automatic identification and refactoring of software code smells

引用

APPLIED SOFT COMPUTING 2022年 130卷

作者： Saheb-Nassagh, Raana Ashtiani, Mehrdad Minaei-Bidgoli, Behrouz Iran Univ Sci & Technol Sch Comp Engn Tehran Iran Iran Univ & Technol Sch Comp Engn Hengam StResalat Sq Tehran *** Iran

Programmers strive to design programs that are flexible, updateable, and maintainable. However, several factors such as lack of time, high costs, and workload lead to the creation of software with inadequacies known as anti-patterns. To identify and refactor software anti-patterns, many research studies have been conducted using machine learning. Even though some of the previous works were very accurate in identifying anti-patterns, a method that takes into account the relationships between different structures is still needed. Furthermore, a practical method is needed that is trained according to the characteristics of each program. This method should be able to identify anti-patterns and perform the necessary refactorings. This paper proposes a framework based on probabilistic graphical models for identifying and refactoring anti-patterns. A graphical model is created by extracting the class properties from the source code. As a final step, a Bayesian network is trained, which determines whether anti-patterns are present or not based on the characteristics of neighboring classes. To evaluate the proposed approach, the model is trained on six different anti-patterns and six different Java programs. The proposed model has identified these anti-patterns with a mean accuracy of 85.16 percent and a mean recall of 79%. Additionally, this model has been used to introduce several methods for refactoring, and it has been shown that these refactoring methods will ultimately create a system with less coupling and higher cohesion.

关键词： Software anti -patterns Refactoring probabilistic graphical models

来源：评论

学校读者我要写书评

暂无评论

The importance of temporal information in Bayesian network structure learning

引用

EXPERT SYSTEMS WITH APPLICATIONS 2021年 164卷 113814-113814页

作者： Constantinou, Anthony C. Queen Mary Univ London Sch Elect Engn & Comp Sci Bayesian Artificial Intelligence Res Lab Risk Informat Management Res Grp London E1 4NS England Alan Turing Inst British Lib 96 Euston Rd London NW1 2DB England

Several algorithms have been proposed towards discovering the graphical structure of Bayesian networks. Most of these algorithms are restricted to observational data and some enable us to incorporate knowledge as constraints in terms of what can and cannot be discovered by an algorithm. A common type of such knowledge involves the temporal order of the variables in the data. For example, knowledge that event B occurs after observing A and hence, the constraint that B cannot cause A. This paper investigates real-world case studies that incorporate interesting properties of objective temporal variable order, and the impact these temporal constraints have on the learnt graph. The results show that most of the learnt graphs are subject to major modifications after incorporating incomplete temporal objective information. Because temporal information is widely viewed as a form of knowledge that is subjective, rather than as a form of data that tends to be objective, it is generally disregarded and reduced to an optional piece of information that only few of the structure learning algorithms may consider. The paper argues that objective temporal information should form part of observational data, to reduce the risk of disregarding such information when available and to encourage its reusability across related studies.

关键词： Causal discovery Causal graphs Directed acyclic graphs probabilistic graphical models Order-based learning Temporal constraints

来源：评论

学校读者我要写书评

暂无评论

Bayesian network approach to process data reconciliation with state uncertainties and recycle streams

引用

CHEMICAL ENGINEERING SCIENCE 2021年 246卷 116996-116996页

作者： Sundaramoorthy, Arun Senthil Valluru, Jayaram Huang, Biao Univ Alberta Dept Chem & Mat Engn Edmonton AB Canada

Data reconciliation is a widely utilised technique in process industries to obtain consistent estimates of the process variables from measurements corrupted with random error and gross error, taking process models as constraint. In the existing formulations for data reconciliation, process models are assumed to be error free. However, in practice, process models can suffer from model inaccuracies, leading to uncertainties in states. This paper introduces a new method for data reconciliation developed in the framework of Bayesian network, accounting for the state uncertainties. The solution is obtained by utilising a Bayesian network model translated from the process model and using statistical inference techniques to estimate the reconciled values of the states. A novel method to construct acyclic Bayesian network for process networks with recycle streams is proposed. This method is also extended for data reconciliation of partially measured systems. The proposed data reconciliation schemes is demonstrated on two case studies. (c) 2021 Elsevier Ltd. All rights reserved.

关键词： Bayesian network Data reconciliation probabilistic graphical models Parameter estimation

来源：评论

学校读者我要写书评

暂无评论

Accurate determination of node and arc multiplicities in de bruijn graphs using conditional random fields

引用

BMC BIOINFORMATICS 2020年第1期21卷 402-402页

作者： Steyaert, Aranka Audenaert, Pieter Fostier, Jan Univ Ghent IMEC Dept Informat Technol IDLab B-9052 Ghent Belgium

Background: De Bruijn graphs are key data structures for the analysis of next-generation sequencing data. They efficiently represent the overlap between reads and hence, also the underlying genome sequence. However, sequencing errors and repeated subsequences render the identification of the true underlying sequence difficult. A key step in this process is the inference of the multiplicities of nodes and arcs in the graph. These multiplicities correspond to the number of times eachk-mer (resp.k+1-mer) implied by a node (resp. arc) is present in the genomic sequence. Determining multiplicities thus reveals the repeat structure and presence of sequencing errors. Multiplicities of nodes/arcs in the de Bruijn graph are reflected in their coverage, however, coverage variability and coverage biases render their determination ambiguous. Current methods to determine node/arc multiplicities base their decisions solely on the information in nodes and arcs individually, under-utilising the information present in the sequencing data. Results: To improve the accuracy with which node and arc multiplicities in a de Bruijn graph are inferred, we developed a conditional random field (CRF) model to efficiently combine the coverage information within each node/arc individually with the information of surrounding nodes and arcs. Multiplicities are thus collectively assigned in a more consistent manner. Conclusions: We demonstrate that the CRF model yields significant improvements in accuracy and a more robust expectation-maximisation parameter estimation. Truek-mers can be distinguished from erroneousk-mers with a higher F(1)score than existing methods. A C++11 implementation is available atunder the GNU AGPL v3.0 license.

关键词： Next-generation sequencing De Bruijn graphs probabilistic graphical models

来源：评论

学校读者我要写书评

暂无评论

A hierarchical Bayesian-MAP approach to inverse problems in imaging

引用

INVERSE PROBLEMS 2016年第7期32卷 075003-075003页

作者： Raj, Raghu G. US Naval Res Lab Washington DC 20375 USA

We present a novel approach to inverse problems in imaging based on a hierarchical Bayesian-MAP (HB-MAP) formulation. In this paper we specifically focus on the difficult and basic inverse problem of multi-sensor (tomographic) imaging wherein the source object of interest is viewed from multiple directions by independent sensors. Given the measurements recorded by these sensors, the problem is to reconstruct the image (of the object) with a high degree of fidelity. We employ a probabilistic graphical modeling extension of the compound Gaussian distribution as a global image prior into a hierarchical Bayesian inference procedure. Since the prior employed by our HB-MAP algorithm is general enough to subsume a wide class of priors including those typically employed in compressive sensing (CS) algorithms, HB-MAP algorithm offers a vehicle to extend the capabilities of current CS algorithms to include truly global priors. After rigorously deriving the regression algorithm for solving our inverse problem from first principles, we demonstrate the performance of the HB-MAP algorithm on Monte Carlo trials and on real empirical data (natural scenes). In all cases we find that our algorithm outperforms previous approaches in the literature including filtered back-projection and a variety of state-of-the-art CS algorithms. We conclude with directions of future research emanating from this work.

关键词： inverse problems imaging compressive sensing Bayesian probabilistic graphical models hierarchical Bayes sparse reconstruction

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：