检索结果-内蒙古大学图书馆

International Joint Conference on Computer Science and Software Engineering (JCSSE)

作者： Sapa Chanyachatchawan Krich Nasingkun Patipat Tumsangthong Porntiwa Chata Marut Buranarach Monsak Socharoentum Leveraging Technology Solutions Section National Electronics and Computer Technology Center Bangkok Thailand Strategic Analytics Networks with Machine Learning and AI Research National Electronics and Computer Technology Center Bangkok Thailand Data Science and Analytics Research Group National Electronics and Computer Technology Center Bangkok Thailand Digital Government Development Agency Bangkok Thailand

In the current era of extensive data usage across industries, data collection, preservation, utilization, and organization has become more challenging and nuanced because it is necessary to consider critical concerns such as data security, privacy, and legal issues, apart from efficiency issues. As a result, Thai government initiated the idea and effort to implement data governance throughout the government agency. This paper showcases the implementation of data governance in a governmental research organization with highly diverse structured and unstructured data. The implementation follows international standards and the guidelines of the Digital Government Development Agency (DGA). The executives set up the working body, including the data Governance Council and data Stewards, responsible for setting up and deploying policies and regulations. Creating awareness and the necessary infrastructure are the main focuses in the first-year phase. The metadata was designed to extend DGA's version and match the organization's unique requirements. A data catalog platform was developed accordingly. We organized activities to boost employee awareness and participation, including advertising and data catalog platform training. By the end of the first year of implementation, every organization unit had registered at least one data record into the data catalog.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Reproducibility and Geometric Intrinsic Dimensionality: An Investigation on Graph Neural Network Research

arXiv

引用

arXiv 2024年

作者： Hille, Tobias Stubbemann, Maximilian Hanika, Tom Knowledge & Data Engineering Group University of Kassel Kassel Germany Interdisciplinary Research Center for Information System Design University of Kassel Kassel Germany Information Systems and Machine Learning Lab University of Hildesheim Hildesheim Germany Institute of Computer Science University of Hildesheim Hildesheim Germany

Difficulties in replication and reproducibility of empirical evidences in machine learning research have become a prominent topic in recent years. Ensuring that machine learning research results are sound and reliable requires reproducibility, which verifies the reliability of research findings using the same code and data. This promotes open and accessible research, robust experimental workflows, and the rapid integration of new findings. Evaluating the degree to which research publications support these different aspects of reproducibility is one goal of the present work. For this we introduce an ontology of reproducibility in machine learning and apply it to methods for graph neural networks. Building on these efforts we turn towards another critical challenge in machine learning, namely the curse of dimensionality, which poses challenges in data collection, representation, and analysis, making it harder to find representative data and impeding the training and inference processes. Using the closely linked concept of geometric intrinsic dimension we investigate to which extend the used machine learning models are influenced by the intrinsic dimension of the data sets they are trained *** Codes 68T01 68T07 68T09 51F99 Copyright © 2024, The Authors. All rights reserved.

关键词： Graph neural networks

来源：评论

学校读者我要写书评

暂无评论

NC-ALG: Graph-Based Active learning Under Noisy Crowd

NC-ALG: Graph-Based Active Learning Under Noisy Crowd

引用

International Conference on data Engineering

作者： Wentao Zhang Yexin Wang Zhenbang You Yang Li Gang Cao Zhi Yang Bin Cui Center for Machine Learning Research Peking University Institute of Advanced Algorithms Research Shanghai National Engineering Labratory for Big Data Analytics and Applications Key Lab of High Confidence Software Technologies Peking University Department of Data Platform TEG Tencent Inc. Beijing Academy of Artificial Intelligence Institute of Computational Social Science Peking University Qingdao

ISBN: (数字)9798350317152

ISBN: (纸本)9798350317169

Graph Neural Networks (GNNs) have achieved great success in various data mining tasks but they heavily rely on a large number of annotated nodes, requiring considerable human efforts. Despite the effectiveness of existing GNN-based Active learning (AL) methods, they assume that the annotated labels are always correct, which is contradictory to the error-prone labeling process in a practical crowdsourcing environment. Besides, due to this impractical assumption, existing works only focus on optimizing the node selection in AL but neglect optimizing the labeling process. Therefore, we present NC-ALG, the first GNN-based AL framework that optimizes both the node selection and node labeling process under a noisy crowd. For node selection, NC-ALG introduces a new measurement to model influence reliability and an effective influence maximization objective to select nodes. For node labeling, NC-ALG significantly reduces the labeling cost by considering the model-predicted labels and the labels of mirror nodes. To the best of our knowledge, this is the first attempt to consider GNN-based AL under the practical noisy crowd. Empirical studies on public datasets demonstrate that NC-ALG significantly outperforms existing methods in terms labeling efficiency. Notably, it only takes NC-ALG one-third of the labeling budget that the competitive baseline GRAIN needs to achieve an accuracy of 70.7 % on PubMed.

关键词： Crowdsourcing Costs Noise Predictive models Graph neural networks Labeling Noise measurement

来源：评论

学校读者我要写书评

暂无评论

A Privacy-Preserving Framework for Collaborative machine learning with Kernel Methods

A Privacy-Preserving Framework for Collaborative Machine Lea...

引用

IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA)

作者： Anika Hannemann Ali Burak Ünal Arjhun Swaminathan Erik Buchmann Mete Akgün Dept. of Computer Science Leipzig University Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig Germany Medical Data Privacy and Privacy-preserving Machine Learning (MDPPML) University of Tübingen Institute for Bioinformatics and Medical Informatics (IBMI) University of Tübingen Germany

It is challenging to implement Kernel methods, if the data sources are distributed and cannot be joined at a trusted third party for privacy reasons. It is even more challenging, if the use case rules out privacy-preserving approaches that introduce noise or entail significant computational overhead. An example for such a use case is machine learning on clinical data. To realize exact and efficient privacy preserving computation of kernel methods, we propose FLAKE, a Framework for learning with Anonymized KErnels on horizontally distributed data. With our method, the data sources mask their data so that a Gram matrix can be computed without compromising privacy or utility. The Gram matrix allows to calculate many kernel matrices, which can be used to train kernel-based machine learning algorithms such as Support Vector machines. We prove that our framework prevents an adversary from learning the input data or the number of input features under a semi-honest threat model. The conducted experiments on clinical, genomic, and image data provide confirmation that our approach is applicable across a wide range of settings. Additionally, our method outperforms comparable approaches in both computational efficiency and accuracy. Thus, FLAKE is a lightweight, applicable approach suitable for various use cases.

关键词：

来源：评论

学校读者我要写书评

暂无评论

BIM: Improving Graph Neural Networks with Balanced Influence Maximization

BIM: Improving Graph Neural Networks with Balanced Influence...

引用

International Conference on data Engineering

作者： Wentao Zhang Xinyi Gao Ling Yang Meng Cao Ping Huang Jiulong Shan Hongzhi Yin Bin Cui Center for Machine Learning Research Peking University Institute of Advanced Algorithms Research Shanghai National Engineering Labratory for Big Data Analytics and Applications The University of Queensland Australia Key Lab of High Confidence Software Technologies Peking University Apple Inc. Institute of Computational Social Science Peking University Qingdao

ISBN: (数字)9798350317152

ISBN: (纸本)9798350317169

The imbalanced data classification problem has aroused lots of concerns from both academia and industry since data imbalance is a widespread phenomenon in many real-world scenarios. Although this problem has been well researched from the view of imbalanced class samples, we further argue that graph neural networks (GNNs) expose a unique source of imbalance from the influenced nodes of different classes of labeled nodes, i.e., labeled nodes are imbalanced in terms of the number of nodes they influenced during the influence propagation in GNNs. To tackle this previously unexplored influence-imbalance issue, we connect social influence maximization with the imbalanced node classification problem and propose balanced influence maximization (BIM). Specifically, BIM greedily assigns the pseudo label to the node which can maximize the number of influenced nodes in GNN training while making the influence of each class more balance. Experimental results on five public datasets demonstrate the effectiveness of our method in relieving the influence-imbalance issue. For example, when training a GCN with an imbalance ratio of 0.1, BIM significantly outperforms the most competitive baseline by 0.6% -9.8% in five public datasets in terms of the F1 score.

关键词： Training Industries data engineering Graph neural networks

来源：评论

学校读者我要写书评

暂无评论

Thai Conversational Chatbot Classification Using BiLSTM and data Augmentation 1

引用

1st International Conference on data Science and Artificial Intelligence, DSAI 2023

作者： Lhasiw, Nunthawat Tanantong, Tanatorn Sanglerdsinlapachai, Nuttapong Thammasat Research Unit in Data Innovation and Artificial Intelligence Department of Computer Science Faculty of Science and Technology Thammasat University Pathum Thani Thailand Strategic Analytics Networks with Machine Learning and AI Research Team National Electronics and Computer Technology Center Pathum Thani Thailand

ISBN: (数字)9789819979691

ISBN: (纸本)9789819979684

Chatbot platforms, e.g., Facebook and Line, have revolutionized human interaction in the digital age. In order to develop an automatic chatbot classification, there are several challenges especially for Thai chat messages. Conversational messages are usually short and ambiguous. Therefore, it is difficult to find a dataset for constructing an effective classification model. To address the limited size of the dataset, data augmentation techniques can be possibly applied. data augmentation involves generating synthetic messages by applying various transformations to existing data samples while preserving their original meaning. In this study, the size and diversity of the dataset is increased by two methods, i.e., text augmentation using word2vec from Thai2Fit and English-Thai machine translation models proposed by VISTEC. Based on the augmented messages, a Deep learning technique, BiLSTM, is used to construct a chatbot classification model. The experimental obtained results demonstrate that data augmentation can help to increase the classification performance. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd 2023.

关键词： Classification (of information)

来源：评论

学校读者我要写书评

暂无评论

Language-Agnostic Bias Detection in Language Models with Bias Probing

arXiv

引用

arXiv 2023年

作者： Koksal, Abdullatif Yalcin, Omer Faruk Akbiyik, Ahmet Kilavuz, M. Tahir Korhonen, Anna Schutze, Hinrich Center for Information and Language Processing LMU Munich Germany Munich Center for Machine Learning Germany Data Analytics and Computational Social Science University of Massachusetts Amherst United States Harvard Kennedy School United States Middle East Initiative Harvard Kennedy School United States Marmara University Turkey Language Technology Lab University of Cambridge United Kingdom

Pretrained language models (PLMs) are key components in NLP, but they contain strong social biases. Quantifying these biases is challenging because current methods focusing on fill-the-mask objectives are sensitive to slight changes in input. To address this, we propose a bias probing technique called LABDet, for evaluating social bias in PLMs with a robust and language-agnostic method. For nationality as a case study, we show that LABDet "surfaces" nationality bias by training a classifier on top of a frozen PLM on non-nationality sentiment detection. We find consistent patterns of nationality bias across monolingual PLMs in six languages that align with historical and political context. We also show for English BERT that bias surfaced by LABDet correlates well with bias in the pretraining data;thus, our work is one of the few studies that directly links pretraining data to PLM behavior. Finally, we verify LABDet's reliability and applicability to different templates and languages through an extensive set of robustness checks. We publicly share our code and dataset in https://***/akoksal/LABDet. Copyright © 2023, The Authors. All rights reserved.

关键词： machine learning

来源：评论

学校读者我要写书评

暂无评论

Achieving Linear Speedup with Network-Independent learning Rates in Decentralized Stochastic Optimization

Achieving Linear Speedup with Network-Independent Learning R...

引用

IEEE Conference on Decision and Control

作者： Hao Yuan Sulaiman A. Alghunaim Kun Yuan Center for Machine Learning Research Peking University Beijing P. R. China Dept. Electrical Engr. Kuwait University Safat Kuwait AI for Science Institute Beijing P. R. China National Engineering Labratory for Big Data Analytics and Applications Beijing P. R. China

Decentralized stochastic optimization has become a crucial tool for addressing large-scale machine learning and control problems. In decentralized algorithms, all computing nodes are connected through a network topology, and each node communicates only with its direct neighbors. Decentralized algorithms can significantly reduce communication overhead by eliminating the need for global communication. However, existing research on the linear speedup analysis of decentralized stochastic algorithms is limited to the condition of network-dependent learning rates, which rarely holds in practice since the network connectivity is typically unknown to each node. As a result, it remains an open question whether a linear speedup bound can be achieved using network-independent learning rates. This paper provides an affirmative answer. By utilizing a new analysis framework, we prove that D-SGD and Exact-Diffusion, two representative decentralized stochastic algorithms, can achieve linear speedup with network-independent learning rates. Simulations are provided to validate our theories.

关键词：

来源：评论

学校读者我要写书评

暂无评论

System Architecture for Reading and Interpreting Physical Printouts of Medical Forms

System Architecture for Reading and Interpreting Physical Pr...

引用

Annual Siberian Russian Workshop on Electron Devices and Materials (EDM)

作者： Ekaterina Snegireva Grigory R. Khazankin Igor Mikheenko Stream Data Analytics and Machine Learning laboratory Novosibirsk State University Novosibirsk Russia Novosibirsk State University Novosibirsk Russia Meshalkin National Medical Research Center Novosibirsk Russia

This article describes the developed architecture of the system module for processing and interpreting analog medical data. Patients often undergo examinations in various medical institutions, and since their results are often handed out to the patient in printed form, the receiving institution transfers them to its database manually. There is also a tendency to completely refuse analog media and use only digital ones. But in this case, another problem appears - either loss or conversion of the accumulated analog base into digital format. These days, automatic document management systems for medical institutions - Health information systems (HIS) - are actively developing. The software module developed in accordance with the architecture described in the article can be used by developers of various HIS to automate the work with analog data. If it is necessary, it can also be freely expanded by adding new modules for working with various analog data. In this article, we take ECG scans and medical test results as examples of such data. As a result of the work undertaken the prototype of the designed system was developed and tested.

关键词： Waste materials databases Prototypes Systems architecture Computer architecture Electrocardiography Medical tests

来源：评论

学校读者我要写书评

暂无评论

Deep learning Based Prediction of Sun-Induced Fluorescence from Hyplant Imagery

Deep Learning Based Prediction of Sun-Induced Fluorescence f...

引用

IEEE International Symposium on Geoscience and Remote Sensing (IGARSS)

作者： Jim Buffat Miguel Pato Kevin Alonso Stefan Auer Emiliano Carmona Stefan Maier Rupert Müller Patrick Rademske Uwe Rascher Hanno Scharr Forschungszentrum Jülich GmbH Institute of Bio- and Geosciences IBG-2: Plant Sciences Jülich Germany German Aerospace Center (DLR) Earth Observation Center Remote Sensing Technology Institute Oberpfaffenhofen Germany RHEA Group c/o European Space Agency (ESA) Frascati Italy Forschungszentrum Jülich GmbH Institute of Advanced Simulations IAS-8: Data Analytics and Machine Learning Jülich Germany

The retrieval of sun-induced fluorescence (SIF) from hyper-spectral imagery is an ill-posed problem that has been tackled in different ways. We present a novel retrieval method combining semi-supervised deep learning with an existing spectral fitting method. A validation study with in-situ SIF measurements shows high sensitivity of the deep learning method to SIF changes even though systematic shifts deteriorate its absolute prediction accuracy. A detailed analysis of diurnal SIF dynamics and SIF prediction in topographically variable terrain highlights the benefits of this deep learning approach.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：