engineering outreach and introductory courses are essential for motivating and training the next generation of capable engineers. Accessibility and portability of the infrastructure for a STEM course is critical for s...
详细信息
Forest plays a vital role in environmental change and the protection of forests is the utmost important thing. The study attempts to suggest a solution for a smart forest (Internet of Forest Things). As nowadays, the ...
详细信息
Learners with a limited budget can use supervised data subset selection and active learning techniques to select a smaller training set and reduce the cost of acquiring data and training machine learning (ML) models. ...
详细信息
Learners with a limited budget can use supervised data subset selection and active learning techniques to select a smaller training set and reduce the cost of acquiring data and training machine learning (ML) models. However, the resulting high model performance, measured by a data utility function, may not be preserved when some data owners, enabled by the GDPR's right to erasure, request their data to be deleted from the ML model. This raises an important question for learners who are temporarily unable or unwilling to acquire data again: During the initial data acquisition of a training set of size k, can we proactively maximize the data utility after future unknown deletions? We propose that the learner anticipates/estimates the probability that (i) each data owner in the feasible set will independently delete its data or (ii) a number of deletions occur out of k, and justify our proposal with concrete real-world use cases. Then, instead of directly maximizing the data utility function, the learner can maximize the expected or risk-averse post-deletion utility based on the anticipated probabilities. We further propose how to construct these deletion-anticipative data selection (DADS) maximization objectives to preserve monotone submodularity and near-optimality of greedy solutions, how to optimize the objectives and empirically evaluate DADS' performance on real-world datasets. Copyright 2024 by the author(s)
作者:
Mohan, RamyaRajinikanth, Venkatesan
Department of Computer Science and Engineering TN Chennai602105 India
Division of Research and Innovation Department of Computer Science and Engineering TN Chennai602105 India
Oral cancer (OC) is a harsh disease that demands timely detection and intervention. Histopathological image-assisted OC detection involves the assessment of tissue specimens under a microscope to detect abnormal cellu...
详细信息
A common issue in learning decision-making policies in data-rich settings is spurious correlations in the offline dataset, which can be caused by hidden confounders. Instrumental variable (IV) regression, which utilis...
详细信息
A common issue in learning decision-making policies in data-rich settings is spurious correlations in the offline dataset, which can be caused by hidden confounders. Instrumental variable (IV) regression, which utilises a key unconfounded variable known as the instrument, is a standard technique for learning causal relationships between confounded action, outcome, and context variables. Most recent IV regression algorithms use a two-stage approach, where a deep neural network (DNN) estimator learnt in the first stage is directly plugged into the second stage, in which another DNN is used to estimate the causal effect. Naively plugging the estimator can cause heavy bias in the second stage, especially when regularisation bias is present in the first stage estimator. We propose DML-IV, a non-linear IV regression method that reduces the bias in two-stage IV regressions and effectively learns high-performing policies. We derive a novel learning objective to reduce bias and design the DML-IV algorithm following the double/debiased machine learning (DML) framework. The learnt DML-IV estimator has strong convergence rate and O(N−1/2) suboptimality guarantees that match those when the dataset is unconfounded. DML-IV outperforms state-of-the-art IV regression methods on IV regression benchmarks and learns high-performing policies in the presence of instruments. Copyright 2024 by the author(s)
作者:
Huang, Po-HsunHsiao, Tzu-Chien
Hsinchu300 Taiwan Nycu
Department of Computer Science College of Cs and Institute of Biomedical Engineering College of Electrical and Computer Engineering Hsinchu300 Taiwan
The determination of appropriate parameters and an appropriate window size in most entropy-based measurements of time-series complexity is a challenging problem. Inappropriate settings can lead to the loss of intrinsi...
详细信息
Ambient diffusion is a recently proposed framework for training diffusion models using corrupted data. Both Ambient Diffusion and alternative SURE-based approaches for learning diffusion models from corrupted data res...
详细信息
Ambient diffusion is a recently proposed framework for training diffusion models using corrupted data. Both Ambient Diffusion and alternative SURE-based approaches for learning diffusion models from corrupted data resort to approximations which deteriorate performance. We present the first framework for training diffusion models that provably sample from the uncorrupted distribution given only noisy training data, solving an open problem in Ambient diffusion. Our key technical contribution is a method that uses a double application of Tweedie's formula and a consistency loss function that allows us to extend sampling at noise levels below the observed data noise. We also provide further evidence that diffusion models memorize from their training sets by identifying extremely corrupted images that are almost perfectly reconstructed, raising copyright and privacy concerns. Our method for training using corrupted samples can be used to mitigate this problem. We demonstrate this by fine-tuning Stable Diffusion XL to generate samples from a distribution using only noisy samples. Our framework reduces the amount of memorization of the fine-tuning dataset, while maintaining competitive performance. Copyright 2024 by the author(s)
Modern technological advancements have made social media an essential component of daily *** media allow individuals to share thoughts,emotions,and *** analysis plays the function of evaluating whether the sentiment o...
详细信息
Modern technological advancements have made social media an essential component of daily *** media allow individuals to share thoughts,emotions,and *** analysis plays the function of evaluating whether the sentiment of the text is positive,negative,neutral,or any other personal emotion to understand the sentiment context of the *** analysis is essential in business and society because it impacts strategic *** analysis involves challenges due to lexical variation,an unlabeled dataset,and text distance *** execution time increases due to the sequential processing of the sequence ***,the calculation times for the Transformer models are reduced because of the parallel *** study uses a hybrid deep learning strategy to combine the strengths of the Transformer and Sequence models while ignoring their *** particular,the proposed model integrates the Decoding-enhanced with Bidirectional Encoder Representations from Transformers(BERT)attention(DeBERTa)and the Gated Recurrent Unit(GRU)for sentiment *** the Decoding-enhanced BERT technique,the words are mapped into a compact,semantic word embedding space,and the Gated Recurrent Unit model can capture the distance contextual semantics *** proposed hybrid model achieves F1-scores of 97%on the Twitter Large Language Model(LLM)dataset,which is much higher than the performance of new techniques.
Given the severity of waste pollution as a major environmental concern, intelligent and sustainable waste management is becoming increasingly crucial in both developed and developing countries. The material compositio...
详细信息
With the explosive growth of mobile data traffic, roadside-unit (RSU) caching is considered an effective way to offload download traffic in vehicular ad hoc networks (VANETs). Many existing works investigate the conte...
详细信息
暂无评论