Uplift modeling, also known as individual treatment effect (ITE) estimation, is an important approach for data-driven decision making that aims to identify the causal impact of an intervention on individuals. This pap...
详细信息
Nowadays, swarm intelligence algorithms are used to solve various problem in IoT environments because of their excellent performance, and the particle swarm algorithm(PSO) is a superior algorithm in swarm intelligence...
详细信息
The growth of social media, characterized by its multimodal nature, has led to the emergence of diverse phenomena and challenges, which calls for an effective approach to uniformly solve automated tasks. The powerful ...
详细信息
Deep learning methods are transforming research, enabling new techniques, and ultimately leading to new discoveries. As the demand for more capable AI models continues to grow, we are now entering an era of Trillion P...
详细信息
ISBN:
(纸本)9798400704734
Deep learning methods are transforming research, enabling new techniques, and ultimately leading to new discoveries. As the demand for more capable AI models continues to grow, we are now entering an era of Trillion Parameter Models (TPM), or models with more than a trillion parameters---such as Huawei's PanGu-Σ. We describe a vision for the ecosystem of TPM users and providers that caters to the specific needs of the scientific community. We then outline the significant technical challenges and open problems in system design for serving TPMs to enable scientific research and discovery. Specifically, we describe the requirements of a comprehensive software stack and interfaces to support the diverse and flexible requirements of researchers.
Credit-lending organizations have resorted to the use of machine learning (ML) algorithms in the recent past to predict the probability of the default of a business. Explainability of the decisions made by the traditi...
详细信息
Credit-lending organizations have resorted to the use of machine learning (ML) algorithms in the recent past to predict the probability of the default of a business. Explainability of the decisions made by the traditional statistical algorithms like Logit models brings transparency to every stakeholder involved in the process. On the other hand, machine learning models like XGBoost and Neural Nets have achieved better accuracy scores, but their decisions are not easily comprehensible. In this paper, we propose a graph based variable clustering (GVC) method as a filter based approach to select prominent features while retaining as much variance as possible. Our experiments show that our GVC approach is not only almost 40 times faster than the existing variable clustering methods but retains retains 5% more variance than the existing *** feature set from GVC approach has performed better with an increase of 6% accuracy on an average. The predictions on the feature set from GVC were 98% accurate using XGBoost algorithm.
This article presents a free and open source toolkit that supports the semi-automated checking of research outputs (SACRO) for privacy disclosure within secure data environments. SACRO is a framework that applies best...
详细信息
The past decade has seen a significant increase in the automobile industry, which has come with some serious challenges and threats. Modem vehicles are now made up of complex mechanical systems, as well as sophisticat...
The past decade has seen a significant increase in the automobile industry, which has come with some serious challenges and threats. Modem vehicles are now made up of complex mechanical systems, as well as sophisticated electronic devices and connections to the outside world. Various electronic devices utilize standard communication protocols, including the Controller Area Network (CAN), to establish communication with each other. Unfortunately, CAN lacks some fundamental security features, such as encryption and authentication, which makes it vulnerable to security attacks. This can lead to accidents and financial losses for the users of these vehicles. To address this issue, researchers have proposed a number of security measures, such as cryptography and Intrusion Detection Systems (IDS). This paper addresses the security vulnerabilities associated with CAN and proposes potential solutions to overcome its limitations.
Edge storage presents a viable data storage alternative for application vendors (AV), offering benefits such as reduced bandwidth overhead and latency compared to cloud storage. However, data cached in edge computing ...
详细信息
The spread of Corona Virus Disease 19 (COVID-19) in Indonesia is still relatively high and has not shown a significant decrease. One of the main reasons is due to the lack of supervision on the implementation of healt...
详细信息
Machine learning (ML) research strongly relies on benchmarks in order to determine the relative effectiveness of newly proposed models. Recently, a number of prominent research effort argued that a number of models th...
详细信息
ISBN:
(数字)9798350359312
ISBN:
(纸本)9798350359329
Machine learning (ML) research strongly relies on benchmarks in order to determine the relative effectiveness of newly proposed models. Recently, a number of prominent research effort argued that a number of models that improve the state-of-the-art by a small margin tend to do so by winning what they call a "benchmark lottery". An important benchmark in the field of machine learning and computer vision is the ImageNet where newly proposed models are often showcased based on their performance on this dataset. Given the large number of self-supervised learning (SSL) frameworks that has been proposed in the past couple of years each coming with marginal improvements on the ImageNet dataset, in this work, we evaluate whether those marginal improvements on ImageNet translate to improvements on similar datasets or not. To do so, we investigate twelve popular SSL frameworks on five ImageNet variants and discover that models that seem to perform well on ImageNet may experience significant performance declines on similar datasets. Specifically, state-of-the-art frameworks such as DINO and Swav, which are praised for their performance, exhibit substantial drops in performance while MoCo and Barlow Twins displays comparatively good results. As a result, we argue that otherwise good and desirable properties of models remain hidden when benchmarking is only performed on the ImageNet validation set, making us call for more adequate benchmarking. To avoid the "benchmark lottery" on ImageNet and to ensure a fair benchmarking process, we investigate the usage of a unified metric that takes into account the performance of models on other ImageNet variant datasets.
暂无评论