In the rapidly evolving field of machine learning, training models with datasets from various locations and organizations presents significant challenges due to privacy and legal concerns. the exploration of effective...
详细信息
ISBN:
(数字)9798400712487
ISBN:
(纸本)9798400712487
In the rapidly evolving field of machine learning, training models with datasets from various locations and organizations presents significant challenges due to privacy and legal concerns. the exploration of effective collaborative training settings, which are capable of leveraging valuable knowledge from distributed and isolated datasets, is increasingly *** study investigates key factors that impact the effectiveness of collaborative training methods in code next-token prediction, as well as the correctness and utility of the generated code, showing the promise of such methods. Additionally, we evaluate the memorization of different participant training data across various collaborative training settings, including centralized, federated, and incremental training, showing their potential risks in leaking data. Our findings indicate that the size and diversity of code datasets are pivotal factors influencing the success of collaborative trained code models. We demonstrate that federated learning achieves competitive performance compared to centralized training while offering better data protection, as evidenced by lower memorization ratios in the generated code. However, federated learning can still produce verbatim code snippets from hidden training data, potentially violating data privacy or copyright. Our study further explores the patterns of effectiveness and memorization in incremental learning, emphasizing the importance of the sequence in which individual participant datasets are introduced. Also, we identify the memorization phenomenon of cross-organizational clones as a prevalent challenge in both centralized and federated learning scenarios. Our findings highlight the persistent risk of data leakage during inference, even when training data remains unseen. We conclude with strategic recommendations for practitioners and researchers to optimize the use of multisource datasets, thereby propelling the cross-organizational collaboration forward.
As the homogenization of Web services becomes more and more common, the difficulty of service recommendation is gradually increasing. How to predict Quality of Service (QoS) more efficiently and accurately becomes an ...
详细信息
CodeSearchNet is a widely used dataset of comment-code pairs for training code search models. However, code search models trained on the datasets of comment-code pairs usually have lower performance in real-world appl...
详细信息
Code search aims to retrieve relevant code snippets from large code repositories based on query, promoting code reuse and enhancing software development efficiency. Deep Learning is a powerful approach for code search...
详细信息
Security Operation Center (SOC) teams manually analyze numerous tools' API documentation to find appropriate APIs to define, update and execute incident response plans for responding to security incidents. Manuall...
详细信息
ISBN:
(纸本)9789897586477
Security Operation Center (SOC) teams manually analyze numerous tools' API documentation to find appropriate APIs to define, update and execute incident response plans for responding to security incidents. Manually identifying security tools' APIs is time consuming that can slow down security incident response. To mitigate this manual process's negative effects, automated API recommendation support is desired. the state-of-the-art automated security tool API recommendation uses Deep Learning (DL) model. However, DL models are environmentally unfriendly and prohibitively expensive requiring huge time and resources (denoted as "Red AI"). Hence, "Green AI" considering both efficiency and effectiveness is encouraged. Given SOCs' incident response is hindered by cost, time and resource constraints, we assert that Machine Learning (ML) models are likely to be more suitable for recommending suitable APIs with fewer resources. Hence, we investigate ML model's applicability for effective and efficient security tools' API recommendation. We used 7 real world security tools' API documentation, 5 ML models, 5 feature representations and 19 augmentation techniques. Our Logistic Regression model with word and character level features compared to the state-of-the-art DL-based approach reduces 95.91% CPU core hours, 97.65% model size, 291.50% time and achieves 0.38% better accuracy, which provides cost-cutting opportunities for industrial SOC adoption.
Microservice architecture became the mainstream for cloud-native systems. While many microservice system benchmarks have been introduced to the scientific community, there is still a notable gap since the benchmarks d...
详细信息
ISBN:
(纸本)9783031709456;9783031712463
Microservice architecture became the mainstream for cloud-native systems. While many microservice system benchmarks have been introduced to the scientific community, there is still a notable gap since the benchmarks do not offer architectural variants of the same system with identical functionality. For instance, research engaged monoto-micro decomposition, but proposed methods use textbook or tutorial systems. Moreover, different microservice granularities in the system provide different architectural trade-offs. this paper extends an established microservice benchmark with two new variants, including a system monolith and a version with 20 microservices. Equivalent functionality is validated across the three benchmark variants.
this research evaluates three air distribution methods in terms of cross-infection minimization through CFD simulation using interacting virtual manikins and the LES approach. Displacement and stratum presented less p...
详细信息
Increased ventilation rates and enhanced air filtration are proven ways to decrease indoor airborne transmission of SARS-CoV-2. this study uses sensors to monitor carbon dioxide (CO2) and fine particulate matter (PM2....
详细信息
Ethical guidelines are an asset for artificial intelligence(AI) development and conforming to them will soon be a procedural requirement once the EU AI Act gets ratified in the European parliament. However, developers...
详细信息
ISBN:
(纸本)9798350322637
Ethical guidelines are an asset for artificial intelligence(AI) development and conforming to them will soon be a procedural requirement once the EU AI Act gets ratified in the European parliament. However, developers often lack explicit knowledge on how to apply these guidelines during the system development process. A literature review of different ethical guidelines from various countries and organizations has revealed inconsistencies in the principles presented and the terminology used to describe such principles. this research begins by identifying the limitations of existing ethical AI development frameworks in performing requirements engineering(RE) processes during the development of trustworthy AI. Recommendations to address those limitations will be proposed to make the frameworks more applicable in the RE process to foster the development of trustworthy AI. this could lead to wider adoption, greater productivity of the AI systems, and reduced workload on humans for non-cognitive tasks. Considering the impact of some of the newer foundation models like Github Copilot and ChatGPT, the vision for this research project is to work towards the development of holistic operationalisable RE guidelines for the development and implementation of trustworthy AI not only on a product level but also on process level.
To improve assembly quality and efficiency, a method based on deep learning and object matching is proposed to detect missing and wrong parts. An improved YOLO V3 neural network is designed to solve the problem of mis...
详细信息
ISBN:
(数字)9781665490429
ISBN:
(纸本)9781665490429
To improve assembly quality and efficiency, a method based on deep learning and object matching is proposed to detect missing and wrong parts. An improved YOLO V3 neural network is designed to solve the problem of missing assembly. A small target detection scale and attention module is added to the neural network. the size of prior anchor box is optimized by K-means++ clustering algorithm. For the problem of wrong assembly, the standard assembly state detection template is constructed according to the virtual assembly scene in CAD software, and the 2D detection box of the current assembly object in the scene image is matched withthe 2D box in the standard state template based on IoU (Intersection over Union) calculation. the assembly model MONA (a 3D model for the evaluation of manual Assembly tasks), is used to test the proposed method. Experimental results show that this method can accurately locate and identify assembly parts, and effectively detect the missing and wrong parts in the assembly process.
暂无评论