This paper proposes a novel supervised multi-view feature selection method via maximum margin criterion (MMC) joint distributed optimization. Firstly, the proposed method integrates the common loss and the local loss ...
详细信息
In the modern digital landscape, cyber-attacks have become highly advanced and difficult to detect, especially in distributedsystems. These Denial of Service (DoS) or distributed Denial of Service (DDoS) attacks lead...
详细信息
Cattle detection and stock monitoring in open fields continues to pose a challenge in smart agriculture due to the dynamic motion of the cattle and the complex background scene. With the advancement of sensor technolo...
详细信息
Multilinear algebra kernel performance on modern massively-parallel systems is determined mainly by data movement. However, deriving data movement-optimal distributed schedules for programs with many high-dimensional ...
详细信息
ISBN:
(纸本)9781665454445
Multilinear algebra kernel performance on modern massively-parallel systems is determined mainly by data movement. However, deriving data movement-optimal distributed schedules for programs with many high-dimensional inputs is a notoriously hard problem. State-of-the-art libraries rely on heuristics and often fall back to suboptimal tensor folding and BLAS calls. We present Deinsum, an automated framework for distributed multilinear algebra computations expressed in Einstein notation, based on rigorous mathematical tools to address this problem. Our framework automatically derives data movement-optimal tiling and generates corresponding distributed schedules, further optimizing the performance of local computations by increasing their arithmetic intensity. To show the benefits of our approach, we test it on two important tensor kernel classes: Matricized Tensor Times Khatri-Rao Products and Tensor Times Matrix chains. We show performance results and scaling on the Piz Daint supercomputer, with up to 19x speedup over state-of-the-art solutions on 512 nodes.
Optimization problem in remote areas had been a problem that could not be fully fixed and adjusted the field. This research focuses on optimizing distributed generator rescheduling in Sangihe Island, a remote region w...
详细信息
As a new computing paradigm to solve large-scale group collaboration problems, crowdsourcing has attracted more and more attention. However, malicious users' participation in crowdsourcing tasks will affect the co...
详细信息
ISBN:
(数字)9781665480185
ISBN:
(纸本)9781665480185
As a new computing paradigm to solve large-scale group collaboration problems, crowdsourcing has attracted more and more attention. However, malicious users' participation in crowdsourcing tasks will affect the completion of crowdsourcing tasks or generate malicious evaluations that are inconsistent with the facts, which will reduce the user satisfaction of ordinary users and even lose their trust in the system. In addition, most of the existing crowdsourcing systems rely on the central server and are vulnerable to a single point of failure, affecting users' trust in the system. To solve the above problems, this paper proposes a trusted distributed crowdsourcing framework based on user preferences. Firstly, we propose a trust model of identifying malicious users (IMU) based on reputation value, which can quickly identify all kinds of malicious users. Secondly, the framework is based on an open, transparent, and tamper-proof consortium blockchain to ensure the security and reliability of transaction information, and has developed a complete service process for it. Finally, this paper also takes into account the different preferences of users, and gives priority to the tasks that best meet users' preferences to improve user satisfaction. The proposed framework is deployed on the IBM Hyperledger Fabric. The average transaction confirmation time is 1.4424 s and the average system throughput is 186tps. The experimental results show that the framework can quickly identify malicious users.
Developing digital biomarkers requires handling unprecedented quantities of digital data generated from digital health technologies that utilize a combination of computing platforms, connectivity, software, and sensor...
详细信息
ISBN:
(纸本)9798350310764
Developing digital biomarkers requires handling unprecedented quantities of digital data generated from digital health technologies that utilize a combination of computing platforms, connectivity, software, and sensors. These collected data need to be transformed and transported into a meaningful and useful format before further being derived into health indicators for understanding disease state and life quality. The unique challenges for this class of data engineering tasks are due to the complexity and volume of the digital data we are handling, the data quality and fidelity required to enable subsequent analysis, and the repeated cycles of trial and error to achieve the desired results. This paper presents a family of systems, pipelines, and methods we have designed and built to facilitate these tasks in a typical digital data engineering lifecycle in the context of digital biomarker development.
With the widespread use of fiber optic-based distributed sensing technology, many application areas such as temperature detection, leak detection, conveyor temperature monitoring, fire detection in tunnels and passage...
详细信息
distributed Machine Learning (DML) at the edge of the network involves model learning and inference across networking nodes over distributed data. One type of model learning could be the delivery of predictive analyti...
详细信息
As deep learning grows rapidly, model training heavily relies on parallel methods and there exist numerous cluster configurations. However, current preferences for parallel training focus on data centers, overlooking ...
As deep learning grows rapidly, model training heavily relies on parallel methods and there exist numerous cluster configurations. However, current preferences for parallel training focus on data centers, overlooking the financial constraints faced by most researchers. To attain the best performance within the cost limitation, we introduce a throughput-cost metric to accurately characterize clusters' cost-effectiveness. Based on this metric, we design a cost-effective cluster featuring the 3090 with NVLink. The experiment results demonstrate that our cluster achieves remarkable cost-effectiveness in various distributed model training schemes.
暂无评论