In light of recent advancements in Internet of Multimedia things (IoMT) and 5G technology, boththe variety and quantity of data have been rapidly increasing. Consequently, handling zero-shot cross-modal retrieval (ZS...
详细信息
Withthe application of blockchain light nodes in embedded devices, how to alleviate computing pressure brought by complex operations such as transaction's SPV Verification for CPU of embedded devices and improve ...
详细信息
Android malware detection has become research hotspot in mobile security. When security service providers obtain feature information from target samples, they may involve user privacy information such as identity and ...
详细信息
this paper presents GPTuneCrowd, a crowd-based autotuning framework for tuning high-performance computing applications. GPTuneCrowd collects performance data from various users using a user-friendly tuner interface. G...
详细信息
ISBN:
(纸本)9798350337662
this paper presents GPTuneCrowd, a crowd-based autotuning framework for tuning high-performance computing applications. GPTuneCrowd collects performance data from various users using a user-friendly tuner interface. GPTuneCrowd then presents novel autotuning techniques, based on transfer learning and parameter sensitivity analysis, to maximize tuning quality using collected data from the crowd. this paper shows several real-world case studies of GPTuneCrowd. Our evaluation shows that GPTuneCrowd's transfer learning improves the tuned performance of ScaLAPACK's PDGEQRF by 1.57x and a plasma fusion code NIMROD by 2.97x, over a non-transfer learning autotuner. We use GPTuneCrowd's sensitivity analysis to reduce the search space of SuperLU DIST and Hypre. Tuning on the reduced search space achieves 1.17x and 1.35x better tuned performance of SuperLU DIST and Hypre, respectively, compared to the original search space.
Many of Taobaos important daily data mining tasks, such as anomaly attack detection and interest group detection, require efficient algorithmic solutions for mining specific graph patterns. the most common graph patte...
详细信息
the development of the processing capability has enabled the implementation of advanced control systems for power electronics converters. In this framework, the modular multilevel converter has attracted the attention...
详细信息
ISBN:
(数字)9781665466189
ISBN:
(纸本)9781665466189
the development of the processing capability has enabled the implementation of advanced control systems for power electronics converters. In this framework, the modular multilevel converter has attracted the attention of industry and academia, thanks to the good performance in terms of power quality, wide voltage capability, and fault tolerance, which are key requirement for high power applications. this paper proposes a quasi distribute control architecture based on a FemtoCore platform, an optimized soft-core designed for modular power electronics applications. Latency estimates and simulation results shows the potential of this solution for the control of modular multilevel converters.
High-fidelity flow simulations are indispensable when analyzing systems exhibiting multiphase flow phenomena. the accuracy of multiphase flow simulations is strongly contingent upon the finest mesh resolution used to ...
详细信息
ISBN:
(纸本)9798350337662
High-fidelity flow simulations are indispensable when analyzing systems exhibiting multiphase flow phenomena. the accuracy of multiphase flow simulations is strongly contingent upon the finest mesh resolution used to represent the fluid-fluid interfaces. However, the increased resolution comes at a higher computational cost. In this work, we propose algorithmic advances that aim to reduce the computational cost without compromising on the physics by selectively detecting key regions of interest (droplets/filaments) that require significantly higher resolution. the framework uses an adaptive octree-based meshing framework that is integrated with PETSc's linear algebra solvers. We demonstrate scaling of the framework up to 114,688 processes on TACC's Frontera. Finally, we deploy the framework to simulate one of the most resolved simulations of primary jet atomization. this simulation - equivalent to 35 trillion grid points on a uniform grid - is 64x larger than current state-of-the-art simulations and provides unprecedented insights into an important flow physics problem with a diverse array of engineering applications.
A program's architecture-how it organizes the invocation of application-specific logic-influences important program characteristics including its scalability and security. Architecture details are usually expresse...
详细信息
ISBN:
(纸本)9798350311990
A program's architecture-how it organizes the invocation of application-specific logic-influences important program characteristics including its scalability and security. Architecture details are usually expressed in the same programming language as the rest of a program, and can be difficult to distinguish from non-architecture code. And once defined, architecture is difficult and risky to change because it couples tightly with application logic over time. We introduce C-Saw: an approach to express a software's architecture using a new embedded domain-specific language (EDSL) designed for that purpose. It decouples application-specific logic from architecture, making it easier to identify architectural details of software. C-Saw leverages three ideas: (i) introducing a new, formally-specified EDSL to separate an application's architecture description from its programming language;(ii) reducing architecture implementation to the definition and management of distributed key-value tables, and (iii) introducing an expressive state-management abstraction for distributedapplications. We describe a prototype implementation of C-Saw for C programs and use it to build end-to-end examples of expressing and changing the architecture of widely-used, third-party software. We evaluate this on Redis, cURL, and Suricata and find that C-Saw provides expressiveness and reusability, requires fewer lines of code when compared to directly using C to express architectural patterns, and imposes low performance overhead on typical workloads.
Lattice cryptography, as a recognized Cryptosystem that can resist quantum computation, has great potential for development. Lattice based signature scheme is currently a research focus. In this paper, the traceable r...
详细信息
applicationsthat fuse machine learning and simulation can benefit from the use of multiple computing resources, with, for example, simulation codes running on highly parallel supercomputers and AI training and infere...
详细信息
ISBN:
(纸本)9798350311990
applicationsthat fuse machine learning and simulation can benefit from the use of multiple computing resources, with, for example, simulation codes running on highly parallel supercomputers and AI training and inference tasks on specialized accelerators. Here, we present our experiences deploying two AI-guided simulation workflows across such heterogeneous systems. A unique aspect of our approach is our use of cloud-hosted management services to manage challenging aspects of cross-resource authentication and authorization, function-as-a-service (FaaS) function invocation, and data transfer. We show that these methods can achieve performance parity with systems that rely on direct connection between resources. We achieve parity by integrating the FaaS system and data transfer capabilities with a system that passes data by reference among managers and workers, and a user-configurable steering algorithm to hide data transfer latencies. We anticipate that this ease of use can enable routine use of heterogeneous resources in computational science.
暂无评论