Modern processors are becoming more complex and as features and application size increase, their evaluation is becoming more time-consuming. To date, design space exploration relies on extensive use of software simula...
详细信息
ISBN:
(纸本)9783981080155
Modern processors are becoming more complex and as features and application size increase, their evaluation is becoming more time-consuming. To date, design space exploration relies on extensive use of software simulation that when highly accurate is slow. In this paper we propose ReSim, a parameterizable ILP processor simulation acceleration engine based on reconfigurable hardware. We describe ReSim's trace-driven microarchitecture that allows us to simulate the operation of a complex ILP processor in a cycle serial fashion, aiming to simplify implementation complexity and to boost operating frequency. Being trace driven, ReSim can simulate timing in an almost ISA independent fashion, and supports all SimpleScalar ISAs, i.e. PISA, Alpha, etc. We implemented ReSim for the latest Xilinx devices. In our experiments with a 4-way superscalar processor ReSim achieves a simulation throughput of up to 28MIPS, and offers more than a factor of 5x improvement over the best reported ILP processor hardware simulators.
System-Level Test (SLT) has been an integral part of integrated circuit test flows for over a decade and continues to be significant. Nevertheless, there is a lack of systematic approaches for generating test programs...
详细信息
ISBN:
(数字)9798350349320
ISBN:
(纸本)9798350349337
System-Level Test (SLT) has been an integral part of integrated circuit test flows for over a decade and continues to be significant. Nevertheless, there is a lack of systematic approaches for generating test programs, specifically focusing on the non-functional aspects of the Device under Test (DUT). Currently, test engineers manually create test suites using commercially available software to simulate the end-user environment of the DUT. This process is challenging and laborious and does not assure adequate control over non-functional properties. This paper proposes to use Large Language Models (LLMs) for SLT program generation. We use a pre-trained LLM and fine-tune it to generate test programs that optimize non-functional properties of the DUT, e.g., instructions per cycle. Therefore, we use Gem5, a microarchitectural simulator, in conjunction with Reinforcement Learning-based training. Finally, we write a prompt to generate C code snippets that maximize the instructions per cycle of the given architecture. In addition, we apply hyperparameter optimization to achieve the best possible results in inference.
System-Level Test (SLT) has been a part of the test flow for integrated circuits for over a decade and still gains importance. However, no systematic approaches exist for test program generation, especially targeting ...
详细信息
System-Level Test (SLT) is essential for testing integrated circuits, focusing on functional and non-functional properties of the Device under Test (DUT). Traditionally, test engineers manually create tests with comme...
详细信息
ISBN:
(数字)9798350366884
ISBN:
(纸本)9798350366891
System-Level Test (SLT) is essential for testing integrated circuits, focusing on functional and non-functional properties of the Device under Test (DUT). Traditionally, test engineers manually create tests with commercial software to simulate the DUT's end-user environment. This process is both time-consuming and offers limited control over non-functional properties. This paper proposes Large Language Models (LLMs) enhanced by Structural Chain of Thought (SCoT) prompting, a temperature schedule, and a pool of previously generated snippets to generate high-quality code snippets for SLT. We repeatedly query the LLM for a better snippet using previously generated snippets as examples, thus creating an iterative optimization loop. This approach can automatically generate snippets for SLT that target specific non-functional properties, reducing time and effort. Our findings show that this approach improves the quality of the generated snippets compared to unstructured prompts containing only a task description.
Spike detection plays a central role in neural data processing and brain-machine interfaces (BMIs). A challenge for future-generation implantable BMIs is to build a spike detector that features both low hardware cost ...
详细信息
Quantum computing is a promising technology that requires a sophisticated software stack to connect end users to the wide range of possible quantum backends. However, current software tools are usually hard-coded for ...
详细信息
ISBN:
(数字)9798331541378
ISBN:
(纸本)9798331541385
Quantum computing is a promising technology that requires a sophisticated software stack to connect end users to the wide range of possible quantum backends. However, current software tools are usually hard-coded for single platforms and lack a dynamic interface that can automatically retrieve and adapt to changing physical characteristics and constraints of different platforms. With new hardware platforms frequently introduced and their performance changing on a daily basis, this constitutes a serious limitation. In this paper, we show-case a concept and a prototypical realization of an interface, called the Quantum Device Management Interface (QDMI), that addresses this problem by explicitly connecting the software and hardware developers, mediating between their competing interests. QDMI allows hardware platforms to provide their physical characteristics in a standardized way, and software tools to query that data to guide the compilation process accordingly. This enables software tools to automatically adapt to different platforms and to optimize the compilation process for the specific hardware constraints. QDMI is a central part of the Munich Quantum Software Stack (MQSS)-a sophisticated software stack to connect end users to the wide range of possible quantum backends. QDMI is publicly available as open source at https://***/Munich-Quantum-Software-Stack/QDMI.
The modeling of atmospheric processes in the context of weather and climate simulations is an important and computationally expensive challenge. The temporal integration of the underlying PDEs requires a very large nu...
详细信息
Effcient time integration schemes are necessary to capture the complex processes involved in atmospheric ows over long periods of time. In this work, we propose a high-order, implicit-explicit numerical scheme that co...
详细信息
Load imbalance is a challenge for parallel applications in High Performance Computing (HPC). It is caused by processes having different execution times or load values, leading to idle or wait times at synchronization ...
详细信息
ISBN:
(数字)9798350355543
ISBN:
(纸本)9798350355550
Load imbalance is a challenge for parallel applications in High Performance Computing (HPC). It is caused by processes having different execution times or load values, leading to idle or wait times at synchronization points, where faster processes must wait for the slowest process to catch up. To mitigate this issue, applications can employ load balancing (LB) strategies, which migrate load between processes to even out load. This is often referred to as the Load Rebalancing Problem (LRP). While many approaches solving the LRP exist, they can only be heuristics and hence further optimization potential exists. In our work, we turn to a novel approach by using hybrid classical-quantum approaches and present two versions of the constrained quadratic model for solving the LRP; the two differ in how they balance the number of qubits required with the types of applied constraints. We compare the quantum-based methods with classical methods using heuristic algorithms Greedy, Karmarkar–Karp, and ProactLB. We evaluate our approaches using imbalance ratio and speedup as metrics, as well as the number of migrated tasks to indicate overhead caused by migrations. Our results show that the quantum-based methods outperform the classic methods. For example, we need only 1/4 of the number of migrated tasks in a realistic use case compared with classical methods, particularly Greedy and KK, to balance the load.
作者:
Miyazaki, JYokota, HMemberSchool of Information Science
Japan Advanced Institute of Science and Technology Ishikawa-ken Japan 923 Received his B.E. degree from Tokyo Institute of Technology
Tokyo Japan in 1992 and his M.S. degree from Japan Advanced Institute of Science and Technology (JAIST) Ishikawa Japan in 1994. He is a Ph.D. student at School of Information Science JAIST. His research interests include parallel rule base systems active database systems and high performance I/O systems. He is a member of the Institute of Electronics Information and Communication Engineers of Japan and Information Processing Society of Japan.Received his B.E.
M.E. and Dr. of Eng. degrees from Tokyo Institute of Technology in 1980 1982 and 1991 respectively. He joined Fujitsu Ltd. in 1982 and was a researcher at the Institute of New Generation Computer Technology (ICOT) from 1982 to 1986 and at Fujitsu Laboratories Ltd. from 1986 to 1992. He is an Associate Professor of School of Information Science Japan Advanced Institute of Science and Technology (JAIST). HIS research interests include parallel computer architecture for databases and data engineering. He is a member of IPS I.E.I.C.E.JSAIIEEEand ACM.
An implementation method is proposed for parallel production systems on multicomputers, or message-passing computers, to speed up execution time. There have been proposed parallel production systems using hash mechani...
详细信息
An implementation method is proposed for parallel production systems on multicomputers, or message-passing computers, to speed up execution time. There have been proposed parallel production systems using hash mechanism, but they cause a skewed load distribution problem. To obtain more efficient balance of load, the method addressed here and named clustered parallel production systems (CPPS) adopts two load balancing strategies: hash and demand-driven. Taking account of the cost of termination detection, the execution time of the CPPS and simple hash method are estimated, and implement on an nCube2. The estimation meets the execution results. The CPPS provides much better load balance to improve scalability.
暂无评论