Synthetic Aperture Radar (SAR) tomography is an advanced technique for monitoring deformations of the Earth’s surface. However, the computational complexity of SAR tomography algorithms often restricts their applicat...
详细信息
ISBN:
(数字)9798331524937
ISBN:
(纸本)9798331524944
Synthetic Aperture Radar (SAR) tomography is an advanced technique for monitoring deformations of the Earth’s surface. However, the computational complexity of SAR tomography algorithms often restricts their application to large-scale datasets. To address this issue, we introduce a multi-level parallel implementation of a single scatterer detection algorithm specifically designed to exploit the capabilities of modern heterogeneous High-Performance Computing (HPC) systems. By efficiently distributing the computational workload at different levels across multiple processing units, our parallel approach significantly reduces processing time, facilitating the analysis of extensive SAR datasets. We assess the performance of our parallel implementation using real-world SAR data, showcasing its effectiveness in enhancing both the efficiency and scalability of SAR tomography. Our work contributes to advancing remote sensing techniques and offers valuable insights into the application of HPC for large-scale environmental monitoring.
real-time applications are increasing in their complexity of control and computational demands. parallel and distributedsystems provide cost-efficient computing power and higher degree of fault tolerance that make th...
详细信息
real-time applications are increasing in their complexity of control and computational demands. parallel and distributedsystems provide cost-efficient computing power and higher degree of fault tolerance that make these systems attractive computer systems for the next generation of real-timesystems. As real-timesystems move from uniprocessor systems to parallel and distributedsystems, the design of real-timesystems becomes more complex and new techniques are required. This dissertation provides new approaches for solving two closely related problems on designing parallel and distributedreal-timesystems: dynamic scheduling of tasks with precedence relations and communication support for wormhole routed networks. The task scheduling algorithm combines task graph partitioning, least-laxity-first scheduling and branch-and-bound task allocation techniques to provide required real-time performance. Performance analysis show that the algorithm can efficiently schedule precedence-constrained tasks with low scheduling overhead. The parameters that affect the performance and hardware costs are studied to provide system designers with the means for fine tuning the algorithm for different system configurations. The flow control scheme of a direct network manages the network resources and directly relates to the system performance. Several flow control schemes are developed to support real-time communication on wormhole networks. The schemes differ in their priority mapping, priority adjustment, arbitration and message dropping strategies. A priority mapping scheme encodes the timing property of a message into a priority, which can be represented in a small number of digits. As the timing property of a message changes, a priority adjustment method modifies the priority to reflect the current status of the message. An arbitration function decides how to allocate bandwidth. Messages that miss their deadlines and lose their value are removed from the network by a message dro
The integration of the Internet of Things (IoT) into various industries has led to an exponential increase in the volume of data generated, posing significant challenges for compliance monitoring. Traditional complian...
详细信息
Embedded real-time applications increasingly present high computation requirements, which need to be completed within specific deadlines. But, those applications present highly variable patterns, depending on the data...
详细信息
ISBN:
(纸本)9780769547909
Embedded real-time applications increasingly present high computation requirements, which need to be completed within specific deadlines. But, those applications present highly variable patterns, depending on the data set in a determined instant. The current trend to provide parallel processing in the embedded domain allows providing higher processing power;however, it does not address the variability in the processing pattern. Dimensioning each device for its worst-case scenario implies lower average utilization, and increased available, but unusable, processing in the overall system. A solution for this problem is to extend the parallel execution of the applications, allowing networked nodes to distribute the workload, on peak situations, to neighbour nodes. In this context, this paper proposes a framework to develop parallel and distributedreal-time embedded applications, transparently using OpenMP and Message Passing Interface (MPI), within a programming model based on OpenMP. The paper also devises an integrated timing model, which enables the structured reasoning on the timing behaviour of these hybrid architectures.
This investigation purpose a unique Dynamic Bio-Information Image Recognition System (DBIRS) built for the real-time monitoring of cancer cell apoptosis produced by available medical therapy. Utilizing advances in mic...
详细信息
ISBN:
(数字)9791188428137
ISBN:
(纸本)9798331507602
This investigation purpose a unique Dynamic Bio-Information Image Recognition System (DBIRS) built for the real-time monitoring of cancer cell apoptosis produced by available medical therapy. Utilizing advances in microscopy, AI-based image recognition, and parallel computing across distributed edge devices, the proposed system dramatically boosts the precision and efficiency of cellular investigation. By deploying numerous AI modules adapted to distinct cellular states, the system provides automatic, high-accuracy recognition of morphological changes in cancer cells, hence minimizing the need on manual procedures. This technique offers remote monitoring and fast data processing via cloud integration, permitting continuous observation and timely intervention. The system's efficiency is proven through trials on K-562 cells, where it attained a recognition accuracy of 97.41%. These outcomes underline the potential of the proposed method to refine cytotoxicity assays, delivering a strong tool for increasing the efficacy and safety of immune cell treatments in clinical and research settings.
Today's large-scale parallel workflows are often processed on heterogeneous distributed computing platforms. From an economic perspective, computing resource providers should minimize the cost while offering high ...
详细信息
Today's large-scale parallel workflows are often processed on heterogeneous distributed computing platforms. From an economic perspective, computing resource providers should minimize the cost while offering high service quality. It has become well-recognized that energy consumption accounts for a large part of a computing system's total cost, and timeliness and reliability are two important service indicators. This work studies the problem of scheduling a parallel workflow that minimizes the system energy consumption under the constraints of response time and reliability. We first mathematically formulate this problem as a Non-linear Mixed Integer Programming problem. Since this problem is hard to solve directly, we present some highly-efficient heuristic solutions. Specifically, we first develop an algorithm that minimizes the schedule length while meeting reliability requirement, on top of which we propose a processor-merging algorithm and a slack time reclamation algorithm using a dynamic voltage frequency scaling (DVFS) technique to reduce energy consumption. The processor-merging algorithm tries to turn off some energy-inefficient processors such that energy consumption can be minimized. The DVFS technique is applied to scale down the processor frequency at both processor and task levels to reduce energy consumption. Experimental results on two real-life workflows and extensive synthetic parallel workflows demonstrate their effectiveness.
Most computer-based systems have hard real-time constraints. Schedulers in complex systems must be designed to manage a set of applications developed and deployed independently. In this paper, we study an open real-ti...
详细信息
Most computer-based systems have hard real-time constraints. Schedulers in complex systems must be designed to manage a set of applications developed and deployed independently. In this paper, we study an open real-time environment architecture for distributedsystems where real-time applications may run concurrently with non-real-time applications. The architecture uses a two-level scheduling scheme. Each application is assigned a sporadic server to schedule the processes in the application. All sporadic servers are then scheduled by a system-wide fixed priority scheduler. Using the proposed open environment architecture, all hard real-time applications are guaranteed to have their reserved CPU utilization in order to meet all their deadlines. The guarantee is independent of the behaviors of all other applications in the same system. We present the schedulability analysis methods on systems with or without shared memory.
Over the past few years, large language models have evolved to enable a wide range of applications-from natural language understanding to real-time conversational agents. However, the deployment of LLMs into productio...
详细信息
ISBN:
(数字)9798331509859
ISBN:
(纸本)9798331509866
Over the past few years, large language models have evolved to enable a wide range of applications-from natural language understanding to real-time conversational agents. However, the deployment of LLMs into production presents many significant challenges, especially with regard to low-latency responses that enable real-time interactions. This work investigates multi-node inference architectures for optimized deployment using open-source frameworks with scalability, flexibility, and cost-effectiveness. We investigate various methods, such as microbatching, tensor and pipeline parallelism, and sophisticated load balancing, that effectively distribute inference workloads across multiple nodes. We conduct extensive evaluations using popular open-source tools such as Kubernetes, Ray, and Envoy to benchmark the performance of these architectures in terms of latency, throughput, and resource utilization under diverse workloads. We also analyze model replication versus model partitioning trade-offs, giving insights into the most appropriate configuration for various deployment scenarios. As our results show, a well-orchestrated multi-node setup can be used to greatly reduce inference latency while preserving high throughputs, enabling the deployment of sophisticated LLMs in latencysensitive applications. This paper gives insights with a detailed analysis of multi-node inference strategies and integration into open-source ecosystems, therefore it will be a great guide for practitioners seeking to develop deployments of LLMs at scale. In summary, this work underlines how distributed architectures can overcome some of the inherent limitations imposed by singlenode deployments and are crucial for achieving more efficient and responsive AI-driven services.
With the rapid expansion of the Internet of Things (IoT), the shift from cloud computing to Mobile Edge Computing (MEC) has become necessary to address the low-latency requirements of real-time applications. Verifiabl...
详细信息
暂无评论