In this article we present PARSIR (parallel SImulation Runner), a package that enables the effective exploitation of shared-memory multi-processor machines for running discrete event simulation models. PARSIR is a com...
详细信息
ISBN:
(数字)9798331527211
ISBN:
(纸本)9798331527228
In this article we present PARSIR (parallel SImulation Runner), a package that enables the effective exploitation of shared-memory multi-processor machines for running discrete event simulation models. PARSIR is a compile/run-time environment for discrete event simulation models developed with the C programming language. The architecture of PARSIR has been designed in order to keep low the amount of CPU-cycles required for running models. This is achieved via the combination of a set of techniques like: 1) causally consistent batch-processing of simulation events at an individual simulation object for caching effectiveness; 2) high likelihood of disjoint access parallelism; 3) the favoring of memory accesses on local NUMA (Non-Uniform-Memory-Access) nodes in the architecture, while still enabling well balanced workload distribution via work-stealing from remote nodes; 4) the use of RMW (Read-Modify-Write) machine instructions for fast access to simulation engine data required by the worker threads for managing the concurrent simulation objects and distributing the workload. Furthermore, any architectural solution embedded in the PARSIR engine is fully transparent to the application level code implementing the simulation model. We also provide experimental results showing the effectiveness of PARSIR when running the reference PHOLD benchmark on a NUMA shared-memory multi-processor machine equipped with 40 CPUs.
Data Monitoring becomes mandatory for several IoT applications including smart greenhouse. It aims to increase the quality of information by identifying existing errors and anomalies, especially using outlier detectio...
详细信息
ISBN:
(数字)9798350371284
ISBN:
(纸本)9798350371291
Data Monitoring becomes mandatory for several IoT applications including smart greenhouse. It aims to increase the quality of information by identifying existing errors and anomalies, especially using outlier detection process by machine learning-data classification. Existing approaches require the knowledge of data characteristics in advance. However, this requirement is not always possible in IoT due to the heterogeneity of devices. Therefore, this paper provides VoteIoT: a new monitoring method based on data analytic and vote clustering outcome. In this way, we increase the probability of making a good decision and guaranteeing a good harvest of greenhouses. To evaluate the proposed solution, we used a real database extended by augmented data. The results show a good response time, below 0.01 seconds, as well as a good detection accuracy of 97%. Additionally, the false alarms are below 3% and therefore, a low useful data loss. Moreover, we have managed to increase the probability of a good decision compared to existing solutions.
High fan-out requests are prevalent in systems employing multi-tier architectures. These requests are divided into several sub-requests for parallel processing. However, a high fan-out request must await all sub-reque...
详细信息
ISBN:
(数字)9798331509712
ISBN:
(纸本)9798331509729
High fan-out requests are prevalent in systems employing multi-tier architectures. These requests are divided into several sub-requests for parallel processing. However, a high fan-out request must await all sub-requests to be completed before returning, but the processing times of sub-requests are unpredictable due to their differences in characteristics, such as data volume and data popularity. Meanwhile, existing SSD-based caches struggle to adjust request processing speeds to ensure timely handling. As a result, some sub-requests are delayed, affecting the overall latency and causing long tail latency *** paper proposes RSCache, a tail latency-friendly cache based on NVMe SSDs. RSCache combines the NVMe Weighted Round-robin (WRR) arbitration mechanism with a priority-based scheduling mechanism to enable differentiated request processing. We propose a fan-out size-based priority assignment strategy along with a latency-aware sub-request sorting method. They collaborate to prioritize sub-requests according to their impacts on tail latency and schedule them to NVMe priority queues with various processing speeds, effectively reducing the processing time variations among sub-requests. In addition, we balance the load across queues to avoid congestion with a novel feedback mechanism. We implement the prototype of RSCache based on SPDK. Our experiments demonstrate that RSCache reduces tail latency by up to 40% and improves throughput by two times compared to state-of-the-art SSD-based cache designs.
The safeguard of drug security is an important prerequisite to guarantee public health and life safety. The pre-warning mechanism is an important means to monitor drug risks quickly and effectively. Based on the data ...
详细信息
ISBN:
(数字)9798331542856
ISBN:
(纸本)9798331542863
The safeguard of drug security is an important prerequisite to guarantee public health and life safety. The pre-warning mechanism is an important means to monitor drug risks quickly and effectively. Based on the data mining technology, oriented to the massive data from each procedure of the drug industry chain, this paper mainly discusses how to pre-find the potential risks of the whole drug industry chain, and provides early-warning guidance via risk indicators. From the view of data modeling and technical processing, the key is how to integrate and analyze multi-source heterogeneous data (across space and time) and use the analysis of key indicators for streaming data. In specific, we first construct a distributed system and carry standardization for complex time-series data from different objects. Secondly, the potential risk of drug data is evaluated by using the key indicator analysis. At last, obtain the abnormal data objects and their fluctuating indicators of different periods. The application of the proposed method can provide potential hints or guidance for the sampling inspection of drugs, which is advantageous to improve the efficiency of the drug safety supervision department.
Locally Checkable Labeling (LCL) problems are graph problems in which a solution is correct if it satisfies some given constraints in the local neighborhood of each node. Example problems in this class include maximal...
详细信息
The distributed linearly separable computation problem finds extensive applications across domains such as dis-tributed gradient coding, distributed linear transform, real-time rendering, etc. In this paper, we invest...
详细信息
ISBN:
(数字)9798350382846
ISBN:
(纸本)9798350382853
The distributed linearly separable computation problem finds extensive applications across domains such as dis-tributed gradient coding, distributed linear transform, real-time rendering, etc. In this paper, we investigate this problem in a fully decentralized scenario, where
$\mathrm{N}$
workers collaboratively perform the computation task without a central master. Each worker aims to compute a linearly separable computation that can be manifested as
$\mathrm{K}_{\mathrm{c}}$
linear combinations of
$\mathrm{K}$
messages, where each message is a function of a distinct dataset. We require that each worker successfully fulfill the task based on the transmissions from any
$\mathrm{N}_{\mathrm{r}}$
workers, such that the system can tolerate any
$\mathrm{N}-\mathrm{N}_{\mathrm{r}}$
stragglers. We focus on the scenario where the computation cost (the number of uncoded datasets assigned to each worker) is minimum, and aim to minimize the communication cost (the number of symbols the fastest
$\mathrm{N}_{\mathrm{r}}$
workers transmit). We propose a novel distributed computing scheme that is optimal under the widely used cyclic data assignment. Interestingly, we demonstrate that the side information at each worker is ineffective in reducing the communication cost when
$\mathrm{K}_{\mathrm{c}}\leq \text{KN}_{\mathrm{r}}/\mathrm{N}$
, while it helps reduce the communication cost as
$\mathrm{K}_{\mathrm{c}}$
increases.
For parallel solvers susceptible to hardware-related failures, localizing recovery to the processes directly affected by the failure allows preserving asynchronous progress and exhibits “failure masking” due to limi...
详细信息
ISBN:
(数字)9798350364606
ISBN:
(纸本)9798350364613
For parallel solvers susceptible to hardware-related failures, localizing recovery to the processes directly affected by the failure allows preserving asynchronous progress and exhibits “failure masking” due to limited propagation of recovery delays. This results in improved scalability compared to global recovery which is a disproportionate response. However, localizing recovery from hard failures is challenging because such failures are not transparent to the MPI runtime, requiring reconstruction of the communication layers and of a consistent application state. In this work we present the process- and data-recovery concepts that enable the performance and scalability of localized recovery despite the inherently non-local nature of some recovery steps. We present design enhancements to existing resilience middleware-the Fenix library and MPI User-Level Failure Mitigation-to robustly support larger-scale execution and “pseudo-local” checkpointing and recovery from many process failures. Using an example stencil solver with emulated hard failures we present an experimental evaluation, with runs on up to ~1000 ranks subject to ~100 process failures, which confirms that that pseudo-local recovery has significantly improved weak scaling compared to the roughly exponential slowdown of global recovery. Our work shows how fault tolerance infrastructure originally designed for global checkpoint/restart can be repurposed to enable greater efficiency in a resilience-aware application.
暂无评论