The article describes a new approach to the processing of long series of ozone monitoring data in the surface atmosphere called the data parallel processing in Block Streams. The proposed method is based on splitting ...
详细信息
The article describes a new approach to the processing of long series of ozone monitoring data in the surface atmosphere called the data parallel processing in Block Streams. The proposed method is based on splitting of a sequential series of initial data into blocks filled with surface ozone monitoring data for one day. Further, a chain is formed from such blocks, the length of which is determined by the total length of the monitoring process. Along this chain of blocks, parallelprocessing of the initial data is carried out, aimed at smoothing out fast fluctuations. The smoothed data is then used to determine the daily production of ozone due to photochemical reactions, its minimum nighttime levels, as well as the magnitude of nighttime maxima. The possibilities of the proposed approach are demonstrated using the analysis of surface ozone monitoring data in Moscow, Russia, in 2020. The prospects for further application of the developed method are discussed.
The Generalized Nets (GN) approach is an advanced way of parallel processes modeling and analysis of complex systems as Large-scale Wireless Sensor Networks (LWSN). The LWSN such as meteorological and air quality moni...
详细信息
ISBN:
(纸本)9783030410322;9783030410315
The Generalized Nets (GN) approach is an advanced way of parallel processes modeling and analysis of complex systems as Large-scale Wireless Sensor Networks (LWSN). The LWSN such as meteorological and air quality monitoring systems could generate a large amount of data that can reach petabytes per year. The sensor data-parallelprocessing is one of the possible solutions to reduce inter-node communication to save energy. At the same time, the on-site parallelprocessing requires additional energy, needed for computational dataprocessing. Therefore, the development of a realistic model of the process is critical for the optimization analysis of every large scale sensor network. In the proposed paper, a new developed GN based model of a sensor nodes data-parallelprocessing of LWSN with cluster topology is presented. The proposed model covers all the aspects of the inter-node sensor data integration and the cluster-based parallel processes specific for large scale amounts of sensor data operations.
Directed Acyclic Graph (DAG) workflows are widely used for large-scale data analytics in cluster-based distributed computing systems. The performance model for a DAG on data-parallel frameworks (e.g., MapReduce) is a ...
详细信息
Directed Acyclic Graph (DAG) workflows are widely used for large-scale data analytics in cluster-based distributed computing systems. The performance model for a DAG on data-parallel frameworks (e.g., MapReduce) is a research challenge because the allocation of preemptable system resources among parallel jobs may dynamically vary during execution. This resource allocation variation during execution makes it difficult to accurately estimate the execution time. In this paper, we tackle this challenge by proposing a new cost model, called Bottleneck Oriented Estimation (BOE), to estimate the allocation of preemptable resources by identifying the bottleneck to accurately predict task execution time. For a DAG workflow, we propose a state-based approach to iteratively use the resource allocation property among stages to estimate the overall execution plan. Furthermore, to handle the skewness of various jobs, we refine the model with the order statistics theory to improve estimation accuracy. Extensive experiments were performed to validate these cost models with HiBench and TPC-H workloads. The BOE model outperforms the state-of-the-art models by a factor of five for task execution time estimation. For the refined skew aware model, the average prediction error is under 3% when estimating the execution time of 51 hybrid analytics (HiBench) and query (TPC-H) DAG workflows.
作者:
Guan, XuefengWuhan Univ
State Key Lab Informat Engn Surveying Mapping & R Wuhan 430079 Peoples R China
Massive spatial data requires considerable computing power for real-time processing. With the help of the development of multicore technology and computer component cost reduction in recent years, high performance clu...
详细信息
Massive spatial data requires considerable computing power for real-time processing. With the help of the development of multicore technology and computer component cost reduction in recent years, high performance clusters become the only economically viable solution for this requirement. Massive spatial dataprocessing demands heavy I/O operations however, and should be characterized as a data-intensive application. data-intensive application parallelization strategies are imcompatible with currently available procssing frameworks, which are basically designed for traditional compute-intensive applications. In this paper we introduce a Split-and-Merge paradigm for spatial dataprocessing and also propose a robust parallel framework in a cluster environment to support this paradigm. The Split-and-Merge paradigm efficiently exploits dataparallelism for massive dataprocessing. The proposed framework is based on the open-source TORQUE project and hosted on a multicore-enabled Linux cluster. One common LiDAR point cloud algorithm, Delaunay triangulation, was implemented on the proposed framework to evaluate its efficiency and scalability. Experimental results demonstrate that the system provides efficient performance speedup.
Large datasets, such as pixels and voxels in 2D and 3D images can usually be reduced during their processing to smaller subsets with less datapoints. Such subsets can be the objects in the image, features - edges or c...
详细信息
Large datasets, such as pixels and voxels in 2D and 3D images can usually be reduced during their processing to smaller subsets with less datapoints. Such subsets can be the objects in the image, features - edges or corners - or more general, regions of interest. For instance, the transformation from a set of datapoints representing an image, to one or more subsets of datapoints representing objects in the image, is due to a segmentation algorithm and may involve both the selection of datapoints as well as a change in datastructure. The massive number of pixels in the original image, points to a dataparallel approach, whereas the processing of the various objects in the image is more suitable for task parallelism. In this paper we introduce a framework for parallel image processing and we focus on an array of buckets that can be distributed over a number of processors and that contains pointers to the data from the dataset. The benefit of this approach is that the processor activity remains focussed on the datapoints that need processing and, moreover, that the load can be distributed over many processors, even in a heterogeneous computer architecture. Although the method is generally applicable in the processing of sets, in this paper we obtain our examples from the domain of image processing. As this method yields speedups that are data dependent, we derived a run-time evaluation that is able to determine if the use of distributed buckets is beneficial. (C) 2008 Elsevier B.V. All rights reserved.
In this paper, under the weak assumptions such as only the step changes of system set points being the exciting signal, unknown structures and the dynamic parameters, approximate linear dynamic model and the least squ...
详细信息
In this paper, under the weak assumptions such as only the step changes of system set points being the exciting signal, unknown structures and the dynamic parameters, approximate linear dynamic model and the least squares estimation method, for the linear slow time-varying large-scale system, the steady-state gain estimate of the system is formed from the estimates of the parameters and a parallel identification algorithm is put forward. The consistency of the estimate and the convergence of the parallel iteration are analyzed. Based on this consistency theorem, a pragmatic method to get the strony consistent estimate of the steady-state model of the large-scale system is presented. Simulation study has already proved this.
暂无评论