Meteorology Grid Computing aims to provide scientist with seamless, reliable, secure and inexpensive access to meteorological resources. In this paper, we presented a semantic-based meteorology grid service registry, ...
详细信息
The use of large-scale machine learning and data mining methods is becoming ubiquitous in many application domains ranging from business intelligence and bioinformatics to self-driving cars. These methods heavily rely...
详细信息
ISBN:
(纸本)9781509065431
The use of large-scale machine learning and data mining methods is becoming ubiquitous in many application domains ranging from business intelligence and bioinformatics to self-driving cars. These methods heavily rely on matrix computations, and it is hence critical to make these computations scalable and efficient. These matrix computations are often complex and involve multiple steps that need to be optimized and sequenced properly for efficient execution. This paper presents new efficient and scalable matrix processing and optimization techniques for in-memory distributed clusters. The proposed techniques estimate the sparsity of intermediate matrix-computation results and optimize communication costs. An evaluation plan generator for complex matrix computations is introduced as well as a distributed plan optimizer that exploits dynamic cost-based analysis and rule-based heuristics to optimize the cost of matrix computations in an in-memory distributed environment. The result of a matrix operation will often serve as an input to another matrix operation, thus defining the matrix data dependencies within a matrix program. The matrix query plan generator produces query execution plans that minimize memory usage and communication overhead by partitioning the matrix based on the data dependencies in the execution plan. We implemented the proposed matrix processing and optimization techniques in Spark, a distributed in-memory computing platform. Experiments on both real and synthetic data demonstrate that our proposed techniques achieve up to an order-of-magnitude performance improvement over state-of-the-art distributed matrix computation systems on a wide range of applications.
The authors present the results of a simulation study carried out to compare the effects of different restart techniques on the overall throughput, number of restarts, average response time, and average communication ...
详细信息
The authors present the results of a simulation study carried out to compare the effects of different restart techniques on the overall throughput, number of restarts, average response time, and average communication delay in representative distributed database environments. The performances of the following transaction restart methods are compared: (1) restart with random increase of timestamp, (2) restart with random delay, (3) data-marking method, (4) data marking with random delay, and (5) restart with a substitute transaction. The substitute transaction method is shown to perform well under all loads, except for the case in which all transactions are update-only. In this case, restart with a random delay performs better. The data-marking method introduces a very high communication overhead owing to the fact that the transactions keep sending messages requesting operations when an item is not available. This results in high response times and in an erratic behavior of the system under higher loads.
In this paper, we propose a novel data transformation scheme over a big data platform, aiming at injecting production data from the local database in the factory side and transforming them into the workpiece-centric f...
详细信息
ISBN:
(纸本)9781509067817
In this paper, we propose a novel data transformation scheme over a big data platform, aiming at injecting production data from the local database in the factory side and transforming them into the workpiece-centric form that many manufacturing analytics systems need. The key idea is to blend big data processingtechniques, including table composition with external distributed files, columnar storage, partition, massively parallelprocessing into the data transformation scheme for minimizing the data processing time. Our proposed scheme brings two main impacts to the smart manufacturing. First, our scheme plays the key component to develop data-driven manufacturing decision systems, since large-volume production data sources can be efficiently transformed into the workpiece-centric form that other smart manufacturing services require. Second, our proposed scheme provides a development exemplar to assist a manufacturing factory toward the Industry-4.0 realm, since big data techniques are ingeniously blended in building data-intensive manufacturing services. We finally implement the prototype of the proposed scheme on the Hadoop platform and apply the prototype to a semiconductor factory for conducting integrated tests. Testing results of a case study physically applying the proposed scheme to a semiconductor factory demonstrate the success of our work.
Programming models for task-based parallelization based on compile-time directives are very effective at uncovering the parallelism available in HPC applications. Despite that, the process of correctly annotating comp...
详细信息
ISBN:
(纸本)9783030576752;9783030576745
Programming models for task-based parallelization based on compile-time directives are very effective at uncovering the parallelism available in HPC applications. Despite that, the process of correctly annotating complex applications is error-prone and may hinder the general adoption of these models. In this paper, we target the OmpSs-2 programming model and present a novel toolchain able to detect parallelization errors coming from non-compliant OmpSs-2 applications. Our toolchain verifies the compliance with the OmpSs-2 programming model using local task analysis to deal with each task separately, and structural induction to extend the analysis to the whole program. To improve the effectiveness of our tools, we also introduce some ad-hoc verification annotations, which can be used manually or automatically to disable the analysis of specific code regions. Experiments run on a sample of representative kernels and applications show that our toolchain can be successfully used to verify the parallelization of complex real-world applications.
Sequence similarity and alignment are most important operations in computational biology. However, analyzing large sets of DNA sequence seems to be impractical on a regular PC. Using multiple threads with JavaParty me...
详细信息
ISBN:
(纸本)9783540859291
Sequence similarity and alignment are most important operations in computational biology. However, analyzing large sets of DNA sequence seems to be impractical on a regular PC. Using multiple threads with JavaParty mechanism, this project has successfully implemented in extending the capabilities of regular Java to a distributed environment for simulation of DNA computation. With the aid of JavaParty and the design of multiple threads, the results of this study demonstrated that the modified regular Java program could perform parallel computing without using RMI or socket communication. In this paper, an efficient method for modeling and comparing DNA sequences with dynamic programming and JavaParty was firstly proposed. Additionally, results of this method in distributed environment have been discussed.
Lattice cryptography, as a recognized Cryptosystem that can resist quantum computation, has great potential for development. Lattice based signature scheme is currently a research focus. In this paper, the traceable r...
详细信息
Collecting observations from all international news coverage and using TABARI software to code events, the Global Database of Event, Language, and Tone (GDELT) is the only global political georeferenced event dataset ...
详细信息
ISBN:
(纸本)9781479950638
Collecting observations from all international news coverage and using TABARI software to code events, the Global Database of Event, Language, and Tone (GDELT) is the only global political georeferenced event dataset with 250+ million observations covering all countries in the world from January 1, 1979 to the present with daily updates. The purpose of this widely used dataset is to help understand and uncover spatial, temporal and perceptual trends and behaviors of the social and international system. To query such big geospatial data, traditional RDBMS can no longer be used and the need for paralleldistributed solutions has become a necessity. MapReduce paradigm has proved to be a scalable platform to process and analyze Big Data in the cloud. Hadoop as an implementation of MapReduce is an open source application that has been widely used and accepted in academia and industry. However, when dealing with Spatial Data, Hadoop is not equipped well and falls short as it doesnt perform efficiently in terms of running time. SpatialHadoop is an extension of Hadoop with the support of spatial data. In this paper, we present Geographic Information System Querying Framework (GISQF) to process Massive Spatial Data. This framework has been built on top of the open source SpatialHadoop system which exploits two-layer spatial indexing techniques to speed up query processing. We show how this solution outperforms Hadoop query processing by orders of magnitude when applying queries on GDELT dataset with a size of 60 GB. We show the results for three types of queries, Longitude Latitude Point queries, Circle-Area queries, and Aggregation queries.
As computing technology advances, manycore systems have become increasingly used due to their performance and parallelprocessing capabilities. However, these systems also present significant security risks, including...
详细信息
ISBN:
(纸本)9798350377217;9798350377200
As computing technology advances, manycore systems have become increasingly used due to their performance and parallelprocessing capabilities. However, these systems also present significant security risks, including threats from hardware Trojans, malicious applications, and peripherals capable of executing various attacks, such as denial-of-service, spoofing, and eavesdropping. Researchers have proposed several security enhancements to address these challenges, including establishing secure zones, authentication protocols, and security-aware routing algorithms. This paper introduces a distributed monitoring mechanism that can detect suspicious activities within NoC links, peripheral interfaces, and packet reception protocols, explicitly focusing on detecting DoS, spoofing, and eavesdropping attacks. We conducted a comprehensive attack campaign to evaluate the effectiveness of our monitoring mechanisms and test the platform's resilience. The results were promising, with the system successfully detecting all attacks and continuing to execute applications correctly. Although there was an average execution time penalty of 8.83%, which included the time taken to generate attack warnings from the monitoring mechanisms and apply countermeasures, this mechanism significantly enhanced the security of manycore systems.
High speed data processing is very important factor in super computers. parallel computing is one of the important elements mostly used in fast computers. There are different methods actively involved to satisfy the c...
详细信息
ISBN:
(纸本)9788132227571;9788132227564
High speed data processing is very important factor in super computers. parallel computing is one of the important elements mostly used in fast computers. There are different methods actively involved to satisfy the concept parallel computing. In the present paper pipeline method is discussed with its flaws and different clock schemes. In the present paper data propagation delay is discussed at different existing techniques and presented a new method to optimize the data propagation delay. The new method is compared with different existing methods and designs of new technique with simulation results are presented. The simulation results are obtained from Virtual Digital Integrator.
暂无评论