High performance computing (HPC), large scale instruments and continuously increasing simulation tools are generating data at a huge rate that are difficult to be effectively managed and analyzed. Implementation of Ma...
详细信息
This work deals with the issues of abstract and structural design of distributed network applications based on logic-algebraic approach and some of the methods of artificial intelligence. Formal methods being used all...
详细信息
ISBN:
(纸本)9781479971039
This work deals with the issues of abstract and structural design of distributed network applications based on logic-algebraic approach and some of the methods of artificial intelligence. Formal methods being used allow you to develop distributedapplications for data processing which correspond to a chosen paradigm. The paradigm means the concept being chosen for software solutions, for example, the principle of organization of communication between processes, the method of naming and synchronization of processes, the approach to the distribution of the objects and coordination of their functioning. In order to perform conformity of application with the paradigm selected by developer, at the initial stages of the project it is suggested to use conceptual models and logical models of artificial intelligence.
In this work a multi-frame interferometric system based in digital image processing and digital interferometry is presented. Here, an experimental optical setup which integrates the virtues of the digital interferomet...
详细信息
ISBN:
(纸本)9781509011483
In this work a multi-frame interferometric system based in digital image processing and digital interferometry is presented. Here, an experimental optical setup which integrates the virtues of the digital interferometry technique and the capacity to capture two interferograms, to two different times, in a single digital recording system is implemented. The digital reconstruction of the interferograms required the capture of three interferometric patterns of very thin parallel fringes (15-20 lines/mm), one of them with information of the phase object and the other two are interferometric patterns of reference. One of the referential pattern is captured in the same condition as the one with plasma and the other one slightly changing the frequency of the pattern fringes. The interferogram associated with each instant of time is separated by means of spatial filtering techniques. Thus, the technique allows obtain digital interferograms in fringes of infinite and finite width in two different times. The interferometric system was tested in a laser-induced spark plasma in air.
Computing platforms are consuming more and more energy due to the increasing number of nodes composing them. To minimize the operating costs of these platforms many techniques have been used. Dynamic voltage and frequ...
详细信息
ISBN:
(纸本)9781467376846
Computing platforms are consuming more and more energy due to the increasing number of nodes composing them. To minimize the operating costs of these platforms many techniques have been used. Dynamic voltage and frequency scaling (DVFS) is one of them. It reduces the frequency of a CPU to lower its energy consumption. However, lowering the frequency of a CPU may increase the execution time of an application running on that processor. Therefore, the frequency that gives the best trade-off between the energy consumption and the performance of an application must be selected. In this paper, a new online frequency selecting algorithm for heterogeneous platforms (heterogeneous CPUs) is presented. It selects the frequencies and tries to give the best trade-off between energy saving and performance degradation, for each node computing the message passing iterative application. The algorithm has a small overhead and works without training or profiling. It uses a new energy model for message passing iterative applications running on a heterogeneous platform. The proposed algorithm is evaluated on the SimGrid simulator while running the NAS parallel benchmarks. The experiments show that it reduces the energy consumption by up to 34% while limiting the performance degradation as much as possible. Finally, the algorithm is compared to an existing method, the comparison results show that it outperforms the latter, on average it saves 4% more energy while keeping the same performance.
Cloud infrastructures offer a wide variety of resources to choose from. However, most cloud users ignore the potential benefits of dynamically choosing cloud resources among a wide variety of VM instance types with di...
详细信息
The integration of multiple database technologies, including both SQL and NoSQL, allows using the best tool for each aspect of a complex problem and is increasingly sought in practice. Unfortunately, this makes it dif...
详细信息
As a fundamental tool in modeling and analyzing social, and information networks, large-scale graph mining is an important component of any tool set for big data analysis. processing graphs with hundreds of billions o...
详细信息
ISBN:
(纸本)9781450333177
As a fundamental tool in modeling and analyzing social, and information networks, large-scale graph mining is an important component of any tool set for big data analysis. processing graphs with hundreds of billions of edges is only possible via developing distributed algorithms under distributed graph mining frameworks such as MapReduce, Pregel, Gigraph, and alike. For these distributed algorithms to work well in practice, we need to take into account several metrics such as the number of rounds of computation and the communication complexity of each round. For example, given the popularity and ease-of-use of MapReduce framework, developing practical algorithms with good theoretical guarantees for basic graph algorithms is a problem of great importance. In this tutorial, we first discuss how to design and implement algorithms based on traditional MapReduce architecture. In this regard, we discuss various basic graph theoretic problems such as computing connected components, maximum matching, MST, counting triangle and overlapping or balanced clustering. We discuss a computation model for MapReduce and describe the sampling, filtering, local random walk, and core-set techniques to develop efficient algorithms in this framework. At the end, we explore the possibility of employing other distributed graph processing frameworks. In particular, we study the effect of augmenting MapReduce with a distributed hash table (DHT) service and also discuss the use of a new graph processing framework called ASYMP based on asynchronous message-passing method. In particular, we will show that using ASyMP, one can improve the CPU usage, and achieve significantly improved running time.
The Hadoop framework has been developed to effectively process data-intensive MapReduce applications. Hadoop users specify the application computation logic in terms of a map and a reduce function, which are often ter...
详细信息
ISBN:
(纸本)9781479988709
The Hadoop framework has been developed to effectively process data-intensive MapReduce applications. Hadoop users specify the application computation logic in terms of a map and a reduce function, which are often termed MapReduce applications. The Hadoop distributed file system is used to store the MapReduce application data on the Hadoop cluster nodes called Datanodes, whereas Namenode is a control point for all Datanodes. While its resilience is increased, its current data-distribution methodologies are not necessarily efficient for heterogeneous distributed environments such as public clouds. This work contends that existing data distribution techniques are not necessarily suitable, since the performance of Hadoop typically degrades in heterogeneous environments whenever data-distribution is not determined as per the computing capability of the nodes. The concept of data-locality and its impact on the performance of Hadoop are key factors, since they affect the performance in the Map phase when scheduling tasks. The task scheduling techniques in Hadoop should arguably consider data locality to enhance performance. Various task scheduling techniques have been analysed to understand their data-locality awareness while scheduling applications. Other system factors also play a major role while achieving high performance in Hadoop data processing. The main contribution of this work is a novel methodology for data placement for Hadoop Datanodes based on their computing ratio. Two standard MapReduce applications, WordCount and Grep, have been executed and a significant performance improvement has been observed based on our proposed data distribution technique.
Understanding the characteristics of MapReduce workloads in a Hadoop, is the key in making optimal and efficient configuration decisions and improving the system efficiency. MapReduce is a very popular parallel proces...
详细信息
ISBN:
(纸本)9781509002252
Understanding the characteristics of MapReduce workloads in a Hadoop, is the key in making optimal and efficient configuration decisions and improving the system efficiency. MapReduce is a very popular parallelprocessing framework for large-scale data analytics which has become an effective method for processing massive data by using cluster of computers. In the last decade, the amount of customers, services and information increasing rapidly, yielding the big data analysis problem for service systems. To keep up with the increasing volume of datasets, it requires efficient analytical capability to process and analyze data in two phases. They are mapping and reducing. Between mapping and reducing phases, MapReduce requires a shuffling to globally exchange the intermediate data generated by the mapping. In this paper, it is proposed a novel shuffling strategy to enable efficient data movement and reduce for MapReduce shuffling with number of consecutive words and their count in the word processor. To improve its scalability and efficiency of word processor in big data environment, repetition of consecutive words count with shuffling is implemented on Hadoop. It can be implemented in a widely-adopted distributed computing platform and also in single word processor big documents using the MapReduce parallelprocessing paradigm.
Many core systems are basically designed for applications having large data parallelism. Strassen Matrix Multiply (MM) can be formulated as a depth first (DFS) traversal of a recursion tree where all cores work in par...
详细信息
ISBN:
(纸本)9781479986767
Many core systems are basically designed for applications having large data parallelism. Strassen Matrix Multiply (MM) can be formulated as a depth first (DFS) traversal of a recursion tree where all cores work in parallel on computing each of the NxN sub-matrices that reduces storage at the detriment of large data motion to gather and aggregate the results. We propose Strassen and Winograd algorithms (S-MM and W-MM) based on three optimizations: a set of basic algebra functions to reduce overhead, invoking efficient library (CUBLAS 5.5), and parameter-tuning of parametric kernel to improve resource occupancy. On GPUs, W-MM and S-MM with one recursion level outperform CUBLAS 5.5 Library with up to twice as faster for large arrays satisfying N>=2048 and N>=3072, respectively. Compared to NVIDIA SDK library, S-MM and W-MM achieved a speedup between 20x to 80x for the above arrays. The proposed approach can be used to enhance the performance of CUBLAS and MKL libraries.
暂无评论