Large computing systems such as data centers are becoming the mainstream infrastructures for big data processing. As one of the key data operators in such scenarios, distributed joins is still challenging current tech...
详细信息
ISBN:
(纸本)9783319969831;9783319969824
Large computing systems such as data centers are becoming the mainstream infrastructures for big data processing. As one of the key data operators in such scenarios, distributed joins is still challenging current techniques since it always incurs a significant cost on network communication. Various advanced approaches have been proposed to improve the performance, however, most of them just focus on data skew handling, and algorithms designed specifically for communication reduction have received less attention. Moreover, although the state-of-the-art technique can minimize network traffic, it provides fine-grained optimal schedules for all individual join keys, which could result in obvious overhead. In this paper, we propose a new approach called LAS (Lightweight Locality-Aware Scheduling), which targets reducing network communication for large distributed joins in an efficient and effective manner. We present the detailed design and implementation of LAS, and conduct an experimental evaluation using large data joins. Our results show that LAS can effectively reduce scheduling overhead and achieve comparable performance on network reduction compared to the state-of-the-art.
Although smart devices markets are increasing their sales figures, their computing capabilities are not sufficient to provide good-enough-quality services. This paper proposes a solution to organize the devices within...
详细信息
ISBN:
(纸本)9783030856656;9783030856649
Although smart devices markets are increasing their sales figures, their computing capabilities are not sufficient to provide good-enough-quality services. This paper proposes a solution to organize the devices within the Cloud-Edge Continuum in such a way that each one, as an autonomous individual -Agent-, processes events/data on its embedded compute resources while offering its computing capacity to the rest of the infrastructure in a Function-as-a-Service manner. Unlike other FaaS solutions, the described approach proposes to transparently convert the logic of such functions into task-based workflows backing on task-based programming models;thus, agents hosting the execution of the method generate the corresponding workflow and offloading part of the workload onto other agents to improve the overall service performance. On our prototype, the function-to-workflow transformation is performed by COMPSs;thus, developers can efficiently code applications of any of the three envisaged computing scenarios - sense-process-actuate, streaming and batch processing - throughout the whole Cloud-Edge Continuum without struggling with different frameworks specifically designed for each of them.
distributed online social networks (DOSN) have emerged recently. Nevertheless, recommending friends in the distributed social networks has not been exploited fully. We propose BCE (Bloom Filter based Common-Friend Est...
详细信息
The paper presents an important aspect of cloud computing technology, namely migrating enterprise level workloads to a cloud environment, without re-architecting or re-engineering the existing applications. How readil...
详细信息
ISBN:
(纸本)9781467323703
The paper presents an important aspect of cloud computing technology, namely migrating enterprise level workloads to a cloud environment, without re-architecting or re-engineering the existing applications. How readily an application can be lifted and shifted onto a cloud platform depends on factors like nature of the application, the type of cloud etc. In this respect, the paper explores the methodology of migrations along with the challenges and issues that usually acts as a barrier for organizations trying to pursue this goal. An effort is also made to see how the cloud migration framework maps to the cloud Computing Reference Architecture model. Finally, a set of migration patterns which span the continuum from legacy IT environment to the cloud are included as a common framework for aligning the various migration approaches developed in support of using cloud as a delivery paradigm.
Quality assurance of multitier application is still a challenge. Especially difficult is testing big, distributedapplications written by several programmers, with the use of components from different sources. Due to ...
详细信息
ISBN:
(纸本)0387393870
Quality assurance of multitier application is still a challenge. Especially difficult is testing big, distributedapplications written by several programmers, with the use of components from different sources. Due to multi threaded and distributed architecture, their ability to be observed and their profiling are extremely difficult. J2eeprof is a new tool developed for testing and profiling multitier applications that run in the J2EE environment. The tool is based on the paradigm of aspect insertion. The main goal of j2eeprof is to help in fixing of integration errors and efficiency errors. This paper presents the concept of j2eeprof and gives some insides of j2eeprof development. On the beginning we give some introduction to the methods of software profiling, and a brief characteristic of existing profilers, i.e., JFluid, Iron Track Sql, Optimizelt Server Trace and JXInsight. Next we present the architecture of j2eeprof, and we describe how it collects data, what protocols it uses, and what kind of analysis it supports. On the end we demonstrate how j2eeprof works in practice. In conclusions we list the strong and weak points of this tool, which is still in a beta version. J2eeprof is planned to be offered as an open source for the programmer community.
In this paper we propose a protocol synthesis method based on a partial order model (called event structures)for the class of context-free processes. First, we assign a unique name called event ID to every event execu...
详细信息
ISBN:
(纸本)081869209X
In this paper we propose a protocol synthesis method based on a partial order model (called event structures)for the class of context-free processes. First, we assign a unique name called event ID to every event executable by a given service specification. An event ID is a finite sequence of symbols derived from the context-free process specification. Then we show that some interesting sets of events are expressible by regular expressions on symbols, and that the event structure can be finitely represented by a set of relations among the regular expressions. Finally, we present a method to derive a protocol specification which implements a given service specification on distributed nodes, by using the obtained finite representation of event structures. The derived protocol specification contains the minimum message exchanges necessary to ensure the partial order of events of the service specification.
Extraction of useful information from large datasets is one of the most important research problem. Association rule mining is one of the best methods for this purpose. Finding possible associations between items in l...
详细信息
ISBN:
(纸本)9781509020287
Extraction of useful information from large datasets is one of the most important research problem. Association rule mining is one of the best methods for this purpose. Finding possible associations between items in large transaction based datasets (finding frequent patterns) is most important part of the association rule mining. There exists many algorithms to find frequent patterns but Apriori algorithm always remains a preferred choice due to its ease of implementation and natural tendency to be parallelized. Many single-machine based Apriori variants exist but massive amount of data available these days is above capacity of a single machine. Therefore, to meet the demands of this ever-growing huge data, there is a need of multiple machines based Apriori algorithm. For these type of distributedapplications, mapreduce is a popular fault-tolerant framework. Hadoop is one of the best open-source software framework with mapreduce approach for distributed storage and distributedprocessing of huge datasets using clusters built from commodity hardware. But heavy disk I/O operation at each iteration of a highly iterative algorithm like Apriori makes hadoop inefficient. A number of map reduce based platforms are being developed for parallel computing in recent years. Among them, two platforms, namely, Spark and Flink have attracted lot of attention because of their inbuilt support to distributed computations. Earlier we had proposed a reduced- Apriori algorithm on Spark platform which outperforms parallel Apriori, firstly because of use of Spark and secondly because of the improvement we proposed in standard Apriori. Therefore, present work is a natural sequel of our earlier work and targets on implementing, testing and benchmarking Apriori on Apache Flink and compares it with Spark implementation. We conduct in-depth experiments to gain insight into the effectiveness, efficiency and scalability of the Apriori algorithm on Flink. We also use community detection graph mi
The recent development on semiconductor process and design technologies enables multi-core processors to become a dominant market trend in desk-top PCs as well as high end mobile devices. At the same time, the increas...
详细信息
User-Item (U-I) matrix has been used as the dominant data infrastructure of Collaborative Filtering (CF). To reduce space consumption in runtime and storage, caused by data sparsity and growing need to accommodate sid...
详细信息
Nowadays, digitization data is used and there are more and more data that generates everyday about everything. Data sets whose size is complex or large that commonly used today. Management of this data which ensures t...
详细信息
ISBN:
(纸本)9789811052729;9789811052712
Nowadays, digitization data is used and there are more and more data that generates everyday about everything. Data sets whose size is complex or large that commonly used today. Management of this data which ensures that the data from varied sources is processed error free and is of good quality to perform analysis processing and sharing of such a large data by traditional methods is difficult by the use of traditional methods. So we need such systems that are more flexible, scalable, fault tolerance, compatible, and cheap to process large amount of data. Hadoop is designed to handle the extremely high volumes of data in any structure. There are various ways for Hadoop to run the job. The three programming approaches that are MapReduce, Hive, and Pig are used. In this paper, we present the comparison of MapReduce, Hive, and Pig. These three techniques are useful under different constraints.
暂无评论