MapReduce is one of the leading programming frameworks to implement data-intensive applications by splitting the map and reduce tasks to distributed servers. Although there has been substantial amount of work on map t...
详细信息
MapReduce is one of the leading programming frameworks to implement data-intensive applications by splitting the map and reduce tasks to distributed servers. Although there has been substantial amount of work on map task scheduling and optimization in the literature, the work on reduce task scheduling is very limited. Effective scheduling of the reduce tasks to the resources becomes especially important for the performance of data-intensive applications where large amounts of data are moved between the map and reduce tasks. In this paper, we propose a new algorithm (LoNARS) for reduce task scheduling, which takes bothdata locality and network traffic into consideration. data locality awareness aims to schedule the reduce tasks closer to the map tasks to decrease the delay in data access as well as the amount of traffic pushed to the network. Network traffic awareness intends to distribute the traffic over the whole network and minimize the hotspots to reduce the effect of network congestion in data transfers. We have integrated LoNARS into Hadoop-1.2.1. Using our LoNARS algorithm, we achieved up to 15% gain in data shuffling time and up to 3-4% improvement in total job completion time compared to the other reduce task scheduling algorithms. Moreover, we reduced the amount of traffic on network switches by 15% which helps to save energy consumption considerably.
Cloud computing is a new paradigm for using ICT services-only when needed and for as long as needed, and paying only for service actually consumed. Benchmarking the increasingly many cloud services is crucial for mark...
详细信息
ISBN:
(纸本)9783319202334;9783319202327
Cloud computing is a new paradigm for using ICT services-only when needed and for as long as needed, and paying only for service actually consumed. Benchmarking the increasingly many cloud services is crucial for market growth and perceived fairness, and for service design and tuning. In this work, we propose a generic architecture for bench-marking cloud services. Motivated by recent demand for data-intensive ICT services, and in particular by processing of large graphs, we adapt the generic architecture to Graphalytics, a benchmark for distributed and GPU-based graph analytics platforms. Graphalytics focuses on the dependence of performance on the input dataset, on the analytics algorithm, and on the provisioned infrastructure. the benchmark provides components for platform configuration, deployment, and monitoring, and has been tested for a variety of platforms. We also propose a new challenge for the process of benchmarking data-intensive services, namely the inclusion of the data-processing algorithm in the system under test;this increases significantly the relevance of benchmarking results, albeit, at the cost of increased benchmarking duration.
Large, heterogeneous, and complex data collections can be difficult to analyze and manage manually. there is a need for scalable and user-friendly approaches that can automate the analysis and management of such colle...
详细信息
ISBN:
(纸本)9781509061594
Large, heterogeneous, and complex data collections can be difficult to analyze and manage manually. there is a need for scalable and user-friendly approaches that can automate the analysis and management of such collections in a timely and efficient manner. To meet the aforementioned need, we are developing a system named Pecos which combines (1) an android application, (2) cloud computing middleware and resources, and (3) High Performance computing (HPC) platform and software, for performing large-scale data analysis through an easy-to-use interface. Currently, Pecos can be used to analyze and manage data collections residing on Google drive or on remote Linux systems that are accessible via SSH connection. Some of the steps in the data analysis and management process that are already enabled through Pecos are (1) content-based classification and clustering of documents, (2) filtering or searching the documents on the basis of a text pattern, (3) performing checksum analysis and metadata extraction, (4) supporting format conversion of documents, and, (5) developing visualizations for analyzing the collection. Work is under progress to (1) optimize the algorithms in Pecos for document analysis, (2) develop a recommendation system, and (3) integrate the functionality for analyzing data from social media.
the proceedings contain 22 papers. the special focus in this conference is on Conceptual Modeling. the topics include: Towards an ontology for strategic decision making: the Case of quality in rapid software developme...
ISBN:
(纸本)9783319706245
the proceedings contain 22 papers. the special focus in this conference is on Conceptual Modeling. the topics include: Towards an ontology for strategic decision making: the Case of quality in rapid software development projects;detecting bad smells of refinement in goal-oriented requirements analysis;Requirements engineering for data warehouses (RE4DW): From strategic goals to multidimensional model;towards formal strategy analysis with goal models and semantic web technologies;assisting process modeling by identifying business process elements in natural language texts;Using multidimensional concepts for detecting problematic Sub-KPIs in analysis systems;automatically annotating business process models with ontology concepts at design-time;OPL-ML: A modeling language for representing ontology pattern languages;Evaluating quality issues in BPMN models by extending a technical debt software platform;data modelling for dynamic monitoring of vital signs: Challenges and perspectives;utility-driven data management for data-intensive applications in fog environments;Assessing the positional planimetric accuracy of DBpedia georeferenced resources;Assessing the completeness evolution of DBpedia: A case study;towards care systems using model-driven adaptation and monitoring of autonomous multi-clouds;clustering event traces by behavioral similarity;goal-based selection of visual representations for big data analytics;A four V’s design approach of NoSQL graph databases;towards efficient and informative omni-channel customer relationship management;stream clustering of chat messages with applications to twitch streams;towards consistent demarcation of enterprise design domains.
Parallelization of time-dependent partial differential equations (PDEs) can be accomplished by time decomposition using the parareal algorithm. While the parareal algorithm was designed to enable real-time simulations...
详细信息
ISBN:
(纸本)9780769546766
Parallelization of time-dependent partial differential equations (PDEs) can be accomplished by time decomposition using the parareal algorithm. While the parareal algorithm was designed to enable real-time simulations, it holds particular promise for long time simulations on computational grids and clouds, due its low communication overhead and potential for adaptation to heterogeneous processors. this contribution extends previous work on the scheduling of tasks of the parareal algorithm to resources with heterogeneous CPU performance. Experiments on Amazon's EC2 show the suitability of this algorithm for execution on a heterogeneous cloud platform and its insensitivity to network latency.
暂无评论