A MapReduce scheduling algorithm plays a critical role in managing large clusters of hardware nodes and meeting multiple quality requirements by controlling the order and distribution of users, jobs, and tasks executi...
详细信息
A MapReduce scheduling algorithm plays a critical role in managing large clusters of hardware nodes and meeting multiple quality requirements by controlling the order and distribution of users, jobs, and tasks execution. A comprehensive and structured survey of the scheduling algorithms proposed so far is presented here using a novel multidimensional classification framework. These dimensions are (i) meeting quality requirements, (ii) scheduling entities, and (iii) adapting to dynamic environments;each dimension has its own taxonomy. An empirical evaluation framework for these algorithms is recommended. This survey identifies various open issues and directions for future research.
In this paper, the design of a finite-memory partition system for the detection of a constant signal in psi-mixing noise is investigated. It is found that the new detector converges to the locally optimal finite-memor...
详细信息
In this paper, the design of a finite-memory partition system for the detection of a constant signal in psi-mixing noise is investigated. It is found that the new detector converges to the locally optimal finite-memory practically intractable detector characterized by a multidimensional Fredholm integral equation of the second kind. The new detector encompasses many classes of known detectors. Numerical calculations demonstrate that the finite-memory detector compares favorably, using asymptotic relative efficiency as a fidelity criterion, to other classes of detectors even if extremes of dependent noise distributions are considered. The same calculations also suggest that a dependent process may be treated as an M-dependent process in finite-memory detectors without causing significant detrimental effects, provided M is sufficiently large. To reduce excessive computational complexity, a priori knowledge regarding properties of system parameters (such as matrix symmetry) as well as noise distributions (especially Gaussian and its independently nonlinear transformations) are exploited. Generalizations and extensions of the proposed detectors are also discussed. The operation of the detector may be easily extended to include adaptability and/or sequential operation.
This paper presents an effort to mitigate overheads, latencies, and limitations observed in message-driven runtime frameworks, by utilizing lightweight threads tightly integrated with message passing. It also introduc...
详细信息
This paper presents an effort to mitigate overheads, latencies, and limitations observed in message-driven runtime frameworks, by utilizing lightweight threads tightly integrated with message passing. It also introduces new abstractions and features for group communication as well as fine-grained concurrency on top of remote method invocations to improve workload balancing in shared and distributed memory. We observe up to 100% difference in performance behavior for task creation and handling message passing. Evaluations on 1000 cores (25 nodes) of a distributed memory machine showed that the integration of fine-grained concurrency with the runtime achieves performance improvements of 12% on a seismic wave simulation benchmark, as opposed to 50% degradation with OpenMP. Moreover, a 3D mesh refinement application showed 50% improvement, exploiting multi-grain parallelism at data and task levels.
A predominant ('base') scheme currently used in many circuit-switched LAN-WAN gateway applications is based on device-driver LAN redirections. Various functional and performance deficiencies associated with th...
详细信息
A predominant ('base') scheme currently used in many circuit-switched LAN-WAN gateway applications is based on device-driver LAN redirections. Various functional and performance deficiencies associated with the LAN redirection (base) scheme and an RPC programming interface are described. This RPC definition supports the base scheme, as well as another ('enhanced') scheme. The enhanced scheme overcomes the various outlined functional deficiencies associated with the cited implementations of the base scheme. Implementations of these two client-server application schemes on top of the IEEE portable operating-system interface and the Open Software Foundation RPC are described for both schemes. Performances of these two schemes are compared, both based on a simplified Markov chain model and on actual implemtations and long-duration experiments. Some of the similarities between the predictive simple-model behaviors and the actual experienced results for the applications are encouraging. This experience shows that the described RPC programming interface is a reasonable candidate for immediate practical implementations on heterogeneous systems.
Bitcoin is the leading example of a blockchain application that facilitates peer-to-peer transactions without the need for a trusted third party. This paper considers possible attacks related to the decentralized netw...
详细信息
Bitcoin is the leading example of a blockchain application that facilitates peer-to-peer transactions without the need for a trusted third party. This paper considers possible attacks related to the decentralized network architecture of Bitcoin. We perform a data driven study of Bitcoin and present possible attacks based on spatial and temporal characteristics of its network. Towards that, we revisit the prior work, dedicated to the study of centralization of Bitcoin nodes over the Internet, through a fine-grained analysis of network distribution, and highlight the increasing centralization of the Bitcoin network over time. As a result, we show that Bitcoin is vulnerable to spatial, temporal, spatio-temporal, and logical partitioning attacks with an increased attack feasibility due to the network dynamics. We verify our observations through data-driven analyses and simulations, and discuss the implications of each attack on the Bitcoin network. We conclude with suggested countermeasures.
The World Wide Web has become the largest single possible source of processing power. By coupling CPU time donated by volunteers, researchers and industry have the ability to execute applications that traditionally we...
详细信息
The World Wide Web has become the largest single possible source of processing power. By coupling CPU time donated by volunteers, researchers and industry have the ability to execute applications that traditionally were in the domain of the supercomputer users. This paper presents one such attempt at creating a system capable of exploiting this abundance of processing power. It is based on an inherently parallel model of computing. The concepts behind computational model are explained and the implementation details are illustrated. The paper presents results obtained from various tests of this implementation.
We provide novel coded computation strategies for distributed matrix-matrix products that outperform the recent "Polynomial code" constructions in recovery threshold, i.e., the required number of successful ...
详细信息
We provide novel coded computation strategies for distributed matrix-matrix products that outperform the recent "Polynomial code" constructions in recovery threshold, i.e., the required number of successful workers. When a fixed 1/m fraction of each matrix can be stored at each worker node, Polynomial codes require m(2) successful workers, while our MatDot codes only require 2m - 1 successful workers. However, MatDot codes have higher computation cost per worker and higher communication cost from each worker to the fusion node. We also provide a systematic construction of MatDot codes. Furthermore, we propose "PolyDot" coding that interpolates between Polynomial codes and MatDot codes to trade off computation/communication costs and recovery thresholds. Finally, we demonstrate a novel coding technique for multiplying n matrices (n >= 3) using ideas from MatDot and PolyDot codes.
Air quality regulations in the United States require thousands of planners and technicians in state and local agencies, and in corporations, to model various pollution scenarios. This is a first step towards devising ...
详细信息
Air quality regulations in the United States require thousands of planners and technicians in state and local agencies, and in corporations, to model various pollution scenarios. This is a first step towards devising optimal ways of limiting emissions to achieve statutory goals for protecting health. These end users are not necessarily computer experts or scientists: they need a sophisticated but ready-to-go computational system for simulating pollution and its control. The system must incorporate menus, mice, and the other features of readily understood GUIs for ease of use, but must also be able to take advantage of distributed data sets, high-bandwidth communication networks, and supercomputer power for state-of-the-art modeling. Many people need these tools and they are all trying to follow similar laws and rules, implying that, while flexibility is needed, a somewhat standardized system is in order so that apples can be compared with apples. While consumer and business software has advanced in ease of use by leaps and bounds over the last decade, the interfaces for many scientific and engineering applications are still stuck in the pregraphical, user-unfriendly age. Moreover, tools for organizing the proliferating input and output data that go along with these applications are often most diplomatically described as quaint. GEMS, the Geographic Environmental Modeling System, is part of an effort to turn that tide. GEMS itself is not a computational model for predicting air pollution but rather an object-oriented working environment with a graphical interface, designed to make the scientific models easier to use. Software engineers initiated GEMS, but quickly realized that to not only design the system right but to design the right system, end users had to be integrally involved at every stage. The authors, two software experts and two noted pollution modeling experts, have something to teach anyone who wants to develop a computational system for science and engi
The notion of computational resiliency refers to the ability of a distributed application to tolerate intrusion when under information warfare (IW) attack. This technology seeks an active strengthening of a military m...
详细信息
The notion of computational resiliency refers to the ability of a distributed application to tolerate intrusion when under information warfare (IW) attack. This technology seeks an active strengthening of a military mission, rather than protecting its network infrastructure using static defensive measures such as network security, intrusion sensors, and firewalls. Computational resiliency involves the dynamic use of replication, guided by mission policy, to achieve intrusion tolerance so that even undetected attacks do not cause mission failure;however, it goes further to dynamically regenerate replication in response to an IW attack, allowing the level of system assurance to be restored and maintained. Replicated structures are protected through several techniques such as camouflage, dispersion, and layered security policy. This paper describes a prototype concurrent programming technology that we have developed to support computational resiliency and describes how the library has been applied in tyro prototypical applications. Copyright (C) 2002 John Wiley Sons, Ltd.
This paper examines the fundamental problems and goals associated with test, verification, and flight-certification of man-rated distributed data systems. First, a summary of the characteristics of modern computer sys...
详细信息
This paper examines the fundamental problems and goals associated with test, verification, and flight-certification of man-rated distributed data systems. First, a summary of the characteristics of modern computer systems that affect the testing process is provided. Then, verification requirements are expressed in terms of an overall test philosophy for distributed computer systems. This test philosophy stems from previous experience that was gained with centralized systems (Apollo and the Space Shuttle), and deals directly with the new problems that verification of distributed systems may present. Finally, a description of potential hardware and software tools to help solve these problems is provided.
暂无评论