Green Computing is a recent trend in computer science, which tries to reduce the energy consumption and carbon footprint produced by computers on distributed platforms Such as clusters, grids, and clouds. Traditional ...
详细信息
Green Computing is a recent trend in computer science, which tries to reduce the energy consumption and carbon footprint produced by computers on distributed platforms Such as clusters, grids, and clouds. Traditional scheduling solutions attempt to minimize processing times without taking into account the energetic cost. One of the methods for reducing energy consumption is providing scheduling policies in order to allocate tasks on specific resources that impact over the processing times and energy consumption. In this paper, we propose a real-time dynamic scheduling system to execute efficiently task based applications on distributed computing platforms in order to minimize the energy consumption. Scheduling tasks on multiprocessors is a well known NP-hard problem and optimal solution of these problems is not feasible, we present a polynomial-time algorithm that combines a set of heuristic rules and a resource allocation technique in order to get good solutions on an affordable time scale. The proposed algorithm minimizes a multi-objective function which combines the energy-consumption and execution time according to the energy-performance importance factor provided by the resource provider or user, also taking into account sequence-dependent setup times between tasks, setup times and down times for virtual machines (VM) and energy profiles for different architectures. A prototype implementation of the scheduler has been tested with different kinds of DAG generated at random as well as on real task-based COMPSs applications. We have tested the system with different size instances and importance factors, and we have evaluated which combination provides a better solution and energy savings. Moreover, we have also evaluated the introduced overhead by measuring the time for getting the scheduling solutions for a different number of tasks, kinds of DAG, and resources, concluding that our method is suitable for run-time scheduling. (C) 2016 Elsevier B.V. All rights rese
Coded distributed computing has been considered as a promising technique which makes large-scale systems robust to the "straggler" workers. Yet, practical system models for distributed computing have not bee...
详细信息
We present an automated analysis technique that leverages artificial neural networks to identify possible causes for sub-optimal execution of task-parallel programs. Performance anomalies in task-parallel programs are...
详细信息
ISBN:
(纸本)9781538655559
We present an automated analysis technique that leverages artificial neural networks to identify possible causes for sub-optimal execution of task-parallel programs. Performance anomalies in task-parallel programs are often extremely difficult to analyze due to the complexity of the interactions between dynamic runtimesystems and hardware. While Hardware Performance Monitoring is a common technique to capture hardware behavior, understanding how the resulting hardware event profiling data relates to task performance is often non-trivial and time-consuming. In this work, we present an automated technique for task-parallel performance analysis that identifies the hardware behaviors that have the greatest impact on task performance. Our technique uses artificial neural networks to model these relationships, allowing for isolation of the specific hardware events that have the most impact to slow down task execution. We show that our technique provides new insights into task-parallel execution behavior, allowing for acceleration of the performance optimization process.
In this work, we introduce slot selection and co-allocation algorithms for parallel jobs in distributed computing with non-dedicated and heterogeneous resources (clusters, CPU nodes equipped with multicore processors,...
详细信息
ISBN:
(纸本)9781450365239
In this work, we introduce slot selection and co-allocation algorithms for parallel jobs in distributed computing with non-dedicated and heterogeneous resources (clusters, CPU nodes equipped with multicore processors, networks etc.). A single slot is a time span that can be assigned to a task, which is a part of a parallel job. The job launch requires a co-allocation of a specified number of slots starting and finishing synchronously. The challenge is that slots associated with different heterogeneous resources of distributed computing environments may have arbitrary start and finish points, different performance, latency, pricing policies. Some existing algorithms assign a job to the first set of slots matching the resource request without any optimization (the first fit type), while other algorithms are based on an exhaustive search. In this paper, algorithms for efficient slot selection are studied and compared with known approaches. The novelty of the proposed approach is in a general algorithm selecting a set of slots efficiently according to the specified criterion.
Betweenness centrality quantifies the importance of nodes in a graph in many applications, including network analysis, community detection and identification of influential users. Typically, graphs in such application...
详细信息
Betweenness centrality quantifies the importance of nodes in a graph in many applications, including network analysis, community detection and identification of influential users. Typically, graphs in such applications evolve over time. Thus, the computation of betweenness centrality should be performed incrementally. This is challenging because updating even a single edge may trigger the computation of all-pairs shortest paths in the entire graph. Existing approaches cannot scale to large graphs: they either require excessive memory (i.e., quadratic to the size of the input graph) or perform unnecessary computations rendering them prohibitively slow. We propose iCENTRAL;a novel incremental algorithm for computing betweenness centrality in evolving graphs. We decompose the graph into biconnected components and prove that processing can be localized within the affected components. iCENTRAL is the first algorithm to support incremental betweeness centrality computation within a graph component. This is done efficiently, in linear space;consequently, iCENTRAL scales to large graphs. We demonstrate with real datasets that the serial implementation of iCENTRAL is up to 3.7 times faster than existing serial methods. Our parallel implementation that scales to large graphs, is an order of magnitude faster than the state-of-the-art parallel algorithm, while using an order of magnitude less computational resources.
Although the principles of real-time collaborative editing have been explored since the eighties, team collaboration software facilitating the completion of tasks as a group continues to be a very hot research topic. ...
详细信息
ISBN:
(纸本)9781538646403
Although the principles of real-time collaborative editing have been explored since the eighties, team collaboration software facilitating the completion of tasks as a group continues to be a very hot research topic. A series of theoretical and practical results obtained by the research and industrial communities originated in the theory of distributed computing. They were devised for managing the concurrent nature of user actions and for maintaining the consistency of data as changes are introduced randomly, by multiple users and in real-time. As such, centralized collaborative editing servers were designed to allow users to work in parallel on a document from a typical web browser. In order to maintain the consistency of the content being modified at different sites in different orders, Operational Transformation (OT) mechanisms are at the core of collaboration servers enabling web-based co-editing. However, as expected of modern web application deployments, a centralized OT algorithm is required that must also exhibit properties such as scalability and reliability. In this paper, the processes involved in the client-server interactions of OT are modeled as real-timesystems using Finite State Machine (FSM) theory. The consistency of the data is controlled by formal groups of FSMs. Hierarchical FSMs are used to define and simulate the real-time behavior of client and server components when processing and transforming changes initiated by users. The FSM-based OT implementation is tested using random inputs and the approach is shown to be helpful for organizing and managing the complex distributed aspects of such algorithms.
Unmanned aerial systems (UAS) are becoming a common tool for aerial sensing applications. Nevertheless, sensed data need further processing before becoming useful information. This processing requires large computing ...
详细信息
Unmanned aerial systems (UAS) are becoming a common tool for aerial sensing applications. Nevertheless, sensed data need further processing before becoming useful information. This processing requires large computing power and time before delivery. In this paper, we present a parallel architecture that includes an unmanned aerial vehicle (UAV), a small embedded computer on board, a communication link to the Internet, and a cloud service with the aim to provide useful real-time information directly to the end-users. The potential of parallelism as a solution in remote sensing has not been addressed for a distributed architecture that includes the UAV processors. The architecture is demonstrated for a specific problem: the counting of olive trees in a crop field where the trees are regularly spaced from each other. During the flight, the embedded computer is able to process individual images on board the UAV and provide the total count. The tree counting algorithm obtains an F-1 score of 99.09% for a sequence of ten images with 332 olive trees. The detected trees are geolocated and can be visualized on the Internet seconds after the take-off of the flight, with no further processing required. This is a use case to demonstrate near real-time results obtained from UAS usage. Other more complex UAS applications, such as tree inventories, search and rescue, fire detection, or stock breeding, can potentially benefit from this architecture and obtain faster outcomes, accessible while the UAV is still on flight.
The needs for fast and efficient data communication has been progressively bringing original and agile design needs of the developed systems today. As the scope and capability of the system increases, the need for rap...
详细信息
The needs for fast and efficient data communication has been progressively bringing original and agile design needs of the developed systems today. As the scope and capability of the system increases, the need for rapid, accurate and real-time data sharing of a large number of hardware and software components in the environment arises. Data Distribution Service (DDS) technology is a middleware standard that has been used in the development of distributedsystems and has become popular in many sectors in recent years. In this paper, the test approach applied to a DDS based system is discussed. This study, which we call "Data Distribution Based distributed System Test Aprroach", presents a flexible and expandable test infrastructure with the ability of dynamic environment recognition provided by DDS. In addition, the life cycle stages of the related system are designed as model based, and software test analysis and test design phases are also prepared in model based. In this study, it will presents a hybrid analysis that reliably delivering data communication by reducing efforts especially in distributed and real-timesystems.
distributed optical fiber sensors are an increasingly utilized method of gathering distributed strain and temperature data. However, the large amount of data they generate present a challenge that limits their use in ...
详细信息
ISBN:
(数字)9781510616936
ISBN:
(纸本)9781510616936
distributed optical fiber sensors are an increasingly utilized method of gathering distributed strain and temperature data. However, the large amount of data they generate present a challenge that limits their use in real-time, in-situ applications. This letter describes a parallel and pipelined computing architecture that accelerates the signal-processing speed of sub-terahertz fiber sensor (sub-THz-fs) arrays, maintaining high spatial resolution while allowing for expanded use for real-time sensing and control applications. The computing architecture described was successfully implemented in a field programmable gate array (FPGA) chip. The signal processing for the entire array takes only 12 system clock cycles. In addition, this design removes the necessity of storing any raw or intermediate data.
Most of the current automatic speech recognition is performed on a remote server. However, the demand for speech recognition on personal devices is increasing, owing to the requirement of shorter recognition latency a...
详细信息
ISBN:
(纸本)9781538643341
Most of the current automatic speech recognition is performed on a remote server. However, the demand for speech recognition on personal devices is increasing, owing to the requirement of shorter recognition latency and increased privacy. End-to-end speech recognition that employs recurrent neural networks (RNNs) shows good accuracy, but the execution of conventional RNNs, such as the long short-term memory (LSTM) or gated recurrent unit (GRU), demands many memory accesses, thus hindering its real-time execution on smart-phones or embedded systems. To solve this problem, we built an end-to-end acoustic model (AM) using linear recurrent units instead of LSTM or GRU and employed a multi-step parallel approach for reducing the number of DRAM accesses. The AM is trained with the connectionist temporal classification (CTC) loss, and the decoding is conducted using weighted finite-state transducers (WFSTs). The proposed system achieves x4.8 real-time speed when executed on a single core of an ARM CPU-based system.
暂无评论