In this paper, the method of data intensive computing is studied for large amounts of data in computed tomography (CT). An automatic workflow is built up to connect the tomography beamline of ANKA with the large scale...
详细信息
ISBN:
(纸本)9780769549392;9781467353212
In this paper, the method of data intensive computing is studied for large amounts of data in computed tomography (CT). An automatic workflow is built up to connect the tomography beamline of ANKA with the large scale data facility (LSDF), able to enhance the data storage and analysis efficiency. In this workflow, this paper focuses on the parallel computing of 3D computed tomography reconstruction. Different from the existing reconstruction system with filtered back-projection method, an algebraic reconstruction technique based on compressive sampling theory is presented to reconstruct the data from ultrafast computed tomography with fewer projections. Then the connected computing resources at the LSDF are used to implement the 3D CT reconstruction by distributing the whole job into multiple tasks executed in parallel. Promising reconstruction images and high computing performance are reported. For the 3D X-ray CT reconstruction, less than six minutes are actually required. LSDF is not only able to organize data efficiently, but also can provide reconstructed results to users in nearly instantaneous time. After integration into the workflow, this data intensive computing method will largely improve the data processing for ultrafast computed tomography at ANKA.
Energy awareness is an important aspect of modern network and computing system design and management, especially in the case of internet-scale networks and dataintensive large scale distributed computing systems. The...
详细信息
Energy awareness is an important aspect of modern network and computing system design and management, especially in the case of internet-scale networks and dataintensive large scale distributed computing systems. The main challenge is to design and develop novel technologies, architectures and methods that allow us to reduce energy consumption in such infrastructures, which is also the main reason for reducing the total cost of running a network. Energy-aware network components as well as new control and optimization strategies may save the energy utilized by the whole system through adaptation of network capacity and resources to the actual traffic load and demands, while ensuring end-to-end quality of service. In this paper, we have designed and developed a two-level control framework for reducing power consumption in computer networks. The implementation of this framework provides the local control mechanisms that are implemented at the network device level and network-wide control strategies implemented at the central control level. We also developed network-wide optimization algorithms for calculating the power setting of energy consuming network components and energy-aware routing for the recommended network configuration. The utility and efficiency of our framework have been verified by simulation and by laboratory tests. The test cases were carried out on a number of synthetic as well as on real network topologies, giving encouraging results. Thus, we come up with well justified recommendations for energy-aware computer network design, to conclude the paper. (C) 2013 Elsevier B.V. All rights reserved.
Centralized cloud infrastructures have become the popular platforms for data-intensivecomputing today. However, they suffer from inefficient data mobility due to the centralization of cloud resources, and hence, are ...
详细信息
Centralized cloud infrastructures have become the popular platforms for data-intensivecomputing today. However, they suffer from inefficient data mobility due to the centralization of cloud resources, and hence, are highly unsuited for geo-distributed data-intensive applications where the data may be spread at multiple geographical locations. In this paper, we present Nebula: a dispersed edge cloud infrastructure that explores the use of voluntary resources for both computation and data storage. We describe the lightweight Nebula architecture that enables distributed data-intensivecomputing through a number of optimization techniques including location-aware data and computation placement, replication, and recovery. We evaluate Nebula performance on an emulated volunteer platform that spans over 50 PlanetLab nodes distributed across Europe, and show how a common data-intensivecomputing framework, MapReduce, can be easily deployed and run on Nebula. We show Nebula MapReduce is robust to a wide array of failures and substantially outperforms other wide-area versions based on emulated existing systems.
作者:
Wang, WeiZeng, GuosunTongji Univ
Dept Comp Sci & Engn Shanghai 200092 Peoples R China Tongji Branch
Natl Engn & Technol Ctr High Performance Comp Shanghai 200092 Peoples R China
Science is increasingly becoming more and more data-driven. The ability of a geographically distributed community of scientists to access and analyze large amounts of data has emerged as a significant requirement for ...
详细信息
Science is increasingly becoming more and more data-driven. The ability of a geographically distributed community of scientists to access and analyze large amounts of data has emerged as a significant requirement for furthering science. In data intensive computing environment with uncountable numeric nodes, resource is inevitably unreliable, which has a great effect on task execution and scheduling. Novel algorithms are needed to schedule the jobs on the trusty nodes to execute, assure the high speed of communication, reduce the jobs execution time, lower the ratio of failure execution, and improve the security of execution environment of important data. In this paper, a kind of trust mechanism-based task scheduling model was presented. Referring to the trust relationship models of social persons, trust relationship is built among computing nodes, and the trustworthiness of nodes is evaluated by utilizing the Bayesian cognitive method. Integrating the trustworthiness of nodes into a Dynamic Level Scheduling (DLS) algorithm, the Trust-Dynamic Level Scheduling (Trust-DLS) algorithm is proposed. Moreover, a benchmark is structured to span a range of data intensive computing characteristics for evaluation the proposed method. Theoretical analysis and simulations prove that the Trust-DLS algorithm can efficiently meet the requirement of dataintensive workloads in trust, sacrificing fewer time costs, and assuring the execution of tasks in a security way in large-scale data intensive computing environment.
The architecture of healthcare information system (HIS) is important framework. The architecture implements the cutting-edge technologies of cloud computing and big data in parallel and distributed computing manors. T...
详细信息
ISBN:
(纸本)9781509004546
The architecture of healthcare information system (HIS) is important framework. The architecture implements the cutting-edge technologies of cloud computing and big data in parallel and distributed computing manors. The architecture scheme is proposed in SoSE-based systems engineering paradigm. The implementation of such architecture for healthcare data processing is given via multiple layers of big data implementations. This study is a primary work which provides basic insight into future intelligent system of HIS systems.
data intensive computing (DIC) offers an attractive option for business to remotely execute applications and load the computing resources from cloud in a streaming way. A key challenge in such environment is to increa...
详细信息
ISBN:
(纸本)9781509018932
data intensive computing (DIC) offers an attractive option for business to remotely execute applications and load the computing resources from cloud in a streaming way. A key challenge in such environment is to increase the utilization of cloud cluster for the high throughput processing. One way of achieving this goal is to optimize the execution of computing jobs on the cluster. We observe that the order in which these jobs are executed can have a significant impact on their overall completion time (makespan). Our goal is to design a job scheduler that minimizes the makespan. In this study, a new formalization is introduced to present each job as a pair of disk processing and network transmitting two-stage durations. Due to the streaming processing feature, the two-stage operations are executed in an overlap manner and may lead to both one-stage and two-stage scheduling situations. A novel heuristic scheduling strategy is proposed for this hybrid scheduling problem, and the performance of the method is confirmed by the experimental evaluation.
Exciting changes are coming to the fore in the world of Storage and I/O for High Performance and data intensive computing. Existing data access and data retrieval methods that suitably support the new generation of da...
详细信息
ISBN:
(纸本)9781450391993
Exciting changes are coming to the fore in the world of Storage and I/O for High Performance and data intensive computing. Existing data access and data retrieval methods that suitably support the new generation of dataintensive applications in the realm of HPC and AI are being re-assessed. These new applications need to support both evolutionary as well as revolutionary approaches to data access and storage. Open source is enabling the adoption of these new techniques. This workshop looks at cutting edge trends in storage systems and solutions for dataintensive HPC and AI applications with specific focus primarily on community driven initiatives and commercial products that may inspire the community. The EMOSS'22 workshop proceedings are available at: https://***/***?id=3526061
More and more parallel applications are running in a distributed environment to take advantage of easily available and inexpensive commodity resources. For dataintensive applications, employing multiple distributed s...
详细信息
More and more parallel applications are running in a distributed environment to take advantage of easily available and inexpensive commodity resources. For dataintensive applications, employing multiple distributed storage resources has many advantages. In this paper, we present a Multi-Storage I/O System (MS-I/O) that cannot only effectively manage various distributed storage resources in the system, but also provide novel high performance storage access schemes. MS-I/O employs many state-of-the-art I/O optimizations such as collective I/O, asynchronous I/O etc. and a number of new techniques such as data location, data replication, subfile, superfile and data access history. In addition, many MS-I/O optimization schemes can work simultaneously within a single data access session, greatly improving the performance. Although I/O optimization techniques can help improve performance, it also complicates I/O system. In addition, most optimization techniques have their limitations. Therefore, selecting accurate optimization policies requires expert knowledge which is not suitable for end users who may have little knowledge of I/O techniques. So the task of I/O optimization decision should be left to the I/O system itself, that is, automatic from user's point of view. We present a User Access Pattern data structure which is associated with each dataset that can help MS-I/O easily make accurate I/O optimization decisions. (C) 2003 Elsevier B.V. All rights reserved.
We discuss the extended parallel pattern set identified within the EU-funded project RePhrase as a candidate pattern set to support dataintensive applications targeting heterogeneous architectures. The set has been d...
详细信息
We discuss the extended parallel pattern set identified within the EU-funded project RePhrase as a candidate pattern set to support dataintensive applications targeting heterogeneous architectures. The set has been designed to include three classes of pattern, namely (1) core patterns, modelling common, not necessarily dataintensive parallelism exploitation patterns, usually to be used in composition;(2) high level patterns, modelling common, complex and complete parallelism exploitation patterns;and (3) building block patterns, modelling the single components of dataintensive applications, suitable for usein compositionto implement patterns not covered by the core and high level patterns. We discuss the expressive power of the RePhrase extended pattern set and results illustrating the performances that may be achieved with the FastFlow implementation of the high level patterns.
Today, campus grids provide users with easy access to thousands of CPUs. However, it is not always easy for nonexpert users to harness these systems effectively. A large workload composed in what seems to be the obvio...
详细信息
Today, campus grids provide users with easy access to thousands of CPUs. However, it is not always easy for nonexpert users to harness these systems effectively. A large workload composed in what seems to be the obvious way by a naive user may accidentally abuse shared resources and achieve very poor performance. To address this problem, we argue that campus grids should provide end users with high-level abstractions that allow for the easy expression and efficient execution of data-intensive workloads. We present one example of an abstraction-All-Pairs-that fits the needs of several applications in biometrics, bioinformatics, and data mining. We demonstrate that an optimized All-Pairs abstraction is both easier to use than the underlying system, achieve performance orders of magnitude better than the obvious but naive approach, and is both faster and more efficient than a tuned conventional approach. This abstraction has been in production use for one year on a 500 CPU campus grid at the University of Notre Dame and has been used to carry out a groundbreaking analysis of biometric data.
暂无评论