Multicast communication constrained by end-to-end delay and inter-destination delay variation is known as Delay and Delay Variation Bounded Multicast (DVBM). In this paper, we propose a dynamic multi-core multicast ap...
详细信息
ISBN:
(纸本)9781538649756
Multicast communication constrained by end-to-end delay and inter-destination delay variation is known as Delay and Delay Variation Bounded Multicast (DVBM). In this paper, we propose a dynamic multi-core multicast approach to solve the DVBM problem. The proposed three-phase algorithm, Multi-core DVBM Trees (MCDVBMT), semi-matches group members to core nodes. The message is disseminated to group members using trees rooted at the designated core nodes. MCDVBMT dynamically reorganizes the rooted trees in response to changes to multicast group members. On average, only 5.2% of the total requests trigger re-executions and 53.6% of the graphs generated by MCDVBMT suffer from re-execution before receiving all dynamic requests.
We present a performance analysis of a parallel implementation to both preconditioned Conjugate Gradient and preconditioned Bi-conjugate Gradient solvers using graphic processing units with CUDA programming model. The...
详细信息
ISBN:
(纸本)9781479927289
We present a performance analysis of a parallel implementation to both preconditioned Conjugate Gradient and preconditioned Bi-conjugate Gradient solvers using graphic processing units with CUDA programming model. The solvers were optimized for the solution of sparse systems of equations arising from Finite Element Analysis of electromagnetic phenomena involved in the diffusion of underground currents under time-harmonic current excitation. We used a shifted Incomplete Cholesky factorization as preconditioner. Results show a significant speedup by using the GPU compared to a serial CPU implementation.
The increasing need for physical security in critical environment has led to a widespread of video surveillance systems. Effective video surveillance systems should be able to detect the presence of unauthorized peopl...
详细信息
ISBN:
(纸本)9781467387767
The increasing need for physical security in critical environment has led to a widespread of video surveillance systems. Effective video surveillance systems should be able to detect the presence of unauthorized people in the monitored environments while preserving the privacy of authorized ones. To this aim, our paper proposes the adoption of the usage control model in the video surveillance scenario to enforce security policies that continuously control whether a person holds the right to stay in a give space (e.g., a room) from the moment when this person enters that space. In some scenarios, a person is allowed to stay in the room only under some circumstances, which are described by the usage control policy. When the policy is violated an action is taken, e.g., the video camera placed in the room enables the registration. This paper presents the architecture of the proposed framework, provides an example of usage control policy in a real scenario, and describes the main details of our prototype implementation.
Cloud computing is the dominating paradigm in distributed computing. The most popular open source cloud solutions support different type of storage subsystems, because of the different needs of the deployed services (...
详细信息
ISBN:
(纸本)9780769549392;9781467353212
Cloud computing is the dominating paradigm in distributed computing. The most popular open source cloud solutions support different type of storage subsystems, because of the different needs of the deployed services (in terms of performance, flexibility, cost-effectiveness). In this paper, we investigate the supported standard and open source storage types and create a classification. We point out that the Internet Small Computer System Interface (iSCSI) based block level storage can be used for I/O intensive services currently. However, the ATA-over-Ethernet (AoE) protocol uses fewer layers and operates on lower level which makes it more lightweight and faster than iSCSI. Therefore, we proposed an architecture for AoE based storage support in OpenNebula cloud. The novel storage solution was implemented and the performance evaluation shows that the I/O throughput of the AoE based storage is better (32.5-61.5%) compared to the prior iSCSI based storage and the new storage solution needs less CPU time (41.37%) to provide the same services.
Software applications for biological networks analysis rely on graphs to model the structure interactions. A great part of them requires searching for subgraphs in a target graph or in collections of graphs. Even thou...
详细信息
ISBN:
(纸本)9781728116440
Software applications for biological networks analysis rely on graphs to model the structure interactions. A great part of them requires searching for subgraphs in a target graph or in collections of graphs. Even though very efficient algorithms have been defined to solve such a subgraph isomorphisms problem, the complexity of current real biological networks make their sequential execution time prohibitive. On the other hand, parallel architectures, from multi-core to many-core, have become pervasive to deal with the problem of the data size. Nevertheless, the sequential nature of the graph searching algorithms makes their implementation for parallel architectures very challenging. This paper presents three different parallel solutions for the graph searching problem. The first two target the exact search for multi-core CPUs and many-core GPUs, respectively. The third one targets the approximate search for GPUs, which handles node, edge, and node label mismatches. The paper shows how different techniques have been developed in all the solutions to reduce the search space complexity. The paper shows the performance of the proposed solutions on representative biological networks containing antiviral chemical compounds and protein interactions networks.
Three-dimensional stack memory that provides both high-bandwidth access and large capacity is a promising technology for next-generation computer systems. While a large number of memory cubes increase the aggregate me...
详细信息
ISBN:
(纸本)9781665414555
Three-dimensional stack memory that provides both high-bandwidth access and large capacity is a promising technology for next-generation computer systems. While a large number of memory cubes increase the aggregate memory capacity, the communication latency and power consumption would bet significant due to its low-radix large-diameter packet network. In this context, we propose a memory-cube network called Diagonal Memory network (DMN). A diagonal network topology, its floor layout, and its lightweight router are designed for low-latency and low-voltage memory-read communication. Our evaluation results show that a DMN router decreases 31% of the hardware resources than a conventional virtual-channel router. The DMN router reduces 137 and 6714 energy consumption to transit a packet along with the original datapath and bypassing datapath, respectively.
GPUs have been used to accelerate different data parallel applications. The challenge consists in using GPUs to accelerate stream processing applications. Our goal is to investigate and evaluate whether stream paralle...
详细信息
ISBN:
(纸本)9781728116440
GPUs have been used to accelerate different data parallel applications. The challenge consists in using GPUs to accelerate stream processing applications. Our goal is to investigate and evaluate whether stream parallel applications may benefit from parallel execution on both CPU and GPU cores. In this paper, we introduce new parallel algorithms for the Lempel-Ziv-Storer-Szymanski (LZSS) data compression application. We implemented the algorithms targeting both CPUs and GPUs. GPUs have been used with CUDA and OpenCL to exploit inner algorithm data parallelism. Outer stream parallelism has been exploited using CPU cores through SPar. The parallel implementation of LZSS achieved 135 fold speedup using a multi-core CPU and two GPUs. We also observed speedups in applications where we were not expecting to get it using the same combine data-stream parallel exploitation techniques.
The growing heterogeneity and decentralization in the modern computing paradigm of edge-cloud continuum introduces new constraints on storage systems, such as storage type, associated processors, privacy, scarce resou...
详细信息
ISBN:
(纸本)9798350363074;9798350363081
The growing heterogeneity and decentralization in the modern computing paradigm of edge-cloud continuum introduces new constraints on storage systems, such as storage type, associated processors, privacy, scarce resources, compliance, GDPR and geographical restrictions. While existing distributed data and object stores can ensure data availability and faulttolerance, they are not flexible or dynamic enough to address these diverse set of constraints. In this paper, we introduce a modular policy-driven data placement framework, CATER, designed to seamlessly integrate with existing storage systems and overcome the aforementioned limitations. CATER formulates the data placement problem as an optimization model, incorporating data collocation and hardware constraints. We integrated a prototype of CATER with Apache Ozone and conducted experiments and simulations. Results show a 23% improvement in data placement while respecting 100% of the constraints.
Our target in this work is to study ways of exploring the parallelism offered by vectorization on accelerators with very wide vector units. To this end, we implemented two kernels that derive from the Wilson Dslash op...
详细信息
ISBN:
(纸本)9781467387767
Our target in this work is to study ways of exploring the parallelism offered by vectorization on accelerators with very wide vector units. To this end, we implemented two kernels that derive from the Wilson Dslash operator and investigate several data layout techniques for increasing the scalability of lattice QCD scientific kernels suitable for the Intel Xeon Phi. In parts of the application where real numbers are used for computation, we see a 6.6x increase in bandwidth compared to scalar code, thanks to the auto-vectorization by the compiler. In other kernels where arithmetic operations on complex numbers dominate, our hand-vectorized code outperforms the auto-vectorization of the compiler. In this paper we find that our proposed Hopping Vector-friendly Ordering allows for more efficient vectorization of complex arithmetic floating point operations. Using this data layout, we manage to increase the sustained bandwidth by approximately 1.8x.
The software development process utilized to implement enterprise systems using a component architecture in a distributed environment is well understood. The service oriented architecture enables the joint development...
详细信息
ISBN:
(纸本)9780769535449
The software development process utilized to implement enterprise systems using a component architecture in a distributed environment is well understood. The service oriented architecture enables the joint development of global and enterprise-wide solutions by several developers following enterprise-wide IT-strategies as coordinated by the customer. However, the necessary level of collaboration between parties from the private or public sector requires well-defined interoperability specifications. The existing standardization processes are usually too complex and inflexible for widely available interoperability specifications in rather specialized application domains. The Semantic Interoperability Centre Europe (***) defines an open clearing process to enable the collaborative development of interoperability specifications in order to exchange data especially between public administrations from the Member States of the European Community. based on this scenario the publishing process and tools required for interoperability specifications supporting the data exchange in service- or web-oriented architectures are compared with the traditional software development process.
暂无评论