the Common Instrument Middleware architecture (CIMA) defines a web services interface to scientific instruments. We have been experimenting withthe use of CIMA web services for remote monitoring of synchrotron experi...
详细信息
ISBN:
(纸本)159593717X
the Common Instrument Middleware architecture (CIMA) defines a web services interface to scientific instruments. We have been experimenting withthe use of CIMA web services for remote monitoring of synchrotron experiments and real-time data download, processing and storage. Here we discuss some performance issues with data transfer using CIMA web services, particularly for long-distance, high-latency transfers. We explore alternative approaches for improving the performance and robustness of data transfer with CIMA, and provide some experimental results. Copyright 2007 ACM.
the proceedings contain 49 papers. the topics discussed include: the end of denial architecture and the rise of throughout computing;ERfair scheduler with processor shutdown;service oriented architecture for job submi...
ISBN:
(纸本)9781424449224
the proceedings contain 49 papers. the topics discussed include: the end of denial architecture and the rise of throughout computing;ERfair scheduler with processor shutdown;service oriented architecture for job submission and management on grid computing resources;improved opportunistic scheduling algorithms for WiMAX mobile multihop relay networks;automatic data placement and replication in grids;spanning tree routing strategies for divisible load scheduling on arbitrary graphs - a comparative performance analysis;distance-aware round-robin mapping for large NUCA caches;fast checkpointing by write aggregation with dynamic buffer and interleaving on multicore architecture;a framework for routing and resource allocation in network virtualization;cache streamization for highperformance stream processor;and on providing event reliability and maximizing network lifetime using mobile data-collectors in wireless sensor networks.
this work presents an implementation of Neocognitron Neural Network, using a highperformancecomputingarchitecture based on GPU (Graphics Processing Unit). Neocognitron is an artificial neural network, proposed by F...
详细信息
ISBN:
(纸本)9780769534237
this work presents an implementation of Neocognitron Neural Network, using a highperformancecomputingarchitecture based on GPU (Graphics Processing Unit). Neocognitron is an artificial neural network, proposed by Fukushima and collaborators, constituted of several hierarchical stages of neuron layers, organized in. two-dimensional matrices called cellular planes. For the highperformance computation of Face Recognition application using Neocognitron it was used CUDA (Compute Unified Device architecture) as API (Application Programming Interface) between the CPU and the GPU, from GeForce 8800 GTX of NVIDIA company, with 128 ALU's. As face image databases it was used a face database created at UFS-Car and the CMU-PIE (Carnegie Mellon University Pose, Illumination and Expression) database. the load balancing was achieved through the use of cellular connections as threads organized in blocks, following the CUDA philosophy), of development. the results showed the feasibility of this type of device as a massively parallel data processing tool, and that smaller the granularity and the data dependency of the parallel processing, better is its performance.
Technology scaling has led to the integration of many cores into a single chip. As a result, on-chip interconnection networks start to play a more and more important role in determining the performance and power of th...
详细信息
ISBN:
(纸本)9781424456598
Technology scaling has led to the integration of many cores into a single chip. As a result, on-chip interconnection networks start to play a more and more important role in determining the performance and power of the entire chip. Packet-switched network-on-chip (NoC) has provided a scalable solution to the communications for tiled multicore processors. However the virtual-channel (VC) buffers in the NoC consume significant dynamic and leakage power of the system. To improve the energy efficiency of the router design, it is advantageous to use small buffer sizes while still maintaining throughput of the network. this paper proposes two new virtual channel allocation (VA) mechanisms, termed Fixed VC Assignment with Dynamic VC Allocation (FVADA) and Adjustable VC Assignment with Dynamic VC Allocation (AVADA). the idea is that VCs are assigned based on the designated output port of a packet to reduce the Head-of-Line (HoL) blocking. Also, the number of VCs allocated for each output port can be adjusted dynamically. Unlike previous buffer-pool based designs, we only use a small number of VCs to keep the arbitration latency low. Simulation results show that FVADA and AVADA can improve the network throughput by 41% on average, compared to a baseline design withthe same buffer size. AVADA can still outperform the baseline even when our buffer size is halved. Moreover, we are able to achieve comparable or better throughput than a previous dynamic VC allocator while reducing its critical path delay by 60%. Our results prove that the proposed VA mechanisms are suitable for low-power, high-throughput, and high-frequency on-chip network designs.
One of the main challenges to the wide use of the Internet is the scalability of the servers, that is, their ability to handle the increasing demand. Scalability in stateful servers, which comprise e-Commerce and othe...
详细信息
ISBN:
(纸本)0769520464
One of the main challenges to the wide use of the Internet is the scalability of the servers, that is, their ability to handle the increasing demand. Scalability in stateful servers, which comprise e-Commerce and other transaction-oriented servers, is even more difficult, since it is necessary to keep transaction data across requests from the same user One common strategy for achieving scalability is to employ clustered servers, where the load is distributed among the various servers. However, as a consequence of the workload characteristics and the need of maintaining data coherent among the servers that compose the cluster, load imbalance arise among servers, reducing the efficiency of the server as a whole. In this paper we propose and evaluate a strategy for load balancing in stateful clustered servers. Our strategy is based on control theory and allowed significant gains over configurations that do not employ the load balancing strategy, reducing the response time in up to 50% and increasing the throughput in up to 16%.
In this paper we present VCube-PS, a topic-based Publish/Subscribe system built on the top of a virtual hypercube-like topology. Membership information and published messages to subscribers (members) of a topic group ...
详细信息
ISBN:
(纸本)9781509012336
In this paper we present VCube-PS, a topic-based Publish/Subscribe system built on the top of a virtual hypercube-like topology. Membership information and published messages to subscribers (members) of a topic group are broadcast over dynamically built spanning trees rooted at the message's source. For a given topic, delivery of published messages respects causal order. performance results of experiments conducted on the PeerSim simulator confirm the efficiency of VCube-PS in terms of scalability, latency, number, and size of messages when compared to a single rooted, not dynamically, tree built approach.
In this article our goal is to apply the mobile agent technology to provide a better scheduling to MPI applications executing in a cluster configuration. this approach could represent in a distributed cluster environm...
详细信息
ISBN:
(纸本)0769516262
In this article our goal is to apply the mobile agent technology to provide a better scheduling to MPI applications executing in a cluster configuration. this approach could represent in a distributed cluster environment an enhancement on the load balancing of the parallel processes. MPI in cluster of heterogeneous machines could lead parallel programmers to obtain frustrated results, mainly because of the lack of an even distribution of the workload in the cluster. As a result, before submitting a MPI application to a cluster, we use our JOTA mobile agent approach to acquire a more precise information of machine's workload, therefore, with a more precise knowledge of the load and characteristics in each machine, we are ready to gather lightweight workstations to form a cluster. Our empirical results indicates that it is possible to spend less elapsed time when considering the execution of a parallel application using the agent approach in comparison to an ordinary MPI environment.
Grid applications often need to distribute large amounts of data efficiently from one cluster to multiple others (multicast). Existing methods usually arrange nodes in optimized tree structures, based on external netw...
详细信息
ISBN:
(纸本)1595936734
Grid applications often need to distribute large amounts of data efficiently from one cluster to multiple others (multicast). Existing methods usually arrange nodes in optimized tree structures, based on external network monitoring data. this dependence on monitoring data, however, severely impacts both ease of deployment and adaptivity to dynamically changing network conditions. In this paper, we present Multicast Optimizing Bandwidth (MOB), a high-throughput multicast approach, inspired by the BitTorrent protocol. With MOB, data transfers are initiated by the receivers that try to steal data from peer clusters. Instead of using potentially outdated monitoring data, MOB automatically adapts to the currently achievable bandwidth ratios. Our experimental evaluation compares MOB to boththe BitTorrent protocol and to our previous approach, Balanced Multicasting, the latter optimizing multicast trees based on external monitoring data. We show that MOB outperforms the BitTorrent protocol. MOB is competitive with Balanced Multicasting as long as the network bandwidth remains stable. With dynamically changing bandwith, MOB outperforms Balanced Multicasting by wide margins. Copyright 2007 ACM.
the increasing performance needs in critical real-time embedded systems (CRTES), such as for instance the automotive domain, push for the adoption of high-performance hardware from the consumer electronics domain. How...
详细信息
ISBN:
(纸本)9781538677698
the increasing performance needs in critical real-time embedded systems (CRTES), such as for instance the automotive domain, push for the adoption of high-performance hardware from the consumer electronics domain. However, their time-predictability features are quite unexplored. the ARM *** architecture is a good candidate for adoption in the CRTES market (i.e. in the automotive market it has already started being used). In this paper we study ARM ***'s capabilities to meet CRTES requirements. In particular, we perform a qualitative and quantitative assessment of its timing characteristics, focusing on shared multicore resources, and how this architecture can be reliably used in CRTES.
the development of new technologies is setting a new era characterized, among other factors, by the rise of sophisticated mobile devices containing CPUs and GPUs. this emerging scenario of heterogeneous mobile archite...
详细信息
ISBN:
(纸本)9781509012336
the development of new technologies is setting a new era characterized, among other factors, by the rise of sophisticated mobile devices containing CPUs and GPUs. this emerging scenario of heterogeneous mobile architectures brings challenging issues regarding the use of the available computing resources. Such issues are mainly related to the intrinsic complexity of coordinating these processors in order to increase application performance. In this sense, this paper presents a high-level programming model to implement parallel patterns that can be executed in a coordinate way by heterogeneous mobile architectures. A comparative analysis of performance and programming complexity is presented, contrasting code generated automatically from the proposed programming model with low-level manually-optimized implementations.
暂无评论