the increasing algorithm complexity and dataset sizes necessitate the use of networked machines for many graph-parallelalgorithms, which also makes fault tolerance a must due to the increasing scale of machines. Unfo...
详细信息
the increasing algorithm complexity and dataset sizes necessitate the use of networked machines for many graph-parallelalgorithms, which also makes fault tolerance a must due to the increasing scale of machines. Unfortunately, existing large-scale graph-parallel systems usually adopt a distributed checkpoint mechanism for fault tolerance, which incurs not only notable performance overhead but also lengthy recovery time. this paper observes that the vertex replicas created for distributed graph computation can be naturally extended for fast in-memory recovery of graph states. this paper proposes Imitator, a new fault tolerance mechanism, that supports cheaply maintenance of vertex states by replicating vertex states to their replicas during normal message exchanges, and provides fast in-memory reconstruction of failed vertices from replicas in other machines. Imitator has been implemented by extending Hama, a popular open-source clone of Pregel. Evaluation shows that Imitator incurs negligible performance overhead (less than 5% for all cases) and can recover from failures of more than one million of vertices with less than 3.4 seconds.
An arbitrary sampling rate conversion algorithm based on frequency domain filtering(FDF-ASRC) is *** number of Fourier coefficients in FDF-ASRC is irrelevant to the symbol rate of interested *** the resampling algorit...
详细信息
An arbitrary sampling rate conversion algorithm based on frequency domain filtering(FDF-ASRC) is *** number of Fourier coefficients in FDF-ASRC is irrelevant to the symbol rate of interested *** the resampling algorithm to non-uniform channelization,a few modules of Fourier transform(FT) and inverse Fourier transform(IFT) can guarantee infinite channels in ***,the sampling rate of output signal can be adjusted to an integer multiple of symbol rate,which can be passed to demodulator *** comparison with seven architectures of non-uniform channelizers,the computational complexity of the proposed channelizers is equal to the best.
the number and diversity of cores in on-chip systems is increasing rapidly. However, due to the thermal Design Power (TDP) constraint, it is not possible to continuously operate all cores at the same time. Exceeding t...
详细信息
the number and diversity of cores in on-chip systems is increasing rapidly. However, due to the thermal Design Power (TDP) constraint, it is not possible to continuously operate all cores at the same time. Exceeding the TDP constraint may activate the Dynamic thermal Management (DTM) to ensure thermal stability. Such hardware based closed-loop safeguards pose a big challenge in using many-core chips for real-time tasks. Managing the worst-case peak power usage of a chip can help toward resolving this issue. We present a scheme to minimize the peak power usage for frame-based and periodic real-time tasks on many-core processors by scheduling the sleep cycles for each active core and introduce the concept of a sufficient test for peak power consumption for task feasibility. We consider both inter-task and inter-core diversity in terms of power usage and present computationally efficient algorithms for peak power minimization for these cases, i.e., a special case of “homogeneous tasks on homogeneous cores” to the general case of “heterogeneous tasks on heterogeneous cores”. We evaluate our solution through extensive simulations using the 48-core SCC platform and gem5 architecture simulator. Our simulation results show the efficacy of our scheme.
Software defined radio (SDR) is characterized by its flexibility and convenience of modification, because significant amounts of signal processing are handed over to the general-purpose processor (GPP), rather than be...
详细信息
ISBN:
(纸本)9781479959716
Software defined radio (SDR) is characterized by its flexibility and convenience of modification, because significant amounts of signal processing are handed over to the general-purpose processor (GPP), rather than being done in special-purpose hardware. In the long term, software-defined radios work well in low data transmission rate systems such as 2G/3G cellular systems and other wireless radio systems. With rapid development of GPP power, more complex algorithms and higher data rate wireless communication systems can be implemented with GPP processor and general purpose RF front end. In this paper we propose a LTE physical layer parallel system architecture based on multicore GPP to match up the high capacity requests of LTE air interface. Firstly, we analyzed the complexity of each LTE physical layer module and made a statistics of their processing time. And then we used Linux CPU affinity to allocate each LTE module to particular CPU core carefully based on the statistics. CPU utilization rate shows that using multi-core makes the system to have enough capacity. Our system can only match up 2M bandwidth when the LTE physical layer procedures are running on single CPU core, while it can match up 5M bandwidth when it's running on multi-core.
Advanced SSDs employ a RAM-based write buffer to improve their write performance. the buffer intentionally delays write requests in order to reduce flash write traffic and reorders them to minimize the cost of garbage...
详细信息
ISBN:
(纸本)9781479961245
Advanced SSDs employ a RAM-based write buffer to improve their write performance. the buffer intentionally delays write requests in order to reduce flash write traffic and reorders them to minimize the cost of garbage collection. this work presents a novel buffer algorithm for page-mapping multichannel SSDs. We propose grouping temporally or spatially correlated buffer pages and writing these grouped buffer pages to the same flash block. this strategy dramatically increases the probability of bulk data invalidations in flash blocks. In multichannel architectures, channels are assigned to their own groups of buffer pages for writing, and so channel striping does not divide a group of correlated buffer pages into small pieces. We have conducted simulations and experiments using a SSD simulator and a real SSD platform, respectively. Our results show that our design greatly outperforms existing buffer algorithms.
An extension to the classic von Neumann paradigms is suggested, which -from the point of view of chip designers- considers modern many-core processors, and -from the point of view of programmers- still remains the cla...
详细信息
An extension to the classic von Neumann paradigms is suggested, which -from the point of view of chip designers- considers modern many-core processors, and -from the point of view of programmers- still remains the classic von Neumann programming model. the work is based on the ideas that 1) the order in which the instructions (and/or code blocks) are executed does not matter, if some constraints do not force a special order of execution 2) a High Level parallelism for code blocks (similar to Instruction Level parallelism for instructions) can be introduced, allowing high-level out of order execution 3) discovering the possibilities for out of order execution can be done during compile time rather than runtime 4) the optimization possibilities discovered by the compile toolchain can be communicated to the processor in form of meta-information 5) the many computing resources (cores) can be assigned dynamically to machine instructions. It is shown that the multicore architectures could be transformed to a strongly enhanced single core processor. the key blocks of the proposal are a toolchain preparing the program code to run on many cores, a dispatch unit within the processor making effective use of the parallelized code, and also a much smarter communication method between the two key blocks is needed.
the desire to build a computer that operates in the same manner as our brains is as old as the computer itself. Although computer engineering has made great strides in hardware performance as a result of Dennard scali...
详细信息
ISBN:
(纸本)9781450323055
the desire to build a computer that operates in the same manner as our brains is as old as the computer itself. Although computer engineering has made great strides in hardware performance as a result of Dennard scaling, and even great advances in 'brain like' computation, the field still struggles to move beyond sequential, analytical computing architectures. Neuromorphic systems are being developed to transcend the barriers imposed by silicon power consumption, develop new algorithmsthat help machines achieve cognitive behaviors, and both exploit and enable further research in neuroscience. In this talk I will discuss a system im-plementing spiking neural networks. these systems hold the promise of an architecture that is event based, broad and shallow, and thus more power efficient than conventional computing solu-tions. this new approach to computation based on modeling the brain and its simple but highly connected units presents a host of new challenges. Hardware faces tradeoffs such as density or lower power at the cost of high interconnection overhead. Consequently, software systems must face choices about new language design. Highly distributed hardware systems require complex place and route algorithms to distribute the execution of the neural network across a large number of highly interconnected processing units. Finally, the overall design, simulation and testing process has to be entirely reimagined. We discuss these issues in the context of the Zeroth processor and how this approach compares to other neuromorphic systems that are becoming available.
Hierarchical clustering technology plays a very important role in image processing, intrusion detection and bioinformatics applications, which is one of the most extensively studied branch in data mining. Presently th...
详细信息
ISBN:
(纸本)9780769549323;9781467356527
Hierarchical clustering technology plays a very important role in image processing, intrusion detection and bioinformatics applications, which is one of the most extensively studied branch in data mining. Presently the parallel hierarchical algorithms aren't very good at processing large data. To overcome this shortcomings, a new parallel data preprocessing algorithm based on Hierarchical Clustering is proposed in this paper this algorithm can reduce the scale of data and runtime. accounting for one tenth of it in the best situation. the experiment proof the performance of our algorithm.
the growth in compute speed has outpaced the growth in network bandwidth over the last decades. this has led to an increasing performance gap between local and distributed processing. A parallel database cluster thus ...
详细信息
the growth in compute speed has outpaced the growth in network bandwidth over the last decades. this has led to an increasing performance gap between local and distributed processing. A parallel database cluster thus has to maximize the locality of query processing. A common technique to this end is to co-partition relations to avoid expensive data shuffling across the network. However, this is limited to one attribute per relation and is expensive to maintain in the face of updates. Other attributes often exhibit a fuzzy co-location due to correlations withthe distribution key but current approaches do not leverage this. In this paper, we introduce locality-sensitive data shuffling, which can dramatically reduce the amount of network communication for distributed operators such as join and aggregation. We present four novel techniques: (i) optimal partition assignment exploits locality to reduce the network phase duration; (ii) communication scheduling avoids bandwidth underutilization due to cross traffic; (iii) adaptive radix partitioning retains locality during data repartitioning and handles value skew gracefully; and (iv) selective broadcast reduces network communication in the presence of extreme value skew or large numbers of duplicates. We present comprehensive experimental results, which show that our techniques can improve performance by up to factor of 5 for fuzzy co-location and a factor of 3 for inputs with value skew.
Pervasive software should be able to adapt itself to the changing environments and user ***,it will bring great challenges to the software engineering *** paper proposes AUModel,a conceptual model for adaptive softwar...
详细信息
Pervasive software should be able to adapt itself to the changing environments and user ***,it will bring great challenges to the software engineering *** paper proposes AUModel,a conceptual model for adaptive software,which takes adaptability as an inherent feature and can act as the foundation of the engineering *** introducing AUModel,the reuse of software adaptation infrastructure as well as the separation of adaptation concerns are enabled,which can facilitate boththe development and maintenance of adaptive *** paper also presents our initial attempts to realize this model,including a middleware prototype to support this model and an application to validate its effectiveness.
暂无评论