Video encoding due to its high processing requirements has been traditionally done using special-purpose hardware. Software solutions have been explored but are considered to be feasible only for nonreal-time applicat...
详细信息
Video encoding due to its high processing requirements has been traditionally done using special-purpose hardware. Software solutions have been explored but are considered to be feasible only for nonreal-time applications requiring low encoding rates. However, a software solution using a general-purpose computing system has numerous advantages: It is more available and flexible and allows experimenting with and hence improving various components of the encoder. In this paper, we present the performance of a software video encoder with MPEG-2 quality on various parallel and distributed platforms, The platforms include an Intel Paragon XP/S and an Intel iPSC/860 hypercube parallel computer as well as various networked clusters of workstations. Our encoder is portable across these platforms and uses a data-parallel approach in which parallelism is achieved by distributing each frame across the processors. The encoder is useful for both real-time and nonreal-time applications, and its performance scales according to the available number of processors. In addition, the encoder provides control over various parameters such as the size of motion search window, buffer management, and bit rate, The performance results include comparisons of execution times, speedups, and frame encoding rates on various systems.
For this special issue we have selected five papers that address, from several points of view, the problem of efficient utilization of resources in parallel and distributed systems. These papers were among the best pa...
详细信息
For this special issue we have selected five papers that address, from several points of view, the problem of efficient utilization of resources in parallel and distributed systems. These papers were among the best papers presented at the IASTED PDCS 2001 conference. The topics covered include: efficient cache strategies for simultaneous execution of threads as well as for the distribution of video-on-demand, efficient communication and failure recovery, and run-time support for the automatic parallelization of dynamic structures.
Applications employing the actor model of concurrent computation are becoming popular nowadays. On the one hand, the foundational characteristics of the actor model make it attractive in parallel and distributed setti...
详细信息
ISBN:
(纸本)9781450348379
Applications employing the actor model of concurrent computation are becoming popular nowadays. On the one hand, the foundational characteristics of the actor model make it attractive in parallel and distributed settings. On the other hand, effective investigation of poor performance in actor based applications requires dedicated metrics and profiling methods. Unfortunately, little research has been conducted on this topic to date, and developers are forced to investigate suboptimal performance with general-purpose profilers that fall short in locating scalability bottlenecks and performance inefficiencies. This position paper advocates the need for dedicated profiling techniques and tools for actor-based applications, focusing specifically on inter-actor communication and actor utilization. Our preliminary results support the importance of dedicated actor profiling and motivate further research on this topic.
Null message algorithm (NMA) is one of the efficient conservative time management algorithms that use null messages to provide synchronization between the logical processes (LPs) in a parallel discrete event simulatio...
详细信息
ISBN:
(纸本)9781424427024
Null message algorithm (NMA) is one of the efficient conservative time management algorithms that use null messages to provide synchronization between the logical processes (LPs) in a parallel discrete event simulation (PDES) system. However, the performance of a PDES system could be severely degraded if a large number of null messages need to be generated by LPs to avoid deadlock. In this paper, we present a mathematical model based on the quantitative criteria specified in [12] to optimize the performance of NMA by reducing the null message traffic. Moreover, the proposed mathematical model can be used to approximate the optimal values of some critical parameters such as frequency of transmission, Lookahead (L) values, and the variance of null message elimination. In addition, the performance analysis of the proposed mathematical model incorporates both uniform and non-uniform distribution of L values across multiple output lines of an LP. Our simulation and numerical analysis suggest that an optimal NMA offers better scalability in PDES system if it is used with the proper selection of critical parameters.
Static analysis, based on scheduling techniques, provides the most typical approach for validation of real-time systems. However, in the case of complex real-time systems such as parallel and distributed systems, many...
详细信息
Static analysis, based on scheduling techniques, provides the most typical approach for validation of real-time systems. However, in the case of complex real-time systems such as parallel and distributed systems, many simplifications are made in order to make analysis tractable. This means that even if the system can be statically validated, the real behaviour of the system in execution may be different enough from its theoretical behaviour to make it invalid. In these cases, an analysis based on measurement of the system in execution constitutes an invaluable aid to the static analysis. This article describes a methodology for the analysis of the temporal behaviour of parallel and distributed realtime systems with end-to-end constraints. The analysis is based on the measurement of a prototype of the system in execution and is supported by a behavioural model. The main components of the model are the sequences of activities throughout the system tasks (transactions), which are carried out in response to input events, causing the corresponding output events. Thus, the temporal behaviour of the system is viewed as a set of real-time transactions competing for the available resources. This article also includes experimental results of applying the methodology to the analysis of a well-known case study. (C) 2000 Elsevier Science B.V. All rights reserved.
Prediction of the translation initiation site is of vital importance in bioinformatics since through this process it is possible to understand the organic formation and metabolic behavior of living organisms. Sequenti...
详细信息
ISBN:
(纸本)9781467317146
Prediction of the translation initiation site is of vital importance in bioinformatics since through this process it is possible to understand the organic formation and metabolic behavior of living organisms. Sequential algorithms are not always a viable solution due to the fact that mRNA databases are normally very large, resulting in long processing times. Applying parallel and distributed computing resources to such databases could help reduce this time. The objective of this article is to present a class balancing solution for the translation initiation site process using parallel and distributed computing resources in a hybrid model. The results reveal a speedup of up to 23 times compared to sequential methods and performance rates for accuracy, precision, sensitivity, specificity and adjusted accuracy of 91.15%, 39.83%, 89.11%, 88.93% and 89.02%, respectively, for the Homo sapiens database. For the Drosophila melanogaster database, the speedup was 18.33 times and accuracy, precision, sensitivity, specificity and adjusted accuracy were 95.22%, 43.01%, 90.83%, 90.47% and 90.64%, respectively. Both sets of results are considered important. Thus, the solution presented in this article demonstrated itself viable for the problem in question.
Scientific datasets of large volumes generated by next-generation computational sciences need to be transferred and processed for remote visualization and distributed collaboration among a geographically dispersed tea...
详细信息
Scientific datasets of large volumes generated by next-generation computational sciences need to be transferred and processed for remote visualization and distributed collaboration among a geographically dispersed team of scientists. parallel visualization using high-performance computing facilities is a typical approach to processing such increasingly large datasets. We propose an optimized image compositing scheme with linear pipeline and adaptive transport to support efficient image delivery to a remote client. The proposed scheme arranges an arbitrary number of parallel processors within a cluster in a linear order and divides the image into a carefully selected number of segments, which flow through the linear incluster pipeline and wide-area networks to the remote client consecutively. We analytically determine the segment size that minimizes the final image display time and derive the conditions where the proposed image compositing and delivery scheme outperforms the traditional schemes including the binary swap algorithm. In order to match the transport throughput for image delivery over wide-area networks to the pipelining rate for image compositing within the cluster, we design a class of transport protocols using stochastic approximation methods that are able to stabilize the data flow at a target rate. The experimental results from remote visualization of large-scale scientific datasets justify the correctness of our theoretical analysis and illustrate the superior performances of the proposed method. (C) 2008 Elsevier Inc. All rights reserved.
Many stochastic models and analysis techniques have been proposed in the literature during the last two decades for the performance evaluation of parallel and distributed systems. However, few of them are directly app...
详细信息
Many stochastic models and analysis techniques have been proposed in the literature during the last two decades for the performance evaluation of parallel and distributed systems. However, few of them are directly applicable to practical systems which are generally too complex. In this paper, we analyze performances of the master-slave computational model, one of the most commonly used models in parallel and distributed computations. We propose a hybrid analytical approach by using techniques from the theories of both stochastic task graphs and queueing networks. We apply this method to the analysis of a computational chemistry application running on a Transputer based system. The proposed method turns out to be not only very efficient in time for large systems but also very accurate compared to measuring. (C) 1998 Elsevier Science B.V. All rights reserved.
The sentry of a concurrent program P is a program that observes the execution of P, and issues a warning if P does not behave correctly with respect to a given set of logical properties (owing to a programming error o...
详细信息
The sentry of a concurrent program P is a program that observes the execution of P, and issues a warning if P does not behave correctly with respect to a given set of logical properties (owing to a programming error or a failure), The synchronization between the program and sentry is such that the program never waits for the sentry, the shared storage between them is very small (in fact linear in the number of program variables being observed), and the snapshots read by the sentry are consistent. To satisfy these three requirements, some snapshots may be overwritten by the program before being read by the sentry, We develop a family of algorithms that preserve these requirements for properties involving scalar variables, then extend the algorithms to permit the observation of large data structures without additional overhead. We describe in detail the annotation language with which the properties can be expressed, and a prototype system that we have implemented to generate the sentry automatically for any given concurrent C program. Finally, we present experimental results that show that the overhead incurred by the sentry is on average no worse than ten per cent for snapshots of up to six variables, and that the loss of snapshots prevents the sentry's detection of an single violation in less than four per cent of the cases. Recurring errors are detected at a rate of 100 per cent.
Y The topological structures of the interconnection networks of some parallel and distributed systems are designed as n-dimensional hypercube Q(n) or n-dimensional folded hypercube FQ n with N = 2(n) processors. For i...
详细信息
Y The topological structures of the interconnection networks of some parallel and distributed systems are designed as n-dimensional hypercube Q(n) or n-dimensional folded hypercube FQ n with N = 2(n) processors. For integers 0 <= k <= n - 1, let Pk(1)(,) P-2(k) and P-3(k) be the property of having at least k neighbors for each processor, containing at least 2k processors and admitting average neighbors at least k, respectively. P-conditional edge-connectivity of G, lambda(P, G), is the minimum cardinality of faulty edge-cut, whose malfunction divides this network into several components, with each component satisfying the property of P. For each integer 0 = k = n - 1, and 1 <= i <= 3, this paper offers a unified method to investigate the Pk i -conditional edge-connectivity of Q(n) and FQ(n). Exact value of P-i(k)-conditional edge-connectivity of Q n, lambda(P-i(k), Q(n)), is (n - k)2(k), and that of P-i (k)-conditional edge-connectivity of FQ(n), lambda(P-i(k), FQ (n)), is (n - k + 1)2(k). Our method generalizes the result of Guo and Guo in [The Journal of Supercomputing, 2014, 68:1235-1240] and the previous other results.
暂无评论