In this paper we propose a method for run time profiling of applications on instruction level by analysis of loops. Instead of looking for coarse grain blocks we concentrate on fine grain but still costly blocks in te...
详细信息
ISBN:
(纸本)9780819476371
In this paper we propose a method for run time profiling of applications on instruction level by analysis of loops. Instead of looking for coarse grain blocks we concentrate on fine grain but still costly blocks in terms of execution times. Most code profiling is done in software by introducing code into the application under profile witch has time overhead, while in this work data for the position of a loop, loop body, size and number of executions is stored and analysed using a small non intrusive hardware block. The paper describes the system mapping to runtime reconfigurable systems. The fine grain code detector block synthesis results and its functionality verification are also presented in the paper. To demonstrate the concept MediaBench multimedia benchmark running on the chosen development platform is used.
In attribute reduction algorithms, discernibility matrix-based methods and heuristic-based methods are two highly effective approaches. While the prevailing view is that heuristic algorithms are faster than discernibi...
详细信息
In attribute reduction algorithms, discernibility matrix-based methods and heuristic-based methods are two highly effective approaches. While the prevailing view is that heuristic algorithms are faster than discernibility matrix-based methods, the rise of GPUs and other matrix-based parallel computing devices has enabled discernibility matrix reduction methods to achieve faster computation speeds by leveraging their matrix characteristics. However, few discernibility matrix- based methods can directly adapt to GPU devices, and for unlabeled data, existing discernibility matrix-based methods fail to fully utilize the fuzzy information, resulting in unsatisfactory outcomes. In this paper, we propose a parallel attribute reduction algorithm based on fuzzy discernibility matrices and soft deletion behavior. To achieve parallel computing, we transform the traditional 3-dimensional discernibility matrix into a 2-dimensional matrix. To maximize the use of fuzzy discernibility information, we introduce a fuzzy deletion function, which can effectively update the discernibility matrix by incorporating fuzzy discernibility information. Finally, we propose a stopping mechanism for the algorithm, enabling it to select fewer attributes under appropriate conditions. Experiments demonstrate that our algorithm significantly increases computation speed compared to traditional heuristic algorithms and reduces the number of attributes while maintaining and enhancing the effectiveness of downstream tasks.
When preparing an article on image restoration in astronomy, it is obvious that some topics have to be dropped to keep the work at reasonable length. We have decided to concentrate on image and noise models and on the...
详细信息
When preparing an article on image restoration in astronomy, it is obvious that some topics have to be dropped to keep the work at reasonable length. We have decided to concentrate on image and noise models and on the algorithms to find the restoration. Topics like parameter estimation and stopping rules are also commented on. We start by describing the Bayesian paradigm and then proceed to study the noise and blur models used by the astronomical community. Then the prior models used to restore astronomical images are examined. We describe the algorithms used to find the restoration for the most common combinations of degradation and image models. Then we comment on important issues such as acceleration of algorithms, stopping rules, and parameter estimation. We also comment on the huge amount of information available to, and made available by, the astronomical community
Convergence speed is one of the most important fea-tures of an equilibrium-seeking algorithm. In this article, we ad -dress the problem of algorithm acceleration for distributed Nash equilibrium (NE) learning in netwo...
详细信息
Convergence speed is one of the most important fea-tures of an equilibrium-seeking algorithm. In this article, we ad -dress the problem of algorithm acceleration for distributed Nash equilibrium (NE) learning in networked average aggregative games with partial decision information. Harnessing the smoothness of cost functions, we propose a novel accelerated NE learning algo-rithm by integrating a momentum term into a gradient descent step. We prove that the distributed algorithm converges to the exact Nash equilibrium with constant stepsize by bounding four key consensus error terms. When cost functions are strongly convex and interac-tion graph is undirected and connected, the proposed algorithm enjoys a linear convergence rate O (?(M (0))(k)), ?(M (0)) is the spectral radius of a parameterized matrix M (0) (0 is stepsize, k is iterations). Simulation results under two different communication graphs show the momentum term does accelerate the algorithm, the iteration numbers of convergence to specific relative error are significantly reduced, up to 80% when the stepsize is small and the graph connectivity is high.
Cascading failures have become a severe threat to interconnected modern power systems. The ultrahigh complexity of the interconnected networks is the main challenge toward the understanding and management of cascading...
详细信息
Cascading failures have become a severe threat to interconnected modern power systems. The ultrahigh complexity of the interconnected networks is the main challenge toward the understanding and management of cascading failures. In addition, high penetration of wind power integration introduces large uncertainties and further complicates the problem into a massive scenario simulation problem. This article proposes a framework that enables a fast cascading path searching under high penetration of wind power. In addition, we ease the computational burden by formulating the cascading path searching problem into a Markov chain searching problem and further use a dictionary-based technique to accelerate the calculations. In detail, we first generate massive wind generation and load scenarios. Then, we utilize the Markov search strategy to decouple the problem into a large number of DC power flow (DCPF) and DC optimal power flow (DCOPF) problems. The major time-consuming part, the DCOPF and the DCPF problems, is accelerated by the dynamic construction of a line status dictionary (LSD). The information in the LSD can significantly ease the computation burden of the following DCPF and DCOPF problems. The proposed method is proven to be effective by a case study of the IEEE RTS-79 test system and an empirical study of China's Henan Province power system.
Data security is the focus of information security. As a primary method, file encryption is adopted for ensuring data security. Encryption algorithms created to meet the Data Encryption Standard (DES) and the Advanced...
详细信息
Data security is the focus of information security. As a primary method, file encryption is adopted for ensuring data security. Encryption algorithms created to meet the Data Encryption Standard (DES) and the Advanced Encryption Standard (AES) are widely used in a variety of systems. These algorithms are computationally highly complex, thus, the efficiency of encrypting or decrypting large files can be drastically reduced. To this end, we propose an optimized algorithm that efficiently encrypts and decrypts large files by parallelizing processing tasks on a single heterogeneous many-core processor in the Sunway TaihuLight computer system. Firstly, we convert the serial DES and AES programs to our experimental platform. Then we implement a task assignment strategy to test the converted algorithms. Finally, in order to optimize parallelized algorithms and improve data transmission performance, we apply the master-slave communication optimization, the three-stage parallel pipeline, and vectorization. Extensive experiments demonstrate that our optimized algorithm is faster than the state-of-the-art open-source implementations of DES and AES. Compared with the serial processing algorithms, our parallelized DES and AES perform nearly 40 times and 72 times faster, respectively. The work described in this paper leverages existing methods and provides a sound basis for the direction of future research in data encryption.
We develop a novel technique to accelerate minorization-maximization (MM) procedure for the non-orthogonal multiple access (NOMA) weighted sum rate maximization problem. Specifically, we exploit the Lipschitz continui...
详细信息
We develop a novel technique to accelerate minorization-maximization (MM) procedure for the non-orthogonal multiple access (NOMA) weighted sum rate maximization problem. Specifically, we exploit the Lipschitz continuity of the gradient of the objective function to adaptively update the MM algorithm. With fewer additional analysis variables and low complexity second-order cone program (SOCP) to solve in each iteration of the MM algorithm, the proposed approach converges quickly at a small computational cost. By numerical simulation results, our algorithm is shown to greatly outperform known solutions in terms of achieved sum rates and computational complexity.
This work applies the methodology of the Universal Evolutionary Global Optimization, UEGO, to solve the protein structure optimization problem based on the HP model. The UEGO algorithm was initially designed to solve ...
详细信息
This work applies the methodology of the Universal Evolutionary Global Optimization, UEGO, to solve the protein structure optimization problem based on the HP model. The UEGO algorithm was initially designed to solve problems whose solutions were codified as real vectors. However, in this work the HP protein folding solutions have been defined as means of conformations encoded by relative coordinates. Consequently several main concepts in UEGO have been re-defined, i.e. the representation of a solution, the distance concept, the computation of a middle point, etc. In addition, a new efficient local optimizer has been designed based on the characteristics of the protein model. This work develops the adaptation and implementation of UEGO to the HP model and analyzes the UEGO solutions of HP protein folding for different 3D problems. Finally, obtained HP solutions are converted into all-atom models so that comparison with real proteins can be carried out, and a good agreement is obtained for small size proteins.
As most of developed empirical mode decomposition (EMD) based R-peaks detection algorithms consume a considerable time of calculation caused by the large length of the input ECG signal, the design of a new technique t...
详细信息
As most of developed empirical mode decomposition (EMD) based R-peaks detection algorithms consume a considerable time of calculation caused by the large length of the input ECG signal, the design of a new technique that allows the acceleration of such methods becomes necessary. Accordingly, a new variant of an EMD-based strategy for R-peaks localization is presented. The new accelerated variant is constituted of three essential parts. The first step is the length reduction of the input signal by means of the truncation in the Fast Fourier Transform (FFT) domain followed by the application of the inverse FFT guaranteeing a suitable time-domain down-sampling. Consequently, the new input signal of a reduced length preserves all medical information contained initially in the original lengthy signal. The second part is dedicated to identify the QRS complex using EMD-based R-peaks detection. This latter comprises a low-pass filter, Empirical Mode Decomposition (EMD) and the Hilbert transform, Finally, the third phase is the time-domain up-sampling using the FFT, the zeropadding and the Inverse Fast Fourier Transform (IFFT) to obtain a resulting processed signal which has the same length as the original signal. Next, as a post-processing step, final R-peaks refined localization is achieved. It is noticeable that the new variant ensures same results, in term of accuracy, as the standard method, however, a significant speed-up ratio of 6.95:1 is reported. Additionally, to more prove the effectiveness of the suggested strategy, it has been applied to accelerate two other efficient algorithms and satisfactory speed up ratios of, 7.20:1 and 4.23:1, respectively have been reached.
Theoretical results are reviewed that are concerned with the construction of speed-optimal parallel-pipeline algorithms for mass calculations in solving filtering problems. The optimality is proved in the correspondin...
详细信息
Theoretical results are reviewed that are concerned with the construction of speed-optimal parallel-pipeline algorithms for mass calculations in solving filtering problems. The optimality is proved in the corresponding classes of algorithms equivalent in terms of information graphs. The effectiveness of using the developed algorithmic constructions rfiltering problems is investigated.
暂无评论