Process-network synthesis is the determination of the optimal network structure of a process system together with optimal configurations and capacities of the operating units incorporated into the system. The aim of d...
详细信息
Process-network synthesis is the determination of the optimal network structure of a process system together with optimal configurations and capacities of the operating units incorporated into the system. The aim of developing more and more sophisticated solver algorithms is to find the optimum as fast as possible and increase the circle of practically solvable process synthesis problems. The P-graph framework can effectively reduce the number of structures to be examined and accelerate the computation searching for the optimum due to the exploitation of combinatorial characteristics of candidate solution structures. A cooperative parallel implementation of P-graph algorithms have been published recently to exploit the capabilities of multi-core and multiprocessor systems (Bartos and Bertok in De Gruyter Ser Logic Appl 1:303-313, 2015). The parallel implementation has increased performance significantly but this can be further improved by fine tuning the parameters of the parallel algorithm. Outcomes of experiments on parameter optimization are to be presented herein.
Minimal functional dependency is an important relationship in the relational database. It can describe some special relationships between complex and irregular attributes in the relational database. Extracting minimal...
详细信息
Minimal functional dependency is an important relationship in the relational database. It can describe some special relationships between complex and irregular attributes in the relational database. Extracting minimal functional dependencies (MFDs) from relational databases is an important database analysis technique. However, as the data grows larger and larger in size, even the most efficient stand-alone algorithms are exponential in the number of attributes of the relations. Discovering MFDs on a single computer is hard and slow, and it can only be applied to small centralized datasets. It is challenging to discover MFDs from big data, especially large-scale distributed data. Apache Spark is a unified analytics engine for big data processing;we present a new algorithm FastMFDs based on Spark for discovering all MFDs from large-scale distributed data in parallel. FastMFDs uses both the RDD framework and the DataFrame framework to store and process distributed data. FastMFDs deletes equivalent attributes. FastMFDs also provides two-way search algorithm for searching and pruning. We experimented our algorithm on real-life datasets, and our algorithm is more efficient and faster than the existing discovering methods.
Statistical fisheries models are frequently used by researchers and agencies to understand the behavior of marine ecosystems or to estimate the maximum acceptable catch of different species of commercial interest. The...
详细信息
Statistical fisheries models are frequently used by researchers and agencies to understand the behavior of marine ecosystems or to estimate the maximum acceptable catch of different species of commercial interest. The parameters of these models are usually adjusted through the use of optimization algorithms. Unfortunately, the choice of the best optimization method is far from trivial. This work proposes the use of population-based algorithms to improve the optimization process of the Globally applicable Area Disaggregated General Ecosystem Toolbox (Gadget), a flexible framework that allows the development of complex statistical marine ecosystem models. Specifically, parallel versions of the Differential Evolution (DE) and the Particle Swarm Optimization (PSO) methods are proposed. The proposals include an automatic selection of the internal parameters to reduce the complexity of their usage, and a restart mechanism to avoid local minima. The resulting optimization algorithms were called PMA (parallel Multirestart Adaptive) DE and PMA PSO respectively. Experimental results prove that the new algorithms are faster and produce more accurate solutions than the other parallel optimization methods already included in Gadget. Although the new proposals have been evaluated on fisheries models, there is nothing specific to the tested models in them, and thus they can be also applied to other optimization problems. Moreover, the PMA scheme proposed can be seen as a template that can be easily applied to other population-based heuristics. (C) 2019 Elsevier B.V. All rights reserved.
Bulk Synchronous parallel (BSP) is a model for parallel computing with predictable scalability. BSP has a cost model: programs can be assigned a cost which describes their resource usage on any parallel machine. Howev...
详细信息
Bulk Synchronous parallel (BSP) is a model for parallel computing with predictable scalability. BSP has a cost model: programs can be assigned a cost which describes their resource usage on any parallel machine. However, the programmer has to manually derive this cost. This paper describes an automatic method for the derivation of BSP program costs, based on classic cost analysis and approximation of polyhedral integer volumes. Our method requires and analyzes programs with textually aligned synchronization and textually aligned, polyhedral communication. We have implemented the analysis and our prototype obtains cost formulas that are parametric in the input parameters of the program and the parameters of the BSP computer and thus bound the cost of running the program with any input on any number of cores. We evaluate the cost formulas and find that they are indeed upper bounds, and tight for data-oblivious programs. Additionally, we evaluate their capacity to predict concrete run times in two parallel settings: a multi-core computer and a cluster. We find that when exact upper bounds can be found, they accurately predict run-times. In networks with full bisection bandwidth, as the BSP model supposes, results are promising with errors <50%.
This paper presents a number of optimisations for improving the performance of unstructured computational fluid dynamics codes on multicore and manycore architectures such as the Intel Sandy Bridge, Broadwell and Skyl...
详细信息
This paper presents a number of optimisations for improving the performance of unstructured computational fluid dynamics codes on multicore and manycore architectures such as the Intel Sandy Bridge, Broadwell and Skylake CPUs and the Intel Xeon Phi Knights Corner and Knights Landing manycore processors. We discuss and demonstrate their implementation in two distinct classes of computational kernels: face-based loops represented by the computation of fluxes and cell-based loops representing updates to state vectors. We present the importance of making efficient use of the underlying vector units in both classes of computational kernels with special emphasis on the changes required for vectorising face-based loops and their intrinsic indirect and irregular access patterns. We demonstrate the advantage of different data layouts for cell-centred as well as face data structures and architectural specific optimisations for improving the performance of gather and scatter operations which are prevalent in unstructured mesh applications. The implementation of a software prefetching strategy based on auto tuning is also shown along with an empirical evaluation on the importance of multithreading for in order architectures such as Knights Corner. We explore the various memory modes available on the Intel Xeon Phi Knights Landing architecture and present an approach whereby both traditional DRAM as well as MCDRAM interfaces are exploited for maximum performance. We obtain significant full application speed-ups between 2.8 and 3X across the multicore CPUs in two-socket node configurations, 8.6X on the Intel Xeon Phi Knights Corner coprocessor and 5.6X on the Intel Xeon Phi Knights Landing processor in an unstructured finite volume CFD code representative in size and complexity to an industrial application. Program summary Program Title: some_opt_for_unstructured_cfd Program Files doi: http://***/10.17632/zyh2zkf3jw.1 Licensing provisions: GNU General Public License 3 (GPL)
Due to the growth of biological databases and biomedical instruments, the high performance active (real-time) signal processing becomes a challenge for medical scientists and engineers. The medical applications requir...
详细信息
Due to the growth of biological databases and biomedical instruments, the high performance active (real-time) signal processing becomes a challenge for medical scientists and engineers. The medical applications require a high-performance signal processor which can process the scientific and engineering biomedical applications and is easy to program. In this article, we have suggested a biomedical sensor interface and heterogeneous multi-core processing architecture based biomedical application processing system (BAPS) and biomedical applications toolkit. The biomedical sensor interface supports multiple regular and complex medical signals and provides digital data to the processing system. The BAPS uses heterogeneous multi-core architecture that processes biomedical applications with the performance up to 10 billion operations per sec and accuracy of 1 mu sec. The biomedical application toolkit provides programmability by giving support of hardware-level, scientific and artificial intelligence programming. The BAPS provides a single embedded platform solution to process a wide range of biomedical signal and image processing applications. To prove the importance of the proposed system, we developed the BAPS hardware architecture and tested it with different biomedical applications. When compared the results of BAPS with the baseline system, the results show that BAPS improves active (real-time) applications performance up to 12.8 times and processes passive (non-real-time) application 7.4 times faster and improves the 4.84-time performance of artificial intelligence application. While comparing the power and energy, the BAPS draws 1.56 times less dynamic power and consumes 21.85 times less energy.
Many libraries in the HPC field use sophisticated algorithms with clear theoretical scalability expectations. However, hardware constraints or programming bugs may sometimes render these expectations inaccurate or eve...
详细信息
Many libraries in the HPC field use sophisticated algorithms with clear theoretical scalability expectations. However, hardware constraints or programming bugs may sometimes render these expectations inaccurate or even plainly wrong. While algorithm and performance engineers have already been advocating the systematic combination of analytical performance models with practical measurements for a very long time, we go one step further and show how this comparison can become part of automated testing procedures. The most important applications of our method include initial validation, regression testing, and benchmarking to compare implementation and platform alternatives. Advancing the concept of performance assertions, we verify asymptotic scaling trends rather than precise analytical expressions, relieving the developer from the burden of having to specify and maintain very fine-grained and potentially non-portable expectations. In this way, scalability validation can be continuously applied throughout the whole development cycle with very little effort. Using MPI and parallel sorting algorithms as examples, we show how our method can help uncover non-obvious limitations of both libraries and underlying platforms.
Asymmetric multi-cores (AMCs) are a successful architectural solution for both mobile devices and supercomputers. By maintaining two types of cores (fast and slow) AMCs are able to provide high performance under the f...
详细信息
Asymmetric multi-cores (AMCs) are a successful architectural solution for both mobile devices and supercomputers. By maintaining two types of cores (fast and slow) AMCs are able to provide high performance under the facility power budget. This paper performs the first extensive evaluation of how portable are the current HPC applications for such supercomputing systems. Specifically we evaluate several execution models on an ARM *** AMC using the PARSEC benchmark suite that includes representative highly parallel applications. We compare schedulers at the user, OS and runtime levels, using both static and dynamic options and multiple configurations, and assess the impact of these options on the well-known problem of balancing the load across AMCs. Our results demonstrate that scheduling is more effective when it takes place in the runtime system level as it improves the baseline by 23%, while the heterogeneous-aware OS scheduling solution improves the baseline by 10%. (C) 2019 Published by Elsevier Inc.
Scalability is a key feature for big data analysis and machine learning frameworks and for applications that need to analyze very large and real-time data available from data repositories, social media, sensor network...
详细信息
Scalability is a key feature for big data analysis and machine learning frameworks and for applications that need to analyze very large and real-time data available from data repositories, social media, sensor networks, smartphones, and the Web. Scalable big data analysis today can be achieved by parallel implementations that are able to exploit the computing and storage facilities of high performance computing (HPC) systems and clouds, whereas in the near future Exascale systems will be used to implement extreme-scale data analysis. Here is discussed how clouds currently support the development of scalable data mining solutions and are outlined and examined the main challenges to be addressed and solved for implementing innovative data analysis applications on Exascale systems.
In this work, novel circuits based on memristors for implementing electronic synapse and artificial neuron are designed. First, two simple synaptic circuits for implementing weighting calculations of voltage and curre...
详细信息
In this work, novel circuits based on memristors for implementing electronic synapse and artificial neuron are designed. First, two simple synaptic circuits for implementing weighting calculations of voltage and current modes using twin memristors are proposed. A synaptic weighting operation is defined as a difference function between the twin memristors, which can be adjusted in reverse by applying programmed signals and conducting positive, zero, and negative synaptic weights. Second, two neuron circuits using the proposed memristor synapses, in which parallel computing and programming can be achieved, are designed. Finally, performances of the proposed memristor synapses and neuron circuits, such as weight programming, neuron computing, and parallel operation, are analyzed through PSpice simulations. (C) 2018 Elsevier B.V. All rightsreserved.
暂无评论