Low power Systems-on-Chip (SoCs), originally developed in the context of mobile and embedded technologies, are becoming attractive for the scientific community given their increasing computing performances, coupled wi...
详细信息
Low power Systems-on-Chip (SoCs), originally developed in the context of mobile and embedded technologies, are becoming attractive for the scientific community given their increasing computing performances, coupled with relatively low cost and power demand. In this work, we investigate the potential of SoCs for realistic scientific workloads, in particular taken from the bioinformatics and astrophysics domains. We selected a series of parallel, computationally intensive scientific applications and ported them to a cluster of development boards based on low power SoCs. The performance results obtained for the different applications are reported and compared with those obtained on a typical x86 HPC node.
The flexibility of Field Programmable Gate Arrays (FPGAs) as well as their parallelprocessing capabilities make them a good choice for digital signal processing in communication systems. However, today, further impro...
详细信息
ISBN:
(纸本)9781509012886
The flexibility of Field Programmable Gate Arrays (FPGAs) as well as their parallelprocessing capabilities make them a good choice for digital signal processing in communication systems. However, today, further improvements in performance hang in mid-air as we run into the frequency wall and FPGA based devices are clocked below 1 GHz. New methodologies which can cater performance optimization within the frequency wall limitation become highly essential. In this context, efficient modulation techniques like Quadrature Amplitude Modulation (QAM) and mixed time and frequency domain approach have been utilized in this paper to employ a generic scalable FPGA based QAM transmitter with the filter parallelization being executed in mixed domain. The system developed in this paper achieves a throughput of 4 Gb/s for QAM-16 format with a clock frequency as low as 62.5 MHz, thereby, paves down a promising methodology for applications where having higher clock frequencies is a hard limit.
The design of durable structure components requires durability analysis in CAE systems. Such analysis requires batch processing of multiple same-type calculation routines and cloud computations use is a favorable choi...
详细信息
ISBN:
(纸本)9783038355250
The design of durable structure components requires durability analysis in CAE systems. Such analysis requires batch processing of multiple same-type calculation routines and cloud computations use is a favorable choice for such a task. To develop CAE durability modules analytical approach to different batch processing systems efficiency estimation is necessary. Such modules are intended for durability analysis of pre-hydrogenated and statically loaded structure components with initial defects. Durability estimate is defined as crack growth time elapsed from initial defect state to structure component fracture. Crack kinetics model had been used to simulate a fracture process, required for safe operation of a structure;crack length curves had been obtained and analyzed. The results were verified with the published experimental data on the subject at hand. The empirically specified crack kinetics model assumes the dependence between its parameters and structure components durability thus many similar separate computational tasks have to be done, which to save design time can be executed in parallel. An approach to parallel batch processing organization is described: a distributed cloud application, built on top of Microsoft Azure services, which engages multiple computational resources from a distant cloud server to perform parallel execution of simulations. Additionally, an efficiency criterion of parallel simulation tasks batch processing is suggested. Authors’ criterion can be applied to differentiate between various implementations of batch processing of similar computational tasks thus enabling to reach sufficient cost-effectiveness of computational resources utilization. This approach is important for design works budget planning because prolonged cloud resources use gradually increases cost of designed products – structure components. Using the criterion the most efficient source of computational power for any given task can be selected both automatically and
Avionics applications need to be certified for the highest criticality standard. This certification includes schedulability analysis and worst-case execution time (WCET) analysis. WCET analysis is only possible when t...
详细信息
Avionics applications need to be certified for the highest criticality standard. This certification includes schedulability analysis and worst-case execution time (WCET) analysis. WCET analysis is only possible when the software is written to be WCET analyzable and when the platform is time-predictable. In this paper we present prototype avionics applications that have been ported to the time-predictable T-CREST platform. The applications are WCET analyzable, and T-CREST is supported by the aiT WCET analyzer. This combination allows us to provide WCET bounds of avionic tasks, even when executing on a multicore processor.
MapReduce programming model is a popular model to simplify but speed up data parallelapplications. However, it is not efficient for iterative applications because of its repeated data transmission with HDFS (Hadoop D...
详细信息
MapReduce programming model is a popular model to simplify but speed up data parallelapplications. However, it is not efficient for iterative applications because of its repeated data transmission with HDFS (Hadoop distributed File System). Conch, a cyclic MapReduce model, is designed for efficient processing of iterative applications. In order to minimize network overhead, shared data is cached locally and a "map-shuffle" phase is presented with a combined transmission mechanism. Meanwhile, a prediction scheduler for iterative applications is brought out to achieve better data locality in terms of runtime information. The experiments show that Conch can support iterative applications transparently and efficiently. Compared with Hadoop and HaLoop in single-job environment, Conch can achieve 13%-17% improvements on K-Means and fuzzy C-Means. Especially in multi-job environment, 63.6% and 28.6% improvements can be obtained compared with Hadoop and HaLoop.
The proceedings contain 14 papers. The special focus in this conference is on Brain-Inspired Computing. The topics include: Human brainnetome atlas and its potential applications in brain-inspired computing;workflows ...
ISBN:
(纸本)9783319508610
The proceedings contain 14 papers. The special focus in this conference is on Brain-Inspired Computing. The topics include: Human brainnetome atlas and its potential applications in brain-inspired computing;workflows for ultra-high resolution 3D models of the human brain on massively parallel supercomputers;including gap junctions into distributed neuronal network simulations;finite-difference time-domain simulation for three-dimensional polarized light imaging;visual processing in cortical architecture from neuroscience to neuromorphic computing;bio-inspired filters for audio analysis;sophisticated LVQ classification models - beyond accuracy optimization;classification of FDG-PET brain data by generalized matrix relevance LVQ;a cephalomorph real-time computer;towards the ultimate display for neuroscientific data analysis;sentiment analysis and affective computing;methods and applications and deep representations for collaborative robotics.
Computer vision has played a key role in developing object detection and tracking techniques for Surveillance system. Most of the implementations currently employed are based on Serial execution on General Purpose Pro...
详细信息
ISBN:
(纸本)9781509016235
Computer vision has played a key role in developing object detection and tracking techniques for Surveillance system. Most of the implementations currently employed are based on Serial execution on General Purpose Processors. But the high cost and complexity of such implementations doesn't make it a viable option for real time surveillance system. The system proposed here is implemented on Field Programmable Gate Arrays (FPGA) Zynq XC7Z020 board using Modified Background Subtraction algorithm for real-time Object Detection and Tracking. The presence of numerous configurable logic blocks, distributed memory and hard Digital Signal processing (DSP) modules offers a great flexibility in achieving Temporal and Spatial parallelism. This paper uses Xilinx ISE software for implementation which is programmed in VHDL. OV7670 camera used in the paper has a resolution of 0.3 Megapixel and it captures the video at a speed of 30fps. The reference frame and the subsequent incoming frames are stored in different memory modules before the Modified Background Subtraction algorithm is applied on these frames to obtain the difference image. After comparing it with the threshold, the resultant image is displayed and its addresses are stored in order to track it. The system works in real time with minimum time lag between the capture and display. Moreover the entire system is optimized in terms of speed, memory requirements as well as the number of logic elements used which makes it suitable for application in real-time surveillance system.
The proceedings contain 39 papers. The topics discussed include: H2F: a hierarchical Hadoop framework for big data processing in geo-distributed environments;performance characterization of Hadoop workloads on SR-IOV-...
ISBN:
(纸本)9781450346177
The proceedings contain 39 papers. The topics discussed include: H2F: a hierarchical Hadoop framework for big data processing in geo-distributed environments;performance characterization of Hadoop workloads on SR-IOV-enabled virtualized InfiniBand clusters;a visual analytics approach to author name disambiguation;applying big data warehousing and visualization techniques on PingER data;efficient service discovery in decentralized online social networks;towards longitudinal analysis of a population's electronic health records using factor graphs;disease gene discovery of single-gene disorders based on complex network;identifying patient experience from online resources via sentiment analysis and topic modelling;a study of factuality, objectivity and relevance: three desiderata in large-scale information retrieval;on exploiting data locality for iterative MapReduce applications in hybrid clouds;spatial and temporal analysis of urban space utilization with renewable wireless sensor network;spatial big data for designing large scale infrastructure: a case-study of electrical road systems;a real-time big data analysis framework on a CPU/GPU heterogeneous cluster: a meteorological application case study;a benchmarking platform for analyzing corpora of traces: the recognition of the users' involvement in fields of competencies;and not too late to identify potential churners: early churn prediction in telecommunication industry.
The proceedings contain 13 papers. The special focus in this conference is on Mathematical and Engineering Methods in Computer Science. The topics include: Programming support for future parallel architectures;flexibl...
ISBN:
(纸本)9783319298160
The proceedings contain 13 papers. The special focus in this conference is on Mathematical and Engineering Methods in Computer Science. The topics include: Programming support for future parallel architectures;flexible interpolation for efficient model checking;understanding transparent and complicated users as instances of preference learning for recommender systems;span-program-based quantum algorithms for graph bipartiteness and connectivity;fitting aggregation operators;practical exhaustive generation of small multiway cuts in sparse graphs;self-adaptive architecture for multi-sensor embedded vision system;exceptional configurations of quantum walks with grover’s coin;performance analysis of distributed stream processingapplications through colored petri nets;GPU-accelerated real-time mesh simplification using parallel half edge collapses;classifier ensemble by semi-supervised learning;the challenge of increasing safe response of antivirus software users and weak memory models as LLVM-to-LLVM transformations.
Computational systems are nowadays composed of basic computational components that share multiprocessors and coprocessors of different types, typically several graphics processing units (GPUs) or many integrated cores...
详细信息
Computational systems are nowadays composed of basic computational components that share multiprocessors and coprocessors of different types, typically several graphics processing units (GPUs) or many integrated cores (MICs), and those computational components are combined in heterogeneous clusters of nodes with different characteristics, including coprocessors of different types, with varying numbers of nodes at different speeds. The software previously developed and optimized for simpler system needs to be redesigned and reoptimized for these new, more complex systems. The adaptation to hybrid multicore+multiGPU and multicore+multiMIC of autotuning techniques for basic linear algebra routines is analyzed. The matrix-matrix multiplication kernel, which is optimized for different computational system components through guided experimentation, is studied. The routine is installed for each node in the cluster, and the information generated from individual installations may be used for a hierarchical installation in a cluster. The basic matrix-matrix multiplication may, in turn, be used inside higher level routines, which delegate their efficient execution to the optimization of the lower level routine. Experimental results are satisfactory in different multicore+multiGPU and multicore+multiMIC systems. So the guided search of execution configurations for satisfactory execution times proves to be a useful tool for heterogeneous systems, where the complexity of the system means a correct use of highly efficient routines and libraries is difficult.
暂无评论