This vision paper reviews the current state-of art and lays out emerging research challenges in parallel processing of spatial-temporal largedatasets relevant to a variety of scientific communities. The spatio-tempor...
详细信息
ISBN:
(纸本)9781538619964
This vision paper reviews the current state-of art and lays out emerging research challenges in parallel processing of spatial-temporal largedatasets relevant to a variety of scientific communities. The spatio-temporal data, whether captured through remote sensors (global earth observations), ground and ocean sensors (e.g., soil moisture sensors, buoys), social media and hand-held, traffic-related sensors and cameras, medical imaging (e.g., MRI), or large scale simulations (e.g., climate) have always been "big." A common thread among all these big collections of datasets is that they are spatial and temporal. Processing and analyzing these datasets requires high-performance computing (HPC) infrastructures. Various agencies, scientific communities and increasingly the society at large rely on spatial data management, analysis, and spatial data mining to gain insights and produce actionable plans. Therefore, an ecosystem of integrated and reliable software infrastructure is required for spatial temporal big data management and analysis that will serve as crucial tools for solving a wide set of research problems from different scientific and engineering areas and to empower users with next-generation tools. This vision requires a multidisciplinary effort to significantly advance domain research and have a broad impact on the society. The areas of research discussed in this paper include (i) spatial data mining, (ii) data analytics over remote sensing data, (iii) processing medical images, (iv) spatial econometrics analyses, (v) Map-Reduce based systems for spatial computation and visualization, (vi) CyberGIS systems, and (vii) foundational parallel algorithms and data structures for polygonal datasets, and why HPC infrastructures, including harnessing graphics accelerators, are needed for time-critical applications.
Designing complex system architectures involves analysing tradeoffs between multiple conflicting decision criteria to find a solution which best matches the preferences of the customer. This is usually done in the eng...
详细信息
ISBN:
(纸本)9781538634035
Designing complex system architectures involves analysing tradeoffs between multiple conflicting decision criteria to find a solution which best matches the preferences of the customer. This is usually done in the engineering characteristic (decision criteria) space, but the customer is generally more interested in higher-level characteristics. For example, the engineering characteristic "modularity" is not of direct interest to a customer, but it is related to their concern "through-life costs", since modular systems can be upgraded more easily. The relationships between customer and engineering concerns are many-to-many making it difficult to relate the two sets of priorities. This paper proposes an integrated system architecture synthesis framework, which aims to maximise customer satisfaction by using their preferences directly to refine a set of candidate architectures. The novelty of the research relates to the translation from customer preferences to decision criteria limits on a parallel coordinates plot. This automated flow facilitates rapid re-synthesis of "best" architectures following a change in customer preferences. The time saved allows customers to investigate a wider range of concerns and gain a better understanding of how their priorities influence the solution set. The approach is presented on a case study of a control system for a pressurized water reactor.
Deep Learning over Big data (DLoBD) is becoming one of the most important research paradigms to mine value from the massive amount of gathered data. Many emerging deep learning frameworks start running over Big data s...
详细信息
ISBN:
(纸本)9781538610138
Deep Learning over Big data (DLoBD) is becoming one of the most important research paradigms to mine value from the massive amount of gathered data. Many emerging deep learning frameworks start running over Big data stacks, such as Hadoop and Spark. With the convergence of HPC, Big data, and Deep Learning, these DLoBD stacks are taking advantage of RDMA and multi-/many-core based CPUs/GPUs. Even though a lot of activities are happening in the field, there is a lack of systematic studies on analyzing the impact of RDMA-capable networks and CPU/GPU on DLoBD stacks. To fill this gap, we propose a systematical characterization methodology and conduct extensive performance evaluations on three representative DLoBD stacks (i.e., CaffeOnSpark, TensorFlowOnSpark, and BigDL) to expose the interesting trends regarding performance, scalability, accuracy, and resource utilization. Our observations show that RDMA-based design for DLoBD stacks can achieve up to 2.7x speedup compared to the IPoIB based scheme. The RDMA scheme can also scale better and utilize resources more efficiently than the IPoIB scheme over InfiniBand clusters. For most cases, GPU-based deep learning can outperform CPU-based designs, but not always. We see that for LeNet on MNIST, CPU + MKL can achieve better performance than GPU and GPU + cuDNN on 16 nodes. Through our evaluation, we see that there are large rooms to improve the designs of current generation DLoBD stacks further.
The benefits of applying advanced illumination models to volume visualization have been demonstrated by many researchers. For a parallel distributed, GPU computing environment, however, there is no efficient algorithm...
详细信息
ISBN:
(纸本)9781509056590
The benefits of applying advanced illumination models to volume visualization have been demonstrated by many researchers. For a parallel distributed, GPU computing environment, however, there is no efficient algorithm for scalable global illumination calculations. This paper presents a parallel, data-distributed and GPU-accelerated algorithm for volume rendering with advanced lighting. Our approach features tunable soft shadows for enhancing perception of complex spatial structures and relationships. For lighting calculations, our design effectively avoids data exchange among GPUs. Performance evaluation on a GPU cluster using up to 128 GPUs shows scalable rendering performance, with both the number of GPUs and volume data size.
Sort-last parallel rendering can be improved by considering the rendering of multiple images at a time. Most parallel rendering algorithms consider the generation of only a single image. This makes sense when performi...
详细信息
ISBN:
(纸本)9781509056590
Sort-last parallel rendering can be improved by considering the rendering of multiple images at a time. Most parallel rendering algorithms consider the generation of only a single image. This makes sense when performing interactive rendering where the parameters of each rendering are not known until the previous rendering completes. However, in situ visualization often generates multiple images that do not need to be created sequentially. In this paper we present a simple and effective approach to improving parallel image generation throughput by amortizing the load and overhead among multiple image renders. Additionally, we validate our approach by conducting a performance study exploring the achievable speed-ups in a variety of image-based in situ use cases and rendering workloads. On average, our approach shows a 1.5 to 3.7 fold improvement in performance, and in some cases, shows a 10 fold improvement.
visualizations for computational biology have been developing for over 50 years. With recent advances in both computational biology and computer graphics techniques, these fields have witnessed rapid technological adv...
详细信息
Cloud computing is an essential technology to Big data analytics and services. A cloud computing system is often comprised of a large number of parallel computing and storage devices. Monitoring the usage and performa...
详细信息
Cloud computing is an essential technology to Big data analytics and services. A cloud computing system is often comprised of a large number of parallel computing and storage devices. Monitoring the usage and performance of such a system is important for efficient operations, maintenance, and security. Tracing every application on a large cloud system is untenable due to scale and privacy issues. But profile data can be collected relatively efficiently by regularly sampling the state of the system, including properties such as CPU load, memory usage, network usage, and others, creating a set of multivariate time series for each system. Adequate tools for studying such large-scale, multidimensional data are lacking. In this paper, we present a visual based analysis approach to understanding and analyzing the performance and behavior of cloud computing systems. Our design is based on similarity measures and a layout method to portray the behavior of each compute node over time. When visualizing a large number of behavioral lines together, distinct patterns often appear suggesting particular types of performance bottleneck. The resulting system provides multiple linked views, which allow the user to interactively explore the data by examining the data or a selected subset at different levels of detail. Our case studies, which use datasets collected from two different cloud systems, show that this visual based approach is effective in identifying trends and anomalies of the systems.
The generation of short pulses of ion beams through the interaction of an intense laser with a plasma sheath offers the possibility of compact and cheaper ion sources for many applications--from fast ignition and radi...
详细信息
The generation of short pulses of ion beams through the interaction of an intense laser with a plasma sheath offers the possibility of compact and cheaper ion sources for many applications--from fast ignition and radiography of dense targets to hadron therapy and injection into conventional accelerators. To enable the efficient analysis of large-scale, high-fidelity particle accelerator simulations using the Warp simulation suite, the authors introduce the Warp In situ visualization Toolkit (WarpIV). WarpIV integrates state-of-the-art in situ visualization and analysis using VisIt with Warp, supports management and control of complex in situ visualization and analysis workflows, and implements integrated analytics to facilitate query- and feature-based data analytics and efficient large-scale data analysis. WarpIV enables for the first time distributed parallel, in situ visualization of the full simulation data using high-performance compute resources as the data is being generated by Warp. The authors describe the application of WarpIV to study and compare large 2D and 3D ion accelerator simulations, demonstrating significant differences in the acceleration process in 2D and 3D simulations. WarpIV is available to the public via https://***/berkeleylab/warpiv. The Warp In situ visualization Toolkit (WarpIV) supports large-scale, parallel, in situ visualization and analysis and facilitates query- and feature-based analytics, enabling for the first time high-performance analysis of large-scale, high-fidelity particle accelerator simulations while the data is being generated by the Warp simulation suite. This supplemental material https://***/extra/*** provides more details regarding the memory profiling and optimization and the Yee grid recentering optimization results discussed in the main article. [ABSTRACT FROM PUBLISHER]
large scale molecular dynamics simulations produce terabytes of data that is impractical to transfer to remote facilities. It is therefore necessary to perform visualization tasks in-situ as the data are generated, or...
详细信息
ISBN:
(纸本)9781509036820
large scale molecular dynamics simulations produce terabytes of data that is impractical to transfer to remote facilities. It is therefore necessary to perform visualization tasks in-situ as the data are generated, or by running interactive remote visualization sessions and batch analyses co-located with direct access to high performance storage systems. A significant challenge for deploying visualization software within clouds, clusters, and supercomputers involves the operating system software required to initialize and manage graphics acceleration hardware. Recently, it has become possible for applications to use the Embedded-system graphics Library (EGL) to eliminate the requirement for windowing system software on compute nodes, thereby eliminating a significant obstacle to broader use of high performance visualization applications. We outline the potential benefits of this approach in the context of visualization applications used in the cloud, on commodity clusters, and supercomputers. We discuss the implementation of EGL support in VMD, a widely used molecular visualization application, and we outline benefits of the approach for molecular visualization tasks on petascale computers, clouds, and remote visualization servers. We then provide a brief evaluation of the use of EGL in VMD, with tests using developmental graphics drivers on conventional workstations and on Amazon EC2 G2 GPU-accelerated cloud instance types. We expect that the techniques described here will be of broad benefit to many other visualization applications.
The 2017 visualization Career Award goes to Charles (Chuck) Hansen in recognition for his contributions to large scale datavisualization, including advances in parallel and volume rendering, novel interaction techniq...
详细信息
The 2017 visualization Career Award goes to Charles (Chuck) Hansen in recognition for his contributions to large scale datavisualization, including advances in parallel and volume rendering, novel interaction techniques, and techniques for exploiting hardware; for his leadership in the community as an educator, program chair, and editor; and for providing vision for the development and support of the field. The ieeevisualization & graphics Technical Committee (VGTC) is pleased to award Charles Hansen the 2017 visualization Career Award.
暂无评论