Big Data is an immense term for working with large volume and complex data sets. When data set is large in volume and traditional processingapplications are inadequate then distributed databases are needed. Big data ...
详细信息
ISBN:
(纸本)9789380544199
Big Data is an immense term for working with large volume and complex data sets. When data set is large in volume and traditional processingapplications are inadequate then distributed databases are needed. Big data came into existence because earlier technologies were not able to handle such large data from autonomous sources. To find meaningful and accurate data from large unstructured data, is a dreary task for any user. This is the reason why classification techniques came into picture for big data. With the help of classification methods unstructured data can be turned into organized form so that a user can access the required data easily. These classification techniques can be applied over big transactional databases to provide data services to users from large volume data sets. Classification is an aspect of machine learning and there are basically two broad categories: Supervised and unsupervised classification. In this paper we worked on to study variants of supervised classification methods. A comparison is also done on the basis of their advantages and limitations
Improving the memory access behavior of parallelapplications is one of the most important challenges in high-performance computing. Non-Uniform Memory Access (NUMA) architectures pose particular challenges in this co...
详细信息
Improving the memory access behavior of parallelapplications is one of the most important challenges in high-performance computing. Non-Uniform Memory Access (NUMA) architectures pose particular challenges in this context: they contain multiple memory controllers and the selection of a controller to serve a page request influences the overall locality and balance of memory accesses, which in turn affect performance. In this paper, we analyze and improve the memory access pattern and overall memory usage of large-scale irregular applications on NUMA machines. We selected HashSieve, a very important algorithm in the context of lattice-based cryptography, as a representative example, due to (1) its extremely irregular memory pattern, (2) large memory requirements and (3) unsuitability to other computer architectures, such as GPUs. We optimize HashSieve with a variety of techniques, focusing both on the algorithm itself as well as the mapping of memory pages to NUMA nodes, achieving a speedup of over 2x.
Accumulated cost surfaces (ACSs) are a tool for spatial modelling used in a number of fields. Some relevant applications, especially in the areas of multi-criteria evaluation and spatial optimization, require the avai...
详细信息
Accumulated cost surfaces (ACSs) are a tool for spatial modelling used in a number of fields. Some relevant applications, especially in the areas of multi-criteria evaluation and spatial optimization, require the availability of several ACSs on the same raster, which may result in a significant computational cost. In this paper, we discuss some techniques available in the literature for accelerating the ACS computation using graphics processing units (GPUs) and CUDA. Also, we illustrate in details a new CUDA algorithm suitable for the computation of multiple ACSs. Moreover, we present some preliminary results on a test case, including an experimental comparison against a fast sequential implementation running on a CPU.
In recent years, advances in digital sensors, communication, computation and storage have created huge collections of data. Many real applications in various fields require efficient and effective management of these ...
详细信息
In recent years, advances in digital sensors, communication, computation and storage have created huge collections of data. Many real applications in various fields require efficient and effective management of these large-scale, graph-structured data, demanding the design of new techniques and platforms for analyzing, processing and mining these large-scale graphs. There are distributed graph processing platforms running on a cluster of machines as well as non-distributed platforms working on a single machine. Most of the platforms use homogeneous processors such as multi-core CPUs while several platforms utilize both multi-core CPUs and many-core GPUs. The diversities of the available graphs, the processing algorithms, and the graph-processing platforms make the selection of a platform a difficult task. In this paper, we provide a comparative study on a selection of open-source graph processing platforms. We evaluate their performance, scalability and energy efficiency and discuss the reasons behind for designers or users of graph processing platforms.
Many features and advantages have been brought to organizations and computer users by Cloud computing. Many applications and services have been distributed by Cloud providers in an economical way. Even though companie...
详细信息
ISBN:
(纸本)9781509052639
Many features and advantages have been brought to organizations and computer users by Cloud computing. Many applications and services have been distributed by Cloud providers in an economical way. Even though companies and clients have started using Cloud computing, they are still concerned about their data's security because the data are stored and controlled by the Cloud providers [9]. In this paper, a technique has been explored to improve query processing performance while protecting database tables on a Cloud by encrypting those so that they remain secure. In addition, four techniques have been designed to index and partition the data, and these indexed data will be stored together with the encrypted table on the Cloud or server. The indexes and partitions of the data are used to select part of the data from the Cloud or outsource data depending on the required data. The indexed data can be used to increase the performance when the data are requested from the encrypted table. To compare the efficiency of our proposed methods, results will be presented in forms of graphs. In addition, the paper will explain how to improve performance by combining the two methods to partition the data.
Silent data corruption (SDC) poses a great challenge for high-performance computing (HPC) applications as we move to extreme-scale systems. If not dealt with properly, SDC has the potential to influence important scie...
详细信息
ISBN:
(纸本)9783319436593;9783319436586
Silent data corruption (SDC) poses a great challenge for high-performance computing (HPC) applications as we move to extreme-scale systems. If not dealt with properly, SDC has the potential to influence important scientific results, leading scientists to wrong conclusions. In previous work, our detector was able to detect SDC in HPC applications to a certain level by using the peculiarities of the data (more specifically, its "smoothness" in time and space) to make predictions. Accurate predictions allow us to detect corruptions when data values are far "enough" from them. However, these data-analytic solutions are still far from fully protecting applications to a level comparable with more expensive solutions such as full replication. In this work, we propose partial replication to overcome this limitation. More specifically, we have observed that not all processes of an MPI application experience the same level of data variability at exactly the same time. Thus, we can smartly choose and replicate only those processes for which our lightweight data-analytic detectors would perform poorly. Our results indicate that our new approach can protect the MPI applications analyzed with 49-53% less overhead than that of full duplication with similar detection recall.
The data volume of many scientific applications has substantially increased in the past decade and continues to increase due to the rising needs of high-resolution and fine-granularity scientific discovery. The data m...
详细信息
ISBN:
(纸本)9781509033157
The data volume of many scientific applications has substantially increased in the past decade and continues to increase due to the rising needs of high-resolution and fine-granularity scientific discovery. The data movement between storage and compute nodes has become a critical performance factor and has attracted intense research and development attention in recent years. In this paper, we propose a novel solution, named Active burst-buffer, to reduce the unnecessary data movement and to speed up scientific workflow. Active burst-buffer enhances the existing burst-buffer concept with analysis capabilities by reconstructing the cached data to a logic file and providing a MapReduce-like computing framework for programming and executing the analysis codes. An extensive set of experiments were conducted to evaluate the performance of Active burst-buffer by comparing it against existing mainstream schemes, and more than 30% improvements were observed. The evaluations confirm that Active burst-buffer is capable of enabling efficient data analysis in-transit on burst-buffer nodes and is a promising solution to scientific discoveries with large-scale data sets.
Emerging nanoscale silicon-photonics with its advances in fabrication and integration of on-chip CMOS-compatible optical elements are good news for system designers. Optical Network-on-Chips (ONoCs) could be the next ...
详细信息
Emerging nanoscale silicon-photonics with its advances in fabrication and integration of on-chip CMOS-compatible optical elements are good news for system designers. Optical Network-on-Chips (ONoCs) could be the next generation of NoCs. On the other hand, hybrid opto-electrical networks may provide higher bandwidth, lower latency and better power dissipation when considering both optical and electrical characteristics on multicore platforms. The cluster-based technique locally connects processing cores through electrical interconnect, while the clusters themselves are connected together through an optical waveguide. The experimental results show that in most benchmark applications, the cluster size of 4 proves to be an appropriate size for optimizing the energy-delay product (EDP) parameter.
The proceedings contain 24 *** special focus in this conference is on Short Papers, Big Data applications and Principles, Data centered Smart applications and ADBIS Doctoral *** topics include: Towards automated perfo...
ISBN:
(纸本)9783319440651
The proceedings contain 24 *** special focus in this conference is on Short Papers, Big Data applications and Principles, Data centered Smart applications and ADBIS Doctoral *** topics include: Towards automated performance optimization of BPMN business processes;pixel-based analysis of information dashboard attributes;towards adaptive distributed top-k query processing;basis functions as pivots in space of users preferences;towards semi-structured JSON big data;skyline algorithms on streams of multidimensional data;canonical data model for data warehouse;a quality-based query rewriting algorithm for data integration, towards spatial crowdsourcing in vehicular networks using mobile agents;shift of image processing technologies to column-oriented databases;influence of parallelism property of streaming engines on their performance;reducing big data by means of context-aware tailoring;feature ranking and selection for big data sets;a bagged associative classifier for big data frameworks;a new parallel approximate subspace clustering algorithm;smart modeling for lightweight mobile application development methods;an implementation method of an information credibility calculation system for emergency such as natural disasters;model capsules for research and engineering networks;usage of aspect-oriented programming in adaptive application structure;short-term user behaviour changes modelling and framework for managing distinct versions of data in relational databases.
The proceedings contain 174 papers. The topics discussed include: distribution network reconfiguration for control of the demand contract with transmission system;offline transmission system analysis with reduced dist...
ISBN:
(纸本)9781509046508
The proceedings contain 174 papers. The topics discussed include: distribution network reconfiguration for control of the demand contract with transmission system;offline transmission system analysis with reduced distribution networks;overview of on-line and off-line ampacity identification techniques of bare overhead transmission line;grid frequency support by single-phase electric vehicles employing an innovative virtual inertia controller;the role of residential HVAC units in demand side flexibility considering end-user comfort;asymmetries of earthing arrangements and equipotential bonding systems in buildings and the effects on EMC;optical wavelength ratiometric monitoring system for data centre CWDM applications;implementing a distributed firewall using a DHT network applied to smart grids;study of missing meter data impact on domestic load profiles clustering and characterisation;an innovative information and communication technology architecture to the V2G concept implementation;Portugal as a producer of biomass fuels for power production: an analysis of logistic costs associated to wood pellets exportation;diversification of Brazilian energy matrix by connecting distributed generation sources fuelled by biogas from swine manure;and hybrid series-parallel PWM dimming technique for integrated-converter-based HPF led drivers.
暂无评论