Sequential and parallel applications use most of the data as private in a multicore system. Recent proposals made use of this observation to reduce the area of the coherence directories or the memory access latency. T...
详细信息
ISBN:
(纸本)9781728165820
Sequential and parallel applications use most of the data as private in a multicore system. Recent proposals made use of this observation to reduce the area of the coherence directories or the memory access latency. The driving force of these proposals is the classification of private/shared memory data. The effectiveness of these proposals depends on the number of detected private data. The existing proposals perform the private/shared classification at page granularity, leading to a noticeable amount of miss-classified memory blocks. We propose a mechanism that works on block granularity using the translation lookaside buffer (TLB) to make accurate detection of private data, which increases the effectiveness of proposals relying on a private/shared classification. Simulation results show that the block-grain approach obtains 17.0% more accessed private miss data than the page-grain approach, which translates to an improvement in system performance by 6.02% compared to a page-grain approach.
The paper proposes a new approach to solve role mining problem in role-based access control systems. This approach is founded on applying genetic algorithms as heuristic optimization methods that are effectively used ...
详细信息
ISBN:
(纸本)9780769543284
The paper proposes a new approach to solve role mining problem in role-based access control systems. This approach is founded on applying genetic algorithms as heuristic optimization methods that are effectively used when the search space is too huge to be fully explored. To realize genetic algorithms, we propose some important novelties: having many chromosomes by individuals, presentation of genes as complex objects, dividing selection and mutation into several phases, accounting data confidentiality and availability in fitness functions and other. Proposed genetic algorithms were tested on randomly generated data sets for "basic" and "edge" role mining problems. The test results allow to assert that genetic algorithms may be successfully applied to efficiently solve main kinds of role mining problems.
We explore in this paper the application of bio-inspired approaches to the association rules mining (ARM) problem for the purpose of accelerating the process of extracting the correlations between items in sizeable da...
详细信息
ISBN:
(纸本)9781509060580
We explore in this paper the application of bio-inspired approaches to the association rules mining (ARM) problem for the purpose of accelerating the process of extracting the correlations between items in sizeable data instances. A new bio-inspired GPU-based model is proposed, which benefits from the massively GPU threading by evaluating multiple rules in parallel on GPU. To validate the proposed model, the most used bio-inspired approaches (GA, PSO, and BSO) have been executed on GPU to solve well-known large ARM instances. Real experiments have been carried out on an Intel Xeon 64 bit quad-core processor E5520 coupled to an Nvidia Tesla C2075 GPU device. The results show that the genetic algorithm outperforms PSO and BSO. Moreover, it outperforms the state-of-the-art GPU-based ARM approaches when dealing with the challenging Webdocs instance.
Infrastructure as a Service providers use virtualization to abstract their hardware and to create a dynamic data center. Virtualization enables the consolidation of virtual machines as well as the migration of them to...
详细信息
ISBN:
(纸本)9780769543284
Infrastructure as a Service providers use virtualization to abstract their hardware and to create a dynamic data center. Virtualization enables the consolidation of virtual machines as well as the migration of them to other hosts during runtime. Each provider has its own strategy to efficiently operate a data center. We present a rule based mapping algorithm for VMs, which is able to automatically adapt the mapping between VMs and physical hosts. It offers an interface where policies can be defined and combined in a generic way. The algorithm performs the initial mapping at request time as well as a remapping during runtime. It deals with policy and infrastructure changes. We extended the open source IaaS solution Eucalyptus and we evaluated it with typical policies: maximizing the compute performance and VM locality to achieve a high performance and minimizing energy consumption. The evaluation was done on state-of-the-art servers in our own data center and by simulations using a workload of the parallel Workload Archive. The results show that our algorithm performs well in dynamic data centers environments.
We take a step forward in the direction of developing high performance codes for the convolution, based on the Winograd transformation, that are easy to customize for different processor architectures. In our approach...
详细信息
ISBN:
(纸本)9781665469586
We take a step forward in the direction of developing high performance codes for the convolution, based on the Winograd transformation, that are easy to customize for different processor architectures. In our approach, augmenting the portability of the solution is achieved via the introduction of vector intrinsics to exploit the SIMI) (single-instruction multiple-data) capabilities of current processors as well as OpenMP pragmas to exploit multi-thread parallelism. While this comes at the cost of sacrificing a fraction of the computational performance, our experimental results on two distinct processors, with Intel Xeon Skylake and ARM Cortex A57 architectures, show that the impact is affordable, and still renders a Winograd-based solution that is competitive with the general method for the convolution based on the so-called im2col transform followed by a matrix-matrix multiplication.
As recent neural networks are being improved to be more accurate, their model's size is exponentially growing. Thus, a huge number of parameters requires to be loaded and stored from/in memory hierarchy and comput...
详细信息
ISBN:
(纸本)9781728165820
As recent neural networks are being improved to be more accurate, their model's size is exponentially growing. Thus, a huge number of parameters requires to be loaded and stored from/in memory hierarchy and computed in processors to perform training or inference phase of neural networkprocessing. Increasing the number of parameters causes a big challenge for real-time deployment since the memory bandwidth improvement's trend cannot keep up with models' complexity growing trend. Although some operations in neural networks processing are computational intensive such as convolutional layer computing, computing dense layers face with memory bandwidth bottleneck. To address the issue, the paper has proposed Partition Pruning for dense layers to reduce the required parameters while taking into consideration parallelization. We evaluated the performance and energy consumption of parallel inference of partitioned models, which showed a 7.72x speedup of performance and a 2.73x reduction in the energy used for computing pruned fully connected layers in TinyVGG16 model in comparison to running the unpruned model on a single accelerator. Besides, our method showed a limited reduction in accuracy while partitioning fully connected layers.
Data movement between memory subsystem and processor unit is a crippling performance and energy bottleneck for data-intensive applications. Near Memory processing (NMP) is a promising solution to alleviate the data mo...
详细信息
ISBN:
(纸本)9781665414555
Data movement between memory subsystem and processor unit is a crippling performance and energy bottleneck for data-intensive applications. Near Memory processing (NMP) is a promising solution to alleviate the data movement bottleneck. The introduction of 3D-stacked memories and more importantly hybrid memory systems enable the long-wished NMP capability. This work explores the feasibility and efficacy of having NMP on the hybrid memory system for a given set of applications. In this paper, we first redefine a set of NMP-centric performance metrics in order to analyze the efficacy of a given processing unit. Leveraging the proposed metrics, we characterize various sets of applications to assess the suitability of a processing unit in terms of performance. Specifically, in this work we motivate the efficiency of NMP subsystems to process memory-intensive applications when 3D-NVM technologies are employed.
Stream processing applications compute streams of data and provide insightful results in a timely manner, where parallel computing is necessary for accelerating the application executions. Considering that these appli...
详细信息
ISBN:
(纸本)9781665414555
Stream processing applications compute streams of data and provide insightful results in a timely manner, where parallel computing is necessary for accelerating the application executions. Considering that these applications are becoming increasingly dynamic and lung-running, a potential solution is to apply dynamic runtime changes. However, it is challenging for humans to continuously monitor and manually self-optimize the executions. In this paper, we propose self-adaptiveness of the parallel patterns used, enabling flexible on-the-fly adaptations. The proposed solution is evaluated with an existing programming framework and running experiments with a synthetic and a real-world application. The results show that the proposed solution is able to dynamically self-adapt to the most suitable parallel pattern configuration and achieve performance competitive with the best static cases. The feasibility of the proposed solution encourages future optimizations and other applicabilities.
Particle tracking plays an important role in numerous fields of science. In this paper, we present TraCCA, an algorithm for detecting and tracking particles based on geometrical difference evaluation and centroid disp...
详细信息
ISBN:
(纸本)9781509060580
Particle tracking plays an important role in numerous fields of science. In this paper, we present TraCCA, an algorithm for detecting and tracking particles based on geometrical difference evaluation and centroid displacement analysis to reconstruct the trajectories. This method works for n-dimensional input data provided that particles are represented by at least a centroid space coordinate and a geometrical entity which describe their shape. Since 2-D images are a common source of such data, we also present a framework for image-manipulation based on Extended Cellular Automata (XCA). We have applied and validated TraCCA in investigating the motility of B. subtilis. injected in a microfluidic device using 4100 images taken at 100 frames per second. Results show that the framework is able to reconstruct the trajectories as computed motion parameters are in accordance with the ones reported in the literature.
From information security point of view embedded devices are the elements of complex systems operating in a potentially hostile environment. Therefore development of embedded devices is a complex task that often requi...
详细信息
ISBN:
(纸本)9781467387767
From information security point of view embedded devices are the elements of complex systems operating in a potentially hostile environment. Therefore development of embedded devices is a complex task that often requires expert solutions. The complexity of the task of developing secure embedded devices is caused by various types of threats and attacks that may affect the device, as well as that in practice security of embedded devices is usually considered at the final stage of the development process in the form of adding additional security features. The paper proposes a design technique and its application that will facilitate development of secure and energy efficient embedded devices. The technique organizes the search for the best combinations of security components on the basis of solving an optimization problem. The efficiency of the proposed technique is demonstrated by development of a room perimeter protection system.
暂无评论