检索结果-内蒙古大学图书馆

Joint Meeting of the international Symposium on distributed Computing and applications to Business, Engineering and Science/international conference on parallel Algorithms and Computing Evironments

作者： Ong, Ghim-Hwee Fan, Lixin Natl Univ Singapore Sch Comp Dept Comp Sci Singapore 117548 Singapore

A two-level fast search algorithm to reduce the encoding time for hexagonal-based fractal image compression is presented. The design of the sequential algorithm is based on the distribution of matched domains in a given image. The first search level previews various portions of the image and identifies promising domains among all possible domains. The second search level picks out domain blocks in the image portion where the first level gives positive results, and compares them with a given range block for encoding. The algorithm is parallelized by a dynamic range distribution scheme to achieve load balancing. Experimental results show that by running the parallelized encoding algorithm on multiple processors, the encoding time is drastically reduced while the quality of image reconstruction is retained. A speed-up of about 9 can be obtained by using 13 processors.

关键词： image compression fractals parallel processing hexagonal partitioning

来源：评论

学校读者我要写书评

暂无评论

GPU-Based parallel Implementation of k-means Clustering Algorithm for Image Segmentation

GPU-Based Parallel Implementation of k-means Clustering Algo...

引用

IEEE international conference on Electro/Information Technology (EIT)

作者： Karbhari, Shruti Alawneh, Shadi Oakland Univ Dept Elect & Comp Engn Rochester MI 48063 USA

ISBN: (纸本)9781538653982

Clustering algorithms group a dataset into clusters that have common features. Clustering has applications in computer vision, data mining, market segmentation etc. The kmeans clustering algorithm is one of the most popular algorithms where the mean is used as a prototype of the cluster. In this paper, we explore accelerating the performance of k-means clustering using NVIDIA Graphics processing Units (GPUs) programmed with CUDA C. Different optimization techniques are applied such as the use of shared memory for image data and the use of constant memory for cluster data. The performance results are evaluated on a range of images from small (256x256 pixels) to large (1024x1024 pixels) and number of clusters range from 4 to 256. We find that on an average, the parallel implementation has a 9x speed up as compared to the sequential version for 4 clusters. The speedup increases to 57x as number of clusters increase to 256. This implementation also performs better than a reference implementation from Northwestern University/UC Berkeley.

关键词： k-means Clustering GPU Computing CUDA Image Segmentation

来源：评论

学校读者我要写书评

暂无评论

Efficient processing of distributed set queries

Efficient processing of distributed set queries

引用

PARBASE-90: international conference on Databases, parallel Architectures, and Their applications

作者： El-Sharkawi, Mohamed E. Kambayashi, Yahiko Kyushu Univ Dep of Comput Sci & Commun Eng Hakozaki Jpn

The problem of efficiently processing queries that manipulate sets is considered with the objective of minimizing the processing cost by reducing the size of transmitted data as much as possible. The semantics of set operations is used to achieve this goal. A set query has the general form SET 1 op SET 2. For two sets to be related by a set operation, their sizes should satisfy a necessary condition. For the two sets to be equal, they should have the same size. For SET 1 to be a subset of SET 2, its size should be less than or equal to the size of SET 2. In the relational model, given two attributes, the size of a set of values from one attribute that is associated with a value from the other attribute can be determined using functional dependency between the two attributes. Using these semantics, a distributed set query can be converted into a distributed nonset query. When the two sets are of size greater than one, however, the query cannot be converted into a nonset query. It is converted into another distributed set query. The size of data transmitted to answer the new query is reduced as much as possible. This is done by sending sets that satisfy the necessary condition of the set operation.

关键词： Database Systems

来源：评论

学校读者我要写书评

暂无评论

DOAF - Replica Localization Module

DOAF - Replica Localization Module

引用

36th international conference on Telecommunications and Signal processing (TSP)

作者： Cirstea, Calin Prostean, Octavian Cirstea, Cosmin Politehn Univ Timisoara Automat & Comp Sci Fac Automat & Appl Informat Dept Timisoara Romania

ISBN: (纸本)9781479904020;9781479904037

We present in this paper the detailed architecture of the replica localization module implemented inside the distributed Operation Application Framework (DOAF). DOAF is a development framework designed to speed up the development and deployment of distributed applications. The replica localization module (PathFinder) provides standardized interfaces and default implementation for replica management and localization operations. Besides the localization and management of replicas, the replica localization module offers support for topics like secured connections and data transmission or malicious replica substitution. The PathFinder management and localization operations are designed to support system wide optimization techniques by deeply coupling them with the Optimizer module.

关键词： Bind operation development framework distributed application localization module optimization replica management

来源：评论

学校读者我要写书评

暂无评论

parallel implementation of rotation visual cryptography on GPU using CUDA 17

Parallel implementation of rotation visual cryptography on G...

引用

2017 international conference on Advances in Image processing, ICAIP 2017

作者： Mayya, Veena Nayak, Aparna Manipal University Department of I and CT Manipal Institute of Technology India

ISBN: (纸本)9781450352956

Visual cryptography scheme (VCS), is a cryptography technique where visual information is encrypted in such a way that decryption can be fulfilled with human visual system by direct stacking of the encrypted shares. Rotation visual cryptography is advanced technique of VCS where direct stacking of shares will not uphold the original image. In the existing literature, encryption techniques of VCS take longer time that can be substantially reduced by using CUDA. The proposed method uses CUDA for computationally intensive independent tasks performed on pixels during the rotation VCS. parallel implementation of VCS will be faster and can be adopted in many real time applications that requires image security. Results show that the proposed method is 40-50 times faster than the existing encryption techniques. © 2017 Association for Computing Machinery.

关键词： Rotation

来源：评论

学校读者我要写书评

暂无评论

2012 international conference for High Performance Computing, Networking, Storage and Analysis, SC 2012

2012 International Conference for High Performance Computing...

引用

2012 24th international conference for High Performance Computing, Networking, Storage and Analysis, SC 2012

ISBN: (纸本)9781467308069

The proceedings contain 105 papers. The topics discussed include: NUMA-aware graph mining techniques for performance and energy efficiency;classifying soft error vulnerabilities in extreme-scale scientific applications using a binary instrumentation tool;containment domains: a scalable, efficient, and flexible resilience scheme for exascale systems;critical lock analysis: diagnosing critical section bottlenecks in multithreaded applications;code generation for parallel execution of a class of irregular loops on distributed memory systems;data-intensive spatial filtering in large numerical simulation datasets;parallel particle advection and FTLE computation for time-varying flow fields;parallel I/O, analysis, and visualization of a trillion particle simulation;forward and adjoint simulations of seismic wave propagation on emerging large-scale GPU architectures;and a divide and conquer strategy for scaling weather simulations with multiple regions of interest.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Efficient implementation of portable C*-like data-parallel library in C++

Efficient implementation of portable C*-like data-parallel l...

引用

international conference on Advances in parallel and distributed Computing

作者： Matsuda, M Sato, M Ishikawa, Y Tsukuba Research Cent Ibaraki Japan

ISBN: (纸本)0818678763

The C* language is a data-parallel extension of the C language which incorporates parallel data types. Since the C++ language provides operator overloading, a C++ library can implement the C* parallel extensions with a similar syntax. Although library implementations are highly portable, some overheads make them impractical. The two major overheads incurred are temporaries in each operator application, and the inability to detect regular communication patterns The C++ overloading mechanism forces a temporary for each operator application. Also, regular communications in. C* are syntactically indistinguishable from general point-to-point communications. We tackled these problems extensively in a library. The template mechanism, a type parameterization in C++, is used to eliminate temporaries by delaying operator application and evaluating the entire expression at once. The polymorphic type dispatch mechanism is used to detect regular communications by assigning particular types to potentially regular communications. We have implemented the library on the CM-5, and compared its performance with the C* compiler using three simple examples. The techniques presented offers improved performance comparable to the C* compiler, which is close or 1.5 times slower in two examples, and even faster in one example.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Low-cost tuning of two-step algorithms for scheduling mixed-parallel applications onto homogeneous clusters 10

Low-cost tuning of two-step algorithms for scheduling mixed-...

引用

10th IEEE/ACM international Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2010

作者： Hunold, Sascha International Computer Science Institute Berkeley CA United States

ISBN: (纸本)9781424469871

Due to the strong increase of processing units available to the end user, expressing parallelism of an algorithm is a major challenge for many researchers. parallel applications are often expressed using a task-parallel model (task graphs), in which tasks can be executed concurrently unless they share a dependency. If these tasks can also be executed in a data-parallel fashion, e.g., by using MPI or OpenMP, then we call it a mixed-parallel programming model. Mixed-parallel applications are often modeled as directed acyclic graphs (DAGs), where nodes represent the tasks and edges represent data dependencies. To execute a mixed-parallel application efficiently, a good scheduling strategy is required to map the tasks to the available processors. Several algorithms for the scheduling of mixed-parallel applications onto a homogeneous cluster have been proposed. MCPA (Modified CPA) has been shown to lead to efficient schedules. In the allocation phase, MCPA considers the total number of processors allocated to all potentially concurrently running tasks as well as the number of processors in the cluster. In this article, it is shown how MCPA can be extended to obtain a more balanced workload in situations where concurrently running tasks differ significantly in the number of operations. We also show how the allocation procedure can be tuned in order to deal not only with regular DAGs (FFT), but also with irregular ones. We also investigate the question whether additional optimizations of the mapping procedure, such as packing of allocations or backfilling, can reduce the makespan of the schedules. © 2010 IEEE.

关键词： Scheduling

来源：评论

学校读者我要写书评

暂无评论

Advanced high-performance computer system architectures

引用

NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT 2007年第1-2期571卷 429-432页

作者： Vinogradov, V. I. Russian Acad Sci Inst Nucl Res Moscow 117312 Russia

Convergence of computer systems and communication technologies are moving to switched high-performance modular system architectures on the basis of high-speed switched interconnections. Multi-core processors become more perspective way to high-performance system, and traditional parallel bus system architectures (VME/VXI, cPCI/PXI) are moving to new higher speed serial switched interconnections. Fundamentals in system architecture development are compact modular component strategy, low-power processor, new serial high-speed interface chips on the board, and high-speed switched fabric for SAN architectures. Overview of advanced modular concepts and new international standards for development high-performance embedded and compact modular systems for real-time applications are described. (c) 2006 Elsevier B.V. All rights reserved.

关键词： microprocessor interface system nodule network communications multiprocessor RT-system cluster interconnect electronics information technology switches telecommunications image processing data acquisition control distributed system coherent router compact node protocol convergence supercomputer data center HPC terminal

来源：评论

学校读者我要写书评

暂无评论

international conference for High Performance Computing, Networking, Storage and Analysis, SC

International Conference for High Performance Computing, Net...

引用

2016 international conference for High Performance Computing, Networking, Storage and Analysis, SC 2016

ISBN: (纸本)9781467388153

The proceedings contain 87 papers. The topics discussed include: the vectorization of the Tersoff multi-body potential: an exercise in performance portability;increasing molecular dynamics simulation rates with an 8-fold increase in electrical power efficiency;TrueNorth ecosystem for brain-inspired computing: scalable systems, software, and applications;evaluating HPC networks via simulation of parallel workloads;PFEAST: a high performance sparse eigenvalue solver using distributed-memory linear solvers;block iterative methods and recycling for improved scalability of linear solvers;pinpointing scale-dependent integer overflow bugs in large-scale parallel applications;compiler-directed lightweight checkpointing for fine-grained guaranteed soft error recovery;simulation and performance analysis of the ECMWF tape library system;real-time synthesis of compression algorithms for scientific data;serf: efficient scheduling for fast deep neural network serving via judicious parallelism;graph colouring as a challenge problem for dynamic graph processing on distributed systems;an exploration of optimization algorithms for high performance tensor completion;and designing MPI library with on-demand paging (ODP) of InfiniBand: challenges and benefits.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：