Multi-wavelength data cross-match among multiple catalogs is a basic and unavoidable step to make distributed digital archives accessible and inter-operable. As current catalogs often contain millions or billions obje...
详细信息
ISBN:
(纸本)9783642030949
Multi-wavelength data cross-match among multiple catalogs is a basic and unavoidable step to make distributed digital archives accessible and inter-operable. As current catalogs often contain millions or billions objects, it is a typical data-intensive computation problem. In this paper, a high-efficient parallel approach of astronomical cross-match is introduced. We issue our partitioning and parallelization approach, after that we address some problems introduced by task partition and give the solutions correspondingly, including a sky splitting function HEALPix we selected which play a key role on boththe task partitioning and the database indexing, and a quick bit-operation algorithm we advanced to resolve the block-edge problem. Our experiments prove that the function has a marked performance superiority comparing withthe previous functions and is fully applicable to large-scale cross-match.
Physically-based illumination is an essential factor for realistic rendering. In this context, hierarchical radiosity is one of the most accurate global illumination methods. One of the key features of the radiosity a...
详细信息
ISBN:
(纸本)9780769535449
Physically-based illumination is an essential factor for realistic rendering. In this context, hierarchical radiosity is one of the most accurate global illumination methods. One of the key features of the radiosity approach is that it obtains view-independent global illumination results. Unfortunately, global illumination has high computational and memory requirements, and hierarchical radiosity, though more efficient than other radiosity solutions, is not an exception. the progressive popularization of multiprocessor and multi-core processor systems makes the design and implementation of efficient parallelalgorithms an appealing alternative in this field. In this paper we present a novel parallel radiosity method addressing the hierarchical radiosity computation on current homogeneous multi-core environments. One of the main contributions of our work is the use of different tasks to exploit the independent interactions among the geometric elements in the scene. Our parallel solution leads to a versatile radiosity implementation that takes advantage of the multiple computational resources in the system, such as multi-core processors and SMT (Simultaneous Multithreading) capabilities. Good results in terms of performance have been achieved.
Free/Libre/Open Source Software (FLOSS) practitioners and developers are typically also users of their own systems: as a result, traditional software engineering (SE) processes (e.g., the requirements and design phase...
详细信息
ISBN:
(纸本)9783642020315
Free/Libre/Open Source Software (FLOSS) practitioners and developers are typically also users of their own systems: as a result, traditional software engineering (SE) processes (e.g., the requirements and design phases), take less time to articulate and negotiate among FLOSS developers. Design and requirements are kept more as informal knowledge, rather than formally described and assessed. this paper attempts to recover the SE concepts of software design and architectures from three FLOSS case studies, sharing the same application domain (i.e., Instant Messaging). Its first objective is to determine whether a common architecture emerges from the three systems, which can be used as shared knowledge for future applications. the second objective is to determine whether these architectures evolve or decay during the evolution of these systems. the results of this study are encouraging: albeit no explicit effort was done by FLOSS developers to define a high-level view of the architecture, a common shared architecture could be distilled for the Instant Messaging application domain. It was also found that, for two of the three systems, the architecture becomes better organised, and the components better specified, as long as the system evolves in time.
We presents a new scheduling solution for press shop and a new scheduling model based on constraints parallel machine for press shop. In this paper, the scheduling for press shop is divided into scheduling of one sing...
详细信息
Turbo codes experience a significant decoding delay because of the iterative nature of the decoding algorithms, the high number of metric computations and the complexity added by the (de)interleaver. the extrinsic inf...
详细信息
ISBN:
(纸本)9781424437320
Turbo codes experience a significant decoding delay because of the iterative nature of the decoding algorithms, the high number of metric computations and the complexity added by the (de)interleaver. the extrinsic information is exchanged sequentially between two Soft-Input Soft-Output (SISO) decoders. Instead of this sequential process, a received frame can be divided into smaller windows to be processed in parallel. In this paper, a novel parallelprocessing methodology is proposed based on the previous parallel decoding techniques. A novel Contention-Free (CF) interleaver is proposed as part of the decoding architecture which allows using extrinsic Log-Likelihood Ratios (LLRs) immediately as a-priori LLRs to start the second half of the iterative turbo decoding. the simulation case studies performed in this paper show that our parallel decoding method can provide %80 time saving compared to the standard decoding and %30 time saving compared to the previous parallel decoding methods at the expense of 0.3dB Bit Error Rate (BER) performance degradation.
Low power multipliers with high clock frequencies play an important role in today's digital signal processing. In this work, the performance analysis of Wallace-tree, Array and Baugh-Wooley multiplier architecture...
详细信息
Because the parallel data processing and the fast carry-free algorithm can be achieved by using Residue Number System (RNS) in VLSI (very large scale integration) design, RNS (Residue Number System) shows the high per...
详细信息
One of the most widely used schemes to extract feature points suitable for tracking in computer vision is "Good Features to Track". In this paper, we propose parallel implementation of the good feature extra...
详细信息
Modeling and simulation are necessary for nano-material science, as experiments and appliances are costly. In this work we introduce a typical tool flow used for computational nano-material science and report our comp...
详细信息
Automated code generation and performance tuning techniques for concurrent architectures such as GPUs, Cell and FPGAs can provide integer factor speedups over multi-core processor organizations for data-parallel, floa...
详细信息
ISBN:
(纸本)9781424438914
Automated code generation and performance tuning techniques for concurrent architectures such as GPUs, Cell and FPGAs can provide integer factor speedups over multi-core processor organizations for data-parallel, floating-point computation in SPICE Model-Evaluation. Our Verilog AMS compiler produces code for parallel evaluation of non-linear circuit models suitable for use in SPICE simulations where the same model is evaluated several times for all the devices in the circuit. Our compiler uses architecture specific parallelization strategies (OpenMP for multi-core, Pthreads for Cell, CUDA for GPU, statically scheduled VLIW for FPGA) when producing code for these different architectures. We automatically explore different implementation configurations (e.g. unroll factor, vector length) using our performance-tuner to identify the best possible configuration for each architecture. We demonstrate speedups of 3 182 x for a Xilinx Virtex5 LX 330T, 1.3-33x for an IBM Cell, and 3-131 x for an NVIDIA 9600 GT CPU over a 3 GHz Intel Xeon 5160 implementation for a variety of single-precision device models.
暂无评论