The design of microprocessor chip for high-end computing systems is moving towards many-core architectures with 10s or 100+ processing units. An important class of the target applications for such architectures are sc...
详细信息
Canonical Polyadic Decomposition (CPD) is one of the most popular methods in tensor decomposition and plays an important role in big data analysis. For sparse tensor, the major computation procedure in CPD, known as m...
详细信息
Scheduling independent tasks on a parallel platform is a widely-studied problem, in particular when the goal is to minimize the total execution time, or makespan (P parallel to C-max problem in Graham's notations)...
详细信息
ISBN:
(纸本)9781665443012
Scheduling independent tasks on a parallel platform is a widely-studied problem, in particular when the goal is to minimize the total execution time, or makespan (P parallel to C-max problem in Graham's notations). Also, many applications do not consist of sequential tasks, but rather parallel moldable tasks that can decide their degree of parallelism at execution (i.e., on how many processors they are executed). Furthermore, since the energy consumption of data centers is a growing concern, both from an environmental and economical point of view, minimizing the energy consumption of a schedule is a main challenge to be addressed. One can then decide, for each task, on how many processors it is executed, and at which speed the processors are operated, with the goal to minimize the total energy consumption. We further focus on co-schedules, where tasks are partitioned into shelves, and we prove that the problem of minimizing the energy consumption remains NP-complete when static energy is consumed during the whole duration of the application. We are however able to provide an optimal algorithm for the schedule within one shelf, i.e., for a set of tasks that start at the same time. Several approximation results are derived, and simulations are performed to show the performance of the proposed algorithms.
As a strategy to obtain dense depth maps from a single image, sparse depths, or the fusion of both (RGBd), depth estimation receives much attention. Usually relying on high-performance workstations, existing depth pre...
详细信息
In recent years, cloud storage systems have emerged as the primary solution for online storage and information sharing. Due to efficient storage and bandwidth utilization, the use of erasure codes and network coding i...
详细信息
ISBN:
(纸本)9780769551029
In recent years, cloud storage systems have emerged as the primary solution for online storage and information sharing. Due to efficient storage and bandwidth utilization, the use of erasure codes and network coding is proven to effectively provide fault tolerance and fast content retrieval in cloud storage systems. In a nutshell, coded blocks are distributed among storage nodes, and file retrieval is accomplished by downloading sufficient coded blocks from any group of storage nodes. However, due to high correlation between coded blocks and the original file, even a single-byte update invalidates all coded blocks in the system. In this paper, we introduce DeltaNC, a new differential update algorithm that keeps all coded blocks in a network-coding-based cloud storage system synchronized by transmitting only the changes in the file. Our experimental results, from a trace-driven simulator, show that DeltaNC significantly reduces the bandwidth and CPU usage and its performance is comparable to that offered by the Diff program, the common tool for updating files.
Modern grids have become very complex by their size and their heterogeneity. It makes the deployment and maintenance of systems a difficult task requiring lots of efforts from administrators and programmers. Our goal ...
详细信息
This paper describes the design philosophy for the Grid system being developed by Japan Committee on high-performancecomputing for Bioinformatics and Initiative for Parallel Bioinformatics (IPAB). Grid is one of attr...
详细信息
ISBN:
(纸本)0769516599
This paper describes the design philosophy for the Grid system being developed by Japan Committee on high-performancecomputing for Bioinformatics and Initiative for Parallel Bioinformatics (IPAB). Grid is one of attractive solutions to achieve distributed bioinformtics environment with highperformance parallel computers, large genomic databases, computation intensive applications such as homology search and molecular simulation. However, much has been remained in Grid system design especially in the wide area network environment. OBIGrid emphasizes the virtual organization aspect of the Grid system and gives more priority on security and scalability rather than performance.
Superpixel segmentation is a very popular image segmentation technique used in various computer vision tasks. Recently, a number of superpixel algorithms have been proposed in literature. One such algorithm is conside...
详细信息
ISBN:
(纸本)9781538657546
Superpixel segmentation is a very popular image segmentation technique used in various computer vision tasks. Recently, a number of superpixel algorithms have been proposed in literature. One such algorithm is considered as the-state-of-the-art in superpixel segmentation: Simple Linear Iterative Clustering or SLIC. However, its original implementation has a long execution time on highperformance processors designed within the common mobile and enterprise applications, as well on high-end processors such as Intel Xeon. Overall, the execution time for single-threaded implementation is considered critical for real-time or near real-time applications. In this paper, we explore the possibility of accelerating parts of the SLIC image segmentation critical for performance, by designing the image segmentation accelerator for Intel's Arria 10 SoC. We propose a novel architecture to enable hardware acceleration by addressing the problem of hardware/software partitioning to minimize the overall program latency.
Monitoring and controlling a large number of geographically distributed scientific instruments is a challenging task. Some operations on these instruments require real-time (or quasi real-time) response which make it ...
详细信息
In this paper, we propose an efficient concurrent wait-free algorithm to construct an unbounded directed graph for shared memory architecture. To the best of our knowledge that this is the first wait-free algorithm fo...
详细信息
暂无评论