In an application that estimates the movement of pedestrians in urban areas utilizing an advancing person re-identification technique as a video analysis scheme, a massive number of simultaneous similarity searches of...
详细信息
ISBN:
(纸本)9781467388450
In an application that estimates the movement of pedestrians in urban areas utilizing an advancing person re-identification technique as a video analysis scheme, a massive number of simultaneous similarity searches of feature data, which represent a person's characteristics as numerical values, is required. The system should be able to process over 10,000 people per minute if a large-scale urban facility is assumed. However, the computation cost of similarity searches is high and the size of the feature data extracted from a video become rather large. These properties constitute the obstacles for large-scale estimations using live videos. We propose a novel design of a live video analysis system, which executes the processes of feature data extraction and similarity searches using parallel computations on distributed server nodes connected via a peer-to-peer network. We implemented the system on a testbed and evaluated its performance using a real dataset of a large-scale facility, applying an existing face recognition technique as a person re-identification scheme, and confirmed that the processes can be completed within a minute.
We describe the design and the implementation of the CFS (Cluster File System) storage system which is dedicated to video streams. Our goal is to provide a system with the following features: 1) High number of support...
In this work we describe two sequential algorithms and their parallel counterparts for solving nonlinear systems, when the Jacobian matrix is symmetric and positive definite. This case appears frequently in unconstrai...
详细信息
In this study, we have successfully developed a grid-enabled software distributed shared memory called Teamster-G. This system provides users with not only a shared memory programming interface but also a transparent ...
详细信息
A key emerging and popular communication paradigm, primarily employed for information dissemination, is peer-to-peer (P2P) networking. In this paper, we model the spread of malware in decentralized, Gnutella type of p...
详细信息
In some scenarios involving on-line transaction processing within a distributed database, it is desirable to synchronize transactions in a manner that guarantees conflict equivalence with a serial schedule ordered by ...
详细信息
ISBN:
(纸本)076950728X
In some scenarios involving on-line transaction processing within a distributed database, it is desirable to synchronize transactions in a manner that guarantees conflict equivalence with a serial schedule ordered by original transaction start times while providing each transaction with an anomaly serializable isolation. Few theoretical concurrency control algorithms guarantee such a conflict equivalence, and we are unaware of any protocol that accomplishes this while supporting real-world issues such as out-of-order transaction messages, out-of-order operation executions, and out-of-order transaction committals without the burden of explicit readset and writeset declarations We describe an algorithm that provides this guarantee mid supports these issues while requiring only table-level writeset declarations.
Most supercomputers adopt a data forwarding architecture to achieve storage scalability. However, it results in a significant reduction in single-process bandwidth compared to direct file system access. Moreover, cons...
详细信息
ISBN:
(纸本)9781665473156
Most supercomputers adopt a data forwarding architecture to achieve storage scalability. However, it results in a significant reduction in single-process bandwidth compared to direct file system access. Moreover, considering that a majority of applications uses only a single process for writing and reading data, the low single-process performance also leads to a time overhead for these applications. This paper proposes an userspace forwarding mechanism DFBUFFER with two performance optimization methods: user-space multi-thread request processing and data write buffer in a unit of file. The client of DFBUFFER is embedded in the application as a library reducing the software overhead, and the server implements multi-thread I/O request processing to improve bandwidth efficiency. The data write buffer can asynchronously handle write requests, which accelerates the write bandwidth of compute nodes. We evaluate DFBUFFER on the Sunway exascale prototype system. The results indicate that in the regular mode of DFBUFFER, both the write and read latency are reduced, and the write bandwidth and large-block read bandwidth of single-process are increased by 1.8 times and 2.8 times respectively. The DFBUFFER buffer mode increases the write bandwidth of a single process by 0.8 times over the regular mode. Although the performance advantage of the regular mode of DFBUFFER gradually weakens with the increase of concurrent processes, the DFBUFFER buffer mode has the effect of improving the write bandwidth, the 64-IO-processes application is increased by 0.2 times.
Application Specific Instruction Processors (or, ASIPs) have the potential to meet the high-performance demands of multimedia applications, such as image processing, audio and video encoding, speech processing, and di...
详细信息
ISBN:
(纸本)9783540747413
Application Specific Instruction Processors (or, ASIPs) have the potential to meet the high-performance demands of multimedia applications, such as image processing, audio and video encoding, speech processing, and digital signal processing. To achieve lower cost and efficient energy for high performance embedded systems built by ASIPs, subword parallelism optimization will become an important alternative to accelerate multimedia applications. But one major problem is how to exploit subword parallelism for ASIPs with limited resources. This paper shows that loop transformations such as loop unrolling, variable expansion, etc., can be utilized to create opportunities for subword parallelism, and presents a novel approach to recognize and extract subword parallelism based on Cost Subgragh (or, CSG). This approach is evaluated on Transport Triggered Architecture (TTA), a customizable processor architecture that is particularly suitable for tailoring the hardware resources according to the requirements of the application. In our experiment, 63.58% of loops and 85.64% of instructions in these loops can exploit subword parallelism. The results indicate that significant available subword parallelism would be attained using our method.
The paper studies the problem of restructuring two dimensional wafers in the presence of faults. It constructs arrays of size as much as the size of the original faulty wafer. Obviously this is done at the cost of inc...
详细信息
暂无评论