This paper aims to introduce my PhD research project, which is focused on scalable deep learning and its applications in the medical context. This project aims to design new DL algorithms or to adapt existing ones, to...
详细信息
The structure of the rocket-borne model is inherently complex, with processed images exhibiting high resolution and generating substantial amounts of data and calculations. Achieving robust real-time computing on an e...
详细信息
ISBN:
(数字)9798331531881
ISBN:
(纸本)9798331531898
The structure of the rocket-borne model is inherently complex, with processed images exhibiting high resolution and generating substantial amounts of data and calculations. Achieving robust real-time computing on an embedded platform poses significant challenges due to strictly limited resources, power consumption constraints, and size limitations. Our review of rocket-borne applications reveals considerable variability in the design resources of different devices, indicating a need for expanded design approaches. Upon evaluating existing methods, we identified two primary drawbacks. First, certain operators within the high-resolution target detection model are difficult to parallelize, resulting in significant inference delays that hinder the ability to meet task requirements. Although existing methods have been extended, there remains significant potential for performance enhancement in core scheduling for poor acceleration. This paper proposes an optimized architecture for the target detection algorithm accelerator designed for high-resolution images, along with a novel highly parallel data pre-processing and post-processing module implemented on FPGA to address these issues. Compared to the ARM implementation, this architecture demonstrates an improved performance of 24.64x. Furthermore, to ensure flexible application across various rocket launch scenarios, we introduce an optimization structure for convolution, pooling, and fusion operators and a multi-core expansion optimization method. This approach yields a 1.29x improvement in computing unit utilization compared to state-of-the-art multi-core scaling efforts. Finally, we assessed the accelerator architecture across multiple FPGA platforms, achieving a peak processing element utilization rate of 99.71% for a single core and layer. The overall computing efficiency, excluding the first layer, exceeded 90%. The peak computing power for the four cores reached 1638.4 GOPS, and the end-to-end computation time for
This special issue of the international Journal of High Performance Computing applications collects extended versions of the best three papers presented at the international Workshop on GPUs and Scientific Application...
详细信息
This special issue of the international Journal of High Performance Computing applications collects extended versions of the best three papers presented at the international Workshop on GPUs and Scientific applications (GPUScA 2010) held in Vienna in September 2010, in conjunction with PACT 2010 - the Annual internationalconference on parallelarchitectures and Compilation Techniques.
VLSI microprocessors have become central components in parallel computer architectures. parallel computers have reached a level of commercial viability for applications including databases and scientific modeling. Thi...
详细信息
ISBN:
(纸本)0852965192
VLSI microprocessors have become central components in parallel computer architectures. parallel computers have reached a level of commercial viability for applications including databases and scientific modeling. This commercial imperative has also inspired staightforward designs that can make extensive use of existing components, both hardware and software. Multiprocessor architectures are converging. For the present, it is imperative to adopt common standards for message passing programming. For the future, it is expected that scalable virtual shared memory machines will dominate.
Throttling of parallelism is of importance to dynamic parallelism models in which the number and sizes of possible parallel code segments (tasks, processes, threads, etc.) are unknown at compile time, and which, if le...
详细信息
Throttling of parallelism is of importance to dynamic parallelism models in which the number and sizes of possible parallel code segments (tasks, processes, threads, etc.) are unknown at compile time, and which, if left uncontrolled, may lead to explosive parallelism with resulting slowdown rather than speedup. The primary goal of throttling is preventing a dynamically unfolding parallel execution from creating so many parallel code segments that either (1) the system runs out of resources with which to manage the parallel segments, (2) the system begins to thrash as a result of the increased process management and communication requirements, or (3) the system simply wastes CPU time creating extra code segments which end up being executed sequentially anyway. On the other hand, speculative parallelism involves the parallel execution of code segments which are not yet known to be required. It can be exploited in two basic forms: parallel search and parallel test. Speculative parallelism can be used when there are idle resources and the currently executing set of codes is not expected to yield any more parallel threads of execution.
Today's applications are both object-oriented and based on a new type of three-tiered client-server architecture with clients, processing servers, and data servers as cornerstones. By recognising these trends, ind...
详细信息
Today's applications are both object-oriented and based on a new type of three-tiered client-server architecture with clients, processing servers, and data servers as cornerstones. By recognising these trends, industry and researchers have been engaged in defining standards and technologies for communicating the components of Distributed Information Systems and for providing compatible mechanisms to access databases, but a key problem with these complex architectures is still their performance. This paper presents a tool for predicting the performance of systems based on CORBA and DCOM as distributed-object architectures, and OLE-DB and PL/SQL as data-access architectures. The tool is an extension of SMART, a workbench that exploits analytical and simulation performance models to predict the performance of database applications. (C)2000 Elsevier Science B.V. All rights reserved.
One of the most important topics in parallel application development is portability. No standard parallel computational model exists, whereas the multitude of different parallelarchitectures, programming paradigms an...
详细信息
ISBN:
(纸本)3540593934
One of the most important topics in parallel application development is portability. No standard parallel computational model exists, whereas the multitude of different parallelarchitectures, programming paradigms and methodologies introduces significant difficulty in developing multi-system parallelapplications. In this paper we present a platform-based architecture for developing portable parallelapplications through the use of the platform approach. A platform is built on top of the native operating systems offering a common application programming interface (API). The API consists of a set of primitives through which portable applications can be developed (since only the platform - not the applications - requires re-programming for porting to different architectures). Furthermore, we define a set of key primitives specifically designed for implementing database applications. Our work towards this direction has been triggered by our recent implementation experience on parallel platforms and our interest in developing a parallel database system.
This paper aims to describe synthetically integration and use of parallelism in relational databases on MIMD parallel architecture models. More precisely, after exposing the main goals of parallel relational databases...
详细信息
ISBN:
(纸本)354061656X
This paper aims to describe synthetically integration and use of parallelism in relational databases on MIMD parallel architecture models. More precisely, after exposing the main goals of parallel relational databases, we highlight that It is essential to exploit recent parallelarchitectures to obtain high performance. parallelization of database programs requires the use of data placement approaches and data partitioning strategies which lead to extract levels, forms and types of parallelism. As for the inter-operation parallelization phase, the key problem of optimization, we describe one-phase and two-phase inter-operation parallelization strategies. This leads to unsolved problems which constitute a challenge for future parallel relational database systems.
This book constitutes the refereed proceedings of the 10th IEEE internationalconference Beyond databases, architectures, and Structures, BDAS 2014, held in Ustron, Poland, in May 2014. This book consists of 56 carefu...
ISBN:
(数字)9783319069326
ISBN:
(纸本)9783319069319;9783319069326
This book constitutes the refereed proceedings of the 10th IEEE internationalconference Beyond databases, architectures, and Structures, BDAS 2014, held in Ustron, Poland, in May 2014. This book consists of 56 carefully revised selected papers that are assigned to 11 thematic groups: query languages, transactions and query optimization; data warehousing and big data; ontologies and semantic web; computational intelligence and data mining; collective intelligence, scheduling, and parallel processing; bioinformatics and biological data analysis; image analysis and multimedia mining; security of database systems; spatial data analysis; applications of database systems; Web and XML in database systems.
暂无评论