Quantitative Finance (QF) utilizes increasingly sophisticated mathematic models and advanced computer techniques to predict the movement of global markets, and price the derivatives and other assets. Being able to rea...
详细信息
Quantitative Finance (QF) utilizes increasingly sophisticated mathematic models and advanced computer techniques to predict the movement of global markets, and price the derivatives and other assets. Being able to react quickly and intelligently to fast-changing markets is a decisive success factor for trading companies. To date, the rise of QF requires an integrated toolchain of enabling technologies to carry out complex event processing on the explosive growth and diversified forms of market metadata, in pursuit of a microsecond latency on an Exabyte-level dataset. Inspired by this, we present a data-driven execution paradigm that untangles the dependencies of complex processing events and integrate the paradigm with a big data infrastructure that streams time series data. This integrated platform is termed as the QuantCloud platform. Essentially, QuantCloud executes the complex event processing in a data-driven mode and manages large amounts of diversified market data in a data-parallel mode. To show its practicability and performance, we develop a prototype and benchmark by applying real-world QF research models on the New York Stock Exchange (NYSE) data. Using this prototype, we demonstrate this platform with an application to: (i) data cleaning and aggregating (including the computing of logarithmic returns from tick data and the finding the medians of grouped data) and (ii) data modeling: the autoregressive-moving average (ARMA) model. The performance results show that (a) this platform obtains a high throughput (usually in the order of millions of tick messages per second) and a sub-microsecond latency;(b) it fully executes data-dependent tasks through a data-driven execution;and (c) it implements a modular design approach for rapidly developing these data-crunching methods and QF research models. This platform resulting from an aggregated effort of the data-driven execution and big data infrastructure, offers the financial engineers with new insights and en
Hiding communication behind useful computation is an important performance programming technique but remains an inscrutable programming exercise even for the expert. We present Bamboo, a code transformation framework ...
详细信息
Hiding communication behind useful computation is an important performance programming technique but remains an inscrutable programming exercise even for the expert. We present Bamboo, a code transformation framework that can realize communication overlap in applications written in MPI without the need to intrusively modify the source code. We reformulate MPI source into a task dependency graph representation, which partially orders the tasks, enabling the program to execute in a data-driven fashion under the control of an external runtime system. Experimental results demonstrate that Bamboo significantly reduces communication delays while requiring only modest amounts of programmer annotation for a variety of applications and platforms, including those employing co-processors and accelerators. Moreover, Bamboo's performance meets or exceeds that of labor-intensive hand coding. The translator is more than a means of hiding communication costs automatically;it demonstrates the utility of semantic level optimization against a well-known library. (C) 2017 Elsevier Inc. All rights reserved.
This paper presents the hardware prototype of a Network-on-Chip (NoC) for a chip multiprocessor that provides support for cache coherence, cache prefetching and cache-aware thread scheduling. A NoC with support to the...
详细信息
ISBN:
(纸本)9780819476371
This paper presents the hardware prototype of a Network-on-Chip (NoC) for a chip multiprocessor that provides support for cache coherence, cache prefetching and cache-aware thread scheduling. A NoC with support to these cache related mechanisms can assist in improving systems performance by reducing the cache miss ratio. The presented multi-core system employs the data-driven Multithreading (DDM) model of execution. In DDM thread scheduling is done according to data availability, thus the system is aware of the threads to be executed in the near future. This characteristic of the DDM model allows for cache aware thread scheduling and cache prefetching. The NoC prototype is a crossbar switch with output buffering that can support a cache-aware 4-node chip multiprocessor. The prototype is built on the Xilinx ML506 board equipped with a Xilinx Virtex-5 FPGA.
Current high-end microprocessors achieve high performance as a result of adding more features and therefore increasing complexity. This paper makes the case for a Chip-Multiprocessor based on the data-driven Multithre...
详细信息
Current high-end microprocessors achieve high performance as a result of adding more features and therefore increasing complexity. This paper makes the case for a Chip-Multiprocessor based on the data-driven Multithreading (DDM-CMP) execution model in order to overcome the limitations of current design trends. data-driven Multithreading (DDM) is a multithreading model that effectively hides the communication delay and synchronization overheads. DDM-CMP avoids the complexity of other designs by combining simple commodity microprocessors with a small hardware overhead for thread scheduling and an interconnection network. Preliminary experimental results show that a DDM-CMP chip of the same hardware budget as a high-end commercial microprocessor, clocked at the same frequency, achieves a speedup of up to 18.5 with a 78-81% power consumption of the commercial chip. Overall, the estimated results for the proposed DDM-CMP architecture show a significant benefit in terms of both speedup and power consumption making it an attractive architecture for future processors.
Current high-end microprocessors achieve high performance as a result of adding more features and therefore increasing complexity. This paper makes the case for a Chip-Multiprocessor based on the data-driven Multithre...
详细信息
Current high-end microprocessors achieve high performance as a result of adding more features and therefore increasing complexity. This paper makes the case for a Chip-Multiprocessor based on the data-driven Multithreading (DDM-CMP) execution model in order to overcome the limitations of current design trends. data-driven Multithreading (DDM) is a multithreading model that effectively hides the communication delay and synchronization overheads. DDM-CMP avoids the complexity of other designs by combining simple commodity microprocessors with a small hardware overhead for thread scheduling and an interconnection network. Preliminary experimental results show that a DDM-CMP chip of the same hardware budget as a high-end commercial microprocessor, clocked at the same frequency, achieves a speedup of up to 18.5 with a 78-81% power consumption of the commercial chip. Overall, the estimated results for the proposed DDM-CMP architecture show a significant benefit in terms of both speedup and power consumption making it an attractive architecture for future processors.
A number of data-driven execution models have been proposed for parallel execution of logic programs.8,12,9,3) LogDf is an abstract data-driven execution model for pure logic programs,3) which has shown promising perf...
详细信息
A number of data-driven execution models have been proposed for parallel execution of logic programs.8,12,9,3) LogDf is an abstract data-driven execution model for pure logic programs,3) which has shown promising performance during simulations. However, the original model lacks support for extra logical features such as cut and side-effects, which are needed to execute Prolog programs. This paper describes a scheme that has been incorporated into the LogDf model to support cut and side-effects. The main component of the scheme is a data structure, called a flat,non-strict S-Stream, which maintains strict ordering of multiple solutions, and, at the same time, allows simultaneous modification by several actors. This ordering corresponds to the order in which solutions would be produced in a sequential system and is necessary to implement cut and side-effects. The correct synchronization and ordering of operations on the cells of an S-Stream is ensured by the use of I-structure memory.1) The descriptor based token coloring mechanism in the LogDf provides convenient support for maintaining the scope information associated with cuts. An efficient garbage collection strategy is also proposed.
A collection of parallel processors is said to be coordinated if each write from one processing element (PE) to another is answered by a read. We report on an efficient algorithm to test coordination for parallel prog...
详细信息
A collection of parallel processors is said to be coordinated if each write from one processing element (PE) to another is answered by a read. We report on an efficient algorithm to test coordination for parallel programs in which the code for each PE is a loop. We also test a weaker predicate for parallel algorithms with oblivious PE codes and we show that the general problem is PSPACE-hard.
暂无评论