In this paper we present SciSpark, a Big Data framework that extends Apache (TM) Spark for scaling scientific computations. The paper details the initial architecture and design of SciSpark. We demonstrate how SciSpar...
详细信息
ISBN:
(纸本)9781479999255
In this paper we present SciSpark, a Big Data framework that extends Apache (TM) Spark for scaling scientific computations. The paper details the initial architecture and design of SciSpark. We demonstrate how SciSpark achieves parallel ingesting and partitioning of earth science satellite and model datasets. We also illustrate the usability and extensibility of SciSpark by implementing aspects of the Grab 'em Tag 'em Graph 'em (GTG) algorithm using SciSpark and its Map Reduce capabilities. GTG is a topical automated method for identifying and tracking Mesoscale Convective Complexes in satellite infrared datasets.
With rapid growth of world economy, the user activity in the capital markets is increased. This results into large number of transactional activities in trading systems. Hence, the trade surveillance system with low l...
详细信息
ISBN:
(纸本)9781538653302
With rapid growth of world economy, the user activity in the capital markets is increased. This results into large number of transactional activities in trading systems. Hence, the trade surveillance system with low latency and high throughput is needed to monitor such a large amount of data in order to improve user experience by reducing discrepancies and frauds. In-memory technology reduces this latency by processing as well as caching data in main memory thereby removing the overhead of disk access. Currently, open-source frameworks such as Apache Ignite, Apache Flink and Kafka Streams provides in-memory streaming and caching functionalities along with scalability and fault-tolerant features. The paper talks about Trade Surveillance System (TSS), which includes Complex Event Processing (CEP). Here we discuss design, implementation and tuning of three different high-performance architectures for trade surveillance system using Ignite, Flink and Kafka Streams as in-memory streaming technologies. Paper also compares system throughput, support for fault tolerance and effect of caching on streaming throughput for all three architectures. Based on experiments, it is seen that Ignite outperforms Flink and Kafka Streams in CEP based streaming. Flink is more reliable considering fault-tolerance and event-time processing at streaming layer compared to Ignite. Though Kafka Streams also provides fault-tolerance and event-time processing out of the box, it shows high latency due to disk based processing.
We present further work on SciSpark, a Big Data framework that extends Apache Spark's in-memory parallel computing to scale scientific computations. SciSpark's current architecture and design includes: time an...
详细信息
ISBN:
(纸本)9781467390057
We present further work on SciSpark, a Big Data framework that extends Apache Spark's in-memory parallel computing to scale scientific computations. SciSpark's current architecture and design includes: time and space partitioning of high-resolution geo-grids from NetCDF3/4;a sciDataset class providing N-dimensional array operations in Scala/Java and CF-style variable attributes (an update of our prior sciTensor class);parallel computation of time-series statistical metrics;and an interactive front-end using science (code & visualization) Notebooks. We demonstrate how SciSpark achieves parallel ingest and time/space partitioning of Earth science satellite and model datasets. We illustrate the usability, extensibility, and early performance of SciSpark using several Earth science Use cases, here presenting benchmarks for sciDataset Readers and parallel time-series analytics. A three-hour SciSpark tutorial was taught at an ESIP Federation meeting using a dozen "live" Notebooks.
Trade surveillance is an important concern in recent trading engines to detect and prevent fraudulent trades at earliest. In traditional trading platforms, to achieve high throughput and low latency requirements focus...
详细信息
ISBN:
(纸本)9781450357821
Trade surveillance is an important concern in recent trading engines to detect and prevent fraudulent trades at earliest. In traditional trading platforms, to achieve high throughput and low latency requirements focus of developers has always been on high-performance languages such as C, C++ and FPGA based systems. These systems have limitations of scalability and fault-tolerance. With the arrival of in-memory technology, these requirements can be met with Java-based frameworks like Ignite, Flink, Spark. In this paper, we propose a novel way of implementing trade surveillance architecture using Apache Ignite In-memory Data Grid (IMDG). Paper discusses the engineering approach to tune system architecture on the single node in terms of achieving high throughput, low latency and then scaling out to multiple nodes.
Trade surveillance is an important concern in recent trading engines to detect and prevent fraudulent trades at earliest. In traditional trading platforms, to achieve high throughput and low latency requirements focus...
详细信息
ISBN:
(纸本)9781450357821
Trade surveillance is an important concern in recent trading engines to detect and prevent fraudulent trades at earliest. In traditional trading platforms, to achieve high throughput and low latency requirements focus of developers has always been on high-performance languages such as C, C++ and FPGA based systems. These systems have limitations of scalability and fault-tolerance. With the arrival of in-memory technology these requirements can be met with Java-based frameworks like Ignite, Flink, Spark. In this paper, we propose a novel way of implementing trade surveillance architecture using Apache Ignite In-memory Data Grid (IMDG). Paper discusses the engineering approach to tune system architecture on the single node in terms of achieving high throughput, low latency and then scaling out to multiple nodes.
暂无评论