For the control of measurements as well as for taking over and pre-processing of the measurement data the NMR-spectrometer BS 487 C/TESLA has been connected on-line with an AMCA 80-microcomputer. This experimental equ...
详细信息
For the control of measurements as well as for taking over and pre-processing of the measurement data the NMR-spectrometer BS 487 C/TESLA has been connected on-line with an AMCA 80-microcomputer. This experimental equipment has been integrated in the local computer network ATLAS. For the data transmission and the computer coupling the CAMAC-system has been used. The storage and time intensive programmes for the spectra evaluations are-running in the EG-1040 computer. The data transfer is controlled by the help of a two-party, cooperative communication type protocol. The data acquisition and processing work in dialogue are running under real-time conditions.
With the increasing amount of available data, distributed data processing systems like Apache Flink and Apache Spark have emerged that allow to analyze large-scale datasets. However, such engines introduce significant...
详细信息
ISBN:
(纸本)9781509027729
With the increasing amount of available data, distributed data processing systems like Apache Flink and Apache Spark have emerged that allow to analyze large-scale datasets. However, such engines introduce significant computational overhead compared to non-distributed implementations. Therefore, the question arises when using a distributedprocessing approach is actually beneficial. This paper helps to answer this question with an evaluation of the performance of the distributed data processing framework Apache Flink. In particular, we compare Apache Flink executed on up to 50 cluster nodes to single-threaded implementations executed on a typical laptop for three different benchmarks: TPC-H Query 10, Connected Components, and Gradient Descent. The evaluation shows that the performance of Apache Flink is highly problem dependent and varies from early outperformance in case of TPC-H Query 10 to slower runtimes in case of Connected Components. The reported results give hints for which problems, input sizes, and cluster resources using a distributed data processing system like Apache Flink or Apache Spark is sensible.
Hadoop becomes de facto standard framework for big data analysis due to its scalability. Despite of the importance of Hadoop's scalability, there are a few works have been made on the scalability in multi-rack clu...
详细信息
ISBN:
(纸本)9781467356107
Hadoop becomes de facto standard framework for big data analysis due to its scalability. Despite of the importance of Hadoop's scalability, there are a few works have been made on the scalability in multi-rack clusters. In multi-rack clusters of real world, network topology becomes a major scalability bottleneck due to the limited network switch capacity. It is a waste of resources to add servers to a Hadoop cluster in such situation. Therefore, it is helpful for users to save cost by efficiently measuring the network influence to Hadoop before they add a new server to their clusters. In this paper, we describe a Hadoop performance model for the multi-rack clusters. We modeled network influence on Hadoop and achieved about 95% accuracy to the real measurement. Furthermore, we predicted Hadoop scalability in large clusters with our model and show Hadoop scales enough even in multi-rack clusters.
暂无评论