Anomalies during system execution can be detected by automated analysis of logs generated by the system. However, large scale systems can generate tens of millions of lines of logs within days. Centralized implementat...
详细信息
ISBN:
(纸本)9781728108582
Anomalies during system execution can be detected by automated analysis of logs generated by the system. However, large scale systems can generate tens of millions of lines of logs within days. Centralized implementations of traditional machine learning algorithms are not scalable for such data. Therefore, we recently introduced a distributed log analysis framework for anomaly detection. In this paper, we introduce an extension of this framework, which can detect anomalies earlier via incremental analysis instead of the existing offline analysis approach. In the extended version, we periodically process the log data that is accumulated so far. We conducted controlled experiments based on a benchmark dataset to evaluate the effectiveness of this approach. We repeated our experiments with various periods that determine the frequency of analysis as well as the size of the data processed each time. Results showed that our online analysis can improve anomaly detection time significantly while keeping the accuracy level same as that is obtained with the offline approach. The only exceptional case, where the accuracy is compromised, rarely occurs when the analysis is triggered before all the log data associated with a particular session of events are collected.
The COVID-19 global pandemic is an unprecedented health crisis. Many researchers around the world have produced an extensive collection of literature since the outbreak. Analysing this information to extract knowledge...
详细信息
ISBN:
(数字)9780738123943
ISBN:
(纸本)9781665415637
The COVID-19 global pandemic is an unprecedented health crisis. Many researchers around the world have produced an extensive collection of literature since the outbreak. Analysing this information to extract knowledge and provide meaningful insights in a timely manner requires a considerable amount of computational power. Cloud platforms are designed to provide this computational power in an on-demand and elastic manner. Specifically, hybrid clouds, composed of private and public data centers, are particularly well suited to deploy computationally intensive workloads in a cost-efficient, yet scalable manner. In this paper, we developed a system utilising the Aneka Platform as a Service middleware with parallel processing and multi-cloud capability to accelerate the data process pipeline and article categorising process using machine learning on a hybrid cloud. The results are then persisted for further referencing, searching and visualising. The performance evaluation shows that the system can help with reducing processing time and achieving linear scalability. Beyond COVID-19, the application might be used directly in broader scholarly article indexing and analysing.
Neutron flux distribution inside the core of large size nuclear reactors is a function of space and time. An online Flux Mapping System (FMS) is needed to monitor the core during the reactor operation. FMS estimates t...
ISBN:
(数字)9781728168517
ISBN:
(纸本)9781728168524
Neutron flux distribution inside the core of large size nuclear reactors is a function of space and time. An online Flux Mapping System (FMS) is needed to monitor the core during the reactor operation. FMS estimates the core flux distribution from the measurements of few in-core detectors using an appropriate algorithm. Here, a distributed Artificial Neural Network (D-ANN) model is developed using parallel-forward multi-layer perceptron architecture to capture the spatial core flux variation in a nuclear reactor. The proposed D-ANN model is tested with simulated test case data of Advanced Heavy Water Reactor (AHWR) for multiple operating conditions of the reactor. The model estimates the neutron flux in all horizontal mesh locations (2-D) from the multiple networks distributed spatially across AHWR core. Estimation error using the proposed D-ANN model is found to be significantly lower than that with lumped ANN model. Validation exercises establish that this D-ANN model could effectively capture the spatial variations in the reactor core and therefore could be utilized for efficient flux mapping. The real time implementation of D-ANN based flux mapping method is also proposed.
Apache Kafka, which is a high-throughput distributed message processing system, has been leveraged by the majority of enterprise for its outstanding performance. Unlike common cloud-based access control architectures,...
详细信息
ISBN:
(数字)9781728190747
ISBN:
(纸本)9781728183824
Apache Kafka, which is a high-throughput distributed message processing system, has been leveraged by the majority of enterprise for its outstanding performance. Unlike common cloud-based access control architectures, Kafka service providers often need to build their systems on other enterprises' high-performance cloud platforms. However, since the cloud platform belongs to a third party, it is not necessarily reliable. Paradoxically, it has been demonstrated that Kafka's data is stored in the cloud in the plaintext form, and thus poses a serious risk of user privacy leakage. In this paper, we propose a secure fine-grained data transmission scheme called Secure Door on Cloud (SDoC) to protect the data from being leaked in Kafka. SDoC is not only more secure than Kafka's built-in security mechanism, but also can effectively prevent third-party cloud from stealing plaintext data. To evaluate the performance of the SDoC, we simulate normal inter-entity communication and show that Kafka with SDoC integration has a lower data transfer time overhead than that of Kafka with built-in security mechanism opened.
The proceedings contain 8 papers. The topics discussed include: scalable hyperparameter optimization with lazy Gaussian processes;understanding scalability and fine-grain parallelism of synchronous data parallel train...
ISBN:
(纸本)9781728159850
The proceedings contain 8 papers. The topics discussed include: scalable hyperparameter optimization with lazy Gaussian processes;understanding scalability and fine-grain parallelism of synchronous data parallel training;DisCo: physics-based unsupervised discovery of coherent structures in spatiotemporal systems;GradVis: visualization and second order analysis of optimization surfaces during the training of deep neural networks;metaoptimization on a distributed system for deep reinforcement learning;scheduling optimization of parallel linear algebra algorithms using supervised learning;parallel data-local training for optimizing Word2Vec embeddings for word and graph embeddings;and fine-grained exploitation of mixed precision for faster CNN training.
This paper deals with the development of 3D printing machine using a parallel mechanism based robot. Discussed topics are the high motion resolution, high rigidity, high precision and workspace constraints of the para...
详细信息
The blockchain uses a decentralized consensus mechanism to maintain the books in an immutable way, which ensures the blockchain smart contract system highly secure. In existing blockchain systems, all user information...
详细信息
ISBN:
(纸本)9781728133638
The blockchain uses a decentralized consensus mechanism to maintain the books in an immutable way, which ensures the blockchain smart contract system highly secure. In existing blockchain systems, all user information is disclosed in the blockchain. However, currently users begin to pay more and more attention to personal privacy, therefore the future blockchain smart contract system needs not only to keep immutability but also to protect user privacy. To achieve this goal, in this paper we propose a Privacy-Protected Blockchain System, where all data is encrypted within a controllable period of time. Although the data is visible from a historical perspective, our design can effectively protect user privacy and against deceivers, making the system more secure and healthy.
Hadoop is a Java-based open source programming model that becomes a pillar in recent distributedcomputing by providing massive storage and multiprocessing of data. Hadoop becomes a simple and easy to implement progra...
详细信息
ISBN:
(纸本)9781538694831
Hadoop is a Java-based open source programming model that becomes a pillar in recent distributedcomputing by providing massive storage and multiprocessing of data. Hadoop becomes a simple and easy to implement programming model through the usage of commodity hardware. Hadoop open-source became a foundation for massive parallel processing of big data which includes scientific analytics, e-commerce data and sales planning, and processing enormous volumes of sensor from various sensors. Since Hadoop plays vital role in distributedparallel processing of big data, it is good to know the technical details behind the hadoop framework. This paper focus on detailing the necessary steps for the successful implementation of a Hadoop single-node cluster in a individual computer which provide a foundation for the new hadoop users.
The advent of the Internet of Things (IoT) significantly stimulates the development of context-aware applications. Complex event processing (CEP) is a technology for real-time data processing. However, a single node o...
详细信息
Many problems in scientific and engineering applications contain sparse matrices or graphs as main input objects, e.g., numerical simulations on meshes. Large inputs are abundant these days and require parallel proces...
详细信息
Many problems in scientific and engineering applications contain sparse matrices or graphs as main input objects, e.g., numerical simulations on meshes. Large inputs are abundant these days and require parallel processing for memory size and speed. To optimize the execution of such simulations on cluster systems, the input problem needs to be distributed suitably onto the processing units (PUs). More and more frequently, such clusters contain different CPUs or a combination of CPUs and GPUs. This heterogeneity makes the load distribution problem quite challenging. Our study is motivated by the observation that established partitioning tools do not handle such heterogeneous distribution problems as well as homogeneous ones. In this paper, we first formulate the problem of balanced load distribution for heterogeneous architectures as a multiobjective, single-constraint optimization problem. We then split the problem into two phases and propose a greedy approach to determine optimal block sizes for each PU. These block sizes are then fed into numerous existing graph partitioners, for us to examine how well they handle the above problem. One of the tools we consider is an extension of our own previous work (von Looz et al., ICPP'18) called Geographer. Our experiments on well-known benchmark meshes indicate that only two tools under consideration are able to yield good quality. These two are ParMetis (both the geometric and the combinatorial variant) and Geographer. While ParMetis is faster, Geographer yields better quality on average.
暂无评论