There are two main approaches to graph databases: based on RDF model and based on labeled property graph model. RDF is well known and studied, but modern graph dataabases with labeled property graph model are studied ...
详细信息
ISBN:
(纸本)9781728146232
There are two main approaches to graph databases: based on RDF model and based on labeled property graph model. RDF is well known and studied, but modern graph dataabases with labeled property graph model are studied much lesser. In this paper we evaluated several possible solutions for storing and querying graph data using Gremlin - general purpose graph query language from Apache TinkerPop. We used LDBC Graphalytics framework and compared NoSQL-based setups with SQL-based setups. We evaluated JanusGraph on HBase both on single machine and cluster and SQLG on top of PostgreSQL and H2. We used datasets from the different domains and of different sizes up to tens of millions vertices and edges. Evaluation results show that for the used workload SQLG with PostgreSQL is about ten times faster than JanusGraph on HBase and SQLG with H2 performance is in between.
An incremental approach is proposed for deriving an adaptive Distinguishing Test Case (DTC) for a subset of states of an observable non-deterministic Finite-State Machine (FSM). The approach considers the states of th...
详细信息
An incremental approach is proposed for deriving an adaptive Distinguishing Test Case (DTC) for a subset of states of an observable non-deterministic Finite-State Machine (FSM). The approach considers the states of the subset incrementally while checking the existence of a DTC. Experiments were conducted to assess and compare various versions of the incremental approach with respect to a non-incremental counterpart. In addition, two implementations of an efficient heuristic approach for the considered problem are proposed. The implementations are based on a special traversal of a successor tree up to certain height using some established construction rules. Experiments were conducted to assess the execution time and quality of obtained solutions for large FSMs. Moreover, we determine how often a DTC exists while varying the number of states, outputs and non-determinism of a given FSM. A complete summary of the obtained results is included.
The requirements for speed and capacity of data storage systems permanently increase. As a result, NoSQL and NewSQL database management systems became more popular nowadays. The CAP theorem implies that for a distribu...
详细信息
ISBN:
(纸本)9781728146232
The requirements for speed and capacity of data storage systems permanently increase. As a result, NoSQL and NewSQL database management systems became more popular nowadays. The CAP theorem implies that for a distributed data storage, one usually has to choose a tradeoff between its availability and consistency. However, the developers of distributed data storages could use these terms not in their original meanings, making the end users misunderstand the limitations of the systems. Moreover, even if the limitations are described in detail in the documentation, the software could have errors. Therefore a need for testing of data storage systems, not only from the performance point of view, but also for consistency and other properties. In this paper we present our results for consistency analysis for Apache Ignite using Jepsen framework.
Text recognition problem has been studied many years. A few OCR engines exist, which successfully solve the problem for many languages. But these engines work well only with high quality scanned images. Social network...
详细信息
ISBN:
(纸本)9781728146232
Text recognition problem has been studied many years. A few OCR engines exist, which successfully solve the problem for many languages. But these engines work well only with high quality scanned images. Social networks nowadays contain large number of images that need to analyze and recognize the text contained in them, but they have different quality: mixed text with images, poor quality images taken from camera of smartphone, etc. In this paper a text extraction pipeline is provided to address text extraction from various quality images collected form social media. Input images are categorized into different classes and then class specific preprocessing is applied to them (illumination improvement, text localization etc.). Then OCR engine used to recognize text. In the paper we present results of our experiments on dataset collected from social media.
The paper addresses practical challenges related to the development and application of distributed software packages of the Orlando Tools framework to solve real problems. Such packages include a special class of scie...
详细信息
ISBN:
(纸本)9783030410056;9783030410049
The paper addresses practical challenges related to the development and application of distributed software packages of the Orlando Tools framework to solve real problems. Such packages include a special class of scientific applications characterized by a wide class of problem solvers, modular structure of software, algorithmic knowledge implemented by modules, computations scalability, execution in heterogeneous resources, etc. It is adapted for various categories of users: developers, administrators, and end-users. Unlike other tools for developing scientific applications, Orlando Tools provides supports for the intensive evolution of algorithmic knowledge, adaptation of existed and designing new ones. It has the capability to extend the class of solved problems. We implement and automate the non-trivial technological sequence of the collaborative development and use of packages including the continuous integration, delivery, deployment, and execution of package modules in a heterogeneous distributed environment that integrates grid and cloud computing. This approach reduces the complexity of the collaborative development and use of packages, and increases software operation predictability through the preliminary detecting and eliminating errors with significant reduction of the correcting cost.
The protection of data processing is emerging as an essential aspect of data analytics, machine learning, delegation of computation, Internet of Things, medical and financial analysis, smart cities, genomics, non-disc...
详细信息
ISBN:
(纸本)9781728195865
The protection of data processing is emerging as an essential aspect of data analytics, machine learning, delegation of computation, Internet of Things, medical and financial analysis, smart cities, genomics, non-disclosure searching, among others. Often, they use sensitive information that cannot be protected by traditional cryptosystems. Homomorphic Encryption (HE) schemes and secure Multi-Party Computation (MPC) are considered suitable solutions for privacy protection. In this paper, we propose and analyze the performance of three homomorphic Logistic Regression (LR) models with Gradient Descent (GD) algorithms based on the Residue Number system (RNS). We compare their performance with four traditional non-homomorphic versions, one homomorphic algorithm based on RNS with Batch GD, and two state-of-the-art homomorphic algorithms. To validate our approach, we consider six public datasets of different medicine domains (diabetes, cancer, drugs, etc.) and genomics. We use a 5-fold cross-validation technique for a fair comparison in terms of the solution quality and training time. The results show that propose homomorphic solutions have similar accuracy with non-homomorphic algorithms, increased classification performance, and decreased training time compared with the state-of-the-art HE algorithms.
CERN experiments are preparing for the HL-LHC era, which will bring an unprecedented volume of scientific data. These data will need to be stored and processed by thousands of physicists, but expected resource growth ...
详细信息
CERN experiments are preparing for the HL-LHC era, which will bring an unprecedented volume of scientific data. These data will need to be stored and processed by thousands of physicists, but expected resource growth is nowhere near the extrapolated requirements of existing models, in terms of both storage volume and compute power. Opportunistic CPU resources such as HPCs and university clusters can provide extra CPU cycles, but there is no opportunistic storage. In this article, we will present the main architectural ideas, deployment details, and test results, with emphasis on our research to build a prototype of a distributed data processing and storage system with a focus on optimizing the efficiency of resources by reducing overhead costs for accessing the data. The described prototype was built using the geographically distributed WLCG sites and university clusters in Russia.
In this paper we introduce ISP-Fuzzer, an extendable fuzzing framework. The framework supports plugins which makes possible to tune it for any fuzzing task. ISP-Fuzzer capable of performing fuzzing for: files, standar...
详细信息
ISBN:
(纸本)9781728146232
In this paper we introduce ISP-Fuzzer, an extendable fuzzing framework. The framework supports plugins which makes possible to tune it for any fuzzing task. ISP-Fuzzer capable of performing fuzzing for: files, standard input, network, network protocols. As well it can generate BNF structured data for compilers and interpreters fuzzing. The framework supports number of plugins for performing: code static analysis, dynamic symbolic execution, directed fuzzing etc. ISP-Fuzzer designed to run on multiprocessor and distributed systems. During experimental setup the tool has detected number of defects in binary files from different Linux distributions.
暂无评论