Big Data is a massive volume of both unstructured and structured data. It is crucial to efficiently represent big data as knowledge for data management. Ontologies provide knowledge as a formal description of a domain...
详细信息
ISBN:
(纸本)9781728151731
Big Data is a massive volume of both unstructured and structured data. It is crucial to efficiently represent big data as knowledge for data management. Ontologies provide knowledge as a formal description of a domain of interest. Therefore, the ontology learning approach is proposed for Apache Cassandra. It is composed of six mapping rules and converts OWL ontology from data in Cassandra by applying these mapping rules. NorthWind dataset is applied for demonstrating how to learn ontology from data in Cassandra. The evaluation result indicates that our approach can learn ontology in covering terminologically the modeled domain since the adequacy of extracted ontology is greater than 15%.
Modern trends in the agriculture domain have made people realize the importance of big data. The key challenge of big data in agriculture is to identify the effectiveness of big data analytics. Moreover, how big data ...
详细信息
ISBN:
(纸本)9781509057733
Modern trends in the agriculture domain have made people realize the importance of big data. The key challenge of big data in agriculture is to identify the effectiveness of big data analytics. Moreover, how big data analytics can be used to improve the productivity in agricultural practices. The purpose of the proposed research is to reduce the technological gap between rural communities and information through recommendations and decision support system. The main contribution of this paper is to propose an open source, cost-effective and scalable big data analytics architecture for an Agro advisory system. As a part of implementation, an analytic framework for big data application development is built and implemented. Also, a prototype application for crop yield prediction is implemented for cotton crop in Ahmedabad district, Gujarat, India.
In this paper, we introduce an alternative solution to the many existing IoT data acquisition and storage systems. We present a self-designed and developed prototype electronic circuit extension for Raspberry Pi devel...
详细信息
ISBN:
(纸本)9781728111544
In this paper, we introduce an alternative solution to the many existing IoT data acquisition and storage systems. We present a self-designed and developed prototype electronic circuit extension for Raspberry Pi development board used for collecting sensor data. There is also presented a Pi4Java API based Java application used for sensor data collection and storage. We set up an Apache Cassandra database cluster, to stores large amounts of sensor data on lowcost servers, providing high availability. In addition, a web application is also presented, that allows different data visualization operations to be performed on the stored data. The presented system is a full IoT data acquisition, storage and visualization solution
The importance of Big Data is being realised worldwide with the advancement of information technologies, leveraging the capabilities of virtualization and cloud computing. Big Data infrastructure and the use of its to...
详细信息
ISBN:
(纸本)9781450372466
The importance of Big Data is being realised worldwide with the advancement of information technologies, leveraging the capabilities of virtualization and cloud computing. Big Data infrastructure and the use of its tools and applications will significantly transform the data centers of businesses in the next decade. Data analytics is evolving with the new real-time capability of Big Data solutions to provide business intelligence for timely and effective decision making. However, Big Data poses various challenges related to the infrastructure and resource constraints, and other issues including security and privacy. This paper takes an initial step in recognizing the value of creating Big Data infrastructure for delivering high performance and scalable business intelligence in an organization. It presents the state-of-the-art tools and technologies for Big Data infrastructure and NIST framework. The advantages of data visualisation are illustrated thorough industry case scenarios. The Big Data trends and challenges are also discussed. Overall, this paper contributes to providing valuable insights unto the Big Data journey of an organization to enable a scalable infrastructure for achieving mission critical decision-making through data visualisation.
Considering the wide usage of databases and their ever growing size, it is crucial to improve the query processing performance. Selection of an appropriate set of indexes for the workload processed by the database sys...
详细信息
ISBN:
(纸本)9781479999255
Considering the wide usage of databases and their ever growing size, it is crucial to improve the query processing performance. Selection of an appropriate set of indexes for the workload processed by the database system is an important part of physical design and performance tuning. This selection is a non-trivial tasks, especially considering possible number of native indexes in modern databases. We introduce a new approach to the index selection problem using data mining. The method recommends the creation of indexes as well as the type of each index. This results in more precise index recommendations that allows not only to create ascending and descending indexes, but also special indexes supported by the database system. Mining of queries results in candidate indexes for which virtual indexes get created. As the approach does not require modifications of the database system, it is generically applicable. Evaluations of the scalability are given for different workloads for the document-based nosql database MongoDB.
Nowadays, there are many options for corpus linguistic analysis that make use of different approaches for corpus storage. There are tools based on SQL databases, dedicated implementations such as CQP/CWB and others th...
详细信息
ISBN:
(纸本)9781643681177;9781643681160
Nowadays, there are many options for corpus linguistic analysis that make use of different approaches for corpus storage. There are tools based on SQL databases, dedicated implementations such as CQP/CWB and others that employ plain-text corpora. nosql databases have been widely used for big data, data mining and even sentiment analysis. However, as far as we can see, there is a lack of a widespread concordancer or consolidated framework that makes use of MongoDB architecture for the purposes of corpus linguistics. This paper aims to describe the architecture of a software that allows users to analyse monolingual and bilingual parallel corpora with grammatical annotation using MongoDB technology. Our premises are that MongoDB is ideal for non-structured data and provides high flexibility and scalability, so it may be also useful for corpus linguistic research. We analyse functionalities of MongoDB such as text search indexes and query format in order to examine its suitability.
A Tokamak device consists of numerous control systems, which need to be integrated. CODAC (Control, Data Access and Communication) system requires the configuration settings of these control systems to carry out the i...
详细信息
A Tokamak device consists of numerous control systems, which need to be integrated. CODAC (Control, Data Access and Communication) system requires the configuration settings of these control systems to carry out the integration smoothly. SDD (Self-description data) is designed to describe the static configuration of control systems. ITER CODAC group has released an SDD software package for control system designers to manage the static configuration, but it is specific for ITER plant control systems. Following the idea of ITER SDD, we developed a flexible and scalable SDD framework to develop SDD software for J-TEXT and other sophisticated devices. The SDD framework describes the configuration settings of various control systems, including physical and logical elements and their relation information, in SDD models which are classified into Components and Connections. The framework is composed of three layers: the MongoDB database, an open-source, dynamic schema, nosql (Not Only SQL) database;the SDD service, which maps SDD models to MongoDB and handles the transaction and business logic;the SDD applications, which can be used to create and maintain SDD information, and generate various kinds of output using the stored SDD information. (C) 2015 Elsevier B.V. All rights reserved.
Big Data refers to a data set which collects large and complex data that is hard to process using traditional applications [6]. With the increasing usage of RFID technology and location based services, logistics wareh...
详细信息
ISBN:
(纸本)9781467385718
Big Data refers to a data set which collects large and complex data that is hard to process using traditional applications [6]. With the increasing usage of RFID technology and location based services, logistics warehouse control is facing Big Data. Volume Information generated by each product can be huge as it moves through different locations. Focused on traceability, the system proposed in this paper allows through different modules to follow movement of products, manage inventories and optimize stock bearings for materials destined for the production and / or distribution. In fact, the proposed system collect products locations using RFID technologies, models and warehouse products trajectories to support the logistics management and decision-makings such as logistics planning and scheduling of a warehouse.
The paper work is mainly focussing on query optimization of database on cloud computing environment. We are also taking care of implementation of such cloud databases on public as well as private cloud system. During ...
详细信息
ISBN:
(纸本)9781728188768
The paper work is mainly focussing on query optimization of database on cloud computing environment. We are also taking care of implementation of such cloud databases on public as well as private cloud system. During the research of this topic, after going through various literatures available, we found that resource allocation on private cloud and public cloud is really a challenging one and also working on cloud databases for optimization on nosql databases through various cluster will be a new concepts of implementation and we wish to see their real-time results based on various operation.
Estimating performance models parameters of cloud systems presents several challenges due to the distributed nature of the applications, the chains of interactions of requests with architectural nodes, and the paralle...
详细信息
ISBN:
(纸本)9781728128887
Estimating performance models parameters of cloud systems presents several challenges due to the distributed nature of the applications, the chains of interactions of requests with architectural nodes, and the parallelism and coordination mechanisms implemented within these systems. In this work, we present a new inference algorithm for model parameters, called state divergence (SD) algorithm, to accurately estimate resource demands in a complex cloud application. Differently from existing approaches, SD attempts to minimize the divergence between observed and modeled marginal state probabilities for individual nodes within an application, therefore requiring the availability of probabilistic measures from both the system and the underpinning model. Validation against a case study using the Apache Cassandra nosql database and random experiments show that SD can accurately predict demands and improve system behavior modeling and prediction.
暂无评论