The Shipyard Management Information System is a computer-based system that processes data on virtually e^/ery element of operations within a United States Naval Shipyard. This thesis examines the characteristics and o...
详细信息
The Shipyard Management Information System is a computer-based system that processes data on virtually e^/ery element of operations within a United States Naval Shipyard. This thesis examines the characteristics and objectives of a general management information system and the philosophies of the Ship- yard Management Information System as envisioned by the Naval Sea Systems Command. The motivations for, and the characteristics of, distributedprocessing as they apply to management informations systems are presented. It will be shown how an effective distributed data processing system can be created from selective addition of equipment to, and purposeful personnel reorganiza- tion of, the dataprocessing configuration at Mare Island Naval Shipyard.
We have proposed a Web-based sensor network constructed of Web-based sensor nodes and a remote management system. The Web-based sensor nodes consist of communication units and measurement devices with Web servers. The...
详细信息
We have proposed a Web-based sensor network constructed of Web-based sensor nodes and a remote management system. The Web-based sensor nodes consist of communication units and measurement devices with Web servers. The management system has intelligent processing and rule-based function to manage them flexibly via the Internet and performs various image analyses easily with Web application services. By distributing the image analyses to Web application services, our proposed system provides versatile and scalable dataprocessing. We demonstrated that it can realize the desired image analyses effectively and perform complicated management by changing its operations depending on the results of analysis. (C) 2011 Elsevier B.V. All rights reserved.
With the development of synchrotron radiation sources and high-frame-rate detectors, the amount of experimental data collected at synchrotron radiation beamlines has increased exponentially. As a result, data processi...
详细信息
With the development of synchrotron radiation sources and high-frame-rate detectors, the amount of experimental data collected at synchrotron radiation beamlines has increased exponentially. As a result, dataprocessing for synchrotron radiation experiments has entered the era of big data. It is becoming increasingly important for beamlines to have the capability to process large-scale data in parallel to keep up with the rapid growth of data. Currently, there is no set of dataprocessing solutions based on the big data technology framework for beamlines. Apache Hadoop is a widely used distributed system architecture for solving the problem of massive data storage and computation. This paper presents a set of distributed data processing schemes for beamlines with experimental data using Hadoop. The Hadoop distributed File System is utilized as the distributed file storage system, and Hadoop YARN serves as the resource scheduler for the distributed computing cluster. A distributed data processing pipeline that can carry out massively parallel computation is designed and developed using Hadoop Spark. The entire dataprocessing platform adopts a distributed microservice architecture, which makes the system easy to expand, reduces module coupling and improves reliability.
The Systolic Hypertension in the Elderly Program (SHEP) Pilot was a collaborative clinical trial that distributed to the clinics all dataprocessing tasks except for randomization assignment codes and morbidity and mo...
详细信息
The Systolic Hypertension in the Elderly Program (SHEP) Pilot was a collaborative clinical trial that distributed to the clinics all dataprocessing tasks except for randomization assignment codes and morbidity and mortality data. The clinics used customized programs to enter and verify data interactively, to maintain their own local master files, and to transmit the data electronically to the Coordinating Center. We measured quality control based on criteria from centralized as well as distributed models: the error rate for baseline forms was 0.5 per 1000 items. Ninety-eight percent of the forms were query-free, and a central reentry of the data in a 5% sample yielded a miskey rate of 2 per 1000 items. The potential problems of distributed data processing are vulnerability of the local master files and the time demands on Coordinating Center programmers for maintaining clinic computer systems. The advantages are the active involvement of clinical staff in their own quality control, the functional accessibility of the clinics to the Coordinating Center in controlling protocol decisions and data monitoring, and the level of accuracy, completeness, and timeliness of the data that can be achieved.
In recent years, research on big data, data storage and other topics that represent innovations in the analytics field has become very popular. This paper describes a proposal of a big web data application and archive...
详细信息
ISBN:
(纸本)9783319243061;9783319243054
In recent years, research on big data, data storage and other topics that represent innovations in the analytics field has become very popular. This paper describes a proposal of a big web data application and archive for the distributed data processing with Apache Hadoop, including the framework with selected methods, which can be used with this platform. It proposes a workflow to create a web content mining application and a big data archive, which uses modern technologies like Python, PHP, JavaScript, MySQL and cloud services. It also shows the overview about the architecture, methods and data structures used in the context of web mining, distributedprocessing and big data analytics.
In recent years, distributeddata-processing frameworks have become popular for processing big data. However, in an HPC, where the computation and storage nodes are separated, the bandwidth between the computation and...
详细信息
ISBN:
(纸本)9798350370621
In recent years, distributeddata-processing frameworks have become popular for processing big data. However, in an HPC, where the computation and storage nodes are separated, the bandwidth between the computation and storage components is small, causing a reduction in dataprocessing throughput. Therefore, in this paper, data were stored on the computation node to solve the dataprocessing throughput degradation. We propose an I/O acceleration method that integrates Apache Arrow and CHFS. It leverages non-volatile memory, a state-of-the-art storage device, via CHFS and leverages CHFS from a distributed data processing framework via Apache Arrow's abstract file system API. The evaluation results showed that the system achieved up to 11.60 times higher bandwidth than when reading data from the parallel file system Lustre. This study also compared with Apache Arrow with BeeOND and UnifyFS, other ad hoc filesystems. The proposed Apache Arrow CHFS showed up to 1.67x/1.23x better write performance. The implementation is published at https://***/tsukuba-hpcs/arrow-chfs
In the paper a concept of the measuring instrument with distributed data processing, cooperating with PC is presented. A functional configuration of the instrument performing additional functions connected with measur...
详细信息
ISBN:
(纸本)9789638841001
In the paper a concept of the measuring instrument with distributed data processing, cooperating with PC is presented. A functional configuration of the instrument performing additional functions connected with measurement result accuracy assessment as well as respective algorithms are shown. The principles of cooperation with PC computer are discussed. The laboratory implementation of the designed instrument and scope of its use for teaching process is proposed.
The development of control systems often implies necessity of the organization of parallel computing processes or the implementation of software of distributed data processing. Solving this problems we should carry ou...
详细信息
ISBN:
(纸本)0780364864
The development of control systems often implies necessity of the organization of parallel computing processes or the implementation of software of distributed data processing. Solving this problems we should carry out such stages as constructing system structure, developing scheme of processes communication, choosing special language tools of the parallel programming. Such formal tools of simulation as Petri nets and its extensions can lighten the resolving of this problems. The communication of parallel processes is described by the net structure, which can be easily obtained from the process state graph and the. algorithm implemented by this process. It is important to note that Petri nets allow to describe asynchronous and parallel independent events. Occurrences of transitions act as events and result in creating, moving and deleting tokens. In this case the processes are represented by tokens, and the cortege of token attributes acts as a data segment of the process. To this effect the object-oriented modification of the hierarchical E-nets is developed. At the developing of the given E-nets modification we have been concentrated on the creating of the open program interface. By this we have achieved two important factors. First, ne can bind the net events and any calculations or input/output operations by any available code library. Second, constructed in that way the kernel of a distributed system is independent and can be used in a resulted software product. Though the development of the corresponding design tools are not finished, the effectiveness of the given approach is confirmed by the results of scientific work of authors.
Current trends, such as the increase in the number of Internet-connected devices, exponential growth of data volumes, development of cloud technologies, change all spheres of economy and business. Global network traff...
详细信息
ISBN:
(纸本)9781728117393
Current trends, such as the increase in the number of Internet-connected devices, exponential growth of data volumes, development of cloud technologies, change all spheres of economy and business. Global network traffic growth dictates the business need to configure large-scale systems and networks. At the same time, the explosive growth of data transmission forces state and commercial enterprises to look for opportunities to quickly reduce current costs in the continuous implementation of new services and technologies. The paper proposes the development of a web-oriented digital platform of distributed data processing for rendering digital educational services for people with disabilities. In the quality of software implementation of the developed digital platform proposed to use PHP framework Pandora. The architecture of the digital platform is represented as UML class diagrams, plug-ins and widgets. The developed digital platform, thanks to a wide range of software tools and open APIs, implements the ability to quickly connect and deploy new digital services.
Spatial data partitioning strategy plays an important role in GIS spatial datadistributed storage and processing, its key problem is how to partition spatial data to distributed nodes in network environment. Existing...
详细信息
ISBN:
(纸本)9780819469540
Spatial data partitioning strategy plays an important role in GIS spatial datadistributed storage and processing, its key problem is how to partition spatial data to distributed nodes in network environment. Existing main spatial data partitioning methods doesn't consider spatial locality and unstructured variable length characteristics of spatial data, these methods simply partition spatial data based on one or more attributes value that could result in storage capacity imbalance between distributedprocessing nodes. Aiming at these, we point out the two basic principles that spatial data partitioning should meet to in this paper. We propose a new spatial data partitioning method based on hierarchical decomposition method of low order Hilbert space-filling curve, which could avoid excessively intensive space partitioning by hierarchically decomposing subspaces. The proposed method uses Hilbert curve to impose a linear ordering on the multidimensional spatial objects, and partition the spatial objects according to this ordering. Experimental results show the proposed spatial data partitioning method not only achieves better storage load balance between distributed nodes, but also keeps well spatial locality of data objects after partitioning.
暂无评论