Many transient simulations spend a significant portion of the overall runtime solving a linear system. A wide variety of preconditioned linear solvers have been developed to quickly and accurately solve different type...
详细信息
ISBN:
(纸本)9780769546766
Many transient simulations spend a significant portion of the overall runtime solving a linear system. A wide variety of preconditioned linear solvers have been developed to quickly and accurately solve different types of linear systems, each having options to customize the preconditioned solver for a given linear system. Transient simulations may produce significantly different linear systems as the simulation progresses due to special events occurring that make the linear systems more difficult to solve or the model moving closer to a state of equilibrium where the linear systems are easier to solve. Machine learning algorithms provide the ability to dynamically select the preconditioned linear solver for each linear system produced by a simulation. We can generate databases by computing attributes for each linear system, physical attributes for the transient simulation, computational attributes, and running times for a set of preconditioned solvers on each linear system. Machine learning algorithms can then use these databases to generate classifiers capable of dynamically selecting a preconditioned solver for each linear system given a set of attributes. This allows us to quickly and accurately compute each transient simulation using different preconditioned solvers throughout the simulation. This also provides the potential to produce speedups in comparison with using a single preconditioned solver for an entire transient simulation.
On-Line Analytical Processing techniques are used for data analysis and decision support systems. The multidimensionality of the underlying data is well represented by multidimensional databases. For data mining in kn...
详细信息
ISBN:
(纸本)0818684038
On-Line Analytical Processing techniques are used for data analysis and decision support systems. The multidimensionality of the underlying data is well represented by multidimensional databases. For data mining in knowledge discovery, OLAP calculations can be effectively used. For these, high performance parallelsystems are required to provide interactive analysis. Precomputed aggregate calculations in a Data Cube can provide efficient query processing for OLAP applications. In this article, we present parallel data cube construction on distributed-memory, parallel computers from a relational database. Data Cube is used for data mining of associations using Attribute Focusing. Results are presented for these on the IBM-SP2, which show that our algorithms and techniques are scalable to a large number of processors, providing a high performance platform for such applications.
In distributed object-oriented databases (DOODB), objects is distributed in different sites on communication networks. In DOODB, class fragmentation to divide a class into fragments is needed for improving performance...
详细信息
ISBN:
(纸本)0818682272
In distributed object-oriented databases (DOODB), objects is distributed in different sites on communication networks. In DOODB, class fragmentation to divide a class into fragments is needed for improving performance and for reducing data transfer, Class fragmentation is different from conventional relational databases, We have proposed the vertical class fragmentation to reflect the characteristics of object-oriented databases. In this paper we define the attribute partitioning algorithm and describe the results of implementation and comparison.
Improvements in the processing speed of multiprocessors are outpacing improvements in the speed of disk hardware. parallel disk I/O subsystems have been proposed as one way to close the gap between processor and disk ...
详细信息
Improvements in the processing speed of multiprocessors are outpacing improvements in the speed of disk hardware. parallel disk I/O subsystems have been proposed as one way to close the gap between processor and disk speeds. In a previous paper we showed that prefetching and caching have the potential to deliver the performance benefits of parallel file systems to parallel applications. In this paper we describe experiments with practical prefetching policies that base decisions only on on-line reference history, and that can be implemented efficiently. We also test the ability of those policies across a range of architectural parameters.
Integrity constraint handling is considered an important issue in relational database management systems. Many studies were already conducted in this area. Little attention has been paid however to the influence of re...
详细信息
ISBN:
(纸本)0818620528
Integrity constraint handling is considered an important issue in relational database management systems. Many studies were already conducted in this area. Little attention has been paid however to the influence of relation fragmentation and parallelism on constraint handling. This paper shows how relation fragmentation complicates matters on the one hand, but how parallelism can help to get better efficiency in enforcing constraints on the other hand. The ideas as presented in this paper are used in the context of the PRISMA database machine; they have a more general applicability though.
This paper considers text retrieval systems which store extremely huge amounts of text while providing a multi-user retrieval service for a large customer base. Due to the severe I/O demands of such a system, it is us...
详细信息
ISBN:
(纸本)0818620528
This paper considers text retrieval systems which store extremely huge amounts of text while providing a multi-user retrieval service for a large customer base. Due to the severe I/O demands of such a system, it is usually beneficial if not necessary to utilize a multi-processor system with multiple I/O facilities in an effort to increase the parallel I/O activity, the objective being to lower search response *** defining the problem, we model a solution and show that the application can be handled in a very effective fashion by a multi-processor system with a simple LAN-based topology. The final discussion describes a type of functional splitting which, if done in a careful manner, helps improve search response time.
Detecting malware during execution using machine learning models presents some hard-to-solve problems relating to data set construction seldom discussed in the literature. We identify, name these problems and show our...
详细信息
ISBN:
(纸本)9781665432818
Detecting malware during execution using machine learning models presents some hard-to-solve problems relating to data set construction seldom discussed in the literature. We identify, name these problems and show our solutions to them in the form of Curator, a specialized distributed system for detonating potentially malicious programs, extracting behavior information, and correctly labeling said behavior to construct an accurate, consistent, and reliable data set. We demonstrate Curator's need by using generated data sets to train machine learning models based on Naive Bayes, Logistic Regression, and Random Forests. Our work is currently focused on the Windows operating system.
The need for managing massive attributed graphs is becoming common in many areas such as recommendation systems, proteomics analysis, social network analysis or bibliographic analysis. This is making it necessary to m...
详细信息
ISBN:
(纸本)9781450306270
The need for managing massive attributed graphs is becoming common in many areas such as recommendation systems, proteomics analysis, social network analysis or bibliographic analysis. This is making it necessary to move towards parallelsystems that allow managing graph databases containing millions of vertices and edges. Previous work on distributed graph databases has focused on finding ways to partition the graph to reduce network traffic and improve execution time. However, partitioning a graph and keeping the information regarding the location of vertices might be unrealistic for massive graphs. In this paper, we propose parallel-GDB, a new system based on specializing the local caches of any node in this system, providing a better cache hit ratio. parallelGDB uses a random graph partitioning, avoiding complex partition methods based on the graph topology, that usually require managing extra data structures. This proposed system provides an efficient environment for distributed graph databases.
Graph databases are becoming a critical tool for the analysis of graph-structured data in the context of multiple scientific and technical domains, including cybersecurity and computational biology. In particular, the...
详细信息
ISBN:
(纸本)9781509036820
Graph databases are becoming a critical tool for the analysis of graph-structured data in the context of multiple scientific and technical domains, including cybersecurity and computational biology. In particular, the storage, analysis and querying of attributed graphs is a very important capability. Attributed graphs contain properties attached to the vertices and edges of the graph structure. Queries over attributed graphs do not only include structural pattern matching, but also conditions over the values of the attributes. In this work, we present GraQL, a query language designed for high-performance attributed graph databases hosted on a high memory capacity cluster. GraQL is designed to be the front-end language for the attributed graph data model for the GEMS database system.
While large-scale scientific experiments and simulations produce massive amounts of data, a small fraction of data contains useful information. Efficient querying on such volume of data to extract that information inc...
详细信息
ISBN:
(纸本)9781728174457
While large-scale scientific experiments and simulations produce massive amounts of data, a small fraction of data contains useful information. Efficient querying on such volume of data to extract that information increases the productivity of the scientific discovery process. Although querying has been explored extensively in relational databases, research and adoption of querying tools for scientific data that is stored in parallel file systems on high performance computing (HPC) systems are still in infancy. In this paper, we introduce a parallel query service, called PDC-Query, for an object data management systems (ODMS) on HPC systems. It operates on partitioned objects in parallel, and provides several optimization strategies for fast query evaluation. The ODMS paradigm for HPC systems is promising in reducing the burden on users in data management and in moving data transparently across the deep memory hierarchy in modern HPC systems. We propose a 'global histogram'-based approach to accelerate query evaluation, through selectivity estimation and reducing the amount of data that needs to be loaded from storage and processed. We compare querying performance and demonstrate the efficiency and scalability of different approaches PDC-Query supports, including using global histograms, bitmap indexes, sorting, and full scan, in performing various queries on top of a plasma physics dataset with 125 billion particles and an astronomy dataset with 25 million objects.
暂无评论