We present scalable and parallel versions of Lipmaa's computationally-private information retrieval (CPIR) scheme [20], which provides log-squared communication complexity. In the proposed schemes, instead of bina...
详细信息
We present scalable and parallel versions of Lipmaa's computationally-private information retrieval (CPIR) scheme [20], which provides log-squared communication complexity. In the proposed schemes, instead of binary decision diagrams utilized in the original CPIR, we employ an octal tree based approach, in which non-sink nodes have eight child nodes. Using octal trees offers two advantages: i) a serial implementation of the proposed scheme in software is faster than the original scheme and ii) its bandwidth usage becomes less than the original scheme when the number of items in the data set is moderately high (e.g., 4,096 for 80-bit security level using Damgard-Jurik cryptosystem). In addition, we present a highly-optimized parallel algorithm for shared-memory multi-core/processor architectures, which minimizes the number of synchronization points between the cores. We show that the parallel implementation is about 50 times faster than the serial implementation for a data set with 4,096 items on an eight-core machine. Finally, we propose a hybrid algorithm that scales the CPIR scheme to larger data sets with small overhead in bandwidth complexity. We demonstrate that the hybrid scheme based on octal trees can lead to more than two orders of magnitude faster parallel implementations than serial implementations based on binary trees. Comparison with the original as well as the other schemes in the literature reveals that our scheme is the best in terms of bandwidth requirement.
Can we learn from the unknown? Logical data sets of the ternary kind are often found in information systems. They contain unknown as well as true/false values. An unknown value may represent a missing entry (lost or i...
详细信息
Can we learn from the unknown? Logical data sets of the ternary kind are often found in information systems. They contain unknown as well as true/false values. An unknown value may represent a missing entry (lost or indeterminable) or have meaning, like a Don't Know response in a questionnaire. In this paper, we introduce algorithms for reducing the dimensionality of logical data (categorical data in general) in the context of a new data mining challenge: Ternary Matrix Factorization (TMF). For a ternary data matrix, TMF exploits ternary logic to produce a basis matrix (which holds the major patterns in the data) and a usage matrix (which maps patterns to original observations). Both matrices are interpretable, and their ternary matrix product approximates the original matrix. TMF has applications in (1) finding targeted structure in ternary data, (2) imputing values through pattern discovery in highly incomplete categorical data sets, and (3) solving instances of its encapsulated Binary Matrix Factorization problem. Our elegant algorithm FasTer (FASt TERnary Matrix Factorization) has linear run-time complexity with respect to the dimensions of the data set and is parameter-robust. A variant of FasTer that exploits useful results from combinatorics provides accuracy bounds for a core part of the algorithm in certain situations. Experiments on synthetic and real-world data sets show that our algorithms are able to outperform state-of-the-art techniques in all three TMF applications with respect to run-time and effectiveness. Finally, convincing speedup and efficiency results on a parallel version of FasTer demonstrate its suitability for weak-and strong-scaling scenarios.
We introduce a new strategy for coupling the parallel in time (parareal) iterative methodology with multiscale integrators. Following the parareal framework, the algorithm computes a low-cost approximation of all slow...
详细信息
We introduce a new strategy for coupling the parallel in time (parareal) iterative methodology with multiscale integrators. Following the parareal framework, the algorithm computes a low-cost approximation of all slow variables in the system using an appropriate multiscale integrator, which is refined using parallel fine scale integrations. Convergence is obtained using an alignment algorithm for fast phase-like variables. The method may be used either to enhance the accuracy and range of applicability of the multiscale method in approximating only the slow variables, or to resolve all the state variables. The numerical scheme does not require that the system is split into slow and fast coordinates. Moreover, the dynamics may involve hidden slow variables, for example, due to resonances. We propose an alignment algorithm for almost-periodic solutions, in which case convergence of the parareal iterations is proved. The applicability of the method is demonstrated in numerical examples.
Background: Metagenomics is a genomics research discipline devoted to the study of microbial communities in environmental samples and human and animal organs and tissues. Sequenced metagenomic samples usually comprise...
详细信息
Background: Metagenomics is a genomics research discipline devoted to the study of microbial communities in environmental samples and human and animal organs and tissues. Sequenced metagenomic samples usually comprise reads from a large number of different bacterial communities and hence tend to result in large file sizes, typically ranging between 1-10 GB. This leads to challenges in analyzing, transferring and storing metagenomic data. In order to overcome these data processing issues, we introduce MetaCRAM, the first de novo, parallelized software suite specialized for FASTA and FASTQ format metagenomic read processing and lossless compression. Results: MetaCRAM integrates algorithms for taxonomy identification and assembly, and introduces parallel execution methods;furthermore, it enables genome reference selection and CRAM based compression. MetaCRAM also uses novel reference-based compression methods designed through extensive studies of integer compression techniques and through fitting of empirical distributions of metagenomic read-reference positions. MetaCRAM is a lossless method compatible with standard CRAM formats, and it allows for fast selection of relevant files in the compressed domain via maintenance of taxonomy information. The performance of MetaCRAM as a stand-alone compression platform was evaluated on various metagenomic samples from the NCBI Sequence Read Archive, suggesting 2- to 4-fold compression ratio improvements compared to gzip. On average, the compressed file sizes were 2-13 percent of the original raw metagenomic file sizes. Conclusions: We described the first architecture for reference-based, lossless compression of metagenomic data. The compression scheme proposed offers significantly improved compression ratios as compared to off-the-shelf methods such as zip programs. Furthermore, it enables running different components in parallel and it provides the user with taxonomic and assembly information generated during execution of the
The objective of this paper is to develop a robust maximum likelihood estimation (MLE) for the stochastic state space model via the expectation maximisation algorithm to cope with observation outliers. Two types of ou...
详细信息
The objective of this paper is to develop a robust maximum likelihood estimation (MLE) for the stochastic state space model via the expectation maximisation algorithm to cope with observation outliers. Two types of outliers and their influence are studied in this paper: namely,the additive outlier (AO) and innovative outlier (IO). Due to the sensitivity of the MLE to AO and IO, we propose two techniques for robustifying the MLE: the weighted maximum likelihood estimation (WMLE) and the trimmed maximum likelihood estimation (TMLE). The WMLE is easy to implement with weights estimated from the data;however, it is still sensitive to IO and a patch of AO outliers. On the other hand, the TMLE is reduced to a combinatorial optimisation problem and hard to implement but it is efficient to both types of outliers presented here. To overcome the difficulty, we apply the parallel randomised algorithm that has a low computational cost. A Monte Carlo simulation result shows the efficiency of the proposed algorithms.
In this paper, we present single- and multi-node optimizations of SU2, a widely-used, open-source Computational Fluid Dynamics application, aimed at improving performance and scalability for implicit Reynolds-averaged...
详细信息
In this paper, we present single- and multi-node optimizations of SU2, a widely-used, open-source Computational Fluid Dynamics application, aimed at improving performance and scalability for implicit Reynolds-averaged Navier-Stokes calculations on unstructured grids. Typical industry-standard implementations are currently limited by unstructured accesses, variable degrees of parallelism, as well as the global synchronizations inherent in traditionally used Krylov linear solvers. Therefore, we rely on aggressive single-node optimizations, such as hierarchical parallelism, dynamic threading, compacted memory layout, and vectorization, along with a communication-friendly agglomeration (geometric) linear multi grid solver. Based on results with the well-known ONERA M6 geometry, our single core and shared memory optimizations result in a speedup of 2.6X on the latest 14-core Intel (R) Xeon (TM) (1) E5-2697v3 processor when compared to the baseline SU2 implementation with 14 MPI ranks. In multi-node settings, the hybrid OpenMP+MPI multigrid implementation achieves 2X higher parallel efficiency on 256 nodes over conventional Krylov-based (GMRES) methods. (C) 2016 Elsevier Ltd. All rights reserved.
Due to the increasing complexity of software systems, there is a growing need for automated and scalable software synthesis and analysis. In the last decade, active research in the formal methods community brought int...
详细信息
Due to the increasing complexity of software systems, there is a growing need for automated and scalable software synthesis and analysis. In the last decade, active research in the formal methods community brought interesting results and valuable tools. However, there are still challenges to face and hard problems that need to be solved. We briefly outline some recent trends, and review some of the latest achievements, introducing six papers selected from the 20th International Conference on Tools and algorithms for the Construction and Analysis of Systems (TACAS 2014).
The observation of interactions between neurons of a network can reveal important information about how information is processed within that network. Such observation can be established with the analysis of causality ...
详细信息
The observation of interactions between neurons of a network can reveal important information about how information is processed within that network. Such observation can be established with the analysis of causality between the activities of the different neurons in the network. This analysis is called effective connectivity analysis. However, methods for such analysis are either computationally heavy for daily use or too inaccurate for making reliable analyses. Cox method produces reliable analysis, but the computation takes hours on CPUs, making it slow to use on research. In this paper, two algorithms are presented that speed up analysis of Cox method by parallelizing the computation on a graphical processing unit (GPU) with the help of a Compute Unified Device Architecture platform. Both algorithms are evaluated according to the network size and recording duration. The interest of proposing GPU implementations is in gaining the computation time but another important interest is that such implementation requires rethinking the algorithm in different ways than as the sequential implementation. This rethinking itself brings new optimization possibilities, e.g. by employing OpenCL. Utilizing this accelerated implementation, the Cox method is then applied on an experimental dataset from CRCNS in a personal computer. This should facilitate observations of biological neural network organizations that can provide new insights to improve understanding of memory, learning and intelligence.
Finite state automata (FSA) are used by many network processing applications to match complex sets of regular expressions in network packets. In order to make FSA-based matching possible even at the ever-increasing sp...
详细信息
Finite state automata (FSA) are used by many network processing applications to match complex sets of regular expressions in network packets. In order to make FSA-based matching possible even at the ever-increasing speed of modern networks, multi-striding has been introduced. This technique increases input parallelism by transforming the classical FSA that consumes input byte by byte into an equivalent one that consumes input in larger units. However, the algorithms used today for this transformation are so complex that they often result unfeasible for large and complex rule sets. This paper presents a set of new algorithms that extend the applicability of multi-striding to complex rule sets. These algorithms can transform nondeterministic finite automata (NFA) into their multi-stride form with reduced memory and time requirements. Moreover, they exploit the massive parallelism of graphical processing units for NFA-based matching. The final result is a boost of the overall processing speed on typical regex-based packet processing applications, with a speedup of almost one order of magnitude compared to the current state-of-the-art algorithms.
In this paper we analyze and extend mesh-free algorithms for three-dimensional data transfer problems in partitioned multiphysics simulations. We first provide a direct comparison between a mesh-based weighted residua...
详细信息
In this paper we analyze and extend mesh-free algorithms for three-dimensional data transfer problems in partitioned multiphysics simulations. We first provide a direct comparison between a mesh-based weighted residual method using the common-refinement scheme and two mesh-free algorithms leveraging compactly supported radial basis functions: one using a spline interpolation and one using a moving least square reconstruction. Through the comparison we assess both the conservation and accuracy of the data transfer obtained from each of the methods. We do so for a varying set of geometries with and without curvature and sharp features and for functions with and without smoothness and with varying gradients. Our results show that the mesh-based and mesh-free algorithms are complementary with cases where each was demonstrated to perform better than the other. We then focus on the mesh-free methods by developing a set of algorithms to parallelize them based on sparse linear algebra techniques. This includes a discussion of fast parallel radius searching in point clouds and restructuring the interpolation algorithms to leverage data structures and linear algebra services designed for large distributed computing environments. The scalability of our new algorithms is demonstrated on a leadership class computing facility using a set of basic scaling studies. These scaling studies show that for problems with reasonable load balance, our new algorithms for both spline interpolation and moving least square reconstruction demonstrate both strong and weak scalability using more than 100,000 MPI processes with billions of degrees of freedom in the data transfer operation. (C) 2015 Elsevier Inc. All rights reserved.
暂无评论