Biological networks have recently gathered much attraction in finding their motifs. Motifs can be considered as subgraphs that occur in a particular network at significantly higher frequencies than random networks. Th...
详细信息
This dissertation demonstrates the possibility of obtaining strong speedups for a variety of parallel applications versus the best serial and parallel implementations on commodity platforms. These results were obtaine...
详细信息
This dissertation demonstrates the possibility of obtaining strong speedups for a variety of parallel applications versus the best serial and parallel implementations on commodity platforms. These results were obtained using the PRAM-inspired Explicit Multi-Threading (XMT) many-core computing platform, which is designed to efficiently support execution of both serial and parallel code and switching between the two. Biconnectivity: For finding the biconnected components of a graph, we demonstrate speedups of 9x to 33x on XMT relative to the best serial algorithm using a relatively modest silicon budget. Further evidence suggests that speedups of 21x to 48x are possible. For graph connectivity, we demonstrate that XMT outperforms two contemporary NVIDIA GPUs of similar or greater silicon area. Prior studies of parallel biconnectivity algorithms achieved at most a 4x speedup, but we could not find biconnectivity code for GPUs to compare biconnectivity against them. Triconnectivity: We present a parallel solution to the problem of determining the triconnected components of an undirected graph. We obtain significant speedups on XMT over the only published optimal (linear-time) serial implementation of a triconnected components algorithm running on a modern CPU. To our knowledge, no other parallel implementation of a triconnected components algorithm has been published for any platform. Burrows-Wheeler compression: We present novel work-optimal parallel algorithms for Burrows-Wheeler compression and decompression of strings over a constant alphabet and their empirical evaluation. To validate these theoretical algorithms, we implement them on XMT and show speedups of up to 25x for compression, and 13x for decompression, versus bzip2, the de facto standard implementation of Burrows-Wheeler compression. Fast Fourier transform (FFT): Using FFT as an example, we examine the impact that adoption of some enabling technologies, including silicon photonics, would have on the perfo
The read alignment (sequence alignment) is one of the most basic and time-consuming problems in Bioinformatics. In this paper, a CPU-GPU parallel long-read alignment method is studied to solve this problem. A lightwei...
详细信息
The residue number system (RNS) provides parallel, carry-free, and high-speed arithmetic and is therefore a good tool for high-performance computing. However, operations such as magnitude comparison, sign computation,...
详细信息
The residue number system (RNS) provides parallel, carry-free, and high-speed arithmetic and is therefore a good tool for high-performance computing. However, operations such as magnitude comparison, sign computation, overflow detection, scaling, and division are difficult to perform in RNS, since it is problematic to determine the magnitude of an RNS number. In order to resolve this problem, we propose to compute the interval evaluation of the fractional representation of an RNS number in floating-point arithmetic of limited precision. No matter what the size of the moduli set and dynamic range, only small arithmetic operations are required, and most of the computations are performed in parallel with threads, which allows for efficient implementation of our method on many general-purpose computing platforms. Using this method, we propose new algorithms for magnitude comparison and general division in RNS and implement them for GPUs using the CUDA platform. We evaluate the performance of our algorithms on an NVIDIA GTX 1080 GPU using sets of 4 to 256 RNS moduli that provide dynamic ranges from 64 to 4096 bits. Experimental results show that the proposed new algorithms are efficient for large moduli sets and clearly outperform the existing RNS magnitude comparison and division algorithms in terms of execution time.
In an ever more data-centric economy, machine learning models have risen in importance. With the large amounts of data companies collect, they are able to develop highly accurate models to predict the behaviours of th...
详细信息
In an ever more data-centric economy, machine learning models have risen in importance. With the large amounts of data companies collect, they are able to develop highly accurate models to predict the behaviours of their customers. It is thus important to safeguard the data used to build these models to prevent competitors from mimicking their services. In addition, as this type of techniques finds its way into areas that need to deal with more sensitive information, like the medical industry, the privacy of the data that needs to be classified also has to be ensured. Herein, this topic is addressed by homomorphically evaluating Support Vector Machine (SVM) models, in a way that guarantees that a client learns nothing about the model except for the classification of his data, and that the service provider learns nothing about the data. Whereas, previously, Fully Homomorphic Encryption (FHE) has mostly focused on either bit-wise or value-wise computations, SVMs present an additional challenge since they combine both: during an initial phase a kernel function is evaluated that makes use of real arithmetic, and during a second phase the sign bit has to be extracted. Novel techniques are herein proposed that allow for speedups of up to 2.7 and 6.6 for the evaluation of polynomials and the determination of sign, respectively, in comparison to the state of the art. Finally, it is shown that the proposed techniques do not deteriorate the classification accuracy of the SVM models.
A treasure is placed in one of M boxes according to a known distribution and k searchers are searching for it in parallel during T rounds. How can one incentivize selfish players so that the probability that at least ...
详细信息
A treasure is placed in one of M boxes according to a known distribution and k searchers are searching for it in parallel during T rounds. How can one incentivize selfish players so that the probability that at least one player finds the treasure is maximized? We focus on congestion policies C(l) specifying the reward a player receives being one of the l players that (simultaneously) find the treasure first. We prove that the exclusive policy, in which C(1) = 1 and C(l) = 0 for l > 1, yields a price of anarchy of (1 -(1 - 1/k)(k))(-1), which is the best among all symmetric reward policies. We advocate the use of symmetric equilibria, and show that besides being fair, they are highly robust to crashes of players. Indeed, in many cases, if some small fraction of players crash, symmetric equilibria remain efficient in terms of their group performance while also serving as approximate equilibria. (C) 2020 Elsevier Inc. All rights reserved.
We present a numerical scheme for solving an inverse problem for parameter estimation in tumor growth models for glioblastomas, a form of aggressive primary brain tumor. The growth model is a reaction-diffusion partia...
详细信息
We present a numerical scheme for solving an inverse problem for parameter estimation in tumor growth models for glioblastomas, a form of aggressive primary brain tumor. The growth model is a reaction-diffusion partial differential equation (PDE) for the tumor concentration. We use a PDE-constrained optimization formulation for the inverse problem. The unknown parameters are the reaction coefficient (proliferation), the diffusion coefficient (infiltration), and the initial condition field for the tumor PDE. Segmentation of magnetic resonance imaging (MRI) scans drive the inverse problem where segmented tumor regions serve as partial observations of the tumor concentration. Like most cases in clinical practice, we use data from a single time snapshot. Moreover, the precise time relative to the initiation of the tumor is unknown, which poses an additional difficulty for inversion. We perform a frozen-coefficient spectral analysis and show that the inverse problem is severely ill-posed. We introduce a biophysically motivated regularization on the structure and magnitude of the tumor initial condition. In particular, we assume that the tumor starts at a few locations (enforced with a sparsity constraint on the initial condition of the tumor) and that the initial condition magnitude in the maximum norm is equal to one. We solve the resulting optimization problem using an inexact quasi-Newton method combined with a compressive sampling algorithm for the sparsity constraint. Our implementation uses PETSc and AccFFT libraries. We conduct numerical experiments on synthetic and clinical images to highlight the improved performance of our solver over a previously existing solver that uses standard two-norm regularization for the calibration parameters. The existing solver is unable to localize the initial condition. Our new solver can localize the initial condition and recover infiltration and proliferation. In clinical datasets (for which the ground truth is unknown), our sol
The Quality of Service (QoS) in Mobile Edge Computing (MEC) systems is significantly dependent on the application offloading and placement decisions. Due to the movement of users in MEC networks, an optimal applicatio...
详细信息
The Quality of Service (QoS) in Mobile Edge Computing (MEC) systems is significantly dependent on the application offloading and placement decisions. Due to the movement of users in MEC networks, an optimal application placement might turn into the least efficient placement in few minutes. Thus, it is crucial to take the dynamics of the system into account when designing application placement mechanisms. On the other hand, energy consumption of servers is a significant component of the cost of services in MEC systems and must also be considered in the design of the mechanisms. In this article, we model the problem of energy-aware application placement in edge computing systems as a multi-stage stochastic program. The objective is to maximize the QoS of the system while taking into account the limited energy budget of the edge servers. To solve the problem, we design a novel parallel Sample Average Approximation (SAA) algorithm. We conduct an extensive experimental analysis to evaluate the performance of the proposed algorithm using real-world trace data.
The convex hull problem has practical applications in mesh generation, file searching, cluster analysis, collision detection, image processing, statistics, etc. In this paper, we present a novel pruning-based approach...
详细信息
Multi-join queries are important operations in data management systems and data integration systems, and their efficiency has attracted the attention of researchers. In recent years, graphics processing units (GPUs) h...
详细信息
Multi-join queries are important operations in data management systems and data integration systems, and their efficiency has attracted the attention of researchers. In recent years, graphics processing units (GPUs) have developed rapidly and become a powerful tool for parallel computing, providing a new idea for multi-join query optimization. This paper studies the use of GPU technology to optimize multi-join queries and focuses on two points: 1) a multi-phase optimization strategy and 2) optimization methods of each stage. For the first point, we discuss a two-phase optimization strategy on the GPU and prove the effectiveness of this strategy. For the second point, we provide an establishment method of a minimum cost join tree on the GPU, the parallel execution methods of intra-join and inter-join on the GPU, and a strategy of scheduling multiple joins to execute in parallel on the GPU. Experimental results show that the multi-join query optimization proposed in this paper improves the efficiency of multi-join queries, especially in the case of high load and complex join queries, achieving higher throughput than that of previous optimization algorithms.
暂无评论