A mathematical model is constructed that describes the dynamics of fractional-differential locally nonequilibrium in time convection-diffusion process of soluble substances in plain-vertical established filtration wit...
详细信息
A mathematical model is constructed that describes the dynamics of fractional-differential locally nonequilibrium in time convection-diffusion process of soluble substances in plain-vertical established filtration with free boundary. The respective boundary-value problem is formulated and the technique is outlined to derive its approximate solution. parallel algorithms for calculation of cluster systems are developed, the results of testing the response of parallel algorithms for GPU and the results of numerical experiments on simulation of the dynamics of the migration process under study are presented.
Considerable effort is currently being spent designing neuromorphic hardware for addressing challenging problems in a variety of pattern-matching applications. These neuromorphic systems offer low power architectures ...
详细信息
ISBN:
(纸本)9781509061839
Considerable effort is currently being spent designing neuromorphic hardware for addressing challenging problems in a variety of pattern-matching applications. These neuromorphic systems offer low power architectures with intrinsically parallel and simple spiking neuron processing elements. Unfortunately, these new hardware architectures have been largely developed without a clear justification for using spiking neurons to compute quantities for problems of interest. Specifically, the use of spiking for encoding information in time has not been explored theoretically with complexity analysis to examine the operating conditions under which neuromorphic computing provides a computational advantage (time, space, power, etc.) In this paper, we present and formally analyze the use of temporal coding in a neural-inspired algorithm for optimization-based computation in neural spiking architectures.
An assortative edge switch is an operation on a labeled network, where two edges are randomly selected and the end vertices are swapped with each other if the labels of the end vertices of the edges remain invariant. ...
详细信息
ISBN:
(纸本)9781510838222
An assortative edge switch is an operation on a labeled network, where two edges are randomly selected and the end vertices are swapped with each other if the labels of the end vertices of the edges remain invariant. Assortative edge switch has important applications in studying the mixing pattern and dynamic behavior of social networks, modeling and analyzing dynamic networks, and generating random networks. In this paper, we present an efficient sequential algorithm and a distributed-memory parallel algorithm for assortative edge switch. To our knowledge, they are the first efficient algorithms for this problem. The dependencies among successive assortative edge switch operations, the requirement of maintaining the assortative coefficient invariant, keeping the network simple, and balancing the computation loads among the processors pose significant challenges in designing a parallel algorithm. Our parallel algorithm achieves a speedup of 68 - 772 with 1024 processors for a wide variety of networks.
We present and evaluate a new GPU algorithm based on the Louvain method for community detection. Our algorithm is the first for this problem that parallelizes the access to individual edges. In this way we can fine tu...
详细信息
We present and evaluate a new GPU algorithm based on the Louvain method for community detection. Our algorithm is the first for this problem that parallelizes the access to individual edges. In this way we can fine tune the load balance when processing networks with nodes of highly varying degrees. This is achieved by scaling the number of threads assigned to each node according to its degree. Extensive experiments show that we obtain speedups up to a factor of 270 compared to the sequential algorithm. The algorithm consistently outperforms other recent shared memory implementations and is only one order of magnitude slower than the current fastest parallel Louvain method running on a Blue Gene/Q supercomputer using more than 500K threads.
In order to improve the efficiency of the sparse matrices multiplication task on a traditional cluster supercomputer, it is necessary to take into account different levels of parallelism when programming. To work arou...
详细信息
In order to improve the efficiency of the sparse matrices multiplication task on a traditional cluster supercomputer, it is necessary to take into account different levels of parallelism when programming. To work around these problems the dataflow computing model with the dynamically formed context and the architecture of the parallel dataflow computing system can be used. The article describes the implementation of a parallel algorithm of the sparse matrices multiplication task on the parallel dataflow computing system. The experiments performed on the emulator of the system demonstrate the application perspectiveness of the dataflow computing model for this class of tasks.
There is no doubt that data compression is very important in computer engineering. However, most lossless data compression and decompression algorithms are very hard to parallelize, because they use dictionaries updat...
详细信息
There is no doubt that data compression is very important in computer engineering. However, most lossless data compression and decompression algorithms are very hard to parallelize, because they use dictionaries updated sequentially. The main contribution of this paper is to present a new lossless data compression method that we call adaptive loss-less (ALL) data compression. It is designed so that the data compression ratio is moderate, but decompression can be performed very efficiently on the graphics processing unit (GPU). This makes sense for applications such as training of deep learning, in which compressed archived data are decompressed many times. To show the potentiality of ALL data compression method, we have evaluated the running time using five images and five text data and compared ALL with previously published lossless data compression methods implemented in the GPU, Gompresso, CULZSS, and LZW. The data compression ratio of ALL data compression is better than the others for eight data out of these 10 data. Also, our GPU implementation on GeForce GTX1080 GPU for ALL decompression runs 84.0 to 231 times faster than the CPU implementation on Corei7-4790 CPU. Further, it runs 1.22 to 23.5 times faster than Gompresso, CULZSS, and LZW running on the same GPU.
Coresets are compact representations of data sets such that models trained on a coreset are provably competitive with models trained on the full data set. As such, they have been successfully used to scale up clusteri...
详细信息
The information diffusion and dissemination define critical dynamics observed in large complex networks. The underlying information propagation topology, however, is often hidden or incomplete because of the lack of e...
详细信息
The information diffusion and dissemination define critical dynamics observed in large complex networks. The underlying information propagation topology, however, is often hidden or incomplete because of the lack of explicit citations of the sources. We proposed a scalable parallel algorithm to derive the node embeddings to better understand the information dissemination patterns and predict emergent cascades of viral events in online media. Unlike previous works which concentrate on modeling the links of information propagation, our algorithm infers the topic-specific output influence and the input selectivity of nodes. The parallel algorithm iteratively merges local node embeddings in particular communities to obtain the global optimal results so that the processing of cascades can be significantly accelerated. Based on the obtained latent representation of nodes, the emergent cascades of viral news events in online media can be successfully predicted with an 80\% accuracy at its early stage. Experimental results show that our parallel inference algorithm achieves a 10-fold acceleration and requires a low communication overhead, while the accuracy of the cascade size prediction is preserved.
We present the SIBIA (Scalable Integrated Biophysics-based Image Analysis) framework for joint image registration and biophysical inversion and we apply it to analyse MR images of glioblastomas (primary brain tumors)....
详细信息
We present the SIBIA (Scalable Integrated Biophysics-based Image Analysis) framework for joint image registration and biophysical inversion and we apply it to analyse MR images of glioblastomas (primary brain tumors). Given the segmentation of a normal brain MRI and the segmentation of a cancer patient MRI, we wish to determine tumor growth parameters and a registration map so that if we "grow a tumor" (using our tumor model) in the normal segmented image and then register it to the segmented patient image, then the registration mismatch is as small as possible. We call this "the coupled problem" because it two-way couples the biophysical inversion and registration problems. In the image registration step we solve a large-deformation diffeomorphic registration problem parameterized by an Eulerian velocity field. In the biophysical inversion step we estimate parameters in a reaction-diffusion tumor growth model that is formulated as a partial differential equation (PDE). In SIBIA, we couple these two steps in an iterative manner. We first presented the components of SIBIA in "Gholami et al, Framework for Scalable Biophysics-based Image Analysis, IEEE/ACM Proceedings of the SC2017", in which we derived parallel distributed memory algorithms and software modules for the decoupled registration and biophysical inverse problems. In this paper, our contributions are the introduction of a PDE-constrained optimization formulation of the coupled problem, the derivation of the optimality conditions, and the derivation of a Picard iterative scheme for the solution of the coupled problem. In addition, we perform several tests to experimentally assess the performance of our method on synthetic and clinical datasets. We demonstrate the convergence of the SIBIA optimization solver in different usage scenarios. We demonstrate that using SIBIA, we can accurately solve the coupled problem in three dimensions (2563resolution) in a few minutes using 11 dual-x86 *** Codes 49K20, 49
A distributed algorithm is described for finding a common fixed point of a family of m > 1 nonlinear maps Mi: IRn→ IRnassuming that each map is a paracontraction and that such a common fixed point exists. The comm...
详细信息
暂无评论