With the development of intelligent traffic system, high-definition cameras are spread along the urban roads. These devices transmit real-time captured images to data center for multi-purpose usability, but these brin...
详细信息
Detection of strongly connected component (SCC) on the GPU has become a fundamental operation to accelerate graph computing. Existing SCC detection methods on multiple GPUs introduce massive unnecessary data transform...
详细信息
ISBN:
(数字)9781728133201
ISBN:
(纸本)9781728133218
Detection of strongly connected component (SCC) on the GPU has become a fundamental operation to accelerate graph computing. Existing SCC detection methods on multiple GPUs introduce massive unnecessary data transformation between multiple GPUs. In this paper, we propose a novel distributed SCC detection approach using multiple GPUs plus CPU. Our approach includes three key ideas: (1) segmentation and labeling over large-scale datasets; (2) collecting and merging the segmented SCCs; and (3) running tasks assignment over multiples GPUs and CPU. We implement our approach under a hybrid distributed architecture with multiple GPUs plus CPU. Our approach can achieve device-level optimization and can be compatible with the state-of-the-art algorithms. We conduct extensive theoretical and experimental analysis to demonstrate efficiency and accuracy of our approach. The experimental results expose that our approach can achieves 11.2×, 1.2×, 1.2× speedup for SCC detection using NVIDIA K80 compared with Tarjan's, FB-Trim, and FB-Hybrid algorithms respectively.
Background: Bioinformatics research comes into an era of big data. Mining potential value in biological big data for scientific research and health care field has the vital significance. Deep learning as new machine l...
详细信息
Background: Bioinformatics research comes into an era of big data. Mining potential value in biological big data for scientific research and health care field has the vital significance. Deep learning as new machine learning algorithms, on the basis of big data and high performance distributedparallel computing, show the excellent performance in biological big data processing. Objective: Provides a valuable reference for researchers to use deep learning in their studies of processing large biological data. methods: This paper introduces the new model of data storage and computational facilities for big data analyzing. Then, the application of deep learning in three aspects including biological omics data processing, biological imageprocessing and biomedical diagnosis was summarized. Aiming at the problem of large biological data processing, the accelerated methods of deep learning model have been described. Conclusion: The paper summarized the new storage mode, the existing methods and platforms for biological big data processing, and the progress and challenge of deep learning applies in biological big data processing.
Relationships in online social networks often imply social connections in real life. An accurate understanding of relationship types benefits many applications, e.g. social advertising and recommendation. Some recent ...
详细信息
ISBN:
(数字)9781728129037
ISBN:
(纸本)9781728129044
Relationships in online social networks often imply social connections in real life. An accurate understanding of relationship types benefits many applications, e.g. social advertising and recommendation. Some recent attempts have been proposed to classify user relationships into predefined types with the help of pre-labeled relationships or abundant interaction features on relationships. Unfortunately, both relationship feature data and label data are very sparse in real social platforms like WeChat, rendering existing methods inapplicable. In this paper, we present an in-depth analysis of WeChat relationships to identify the major challenges for the relationship classification task. To tackle the challenges, we propose a Local Community-based Edge Classification (LoCEC) framework that classifies user relationships in a social network into real-world social connection types. LoCEC enforces a three-phase processing, namely local community detection, community classification and relationship classification, to address the sparsity issue of relationship features and relationship labels. Moreover, LoCEC is designed to handle large-scale networks by allowing parallel and distributedprocessing. We conduct extensive experiments on the real-world WeChat network with hundreds of billions of edges to validate the effectiveness and efficiency of LoCEC.
We propose an approach to image segmentation that views it as one of pixel classification using simple features defined over the local neighborhood. We use a support vector machine for pixel classification, making the...
详细信息
We propose an approach to image segmentation that views it as one of pixel classification using simple features defined over the local neighborhood. We use a support vector machine for pixel classification, making the approach automatically adaptable to a large number of image segmentation applications. Since our approach utilizes only local information for classification, both training and application of the image segmentor can be done on a distributed computing platform. This makes our approach scalable to larger images than the ones tested. This article describes the methodology in detail and tests it efficacy against 5 other comparable segmentation methods on 2 well-known image segmentation databases. Hence, we present the results together with the analysis that support the following conclusions: (i) the approach is as effective, and often better than its studied competitors;(ii) the approach suffers from very little overfitting and hence generalizes well to unseen images;(iii) the trained image segmentation program can be run on a distributed computing environment, resulting in linear scalability characteristics. The overall message of this paper is that using a strong classifier with simple pixel-centered features gives as good or better segmentation results than some sophisticated competitors and does so in a computationally scalable fashion.
The European Extremely Large Telescope (E-ELT) is one of today's most challenging projects in ground based astronomy. Addressing one of the key science cases for the E-ELT, the study of the early Universe, require...
详细信息
ISBN:
(纸本)9781538643686
The European Extremely Large Telescope (E-ELT) is one of today's most challenging projects in ground based astronomy. Addressing one of the key science cases for the E-ELT, the study of the early Universe, requires the implementation of multi-object adaptive optics (MOAO), a dedicated concept relying on turbulence tomography. We use a novel pseudo-analytical approach to simulate the performance of tomographic reconstruction of the atmospheric turbulence in a MOAO system on real datasets. We simulate simultaneously 4K galaxies in a common field of view on massively parallel supercomputers during a single night of observations. We are able to generate a first-ever high resolution galaxy map at almost a real-time throughput. This simulation scale opens new research horizons in numerical methods for experimental astronomy, some core components of the pipeline standing as pathfinders toward actual operations and future astronomic discoveries on the E-ELT.
MapReduce is a powerful distributed data analysis programming model. It runs on big data storage systems and processes data in a parallel way. An appropriate way to ensure the correctness of MapReduce programs is form...
详细信息
MapReduce is a powerful distributed data analysis programming model. It runs on big data storage systems and processes data in a parallel way. An appropriate way to ensure the correctness of MapReduce programs is formal method analysis, which requires firstly a formal model of MapReduce. In this paper we propose a modeling language to establish the formal model of the MapReduce framework. Unlike other approaches, our language describes the processing of data in the MapReduce programs from a perspective of underlying files and blocks, so that the details of data processing can be clearly demonstrated. The language is based on our previous work, a language describing the management of massive data storage systems, with extensions from two aspects: block content data refinement and concurrency support. Based on our language, the features of the MapReduce programming model can be discussed.
Traditional phishing detection methods are mostly based on computer platforms and cannot be directly applied to mobile devices. This paper proposes a new two-dimensional code phishing detection method called LogoPhish...
详细信息
ISBN:
(纸本)9781728111414
Traditional phishing detection methods are mostly based on computer platforms and cannot be directly applied to mobile devices. This paper proposes a new two-dimensional code phishing detection method called LogoPhish. We use the logo to determine the identity between a two-dimensional code's actual identity and the described identity. The method includes two processes: logo extraction and identity detection. The first process uses a mobile phone to scan the two-dimensional code to extract the logo and perform imageprocessing. The second process uses the Google image search engine to determine the identity of the logo. Since the relationship between the logo and the domain name is exclusive, it is reasonable to use the domain name as an identifier. The experimental results show that LogoPhish has a good effect and is superior to the traditional detection method in the detection of two-dimensional code phishing attacks.
Illegal excavations in archaeological heritage sites (namely "looting") are a global phenomenon. Satellite images are nowadays massively used by archaeologists to systematically document sites affected by lo...
详细信息
Illegal excavations in archaeological heritage sites (namely "looting") are a global phenomenon. Satellite images are nowadays massively used by archaeologists to systematically document sites affected by looting. In parallel, remote sensing scientists are increasingly developing processingmethods with a certain degree of automation to quantify looting using satellite imagery. To capture the state-of-the-art of this growing field of remote sensing, in this work 47 peer-reviewed research publications and grey literature are reviewed, accounting for: (i) the type of satellite data used, i.e., optical and synthetic aperture radar (SAR);(ii) properties of looting features utilized as proxies for damage assessment (e.g., shape, morphology, spectral signature);(iii) imageprocessing workflows;and (iv) rationale for validation. Several scholars studied looting even prior to the conflicts recently affecting the Middle East and North Africa (MENA) region. Regardless of the method used for looting feature identification (either visual/manual, or with the aid of imageprocessing), they preferred very high resolution (VHR) optical imagery, mainly black-and-white panchromatic, or pansharpened multispectral, whereas SAR is being used more recently by specialist image analysts only. Yet the full potential of VHR and high resolution (HR) multispectral information in optical imagery is to be exploited, with limited research studies testing spectral indices. To fill this gap, a range of looted sites across the MENA region are presented in this work, i.e., Lisht, Dashur, and Abusir el Malik (Egypt), and Tell Qarqur, Tell Jifar, Sergiopolis, Apamea, Dura Europos, and Tell Hizareen (Syria). The aim is to highlight: (i) the complementarity of HR multispectral data and VHR SAR with VHR optical imagery, (ii) usefulness of spectral profiles in the visible and near-infrared bands, and (iii) applicability of methods for multi-temporal change detection. Satellite data used for the demonstrati
In this paper, we present a new distributed algorithm for minimizing a sum of non-necessarily differentiable convex functions composed with arbitrary linear operators. The overall cost function is assumed strongly con...
详细信息
ISBN:
(纸本)9781479970612
In this paper, we present a new distributed algorithm for minimizing a sum of non-necessarily differentiable convex functions composed with arbitrary linear operators. The overall cost function is assumed strongly convex. Each involved function is associated with a node of a hypergraph having the ability to communicate with neighboring nodes sharing the same hyperedge. Our algorithm relies on a primal-dual splitting strategy with established convergence guarantees. We show how it can be efficiently implemented to take full advantage of a multicore architecture. The good numerical performance of the proposed approach is illustrated in a problem of video sequence denoising, where a significant speedup is achieved.
暂无评论