Semi-supervised learning (SSL) utilizes plenty of unlabeled examples to boost the performance of learning from limited labeled examples. Due to its great discriminant power, SSL has been widely applied to various real...
详细信息
The proceedings contain 22 papers. The topics discussed include: extending manual GUI testing beyond defects by building mental models of software behavior;software development analytics: experiences and the way forwa...
ISBN:
(纸本)9781467397759
The proceedings contain 22 papers. The topics discussed include: extending manual GUI testing beyond defects by building mental models of software behavior;software development analytics: experiences and the way forward;a conceptual framework for the comparison of fully automated GUI testing techniques;testing approach for mobile applications through reverse engineering of UI patterns;data mining methods and cost estimation models;empirical analysis on parallel tasks in crowdsourcing software development;analytics for software project management - where are we and where do we go?;a method to evaluate estimates produced by the capture-recapture model;using collective intelligence to support multi-objective decisions: collaborative and online preferences;RepMine: a system for transferrable analyses of collaboration activities in software engineering;an automated contextual collaboration approach for distributed agile delivery;and comparing model coverage and code coverage in model driven testing: an exploratory study.
Given a point p and a set of points S, the kNN operation finds the k closest points to p in S. It is a computational intensive task with a large range of applications such as knowledge discovery or data mining. Howeve...
详细信息
Given a point p and a set of points S, the kNN operation finds the k closest points to p in S. It is a computational intensive task with a large range of applications such as knowledge discovery or data mining. However, as the volume and the dimension of data increase, only distributed approaches can perform such costly operation in a reasonable time. Recent works have focused on implementing efficient solutions using the MapReduce programming model because it is suitable for large scale data processing. Also, it can easily be executed in a distributed environment. Although these works provide different solutions to the same problem, each one has particular constraints and properties. There is no readily available comparison to help users choose the one most appropriate for their needs. This is the problem we address in this work. Firstly, we show that all kNN implementations go through a common workflow, which we use as a basis for classification. Secondly, we describe precisely the different techniques published so far. And lastly, we provide a set of objective criteria that can be used to make informed decisions.
Particle and Heavy Ion code System, PHITS, is a general-purpose Monte Carlo code, which has been used by many users in various fields of research and development. Two parallel computing functions are available in PHIT...
详细信息
ISBN:
(纸本)9781510808041
Particle and Heavy Ion code System, PHITS, is a general-purpose Monte Carlo code, which has been used by many users in various fields of research and development. Two parallel computing functions are available in PHITS to reduce its computation time. One is the distributed-memory parallelization using protocols of message passing interface (MPI) and the other is the shared-memory parallelization using open multi-processing (OpenMP) directives. Each function has advantages and disadvantages, and the performances depend on simulation details such as geometry, materials, choices of the physics models, and so on. By adopting both MPI and OpenMP parallelization functions in PHITS, the parallel computing can be flexibly adjusted to suit users' needs. On supercomputer systems, the so-called hybrid parallelization using both functions can be also performed with the inter-node MPI parallelization and the intra-node OpenMP parallelization. The explanation is given for both MPI and OpenMP parallelization functions and the performance of PHITS was tested with some applications using a typical workstation computer. The performance of the hybrid parallelization of PHITS on a supercomputer was also tested using K computer at RIKEN. The good parallelization efficiency for strong scaling (96.2 %) was confirmed up to 2,048 nodes (X8 intra-node cores).
With the exploding growth of data, the computational complexity required by learning Support Vector Machine (SVM) lays a heavy burden on real-world applications. To address this issue, parallel computational technique...
详细信息
ISBN:
(纸本)9781479986989
With the exploding growth of data, the computational complexity required by learning Support Vector Machine (SVM) lays a heavy burden on real-world applications. To address this issue, parallel computational techniques can be employed such as the Graphics processing Units (GPUs) and MapReduce model. As it is well known, GPUs are microprocessors on a multi-core architecture which reveal high performance in mass data parallel computing, and MapReduce allows computational tasks to be divided into a plurality of parts, distributed to various computing nodes and combined on a single node. In this paper, we propose a GPU-based MapReduce framework to accelerate SVM learning by jointly utilizing the parallel computing power of GPU and MapReduce. Extensive experimental results have verified the effectiveness and efficiency of the proposed approach.
Approximate pattern discovery is one of the fundamental and challenging problems in computer science. Fast and high performance algorithms are highly demanded in many applications in bioinformatics and computational m...
详细信息
Approximate pattern discovery is one of the fundamental and challenging problems in computer science. Fast and high performance algorithms are highly demanded in many applications in bioinformatics and computational molecular biology, which are the domains that are mostly and directly benefit from any enhancement of pattern matching theoretical knowledge and solutions. This paper proposed an efficient GPU implementation of fuzzified Aho-Corasick algorithm using Levenshtein method and N-gram technique as a solution for approximate pattern matching problem.
The proceedings contain 29 papers. The special focus in this conference is on Intelligent Data Analysis. The topics include: Data analytics and optimisation for assessing a ride sharing system;constraint-based queryin...
ISBN:
(纸本)9783319244648
The proceedings contain 29 papers. The special focus in this conference is on Intelligent Data Analysis. The topics include: Data analytics and optimisation for assessing a ride sharing system;constraint-based querying for bayesian network exploration;efficient model selection for regularized classification by exploiting unlabeled data;segregation discovery in a social network of companies;a first-order-logic based model for grounded language learning;a paralleldistributedprocessing algorithm for image feature extraction;a probabilistic graphical model based approach;diversity-driven widening of hierarchical agglomerative clustering;batch steepest-descent-mildest-ascent for interactive maximum margin clustering;time series classification with representation ensembles;simultaneous clustering and model selection for multinomial distribution;on binary reduction of large-scale multiclass classification problems;probabilistic active learning in datastreams;implicitly constrained semi-supervised least squares classification;diagonal co-clustering algorithm for document-word partitioning;an attributed graph clustering method;class-based outlier detection;using metalearning for prediction of taxi trip duration using different granularity levels;using entropy as a measure of acceptance for multi-label classification;investigation of node deletion techniques for clustering applications of growing self organizing maps;exploratory topic modeling with distributional semantics;assigning geo-relevance of sentiments mined from location-based social media posts and continuous and discrete deep classifiers for data integration.
The proceedings contain 31 papers. The special focus in this conference is on SMT techniques, applications and HW Verification. The topics include: SMT aided linearizability proofs;finding bounded path in graph using ...
ISBN:
(纸本)9783319216676
The proceedings contain 31 papers. The special focus in this conference is on SMT techniques, applications and HW Verification. The topics include: SMT aided linearizability proofs;finding bounded path in graph using SMT for automatic clock routing;cutting the mix;the inez mathematical programming modulo theories framework;using minimal correction sets to more efficiently compute minimal unsatisfiable sets;deciding local theory extensions via E-matching;modular deductive verification of multiprocessor hardware designs;word-level symbolic trajectory evaluation;synthesis through unification;from non-preemptive to preemptive scheduling using synchronization synthesis;counterexample-guided quantifier instantiation for synthesis in SMT;deductive program repair;quantifying conformance using the skorokhod metric;pareto curves of multidimensional mean-payoff games;conflict-driven conditional termination;predicate abstraction and CEGAR for disproving termination of higher-order functional programs;complexity of bradley-manna-sipma lexicographic ranking functions;measuring with timed patterns;automatic verification of stability and safety for delay differential equations;time robustness in MTL and expressivity in hybrid system falsification;adaptive concretization for parallel program synthesis;automatic completion of distributed protocols with symmetry;an axiomatic specification for sequential memory models;an abstraction for distributed almost-synchronous systems and automated and modular refinement reasoning for concurrent programs.
To achieve reliability in distributed storage systems, fault tolerance techniques like replication strategy are adopted. As the rapid growth of data, distributed storage systems have been transitioning replication str...
详细信息
Recently, the real-time synthetic aperture radar (SAR) imaging technique is a hotspot of research in the field of remote sensing and military applications. As the SAR imaging algorithm is associated with high data and...
详细信息
ISBN:
(纸本)9781467392013
Recently, the real-time synthetic aperture radar (SAR) imaging technique is a hotspot of research in the field of remote sensing and military applications. As the SAR imaging algorithm is associated with high data and computation intensive, it is suitable for using hybrid storage systems, e.g. A cluster, for the performance acceleration. To design a SAR algorithm with high performance, we need consider a prerequisite to maximize the parallelizability of the algorithm due to multi-level parallelization features of the cluster platform. Focusing on the large-scale data, we explore concurrency characteristics of the SAR imaging algorithm on a hybrid storage system, and propose some parallel optimization techniques to accelerate the SAR imaging algorithm. According to the study, we implement a parallel SAR imaging algorithm and evaluate its performance. Experiment results show that the optimized SAR imaging program has high-speed network utilization, and can realize obvious improvement on the performance.
暂无评论