With the quick development of open source software, quantity of software is produced in the open source community (OSC) [1]. Lots of researches are launched to study the internal regular patterns of OSC [2], [3]. GitH...
详细信息
ISBN:
(纸本)9789811136719
With the quick development of open source software, quantity of software is produced in the open source community (OSC) [1]. Lots of researches are launched to study the internal regular patterns of OSC [2], [3]. GitHub is one of the most famous open source community which owns thousands software projects. As a result, there are massive and abundant data of software development activities in GitHub. With the purpose to offer an accuracy and efficient dataset of GitHub, this paper proposes Kraken which is a continuous incremental data acquisition system for GitHub. Kraken contains three main modules which are independent with each other. Kraken gets the data of GitHub from two ways: git repositories and rest API. The final result shows that Kraken could extract the commits information of git repositories and get pull requests(PRs) and issues through rest API. The commits information contains the detail development history of software and the feedbacks and wisdom of software engineers are showed through PRs and issues.
We consider a wide range of non-convex regularized minimization problems, where the non-convex regularization term is composite with a linear function engaged in sparse learning. Recent theoretical investigations have...
详细信息
We consider a wide range of non-convex regularized minimization problems, where the non-convex regularization term is composite with a linear function engaged in sparse learning. Recent theoretical investigations have demonstrated their superiority over their convex counterparts. The computational challenge lies in the fact that the proximal mapping associated with non-convex regularization is not easily obtained due to the imposed linear composition. Fortunately, the problem structure allows one to introduce an auxiliary variable and reformulate it as an optimization problem with linear constraints, which can be solved using the Linearized Alternating Direction Method of Multipliers (LADMM). Despite the success of LADMM in practice, it remains unknown whether LADMM is convergent in solving such non-convex compositely regularized optimizations. In this research, we first present a detailed convergence analysis of the LADMM algorithm for solving a non-convex compositely regularized optimization problem with a large class of non-convex penalties. Furthermore, we propose an Adaptive LADMM (AdaLADMM) algorithm with a line-search criterion. Experimental results on different genres of datasets validate the efficacy of the proposed algorithm.
One performance-intensive part of automatic speech recognition is the weighted finite-state transducer (WFST) decoding. To solve the problem, we expand parallel Graphics processing Units (GPU) computing to the decodin...
One performance-intensive part of automatic speech recognition is the weighted finite-state transducer (WFST) decoding. To solve the problem, we expand parallel Graphics processing Units (GPU) computing to the decoding period. We describe extension work based on Kaldi toolkit for speech recognition research. Our work can support weighted finite-state transducer decoding on Kaldi neural nets with CUDA toolkit. Our paper also expands an efficient parallel Viterbi beam decoding algorithm to decrease the speech recognition Real Time Factor (RTF) value. Together with our optimization algorithm, we have reached 2.3x speed up on the AISHELL corpus decoding. We also implement nnet3 decoder that improves real-time speed up with no word error rate raise.
Emerging blockchain systems have been widely adopted in sharing economy, such as e-commerce, to allow mutually distrustful parties to transact fairly without trusted parties. Most blockchain systems, however, lack tra...
详细信息
Emerging blockchain systems have been widely adopted in sharing economy, such as e-commerce, to allow mutually distrustful parties to transact fairly without trusted parties. Most blockchain systems, however, lack transactional privacy protection. All transactions, including trading relationship between pseudonyms and content transacted, are exposed on the blockchain. Although many existing privacy protection methods on the blockchain have been proposed, it is difficult to find a trade-off between keeping speed and protecting privacy of transactions. To address this limitation, we propose a novel privacy-preserving method RZKPB that does not store financial transactions in clear on the blockchain, thus retaining transactional privacy from the public's view. Meanwhile, these transactions are as proofs to solve disputes between trading partners. RZKPB ensures fairness and privacy of transactions between participants without adding a new trusted party and breaking the verifying protocol on the blockchain. We take the e-commerce as an example in sharing economy to introduce RZKPB in our paper. Our experimental results show that compared with existing privacy-preserving methods based on the blockchain, RZKPB is more efficient under different settings.
Blockchain is a distributed system with efficient transaction recording and has been widely adopted in sharing economy. Although many existing privacy-preserving methods on the blockchain have been proposed, finding a...
详细信息
Blockchain is a distributed system with efficient transaction recording and has been widely adopted in sharing economy. Although many existing privacy-preserving methods on the blockchain have been proposed, finding a trade-off between keeping speed and preserving privacy of transactions remain challenging. To address this limitation, we propose a novel Fast and Privacy-preserving method based on the Permissioned Blockchain (FPPB) for fair transactions in sharing economy. Without breaking the verifying protocol and bringing additional off-blockchain interactive communication, FPPB protects the privacy and fairness of transactions. Additionally, experiments are implemented in EthereumJ (a Java implementation of the Ethereum protocol) to measure the performance of FPPB. Compared with normal transactions without cryptographic primitives, FPPB only slows down transactions slightly.
Image clustering is one of the challenging tasks in machine learning, and has been extensively used in various applications. Recently, various deep clustering methods has been proposed. These methods take a two-stage ...
详细信息
Image clustering is one of the challenging tasks in machine learning, and has been extensively used in various applications. Recently, various deep clustering methods has been proposed. These methods take a two-stage approach, feature learning and clustering, sequentially or jointly. We observe that these works usually focus on the combination of reconstruction loss and clustering loss, relatively little work has focused on improving the learning representation of the neural network for clustering. In this paper, we propose a deep convolutional embedded clustering algorithm with inception-like block (DCECI). Specifically, an inception-like block with different type of convolution filters are introduced in the symmetric deep convolutional network to preserve the local structure of convolution layers. We simultaneously minimize the reconstruction loss of the convolutional autoencoders with inception-like block and the clustering loss. Experimental results on multiple image datasets exhibit the promising performance of our proposed algorithm compared with other competitive methods.
Uncertainty is a great challenge for environment perception of autonomous robots. For instance, while building semantic maps (i.e., maps with semantic labels such as object names), the robot may encounter unexpected o...
详细信息
The widespread use of pull-requests boosts the development and evolution for many open source software projects. However, due to the parallel and uncoordinated nature of development process in GitHub, duplicate pull-r...
详细信息
In recent years, the rapidly growing use of graphs has sparked parallel graph analytics frameworks for leveraging the massive hardware resources, specifically graphics processing units (GPUs). However, the issues of t...
详细信息
ISBN:
(纸本)9781538657393;9781538657386
In recent years, the rapidly growing use of graphs has sparked parallel graph analytics frameworks for leveraging the massive hardware resources, specifically graphics processing units (GPUs). However, the issues of the unpredictable control flows, memory divergence, and the complexity of programming have restricted high-level GPU graph libraries. In this work, we present HPGA, a high performance parallel graph analytics framework targeting the GPU. HPGA implements an abstraction which maps vertex programs to generalized sparse matrix operations on GPUs for delivering high performance. HPGA incorporates high-performance GPU computing primitives and optimization strategies with a high-level programming model. We evaluate the performance of HPGA for three graph primitives (BFS, SSSP, PageRank) with large-scale datasets. The experimental results show that HPGA matches or even exceeds the performance of MapGraph and nvGRAPH, two state-of-the-art GPU graph libraries.
暂无评论