B-mode ultrasound tongue imaging is widely used in the speech production field. However, efficient interpretation is in a great need for the tongue image sequences. Inspired by the recent success of unsupervised deep ...
详细信息
In recent years, the rapid-growing scales of graphs have sparked a lot of parallel graph analysis frameworks to leverage the massive hardware resources on CPUs or GPUs. Existing CPU implementations are time-consuming,...
详细信息
In recent years, the rapid-growing scales of graphs have sparked a lot of parallel graph analysis frameworks to leverage the massive hardware resources on CPUs or GPUs. Existing CPU implementations are time-consuming, while GPU implementations are restricted by the memory space and the complexity of programming. In this paper, we present a high performance hybrid CPU-GPU parallel graph analytics framework with good productivity based on GraphMat. We map vertex programs to generalized sparse matrix vector multiplication on GPUs to deliver high performance, and propose a high-level abstraction for developers to implement various graph algorithms with relatively little efforts. Meanwhile, several optimizations have been adopted for reducing the communication cost and leveraging hardware resources, especially the memory hierarchy. We evaluate the proposed framework on three graph primitives(PageRank, BFS and SSSP) with large-scale graphs. The experimental results show that, our implementation achieves an average speedup of 7.0 X than GraphMat on two 6-core Intel Xeon CPUs. It also has the capability to process larger datasets but achieves comparable performance than MapGraph, a state-of-theart GPU-based framework.
One performance-intensive part of automatic speech recognition is the weighted finite-state transducer (WFST) decoding. To solve the problem, we expand parallel Graphics processing Units (GPU) computing to the decodin...
One performance-intensive part of automatic speech recognition is the weighted finite-state transducer (WFST) decoding. To solve the problem, we expand parallel Graphics processing Units (GPU) computing to the decoding period. We describe extension work based on Kaldi toolkit for speech recognition research. Our work can support weighted finite-state transducer decoding on Kaldi neural nets with CUDA toolkit. Our paper also expands an efficient parallel Viterbi beam decoding algorithm to decrease the speech recognition Real Time Factor (RTF) value. Together with our optimization algorithm, we have reached 2.3x speed up on the AISHELL corpus decoding. We also implement nnet3 decoder that improves real-time speed up with no word error rate raise.
Emerging blockchain systems have been widely adopted in sharing economy, such as e-commerce, to allow mutually distrustful parties to transact fairly without trusted parties. Most blockchain systems, however, lack tra...
详细信息
Emerging blockchain systems have been widely adopted in sharing economy, such as e-commerce, to allow mutually distrustful parties to transact fairly without trusted parties. Most blockchain systems, however, lack transactional privacy protection. All transactions, including trading relationship between pseudonyms and content transacted, are exposed on the blockchain. Although many existing privacy protection methods on the blockchain have been proposed, it is difficult to find a trade-off between keeping speed and protecting privacy of transactions. To address this limitation, we propose a novel privacy-preserving method RZKPB that does not store financial transactions in clear on the blockchain, thus retaining transactional privacy from the public's view. Meanwhile, these transactions are as proofs to solve disputes between trading partners. RZKPB ensures fairness and privacy of transactions between participants without adding a new trusted party and breaking the verifying protocol on the blockchain. We take the e-commerce as an example in sharing economy to introduce RZKPB in our paper. Our experimental results show that compared with existing privacy-preserving methods based on the blockchain, RZKPB is more efficient under different settings.
Blockchain is a distributed system with efficient transaction recording and has been widely adopted in sharing economy. Although many existing privacy-preserving methods on the blockchain have been proposed, finding a...
详细信息
Blockchain is a distributed system with efficient transaction recording and has been widely adopted in sharing economy. Although many existing privacy-preserving methods on the blockchain have been proposed, finding a trade-off between keeping speed and preserving privacy of transactions remain challenging. To address this limitation, we propose a novel Fast and Privacy-preserving method based on the Permissioned Blockchain (FPPB) for fair transactions in sharing economy. Without breaking the verifying protocol and bringing additional off-blockchain interactive communication, FPPB protects the privacy and fairness of transactions. Additionally, experiments are implemented in EthereumJ (a Java implementation of the Ethereum protocol) to measure the performance of FPPB. Compared with normal transactions without cryptographic primitives, FPPB only slows down transactions slightly.
In recent years, the rapidly growing use of graphs has sparked parallel graph analytics frameworks for leveraging the massive hardware resources, specifically graphics processing units (GPUs). However, the issues of t...
详细信息
ISBN:
(纸本)9781538657393;9781538657386
In recent years, the rapidly growing use of graphs has sparked parallel graph analytics frameworks for leveraging the massive hardware resources, specifically graphics processing units (GPUs). However, the issues of the unpredictable control flows, memory divergence, and the complexity of programming have restricted high-level GPU graph libraries. In this work, we present HPGA, a high performance parallel graph analytics framework targeting the GPU. HPGA implements an abstraction which maps vertex programs to generalized sparse matrix operations on GPUs for delivering high performance. HPGA incorporates high-performance GPU computing primitives and optimization strategies with a high-level programming model. We evaluate the performance of HPGA for three graph primitives (BFS, SSSP, PageRank) with large-scale datasets. The experimental results show that HPGA matches or even exceeds the performance of MapGraph and nvGRAPH, two state-of-the-art GPU graph libraries.
Ontologies have proven to be useful for capturing and organizing knowledge as a hierarchical set of terms and their relationships. However, curating gene ontology data by hand requires specialized knowledge of certain...
详细信息
Valuable training data is often owned by independent organizations and located in multiple data centers. Most deep learning approaches require to centralize the multi-datacenter data for performance purpose. In practi...
详细信息
New non-volatile memory (e.g., phase-change memory) provides fast access, large capacity, byteaddressability, and non-volatility features. These features, fast-byte-persistency, will bring new opportunities to fault...
详细信息
New non-volatile memory (e.g., phase-change memory) provides fast access, large capacity, byteaddressability, and non-volatility features. These features, fast-byte-persistency, will bring new opportunities to fault tolerance. We propose a fine-grained checkpoint based on non-volatile memory. We extend the current virtual memory manager to manage non-volatile memory, and design a persistent heap with support for fast allocation and checkpointing of persistent objects. To achieve a fine-grained checkpoint, we scatter objects across virtual pages and rely on hardware page-protection to monitor the modifications. In our system, two objects in different virtual pages may reside on the same physical page. Modifying one object would not interfere with the other object. This allows us to monitor and checkpoint objects smaller than 4096 bytes in a fine-grained way. Compared with previous page-grained based checkpoint mechanisms, our new checkpoint method can greatly reduce the data copied at checkpoint time and better leverage the limited bandwidth of non-volatile memory.
暂无评论