One performance-intensive part of automatic speech recognition is the weighted finite-state transducer (WFST) decoding. To solve the problem, we expand parallel Graphics processing Units (GPU) computing to the decodin...
One performance-intensive part of automatic speech recognition is the weighted finite-state transducer (WFST) decoding. To solve the problem, we expand parallel Graphics processing Units (GPU) computing to the decoding period. We describe extension work based on Kaldi toolkit for speech recognition research. Our work can support weighted finite-state transducer decoding on Kaldi neural nets with CUDA toolkit. Our paper also expands an efficient parallel Viterbi beam decoding algorithm to decrease the speech recognition Real Time Factor (RTF) value. Together with our optimization algorithm, we have reached 2.3x speed up on the AISHELL corpus decoding. We also implement nnet3 decoder that improves real-time speed up with no word error rate raise.
Ontologies have proven to be useful for capturing and organizing knowledge as a hierarchical set of terms and their relationships. However, curating gene ontology data by hand requires specialized knowledge of certain...
详细信息
The Internet based cyber-physical world has profoundly changed the information environment for the development of artificial intelligence(AI), bringing a new wave of AI research and promoting it into the new era of AI...
详细信息
The Internet based cyber-physical world has profoundly changed the information environment for the development of artificial intelligence(AI), bringing a new wave of AI research and promoting it into the new era of AI 2.0. As one of the most prominent characteristics of research in AI 2.0 era, crowd intelligence has attracted much attention from both industry and research communities. Specifically, crowd intelligence provides a novel problem-solving paradigm through gathering the intelligence of crowds to address challenges. In particular, due to the rapid development of the sharing economy, crowd intelligence not only becomes a new approach to solving scientific challenges, but has also been integrated into all kinds of application scenarios in daily life, e.g., online-tooffline(O2O) application, real-time traffic monitoring, and logistics management. In this paper, we survey existing studies of crowd intelligence. First, we describe the concept of crowd intelligence, and explain its relationship to the existing related concepts, e.g., crowdsourcing and human computation. Then, we introduce four categories of representative crowd intelligence platforms. We summarize three core research problems and the state-of-the-art techniques of crowd intelligence. Finally, we discuss promising future research directions of crowd intelligence.
New non-volatile memory (e.g., phase-change memory) provides fast access, large capacity, byteaddressability, and non-volatility features. These features, fast-byte-persistency, will bring new opportunities to fault...
详细信息
New non-volatile memory (e.g., phase-change memory) provides fast access, large capacity, byteaddressability, and non-volatility features. These features, fast-byte-persistency, will bring new opportunities to fault tolerance. We propose a fine-grained checkpoint based on non-volatile memory. We extend the current virtual memory manager to manage non-volatile memory, and design a persistent heap with support for fast allocation and checkpointing of persistent objects. To achieve a fine-grained checkpoint, we scatter objects across virtual pages and rely on hardware page-protection to monitor the modifications. In our system, two objects in different virtual pages may reside on the same physical page. Modifying one object would not interfere with the other object. This allows us to monitor and checkpoint objects smaller than 4096 bytes in a fine-grained way. Compared with previous page-grained based checkpoint mechanisms, our new checkpoint method can greatly reduce the data copied at checkpoint time and better leverage the limited bandwidth of non-volatile memory.
Valuable training data is often owned by independent organizations and located in multiple data centers. Most deep learning approaches require to centralize the multi-datacenter data for performance purpose. In practi...
详细信息
Mankind's demand for more powerful computing capabilities is never met, which has led to the continuous improvement of supercomputers' performance. A more powerful supercomputer tends to have a larger system s...
详细信息
Mankind's demand for more powerful computing capabilities is never met, which has led to the continuous improvement of supercomputers' performance. A more powerful supercomputer tends to have a larger system scale, which brings serious challenges to the system management, within which how to monitor the system's state is a critical problem. To address this problem, a scalable and flexible monitoring system framework for supercomputers is brought forward in this paper which can monitor supercomputers with tens of thousands of nodes effectively and efficiently. In this paper, we firstly give an overview of the framework and then focus on the Super Computer System Description Language(SCSDL) which is key to the framework. In the end, we explain some techniques about implementing the framework, and the client GUIs of a job monitoring system and an error monitoring system for Tianhe-2 based on this framework are given, from which we can see that the framework is well scalable and flexible to monitor Tianhe-2 which has 16,000 nodes effectively and efficiently.
Image annotation generates a set of semantic labels that describe the contents of an input *** deep learning techniques have achieved significant success in many areas of image *** this paper,we present a multi-label ...
详细信息
Image annotation generates a set of semantic labels that describe the contents of an input *** deep learning techniques have achieved significant success in many areas of image *** this paper,we present a multi-label image annotation method that combines unsupervised object hypotheses generation and deep neural *** an image,object hypotheses are generated in an unsupervised *** we extract the image features for each hypothesis with a deep neural network *** combining the features of all hypotheses,we get the features of the entire ***,we calculate for each label the probability of that the label is correlated with the given *** can be trained in an end-to-end way using the standard backward propagation *** results on multiple benchmark datasets show that our method is better than the state-of-the-art ones.
Performance and energy consumption of high performance computing (HPC) interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation plat- form...
详细信息
Performance and energy consumption of high performance computing (HPC) interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation plat- form is very important for the research on HPC software and hardware technologies. To effectively evaluate the per- formance and energy consumption of HPC interconnection networks, this article designs and implements a detailed and clock-driven HPC interconnection network simulation plat- form, called HPC-NetSim. HPC-NetSim uses application- driven workloads and inherits the characteristics of the de- tailed and flexible cycle-accurate network simulator. Besides, it offers a large set of configurable network parameters in terms of topology and routing, and supports router's on/off states. We compare the simulated execution time with the real execution time of Tianhe-2 subsystem and the mean error is only 2.7%. In addition, we simulate the network behaviors with different network structures and low-power modes. The results are also consistent with the theoretical analyses.
On virtualization platforms, peak memory de- mand caused by hotspot applications often triggers page swapping in guest OS, causing performance degradation in- side and outside of this virtual machine (VM). Even thou...
详细信息
On virtualization platforms, peak memory de- mand caused by hotspot applications often triggers page swapping in guest OS, causing performance degradation in- side and outside of this virtual machine (VM). Even though host holds sufficient memory pages, guest OS is unable to utilize free pages in host directly due to the semantic gap between virtual machine monitor (MM) and guest operat- ing system (OS). Our work aims at utilizing the free memory scattered in multiple hosts in a virtualization environment to improve the performance of guest swapping in a transparent and implicit way. Based on the insightful analysis of behav- ioral characteristics of guest swapping, we design and im- plement a distributed and scalable framework HybridSwap. It dynamically constructs virtual swap pools using various policies, and builds up a synthetic swapping mechanism in a peer-to-peer way, which can adaptively choose different vir- tual swap pools. We implement the prototype of HybridSwap and evaluate it with some benchmarks in different scenar- ios. The evaluation results demonstrate that our solution has the ability to promote the guest swapping efficiency indeed and shows a double performance promotion in some cases. Even in the worst case, the system overhead brought by Hy- bridSwap is acceptable.
暂无评论