Author name disambiguation (AND) is an important task in the field of scientific data mining. It has become a great challenge with the rapid growth of academic digital libraries. The task of AND for a large number of ...
详细信息
This paper proposes a heterogeneous processor design for CNN-based AI applications on IoT devices. The heterogeneous processor contains an embedded RISC-V CPU that works as a general processor and an efficient CNN-acc...
详细信息
This paper proposes a heterogeneous processor design for CNN-based AI applications on IoT devices. The heterogeneous processor contains an embedded RISC-V CPU that works as a general processor and an efficient CNN-accelerator that supports a variety of CNN models with a list of macro instructions. For demonstration, we implement a prototype on an FPGA platform with the RISC-V CPU working under 20 MHz and the CNN accelerator working under 100 MHz. As a case study, we run a CNN-based face detection and recognition application on this prototype. The prototype can process one image in 0.72 seconds and an ASIC implementation working under 400 MHz can process one image in less than 0.15 seconds by estimation, which can satisfy the needs for many IoT scenarios such as access control systems and check-in systems.
Image restoration problems are typical ill-posed problems where the regularization term plays an important role. The regularization term learned via generative approaches is easy to transfer to various image restorati...
详细信息
Information Bottleneck (IB) based multi-view learning provides an information theoretic principle for seeking shared information contained in heterogeneous data descriptions. However, its great success is generally at...
详细信息
In open source community, there are a large number of software resources existing. Such software resources distribute in different societies or storehouses, which require different software characteristic. This phenom...
详细信息
ISBN:
(数字)9781728109459
ISBN:
(纸本)9781728109466
In open source community, there are a large number of software resources existing. Such software resources distribute in different societies or storehouses, which require different software characteristic. This phenomenon results in difficult to evaluate software quality using traditional methods. In this case, a novel open-source software sorting algorithm may be an effective solution. Considering both subjective and objective levels, we propose a new method on software sorting and retrieving. In the subjective level, metrics are selected from the corresponding collaborative development community based on software topic. In the objective level, metrics are obtained from the group emotional evaluation value of the knowledge sharing community. We have proved the effectiveness of the method through comparison experiments. Combining with the Solrcloud tool, this method has been integrated into the OSSEAN platform.
This paper presents a load balancing method for a multi-block grids-based CFD (Computational Fluid Dynamics) application on heterogeneous platform. This method includes an asymmetric task scheduling scheme and a load ...
详细信息
ISBN:
(数字)9781665403986
ISBN:
(纸本)9781665403993
This paper presents a load balancing method for a multi-block grids-based CFD (Computational Fluid Dynamics) application on heterogeneous platform. This method includes an asymmetric task scheduling scheme and a load balancing model. The idea is to balance the computing speed between the CPU and the coprocessor by adjusting the workload and the numbers of threads on both sides. Optimal load balance parameters are empirically selected, guided by a performance model. Performance evaluation is conducted on a computer server consists of two Intel Xeon E5-2670 v3 CPUs and two MIC coprocessors (Xeon Phi 5110P and Xeon Phi 7120P) for the simulation of turbulent combustion in a supersonic combustor. The results show that the performance is highly sensitive to the load balance parameters. With the optimal parameters, the heterogeneous computing achieves a maximum speedup of 2.30 × for a 6-block mesh, and a maximum speedup of 2.66 × for a 8-block mesh, over the CPU-only computing.
Data detection is among the most crucial process task for massive multi-user (MU) multiple-input multiple-output (MIMO) wireless systems. In this letter, we propose a novel efficient high precision soft-output data de...
详细信息
ISBN:
(纸本)9781728140773
Data detection is among the most crucial process task for massive multi-user (MU) multiple-input multiple-output (MIMO) wireless systems. In this letter, we propose a novel efficient high precision soft-output data detection algorithm, which iteratively generates a signal vector and reduces its complexity meanwhile. This algorithm guarantees its significant error-rate performance and reduced complexity by combing an optimization method, ADMM, and modified neighborhood search algorithm. Simulation results demonstrate that the proposed detection achieves superior performance over the existing methods at low computational complexity.
Polar codes attract more and more attention of researchers in recent years, since its capacity achieving property. However, their error-correction performance under successive cancellation (SC) decoding is inferior to...
详细信息
Non-volatile random-access memory(NVRAM) technology is maturing rapidly and its byte-persistence feature allows the design of new and efficient fault tolerance mechanisms. In this paper we propose the versionized pr...
详细信息
Non-volatile random-access memory(NVRAM) technology is maturing rapidly and its byte-persistence feature allows the design of new and efficient fault tolerance mechanisms. In this paper we propose the versionized process(Ver P), a new process model based on NVRAM that is natively non-volatile and fault tolerant. We introduce an intermediate software layer that allows us to run a process directly on NVRAM and to put all the process states into NVRAM, and then propose a mechanism to versionize all the process data. Each piece of the process data is given a special version number, which increases with the modification of that piece of data. The version number can effectively help us trace the modification of any data and recover it to a consistent state after a system *** with traditional checkpoint methods, our work can achieve fine-grained fault tolerance at very little cost.
Sparse bundle adjustment(SBA) is a key but time-and memory-consuming step in three-dimensional(3 D) reconstruction. In this paper, we propose a 3 D point-based distributed SBA algorithm(DSBA) to improve the speed and ...
详细信息
Sparse bundle adjustment(SBA) is a key but time-and memory-consuming step in three-dimensional(3 D) reconstruction. In this paper, we propose a 3 D point-based distributed SBA algorithm(DSBA) to improve the speed and scalability of SBA. The algorithm uses an asynchronously distributed sparse bundle adjustment(A-DSBA)to overlap data communication with equation computation. Compared with the synchronous DSBA mechanism(SDSBA), A-DSBA reduces the running time by 46%. The experimental results on several 3 D reconstruction datasets reveal that our distributed algorithm running on eight nodes is up to five times faster than that of the stand-alone parallel SBA. Furthermore, the speedup of the proposed algorithm(running on eight nodes with 48 cores) is up to41 times that of the serial SBA(running on a single node).
暂无评论