检索结果-内蒙古大学图书馆

Analysis and evaluation method for linpack benchmark

Dongbei Daxue Xuebao/Journal of Northeastern University 2014年 35卷 102-107页

作者： Du, Yun-Fei Yang, Can-Qun Wang, Feng Yi, Hui-Zhan School of Computer National University of Defense Technology Changsha410073 China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha410073 China

The basic algorithm of HPL was introduced. Two optimization methods of communication, i.e., advanced-lookahead and dynamic broadcasting algorithm, were proposed. The performances of the two optimization methods were evaluated on the Tianhe-2 system, and compared with the performance of traditional HPL. The Linpack evaluating methods were discussed, the choice methods of the key parameters affecting the performance were given, and the effect of the high-speed interconnection network on Linpack performance was analyzed. The results showed that the higher the node performance, the greater the effect of the network bandwidth and latency on the performances of HPL. ©, 2014, Northeastern University. All right reserved.

关键词： Large scale systems

来源：评论

学校读者我要写书评

暂无评论

AUModel:A Conceptual model for Adaptive Software

AUModel:A Conceptual model for Adaptive Software

引用

2014 IEEE 5th International Conference on Software Engineering and Service Science

作者： Hui Liu Bo Ding Dianxi Shi Huaimin Wang National Key Laboratory for Parallel and Distributed Processing College of Computer ScienceNational University of Defense Technology

Pervasive software should be able to adapt itself to the changing environments and user ***,it will bring great challenges to the software engineering *** paper proposes AUModel,a conceptual model for adaptive software,which takes adaptability as an inherent feature and can act as the foundation of the engineering *** introducing AUModel,the reuse of software adaptation infrastructure as well as the separation of adaptation concerns are enabled,which can facilitate both the development and maintenance of adaptive *** paper also presents our initial attempts to realize this model,including a middleware prototype to support this model and an application to validate its effectiveness.

关键词： pervasive computing adaptive software middleware

来源：评论

学校读者我要写书评

暂无评论

The acceleration of turbo decoder on the newest GPGPU of Kepler architecture

The acceleration of turbo decoder on the newest GPGPU of Kep...

引用

International Symposium on Communications and Information Technologies (ISCIT)

作者： Yang Zhang Zuocheng Xing Luechao Yuan Cang Liu Qinglin Wang Science and technology on Parallel and distributed processing laboratory National University of Defense Technology ChangSha China

ISBN: (纸本)9781479944156

In the paper, a new implementation of a 3GPP LTE standards compliant turbo decoder based on GPGPU is proposed. It uses the newest GPU-Tesla K20c, which is based on the Kepler GK110 architecture. The new architecture has more powerful parallel computing capability and we use it to fully exploit the parallelism in the turbo decoding algorithm in novel ways. Meanwhile, we use various memory hierarchies to meet various kinds of data demands on speed and capacity. Simulation shows that our implementation is practical and it gets 76% improvement on throughput over the latest GPU implementation. The result demonstrates that the newest Kepler architecture is suitable for turbo decoding and it can be a promising reconfigurable platform for the communication system.

关键词： Decoding Graphics processing units parallel processing Computer architecture Throughput Kernel Bit error rate

来源：评论

学校读者我要写书评

暂无评论

IWFR:Exploiting Full-duplex in Wireless Relay Networks

IWFR:Exploiting Full-duplex in Wireless Relay Networks

引用

2014 4th IEEE International Conference on Information Science and Technology

作者： Yong Lu Shaohe Lv Xiaodong Wang Xingming Zhou National Key Laboratory for Parallel and Distributed Processing School of Computer Science National University of Defense Technology

Traditional wireless relay networks have large endto-end time delay and low throughput because of the limit that it can't receive and forward at the same *** this paper,we proposed IWFR:Immediate Wireless Full-Duplex Relay which exploits the advantages of full-duplex to shorten the end-to-end time delay and improve the *** the same time,we designed a new implicit acknowledgement mechanism,which can eliminate the ACK overheads and evidently improve the throughput of the *** implement IWFR,we also modified the full-duplex node architecture to make it support for immediate *** shows that IWFR shortens the end-to-end time delay by 60%on average and improves the throughput to 240%of the original relay.

关键词： Relay IWFR:Exploiting Full-duplex in Wireless Relay Networks

来源：评论

学校读者我要写书评

暂无评论

A low-cost fully pipelined architecture for fingerprint matching

A low-cost fully pipelined architecture for fingerprint matc...

引用

International Conference on Signal processing Proceedings (ICSP)

作者： Jinwei Xu Jingfei Jiang Yong Dou Xiaolong Shen Science and Technology on Parallel and distributed Processing Laboratory National University of Defense Technology Changsha China

Fingerprint matching is a key procedure in fingerprint identification applications. The fingerprint-matching algorithm based on minutiae is one of the most typical algorithms that can achieve a reasonably correct recognition rate. Performance and cost are two critical factors when implementing minutia-based matching algorithms in most embedded applications. A low-cost, fully pipelined architecture for minutia-based fingerprint matching is proposed in this paper. A regular matching unit with a pipeline of 13 stages is designed as the core of the architecture, interfacing with a two-port RAM and a DDR3 controller. We implemented the whole architecture on a Xilinx FPGA board with the Virtex VII XC7VX485T chip. The matching unit can run with a frequency of 330 MHz on the chip, which leads the system to achieve a throughput of about 430000 fingerprints per second when processing typical datasets. The unit only occupies 568 slices, which is less than 1% of the available chip resources. The board only consumes 16 W of power when run. The architecture can gain about twice the throughput of the 2.93 GHz Intel Xeon5670 CPU at a low logic cost and power.

关键词： Fingerprint recognition Throughput Random access memory Pipelines Computer architecture Field programmable gate arrays Algorithm design and analysis

来源：评论

学校读者我要写书评

暂无评论

Accelerating embarrassingly parallel algorithm on Intel MIC

Accelerating embarrassingly parallel algorithm on Intel MIC

引用

IEEE International Conference on Progress in Informatics and Computing (PIC)

作者： Qinglin Wang Jie Liu Xiantuo Tang Feng Wang Guitao Fu Zuocheng Xing Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China

ISBN: (纸本)9781479920327

The Embarrassingly parallel (EP) algorithm which is typical of many Monte Carlo applications provides an estimate of the upper achievable limits for double precision performance of parallel supercomputers. Recently, Intel released Many Integrated Core (MIC) architecture as a many-core co-processor. MIC often offers more than 50 cores each of which can run four hardware threads as well as 512-bit vector instructions. In this paper, we describe how the EP algorithm is accelerated effectively on the platforms containing MIC using the offload execution model. The result shows that the efficient implementation of EP algorithm on MIC can take full advantage of MIC's computational resources and achieves a speedup of 3.06 compared with that on Intel Xeon E5-2670 CPU. Based on the EP algorithm on MIC and an effective task distribution model, the implementation of EP algorithm on a CPU-MIC heterogeneous platform achieves the performance of up to 2134.86 Mop/s and 4.04 times speedup compared with that on Intel Xeon E5-2670 CPU.

关键词： Microwave integrated circuits Computer architecture Vectors Algorithm design and analysis Graphics processing units Clustering algorithms Load modeling

来源：评论

学校读者我要写书评

暂无评论

Realization and optimization DGEMM on ARMv8 64-bit multi-core processor

引用

Dongbei Daxue Xuebao/Journal of Northeastern University 2014年 35卷 37-43页

作者： Jiang, Hao Wang, Feng Zuo, Ke Li, Kuan Yang, Can-Qun College of Computer Science National University of Defense Technology Changsha410073 China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha410073 China

The double-precision matrix-matrix multiplication (DGEMM) on ARMv8 64-bit multi-core processor architecture was realized and optimized, and the optimal model for the purpose of maximizing the compute-to-memory access ratio was built to design DGEMM kernel. The ARM 64-bit memory accessing instruction, Cache pre-fetching instruction and NEON vector FMA instruction were utilized through instruction reordering and loop unrolling to construct the kernel assembly codes. The blocking and packing algorithms and parallel methods from GotoBLAS (OpenBLAS) were chosen, and the results showed that the floating-point peak efficiency can achieve 82% with one thread and 80% with eight threads, respectively. As the fastest DGEMM implementation on ARMv8 64-bit processor, it improves the peak performance by 8.3% and 16.7% compared to ATLAS. ©, 2014, Northeastern University. All right reserved.

关键词： Digital arithmetic

来源：评论

学校读者我要写书评

暂无评论

Elastic Allocator: An Adaptive Task Scheduler for Streaming Query in the Cloud

Elastic Allocator: An Adaptive Task Scheduler for Streaming ...

引用

2014 IEEE 8th International Symposium on Service Oriented System Engineering

作者： Zheng Han Rui Chu Haibo Mi Huaimin Wang Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha Hunan China

Many big data applications receive and process data in real time. These data, also known as data streams, are generated continuously and processed online in a low latency manner. Data stream is prone to change dramatically in volume, since its workload may have a variation of several orders between peak and valley periods. Fully provisioning resources for stream processing to handle the peak load is costly, while over-provisioning is wasteful when to deal with lightweight workload. Cloud computing emphasizes that resource should be utilized economically and elastically. An open question is how to allocate query task adaptively to keeping up the input rate of the data stream. Previous work focuses on using either local or global capacity information to improve the cluster CPU resource utilization, while the bandwidth utilization which is also critical to the system throughput is ignored or simplified. In this paper, we formalize the operator placement problem considering both the CPU and bandwidth usage, and introduce the Elastic Allocator. The Elastic Allocator uses a quantitative method to evaluate a node's capacity and bandwidth usage, and exploit both the local and global resource information to allocate the query task in a graceful manner to achieve high resource utilization. The experimental results and a simple prototype built on top of Storm finally demonstrate that Elastic Allocator is adaptive and feasible in cloud computing environment, and has an advantage of improving and balancing system resource utilization.

关键词： Bandwidth Resource management Clustering algorithms Storms Linear programming Cloud computing Computer architecture

来源：评论

学校读者我要写书评

暂无评论

UWBSS: Ultra-wideband Spectrum Sensing with Multiple Sub-Nyquist Sampling Rates

UWBSS: Ultra-wideband Spectrum Sensing with Multiple Sub-Nyq...

引用

IEEE International Conference on Computational Science and Engineering, CSE

作者： Yong Lu Shaohe Lv Xiaodong Wang Xingming Zhou Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha Hunan China

ISBN: (纸本)9781479979820

In consideration of the continuous increasing demand of wireless data transmission, ultra-wideband spectrum sensing is crucial to support the cognitive communication in a ultra-wide frequency band. However, it is challenging to design ADCs that fulfill the Nyquist rate requirement for a ultra wide band. Spectrum sensing based on the sub-Nyquist sampling maybe the answer. We propose UWBSS: ultra-wideband spectrum sensing. With multiple sub-Nyquist sampling rates, UWBSS can reconstruct the occupied frequencies from the under sampled data directly without complex amplitudes reconstruction. Also We conduct an extensive study to characterize the effect on the accuracy of sub-Nyquist spectrum sensing of sampling rate, bandwidth resolution and the SNR of the original signal. The performance of UWBSS is verified by simulations.

关键词： Sensors Signal to noise ratio Ultra wideband technology Signal resolution Wideband Energy resolution

来源：评论

学校读者我要写书评

暂无评论

Agent-Based Fault Tolerance Mechanism for distributed Key-Value Database

Agent-Based Fault Tolerance Mechanism for Distributed Key-Va...

引用

International Conference on Digital Home (ICDH)

作者： Wu Hui-jun Lu Kai GenLi Gen Jiang Jin-fei Wang Shuang-xi Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha PR China

ISBN: (纸本)9781479942831

distributed key-value database is widely used in Web 2.0 applications and cloud computing environments. It overcomes the weak performance and bad scalability of traditional relational database. But fault in distributed system will lead to errors, then the high performance is useless. So we should build a fault tolerance mechanism. On the other hand, in many application scenarios, transactional operations are inevitable. Some existing key-value databases utilize two-phase commit protocol or optimistic concurrency control in transaction processing. But the problems are sing-node failure and high overhead in protocol processing. Meanwhile, users' programming becomes more error-prone. This paper designs a fault tolerance and recovery mechanism on DStageDB, which is a distributed key-value database. We design an agent-based transaction processing mechanism. The transaction processing speed is improved and less user intervention is needed.

关键词： Servers Fault tolerance Fault tolerant systems distributed databases Protocols Usability

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：