检索结果-内蒙古大学图书馆

Design for Testability Features of Godson-3 Multicore Microprocessor

Journal of computer Science & Technology 2011年第2期26卷 302-313页

作者：齐子初刘慧李向库胡伟武 Key Laboratory of Computer System and Architecture Chinese Academy of Sciences Institute of Computing Technology Chinese Academy of Sciences Loongson Technologies Corporation Limited

This paper describes the design for testability （DFT） challenges and techniques of Godson-3 microprocessor, which is a scalable multicore processor based on the scalable mesh of crossbar （SMOC） on-chip network and targets high-end applications. Advanced techniques are adopted to make the DFT design scalable and achieve low-power and low-cost test with limited IO resources. To achieve a scalable and flexible test access, a highly elaborate test access mechanism （TAM） is implemented to support multiple test instructions and test modes. Taking advantage of multiple identical cores embedding in the processor, scan partition and on-chip comparisons are employed to reduce test power and test time. Test compression technique is also utilized to decrease test time. To further reduce test power, clock controlling logics are designed with ability to turn off clocks of non-testing partitions. In addition, scan collars of CACHEs are designed to perform functional test with low-speed ATE for speed-binning purposes, which poses low complexity and has good correlation results.

关键词： DFT （design for testability） TAM （test access mechanism） multicore processor low power test

来源：评论

学校读者我要写书评

暂无评论

Dawning Nebulae:A PetaFLOPS Supercomputer with a Heterogeneous Structure

引用

Journal of computer Science & Technology 2011年第3期26卷 352-362页

作者：孙凝辉邢晶霍志刚谭光明熊劲李波马灿 Key Laboratory of Computer System and Architecture Chinese Academy of Sciences Institute of Computing Technology Chinese Academy of Sciences Graduate University of Chinese Academy of Sciences

Dawning Nebulae is a heterogeneous system composed of 9280 multi-core x86 CPUs and 4640 NVIDIA Fermi GPUs. With a Linpack performance of 1.271 petaFLOPS, it was ranked the second in the TOP500 List released in June 2010. In this paper, key issues in the system design of Dawning Nebulae are introduced. system tuning methodologies aiming at petaFLOPS Linpack result are presented, including algorithmic optimization and communication improvement. The design of its file I/O subsystem, including HVFS and the underlying DCFS3, is also described. Performance evaluations show that the Linpack efficiency of each node reaches 69.89%, and 1024-node aggregate read and write bandwidths exceed 100 GB/s and 70 GB/s respectively. The success of Dawning Nebulae has demonstrated the viability of CPU/GPU heterogeneous structure for future designs of supercomputers.

关键词： supercomputer heterogeneous systems performance evaluation

来源：评论

学校读者我要写书评

暂无评论

YOLO-LLTS: Real-Time Low-Light Traffic Sign Detection via Prior-Guided Enhancement and Multi-Branch Feature Interaction

arXiv

引用

arXiv 2025年

作者： Lin, Ziyu Wu, Yunfan Ma, Yuhang Chen, Junzhou Zhang, Ronghui Wu, Jiaming Yin, Guodong Lin, Liang Guangdong Key Laboratory of Intelligent Transportation System School of intelligent systems engineering Sun Yat-sen University Guangzhou510275 China Department of Architecture and Civil Engineering Chalmers University of Technology Sven Hultins gata 6 GothenburgSE-412 96 Sweden School of Mechanical Engineering Southeast University Nanjing211189 China School of Computer Science and Engineering Sun Yat-sen University Guangzhou510275 China

Detecting traffic signs effectively under low-light conditions remains a significant challenge. To address this issue, we propose YOLO-LLTS, an end-to-end real-time traffic sign detection algorithm specifically designed for low-light environments. Firstly, we introduce the High-Resolution Feature Map for Small Object Detection (HRFM-TOD) module to address indistinct small-object features in low-light scenarios. By leveraging high-resolution feature maps, HRFM-TOD effectively mitigates the feature dilution problem encountered in conventional PANet frameworks, thereby enhancing both detection accuracy and inference speed. Secondly, we develop the Multi-branch Feature Interaction Attention (MFIA) module, which facilitates deep feature interaction across multiple receptive fields in both channel and spatial dimensions, significantly improving the model’s information extraction capabilities. Finally, we propose the Prior-Guided Enhancement Module (PGFE) to tackle common image quality challenges in low-light environments, such as noise, low contrast, and blurriness. This module employs prior knowledge to enrich image details and enhance visibility, substantially boosting detection performance. To support this research, we construct a novel dataset, the Chinese Nighttime Traffic Sign Sample Set (CNTSSS), covering diverse nighttime scenarios, including urban, highway, and rural environments under varying weather conditions. Experimental evaluations demonstrate that YOLO-LLTS achieves state-of-the-art performance, outperforming the previous best methods by 2.7% mAP50 and 1.6% mAP50:95 on TT100K-night, 1.3% mAP50 and 1.9% mAP50:95 on CNTSSS, and achieving superior results on the CCTSDB2021 dataset. Moreover, deployment experiments on edge devices confirm the real-time applicability and effectiveness of our proposed approach. Copyright © 2025, The Authors. All rights reserved.

关键词： Traffic signs

来源：评论

学校读者我要写书评

暂无评论

Physical Implementation of the 1GHz Godson-3 Quad-Core Microprocessor

引用

Journal of computer Science & Technology 2010年第2期25卷 192-199页

作者：范宝峡杨梁王江嵋王茹肖斌徐英刘动赵继业 Key Laboratory of Computer System and Architecture Institute of Computing TechnologyChinese Academy of Sciences Graduate University of Chinese Academy of Sciences Loongson Technology Corporation Limited

The Godson-3A microprocessor is a quad-core version of the scalable Godson-3 multi-core series. It is physically implemented based on the 65 nm CMOS process. This 174 mm2 chip consists of 425 million transistors. The maximum frequency is 1GHz with a maximum power consumption of 15 W. The main challenges of Godson-3A physical implementation include very large scale, high frequency requirement, sub-micron technology effects and aggressive time schedule. This paper describes the design methodology of the physical implementation of Godson-3A, with particular emphasis on design methods for high frequency, clock tree design, power management, and on-chip variation （OCV） issue.

关键词： physical implementation design methodology on-chip variation （OCV） low power clock tree

来源：评论

学校读者我要写书评

暂无评论

Helix Scan:A Scan Design for Diagnosis

引用

Tsinghua Science and Technology 2007年第S1期12卷 83-88页

作者：王飞胡瑜李晓维 Graduate School of Chinese Academy of Sciences Key Laboratory of Computer System and Architecture Institute of Computing TechnologyChinese Academy of Sciences

Scan design is a widely used design-for-testability technique to improve test quality and efficiency. For the scan-designed circuit, test and diagnosis of the scan chain and the circuit is an important process for silicon debug and yield learning. However, conventional scan designs and diagnosis methods abort the subsequent diagnosis process after diagnosing the scan chain if the scan chain is faulty. In this work, we propose a design-for-diagnosis scan strategy called helix scan and a diagnosis algorithm to address this issue. Unlike previous proposed methods, helix scan has the capability to carry on the diagnosis process without losing information when the scan chain is faulty. What is more, it simplifies scan chain diagnosis and achieves high diagnostic resolution as well as accuracy. Experimental results demonstrate the effectiveness of our design.

关键词： test diagnosis scan chain diagnosis design for diagnosis (DFD)

来源：评论

学校读者我要写书评

暂无评论

Design and implementation of communication system of the Dawning 6000 supercomputer

引用

中国计算机科学前沿 2010年第4期4卷 466-474页

作者： Qiang LI Bo LI Zhigang HUO Ninghui SUN National Research Center for Intelligent Computing Systems Beijing 100190China Key Laboratory of Computer System and Architecture Chinese Academy of SciencesBeijing 100190China Graduate University of Chinese Academy of Sciences Beijing 100190China National Research Center for Intelligent Computing Systems Beijing 100190China Key Laboratory of Computer System and Architecture Chinese Academy of SciencesBeijing 100190China

An increasing number of supercomputers adopt a heterogeneous architecture, consisting of both general purpose CPUs and specialized accelerators. Such design is beneficial for scalability and power, but on the other hand, heterogeneity brings new challenges in communication systems to connect heterogeneous components and provide support for programming. The communication system of the Dawning 6000 connectstwo kinds of heterogeneous processors, Loongson and AMD, and adopts a three layer architecture with an intranode layer between heterogeneous components. To efficiently connect heterogeneous components, the system forms a global address space and provides a mechanism for message transmission via an in-node global store; and employing Infiniband network, provides an OS-bypassing virtualization method to share an Infiniband card between nodes. To facilitate programming on heterogeneous processors, it supports unified parallel C (UPC), with a modified complier based on global address space. Also, aspecial collective network is implemented for collective operations. Results obtained from a prototype system prove these features to be both feasible and efficient.

关键词： hyper parallel processing (HPP) global address space (GAS) virtualization Dawning 6000 unified parallel C (UPC)

来源：评论

学校读者我要写书评

暂无评论

Reliable and Energy Efficient Protocol for Wireless Sensor Network

引用

Tsinghua Science and Technology 2007年第S1期12卷 95-100页

作者：阚保强蔡理徐勇军 College of Science Air Force Engineering University Key Laboratory of Computer System and Architecture Institute of Computing TechnologyChinese Academy of Sciences

Low-power design is one of the most important issues in wireless sensor networks (WSNs) , while reliable information transmitting should be ensured as well. Transmitting power (TP) control is a simple method to make the power consumption down, but excessive interferences from potential adjacent operating links and communication reliability between nodes should be considered. In this paper, a reliable and energy efficient protocol is presented, which adopts adaptive rate control based on an optimal TP. A mathematical model considering average interference and network connectivity was used to predict the optimal TP. Then for the optimal TP, active nodes adaptively chose the data rate with the change of bit-error–rate(BER) performance. The efficiency of the new strategy was validated by mathematical analysis and simulations. Compared with 802.11 DCF which uses maximum unified TP and BASIC protocol, it is shown that the higher average throughput can achieve while the energy consumption per useful bit can be reduced according to the results.

关键词： wireless sensor networks reliable communication energy efficient optimal transmitting power adaptive link control

来源：评论

学校读者我要写书评

暂无评论

Fusing Bluetooth with Pedestrian Dead Reckoning: A Floor Plan-Assisted Positioning Approach

arXiv

引用

arXiv 2025年

作者： Pan, Wenxuan Yang, Yang Chen, Mingzhe Wei, Dong Guo, Caili Mao, Shiwen Beijing Key Laboratory of Network System Architecture and Convergence School of Information and Communication Engineering Beijing University of Posts and Telecommunications Beijing100876 China Department of Electrical and Computer Engineering the Institute for Data Science and Computing University of Miami Coral GablesFL33146 United States Institute of Information Engineering Chinese Academy of Sciences Beijing100093 China Beijing Laboratory of Advanced Information Networks School of Information and Communication Engineering Beijing University of Posts and Telecommunications Beijing100876 China Wireless Engineering Research and Education Center Auburn University AuburnAL36849 United States

Floor plans can provide valuable prior information that helps enhance the accuracy of indoor positioning systems. However, existing research typically faces challenges in efficiently leveraging floor plan information and applying it to complex indoor layouts. To fully exploit information from floor plans for positioning, we propose a floor plan-assisted fusion positioning algorithm (FP-BP) using Bluetooth low energy (BLE) and pedestrian dead reckoning (PDR). In the considered system, a user holding a smartphone walks through a positioning area with BLE beacons installed on the ceiling, and can locate himself in real time. In particular, FP-BP consists of two phases. In the offline phase, FP-BP programmatically extracts map features from a stylized floor plan based on their binary masks, and constructs a mapping function to identify the corresponding map feature of any given position on the map. In the online phase, FP-BP continuously computes BLE positions and PDR results from BLE signals and smartphone sensors, where a novel grid-based maximum likelihood estimation (GML) algorithm is introduced to enhance BLE positioning. Then, a particle filter is used to fuse them and obtain an initial estimate. Finally, FP-BP performs post-position correction to obtain the final position based on its specific map feature. Experimental results show that FP-BP can achieve a real-time mean positioning accuracy of 1.19 m, representing an improvement of over 28% compared to existing floor plan-fused baseline algorithms. Copyright © 2025, The Authors. All rights reserved.

关键词： Photomapping

来源：评论

学校读者我要写书评

暂无评论

Using index in the MapReduce framework

Using index in the MapReduce framework

引用

12th International Asia Pacific Web Conference, APWeb 2010

作者： An, Mingyuan Wang, Yang Wang, Weiping Key Laboratory of Computer System and Architecture Graduate University of Chinese Academy of Sciences Beijing China Key Laboratory of Computer System and Architecture Chinese Academy of Sciences Institute of Computing Technology Beijing China

ISBN: (纸本)9780769540122

MapReduce is a programming framework introduced by Google for large-scale data processing. It is usually used in a scan-centric fashion where all the data are split into blocks and Maps are generated for each block to scan and process the data in the block, then Reduces merge outputs from all the Maps. When a query intends to process only a subset of the data selected by a predicate, this brute-force method may cause extra I/O overhead spent on irrelevant data, and the overhead for initiating so many Maps may be nontrivial given that the actually interesting data for the query is comparatively small in volume. We propose an approach to integrate the index into the MapReduce execution in which only an appropriate number of Maps are generated, each of which accesses the data using an index. This approach incurs random I/O and remote access to data, so the overall performance depends on both system parameters and the query characteristics. We build a cost model for both this index access execution and the traditional full scan execution. This cost model can be used to choose between the two execution modes before executing a query. Experiments show that the index access execution can greatly outperform full scan execution when the selectivity of the predicate is low, and the cost model predicts the actual execution cost very well so can be used to determine the execution plan for a query. © 2010 IEEE.

关键词： MapReduce

来源：评论

学校读者我要写书评

暂无评论

PARBLO:Page-Allocation-Based DRAM Row Buffer Locality Optimization

引用

Journal of computer Science & Technology 2009年第6期24卷 1086-1097页

作者：米伟冯晓兵贾耀仓陈莉薛京灵 Key Laboratory of Computer System and Architecture Institution of Computing Technology Chinese Academy of Sciences Graduate University of Chinese Academy of Sciences Programming Languages and Compilers Group School of Computer Science and Engineering University of New South Wales

DRAM row buffer conflicts can increase memory access latency significantly. This paper presents a new pageallocation-based optimization that works seamlessly together with some existing hardware and software optimizations to eliminate significantly more row buffer conflicts. Validation in simulation using a set of selected scientific and engineering benchmarks against a few representative memory controller optimizations shows that our method can reduce row buffer miss rates by up to 76% （with an average of 37.4%）. This reduction in row buffer miss rates will be translated into performance speedups by up to 15% （with an average of 5%）.

关键词： DRAM row buffer page allocation locality optimization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：