Edge learning (EL) is an end-to-edge collaborative learning paradigm enabling devices to participate in model training and data analysis, opening countless opportunities for edge intelligence. As a promising EL framew...
详细信息
With the increasing amount of data,there is an urgent need for efficient sorting algorithms to process large data *** sorting algorithms have attracted much attention because they can take advantage of different hardw...
详细信息
With the increasing amount of data,there is an urgent need for efficient sorting algorithms to process large data *** sorting algorithms have attracted much attention because they can take advantage of different hardware's *** the traditional hardware sort accelerators suffer“memory wall”problems since their multiple rounds of data transmission between the memory and the *** this paper,we utilize the in-situ processing ability of the ReRAM crossbar to design a new ReCAM array that can process the matrix-vector multiplication operation and the vector-scalar comparison in the same array *** this designed ReCAM array,we present ReCSA,which is the first dedicated ReCAM-based sort *** hardware designs,we also develop algorithms to maximize memory utilization and minimize memory exchanges to improve sorting *** sorting algorithm in ReCSA can process various data types,such as integer,float,double,and *** also present experiments to evaluate the performance and energy efficiency against the state-of-the-art sort *** experimental results show that ReCSA has 90.92×,46.13×,27.38×,84.57×,and 3.36×speedups against CPU-,GPU-,FPGA-,NDP-,and PIM-based platforms when processing numeric data *** also has 24.82×,32.94×,and 18.22×performance improvement when processing string data sets compared with CPU-,GPU-,and FPGA-based platforms.
Transactional stream processing engines (TSPEs) are central to modern stream applications handling shared mutable states. However, their full potential, particularly in adaptive scheduling, remains largely unexplored....
详细信息
1 Introduction In recent years,the Massively Parallel Computation(MPC)model has gained significant ***,most of distributed and parallel graph algorithms in the MPC model are designed for static graphs[1].In fact,the g...
详细信息
1 Introduction In recent years,the Massively Parallel Computation(MPC)model has gained significant ***,most of distributed and parallel graph algorithms in the MPC model are designed for static graphs[1].In fact,the graphs in the real world are constantly *** size of the real-time changes in these graphs is smaller and more *** graph algorithms[2,3]can deal with graph changes more efficiently[4]than the corresponding static graph ***,most studies on dynamic graph algorithms are limited to the single machine ***,a few parallel dynamic graph algorithms(such as the graph connectivity)in the MPC model[5]have been proposed and shown superiority over their parallel static counterparts.
Media power,the impact that media have on public opinion and perspectives,plays a significant role in maintaining internal stability,exerting external influence,and shaping international dynamics for nations/***,prior...
详细信息
Media power,the impact that media have on public opinion and perspectives,plays a significant role in maintaining internal stability,exerting external influence,and shaping international dynamics for nations/***,prior research has primarily concentrated on news content and reporting time,resulting in limitations in evaluating media *** more accurately assess media power,we use news content,news reporting time,and news emotion simultaneously to explore the emotional contagion between *** use emotional contagion to measure the mutual influence between media and regard the media with greater impact as having stronger media *** propose a framework called Measuring Media Power via Emotional Contagion(MMPEC)to capture emotional contagion among media,enabling a more accurate assessment of media power at the media and national/regional *** also interprets experimental results through correlation and causality analyses,ensuring *** analyses confirm the higher accuracy of MMPEC compared to other baseline models,as demonstrated in the context of COVID-19-related news,yielding compelling and interesting insights.
Computer vision(CV)algorithms have been extensively used for a myriad of applications *** the multimedia data are generally well-formatted and regular,it is beneficial to leverage the massive parallel processing power...
详细信息
Computer vision(CV)algorithms have been extensively used for a myriad of applications *** the multimedia data are generally well-formatted and regular,it is beneficial to leverage the massive parallel processing power of the underlying platform to improve the performances of CV *** Instruction Multiple Data(SIMD)instructions,capable of conducting the same operation on multiple data items in a single instruction,are extensively employed to improve the efficiency of CV *** this paper,we evaluate the power and effectiveness of RISC-V vector extension(RV-V)on typical CV algorithms,such as Gray Scale,Mean Filter,and Edge *** our examinations,we show that compared with the baseline OpenCV implementation using scalar instructions,the equivalent implementations using the RV-V(version 0.8)can reduce the instruction count of the same CV algorithm up to 24x,when processing the same input ***,the actual performances improvement measured by the cycle counts is highly related with the specific implementation of the underlying RV-V *** our evaluation,by using the vector co-processor(with eight execution lanes)of Xuantie C906,vector-version CV algorithms averagely exhibit up to 2.98x performances speedups compared with their scalar counterparts.
Memory disaggregation, facilitated by Smart Network Interface Cards (SmartNICs), has emerged as a cost-effective approach for sharing memory resources in data centers. However, current SoC-based SmartNICs face several...
详细信息
Graph processing has been widely used in many scenarios,from scientific computing to artificial *** processing exhibits irregular computational parallelism and random memory accesses,unlike traditional ***,running gra...
详细信息
Graph processing has been widely used in many scenarios,from scientific computing to artificial *** processing exhibits irregular computational parallelism and random memory accesses,unlike traditional ***,running graph processing workloads on conventional architectures(e.g.,CPUs and GPUs)often shows a significantly low compute-memory ratio with few performance benefits,which can be,in many cases,even slower than a specialized single-thread graph *** domain-specific hardware designs are essential for graph processing,it is still challenging to transform the hardware capability to performance boost without coupled software *** article presents a graph processing ecosystem from hardware to *** start by introducing a series of hardware accelerators as the foundation of this ***,the codesigned parallel graph systems and their distributed techniques are presented to support graph ***,we introduce our efforts on novel graph applications and hardware *** results show that various graph applications can be efficiently accelerated in this graph processing ecosystem.
Data race is one of the most important concurrent anomalies in multi-threaded *** con-straint-based techniques are leveraged into race detection,which is able to find all the races that can be found by any oth-er soun...
详细信息
Data race is one of the most important concurrent anomalies in multi-threaded *** con-straint-based techniques are leveraged into race detection,which is able to find all the races that can be found by any oth-er sound race ***,this constraint-based approach has serious limitations on helping programmers analyze and understand data ***,it may report a large number of false positives due to the unrecognized dataflow propa-gation of the ***,it recommends a wide range of thread context switches to schedule the reported race(in-cluding the false one)whenever this race is exposed during the constraint-solving *** ad hoc recommendation imposes too many context switches,which complicates the data race *** address these two limitations in the state-of-the-art constraint-based race detection,this paper proposes DFTracker,an improved constraint-based race detec-tor to recommend each data race with minimal thread context ***,we reduce the false positives by ana-lyzing and tracking the dataflow in the *** this means,DFTracker thus reduces the unnecessary analysis of false race *** further propose a novel algorithm to recommend an effective race schedule with minimal thread con-text switches for each data *** experimental results on the real applications demonstrate that 1)without removing any true data race,DFTracker effectively prunes false positives by 68%in comparison with the state-of-the-art constraint-based race detector;2)DFTracker recommends as low as 2.6-8.3(4.7 on average)thread context switches per data race in the real world,which is 81.6%fewer context switches per data race than the state-of-the-art constraint based race ***,DFTracker can be used as an effective tool to understand the data race for programmers.
In the realm of recommendation systems, achieving real-time performance in embedding similarity tasks is often hindered by the limitations of traditional Top-K sparse matrix-vector multiplication (SpMV) methods, which...
详细信息
暂无评论