Coordination among users is an indispensable part in wireless networks for efficient access control. Alone with the rapid increase of the data transmission rate, however, coordination time becomes insufferable, even s...
详细信息
Coordination among users is an indispensable part in wireless networks for efficient access control. Alone with the rapid increase of the data transmission rate, however, coordination time becomes insufferable, even several times higher than that for data transmission. We present SIF, a signature-based frequency-domain contention mechanism to achieve high coordination efficiency with low overhead. In SIF, different user is assigned by a different PN sequence as a signature. A contending user issues its signature on some specific OFDM subcarriers and uses the binary sequence of the ON/OFF states of all OFDM subcarriers to deliver the contend information. A signature-based detection method is proposed to detect the CVs of other nodes quickly and reliably. It is shown that, the collision probability of SIF is very low even in a large wireless networks, e.g., less than 0.2% with 100 users. Moreover, as SIF can complete the coordination within one slot in most cases, the throughput gain is up to 200% in comparison with 802.11.
Github facilitates the pull-request mechanism as an outstanding social coding paradigm by integrating with social media. The review process of pull-requests is a typical crowdsourcing job which needs to solicit opinio...
详细信息
The contribution of parasitic bipolar amplification to SETs is experimentally verified using two P-hit target chains in the normal layout and in the special layout. For PMOSs in the normal layout, the single-event cha...
详细信息
The contribution of parasitic bipolar amplification to SETs is experimentally verified using two P-hit target chains in the normal layout and in the special layout. For PMOSs in the normal layout, the single-event charge collection is composed of diffusion, drift, and the parasitic bipolar effect, while for PMOSs in the special layout, the parasitic bipolar junction transistor cannot turn on. Heavy ion experimental results show that PMOSs without parasitic bipolar amplification have a 21.4% decrease in the average SET pulse width and roughly a 40.2% reduction in the SET cross-section.
The basic algorithm of HPL was introduced. Two optimization methods of communication, i.e., advanced-lookahead and dynamic broadcasting algorithm, were proposed. The performances of the two optimization methods were e...
详细信息
In the paper, a new implementation of a 3GPP LTE standards compliant turbo decoder based on GPGPU is proposed. It uses the newest GPU-Tesla K20c, which is based on the Kepler GK110 architecture. The new architecture h...
详细信息
ISBN:
(纸本)9781479944156
In the paper, a new implementation of a 3GPP LTE standards compliant turbo decoder based on GPGPU is proposed. It uses the newest GPU-Tesla K20c, which is based on the Kepler GK110 architecture. The new architecture has more powerful parallel computing capability and we use it to fully exploit the parallelism in the turbo decoding algorithm in novel ways. Meanwhile, we use various memory hierarchies to meet various kinds of data demands on speed and capacity. Simulation shows that our implementation is practical and it gets 76% improvement on throughput over the latest GPU implementation. The result demonstrates that the newest Kepler architecture is suitable for turbo decoding and it can be a promising reconfigurable platform for the communication system.
Fingerprint matching is a key procedure in fingerprint identification applications. The fingerprint-matching algorithm based on minutiae is one of the most typical algorithms that can achieve a reasonably correct reco...
详细信息
Fingerprint matching is a key procedure in fingerprint identification applications. The fingerprint-matching algorithm based on minutiae is one of the most typical algorithms that can achieve a reasonably correct recognition rate. Performance and cost are two critical factors when implementing minutia-based matching algorithms in most embedded applications. A low-cost, fully pipelined architecture for minutia-based fingerprint matching is proposed in this paper. A regular matching unit with a pipeline of 13 stages is designed as the core of the architecture, interfacing with a two-port RAM and a DDR3 controller. We implemented the whole architecture on a Xilinx FPGA board with the Virtex VII XC7VX485T chip. The matching unit can run with a frequency of 330 MHz on the chip, which leads the system to achieve a throughput of about 430000 fingerprints per second when processing typical datasets. The unit only occupies 568 slices, which is less than 1% of the available chip resources. The board only consumes 16 W of power when run. The architecture can gain about twice the throughput of the 2.93 GHz Intel Xeon5670 CPU at a low logic cost and power.
The Embarrassingly parallel (EP) algorithm which is typical of many Monte Carlo applications provides an estimate of the upper achievable limits for double precision performance of parallel supercomputers. Recently, I...
详细信息
ISBN:
(纸本)9781479920327
The Embarrassingly parallel (EP) algorithm which is typical of many Monte Carlo applications provides an estimate of the upper achievable limits for double precision performance of parallel supercomputers. Recently, Intel released Many Integrated Core (MIC) architecture as a many-core co-processor. MIC often offers more than 50 cores each of which can run four hardware threads as well as 512-bit vector instructions. In this paper, we describe how the EP algorithm is accelerated effectively on the platforms containing MIC using the offload execution model. The result shows that the efficient implementation of EP algorithm on MIC can take full advantage of MIC's computational resources and achieves a speedup of 3.06 compared with that on Intel Xeon E5-2670 CPU. Based on the EP algorithm on MIC and an effective task distribution model, the implementation of EP algorithm on a CPU-MIC heterogeneous platform achieves the performance of up to 2134.86 Mop/s and 4.04 times speedup compared with that on Intel Xeon E5-2670 CPU.
The double-precision matrix-matrix multiplication (DGEMM) on ARMv8 64-bit multi-core processor architecture was realized and optimized, and the optimal model for the purpose of maximizing the compute-to-memory access ...
详细信息
To reduce the access latencies of end hosts,latency-sensitive applications need to choose suitably close service machines to answer the access requests from end *** K nearest neighbor search locates K service machines...
详细信息
To reduce the access latencies of end hosts,latency-sensitive applications need to choose suitably close service machines to answer the access requests from end *** K nearest neighbor search locates K service machines closest to end hosts,which can efficiently optimize the access latencies for end *** work has weakness in terms of the accuracy and *** to the scalable and accurate K nearest neighbor search problem,we propose a distributed K nearest neighbor search method called DKNNS in this *** machines are organized into a locality-aware multilevel *** first locates a service machine that starts the search process based on a farthest neighbor search scheme,then discovers K nearest service machines based on a backtracking approach within the proximity region containing the target in the latency *** analysis,simulation results and deployment experiments on the PlanetLab show that,DKNNS can determine K approximately optimal service machines,with modest completion time and query ***,DKNNS is also quite stable that can be used for reducing frequent searches by caching found nearest neighbors.
Many big data applications receive and process data in real time. These data, also known as data streams, are generated continuously and processed online in a low latency manner. Data stream is prone to change dramati...
详细信息
Many big data applications receive and process data in real time. These data, also known as data streams, are generated continuously and processed online in a low latency manner. Data stream is prone to change dramatically in volume, since its workload may have a variation of several orders between peak and valley periods. Fully provisioning resources for stream processing to handle the peak load is costly, while over-provisioning is wasteful when to deal with lightweight workload. Cloud computing emphasizes that resource should be utilized economically and elastically. An open question is how to allocate query task adaptively to keeping up the input rate of the data stream. Previous work focuses on using either local or global capacity information to improve the cluster CPU resource utilization, while the bandwidth utilization which is also critical to the system throughput is ignored or simplified. In this paper, we formalize the operator placement problem considering both the CPU and bandwidth usage, and introduce the Elastic Allocator. The Elastic Allocator uses a quantitative method to evaluate a node's capacity and bandwidth usage, and exploit both the local and global resource information to allocate the query task in a graceful manner to achieve high resource utilization. The experimental results and a simple prototype built on top of Storm finally demonstrate that Elastic Allocator is adaptive and feasible in cloud computing environment, and has an advantage of improving and balancing system resource utilization.
暂无评论