While in-memory databases have largely removed I/O as a bottleneck for database operations, loading the data from storage into memory remains a significant limiter to end-to end performance. Snappy is a widely used co...
详细信息
While in-memory databases have largely removed I/O as a bottleneck for database operations, loading the data from storage into memory remains a significant limiter to end-to end performance. Snappy is a widely used compression algorithm in the Hadoop ecosystem and in database systems and is an option in often-used file formats such as Parquet and ORC. Compression reduces the amount of data that must be transferred from/to the storage saving both storage space and storage bandwidth. While it is easy for a CPU Snappy decompressor to keep up with the bandwidth of a hard disk drive, when moving to NVMe devices attached with high bandwidth connections such as PCIe Gen4 or OpenCAPI, the decompression speed in a CPU is insufficient. We propose an FPGA-based Snappy decompressor that can process multiple tokens in parallel and operates on each FPGA block ram independently. Read commands are recycled until the read data is valid dramatically reducing control complexity. One instance of our decompression engine takes 9% of the LUTs in the XCKU15P FPGA, and achieves up to 3GB/s (5GB/s) decompression rate from the input (output) side, about an order of magnitude faster than a CPU (single thread). Parquet allows for independent decompression of multiple pages and instantiating eight of these units on a XCKU15P FPGA can keep up with the highest performance interface bandwidths.
Deploying advanced Simultaneous Localisation and Mapping, or SLAM, algorithms in autonomous low-power robotics will enable emerging new applications which require an accurate and information rich reconstruction of the...
详细信息
ISBN:
(纸本)9782839918442
Deploying advanced Simultaneous Localisation and Mapping, or SLAM, algorithms in autonomous low-power robotics will enable emerging new applications which require an accurate and information rich reconstruction of the environment. This has not been achieved so far because accuracy and dense 3D reconstruction come with a high computational complexity. This paper discusses custom hardware design on a novel platform for embedded SLAM, an FPGA-SoC, combining an embedded CPU and programmablelogic on the same chip. The use of programmablelogic, tightly integrated with an efficient multicore embedded CPU stands to provide an effective solution to this problem. In this work an average framerate of more than 4 frames/second for a resolution of 320x240 has been achieved with an estimated power of less than 1 Watt for the custom hardware. In comparison to the software-only version, running on a dual-core ARM processor, an acceleration of 2x has been achieved for LSD-SLAM, without any compromise in the quality of the result.
FPGAs have been demonstrated as promising platforms to accelerate graph processing applications at scale with superior energyefficiency. However, programming FPGAs is significantly more challenging than similar softwa...
详细信息
ISBN:
(纸本)9781450353168
FPGAs have been demonstrated as promising platforms to accelerate graph processing applications at scale with superior energyefficiency. However, programming FPGAs is significantly more challenging than similar software solutions. To address this productivity challenge, several graph processing frameworks for FPGA have already been proposed in recent years. These frameworks aim to lower a programmeraAZs burden by requiring users to provide only logic specific to the target graph algorithm, while leaving the auto generation of the rest of the hardware design to the framework. In this work, we extend the capability of the GraVF framework and improve the scale of its supported input graphs by a) making the synchronization method independent of the network structure and b) adding support for off-chip memory. The improved system accepts graph sizes an order of magnitude larger than previously reported and provides throughput in the order of 100MTEPS per processing element.
Thanks to their excellent performances on typical artificial intelligence problems, deep neural networks have drawn a lot of interest lately. However, this comes at the cost of large computational needs and high power...
详细信息
Thanks to their excellent performances on typical artificial intelligence problems, deep neural networks have drawn a lot of interest lately. However, this comes at the cost of large computational needs and high power consumption. Benefiting from high precision at acceptable hardware cost on these difficult problems is a challenge. To address it, we advocate the use of ternary neural networks (TNN) that, when properly trained, can reach results close to the state of the art using floatingpoint arithmetic. We present a highly versatile FPGA friendly architecture for TNN in which we can vary both the number of bits of the input data and the level of parallelism at synthesis time, allowing to trade throughput for hardware resources and power consumption. To demonstrate the efficiency of our proposal, we implement high-complexity convolutional neural networks on the Xilinx Virtex-7 VC709 FPGA board. While reaching a better accuracy than comparable designs, we can target either high throughput or low power. We measure a throughput up to 27 000 fps at ≈7W or up to 8.36 TMAC/s at ≈13 W.
In recent years, the RapidSmith CAD tool [1] has been used with ISE to create custom CAD tools targeting Xilinx FPGAs. This tool flow was based on the Xilinx Design Language (XDL), a human-readable representation of a...
详细信息
In this work it is showed that compressively strained Ge1-xSnx/Ge quantum wells (QWs) grown on a Ge virtual substrate are very promising TE mode gain medium. Moreover we show how emission wavelength and polarization c...
详细信息
ISBN:
(纸本)9781509053230
In this work it is showed that compressively strained Ge1-xSnx/Ge quantum wells (QWs) grown on a Ge virtual substrate are very promising TE mode gain medium. Moreover we show how emission wavelength and polarization can be controlled in Ge1-wSnw/SiyGe1-x-ySnx QWs. Demonstration of capabilities of presented QW systems bases on analysis of transverse electric (TE) and transverse magnetic (TM) modes of material gain, which is calculated by 8-band k.p electronic bandstructure.
The modelling for renewable energy system (RES) performance evaluation and impact analysis can be a challenging task, where there are multiple criteria with both quantitative and qualitative forms under uncertainty. M...
详细信息
ISBN:
(纸本)9781538626795
The modelling for renewable energy system (RES) performance evaluation and impact analysis can be a challenging task, where there are multiple criteria with both quantitative and qualitative forms under uncertainty. Multi-criteria decision analysis (MCDA) methods have become increasingly popular in the decision-making for renewable energy systems because of the multi-dimensionality of the sustainability goal and the complexity of the technical, environmental, economic and social perspectives. This paper aims to review the applications of MCDA methods to the performance modelling and impact analysis of RESs, primarily in four relevant areas, including renewable energy planning and policy, renewable energy evaluation and assessment, renewable energy project selection and allocation and RES environmental impact assessment. Further research can be conducted to study the feasibility of different RESs and the selection of an appropriate MCDA methodology in alternative energy decision-making.
In this paper, we proposed an new Double-Gate (DG) PN-type tunneling field-effect transistor with exploiting induced channel layer (iTFETs) for line tunneling and low power applications (V D = 0.1V & 0.05V). Unli...
详细信息
In this paper, we proposed an new Double-Gate (DG) PN-type tunneling field-effect transistor with exploiting induced channel layer (iTFETs) for line tunneling and low power applications (V D = 0.1V & 0.05V). Unlike conventional TFET, with same type of doping in source and underneath the gate change the topology from PIN to PN TFET. We replace point tunneling with line tunneling, which can improves currents, average and minimum subthreshold swing. By using PN structure, the ambipolar effect can be effectively suppressed. Furthermore, the proposed topology improves the double and vertical gate results. These devices achieve 4.1 × 10 -7 A/μm of ON current (I ON (I d @ V G = V D ), 2.1 × 10 -17 A/μm of OFF current (I OFF (I d @ V G = 0 V), and 1.95 × 10 10 of ON/OFF current ratio (I ON /I OFF ). A minimum subthreshold swing SS avg = 12.2 mV/dec and the SS min = 5 mV/dec are obtained.
P300-based brain-computer interface (BCI) is one of the most common BCIs. Due to the characteristics of P300 responses vary from person to person, it leads to the necessity of collecting much labeled data from each us...
详细信息
ISBN:
(纸本)9781509046034
P300-based brain-computer interface (BCI) is one of the most common BCIs. Due to the characteristics of P300 responses vary from person to person, it leads to the necessity of collecting much labeled data from each user and the problem of time-consuming in many applications. In this work, a transfer learning method which dynamically adjusts the weights of instances is applied to improve the P300-based BCI. Offline experiments on BCI competition III and P300 speller with ALS patients dataset prove the robustness of different subjects and the validity when the data of different individuals are sufficient. Online experiments on our P300-based robot control system demonstrated that the classification performance could be enhanced by 13.02% at most compared to the traditional classifiers.
Human computation (HC) is an active research field in which people play a notable role as computational elements in an automated system with the aim of arriving at a truly symbiotic human-computer interaction. Situati...
详细信息
ISBN:
(纸本)9781509060344
Human computation (HC) is an active research field in which people play a notable role as computational elements in an automated system with the aim of arriving at a truly symbiotic human-computer interaction. Situational awareness (SA) and decision support systems (DSSs) are two domains where human computation is rapidly advancing, with the latter arising as an invaluable vehicle to achieve the former. Fuzzy systems and fuzzy logic are two commonly employed tools in these domains due to their inherent capabilities of representing and processing vague and imprecise information while conveying the analysis results in an interpretable fashion. In this paper, we elaborate on the human computation aspects of risk analysis within SA and DSS conducted with the aid of fuzzy sets. The study makes the following contributions: (1) we argue that risk analysis must be a highly automated yet still human-centric endeavour and highlight four manners in which the human component provides value to the underlying data/information fusion processes;(2) we illustrate this fuzzy/human risk analysis methodology through a multimodular Risk Management Framework (RMF) architecture and its application to the maritime domain, particularly in hard-soft data fusion, automated response generation to maritime incidents, port anomaly filtering and dynamic risk management triggered by contextual knowledge and (3) the framework under discussion can be extrapolated to other domains with negligible effort.
暂无评论