Pedestrian detection is a significant field within computer vision, closely related to object detection, with applications spanning surveillance, autonomous driving, optical character recognition (OCR), and face recog...
详细信息
the increasing prevalence of DeepFake technology poses significant threats to various industries and public trust, making the development of robust detection methods crucial. In this study, we propose a novel approach...
详细信息
Medical image processing has revolutionized the way in which healthcare professionals diagnose and treat various diseases and conditions. GI cancer is the fastest-growing cancer in recent times with an estimated 5 mil...
详细信息
Accurately predicting loan repayment behavior is a critical challenge for financial institutions, which often face high default rates and financial instability due to inaccurate credit assessments. In order to address...
详细信息
Content Delivery Networks (CDNs) are fundamental to modern Internet content distribution, enabling high-speed delivery of web pages, videos, and other online resources. Optimizing caching strategies within CDNs is cru...
详细信息
this paper introduces a personalized memory aid system that combines a high-performance cloud-based large language model (LLM) with a low-power edge device running a small language model (sLM) to enhance user producti...
详细信息
Recent research shows that artificial intelligence (AI) algorithms can dramatically improve the profitability of high-frequency trading (HFT) with accurate market prediction, overcoming the limitation of conventional ...
详细信息
ISBN:
(纸本)9781665476522
Recent research shows that artificial intelligence (AI) algorithms can dramatically improve the profitability of high-frequency trading (HFT) with accurate market prediction, overcoming the limitation of conventional latency-oriented approaches. However, it is challenging to integrate the computationally intensive AI algorithm into the existing trading pipeline due to its excessively long latency and insufficient throughput, necessitating a breakthrough in hardware. Furthermore, harsh HFT environments such as bursty data traffic and stringent power constraint make it even more difficult to achieve systemlevel performance without missing crucial market signals. In this paper, we present LightTrader, the world's first AIenabled HFT system that incorporates an FPGA and custom AI accelerators for short-latency-high-throughput trading systems. Leveraging the computing power of brand-new AI accelerators fabricated in TSMC's 7nm FinFET technology, LightTrader optimizes the tick-to-trade latency and response rate for stock market data. the AI accelerators, adopting Coarse-Grained Reconfigurable Array (CGRA) architecture, which maximizes the hardware utilization from the flexible dataflow architecture, achieve a throughput of 16 TFLOPS and 64 TOPS. In addition, we propose both workload scheduling and dynamic voltage and frequency scaling (DVFS) scheduling algorithms to find an optimal offloading strategy under bursty market data traffic and limited power condition. Finally, we build a reliable and rerunnable simulation framework that can back-test the historical market data, such as Chicago Mercantile Exchange (CME), to evaluate the LightTrader system. We thoroughly explore the performance of LightTrader when the number of AI accelerators, power conditions, and complexity of deep neural network models change. As a result, LightTrader achieves 13.92x and 7.28x speed-up of AI algorithm processing compared to existing GPU-based, FPGA-based systems, respectively. LightTrader wit
Phishing emails remain a significant cybersecurity threat, bypassing traditional rule-based detection methods. this paper proposes a novel Machine Learning (ML) and Natural Language Processing (NLP) based approach for...
详细信息
the convolution operator is a crucial kernel for many computer vision and signal processing applications that rely on deep learning (DL) technologies. As such, the efficient implementation of this operator has receive...
详细信息
ISBN:
(数字)9781665451550
ISBN:
(纸本)9781665451550
the convolution operator is a crucial kernel for many computer vision and signal processing applications that rely on deep learning (DL) technologies. As such, the efficient implementation of this operator has received considerable attention in the past few years for a fair range of processor architectures. In this paper, we follow the technology trend toward integrating long SIMD (single instruction, multiple data) arithmetic units into highperformance multicore processors to analyse the benefits of this type of hardware acceleration for latency-constrained DL workloads. For this purpose, we implement and optimise for the Fujitsu processor A64FX, three distinct methods for the calculation of the convolution, namely, the lowering approach, a blocked variant of the direct convolution algorithm, and the Winograd minimal filtering algorithm. Our experimental results include an extensive evaluation of the parallel scalability of these three methods and a comparison of their global performance using three popular DL models and a representative dataset.
the progression in quantum computing and the rapid development of quantum computation hardware has raised expectations for its application to commercially relevant use cases in the future. However, the need for high-l...
详细信息
暂无评论