Point clouds are crucial for 3D geometry representation, and vital in applications like autonomous driving and augmented reality. Despite advancements in deep learning-based analytics, their high computational cost li...
详细信息
ISBN:
(数字)9798331515966
ISBN:
(纸本)9798331515973
Point clouds are crucial for 3D geometry representation, and vital in applications like autonomous driving and augmented reality. Despite advancements in deep learning-based analytics, their high computational cost limits deployment on edge devices with constrained resources. To this end, we analyze PointNet++, a leading point cloud analytics framework, identifying two major bottlenecks: 1) GPU is underutilized due to limited parallelism and excessive kernel launches in the sampling stage and voting stage, and 2) irregular memory accesses in the grouping stage. To address these, we propose parallel sampling and voting to enhance GPU utilization and fuse subroutines in grouping to improve memory efficiency. Experimental results demonstrate that our optimizations result in significant speedup (up to $5.0 \times, 3.2 \times$ on average) across various point cloud workloads on edge devices.
Nowadays, GPU is becoming popular across a broad range of domains. To provide virtual memory support for most applications at present, GPU introduces the address translation process. However, many applications show an...
详细信息
ISBN:
(纸本)9783030953881;9783030953874
Nowadays, GPU is becoming popular across a broad range of domains. To provide virtual memory support for most applications at present, GPU introduces the address translation process. However, many applications show an irregular memory access pattern, i.e. accesses are poor structured and often data dependent, which makes performance worse especially with virtual-to-physical address translations. GPU memory management unit (MMU) adopts caching units, e.g. page walk buffer (PWB) and page walk cache (PWC), and schedule strategies to accelerate the address translations after TLB misses. However, limited by the linear table structure of traditional PWB and PWC, they hold too many redundant information, which further limits the performance of irregular applications. Although nonlinear structure can eliminate the redundancy, it requires sequential look-up on PWB and PWC, which brings greater performance loss. In this paper, we propose the multilevel PWB and PWC structure, which features the multi-level structure for eliminating the redundancy in traditional structure and the co-design of PWB and PWC for enabling parallel look-up. Besides, we design four corresponding address translation processes to ensure the efficiency of the new structure. We evaluate our design with real-world benchmarks under GPGPU-Sim simulator. Results show that our design achieves 42.6% IPC improvement with 35.1% less space overheads.
This paper presents an intelligent system to manage risks in smart cities. This system is based on mining social media big perceptions by using natural language processing in identifying and assessing risks. It will h...
详细信息
This paper presents an intelligent system to manage risks in smart cities. This system is based on mining social media big perceptions by using natural language processing in identifying and assessing risks. It will help the authority of smart cities in managing risks processes. In addition, a related survey for the most relevant research articles in the area of risk management in smart cities is included. Besides, we provide a framework to the proposed system and describe it mathematically as well as explicit mathematical expressions such as identifying risks in tweets are driven. Moreover, a real data is collected from profiles of London citizens on Twitter, and risk analysis and assessment are performed. Furthermore, performance measures are included to show the validity of the proposed system.
The projection process of the LiDAR 3D Point Cloud data is one of the crucial steps in Computer vision applications. It involves several steps to achieve the finalized accurate results. Many current studies leverage t...
详细信息
ISBN:
(数字)9798350372977
ISBN:
(纸本)9798350372984
The projection process of the LiDAR 3D Point Cloud data is one of the crucial steps in Computer vision applications. It involves several steps to achieve the finalized accurate results. Many current studies leverage the benefits of using a GPU in the computing capability. This paper presents a comparative study of the implementation and testing of this process on single-core, multi-core CPU and GPU architectures. The computational efficiency of each platform is evaluated through a series of benchmarks, including data extraction, segmentation, and trans-formation tasks. Our analysis reveals the inherent parallelization benefits of GPUs in handling large-scale point cloud data, while also considering the accessibility of multi-core CPUs. Also, a comparison between the NVIDIA RTX 3070 and NVIDIA RTX 4060 is provided. The RTX 3070 showed roughly a speed up of 8 times over the RTX 4060. In addition, the Multi-core implementation outperforms up to 10 times over the single-core. These results overall show the benefits of using the Multi-core and the GPU accelerating approaches to this application, with the availability for further improvements.
We develop a novel multi-cloud container orchestration architecture for high-performance Real-Time Online Interactive applications (ROIA), with use cases including product configurators, multiplayer online gaming, e-l...
We develop a novel multi-cloud container orchestration architecture for high-performance Real-Time Online Interactive applications (ROIA), with use cases including product configurators, multiplayer online gaming, e-learning and - training. Running the core components of ROIA, e.g., real-time 3D rendering, on a multi-cloud enables access to high-performance resources and prevents proprietary ‘vendor lock-in’. Our container orchestration facilitates: (1) strict Quality of Service (QoS) requirements, (2) secure communication between cluster nodes from different clouds, (3) automatic scalability, and (4) resource usage optimization. We improve previous work by using session slots that set a limit on the number of concurrent user sessions for a service instance without loss of QoS. Our implementation provides a vendor-independent, OpenVPN-based interconnection between cloud nodes, both Linux and Windows, possibly located in different LANs of a multi-cloud. We experimentally evaluate our orchestration approach on a Kubernetes-based cluster using a prototype of an interactive car configurator.
Even though a large-scale graph structure is a powerful model to solve several challenging problems in various applications' domains today, it can also preserve various raw essences regarding user behavior, especi...
详细信息
ISBN:
(纸本)9783031061561;9783031061554
Even though a large-scale graph structure is a powerful model to solve several challenging problems in various applications' domains today, it can also preserve various raw essences regarding user behavior, especially in the e-commerce domain. Information extraction is a promising research area in deep learning algorithms using large-scale graph data. This study focuses on understanding users' implicit navigational behavior on an e-commerce site that we can represent with the large-scale graph data. We propose a GAN-based e-business workflow by leveraging the large-scale browsing graph data and the footprints of navigational users' behavior on the e-commerce site. With this method, we have discovered various frequently repeated click-stream data sequences, which do not appear in training data at all. Therefore, We developed a prototype application to demonstrate performance tests on the proposed business e-workflow. The experimental studies we conducted show that the proposed methodology produces noticeable and reasonable outcomes for our prototype application.
A traditional computing system is limited in its functionality. Cloud computing provides ubiquitous access to network resources and the Internet of Things (IoT) devices growing rapidly. IoT devices can't perform c...
详细信息
Blockchain technology has demonstrated to be extremely efficient at processingdistributed transactions in a secured manner. It consists of a wide range of applications. B. To handle smart contacts and Bitcoin cash. B...
Blockchain technology has demonstrated to be extremely efficient at processingdistributed transactions in a secured manner. It consists of a wide range of applications. B. To handle smart contacts and Bitcoin cash. Blockchain systems may allow automatic data exchange and thinking about, resulting in increased effectiveness and lower costs. This is made possible by the adoption of contract technology and decentralized ledgers. Additionally, blockchain technologies can enhance integrity and security of data, enabling more precise and reliable analysis of information. Ethereum along with information science integration may make it easier to create decentralized apps, opening up novel industries and income sources. However, there are still issues with adaptability, interconnection, and complexity in the combination of blockchain technology and data science. For this type of technology to achieve all of its potential advantages, it is imperative that studies and advancements in this field persist. Blockchain applications for data science are currently being investigated. This essay examines how blockchain technology is used in cybersecurity and data science.
Logs that record execution information and system status, are useful for anomaly detection. Traditionally, developers manually checked logs through keyword search or rule matching. However, as the number of logs incre...
Logs that record execution information and system status, are useful for anomaly detection. Traditionally, developers manually checked logs through keyword search or rule matching. However, as the number of logs increases, this approach becomes unrealistic. More and more methods of log anomaly detection based on machine learning and deep learning have been proposed. In order to detect log anomalies accurately and effectively, this paper proposes a comprehensive framework, namely TCN-Log2Vec. Based on Drain, a prevalent and effective method to extract the log templates from the original logs, TCN-Log2Vec has done some optimization work in log parsing. It also captures sequence information, quantitative information, and semantic information from the original log. Meanwhile, it designs the anomaly detection module based on TCN (Temporal Convolutional Network) to achieve parallelprocessing. Our log parsing method achieves 99% accuracy on HDFS datasets and 94% accuracy on BGL datasets. We compare our anomaly detection framework with other advanced methods including DeepLog, LogAnomaly, and LogRobust, the experiment results show that our model has better performance.
In a heterogeneous computing system, different kinds of processors might need to be involved in the execution of a file I/O operation. Since NEC SX-Aurora TSUBASA is one such system, two I/O acceleration mechanisms ar...
详细信息
ISBN:
(纸本)9781665435772
In a heterogeneous computing system, different kinds of processors might need to be involved in the execution of a file I/O operation. Since NEC SX-Aurora TSUBASA is one such system, two I/O acceleration mechanisms are offered to reduce the data transfer overheads among the processors for a file I/O operation. This paper first investigates the effects of the two mechanisms on the I/O performance of SX-Aurora TSUBASA. Considering the results, proper use of the two mechanisms is discussed via a real-world application of flood damage estimation. These results clearly demonstrate the demand for auto-tuning, i.e., adaptively selecting either of the two mechanisms with considering application behaviors and system configuration.
暂无评论