Influence maximization plays a pivotal role in areas such as cybersecurity, public opinion management, and viral marketing. However, conventional methods for influence maximization grapple with challenges such as seed...
详细信息
With the rapid growth of the on-demand economy, logistics companies and merchants increasingly struggle to meet customer demands in dynamic and uncertain conditions. This paper studies the coordinated delivery of parc...
详细信息
With the accelerating growth of Big Data, real-world graph processing applications now need to tackle graphs with billions of vertices and trillions of edges, thereby increasing the demand for effective solutions to a...
详细信息
ISBN:
(纸本)9798350302080
With the accelerating growth of Big Data, real-world graph processing applications now need to tackle graphs with billions of vertices and trillions of edges, thereby increasing the demand for effective solutions to application scalability. Unfortunately, current approaches to implementing these applications on modern HPC systems exhibit poor scale-out performance with increasing numbers of nodes. The scalability challenges for these applications are driven by large data sizes, synchronization overheads, and fine-grained communications with irregular data accesses and poor locality. This paper presents the scalability of a novel Actor-based programming system, which provides a lightweight runtime that supports fine-grained asynchronous execution and automatic message aggregation atop a Partitioned Global Address Space (PGAS) communication layer. Evaluations of the Jaccard Index and PageRank applications on the NERSC Perlmutter system demonstrate nearly perfect scaling up to 1, 000 nodes and 64K cores (one-third of the targeted 3000-nodes for Perlmutter). In addition, our Actor-based implementations of Jaccard Index and PageRank executed with parallel efficiencies of 85.7% and 63.4% for the largest run of 64K cores. This performance represents a 29.6x speedup relative to UPC and OpenSHMEM versions of PageRank.
Big data is an important product of the information age. Integrating big data into smart grid applications and correctly grasping the key technologies of big data can effectively promote the sustainable development of...
详细信息
Big data is an important product of the information age. Integrating big data into smart grid applications and correctly grasping the key technologies of big data can effectively promote the sustainable development of power industry and the construction of strong smart grid. As far as modern smart grid is concerned, this is both an opportunity and a challenge. 3D point cloud data processing is the core content of reverse engineering technology. As an important data processing step in the preprocessing stage of 3D point cloud, point cloud registration plays an extremely important role in obtaining the complete 3Dcoordinates of the measured target surface. However, at present, the registration speed, accuracy and reliability of various registration algorithms still need to be improved. Cloud computing technology integrates several cheap ordinary PCs into a cloud computing cluster, which realizes the safe storage and efficient processing of massive data. Therefore, consider combining cloud computing with data mining algorithm to solve the problem of massive data conversion in smart grid. In this paper, cloud computing technology is introduced into the smart grid condition monitoring field. By introducing distributed file system, improving traditional density clustering algorithm and parallel design, the storage and clustering division of big data in condition monitoring are effectively solved, which provides a feasible method for the application of cloud computing in condition monitoring field.
As high-performance computing technologies advance, the significance of parallel programming in various domains is becoming increasingly evident since it allows us to harness the power of heterogeneous computing and s...
详细信息
ISBN:
(数字)9798350364606
ISBN:
(纸本)9798350364613
As high-performance computing technologies advance, the significance of parallel programming in various domains is becoming increasingly evident since it allows us to harness the power of heterogeneous computing and solve complex problems more efficiently. However, for students to master this type of computation and be able to apply it in different contexts, it requires understanding how measuring and optimizing parallel code impacts its performance. This paper presents an approach to enhancing students' comprehension of parallel performance metrics through an interactive exercise that complements lectures on parallel performance and improves assessment.
Modern enterprises are facing a massive threat from Advanced Persistent Threats (APTs), which have risen to be one of the most dangerous challenges in recent years. Since system logs capture the complex causality depe...
详细信息
ISBN:
(纸本)9798350329223
Modern enterprises are facing a massive threat from Advanced Persistent Threats (APTs), which have risen to be one of the most dangerous challenges in recent years. Since system logs capture the complex causality dependencies between system entities, they have become the primary data source for countering APTs. However, as modern computer systems get more complicated, system logs can pile up in large quantities. Besides, APTs are sophisticated and persistent cyber attacks that can remain hidden in the target for a long time and constantly steal private data. System logs need to be collected and stored for a long duration to enable a complete analysis of APTs. Such a vast amount of log data is challenging for enterprises to store and manage. There are two mainstream solutions for reducing storage overhead. Data compression methods provide an intuitive idea. However, they are designed for general text and lack optimization for system logs. Another solution considers log reduction, which removes redundant system events recorded in system logs by predefined rules. Unfortunately, they are tailored for specific kinds of redundant information, resulting in limited applicability. Realizing that these two solutions adopt two distinct perspectives to reduce storage overhead, they are complementary. Data compression methods shrink the size of log data from their binary form. Log reduction starts from the semantic information of system logs and removes redundant information to reduce storage overhead. Combining both methods maximizes storage efficiency. In this paper, we propose a distributed storage system based on a hybrid compression scheme. To address the above deficiencies, we first identify and merge redundant system events by analyzing and tracing the information flow rather than based on rules. Then, we apply log parsing to preprocess log entries for further storage efficiency. Besides, we design a distributed architecture to optimize compression and eliminate repeated
High fan-out requests are prevalent in systems employing multi-tier architectures. These requests are divided into several sub-requests for parallel processing. However, a high fan-out request must await all sub-reque...
详细信息
The P-RECS workshop focuses heavily on practical, actionable aspects of reproducibility in broad areas of computational science and data exploration, with special emphasis on issues in which community collaboration ca...
详细信息
In this paper, we study the minimum dominating set (MDS) problem and the minimum total dominating set (MTDS) problem. We propose a new idea to compute approximate MDS and MTDS. This new approach can be implemented in ...
详细信息
暂无评论