Caching intermediate results in memory, instead of flushing them to disks, actually shortens the completion of big data analytics, because there is no need to reload them for follow-up computations. Constrained by the...
Caching intermediate results in memory, instead of flushing them to disks, actually shortens the completion of big data analytics, because there is no need to reload them for follow-up computations. Constrained by the limited memory, traditional approaches only cache part of the results due to explicit user triggers or simply their accesses, but fail to capture instantaneous system dynamics, including the execution order of parallel stages as well as the current uncompleted dependencies per stage. The data to be cached has to capture such system dynamics and minimize job completion. We thus design a dynamic caching mechanism (SAC) for big data analytics, by using both static sketch upon the stage dependencies in jobs’ preparation and the dynamic adjustment of the such sketch during the job’s execution. The static sketch essentially determines a minimal subset of stages in each job for maximizing the caching benefit while the dynamic adjustment tries to change the caching priority among pending stages. We implement our proposed SAC in Spark, and extensive experiments upon real-world workloads show that the SAC reduces the average job completion by at least 24.6%, compared with those state-of-the-art alternatives.
With growing data volume for large-scale virtual screening, the associated data processing and management meet challenges. We have developed UCAPF, A unified platform for large-scale virtual screening. The platform pr...
With growing data volume for large-scale virtual screening, the associated data processing and management meet challenges. We have developed UCAPF, A unified platform for large-scale virtual screening. The platform provides a parallel processing framework for large-scale virtual screening data. It also enables scheduling of heterogeneous parallel architectures and hierarchical storage of massive data. The processing framework improves data quality. On the CASF-2016 dataset, the standardized molecules processed by UCAPF showed a 9.5% to 14.62% improvement in scoring performance and a 7.9% to 34.6% improvement in ranking performance compared to the raw molecules. For massive data processing, the framework provides parallel efficiency of 81.20% for molecule standardized processing and 79.51% for docking result processing on a 72-unit Hadoop cluster. In addition, the distributed database for data management improves the ability to retrieve 10,094 molecules from seventy million docking result data by a factor of 2.40 compared to the single-node storage model. Finally, we analyze the variation of input/output (I/O) over time for different phases of virtual screening to reflect the effectiveness of the scheduling strategy and tiered storage for the heterogeneous parallel architecture.
Automatic synthesis of efficient scientific parallel programs for supercomputers is in general a complex problem of system parallel programming. Therefore various specialized synthesis algorithms and heuristics are of...
详细信息
Digital image processing is an actual task in the digital communication systems, IP-telephony and video conferencing, in digital television, and video surveillance. Digital processing of large video images takes a lot...
详细信息
ISBN:
(纸本)9781728173863
Digital image processing is an actual task in the digital communication systems, IP-telephony and video conferencing, in digital television, and video surveillance. Digital processing of large video images takes a lot of time, especially if it happens in a real-time system. And, processing speed plays an important role in recognition of objects in video images received from IP-cameras in real time. This requires the use of modern technologies, and fast algorithms that increase the acceleration of digital image processing. Acceleration problems have not been fully resolved till present. Today's realities are such that the development of accelerated image processing programs requires a good knowledge of parallel and distributedcomputing. Both of these areas are united by the fact that both parallel and distributed software consists of several processes that together solve one common problem. This article proposes an accelerated method for the tasks of recognizing objects in video images received from IP-cameras using parallel and distributedcomputing technologies
String sequence indexing is the basis of many applications including compression, route prediction, bioinformatics, text mining, string matching, etc. where the goal is to index huge sequences. Moreover, prediction ap...
详细信息
String sequence indexing is the basis of many applications including compression, route prediction, bioinformatics, text mining, string matching, etc. where the goal is to index huge sequences. Moreover, prediction application requires a probabilistic tree. Probabilistic Lempel-Ziv- is one of the widely used techniques for text compression as well as string sequence indexing. In many applications, it serves as a model for prediction. In this case, demonstrate the distributed computation in the case of route prediction. LZW model construction from the large corpus of historical data by processing sequentially is a challenge in the efficient implementation. Most of the current implementations are based on time-space complexity on a vertical salability on a single machine. Extending them to distributedparallel execution is still challenging and is very less researched. This work implements the computing on distributedcomputing clusters which achieves parallelism without sacrificing the accuracy using Hadoop distributed file system (HDFS). The objective is two-fold – first, applies LZW for route prediction, and second, addresses challenges in distributed computation.
Tower solar thermal power generation is a new type of low-carbon and environmentally friendly clean energy technology. In this paper, a single-objective optimization model is established with the maximum annual averag...
详细信息
ISBN:
(数字)9798350374315
ISBN:
(纸本)9798350374322
Tower solar thermal power generation is a new type of low-carbon and environmentally friendly clean energy technology. In this paper, a single-objective optimization model is established with the maximum annual average thermal output per unit mirror area of the heliostat field as the objective function. By optimizing the heliostat width, heliostat height and heliostat coordinate, the annual average output thermal power can be maximized while meeting the rated power. The heliostat field is distributed in a circular shape. As the seasons change, the light intensity received by the heliostat field is uneven, so the optical efficiency of the heliostats is also different. We simplify the heliostat field into S area (area with higher optical efficiency) and W area (area with lower optical efficiency) based on the optical efficiency distribution of heliostats. At different locations, According to the heliostat width, heliostat height and heliostat coordinate generated by random functions, substituting these parameters into the efficiency model in this paper, and the random search algorithm is used to traverse the maximum annual output thermal power and the corresponding heliostat parameters. In the shadow occlusion loss model, we have grid to set up the heliostat and use the coordinate conversion to solve the shadow occlusion loss efficiency.
Now a days microgrid is one of the most widely used method in power network to reduce system losses as well as improve the reliability in the field of electrical systems. Integration of power projects typically involv...
详细信息
Modern law enforcement faces significant challenges in managing and analyzing the exponential growth of crime data, which often includes unstructured, high-volume, and real-time information. Traditional relational dat...
详细信息
ISBN:
(数字)9798331511425
ISBN:
(纸本)9798331511432
Modern law enforcement faces significant challenges in managing and analyzing the exponential growth of crime data, which often includes unstructured, high-volume, and real-time information. Traditional relational database systems struggle to handle these complexities, limiting the ability of agencies to derive actionable insights for proactive measures. This research addresses these challenges by proposing a scalable distributed framework utilizing Hadoop and MapReduce for real-time crime data analysis. The objectives include enabling efficient ETL (Extract, Transform, Load) processes, implementing a robust star schema for structured data storage, and providing actionable insights through an integrated real-time dashboard. The methodology employs a multi-node Hadoop cluster for parallel processing, optimizing data integration and analysis capabilities. Results demonstrate significant improvements in processing speed, fault tolerance, and scalability, validated through the framework's application in Sri Lanka's crime data analysis. Findings reveal enhanced resource allocation, crime pattern identification, and operational efficiency for law enforcement. This research establishes a cost-effective, high-performance solution to modern criminological data challenges, with future potential for predictive analytics and machine learning integration.
3D Gaussian Splatting (3DGS) has recently emerged as a prominent technique in novel view synthesis. The superior performance of 3DGS has catalyzed an increasing number of 3DGS- based applications in edge scenarios, wh...
详细信息
ISBN:
(数字)9798331516024
ISBN:
(纸本)9798331516031
3D Gaussian Splatting (3DGS) has recently emerged as a prominent technique in novel view synthesis. The superior performance of 3DGS has catalyzed an increasing number of 3DGS- based applications in edge scenarios, where 3DGS is utilized for various purposes, such as scene representation, comprehension, and generation. Meanwhile, these edge applications also serve as primary sources of scene observations for producing 3DGS models. However, the intensive computation involved in 3DGS training and the massive number of 3D Gaussian primitives required for high-resolution scene repre-sentation hinder the effectiveness of in-situ 3DGS training on off-the-shelf edge devices, whether using standalone training or Data-distributed-parallel (DDP) training. To address this issue, this work proposes MIX3D, a novel mixed representation for communication-efficient distributed 3DGS training in edge scenarios. MIX3D features a global sparse sub-model and various local dense sub-models, where the sparse sub-model encodes coarse-grained appearance for the entire scene, and each dense sub-model targets fine-grained details for a specific region of the scene. Extensive evaluations on a four-device edge cluster demonstrate the effectiveness of our developed distributed 3DGS training workflow based on MIX3D, achieving reductions in training time up to 86.6% compared to vanilla DDP training and an average speedup of 3.767x over standalone training.
In today's society, under the comprehensive arrival of the Internet era, the rapid development of technology has facilitated people's production and life, but it is also a "double-edged sword", makin...
详细信息
ISBN:
(数字)9781728182780
ISBN:
(纸本)9781728182780
In today's society, under the comprehensive arrival of the Internet era, the rapid development of technology has facilitated people's production and life, but it is also a "double-edged sword", making people's personal information and other data subject to a greater threat of abuse. The unique features of big data technology, such as massive storage, parallelcomputing and efficient query, have created a breakthrough opportunity for the key technologies of large-scale network security situational awareness. On the basis of big data acquisition, preprocessing, distributedcomputing and mining and analysis, the big data analysis platform provides information security assurance services to the information system. This paper will discuss the security situational awareness in large-scale network environment and the promotion of big data technology in security perception.
暂无评论