Big data applications play a significant role in diverse fields. Distributed Stream Processing Engines (DSPEs) are widely used to support real time applications efficiently. Partitioning algorithms are used to partiti...
详细信息
Big data applications play a significant role in diverse fields. Distributed Stream Processing Engines (DSPEs) are widely used to support real time applications efficiently. Partitioning algorithms are used to partition data streams into multiple nodes to process in parallel to gain efficient performance. Aggregation cost is an important factor when process stateful streaming applications using such partitioning algorithms because it plays an important role on performance when final result is being produced in stateful streaming applications. However, impact of aggregation cost in stream processing is not discussed comprehensively in existing literature. We use performance modeling to identify the importance of aggregation cost when workload is high. We implement performance model on a multi-node cluster to predict the same behavior as on single resource performance model. We demonstrate that stateful streaming applications need more resources as compare to stateless applications when workload is high and both stateful and stateless applications are running in the same DSPE. Experiments results show that a stateful streaming application needs more resources compared to a stateless streaming application when both applications are running on the same DSPE when the workload is high. Further experiment results show that the performance modeling may be helpful to predict maximum workload that can be process on a DSPE and increase in parallelism level is not guaranteed to increase the performance of streaming applications.
Triangle counting is one of the most basic graph applications to solve many real-world problems in a wide variety of domains. Exploring the massive parallelism of the Graphics Processing Unit (GPU) to accelerate the t...
详细信息
Triangle counting is one of the most basic graph applications to solve many real-world problems in a wide variety of domains. Exploring the massive parallelism of the Graphics Processing Unit (GPU) to accelerate the triangle counting is prevail. We identify that the stat-of-the-art GPU-based studies that focus on improving the load balancing still exhibit inherently a large number of random accesses in degrading the performance. In this paper, we design a prefetching scheme that buffers the neighbor list of the processed vertex in advance in the fast shared memory to avoid high latency of random global memory access. Also, we adopt the degree-based graph reordering technique and design a simple heuristic to evenly distribute the workload. Compared to the state-of-the-art HEPC Graph Challenge Champion in the last year, we advance to improve the performance of triangle counting by up to 5.9× speedup with> 109 TEPS on a single GPU for many large real graphs from graph challenge datasets.
Graph is a well known data structure to represent the associated relationships in a variety of applications, e.g., data science and machine learning. Despite a wealth of existing efforts on developing graph processing...
详细信息
Wireless local area network (WLAN) based indoor localization is expanding its fast-paced adoption to facilitate a variety of indoor location-based services (ILBS). Unfortunately, the performance of current WLAN locali...
详细信息
The growing demand and dependence upon cloud services have garnered an increasing level of threat to user data and security. Some of such critical web and cloud platforms have become constant targets for persistent ma...
详细信息
The growing demand and dependence upon cloud services have garnered an increasing level of threat to user data and security. Some of such critical web and cloud platforms have become constant targets for persistent malicious attacks that attempt to breach security protocol and access user data and information in an unauthorized manner. While some of such security compromises may result from insider data and access leaks, a substantial proportion continues to remain attributed to security flaws that may exist within the core web technologies with which such critical infrastructure and services are developed. This paper explores the direct impact and significance of security in the Software Development Life Cycle(SDLC) through a case study that covers some 70 public domain web and cloud platforms within Saudi Arabia. Additionally, the major sources of security vulnerabilities within the target platforms as well as the major factors that drive and influence them are presented and discussed through experimental evaluation. The paper reports some of the core sources of security flaws within such critical infrastructure by implementation with automated security auditing and manual static code analysis. The work also proposes some effective approaches, both automated and manual, through which security can be ensured through-out the SDLC and safeguard user data integrity within the cloud.
Search over encrypted data is a hot topic. In this paper, we propose a secure scheme for searching the encrypted servers. Such scheme enables the authorised user to search multiple servers with multi-keyword queries a...
详细信息
Malware scanning of an app market is expected to be scalable and effective. However, existing approaches use either syntax-based features which can be evaded by transformation attacks or semantic-based features which ...
详细信息
Malware scanning of an app market is expected to be scalable and effective. However, existing approaches use either syntax-based features which can be evaded by transformation attacks or semantic-based features which are usually extracted by performing expensive program analysis. Therefor, in this paper, we propose a lightweight graph-based approach to perform Android malware detection. Instead of traditional heavyweight static analysis, we treat function call graphs of apps as social networks and perform social-network-based centrality analysis to represent the semantic features of the graphs. Our key insight is that centrality provides a succinct and fault-tolerant representation of graph semantics, especially for graphs with certain amount of inaccurate information (e.g., inaccurate call graphs). We implement a prototype system, MalScan, and evaluate it on datasets of 15,285 benign samples and 15,430 malicious samples. Experimental results show that MalScan is capable of detecting Android malware with up to 98% accuracy under one second which is more than 100 times faster than two state-of-the-art approaches, namely MaMaDroid and Drebin. We also demonstrate the feasibility of MalScan on market-wide malware scanning by performing a statistical study on over 3 million apps. Finally, in a corpus of dataset collected from Google-Play app market, MalScan is able to identify 18 zero-day malware including malware samples that can evade detection of existing tools.
Modern graph processing is widely used for solving a vast variety of real-world problems,e.g.,web sites ranking[1]and community detection[2].To better adapt and express the procedure of graph iteration,a wide spectrum...
详细信息
Modern graph processing is widely used for solving a vast variety of real-world problems,e.g.,web sites ranking[1]and community detection[2].To better adapt and express the procedure of graph iteration,a wide spectrum of research is proposed with highly concurrent programming models and smart strategies of graph partition[1,3].
Growing accuracy and robustness of Deep Neural Networks (DNN) models are accompanied by growing model capacity (going deeper or wider). However, high memory requirements of those models make it difficult to execute th...
详细信息
Followee recommendation plays an important role in information sharing over microblogging platforms. Existing followee recommendation schemes adopt either content relevance or social information for followee ranking, ...
详细信息
Followee recommendation plays an important role in information sharing over microblogging platforms. Existing followee recommendation schemes adopt either content relevance or social information for followee ranking, suffering poor performance. Based on the observation that microblogging systems have dual roles of social network and news media platform, we propose a novel followee recommendation scheme that takes into account the information sources of both tweet contents and the social structures. We set up a linear weighted model to combine the two factors and further design a simulated annealing algorithm to automatically assign the weights of both factors in order to achieve an optimized combination of them. We conduct comprehensive experiments on real-world datasets collected from Sina Weibo, the largest microblogging system in China. The results demonstrate that our scheme provides a much more accurate followee recommendation for a user compared to existing schemes.
暂无评论