Many local outlier detection algorithms have been proposed inspired by the idea of local outlier factor (LOF). However, they often have low detection performance and are sensitive to neighborhood size because there is...
详细信息
ISBN:
(数字)9781728143286
ISBN:
(纸本)9781728143293
Many local outlier detection algorithms have been proposed inspired by the idea of local outlier factor (LOF). However, they often have low detection performance and are sensitive to neighborhood size because there is a major defect in their calculation formulas of outlier degree and the kNN (k-nearest neighbors) method is widely used to quantify a neighborhood of an instance. To address these issues, we define a novel nearest neighbors tree (NNT) to measure a neighborhood of an instance. Meanwhile, we propose a local structure outlier factor (LSOF), which score each local structure instead of each data point and report the top-scored local structures as anomalous local structures, where outliers and groups of outliers are easily divided according to characteristics of the NNT. Our experimental results demonstrate that the competitive behavior of our method on both synthetic and real-world datasets.
In-memory data processing frameworks (e.g., Spark) make big data analysis greatly simpler and efficient. However, stragglers that take much longer to finish than other tasks significantly degrade performance. There ex...
详细信息
ISBN:
(纸本)9781728125848
In-memory data processing frameworks (e.g., Spark) make big data analysis greatly simpler and efficient. However, stragglers that take much longer to finish than other tasks significantly degrade performance. There exist multiple factors that cause stragglers, either from the hardware resource layer or application layer, e.g. hardware heterogeneity, interference, data locality and data skew. While state-of-the-art straggler mitigation techniques have presented partial solutions on data skew and data locality, we find that the other factors can also result in serious problems. We present Clio, a cross-layer interference-aware optimization system that can effectively mitigate stragglers for data processing frameworks. Clio supports the scheduling of both map and reduce tasks. It heuristically dispatches intermediate data in proportion to the actual computing ability of each worker node, which is estimated considering various straggler factors, to balance the completion times of tasks in a much finer way. We implement Clio in Apache Spark, and evaluate its performance using both synthetic and real datasets. Experiment results show that, Clio can speed up the execution of applications by up to 67%, compared with the existing algorithms.
Offering a paradigm for retrieving and aggregating multiple data from multiple sources is a crucial requirement in a large content-centric network. However, the major hindrances to this paradigm are network's dyna...
详细信息
ISBN:
(纸本)9781538648704
Offering a paradigm for retrieving and aggregating multiple data from multiple sources is a crucial requirement in a large content-centric network. However, the major hindrances to this paradigm are network's dynamic feature, traffic balance, wired forwarding and the absence of cooperation between communications and computations. In this paper, we present a scalable and top-k Concast service on Named Data networking (NDN). The service enables cooperation between top-k tiny data discovering and aggregating among multiple routers and paths for a user's Interest that contained a hierarchical name and other constraints. Specifically, multiple types and strategies of tiny data aggregation for merging and processing the positive data and suppressing the negative, futile data, as well as a determination of response completeness are introduced for enhancing relevant results recall and sharing. The experimentation demonstrated the top-k Concast service can effectively improve the service quality, reduce network traffic and shorten response time.
The Big Data is a huge unmanageable set of data that call for store, process, and analyze these data. The Big Data is going colossal per day as well as metadata size. It is the time to bring in the distributed Metadat...
详细信息
ISBN:
(纸本)9789811033735;9789811033728
The Big Data is a huge unmanageable set of data that call for store, process, and analyze these data. The Big Data is going colossal per day as well as metadata size. It is the time to bring in the distributed Metadata Server (dMDS) to appease the hunger of data scientist to conquer the dilemma of storing, processing, and analyzing the hysterical data. The dMDS makes stride to carry out the intricate research problems of the file system in Big Data to make impeccable technology. In this study, we investigate properly to get the bottom of the things of Metadata Server (MDS).
As storage systems grow to Petascale, the demand for object storage increases. In a large scale heterogeneous object storage systems, efficient selection of storage targets for placing objects is critically important ...
详细信息
In designing digital filters, Multiply-Accumulate (MAC) unit is used. MAC comprises of multiplier, adder and an accumulator. Faster adder and multiplier circuits are required for high speed MAC unit. But MAC based str...
详细信息
The advent of internet has led to the formation of a totally digitized environment, where each and everything is constantly in connection & can be easily accessed from anywhere. Traditional IP networks are difficu...
详细信息
ISBN:
(纸本)9781538656570
The advent of internet has led to the formation of a totally digitized environment, where each and everything is constantly in connection & can be easily accessed from anywhere. Traditional IP networks are difficult to understand and manage. Configuring the networks as per the already defined policies & making changes to it to react to issues faced has become difficult. The vertical integration of current networks adds to the difficulties. Software-Defined Network (SDN) is a new paradigm that promises to change and solve the problem. SDN breaks network into traceable pieces, thus making it less demanding to make & present new changes in networking, making network management & network evolution simpler. In SDN, load balancing is a brilliant technology as it can save power & improve the utilization of resources. This research focuses on analyzing the load and data in SDN for various distributed network topologies. Various network emulator and packet tracker tools have been used in the paper for generating statics.
Localization is required to maintain real-time position for processing in indoor and outdoor conditions. This may be utilized to support mobile computing and networking among nodes in wireless sensor networks. In this...
详细信息
ISBN:
(纸本)9789811068720;9789811068713
Localization is required to maintain real-time position for processing in indoor and outdoor conditions. This may be utilized to support mobile computing and networking among nodes in wireless sensor networks. In this paper, a dynamic algorithm for location estimation is proposed for the wireless sensor networks. The algorithm is tested on the TinyOS-based emulation test bed for validation of the approach in real-time environment.
Social influence plays an essential role in spreading information within online social networks, and can be modeled or measured by analyzing various social networking data, such as published content, users' attrib...
详细信息
ISBN:
(纸本)9781538643013
Social influence plays an essential role in spreading information within online social networks, and can be modeled or measured by analyzing various social networking data, such as published content, users' attributes or interactions among them. Because of the massive social data, researchers often fail to quantify user influence in an accurate and high efficient way. Big data technique can be adopted to alleviate this problem. In this paper, we introduce a kind of classical individual influence algorithm, and implement two parallel versions of this algorithm based on different big data processing framework Experiment results on a large-scale real dataset demonstrate that the computational efficiency of influence algorithm can be improved significantly in massive data sets by virtue of big data processing framework
For the development of agricultural modernization, an automatic recognition network for crop diseases with the advantages of high efficiency, non-destructiveness and continuity is indispensable. In this paper, we cons...
详细信息
ISBN:
(数字)9781728143286
ISBN:
(纸本)9781728143293
For the development of agricultural modernization, an automatic recognition network for crop diseases with the advantages of high efficiency, non-destructiveness and continuity is indispensable. In this paper, we construct a 13-layers convolutional neural network for common citrus diseases recognition, which improving the efficiency and reducing overfitting by using a stack of small kernel and dropout, etc. Finally, we conduct comparative experiments with other networks. The experiment results show that the performances of our network are better than CNN and AlexNet. It indicates that the network we constructed is an effective citrus diseases recognition method, which can provide technical support for the identification and prevention of citrus diseases.
暂无评论