Have you ever Wondered how data can be used as a weapon as well as useful information?. Welcome to the year of 2018 where large datasets can be manipulated using the hadoop cluster interface, so that the users can sto...
详细信息
Have you ever Wondered how data can be used as a weapon as well as useful information?. Welcome to the year of 2018 where large datasets can be manipulated using the hadoop cluster interface, so that the users can store, read, transfer any type of data including large datasets. Imagine a lawyer who must be having huge amounts of case files. It isnot humanly impossible to remember the dates, file numbers, The status at a single time of a case which was closed several years ago here we use the concept of big data where you can retrieve such information within seconds. The technical advantages of organizing such data can be helpful not only in certain high profile cases where it is necessary that a particular order or organization is present at the most basic level so as to get information that maybe the turning point that is needed to win a particular case but also assist as a alibi for any vocal statements given during a particular session, these data entered in a registry as mentioned above can be lost over time so the storage of these important data can be done by using a Hadoop cluster which stores large datasets and can be referred at will i.e Lawyers as well as lawmakers will be able to successfully get facts in the argument without any Data Unavailability. The Apche ambari is a Hadoop interface used to manipulate and analyze distributed data across number of clusters. The Apache spark can be used to reduce, separate and filter data according to the users requirements.
This study aims to make it simpler to detect uneven DNS heavy attack. Detecting DNS-heavy attack might be problematic when there is a lack of balanced datasets. Using both Recurrent Neural Network (RNN) and Support Ve...
详细信息
ISBN:
(数字)9798350371314
ISBN:
(纸本)9798350371321
This study aims to make it simpler to detect uneven DNS heavy attack. Detecting DNS-heavy attack might be problematic when there is a lack of balanced datasets. Using both Recurrent Neural Network (RNN) and Support Vector Machine (SVM) designs together might help find things more accurately. The Recurrent Neural Network and a Support Vector Machine work well together in this model to handle the complexity that comes from unevenly distributed data. The model does a great job of recognizing the order of events in DNS query packets by utilizing the time-based understanding of Recurrent Neural Networks and its Long Short-Term Memory cells. These changes help us understand trends that are connected to bigger threats. To change how features are shown, SVM is also used. This is because it is good at navigating feature spaces with lots of factors. This makes the process of sorting better. The fact that the model may be applied to samples that are not evenly distributed while yet producing comparable learning results is an intriguing aspect of the approach. In general, it is underlined that the approach has the potential to make it far simpler to identify Attacks that are heavy on DNS. In the results analysis, found the numbers of Loss, Accuracy, Val_loss, and Val_accuracy at various epoch times. This helped us better understand how effective our method was at different stages of teaching. Analyzing these signs taught about how well the methods work at finding skewed DNS attacks.
Extracting minimal functional dependencies (MFDs) from relational databases is an import database analysis technique. With the advent of big data era, it is challenging to discover MFDs from big data, especially large...
详细信息
Extracting minimal functional dependencies (MFDs) from relational databases is an import database analysis technique. With the advent of big data era, it is challenging to discover MFDs from big data, especially large-scale distributed data stored in many different sites. The key to discovering MFDs as fast as possible is pruning the useless candidate MFDs. And in most existed algorithms, it usually prunes candidate MFDs from top to bottom or from bottom to top. We present a new algorithms FastMFDs for discovering all MFDs from large-scale distributed data both from top to bottom and from bottom to top in parallel. We experimented our algorithm in real-life datasets, and our algorithm is more efficient and faster than the existed discovering algorithms.
The P2P-based system for the distributed computing of statistics called DuDE is presented. High scalability and failure resilience features of P2P are exploited to achieve a high- performance distributed system, which...
详细信息
The P2P-based system for the distributed computing of statistics called DuDE is presented. High scalability and failure resilience features of P2P are exploited to achieve a high- performance distributed system, which avoids the bottlenecks of a centralized computing system. To ensure high data availability, a sophisticated algorithm for distributed data storage is integrated. Furthermore, an algorithm for global peer discovery is presented, which allows for finding all data assigned to peers without the need for a central instance. For the realization of DuDE, common working stages of distributed computing are extended to enable a highly scalable computing system based on P2P technology. Generated results from a test system show a nearly perfect linear speedup for distributed computing as well as high processor and memory relief compared to a centralized solution.
A knowledge-based architecture designed to connect and correlate autonomous disparate information sources is presented. The information sources being integrated come equipped with logical front-ends that build up only...
详细信息
A knowledge-based architecture designed to connect and correlate autonomous disparate information sources is presented. The information sources being integrated come equipped with logical front-ends that build up only those parts of a virtual global schema which are needed to process local or global requests. The schema building and translation processes are driven by respective knowledge bases. The core of the architecture is a connecting tissue in the form of a distributed knowledge modeling platform, also referred to as distributed cooperative object management. The proposed architecture is intended for a broad spectrum of use, ranging from heterogeneous business applications to sophisticated design systems. It is expected that the architecture can be implemented on top of open distributed processing environments.< >
Federated learning (FL), a distributed learning strategy, improves security and privacy by eliminating the need for clients to share their local data; however, FL struggles with non-IID (non independent and identicall...
详细信息
ISBN:
(数字)9798350374889
ISBN:
(纸本)9798350374896
Federated learning (FL), a distributed learning strategy, improves security and privacy by eliminating the need for clients to share their local data; however, FL struggles with non-IID (non independent and identically distributed) data. Clustered FL aims to remedy this by grouping similar clients and training a model per group; nevertheless, it faces difficulties in determining clusters without sharing local data and conducting model evaluation. Clustered FL evaluation on unseen clients typically applies all models, selecting the best-performer for each client - approach known as best-fit cluster evaluation. This paper challenges such evaluation process arguing that it violates a fundamental machine learning principle: test dataset labels should be used only for performance calculation, not for model selection. We show that best-fit cluster evaluation results in significant accuracy overestimates. Moreover, we present an evaluation approach that maintains the separation between model selection and evaluation by reserving a portion of the target client data for model selection, while the remaining data is used for accuracy estimation. Experiments on four datasets, encompassing various IID and non-IID scenarios, demonstrate that the best-fit cluster evaluation produces overestimates that are statistically different from our evaluation.
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
ISBN:
(纸本)9781538655566;9781538655559
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
Since the 1950s, nonoverlapping batch means (NBM) has been a basis for confidence-interval procedures (CIPs) for the mean of a steady-state time series. In 1985, overlapping batch means (OBM) was introduced as an alte...
详细信息
ISBN:
(纸本)9781467397414
Since the 1950s, nonoverlapping batch means (NBM) has been a basis for confidence-interval procedures (CIPs) for the mean of a steady-state time series. In 1985, overlapping batch means (OBM) was introduced as an alternative to NBM for estimating the standard error of the sample mean. Despite OBM's inherent efficiency, because the OBM statistic does not approach normality via the chi-squared distribution, no OBM CIP was introduced. We define two fixed-sample-size OBM CIPs. OBM1 is based on the result that asymptotically OBM has half again as many degrees of freedom as NBM. OBM2 does the same, but increases degrees of freedom. We argue that OBM's sampling distribution has skewness and kurtosis closer to normal than the chi-squared distribution. We show experimentally that for AR(1) processes the OBM CIPs perform better than NBM CIPs in terms of classic criteria and the VAMP1RE criterion. Finally, we introduce the concept of VAMP1RE-optimal batch sizes.
Grid computing receives growing attention from scientists and scholars these years, of which grid information service is the highlight and one of the most difficult subjects. Grid information service is characterized ...
详细信息
Grid computing receives growing attention from scientists and scholars these years, of which grid information service is the highlight and one of the most difficult subjects. Grid information service is characterized in its wide distribution, high fault tolerance, and dynamic functions as well as diversified forms. As the foundation and core of the grid information service, the distributed database of LDAP (light weight directory access protocol) has been widely used so far. In order to make the system more effective and efficient, this paper puts forward the strategies of ring replication and Thread replication, both of which are really better ways to raise the efficiency of the grid information service system compared with many other replication strategies.
暂无评论