Credit card fraud has grown increasingly common in today’s age, and with the rise in cybercrimes with fraud, several examples have been recorded in the past. The use of a distributed search plays a pivotal role in en...
详细信息
ISBN:
(数字)9798331518882
ISBN:
(纸本)9798331518899
Credit card fraud has grown increasingly common in today’s age, and with the rise in cybercrimes with fraud, several examples have been recorded in the past. The use of a distributed search plays a pivotal role in enhancing the performance of fraud detection systems. By enabling the aggregation and retrieval of data from multiple decentralized sources without compromising data privacy, it facilitates the efficient training of models on large, diverse datasets. Another technique used in controlling fraud losses is through applying Federated Learning (FL) for detecting fraudulent transactions. The models may gain the benefit of the dispersed data without actually sharing the data in this way. This paper focuses on the implementation of CNN with FL to improve a security and accuracy of fraud detection in financial transactions. The suggested model employs Kaggle credit card fraud dataset and uses enhanced techniques such as; SMOTE to work on class imbalance problem and one hot encoder to work on Categorical features. The proposed CNN-FL model surpassed other classifiers and yielded better accuracy, precision, and recall rates compared to traditional ML classifiers like NB, LR, and Gaussian Naive Bayes; accuracy $\mathbf{9 9. 8 6\%}$, precision 99.83%, recal 199.85%, and an F1-score of $\mathbf{9 9. 84\%}$. Thus, effectiveness of suggested CNN-based federated learning approach for enriching fraud detection systems is shown, with good generalisation and high accuracy on various types of transactions.
Cloud data warehouse systems lower the barrier to access data analytics. These applications often lack a database administrator and integrate data from various sources, potentially leading to data not satisfying stric...
详细信息
Cloud data warehouse systems lower the barrier to access data analytics. These applications often lack a database administrator and integrate data from various sources, potentially leading to data not satisfying strict constraints. Automatic schema optimization in self-managing databases is difficult in these environments without prior data cleaning steps. In this paper, we focus on constraint discovery as a subtask of schema optimization. Perfect constraints might not exist in these unclean datasets due to a small set of values violating the constraints. Therefore, we introduce the concept of a generic PatchIndex structure, which handles exceptions to given constraints and enables database systems to define these approximate constraints. We apply the concept to the environment of distributed databases, providing parallel index creation approaches and optimization techniques for parallel queries using PatchIndexes. Furthermore, we describe heuristics for automatic discovery of PatchIndex candidate columns and prove the performance benefit of using PatchIndexes in our evaluation.
An approach is presented for managing distributed database systems in the face of communication failures and network partitions. The approach is based on the idea of dividing the database into fragments and assigning ...
详细信息
An approach is presented for managing distributed database systems in the face of communication failures and network partitions. The approach is based on the idea of dividing the database into fragments and assigning each fragment a controlling entity called an agent. The goals achieved by this approach include high data availability and the ability to operate without promptly and correctly detecting partitions. A correctness criterion for transaction execution, called fragmentwise serializability, is introduced. It is less strict than the conventional serializability, but provides a valuable alternative for some applications.
Aggregate views are commonly used for summarizing information held in very large databases such as those encountered in data warehousing, large scale transaction management, and statistical databases. Such application...
详细信息
Aggregate views are commonly used for summarizing information held in very large databases such as those encountered in data warehousing, large scale transaction management, and statistical databases. Such applications often involve distributed databases that have developed independently and therefore may exhibit incompatibility, heterogeneity, and data inconsistency, We are here concerned with the integration of aggregates that have heterogeneous classification schemes where local ontologies, in the form of such classification schemes, may be mapped onto a common ontology. In previous work, we have developed a method for the integration of such aggregates;the method previously developed is efficient, but cannot handle innate data inconsistencies that are likely to arise when a large number of databases are being integrated. In this paper, we develop an approach that can handle data inconsistencies and is thus inherently much more scalable. In our new approach, we first construct a dynamic shared ontology by analyzing the correspondence graph that relates the heterogeneous classification schemes the aggregates are then derived by minimization of the Kullback-Leibler information divergence using the EM (Expectation - Maximization) algorithm. Thus, we may assess whether global queries on such aggregates are answerable, partially answerable, or unanswerable in advance of computing the aggregates themselves.
Data integration in a distributed database refers to the production of union-compatible views for similar data that are expressed dissimilarly in different nodes. Such a facility is required for location transparency...
详细信息
Data integration in a distributed database refers to the production of union-compatible views for similar data that are expressed dissimilarly in different nodes. Such a facility is required for location transparency and for easier formulation of global queries over the apparently incompatible data that are aggregated from different nodes. The issues in data integration within a relational context are examined, and a solution is proposed that is based on special relational constructs, which produce union-compatible relations. The advantages of this approach over others are discussed. These constructs were developed for the PRECI distributed database system, and some of them are being put into operation.
The file allocation problem for distributed databases has been extensively studied in the literature and the objective is to minimize total costs consisting of storage, query and update communication costs. Current mo...
详细信息
The file allocation problem for distributed databases has been extensively studied in the literature and the objective is to minimize total costs consisting of storage, query and update communication costs. Current modeling of update communication costs is simplistic and does not capture the working of most of the protocols that have been proposed. This paper shows that more accurate modeling of update costs can be achieved fairly easily without an undue increase in the complexity of the formulation. In particular, formulations for two classes of update protocols are shown. Existing heuristics can be used on these formulations to obtain good solutions.
Purpose - The purpose of this paper is to provide a data framework to support the incremental aggregation of, and an effective data refresh model to maintain the data consistency in, an aggregated centralized database...
详细信息
Purpose - The purpose of this paper is to provide a data framework to support the incremental aggregation of, and an effective data refresh model to maintain the data consistency in, an aggregated centralized database. Design/methodology/approach - It is based on a case study of enterprise distributed databases aggregation for Taiwan's National Immunization Information System (NIIS). Selective data replication aggregated the distributed databases to the central database. The data refresh model assumed heterogeneous aggregation activity within the distributed database systems. The algorithm of the data refresh model followed a lazy replication scheme but update transactions were only allowed on the distributed databases. Findings - It was found that the approach to implement the data refreshment for the aggregation of heterogeneous distributed databases can be more effectively achieved through the design of a refresh algorithm and standardization of message exchange between distributed and central databases. Research limitations/implications - The transaction records are stored and transferred in standardized XML format. It is more time-consuming in record transformation and interpretation but it does have higher transportability and compatibility over different platforms in data refreshment with equal performance. The distributed database designer should manage these issues as well assure the quality. Originality/value - The data system model presented in this paper may be applied to other similar implementations because its approach is not restricted to a specific database management system and it uses standardized XML message for transaction exchange.
A mathematical model for the allocation of data to sites in a distributed computer network is described in this paper. The model allows for the allocation of multiple copies of data to sites. The model considers trade...
详细信息
A mathematical model for the allocation of data to sites in a distributed computer network is described in this paper. The model allows for the allocation of multiple copies of data to sites. The model considers tradeoffs between costs, transactions' requirements and systems' characteristics. Optimal and near optimal solutions are presented. The data allocation configuration can be determined in polynomial time.
The paper presents a structured approach to the problem of minimizing the join cost in a relational distributed environment. A tree model is used to present a query and a set of tree equivalence classes for query repr...
详细信息
The paper presents a structured approach to the problem of minimizing the join cost in a relational distributed environment. A tree model is used to present a query and a set of tree equivalence classes for query representation is identified corresponding to the space of all the feasible strategies to execute the query. The optimal strategy is then chosen by a dynamic programming approach which exploits the properties of the tree model, although the computational complexity remains exponential in the size of the problem.
暂无评论