Cloud storage is essential for managing user data to store and retrieve from the distributed data *** storage service is distributed as pay a service for accessing the size to collect the *** to the massive amount of ...
详细信息
Cloud storage is essential for managing user data to store and retrieve from the distributed data *** storage service is distributed as pay a service for accessing the size to collect the *** to the massive amount of data stored in the data centre containing similar information and file structures remaining in multi-copy,duplication leads to increase storage *** potential deduplication system doesn’t make efficient data reduction because of inaccuracy in finding similar data *** creates a complex nature to increase the storage consumption under *** resolve this problem,this paper proposes an efficient storage reduction called Hash-Indexing Block-based Deduplication(HIBD)based on Segmented Bind Linkage(SBL)Methods for reducing storage in a cloud ***,preprocessing is done using the sparse augmentation ***,the preprocessed files are segmented into blocks to make *** block of the contents is compared with other files through Semantic Content Source Deduplication(SCSD),which identifies the similar content presence between the *** on the content presence count,the Distance Vector Weightage Correlation(DVWC)estimates the document similarity weight,and related files are grouped into a ***,the segmented bind linkage compares the document to find duplicate content in the cluster using similarity weight based on the coefficient match *** implementation helps identify the data redundancy efficiently and reduces the service cost in distributed cloud storage.
Free speech is essential, but it can conflict with protecting marginalized groups from harm caused by hate speech. Social media platforms have become breeding grounds for this harmful content. While studies exist to d...
详细信息
Free speech is essential, but it can conflict with protecting marginalized groups from harm caused by hate speech. Social media platforms have become breeding grounds for this harmful content. While studies exist to detect hate speech, there are significant research gaps. First, most studies used text data instead of other modalities such as videos or audio. Second, most studies explored traditional machine learning algorithms. However, due to the increase in complexities of computational tasks, there is need to employ complex techniques and methodologies. Third, majority of the research studies have either been evaluated using very few evaluation metrics or not statistically evaluated at all. Lastly, due to the opaque, black-box nature of the complex classifiers, there is need to use explainability techniques. This research aims to address these gaps by detecting hate speech in English and Kiswahili languages using videos manually collected from YouTube. The videos were converted to text and used to train various classifiers. The performance of these classifiers was evaluated using various evaluation and statistical measurements. The experimental results suggest that the random forest classifier achieved the highest results for both languages across all evaluation measurements compared to all classifiers used. The results for English language were: accuracy 98%, AUC 96%, precision 99%, recall 97%, F1 98%, specificity 98% and MCC 96% while the results for Kiswahili language were: accuracy 90%, AUC 94%, precision 93%, recall 92%, F1 94%, specificity 87% and MCC 75%. These results suggest that the random forest classifier is robust, effective and efficient in detecting hate speech in any language. This also implies that the classifier is reliable in detecting hate speech and other related problems in social media. However, to understand the classifiers’ decision-making process, we used the Local Interpretable Model-agnostic Explanations (LIME) technique to explain the
In the realm of smart healthcare, vast amounts of valuable patient data are generated worldwide. However, healthcare providers face challenges in data sharing due to privacy concerns. Federated learning (FL) offers a ...
详细信息
Biosignal representation learning (BRL) plays a crucial role in emotion recognition for game users (ERGU). Unsupervised BRL has garnered attention considering the difficulty in obtaining ground truth emotion labels fr...
详细信息
Biosignal representation learning (BRL) plays a crucial role in emotion recognition for game users (ERGU). Unsupervised BRL has garnered attention considering the difficulty in obtaining ground truth emotion labels from game users. However, unsupervised BRL in ERGU faces challenges, including overfitting caused by limited data and performance degradation due to unbalanced sample distributions. Faced with the above challenges, we propose a novel method of biosignal contrastive representation learning (BCRL) for ERGU, which not only serves as a unified representation learning approach applicable to various modalities of biosignals but also derives generalized biosignals representations suitable for different downstream tasks. Specifically, we solve the overfitting by introducing perturbations at the embedding layer based on the projected gradient descent (PGD) adversarial attacks and develop the sample balancing strategy (SBS) to mitigate the negative impact of the unbalanced sample on the performance. Further, we have conducted comprehensive validation experiments on the public dataset, yielding the following key observations: 1) BCRL outperforms all other methods, achieving average accuracies of 76.67%, 71.83%, and 63.58% in 1D-2C Valence, 1D-2C Arousal and 2D-4C Valence/Arousal, respectively;2) The ablation study shows that both the PGD module (+7.58% in accuracy on average) and the SBS module (+14.60% in accuracy on average) have a positive effect on the performance of different classifications;3) BCRL model exhibits the certain generalization ability across the different games, subjects and classifiers. IEEE
As a result of its aggressive nature and late identification at advanced stages, lung cancer is one of the leading causes of cancer-related deaths. Lung cancer early diagnosis is a serious and difficult challenge that...
详细信息
Parkinson's disease (PD) diagnosis involves the assessment of a variety of motor and non-motor symptoms. To accurately diagnose PD, it is necessary to differentiate its symptoms from those of other conditions. Dur...
详细信息
Today's world is fully dependent on data. Data are individual packets or units of information which on process leads to a useful information which intend helps in decision making. So these data are to be shared am...
详细信息
In the wake of rapid advancements in artificial intelligence(AI), we stand on the brink of a transformative leap in data systems. The imminent fusion of AI and DB(AI×DB) promises a new generation of data systems,...
详细信息
In the wake of rapid advancements in artificial intelligence(AI), we stand on the brink of a transformative leap in data systems. The imminent fusion of AI and DB(AI×DB) promises a new generation of data systems, which will relieve the burden on end-users across all industry sectors by featuring AI-enhanced functionalities, such as personalized and automated in-database AI-powered analytics, and selfdriving capabilities for improved system performance. In this paper, we explore the evolution of data systems with a focus on deepening the fusion of AI and DB. We present NeurDB, an AI-powered autonomous data system designed to fully embrace AI design in each major system component and provide in-database AI-powered analytics. We outline the conceptual and architectural overview of NeurDB, discuss its design choices and key components, and report its current development and future plan.
Most of the search-based software remodularization(SBSR)approaches designed to address the software remodularization problem(SRP)areutilizing only structural information-based coupling and cohesion quality ***,in prac...
详细信息
Most of the search-based software remodularization(SBSR)approaches designed to address the software remodularization problem(SRP)areutilizing only structural information-based coupling and cohesion quality ***,in practice apart from these quality criteria,there require other aspects of coupling and cohesion quality criteria such as lexical and changed-history in designing the modules of the software ***,consideration of limited aspects of software information in the SBSR may generate a sub-optimal modularization ***,such modularization can be good from the quality metrics perspective but may not be acceptable to the *** produce a remodularization solution acceptable from both quality metrics and developers’perspectives,this paper exploited more dimensions of software information to define the quality criteria as modularization ***,these objectives are simultaneously optimized using a tailored manyobjective artificial bee colony(MaABC)to produce a remodularization *** assess the effectiveness of the proposed approach,we applied it over five software *** obtained remodularization solutions are evaluated with the software quality metrics and developers view of *** demonstrate that the proposed software remodularization is an effective approach for generating good quality modularization solutions.
Handwritten documents generated in our day-to-day office work, class room and other sectors of society carry vital information. Automatic processing of these documents is a pipeline of many challenging steps. The very...
详细信息
暂无评论