An imbalanced dataset often challenges machine learning, particularly classification methods. Underrepresented minority classes can result in biased and inaccurate models. The Synthetic Minority Over-Sampling Techniqu...
详细信息
An imbalanced dataset often challenges machine learning, particularly classification methods. Underrepresented minority classes can result in biased and inaccurate models. The Synthetic Minority Over-Sampling Technique (SMOTE) was developed to address the problem of imbalanced data. Over time, several weaknesses of the SMOTE method have been identified in generating synthetic minority class data, such as overlapping, noise, and small disjuncts. However, these studies generally focus on only one of SMOTE’s weaknesses: noise or overlapping. Therefore, this study addresses both issues simultaneously by tackling noise and overlapping in SMOTE-generated data. This study proposes a combined approach of filtering, clustering, and distance modification to reduce noise and overlapping produced by SMOTE. Filtering removes minority class data (noise) located in majority class regions, with the k-nn method applied for filtering. The use of Noise Reduction (NR), which removes data that is considered noise before applying SMOTE, has a positive impact in overcoming data imbalance. Clustering establishes decision boundaries by partitioning data into clusters, allowing SMOTE with modified distance metrics to generate minority class data within each cluster. This SMOTE clustering and distance modification approach aims to minimize overlap in synthetic minority data that could introduce noise. The proposed method is called “NR-Clustering SMOTE,” which has several stages in balancing data: (1) filtering by removing minority classes close to majority classes (data noise) using the k-nn method;(2) clustering data using K-means aims to establish decision boundaries by partitioning data into several clusters;(3) applying SMOTE oversampling with Manhattan distance within each cluster. Test results indicate that the proposed NR-Clustering SMOTE method achieves the best performance across all evaluation metrics for classification methods such as Random Forest, SVM, and Naїve Bayes, compared t
A robust system for backlit keyboard inspection is revealed. The backlit keyboard not only has changeable diverse colors but also has the laser marking keys. The keys on the keyboard can be divided into regions of fun...
详细信息
A robust system for backlit keyboard inspection is revealed. The backlit keyboard not only has changeable diverse colors but also has the laser marking keys. The keys on the keyboard can be divided into regions of function keys, normal keys, and number keys. However, there might have some types of defects: incorrect illuminating area, non-uniform illumination of specified inspection region(IR), and incorrect luminance and intensity of individual key. Since the illumination features of backlit keyboard are too complex to inspect for human inspector in the production line, an auto-mated inspection system for the backlit keyboard is proposed in this paper. The system was designed into the operation module and inspection module. A set of image processing methods were developed for these defects inspection. Some experimental results demonstrate the robustness and effectiveness of the proposed system.
In this paper, we propose a lossless data hiding algorithm for grayscale images. Specifically, our technique is based on the cluster-based difference expansion transform. The main scenario behind our technique is that...
详细信息
As more and more undergraduate students act as voluntary tutors to rural pupils after school, there is a growing need for a resource repository to support tutors during their tutoring process. However, when tutoring r...
详细信息
In the field of image-based drug discovery, capturing the phenotypic response of cells to various drug treatments and perturbations is a crucial step. This process involves transforming high-throughput cellular images...
详细信息
In the field of image-based drug discovery, capturing the phenotypic response of cells to various drug treatments and perturbations is a crucial step. This process involves transforming high-throughput cellular images into quantitative representations for downstream analysis. However, existing methods require computationally extensive and complex multi-step procedures, which can introduce inefficiencies, limit generalizability, and increase potential errors. To address these challenges, we present PhenoProfiler, an innovative model designed to efficiently and effectively extract morphological representations, enabling the elucidation of phenotypic changes induced by treatments. PhenoProfiler is designed as an end-to-end tool that processes whole-slide multi-channel images directly into low-dimensional quantitative representations, eliminating the extensive computational steps required by existing methods. It also includes a multi-objective learning module to enhance robustness, accuracy, and generalization in morphological representation learning. PhenoProfiler is rigorously evaluated on large-scale publicly available datasets, including over 230,000 whole-slide multi-channel images in end-to-end scenarios and more than 8.42 million single-cell images in non-end-to-end settings. Across these benchmarks, PhenoProfiler consistently outperforms state-of-the-art methods by up to 20%, demonstrating substantial improvements in both accuracy and robustness. Furthermore, PhenoProfiler uses a tailored phenotype correction strategy to emphasize relative phenotypic changes under treatments, facilitating the detection of biologically meaningful signals. UMAP visualizations of treatment profiles demonstrate PhenoProfiler’s ability to effectively cluster treatments with similar biological annotations, thereby enhancing interpretability. These findings establish PhenoProfiler as a scalable, generalizable, and robust tool for phenotypic learning, offering transformative advancement
The paper presents a method and a collection of techniques for conducting virtual excavations in online social networking services. YouTube and its Data API are used as a case study of a virtual settlement. The object...
详细信息
Khodaei and Faez proposed a new adaptive data hiding technique based on LSB substitution and pixel-value differencing. Their algorithm can embed a large amount of secret data while maintaining acceptable image quality...
详细信息
Khodaei and Faez proposed a new adaptive data hiding technique based on LSB substitution and pixel-value differencing. Their algorithm can embed a large amount of secret data while maintaining acceptable image quality. However, their proposed algorithm only has fixed embedding capacity. In addition, the derivation for three consecutive pixels in the boundary region is poorly manipulated using raster scan order, resulting in inaccurate pixel differences. Finally, an overflow problem may occur for some embedding cases. In this study, we adopt non-overlapping blocks with m-by-n pixels to address the above problems. The cover image is first partitioned into non-overlapping blocks. The LSB substitution and optimal pixel adjustment process are then employed to embed the secret message into the central pixel of each block. The residual pixels within the same block are with message embedded using a pixel-value differencing scheme. The experimental results show that our proposed algorithm can achieve an adjustable embedding capacity according to the block size. The proposed technique is feasible in adaptive data hiding.
We present an efficient information hiding algorithm for polygonal models. The decision to referencing neighbors for each embeddable vertex is based on a modified breadth first search, starting from the initial polygo...
详细信息
We present an efficient information hiding algorithm for polygonal models. The decision to referencing neighbors for each embeddable vertex is based on a modified breadth first search, starting from the initial polygon determining by principal component analysis. The surface complexity is then estimated by the distance between the embedding vertex and the center of its referencing neighbors. Different amounts of secret messages are adaptively embedded according to the surface properties of each vertex. A constant threshold is employed to control the maximum embedding capacity for each vertex and decrease the model distortion simultaneously. The experimental results show the proposed algorithm is efficient and can provide higher robustness, higher embedding capacity, and lower model distortion than previous work, with acceptable estimation accuracy. The proposed technique is feasible in 3D adaptive information hiding.
Multi-processor systems-on-chip (MPSoC) seek for high performance, scalable and power efficient communication infrastructures. Recent research considers on-chip serial links for communication fabrics as a solution to ...
详细信息
The paper elaborates on the concept of transformable boundary artifacts and their role in fostering knowledge-based work in cross-organization virtual communities of practice. The domain of investigation is clinical p...
详细信息
暂无评论