The clustering algorithm of K-means is a widely used clustering algorithm, which characteristic is efficient and simple to implement. In this study, it takes the clustering algorithm of K-means as the starting point, ...
详细信息
The clustering algorithm of K-means is a widely used clustering algorithm, which characteristic is efficient and simple to implement. In this study, it takes the clustering algorithm of K-means as the starting point, which also explains the improvement of the clustering algorithm of K-means clustering, discussing the application of K-means on the realization of fruit image segmentation.
Understanding software modelers' difficulties and evaluating their performance is crucial to Model-Driven Engineering (MDE) education. The software modeling process contains fine-grained information about the mode...
详细信息
Understanding software modelers' difficulties and evaluating their performance is crucial to Model-Driven Engineering (MDE) education. The software modeling process contains fine-grained information about the modelers' analysis and thought processes. However, existing research primarily focuses on identifying obvious issues in the software modeling process, such as incorrect connections or misunderstandings, but neglects the behavioral patterns that can reveal underlying, unaddressed modeling problems. This oversight fails to identify deeper problems that do not manifest as obvious issues but still represent significant potential problems in the software modeling process. Our research concentrates on detecting and classifying problematic modeling behaviors from software modeling process data, revealing the potential problems hidden in the process for MDE education. Specifically, we first construct problematic modeling behavior patterns from three dimensions, including anomalous time intervals, repetitions, and frequencies, to further identify characteristics and priorities relevant to problematic modeling behaviors. Then, we design rules with characteristics and priorities to detect and classify problematic modeling behaviors from problematic patterns. To evaluate the effectiveness of our proposal, we apply it to a data-flow diagram modeling platform. This platform can record modelers' processes and has been practically used in software engineering courses for five years. We have conducted a case study with 12 participants. The macro F1 of detection and classification problematic modeling behaviors is 82.3%, which shows the effectiveness of our approach. Then, to evaluate the usefulness of our proposal for assisting modeling instructors in MDE education, we conducted another case study with 5 modeling instructors. The results show that our approach can help instructors uncover problems hidden in the software modeling process. The results of two case studies demonstr
This study proposes a novel equivalent modelling method of wind farm (WF), which can be used for small-signal stability analysis and researches on damping control of low-frequency oscillation. First, a complete WF mod...
详细信息
This study proposes a novel equivalent modelling method of wind farm (WF), which can be used for small-signal stability analysis and researches on damping control of low-frequency oscillation. First, a complete WF model described with a set of differential algebraic equations is established, then its linearised model is obtained at one of the steady-state operating points. Next, the eigenanalysis and the modal participation factor (MPF) analysis methods are adopted to evaluate the low-frequency oscillation modes and the corresponding modal participation in each wind turbine generator (WTG). A feature vector for each WTG, describing its oscillation characteristics, is extracted based on the MPFs. Afterwards, all WTGs are clustered into some groups via a clustering algorithm, then the WTGs in the same groups are aggregated into a single WTG, using the weighted summation method. Furthermore, the criterion for the validity of the equivalent model is studied. The proposed modelling and validation methods are verified by simulations.
clustering is an important research topic and cure technology in Data Mining. clustering algorithms have been researched deeply. Now, there are lots of different clustering algorithms, these algorithms are used in spe...
详细信息
clustering is an important research topic and cure technology in Data Mining. clustering algorithms have been researched deeply. Now, there are lots of different clustering algorithms, these algorithms are used in special fields and users. In order to use these algorithms better, some researchers have inferred some standards to evaluate the clustering algorithms. This paper aims to evaluate clustering algorithms form another aspect-using the overlap rate between clusters to compare clustering algorithms. Based on the concept of overlap rate, we can generate data sets with controlled the overlap rate between clusters and the geometrical character. Then we use the data set to evaluate clustering algorithms to find the applicability of clustering algorithms.
We compare three common types of clustering algorithms for use with community data. TWINSPAN is divisive hierarchical, flexible-UPGMA is agglomerative and hierarchical, and ALOC is non-hierarchical. A balanced design ...
详细信息
We compare three common types of clustering algorithms for use with community data. TWINSPAN is divisive hierarchical, flexible-UPGMA is agglomerative and hierarchical, and ALOC is non-hierarchical. A balanced design six-factor model was used to generate 480 data sets of known characteristics. Recovery of the embedded clusters suggests that both flexible UPGMA and ALOC are significantly better than TWINSPAN. No significant difference existed between flexible UPGMA and ALOC.
The rapid development of online social networks has allowed users to obtain information, communicate with each other and express different opinions. Generally, in the same social network, users tend to be influenced b...
详细信息
The rapid development of online social networks has allowed users to obtain information, communicate with each other and express different opinions. Generally, in the same social network, users tend to be influenced by each other and have similar views. However, on another social network, users may have opposite views on the same event. Therefore, research undertaken on a single social network is unable to meet the needs of research on hot topic community discovery. "Cross social network" refers to multiple social networks. The integration of information from multiple social network platforms forms a new unified dataset. In the dataset, information from different platforms for the same event may contain similar or unique topics. This paper proposes a hot topic discovery method on cross social networks. Firstly, text data from different social networks are fused to build a unified model. Then, we obtain latent topic distributions from the unified model using the Labeled Biterm Latent Dirichlet Allocation (LB-LDA) model. Based on the distributions, similar topics are clustered to form several topic communities. Finally, we choose hot topic communities based on their scores. Experiment result on data from three social networks prove that our model is effective and has certain application value.
This article presents a dataset containing messages from the Digital Teaching Assistant (DTA) system, which records the results from the automatic verification of students' solutions to unique programming exercise...
详细信息
This article presents a dataset containing messages from the Digital Teaching Assistant (DTA) system, which records the results from the automatic verification of students' solutions to unique programming exercises of 11 various types. These results are automatically generated by the system, which automates a massive Python programming course at MIREA-Russian Technological University (RTU MIREA). The DTA system is trained to distinguish between approaches to solve programming exercises, as well as to identify correct and incorrect solutions, using intelligent algorithms responsible for analyzing the source code in the DTA system using vector representations of programs based on Markov chains, calculating pairwise Jensen-Shannon distances for programs and using a hierarchical clustering algorithm to detect high-level approaches used by students in solving unique programming exercises. In the process of learning, each student must correctly solve 11 unique exercises in order to receive admission to the intermediate certification in the form of a test. In addition, a motivated student may try to find additional approaches to solve exercises they have already solved. At the same time, not all students are able or willing to solve the 11 unique exercises proposed to them;some will resort to outside help in solving all or part of the exercises. Since all information about the interactions of the students with the DTA system is recorded, it is possible to identify different types of students. First of all, the students can be classified into 2 classes: those who failed to solve 11 exercises and those who received admission to the intermediate certification in the form of a test, having solved the 11 unique exercises correctly. However, it is possible to identify classes of typical, motivated and suspicious students among the latter group based on the proposed dataset. The proposed dataset can be used to develop regression models that will predict outbursts of student activ
Methods Ethics Approval The Icahn School of Medicine at Mount Sinai’s Program for the Protection of Human Subjects approved and granted a waiver of consent for this secondary data analysis study (STUDY- 21-00157) whi...
详细信息
Methods Ethics Approval The Icahn School of Medicine at Mount Sinai’s Program for the Protection of Human Subjects approved and granted a waiver of consent for this secondary data analysis study (STUDY- 21-00157) which was conducted in accordance with the Helsinki Declaration.
Using the Google Maps Geocoding application programming interface (API), we obtained latitude/longitude coordinates of patient residences, which served as the input data to our algorithm.
Value Patient demographics Patients who are homebound 428 Average age (years) 83.9 Average Elixhauser comorbidity score [6]a 3.8 Sex, n (%) Female 323 (75.5) Male 105 (24.5) Racial/ethnic identity, n (%) White 148 (34.6) Black or African American 63 (14.7) Asian 15 (3.5) Hispanic 108 (25.2) Other 86 (20.1) Unknown 8 (1.9) Patient’s family members and caregivers, n 92 Vaccination campaign statistics Providers vaccinating per day (n), range 3-6 Average number of patients vaccinated per day 22.1 Average duration of provider time spent vaccinating (hours) 4.6 Average duration of individual vaccination (including transit time, vaccine administration, and 15-minute postvaccination observation time; minutes) 52 aElixhauser scores were available for 372 of the patients who are homebound.
7clustering is such an algorithm which merges the most similar pair of samples into the same classification at every iteration. The traditional similarity evaluation function is manually designed, but the recent inter...
详细信息
ISBN:
(纸本)9781479937066
7clustering is such an algorithm which merges the most similar pair of samples into the same classification at every iteration. The traditional similarity evaluation function is manually designed, but the recent interest focuses on supervised or semi-supervised learning where the ground-truth clustered data can be available for training. This paper will first describes how to train a similarity function by regarding it as the action-value function in reinforcement learning. Then, the agglomerative clustering algorithm with superpixel is applied to segment a challenging dataset of brain images. The experimental results demonstrate the proposed method remarkably improved the segmentation accuracy.
Wireless sensor networks (WSN) are considered as a special type of ad hoc networks, that represent an emerging technology that is having an increasing success in the scientific, logistical, and military areas. It not ...
详细信息
Wireless sensor networks (WSN) are considered as a special type of ad hoc networks, that represent an emerging technology that is having an increasing success in the scientific, logistical, and military areas. It not only realizes benefits for the customer in a technologically sophisticated way, but in addition provides this with high flexibility. However, the size of the sensors is an important limitation, mainly in terms of energy autonomy and lifetime because the battery must be very small. For this reason, many studies are currently focusing on managing the energy consumed by the sensors in the network. With this in mind, we have proposed an algorithm that improves the quality of service based on a clustering approach. In order to confirm the improvements provided by our algorithm, a simulation is done using MATLAB, in which the performance of our algorithm is evaluated and compared with available clustering protocols (LEACH and SEP).
暂无评论