Network is the basis of information exchange. With the rapid increase of information, the environment of network has become complex, and the problem of network data diversification is serious. Effective management of ...
详细信息
Network is the basis of information exchange. With the rapid increase of information, the environment of network has become complex, and the problem of network data diversification is serious. Effective management of these heterogeneous data is an urgent problem to be solved. Aiming at the problems of long processing time and poor efficiency when the current heterogeneous large data is processed, a parallel clustering algorithm is proposed to achieve effective data processing.
clustering is one of the most important methods to discover the intrinsic grouping in a set of unlabeled data. As ways of getting data are more various and easier, the amount of data processed is increasing exponentia...
详细信息
ISBN:
(纸本)9781728185262
clustering is one of the most important methods to discover the intrinsic grouping in a set of unlabeled data. As ways of getting data are more various and easier, the amount of data processed is increasing exponentially and the data is more likely to be located at different clients. Traditional clustering methods cannot process the large dataset one time due to the limit of memories. In this paper, an Image Scaling Density-based clustering (ISDC) algorithm is proposed. ISDC can process data by a client alone as well as process in parallel by several clients to deal with data located at different clients. The ISDC algorithm does not need any parameters to be designated manually. The parameters are determined by the algorithm based on the statistical features of dataset. In parallel ISDC or PISDC, each data block located at different client is clustered alone to form intermediate clusters. By border detection algorithm, representative clusters are formed by the points that are at the edge of intermediate clusters. Then, in global clustering, representative clusters from all clients are merged by the server. The border detection algorithm reduces the communication cost between clients and the server, as well as increases the efficiency of global clustering. At last, the server feeds back the clustering information to clients to complete clustering. Our experimental results verified the effectiveness and efficiency of PISDC and ISDC.
Automatic segmentation of images, which is now feasible through an increase in available computing power, has become an important challenge in many fields. A key technology for obtaining such images is optical coheren...
详细信息
Automatic segmentation of images, which is now feasible through an increase in available computing power, has become an important challenge in many fields. A key technology for obtaining such images is optical coherence tomography (OCT), which is already widely applied in ophthalmology and more recently in the pharmaceutical industry, as a method for real-time monitoring of solid oral dosage form coating processes. Accurately detecting the boundaries of objects in OCT images is required for a meaningful automatic evaluation. During in-line monitoring, the evaluation time for each image is a crucial factor to enable the real-time analysis of large amounts of data. The segmentation of images has previously been achieved via machine learning methods, which generally require a large number of training examples. This work aims to overcome this limitation by employing unsupervised machine learning for the segmentation of OCT images of coated pharmaceutical tablets. An adapted clustering method was specifically developed to achieve the fast real-time detection of the coating layer's boundaries in OCT-generated images. A newly developed parallel implementation of DBSCAN, that is well suited for image evaluation, makes it possible to use this novel method for real-time process analytical technology (PAT) applications. This approach has been shown to be significantly faster than so far established methods for segmenting similar OCT images. Furthermore, the image-specific parallelized DBSCAN algorithm has been shown to be around three times faster than other parallel implementations.
With the rapid development of the Internet and mobile Internet, the amount of data and information generated by people has dramatically increased. The demand for rapid processing of data by computers has become increa...
详细信息
With the rapid development of the Internet and mobile Internet, the amount of data and information generated by people has dramatically increased. The demand for rapid processing of data by computers has become increasingly urgent. clustering analysis is one of the most important data processing methods. Existing clusteringalgorithms have a high time complexity in calculating the center point and a large amount of resource consumption and poor execution efficiency in the serial processing of mass data. Therefore, efficient and accurate parallel clustering algorithms need to be studied. This paper introduced the parallel mechanism and its computing platform, summarized the existing parallel clustering algorithms, classified them according to clustering methods, and discuss the key technologies and platforms proposed in the existing work.
Multiple emerging technologies like social networks and IoT generate huge amounts of data on daily basis. This leads us to analyze and cluster this data, so we can uncover hidden values and patterns. DBSCAN is a power...
详细信息
ISBN:
(纸本)9781538666142
Multiple emerging technologies like social networks and IoT generate huge amounts of data on daily basis. This leads us to analyze and cluster this data, so we can uncover hidden values and patterns. DBSCAN is a powerful clusteringalgorithm which detects patterns by clustering data based on its density, it classifies each point as a core point, border point or a noise. DBSCAN is already used in many applications like retail business, medical imaging and text mining. However, the existence of advanced networks and sophisticated machines increased the need to switch traditional clusteringalgorithms from single node to parallel nodes environment. In our paper, we present a solution to parallelize DBSCAN by using Quadtree data structure. Our solution distributes the dataset into smaller chunks, then it utilizes the parallel programming frameworks such as Map-Reduce to provide an infrastructure to store and process these small chunks of data. We use various training sets to evaluate the performance of both traditional DBSCAN and our Map-Reduce DBSCAN prototype. We analyze our solution in terms of time complexity, efficiency, scalability, value and accuracy. Our analysis illustrates the benefits of using parallelized DBSCAN clustering, it shows the usefulness of managing subsets of data using Quadtree data structure.
暂无评论