When faced with large language datasets, machine translation language models are difficult to process, resulting in low accuracy and efficiency of translation. Therefore, this study constructs a framework based on Fir...
详细信息
ISBN:
(数字)9798331528348
ISBN:
(纸本)9798331528355
When faced with large language datasets, machine translation language models are difficult to process, resulting in low accuracy and efficiency of translation. Therefore, this study constructs a framework based on Firstly, by installing Apache Spark and configuring its integration with Hadoop distributed File System (HDFS), a distributed computing environment capable of parallel data processing is established to support rapid processing of large-scale data. Then, the distributed storage system HDFS is adopted to optimize data access efficiency and reduce IO (Input/Output) bottlenecks. Finally, a new iterative training strategy is implemented to gradually improve the translation accuracy of the model through incremental learning. Under the optimal configuration, the combination of a learning rate of 0.1 and batch processing of 2048 achieves a BLEU score of up to 30.8%, with training lasting only 4 hours. This study demonstrates the effectiveness of a large-scale machine translation language model training system based on Spark in improving translation efficiency and accuracy.
The identification of network device is of vital importance to strengthen network identity management and maintain cyberspace security. However, traditional device identification technologies based on the MAC address,...
详细信息
The identification of network device is of vital importance to strengthen network identity management and maintain cyberspace security. However, traditional device identification technologies based on the MAC address, IP address or other explicit identifiers can be deactivated if the identifier is hidden or tampered. Meanwhile, the existing device fingerprinting technology is also restricted by its limited performance and excessive time lapse. In order to realize device identification in high-speed network environment, PFQ kernel module and Storm are used for high-speed packet capture and online traffic analysis, respectively. On this basis, a novel device fingerprinting technology based on runtime environment analysis is proposed, which employs logistic regression to implement online identification with a sliding window mechanism, reaching a recognition accuracy of 77.03% over a 60-minute period. Moreover, performance test results show that the proposed technology can support over 10Gbps traffic capture and online analysis, and the system architecture is justified in practice because of its practicability and extensibility.
Cardiovascular disease is part of global death's main cause. It is the term for all types of diseases that affect the heart or blood vessels. Heart disease is a type of cardiovascular disease. It can be detected e...
详细信息
ISBN:
(纸本)9781665410540
Cardiovascular disease is part of global death's main cause. It is the term for all types of diseases that affect the heart or blood vessels. Heart disease is a type of cardiovascular disease. It can be detected early by examining the arrhythmia presence. Arrhythmia is an abnormal heart rhythm that is commonly diagnosed and evaluated by analyzing electrocardiogram (ECG) signals. In classical techniques, a cardiologist/ clinician used an electrocardiogram (ECG) to monitor the heart rate and rhythm of patients then read the journal activity of patients to diagnose the presence of arrhythmias and to develop appropriate treatment plans. However, The classical techniques take time and effort. The development of arrhythmias diagnosis, toward computational processes, such as arrhythmias detection and classification by using machine learning and deep learning. A convolutional neural network (CNN) is a popular method used to classify arrhythmia. Dataset pre-processing was also considered to achieve the best performance models. MIT-BIH Arrhythmia Database was used as our dataset. Our study used the EfficientN et- V2 which is a type of convolutional neural network to perform the classification of five types of arrhythmias. In pre-processing, the ECG signal was cut each 1 second (360 data), signal augmentation is applied to balance the amount of data in each class, and then the Continues Wavelet Transform (CWT) is employed to transform the ECG signal into a scalogram. The dataset is then distributed into subsets by using modulo operation to get variants of data in each subset. The colormap is applied to convert scalograms into RGB images. By this scheme, our study achieved superior accuracy than the existing method, with an accuracy rate of 99.97%.
Recently many researchers have been studying the feasibility of a support system for medical diagnosis through the World Wide Web, and there have been many attempts to construct a distributed database of medical image...
详细信息
Recently many researchers have been studying the feasibility of a support system for medical diagnosis through the World Wide Web, and there have been many attempts to construct a distributed database of medical images connected by the computer network. In these cases, the efficient and easy presentation of three-dimensional (3-D) medical images should be a quite important tool. The authors made a 3-D presentation system through the World Wide Web with a practical example image. The target 3-D object is a human head reconstructed from 177 MRI cross section slices, and the data for this display are described by the VRML 2.0 format. The observer can see the 3-D head model from any viewpoint, furthermore, he can operate the model through the network in a simple manner. Now, VRML is a popular and standard language to describe 3-D object, so one doesn't need to prepare the special software or special instruments. However, one needs some procedure to use it more efficiently. Here, the authors present an interactive 3-D presentation procedure for medical images on the network using VRML 2.0, and it is mentioned the presentation procedure is quite useful for remote medial service, teaching procedure for medical students and explanation for patients.
With the accelerated growth of big spatial data volume produced by various devices, a plethora of development research to handle big spatial data has been done in the past decade. This research reviews the fundamental...
详细信息
ISBN:
(数字)9781665472159
ISBN:
(纸本)9781665472166
With the accelerated growth of big spatial data volume produced by various devices, a plethora of development research to handle big spatial data has been done in the past decade. This research reviews the fundamental components and characteristics included in analytic systems for effectively managing big spatial data. After that, an overview of recent researches on big spatial data is discussed using four main components: source, storage, processing, and visualization. The components are then described in detail, including examples of how they are used in existing work. Afterward, some researchers’ work on systems of big spatial data is discussed showing how they support these four components. Furthermore, a comparison between these works is given in terms of the important performance metrics. Finally, this paper addresses the future research directions.
The authors argue that immutability is a suitable base on which to build distributed software engineering environments. They discuss the various approaches to maintaining consistency in immutable object systems and co...
详细信息
The authors argue that immutability is a suitable base on which to build distributed software engineering environments. They discuss the various approaches to maintaining consistency in immutable object systems and compare D.P. Reed's model of time domain addressing (ACM Trans. Comput. Syst., vol.1, no.1, p.3-23, Feb. 1983), with their own model of domain relative addressing. They demonstrate the suitability of domain relative addressing for use in distributed-software engineering environments.< >
With the change of energy consumption structure, the integrated energy system coupled with multiple energy flows such as electricity, natural gas and heat is developing rapidly. At present, the decision-making center ...
详细信息
ISBN:
(纸本)9781665414401
With the change of energy consumption structure, the integrated energy system coupled with multiple energy flows such as electricity, natural gas and heat is developing rapidly. At present, the decision-making center of the integrated energy park is the park dispatching control center. The dispatching process involves in the collection of distributed resource data, the joint centralized scheduling of multiple data, and the allocation and distribution of dispatching tasks. However, with the expansion of the scale of the integrated energy system, the centralized control center needs to collect and process a large amount of data, this depends on the strong computing power of the park control center and the reliable communication ability with distributed resources. For the future large-scale comprehensive energy park, centralized computing has many deficiencies. In this paper, the distributed computing method is used to decompose the joint scheduling problem into electric-gas subproblems, and the strong coupling relationship between electricity, natural gas and heat is considered; In the longitudinal consideration of source and reservoir interaction, a day ahead scheduling model considering economic cost and fluctuation cost of tie line is established. Finally, an example is given to analyze the feasibility of the distributed optimization model.
作者:
M. NikRaveshBISC Program
Computer Sciences Division EECS Department University of California Berkeley CA USA
Retrieving relevant information is a crucial component of cased-based reasoning systems for Internet applications such as search engines. The task is to use user-defined queries to retrieve useful information accordin...
详细信息
Retrieving relevant information is a crucial component of cased-based reasoning systems for Internet applications such as search engines. The task is to use user-defined queries to retrieve useful information according to certain measures. Even though techniques exist for locating exact matches, finding relevant partial matches might be a problem. It may not be also easy to specify query requests precisely and completely - resulting in a situation known as a fuzzy-querying. It is usually not a problem for small domains, but for large repositories such as World Wide Web, a request specification becomes a bottleneck. Thus, a flexible retrieval algorithm is required, allowing for imprecise specification or search. Therefore, we envision that non-classical techniques such as fuzzy logic based-clustering methodology based on perception, fuzzy similarity, fuzzy aggregation, and FLSI for automatic information retrieval and search with partial matches are required.
In cloud computing environments, computing power, storage, platforms, and software are abstracted and virtualized as all kind of cloud services. Meanwhile complex applications can be described as processes invoking se...
详细信息
In cloud computing environments, computing power, storage, platforms, and software are abstracted and virtualized as all kind of cloud services. Meanwhile complex applications can be described as processes invoking services selected at runtime. As a nonfunctional requirement, quality-of-service (QoS) is an important selection basis of cloud services. Nevertheless, the complex network environment and illegal operations generally result in the untrustworthy QoS data in reality, and then influence the preciseness and trustworthiness of cloud services scheduling. This paper proposes a trustworthy management approach for cloud services QoS. Results of experiments demonstrate the effectiveness of the proposed trustworthy management approach.
暂无评论