In the digital age, the widespread preference for instructional videos as an educational aid demands cutting-edge technologies that can swiftly pinpoint video segments in response to queries. In light of the likelihoo...
详细信息
ISBN:
(纸本)9789819794423;9789819794430
In the digital age, the widespread preference for instructional videos as an educational aid demands cutting-edge technologies that can swiftly pinpoint video segments in response to queries. In light of the likelihood that learners from various cultural backgrounds may pose questions in different languages, the multilingual temporal answer grounding in single video (mTAGSV) challenge has been introduced. MTAGSV requires models to precisely identify the particular time segment within the video that align withthe query posed in Chinese or English. By addressing the limitations of existing monolingual approaches and their inefficacy with silent videos, we utilize optical character recognition (OCR) to enhance video information and leverage large language models (LLMs) to bridge linguistic gap. For videos that contain audio, the subtitles are extracted by an automated speech recognition (ASR) tool. For silent videos or those with insufficient audio-based subtitles, we leverage an OCR tool to extract textual content from video frames, refining the OCR-generated text to act as a surrogate for subtitles. Furthermore, we leverage LLMs to translate English queries into their Chinese equivalents, bridging the linguistic divide between queries and video contents. the MutualSL model is employed as the backbone network for extracting features from textual subtitles and visual frames. through extensive experiments, we demonstrate that our proposed techniques enhance the task performance, securing first place in track 1 of NLPCC 2024 shared task 7.
Remote photoplethysmography (rPPG) is a technology that extracts blood volume pulse (BVP) signals by analyzing the wavelength differences of light reflected from a person’s skin using devices such as RGB and infrared...
详细信息
We have known that reinforcement learning, deep learning, and deep reinforcement learning effectively acquire action rules for the autonomous motion of objects. However, it is known that these learning processes requi...
详细信息
In recent years, the increasing influence of machinelearning in different industries had inspired many traders to benefit from it in the world of finance, stock trading is one of the most important activities. Predic...
详细信息
Partial differential equations (PDEs) are the most ubiquitous tool for modeling problems in nature. In recent years, machinelearning techniques are adopted to solve PDEs. However, the prediction errors of existing ma...
详细信息
the prerequisite relationship of the concept plays an important role in education. Previously, the prerequisites were given by experts, which is very costly. Withthe development of the Internet, many new concepts hav...
详细信息
ELM (Extreme learningmachine) is a random method for Single-hidden layer feedforward neural network construction, and MFCC (Mel-frequency Cepstrum Coefficient) is a kind of feature parameter for speech recognition. B...
详细信息
Penetration test is an important means to test the security of the web system. It has been mainly carried out by tester manually. the main reason is that it is difficult to generate test path and code automatically be...
详细信息
Advancement in hardware capability has opened up the possibility of performing ML inference tasks at the edge using a large volume of sensory data generated from IoT devices such as cameras. As cameras become more per...
详细信息
ISBN:
(纸本)9784907626488
Advancement in hardware capability has opened up the possibility of performing ML inference tasks at the edge using a large volume of sensory data generated from IoT devices such as cameras. As cameras become more pervasive, edge systems need to process streams from multiple sources with overlapping fields-of-view. In this position paper, we describe a collaborative sensing mechanism at the edge for such cases. We introduce a View Mapping Database (DB) that maps regions in a camera's field of view to regions in other cameras' view. We analyze characteristics of 5 video streams that capture an intersection from multiple angles, prototype a View Mapping DB, and present our preliminary results.
In this paper, we defined the decision rule in multi-decision data from the perspective of multi-level cognitive concept learning. Compared withthe decision rule based on rough set theory, we proved that the proposed...
详细信息
ISBN:
(纸本)9781665466080
In this paper, we defined the decision rule in multi-decision data from the perspective of multi-level cognitive concept learning. Compared withthe decision rule based on rough set theory, we proved that the proposed decision rule are consistent withthe definition of decision rule in rough set theory. To show the feasibility and effectiveness of our proposed decision rule, an experimental evaluation of rule acquisition is studied in a multi-decision data about cognitve real-world animals and the corresponding classification.
暂无评论