In this paper, a methodology for real-time image classification on multimedia platforms has been developed. For this purpose, six feedforward neural network models were trained with images from two databases, which we...
详细信息
In this paper, a methodology for real-time image classification on multimedia platforms has been developed. For this purpose, six feedforward neural network models were trained with images from two databases, which were preprocessed by three texture extraction methods: local binary pattern-uniform (LBP-U), gray level co-occurrence matrix (GLCM), and wavelet image scattering (WIS). The databases used consist of 157,448 images of the sections with the thumbnails of the platform content (mosaics) representing 14 classes and 38,214 images with the descriptions of the available content (descriptors) representing 11 classes, where all images have a resolution of 1280 x 720 pixels. The six models (three for mosaics and three for descriptors) were validated with images from the databases, which were not part of the training process, to obtain their performance metrics. The training and validation process was performed 30 times, and the average results were compared. The most outstanding models for each database were the neural networks trained with the wavelet image scattering method, with metrics of 99.97 +/- 0.01 % accuracy, 99.99 +/- 0.01 % specificity, 99.84 +/- 0.06 % sensitivity, 99.59 +/- 0.13 % precision and 99.71 +/- 0. 08 % of F1 score with a response time of 0.7349 seconds for the model trained with mosaics and with metrics of 99.90 +/- 0.03 % of accuracy, 99.94 +/- 0.02 % of specificity, 99.58 +/- 0.15 % of sensitivity, 98.63 +/- 0.55 % of precision and 99.09 +/- 0.30 % of F1 score with a response time of 0.6227 seconds for the model trained with descriptor images. The results are very significant due to the high efficiency obtained and confirm the effectiveness of the models with the WIS method for the classification of multimedia platform images with the characteristics of the databases used. It is suggested that the remaining methods be adjusted to improve their performance.
Image encryption is a fundamental component of modern data security that guarantees the integrity, privacy, and confidentiality of sensitive visual content. This paper provides a thorough examination of image encrypti...
详细信息
The rapid advancement of technology has been revolutionizing the field of sports media, where there is a growing need for sophisticated data processing methods. Current methodologies for extracting information from so...
详细信息
ISBN:
(纸本)9798400704123
The rapid advancement of technology has been revolutionizing the field of sports media, where there is a growing need for sophisticated data processing methods. Current methodologies for extracting information from soccer broadcast videos to generate game highlights and summaries for social media are predominantly manual and rely heavily on text-based NLP techniques, overlooking the rich visual and auditory information available. In response to this challenge, our research introduces SoccerSum, a tool that innovates in the field by integrating computer vision, audio analysis with advanced language models like GPT-4. This multimodal approach enables automated, enriched content summarization, including detection of players and key field elements, thereby enhancing the metadata used in summarization algorithms. SoccerSum uniquely combines textual and visual data, offering a comprehensive solution for generating accurate, platform-specific content. This development represents a significant advancement in automated, data-driven sports media dissemination, and sets a new benchmark in the realm of soccer information extraction. A video of the demo can be found here: https://***/za4VIi2ARXY.
HTTP Adaptive Streaming (HAS) has emerged as the predominant solution for delivering video content on the Internet. The urgency of the climate crisis has accentuated the demand for investigations into the environmenta...
详细信息
ISBN:
(纸本)9798400704123
HTTP Adaptive Streaming (HAS) has emerged as the predominant solution for delivering video content on the Internet. The urgency of the climate crisis has accentuated the demand for investigations into the environmental impact of HAS techniques. In HAS, clients rely on adaptive bitrate (ABR) algorithms to drive the quality selection for video segments. Focusing on maximizing video quality, these algorithms often prioritize maximizing video quality under favorable network conditions, disregarding the impact of energy consumption. To thoroughly investigate the effects of energy consumption, including the impact of bitrate and other video parameters such as resolution and codec, further research is still needed. In this paper, we propose COCONUT, a content COnsumption eNergy measUrement daTaset for adaptive video streaming collected through a digital multimeter on various types of client devices, such as laptop and smartphone, streaming MPEGDASH segments. Furthermore, we analyze the dataset and find insights into the influence of multiple codecs, various video encoding parameters, such as segment length, framerate, bitrates, and resolutions, and decoding type, i.e., hardware or software, on energy consumption. We gather and categorize these measurements based on segment retrieval through the network interface card (NIC), decoding, and rendering. Additionally, we compare the impact of different HAS players on energy consumption. This research offers valuable perspectives on the energy usage of streaming devices, which could contribute to creating a media consumption experience that is both more sustainable and resource-efficient.
The way we create, consume and interact with multimediacontent has changed significantly in recent years with the advent of affordable recording devices and easy sharing and access in the form of mobile phones. With ...
详细信息
To assess the quality of multimediacontent, create datasets, and train objective quality metrics, one needs to collect subjective opinions from annotators. Different subjective methodologies exist, from direct rating...
详细信息
ISBN:
(纸本)9798400701481
To assess the quality of multimediacontent, create datasets, and train objective quality metrics, one needs to collect subjective opinions from annotators. Different subjective methodologies exist, from direct rating with single or double stimuli to indirect rating with pairwise comparisons. Triplet and quadruplet-based comparisons are a type of indirect rating. From these comparisons and preferences on stimuli, we can place the assessed stimuli on a perceptual scale (e.g., from low to high quality). Maximum Likelihood Difference Scaling (MLDS) solver is one of these algorithms working with triplets and quadruplets. A participant is asked to compare intervals inside pairs of stimuli: (a,b) and (c,d), where a,b,c,d are stimuli forming a quadruplet. However, one limitation is that the perceptual scales retrieved from stimuli of different contents are usually not comparable. We previously offered a solution to measure the inter-content scale of multiple contents. This paper presents an open-source python implementation of the method and demonstrates its use on three datasets collected in an in-lab environment. We compared the accuracy and effectiveness of the method using pairwise, triplet, and quadruplet for intra-content annotations. The code is available here: https://***/andreaspastor/MLDS_inter_content_scaling.
Subscribers seeking to accessmultimediacontent utilize wireless networks significantly. multimediaaccess is influenced and impaired by variability in the availability of wireless access links. The impairment arises...
详细信息
ISBN:
(数字)9781665484220
ISBN:
(纸本)9781665484220
Subscribers seeking to accessmultimediacontent utilize wireless networks significantly. multimediaaccess is influenced and impaired by variability in the availability of wireless access links. The impairment arises due to the occurrence of rain induced attenuation. This results in a multimediacontent viewing gap (MCVG). The proposed research addresses the challenge of the MCVG and proposes the incorporation of artificial intelligence with computing and networking entities in enabling the creation of multi - media content. This is done aboard entities located in the subscriber residence instead of entities that were previously out of the subscriber residence. Performance analysis shows that the proposed mechanism outperforms existing mechanism and reduces access costs by at least 22.6% and up to 71% on average. The proposed mechanism also enhances the contentaccess duration by 35.6%.
Multi-media content rendering is an important computing intensive task required for multi-media content generation. The content rendering is executed aboard ground based data centres. The operation of terrestrial data...
详细信息
ISBN:
(纸本)9798350387919;9798350387902
Multi-media content rendering is an important computing intensive task required for multi-media content generation. The content rendering is executed aboard ground based data centres. The operation of terrestrial data centres is inefficient when a significant amount of power is expended on cooling alongside a high-water footprint (WF). This results in a high-power usage effectiveness (PUE). The high PUE, and WF can be reduced via the use of alternative freely cooled data centre such as the underwater data centre. The research proposes render workload migration strategy enabling an underwater data centre to execute the render workload. Furthermore, the underwater data centre hosts mini - nuclear reactors that enhance its power accessibility instead of only relying on highly variable renewable energy resources in the underwater environment. Performance evaluation shows that the use of the PUE, and accessible power is improved by proposed approach by an average of 26.5%, and 63.8%, respectively.
Video data can be slow to process due to the size of video streams and the computational complexity needed to decode, transform, and encode them. These challenges are particularly significant in interactive applicatio...
详细信息
ISBN:
(纸本)9798400714672
Video data can be slow to process due to the size of video streams and the computational complexity needed to decode, transform, and encode them. These challenges are particularly significant in interactive applications, such as quickly generating compilation videos from a user search. We look at optimizing access to source video segments in multimediasystems where multiple separately encoded copies of video sources are available, such as proxy/optimized media in conventional non-linear video editors or VOD streams in content distribution networks. Rather than selecting a single source to use (e.g., "use the lowest-bitrate 720p source"), we specify a minimum visual quality (e.g., " use any frames with VMAF >= 85"). This quality constraint and the needed segment bounds are used to find the lowest-latency operations to decode a segment from multiple available sources with diverse bitrates, resolutions, and codecs. This uses higher-quality/slower-to-decode sources if the encoding is better aligned for the specific segment bounds, which can provide faster access than using just one lower-quality source. We provide a general solution to this Quality-Aware Multi-Source Selection problem with optimal computational complexity. We create a dataset using adaptive-bitrate streaming Video on Demand sources from YouTube's CDN. We evaluate our algorithm on simple segment decoding as well as embedded into a larger editing system-a declarative video editor. Our evaluation shows up to 23% lower latency access, depending on segment length, at identical visual quality levels.
This scientific paper addresses the pervasive issue of spoofing in biometric user identification, focusing on its manifestation in facial recognition systems. Spoofing involves deceptive communication originating from...
详细信息
ISBN:
(纸本)9798350362169
This scientific paper addresses the pervasive issue of spoofing in biometric user identification, focusing on its manifestation in facial recognition systems. Spoofing involves deceptive communication originating from an untrusted source, aiming to gain unauthorized access to sensitive information. The study delves into the vulnerabilities of facial biometrics, particularly in scenarios where malicious actors attempt to imitate a user's face using masks, photos, or digital means.
暂无评论