multimediainformationretrieval has been a challenging problem due to the diversity and size of multimedia data along with difficulty of expressing desired queries. This paper highlights key points of multimedia retr...
详细信息
ISBN:
(纸本)9781538618578
multimediainformationretrieval has been a challenging problem due to the diversity and size of multimedia data along with difficulty of expressing desired queries. This paper highlights key points of multimediaretrieval approaches that work. After providing discussion on the success of multimediainformationretrieval, the paper analyzes the problem of retrieval challenge (i.e., the capability of retrieving every multimedia object) and proposes page-oriented precision as an alternative evaluation measure for the performance of multimediainformation systems.
In this paper we survey the methods for control and creative interaction with pre-trained generative models for audio and music. By using reduced (lossy) encoding and symbolization steps we are able to examine the lev...
详细信息
ISBN:
(数字)9781665495486
ISBN:
(纸本)9781665495486
In this paper we survey the methods for control and creative interaction with pre-trained generative models for audio and music. By using reduced (lossy) encoding and symbolization steps we are able to examine the level of information that is passing between the environment (the musician) and the agent (machine improvisation). We further use the concept of music information dynamics to find an optimal symbolization in terms of predictive information measure. Methods and strategies for generative models are surveyed in this paper and their implications for creative interaction with the machine are discussed in the musical improvisation framework.
Music is a collection of measures expertly crafted and placed together by composers. However, most composer classification models aim to recognize entire songs and may not be suited for thematic analysis that analyzes...
详细信息
ISBN:
(纸本)9798350351439;9798350351422
Music is a collection of measures expertly crafted and placed together by composers. However, most composer classification models aim to recognize entire songs and may not be suited for thematic analysis that analyzes each collection of measures within a song. Additionally, composer classification techniques rely on large amounts of songs, and previously presented transformer neural network iterations cannot accommodate samples outside of the training dataset due to their tokenization process. In this paper, we propose a lightweight retrieval technique that achieves the single composer benchmark accuracies presented by previous classification models at a fraction of the computing resources. This solution can be applied to variable-length MIDI songs from composers both included and excluded in the training dataset, and it achieves 100% accuracy in our performance study.
The crosslingual voice conversion problem refers to the replacement of a speaker's timbre or vocal identity in a recorded sentence, assuming that the source speaker and target speaker use different languages. This...
详细信息
Real-time and accurate position estimation is critical for various multi-robot applications and serves as a prerequisite for location-based multi-sensor data analysis. However, it is often impeded by energy, sensing, ...
详细信息
ISBN:
(数字)9781665495486
ISBN:
(纸本)9781665495486
Real-time and accurate position estimation is critical for various multi-robot applications and serves as a prerequisite for location-based multi-sensor data analysis. However, it is often impeded by energy, sensing, and processing limitations. In this work, we study the problem of information-seeking in localization and navigation in multi-agent systems, which aims to navigate mobile agents while reducing position errors. We formalize information-seeking as reducing spatial uncertainty and introduce an efficient motion controller based on artificial potential fields superimposing attractive, repulsive, and information-seeking forces. We evaluate the effect of information-seeking on localization and mission planning in a simulation study with non-collaborative and collaborative localization approaches.
Dichromats recognize colors using two out of three cone cells;L, M, and S. For example, red-green color blinds cannot distinguish the color between red, yellow, and green. To extend the ability of dichromats to recogn...
详细信息
ISBN:
(数字)9781665495486
ISBN:
(纸本)9781665495486
Dichromats recognize colors using two out of three cone cells;L, M, and S. For example, red-green color blinds cannot distinguish the color between red, yellow, and green. To extend the ability of dichromats to recognize the color difference, we propose a method to expand the color difference when observed by dichromats. We analyze the color between the neighboring pixels in chromaticity space. In addition, we employ multiresolution analysis to form the Poisson equation. Our multiresolution analysis is non-linear depending on the saturation of each pixel's color. Solving the multiresolution Poisson equation results in the color enhanced image.
Deep Learning (DL) provided powerful tools for various visual information analysis and retrieval tasks, outperforming previously used methods. However, despite the potential of such approaches for various tasks, apply...
详细信息
ISBN:
(数字)9781665495486
ISBN:
(纸本)9781665495486
Deep Learning (DL) provided powerful tools for various visual information analysis and retrieval tasks, outperforming previously used methods. However, despite the potential of such approaches for various tasks, applying them in video stream applications, such as media monitoring or surveillance, where a large number of streams should be processed in parallel, is not trivial and comes with several challenges. This paper aims to provide a brief overview of the current state-of-the-art in DL tools that can be used for deep video stream information analysis and retrieval. Apart from a review of the current literature, we also include experimental results discussing deployment challenges, ranging from speed to energy consumption, demonstrating the capabilities of readily available commodity hardware in processing video streams for selected DL models.
In order to efficiently retrieve information in general, and multimedia content in particular, new network architectures are required, centered around information and content distribution. Most information-Centric arc...
详细信息
ISBN:
(纸本)9781538618578
In order to efficiently retrieve information in general, and multimedia content in particular, new network architectures are required, centered around information and content distribution. Most information-Centric architectures are build upon an asymmetry: the client nodes request information, such as multimedia data or a video stream based upon that information's name. That request is therefore content-routed. However, the clients themselves are either not addressed, or addressed by a host name. We propose to address the nodes by the information they contain, so that both requests for informations and responses with information can therefore be routed based upon a related, if not similar, content-routing mechanism.
暂无评论