The integrated sensing and communication (ISAC) waveform with a low sidelobe level on all delay indices is important for probing targets in the ISAC scenario. In this article, we consider the problem of jointly design...
详细信息
Ridge regression (RR)-based methods aim to obtain a low-dimensional subspace for feature extraction. However, the subspace's dimensionality does not exceed the number of data categories, hence compromising its cap...
详细信息
Image steganography is the art and science of secure communication by concealing information within digital images. In recent years, the techniques of steganographic cost learning have developed rapidly. Although the ...
详细信息
Image steganography is the art and science of secure communication by concealing information within digital images. In recent years, the techniques of steganographic cost learning have developed rapidly. Although the existing methods can learn satisfactory additive costs, the interplay of different pixels' embedding impacts has not been considered, so the potential of learning may not be fully exploited. To overcome this limitation, in this paper, a reinforcement learning paradigm called Jo Po L(joint policy learning) is proposed to extend the idea of additive cost learning to a non-additive situation. Jo Po L aims to capture the interactions within pixel blocks by defining embedding policies and evaluating contributions of embedding impacts on a block level rather than a pixel level. Then, a policy network is utilized to learn optimal joint embedding policies for pixel blocks through interactions with the environment. Afterwards,these policies can be converted into joint embedding costs for practical message embedding. The structure of the policy network is designed with an effective attention mechanism and incorporated with the domain knowledge derived from traditional non-additive steganographic methods. The environment is responsible for assigning rewards according to the impacts of the sampled joint embedding actions, which are evaluated by the gradient information of a neural network-based steganalyzer. Experimental results show that the proposed non-additive method Jo Po L significantly outperforms the existing additive methods against both feature-based and CNN-based steganalzyers over different payloads.
Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection ***,most audio‐visual wake word s...
详细信息
Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection ***,most audio‐visual wake word spotting models are only suitable for simple single‐speaker scenarios and require high computational *** development is hindered by complex multi‐person scenarios and computational limitations in mobile *** this paper,a novel audio‐visual model is proposed for on‐device multi‐person wake word ***,an attention‐based audio‐visual voice activity detection module is presented,which generates an attention score matrix of audio and visual representations to derive active speaker ***,the knowledge distillation method is introduced to transfer knowledge from the large model to the on‐device model to control the size of our ***,a new audio‐visual dataset,PKU‐KWS,is collected for sentence‐level multi‐person wake word *** results on the PKU‐KWS dataset show that this approach outperforms the previous state‐of‐the‐art methods.
Object search is a fundamental skill for household robots, yet the core problem lies in the robot's ability to locate the target object accurately. The dynamic nature of household environments, characterized by th...
详细信息
Object search is a fundamental skill for household robots, yet the core problem lies in the robot’s ability to locate the target object accurately. The dynamic nature of household environments, characterized by the a...
详细信息
ISBN:
(数字)9798350377705
ISBN:
(纸本)9798350377712
Object search is a fundamental skill for household robots, yet the core problem lies in the robot’s ability to locate the target object accurately. The dynamic nature of household environments, characterized by the arbitrary placement of daily objects by users, makes it challenging to perform target localization. To efficiently locate the target object, the robot needs to be equipped with knowledge at both the object and room level. However, existing approaches rely solely on one type of knowledge, leading to unsatisfactory object localization performance and, consequently, inefficient object search processes. To address this problem, we propose a commonsense scene graph-based target localization, CSG-TL, to enhance target object search in the household environment. Given the pre-built map with stationary items, the robot models the room-level spatial knowledge with object-level commonsense knowledge generated by a large language model (LLM) to a commonsense scene graph (CSG), supporting both types of knowledge for CSG-TL. To demonstrate the superiority of CSG-TL on target localization, extensive experiments are performed on the real-world ScanNet dataset and the AI2THOR simulator. Moreover, we have extended CSG-TL to an object search framework, CSG-OS, validated in both simulated and real-world environments. Code and videos are available at https://***/view/csg-os.
Diabetic retinopathy(DR),the main cause of irreversible blindness,is one of the most common complications of *** present,deep convolutional neural networks have achieved promising performance in automatic DR detection...
详细信息
Diabetic retinopathy(DR),the main cause of irreversible blindness,is one of the most common complications of *** present,deep convolutional neural networks have achieved promising performance in automatic DR detection *** convolution operation of methods is a local cross-correlation operation,whose receptive field de-termines the size of the local neighbourhood for ***,for retinal fundus photographs,there is not only the local information but also long-distance dependence between the lesion features(*** and exudates)scattered throughout the whole *** proposed method incorporates correlations between long-range patches into the deep learning framework to improve DR ***-wise re-lationships are used to enhance the local patch features since lesions of DR usually appear as *** Long-Range unit in the proposed network with a residual structure can be flexibly embedded into other trained *** experimental results demon-strate that the proposed approach can achieve higher accuracy than existing state-of-the-art models on Messidor and EyePACS datasets.
Can we construct an explainable face recognition network able to learn a facial part-based feature like eyes, nose, mouth and so forth, without any manual annotation or additionalsion datasets? In this paper, we propo...
详细信息
Local feature descriptors play a crucial role in computervision problems, especially robot motion. Existing descriptors are highly accurate, but their performance de-pends on the influence of distracting factors, suc...
详细信息
Hair editing is a critical image synthesis task that aims to edit hair color and hairstyle using text descriptions or reference images, while preserving irrelevant attributes (e.g., identity, background, cloth). Many ...
暂无评论