computer vision has been used in many areas such as medical, transportation, military, geography, etc. The fast development of sensor devices inside camera and satellite provides not only red-greed-blue (RGB) images b...
详细信息
Industry 4.0 has driven the need for Information technology (IT) & Operational technology (OT) convergence to modernize OT by leveraging IT. The challenges for manufacturing operations are utilizing and converting...
详细信息
As the quantities of data recorded by embedded edge sensors grow, so too does the need for intelligent local processing. Such data often comes in the form of time-series signals, based on which real-time predictions c...
详细信息
In recent years, Field-programmable Gate Arrays (FPGAs) are gaining attention as computational acceleration devices in the field of high-performance computing. By implementing specialized circuits that can be customiz...
详细信息
ISBN:
(数字)9798350383454
ISBN:
(纸本)9798350383461
In recent years, Field-programmable Gate Arrays (FPGAs) are gaining attention as computational acceleration devices in the field of high-performance computing. By implementing specialized circuits that can be customized to specific problems, FPGAs can achieve efficient parallelization with low latency even for complex tasks.
This paper investigates the effect of bitrate control methods on QoE of multi-view video and audio streaming with MPEG-DASH. We adopt three bitrate control methods for conventional single-view video streaming to the M...
This paper investigates the effect of bitrate control methods on QoE of multi-view video and audio streaming with MPEG-DASH. We adopt three bitrate control methods for conventional single-view video streaming to the MVV-A system with MPEG-DASH. We conduct a subjective experiment changing available network bandwidth and investigate the effect of the methods on QoE.
The wireless capsule endoscope can comprehensively inspect the digestive tract. It has the advantages of safety, painlessness, and no postoperative reaction, but it has some disadvantages. When the most critical issue...
详细信息
This study evaluates the performance of the REAL-ESRGAN [1] model on images with varying levels of degradation using the DIV2K dataset [2], such as the Wild, the Mild, the Difficult, and the x8 subsets. REAL-ESRGAN wa...
详细信息
ISBN:
(数字)9798350389654
ISBN:
(纸本)9798350389661
This study evaluates the performance of the REAL-ESRGAN [1] model on images with varying levels of degradation using the DIV2K dataset [2], such as the Wild, the Mild, the Difficult, and the x8 subsets. REAL-ESRGAN was created to solve super-resolution problems and aims to produce high-resolution images from low-resolution images. Experiments were conducted at scales of x2 and x4, and performance was measured using Full-Reference metrics (LPIPS, PSNR, SSIM) and No-Reference metrics (NIQE, MANIQA, CLIPIQA, and PI). The Results were good, especially with the x2 scale; it has higher PSNR and SSIM scores, lower LPIPS and NIQE values, and enhanced visual and perceptual quality. The model faced more significant challenges with the wild and the difficult datasets because they have more complex degradations and compression artifacts; it can be seen with unstable results of Full-Reference and No-Reference metrics. On the contrary, the Mild and x8 datasets yielded better results in both metrics; not only that, even the computational cost for Mild and x8 outperforms the rest of the dataset. This study shows the strengths and limitations of REAL-ESRGAN in handling different levels of image degradation. For future research, the model needs enhancement to tackle the degradation format of the wild and the difficult dataset. It would be good if the REAL-ESRGAN improvement could also maintain the computational cost.
Since the current Internet has accessibility issues due to Network Address Port Translation (NAPT) and different IP protocol versions, recent service such as video conferencing also uses a client-server model even if ...
详细信息
Generating 3D human models directly from text helps reduce the cost and time of character modeling. However, achieving multi-attribute controllable and realistic 3D human avatar generation is still challenging due to ...
Generating 3D human models directly from text helps reduce the cost and time of character modeling. However, achieving multi-attribute controllable and realistic 3D human avatar generation is still challenging due to feature coupling and the scarcity of realistic 3D human avatar datasets. To address these issues, we propose Text2Avatar, which can generate realistic-style 3D avatars based on the coupled text prompts. Text2Avatar leverages a discrete codebook as an intermediate feature to establish a connection between text and avatars, enabling the disentanglement of features. Furthermore, to alleviate the scarcity of realistic style 3D human avatar data, we utilize a pre-trained unconditional 3D human avatar generation model to obtain a large amount of 3D avatar pseudo data, which allows Text2Avatar to achieve realistic style generation. Experimental results demonstrate that our method can generate realistic 3D avatars from coupled textual data, which is challenging for other existing methods in this field.
This research paper focuses on the development and evaluation of Automatic Speech Recognition (ASR) technology using the XLS-R 300m model. The study aims to improve ASR performance in converting spoken language into w...
详细信息
暂无评论