In this paper, we introduce a new class of score-based generative models (SGMs) designed to handle high-cardinality data distributions by leveraging concepts from mean-field theory. We present mean-field chaos diffusi...
详细信息
Query-based similarity search is a useful exploratory tool that has been used in many areas such as music, economics, and biology to find common patterns and behaviors. Existing query-based search systems allow users ...
详细信息
We characterize the linearity of a lateral junction microring modulator in a monolithic 45 nm CMOS platform versus modulator bias, optical wavelength, and input power, and achieve peak third-order SFDR of 96.4dB·...
详细信息
This position paper explores the gap between the current state-of-the-art in spectrum management and the objective of data driven spectrum policy. We explore four issues underlying successful data-driven policy: data ...
详细信息
ISBN:
(数字)9798350317640
ISBN:
(纸本)9798350317657
This position paper explores the gap between the current state-of-the-art in spectrum management and the objective of data driven spectrum policy. We explore four issues underlying successful data-driven policy: data requirements to support policy decisions; data acquisition and storage; robust, extensible metadata; and tools for analysis and visualization. For each issue, we discuss the state-of-the-art and describe the ultimate objective. We conclude the paper with a call for action to the spectrum community and list a number of efforts that should be undertaken to support true data-driven spectrum policy
Teams must be formed for all kinds of projects and purposes. Team formation is a key activity for innovation, entrepreneurship, class projects, and industry initiatives. Our experience with entrepreneurship and innova...
详细信息
Small-world networks, known for their high local clustering and short average path lengths, are a fundamental structure in many real-world systems, including social, biological, and technological networks. We apply th...
详细信息
Teams must be formed for all kinds of projects and purposes. Team formation is a key activity for innovation, entrepreneurship, class projects, and industry initiatives. In parallel work, we have proposed a generalize...
详细信息
The log-rank conjecture, a longstanding problem in communication complexity, has persistently eluded resolution for decades. Consequently, some recent efforts have focused on potential approaches for establishing the ...
详细信息
We present a data structure to randomly sample rows from the Khatri-Rao product of several matrices according to the exact distribution of its leverage scores. Our proposed sampler draws each row in time logarithmic i...
We present a data structure to randomly sample rows from the Khatri-Rao product of several matrices according to the exact distribution of its leverage scores. Our proposed sampler draws each row in time logarithmic in the height of the Khatri-Rao product and quadratic in its column count, with persistent space overhead at most the size of the input matrices. As a result, it tractably draws samples even when the matrices forming the Khatri-Rao product have tens of millions of rows each. When used to sketch the linear least squares problems arising in CANDECOMP / PARAFAC tensor decomposition, our method achieves lower asymptotic complexity per solve than recent state-of-the-art methods. Experiments on billion-scale sparse tensors validate our claims, with our algorithm achieving higher accuracy than competing methods as the decomposition rank grows.
We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM). Our maximum likelihood model leverages expert-writ...
详细信息
We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM). Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level. We apply this approach to a case study of scientific peer review in AI conferences that took place after the release of ChatGPT: ICLR 2024, NeurIPS 2023, CoRL 2023 and EMNLP 2023. Our results suggest that between 6.5% and 16.9% of text submitted as peer reviews to these conferences could have been substantially modified by LLMs, i.e. beyond spell-checking or minor writing updates. The circumstances in which generated text occurs offer insight into user behavior: the estimated fraction of LLM-generated text is higher in reviews which report lower confidence, were submitted close to the deadline, and from reviewers who are less likely to respond to author rebuttals. We also observe corpus-level trends in generated text which may be too subtle to detect at the individual level, and discuss the implications of such trends on peer review. We call for future interdisciplinary work to examine how LLM use is changing our information and knowledge practices. Copyright 2024 by the author(s)
暂无评论