检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

295 篇 期刊文献
158 篇 会议
6 册 图书

馆藏范围

459 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

344 篇 工学
- 272 篇 计算机科学与技术...
- 190 篇 软件工程
- 45 篇 控制科学与工程
- 44 篇 信息与通信工程
- 35 篇 光学工程
- 30 篇 生物工程
- 21 篇 生物医学工程（可授...
- 18 篇 电气工程
- 18 篇 电子科学与技术（可...
- 15 篇 机械工程
- 11 篇 化学工程与技术
- 9 篇 材料科学与工程（可...
- 8 篇 土木工程
- 7 篇 力学（可授工学、理...
- 6 篇 仪器科学与技术
- 6 篇 建筑学
- 6 篇 安全科学与工程
180 篇 理学
- 100 篇 数学
- 60 篇 物理学
- 44 篇 统计学（可授理学、...
- 34 篇 生物学
- 20 篇 系统科学
- 17 篇 化学
- 7 篇 地球物理学
45 篇 管理学
- 28 篇 管理科学与工程(可...
- 22 篇 图书情报与档案管...
- 17 篇 工商管理
16 篇 法学
- 16 篇 社会学
7 篇 经济学
- 7 篇 应用经济学
6 篇 农学
6 篇 医学
3 篇 教育学
2 篇 文学
1 篇 哲学

主题

15 篇 reinforcement le...
10 篇 semantics
9 篇 deep learning
8 篇 approximation al...
7 篇 decoding
7 篇 machine learning
7 篇 stochastic syste...
6 篇 computer science
6 篇 bayesian inferen...
5 篇 adversarial mach...
5 篇 speech recogniti...
5 篇 complexity theor...
5 篇 artificial intel...
5 篇 accuracy
4 篇 quantum control
4 篇 deep neural netw...
4 篇 quantum algorith...
4 篇 neural networks
4 篇 optimization
4 篇 computational li...

机构

71 篇 google deepmind ...
48 篇 google
28 篇 google deepmind
26 篇 google research ...
25 篇 mpi for intellig...
21 篇 google research
16 篇 google united st...
13 篇 google inc.
13 篇 deepmind united ...
10 篇 department of co...
10 篇 google inc. unit...
9 篇 department of co...
9 篇 google research ...
8 篇 department of el...
8 篇 department of co...
8 篇 department of co...
7 篇 department of co...
7 篇 deepmind
7 篇 heidelberg
6 篇 department of el...

作者

36 篇 bernhard schölko...
35 篇 kevin murphy
8 篇 müller klaus-rob...
7 篇 farhi edward
6 篇 jiang zhang
6 篇 bakas spyridon
6 篇 leibo joel z.
6 篇 søgaard anders
6 篇 menze bjoern
6 篇 montavon grégoir...
5 篇 summers ronald m...
5 篇 baumgartner mich...
5 篇 veličković petar
5 篇 antonelli michel...
5 篇 kopp-schneider a...
5 篇 sadigh dorsa
5 篇 isensee fabian
5 篇 xia fei
5 篇 demaine erik d.
4 篇 kreshuk anna

语言

395 篇 英文
63 篇 其他
1 篇 中文

检索条件"机构=Google DeepMind and Department of Computer Science and Technology"

共 459 条记录，以下是11-20 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

BAGEL: bootstrapping agents by guiding exploration with language 24

BAGEL: bootstrapping agents by guiding exploration with lang...

引用

Proceedings of the 41st International Conference on Machine Learning

作者： Shikhar Murty Christopher D. Manning Peter Shaw Mandar Joshi Kenton Lee Google Deepmind and Department of Computer Science Stanford University Department of Computer Science Stanford University Google Deepmind

Following natural language instructions by executing actions in digital environments (e.g. web-browsers and REST APIs) is a challenging task for language model (LM) agents. Unfortunately, LM agents often fail to generalize to new environments without human demonstrations. This work presents BAGEL, a method for bootstrapping LM agents without human supervision. BAGEL converts a seed set of randomly explored trajectories or synthetic instructions, into demonstrations, via round-trips between two noisy LM components: an LM labeler which converts a trajectory into a synthetic instruction, and a zero-shot LM agent which maps the synthetic instruction into a refined trajectory. By performing these round-trips iteratively, BAGEL quickly converts the initial distribution of trajectories towards those that are well-described by natural language. We use BAGEL demonstrations to adapt a zero shot LM agent at test time via in-context learning over retrieved demonstrations, and find improvements of over 2-13% absolute on ToolQA and MiniWob++, with up to 13× reduction in execution failures.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Interpretability illusions in the generalization of simplified models 24

Interpretability illusions in the generalization of simplifi...

引用

Proceedings of the 41st International Conference on Machine Learning

作者： Dan Friedman Andrew Lampinen Lucas Dixon Danqi Chen Asma Ghandeharioun Department of Computer Science Princeton University Google DeepMind Google Research Department of Computer Science Princeton University and Google DeepMind

A common method to study deep learning systems is to use simplified model representations--for example, using singular value decomposition to visualize the model's hidden states in a lower dimensional space. This approach assumes that the results of these simplifications are faithful to the original model. Here, we illustrate an important caveat to this assumption: even if the simplified representations can accurately approximate the full model on the training set, they may fail to accurately capture the model's behavior out of distribution. We illustrate this by training Transformer models on controlled datasets with systematic generalization splits, including the Dyck balanced-parenthesis languages and a code completion task. We simplify these models using tools like dimensionality reduction and clustering, and then explicitly test how these simplified proxies match the behavior of the original model. We find consistent generalization gaps: cases in which the simplified proxies are more faithful to the original model on the in-distribution evaluations and less faithful on various tests of systematic generalization. This includes cases where the original model generalizes systematically but the simplified proxies fail, and cases where the simplified proxies generalize better. Together, our results raise questions about the extent to which mechanistic interpretations derived using tools like SVD can reliably predict what a model will do in novel situations.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Enhancing Reinforcement Learning with Dense Rewards from Language Model Critic

Enhancing Reinforcement Learning with Dense Rewards from Lan...

引用

2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024

作者： Cao, Meng Shu, Lei Yu, Lei Zhu, Yun Wichers, Nevan Liu, Yinxiao Meng, Lei School of Computer Science McGill University Canada Department of Computer Science University of Toronto Canada Mila - Québec AI Institute Canada Google Deepmind United Kingdom

ISBN: (纸本)9798891761643

Reinforcement learning (RL) can align language models with non-differentiable reward signals, such as human preferences. However, a major challenge arises from the sparsity of these reward signals - typically, there is only a single reward for an entire output. This sparsity of rewards can lead to inefficient and unstable learning. To address this challenge, our paper introduces an novel framework that utilizes the critique capability of Large Language Models (LLMs) to produce intermediate-step rewards during RL training. Our approach pairs a policy model with a critic language model that provides feedback on each part of the policy's output. This feedback is then translated into token or span-level rewards that can be used to guide the RL training process. We investigate this approach under two different settings: one where the policy model is smaller and is paired with a more powerful critic model, and another where a single language model fulfills both roles. We assess our approach on three text generation tasks: sentiment control, language model detoxification, and summarization. Experimental results show that incorporating artificial intrinsic rewards significantly improve both sample efficiency and the overall performance of the policy model, supported by both automatic and human evaluation. The code is available under google Research github. © 2024 Association for Computational Linguistics.

关键词： Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

The poisson midpoint method for langevin dynamics: provably efficient discretization for diffusion models 24

The poisson midpoint method for langevin dynamics: provably ...

引用

Proceedings of the 38th International Conference on Neural Information Processing Systems

作者： Saravanan Kandasamy Dheeraj Nagaraj Department of Computer Science Cornell University Google DeepMind

ISBN: (纸本)9798331314385

Langevin Dynamics is a Stochastic Differential Equation (SDE) central to sampling and generative modeling and is implemented via time discretization. Langevin Monte Carlo (LMC), based on the Euler-Maruyama discretization, is the simplest and most studied algorithm. LMC can suffer from slow convergence - requiring a large number of steps of small step-size to obtain good quality samples. This becomes stark in the case of diffusion models where a large number of steps gives the best samples, but the quality degrades rapidly with smaller number of steps. Randomized Midpoint Method has been recently proposed as a better discretization of Langevin dynamics for sampling from strongly log-concave distributions. However, important applications such as diffusion models involve non-log concave densities and contain time varying drift. We propose its variant, the Poisson Midpoint Method, which approximates a small step-size LMC with large step-sizes. We prove that this can obtain a quadratic speed up of LMC under very weak assumptions. We apply our method to diffusion models for image generation and show that it maintains the quality of DDPM with 1000 neural network calls with just 50-80 neural network calls and outperforms ODE based methods with similar compute.

关键词：

来源：评论

学校读者我要写书评

暂无评论

REDUCR: Robust Data Downsampling using Class Priority Reweighting 38

REDUCR: Robust Data Downsampling using Class Priority Reweig...

引用

38th Conference on Neural Information Processing Systems, NeurIPS 2024

作者： Bankes, William Hughes, George Bogunovic, Ilija Wang, Zi Department of Computer Science University College London United Kingdom Department of Electrical Engineering University College London United Kingdom Google DeepMind United Kingdom

Modern machine learning models are becoming increasingly expensive to train for real-world image and text classification tasks, where massive web-scale data is collected in a streaming fashion. To reduce the training cost, online batch selection techniques have been developed to choose the most informative datapoints. However, many existing techniques are not robust to class imbalance and distributional shifts, and can suffer from poor worst-class generalization performance. This work introduces REDUCR, a robust and efficient data downsampling method that uses class priority reweighting. REDUCR reduces the training data while preserving worst-class generalization performance. REDUCR assigns priority weights to datapoints in a class-aware manner using an online learning algorithm. We demonstrate the data efficiency and robust performance of REDUCR on vision and text classification tasks. On web-scraped datasets with imbalanced class distributions, REDUCR significantly improves worst-class test accuracy (and average accuracy), surpassing state-of-the-art methods by around 15%. © 2024 Neural information processing systems foundation. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

How do Large Language Models Navigate Conflicts between Honesty and Helpfulness? 41

How do Large Language Models Navigate Conflicts between Hone...

引用

41st International Conference on Machine Learning, ICML 2024

作者： Liu, Ryan Sumers, Theodore R. Dasgupta, Ishita Griffiths, Thomas L. Department of Computer Science Princeton University United States Anthropic United States Google DeepMind United Kingdom Department of Psychology Princeton University United States

In day-to-day communication, people often approximate the truth - for example, rounding the time or omitting details - in order to be maximally helpful to the listener. How do large language models (LLMs) handle such nuanced tradeoffs? To address this question, we use psychological models and experiments designed to characterize human behavior to analyze LLMs. We test a range of LLMs and explore how optimization for human preferences or inference-time reasoning affects these trade-offs. We find that reinforcement learning from human feedback improves both honesty and helpfulness, while chain-of-thought prompting skews LLMs towards helpfulness over honesty. Finally, GPT-4 Turbo demonstrates human-like response patterns including sensitivity to the conversational framing and listener's decision context. Our findings reveal the conversational values internalized by LLMs and suggest that even these abstract values can, to a degree, be steered by zero-shot prompting. Copyright 2024 by the author(s)

关键词： Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Online Bidding under RoS Constraints without Knowing the Value 25

Online Bidding under RoS Constraints without Knowing the Val...

引用

34th ACM Web Conference, WWW 2025

作者： Vijayan, Sushant Feng, Zhe Padmanabhan, Swati Shanmugam, Karthikeyan Suggala, Arun Wang, Di School of Technology and Computer Science Tata Institute of Fundamental Research Mumbai India Google Research Mountain View United States Massachusetts Institute of Technology Cambridge United States Google DeepMind Bengaluru India

ISBN: (纸本)9798400712746

We consider the problem of bidding in online advertising, where an advertiser aims to maximize value while adhering to budget and Return-on-Spend (RoS) constraints. Unlike prior work that assumes knowledge of the value generated by winning each impression (e.g., conversions), we address the more realistic setting where the advertiser must simultaneously learn the optimal bidding strategy and the value of each impression opportunity. This introduces a challenging exploration-exploitation dilemma: the advertiser must balance exploring different bids to estimate impression values with exploiting current knowledge to bid effectively. To address this, we propose a novel Upper Confidence Bound (UCB)-style algorithm that carefully manages this trade-off. Via a rigorous theoretical analysis, we prove that our algorithm achieves Oe(pT log(|B|T)) regret and constraint violation, where T is the number of bidding rounds and B is the domain of possible bids. This establishes the first optimal regret and constraint violation bounds for bidding in the online setting with unknown impression values. Moreover, our algorithm is computationally efficient and simple to implement. We validate our theoretical findings through experiments on synthetic data, demonstrating that our algorithm exhibits strong empirical performance compared to existing approaches. © 2025 Copyright held by the owner/author(s).

关键词： UCB

来源：评论

学校读者我要写书评

暂无评论

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator 41

Chain of Code: Reasoning with a Language Model-Augmented Cod...

引用

41st International Conference on Machine Learning, ICML 2024

作者： Li, Chengshu Liang, Jacky Zeng, Andy Chen, Xinyun Hausman, Karol Sadigh, Dorsa Levine, Sergey Fei-Fei, Li Xia, Fei Ichter, Brian Department of Computer Science Stanford University CA United States Google DeepMind CA United States Department of Electrical Engineering and Computer Sciences University of California BerkeleyCA United States

Code provides a general syntactic structure to build complex programs and perform precise computations when paired with a code interpreter - we hypothesize that language models (LMs) can leverage code-writing to improve Chain of Thought reasoning not only for logic and arithmetic tasks (Chen et al., 2022;Nye et al., 2021;Austin et al., 2021), but also for semantic ones (and in particular, those that are a mix of both). For example, consider prompting an LM to write code that counts the number of times it detects sarcasm in an essay: the LM may struggle to write an implementation for "detect_sarcasm(string)" that can be executed by the interpreter (handling the edge cases would be insurmountable). However, LMs may still produce a valid solution if they not only write code, but also selectively "emulate" the interpreter by generating the expected output of "detect_sarcasm(string)". In this work, we propose Chain of Code (CoC), a simple yet surprisingly effective extension that improves LM code-driven reasoning. The key idea is to encourage LMs to format semantic sub-tasks in a program as flexible pseudocode that the interpreter can explicitly catch undefined behaviors and hand off to simulate with an LM (as an "LMulator"). Experiments demonstrate that Chain of Code outperforms Chain of Thought and other baselines across a variety of benchmarks;on BIG-Bench Hard, Chain of Code achieves 84%, a gain of 12% over Chain of Thought. In a nutshell, CoC broadens the scope of reasoning questions that LMs can answer by "thinking in code". Copyright 2024 by the author(s)

关键词： Program interpreters

来源：评论

学校读者我要写书评

暂无评论

A Computationally Efficient Sparsified Online Newton Method 37

A Computationally Efficient Sparsified Online Newton Method

引用

37th Conference on Neural Information Processing Systems, NeurIPS 2023

作者： Devvrit Duvvuri, Sai Surya Anil, Rohan Gupta, Vineet Hsieh, Cho-Jui Dhillon, Inderjit Department of Computer Science The University of Texas Austin United States Google DeepMind United Kingdom CS Department UCLA United States

ISBN: (纸本)9781713899921

Second-order methods hold significant promise for enhancing the convergence of deep neural network training;however, their large memory and computational demands have limited their practicality. Thus there is a need for scalable second-order methods that can efficiently train large models. In this paper, we introduce the Sparsified Online Newton (SONew) method, a memory-efficient second-order algorithm that yields a sparsified yet effective preconditioner. The algorithm emerges from a novel use of the LogDet matrix divergence measure;we combine it with sparsity constraints to minimize regret in the online convex optimization framework. Empirically, we test our method on large scale benchmarks of up to 1B parameters. We achieve up to 30% faster convergence, 3.4% relative improvement in validation performance, and 80% relative improvement in training loss, in comparison to memory efficient optimizers including first order methods. Powering the method is a surprising fact - imposing structured sparsity patterns, like tridiagonal and banded structure, requires little to no overhead, making it as efficient and parallelizable as first-order methods. In wall-clock time, tridiagonal SONew is only about 3% slower per step than first-order methods but gives overall gains due to much faster convergence. In contrast, one of the state-of-the-art (SOTA) memory-intensive second-order methods, Shampoo, is unable to scale to large benchmarks. Additionally, while Shampoo necessitates significant engineering efforts to scale to large benchmarks, SONew offers a more straightforward implementation, increasing its practical appeal. SONew code is available at: https://***/devvrit/SONew. © 2023 Neural information processing systems foundation. All rights reserved.

关键词： Newton-Raphson method

来源：评论

学校读者我要写书评

暂无评论

FRAPPÉ: a group fairness framework for post-processing everything 24

FRAPPÉ: a group fairness framework for post-processing ever...

引用

Proceedings of the 41st International Conference on Machine Learning

作者： Alexandru Ţifrea Preethi Lahoti Ben Packer Yoni Halpern Ahmad Beirami Flavien Prost Department of Computer Science ETH Zurich Google DeepMind

Despite achieving promising fairness-error trade-offs, in-processing mitigation techniques for group fairness cannot be employed in numerous practical applications with limited computation resources or no access to the training pipeline of the prediction model. In these situations, post-processing is a viable alternative. However, current methods are tailored to specific problem settings and fairness definitions and hence, are not as broadly applicable as in-processing. In this work, we propose a framework that turns any regularized in-processing method into a post-processing approach. This procedure prescribes a way to obtain post-processing techniques for a much broader range of problem settings than the prior post-processing literature. We show theoretically and through extensive experiments that our framework preserves the good fairness-error trade-offs achieved with in-processing and can improve over the effectiveness of prior post-processing methods. Finally, we demonstrate several advantages of a modular mitigation strategy that disentangles the training of the prediction model from the fairness mitigation, including better performance on tasks with partial group labels.

关键词：

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共46页 << < 1 2 3 4 5 6 7 8 9 10 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：