paper 管理

yspm / 2024-09-13 / 原文

这些文章放到这里我估计我也就不会读了。主要是 google 浏览器的 bookmark 有点装不下了,把它们清理一下。

分类都是根据论文标题分的,经验之谈是之前觉得很拽的标题,实际上做的都不是我理解的事情。

大列表

组里的

agent survey list

开源框架调查 list

tool Learning survey paper list

efficient transformer

transformer circuits

infra 性质的文章

https://arxiv.org/pdf/2310.01377 ULTRAFEEDBACK: Boosting Language Models with Scaled AI Feedback 这篇文章给 dpo 提供了 infra。

https://arxiv.org/pdf/2305.14233 Enhancing Chat Language Models by Scaling High-quality Instructional Conversations 这篇是 UltraChat,构造多轮对话 sft 数据。数据经过层级划分被精致分过类,这个分类的类别标签之前使用过。

benchmarks

推理 benchmark

https://arxiv.org/pdf/2402.17644 Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data

在本文中,我们重点关注高级定量推理的两个主要领域:统计推理和因果推理,图1中展示了相关示例。给定来自样本调查的数据表,统计推理旨在推断潜在的概率分布,解决诸如“y的总体均值的95%置信区间是多少”的问题;而因果推理则旨在理解变量之间的因果关系,解决诸如“t对y的平均处理效应是多少”的问题。

LLM 搏杀数学统计专业的大学生无疑了。

Knowledge Graph

https://arxiv.org/pdf/2409.03155 long reasoning 能力提升的一个方法是在 knowledge graph 上获得信息。QA 是和 knowledge graph 交互的一个手段。这篇文章通过提升 knowledge Graph QA 的能力提升了 LLM reasoninging 能力

Catastrophic Forgetting

https://arxiv.org/pdf/2404.10306 Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model 微调过程中 versatility 和 speciality 的平衡问题,如果塞入大量特定 domain 的内容,可能会导致通用能力的丧失。这篇文章先找出模型中最表现专有能力的一些模块,把其余参数冻结,然后再用微调数据进行微调。找模块这部分,我连伪代码都没看懂。

Representation Learning

https://arxiv.org/pdf/2409.03662 The Representation Landscape of Few-Shot Learning and Fine-Tuning in Large Language Models 标题讲的已经很清楚了。这篇文章的内容有点太表征学习了,有点读不起。

https://arxiv.org/pdf/2212.07677 Transformers Learn In-Context by Gradient Descent 之前讨论的时候 zhang zhong 学长提到的nb文章,但是我没仔细读过。

https://arxiv.org/pdf/2409.04318 Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs 这篇是微信公众号上推荐的讨论 in context learning 机制的文章

LLM sys

我也不好定义这个分类究竟是在做什么,大概是使用 LLM 作为基座,将一些之前需要人做的事情自动化掉?按照这个定义,以下两篇文章其实和 agents 不太相关。或者说这个 pipeline 只是初步的 pipeline,作者并没有试验在 complex task 上的可用性。

https://arxiv.org/pdf/2308.12261 PROMPT2MODEL: Generating Deployable Models from Natural Language Instructions 这篇文章的作者有缘认识了一下,现在在 UCLA 读博士。第一次搜的时候最令我震撼的是他身在贵系,GPA 还能 3.9?当然后来了解到贵系可能 50% 是 3.8,那没事了

https://arxiv.org/pdf/2407.12874 SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning 也是 chenyang zhao 的文章。这篇文章讨论了微调数据自动合成的事情。conventional approach 需要外部信号,或者说更强大的教师模型,但是这篇文章希望学生模型可以 self guide。大概是这个意思?

X of thoughts

这部分主要收集了一些扩展 thought 的方法。值得注意的是,这个是 thought 的生成,不是 react 的 pipeline。但是有些分析 thought 的文章也可以直接推广到 react 的使用上。这些文章用来做实验的数据集非常固定,都是二十四点啊,口袋魔方啊,GSM8K 啊等等。

https://arxiv.org/pdf/2208.14271 Faithful Reasoning Using Large Language Models 这个架构好像只能用来做选择题?

https://arxiv.org/pdf/2205.10625 LEAST-TO-MOST PROMPTING ENABLES COMPLEX REASONING IN LARGE LANGUAGE MODELS 这篇主要是 task decomposition,所有的过程都有 few shot 示例

https://arxiv.org/pdf/2211.12588 Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks 这篇文章也是后面文章的 baseline 之 n?

那个时代的 deepmind/google brain 就开始关注推理了,真的是 pioneer

https://arxiv.org/pdf/2403.05313v1 RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation

先吐槽一下,这篇文章做实验的时候说涨点涨的是相对比例,看开头好家伙涨近 20%,点到 table x一看大跌眼镜。

做法就是通过 RAG获得 thought。之前 zhangzhong 学长跟我提过用 RAG 获得 Next action 的一些问题(比如简单粗暴 RAG,得到的结果在语义上相关,在别的维度就完全无关)。

https://arxiv.org/pdf/2311.04254 EVERYTHING OF THOUGHTS : DEFYING THE LAW OF PENROSE TRIANGLE FOR THOUGHT GENERATION

https://arxiv.org/pdf/2406.04271 Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models 一看这个年头就知道技术先进一些。这篇似乎也是用了 retrieve thought 的手法

蒸馏

https://arxiv.org/pdf/2407.05682 Retrieved In-Context Principles from Previous Mistakes 怎么感觉是唐文。

LLM Agents

自动化定理证明

这个领域我确实不了解。可能未来可以和同学先交流一些基础知识再仔细读

https://arxiv.org/pdf/2404.07382 Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving 前上交 WF 亚军选手 jingbo shang,现 UCSD 副教授 组里的工作。考虑了负例样本在 用于自动化定理证明的 agent 训练中的学习价值。涨点 30% 多,还是很猛的。没太看是用的 RL 还是 SFT。之前看到的 trial&error,蛮多用 RL 的。

工具调用

https://arxiv.org/pdf/2406.11200 AVATAR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval

https://arxiv.org/pdf/2406.12045 τ -bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

移动端 agent

https://arxiv.org/pdf/2406.11896 DigiRL: Training In-The-Wild Device-Control
Agents with Autonomous Reinforcement Learning

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7479556 MobiGoal: Flexible Achievement of Personal Goals for Mobile Users

agent as operation systems

https://arxiv.org/pdf/2403.16971 AIOS: LLM Agent Operating System

agent evaluation

https://arxiv.org/pdf/2308.04026 An open-source sandbox for large language model evaluation

Reasoning&decision making?

https://arxiv.org/pdf/2212.10403 Towards Reasoning in Large Language Models: A Survey

https://arxiv.org/pdf/2305.14992 Reasoning with Language Model is Planning with World Model

https://arxiv.org/pdf/2405.16376 STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making

https://arxiv.org/pdf/2305.14325 Improving Factuality and Reasoning in Language Models through Multiagent Debate 主打新概念吧

https://arxiv.org/pdf/2307.13692 ARB: Advanced Reasoning Benchmark for Large language Models 这是benchmark

multiagent?

https://arxiv.org/pdf/2405.15677 SMART: Scalable Multi-agent Real-time Simulation via Next-token Prediction

https://arxiv.org/pdf/2408.11416 Subgoal-based Hierarchical Reinforcement
Learning for Multi-Agent Collaboration

https://arxiv.org/pdf/2405.09935 DEBATE: Devil’s Advocate-Based Assessment and Text Evaluation 评估的时候也可以引入多智能体

agent 训练?

https://arxiv.org/pdf/2407.03502 AgentInstruct:Toward Generative Teaching with Agentic Flows

https://arxiv.org/pdf/2310.12823 AGENTTUNING: ENABLING GENERALIZED AGENT ABILITIES FOR LLMS

这篇文章的 idea 就是在特定的数据集上微调会让 LLM 丧失通用能力。于是他们将蒸馏数据集和常规数据集拼起来对模型进行训练。因为这篇文章是早年文章,所以他们做这个的时候可以说自己是第一个做的。

https://arxiv.org/pdf/2312.08468 On Diagnostics for Understanding Agent Training Behaviour in Cooperative MARL 突尼斯学校做的工作?

https://arxiv.org/pdf/2406.01495 Re-ReST: Reflection-Reinforced Self-Training for Language Agents

https://arxiv.org/pdf/2402.15506 AGENTOHANA: DESIGN UNIFIED DATA AND TRAINING PIPELINE FOR EFFECTIVE AGENT LEARNING

https://arxiv.org/pdf/2403.14589 ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy

memory

https://arxiv.org/pdf/2404.13501 A Survey on the Memory Mechanism of Large Language Model based Agents

行为模仿

可能在那个年代还是比较新的概念,但是现在大家似乎都玩烂了

https://arxiv.org/pdf/2306.02552 User Behavior Simulation with Large Language Model based Agents

LLM agents 各种应用

Code generation

https://arxiv.org/pdf/2401.07339 CODEAGENT: Enhancing Code Generation with Tool-Integrated AgentSystems for Real-World Repo-level Coding Challenges

https://arxiv.org/pdf/2312.13010 AgentCoder: Multi-Agent Code Generation with
Effective Testing and Self-optimisation

https://arxiv.org/pdf/2405.17057 ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation 这篇我好还读来着,但是它这个 reflection 是用来微调的,对当时写 rubbish prompt 的我没啥帮助?

泛化性

https://arxiv.org/pdf/2103.02503 Domain Generalization: A Survey(这篇放到这里,说明我这个分类是粗粒度的)

LLM 基座的研究

https://arxiv.org/pdf/2405.14860 Not All Language Model Features Are Linear

lost in the middle

https://arxiv.org/pdf/2307.03172 Lost in the Middle: How Language Models Use Long Contexts

上面这篇文章是开山之作了,lost in the middle 大概描述了这么一件事情:

  1. 当相关信息在输入上下文的中间位置时,模型的性能会显著下降,表明当前的语言模型在长文本上下文中并不总是能够稳健地使用信息。
  2. 观察到一个明显的 U 形性能曲线,即当相关信息出现在输入上下文的开始(首因效应)或结束(近因效应)时,模型的性能最高,而在中间位置时性能显著下降。
  3. 即使是专门为长上下文设计的模型,也存在这种性能下降的问题。

我觉得这和 attention sink 有点类似。或者通过 attention sink 容易联想到这个问题。分析 lost in the middle 出现的原因,我觉得最重要的还是训练数据,预训练数据都是互联网找的,互联网上的文字段落都是人写的,小学老师就教你总分总,自然两头的信息量(关键程度/对理解的影响程度)大于中间部分,这就导致了 lost in the middle。

https://arxiv.org/pdf/2403.04797 Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding 上网搜了一下搜到了这篇文章,看名字就很有趣啊

leverage LLM 的世界知识

其实有一篇属于文字游戏分类的文章也应该属于这个类

https://arxiv.org/pdf/2407.13578v1 Large Language Models as Reliable Knowledge Bases?

语言模型是压缩模型

https://arxiv.org/pdf/2305.14788 Adapting Language Models to Compress Contexts 作者甚至有陈丹琦

https://arxiv.org/pdf/2309.10668 LANGUAGE MODELING IS COMPRESSION

LLM generalization

https://aclanthology.org/2023.findings-emnlp.768.pdf Improving generalization in large language models by learning prefix subspaces 这篇大概率是没时间读明白了

模型坍缩

nature 封面文被沈老师喷说结论很 intuitive

https://arxiv.org/pdf/2404.01413 Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

减少幻觉

https://arxiv.org/pdf/2405.20974 SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

RAG

https://arxiv.org/pdf/2408.10497 QUITO-X: An Information Bottleneck-based Compression Algorithm with Cross-Attention 算法竞赛选手 y_dove 的文章

https://arxiv.org/pdf/2305.17331 Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In

https://arxiv.org/pdf/2409.03708 RAG based Question-Answering for Contextual Response Prediction System

模型训练

https://arxiv.org/pdf/2407.06654 SoftDedup: an Efficient Data Reweighting Method for Speeding Up Language Model Pre-training 这个data reweight 还挺重要,得近期仔细读读这篇?

https://arxiv.org/pdf/2407.05013 Progress or Regress? Self-Improvement Reversal in Post-training

https://arxiv.org/pdf/2407.04787 Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning

mixture of experts / fusion

https://arxiv.org/pdf/2407.04153 Mixture of A Million Experts 谷歌 deepmind 老哥写的。

https://arxiv.org/pdf/2401.10491 KNOWLEDGE FUSION OF LARGE LANGUAGE MODELS

https://arxiv.org/abs/2407.19985 Mixture of Nested Experts: Adaptive Processing of Visual Tokens

reflection tuning

https://arxiv.org/pdf/2402.10110 Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning

https://arxiv.org/pdf/2310.11716 Reflection-Tuning:Data Recycling Improves LLM Instruction-Tuning

这两篇好像是一串。但是我忘了。

你什么模型训练,我什么模型训练

https://zhuanlan.zhihu.com/p/694263912 badam 优化器?

https://arxiv.org/pdf/2409.03137 THE ADEMAMIX OPTIMIZER: BETTER, FASTER, OLDER “这就是 TCS 吗”

知识蒸馏

https://www.zhihu.com/question/309808462/answer/3365782354

花式 prompt engineering

https://arxiv.org/pdf/2406.06608 The Prompt Report: A Systematic Survey of Prompting Techniques

https://arxiv.org/pdf/2407.04118 MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization 还是自适应 prompt 的工作。甚至带 RL 了已经。

text based simulators

https://arxiv.org/pdf/2406.06485 Can Language Models Serve as Text-Based World Simulators? 这篇认为 LLM 意识不到环境中 非当前操作物品 按照时间 变化的事实,所以造了一些数据

https://arxiv.org/pdf/2107.04132 A Systematic Survey of Text Worlds as Embodied Natural Language Environments (Ruoyao Wang 老师的文章)

https://arxiv.org/pdf/1909.05398 Interactive Fiction Games: A Colossal Adventure 这篇文章的几个老哥做的 TextWorld

https://arxiv.org/pdf/2312.11970v1 Large Language Models Empowered Agent-based
Modeling and Simulation: A Survey and Perspectives

multimodel

https://arxiv.org/pdf/2402.15116 Large Multimodal Agents: A Survey

https://arxiv.org/pdf/2405.10292 Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

语音交互

https://arxiv.org/pdf/2407.04051 FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

Robotics?Embodied?

https://arxiv.org/pdf/2405.13035v1 SIGMA: AN OPEN-SOURCE INTERACTIVE SYSTEM
FOR MIXED-REALITY TASK ASSISTANCE RESEARCH

https://arxiv.org/pdf/2304.13705 Learning Fine-Grained Bimanual Manipulation with
Low-Cost Hardware

https://arxiv.org/pdf/2204.01691 Do As I Can, Not As I Say:Grounding Language in Robotic Affordances 这好像是听 talk 的时候收集的

https://arxiv.org/pdf/2212.06817 RT-1: ROBOTICS TRANSFORMER FOR REAL-WORLD CONTROL AT SCALE 这好像是 sergey levine 在cs285上宣传的

https://arxiv.org/pdf/2407.02220 Embodied AI in Mobile Robots: Coverage Path Planning with Large Language Models

https://arxiv.org/pdf/2210.03370 GNM: A General Navigation Model to Drive Any Robot

structured exploration

https://arxiv.org/pdf/1802.07245 Meta-Reinforcement Learning of Structured Exploration Strategies 这篇是远古 sergey levine RL 文章,由于 intro 有点没看懂所以鸽了先。

alignment

https://arxiv.org/pdf/2401.05566 SLEEPER AGENTS: TRAINING DECEPTIVE LLMS THAT PERSIST THROUGH SAFETY TRAINING

https://arxiv.org/pdf/2308.06259 SELF-ALIGNMENT WITH INSTRUCTION BACKTRANSLATION

https://arxiv.org/pdf/2407.13692 PROVER-VERIFIER GAMES IMPROVE LEGIBILITY OF LLM OUTPUTS 水 OpenAI Research 主页的时候看到的

https://arxiv.org/pdf/2004.07213 Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims∗老文章了

https://arxiv.org/pdf/2202.03286 Red Teaming Language Models with Language Models

https://www.zhihu.com/column/c_1725235995694276608 知乎上有老哥每周总结一些科幻小说的。

https://arxiv.org/pdf/2408.12163 Preference-Guided Reflective Sampling for Aligning Language Models

unlearning

https://arxiv.org/pdf/2406.11614 Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces 好像是讨论破坏词向量的。https://zhuanlan.zhihu.com/p/708685124 作者在这里写了一个解释

https://arxiv.org/pdf/2402.16835 EIGHT METHODS TO EVALUATE ROBUST
UNLEARNING IN LLMS 这就是做实验的地方咯

Preference Learning

https://arxiv.org/pdf/2406.00888 Show, Don’t Tell: Aligning Language Models with Demonstrated Feedback

DPO

https://arxiv.org/pdf/2404.10719 Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

https://arxiv.org/pdf/2406.18629 STEP-DPO: STEP-WISE PREFERENCE OPTIMIZATION FOR LONG-CHAIN REASONING OF LLMS

DPO 是正负样本对比学习,这篇论文认为在 long reasoning 场景中,直接对整个 trajectory 做正负样本对比会丢失一些信息,所以我把 reward 每步最小化 \(y_+\) 的预测 - \(y_-\) 的预测

这个一看就是编了一个做法然后做了做实验就灌了,毕竟 figure 上的提升也不算高

https://arxiv.org/pdf/2406.11176 Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement

一言以蔽之:SFT loss + step-DPO loss + outcome-DPO loss(不 step 的 DPO)。非常好裁缝啊。但是我觉得没有从方法上解决问题啊。

image-20240910231045915

https://arxiv.org/pdf/2409.03650 ON THE LIMITED GENERALIZATION CAPABILITY OF THE IMPLICIT REWARD MODEL INDUCED BY DIRECT PREFERENCE OPTIMIZATION 这篇分析了 DPO reward model 和 RLHF reward model 的区别。结论是使用 DPO reward model 进行模型训练会导致训出来的模型面对 OOD 问题能力不足,五个测试平均掉三个点最多掉七个点。所以说 DPO reward model 的泛化能力有限,而且那些 iterative 的 DPO 方法在某种程度上是 RLHF reward model 的集成?

https://arxiv.org/pdf/2406.09760 Bootstrapping Language Models with DPO Implicit Rewards

Iterative dpo 需要每轮构建 preference dataset。本文使用上一轮的 reward model 对当前模型生成的若干个回复进行打分,分最高的和最低的作为 \(y_{win},y_{lose}\),不需要外部监督信号。

diffusion models 无处不在

https://arxiv.org/pdf/2407.06938 RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models

https://arxiv.org/abs/2402.03570 Diffusion World Model: Future Modeling Beyond Step-by-Step Rollout for Offline Reinforcement Learning

https://arxiv.org/abs/2405.12399 Diffusion for World Modeling: Visual Details Matter in Atari

diffusion policy

https://arxiv.org/pdf/2303.04137 Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

transfusion

这个 关键词 在arxiv 上能搜到一坨结果。

https://arxiv.org/pdf/2203.11496 TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers

https://arxiv.org/pdf/2403.18681 TRANSFUSION: CONTRASTIVE LEARNING WITH
TRANSFORMERS

https://arxiv.org/abs/2311.09999 TransFusion -- A Transparency-Based Diffusion Model for Anomaly Detection

https://arxiv.org/abs/2210.07677 TransFusion: Transcribing Speech with Multinomial Diffusion

https://arxiv.org/pdf/2307.12667 TRANSFUSION: GENERATING LONG, HIGH FIDELITY TIME SERIES USING DIFFUSION MODELS WITH TRANSFORMERS

https://www.arxiv.org/pdf/2408.11039 Transfusion: Predict the Next Token and
Diffuse Images with One Multi-Modal Model

Reinforcement learning

RL 中 subgoal

https://arxiv.org/pdf/2107.00541 Goal-Conditioned Reinforcement Learning with Imagined Subgoals

Trajectory Exploration

真不懂你们 RL

https://arxiv.org/pdf/2403.02502 Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents

https://arxiv.org/pdf/2406.11176 Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement

step reward 设计

https://arxiv.org/pdf/2310.10080 LET’S REWARD STEP BY STEP: STEP-LEVEL REWARD MODEL AS THE NAVIGATORS FOR REASONING

https://arxiv.org/pdf/2305.20050 Let’s Verify Step by Step(OpenAI 出品)

我也不知道怎么分类这些 RL 的算法?

https://arxiv.org/pdf/1812.02690 Provably Efficient Maximum Entropy Exploration 好像是牛逼 TCS 文章

https://github.com/WindyLab/LLM-RL-Papers 西湖大学维护的 LLM RL 相关的论文集合

这些文章的来源全是 zhihu 关注的老哥的动态,他们看起来挺专业,但是我也不知道这些文章具体干了什么

https://www.arxiv.org/pdf/2408.08152 DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search (DeepSeek 出品,被吹了好像)

https://arxiv.org/pdf/2409.01369v1 Imitating Language via Scalable Inverse Reinforcement Learning 这篇也是新颖。

https://arxiv.org/pdf/2107.10390 Reinforcement Learning Agent Training with Goals for Real World Tasks

https://arxiv.org/pdf/2406.14324 Revealing the learning process in reinforcement learning agents through attention-oriented metrics

基础 RL 远古文章

这些都是 sergey levine 在 cs285 上提到的东西

https://arxiv.org/pdf/1707.01495 Hindsight Experience Replay

https://arxiv.org/pdf/1706.03741 Deep Reinforcement Learning from Human Preferences

https://arxiv.org/pdf/1912.06088 Learning to Reach Goals via Iterated Supervised Learning

https://arxiv.org/pdf/1903.01973 Learning Latent Plans from Play

quiet q* 能不能直接当做标题啊

https://arxiv.org/pdf/2403.09629v1 Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

https://arxiv.org/pdf/2408.02666 Self-Taught Evaluators(Meta 作)

知识迁移

https://arxiv.org/pdf/2408.10858 Knowledge Sharing and Transfer via Centralized Reward Agent for Multi-Task Reinforcement Learning

https://arxiv.org/pdf/2408.12525v1 Scaling, Control and Generalization in Reinforcement Learning Level Generators

总结现状,反思行情小作文

https://arxiv.org/pdf/2407.01502 AI Agents That Matter 喷了喷现在的大多数工作, evaluation 都做的不好

https://arxiv.org/pdf/2211.16327 ON THE POWER OF FOUNDATION MODELS 范畴论小文章

未分类·

Generative agents: Interactive simulacra of human behavior

https://ysymyth.github.io/papers/Dissertation-finalized.pdf 著名老哥shunyu yao 的博士答辩论文

https://proceedings.neurips.cc/paper_files/paper/2011/file/e19347e1c3ca0c0b97de5fb3b690855a-Paper.pdf Unsupervised learning models of primary cortical receptive fields and receptive field plasticity 有点太老了,不知道有没有意义还。

https://arxiv.org/pdf/2405.16137 Comparison between Behavior Trees and Finite State Machines

https://proceedings.neurips.cc/paper_files/paper/2020/file/1f89885d556929e98d3ef9b86448f951-Paper.pdf Learning to summarize from human feedback OpenAI 力作

https://arxiv.org/pdf/2008.02217 HOPFIELD NETWORKS IS ALL YOU NEED