Agentic RL: Reward, Behavior, and The Long Shadow Of Feedback
A field note on agentic RL: reward design, behavior shaping, credit assignment, online and offline evaluation, and the feedback loops behind agents.
A field note on agentic RL: reward design, behavior shaping, credit assignment, online and offline evaluation, and the feedback loops behind agents.