Reinforcement Learning

Your weight: normal

0.

Reinforcement Learning from Rich Feedback with Distributional DAgger (arxiv.org)

0 points 1 sources 1 minutes ago cluster

Researchers propose a new algorithm, Distributional DAgger, to improve reinforcement learning from rich feedback. This approach aims to reduce uncertainty in decision-making by leveraging distributional information.

machine-learning reinforcement-learning
0.

QUBRIC: Co-Designing Queries and Rubrics for RL Beyond Verifiable Rewards (arxiv.org)

0 points 1 sources 1 minutes ago cluster

Researchers propose QUBRIC, a co-design framework for reinforcement learning (RL) that goes beyond verifiable rewards. QUBRIC combines queries and rubrics to enable RL in complex scenarios.

reinforcement-learning
0.

Self-Refining Agentic Reinforcement Learning for Vision-Conditioned UAV Navigation (arxiv.org)

0 points 1 sources 1 minutes ago cluster

Researchers propose a self-refining agentic reinforcement learning approach for vision-conditioned UAV navigation, which improves navigation performance and adaptability.

agents reinforcement-learning uav-navigation
0.

Using Reward Uncertainty to Induce Diverse Behaviour in Reinforcement Learning (arxiv.org)

0 points 1 sources 1 minutes ago cluster

Researchers propose a method to induce diverse behavior in reinforcement learning by incorporating reward uncertainty. This approach aims to improve exploration and decision-making in complex environments.

machine-learning reinforcement-learning
0.

Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases (arxiv.org)

0 points 1 sources 1 minutes ago cluster

Researchers found that reinforcement learning from human feedback can be configured to optimize misaligned biases in AI systems, according to a study published on arXiv AI.

artificial-intelligence reinforcement-learning