Trust Region Policy Optimization
Introduces Trust Region Policy Optimization (TRPO), a novel iterative algorithm designed for optimizing policies with guaranteed monotonic improvement...
see more
Find all the Top AIGames papers. Links to pdf, code repos and demos are provided.
Introduces Trust Region Policy Optimization (TRPO), a novel iterative algorithm designed for optimizing policies with guaranteed monotonic improvement...
see more
Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained using a me...
see more
Presents a novel approach to enabling intelligent agents to learn and simulate physical dynamics through visual predictive models, specifically in the...
see more
Every living organism struggles against disruptive environmental forces to carve out and maintain an orderly niche. We propose that such a struggle to...
see more
Deep reinforcement learning algorithms that estimate state and state-action
value functions have been shown to be effective in a variety of challengin...
see more
Introduces the Normalized Actor-Critic (NAC) algorithm, designed to facilitate reinforcement learning from both imperfect demonstrations and environme...
see more
Presents a novel model called SCOFF that separates declarative knowledge from procedural knowledge to enhance the modeling of dynamic environments, pa...
see more
Presents Simulated Policy Learning (SimPLe), a model-based reinforcement learning algorithm that significantly improves sample efficiency in learning ...
see more
Presents PokรฉChamp, an expert-level minimax language agent utilizing Large Language Models (LLMs) for Pokรฉmon battles, achieving superior performance ...
see more
Transformer, originally devised for natural language processing, has also attested significant success in computer vision. Thanks to its super express...
see more
Introduces CHASE, a framework for generating challenging evaluation problems for Large Language Models (LLMs) using synthetic methods, bypassing the n...
see more
We present Readout Guidance, a method for controlling text-to-image diffusion models with learned signals. Readout Guidance uses readout heads, lightw...
see more