Difference between revisions of "Reinforcement learning"

From Lesswrongwiki
Jump to: navigation, search
(Further Reading & References)
Line 10: Line 10:
== Further Reading & References==
== Further Reading & References==
*[ Reinforcement learning: An introduction by Satto & Barto]
*[ Reinforcement learning: An introduction] by Satto & Barto
*[http://arxiv.org/pdf/cs/9605103v1.pdf Reinforcement learning: a survey by Kaelbling, Littman & Moore ]
*[http://arxiv.org/pdf/cs/9605103v1.pdf Reinforcement learning: a survey] by Kaelbling, Littman & Moore
==See Also==
==See Also==

Revision as of 01:04, 14 September 2012

Wikipedia has an article about

Within the field of Machine learning, reinforcement learning refers to the study of methods of magnifying the reward given by interactions with the environment with no a priori knowledge of its properties. Strongly inspired by the work developed in behavioral psychology it is essentially a trial and error approach to find the best strategy.

Consider an agent that receives an input vector – I – from a complex environment of which it knows nothing of – S – informing it of its state. Based only on that information, the agent has to make a decision regarding which action to take, from a set, which will influence the state of the environment – A. This action will in itself change the state of the environment, which will result in a new input vector, and so on, each time also presenting the agent with the reward relative to its actions in the environment – r. The agent's goal is then to find the ideal strategy which will give the highest reward expectations over time, based on previous experience.

Exploration and Optimization

Knowing that randomly selecting the actions will result in poor performances, one of the biggest problems in reinforcement learning is exploring the avaliable set of responses to avoid getting stuck in sub-optimal choices and proceed to better ones. This is the problem of exploration, which is best described in the most studied reinforcement learning problem - the k-armed bandit.

Parallel with an exploration implementation, it is still necessary to chose the criteria which makes a certain action optimal when compared to another. This study of this property has led to several methods, from brute forcing to taking into account temporal differences in the received reward. Despite this and the great results obtained by reinforcement methods in solving small problems, it suffers from a lack of scalability, having difficulties solving larger, close-to-human scenarios.

Further Reading & References

See Also