Qualia  0.2
Public Member Functions | Public Attributes | List of all members
QLearningEDecreasingPolicy Class Reference

#include <QLearningEDecreasingPolicy.h>

Inheritance diagram for QLearningEDecreasingPolicy:
Inheritance graph
[legend]
Collaboration diagram for QLearningEDecreasingPolicy:
Collaboration graph
[legend]

Public Member Functions

 QLearningEDecreasingPolicy (float epsilon, float decreaseConstant)
 
virtual void init ()
 
virtual void chooseAction (Action *action, const Observation *observation)
 
virtual float getCurrentEpsilon () const
 Returns the current epsilon value ie. = epsilon / (1 + t * decreaseConstant). More...
 
- Public Member Functions inherited from QLearningEGreedyPolicy
 QLearningEGreedyPolicy (float epsilon)
 
virtual ~QLearningEGreedyPolicy ()
 
- Public Member Functions inherited from Policy
 Policy ()
 
virtual ~Policy ()
 
virtual void setAgent (Agent *agent_)
 

Public Attributes

float decreaseConstant
 
float _epsilonDiv
 
- Public Attributes inherited from QLearningEGreedyPolicy
float epsilon
 The value (should be in [0,1]). More...
 
- Public Attributes inherited from Policy
Agentagent
 

Constructor & Destructor Documentation

QLearningEDecreasingPolicy::QLearningEDecreasingPolicy ( float  epsilon,
float  decreaseConstant 
)

Member Function Documentation

void QLearningEDecreasingPolicy::chooseAction ( Action action,
const Observation observation 
)
virtual

This method is implemented by subclasses. It chooses an action based on given observation #observation# and puts it in #action#.

Reimplemented from QLearningEGreedyPolicy.

float QLearningEDecreasingPolicy::getCurrentEpsilon ( ) const
virtual

Returns the current epsilon value ie. = epsilon / (1 + t * decreaseConstant).

void QLearningEDecreasingPolicy::init ( )
virtual

Reimplemented from Policy.

Member Data Documentation

float QLearningEDecreasingPolicy::_epsilonDiv
float QLearningEDecreasingPolicy::decreaseConstant

The decrease constant. Value should be >= 0, usually in [0, 1]. The decrease constant is applied in a similar fashion to the one for the stochastic gradient (see NeuralNetwork.h). Here, it is used to slowly decrease the epsilon value, thus allowing the agent to adapt its policy over time from being more exploratory to being more greedy.


The documentation for this class was generated from the following files: