Qualia
0.2
|
#include <QLearningSoftmaxPolicy.h>
Public Member Functions | |
QLearningSoftmaxPolicy (float temperature=1.0, float epsilon=0.0) | |
virtual | ~QLearningSoftmaxPolicy () |
virtual void | chooseAction (Action *action, const Observation *observation) |
![]() | |
Policy () | |
virtual | ~Policy () |
virtual void | init () |
virtual void | setAgent (Agent *agent_) |
Public Attributes | |
float | temperature |
float | epsilon |
![]() | |
Agent * | agent |
Implements the softmax policy. The class contains an optional (epsilon) parameter that behaves in a similar fashion as the -greedy policy, meaning that there is a probability that the action is chosen randomly uniformly accross the action state and a probability of (1-) that it resorts to the softmax policy ie. picks randomly, but this time according to the softmax distribution.
QLearningSoftmaxPolicy::QLearningSoftmaxPolicy | ( | float | temperature = 1.0 , |
float | epsilon = 0.0 |
||
) |
|
virtual |
|
virtual |
This method is implemented by subclasses. It chooses an action based on given observation #observation# and puts it in #action#.
Implements Policy.
float QLearningSoftmaxPolicy::epsilon |
An optional parameter.
float QLearningSoftmaxPolicy::temperature |
The temperature controls the "peakiness" (or "greediness") of the policy. Higher temperature means more peaky/greedy distribution, whereas lower temperatures results in more flat / uniformly distributed choices.