|
Qualia
0.2
|
#include <QLearningSoftmaxPolicy.h>


Public Member Functions | |
| QLearningSoftmaxPolicy (float temperature=1.0, float epsilon=0.0) | |
| virtual | ~QLearningSoftmaxPolicy () |
| virtual void | chooseAction (Action *action, const Observation *observation) |
Public Member Functions inherited from Policy | |
| Policy () | |
| virtual | ~Policy () |
| virtual void | init () |
| virtual void | setAgent (Agent *agent_) |
Public Attributes | |
| float | temperature |
| float | epsilon |
Public Attributes inherited from Policy | |
| Agent * | agent |
Implements the softmax policy. The class contains an optional (epsilon) parameter that behaves in a similar fashion as the -greedy policy, meaning that there is a probability that the action is chosen randomly uniformly accross the action state and a probability of (1-) that it resorts to the softmax policy ie. picks randomly, but this time according to the softmax distribution.
| QLearningSoftmaxPolicy::QLearningSoftmaxPolicy | ( | float | temperature = 1.0, |
| float | epsilon = 0.0 |
||
| ) |
|
virtual |
|
virtual |
This method is implemented by subclasses. It chooses an action based on given observation #observation# and puts it in #action#.
Implements Policy.
| float QLearningSoftmaxPolicy::epsilon |
An optional parameter.
| float QLearningSoftmaxPolicy::temperature |
The temperature controls the "peakiness" (or "greediness") of the policy. Higher temperature means more peaky/greedy distribution, whereas lower temperatures results in more flat / uniformly distributed choices.
1.8.3.1