Skip to content

Reward Kernel Formulation - incorrect in docs #737

@hungto3112

Description

@hungto3112

This was discussed and confirmed in [https://discord.com/channels/698080905209577513/702060196222205962/1468594043851374604]

Assume the grid has only 1 redispatchable generator, and max ramp up/down is +5/-5, according to the docs, the action space is [-5, 5] [https://grid2op.readthedocs.io/en/latest/mdp.html#modeling-sequential-decisions]

If the agent outputs an action like [6.0] (this can be the case if the agent is a neural network), in this case, the way the reward kernel behaves, in the language of MDP, doesn't match the notation.

The reward kernel will process and perform "do-nothing action".
But for some reward functions available in Grid2Op, they can give a "-1.0" reward signal because the agent asked for an illegal action.

In the notation of reward kernel, the action "a", is it:

  • Case A: the illegal, out-of-action-space [6.0],
  • or Case B, the "do-nothing action" (which replaces the [6.0])?

I'm seeing a little of contradicts here:
If it's Case A, that means the reward kernel is processing an action (or action vector) that doesn't belong to the action space of the environment (out of [-5, 5] )? Is that suitable?
If it's Case B, it doesn't make sense, because now it returns -1.0 for a do-nothing action, which won't happen if we use the real do-nothing action at the beginning (it will not be treated as illegal, hence not -1.0). So that means there is something outside the reward kernel that defines the -1.0 illegal point?

Possible solution

Reward kernel should be a function that takes also some flags from the environment (like "is_ambiguous", "is_illegal" etc.) which is not in the actual formulation.

So we might have:

final_action, is_ambiguous, is_illegal = translate_action(action_vector_from_agent)

RewardKernel(s, final_action, is_ambiguous, is_illegal)

final_action usually is the do-nothing action

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions