[Question] A2C vs PPO parameter/performance matching

### ❓ Question

Hi, I am working with both PPO and A2C, and PPO is working great for me. I have dialed in the PPO hyperparameters well for my case, but when I use A2C with the same (or as close as I can get) hyperparameters, training is having issues. 

I know they are different algorithms so I have also tried some exploration using guidance from the documentation but I just can't get close to the same performance. I've come across this paper which shows it is possible to go from A2C -> PPO (or making PPO behave like A2C) and the steps needed, but is it possible to go the other way? If so, how?

My best performance was (PPO in orange, A2C in blue):
<img width="1323" height="505" alt="Image" src="https://github.com/user-attachments/assets/42d80ad2-fa0e-4208-95d7-649fe9cedf55" />

With other attempts not being close (PPO in orange, A2C with various hyperparameters as other colours):

<img width="2842" height="489" alt="Image" src="https://github.com/user-attachments/assets/0badf030-2529-44b6-8b47-a62c3b9f74ed" />

This is the network structure I use for both, with `MlpPolicy`, and the PPO parameters:

```python
    policy_kwargs = {
        "activation_fn": th.nn.ReLU,
        "net_arch": {
            "pi": [size, size],
            "vf": [size, size],
        },
        "ortho_init": True,
    }

    ppo_params = {
        "n_steps": 6144,
        "batch_size": 512,
        "n_epochs":  6,
        "clip_range":  0.25,
        "ent_coef":  0.01,
        "max_grad_norm": 0.5,
        "gamma": 0.995,
        "gae_lambda": 0.95,
        "target_kl": 0.03,
    }
```

I always use 32 envs for training.

### Checklist

- [x] I have checked that there is no similar [issue](https://github.com/DLR-RM/stable-baselines3/issues) in the repo
- [x] I have read the [documentation](https://stable-baselines3.readthedocs.io/en/master/)
- [x] If code there is, it is [minimal and working](https://github.com/DLR-RM/stable-baselines3/issues/982#issuecomment-1197044014)
- [x] If code there is, it is formatted using the [markdown code blocks](https://help.github.com/en/articles/creating-and-highlighting-code-blocks) for both code and stack traces.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] A2C vs PPO parameter/performance matching #2186

❓ Question

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] A2C vs PPO parameter/performance matching #2186

Description

❓ Question

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions