-
Notifications
You must be signed in to change notification settings - Fork 23
CLOUDP-328217: Automation agent password secret #566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
CLOUDP-328217: Automation agent password secret #566
Conversation
MCK 1.6.0 Release NotesNew Features
Bug Fixes
Other Changes
|
| // Configure will configure all the specified authentication Mechanisms. We need to ensure we wait for | ||
| // the agents to reach ready state after each operation as prematurely updating the automation config can cause the agents to get stuck. | ||
| func Configure(conn om.Connection, opts Options, isRecovering bool, log *zap.SugaredLogger) error { | ||
| func Configure(client kubernetesClient.Client, ctx context.Context, mdbNamespacedName *types.NamespacedName, conn om.Connection, opts Options, isRecovering bool, log *zap.SugaredLogger) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ctx should always be first arg
| // Disable disables all authentication mechanisms, and waits for the agents to reach goal state. It is still required to provide | ||
| // automation agent username, password and keyfile contents to ensure a valid Automation Config. | ||
| func Disable(conn om.Connection, opts Options, deleteUsers bool, log *zap.SugaredLogger) error { | ||
| func Disable(client kubernetesClient.Client, ctx context.Context, mdbNamespacedName *types.NamespacedName, conn om.Connection, opts Options, deleteUsers bool, log *zap.SugaredLogger) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
types.NamespacedName is usually not passed by pointer. Is there a reason it's a pointer here? Is passing nil here a valid case?
| authentication: | ||
| agents: | ||
| # This may look weird, but without it we'll get this from OpsManager: | ||
| # Cannot configure SCRAM-SHA-1 without using MONGODB-CR in te Agent Mode","reason":"Cannot configure SCRAM-SHA-1 without using MONGODB-CR in te Agent Mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you pls fix the typo in "te" in the mentioned validation btw.?
| security: | ||
| authentication: | ||
| agents: | ||
| # This may look weird, but without it we'll get this from OpsManager: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove "This may look weird" - let's state the why objectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw. SCRAM-SHA-1 is deprecated and it requires some additional legacy enablement with that MONGODB-CR IIRC
| server_certs: str, | ||
| namespace: str, | ||
| ) -> MongoDB: | ||
| resource = MongoDB.from_yaml(find_fixture(f"switch-project/{MDB_FIXTURE_NAME}.yaml"), namespace=namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we use some basic fixture to not create redundant yamls? I see you're configuring all the security and auth in the test anyway
| Ensures test isolation in a multi-namespace test environment. | ||
| """ | ||
| return random_k8s_name(f"{namespace}-project-") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to randomize it - namespace is randomized in evg anyway. And randomizing it is just making local runs difficult and not possible to re-run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to have different project names for different deployments then just add a resource name to it: {namespace}-{mdb.name} - it will suffice for making them unique for the test
|
|
||
|
|
||
| @pytest.fixture(scope="module") | ||
| def replica_set(namespace: str) -> MongoDB: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resource fixture should be scoped to function and have if try_load(resource) applied (look for it in other tests).
We don't have this pattern applied across the board, but we try to use it for newer tests.
| tester.assert_expected_users(0) | ||
|
|
||
| def test_create_secret(self): | ||
| print(f"creating password for MongoDBUser {self.USER_NAME} in secret/{self.PASSWORD_SECRET_NAME} ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this necessary? normally we have the progress visible when running the test, i.e. which test step is currently executing. There is nothing more than that so I think it's redundant
| }, | ||
| ) | ||
|
|
||
| replica_set.load() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when you use the pattern with function scope and try_load, you don't need to load it manually in the tests avoiding flakiness errors due to stale object
|
|
||
| replica_set.load() | ||
| replica_set["spec"]["opsManager"]["configMapRef"]["name"] = new_project_configmap | ||
| replica_set.set_version(custom_mdb_version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the version should be only set in the resource's fixture unless the changing version is part of the test.
|
|
||
| def test_moved_replica_set_connectivity(self): | ||
| """ | ||
| Verify connectivity to the replica set after switching projects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment is redundant, just name the function name so it's self describing (it's already good)
|
|
||
| def test_ops_manager_state_correctly_updated_in_moved_replica_set(self, replica_set: MongoDB): | ||
| """ | ||
| Ensure Ops Manager state is correctly updated in the moved replica set after the project switch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment also is not adding much over the already good function name
| MDB_RESOURCE_NAME = "replica-set-scram-sha-1-switch-project" | ||
| MDB_FIXTURE_NAME = MDB_RESOURCE_NAME | ||
|
|
||
| CONFIG_MAP_KEYS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this necessary? Those keys won't be ever changed so we can just inline them.
|
|
||
|
|
||
| @pytest.mark.e2e_replica_set_x509_switch_project | ||
| class TestReplicaSetCreationAndProjectSwitch(KubernetesTester): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the three or four test classes here any different? Do you think we could extract it as a reusable test class parametrized with resource and auth mechanism?
The way to do this is to have a generic test helper (important: without the pytest.mark annotation!)
class ReplicaSetCreationAndProjectSwitchTestHelper:
def test_create_replica_set
def test_ops_manager_state_correctly_updated_in_initial_replica_set
[...]
And in the actual test files we could have only test functions that simply delegate to the common code:
@pytest.fixture
def test_helper() -> ReplicaSetCreationAndProjectSwitchTestHelper:
... configure and return test helper
@pytest.mark.e2e_replica_set_x509_switch_project
def test_create_replica_set(test_helper: ReplicaSetCreationAndProjectSwitchTestHelper):
test_helper.test_create_replica_set()
... etc.
I'm a bit worried that we've added just too much of duplicated code. We've already have a code duplication problem - let's try to not exacerbate it further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example of using test helper to reduce duplication:
mongodb-kubernetes/docker/mongodb-kubernetes-tests/tests/search/search_enterprise_tls.py
Line 216 in 78e9a13
| def test_search_restore_sample_database(mdb: MongoDB): |
unfortunately for now we cannot reuse easily whole test classes with the testing steps. The test functions/classes must be defined in each file separately, but we can organize the code to minimize the duplication
| # def test_create_secret(self): | ||
| # print(f"creating password for MongoDBUser {self.USER_NAME} in secret/{self.PASSWORD_SECRET_NAME} ") | ||
|
|
||
| # create_or_update_secret( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either remove or uncomment commented code; if necessary to leave it - explain why is commented
lsierant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome you've added so many e2e tests! But let's try to think how we could minimize the code duplication in there. It looks like all the tests are almost identical.
Summary
If a deployment is moved to a different project, the automation agent password will be re-generated, triggering a password change in the automation plan.
This will cause a deadlock in a sharded cluster due to the multiple components requiring automation. However, it will not cause issues in replicasets.
This is a blocker for migrating projects in sharded deployments.
Proof of Work
For SCRAM (the only auth mechanism who re-generates a pwd), we now save the automation agent's password in a secret. During migration, the stored secret is utilized to preserve the password ,ensuring project migration possible.
Observed problems
For LDAP (Sharded + Replica) and SCRAM (Sharded), the following tests are failing, even though the only modification made is updating the MongoDB resource's project reference, with no other changes applied. To help further investigation, I have commented out certain code in the tests (which can make them fail) so the issue can be consistently reproduced.
I’ve observed that when users are previously defined and the project is changed, there is a risk that the deployment does not automatically return to the "running" state. This appears to occur because one or more pods fail to receive the updated automation configuration.
Furthermore, for LDAP deployments, while the deployment might return to the "running" state, the users are missing from the automation configuration.
Checklist
skip-changeloglabel if not needed