Skip to content

Conversation

@VITYANA
Copy link

@VITYANA VITYANA commented Nov 3, 2025

No description provided.

@VITYANA VITYANA requested a review from iraedeus November 3, 2025 18:32
@VITYANA VITYANA self-assigned this Nov 3, 2025
@VITYANA VITYANA added the enhancement New feature or request label Nov 3, 2025
@VITYANA VITYANA changed the title Refactor: initializers refactor refactor: initializers module Nov 3, 2025
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In examples we use imports from modules, not files inside module. That's why there is __init__.py files

from rework_pysatl_mpest.optimizers import Optimizer, ScipyNelderMead


def _validate_clusters_distributions(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed docstrings

return valid_clusters, cluster_weights


def _calculate_cluster_fit(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed docstrings

H_k: np.ndarray,
optimizer: Optimizer,
) -> tuple[dict[str, float], float]:
new_params = estimation_func(temp_model, X, H_k, optimizer)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
new_params = estimation_func(temp_model, X, H_k, optimizer)
param_names, param_values = estimation_func(temp_model, X, H_k, optimizer).items()

X = X.reshape(-1, 1)
self.models = dists
self.n_components = len(dists)
H = self._clusterize(X, self.clusterizer)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clusterizer is available from the internal state of the object, why it is in the signature of the method self._clusterize??

temp_model.set_params_from_vector(param_names, param_values)

log_probs = np.clip(temp_model.lpdf(X), -1e9, -1e-9)
weighted_log_likelihood = np.sum(H_k * log_probs)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
weighted_log_likelihood = np.sum(H_k * log_probs)
weighted_log_likelihood = np.dot(H_k, log_probs)

5. Returns the initialized mixture model
"""
X = np.asarray(X, dtype=np.float64)
if X.ndim == 1:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this inside self._clusterize(...)
Otherwise, your two-dimensional array X will be passed to the rest of the class's internal methods, which seem not to be suited for this.

initializer = ClusterizeInitializer(is_accurate=True, is_soft=True, clusterizer=self.mock_clusterizer)

X = np.array([1.0, 2.0, 3.0])
X = np.array([[1.0], [2.0], [3.0]])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what?
This test is not intended to check various X formats.

Besides, why this particular test? It seems to me that tests should be written in the format the user will work with, assuming a one-dimensional array will be passed.

)

effective_n = cluster_weights[k]
score = weighted_log_likelihood / effective_n
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there an additional division here?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor the constructor and the perform signature:
The user should pass the default optimizer and clusterer to the constructor, and pass them to perform if they want to use other ones.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is necessary to add a check that the clusterer has the required methods in the constructor and in perform, since the object type is Any

…eparated into

greedy, hungarian, permutation methods, also for scoring functions aic and likelihood
feat: Enums for matching methods, scoring methods
tests: tests that include cluster_match_strategy commented/deleted
X: ArrayLike,
dists: list[ContinuousDistribution],
cluster_match_info: ClusterMatchStrategy,
cluster_match_info: MatchingMethod,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to update the type hint in the docs

for model, est_func in zip(models, estimation_strategies):
row: list[FitResult] = []
for k in valid_clusters:
cache_key = (model.__class__, est_func, k)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idk, but what happens if models contains multiple instances of the same class? It seems like they would share the same cache key.

score_func: ScoringMethod,
estimation_strategies: list[EstimationStrategy],
optimizer: Optimizer = ScipyNelderMead(),
) -> MixtureModel:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument names here do not match the abstract base class definition in Initializer

model_weights: list[float] = []
used_clusters = set()

for model, estimation_func in zip(context["models"], context["estimation_strategies"]):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The greedy strategy currently iterates sequentially. Was this order-dependency intended?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

4 participants