-
Notifications
You must be signed in to change notification settings - Fork 0
refactor: initializers module #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: refactor/rework-arch
Are you sure you want to change the base?
Conversation
…ter_fit and _validate_clusters_distributions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In examples we use imports from modules, not files inside module. That's why there is __init__.py files
| from rework_pysatl_mpest.optimizers import Optimizer, ScipyNelderMead | ||
|
|
||
|
|
||
| def _validate_clusters_distributions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed docstrings
| return valid_clusters, cluster_weights | ||
|
|
||
|
|
||
| def _calculate_cluster_fit( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed docstrings
| H_k: np.ndarray, | ||
| optimizer: Optimizer, | ||
| ) -> tuple[dict[str, float], float]: | ||
| new_params = estimation_func(temp_model, X, H_k, optimizer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| new_params = estimation_func(temp_model, X, H_k, optimizer) | |
| param_names, param_values = estimation_func(temp_model, X, H_k, optimizer).items() |
| X = X.reshape(-1, 1) | ||
| self.models = dists | ||
| self.n_components = len(dists) | ||
| H = self._clusterize(X, self.clusterizer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clusterizer is available from the internal state of the object, why it is in the signature of the method self._clusterize??
| temp_model.set_params_from_vector(param_names, param_values) | ||
|
|
||
| log_probs = np.clip(temp_model.lpdf(X), -1e9, -1e-9) | ||
| weighted_log_likelihood = np.sum(H_k * log_probs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| weighted_log_likelihood = np.sum(H_k * log_probs) | |
| weighted_log_likelihood = np.dot(H_k, log_probs) |
| 5. Returns the initialized mixture model | ||
| """ | ||
| X = np.asarray(X, dtype=np.float64) | ||
| if X.ndim == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this inside self._clusterize(...)
Otherwise, your two-dimensional array X will be passed to the rest of the class's internal methods, which seem not to be suited for this.
| initializer = ClusterizeInitializer(is_accurate=True, is_soft=True, clusterizer=self.mock_clusterizer) | ||
|
|
||
| X = np.array([1.0, 2.0, 3.0]) | ||
| X = np.array([[1.0], [2.0], [3.0]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what?
This test is not intended to check various X formats.
Besides, why this particular test? It seems to me that tests should be written in the format the user will work with, assuming a one-dimensional array will be passed.
| ) | ||
|
|
||
| effective_n = cluster_weights[k] | ||
| score = weighted_log_likelihood / effective_n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there an additional division here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor the constructor and the perform signature:
The user should pass the default optimizer and clusterer to the constructor, and pass them to perform if they want to use other ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is necessary to add a check that the clusterer has the required methods in the constructor and in perform, since the object type is Any
…eparated into greedy, hungarian, permutation methods, also for scoring functions aic and likelihood feat: Enums for matching methods, scoring methods tests: tests that include cluster_match_strategy commented/deleted
| X: ArrayLike, | ||
| dists: list[ContinuousDistribution], | ||
| cluster_match_info: ClusterMatchStrategy, | ||
| cluster_match_info: MatchingMethod, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forgot to update the type hint in the docs
| for model, est_func in zip(models, estimation_strategies): | ||
| row: list[FitResult] = [] | ||
| for k in valid_clusters: | ||
| cache_key = (model.__class__, est_func, k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idk, but what happens if models contains multiple instances of the same class? It seems like they would share the same cache key.
| score_func: ScoringMethod, | ||
| estimation_strategies: list[EstimationStrategy], | ||
| optimizer: Optimizer = ScipyNelderMead(), | ||
| ) -> MixtureModel: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument names here do not match the abstract base class definition in Initializer
| model_weights: list[float] = [] | ||
| used_clusters = set() | ||
|
|
||
| for model, estimation_func in zip(context["models"], context["estimation_strategies"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The greedy strategy currently iterates sequentially. Was this order-dependency intended?
No description provided.