-
Notifications
You must be signed in to change notification settings - Fork 0
refactor: initializers module #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: refactor/rework-arch
Are you sure you want to change the base?
Changes from 2 commits
b7d2a82
17bb5eb
f5c6cc2
32a2535
64cc60a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -16,6 +16,43 @@ | |||||
| from rework_pysatl_mpest.optimizers import Optimizer, ScipyNelderMead | ||||||
|
|
||||||
|
|
||||||
| def _validate_clusters_distributions( | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Missed docstrings |
||||||
| H: np.ndarray, models_count: int, estimation_strategies_count: int, min_samples: int | ||||||
| ) -> tuple[list[int], list[float]]: | ||||||
| if not np.allclose(np.sum(H, axis=1), 1, atol=1e-10): | ||||||
| raise ValueError("Sum of H matrix weights must be equal to 1") | ||||||
|
|
||||||
| n_clusters = H.shape[1] | ||||||
|
|
||||||
| if estimation_strategies_count != models_count: | ||||||
| raise ValueError("Number of estimation functions must match number of models") | ||||||
|
|
||||||
| cluster_weights: list[float] = np.sum(H, axis=0) | ||||||
|
|
||||||
| valid_clusters = [k for k in range(n_clusters) if cluster_weights[k] >= min_samples] | ||||||
| if len(valid_clusters) != models_count: | ||||||
| return [], [] | ||||||
| return valid_clusters, cluster_weights | ||||||
|
|
||||||
|
|
||||||
| def _calculate_cluster_fit( | ||||||
|
||||||
| temp_model: ContinuousDistribution, | ||||||
| estimation_func: Callable, | ||||||
| X: np.ndarray, | ||||||
| H_k: np.ndarray, | ||||||
| optimizer: Optimizer, | ||||||
| ) -> tuple[dict[str, float], float]: | ||||||
| new_params = estimation_func(temp_model, X, H_k, optimizer) | ||||||
|
||||||
| new_params = estimation_func(temp_model, X, H_k, optimizer) | |
| param_names, param_values = estimation_func(temp_model, X, H_k, optimizer).items() |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| weighted_log_likelihood = np.sum(H_k * log_probs) | |
| weighted_log_likelihood = np.dot(H_k, log_probs) |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there an additional division here?
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Refactor the constructor and the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is necessary to add a check that the clusterer has the required methods in the constructor and in |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -320,6 +320,8 @@ def perform( | |
| 5. Returns the initialized mixture model | ||
| """ | ||
| X = np.asarray(X, dtype=np.float64) | ||
| if X.ndim == 1: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Move this inside |
||
| X = X.reshape(-1, 1) | ||
| self.models = dists | ||
| self.n_components = len(dists) | ||
| H = self._clusterize(X, self.clusterizer) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Clusterizer is available from the internal state of the object, why it is in the signature of the method |
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -183,7 +183,7 @@ def test_perform_accurate_init_normal_path(self): | |
| def test_perform_accurate_init_fallback_to_fast_init(self): | ||
| initializer = ClusterizeInitializer(is_accurate=True, is_soft=True, clusterizer=self.mock_clusterizer) | ||
|
|
||
| X = np.array([1.0, 2.0, 3.0]) | ||
| X = np.array([[1.0], [2.0], [3.0]]) | ||
|
||
| H = np.array([[0.8, 0.2], [0.7, 0.3], [0.9, 0.1]]) | ||
| dists = [self.mock_distributions[0], self.mock_distributions[1]] | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In examples we use imports from modules, not files inside module. That's why there is
__init__.pyfiles