-
Notifications
You must be signed in to change notification settings - Fork 3
Features/search #143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Features/search #143
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the AutoML search strategy configuration and scoring system to improve flexibility and maintainability. The changes introduce YAML-based configuration management with fallback support, enable dynamic metric handling with both macro and weighted averaging methods, and translate comments to Vietnamese for better clarity.
Key Changes:
- Implemented centralized YAML configuration loading with a two-tier fallback system (primary config → default config) across all search strategies
- Modified scoring engine to generate both macro and weighted variants for precision, recall, and f1 metrics, providing comprehensive evaluation for imbalanced datasets
- Extracted and centralized numpy type conversion and cache cleanup logic into reusable helper methods
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
base.py |
Added _load_yaml_config() method for centralized config loading with fallback, refactored get_default_config() to read from YAML files |
base_config.yml |
Primary base configuration file for all search strategies with CV, scoring, logging, and parallel processing settings |
base_default_config.yml |
Fallback base configuration with identical structure to base_config.yml |
grid_search.py |
Refactored to use centralized YAML config loading, enhanced _get_params_hash() to include model base parameters for better cache uniqueness |
grid_search_config.yml |
Grid search specific configuration for dispatch, training, parallel evaluation, and caching |
grid_search_default_config.yml |
Fallback grid search configuration with identical structure |
genetic_algorithm.py |
Replaced custom config loading with base class method, moved numpy type conversion to centralized helper, extracted generation creation logic into _create_next_generation() method, simplified metric handling to work with pre-configured scoring dict from engine.py |
genetic_algorithm_config.yml |
GA configuration covering population, generation, genetic operators, performance optimization, and adaptive features |
genetic_algorithm_default_config.yml |
Fallback GA configuration with identical structure |
bayesian_search.py |
Refactored to use centralized YAML config loading, simplified metric handling to accept pre-configured scoring from engine.py, uses _finalize_results() for consistent cleanup |
bayesian_search_config.yml |
Bayesian search configuration for optimization parameters, acquisition function, and early stopping |
bayesian_search_default_config.yml |
Fallback Bayesian search configuration |
engine.py |
Modified to create both macro and weighted scorers for precision, recall, f1 metrics; updated score extraction logic to retrieve both averaging variants; translated comments to Vietnamese |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Tạm thời sử dụng tỷ lệ lai ghép thích ứng | ||
| original_rate = self.config['crossover_rate'] | ||
| self.config['crossover_rate'] = adaptive_crossover_rate | ||
| child1, child2 = self._crossover(parent1, parent2) | ||
| self.config['crossover_rate'] = original_rate |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The code temporarily modifies self.config['crossover_rate'] at lines 388-391 to use an adaptive rate, which is a side effect that could cause issues if the code is interrupted or if other code reads this config value during execution. Consider passing the crossover rate as a parameter to _crossover instead of modifying the shared config state.
For example:
child1, child2 = self._crossover(parent1, parent2, crossover_rate=adaptive_crossover_rate)This would make the code more maintainable and avoid potential race conditions in concurrent scenarios.
| rank_key = f'rank_test_{metric_sort}' | ||
| if rank_key in cv_results: | ||
| rank_array = np.array(cv_results[rank_key]) | ||
| else: |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metric_sort parameter is used to look up rank_test_{metric_sort} in cv_results at line 185. However, when metrics like 'precision', 'recall', or 'f1' are processed, they are split into '_macro' and '_weighted' variants (lines 129-137). This means if metric_sort is 'precision', the lookup for 'rank_test_precision' will fail since only 'rank_test_precision_macro' and 'rank_test_precision_weighted' exist in cv_results.
Consider either:
- Transforming metric_sort to include the suffix (e.g., 'precision' -> 'precision_macro') before passing to the search strategy, or
- Updating the lookup logic to handle both base names and suffixed names with a fallback mechanism.
| rank_key = f'rank_test_{metric_sort}' | |
| if rank_key in cv_results: | |
| rank_array = np.array(cv_results[rank_key]) | |
| else: | |
| # Handle suffixed metric names for precision, recall, f1 | |
| rank_array = None | |
| if metric_sort in ['precision', 'recall', 'f1']: | |
| # Prefer macro, fallback to weighted | |
| for avg_type in ['macro', 'weighted']: | |
| rank_key = f'rank_test_{metric_sort}_{avg_type}' | |
| if rank_key in cv_results: | |
| rank_array = np.array(cv_results[rank_key]) | |
| break | |
| else: | |
| rank_key = f'rank_test_{metric_sort}' | |
| if rank_key in cv_results: | |
| rank_array = np.array(cv_results[rank_key]) | |
| if rank_array is None: |
| # Đặt metric chính bằng best_score | ||
| primary_metric = self.config.get('metric_sort', 'accuracy') | ||
| if primary_metric in best_all_scores: | ||
| best_all_scores[primary_metric] = best_score |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the fallback case where best_all_scores is created from scratch (line 751), the code attempts to set the primary metric score at line 754: if primary_metric in best_all_scores. However, if metric_sort is a base metric name like 'precision' but the scoring config creates suffixed metrics like 'precision_macro' and 'precision_weighted', this condition will always be False. The best_score will not be assigned to the correct metric.
Consider checking for the presence of the primary metric with suffixes (e.g., if primary_metric is 'precision', check for 'precision_macro' or 'precision_weighted') or ensuring metric_sort is transformed to match the actual scoring keys.
| best_all_scores[primary_metric] = best_score | |
| best_all_scores[primary_metric] = best_score | |
| else: | |
| # Tìm key bắt đầu bằng primary_metric (ví dụ: 'precision' -> 'precision_macro') | |
| matching_keys = [k for k in best_all_scores if k.startswith(primary_metric)] | |
| if matching_keys: | |
| best_all_scores[matching_keys[0]] = best_score |
| # Xác định metric chính để đánh giá fitness | ||
| scoring_config = self.config.get('scoring', 'f1') | ||
| if isinstance(scoring_config, dict): | ||
| primary_metric = self.config.get('metric_sort', 'accuracy') | ||
| elif isinstance(scoring_config, str): | ||
| primary_metric = scoring_config.replace('_macro', '').replace('_weighted', '') | ||
| else: | ||
| primary_metric = 'f1' | ||
|
|
||
| primary_metric = self.config.get('metric_sort', 'accuracy') | ||
|
|
||
| # Trích xuất điểm fitness và theo dõi cá thể tốt nhất | ||
| fitness_scores = np.zeros(len(population)) # Sử dụng numpy array để hiệu suất tốt hơn | ||
| generation_best_score = float('-inf') | ||
| generation_improved = False | ||
|
|
||
| for idx, (individual, scores) in enumerate(zip(population, all_individual_scores)): | ||
| score = scores.get(primary_metric, 0.0) | ||
| fitness_scores[idx] = score |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary_metric retrieved from config may not match the actual metric keys in the scores dictionary. For example, if metric_sort is 'precision', but engine.py creates scoring with keys 'precision_macro' and 'precision_weighted', the lookup scores.get(primary_metric, 0.0) will return the default value 0.0 instead of the actual score. This would cause incorrect fitness evaluation and optimization.
Consider either:
- Ensuring metric_sort is transformed to include the suffix before being passed to the search strategy, or
- Implementing a fallback lookup that tries the base name first, then checks for '_macro' and '_weighted' suffixes.
| objective.last_metrics[key] = float(np.mean(cv_results[test_key])) | ||
|
|
||
| # Chọn điểm số để tối ưu hóa | ||
| score = objective.last_metrics.get(primary_metric, objective.last_metrics.get('accuracy', 0.0)) |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the issue in genetic_algorithm.py and engine.py, the primary_metric from metric_sort config may not match the actual keys in the scoring results. If metric_sort is 'precision' but the scoring dict has 'precision_macro' and 'precision_weighted', the lookup at line 204 objective.last_metrics.get(primary_metric, ...) will return the default value instead of the actual metric score. This would cause incorrect optimization.
Consider implementing a fallback mechanism that handles both base metric names and their suffixed variants, or ensuring metric_sort is transformed before being passed to the search strategy.
| score = objective.last_metrics.get(primary_metric, objective.last_metrics.get('accuracy', 0.0)) | |
| score = objective.last_metrics.get(primary_metric) | |
| if score is None: | |
| # Fallback: try to find a key that starts with the primary_metric | |
| for k in objective.last_metrics: | |
| if k == primary_metric or k.startswith(primary_metric + "_"): | |
| score = objective.last_metrics[k] | |
| break | |
| if score is None: | |
| score = objective.last_metrics.get('accuracy', 0.0) |
optivisionlab
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OKE
Refactor Config và Metric
Config: Áp dụng cơ chế đọc cấu hình từ YAML cho các chiến lược search (có hỗ trợ fallback).
Scoring: Sửa đổi engine để xử lý động các metric tùy chỉnh và hỗ trợ nhiều phương pháp tính trung bình (averaging methods).
Code quality: Dịch comment sang tiếng Việt để tăng tính dễ hiểu, chuẩn hóa các hàm tiện ích và bổ sung logger.