Skip to content

Commit e84df28

Browse files
Copilotjgbradley1
andauthored
Improve internal logging functionality by using Python's standard logging module (#1956)
* Initial plan for issue * Implement standard logging module and integrate with existing loggers Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Add test cases and improve documentation for standard logging Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Apply ruff formatting and add semversioner file for logging improvements Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Remove custom logger classes and refactor to use standard logging only Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Apply ruff formatting to resolve CI/CD test failures Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Add semversioner file and fix linting issues Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * ruff fixes * fix spelling error * Remove StandardProgressLogger and refactor to use standard logging Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Remove LoggerFactory and custom loggers, refactor to use standard logging Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Fix pyright error: use logger.info() instead of calling logger as function in cosmosdb_pipeline_storage.py Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * ruff fixes * Remove deprecated logger files that were marked as deprecated placeholders Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Replace custom get_logger with standard Python logging Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Fix linting issues found by ruff check --fix Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * apply ruff check fixes * add word to dictionary * Fix type checker error in ModelManager.__new__ method Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Refactor multiple logging.getLogger() calls to use single logger per file Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Remove progress_logger parameter from build_index() and logger parameter from generate_indexing_prompts() Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Remove logger parameter from run_pipeline and standardize logger naming Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Replace logger parameter with log_level parameter in CLI commands Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Fix import ordering in notebook files to pass poetry poe check Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Remove --logger parameter from smoke test command Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Fix Windows CI/CD issue with log file cleanup in tests Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Add StreamHandler to root logger in __main__.py for CLI logging Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Only add StreamHandler if root logger doesn't have existing StreamHandler Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Fix import ordering in notebook files to pass ruff checks Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Replace logging.StreamHandler with colorlog.StreamHandler for colorized log output Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Regenerate poetry.lock file after adding colorlog dependency Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Fix import ordering in notebook files to pass ruff checks Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * move printing of dataframes to debug level * remove colorlog for now * Refactor workflow callbacks to inherit from logging.Handler Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Fix linting issues in workflow callback handlers Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Fix pyright type errors in blob and file workflow callbacks Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Refactor pipeline logging to use pure logging.Handler subclasses Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Rename workflow callback classes to workflow logger classes and move to logger directory Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * update dictionary * apply ruff fixes * fix function name * simplify logger code * update * Remove error, warning, and log methods from WorkflowCallbacks and replace with standard logging Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * ruff fixes * Fix pyright errors by removing WorkflowCallbacks from strategy type signatures Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Remove ConsoleWorkflowLogger and apply consistent formatter to all handlers Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * apply ruff fixes * Refactor pipeline_logger.py to use standard FileHandler and remove FileWorkflowLogger Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Remove conditional azure import checks from blob_workflow_logger.py Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Fix pyright type checking errors in mock_provider.py and utils.py Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Run ruff check --fix to fix import ordering in notebooks Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Merge configure_logging and create_pipeline_logger into init_loggers function Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Remove configure_logging and create_pipeline_logger functions, replace all usage with init_loggers Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * apply ruff fixes * cleanup unused code * Update init_loggers to accept GraphRagConfig instead of ReportingConfig Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * apply ruff check fixes * Fix test failures by providing valid GraphRagConfig with required model configurations Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * apply ruff fixes * remove logging_workflow_callback * cleanup logging messages * Add logging to track progress of pandas DataFrame apply operation in create_base_text_units Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * cleanup logger logic throughout codebase * update * more cleanup of old loggers * small logger cleanup * final code cleanup and added loggers to query * add verbose logging to query * minor code cleanup * Fix broken unit tests for chunk_text and standard_logging Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * apply ruff fixes * Fix test_chunk_text by mocking progress_ticker function instead of ProgressTicker class Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * remove unnecessary logger * remove rich and fix type annotation * revert test formatting changes my by copilot * promote graphrag logs to root logger * add correct semversioner file * revert change to file * revert formatting changes that have no effect * fix changes after merge with main * revert unnecessary copilot changes * remove whitespace * cleanup docstring * simplify some logic with less code * update poetry lock file * ruff fixes --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
1 parent 27c6de8 commit e84df28

File tree

128 files changed

+2134
-2024
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

128 files changed

+2134
-2024
lines changed
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
{
2+
"type": "patch",
3+
"description": "cleaned up logging to follow python standards."
4+
}

dictionary.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,7 @@ itertuples
102102
isin
103103
nocache
104104
nbconvert
105+
levelno
105106

106107
# HTML
107108
nbsp
@@ -186,6 +187,7 @@ Verdantis's
186187
# English
187188
skippable
188189
upvote
190+
unconfigured
189191

190192
# Misc
191193
Arxiv

docs/config/env_vars.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -178,11 +178,11 @@ This section controls the cache mechanism used by the pipeline. This is used to
178178

179179
### Reporting
180180

181-
This section controls the reporting mechanism used by the pipeline, for common events and error messages. The default is to write reports to a file in the output directory. However, you can also choose to write reports to the console or to an Azure Blob Storage container.
181+
This section controls the reporting mechanism used by the pipeline, for common events and error messages. The default is to write reports to a file in the output directory. However, you can also choose to write reports to an Azure Blob Storage container.
182182

183183
| Parameter | Description | Type | Required or Optional | Default |
184184
| --------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----- | -------------------- | ------- |
185-
| `GRAPHRAG_REPORTING_TYPE` | The type of reporter to use. Options are `file`, `console`, or `blob` | `str` | optional | `file` |
185+
| `GRAPHRAG_REPORTING_TYPE` | The type of reporter to use. Options are `file` or `blob` | `str` | optional | `file` |
186186
| `GRAPHRAG_REPORTING_STORAGE_ACCOUNT_BLOB_URL` | The Azure Storage blob endpoint to use when in `blob` mode and using managed identity. Will have the format `https://<storage_account_name>.blob.core.windows.net` | `str` | optional | None |
187187
| `GRAPHRAG_REPORTING_CONNECTION_STRING` | The Azure Storage connection string to use when in `blob` mode. | `str` | optional | None |
188188
| `GRAPHRAG_REPORTING_CONTAINER_NAME` | The Azure Storage container name to use when in `blob` mode. | `str` | optional | None |

docs/config/yaml.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -149,11 +149,11 @@ This section controls the cache mechanism used by the pipeline. This is used to
149149

150150
### reporting
151151

152-
This section controls the reporting mechanism used by the pipeline, for common events and error messages. The default is to write reports to a file in the output directory. However, you can also choose to write reports to the console or to an Azure Blob Storage container.
152+
This section controls the reporting mechanism used by the pipeline, for common events and error messages. The default is to write reports to a file in the output directory. However, you can also choose to write reports to an Azure Blob Storage container.
153153

154154
#### Fields
155155

156-
- `type` **file|console|blob** - The reporting type to use. Default=`file`
156+
- `type` **file|blob** - The reporting type to use. Default=`file`
157157
- `base_dir` **str** - The base directory to write reports to, relative to the root.
158158
- `connection_string` **str** - (blob only) The Azure Storage connection string.
159159
- `container_name` **str** - (blob only) The Azure Storage container name.

graphrag/__init__.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,10 @@
22
# Licensed under the MIT License
33

44
"""The GraphRAG package."""
5+
6+
import logging
7+
8+
from graphrag.logger.standard_logging import init_console_logger
9+
10+
logger = logging.getLogger(__name__)
11+
init_console_logger()

graphrag/api/index.py

Lines changed: 13 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
import logging
1212

13-
from graphrag.callbacks.reporting import create_pipeline_reporter
13+
from graphrag.callbacks.noop_workflow_callbacks import NoopWorkflowCallbacks
1414
from graphrag.callbacks.workflow_callbacks import WorkflowCallbacks
1515
from graphrag.config.enums import IndexingMethod
1616
from graphrag.config.models.graph_rag_config import GraphRagConfig
@@ -19,10 +19,9 @@
1919
from graphrag.index.typing.pipeline_run_result import PipelineRunResult
2020
from graphrag.index.typing.workflow import WorkflowFunction
2121
from graphrag.index.workflows.factory import PipelineFactory
22-
from graphrag.logger.base import ProgressLogger
23-
from graphrag.logger.null_progress import NullProgressLogger
22+
from graphrag.logger.standard_logging import init_loggers
2423

25-
log = logging.getLogger(__name__)
24+
logger = logging.getLogger(__name__)
2625

2726

2827
async def build_index(
@@ -31,7 +30,6 @@ async def build_index(
3130
is_update_run: bool = False,
3231
memory_profile: bool = False,
3332
callbacks: list[WorkflowCallbacks] | None = None,
34-
progress_logger: ProgressLogger | None = None,
3533
) -> list[PipelineRunResult]:
3634
"""Run the pipeline with the given configuration.
3735
@@ -45,26 +43,25 @@ async def build_index(
4543
Whether to enable memory profiling.
4644
callbacks : list[WorkflowCallbacks] | None default=None
4745
A list of callbacks to register.
48-
progress_logger : ProgressLogger | None default=None
49-
The progress logger.
5046
5147
Returns
5248
-------
5349
list[PipelineRunResult]
5450
The list of pipeline run results
5551
"""
56-
logger = progress_logger or NullProgressLogger()
57-
# create a pipeline reporter and add to any additional callbacks
58-
callbacks = callbacks or []
59-
callbacks.append(create_pipeline_reporter(config.reporting, None))
52+
init_loggers(config=config)
6053

61-
workflow_callbacks = create_callback_chain(callbacks, logger)
54+
# Create callbacks for pipeline lifecycle events if provided
55+
workflow_callbacks = (
56+
create_callback_chain(callbacks) if callbacks else NoopWorkflowCallbacks()
57+
)
6258

6359
outputs: list[PipelineRunResult] = []
6460

6561
if memory_profile:
66-
log.warning("New pipeline does not yet support memory profiling.")
62+
logger.warning("New pipeline does not yet support memory profiling.")
6763

64+
logger.info("Initializing indexing pipeline...")
6865
# todo: this could propagate out to the cli for better clarity, but will be a breaking api change
6966
method = _get_method(method, is_update_run)
7067
pipeline = PipelineFactory.create_pipeline(config, method)
@@ -75,15 +72,14 @@ async def build_index(
7572
pipeline,
7673
config,
7774
callbacks=workflow_callbacks,
78-
logger=logger,
7975
is_update_run=is_update_run,
8076
):
8177
outputs.append(output)
8278
if output.errors and len(output.errors) > 0:
83-
logger.error(output.workflow)
79+
logger.error("Workflow %s completed with errors", output.workflow)
8480
else:
85-
logger.success(output.workflow)
86-
logger.info(str(output.result))
81+
logger.info("Workflow %s completed successfully", output.workflow)
82+
logger.debug(str(output.result))
8783

8884
workflow_callbacks.pipeline_end(outputs)
8985
return outputs

graphrag/api/prompt_tune.py

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
Backwards compatibility is not guaranteed at this time.
1212
"""
1313

14+
import logging
1415
from typing import Annotated
1516

1617
import annotated_types
@@ -20,7 +21,7 @@
2021
from graphrag.config.defaults import graphrag_config_defaults
2122
from graphrag.config.models.graph_rag_config import GraphRagConfig
2223
from graphrag.language_model.manager import ModelManager
23-
from graphrag.logger.base import ProgressLogger
24+
from graphrag.logger.standard_logging import init_loggers
2425
from graphrag.prompt_tune.defaults import MAX_TOKEN_COUNT, PROMPT_TUNING_MODEL_ID
2526
from graphrag.prompt_tune.generator.community_report_rating import (
2627
generate_community_report_rating,
@@ -47,11 +48,12 @@
4748
from graphrag.prompt_tune.loader.input import load_docs_in_chunks
4849
from graphrag.prompt_tune.types import DocSelectionType
4950

51+
logger = logging.getLogger(__name__)
52+
5053

5154
@validate_call(config={"arbitrary_types_allowed": True})
5255
async def generate_indexing_prompts(
5356
config: GraphRagConfig,
54-
logger: ProgressLogger,
5557
chunk_size: PositiveInt = graphrag_config_defaults.chunks.size,
5658
overlap: Annotated[
5759
int, annotated_types.Gt(-1)
@@ -71,8 +73,6 @@ async def generate_indexing_prompts(
7173
Parameters
7274
----------
7375
- config: The GraphRag configuration.
74-
- logger: The logger to use for progress updates.
75-
- root: The root directory.
7676
- output_path: The path to store the prompts.
7777
- chunk_size: The chunk token size to use for input text units.
7878
- limit: The limit of chunks to load.
@@ -89,6 +89,8 @@ async def generate_indexing_prompts(
8989
-------
9090
tuple[str, str, str]: entity extraction prompt, entity summarization prompt, community summarization prompt
9191
"""
92+
init_loggers(config=config)
93+
9294
# Retrieve documents
9395
logger.info("Chunking documents...")
9496
doc_list = await load_docs_in_chunks(
@@ -187,9 +189,9 @@ async def generate_indexing_prompts(
187189
language=language,
188190
)
189191

190-
logger.info(f"\nGenerated domain: {domain}") # noqa: G004
191-
logger.info(f"\nDetected language: {language}") # noqa: G004
192-
logger.info(f"\nGenerated persona: {persona}") # noqa: G004
192+
logger.debug("Generated domain: %s", domain)
193+
logger.debug("Detected language: %s", language)
194+
logger.debug("Generated persona: %s", persona)
193195

194196
return (
195197
extract_graph_prompt,

0 commit comments

Comments
 (0)