Skip to content

Conversation

@CiSong10
Copy link
Contributor

Rewrite some functions using tqdm process bar and pathlib. Make other small code improvements as well.

@PatBall1
Copy link
Owner

@CiSong10 thanks for this. Would you please realign you branch with the new python-ci adjustments I made to master?

Also please see this autogenerated review:

I reviewed the diff and the intent is generally good — modernizing path handling (pathlib), adding progress bars (tqdm), cleaning up string formatting and JSON/file handling. I found several places that may cause bugs or break other code; below I list the issues grouped by file with suggested fixes.

General / cross-cutting

  • Many edits switch from str paths to pathlib.Path. Make sure each module imports Path (from pathlib import Path) at the top. I did not see new imports for Path in the diffs you showed — add them where missing.
  • Similarly, tqdm is used in multiple files but I only see it imported in predict.py. Add "from tqdm import tqdm" in outputs.py and any other modules using tqdm.
  • Changing functions to accept Path objects is fine, but ensure all call sites across the codebase pass Path or str consistently. Either accept both (Path(file)) or explicitly convert at function entry (as you did in a few places).

detectree2/models/evaluation.py

  • initialise_feats2: you renamed the parameter file -> filename. That is fine for positional calls, but any external callers passing keyword file=... will break. Also you changed the GeoFeature constructor call: GeoFeature(filename, ...) (previously GeoFeature(file,...)). Confirm GeoFeature expects the filename string (not full path) — or convert to str earlier.
  • site_f1_score2: tile_width appears to be referenced when calling initialise_feats2, but I see the earlier tile_width computation commented out:
    • You do: for file in test_directory.iterdir(): ... tile_origin = get_tile_origin(file) epsg = get_epsg(file) then call initialise_feats2(..., tile_width, tile_origin, epsg) — but tile_width is not defined in this scope (the commented line "# tile_width = get_tile_width(file) * scaling[0]" suggests it used to be set). This will raise a NameError. Restore or recompute tile_width before use.
  • site_f1_score2: you converted test_directory to Path and iterate over Path objects; you then pass file.name to initialise_feats2 which expects the filename (you changed that), OK — but ensure initialise_feats2 uses that name consistently (it now converts to Path inside).
  • site_f1_score2: printing inside the try/except — if a ZeroDivisionError is raised, the except prints but then the function still returns prec, rec, f1_score. Those variables may be undefined if the exception happened before they were set -> this can raise an UnboundLocalError. Ensure you return sensible defaults (e.g., 0, 0, 0) from the except block or re-raise.
  • get_tile_width / get_epsg / get_tile_origin: you now call file = Path(file) and file.stem.split("") which is fine for both str and Path. Confirm callers passing things like "Prediction*_eval.geojson" still work (they pass file.name or file.stem in some places).

detectree2/models/outputs.py

  • Missing imports: this file uses Path, tqdm, and may rely on CRS/rasterio already imported — ensure Path and tqdm are imported at top. I did not see tqdm import added here.
  • to_eval_geojson: you set directory = Path(directory) but the function signature still allows directory=None; calling with None will raise. Either keep signature non-optional or add a guard.
  • to_eval_geojson: you assign img_dict["filename"] = file.name (string) then later use file.open() — you still use the Path, so that's fine. Ensure later code that expects filename to be a path uses file (Path) rather than img_dict["filename"] (string).
  • to_eval_geojson: you use f-strings for EPSG ("f'urn:ogc:...'") — good.
  • project_to_geojson:
    • You use tqdm but outputs.py doesn't show importing it. Add import.
    • The new pattern opens the raster with "with rasterio.open(tifpath) as data:" and then sets raster_transform = data.transform. That variable is used later in transforming coords; OK.
    • entries = [file for file in Path(pred_fold).iterdir() if file.suffix == ".json"] — make sure pred_fold exists or handle it.
    • You changed output file writing to use Path.open/write_text (good). Confirm permission/encoding expectations.
  • filename_geoinfo: now uses Path(filename).stem and maps last 5 parts to int — OK but will raise ValueError if filename doesn't include 5 underscore-separated parts or parts are non-int. This mirrors previous behavior but keep in mind.
  • stitch_crowns:
    • You use tqdm here too; ensure tqdm imported.
    • You switched glob("geojson") -> glob(".geojson") (good).
    • You changed the FileNotFoundError message to include folder path — good.
    • box_filter(file, shift) now receives a Path. Confirm box_filter accepts Path or convert to str.

detectree2/models/predict.py

  • Imports: you added "from tqdm import tqdm" which is good. Ensure Path is imported (I don't see it in the snippet) if you're using Path in annotations and in code.
  • predict_on_data signature now allows Path for directory/out_folder — OK.
  • You replaced the outer loop with a tqdm loop. I would confirm the loop body correctly handles both png and tif cases (I saw the png branch; ensure other branches (tif/raster) remain intact).
  • When saving JSON you switched to output_file.write_text(json.dumps(evaluations)). That works, but be aware this writes compact JSON with no indentation; if you prefer pretty JSON, use json.dump with indent or write_text(json.dumps(..., indent=2)).
  • instances_to_coco_json is called with str(file_name) — correct because some functions expect path strings.
  • Ensure predictor outputs["instances"] exists (unchanged from before).

detectree2/preprocessing/tiling.py

  • You replaced os.makedirs(out_path, exist_ok=True) with out_path.mkdir(parents=True, exist_ok=True) — good. Ensure out_path is a Path.
  • record_classes: same mkdir change, OK.
  • to_traintest_folders: you replaced os.path.exists(tiles_dir) with tiles_dir.exists(), but tiles_dir is a Path and that is fine. However you then raise IOError without message; consider raising FileNotFoundError with message for clarity.

Other stylistic/robustness notes

  • Mixed use of Path and str: be careful passing Path objects into functions that expect strings (e.g., some third-party libs or old helper functions). If you move to Path everywhere, prefer converting at the function entrance (file = Path(file)) — which you did in a few places; do that consistently.
  • Logging: you added prints and tqdm; consider using logging so users can control verbosity.
  • Tests: run the test suite or a quick integration smoke test for these flows: predict_on_data, project_to_geojson, to_eval_geojson, site_f1_score2 and stitch_crowns. The undefined tile_width and missing imports are the highest-risk problems to surface quickly.
  • Type hints: you added some type unions (str | Path). Consider importing from typing for Python <3.10 compatibility, or ensure project runs on 3.10+.

Summary of highest-priority fixes

  1. Define tile_width before use in site_f1_score2 (currently NameError).
  2. Add missing imports:
    • from pathlib import Path in modules where Path is used but not imported.
    • from tqdm import tqdm in outputs.py and any other file using tqdm.
  3. Ensure initialise_feats2 signature/parameter rename is compatible with all callers or revert to original name or accept both.
  4. Fix returning potentially undefined variables in site_f1_score2 except block (return defaults or re-raise).
  5. Run integration tests to catch Path vs str mismatches (e.g., GeoFeature(), box_filter(), gpd.read_file and any other function that may expect strings).

ChristopherKotthoff and others added 26 commits November 20, 2025 09:51
This pull request enhances the image preprocessing capabilities in detectree2/preprocessing/tiling.py by introducing an optional RGB contrast enhancement feature and fixing some inconsistencies with return values across the tiling functions.
Tutorials updated and extended and a few other fixes

* rgb contrast and unsafe access fix

* experimental numpy fix

* fixes a lot of accumulated mypy problem

* rgb contrast and unsafe access fix

* experimental numpy fix

* fixes a lot of accumulated mypy problem

* first small adjustments

* class imbalance,  finetuning, & advanced tips

* refactoring tutorials

* unrelated small fix

* removing "step X" strings

* Zenodo release version update
Updated tutorial links in the README file to point to the correct index page. Issue PatBall1#210
`Path(file_roots[num[i]]` is a string of tile stem, its `.suffix` will return an empty string.
Simply tile name stem + .geojson is good.
This pull request enhances the image preprocessing capabilities in detectree2/preprocessing/tiling.py by introducing an optional RGB contrast enhancement feature and fixing some inconsistencies with return values across the tiling functions.
Updated tutorial links in the README file to point to the correct index page. Issue PatBall1#210
@CiSong10
Copy link
Contributor Author

Hi, I have merged up-to-date changes from master into this PR. I humanly reviewed the AI-generated reviews and there's not much to change. Conflicts resolved, etc. etc. Please check this PR for merging. Thanks.

@PatBall1
Copy link
Owner

The failing job is due to a mypy type-checking error at line 224 in detectree2/models/outputs.py:

error: Unsupported target for indexed assignment ("Collection[str]") [index]

Line 224 is attempting an indexed assignment on a type that does not support it—likely a type annotated as Collection rather than a mutable sequence like list. The problematic code is:

geofile["features"][i] = updated_feature

Or, in your context, you might do something like:

some_collection[index] = value

Solution:
If you need the collection to be writable by index, declare it as a list, not as a generic Collection. Update the type hint for features in GeoFile from:

"features": List[Feature]

and be sure that everywhere features is created and used, you keep it as a list (not as a generic collection or other immutable sequence).

If a variable is annotated as Collection[str], you cannot assign to indices (e.g., collection[0] = 'foo'). Change the variable's annotation to list[str] or List[str]:

from typing import List

features: List[str] = []
features[0] = "foo" # now valid

Go to the definition or location at line 224 and adjust the types as shown above. For more context, refer to the file at this ref: detectree2/models/outputs.py@24d614f06f4a8dff35d829bcee8b0945dcadb1bd.

Summary:

  • Change type annotations from Collection[...] to list[...] or List[...].
  • Ensure the actual variable is a list at runtime.
  • Do not index-assign into Collection types.

After making these changes, the mypy error will be resolved.

@PatBall1
Copy link
Owner

PatBall1 commented Nov 20, 2025

@CiSong10 please see guidance on programming style and pre-commit hooks: https://patball1.github.io/detectree2/contributing.html#programming-style

Code Formatting (isort/flake8) Failures:

  • The following files have "imports incorrectly sorted and/or formatted":

  • Solution: Run the following commands locally to fix import ordering:

    isort detectree2/models/train.py
    isort detectree2/models/predict.py

    Add and commit the changes.

  • Additionally, you can auto-fix many flake8 issues using:

    autoflake --in-place --remove-all-unused-imports detectree2/models/train.py
    autoflake --in-place --remove-all-unused-imports detectree2/models/predict.py

Once you’ve:

  • Corrected the import order and formatting in the code

Your CI action should pass both dependency installation and linting checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants