Skip to content

Conversation

@andreineculau
Copy link
Collaborator

I have some repos with many many files.
I haven't checked the git history for a reason on using a for loop and subshell and bash pattern checking instead, but going with piping and grep and sed makes a night and day difference.

@andreineculau andreineculau requested a review from jmurty January 30, 2025 17:12
@andreineculau
Copy link
Collaborator Author

andreineculau commented Jan 30, 2025

Git blame took me to #173 6e557df which fixed support for double-quoted filenames...

EDIT: closed the issue (I was author). Thank you for the fix @jmurty 🙌 🙇

@andreineculau
Copy link
Collaborator Author

andreineculau commented Jan 30, 2025

TDD FTW 🚀 Failing on

not ok 33 crypt: handle challenging file names when 'core.quotePath=true'

Will have a look and see what could be the alternative

EDIT: no cigar. Pausing efforts

@jmurty
Copy link
Collaborator

jmurty commented Feb 24, 2025

Thanks for looking at this @andreineculau, there's some dark magic in those pipe commands but if it runs much faster it's well worth the change. What kind of performance improvements are you seeing, out of interest?

I found the test failures blocking you were because of a latent bug in my older code: the text content checks in your changes were more correct than mine, and highlighted a problem elsewhere. I have pushed a fix for the code that calls the _list_encrypted_files() function to only do a "strict" listing of files within a given context, when a context is actually given.

If you can add some comments, and maybe squash the PR down to a cleaner history, I should be able to merge it to main soon.

@andreineculau
Copy link
Collaborator Author

andreineculau commented Feb 24, 2025

Fantastic! I didn't have time to look into the failures and I assumed we won't have full coverage. But if we do... 🚀

I will get back with some time stats and content the magic

@andreineculau andreineculau force-pushed the andreineculau-patch-3 branch from 7405209 to f18c4a4 Compare March 15, 2025 06:31
@andreineculau
Copy link
Collaborator Author

and these are the performance metrics in a repo with 25 encrypted files, >40 000 files.

PR version, running git commit and thus calling transcrypt to list files

[master fd56d9955] foo
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 foo
0.37user 0.60system 0:00.79elapsed 123%CPU (0avgtext+0avgdata 24880maxresident)k
0inputs+0outputs (408major+66302minor)pagefaults 0swaps

current version, running git commit and thus calling transcrypt to list files

[master f6a57465b] bar
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 bar
178.11user 198.21system 7:47.76elapsed 80%CPU (0avgtext+0avgdata 33472maxresident)k
0inputs+0outputs (381275major+74427218minor)pagefaults 0swaps

AND no, there is no error. It is that slow :)

jmurty added a commit that referenced this pull request Mar 17, 2025
jmurty added a commit that referenced this pull request Mar 17, 2025
Greatly improve performance in a repository with many files for
pre-commit safety check, encrypted file listing, and showing raw file.

Performance wins are achieved by avoiding creation of subshells in a
tight loop when processing Git's `ls-files` and instead using a smart
piping alternative approach.

Thanks Andrei!
@jmurty jmurty merged commit f18c4a4 into main Mar 17, 2025
4 checks passed
@jmurty
Copy link
Collaborator

jmurty commented Mar 17, 2025

Excellent, thanks for perfecting this huge performance improvement @andreineculau. I've merged it to main.

Those performance numbers from before your fix are horrifying 😱

jmurty added a commit that referenced this pull request Mar 17, 2025
* main: (27 commits)
  Add or update references to me (James Murty) as project maintainer
  Update long-outdated copyright notices
  Note performance improvements from #193 in changelog
  Remove repeated lines in comments
  Add tests for feature: detect incorrect password on init
  Improve name of function that counts dirty files, and do all the needed work in the function
  handle --force when checking for incorrect password
  performance: don't create subshells
  Document in changelog improvements by weichunglee in PR #197
  fix logic for version checking in pre_commit, fix empty files always failing validate_file
  Update ongoing dev version to 2.3.2-pre
  Release 2.3.1
  Warn when password is probably incorrect with error message and return code #182
  Fix --export-gpg command to properly include cipher in exported .asc file
  Fail with error when an empty password is provided to the -p or --password options #188
  Make --upgrade safer by failing fast if transcrypt config cannot be read #189
  Remove hard dependency on `xxd` which can be onerous, fall back to `printf` or `perl` instead #181
  Don't set global Git config when running tests
  Make ongoing dev version 2.3.1-pre
  Release 2.3.0
  ...
jmurty added a commit that referenced this pull request Sep 17, 2025
…unrecognised names #204

Tweak the file listing to fix two issues found to affect files with
problematic names like `"Terrible file""")))(((][][].secret`.

This change may undo performance improvements achieved in #193.

- Ensure the `git check-attr filter` command does not quote file name
  characters by using `-z` arg and some extra trickery to keep
  space and newline delimiter characters for filter metadata and each
  file name respectively.

  This fixes a file named `"Terrible file""")))(((][][].secret`
  being listed as just `Terrible file`.

- Adjust checking for `filter` and `crypt` strings to work with
  changed output from `check-attr` command.

- Tweak `eval 'echo $filename'` portion of command to avoid script
  crash on files with special characters with error message like
  `eval: line 192: syntax error near unexpected token`
@jeancornic
Copy link

Thanks a ton for the fix @andreineculau , you really made my day.

While I am running the latest version of transcrypt (2.3.1), it took me a while to figure out that there was a fix available for my performance issue. Would it be possible to release a new version with this fix?

@jmurty
Copy link
Collaborator

jmurty commented Sep 18, 2025

Hi @jeancornic thanks for the feedback. I have been slow to release a new version because the pending changes haven't been widely tested, and in fact an issue with the performance improvement for repos with many files surfaced just yesterday in #204.

Could you and @andreineculau try the tweaked version on branch issue-204(PR #205) and let me know whether or not it still performs okay in large repos after the bug fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants