Skip to content

Conversation

@majocha
Copy link
Contributor

@majocha majocha commented Aug 29, 2025

implicitYieldExpressions, argInfoCache, typeDefinitelySubsumesNoCoerceCache, typeFeasibleEquivCache

@github-actions
Copy link
Contributor

github-actions bot commented Aug 29, 2025

❗ Release notes required

@majocha,

Caution

No release notes found for the changed paths (see table below).

Please make sure to add an entry with an informative description of the change as well as link to this pull request, issue and language suggestion if applicable. Release notes for this repository are based on Keep A Changelog format.

The following format is recommended for this repository:

* <Informative description>. ([PR #XXXXX](https://github.com/dotnet/fsharp/pull/XXXXX))

See examples in the files, listed in the table below or in th full documentation at https://fsharp.github.io/fsharp-compiler-docs/release-notes/About.html.

If you believe that release notes are not necessary for this PR, please add NO_RELEASE_NOTES label to the pull request.

You can open this PR in browser to add release notes: open in github.dev

Change path Release notes path Description
src/Compiler docs/release-notes/.FSharp.Compiler.Service/11.0.0.md No release notes found or release notes format is not correct

// Is it quadratic or quasi-quadratic?
if ValueIsUsedOrHasEffect cenv (fun () -> (freeInExpr (CollectLocalsWithStackGuard()) bodyR).FreeLocals) (bindR, bindingInfo) then
let collect expr = (freeInExpr (CollectLocalsWithStackGuard()) expr).FreeLocals
if ValueIsUsedOrHasEffect cenv (fun () -> (getFreeLocalsCache cenv).GetOrAdd(bodyR, collect)) (bindR, bindingInfo) then
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives a modest 20% hit ratio in tests. May or may not help somewhat IRL but needs benchmarking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at Optimization times when building FCS I don't see any speedup.

/// Usage:
/// let getValueFor = WeakMap.getOrCreate (fun () -> expensiveInit())
/// let v = getValueFor someKey
let getOrCreate valueFactory =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allows to quickly and and-hoc attach a cache instance in various places. Currently we thread various caches along with context objects / records which is not that convenient when testing things out.

Comment on lines 958 to 964
let getArgInfoCache =
let options = Caches.CacheOptions.getDefault() |> Caches.CacheOptions.withNoEviction
let factory _ = new Caches.Cache<_, ArgReprInfo>(options, "argInfoCache")
WeakMap.getOrCreate factory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these caches going to be cleared when asking FCS to clear its caches?

Copy link
Contributor Author

@majocha majocha Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently not, I think since they're weakly attached to higher lever stuff, they'll just get GC'd with it. I captured some telemetry and the caches get disposed quite often during the test run.

@majocha
Copy link
Contributor Author

majocha commented Aug 30, 2025

Ahh, there's that memory leak in vs tests, again 😔

env.eCachedImplicitYieldExpressions.Add(synExpr1.Range, (synExpr1, expr1Ty, cachedExpr))
try TcExpr cenv overallTy env tpenv otherExpr
finally env.eCachedImplicitYieldExpressions.Remove synExpr1.Range
(getImplicitYieldExpressionsCache cenv).AddOrUpdate(synExpr1, (expr1Ty, cachedExpr))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics w.r.t. to error handling is different now, aren't they?

Also it feels like the previous version had a race condition (FindAll and later Add), possibly the Multi map was chosen to workaround it by storing a list and not a single value?

Copy link
Contributor Author

@majocha majocha Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous version had some concurrency problems, because Vlad at some point added ConcurrentDictionary as HashMultiMap backing store to fix them.

Semantics are different. Originally this just always removed the value in finally. I have to revisit this, I remember the change made sense to me but I forgot why, oh lol. I guess my thinking was eviction will be sufficient here. Now I also notice this was attached to env, not cenv.

This is not really tested in the test suite but there is a benchmark. I'll run it later to see if this still works.

@majocha
Copy link
Contributor Author

majocha commented Sep 11, 2025

I'm trying to identify places that could benefit from memoization or better caching.

Here are some suggestions by Copilot:

InfoReader: immediate members for a type

  • What: Cache “immediate” metadata enumerations per type:
    • GetImmediateIntrinsicMethInfosOfTypeAux, GetImmediateIntrinsicPropInfosOfTypeAux
    • GetImmediateIntrinsicILFieldsOfType, ComputeImmediateIntrinsicEventsOfType
  • Why: We repeatedly re-enumerate IL metadata and F# members by name for the same types inside hierarchy folds and during overload resolution.
  • Where: src/Compiler/Checking/InfoReader.fs
  • Key:
    • (TypeStructure, optional name filter) for the raw unfiltered enumeration.
    • Do NOT key by AccessorDomain; cache the raw list and filter per call (Is*Accessible).
  • Cache result:
    • MethInfo list (per name or “all”); PropInfo list; ILFieldInfo list; EventInfo list.
  • Notes:
    • Factor the existing “…Uncached” helpers to call cached “get all by name” then filter.
    • Works well for IL types; for F# types, MemberInfo is stable after typecheck.

Method/Property sets along hierarchy

  • What: Cache folded results:
    • GetIntrinsicMethodSetsUncached / GetIntrinsicPropertySetsUncached (and named variants)
  • Why: Same fold gets repeated across overload/resolution passes and “most specific” filtering.
  • Where: src/Compiler/Checking/InfoReader.fs
  • Key:
    • (TypeStructure, optional name, AllowMultiIntfInstantiations) → list<list>
    • Keep raw sets; filter by AccessorDomain and “most specific” per call.
  • Notes:
    • Avoid duplicating per‑call sorting by signature; cache canonicalized lists and reuse.

“Most specific” and equivalence checks for methods

  • What: Memoize expensive equivalence/specificity predicates:
    • MethInfosEquivByPartialSig, PropInfosEquivByNameAndPartialSig, GetMostSpecificItemsByType
  • Why: These perform lots of structural type/equivalence checks in nested loops.
  • Where: src/Compiler/Checking/InfoReader.fs
  • Key:
    • Pairwise (minfo1, minfo2) plus Erasure mode → bool
  • Notes:
    • Use a scoped LRU per compilation to avoid unbounded growth.

NameResolution: type lookup tables

  • What: Cache per‑type “named items” aggregation:
    • GetIntrinsicNamedItemsUncached
  • Why: It collects methods, properties, fields, events, traits by name and folds the hierarchy.
  • Where: src/Compiler/Checking/InfoReader.fs / NameResolution.fs
  • Key:
    • (TypeStructure, name) → HierarchyItem
  • Notes:
    • Again, filter accessibility at call sites.

I also notice the existing caching in InfoReader seem to exclude generic types. This can't be good. I wonder if it is possible to improve with similar approach to keys / hashing like in type subsumption cache.

@majocha
Copy link
Contributor Author

majocha commented Sep 11, 2025

With aditional type relations caches there seem to be a slight speedup against main:

DecentlySizedStandAloneFileBenchmark

cache-2

| Method | Mean     | Error    | StdDev   | Gen0      | Gen1      | Gen2      | Allocated |
|------- |---------:|---------:|---------:|----------:|----------:|----------:|----------:|
| Run    | 184.8 ms | 10.26 ms | 29.76 ms | 2000.0000 | 1000.0000 | 1000.0000 | 227.98 MB |

| Method | Mean     | Error    | StdDev   | Gen0      | Gen1      | Gen2      | Allocated |
|------- |---------:|---------:|---------:|----------:|----------:|----------:|----------:|
| Run    | 192.5 ms | 10.54 ms | 31.07 ms | 2000.0000 | 1000.0000 | 1000.0000 | 227.95 MB |

| Method | Mean     | Error    | StdDev   | Gen0      | Gen1      | Gen2      | Allocated |
|------- |---------:|---------:|---------:|----------:|----------:|----------:|----------:|
| Run    | 189.5 ms | 10.63 ms | 30.84 ms | 2000.0000 | 1000.0000 | 1000.0000 |    228 MB |

main

| Method | Mean     | Error   | StdDev   | Gen0      | Gen1      | Gen2      | Allocated |
|------- |---------:|--------:|---------:|----------:|----------:|----------:|----------:|
| Run    | 203.5 ms | 8.23 ms | 24.01 ms | 3000.0000 | 2000.0000 | 1000.0000 | 227.19 MB |

| Method | Mean     | Error   | StdDev   | Gen0      | Gen1      | Gen2      | Allocated |
|------- |---------:|--------:|---------:|----------:|----------:|----------:|----------:|
| Run    | 204.7 ms | 8.61 ms | 24.99 ms | 3000.0000 | 2000.0000 | 1000.0000 |    227 MB |

@majocha majocha force-pushed the cache-2 branch 2 times, most recently from 1b4f0e9 to c308343 Compare September 19, 2025 07:51
@majocha
Copy link
Contributor Author

majocha commented Sep 19, 2025

I added printing cache metrics when --times option is set, to see what's going on during one off compilation.

@T-Gro
Copy link
Member

T-Gro commented Sep 19, 2025

With aditional type relations caches there seem to be a slight speedup against main:

DecentlySizedStandAloneFileBenchmark

cache-2

| Method | Mean     | Error    | StdDev   | Gen0      | Gen1      | Gen2      | Allocated |
|------- |---------:|---------:|---------:|----------:|----------:|----------:|----------:|
| Run    | 184.8 ms | 10.26 ms | 29.76 ms | 2000.0000 | 1000.0000 | 1000.0000 | 227.98 MB |

| Method | Mean     | Error    | StdDev   | Gen0      | Gen1      | Gen2      | Allocated |
|------- |---------:|---------:|---------:|----------:|----------:|----------:|----------:|
| Run    | 192.5 ms | 10.54 ms | 31.07 ms | 2000.0000 | 1000.0000 | 1000.0000 | 227.95 MB |

| Method | Mean     | Error    | StdDev   | Gen0      | Gen1      | Gen2      | Allocated |
|------- |---------:|---------:|---------:|----------:|----------:|----------:|----------:|
| Run    | 189.5 ms | 10.63 ms | 30.84 ms | 2000.0000 | 1000.0000 | 1000.0000 |    228 MB |

main

| Method | Mean     | Error   | StdDev   | Gen0      | Gen1      | Gen2      | Allocated |
|------- |---------:|--------:|---------:|----------:|----------:|----------:|----------:|
| Run    | 203.5 ms | 8.23 ms | 24.01 ms | 3000.0000 | 2000.0000 | 1000.0000 | 227.19 MB |

| Method | Mean     | Error   | StdDev   | Gen0      | Gen1      | Gen2      | Allocated |
|------- |---------:|--------:|---------:|----------:|----------:|----------:|----------:|
| Run    | 204.7 ms | 8.61 ms | 24.99 ms | 3000.0000 | 2000.0000 | 1000.0000 |    227 MB |

Seeing it's for just 1MB increase in memory, this is very nice tradeoff 👍 .

(I assume the cache size counts into the allocated bucket)

@majocha
Copy link
Contributor Author

majocha commented Sep 20, 2025

(I assume the cache size counts into the allocated bucket)

Yes, it allocates the few caches that are there for each run, but they are probably not filled up very much. I guess HistoricalBenchmark is not great to test cache improvements, it just compiles a single not very large file, and does not use Transparent Compiler.

But I did run it for some older commits from around April, to see if things are not regressing unnoticed:

detached at f487d45 

| Method | Mean     | Error    | StdDev   | Median   | Gen0      | Gen1      | Gen2      | Allocated |
|------- |---------:|---------:|---------:|---------:|----------:|----------:|----------:|----------:|
| Run    | 269.1 ms | 18.50 ms | 53.98 ms | 241.1 ms | 5000.0000 | 3000.0000 | 2000.0000 | 293.12 MB |

detached at c549ebf

| Method | Mean     | Error    | StdDev   | Median   | Gen0      | Gen1      | Gen2      | Allocated |
|------- |---------:|---------:|---------:|---------:|----------:|----------:|----------:|----------:|
| Run    | 254.0 ms | 14.47 ms | 37.87 ms | 242.6 ms | 5000.0000 | 3000.0000 | 2000.0000 | 292.85 MB |

Thankfully, it's all good. We are better now than we were, both in allocations and speed.

ETA: this is not a benchmark to assess performance reliably. We can have a serious degraded typecheck times in real build that will not show up here at all.

@majocha
Copy link
Contributor Author

majocha commented Sep 20, 2025

MemoizationTable uses ConcurrentDictionary backing, we can swap this with non-evicting Cache, to see stats of how the various memoization tables perform in the compiler.

For example compiling FSharp.Compiler.Service\Release\net10.0

    ----------------------------------------------------------------------------------------------------------------------------
    | Cache name                          | hit-ratio |     adds |  updates |     hits |   misses | evictions | eviction-fails |
    ----------------------------------------------------------------------------------------------------------------------------
    |                     methodInfoCache |    76.99% |     5624 |        0 |    18820 |     5624 |         0 |              0 |
    | mostSpecificOverrideMethodInfoCache |    73.24% |      365 |        0 |      999 |      365 |         0 |              0 |
    |                      eventInfoCache |     0.00% |        1 |        0 |        0 |        1 |         0 |              0 |
    |                   propertyInfoCache |    38.66% |     1764 |        0 |     1112 |     1764 |         0 |              0 |
    |            implicitYieldExpressions |     0.00% |        0 |        0 |        0 |   717864 |         0 |              0 |
    |             implicitConversionCache |    92.72% |     1368 |        0 |    17436 |     1368 |         0 |              0 |
    |                    ilFieldInfoCache |    75.19% |       32 |        0 |       97 |       32 |         0 |              0 |
    |            entireTypeHierarchyCache |    90.04% |     3428 |        0 |    30983 |     3428 |         0 |              0 |
    |                      v_memoize_file |    99.90% |      380 |        0 |   383211 |      380 |         0 |              0 |
    |           recdOrClassFieldInfoCache |    46.21% |      795 |        0 |      683 |      795 |         0 |              0 |
    |              typeFeasibleEquivCache |    26.66% |    53985 |        0 |    19625 |    53985 |         0 |              0 |
    |                typeSubsumptionCache |    50.88% |   252652 |        0 |   261756 |   252652 |         0 |              0 |
    | typeDefinitelySubsumesNoCoerceCache |    63.60% |    46590 |        0 |    81421 |    46590 |         0 |              0 |
    |                     namedItemsCache |    56.63% |    15838 |        0 |    20681 |    15838 |         0 |              0 |
    |           primaryTypeHierarchyCache |    50.00% |       14 |        0 |       14 |       14 |         0 |              0 |
    |           rawDataValueTypeGenerator |    96.40% |       39 |        0 |     1045 |       39 |         0 |              0 |
    |                        argInfoCache |    26.43% |    42013 |        0 |    15096 |    42013 |         0 |              0 |
    ----------------------------------------------------------------------------------------------------------------------------

@majocha
Copy link
Contributor Author

majocha commented Sep 20, 2025

I'll document here how I estimate the performance, because the existing benchmarks and not suitable.
The steps are:

  • make some changes
  • rebuild boostrap
  • build FCS net10.0 single threaded
  • search for |Typecheck in the binlog to view the result generated with --times switch:
dotnet publish .\proto.proj -c Proto

dotnet build  .\src\Compiler\ -c Release -f net10.0 -m:1 --no-incremental -bl

.\msbuild.binlog

@T-Gro
Copy link
Member

T-Gro commented Sep 25, 2025

For example compiling FSharp.Compiler.Service\Release\net10.0

I like how this brings in clarity into the various caches.
Even if some of them end up performing roughly the same wo/ an improvement, I would propose to keep them (with non-evicting config) simply for the added insights 👍

@T-Gro
Copy link
Member

T-Gro commented Sep 25, 2025

Also globalizing them could help tooling scenarios with multiproject solutions, especially if there are many shared dependencies 👍 .

@majocha
Copy link
Contributor Author

majocha commented Sep 25, 2025

Globalizing is something to try, currently the stats that show up are cumulative, simply by cache name.
There can be many instances, some caches are one per TcGlobals, some MemoizeTables are recreated per checked file IIRC. This PR is a testing ground of various things, but I could extract the MemoizationTable stuff. It should be very safe and negligible on performance, because no eviction cache is just a ConcurrentDictionary with some metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants