Import and extend PosteriorStats #431

sethaxen · 2023-08-20T14:55:26Z

This PR makes the following replacements everywhere:

hpd -> PosteriorStats.hdi (hpd retains its old behavior and is now deprecated)
summarize -> PosteriorStats.summarize
ChainDataFrame -> PosteriorStats.SummaryStats

Other changes:

overloads and exports PosteriorStats.eti
changes default interval probability to 0.89 (PosteriorStats's default)
changes plots to plot ETI by default instead of HDI (consistent with summarize's defaults), adding new API for setting interval probability and CI type
Julia lower bound bumped to v1.10 (PosteriorStats v3's Julia lower bound)

The replacement of ChainDataFrame and the ~~slight~~ change in API and behavior of the methods makes this a breaking change.

Implements #430

e.g.

julia> val = rand(500, 2, 3);

julia> chn
Chains MCMC chain (4000×4×2 Array{Float64, 3}):

Iterations        = 1:2:7999
Number of chains  = 2
Samples per chain = 4000
parameters        = param_1, param_2, param_3, param_4


Use `describe(chains)` for summary statistics and quantiles.


julia> describe(chn)
Chains MCMC chain (1000×8×4 Array{Float64, 3}):

Iterations        = 1:1:1000
Number of chains  = 4
Samples per chain = 1000
parameters        = a, b
internals         = c, d, e, f, g, h

Summary Statistics
     mean    std  eti89            ess_tail  ess_bulk  rhat  mcse_mean  mcse_std 
 a  0.496  0.285  0.0550 .. 0.946      3970      4088  1.00     0.0045    0.0021
 b  0.504  0.290  0.0551 .. 0.946      3972      4073  1.00     0.0045    0.0021

Quantiles
      2.5%  25.0%  50.0%  75.0%  97.5% 
 a  0.0223  0.258  0.492  0.732  0.976
 b  0.0240  0.253  0.504  0.759  0.976
julia> hdi(chn)
HDI
    hdi89            
 a  0.00178 .. 0.887
 b    0.112 .. 0.997

julia> eti(chn)
ETI
    eti89           
 a  0.0550 .. 0.946
 b  0.0551 .. 0.946

Closes #491

sethaxen · 2023-08-20T14:57:21Z

src/stats.jl

    changerates, mvchangerate = changerate(chains)

    # Summarize the results in a named tuple.
-    nt = (; zip(names_of_params, changerates)..., multivariate = mvchangerate)


Lacking a parameter column meant the show method was broken. But since there is a changerate for every parameter, it makes more sense to do the same thing as gelmandiag_multivariate and return a SummaryStats for the marginal values and return the multivariate changerate separately.

Genuine question: what is the change-rate in this context?

From inspecting the code, it's, for each parameter and chain, the fraction of draws that are different from the previous draw. I suppose it's similar to "acceptance rate."

src/stats.jl

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

github-actions

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

JuliaFormatter v1.0.62

[JuliaFormatter v1.0.62] _{reported by reviewdog 🐶}

MCMCChains.jl/test/summarize_tests.jl

Lines 67 to 71 in e31f4f9

    
           three_parms_df = DataFrame(summarize( 
        
               chns[[:a, :b, :c]], 
        
               mean, std; 
        
               sections = [:parameters, :internals], 
        
           ))

[JuliaFormatter v1.0.62] _{reported by reviewdog 🐶}

MCMCChains.jl/test/summarize_tests.jl

Lines 75 to 80 in e31f4f9

    
           three_parms_df_2 = DataFrame(summarize( 
        
               chns[[:a, :b, :g]], 
        
               :mymean => mean, 
        
               :mystd => std; 
        
               sections = [:parameters, :internals], 
        
           ))

github-actions · 2025-09-15T12:42:27Z

test/diagnostic_tests.jl

    @test chn3.info == chn.info

    @test all(MCMCChains.indiscretesupport(chn) .== [false, false, false, true])
    @test setinfo(chn, NamedTuple{(:A, :B)}((1,2))).info == NamedTuple{(:A, :B)}((1,2))


[JuliaFormatter v1.0.62] _{reported by reviewdog 🐶}

Suggested change

@test setinfo(chn, NamedTuple{(:A, :B)}((1,2))).info == NamedTuple{(:A, :B)}((1,2))

@test setinfo(chn, NamedTuple{(:A, :B)}((1, 2))).info == NamedTuple{(:A, :B)}((1, 2))

github-actions · 2025-09-15T12:42:27Z

test/diagnostic_tests.jl

 end

 @testset "function tests" begin
    tchain = Chains(rand(niter, nparams, nchains), ["a", "b", "c"], Dict(:internals => ["c"]))


[JuliaFormatter v1.0.62] _{reported by reviewdog 🐶}

Suggested change

tchain = Chains(rand(niter, nparams, nchains), ["a", "b", "c"], Dict(:internals => ["c"]))

tchain =

Chains(rand(niter, nparams, nchains), ["a", "b", "c"], Dict(:internals => ["c"]))

github-actions · 2025-09-15T12:42:27Z

test/diagnostic_tests.jl

            lags = MCMCChains._default_lags(c, append_chains)
            @test lags == filter!(x -> x < n, [1, 5, 10, 50])

            acor = autocor(c; append_chains=append_chains)


[JuliaFormatter v1.0.62] _{reported by reviewdog 🐶}

Suggested change

acor = autocor(c; append_chains=append_chains)

acor = autocor(c; append_chains = append_chains)

github-actions · 2025-09-15T12:42:27Z

test/ess_rhat_tests.jl

    x = rand(10_000, 40, 10)
    chain = Chains(x)

    for autocov_method in (AutocovMethod(), FFTAutocovMethod(), BDAAutocovMethod()), kind in (:bulk, :basic), f in (ess, ess_rhat, rhat)


[JuliaFormatter v1.0.62] _{reported by reviewdog 🐶}

Suggested change

for autocov_method in (AutocovMethod(), FFTAutocovMethod(), BDAAutocovMethod()), kind in (:bulk, :basic), f in (ess, ess_rhat, rhat)

for autocov_method in (AutocovMethod(), FFTAutocovMethod(), BDAAutocovMethod()),

kind in (:bulk, :basic),

f in (ess, ess_rhat, rhat)

github-actions · 2025-09-15T12:42:28Z

test/ess_rhat_tests.jl


        # analyze array
        ess_array, rhat_array = ess_rhat(
            permutedims(x, (1, 3, 2)); autocov_method = autocov_method, kind = kind,


[JuliaFormatter v1.0.62] _{reported by reviewdog 🐶}

Suggested change

permutedims(x, (1, 3, 2)); autocov_method = autocov_method, kind = kind,

permutedims(x, (1, 3, 2));

autocov_method = autocov_method,

kind = kind,

github-actions · 2025-09-15T12:42:28Z

test/mcse_tests.jl


                # analyze array
                mcse_array = mcse(
                    PermutedDimsArray(x, (1, 3, 2)); autocov_method = autocov_method, kind = kind,


[JuliaFormatter v1.0.62] _{reported by reviewdog 🐶}

Suggested change

PermutedDimsArray(x, (1, 3, 2)); autocov_method = autocov_method, kind = kind,

PermutedDimsArray(x, (1, 3, 2));

autocov_method = autocov_method,

kind = kind,

github-actions · 2025-09-15T12:42:28Z

test/missing_tests.jl

    rf_1 = rafterydiag(chn)
    rf_2 = rafterydiag(chn_m)

    @testset "diagnostics missing tests" for i in 1:nchains


[JuliaFormatter v1.0.62] _{reported by reviewdog 🐶}

Suggested change

@testset "diagnostics missing tests" for i in 1:nchains

@testset "diagnostics missing tests" for i = 1:nchains

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

sethaxen · 2025-09-22T22:42:45Z

src/plot.jl

+- `ci_fun` (default: `eti`): The function used to compute the credible intervals.
+  (Can be [`eti`](@ref) or [`hdi`](@ref))
+
+- `ci_probs` (default: `[$DEFAULT_CI_PROB, 0.8]`): The probability mass(es) of the credible


These are now quite close; perhaps the 0.8 should be decreased. For reference, ArviZ uses [0.89, 0.5] as the default probs.

That makes sense, happy to change it to 0.5.

sethaxen · 2025-09-22T22:50:26Z

This is now up-to-date with PosteriorStats and ready for review. The remaining formatting suggestions seem to all be in lines not touched by this PR but close to lines touched by this PR.

Note that SummaryStats's only current documented interface is its Tables interface. However, because it's both a Table and a Table sink (like DataFrame), instead of the old DataFrames-inspired behavior, one can easily work with a SummaryStats by converting it to a DataFrame, manipulating it as desired, and then converting back to a SummaryStats.

penelopeysm

Thanks for this @sethaxen!

I'm probably not going to review line by line, but I can tell that the tests have largely been preserved, so I'm going to trust that all the minutiae have been ironed out as part of making CI pass and focus on broader things.

My first question (not sure if there are more) is to do with the SummaryStats struct. What's the best way to index into it? Is there a way to use the label :a here?

julia> ss = mean(Chains(ones(3, 2), [:a, :b]))
Mean
    mean
 a  1.00
 b  1.00

julia> ss[:mean][1]
1.0

src/summarize.jl

sethaxen · 2025-09-23T17:12:28Z

My first question (not sure if there are more) is to do with the SummaryStats struct. What's the best way to index into it? Is there a way to use the label :a here?

As documented here it can currently be treated like an OrderedDict, but this is not part of the official API. Only the Tables interface is. That's because I haven't decided on the API I want to support. For that I need user feedback on how they'd most like to use it (e.g. do they most want to select columns first or rows first, or both simultaneously? Will they want to select a subset of labels/columns by name?) I don't want to recreate DataFrame's entire indexing API if I don't need to. In the meantime, there's the OrderedDict-like interface for only interactive use, as well as the ability to interconvert between it and a DataFrame for more complicated sub-selection.

When I eventually add an official indexing API, it won't require a breaking release here or downstream.

penelopeysm · 2025-09-26T01:00:27Z

Thanks for the explanation!

Just the disclaimer to begin, I have absolutely no intention of reviewing this unfairly, but I'm also cognisant that there's some level of conflict of interest because I'm also separately working on FlexiChains (and trying to solve all these same tricky decisions with APIs!), and I can't prove that I'm fully disinterested. Feel free to say if you'd rather someone else review, I won't be offended in the least.

Whilst I'm very happy with the general aim of centralising functionality and reusing code, I think as it stands it's a loss in terms of usability if it's not possible to use the parameter name somehow. IMO, I should be able to combine the three things ss, :mean, and :a in order to get the mean of :a stored in ss. ChainDataFrame does this:

julia> using MCMCChains; ss = mean(Chains(rand(3, 2), [:a, :b]))
Mean
  parameters      mean
      Symbol   Float64

           a    0.6386
           b    0.5754


julia> ss[:a, :mean]
0.6385968125591274

For SummaryStats right now I've only come up with this, which is a bit verbose:

julia> using MCMCChains; ss = mean(Chains(rand(3, 2), [:a, :b]))
Mean
     mean
 a  0.460
 b  0.647

julia> ss[:mean][findfirst(isequal(:a), ss[:label])]
0.46036692554959524

Once there is an interface like that, I'd be very happy to approve this PR. I'm not particularly wedded to the interface being getindex or ss[:a, :mean], or row-based indexing in general. For example if you had ss[:mean] return a NamedTuple of (a = 0.460, b = 0.647) then we could do ss[:mean].a or ss[:mean][:a], and I'd be fine with that too (as long as it's documented -- which brings us to the next paragraph).

Regarding documentation in general, I just wanted to explain my general stance. I do indeed see that you're documenting things in PosteriorStats.jl (which is great!) but if we bring PosteriorStats into MCMCChains, we also need to worry about it in MCMCChains and (perhaps more importantly) the main Turing docs because this is very user-facing. Currently the interface of MCMCChains is quite underdocumented (it essentially amounts to 'learn by reading the examples in the docs'). This existing situation is of course not your fault at all, but I'd like to make sure that when we add new functionality, someone also ensures that it's explained in the main Turing docs.

yebai · 2025-09-28T15:12:16Z

as it stands it's a loss in terms of usability if it's not possible to use the parameter name somehow. IMO, I should be able to combine the three things ss, :mean, and :a in order to get the mean of :a stored in ss.

Can we consider this as part 1 of a larger change, and then have part 2 catch up with the interface?

@sethaxen, do you think you will be happy with that? Or would you rather have another Turing.jl team member (perhaps @shravanngoswamii) work on part 2?

shravanngoswamii · 2025-09-28T15:16:57Z

I would be happy to continue part 2 of this PR, @sethaxen, please let me know any comments or your thoughts that I should keep in mind!

sethaxen · 2025-09-30T09:38:16Z

To clarify, no changes would be needed at all to MCMCChains to support a new indexing syntax for SummaryStats; all changes would be in PosteriorStats, and they would be non-breaking. While prioritizing tasks, I had placed swapping PosteriorStats into MCMCChains as higher priority than finalizing SummaryStats's interface, but I can see @penelopeysm's reasonable point that the priorities could be reversed, and I'm fine with holding off on merging this PR until that interface is finalized. I'll take a pass at a design issue and will link it here for feedback.

yebai · 2025-12-01T09:46:50Z

@sethaxen, a gentle reminde on this -- it would be a great to complete this before 2026!

sethaxen added 21 commits August 20, 2023 15:45

Add PosteriorStats as dependency

43a5630

Import and reexport PosteriorStats functions

2994a56

Forward to PosteriorStats.summarize

9ffb60b

Update docstring

a6d16e5

Update docstring

4e99fe8

Forward summarystats to summarize

38dbdac

Simplify mean implementation

395a3d8

Simplify quantile implementation

4fa204e

Replace hpd with hdi

5c0f35d

Deprecate hpd

3660d0e

Simplify autocor implementation

e3d2d16

Remove unused keyword etype

49af2d9

Explicitly build list of stats

bdde660

Simultaneously compute all quantiles

d851307

Print an extra newline

1717483

Use and export SummaryStats

bf06653

Use SummaryStats in place of ChainsDataFrame

147b56b

Update and repair changerate

706f29c

Remove ChainDataFrame

2c07794

Update docs

58ae52a

Increment major version

340a694

sethaxen commented Aug 20, 2023

View reviewed changes

sethaxen added 8 commits August 20, 2023 19:23

Increment MCMCChains compat for docs

a67ac11

Refer to processed chains

1af83c8

Fix doctest

4aed409

Add back append_chains keyword

eef9393

Compute all lags simultaneously

6a1b482

Vectorize before autocor

c7d2021

Correctly insert chain id into name

bfe8deb

Update diagnostic tests

646e108

github-actions bot reviewed Sep 15, 2025

View reviewed changes

src/stats.jl Show resolved Hide resolved

Accept formatting suggestions

e31f4f9

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

github-actions bot reviewed Sep 15, 2025

View reviewed changes

Accept more formatting suggestions

994c0c5

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

sethaxen marked this pull request as draft September 15, 2025 12:55

Accept even more formatting suggestions

1009e9c

penelopeysm mentioned this pull request Sep 16, 2025

more stats penelopeysm/FlexiChains.jl#15

Closed

4 tasks

sethaxen added 4 commits September 22, 2025 23:26

Bump PosteriorStats compat to 0.4

f54b8c3

Set CI prob to default used by PosteriorStats

ccd9f5b

Update doctests

f89a4fb

Update tests

5025d32

sethaxen commented Sep 22, 2025

View reviewed changes

Bump major version number

cffd7eb

sethaxen marked this pull request as ready for review September 22, 2025 22:47

yebai requested review from penelopeysm and removed request for cpfiffer September 23, 2025 07:43

penelopeysm requested changes Sep 23, 2025

View reviewed changes

src/summarize.jl Show resolved Hide resolved

Improve docstring of summarize

75721b9

Merge branch 'main' into posteriorstats

160b722

gragusa mentioned this pull request Oct 15, 2025

Add compatibility with PrettyTable v3.0 #498

Merged

shravanngoswamii assigned sethaxen and unassigned shravanngoswamii Nov 28, 2025

shravanngoswamii linked an issue Dec 1, 2025 that may be closed by this pull request

No documentation for ChainDataFrame? #482

Open

	three_parms_df = DataFrame(summarize(
	chns[[:a, :b, :c]],
	mean, std;
	sections = [:parameters, :internals],
	))

	three_parms_df_2 = DataFrame(summarize(
	chns[[:a, :b, :g]],
	:mymean => mean,
	:mystd => std;
	sections = [:parameters, :internals],
	))

	@test setinfo(chn, NamedTuple{(:A, :B)}((1,2))).info == NamedTuple{(:A, :B)}((1,2))
	@test setinfo(chn, NamedTuple{(:A, :B)}((1, 2))).info == NamedTuple{(:A, :B)}((1, 2))

	tchain = Chains(rand(niter, nparams, nchains), ["a", "b", "c"], Dict(:internals => ["c"]))
	tchain =
	Chains(rand(niter, nparams, nchains), ["a", "b", "c"], Dict(:internals => ["c"]))

	acor = autocor(c; append_chains=append_chains)
	acor = autocor(c; append_chains = append_chains)

	@testset "diagnostics missing tests" for i in 1:nchains
	@testset "diagnostics missing tests" for i = 1:nchains

Import and extend PosteriorStats #431

Are you sure you want to change the base?

Import and extend PosteriorStats #431

Uh oh!

Conversation

sethaxen commented Aug 20, 2023 • edited by yebai Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sethaxen Aug 20, 2023

Choose a reason for hiding this comment

Uh oh!

torfjelde Feb 10, 2024

Choose a reason for hiding this comment

Uh oh!

sethaxen Feb 10, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

sethaxen Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

penelopeysm Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

sethaxen commented Sep 22, 2025

Uh oh!

penelopeysm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sethaxen commented Sep 23, 2025

Uh oh!

penelopeysm commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yebai commented Sep 28, 2025

Uh oh!

shravanngoswamii commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sethaxen commented Sep 30, 2025

Uh oh!

yebai commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

sethaxen commented Aug 20, 2023 •

edited by yebai

Loading

penelopeysm commented Sep 26, 2025 •

edited

Loading

shravanngoswamii commented Sep 28, 2025 •

edited

Loading