Skip to content

use posterior::gpdfit and posterior::qgeneralized_pareto()#305

Merged
jgabry merged 12 commits into
masterfrom
use-posterior-gpdfit
Apr 17, 2026
Merged

use posterior::gpdfit and posterior::qgeneralized_pareto()#305
jgabry merged 12 commits into
masterfrom
use-posterior-gpdfit

Conversation

@avehtari

Copy link
Copy Markdown
Member

Since we do use posterior package elsewhere,
replace gpdfit() and qgpd() with posterior::gpdfit and posterior::qgeneralized_pareto()

This will make maintenance easier. For example, there is a PR for posterior::gpdfit() to make it more robust and posterior::qgeneralized_pareto() already is more robust than qgpd()

This is related to #249, but timing is due to the fix PR for posterior::gpdfit()

@avehtari avehtari requested a review from jgabry October 17, 2025 17:10

@jgabry jgabry left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the R cmd check error is because the NAMESPACE file is still trying to export gpdfit from loo. If you run devtools::document() it should fix this and then we can see if all the tests pass.

@avehtari

Copy link
Copy Markdown
Member Author

If it has been exported, then someone might be using it? Should I put it back and define it with posterior::qgeneralized_pareto()?

@avehtari

Copy link
Copy Markdown
Member Author

At least in github only I and Juho Timonen have used loo::gpdfit, so I guess it can be removed

@jgabry

jgabry commented Oct 17, 2025

Copy link
Copy Markdown
Member

Good point. Yeah you can just replace the bodies of the functions with calls to the posterior versions.

@jgabry

jgabry commented Oct 17, 2025

Copy link
Copy Markdown
Member

At least in github only I and Juho Timonen have used loo::gpdfit, so I guess it can be removed

OK in that case, we can try removing it and we will find out in the reverse dependency checks whether any packages are using it

@jgabry

jgabry commented Oct 17, 2025

Copy link
Copy Markdown
Member

The error now is because apparently these functions aren't exported from posterior, they're just internal functions. That would be easy to change but it would require doing a CRAN release of posterior.

@avehtari

Copy link
Copy Markdown
Member Author

my bad, I guess this needs to wait for the next posterior release

@jgabry

jgabry commented Oct 17, 2025

Copy link
Copy Markdown
Member

There is also #290 from @VisruthSK that is waiting to be merged until there is a new release of posterior

@VisruthSK VisruthSK marked this pull request as draft February 21, 2026 21:30
@codecov-commenter

codecov-commenter commented Apr 7, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.75%. Comparing base (df56db4) to head (8d83fc2).
⚠️ Report is 13 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #305      +/-   ##
==========================================
+ Coverage   92.73%   92.75%   +0.02%     
==========================================
  Files          31       31              
  Lines        3057     2983      -74     
==========================================
- Hits         2835     2767      -68     
+ Misses        222      216       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@VisruthSK VisruthSK marked this pull request as ready for review April 7, 2026 17:40
@VisruthSK VisruthSK requested a review from jgabry April 7, 2026 17:41
@jgabry

jgabry commented Apr 7, 2026

Copy link
Copy Markdown
Member

I think this needs to bump the posterior version in DESCRIPTION (unless it's merged after the other one that does that, but not sure which will be merged first)

@VisruthSK

Copy link
Copy Markdown
Member

Right, in my head it was being merged after the previous one. Will fix that now.

@jgabry

jgabry commented Apr 7, 2026

Copy link
Copy Markdown
Member

Right, in my head it was being merged after the previous one. Will fix that now.

We need to figure out what to do about the warnings in the other PR, so will probably do this one first. But for this one, it occurs to me that we need to reexport gpdfit from posterior in order to avoid breaking backwards compatibility (it's currently exported from loo so could be called as loo::gpfit in someone's code.

@VisruthSK

Copy link
Copy Markdown
Member

Will restore tests too then

@jgabry

jgabry commented Apr 7, 2026

Copy link
Copy Markdown
Member

Do you think we need to test it in addition to exporting it? I assume it's tested in posterior. Maybe good to test here too, but I'm not sure

@VisruthSK

Copy link
Copy Markdown
Member

If we are rexporting it I think it makes to test, mainly so that the API is the same no? The tests were super minimal anyway.

@jgabry

jgabry commented Apr 7, 2026

Copy link
Copy Markdown
Member

Ok yeah good call

@jgabry

jgabry commented Apr 8, 2026

Copy link
Copy Markdown
Member

Aside from needing to reexport and add the tests back, this looks good.

@jgabry

jgabry commented Apr 8, 2026

Copy link
Copy Markdown
Member

Since we had a snapshot test, adding the tests back should tell us whether the posterior version returns the exact same values as our previous loo version right? Or was the snapshot generated with this new version? We should be sure we're not getting meaningfully different results.

@VisruthSK

Copy link
Copy Markdown
Member

Snaps aren't modified in this run, so if tests pass they're being checked against the originally generated ones.

@jgabry jgabry left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks. Although I wish we had #348 set up to make sure the posterior versions aren't less efficient. For example, maybe they add overhead due to doing more type and argument checking that we skipped in the previous loo implementation?

@VisruthSK

Copy link
Copy Markdown
Member

I mean this isn't an urgent PR right? I can setup the mechanics of touchstone if you'd like, but the real work is deciding what benchmarks to run, which will require more careful thought and discussion.

@jgabry

jgabry commented Apr 8, 2026

Copy link
Copy Markdown
Member

Thanks, yeah not super urgent. I don't think the benchmarks need to be too complicated. We probably don't need to benchmark individual functions other than loo(), at least not right away (or maybe ever). Since we're just worried about efficiency here it doesn't matter if it's a realistic example that we use for the benchmark. Maybe codex and claude are good enough to suggest some that don't take too long to run but are long enough that we notice performance regressions? That's probably the tricky part here, right? We don't want CI to take forever but we want to notice real regressions. Or are there other considerations you can think of?

@VisruthSK

Copy link
Copy Markdown
Member

Hm fair enough I guess we could just run the toplevel function. I suppose that's what I've been doing too--but it would be nice to get a little but of a profile by breaking down the call into some cogent sub-calls--i.e., if there's a PR which speeds up PSIS, it'd be nice to see that most things don't change but PSIS takes half the time or whatever.

@jgabry

jgabry commented Apr 8, 2026

Copy link
Copy Markdown
Member

Yeah good point

@VisruthSK

VisruthSK commented Apr 8, 2026

Copy link
Copy Markdown
Member

I guess we could shelve the finer touchstone tests for now and revist once we already have working pipeline--writing in #352.

@github-actions

Copy link
Copy Markdown

This is how benchmark results would change (along with a 95% confidence interval in relative change) if cddb7cc is merged into master:

  • 🚀loo_function: 1.99s -> 1.74s [-12.9%, -12.07%]
  • 🚀loo_matrix: 1.53s -> 1.28s [-17.25%, -16.27%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@VisruthSK VisruthSK mentioned this pull request Apr 14, 2026
@jgabry

jgabry commented Apr 14, 2026

Copy link
Copy Markdown
Member

This is how benchmark results would change (along with a 95% confidence interval in relative change) if cddb7cc is merged into master:

  • 🚀loo_function: 1.99s -> 1.74s [-12.9%, -12.07%]
  • 🚀loo_matrix: 1.53s -> 1.28s [-17.25%, -16.27%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

Ok so using the posterior versions makes it faster?

@VisruthSK

Copy link
Copy Markdown
Member

Yup looks like it. We should probably increase the number of repetitions if we care about the actual reduction amount. Kinda odd to have confidence intervals in a Stan project...

@jgabry

jgabry commented Apr 15, 2026

Copy link
Copy Markdown
Member

I agree we should increase the repetitions. We’ll forgive the author of the touchstone package for the lack of a Bayesian analysis ;)

@VisruthSK

Copy link
Copy Markdown
Member

New package, touchstan?

Do you want to bump up the repeats and see how the numbers shake out, or do you think this is good to merge?

@avehtari

Copy link
Copy Markdown
Member Author

Confidence interval is acceptable, but definitely there is not enough accuracy to justify more than two significant digits

1.53s -> 1.28s [-17.25%, -16.27%]

@jgabry

jgabry commented Apr 15, 2026

Copy link
Copy Markdown
Member

Do you want to bump up the repeats and see how the numbers shake out, or do you think this is good to merge?

I would increase the repetitions if the code is pretty fast. But it's still taking ~1 hr (https://github.com/stan-dev/loo/actions/runs/24422791263/job/71349231024?pr=305). Is that because it's not caching the packages?

@VisruthSK

Copy link
Copy Markdown
Member

Yes I think it's building a lot of packages from source. I think I know a way around that if reruns are slow. If it's back to the times we saw before, about 5 minutes for a whole run at this repetition count, what do you think would be a good number of reps? How slow is okay, 30 min?

@jgabry

jgabry commented Apr 15, 2026

Copy link
Copy Markdown
Member

Yeah, I think 20-30 minutes is fine

Update posterior gpdfit branch to get newer touchstone functionality
@github-actions

Copy link
Copy Markdown

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 8d83fc2 is merged into master:

  • 🚀loo_function: 2.07s -> 1.74s [-16.4%, -15.61%]
  • 🚀loo_matrix: 1.57s -> 1.28s [-18.73%, -18.17%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@VisruthSK

Copy link
Copy Markdown
Member

@jgabry good to merge? This is with n=60 reps per branch. Slow touchstone action because I moved to new ubuntu and R version.

@jgabry

jgabry commented Apr 17, 2026

Copy link
Copy Markdown
Member

Thanks, yeah I think this one is good to go. Merging now.

@jgabry jgabry merged commit 85db294 into master Apr 17, 2026
8 checks passed
@jgabry jgabry deleted the use-posterior-gpdfit branch April 17, 2026 14:35
vinniott added a commit to vinniott/loo that referenced this pull request May 10, 2026
…pdfit"

This reverts commit 85db294, reversing
changes made to df56db4.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants