Skip to content

[ENH] V1 -> V2 Migration : Runs#1616

Open
Omswastik-11 wants to merge 318 commits into
openml:mainfrom
Omswastik-11:runs-migration-stacked
Open

[ENH] V1 -> V2 Migration : Runs#1616
Omswastik-11 wants to merge 318 commits into
openml:mainfrom
Omswastik-11:runs-migration-stacked

Conversation

@Omswastik-11
Copy link
Copy Markdown
Contributor

@Omswastik-11 Omswastik-11 commented Jan 15, 2026

Metadata

  • Reference Issue:
  • New Tests Added:
  • Documentation Updated:
  • Change Log Entry:

Details

fixes #1624

@geetu040 geetu040 mentioned this pull request Jan 15, 2026
18 tasks
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jan 15, 2026

Codecov Report

❌ Patch coverage is 72.35772% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.89%. Comparing base (1f6fed4) to head (8c8426a).

Files with missing lines Patch % Lines
openml/_api/resources/run.py 75.94% 19 Missing ⚠️
openml/_api/clients/http.py 59.09% 9 Missing ⚠️
openml/runs/run.py 68.42% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1616      +/-   ##
==========================================
+ Coverage   81.45%   81.89%   +0.43%     
==========================================
  Files          63       63              
  Lines        5124     5170      +46     
==========================================
+ Hits         4174     4234      +60     
+ Misses        950      936      -14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync with base pr
sdk code look good so far, please take a look at #1575 (comment) and make changes accordingly where needed.
all tests (existing and new) should pass to make sure we are retaining the original functionality of the sdk

Comment thread openml/_api/resources/runs.py Outdated
Comment thread openml/_api/resources/runs.py Outdated
Comment thread openml/_api/resources/runs.py Outdated
Comment thread openml/runs/functions.py Outdated
Signed-off-by: Omswastik-11 <omswastikpanda11@gmail.com>
Signed-off-by: Omswastik-11 <omswastikpanda11@gmail.com>
@Omswastik-11 Omswastik-11 requested a review from geetu040 January 30, 2026 09:50
@Omswastik-11 Omswastik-11 marked this pull request as ready for review January 30, 2026 09:50
Signed-off-by: Omswastik-11 <omswastikpanda11@gmail.com>
Signed-off-by: Omswastik-11 <omswastikpanda11@gmail.com>
Signed-off-by: Omswastik-11 <omswastikpanda11@gmail.com>
Signed-off-by: Omswastik-11 <omswastikpanda11@gmail.com>
Signed-off-by: Omswastik-11 <omswastikpanda11@gmail.com>
Comment thread openml/runs/functions.py Outdated
Comment on lines +822 to +828
use_cache = not ignore_cache
reset_cache = ignore_cache
return api_context.backend.runs.get(
run_id,
use_cache=use_cache,
reset_cache=reset_cache,
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_cache should be true since the method always supports caching
reset_cache should rely on ignore_cache

Copilot AI review requested due to automatic review settings May 11, 2026 12:35
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.

Comment thread openml/_api/clients/http.py
Comment thread openml/_api/clients/http.py Outdated
Comment thread openml/_api/resources/run.py Outdated
Comment thread openml/_api/resources/base/resources.py
Comment thread tests/test_api/test_run.py Outdated
@Omswastik-11 Omswastik-11 requested a review from geetu040 May 11, 2026 12:47
Copy link
Copy Markdown
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done @Omswastik-11.

@PGijsbers, could you please review/merge this PR when you get a chance?

There is currently one issue caused by differences between the test-server and local-server database entities, which is temporarily patched here: #1616 (comment).

I had mentioned this earlier on Slack as well here, we can continue discussion there

Copy link
Copy Markdown
Collaborator

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small changes or clarifications requested, please see comments.

path_parts = parsed_url.path.strip("/").split("/")

filtered_params = {k: v for k, v in params.items() if k != "api_key"}
params_part = [urlencode(filtered_params)] if filtered_params else []
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good remark, but seeing as this code isn't touched by this PR, I would advocate fixing this in a separate PR.

Comment thread openml/_api/clients/http.py Outdated
Comment on lines +102 to +109
if response.content.startswith(b"PK\x03\x04"):
return "body.zip"

try:
arff.loads(response.text)
return "body.arff"
except arff.ArffException:
pass
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no HTTP header data that would allow us to tell what the content (and file name) should be?
Otherwise, at least for ARFF, the spec states that the first non-comment line of the file should be (not case sensitive): @relation <relation name>. So we could look for that instead of parsing the entire file content.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no HTTP header data that would allow us to tell what the content (and file name) should be?

I tried but didn't find anything

Otherwise, at least for ARFF, the spec states that the first non-comment line of the file should be (not case sensitive): @relation <relation name>.

sounds good I could give this a try

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 5c20f22

OpenMLHashException
If checksum verification fails.
"""
url = urljoin(self.server, path)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore: If this isn't the case already, this should be normalized when openml.config.server is set, not each site which uses it.

Comment thread openml/_api/clients/http.py Outdated
Comment on lines +598 to +602
if use_api_key:
params["api_key"] = self.api_key

if method.upper() in {"POST", "PUT", "PATCH"}:
data = {**params, **data}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore: It raises an exception if api_key is None, it's the statement preceding this line..

Comment on lines +102 to +106
self,
limit: int,
offset: int,
*,
ids: builtins.list[int] | None = None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please address or explain; i see you have dismissed previous comments about this so presumably there is a reason?

Comment thread openml/_api/resources/run.py Outdated

# Fall back to generic oml:id (used by other resources)
if "oml:id" in root_value:
return int(root_value["oml:id"])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If run responses always return oml:run_id, when do we expect this code path to be correct to run?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Omswastik-11 since this method is overriden for runs, we shouldn't expect to handle other resources here, therefore logically this path should be unreachable as Pieter has said

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah got it. I removed it.

Comment thread tests/test_api/test_run.py Outdated
Comment on lines +32 to +41
def test_run_v1_get(run_v1, with_test_cache):
try:
run = run_v1.get(run_id=1)
except OpenMLServerException as e:
if e.code == 236 or "Run not found" in str(e):
run = run_v1.get(run_id=25)
else:
raise
_assert_run_shape(run)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we have a way to check whether a local or non-local server configured is being used?
Then I would prefer to use that e.g.,

run_id = 25 if openml config is local else 1

That embeds this knowledge into the code so it's clear for future maintainers.
We probably do not have the time to address this on our end for a while longer :(

@geetu040
Copy link
Copy Markdown
Collaborator

@Omswastik-11 could you go through the above comments, we'd need to close these discussions.

Copilot AI review requested due to automatic review settings May 22, 2026 12:58
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

openml/runs/run.py:373

  • The ObjectNotPublishedError message here diverges from the established tagging error message used by OpenMLBase.remove_tag (via openml.utils._tag_openml_base), and it also drops the object context. Consider reusing the same wording/format for consistency across entity types.
        if self.run_id is None:
            raise openml.exceptions.ObjectNotPublishedError(
                "Cannot untag a run that has not been published yet."
                " Please publish the run first before being able to untag it.",
            )

Comment thread openml/_api/resources/base/resources.py
Comment thread tests/test_api/test_run.py Outdated
Comment thread openml/runs/run.py
Comment thread openml/_api/clients/http.py Outdated
Co-authored-by: Pieter Gijsbers <p.gijsbers@tue.nl>
Copilot AI review requested due to automatic review settings May 22, 2026 13:06
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.

Comment thread openml/_api/resources/base/resources.py
Comment thread openml/_api/resources/run.py Outdated
Comment thread openml/_api/clients/http.py Outdated
Comment thread openml/_api/clients/http.py
Comment thread tests/test_api/test_run.py Outdated
Copilot AI review requested due to automatic review settings May 22, 2026 13:32
@Omswastik-11 Omswastik-11 requested a review from PGijsbers May 22, 2026 13:33
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

Comment on lines +119 to 125
if len(candidates) > 1:
raise FileNotFoundError(
f"Multiple body files found in path: {path} ({[p.name for p in candidates]})"
)

return candidates[0].name

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this case can be expected given the v1/v2 endpoints

Comment thread tests/test_api/test_run.py Outdated
Comment on lines +32 to +38
def test_run_v1_get(run_v1, with_test_cache):
import os

# Run 1 exists on the remote test server; the local docker server only seeds run 25.
run_id = 25 if os.getenv("OPENML_USE_LOCAL_SERVICES") == "true" else 1
run = run_v1.get(run_id=run_id)
_assert_run_shape(run)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll use cached run instead, so this can be ignored

# Run 1 exists on the remote test server; the local docker server only seeds run 25.
run_id = 25 if os.getenv("OPENML_USE_LOCAL_SERVICES") == "true" else 1
run = run_v1.get(run_id=run_id)
_assert_run_shape(run)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you were trying to do this in the start. this would be the right way to use run from cache to avoid different entities on both servers.

- def test_run_v1_get(run_v1, with_test_cache):
-     import os
- 
-     # Run 1 exists on the remote test server; the local docker server only seeds run 25.
-     run_id = 25 if os.getenv("OPENML_USE_LOCAL_SERVICES") == "true" else 1
-     run = run_v1.get(run_id=run_id)
-     _assert_run_shape(run)

+ def test_run_v1_get(run_v1, test_files_directory):
+     openml.config.set_root_cache_directory(test_files_directory)
+     run = run_v1.get(run_id=1)
+     _assert_run_shape(run)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in 7e57779

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENH] V1 → V2 API Migration - runs

9 participants