[ENH] V1 -> V2 Migration : Runs#1616
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1616 +/- ##
==========================================
+ Coverage 81.45% 81.89% +0.43%
==========================================
Files 63 63
Lines 5124 5170 +46
==========================================
+ Hits 4174 4234 +60
+ Misses 950 936 -14 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
geetu040
left a comment
There was a problem hiding this comment.
sync with base pr
sdk code look good so far, please take a look at #1575 (comment) and make changes accordingly where needed.
all tests (existing and new) should pass to make sure we are retaining the original functionality of the sdk
Signed-off-by: Omswastik-11 <omswastikpanda11@gmail.com>
…into runs-migration-stacked
Signed-off-by: Omswastik-11 <omswastikpanda11@gmail.com>
Signed-off-by: Omswastik-11 <omswastikpanda11@gmail.com>
…-11/openml-python into runs-migration-stacked
Signed-off-by: Omswastik-11 <omswastikpanda11@gmail.com>
Signed-off-by: Omswastik-11 <omswastikpanda11@gmail.com>
Signed-off-by: Omswastik-11 <omswastikpanda11@gmail.com>
Signed-off-by: Omswastik-11 <omswastikpanda11@gmail.com>
| use_cache = not ignore_cache | ||
| reset_cache = ignore_cache | ||
| return api_context.backend.runs.get( | ||
| run_id, | ||
| use_cache=use_cache, | ||
| reset_cache=reset_cache, | ||
| ) |
There was a problem hiding this comment.
use_cache should be true since the method always supports caching
reset_cache should rely on ignore_cache
for more information, see https://pre-commit.ci
geetu040
left a comment
There was a problem hiding this comment.
Nicely done @Omswastik-11.
@PGijsbers, could you please review/merge this PR when you get a chance?
There is currently one issue caused by differences between the test-server and local-server database entities, which is temporarily patched here: #1616 (comment).
I had mentioned this earlier on Slack as well here, we can continue discussion there
PGijsbers
left a comment
There was a problem hiding this comment.
small changes or clarifications requested, please see comments.
| path_parts = parsed_url.path.strip("/").split("/") | ||
|
|
||
| filtered_params = {k: v for k, v in params.items() if k != "api_key"} | ||
| params_part = [urlencode(filtered_params)] if filtered_params else [] |
There was a problem hiding this comment.
This is a good remark, but seeing as this code isn't touched by this PR, I would advocate fixing this in a separate PR.
| if response.content.startswith(b"PK\x03\x04"): | ||
| return "body.zip" | ||
|
|
||
| try: | ||
| arff.loads(response.text) | ||
| return "body.arff" | ||
| except arff.ArffException: | ||
| pass |
There was a problem hiding this comment.
Is there no HTTP header data that would allow us to tell what the content (and file name) should be?
Otherwise, at least for ARFF, the spec states that the first non-comment line of the file should be (not case sensitive): @relation <relation name>. So we could look for that instead of parsing the entire file content.
There was a problem hiding this comment.
Is there no HTTP header data that would allow us to tell what the content (and file name) should be?
I tried but didn't find anything
Otherwise, at least for ARFF, the spec states that the first non-comment line of the file should be (not case sensitive):
@relation <relation name>.
sounds good I could give this a try
| OpenMLHashException | ||
| If checksum verification fails. | ||
| """ | ||
| url = urljoin(self.server, path) |
There was a problem hiding this comment.
ignore: If this isn't the case already, this should be normalized when openml.config.server is set, not each site which uses it.
| if use_api_key: | ||
| params["api_key"] = self.api_key | ||
|
|
||
| if method.upper() in {"POST", "PUT", "PATCH"}: | ||
| data = {**params, **data} |
There was a problem hiding this comment.
ignore: It raises an exception if api_key is None, it's the statement preceding this line..
| self, | ||
| limit: int, | ||
| offset: int, | ||
| *, | ||
| ids: builtins.list[int] | None = None, |
There was a problem hiding this comment.
please address or explain; i see you have dismissed previous comments about this so presumably there is a reason?
|
|
||
| # Fall back to generic oml:id (used by other resources) | ||
| if "oml:id" in root_value: | ||
| return int(root_value["oml:id"]) |
There was a problem hiding this comment.
If run responses always return oml:run_id, when do we expect this code path to be correct to run?
There was a problem hiding this comment.
@Omswastik-11 since this method is overriden for runs, we shouldn't expect to handle other resources here, therefore logically this path should be unreachable as Pieter has said
There was a problem hiding this comment.
Yeah got it. I removed it.
| def test_run_v1_get(run_v1, with_test_cache): | ||
| try: | ||
| run = run_v1.get(run_id=1) | ||
| except OpenMLServerException as e: | ||
| if e.code == 236 or "Run not found" in str(e): | ||
| run = run_v1.get(run_id=25) | ||
| else: | ||
| raise | ||
| _assert_run_shape(run) | ||
|
|
There was a problem hiding this comment.
Didn't we have a way to check whether a local or non-local server configured is being used?
Then I would prefer to use that e.g.,
run_id = 25 if openml config is local else 1
That embeds this knowledge into the code so it's clear for future maintainers.
We probably do not have the time to address this on our end for a while longer :(
|
@Omswastik-11 could you go through the above comments, we'd need to close these discussions. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
openml/runs/run.py:373
- The
ObjectNotPublishedErrormessage here diverges from the established tagging error message used byOpenMLBase.remove_tag(viaopenml.utils._tag_openml_base), and it also drops the object context. Consider reusing the same wording/format for consistency across entity types.
if self.run_id is None:
raise openml.exceptions.ObjectNotPublishedError(
"Cannot untag a run that has not been published yet."
" Please publish the run first before being able to untag it.",
)
Co-authored-by: Pieter Gijsbers <p.gijsbers@tue.nl>
…-11/openml-python into runs-migration-stacked
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| if len(candidates) > 1: | ||
| raise FileNotFoundError( | ||
| f"Multiple body files found in path: {path} ({[p.name for p in candidates]})" | ||
| ) | ||
|
|
||
| return candidates[0].name | ||
|
|
There was a problem hiding this comment.
I don't think this case can be expected given the v1/v2 endpoints
| def test_run_v1_get(run_v1, with_test_cache): | ||
| import os | ||
|
|
||
| # Run 1 exists on the remote test server; the local docker server only seeds run 25. | ||
| run_id = 25 if os.getenv("OPENML_USE_LOCAL_SERVICES") == "true" else 1 | ||
| run = run_v1.get(run_id=run_id) | ||
| _assert_run_shape(run) |
There was a problem hiding this comment.
we'll use cached run instead, so this can be ignored
| # Run 1 exists on the remote test server; the local docker server only seeds run 25. | ||
| run_id = 25 if os.getenv("OPENML_USE_LOCAL_SERVICES") == "true" else 1 | ||
| run = run_v1.get(run_id=run_id) | ||
| _assert_run_shape(run) |
There was a problem hiding this comment.
I believe you were trying to do this in the start. this would be the right way to use run from cache to avoid different entities on both servers.
- def test_run_v1_get(run_v1, with_test_cache):
- import os
-
- # Run 1 exists on the remote test server; the local docker server only seeds run 25.
- run_id = 25 if os.getenv("OPENML_USE_LOCAL_SERVICES") == "true" else 1
- run = run_v1.get(run_id=run_id)
- _assert_run_shape(run)
+ def test_run_v1_get(run_v1, test_files_directory):
+ openml.config.set_root_cache_directory(test_files_directory)
+ run = run_v1.get(run_id=1)
+ _assert_run_shape(run)
Metadata
Details
fixes #1624