Skip to content

ENH+BF: Format species names consistently in species_map, make matching more stringent for common names, improved testing#1866

Open
Copilot wants to merge 9 commits into
enh-macaca-radiatafrom
copilot/wip-address-feedback-format-species-names-consiste
Open

ENH+BF: Format species names consistently in species_map, make matching more stringent for common names, improved testing#1866
Copilot wants to merge 9 commits into
enh-macaca-radiatafrom
copilot/wip-address-feedback-format-species-names-consiste

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 28, 2026

Thanks for the feedback on #1865. I've created this new PR, which merges into #1865, to address your comment. I will work on the changes and keep this PR's description up to date as I make progress.

Original PR: #1865
Triggering review: #1865 (comment)

I think this version is better than just

        "Macaca radiata",

as we did for "Macaca nemestrina" above . I think we should expand all others into similar

{current name} - {Genbank common name} form which seems they all follow.

@copilot -- prepare a PR where all other entries follow the same format for their "name" and also expand description above on the expected format of the "name" field

yarikoptic and others added 2 commits May 28, 2026 17:42
Adds NCBI taxon 9548 (Macaca radiata, common name "bonnet macaque") to
the species lookup table so the species can be recognized by common
name, "radiata", or its NCBITaxon URL.

Requested by a user via email.

Co-Authored-By: Claude Code 2.1.154 / Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI requested a review from yarikoptic May 28, 2026 23:00
Comment thread dandi/metadata/util.py Outdated
Copilot AI requested a review from yarikoptic May 29, 2026 03:41
Comment thread dandi/metadata/util.py Outdated
…fall back + more thorough testing of mapping itself

Added by copilot test revealed a problem of matching 'rat' potentially to multiple (thus order dependent) entries
since we have multiple 'rat's but only one of them is commonly named 'rat'.  So I decided to split out
that other test for common names out. Also extended with a dedicated test for that.

Made it potentially gain multiple matches and then explicitly fail/raise exception so we do not proceed with some
oddly matched item
…taining

Otherwise we would be matching partial names etc. Better fail than to do wrong match!
@yarikoptic yarikoptic force-pushed the copilot/wip-address-feedback-format-species-names-consiste branch from 4f2e7f4 to db81fa7 Compare May 29, 2026 13:13
@yarikoptic yarikoptic added the patch Increment the patch version when merged label May 29, 2026
@yarikoptic yarikoptic changed the title [WIP] [WIP] Address feedback: format species names consistently in species_map ENH+BF: Format species names consistently in species_map, make matching more stringent for common names, improved testing May 29, 2026
@yarikoptic
Copy link
Copy Markdown
Member

@copilot I force pushed my changes over yours, improved matching and testing. Now

  • Address type-check fails
  • Adjust overall PR description to reflect carried out changes fuller (I already adjusted the title)

@yarikoptic
Copy link
Copy Markdown
Member

@copilot also please review my changes as well!

@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

Codecov Report

❌ Patch coverage is 94.28571% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.73%. Comparing base (813fb8e) to head (e85ef07).

Files with missing lines Patch % Lines
dandi/metadata/util.py 85.71% 2 Missing ⚠️
Additional details and impacted files
@@                  Coverage Diff                   @@
##           enh-macaca-radiata    #1866      +/-   ##
======================================================
- Coverage               76.74%   76.73%   -0.01%     
======================================================
  Files                      87       87              
  Lines                   12769    12788      +19     
======================================================
+ Hits                     9799     9813      +14     
- Misses                   2970     2975       +5     
Flag Coverage Δ
unittests 76.73% <94.28%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 29, 2026

@copilot I force pushed my changes over yours, improved matching and testing. Now

  • Address type-check fails
  • Adjust overall PR description to reflect carried out changes fuller (I already adjuste...

Addressed in a8834c2. The type-check failure was the value_matches annotation not allowing the URI fallback's optional name; widening it to str | None makes tox -e typing pass again. I also updated the summary text to reflect the current scope more fully.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 29, 2026

@copilot also please review my changes as well!

Reviewed the current force-pushed changes and I don’t see a blocking follow-up from my side. I also rechecked the relevant validations locally: python -m pytest -q dandi/tests/test_metadata.py passed, and python -m tox -e typing passed. Recent completed Linters and Type-check runs on this branch are successful as well.

@yarikoptic yarikoptic marked this pull request as ready for review May 29, 2026 14:25
Comment thread dandi/metadata/util.py

# common_names, prefix, uri, name
# common_names, prefix, uri, name ({current name} - {GenBank common name})
species_map = [
Copy link
Copy Markdown
Contributor

@CodyCBakerPhD CodyCBakerPhD May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think, as this list grows and becomes more complicated, it might be better as a formal model, or a data class, or at the very least a namedtuple? Instead of just loosely structured a list of tuples

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, filed an issue , feel welcome to put your claude to it after we are done with this PR

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, sure; out of credits for this month so maybe when back from break

Comment thread dandi/metadata/util.py
value_matches.append((uri, name))
# only if no matches -- try to match by common names within common names
if not value_matches:
for common_names, prefix, uri, name in species_map:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Towards consistency (+/- future class-based representation) the line above marked such fields as private in the scope, (not must sense to me) so might want to adjust that to match; https://github.com/dandi/dandi-cli/pull/1866/changes#diff-f1fffc52a57ec0f6b12d707f48d19c51f4dd81a4606d3535935d38e10e710bbdL470

Interesting though, https://github.com/dandi/dandi-cli/pull/1866/changes#diff-f1fffc52a57ec0f6b12d707f48d19c51f4dd81a4606d3535935d38e10e710bbdR498 didn't

Comment thread dandi/metadata/util.py
Comment on lines +508 to +509
if any(key == lower_value for key in common_names):
value_matches.append((uri, name))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus if this were a class unto itself you could define such equivalency logic as a method within it to make a little more sense from the outside looking in, such as .match(...), possible arguments exact or loose

Comment thread dandi/metadata/util.py
f"Got multiple ({len(value_matches)}) species matched for "
f"{value_orig}: {value_matches}. Should not happen. "
"Either you need to be more specific or our code needs "
"fixing - contact help@dandiarchive.org."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like including such info in error since it gives the user something Actionable~(A) at least

@CodyCBakerPhD
Copy link
Copy Markdown
Contributor

Minor comments, nothing blocking

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

patch Increment the patch version when merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants