ENH+BF: Format species names consistently in species_map, make matching more stringent for common names, improved testing#1866
Conversation
Adds NCBI taxon 9548 (Macaca radiata, common name "bonnet macaque") to the species lookup table so the species can be recognized by common name, "radiata", or its NCBITaxon URL. Requested by a user via email. Co-Authored-By: Claude Code 2.1.154 / Claude Opus 4.7 <noreply@anthropic.com>
1a2d7a9 to
813fb8e
Compare
…fall back + more thorough testing of mapping itself Added by copilot test revealed a problem of matching 'rat' potentially to multiple (thus order dependent) entries since we have multiple 'rat's but only one of them is commonly named 'rat'. So I decided to split out that other test for common names out. Also extended with a dedicated test for that. Made it potentially gain multiple matches and then explicitly fail/raise exception so we do not proceed with some oddly matched item
…taining Otherwise we would be matching partial names etc. Better fail than to do wrong match!
4f2e7f4 to
db81fa7
Compare
|
@copilot I force pushed my changes over yours, improved matching and testing. Now
|
|
@copilot also please review my changes as well! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## enh-macaca-radiata #1866 +/- ##
======================================================
- Coverage 76.74% 76.73% -0.01%
======================================================
Files 87 87
Lines 12769 12788 +19
======================================================
+ Hits 9799 9813 +14
- Misses 2970 2975 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Addressed in a8834c2. The type-check failure was the |
Reviewed the current force-pushed changes and I don’t see a blocking follow-up from my side. I also rechecked the relevant validations locally: |
…ormat-species-names-consiste
|
|
||
| # common_names, prefix, uri, name | ||
| # common_names, prefix, uri, name ({current name} - {GenBank common name}) | ||
| species_map = [ |
There was a problem hiding this comment.
Do you think, as this list grows and becomes more complicated, it might be better as a formal model, or a data class, or at the very least a namedtuple? Instead of just loosely structured a list of tuples
There was a problem hiding this comment.
good idea, filed an issue , feel welcome to put your claude to it after we are done with this PR
There was a problem hiding this comment.
Sure, sure; out of credits for this month so maybe when back from break
| value_matches.append((uri, name)) | ||
| # only if no matches -- try to match by common names within common names | ||
| if not value_matches: | ||
| for common_names, prefix, uri, name in species_map: |
There was a problem hiding this comment.
Towards consistency (+/- future class-based representation) the line above marked such fields as private in the scope, (not must sense to me) so might want to adjust that to match; https://github.com/dandi/dandi-cli/pull/1866/changes#diff-f1fffc52a57ec0f6b12d707f48d19c51f4dd81a4606d3535935d38e10e710bbdL470
Interesting though, https://github.com/dandi/dandi-cli/pull/1866/changes#diff-f1fffc52a57ec0f6b12d707f48d19c51f4dd81a4606d3535935d38e10e710bbdR498 didn't
| if any(key == lower_value for key in common_names): | ||
| value_matches.append((uri, name)) |
There was a problem hiding this comment.
Plus if this were a class unto itself you could define such equivalency logic as a method within it to make a little more sense from the outside looking in, such as .match(...), possible arguments exact or loose
| f"Got multiple ({len(value_matches)}) species matched for " | ||
| f"{value_orig}: {value_matches}. Should not happen. " | ||
| "Either you need to be more specific or our code needs " | ||
| "fixing - contact help@dandiarchive.org." |
There was a problem hiding this comment.
I do like including such info in error since it gives the user something Actionable~(A) at least
|
Minor comments, nothing blocking |
Thanks for the feedback on #1865. I've created this new PR, which merges into #1865, to address your comment. I will work on the changes and keep this PR's description up to date as I make progress.
Original PR: #1865
Triggering review: #1865 (comment)