prohibit duplicate key columns by ben-schwen · Pull Request #7760 · Rdatatable/data.table

ben-schwen · 2026-05-26T14:28:48Z

Closes #4888
Closes #4891

github-actions · 2026-05-26T14:54:19Z

HEAD=duplicated_key_columns slower P<0.001 for memrecycle regression fixed in #5463

Generated via commit 7253f69

Download link for the artifact containing the test results: ↓ atime-results.zip

Task	Duration
R setup and installing dependencies	2 minutes and 41 seconds
Installing different package versions	47 seconds
Running and plotting the test cases	5 minutes and 23 seconds

MichaelChirico · 2026-05-26T20:05:22Z

Gemini identified some remaining ways for ambiguity to creep in:

dt = data.table(a=1:2, a.1=3:4, val=10:11)
dt[, .(a.1, sum(val)), keyby=.(a, a)]
# Key: <a, a.1>
#        a   a.1   a.1    V2
#    <int> <int> <int> <int>
# 1:     1     1     1    10
# 2:     2     2     2    11

dt = data.table(a=1:2, b=3:4, key="a")
dt[, .(a, a)]
# Key: <a>
#        a     a
#    <int> <int>
# 1:     1     1
# 2:     2     2
subset(dt, select=c(a, a))
# Key: <a>
#        a     a
#    <int> <int>
# 1:     1     1
# 2:     2     2

…mns' into duplicated_key_columns

codecov · 2026-05-27T09:17:40Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.04%. Comparing base (d4974e9) to head (7253f69).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #7760   +/-   ##
=======================================
  Coverage   99.04%   99.04%           
=======================================
  Files          87       87           
  Lines       17064    17087   +23     
=======================================
+ Hits        16901    16924   +23     
  Misses        163      163

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ben-schwen · 2026-05-27T13:01:05Z

Gemini identified some remaining ways for ambiguity to creep in:

dt = data.table(a=1:2, a.1=3:4, val=10:11)
dt[, .(a.1, sum(val)), keyby=.(a, a)]
# Key: <a, a.1>
#        a   a.1   a.1    V2
#    <int> <int> <int> <int>
# 1:     1     1     1    10
# 2:     2     2     2    11

dt = data.table(a=1:2, b=3:4, key="a")
dt[, .(a, a)]
# Key: <a>
#        a     a
#    <int> <int>
# 1:     1     1
# 2:     2     2
subset(dt, select=c(a, a))
# Key: <a>
#        a     a
#    <int> <int>
# 1:     1     1
# 2:     2     2

I've added these cases, but I'm sure we will encounter special versions of these again

MichaelChirico · 2026-05-27T16:55:38Z

Yep, sharing because those look pretty easy to encounter in practice. Remaining ones will be more baroque. A fresh session finds nothing, so I think this is good now.

MichaelChirico · 2026-05-27T16:57:48Z

+id1  = sample(letters, 10)   # reduced from 20 to 10
+id2  = id1
+date = 1:10                  #   and 40 to 10 to save ram, #5517
+dt = setkey(data.table(CJ(date, id1, id2)), NULL)


test is somewhat confusing, is CJ(..., sorted=FALSE) not enough?

MichaelChirico · 2026-05-27T16:59:07Z

    if (verbose) {cat(timetaken(last.started.at),"\n"); flush.console()}
  } else if (.by_result_is_keyable(x, keyby, bysameorder, byjoin, allbyvars, bysub)) {
-    setattr(ans, "sorted", names(ans)[seq_along(grpcols)])
+    if (!any(names(ans)[seq_along(grpcols)] %chin% duplicated_values(names(ans))))


save names(ans)[seq_along(grpcols)] to a variable

prohibit duplicate key columns

b55e13c

ben-schwen requested a review from MichaelChirico as a code owner May 26, 2026 14:28

ben-schwen and others added 4 commits May 27, 2026 09:28

Merge branch 'master' into duplicated_key_columns

7eb7397

update test setup

404d241

Merge remote-tracking branch 'refs/remotes/origin/duplicated_key_colu…

499661f

…mns' into duplicated_key_columns

adjust test setups

997b5fc

add Mikes found cases

7253f69

MichaelChirico reviewed May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prohibit duplicate key columns#7760

prohibit duplicate key columns#7760
ben-schwen wants to merge 6 commits into
masterfrom
duplicated_key_columns

ben-schwen commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026 •

edited

Loading

Uh oh!

MichaelChirico commented May 26, 2026

Uh oh!

codecov Bot commented May 27, 2026 •

edited

Loading

Uh oh!

ben-schwen commented May 27, 2026

Uh oh!

MichaelChirico commented May 27, 2026

Uh oh!

MichaelChirico May 27, 2026

Uh oh!

MichaelChirico May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ben-schwen commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MichaelChirico commented May 26, 2026

Uh oh!

codecov Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ben-schwen commented May 27, 2026

Uh oh!

MichaelChirico commented May 27, 2026

Uh oh!

MichaelChirico May 27, 2026

Choose a reason for hiding this comment

Uh oh!

MichaelChirico May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 26, 2026 •

edited

Loading

codecov Bot commented May 27, 2026 •

edited

Loading