-
Notifications
You must be signed in to change notification settings - Fork 1k
delete rows by reference #7536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
delete rows by reference #7536
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7536 +/- ##
==========================================
- Coverage 98.97% 98.91% -0.06%
==========================================
Files 87 88 +1
Lines 16733 16853 +120
==========================================
+ Hits 16561 16670 +109
- Misses 172 183 +11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Generated via commit 57a4bdc Download link for the artifact containing the test results: ↓ atime-results.zip
|
This comment was marked as resolved.
This comment was marked as resolved.
Yes. I have added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- setallocrow needs to return invisibly.
- I was expecting following example to show difference, it does not
x1 = data.table(a = 1:2)
x2 = x1
y1 = data.table(a = 1:2)
invisible(setallocrow(y1))
y2 = y1
delete_first_row = function(x) x[1L, .ROW := NULL]
delete_first_row(x1)
delete_first_row(y1)
x2
y2could verbose=T tell when object is re-allocated (as resizable) rather than truly updated in place?
we need example that shows why setallocrow() might be desired.
- does it make sense to mark resizable already in
as.data.table/fread/rbindlist?
|
I have added verbose messages to library(data.table)
dt = data.table(a = 1:5)
truelength(dt$a)
#> [1] 0
setallocrow(dt)
truelength(dt$a)
#> [1] 5
dt[3:5, .ROW := NULL]
truelength(dt$a)
#> [1] 5
setallocrow(dt)
truelength(dt$a)
#> [1] 2Created on 2025-12-29 with reprex v2.1.1 Thinking about the bigger picture of fast |
|
|
||
| Note that for efficiency no check is performed for duplicate assignments, i.e. if multiple values are passed for assignment to the same index, assignment to this index will occur repeatedly and sequentially; for a given use case, consider whether it makes sense to create your own test for duplicates, e.g. in production code. | ||
|
|
||
| Note that \code{.ROW := NULL} is a special case used to delete rows by reference. Unlike column assignment, this requires an \code{i} expression to specify which rows to delete, and does not support \code{by} or \code{keyby}. See \code{\link{.ROW}} or \code{\link{special-symbols}} for details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aside for future consideration -- another use case would be DT[<optional i>, by=<grp>, having=<condition>, .ROW := NULL]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, be sure to point to ?setallocrow as the functional equivalent
Oh, I misunderstood
| \description{ | ||
| \code{.SD}, \code{.BY}, \code{.N}, \code{.I}, \code{.GRP}, and \code{.NGRP} are \emph{read-only} symbols for use in \code{j}. \code{.N} can be used in \code{i} as well. \code{.I} can be used in \code{by} as well. See the vignettes, Details and Examples here and in \code{\link{data.table}}. | ||
| \code{.EACHI} is a symbol passed to \code{by}; i.e. \code{by=.EACHI}, \code{.NATURAL} is a symbol passed to \code{on}; i.e. \code{on=.NATURAL} | ||
| \code{.ROW} is a symbol used with \code{:= NULL} to delete rows by reference; i.e. \code{DT[i, .ROW := NULL]} deletes the rows selected by \code{i}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just point to ?assign, the info here is a bit repetitive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add a section to the Reference semantics vignette about this.
E.g., illustrating any issues that might obtain from shared references to data.tables, what happens here?
DT1 = data.table(i = 1:10)
DT2 = DT1
DT2[i > 5, .ROW := NULL]
nrow(DT1)| if (is.null(jsub) || identical(jsub, quote(NULL))) { | ||
| if (is.null(irows)) | ||
| stopf(".ROW := NULL requires i= condition to specify rows to delete") | ||
| if (!missingby) | ||
| stopf(".ROW := NULL with 'by' or 'keyby' is not supported yet") | ||
| .Call(CdeleteRows, x, irows) | ||
| return(suppPrint(x)) | ||
| } else { | ||
| stopf(".ROW can only be used with := NULL to delete rows") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest unnesting:
| if (is.null(jsub) || identical(jsub, quote(NULL))) { | |
| if (is.null(irows)) | |
| stopf(".ROW := NULL requires i= condition to specify rows to delete") | |
| if (!missingby) | |
| stopf(".ROW := NULL with 'by' or 'keyby' is not supported yet") | |
| .Call(CdeleteRows, x, irows) | |
| return(suppPrint(x)) | |
| } else { | |
| stopf(".ROW can only be used with := NULL to delete rows") | |
| } | |
| if (!is.null(jsub) && !identical(jsub, quote(NULL))) { | |
| stopf(".ROW can only be used with := NULL to delete rows") | |
| if (is.null(irows)) | |
| stopf(".ROW := NULL requires i= condition to specify rows to delete") | |
| if (!missingby) | |
| stopf(".ROW := NULL with 'by' or 'keyby' is not supported yet") | |
| .Call(CdeleteRows, x, irows) | |
| return(suppPrint(x)) |
| # multirow | ||
| dt = data.table(a=1:5, b=letters[1:5]) | ||
| test(2356.09, copy(dt)[c(1L, 3L), .ROW := NULL], dt[c(2,4,5)]) | ||
| test(2356.10, copy(dt)[c(TRUE, FALSE, TRUE, FALSE, TRUE), .ROW := NULL], dt[c(2,4)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks the same as 2356.03?
| Please note: over-allocation of the column pointer vector is not for efficiency \emph{per se}; it is so that \code{:=} can add columns by reference without a shallow copy. | ||
| \code{setallocrow} is a utility function that prepares columns for fast row operations (delete or insert (not implemented yet)) by reference and manages row capacity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mention the I misunderstood.ROW:=NULL alternative here by linking to ?assign
| Please note: over-allocation of the column pointer vector is not for efficiency \emph{per se}; it is so that \code{:=} can add columns by reference without a shallow copy. | ||
| \code{setallocrow} is a utility function that prepares columns for fast row operations (delete or insert (not implemented yet)) by reference and manages row capacity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current wording doesn't make clear that delete is implemented, insert is not.
| \code{setallocrow} is a utility function that prepares columns for fast row operations (delete or insert (not implemented yet)) by reference and manages row capacity. | |
| \code{setallocrow} is a utility function that prepares columns for fast row operations (delete or insert) by reference and manages row capacity. (Note that 'insert' by reference is not yet implemented) |
| \code{setallocrow} is a utility function that prepares columns for fast row operations (delete or insert (not implemented yet)) by reference and manages row capacity. | ||
| Before deleting or inserting rows by reference, columns must be resizable. | ||
| \code{setallocrow} ensures all columns are in the appropriate state by converting ALTREP columns to materialized form and reallocating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs a bit of clarification -- when I read it, my thought was "oh, we can't do .ROW:=NULL on a table with ALTREP columns unless we first setallocrow()", but I don't see any examples/tests to that effect.
OTOH, I think (?) we strive to expand all ALTREP columns, so I haven't been able to come up with such an example
|
|
||
| SEXP allocrowwrapper(SEXP dt, SEXP n) { | ||
| if (!isInteger(n) || length(n)!=1 || INTEGER(n)[0]<0 || INTEGER(n)[0]==NA_INTEGER) | ||
| error("n must be a single non-negative non-NA integer"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| error("n must be a single non-negative non-NA integer"); | |
| error(_("n must be a single non-negative non-NA integer")); |
| for (R_xlen_t i = 0; i < length(dt); i++) { | ||
| SEXP col = VECTOR_ELT(dt, i); | ||
| if (!isVector(col)) | ||
| error("Cannot make non-vector column %lld resizable", (long long)(i + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be sure to mark strings for translation as appropriate
| error("Cannot make non-vector column %lld resizable", (long long)(i + 1)); | |
| error(_("Cannot make non-vector column %lld resizable"), (long long)(i + 1)); |
MichaelChirico
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is kind of 2 PRs -- one about setallocrow() and .ROW := NULL, are they totally independent?
Or perhaps, we need setallocrow() first in order to enable .ROW := NULL?
It might be easier to review in that sequence -- PR 1 adds setallocrow() (with no way to actually use it), PR 2 enables DELETE operations by reference. WDYT?
d = data.table(a = 4:1)
.Internal(inspect(d[["a"]]))
#@55b90c4c1c28 13 INTSXP g0c2 [REF(2)] (len=4, tl=0) 4,3,2,1
setallocrow(d)
.Internal(inspect(d[["a"]]))
#@55b90c4c1e68 13 INTSXP g0c2 [REF(1),gp=0x20] (len=4, tl=4) 4,3,2,1
d[, "x" := c(1,2,3,4)]
.Internal(inspect(d[["a"]]))
#@55b90c4c1e68 13 INTSXP g0c2 [REF(1),gp=0x20] (len=4, tl=4) 4,3,2,1
.Internal(inspect(d[["x"]]))
#@55b910172c58 14 REALSXP g0c3 [REF(2)] (len=4, tl=0) 1,2,3,4
I think it might be explicitly requested from user each time
|

Closes #635
dt[i, .ROW := NULL]byclausesdt[i, .ROW := NULL, by]Currently only supports deletion of
i, but could also be integrated intodogroups.Do we need/want a functional form like
delete(x, irows). Depending on what we allow forirowsthis would need a rewrite/duplicate of the internals of[ievaluation.