Skip to content

Conversation

@thisisnic
Copy link
Member

@thisisnic thisisnic commented Dec 23, 2025

Rationale for this change

CRAN complains about non-API calls

What changes are included in this PR?

Update those items to use alternative approaches

Are these changes tested?

Well, if the tests pass I think we're good?

Are there any user-facing changes?

Nope

Summary of AI changes

🤖 Generated with Claude Code

Have added comments inline regarding assumptions behind changes, and where I questioned those to try to verify it.

Sources referenced in approach:

Comment on lines +578 to +581
// copy attributes from the altrep object
DUPLICATE_ATTRIB(dup, alt);

UNPROTECT(2);
UNPROTECT(1);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Note: DUPLICATE_ATTRIB also copies the OBJECT bit and S4 status, which the old code didn't do.

Was unsure if this matters?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know altrep well enough to know the answer, but copying the whole object sounds like it might be detrimental to performance — any idea if that's true?

Copy link
Member Author

@thisisnic thisisnic Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Performance impact: Negligible. The OBJECT bit and S4 flag are just bit operations - essentially free compared to the actual attribute list copying that both approaches do.

I queried these a little more and it looks like these are just metadata - the OBJECT bit determines the result of calling is.object() and the S4 bit indicates if the object is an S4 object.

I took a look at https://cran.r-project.org/doc/manuals/r-release/R-ints.html#Attributes-1 and also queried what this bit of the codebase does (answer: wrapper around ChunkedArray to store data when we call e.g. collect() as an ALTREP vector until R actually needs it), and whether we definitely need S4 and OBJECT: apparently we need OBJECT but not S4:

🤖 OBJECT bit: Yes, needed. Factors have a class attribute, and the OBJECT bit tells R "this is a classed object, use S3 method dispatch". Without it, things like print(), [, etc. might not dispatch to the factor methods correctly.

Either way, sounds like no impact.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Dec 23, 2025
Comment on lines +1104 to +1108
return R_altrep_inherits(x, AltrepVectorPrimitive<REALSXP>::class_t) ||
R_altrep_inherits(x, AltrepVectorPrimitive<INTSXP>::class_t) ||
R_altrep_inherits(x, AltrepFactor::class_t) ||
R_altrep_inherits(x, AltrepVectorString<StringType>::class_t) ||
R_altrep_inherits(x, AltrepVectorString<LargeStringType>::class_t);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Trade-off: Old code automatically worked for any class registered under "arrow" package. New code requires updating if new ALTREP types are added.

Doesn't sound like it'll be an issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but this check is definitely more fragile — do we have to do this here? I don't see any of the ATTRIB stuff here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the code I deleted from r/src/arrow_cpp11.h, ALTREP_CLASS_SERIALIZED_CLASS calls ATTRIB

Comment on lines +1172 to 1186
if (R_altrep_inherits(x, arrow::r::altrep::AltrepVectorPrimitive<REALSXP>::class_t)) {
result = arrow::r::altrep::AltrepVectorPrimitive<REALSXP>::IsMaterialized(x);
} else if (class_name == "arrow::array_int_vector") {
} else if (R_altrep_inherits(
x, arrow::r::altrep::AltrepVectorPrimitive<INTSXP>::class_t)) {
result = arrow::r::altrep::AltrepVectorPrimitive<INTSXP>::IsMaterialized(x);
} else if (class_name == "arrow::array_string_vector") {
} else if (R_altrep_inherits(
x, arrow::r::altrep::AltrepVectorString<arrow::StringType>::class_t)) {
result = arrow::r::altrep::AltrepVectorString<arrow::StringType>::IsMaterialized(x);
} else if (class_name == "arrow::array_large_string_vector") {
} else if (R_altrep_inherits(
x,
arrow::r::altrep::AltrepVectorString<arrow::LargeStringType>::class_t)) {
result =
arrow::r::altrep::AltrepVectorString<arrow::LargeStringType>::IsMaterialized(x);
} else if (class_name == "arrow::array_factor") {
} else if (R_altrep_inherits(x, arrow::r::altrep::AltrepFactor::class_t)) {
result = arrow::r::altrep::AltrepFactor::IsMaterialized(x);
Copy link
Member Author

@thisisnic thisisnic Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked we are checking the same thing before/after; appears that we are but just accessing them differently (i.e. via C++ class name not registered name)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new way sounds better to me, the class is a truer test than what we labelled it 😂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't pulled the code locally — but do we use class_name elsewhere or was this the one place? Maybe we could get rid of class_name entirely if we always do something like ^^^^?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have it here + also where we have Rprintf("materialized %s len=%ld\n", Impl::class_name,. Unsure if it's important - Claude thinks it's just for altrep debugging, and looks like that to me?

Comment on lines +1203 to +1215
if (R_altrep_inherits(x, arrow::r::altrep::AltrepVectorPrimitive<REALSXP>::class_t)) {
arrow::r::altrep::AltrepVectorPrimitive<REALSXP>::Materialize(x);
} else if (class_name == "arrow::array_int_vector") {
} else if (R_altrep_inherits(
x, arrow::r::altrep::AltrepVectorPrimitive<INTSXP>::class_t)) {
arrow::r::altrep::AltrepVectorPrimitive<INTSXP>::Materialize(x);
} else if (class_name == "arrow::array_string_vector") {
} else if (R_altrep_inherits(
x, arrow::r::altrep::AltrepVectorString<arrow::StringType>::class_t)) {
arrow::r::altrep::AltrepVectorString<arrow::StringType>::Materialize(x);
} else if (class_name == "arrow::array_large_string_vector") {
} else if (R_altrep_inherits(
x,
arrow::r::altrep::AltrepVectorString<arrow::LargeStringType>::class_t)) {
arrow::r::altrep::AltrepVectorString<arrow::LargeStringType>::Materialize(x);
} else if (class_name == "arrow::array_factor") {
} else if (R_altrep_inherits(x, arrow::r::altrep::AltrepFactor::class_t)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


// singleton altrep class description
static R_altrep_class_t class_t;
static const char* class_name;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This felt like slightly weird/redundant weird way to get this information but didn't see any alternative approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be mostly my lack-of-C++ fluency but do you know why we need to have lines like this on a few lines (here and below)? What is it doing?

Copy link
Member Author

@thisisnic thisisnic Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this to extract the class name a different way - before we were doing it via code which used ATTRIB - so having it declared here means that we can set it when we initialise the class (where we do AltrepClass::class_name = name;).

@thisisnic
Copy link
Member Author

CI failure R / AMD64 Ubuntu 24.04 R 4.4 Force-Tests true (pull_request) due to unrelated issue (downloading Parquet file from our repo)

@thisisnic thisisnic marked this pull request as ready for review December 23, 2025 08:43
@thisisnic thisisnic requested a review from jonkeane as a code owner December 23, 2025 08:43
@thisisnic
Copy link
Member Author

thisisnic commented Dec 24, 2025

Alternative approach to try based on conversation with CRAN folks:

The other one is tricky (L152 , 1163, 1194) - you are trying to get the class name of the altrep object for which there is no API AFAICT. ...A workaround is to use something like:

const char *altrep_class_name(SEXP x) {
   SEXP l = PROTECT(Rf_lang2(Rf_install(".Internal"),
			  PROTECT(Rf_lang2(Rf_install("altrep_class"), x))));
   SEXP r = Rf_eval(l, R_BaseEnv);
   UNPROTECT(2);
   return CHAR(STRING_ELT(r, 0));
}

@jonkeane - I'm leaning towards not implementing this approach due to advice from 🤖

🤖 The above suggestion calls into R at runtime to query the class name, while the current PR simply stores the name as a static string when the class is registered. Since Arrow controls its own altrep classes and already has the name at registration time, storing it is simpler, faster, and avoids the overhead of R evaluation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants