Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add try_*_get() for each vector type, and fail more elegantly in Vector element extraction #503

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

DavisVaughan
Copy link
Contributor

@DavisVaughan DavisVaughan commented Sep 4, 2024

Closes #499 (alternative approach)
Addresses posit-dev/positron#4537

If we try and extract an element with get_unchecked() that causes an error in an ALTREP Elt method (including but not limited to an OOM error), we now log an error and return error_elt() for that type, which is typically the NA value for that type, but for raw vectors it is 0 as an Rbyte.

I considered propagating the error up through the return value of get_unchecked() but that was painful. We use it in so many places that don't expect it to fail that propagating an error through would be a huge ordeal that doesn't seem worth it for this somewhat rare case. I think logging an error + showing NA or something like it is possibly the best we can do.

I confirmed that

x <- 1:1e10
names(x) <- x

doesn't crash immediately anymore, but expanding x in the Variables pane hangs and causes a 5 second timeout. Something is definitely not right here because even x <- 1:1e5 hangs for a long time on main when you try and expand x in the Variables pane.

…ing shown to the user

`top_level_exec()` shows errors to the user if they occur at the top level, even if our calling code recovers from the error
@DavisVaughan
Copy link
Contributor Author

DavisVaughan commented Sep 4, 2024

In general things like this should no longer immediately crash the R session. Note that printing z in the console will immediately trigger the R error.

Also note that using STRING_ELT() on z results in the allocation error, which is why you see NA values in the Variables pane. Using STRING_ELT() is the best we can do in the general case for ALTREP objects (because it gives them a chance to do low memory operations if they can) but it just happens to trigger allocations for the deferred string ALTREP class in use here. I think trying to optimize any more for the deferred string case would be overoptimizing vs having a decent solution that works in the general case for ALTREP objects.

x <- 1:1e10
names(x) <- x
z <- names(x)
Screenshot 2024-09-04 at 11 33 37 AM

Copy link
Contributor

@dfalbel dfalbel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Is it worth copying the tests from #499 ?

@DavisVaughan DavisVaughan requested a review from lionel- September 9, 2024 17:00
Copy link
Contributor

@lionel- lionel- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I'm slightly worried about the performance overhead. This could be improved by:

  • Detecting whether a vector is altrep or not on construction.
  • If not altrep, get full vector slice and use that to index an element
  • If altrep, use protected get, and this could be improved later on to retrieve regions so the protection would only happen once per region.

@@ -1264,3 +1264,86 @@ pub fn plain_binding_force_with_rollback(binding: &Binding) -> anyhow::Result<RO
_ => Err(anyhow!("Unexpected binding type")),
}
}

#[cfg(test)]
mod tests {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!


// Small, should always pass with `success`
let display = get_display_value("local({x = 1:1e3; names(x) = x; names(x)})");
assert!(success.is_match(display.as_str()) || failure.is_match(display.as_str()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert!(success.is_match(display.as_str()) || failure.is_match(display.as_str()));
assert!(success.is_match(display.as_str()));

@lionel-
Copy link
Contributor

lionel- commented Sep 14, 2024

and this could be improved later on to retrieve regions so the protection would only happen once per region.

I'm not familiar with altrep but would this behave better than the _ELT() methods? I.e. would this avoid a full materialisation? It seems like that's the purpose of retrieving a region slice?

@lionel-
Copy link
Contributor

lionel- commented Sep 14, 2024

Do we need to be worried about the variable and data viewer panes being confusing with altrep objects? Would it be possible for the formatting paths to use the Result getter and display, say, <error> instead of NA, so that the output is at least googlable?

@DavisVaughan
Copy link
Contributor Author

Note to self to wait for #508 and rework this on top of that after it is merged

@DavisVaughan
Copy link
Contributor Author

would this avoid a full materialisation

It is completely up to the ALTREP class. The ALTREP author could decide that INTEGER_GET_REGION() can be implemented without materializing (by providing an INTEGER_OR_NULL() method), but INTEGER_ELT() can't. The important thing for us is that we don't have any control over that, and just have to try and use "the friendliest thing we can".


Using INTEGER_GET_REGION() combined with having a small buffer vector of the underlying C type is not unreasonable. cpp11 does exactly this actually. It may avoid repeated INTEGER_ELT() calls if the ALTREP type can provide a const DATAPTR without materializing:
https://github.com/wch/r-source/blob/64873e17ff8be3c305b0a7487e01b266bc4660c9/src/main/altrep.c#L404-L417

Otherwise it falls back to this:
https://github.com/wch/r-source/blob/64873e17ff8be3c305b0a7487e01b266bc4660c9/src/main/altrep.c#L755-L763

And FWIW, deferred strings can't provide a dataptr before it has fully materialized, so it would always fall back to the ELT loop fallback approach
https://github.com/wch/r-source/blob/64873e17ff8be3c305b0a7487e01b266bc4660c9/src/main/altclasses.c#L764-L768

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants