Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

isOrdered results differ for NA_integer_ and NA_real_ #221

Open
joshuaulrich opened this issue Dec 20, 2017 · 3 comments
Open

isOrdered results differ for NA_integer_ and NA_real_ #221

joshuaulrich opened this issue Dec 20, 2017 · 3 comments

Comments

@joshuaulrich
Copy link
Owner

The isOrdered() function returns different results depending on whether the input is integer or double and includes a NA:

run.isOrdered <- function(x) {
  c(isOrdered(x, increasing =  TRUE, strictly =  TRUE),
    isOrdered(x, increasing =  TRUE, strictly = FALSE),
    isOrdered(x, increasing = FALSE, strictly = FALSE),
    isOrdered(x, increasing = FALSE, strictly =  TRUE))
}
### Why are these different?
# Integers are always FALSE
run.isOrdered(c(1L, NA_integer_, 1L))
# [1] FALSE FALSE FALSE FALSE
run.isOrdered(c(0L, NA_integer_, 1L))
# [1] FALSE FALSE FALSE FALSE
run.isOrdered(c(1L, NA_integer_, 0L))
# [1] FALSE FALSE FALSE FALSE

run.isOrdered(c(0L, 1L, NA_integer_, 2L))
# [1] FALSE FALSE FALSE FALSE
run.isOrdered(c(0L, 1L, NA_integer_, 1L))
# [1] FALSE FALSE FALSE FALSE
run.isOrdered(c(0L, 0L, NA_integer_, 1L))
# [1] FALSE FALSE FALSE FALSE
run.isOrdered(c(0L, 0L, NA_integer_, 0L))
# [1] FALSE FALSE FALSE FALSE

run.isOrdered(c(2L, NA_integer_, 1L, 0L))
# [1] FALSE FALSE FALSE FALSE
run.isOrdered(c(1L, NA_integer_, 1L, 0L))
# [1] FALSE FALSE FALSE FALSE
run.isOrdered(c(1L, NA_integer_, 0L, 0L))
# [1] FALSE FALSE FALSE FALSE
run.isOrdered(c(0L, NA_integer_, 0L, 0L))
# [1] FALSE FALSE FALSE FALSE


# Doubles are all over the place
run.isOrdered(c(1.0, NA_real_, 1.0))
# [1] TRUE TRUE TRUE TRUE
run.isOrdered(c(0.0, NA_real_, 1.0))
# [1] TRUE TRUE TRUE TRUE
run.isOrdered(c(1.0, NA_real_, 0.0))
# [1] TRUE TRUE TRUE TRUE

run.isOrdered(c(0.0, 1.0, NA_real_, 2.0))
# [1]  TRUE  TRUE FALSE FALSE
run.isOrdered(c(0.0, 1.0, NA_real_, 1.0))
# [1]  TRUE  TRUE FALSE FALSE
run.isOrdered(c(0.0, 0.0, NA_real_, 1.0))
# [1] FALSE  TRUE  TRUE FALSE
run.isOrdered(c(0.0, 0.0, NA_real_, 0.0))
# [1] FALSE  TRUE  TRUE FALSE

run.isOrdered(c(2.0, NA_real_, 1.0, 0.0))
# [1] FALSE FALSE  TRUE  TRUE
run.isOrdered(c(1.0, NA_real_, 1.0, 0.0))
# [1] FALSE FALSE  TRUE  TRUE
run.isOrdered(c(1.0, NA_real_, 0.0, 0.0))
# [1] FALSE  TRUE  TRUE FALSE
run.isOrdered(c(0.0, NA_real_, 0.0, 0.0))
# [1] FALSE  TRUE  TRUE FALSE
@joshuaulrich
Copy link
Owner Author

Places where this behavior may affect user code: Subsetting in [.xts calls isOrdered() on i and/or on the output from calls to binsearch(). It's also called on the INDEX argument to period.apply(). These calls are in files xts.methods.R and periodicity.R, respectively.

The other calls in the grep output below are on the index attribute (or a vector that will be used as an index), so they shouldn't be allowed to contain NA anyway.

> grep isOrdered *
align.time.R:  isOrdered(.index(x), strictly=TRUE)
align.time.R:  isOrdered(index(x), strictly=TRUE)
index.R:  if(!isOrdered(.index(x), strictly=FALSE))
isOrdered.R:`isOrdered` <- function(x, increasing=TRUE, strictly=TRUE) {
periodicity.R:    if(!isOrdered(INDEX)) {
periodicity.R:      # isOrdered returns FALSE if there are duplicates
xts.methods.R:          if(isOrdered(firstlast, strictly=FALSE)) # fixed non-match within range bug
xts.methods.R:    if(!isOrdered(i,strictly=FALSE)) {
xts.R:  if(!isOrdered(order.by, strictly=!unique)) {
xts.R:    if( !isOrdered(index, increasing=TRUE, strictly=unique) )
xts.R:    if( !isOrdered(index, increasing=TRUE, strictly=unique) )

@TomAndrews
Copy link
Contributor

I think the reason that all the integer cases are FALSE is that NA_INTEGER = INT_MIN:
https://github.com/wch/r-source/blob/8a55192af9a65291afffb64c22b29801ea9151a6/src/include/R_ext/Arith.h#L49
So all those cases must be false since there's a very large negative number in the middle of all of them.

For doubles R is using IEEE NaN values:
https://github.com/wch/r-source/blob/8a55192af9a65291afffb64c22b29801ea9151a6/src/main/arithmetic.c#L112

According to the standard, any ordering comparison with NaN returns false:
https://en.wikipedia.org/wiki/NaN

This explains the double cases:

  • With three values, every comparison will include an NA so it falls through to the final return TRUE
  • With four values, the 2 comparisons including an NA don't trigger a return so the results are the same as if you just use the two adjacent real values:
run.isOrdered(c(0.0, 1.0))
# [1]  TRUE  TRUE FALSE FALSE
run.isOrdered(c(0.0, 1.0))
# [1]  TRUE  TRUE FALSE FALSE
run.isOrdered(c(0.0, 0.0))
# [1] FALSE  TRUE  TRUE FALSE
run.isOrdered(c(0.0, 0.0))
# [1] FALSE  TRUE  TRUE FALSE

run.isOrdered(c(1.0, 0.0))
# [1] FALSE FALSE  TRUE  TRUE
run.isOrdered(c(1.0, 0.0))
# [1] FALSE FALSE  TRUE  TRUE
run.isOrdered(c(0.0, 0.0))
# [1] FALSE  TRUE  TRUE FALSE
run.isOrdered(c(0.0, 0.0))
# [1] FALSE  TRUE  TRUE FALSE

Therefore you get results like this:

isOrdered(c(0.0, 1.0, NA_real_, 0.0, 1.0))
# [1] TRUE

@joshuaulrich
Copy link
Owner Author

Note that zoo orders the index using it's ORDER() function, where the default na.last = TRUE means that NA are always at the end of the index for zoo objects, regardless of the atomic type of the index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants