Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #37 #38

Merged
merged 2 commits into from
Jan 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: minty
Title: Minimal Type Guesser
Version: 0.0.4
Version: 0.0.5
Authors@R: c(
person("Chung-hong", "Chan", role = c("aut", "cre"), email = "[email protected]", comment = c(ORCID = "0000-0002-6232-7530")),
person("Hadley", "Wickham", , "[email protected]", role = "aut", comment = "author of the ported code from readr"),
Expand Down
39 changes: 19 additions & 20 deletions misc/benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ suppressPackageStartupMessages(library(readr))
Sys.time()
```

[1] "2024-11-08 11:32:50 CET"
[1] "2025-01-05 12:42:59 CET"

Under 200 rows, simple

Expand All @@ -19,10 +19,10 @@ bench::mark(minty::type_convert(iris_chr),
```

# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 minty::type_convert(iris_chr) 417.2µs 483.39µs 2020. 702.53KB 0
2 suppressMessages(readr::type_c… 2.23ms 2.34ms 345. 1.97MB 0
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:t> <dbl> <bch:byt> <dbl>
1 minty::type_convert(iris_chr) 393.95µs 416.7µs 2302. 702.53KB 0
2 suppressMessages(readr::type_co… 2.14ms 2.2ms 368. 1.97MB 0

Many rows

Expand All @@ -39,8 +39,8 @@ bench::mark(minty::type_convert(flights_chr, guess_integer = TRUE),
# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch> <bch:> <dbl> <bch:byt> <dbl>
1 minty::type_convert(flights_chr, gu… 1.14s 1.19s 0.842 189MB 15.5
2 suppressMessages(readr::type_conver… 1.06s 1.06s 0.918 153MB 16.9
1 minty::type_convert(flights_chr, gu… 1.12s 1.17s 0.859 189MB 15.8
2 suppressMessages(readr::type_conver… 1.03s 1.03s 0.946 153MB 17.4

Many row, guess_max

Expand All @@ -56,20 +56,20 @@ bench::mark(minty::type_convert(flights_chr, guess_integer = TRUE, guess_max = 5
# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 minty::type_convert(flights_ch… 552.31ms 559.35ms 1.73 153MB 19.0
2 suppressMessages(readr::type_c… 1.07s 1.12s 0.909 153MB 16.5
1 minty::type_convert(flights_ch… 543.66ms 551.12ms 1.76 153MB 19.4
2 suppressMessages(readr::type_c… 1.04s 1.04s 0.940 153MB 17.3

``` r
sessionInfo()
```

R version 4.4.1 (2024-06-14)
R version 4.4.2 (2024-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
Expand All @@ -86,14 +86,13 @@ sessionInfo()
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] readr_2.1.5 minty_0.0.4
[1] readr_2.1.5 minty_0.0.5

loaded via a namespace (and not attached):
[1] crayon_1.5.3 vctrs_0.6.5 cli_3.6.3 knitr_1.48
[1] crayon_1.5.3 vctrs_0.6.5 cli_3.6.3 knitr_1.49
[5] rlang_1.1.4 xfun_0.49 bench_1.1.3 jsonlite_1.8.9
[9] glue_1.8.0 htmltools_0.5.8.1 hms_1.1.3 fansi_1.0.6
[13] rmarkdown_2.28 evaluate_1.0.1 tibble_3.2.1 tzdb_0.4.0
[17] fastmap_1.2.0 yaml_2.3.10 profmem_0.6.0 lifecycle_1.0.4
[21] compiler_4.4.1 pkgconfig_2.0.3 nycflights13_1.0.2 digest_0.6.37
[25] R6_2.5.1 utf8_1.2.4 pillar_1.9.0 magrittr_2.0.3
[29] tools_4.4.1
[9] glue_1.8.0 htmltools_0.5.8.1 hms_1.1.3 rmarkdown_2.29
[13] evaluate_1.0.1 tibble_3.2.1 tzdb_0.4.0 fastmap_1.2.0
[17] yaml_2.3.10 profmem_0.6.0 lifecycle_1.0.4 compiler_4.4.2
[21] pkgconfig_2.0.3 nycflights13_1.0.2 digest_0.6.37 R6_2.5.1
[25] utf8_1.2.4 pillar_1.10.0 magrittr_2.0.3 tools_4.4.2
12 changes: 9 additions & 3 deletions src/CollectorGuess.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,17 @@ bool canParse(const cpp11::strings& x, const canParseFun& canParseF, LocaleInfo*
return true;
}

bool allMissing(const cpp11::strings& x) {
bool allMissing(const cpp11::strings& x, bool trim_ws) {
for (const auto & i : x) {
if (i != NA_STRING && i.size() > 0) {
if (!trim_ws && i != NA_STRING && i.size() > 0) {
return false;
}
if (trim_ws) {
auto istr = trimString(std::string(i));
if (i != NA_STRING && istr != "") {
return false;
}
}
}
return true;
}
Expand Down Expand Up @@ -136,7 +142,7 @@ static bool isDateTime(const std::string& x, LocaleInfo* pLocale) {
return "character";
}

if (allMissing(input)) {
if (allMissing(input, trim_ws)) {
return "logical";
}

Expand Down
7 changes: 5 additions & 2 deletions src/utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -84,8 +84,11 @@ inline bool starts_with_comment(
inline std::string trimString(std::string const &str, std::string const &whitespace=" \r\n\t\v\f") {
auto start = str.find_first_not_of(whitespace);
auto end = str.find_last_not_of(whitespace);

return str.substr(start, end - start + 1);
if (start != std::string::npos) {
return str.substr(start, end - start + 1);
} else {
return "";
}
}

#endif
10 changes: 10 additions & 0 deletions tests/testthat/test-parsing.R
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,13 @@ test_that("parse_guess() trim_ws #32 or tidyverse/readr#1536", {
expect_equal(parse_guess(c(" 1", " 2", " 3"), trim_ws = FALSE), c(" 1", " 2", " 3"))
expect_equal(parse_guess(c(" TRUE", "FALSE ", " T "), trim_ws = TRUE), c(TRUE, FALSE, TRUE))
})

test_that("all empty with trim_ws won't crash, #37 and ropensci/readODS#211", {
expect_error(minty:::guess_parser(c(" ", " "), trim_ws = TRUE), NA)
expect_error(minty:::guess_parser(c(" "), trim_ws = TRUE), NA)
expect_equal(minty:::guess_parser(c(" "), trim_ws = TRUE), "logical")
expect_equal(minty:::guess_parser(c(" ", " "), trim_ws = TRUE), "logical")
expect_error(minty:::guess_parser(c(" "), trim_ws = FALSE), NA)
expect_error(minty:::guess_parser(c(" ", "a"), trim_ws = FALSE), NA)
expect_equal(minty:::guess_parser(c(" ", "a")), "character")
})
12 changes: 12 additions & 0 deletions tests/testthat/test-type-convert.R
Original file line number Diff line number Diff line change
Expand Up @@ -98,3 +98,15 @@ test_that("r_is_string_cpp11", {
expect_false(r_is_string_cpp11(NA_character_))
expect_false(r_is_string_cpp11(NULL))
})

test_that("integration tests for #37, also ropensci/readODS#211", {
input <- structure(list(picture_archive_url = c("", "", "", "", ""),
video_url = c(" ", " ", " ", " ", " ")),
row.names = c(NA, -5L), class = "data.frame")

expect_error(type_convert(input, trim_ws = TRUE), NA)
output <- type_convert(input, trim_ws = TRUE)
expect_true(is.logical(output$video_url))
output <- type_convert(input, trim_ws = FALSE)
expect_false(is.logical(output$video_url))
})