Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

signal_state() is quite slow #167

Closed
mgirlich opened this issue Jul 12, 2023 · 14 comments
Closed

signal_state() is quite slow #167

mgirlich opened this issue Jul 12, 2023 · 14 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@mgirlich
Copy link

As it is used for superseded functions this isn't very nice.

f <- function() {
  lifecycle::signal_stage("superseded", "recode()", "case_match()")
}

for (i in 1:1000) {
  f()
}
image

E.g. this made dplyr::recode() so much slower than before.

@DavisVaughan
Copy link
Member

Not really sure the readCitationFile part is "real" here. signaling is definitely slow though, and has been a reoccurring issue

library(lifecycle)
xx <- profvis::profvis(for (i in 1:1000) signal_stage("superseded", "recode()", "case_match()"))
print(xx, aggregate = T)
Screenshot 2023-07-12 at 8 57 13 AM

@DavisVaughan
Copy link
Member

See also r-lib/rlang#1609 for .rlang_cli_compat() being slow

@DavisVaughan
Copy link
Member

I think I've looked at isatty() before too and I couldn't ever figure out why it was slow. It is so fast on its own

bench::mark(isatty(stdout()))
#> # A tibble: 1 × 6
#>   expression            min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>       <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 isatty(stdout())    246ns    328ns  2223575.        0B        0

Created on 2023-07-12 with reprex v2.0.2

@DavisVaughan
Copy link
Member

Oh but the get_data() call is particularly slow when called from RStudio. i.e. if you run this in a reprex in a separate session it is very fast, but interactively in RStudio it is slow:

> bench::mark(cli:::rstudio$.internal$get_data())
# A tibble: 1 × 13
  expression           min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result      
  <bch:expr>         <bch> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>      
1 cli:::rstudio$.in787µs  809µs     1161.    41.1KB     17.3   538     8      463ms <named list>
# ℹ 3 more variables: memory <list>, time <list>, gc <list>

800us * 100 = 800ms which roughly aligns with my profvis result

It looks like most of the time is in the rstudioapi function calls

> bench::mark(asNamespace("rstudioapi")$isAvailable())
# A tibble: 1 × 13
  expression      min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory    
  <bch:expr>   <bch:> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>    
1 "asNamespac… 1.11µs 1.31µs   700181.        0B     70.0  9999     1     14.3ms <lgl>  <Rprofmem>
# ℹ 2 more variables: time <list>, gc <list>

> bench::mark(asNamespace("rstudioapi")$getVersion())
# A tibble: 1 × 13
  expression             min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result    
  <bch:expr>           <bch> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>    
1 "asNamespace(\"rstu… 384µs  403µs     2414.    20.6KB     17.2  1125     8      466ms <pckg_vrs>
# ℹ 3 more variables: memory <list>, time <list>, gc <list>

> bench::mark(asNamespace("rstudioapi")$versionInfo()$mode)
# A tibble: 1 × 13
  expression      min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory    
  <bch:expr>    <bch> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>    
1 "asNamespace382µs  396µs     2460.    20.6KB     17.2  1147     8      466ms <chr>  <Rprofmem>
# ℹ 2 more variables: time <list>, gc <list>

CC @gaborcsardi

@DavisVaughan
Copy link
Member

Another option is to make rlang:::has_ansi() cache its result so it doesn't have to keep calling cli

@DavisVaughan
Copy link
Member

DavisVaughan commented Jul 12, 2023

It looks like rstudioapi's versionInfo() is actually the one calling readCitationFile()!

xx <- profvis::profvis(for (i in 1:1000) asNamespace("rstudioapi")$callFun("versionInfo"))
print(xx, aggregate = TRUE)
Screenshot 2023-07-12 at 9 17 28 AM
> asNamespace("rstudioapi")$versionInfo()
$citation
To cite RStudio in publications use:

  Posit team (2023). RStudio: Integrated Development Environment for R. Posit Software,
  PBC, Boston, MA. URL http://www.posit.co/.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {RStudio: Integrated Development Environment for R},
    author = {{Posit team}},
    organization = {Posit Software, PBC},
    address = {Boston, MA},
    year = {2023},
    url = {http://www.posit.co/},
  }

$mode
[1] "desktop"

$version
[1] ‘2023.6.0.421$long_version
[1] "2023.06.0+421"

$release_name
[1] "Mountain Hydrangea"

This is used by cli when getting the version number and the mode
https://github.com/r-lib/cli/blob/7eb56ebc4c315ecfef74148881b15a522c8946bf/R/aab-rstudio-detect.R#L35-L36

One step cli could take is to just make one call to versionInfo(), because that has the version in it as versionInfo()$version, so there is no need to also call $getVersion().

But I think a much better solution would be:

Then have cli switch to using getMode()

@DavisVaughan
Copy link
Member

I've broken this down to 4 individual fixes

@gaborcsardi
Copy link
Member

@DavisVaughan Why not cache the number of colors, so rstudio$detect() is only called once and you don't need to wait for any other packages to get updated?

@DavisVaughan
Copy link
Member

DavisVaughan commented Jul 12, 2023

Yea I quickly mentioned that here #167 (comment). I do think that would be a good quick solution

It won't be dynamic to any options like "crayon.enabled" anymore, but maybe that is ok, not sure

@gaborcsardi
Copy link
Member

It won't be dynamic to any options like "crayon.enabled" anymore, but maybe that is ok, not sure

I an not sure what you mean.

@DavisVaughan
Copy link
Member

DavisVaughan commented Jul 12, 2023

  • Call has_ansi(), it reports TRUE because cli::num_ansi_colors() > 1 is TRUE
  • Set options(crayon.enabled = FALSE)
  • Call has_ansi() again. It's cached so it reports TRUE again even though in theory if it wasn't cached it would report FALSE

i.e. here is what happens today without caching, which would no longer be true with caching

> rlang:::has_ansi()
[1] TRUE
> options(crayon.enabled = F)
> rlang:::has_ansi()
[1] FALSE

I just don't know how important that is for the places has_ansi() is used

@gaborcsardi
Copy link
Member

You would cache in the function that make a lot of calls downstream. E.g. if one function calls cli many times then there. If another function calls the first function many times, then there.

If there is such a function, and this is not about speeding up a single call...

@hadley
Copy link
Member

hadley commented Nov 1, 2023

I think we can get a big improvement here from switching from expr_deparse() to deparse(), because then we avoid the colour checking path all together.

hadley added a commit that referenced this issue Nov 1, 2023
@hadley hadley added the bug an unexpected problem or unintended behavior label Nov 2, 2023
hadley added a commit that referenced this issue Nov 2, 2023
@hadley
Copy link
Member

hadley commented Nov 2, 2023

There was a ~6x performance improvement in #172, by removing the most obvious bottleneck. We'll make it faster in the future by doing more work lazily, which is tracked in #176.

@hadley hadley closed this as completed Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

4 participants