Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification On data.table::getDTthreads(verbose = TRUE) #6721

Open
drag05 opened this issue Jan 13, 2025 · 2 comments
Open

Clarification On data.table::getDTthreads(verbose = TRUE) #6721

drag05 opened this issue Jan 13, 2025 · 2 comments

Comments

@drag05
Copy link

drag05 commented Jan 13, 2025

Argument definition in setDTthreads() function documentation seems to confuse the statements of getDTthreads(verbose = TRUE:

> data.table::getDTthreads(verbose = TRUE)
  OpenMP version (_OPENMP)       201511
  omp_get_num_procs()            16
  R_DATATABLE_NUM_PROCS_PERCENT  unset (default 50)
  R_DATATABLE_NUM_THREADS                 unset
  R_DATATABLE_THROTTLE                         unset (default 1024)
  omp_get_thread_limit()                                 2147483647
  omp_get_max_threads()                               16
  OMP_THREAD_LIMIT                                 unset
  OMP_NUM_THREADS                               unset
  RestoreAfterFork                                         true
  data.table is using 8 threads with throttle==1024. See ?setDTthreads.
[1] 8

Here, argument "threads" refers to CPU logical cores. As the "percent" default is 50, the value is 8 as stated by next-to-the-last-row in above code. However the env. variable R_DATATABLE_NUM_THREADS is "unset" leading to confusion. This function documentation should be clarified further. Thank you!

@ben-schwen
Copy link
Member

I don't see how the documentation is wrong.

Sys.getenv("R_DATATABLE_NUM_PROCS_PERCENT") will return "", same as Sys.getenv("R_DATATABLE_NUM_THREADS") when not set.

@drag05
Copy link
Author

drag05 commented Jan 14, 2025

@ben-schwen Documentation is not wrong. Here is why I think it only needs to be expounded:

  1. Argument "threads" in setDTthreads refers to logical CPU cores (hardware) while the generic "threads" refers to software processes. Usually, there are hundreds or thousands of threads involved when running programs on CPU with 18, 24 etc. CPU logical cores. Hence, the confusion;
  2. There are two groups of environment variables in above example: the "R_DATATABLE" group and the "OMP" group. Expound on advantages and/or probable conflicts when setting values in both groups simultaneously. Hence, the need for documentation clarification.
    Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants