From 156cd51fd779fbff5a9e5da928c5b3624114b185 Mon Sep 17 00:00:00 2001 From: Arun Jose <40291569+arunjose696@users.noreply.github.com> Date: Fri, 6 Sep 2024 15:33:08 +0200 Subject: [PATCH] DOCS-#7382: Add documentation on how to use Modin Native query compiler (#7386) Co-authored-by: Iaroslav Igoshev Signed-off-by: arunjose696 --- docs/usage_guide/optimization_notes/index.rst | 31 +++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/docs/usage_guide/optimization_notes/index.rst b/docs/usage_guide/optimization_notes/index.rst index 0dcbe5a25d7..6e9d1ca7d63 100644 --- a/docs/usage_guide/optimization_notes/index.rst +++ b/docs/usage_guide/optimization_notes/index.rst @@ -314,6 +314,37 @@ Copy-pastable example, showing how mixing pandas and Modin DataFrames in a singl # Possible output: TypeError +Execute DataFrame operations using NativeQueryCompiler +"""""""""""""""""""""""""""""""""""""""""""""""""""""" + +By default, Modin distributes data across partitions and performs operations +using the ``PandasQueryCompiler``. However, for certain scenarios such as handling small or empty DataFrames, +distributing them may introduce unnecessary overhead. In such cases, it's more efficient to default +to pandas at the query compiler layer. This can be achieved by setting the ``cfg.NativeDataframeMode`` +:doc:`configuration variable: ` to ``Pandas``. When set to ``Pandas``, all operations in Modin default to pandas, and the DataFrames are not distributed, +avoiding additional overhead. This configuration can be toggled on or off depending on whether +DataFrame distribution is required. + +DataFrames created while the ``NativeDataframeMode`` is active will continue to use the ``NativeQueryCompiler`` +even after the config is disabled. Modin supports interoperability between distributed Modin DataFrames and +those using the ``NativeQueryCompiler``. + +.. code-block:: python + + import modin.pandas as pd + import modin.config as cfg + + # This dataframe will be distributed and use `PandasQueryCompiler` by default + df_distributed = pd.DataFrame(...) + + # Set mode to "Pandas" to avoid distribution and use `NativeQueryCompiler` + cfg.NativeDataframeMode.put("Pandas") + df_native_qc = pd.DataFrame(...) + + # Revert to default settings for distributed dataframes + cfg.NativeDataframeMode.put("Default") + df_distributed = pd.DataFrame(...) + Operation-specific optimizations """"""""""""""""""""""""""""""""