Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
kevinheavey committed Dec 26, 2023
1 parent 1b64977 commit 0d824f1
Show file tree
Hide file tree
Showing 13 changed files with 119 additions and 125 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
b32e6d27
5d1e1864
36 changes: 18 additions & 18 deletions indexing.html
Original file line number Diff line number Diff line change
Expand Up @@ -322,7 +322,7 @@ <h2 data-number="1.2" class="anchored" data-anchor-id="read-the-data"><span clas
</div>
</div>
<div class="callout-body-container callout-body">
<p>The examples in this book use the <a href="https://pola-rs.github.io/polars-book/user-guide/coming_from_pandas.html?highlight=lazy#be-lazy">lazy evaluation</a> feature of Polars less than you should. It’s just inconvenient to use the lazy API when displaying dozens of intermediate results for educational purposes.</p>
<p>The examples in this book use the <a href="https://pola-rs.github.io/polars/user-guide/migration/pandas/#be-lazy">lazy evaluation</a> feature of Polars less than you should. It’s just inconvenient to use the lazy API when displaying dozens of intermediate results for educational purposes.</p>
</div>
</div>
<div class="panel-tabset">
Expand All @@ -337,8 +337,8 @@ <h2 data-number="1.2" class="anchored" data-anchor-id="read-the-data"><span clas
<div class="cell-output cell-output-display" data-execution_count="2">

<div><style>
.dataframe > thead > tr > th,
.dataframe > tbody > tr > td {
.dataframe > thead > tr,
.dataframe > tbody > tr {
text-align: right;
white-space: pre-wrap;
}
Expand All @@ -354,7 +354,7 @@ <h2 data-number="1.2" class="anchored" data-anchor-id="read-the-data"><span clas
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>df_pd <span class="op">=</span> pd.read_csv(extracted)</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>df_pd</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>/tmp/ipykernel_10491/2805799744.py:3: DtypeWarning: Columns (76,77,84) have mixed types. Specify dtype option on import or set low_memory=False.
<pre><code>/tmp/ipykernel_13993/2805799744.py:3: DtypeWarning: Columns (76,77,84) have mixed types. Specify dtype option on import or set low_memory=False.
df_pd = pd.read_csv(extracted)</code></pre>
</div>
<div class="cell-output cell-output-display" data-execution_count="3">
Expand Down Expand Up @@ -545,8 +545,8 @@ <h3 data-number="1.4.1" class="anchored" data-anchor-id="rows-by-number-columns-
<div class="cell-output cell-output-display" data-execution_count="4">

<div><style>
.dataframe > thead > tr > th,
.dataframe > tbody > tr > td {
.dataframe > thead > tr,
.dataframe > tbody > tr {
text-align: right;
white-space: pre-wrap;
}
Expand All @@ -556,12 +556,12 @@ <h3 data-number="1.4.1" class="anchored" data-anchor-id="rows-by-number-columns-
</div>
<p>Or using <code>take</code>:</p>
<div class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>df_pl.select(pl.col([<span class="st">"Dest"</span>, <span class="st">"Tail_Number"</span>]).take(<span class="bu">list</span>(<span class="bu">range</span>(<span class="dv">12</span>, <span class="dv">16</span>))))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>df_pl.select(pl.col([<span class="st">"Dest"</span>, <span class="st">"Tail_Number"</span>]).gather(<span class="bu">list</span>(<span class="bu">range</span>(<span class="dv">12</span>, <span class="dv">16</span>))))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-display" data-execution_count="5">

<div><style>
.dataframe > thead > tr > th,
.dataframe > tbody > tr > td {
.dataframe > thead > tr,
.dataframe > tbody > tr {
text-align: right;
white-space: pre-wrap;
}
Expand All @@ -576,8 +576,8 @@ <h3 data-number="1.4.1" class="anchored" data-anchor-id="rows-by-number-columns-
<div class="cell-output cell-output-display" data-execution_count="6">

<div><style>
.dataframe > thead > tr > th,
.dataframe > tbody > tr > td {
.dataframe > thead > tr,
.dataframe > tbody > tr {
text-align: right;
white-space: pre-wrap;
}
Expand Down Expand Up @@ -647,8 +647,8 @@ <h3 data-number="1.4.2" class="anchored" data-anchor-id="rows-by-string-index-co
<div class="cell-output cell-output-display" data-execution_count="8">

<div><style>
.dataframe > thead > tr > th,
.dataframe > tbody > tr > td {
.dataframe > thead > tr,
.dataframe > tbody > tr {
text-align: right;
white-space: pre-wrap;
}
Expand Down Expand Up @@ -729,8 +729,8 @@ <h3 data-number="1.4.3" class="anchored" data-anchor-id="rows-by-number-columns-
<div class="cell-output cell-output-display" data-execution_count="10">

<div><style>
.dataframe > thead > tr > th,
.dataframe > tbody > tr > td {
.dataframe > thead > tr,
.dataframe > tbody > tr {
text-align: right;
white-space: pre-wrap;
}
Expand Down Expand Up @@ -798,8 +798,8 @@ <h2 data-number="1.5" class="anchored" data-anchor-id="settingwithcopy"><span cl
<div class="cell-output cell-output-display" data-execution_count="12">

<div><style>
.dataframe > thead > tr > th,
.dataframe > tbody > tr > td {
.dataframe > thead > tr,
.dataframe > tbody > tr {
text-align: right;
white-space: pre-wrap;
}
Expand All @@ -814,7 +814,7 @@ <h2 data-number="1.5" class="anchored" data-anchor-id="settingwithcopy"><span cl
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a>f[f[<span class="st">'a'</span>] <span class="op">&lt;=</span> <span class="dv">3</span>][<span class="st">'b'</span>] <span class="op">=</span> f[<span class="st">'b'</span>] <span class="op">//</span> <span class="dv">10</span></span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a>f</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>/tmp/ipykernel_10491/1317853993.py:2: SettingWithCopyWarning:
<pre><code>/tmp/ipykernel_13993/1317853993.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

Expand Down
8 changes: 4 additions & 4 deletions method_chaining.html
Original file line number Diff line number Diff line change
Expand Up @@ -435,8 +435,8 @@ <h2 data-number="2.3" class="anchored" data-anchor-id="bringing-it-all-back-home
<div class="cell-output cell-output-display" data-execution_count="7">

<div><style>
.dataframe > thead > tr > th,
.dataframe > tbody > tr > td {
.dataframe > thead > tr,
.dataframe > tbody > tr {
text-align: right;
white-space: pre-wrap;
}
Expand Down Expand Up @@ -690,7 +690,7 @@ <h3 data-number="2.4.1" class="anchored" data-anchor-id="daily-flights"><span cl
<span id="cb11-27"><a href="#cb11-27" aria-hidden="true" tabindex="-1"></a> .plot()</span>
<span id="cb11-28"><a href="#cb11-28" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>/tmp/ipykernel_10556/4223446110.py:17: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
<pre><code>/tmp/ipykernel_14072/4223446110.py:17: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
.groupby(["IATA_CODE_Reporting_Airline", pd.Grouper(freq="H")])[</code></pre>
</div>
<div class="cell-output cell-output-display" data-execution_count="10">
Expand Down Expand Up @@ -773,7 +773,7 @@ <h3 data-number="2.4.2" class="anchored" data-anchor-id="planes-with-multiple-da
<span id="cb16-18"><a href="#cb16-18" aria-hidden="true" tabindex="-1"></a>sns.boxplot(x<span class="op">=</span><span class="st">"turn"</span>, y<span class="op">=</span><span class="st">"DepDelay"</span>, data<span class="op">=</span>flights_pd, ax<span class="op">=</span>ax)</span>
<span id="cb16-19"><a href="#cb16-19" aria-hidden="true" tabindex="-1"></a>ax.set_ylim(<span class="op">-</span><span class="dv">50</span>, <span class="dv">50</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>/tmp/ipykernel_10556/2848021590.py:12: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
<pre><code>/tmp/ipykernel_14072/2848021590.py:12: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
x.groupby(["FlightDate", "Tail_Number"])</code></pre>
</div>
<div class="cell-output cell-output-display" data-execution_count="12">
Expand Down
32 changes: 12 additions & 20 deletions performance.html
Original file line number Diff line number Diff line change
Expand Up @@ -739,8 +739,8 @@ <h3 data-number="3.2.3" class="anchored" data-anchor-id="performance-comparison"
<span id="cb7-21"><a href="#cb7-21" aria-hidden="true" tabindex="-1"></a> .collect()</span>
<span id="cb7-22"><a href="#cb7-22" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>CPU times: user 82.6 ms, sys: 29.6 ms, total: 112 ms
Wall time: 34.8 ms</code></pre>
<pre><code>CPU times: user 94.4 ms, sys: 28.6 ms, total: 123 ms
Wall time: 39.1 ms</code></pre>
</div>
</div>
</div>
Expand All @@ -762,8 +762,8 @@ <h3 data-number="3.2.3" class="anchored" data-anchor-id="performance-comparison"
<span id="cb9-14"><a href="#cb9-14" aria-hidden="true" tabindex="-1"></a> .rename(columns<span class="op">=</span>{<span class="st">"↓OVA"</span>: <span class="st">"OVA"</span>})</span>
<span id="cb9-15"><a href="#cb9-15" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>CPU times: user 6.53 s, sys: 402 ms, total: 6.93 s
Wall time: 6.93 s</code></pre>
<pre><code>CPU times: user 6.12 s, sys: 327 ms, total: 6.45 s
Wall time: 6.44 s</code></pre>
</div>
</div>
</div>
Expand All @@ -779,8 +779,8 @@ <h3 data-number="3.2.3" class="anchored" data-anchor-id="performance-comparison"
<div class="cell-output cell-output-display" data-execution_count="9">

<div><style>
.dataframe > thead > tr > th,
.dataframe > tbody > tr > td {
.dataframe > thead > tr,
.dataframe > tbody > tr {
text-align: right;
white-space: pre-wrap;
}
Expand Down Expand Up @@ -1027,7 +1027,7 @@ <h3 data-number="3.3.2" class="anchored" data-anchor-id="calculate-great-circle-
<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a> )</span>
<span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a>).collect()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>4.42 s ± 261 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
<pre><code>5.98 s ± 33.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
<p>On my machine the NumPy version tends to be 5-20% faster than the pure Polars version:</p>
Expand All @@ -1042,7 +1042,7 @@ <h3 data-number="3.3.2" class="anchored" data-anchor-id="calculate-great-circle-
<span id="cb18-8"><a href="#cb18-8" aria-hidden="true" tabindex="-1"></a> )</span>
<span id="cb18-9"><a href="#cb18-9" aria-hidden="true" tabindex="-1"></a>).collect()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>5.21 s ± 75.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
<pre><code>4.97 s ± 141 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
<p>This may not be a huge performance difference, but it at least means you don’t sacrifice speed when relying on NumPy. There are some <a href="https://pola-rs.github.io/polars-book/user-guide/howcani/interop/numpy.html">gotchas</a> though so watch out for those.</p>
Expand All @@ -1057,7 +1057,7 @@ <h3 data-number="3.3.2" class="anchored" data-anchor-id="calculate-great-circle-
<span id="cb20-7"><a href="#cb20-7" aria-hidden="true" tabindex="-1"></a> collected[<span class="st">"LONGITUDE_right"</span>].to_numpy()</span>
<span id="cb20-8"><a href="#cb20-8" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>5.63 s ± 33.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
<pre><code>5.52 s ± 68.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
</section>
Expand All @@ -1074,7 +1074,7 @@ <h2 data-number="3.4" class="anchored" data-anchor-id="polars-can-be-slower-than
<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> pandas_transform(df: pd.DataFrame) <span class="op">-&gt;</span> pd.DataFrame:</span>
<span id="cb22-7"><a href="#cb22-7" aria-hidden="true" tabindex="-1"></a> g <span class="op">=</span> df.groupby(<span class="st">"name"</span>)[<span class="st">"value2"</span>]</span>
<span id="cb22-8"><a href="#cb22-8" aria-hidden="true" tabindex="-1"></a> v <span class="op">=</span> df[<span class="st">"value2"</span>]</span>
<span id="cb22-9"><a href="#cb22-9" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> (v <span class="op">-</span> g.transform(np.mean)) <span class="op">/</span> g.transform(np.std)</span>
<span id="cb22-9"><a href="#cb22-9" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> (v <span class="op">-</span> g.transform(<span class="st">"mean"</span>)) <span class="op">/</span> g.transform(<span class="st">"std"</span>)</span>
<span id="cb22-10"><a href="#cb22-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-11"><a href="#cb22-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-12"><a href="#cb22-12" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> polars_transform() <span class="op">-&gt;</span> pl.Expr:</span>
Expand All @@ -1092,23 +1092,15 @@ <h2 data-number="3.4" class="anchored" data-anchor-id="polars-can-be-slower-than
<div class="cell" data-execution_count="18">
<div class="sourceCode cell-code" id="cb23"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a><span class="op">%</span>timeit rand_df_pl.select(polars_transform())</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>2.2 s ± 153 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
<pre><code>2.25 s ± 153 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
</div>
<div id="tabset-6-2" class="tab-pane" role="tabpanel" aria-labelledby="tabset-6-2-tab">
<div class="cell" data-execution_count="19">
<div class="sourceCode cell-code" id="cb25"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb25-1"><a href="#cb25-1" aria-hidden="true" tabindex="-1"></a><span class="op">%</span>timeit pandas_transform(rand_df_pd)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>/tmp/ipykernel_10730/1003544873.py:9: FutureWarning: The provided callable &lt;function mean at 0x7f99d02a0a40&gt; is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
return (v - g.transform(np.mean)) / g.transform(np.std)</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code>/tmp/ipykernel_10730/1003544873.py:9: FutureWarning: The provided callable &lt;function std at 0x7f99d02a0c20&gt; is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
return (v - g.transform(np.mean)) / g.transform(np.std)</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>1.67 s ± 10.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
<pre><code>1.65 s ± 26.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
</div>
Expand Down
Loading

0 comments on commit 0d824f1

Please sign in to comment.