Skip to content

Commit

Permalink
merge with main
Browse files Browse the repository at this point in the history
  • Loading branch information
dwreeves committed Aug 31, 2024
2 parents 8cecd98 + 854edbb commit acf5d5c
Show file tree
Hide file tree
Showing 16 changed files with 1,436 additions and 982 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
integration_tests/package-lock.yml
dbt.duckdb
dbt.duckdb.wal
.user.yml
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ repos:
- id: trailing-whitespace

- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: v0.0.284
rev: v0.6.2
hooks:
- id: ruff

Expand Down
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
# Changelog

### `0.2.5`

- Fix bug where `exog` and `group_by` did not handle `str` inputs e.g. `exog="x"`.
- Fix bug where `group_by` for `method='fwl'` with exactly 1 exog variable did not work. (Explanation: `method='fwl'` dispatches to a different macro for the special case of 1 exog variable, and `group_by` was not implemented correctly here.)
- Fix bug where `safe` mode did not work for `method='chol'`.
- Improved docs by hiding everything except `ols()`, improved description of `ols()` macro, and added missing arg.

### `0.2.4`

- Fix minor incompatibility with Redshift; contributed by [@steelcd](https://github.com/steelcd).

### `0.2.3`

- Added Postgres support in integration tests + fixed bugs that prevented Postgres from working.
Expand Down
12 changes: 7 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<p align="center">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/dwreeves/dbt_linreg/main/docs/src/img/dbt-linreg-banner-dark.png">
<img src="https://raw.githubusercontent.com/dwreeves/dbt_linreg/main/docs/src/img/dbt-linreg-banner-light.png" alt="dbt_linreg logo">
<source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/dwreeves/dbt_linreg/main/docs/src/img/dbt-linreg-banner-dark.png#readme-logo">
<img src="https://raw.githubusercontent.com/dwreeves/dbt_linreg/main/docs/src/img/dbt-linreg-banner-light.png#readme-logo" alt="dbt_linreg logo">
</picture>
</p>
<p align="center">
Expand Down Expand Up @@ -32,7 +32,7 @@ Add this the `packages:` list your dbt project's `packages.yml`:

```yaml
- package: "dwreeves/dbt_linreg"
version: "0.2.3"
version: "0.2.5"
```
The full file will look something like this:
Expand All @@ -43,7 +43,7 @@ packages:
# Other packages here
# ...
- package: "dwreeves/dbt_linreg"
version: "0.2.3"
version: "0.2.5"
```
# Examples
Expand Down Expand Up @@ -193,7 +193,8 @@ def ols(
format_options: Optional[dict[str, Any]] = None,
group_by: Optional[Union[str, list[str]]] = None,
alpha: Optional[Union[float, list[float]]] = None,
method: Literal['chol', 'fwl'] = 'chol'
method: Literal['chol', 'fwl'] = 'chol',
method_options: Optional[dict[str, Any]] = None
):
...
```
Expand Down Expand Up @@ -277,6 +278,7 @@ There are a few reasons why this method is discouraged over the `chol` method:
- 🐌 It tends to be much slower in OLAP systems, and struggles to efficiently calculate large number of columns.
- 📊 It does not calculate standard errors.
- 😕 For ridge regression, coefficients are not accurate; they tend to be off by a magnitude of ~0.01%.
- ⚠️ It does not work in all databases because it relies on `COVAR_POP`.

So when should you use `fwl`? The main use case is in OLTP systems (e.g. Postgres) for unregularized coefficient estimation. Long story short, the `chol` method relies on subquery optimization to be more performant than `fwl`; however, OLTP systems do not benefit at all from subquery optimization. This means that `fwl` is slightly more performant in this context.

Expand Down
2 changes: 1 addition & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: "dbt_linreg"
version: "0.2.3"
version: "0.2.5"

# 1.2 is required because of modules.itertools.
require-dbt-version: [">=1.2.0", "<2.0.0"]
Expand Down
10 changes: 5 additions & 5 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,16 @@ theme:
name: material
palette:
- media: "(prefers-color-scheme: dark)"
scheme: default
scheme: slate
primary: black
accent: light-blue
accent: light blue
toggle:
icon: material/lightbulb-outline
name: Switch to light mode
- media: "(prefers-color-scheme: light)"
scheme: default
primary: white
accent: light-blue
accent: light blue
toggle:
icon: material/lightbulb
name: Switch to dark mode
Expand All @@ -42,11 +42,11 @@ markdown_extensions:
- markdown_include.include:
base_path: docs
- sane_lists
extra_css:
- css/extra.css
extra:
social:
- icon: fontawesome/brands/github
link: https://github.com/dwreeves/dbt_linreg
- icon: fontawesome/brands/linkedin
link: https://www.linkedin.com/in/daniel-reeves-27700545/
- icon: fontawesome/brands/twitter
link: https://twitter.com/mueblesfeos
13 changes: 13 additions & 0 deletions docs/src/css/extra.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
[data-md-color-scheme="light"] img[src$="#only-dark"],
[data-md-color-scheme="light"] img[src$="#gh-dark-mode-only"] {
display: none; /* Hide dark images in light mode */
}

[data-md-color-scheme="dark"] img[src$="#only-light"],
[data-md-color-scheme="dark"] img[src$="#gh-light-mode-only"] {
display: none; /* Hide light images in dark mode */
}

img[src$="#readme-logo"] {
display: none;
}
12 changes: 12 additions & 0 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1 +1,13 @@
---
hide:
- title
- toc
- navigation
---

<p align="center">
<img src="img/dbt-linreg-banner-light.png#only-light" align="center">
<img src="img/dbt-linreg-banner-dark.png#only-dark" align="center">
</p>

{!../README.md!}
7 changes: 5 additions & 2 deletions macros/linear_regression/ols.sql
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,7 @@
2. Dispatches the appropriate call.

The actual calculations occur elsewhere in the code, depending on the
implementation chosen. (At the moment, the only implementation method
supported is method='fwl'.)
implementation chosen.

#############################################################################}

Expand Down Expand Up @@ -61,6 +60,8 @@
{% else %}
{% set exog = [exog] %}
{% endif %}
{% elif exog is string %}
{% set exog = [exog] %}
{% endif %}

{% if group_by is not iterable %}
Expand All @@ -69,6 +70,8 @@
{% else %}
{% set group_by = [group_by] %}
{% endif %}
{% elif group_by is string %}
{% set group_by = [group_by] %}
{% endif %}

{% if alpha is not iterable and alpha is not none %}
Expand Down
10 changes: 5 additions & 5 deletions macros/linear_regression/ols_impl_chol/_ols_impl_chol.sql
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@
{{ return(d) }}
{% endmacro %}
{% macro _forward_substitution(li) %}
{% macro _forward_substitution(li, safe=true) %}
{% set d = {} %}
{% for i, j in modules.itertools.combinations_with_replacement(li, 2) %}
{% set ns = namespace() %}
Expand All @@ -86,7 +86,7 @@
{% endfor %}
{% set ns.numerator = ns.numerator~')' %}
{% endif %}
{% if adapter.type() == "postgres" %}
{% if safe %}
{% do d.update({(i, j): '('~ns.numerator~'/nullif(i'~j~'j'~j~', 0))'}) %}
{% else %}
{% do d.update({(i, j): '('~ns.numerator~'/i'~j~'j'~j~')'}) %}
Expand Down Expand Up @@ -119,7 +119,7 @@
)) }}
{%- endif %}
{%- set subquery_optimization = method_options.get('subquery_optimization', True) %}
{%- set safe_sqrt = method_options.get('safe', True) %}
{%- set safe_mode = method_options.get('safe', True) %}
{%- set calculate_standard_error = format_options.get('calculate_standard_error', (not alpha)) and format == 'long' %}
{%- if alpha and calculate_standard_error %}
{% do log(
Expand Down Expand Up @@ -172,7 +172,7 @@ _dbt_linreg_xtx as (
),
_dbt_linreg_chol as (
{%- set d = dbt_linreg._cholesky_decomposition(li=xcols, subquery_optimization=subquery_optimization, safe=safe_sqrt) %}
{%- set d = dbt_linreg._cholesky_decomposition(li=xcols, subquery_optimization=subquery_optimization, safe=safe_mode) %}
{%- if subquery_optimization %}
{%- for i in (xcols | reverse) %}
select
Expand Down Expand Up @@ -203,7 +203,7 @@ _dbt_linreg_chol as (
),
_dbt_linreg_inverse_chol as (
{#- The optimal way to calculate is to do each diagonal at a time. #}
{%- set d = dbt_linreg._forward_substitution(li=xcols) %}
{%- set d = dbt_linreg._forward_substitution(li=xcols, safe=safe_mode) %}
{%- if subquery_optimization %}
{%- for gap in (range(0, upto) | reverse) %}
select *,
Expand Down
2 changes: 1 addition & 1 deletion macros/linear_regression/ols_impl_special/_ols_1var.sql
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ _dbt_linreg_cmeans as (
{%- endif %}
_dbt_linreg_base as (
select
{{ dbt_linreg._gb_cols(group_by, trailing_comma=True) | indent(4) }}
{{ dbt_linreg._alias_gb_cols(group_by) | indent(4) }}
{%- if alpha and add_constant %}
b.{{ endog }} - _dbt_linreg_cmeans.y as y,
b.{{ exog[0] }} - _dbt_linreg_cmeans.x1 as x1,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,14 @@ gb{{ loop.index }} as {{ gb }},
{% endif %}
{% endmacro %}

{% macro redshift___maybe_round(x, round_) %}
{% if round_ is not none %}
{{ return('round(' ~ x ~ ', ' ~ round_ ~ ')') }}
{% else %}
{{ return(x) }}
{% endif %}
{% endmacro %}

{# Alias and write group by columns in a standard way. #}
{% macro _gb_cols(group_by, trailing_comma=False, prefix=None) -%}
{%- if group_by %}
Expand Down
Loading

0 comments on commit acf5d5c

Please sign in to comment.