-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathREADME.Rmd
113 lines (91 loc) · 3.97 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
<!-- rmarkdown v1 -->
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r set-options, echo=FALSE, cache=FALSE}
library(sqlscore)
library(knitr)
options(width=80)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-",
warning = FALSE,
message = FALSE,
echo = TRUE,
tidy = TRUE,
size="small",
linewidth=80
)
# From https://github.com/yihui/knitr-examples/blob/master/077-wrap-output.Rmd
hook_output = knit_hooks$get('output')
knit_hooks$set(output = function(x, options) {
# this hook is used only when the linewidth option is not NULL
if (!is.null(n <- options$linewidth)) {
x = knitr:::split_lines(x)
# any lines wider than n should be wrapped
if (any(nchar(x) > n)) x = strwrap(x, width = n)
x = paste(x, collapse = '\n')
}
hook_output(x, options)
})
```
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/sqlscore)](https://cran.r-project.org/package=sqlscore)
[![Build Status](https://img.shields.io/travis/wwbrannon/sqlscore.svg?style=flat)](https://travis-ci.org/wwbrannon/sqlscore)
[![Coverage Status](https://codecov.io/gh/wwbrannon/sqlscore/branch/master/graph/badge.svg)](https://codecov.io/github/wwbrannon/sqlscore?branch=master)
[![Downloads](https://cranlogs.r-pkg.org/badges/sqlscore)](https://cran.r-project.org/package=sqlscore)
[![License](https://img.shields.io/:license-mit-blue.svg?style=flat)](https://wwbrannon.mit-license.org/)
# sqlscore
Utilities for scoring GLMs and related models in SQL. Use the create_statement
and select_statement functions to generate scoring queries from model objects.
The most important use case is for very large scoring datasets, especially those
which can't fit into memory or would require too much network or storage I/O if
scored the usual way in R.
The SQL-generating functions in sqlscore handle various formula operators, and
also take care of wrapping the model's linear predictor in the appropriate link
function. If needed, you can also specify a custom link function.
## Usage:
The SQL-generating functions create\_statement and select\_statement do what their
names suggest and generate CREATE TABLE and SELECT statements for model scoring.
If, for example, you have a database table of iris measurements, you can model the
sepal length and generate predictions as follows:
```
mod <- glm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species,
data=datasets::iris)
create_statement(mod, src_table="iris", dest_table="iris_scores", pk=c("id"))
```
```{r, echo = FALSE}
mod <- glm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species,
data=datasets::iris)
create_statement(mod, src_table="iris", dest_table="iris_scores")
```
To get a SELECT statement that's not wrapped in a CREATE TABLE (so that, e.g.,
you can add your own database-specific pieces of SQL), use select_statement:
```
mod <- glm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species,
data=datasets::iris)
select_statement(mod, src_table="iris", pk=c("id"))
```
```{r, echo = FALSE}
mod <- glm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species,
data=datasets::iris)
select_statement(mod, src_table="iris")
```
Helper functions include linpred(), which generates an R call object representing
the linear predictor, and score_expression, an S3 generic that handles wrapping
the linear predictor in the response function.
## Supported model types:
Specific packages and models that are known to work include: glm and lm from
package:stats, cv.glmnet from package:glmnet, glmboost from package:mboost,
and bayesglm from package:arm.
Default S3 methods are for objects structured like those of class "glm", so
models not listed here may work if they resemble those objects, but are not
guaranteed to.
## Installation:
Install the released version from CRAN:
```
install.packages('sqlscore')
```
Install the dev version from github:
```
install.packages("devtools")
devtools::install_github("wwbrannon/sqlscore")
```