-
Notifications
You must be signed in to change notification settings - Fork 0
datap in 2 minutes
Christoph Glur edited this page Aug 15, 2016
·
1 revision
Within your datap context (i.e. the datap yaml file), you can
- define data processing pipes:
- Where is your data coming from? (internet, file, db, memory, etc.)
- How should your data be pre-processed? (cleaning, plausibility checking, missing value handling, etc.)
- define data taps by
- giving short id's to your datasets
- organising your datasets into hierarchies
For example, a (simple) datap context might look like this:
stocks:
type: structure
Apple:
type: tap
download:
type: processor
function: Quandl::Quandl(code = 'YAHOO/AAPL', type = 'xts')
Tesla:
type: tap
download:
type: processor
function: Quandl::Quandl(code = 'YAHOO/TSLA', type = 'xts')
indices:
type: structure
S&P500:
type: tap
download:
type: processor
function: quantmod::getSymbols(Symbols = '^GSPC', auto.assign = FALSE)
It defines three taps (Apple, Tesla, and S&P500), and organises stocks and indices neatly in a hierarchical structure.
Save this file in your project.
Once you defined your datap context, you can call your taps from R to load your data into an R session by accessing a single, consistent API.
library(datap)
filePath <- "C:\projects\datap\context.yaml")
context <- Load(filePath)
The context looks like this:
context
## levelName
## 1 context
## 2 ¦--stocks
## 3 ¦ ¦--Apple
## 4 ¦ °--Tesla
## 5 °--indices
## 6 °--S&P500
And you can directly navigate to a tap to fetch the data:
teslaBars <- context$stocks$Tesla$tap()
head(teslaBars)
## Open High Low Close Volume Adjusted Close
## 2010-06-29 19.00 25.00 17.54 23.89 18766300 23.89
## 2010-06-30 25.79 30.42 23.30 23.83 17187100 23.83
## 2010-07-01 25.00 25.92 20.27 21.96 8218800 21.96
## 2010-07-02 23.00 23.10 18.71 19.20 5139800 19.20
## 2010-07-06 20.00 20.00 15.83 16.11 6866900 16.11
## 2010-07-07 16.40 16.63 14.98 15.80 6921700 15.80
If, at a later stage, you change your data source, or your pre-processing steps, the consuming code will not have to know.
datap - da data tap & da data pipes