Skip to content

datap in 2 minutes

Christoph Glur edited this page Aug 15, 2016 · 1 revision

datap in 2 minutes

Within your datap context (i.e. the datap yaml file), you can

  1. define data processing pipes:
    1. Where is your data coming from? (internet, file, db, memory, etc.)
    2. How should your data be pre-processed? (cleaning, plausibility checking, missing value handling, etc.)
  2. define data taps by
    1. giving short id's to your datasets
    2. organising your datasets into hierarchies

For example, a (simple) datap context might look like this:

stocks:
  type: structure
  Apple:
    type: tap
    download:
      type: processor
      function: Quandl::Quandl(code = 'YAHOO/AAPL', type = 'xts')
  Tesla:
    type: tap
    download:
      type: processor
      function: Quandl::Quandl(code = 'YAHOO/TSLA', type = 'xts')
indices:
  type: structure
  S&P500:
    type: tap
    download:
      type: processor
      function: quantmod::getSymbols(Symbols = '^GSPC', auto.assign = FALSE)

It defines three taps (Apple, Tesla, and S&P500), and organises stocks and indices neatly in a hierarchical structure.

Save this file in your project.

Once you defined your datap context, you can call your taps from R to load your data into an R session by accessing a single, consistent API.

library(datap)
filePath <- "C:\projects\datap\context.yaml")
context <- Load(filePath)

The context looks like this:

context
##        levelName
## 1 context       
## 2  ¦--stocks    
## 3  ¦   ¦--Apple 
## 4  ¦   °--Tesla 
## 5  °--indices   
## 6      °--S&P500

And you can directly navigate to a tap to fetch the data:

teslaBars <- context$stocks$Tesla$tap()
head(teslaBars)
##             Open  High   Low Close   Volume Adjusted Close
## 2010-06-29 19.00 25.00 17.54 23.89 18766300          23.89
## 2010-06-30 25.79 30.42 23.30 23.83 17187100          23.83
## 2010-07-01 25.00 25.92 20.27 21.96  8218800          21.96
## 2010-07-02 23.00 23.10 18.71 19.20  5139800          19.20
## 2010-07-06 20.00 20.00 15.83 16.11  6866900          16.11
## 2010-07-07 16.40 16.63 14.98 15.80  6921700          15.80

If, at a later stage, you change your data source, or your pre-processing steps, the consuming code will not have to know.

Clone this wiki locally