Skip to content

Commit

Permalink
Adds from_redis importer (#18)
Browse files Browse the repository at this point in the history
* Adds from_redis importer - initial draft

Follows modular approach as per proposal, and hasn't yet been updated to the IO class inheritence. A couple of spec tweaks yet to be done, and YARD Doc is remaining.

* Restructures redis importer for lazy-calling feature

* Minor fix for default argument

* Adds YARD documentation and makes helper methods private

* Adds .to_json to docs for clarity

* Makes couple of changes

Fixes specs as per rubocop specifications. Adds Importers::Base class.

* Moves Importers::Base to importers/base.rb

* Cleans specs, changes Redis#keys -> Redis#scan

* Removes fakeredis from gemspec

* Adds test and docs for Redis#scan

* Removes unenecessary dump.rdb file

If this happens one more time while running redis-server, dump.rdb should probably be gitignore-d in the future.

* Moves Importers::Base to Importers::Importer

Meta-programming added to Redis Importer 🎉

* Adds meta-programming for linkage of redis importer

* Adds support for selecting particular page & traversing pagination of Redis#scan

* Fixes rubocop issue, reverts autoinit in redis importer

RSpec tests for specific page of paginated result is passing, but it's quite a weak test.

* Fixes pagination to support offset and count properly

* Inits offset to 0, DRYies redis importer

* Tries fixing mongodb command issue on Travis

* Removes offset from Redis Importer
  • Loading branch information
athityakumar authored Jun 30, 2017
1 parent fa3f62f commit 08f2fd8
Show file tree
Hide file tree
Showing 9 changed files with 410 additions and 3 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,6 @@ Gemfile.lock
*.DS_store
.rspec_status
coverage/
dump.rdb
doc/
docs/
docs/
1 change: 1 addition & 0 deletions .rubocop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ Style/EmptyElse:
Metrics/BlockLength:
Exclude:
- 'spec/**/*'

Metrics/LineLength:
Max: 120

Expand Down
4 changes: 4 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,12 @@ script:
- bundle exec rubocop

services:
- redis-server
- mongodb

before_install:
- redis-server --daemonize yes

install:
- gem install bundler
- gem install rainbow -v '2.2.1'
Expand Down
1 change: 1 addition & 0 deletions daru-io.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Gem::Specification.new do |spec|

spec.add_development_dependency 'bundler', '~> 1.15'
spec.add_development_dependency 'rake', '~> 10.0'
spec.add_development_dependency 'redis'
spec.add_development_dependency 'rspec', '~> 3.0'
spec.add_development_dependency 'rspec-its'
spec.add_development_dependency 'rubocop', '>= 0.40.0'
Expand Down
177 changes: 177 additions & 0 deletions lib/daru/io/importers/redis.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
require 'daru'

require 'daru/io/importers/util'
require 'json'
require 'redis'

module Daru
module IO
module Importers
class Redis
# Imports a *Daru::DataFrame* from *Redis* connection and keys.
#
# @note In Redis, the specified key and count the number of queries that
# do not always fit perfectly. This persists in this module too,
# as this module is built on top of redis Ruby gem. Hence, if a query
# for 100 keys doesn't return exactly 100 keys, it is not a bug in
# this module. It is just how Redis works.
#
# @param connection [Hash or Redis Instance] Either a Hash of *Redis* configurations,
# or an existing *Redis* instance. For the hash configurations, have a
# look at {http://www.rubydoc.info/github/redis/redis-rb/Redis:initialize
# Redis#initialize}.
# @param keys [Array] Redis key(s) from whom, the *Daru::DataFrame*
# should be constructed. If no keys are given, all keys in the *Redis*
# connection will be used.
# @param match [String] A pattern to get matching keys.
# @param count [Integer] Number of matching keys to be obtained.
#
# @return A *Daru::DataFrame* imported from the given Redis connection
# and matching keys
#
# @example Importing with Redis configuration without specifying keys
# # Say, the Redis connection has this setup
# # Key "10001" => { "name" => "Tyrion", "age" => 32 }.to_json
# # Key "10002" => { "name" => "Jamie", "age" => 37 }.to_json
# # Key "10003" => { "name" => "Cersei", "age" => 37 }.to_json
# # Key "10004" => { "name" => "Joffrey", "age" => 19 }.to_json
#
# connection = {url: "redis://:[password]@[hostname]:[port]/[db]"}
# df = Daru::DataFrame.from_redis(connection)
#
# df
#
# #=> <Daru::DataFrame(4x2)>
# # name age
# # 10001 Tyrion 32
# # 10002 Jamie 37
# # 10003 Cersei 37
# # 10004 Joffrey 19
#
# @example Importing with Redis configuration by specifying keys
# # Say, the Redis connection has this setup
# # Key "10001" => { "name" => "Tyrion", "age" => 32 }.to_json
# # Key "10002" => { "name" => "Jamie", "age" => 37 }.to_json
# # Key "10003" => { "name" => "Cersei", "age" => 37 }.to_json
# # Key "10004" => { "name" => "Joffrey", "age" => 19 }.to_json
#
# connection = {url: "redis://:[password]@[hostname]:[port]/[db]"}
# df = Daru::DataFrame.from_redis(connection, "10001", "10002")
#
# df
#
# #=> <Daru::DataFrame(2x2)>
# # name age
# # 10001 Tyrion 32
# # 10002 Jamie 37
#
# @example Importing with Redis instance without specifying keys
# # Say, the Redis connection has this setup
# # Key "name" => ["Tyrion", "Jamie", "Cersei", "Joffrey"]
# # Key "age" => [32, 37, 37, 19]
# # Key "living" => [true, true, true, false]
#
# connection = Redis.new({url: "redis://:[password]@[hostname]:[port]/[db]"})
# df = Daru::DataFrame.from_redis(connection)
#
# df
#
# #=> <Daru::DataFrame(4x3)>
# # name age living
# # 0 Tyrion 32 true
# # 1 Jamie 37 true
# # 2 Cersei 37 true
# # 3 Joffrey 19 false
#
# @example Importing with Redis instance by specifying keys
# # Say, the Redis connection has this setup
# # Key "name" => ["Tyrion", "Jamie", "Cersei", "Joffrey"]
# # Key "age" => [32, 37, 37, 19]
# # Key "living" => [true, true, true, false]
#
# connection = Redis.new({url: "redis://:[password]@[hostname]:[port]/[db]"})
# df = Daru::DataFrame.from_redis(connection, "name", "age")
#
# df
#
# #=> <Daru::DataFrame(4x2)>
# # name age
# # 0 Tyrion 32
# # 1 Jamie 37
# # 2 Cersei 37
# # 3 Joffrey 19
#
# @example Querying for matching keys with count
# # Say, the Redis connection has this setup
# # Key "key:1" => { "name" => "name1", "age" => "age1" }.to_json
# # Key "key:2" => { "name" => "name2", "age" => "age2" }.to_json
# # Key "key:3" => { "name" => "name3", "age" => "age3" }.to_json
# # ...
# # Key "key:2000" => { "name" => "name2000", "age" => "age2000" }.to_json
#
# connection = {url: "redis://:[password]@[hostname]:[port]/[db]"}
# Daru::DataFrame.from_redis(connection, match: "key:1*")
#
# #=> #<Daru::DataFrame(1111x2)>
# # name age
# # key:1045 name1045 age1045
# # key:1919 name1919 age1919
# # key:1155 name1155 age1155
# # key:1649 name1649 age1649
# # ... ... ...
#
# Daru::DataFrame.from_redis({}, match: "key:1*", count: 200)
#
# #=> #<Daru::DataFrame(200x2)>
# # name age
# # key:1927 name1927 age1927
# # key:1759 name1759 age1759
# # key:1703 name1703 age1703
# # key:1640 name1640 age1640
# # ... ... ...
def initialize(connection={}, *keys, match: nil, count: nil)
@match = match
@count = count
@client = get_client(connection)
@keys = choose_keys(*keys).map(&:to_sym)
end

def call
vals = @keys.map { |key| ::JSON.parse(@client.get(key), symbolize_names: true) }
Util.guess_parse(@keys, vals)
end

private

def choose_keys(*keys)
return keys.to_a unless keys.empty?

cursor = nil
# Loop to iterate through paginated results of Redis#scan.
until cursor == '0' || (!@count.nil? && keys.count > (@count-1))
cursor, chunk = @client.scan(cursor, match: @match, count: @count)
keys.concat(chunk).uniq!
end
return keys[0..-1] if @count.nil?
keys[0..@count-1]
end

def get_client(connection)
case connection
when ::Redis
connection
when Hash
::Redis.new connection
else
raise ArgumentError, "Expected '#{connection}' to be either "\
'a Hash or an initialized Redis instance, '\
"but received #{connection.class} instead."
end
end
end
end
end
end

require 'daru/io/link'
Daru::DataFrame.register_io_module :from_redis, Daru::IO::Importers::Redis
38 changes: 38 additions & 0 deletions lib/daru/io/importers/util.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
module Daru
module IO
module Importers
module Util
def self.guess_parse(keys, vals)
case vals.first
when Array
case vals.first.first
when Hash
# Array of hashes
# key a :
# [
# { x: 1, y: 2 },
# { x: 3, y: 4 }
# ]
Daru::DataFrame.new vals.flatten
else
# Hash containing Array
# key a :
# {
# x: [1,2,3,4]
# y: [5,6,7,8]
# }
Daru::DataFrame.rows vals.transpose, order: keys
end
when Hash
# Array containing Hash
# [
# key a: { x: 1, y: 2 }
# key b: { x: 3, y: 4 }
# ]
Daru::DataFrame.new vals.flatten, index: keys
end
end
end
end
end
end
Loading

0 comments on commit 08f2fd8

Please sign in to comment.