Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excel Exporter - support for options #37

Merged
merged 6 commits into from
Jul 30, 2017
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 99 additions & 20 deletions lib/daru/io/exporters/excel.rb
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,28 @@ class Excel < Base

# Exports +Daru::DataFrame+ to an Excel Spreadsheet.
#
# @note For giving formatting options as hashes to the +:data+, +:index+ or +header+
# keyword argument(s), please have a look at the
# {http://www.rubydoc.info/gems/ruby-spreadsheet/Spreadsheet/Font Spreadsheet::Font}
# and
# {http://www.rubydoc.info/gems/ruby-spreadsheet/Spreadsheet/Format Spreadsheet::Format}
# pages.
#
# @param dataframe [Daru::DataFrame] A dataframe to export
# @param path [String] Path of the file where the +Daru::DataFrame+
# should be written.
# @param options [Hash] A set of options containing user-preferences
# @param header [Hash or Boolean] Defaults to true. When set to false or nil,
# headers are not written. When given a hash of formatting options,
# headers are written with the specific formatting. When set to true,
# headers are written without any formatting.
# @param data [Hash or Boolean] Defaults to true. When set to false or nil,
# data values are not written. When given a hash of formatting options,
# data values are written with the specific formatting. When set to true,
# data values are written without any formatting.
# @param index [Hash or Boolean] Defaults to true. When set to false or nil,
# index values are not written. When given a hash of formatting options,
# index values are written with the specific formatting. When set to true,
# index values are written without any formatting.
#
# @example Writing to an Excel file without options
# df = Daru::DataFrame.new([[1,2],[3,4]], order: [:a, :b])
Expand All @@ -23,34 +41,95 @@ class Excel < Base
#
# Daru::IO::Exporters::Excel.new(df, "dataframe_df.xls").call
#
# @todo The +opts+ parameter isn't used while creating the Excel Spreadsheet
# yet. Implementing this feature will greatly allow the user to generate a
# Spreadsheet of their choice.
def initialize(dataframe, path, **options)
# @example Writing to an Excel file with formatting options
# df = Daru::DataFrame.new([[1,2],[3,4]], order: [:a, :b])
#
# #=> #<Daru::DataFrame(2x2)>
# # a b
# # 0 1 3
# # 1 2 4
#
# Daru::IO::Exporters::Excel.new(df,
# "dataframe_df.xls",
# header: { color: :red, weight: :bold },
# index: false,
# data: { color: :blue }
# ).call
#
# @example Writing a DataFrame with Multi-Index to an Excel file
# df = Daru::DataFrame.new [[1,2],[3,4]], order: [:x, :y], index: [[:a, :b, :c], [:d, :e, :f]]
#
# #=> #<Daru::DataFrame(2x2)>
# # x y
# # a b c 1 3
# # d e f 2 4
#
# Daru::IO::Exporters::Excel.new(df,
# "dataframe_df.xls",
# header: { color: :red, weight: :bold },
# index: { color: :green },
# data: { color: :blue }
# ).call
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm totally not sure, but don't you think it is something else? Like "format of ...":

Daru::IO::Exporters::Excel.new(formatting: {header: blah, data: blah}).call(path)
# or even...
Daru::IO::Exporters::Excel.new(formatting: {order: blah, index: blah, data: blah}).call(path)

(BTW, as you can see by my example, I am totally not sure where path should be passed... We can discuss it later, though, leave your version)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, all arguments are given only in the initialize() method and none to call().

Daru::IO::Exporters::Excel.new(
  df,
  'path/file.xls',
  formatting: {header: {...}, data: {...}, index: {...}},
  display: {header: true, index: false, data: true}
)

Would it be better to rename :header to :order?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, all arguments are given only in the initialize() method and none to call().

OK, let's leave it this way, at least for now.

Would it be better to rename :header to :order?

¯\_(ツ)_/¯ I can imagine a lot of pro and contra arguments. But probably we should stick to "dataframe point of view" (data, index, order). But² maybe "index & orders" could be called "headers" in this context.

def initialize(dataframe, path, header: true, data: true, index: true)
optional_gem 'spreadsheet', '~> 1.1.1'

super(dataframe)
@path = path
@options = options
@path = path
@data = data
@index = index
@header = header
end

# @note
#
# The +format+ variable used in this method, has to be given
# as options by the user via the +options+ hash input.
#
# Signed off by @athityakumar on 03/06/2017 at 7:00PM
def call
book = Spreadsheet::Workbook.new
sheet = book.create_worksheet
@book = Spreadsheet::Workbook.new
@sheet = @book.create_worksheet

process_offsets
write_headers

@dataframe.each_row_with_index.with_index do |(row, idx), i|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, r| or , dr| (delta row)

write_index(idx, i+@row_offset)
write_data(row, i+@row_offset)
end

@book.write(@path)
end

private

def process_offsets
@row_offset = @header ? 1 : 0
@col_offset = 0 unless @index
@col_offset ||= @dataframe.index.is_a?(Daru::MultiIndex) ? @dataframe.index.levels.size : 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, @dataframe.index.width for MultiIndex

end

def write_headers
return unless @header

@sheet.row(0).concat([' '] * @col_offset + @dataframe.vectors.map(&:to_s))
return unless @header.is_a?(Hash)

@sheet.row(0).default_format = Spreadsheet::Format.new(@header)
end

def write_index(idx, row)
return unless @index

@sheet.row(row).concat(idx.is_a?(Array) ? idx.to_a.map(&:to_s) : [idx.to_s])
return unless @index.is_a?(Hash)

@col_offset.times { |col| @sheet.row(row).set_format(col, Spreadsheet::Format.new(@index)) }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a bit cleaner code, I'd define helper method:

def format(col_range, row, format)
  return unless format.is_a?(Hash)
  ....
end

Then here you can just

format(0...@col_offset, row, @index)

and below

format(@col_offset...@col_offset + @dataframe.ncols, idx, @data)

WDYT?..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks much better & DRYier. I've added this with formatting method, and other methods like write_data, write_index redirect to formatting with different arguments. 👍

end

format = Spreadsheet::Format.new color: :blue, weight: :bold
def write_data(row, idx)
return unless @data

sheet.row(0).concat(@dataframe.vectors.to_a.map(&:to_s)) # Unfreeze strings
sheet.row(0).default_format = format
@dataframe.each_row_with_index { |row, i| sheet.row(i+1).concat(row.to_a) }
@sheet.row(idx).concat(row.to_a)
return unless @data.is_a?(Hash)

book.write(@path)
@dataframe.ncols.times do |col|
@sheet.row(idx).set_format(@col_offset + col, Spreadsheet::Format.new(@data))
end
end
end
end
Expand Down
51 changes: 43 additions & 8 deletions spec/daru/io/exporters/excel_spec.rb
Original file line number Diff line number Diff line change
@@ -1,19 +1,22 @@
RSpec.describe Daru::IO::Exporters::Excel do
include_context 'exporter setup'

subject do
Daru::DataFrame.rows(
Spreadsheet.open(tempfile.path).worksheet(0).rows[1..-1].map(&:to_a),
order: Spreadsheet.open(tempfile.path).worksheet(0).rows[0].to_a
)
end

let(:filename) { 'test_write.xls' }
let(:content) { Spreadsheet.open tempfile.path }
let(:opts) { {header: {color: :blue}, data: {color: :red}, index: {color: :green}} }

before { described_class.new(df, tempfile.path, opts).call }
before { described_class.new(df, tempfile.path, **opts).call }

context 'writes to excel spreadsheet' do
subject do
Daru::DataFrame.rows(
Spreadsheet.open(tempfile.path).worksheet(0).rows[1..-1].map(&:to_a),
order: Spreadsheet.open(tempfile.path).worksheet(0).rows[0].to_a
)
end

let(:opts) { {index: false} }

it_behaves_like 'exact daru dataframe',
ncols: 4,
nrows: 5,
Expand All @@ -25,4 +28,36 @@
[nil, 23, 4,'a','ff']
]
end

context 'writes to excel spreadsheet with header formatting' do
subject { Spreadsheet.open(tempfile.path).worksheet(0).rows[0].format(0).font.color }

it { is_expected.to eq(:blue) }
end

context 'writes to excel spreadsheet with index formatting' do
subject { Spreadsheet.open(tempfile.path).worksheet(0).rows[1].format(0).font.color }

it { is_expected.to eq(:green) }
end

context 'writes to excel spreadsheet with data formatting' do
subject { Spreadsheet.open(tempfile.path).worksheet(0).rows[1].format(1).font.color }

it { is_expected.to eq(:red) }
end

context 'writes to excel spreadsheet with multi-index' do
subject { Spreadsheet.open(tempfile.path).worksheet(0).rows }

let(:df) do
Daru::DataFrame.new(
[[1,2],[3,4]],
order: %i[x y],
index: [%i[a b c], %i[d e f]]
)
end

it { is_expected.to eq([[' ', ' ', ' ', 'x', 'y'], ['a', 'b', 'c', 1, 3], ['d', 'e', 'f', 2, 4]]) }
end
end