Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excel Exporter - support for options #37

Merged
merged 6 commits into from
Jul 30, 2017

Conversation

athityakumar
Copy link
Member

No description provided.

@athityakumar athityakumar force-pushed the excel-exporter-improvement branch from f46cff2 to 365ee8e Compare July 11, 2017 08:46
test.xls was produced while trying out in pry, and was mistakenly added into git history
@zverok
Copy link
Collaborator

zverok commented Jul 19, 2017

(Before the review) Please-please, try to do the descriptive descriptions (sorry for tautology) for your PRs. To write "This enchances this and that, while leaving that and this intact" will take just 5 minutes of your time, but all reviewers would be really grateful.

# "dataframe_df.xls",
# data_options: normal,
# header_options: heading
# ).call
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm totally not sure, but don't you think it is something else? Like "format of ...":

Daru::IO::Exporters::Excel.new(formatting: {header: blah, data: blah}).call(path)
# or even...
Daru::IO::Exporters::Excel.new(formatting: {order: blah, index: blah, data: blah}).call(path)

(BTW, as you can see by my example, I am totally not sure where path should be passed... We can discuss it later, though, leave your version)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, all arguments are given only in the initialize() method and none to call().

Daru::IO::Exporters::Excel.new(
  df,
  'path/file.xls',
  formatting: {header: {...}, data: {...}, index: {...}},
  display: {header: true, index: false, data: true}
)

Would it be better to rename :header to :order?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, all arguments are given only in the initialize() method and none to call().

OK, let's leave it this way, at least for now.

Would it be better to rename :header to :order?

¯\_(ツ)_/¯ I can imagine a lot of pro and contra arguments. But probably we should stick to "dataframe point of view" (data, index, order). But² maybe "index & orders" could be called "headers" in this context.

Copy link
Collaborator

@zverok zverok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is EXTREMELY bad time to ask, but why you've choose spreadsheet gem?.. It doesn't looks reasonable for me, TBH.

#
# @option options formatting [Hash<Hash>] A Hash of formatting options. Supported keys
# are +:header+, +:data+ and +:index+. For example, if +:header+ is set to a Hash
# containing keys such as +:color+, the headers are written with the specified color.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two hashes with same keys looks suspicious to me... Maybe we can join them, this way:

to_excel(dataframe, 'test.xlsx',
  header: false, # or nil, don't show
  index: {color: :red, bold: true}, # show, custom formatting
  data: true # show, default formatting, can be omitted

WDYT?..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged, this does seem much more intuitive to use. Made this and the below changes with commit 6fe329a.

# index: { color: :green },
# data: { color: :blue }
# },
# display: { index: false }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...for example, this example is weird, first set formatting of index, then tell we don't want to show it ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the use-case I was considering is when a daru-io user decides to fix their :formatting and just decide to play-around with the :display hash.

BTW, after having gone through the daru-view repository during the review discussion, I came across some frequently-used default options being declared as a class variable like Daru::View.adapter = :highcharts. Thinking about it, could it be used to define some default options? (Maybe, coupled with meta-programming?) That is, something like Daru::IO::Exporters::Excel.index = {color: :green}? Or does this seem complex?

@options = options
@path = path
@display = options[:display] || {}
@formatting = options[:formatting] || {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch

sheet = book.create_worksheet
@book = Spreadsheet::Workbook.new
@sheet = @book.create_worksheet
@row_offset = @display[:header] ? 1 : 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say write_headers should shift the @row_offset.

def process_col_offset
return 0 unless @display[:index]
return 1 unless @dataframe.index.first.is_a?(Array)
@dataframe.index.first.count
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have more idiomatic way to do it?.. Like

return 1 unless @dataframe.index.is_a?(MultiIndex)
@dataframe.index.levels

format = Spreadsheet::Format.new color: :blue, weight: :bold
if @display[:index] || @display[:data]
@dataframe.each_row_with_index.with_index do |row_idx, i|
row, idx = row_idx
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just each_row_with_index.with_index do |(row, idx), i|, probably

@dataframe.each_row_with_index.with_index do |row_idx, i|
row, idx = row_idx

write_index(idx, i+@row_offset) if @display[:index]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe those ifs place is inside write_index/write_data

@sheet.row(0).concat([' '] * @col_offset + @dataframe.vectors.to_a.map(&:to_s))
end

def write_index(idx, i)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why , i if it should be row or at least r?..


book.write(@path)
def write_data(row, i)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same


book.write(@path)
def write_data(row, i)
(@col_offset..(@col_offset + @dataframe.vectors.to_a.size-1)).each do |x|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. vectors.to_a.size, seriously? More idiomatic way?..
  2. Instead of -1, you should use just excluding range (...). Or you can also do @dataframe.ncols.times. Or even row.size.times
  3. x is extremely bad name

@athityakumar
Copy link
Member Author

@zverok - The existing Excel Exporter was already based on 'Spreadsheet' gem, and I didn't want to break any backward dependencies by changing it. Also, I didn't find any compelling reason to change the Excel Exporter to depend on a different gem. Hence, I chose to stick with the 'Spreadsheet' gem itself. If I'm missing out on any issue with using 'Spreadsheet' gem / if there are better alternatives that you'd like to suggest, please feel free to suggest. 😄

Regarding rest of the review comments, acknowledged. I'll soon make the changes.

YARD Docs have been updated. Using the :data, :index, :header keyword arguments seems to be much more easier and intutive than using something like options[:display][:data]. Code quality has been improved, as per review comments.
@athityakumar
Copy link
Member Author

@zverok - I've made the above changes.

Copy link
Collaborator

@zverok zverok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly good, don't forget about last comments.

process_offsets
write_headers

@dataframe.each_row_with_index.with_index do |(row, idx), i|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, r| or , dr| (delta row)

def process_offsets
@row_offset = @header ? 1 : 0
@col_offset = 0 unless @index
@col_offset ||= @dataframe.index.is_a?(Daru::MultiIndex) ? @dataframe.index.levels.size : 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, @dataframe.index.width for MultiIndex

@sheet.row(row).concat(idx.is_a?(Array) ? idx.to_a.map(&:to_s) : [idx.to_s])
return unless @index.is_a?(Hash)

@col_offset.times { |col| @sheet.row(row).set_format(col, Spreadsheet::Format.new(@index)) }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a bit cleaner code, I'd define helper method:

def format(col_range, row, format)
  return unless format.is_a?(Hash)
  ....
end

Then here you can just

format(0...@col_offset, row, @index)

and below

format(@col_offset...@col_offset + @dataframe.ncols, idx, @data)

WDYT?..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks much better & DRYier. I've added this with formatting method, and other methods like write_data, write_index redirect to formatting with different arguments. 👍

@zverok
Copy link
Collaborator

zverok commented Jul 30, 2017

OK, let's merge it.

But please add issue about underlying gem. The problem is, there are basically 2 (absolutely different) formats of Excel files: old binary one (.xls) and new XML-based one, standartized and modern (.xlsx). If I am not missing something, spreadsheet gem works with the former (which also known as "Excel-2003 format", e.g. 14 years outdated, though still in use), for working with latter you'll need axlsx or rubyxl.

@athityakumar
Copy link
Member Author

@zverok - Acknowledged. I've opened #43 for future discussions on the gem dependency. But, can't XLSX Exporter simply have a different dependency like how the XLSX Importer (PR #28) depends on Roo? (Ofcourse, lesser dependencies would be better.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants