Skip to content
Anton Le edited this page Nov 3, 2021 · 6 revisions

Refinery

Refinery is a tool to extract the data from excel spreadsheets (both in .xls and .xlsx format) in a declarative way. The idea is that you define what you want to extract, but you don't write the algorithm how the data should be extracted.

At Vortexa, we receive tons of excel spreadsheets with important information. Unfortunately, these spreadsheets have different format and populated in many different ways depending on the data source.

The library allows you to do both extraction of the data and validation/transformation of the extracted data along the way. This documentation is a step-by-step guide of all features, starting from the simple one and then covering more and more tricky cases.

Extraction

  • Basic table extraction
  • Multiple tables extraction
  • Multiple tables with the anchor extraction
Clone this wiki locally