Skip to content

Commit

Permalink
flesh out docs
Browse files Browse the repository at this point in the history
  • Loading branch information
ryepup authored Apr 23, 2021
1 parent 24def10 commit d9e7184
Showing 1 changed file with 71 additions and 19 deletions.
90 changes: 71 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,16 @@ compare the two rows with configurable matching rules.
The comparison results are then written to xlsx for further analysis, one row
for each cell that didn't match.

## Example
## Usage

You can download a pre-built binaries from the github releases page.

```console
$ dotnet run --project src/XlsxCompare -- config.json left.xlsx right.xlsx
$ ./XlsxCompare config.json left.xlsx right.xlsx
[20:54:44] info: XlsxCompare.Driver[0]
Starting
[20:54:44] info: XlsxCompare.Driver[0]
Reading config from configt.json
Reading config from config.json
[20:54:44] info: XlsxCompare.XlsxComparer[0]
Comparing left.xlsx to right.xlsx
[20:54:48] info: XlsxCompare.XlsxComparer[0]
Expand All @@ -40,21 +42,7 @@ $ dotnet run --project src/XlsxCompare -- config.json left.xlsx right.xlsx

## Configuration

The config file controls how the two files will be compared. This gets
deserialized to the `CompareOptions` type, see that code for full details.

See `MatchBy` for the different matching rules supported.

### Sample config

This config will:

* join the two xlsx files on `Id == OLD_ID`
* include `Id` and `NEW_ID` for each row in results file
* checks for `Name == CUSTOMER_NAME`, ignoring case and normalizing nulls and
whitespace (e.g. `"Ryan "` will match `"RYAN"` and `null` will match `""`)
* checks for `DateAdded == DT_CREATE`, parsing both into dates before comparing
(e.g. `"2021-04-02"` will match `"4/2/2021 3:45PM"`)
A complicated json file controls how the two files will be compared. Here's a small example:

```json
{
Expand All @@ -64,7 +52,7 @@ This config will:
"leftValueHeader": "my value",
"rightValueHeader": "your value",
"leftColumnNames": [
"Id"
"Batch #"
],
"rightColumnNames": [
"NEW_ID"
Expand All @@ -83,3 +71,67 @@ This config will:
]
}
```

This config will:

* join the two xlsx files on `Id == OLD_ID`
* checks for `Name == CUSTOMER_NAME`, ignoring case and normalizing nulls and
whitespace (e.g. `"Ryan "` will match `"RYAN"` and `null` will match `""`)
* checks for `DateAdded == DT_CREATE`, parsing both into dates before comparing
(e.g. `"2021-04-02"` will match `"4/2/2021 3:45PM"`)
* the output `results.xlsx` will have a row for each mismatch, with these columns:
* `Batch #` - value from the left file's `Batch #` column
* `NEW_ID` - value from the right file's `NEW_ID` column
* `Mismatched field` - either `Name` or `DateAdded`, depending on which assertion failed
* `my value` - mismatched value from the left file
* `your value` - mismatched value from the right file

### Top-level configuration

|key|meaning|default|
|-|-|-|
|`leftKeyColumn`|column name in the "left" file that contains a primary key||
|`rightKeyColumn`|column name in the "right" file that matches the primary key||
|`resultOptions`|object configuring the result file, see below||
|`assertions`|an array of assertion objects configuring how we want columns to match, see below||
|`ignoreMissingRows`|allow rows to exist in the "left" without with a match in the "right" file, useful for checking partial output|`false`|

### `resultOptions` configuration

|key|meaning|default|
|-|-|-|
|`path`|name of the xlsx file to write|`results.xlsx`|
|`leftValueHeader`|header to use over `leftKeyColumn` values|`left value`|
|`rightValueHeader`|header to use over `rightKeyColumn` values|`right value`|
|`leftColumnNames`|additional data to include from the left file. This is useful for adding context to help analyse the mismatches.|`null`|
|`rightColumnNames`|additional data to include from the right file|`null`|

### `assertion` configuration

|key|meaning|default|
|-|-|-|
|`leftColumnName`|column to compare from the left file||
|`rightColumnName`|column to compare from the right file||
|`matchBy`|how to compare the two values, see `matchBy` below|`string`|
|`remove`|if present: before comparison, remove this string from both values|`null`|
|`zeroRepresentsEmpty`|if true: before comparison, convert any zero values (e.g. `0`, `0.0`) to empty string|`false`|

### `matchBy` options

|`matchBy`|rule|examples|
|-|-|-|
|`string`|strings must match, ignoring case and leading/trailing whitespace|`test` matches `test,` `TEST` and ` Test `, but not `testing`|
|`stringIgnoreMissingLeft`|same as `string`, but treat a missing "left" value as a match|same as `string`, but an empty string matches `test`|
|`integer`|parse to integers before comparison|`0123` matches `123`|
|`decimal`|parse to decimals before comparison|`0.123` matches `.123000`|
|`date`|parse as dates before comparison|`2021-04-02` matches `20210402` and `4/2/2021 3:45PM`, but not `2021-04-03`|
|`stringLeftStartsWithRight`|the left value must start with the right value|`testing` matches `test`, but not `testing with suffix`|
|`stringRightStartsWithLeft`|the right value must start with the left value|`test` matches `testing`|

## Developing

* uses `net5`
* recommend using [Visual Studio Code](https://code.visualstudio.com/), or
another IDE that supports [EditorConfig](https://editorconfig.org/)
* releases are based on git tags: make a tag like `v${SEMVER}` and CI will
create a github release with standalone binaries

0 comments on commit d9e7184

Please sign in to comment.