Skip to content

Latest commit

 

History

History
2539 lines (2004 loc) · 81.7 KB

CSV-SQLite3.org

File metadata and controls

2539 lines (2004 loc) · 81.7 KB

CSV-SQLite3 Using Node

Problem Statement

Instead of using Intuit Quicken or Microsoft Money, or even GnuCash, I would like to use Ledger3 to manage my finances. Two big problems immediately arise:
  1. What do I do about the years of prior data that I have accumulated?
  2. How do I efficiently process new data as it comes in from my bank?

The prior data comes in many different sizes and styles. For the immediate future, I will focus solely on the years 2016-2019. I can add additional years after getting things to work for 2016-2019.

New data (transactions) come in from my bank through CSV and QFX downloads. I will download and save both of the CSV and QFX data, but work exclusively with the CSV files at this point.

There is a Download Transactions button that loads a modal window with settings for

  • Account
  • From Date
  • To Date
  • Download To (i.e., format)

{{{noindent}}} So I can easily pick the account, start date, end date, and format, download new transactions into a separate file. The problem that arises is that because this is a manual operation, it becomes difficult to keep track of what dates have been downloaded and processed already, or even wanting to reprocess some dates or all dates, etc. without accumulating duplicate data. An acceptable solution is to concatenate newly downloaded data into files according to account and year, then remove duplicate items. This would be accomplished with a Perl script concatenating the files, sorting the lines, and unique’ifying them and then saving and backing up new and original files.

The Workflow

A potential workflow would therefore be:

  1. Download raw data from a bank by account, start date, and end date, which will be stored in the ~/Downloads directory as export.csv. A script can be executed to review the data to determine what the year is. However, there is no indication in either the filename or the data from which account the data was downloaded. Therefore, any script that is run must be told what account this data is associated with, as for example here:
    catuniq [--dest <path>] <acct>
        

    Invoking this command will look for a file name export.csv in the ~/Downloads directory, determine what year its transactions are, and catenate the file onto the proper destination file according to year and account, and then sort and unique’ify the new file. It will create the necessary destination directory and file to allow the command to run.

    The file onto which the export is catenated can either be indicated on the command line by means of the --dest option, or located through the environment variable $WORKUSB if the --dest option is omitted. Backups are stored in $WORKBAK with the Unix time appended to the end.

    Perl script catuniq.pl

  2. Process the raw data to remove extraneous and unnecessary data while reformating the data into the proper form and columns for efficiently working with the financial data, e.g., producing accounting and tax reports, budgets, etc.. This processed data will be stored in an SQLite3 database, avoiding duplicating data at all times.
  3. Export data from the SQLite3 database, and convert it into a form usable by the Ledger3 program, again always avoiding duplicating transactions.
  4. Do as much as possible with the Ledger3 data automatically to create proper double entry transactions to avoid time-consuming manual work.

This workflow requires a way to download any amount of data and process it with the click of a button without worrying about duplicating transactions. Ledger3 is supposed to be able to detect duplicates and avoid reloading them with its convert command. Therefore it would be convenient to be able to download any amount of new data from a bank and have it combined with old data without duplicating transactions as well.

Then, the workflow would process the downloaded data into a form usable in both an SQLite3 database and Ledger3, again without having to worry about duplicating data. Ledger3 uses a hashing function to produce a unique hash for each entry, similar to how git works. This may be a possible solution for the raw bank data as well.

Source Code for catuniq.pl

package CATUNIQ;
our $VERSION = "0.1.2";
eval $VERSION;
# 2019-07-31T12:30

use strict;
use warnings; no warnings 'experimental';
use v5.16;

use File::Spec;			# Portably perform operations on file names
use File::Path;			# Create or remove directory trees
use File::Slurp;		# Simple and Efficient Reading/Writing/Modifying of Complete Files
use File::Copy;			# Copy files or filehandles
use List::Util 'uniqstr';	# A selection of general-utility list subroutines
use OptArgs2;			# Integrated argument and option processing
use Data::Printer;		# colored pretty-print of Perl data structures and objects


my @valid_accts = (qw'
    6815
    6831
    6151'
);

my @valid_years = (qw'
    2016
    2017
    2018
    2019'
);


# ESTABLISH THE CL OPTIONS AND ARGUMENTS
opt help => (
    isa => 'Flag',
    comment => 'Help',
    alias => 'h',
    ishelp => 1,
);

arg acct => (
    isa     => 'Str',
    comment => 'The name of the account to which the file is related; e.g. "usb_6815" or "usb_6831"',
    required=> 1,
);

opt dest => (
    isa     => 'Str',
    alias   => 'd',
    comment => 'path to the destination file upon which the new data will be catenated.',
    default => exists $ENV{WORKUSB} ? $ENV{WORKUSB} : undef,
);

opt restore => (
    comment => 'Restore a backed-up file related to <ACCT>',
    isa     => 'Flag',
    alias   => 'r',
);

# PROCESS THE CL OPTIONS AND ARGUMENTS
my $opts = optargs;




# VERIFY FILES

# Verify $WORKBAK exists
exists $ENV{WORKBAK} || die("STOP: \$WORKBAK is not defined");

# Verify correct form of 'acct', e.g., 'usb_6815'
my ($acct) = $opts->{acct} =~ /usb_(\d{4})/ or die("STOP: incorrect acct form: $opts->{acct}");
$acct ~~ @valid_accts or die("STOP: acct $acct is not a member of @valid_accts");

# verify a $dest has been supplied
die ("STOP: you did not supply a '-dest' option and \$WORKUSB is not defined.") unless exists $opts->{dest};
my $dest = File::Spec->catdir($opts->{dest}, $opts->{acct}); # e.g., $WORKUSB/usb_6815

if ($opts->{restore}) {
    say "Running restore.";
    restore();
    exit;
}

# Find and verify 'export.csv'
my $export = File::Spec->catfile($ENV{HOME}, 'Downloads', 'export.csv');
-e -r -w $export or die("STOP: $export must exist, be readable, and be writable.");

# Find year from within export.csv
my @lines = read_file($export);
chomp(my $header = shift @lines); # remove the header line from $export
my ($year) = $lines[1] =~ /([[:digit:]]{4})/
    or die("Cannot obtain a year from $export\n");

# verify $dest_year dir exists or create it, including subdirectories
my $dest_year = File::Spec->catdir($dest, $year); # e.g., $WORKUSB/usb_6815/2019
File::Path::make_path($dest_year, {verbose => 1}) unless ( -d $dest_year );

my $acct_year = "$opts->{acct}--${year}.csv";     # e.g., usb_6815--2019.csv
my $dest_file = File::Spec->catfile($dest, $year, $acct_year); # e.g., $WORKUSB/usb_6815/2019/usb_6815--2019.csv




# Backup original $dest_file to $WORKBAK before appending to
my $dest_bak = File::Spec->catfile($ENV{WORKBAK}, "$acct_year." . time());
copy($dest_file, $dest_bak);

# APPEND $export onto $dest_file
append_file($dest_file, @lines)
    or die("STOP: append of $dest_file and \@lines failed.\n");

# UNIQUE new $dest_file
@lines = uniqstr sort map { # first change date to year-mm-dd for proper sorting
    if (/([[:digit:]]{1,2})\/([[:digit:]]{1,2})\/([[:digit:]]{4})/) {
        my $year = sprintf("%4d-%02d-%02d",$3,$1,$2);
        s/$&/$year/;
    }
    $_;
} read_file($dest_file);

unshift @lines, pop @lines; # header ends up last after the sort; put it back to beginning

# Save new $dest_file
write_file($dest_file, @lines);

# Backup export.csv to $WORKBAK
move($export, File::Spec->catfile($ENV{WORKBAK}, "export.${acct_year}." . time()));



say "SUCCESS: $export catuniq'ed onto $dest_file.";

sub restore {
    use POSIX qw(strftime);
    my $acct = $opts->{acct}; # e.g. usb_6815
    my $dt = qr/^(\d{4}-\d{2}-\d{2})/;

    chdir $ENV{WORKBAK};
    opendir (my $dh, '.') || die "Can't open $ENV{WORKBAK}: $!";

    my @baks =
        sort { # sort by most recent backup first
            my ($at) = $a->{t} =~ $dt; # just sort by datetime
            my ($bt) = $b->{t} =~ $dt;
            $bt cmp $at;
        }
        map { # change Unix time to POSIX ISO datetime %Y-%m-%dT%H:%M:%S
            my ($acct, $time) = /^(.*.csv).(\d+)$/;
            $time = substr $time, 0, 10; # remove milliseconds from those times that have them
            my $t = (strftime "%F T %T", localtime($time)) . sprintf(" --- %s", $acct);
            {t => $t, o => $_}; # map to POSIX time, and original filename as a hashref
        }
        grep {/$acct.*.csv/ }
        readdir($dh);

    foreach (@baks) {
        state $c = 0;
        printf("[%2d] %s\n", $c++, $_->{t});
    }

    print "Pick a number: ";
    chomp (my $num = <STDIN>);
    say "You chose $baks[$num]->{t} ($baks[$num]->{o})";
    print "restore (y/n)? ";
    exit unless <STDIN> =~ /y/i;

    my ($file) = $baks[$num]->{t} =~ /--- (.*)$/; # i.e., 'usb_6815--2019.csv'
    my ($year) = $file =~ /(\d{4})\.csv$/; # i.e., '2019'
    my $restore_path = File::Spec->catfile($dest,$year,$file); # full path to file to be restored
    my $bak_path = File::Spec->catfile($ENV{WORKBAK}, $baks[$num]->{o}); # full path to backed-up file

    # back up the file to be restored, just in case; use same directory
    move( ${restore_path}, ${restore_path}.'.bak') or die "Backup of ${restore_path} failed: $!";
    # note that the backed-up file will be deleted
    move( ${bak_path}, ${restore_path}) or die "Restore of $baks[$num]->{o} failed: $!";

    say "Successfully restored $baks[$num]->{o} to $restore_path";
}
ln -f $PWD/scripts/catuniq.pl $WORKBIN/catuniq

Introduction

US Bank has the facility to download bank records in CSV form. This program is designed to convert those downloaded CSV files into a form usable by SQLite, and then to use SQLite to process the data.

{{{heading(Header Lines)}}}

The header lines are:

"Date","Transaction","Name","Memo","Amount"

{{{heading(Data Columns)}}}

{{{subheading(Date)}}}

A sample date is:

1/4/2016

This should be transformed into:

2016-01-14

{{{subheading(Transaction)}}}

A Transaction is one of

  • DEBIT
  • CREDIT

This should be transformed into:

debit | credit

{{{subheading(Name)}}}

A sample name entry is:

DEBIT PURCHASE -VISA USPS PO BOXES 66800-3447779 DC

This should be transformed into:

visa purchase usps po boxes

{{{subheading(Memo)}}}

A sample memo entry is:

Download from usbank.com. USPS PO BOXES 66800-3447779 DC

The Download from usbank.com should be removed in all cases. The extraneous numbers should be removed whenever possible.

{{{subheading(Amount)}}}

  • -66.0000

This should be transformed into:

$-66.00

SQLite Tables

The minimum SQLite tables that should be created are:

usb
includes business (6815), trust (6831), personal (6151) data
cases
190301, etc.
people
John Doe, Mary Jane, etc.
businesses
Law Office of …, etc.
checks
holds check information parsed from worklog.<year>.otl files

More can be created as needed.

SQLite USB Table Columns

The columns that should be created for the usb bank tables are:

  • rowid (implicit creation)
  • acct in the form of usb_6815|usb_6831|usb_6151
  • date in the form of yyyy-mm-dd
  • trans containing either CREDIT | DEBIT
  • checkno containing a check number, if present (check is a reserved word and throws an error)
  • txfr containing a direction arrow (< or >) and a bank account (usb_6151)
  • payee
  • category
  • memo
  • desc1
  • desc2
  • caseno containing a related case number (case is apparently a reserved word and throws an error)
  • amount in the form \pm##,###.##
rowidacctdatetranschecknotxfrpayeecategorymemodesc1desc2casenoamount
primary keyusb_6815yyyy-mm-ddcredit####< usb 6151texttexttexttexttextinteger##,###.##
implicitusb_6831not nulldebitnull> usb 6831not nullnullnullnullnullnullnot null
creationusb_6151null
usb (
       rowid     INTEGER PRIMARY KEY NOT NULL,
       acct      TEXT NOT NULL,
       date      TEXT NOT NULL,
       trans     TEXT NOT NULL,
       checkno   TEXT,
       txfr      TEXT,
       payee     TEXT NOT NULL,
       category  TEXT,
       note      TEXT,
       desc1     TEXT,
       desc2     TEXT,
       caseno    TEXT,
       amount    REAL NOT NULL,
       OrigPayee TEXT NOT NULL,
       OrigMemo  TEXT NOT NULL )

The database column names are identical to the SQL column names except for rowid, which is missing since it is auto-generated by SQLite3.

const DB_TABLES = {
    usb: 'usb',
    checks: 'checks',
};

const DB_ACCTS = {
    '6815': 'Business',
    '6831': 'Trust',
    '6151': 'Personal',
};

const DB_YEARS = [
    '2016',
    '2017',
    '2018',
]

const DB_COLS = [
    'acct',
    'date',
    'trans',
    'checkno',
    'txfr',
    'payee',
    'category',
    'note',
    'desc1',
    'desc2',
    'caseno',
    'amount',
    'OrigPayee',
    'OrigMemo',
];

const EXPORT_DB_COLS = [
    'rowid',
    'acct',
    'date',
    'trans',
    'checkno',
    'txfr',
    'payee',
    'category',
    'note',
    'caseno',
    'amount',
];

SQLite Checks Table Columns

The column names for the checks table columns are:

  • rowid
  • acct e.g., 6815, 6831, 6151
  • checkno, e.g., 1001, 1002
  • date
  • payee
  • subject
  • purpose
  • caseno, e.g., 190301, 190205
  • amount
rowidacctchecknodatepayeesubjectpurposecasenoamount
INTEGERTEXTINTEGERTEXTTEXTTEXTTEXTTEXTREAL
PRIMARY KEYNOT NULLNOT NULLNOT NULLNOT NULLNOT NULLNOT NULL
checks (
	acct		TEXT NOT NULL,
	checkno		TEXT NOT NULL,
	date		TEXT NOT NULL,
	payee		TEXT NOT NULL,
	subject		TEXT NOT NULL,
	purpose		TEXT,
 	caseno		TEXT NOT NULL,
	amount		REAL NOT NULL
)
const CHECKS_COLS = [
    'acct',
    'checkno',
    'date',
    'payee',
    'subject',
    'purpose',
    'caseno',
    'amount'
];

CSV-SQLite3 Usage

  • ~csv-sqlite3 –csv <usb-acct> <year> [–attach <db>] [–checks]
    • <usb-acct> is one of: 6815|6831|6151
    • <year> is one of 2016|2017|2018
    • --attach <db> is optional; the default is workfin.sqlite, otherwise <db>.sqlite
    • --checks parses worklog.<year>.otl files for check data and places it into <db> in a checks documents
    • Transform downloaded data and save in a new CSV file and a new sqlite3 database. The source CSV files are found at:

      $WORKFIN/src/usb/usb_{6815|6831|6151}/{2016|2017|2018}/usb_6815--2016.csv

    • The transformed CSV files are found at:

      $WORKFIN/csv/usb_{6815|6831|6151}__{2016|2017|2018}.csv

    • The new SQLite3 database is found at:

      $WORKFIN/db/workfin.sqlite

    • The exported database file in CSV format will be found at:

      $WORKFIN/csv/workfin.csv

csv-sqlite3 --delete [<db>]

Delete the SQLite3 database file <db>.sqlite, or workfin.sqlite if left blank; also delete the CSV files in csv/;

  • csv-sqlite3 --log-level <level>

    Set a log level (default is warn)

Create the Project

{{{heading(External Dependencies)}}}

This project’s external dependencies are the following Node.js modules:

command-line-args
https://github.com/75lb/command-line-args#readme
command-line-usage
https://github.com/75lb/command-line-usage
csv
https://csv.js.org/
sqlite3
https://github.com/mapbox/node-sqlite3/wiki
accounting
http://openexchangerates.github.io/accounting.js/

{{{heading(Node.js Dependencies)}}}

This project’s Node.js internal dependencies are the following:

File System (fs)
File System for working with files
Path (path)
Utilities for working with file and directory paths
Utilities (util)
Utilities for inspecting objects
Assert (assert)
Assertion functions for verifying invariants

Install the CSV-SQLite3 Package and Dependencies

Create the CSV-SQLite3 project and install the Node.js package dependencies:

  • command-line-ars
  • command-line-usage
  • csv
  • sqlite3
  • accounting
yarn --yes --private init
sed -i '' -e 's/\"version\": \"1.0.0\",/\"version\": \"0.0.73\",/' package.json
yarn add command-line-args command-line-usage csv sqlite3 accounting
yarn add ssh://[email protected]:wlharvey4/wlparser.git#prod
cat package.json

Import the Node.js Dependencies

Create the base file index.js and import the project’s dependencies.

/* index.js */

const fs     = require('fs');
const path   = require('path');
const util   = require('util');
const assert = require('assert').strict;

const cl_args  = require('command-line-args');
const cl_usage = require('command-line-usage');
const csv      = require('csv');
const sqlite3  = require('sqlite3').verbose();      // remove 'verbose' in production
const accounting = require('accounting');

Establish Database Table Name and Columns

<<define-database-tables-and-columns>>

Working with the Command Line

Here is implementation of command-line argument parsing and the generation of a usage message triggered by requesting the option --help.

Command Line Usage

This section generates a usage message activated by the --help option. It uses the ~options_defs~ object created in the code in the next section.

const sections = [
    {
        header: 'CSV-SQLite3',
        content: 'Processes raw usb csv files into a form usable by SQLite3'
    },
    {
        header: 'Options',
        optionList: option_defs,
    },
    {
        content: `Project directory: {underline ${process.env.WORKNODE}/CSV-SQLite3}`
    }
];
const usage = cl_usage(sections);
console.log(usage);

Command Line Argument Processing

Options include giving the name of a database to attach to using --attach <db>. In the absence of this option, a default database will be used. A database can be deleted here as well using the option --delete <db>, with a backup being saved in the WORKBAK directory wtih the unix time suffixed to the end. If the option {{{option(–checks)}}} is included, parse the worklog.<year>.otl worklog file for check information and save the same in a checks table in <db>.

{{{subheading(Usage)}}}

Obtain usage information using the {{{option(–help)}}} option:

csv-sqlite3 --help | -h

{{{subheading(Choose the CSV File)}}}

For non-delete commands, identify the CSV file to transform via the --csv option:

csv-sqlite3 --csv | -c 6815|6831 2004...2019

{{{subheading(Attach and Delete a Database)}}}

The database is attachable (it will be created automatically if it does not exist), and deleteable:

csv-sqlite3 --attach <db> | -a <db-name>
csv-sqlite3 --delete <db> | -d <db-name>

{{{subheading(Export the SQLite3 data to CSV)}}}

Export SQLite3 data into a CSV file of the same name:

csv-sqlite3 --export <db> | -e <db-name>

{{{subheading(Set a Log Level)}}}

csv-sqlite3 --log-level <value:0..10> | -l <value:0..10>

<<option-defs-variable>>

const option_defs = [
    { name: 'help',   alias: 'h', type: Boolean, description: 'Prints this usage message.' },
    { name: 'attach', alias: 'a', type: String,  description: 'Attach to an existing or new database file.' },
    { name: 'delete', alias: 'd', type: String,  description: 'Delete an existing database file and related CSV files.' },
    { name: 'csv',    alias: 'c', type: String,  description: 'Process a CSV file [6815|6831] yyyy', multiple: true  },
    { name: 'export', alias: 'e', type: String,  description: 'Export sqlite3 data into a csv file (default \'workfin.csv\'' },
    { name: 'checks',             type: Boolean, description: 'Find checks in worklog.<year>.otl and save to sqlite3 database checks file.' },
    { name: 'log-level', alias: 'l', type: Number, description: 'Set a log level 0..10' },
];
const options = cl_args(option_defs);
console.log(options);

if (options.help) {
    <<csv-sqlite3-usage>>
    process.exit(0);
}

let LOG_LEVEL = process.env.LOG_LEVEL || 1;
if (options['log-level'] >= 0) {
    if (typeof options['log-level'] === 'number' && options['log-level'] <= 10)
        LOG_LEVEL = options['log-level'];
    else {
        console.error(`Incorrect log-level: ${options['log-level']}; must be between 0 and 10`);
    }
}
console.log(`Log-level set at: ${LOG_LEVEL}`);

Attach To or Delete a Database

SQLite3 can have any number of databases. Only one is initially attached, but more can be attached subsequent to the first attachment. If the database does not exist, it will be created. If the user requests that a database file be deleted, it will be backed up first, then deleted.

The user can attach to a database file (either a specified file or the default file, defined as $WORKFIN/db/workfin.sqlite), or delete a specified database file and the associated CSV file exported from the database data. All deleted files are backed up to a backup directory that needs to be defined as a shell environment variable: WORKBAK. The backed-up files are appended with the current date in UNIX time (seconds).

The attached database will be referenced as db.

{{{heading(Verbose Mode)}}}

During development, call the verbose() method on the sqlite3 object to enable better stack traces. In production, remove this call for improved performance.

if ( !process.env.WORKDB ) { // $WORKFIN/db
    console.error('You must define a shell variable named WORKDB as a base directory for the .sqlite database file.');
    process.exit(1);
}
if ( !process.env.WORKCSV ) { // $WORKFIN/csv
    console.error('You must define a shell variable named WORKCSV as a base directory for the .csv files.');
    process.exit(1);
}
if ( !process.env.WORKLEDGER ) { // $WORKFIN/ledger
    console.error('You must define a shell variable named WORKLEDGER as a base directory for the .ledger files.');
    process.exit(1);
}
if ( !process.env.WORKBAK) { // $WORK/workbak
    console.error('You must define a shell variable named WORKBAK as a backup directory for storing deleted files.');
    process.exit(1);
}

const WORKDB     = process.env.WORKDB;     // base directory for .sqlite db files
if (!fs.existsSync(WORKDB)) { fs.mkdirSync(WORKDB); }
const WORKCSV    = process.env.WORKCSV;    // base directory for .csv files
if (!fs.existsSync(WORKCSV)) { fs.mkdirSync(WORKCSV); }
const WORKLEDGER = process.env.WORKLEDGER; // base directory for .ledger files
const WORKBAK    = process.env.WORKBAK;    // base directory for storing deleted files

const DB_DEFAULT = 'workfin';	       // default sqlite db name
const db_file = options.attach ? options.attach :    // use provided option for attaching
                options.delete ? options.delete :    // use provided option for deletion
                DB_DEFAULT;  	       	         // if no provided option, use the default name

const db_path = path.format({
    dir: WORKDB,
    name: db_file,
    ext: '.sqlite'
});
console.log(`db_path: ${db_path}`);

const csv_path = path.format({
    dir: WORKCSV,
    name: db_file,
    ext: '.csv'
});
console.log(`csv_path: ${csv_path}`);

/*---DELETE--*/
if (options.hasOwnProperty('delete')) {

    const WORKBAK_DB = path.format({
        dir: WORKBAK,
        name: 'db'
    });
    if (!fs.existsSync(WORKBAK_DB)) {
        fs.mkdirSync(WORKBAK_DB, {recursive: true});
    }

    const WORKBAK_CSV= path.format({
        dir:   WORKBAK,
        name: 'csv'
    });
    if (!fs.existsSync(WORKBAK_CSV)) {
        fs.mkdirSync(WORKBAK_CSV, {recursive: true});
    }

    const WORKBAK_LEDGER = path.format({
        dir:   WORKBAK,
        name: 'ledger'
    });
    if (!fs.existsSync(WORKBAK_LEDGER)) {
        fs.mkdirSync(WORKBAK_LEDGER, {recursive: true});
    }

    // Backup workfin.sqlite, workfin.csv
    const db_path_bak = path.format({
        dir: WORKBAK_DB,
        name: db_file,
        ext: `.sqlite.${Date.now()}`
    });

    const csv_path_bak = path.format({
        dir: WORKBAK_CSV,
        name: db_file,
        ext: `.csv.${Date.now()}`
    });

    try {
        fs.renameSync(db_path, db_path_bak);
        console.error(`Renamed ${db_path} to ${db_path_bak}`);
        fs.renameSync(csv_path, csv_path_bak);
        console.error(`Renamed ${csv_path} to ${csv_path_bak}`);
    } catch (err) {
        if (err.code === 'ENOENT')
            console.log(`file ${db_path} and/or ${csv_path} did not exist; ignoring.`);
        else {
            throw err;
        }
    }

    // Backup all .csv files
    try {
        const files = fs.readdirSync(WORKCSV);
        files.forEach(file => {
            const db_csv_path_file = path.format({
                dir: WORKCSV,
                name: file
            });
            const db_csv_path_bak  = path.format({
                dir: WORKBAK_CSV,
                name: file,
                ext: `.${Date.now()}`
            });
            fs.renameSync(db_csv_path_file, db_csv_path_bak);
            console.log(`Renamed ${db_csv_path_file} to ${db_csv_path_bak}`);
        });

    } catch (err) {
        if (err.code === 'ENOENT') {
            console.log(`${db_csv_path} probably does not exist`);
        } else {
            throw err;
        }
    }

    /* Ledger */
    try {
        const files = fs.readdirSync(WORKLEDGER);
        files.forEach(file => {
            if (!/zero/.test(file)) { // don't backup the zero ledger file
                const ledger_file = path.format({
                    dir: WORKLEDGER,
                    name: file
                });
                const ledger_file_bak = path.format({
                    dir: WORKBAK_LEDGER,
                    name: file,
                    ext: `.${Date.now()}`
                });
                fs.renameSync(ledger_file, ledger_file_bak);
                console.log(`Renamed ${ledger_file} to ${ledger_file_bak}`);
            }
        });

    } catch (err) {
        if (err.code === 'ENOENT') {
            console.log(`${ledger_path} probably does not exist`);
        } else {
            throw err;
        }
    }

    process.exit(0);
}

/*--ATTACH--*/
// attach in all situations except --delete
console.log('attaching...');
const db = new sqlite3.Database(db_path, err => {
    if (err) {
        return console.error(`Error opening database file ${db_path}: ${err.message})`);
    }
    console.log(`Successfully attached to database file ${db_path}`);
});
db.serialize();
db.run(`CREATE TABLE IF NOT EXISTS
    <<define-usb-table-schema>>`);

// year is needed for checks, so define it here
const [acct,year] = options.csv;
if (!year) {
  console.error('ERROR: year is not defined.');
  process.exit(1);
}
const wl_year = parseInt(year, 10);

Find and Store Checks

After the Sqlite3 database is exported and converted by Ledger, there are numerous individual entries that need to be converted but for which there is no real data available to help. An example would be checks. The check information is located in the worklog, so one solution is to parse the worklog to obtain check information, then parse the Ledger file to update it.

I created a package called wlparser that can be used to find particular elements in the worklog.<year>.otl files. Among other things, this package is set up to find and return all checks for a particular year. At this point, the findChecks program requires an argument for a <year>, and will then gather all checks and place them into the SQLite3 database in a table called checks. It can be run any number of times for any yeaer and will not add duplicate entries.

When this csv-sqlite3 program converts the CSV files into the ledger format, it can do a search for a particular check number in this SQLite3 checks table and incorporate that data into the ledger file.

Use the commandline program find-checks <year> to find and store checks from the worklog yearly file identified by <year> option (i.e., worklog.<year>.otl) and save it into the SQLite3 database <db> in a table called checks.

/* find-checks.js */

/* USAGE:
 * find-checks <year> [db]
 */

const DEFAULT_DB = 'workfin';
const TABLE      = 'checks';
const EXT        = '.sqlite';

// make sure WORKDB is defined
if (typeof process.env.WORKDB === 'undefined') {
  console.error('Must defined environment variable for WORKDB');
  process.exit(1);
}
const WORKDB = process.env.WORKDB;

// make sure a <year> argument is included
if (process.argv.length < 3) {
  console.error('Must include a <year> argument: "find-checks <year>"');
  process.exit(1);
}

// make sure the <year> argument is a number
const wlyear = parseInt(process.argv[2],10);
if (isNaN(wlyear)) {
  console.error(`The <year> argument: "${process.argv[2]}" must be a year, e.g., "2016"`);
  process.exit(1);
}

// second optional argument is the name of the database, without extension
// if no second argument, use default db of $WORKDB/workfin.sqlite
const path   = require('path');
const db_path = path.format({
  dir: WORKDB,
  name: `${process.argv[3] || DEFAULT_DB}`,
  ext: EXT
});

// Everything is a go; load the wlparser, wlchecks, sqlite3 modules
const {WLChecks} = require('wlparser');
const wlchecks   = new WLChecks(wlyear);
const sqlite3    = require('sqlite3').verbose(); // remove verbose() for production code
const CHECKS_COLS = [
    'acct',
    'checkno',
    'date',
    'payee',
    'subject',
    'purpose',
    'caseno',
    'amount'
];
let statement;

// Load the sqlite3 database
const db = new sqlite3.Database(db_path, err => {
  if (err) {
    console.error(`Database Error: ${err}`);
    process.exit(1);
  }
  console.log(`Successfully opened database at ${db_path}`);
});

db.serialize();

statement = `CREATE TABLE IF NOT EXISTS
<<define-checks-table-schema>>`;
db.run(statement);

let cols = CHECKS_COLS.join(','); // create string of column names for INSERT statement
let values = CHECKS_COLS.map(col => `$${col}`).join(', '); // create string of placeholders for INSERT statement
statement = `INSERT INTO ${TABLE} (${cols}) VALUES (${values})`;

let all_checks = []; // used to filter out already-entered checks

wlchecks.on('check', data => {
  delete data.type; // simply don't need this property
  if (!all_checks.includes(data.checkno)) { // filter out already-entered checks
      const new_data = {};
      for (k in data) { // create the named parameters of form 'new_data = {$checkno: 1234}'
          new_data[`$${k}`] = data[k];
      }
      db.run(statement, new_data, (err) => { // add the check data to the sqlite database
          if (err) console.error(`ERROR: ${err}`);
      });
  };

}).on('checked', () => {
  db.close();

}).on('error', err => {
  process.exit(1);

});

// load all of the previously-entered checks into the array 'all_checks'
db.all(`SELECT checkno FROM ${TABLE} WHERE date LIKE '${wlyear}%'`, [], (err, check_data) => {
    if (check_data) {
        all_checks = check_data.map(row => row.checkno);
    }

    wlchecks.findChecks(); // start the stream running
});

{{{heading(Create Symbolic Link into WORKBIN)}}}

This code creates a symbolic link into the WORKBIN directory for the command line program find-checks so that it can be run as a script. Note that the link must be symbolic in order for the node_modules/wlparser/lib/* files to be found when find-checks is run from the WORKBIN directory.

ln -sf $PWD/find-checks.js $WORKBIN/find-checks

Export SQLite DB Data to CSV File

Once data has been downloaded from the bank’s web site and imported into the sqlite3 database, it must be converted into a Ledger file. This is accomplished using the {{{option(–export)}}} option.

--export Option

To export the SQLite3 database data to CSV and Ledger files, use the {{{option(–export)}}} option:

--export [<csv-file>=workfin.csv] --csv <acct> <year>

The csv filename is optional, with the default being the same as the db file (i.e., workfin), with the extension .csv, (i.e., workfin.csv) in the WORKCSV directory. In order to facilitate matching an account to reconcile against, the --export option must be accompanied by the designation of a bank account, e.g., 6815 and a year, e.g., 2016. These will filter the data that is exported from the sqlite data file and will become the account to reconcile against.

The export is accomplished by executing in a child process the command line program:

sqlite3 -header -csv db sql-statements

The child process runs the {{{command(sqlite3)}}} command, and connects the STDOUT stream to the target CSV file. Since the entire database contents for a given year and account are exported, the output will truncate the target CSV file upon opening it for writing. The program will halt after the export.

Ledger convert Command

Upon an export of the SQLite3 data to a CSV file, the program will also send the exported data through the Ledger {{{command(convert)}}} command and into the workfin.ledger data file.[fn::Refer to the Ledger manual at Sec. 7.2.1.2 for the {{{command(convert}}}} command.] This file is located in the ledger/ directory below the workfin directory.

The {{{command(convert)}}} command uses a ledger file filled with ledger directives to associate payee ‘s with account ‘s. If this directives file does not exist, then it will be created.

Export Code

/*--EXPORT--*/
if (options.hasOwnProperty('export')) {
    const { spawnSync } = require('child_process');
    const export_csv = options['export'] || db_file;
    console.log(`exporting to ${export_csv}...`);

    const export_csv_dir = WORKCSV;
    if (!fs.existsSync(export_csv_dir)) {
        fs.mkdirSync(export_csv_dir);
        console.log(`Created ${export_csv_dir}`);
    }
    const export_csv_path = path.format({
        dir: export_csv_dir,
        name: export_csv,
        ext: '.csv'
    });

    // --export must be accompanied by --csv <acct> <year> of the proper values
    if (!options.hasOwnProperty('csv')) {
        console.error('Export must be accompanied by a bank account (e.g., 6815), and a year (e.g., 2016)');
        process.exit(1);
    }
    const _acct = options.csv[0],
          _year = options.csv[1];

    if (!(Object.keys(DB_ACCTS).includes(_acct) && DB_YEARS.includes(_year))) {
        console.error(`Invalid values for acct: ${_acct} or year: ${_year}`);
        process.exit(1);
    }

    const usb_acct = `usb_${_acct}`;

    //'as' - Open file for appending in synchronous mode. The file is created if it does not exist.
    let fd = fs.openSync(export_csv_path,'as');
    const size = fs.statSync(export_csv_path).size;
    const header = size === 0 ? 'header' : 'noheader';
    console.log(`export_csv_path: ${export_csv_path}`);

    const sql = `
SELECT ${EXPORT_DB_COLS.join(',')}
FROM   usb
WHERE  acct = '${usb_acct}' and date like '${_year}%';`;

    console.log(`sql: ${sql}`);

    let ret = spawnSync(
        'sqlite3',
        [
            db_path,
            '-csv',
            `-${header}`,
            sql,
        ],
        {
            encoding: 'utf-8',
            stdio: [0,fd,2]
        }
    );

    if (ret.error) {
        console.log(`status: ${ret.status}\tsignal: ${ret.signal}`);
        console.log(`error: ${ret.error}`);
    }

    console.log('done exporting');
    fs.closeSync(fd);


    /* CONVERT CSV TO LEDGER */
    const ledger_dir = WORKLEDGER;
    const ledger_path = path.format({
        dir: ledger_dir,
        name: export_csv,
        ext: '.exported.ledger'
    });
    const zero_file = path.format({
        dir: ledger_dir,
        name: 'zero',
        ext: '.ledger'
    });
    if (!fs.existsSync(ledger_dir)) {
        fs.mkdirSync(ledger_dir);
    }

    //const l_file = fs.existsSync(ledger_path) ? ledger_path : zero_file;
    const l_file = zero_file;

    console.log(`converting: ${export_csv_path} to ledger_path: ${ledger_path}`);

    fd = fs.openSync(ledger_path, 'as');	// 'as' - Open file for appending in synchronous mode.
                                                // The file is created if it does not exist.
    ret = spawnSync(
        'ledger',
        [
            'convert',
            `${export_csv_path}`,
            '--invert',
            '--input-date-format=%Y-%m-%d',
            `--account=Assets:${DB_ACCTS[_acct]}`,
            '--rich-data',
            `--file=${l_file}`,
            `--now=${(new Date()).toISOString().split('T')[0]}`,
        ],
        {
            encoding: 'utf-8',
            stdio: [0,fd,2],
        }
    );

    if (ret.error) {
        console.log(`status: ${ret.status}\tsignal: ${ret.signal}`);
        console.log(`error: ${ret.error}`);
    }

    fs.closeSync(fd);
    process.exit(0);
}

/*--DON'T CONTINUE UNLESS --csv OPTION USED--*/
if (!options.hasOwnProperty('csv'))
    process.exit(0);

The Zero Ledger File

The Zero Ledger File is a ledger file with an opening balance, list of accounts, and directives that associate payee ‘s with account ‘s. It is used by the ledger {{{command(convert)}}} command to prepare a ledger file, create initial set of accounts, and parse a CSV file into the ledger format.

{{{heading(List of Accounts)}}}

  1. Expenses—where money goes
  2. Assets—where money sits
  3. Income—where money comes from
  4. Liabilities—where money is owed
  5. Equity—where value is

Beneath theses top levels, there can be any level of detail required.

{{{heading(Allowable Accounts)}}}

Here are defined some allowable accounts:

account Expenses
account Assets
account Income
account Liabilities
account Equity

Use the {{{option(–strict)}}} option to show incampatible accounts

{{{heading(Opening Balances)}}}

The first entry is a set of opening balances. It will look like this:

2016/01/01 * Opening Balance
    Expenses				$0
	Assets:USB:Personal 6151		$0
	Assets:USB:Business 6815		$0
    Assets:USB:Trust 6831			$0
    Assets:USB:Savings			$0
    Income					$0
    Liabilities				$0
	Equity:Opening Balance

{{{heading(Directives and Subdirectives)}}}

The Zero file uses two directives, each of which uses a sub-directive, of the form:

payee <PAYEE>
  alias </PAYEE_REGEX/>
account <FULL:ACCOUNT>
  payee </PAYEE_REGEX/>

In the above, the first line rewrites the payee field to establish a legitimate payee. The alias is a regex; anything that matches this directive will be turned into the associated payee. The second line uses an account and a payee directive to specify the proper account. Anything that matches the payee regex will be assigned the account.

payee USPS
 alias usps
account Expenses:Office:Postage
 payee ^(USPS)$

payee Staples
 alias staples
payee Ikea
 alias ikea
payee Portland Art Museum
 alias portland art
payee The Energy Bar
 alias energy bar
account Expenses:Office:Supplies
 payee ^(Staples|Portland Art Museum|Ikea|The Energy Bar)$

payee City of Portland
 alias city of portland
account Expenses:Business:Travel
 payee ^(City of Portland)$

payee RingCentral
 alias ringcentral
payee AT&T
 alias (at&?t)
payee CenturyLink
 alias (centurylink|ctl)
payee NameBright
 alias namebright
account Expenses:Business:Communication
 payee ^(RingCentral|AT\&T|CenturyLink|NameBright)$

payee AVVO
 alias avvo
account Expenses:Business:Advertising
 payee AVVO

payee National Law Foundation
 alias national law fou
payee Coursera
 alias coursera
payee EdX Inc.
 alias edx
account Expenses:Professional:CLE
 payee ^(National Law Foundation|Coursera|EdX Inc.)$

payee State Bar of CA
 alias state bar of ca
account Expenses:Professional:License
 payee ^(State Bar of CA)$

payee Costco Gas
 alias costco gas
account Expenses:Office:Transportation
 payee ^(Costco Gas)$

payee American Express
 alias amex
payee Citi
 alias citi
account CC:Payment
 payee ^(American Express|Citi)$

payee Apple Store
 alias apple
payee Radio Shack
 alias radioshack
account Expenses:Office:Supplies
 payee ^(Apple Store|Radio Shack)$

payee State Bar WA
 alias interest paid this period
account Trust:LTAB
 payee State Bar WA

payee State Bar WA
 alias ltab
account Trust:LTAB
 payee State Bar WA

account Assets:Personal
 payee usb_6151

{{{heading(An include File }}}

Finally, include a file with an Accounts:Payable Category:

include accounts_payable.ledger

The Accounts Payable File

The accounts_payable.ledger file contains any outstanding accounts that should be included to make the inputted data correct, such as a set of outstanding invoices:

2016/01/18 * Clark County Indigent Defense ; Invoice No.s 092--099
        Assets:Accounts Receivable	$1852.50
        Income:120703			-$  82.50 ; Invoice No 092
        Income:140707			-$ 525.00 ; Invoice No 093
        Income:140709			-$ 397.50 ; Invoice No 094
        Income:150701			-$  15.00 ; Invoice No 095
        Income:150704			-$ 742.50 ; Invoice No 096
        Income:150705			-$   7.50 ; Invoice No 097
        Income:150706			-$   7.50 ; Invoice No 098
        Income:150707			-$  75.00 ; Invoice No.099

Process CSV Files

The Node.js module ~csv~ contains the

{{{noindent}}} all of which will be used in this project. The pattern is to open a CSV file, parse a CSV string into records and pipe those records through the transformer to be massaged into shape. From there the new data is saved in another CSV file and also an SQLite3 database using

The processing of a CSV file, therefore, involves the following steps and Node.js modules:

  1. Find the correct CSV file (using FileSystem) and open it as a Readable Stream;
  2. Open a new CSV file to hold the new transformed data as a Writable Stream
  3. Open an SQLite3 database to hold the new transformed data
  4. Read the CSV records from the file as a string (using StreamReader)
  5. Parse the string into JS records (using CSV-Parse)
  6. Transform the JS records into usable data (using CSV-Transform)
  7. Save the new data in the new CSV file (using StreamWriter)
  8. Save the new data in an SQLite3 database (using SQLite3)

Set Up CSV-Stringify

This section receives the transformed records from the Transform function and writes them to new csv files. The new csv files will be located in the WORKCSV directory, e.g., WORK/workfin/csv. A file will be called, for example, usb_6815__2016.csv. Notice that this file name uses two underscores, whereas the source files use two dashes; in all other respects, they are the same.

const stringifier = csv.stringify({
    header: true,
    columns: DB_COLS,
});

const usb_acct = `usb_${acct}`;
const usb_acct_year = `${usb_acct}__${year}.csv`;

const csv_path_file = path.join(WORKCSV, usb_acct_year);
console.log(`CSV PATH FILE: ${csv_path_file}`);

let csv_stringifier;
try {
    csv_stringifier = fs.createWriteStream(csv_path_file);
    console.log(`WRITE STREAM: ${csv_path_file} has been successfully opened.`);
} catch (err) {
    console.error(err.message);
    process.exit(1);
}

stringifier.on('readable', function() {
    console.log('stringifier is now readable');
    let row;
    while (row = this.read()) {
        console.log(`stringifer row: ${row}`);
        csv_stringifier.write(row);
    }
});

stringifier.on('error', function(err) {
    console.error(err.message);
});

stringifier.on('finish', function() {
    console.log('stringifier is done writing to csv_stringifer');
    csv_stringifier.end('stringifer called csv_stringifier\'s "end" method');
});

stringifier.on('close', function() {
    console.log('stringifier is now closed');
});

csv_stringifier.on('close', function() {
    console.log('csv_stringifier is now closed');
});

Set Up Stream-Transform and Transform Function

This code implements the stream transformer functionality, which is at the heart of this project.

The Transformer is a Node.js Transform Stream. This means it is capable of both reading and writing data. In this project, it writes data in the CSV Parser, and then reads it here via its transformer.read() method. This transformer object has a transform() method that takes a function callback, whose purpose is to to transform records that are read. This is the heart of this project. The transform() function is implemented in the following section, and returns completely transformed CSV bank records at its end. These transformed records are then written to both a new CSV file, and the SQLite3 database.

After it transforms the data, the transformer receives the new data via a readable event, where it can process the data.

The transformed data will also be saved into the SQLite3 database via an INSERT statement executed by the db.run() method.

Set Up the Transform Function

The Transform Function receives a record and massages it into shape. The following regular expressions were created based upon inspection of the raw data as it came from the bank for years 2016, 2017, and 2018. It does a decent job of creating readable payees and memos, as well as txfrs (transfers), but it has not been set up to do anything for check payees, categories or related records, for example.

const transform_function = function (record) {
    const DEBIT   = 'debit';
    const CREDIT  = 'credit';
    const CHECK   = 'check';
    const CASH    = 'cash';
    const DEPOSIT = 'deposit';
    const UNKNOWN = 'unknown';
    const TRANS   = 'transfer';
    const USBANK  = 'usbank';
    let   trfrom  = '';

    // Add new columns: acct, checkno, txfr, caseno, desc1, desc2, category
    record.acct    = usb_acct;
    record.checkno = null; // check no.
    record.txfr    = null; // direction and acct #
    record.caseno  = null; // related case foreign key
    record.desc1   = null; // noun
    record.desc2   = null; // adjective
    record.category= null; // categorization of the transaction

    // Format date as yyyy-mm-dd; delete original Date
    record.date = new Date(record['Date']).toISOString().split('T')[0];
    delete record['Date'];

    // Change Transaction to trans; delete original Transaction
    record.trans = record['Transaction'].toLowerCase();
    delete record['Transaction'];

    // Change Amount to amount as Currency type; delete original Amount
    record.amount = accounting.formatMoney(record['Amount']);
    delete record['Amount'];

    // Change Name to payee; keep original Name as OrigName; delete Name
    record.payee = record['Name'].toLowerCase().trimRight();
    record.OrigPayee = record['Name'];
    delete record['Name'];

    // Clean up Memo by removing Download message; return as note; keep Memo as OrigMemo
    let re = new RegExp('Download from usbank.com.\\s*');
    record.note = record['Memo'].replace(re,'').toLowerCase();
    record.OrigMemo = record['Memo'];
    delete record['Memo'];

    // Add check no. to checkno column
    if (record.payee === CHECK) {
        const checkno = record.trans.replace(/^0*/,'');
        record.checkno  = checkno;
        record.trans   = DEBIT;
        record.payee  = `(${record.checkno}) check`;
        record.note  += `Purchase by check no. ${checkno}`;
        record.desc1  = 'purchase';
        record.desc2  = 'check';
    }

    if (record.payee.match(/(returned) (item)/)) {
        record.desc1 = RegExp.$2;
        record.desc2 = RegExp.$1;
        record.payee = USBANK;
        record.note = `${record.desc2} ${record.desc1}`;
    }

    if (record.payee.match(/(internet|mobile) (banking) transfer (deposit|withdrawal) (\d{4})\s*$/)) {
        record.desc1 = RegExp.$3;
        record.desc2 = RegExp.$1;
        record.txfr = `${(RegExp.$3 === 'deposit') ? '<' : '>'} usb_${RegExp.$4}`;
        tofrom = (record.trans === 'debit') ? 'to' : 'from';
        record.payee = (record.trans === 'debit') ? `usb_${RegExp.$4}` : `usb_${options.csv[0]}`;
        record.note = `${record.desc2} ${record.desc1}: ${TRANS} ${tofrom} ${record.note}`;
        if (/>/.test(record.txfr)) {
            record.payee = `Transfer to ${record.payee} from ${record.acct}`;
        } else {
            record.payee = `Transfer to ${record.payee} from usb_${RegExp.$4}`;
        }
    }

    if (record.payee.match(/debit (purchase)\s*-?\s*(visa)? /)) {
        record.desc1 = RegExp.$1;
        record.desc2 = RegExp.$2;
        record.payee = record.payee.replace(RegExp.lastMatch,'');
        record.note = `${record.desc2} ${record.desc1} ${record.note}`.trimLeft();;
    }

    // Removed ELECTRONIC WITHDRAWAL for payment to State Bar of CA
    if (record.payee.match(/^.*(state bar of ca)/)) {
        record.payee = RegExp.$1;
    }

    // web authorized payment
    // atm|electronic|mobile check|rdc deposit|withdrawal <name>
    if (record.payee.match(/(web authorized) (pmt) |(atm|electronic|mobile)?\s*(check|rdc)?\s*(deposit|withdrawal)\s*(.*)?/)) {
        tofrom = '';
        record.desc1 = RegExp.$2 ? RegExp.$2 : RegExp.$4 ? RegExp.$4 : RegExp.$5 ? RegExp.$5 : 'undefined';
        record.desc2 = RegExp.$1 ? RegExp.$1 : RegExp.$3 ? RegExp.$3 : 'undefined';
        if (RegExp.$3 === 'atm' || RegExp.$3 === 'electronic' || RegExp.$3 === 'mobile' || RegExp.$5 === DEPOSIT) {
            record.payee = (RegExp.$5 === 'deposit') ? `usb_${options.csv[0]}` : CASH;
        } else {
            record.payee = record.payee.replace(RegExp.lastMatch,'');
        }
        if (record.note.match(/paypal/) && record.trans === CREDIT) {
            record.txfr = `< ${RegExp.lastMatch}`;
            tofrom = ' from';
        }
        record.note = `${record.desc2} ${record.desc1}${tofrom} ${record.note}`.trimRight();
    }

    if (record.payee.match(/(zelle instant) (pmt) (from (\w+\s\w+))\s(.*)$/)) {
        record.desc1 = RegExp.$2;
        record.desc2 = RegExp.$1;
        record.note = `${record.desc2} ${record.desc1} ${RegExp.$3}`;
        record.payee = `usb_${options.csv[0]}`;
    }

    if (record.payee.match(/(overdraft|international) (paid|processing) (fee)/)) {
        record.desc1 = RegExp.$3;
        record.desc2 = `${RegExp.$1} ${RegExp.$2}`;
        record.payee = USBANK;
        record.note  = `${record.desc2} ${record.desc1} to ${record.payee}`;
    }

    record.payee = record.payee.replace(/\s*portland\s{2,}or$|\s*vancouver\s{2,}wa.*$/,'');
    record.note  = record.note.replace(/\s*portland\s{2,}or$|\s*vancouver\s{2,}wa.*$/,'');
    record.payee = record.payee.replace(/\s\d{3}\w+\s{2,}or$/,''); // Nike Company 019Beaverton   OR
    record.note  = record.note.replace(/\s\d{3}\w+\s{2,}or$/,'');
    record.payee = record.payee.replace(/\s*[-\d]{5,}\s*\w{2}$/,''); // '650-4724100 CA' & '        855-576-4493WA' & '  800-3333330 MA'
    record.note  = record.note.replace(/\s*[-\d]{5,}\s*\w{2}$/,'');
    record.payee = record.payee.replace(/(\s\w*https)?www.*$/,''); // WWW.ATT.COM TX; UDEMY ONLINE COUHTTPSWWW.UDECA
    record.note  = record.note.replace(/(\s\w*https)?www.*$/,'');
    record.payee = record.payee.replace(/\s*\w+\.com\s+\w{2}$/, '');
    record.note  = record.note.replace( /\s*\w+\.com\s+\w{2}$/, '');
    record.payee = record.payee.replace(/aws.amazon.cWA/i,''); // serviaws.amazon.cWA
    record.note  = record.note.replace(/aws.amazon.cWA/i,'');
    if (record.payee.match(/(bostype \/ wes bo)(hamilton\s+on)/)) { // WES BOHAMILTON    ON
        record.payee = 'Wes Bos';
        record.note  = record.note.replace(RegExp.$1,'Wes Bos');
        record.note  = record.note.replace(RegExp.$2, '');
    }
    record.payee = record.payee.replace(/\s{2,}/g,' ');
    record.note  = record.note.replace(/\s{2,}/g,' ');

    /*
      'DEBIT PURCHASE -VISA SQ *PHIL        877-417-4551WA'

      You paid Phil $159 for Atreus keyboard kit and shipping

      It is for a credit card processor that goes by the brand name
      Square Up. Merchants can run credit card transactions through
      their iPhone or iPads using the Square Up services. Mine was for
      a taxi ride. https://800notes.com/Phone.aspx/1-877-417-4551
    */

    record.payee = record.payee.replace(/sq/, 'square');
    record.note  = record.note.replace(/sq/, 'square');

    return record;
}

Set Up the Stream Transform

const transformer = csv.transform(transform_function);

/* TRANSFORMER reads records through its TRANSFORM_FUNCTION */
/* -------------------------------------------------------- */
transformer.on('readable', function() {
    let record;
    while ((record = transformer.read())) {
        console.log(`Transformer record:\n${util.inspect(record)}`);

        /* STRINGIFIER WRITE Records */
        /* ------------------------- */
        stringifier.write(record);



        /* DB RUN---INSERT RECORDS */
        /* ----------------------- */
        const tab_name  = DB_TABLES['usb'];
        const col_names = DB_COLS.join(',');
        const col_phs   = DB_COLS.map(c => '?').join(',');
        const col_values= DB_COLS.map(c => record[c]);

        let sql = `INSERT INTO ${ tab_name }( ${ col_names } )
                   VALUES ( ${ col_phs } )`;

        console.log(`sql: ${ sql }`);
        console.log(`col_values: ${ col_values }`);

        db.run(sql, col_values, (err) => {
           if (err) {
               console.error(err.message);
               console.error(`ERROR sql: ${ sql }`);
               console.error(`ERROR values: ${ col_values }`);
               process.exit(1);
           }
       });
   }
});

transformer.on('error', function(err) {
    console.error(err.message);
});

transformer.on('finish', function() {
    console.log('Transformer finished writing records.');
});

transformer.on('end', function() {
    console.log('Transformer end reached.');
    stringifier.end();
});

Set Up CSV-Parse

This section implements the csv parser. By default, it does little other than read a large string of data and parse it into an array of records. By giving it the option columns = true, however, the parser will use the first line as a list of column headings, and produce an array of objects where the keys are column names, and the values are column entries. Each record is written to the stream transformer using its WRITE method.

<<csv-transformer-write-method>>

const parser = csv.parse({columns: true});
const records = [];

parser.on('readable', function() {
    console.log('Parser beginning to read records.');
    let record;

    /* PARSE A RECORD AND WRITE TO THE TRANSFORMER */
    while ((record = parser.read())) {
        console.log(`parser record:\n${util.inspect(record)}`);
        transformer.write(record);
    }

});

parser.on('error', function(err) {
    console.error(err.message);
});

parser.on('end', function() {
    console.log('Parser finished reading records.');
});

parser.on('finish', function () {
    console.log('Parser finished writing records.');
    console.log('Parser calling transformer end');
    transformer.end();
});

Set Up StreamReader

This section implements the Stream Reader that reads the CSV file in as a large string of data and sends it to the csv parser via the parser’s write method.

CSV financial files are found in the directories $WORKUSB_[6815|6831]/yyyy, where yyyy can be 2004–2019, and on. Given [6815|6831] and a year [2004|2005...2019], the file path will be $WORKUSB_6815/YYYY/usb_6815--yyyy.csv. This code makes sure the file exists and the user has proper permissions to read it before proceeding.

if (options.csv) {
    // const acct = options.csv[0],
    //       year = options.csv[1];
    // const usb_acct = `usb_${acct}`;

    if (!process.env.WORKUSB) {
        console.error('You must assign a path to the shell variable WORKUSB');
        process.exit(1);
    }

    const acct_year_path = `${process.env.WORKUSB}/${usb_acct}/${year}`;
    const acct_year_csv_file = `${usb_acct}--${year}.csv`;
    const acct_year_csv_file_path = `${acct_year_path}/${acct_year_csv_file}`;
    if (!fs.existsSync(acct_year_csv_file_path) || !(fs.accessSync(acct_year_csv_file_path, fs.constants.R_OK) === undefined)) {
        console.error(`Cannot find or access the CSV file at '${acct_year_csv_file_path}'.`);
        process.exit(1);
    }
    console.log(`Successfully found the CSV file: '${acct_year_csv_file_path}'`);

    /* CREATE THE STREAM HERE */
    const csv_file_stream = fs.createReadStream(acct_year_csv_file_path, {encoding: 'utf8'});

    /* Set up streaming events 'READABLE', 'ERROR', and 'END' */
    csv_file_stream.on('readable', function () {
        let record;

        /* READ THE RECORDS */
        while ((record = this.read())) {
            console.log(`readable record: ${record}`);

            /* WRITE A RECORD TO THE PARSER */
            parser.write(record);

        }
        parser.end();

    });

    csv_file_stream.on('error', function(err) {
        console.error(err.message);
    });

    csv_file_stream.on('end', function () {
        console.log('Reader finished reading data.');
    });
}

Node-SQLite3 Module

Asynchronous, non-blocking SQLite3 bindings for Node.js.

Node-SQLite3 Module Usage

var sqlite3 = require('sqlite3').verbose();
var db = new sqlite3.Database(':memory:');

db.serialize(function() {
  db.run("CREATE TABLE lorem (info TEXT)");

  var stmt = db.prepare("INSERT INTO lorem VALUES (?)");
  for (var i = 0; i < 10; i++) {
      stmt.run("Ipsum " + i);
  }
  stmt.finalize();

  db.each("SELECT rowid AS id, info FROM lorem", function(err, row) {
      console.log(row.id + ": " + row.info);
  });
});

db.close();

Features

  • Straightforward query and parameter binding interface
  • Full Buffer/Blob support
  • Extensive debugging support
  • Query serialization API
  • Extension support
  • Big test suite
  • Written in modern C++ and tested for memory leaks
  • Bundles Sqlite3 3.26.0 as a fallback if the installing system doesn’t include SQLite

Node-SQLite3 API

node-sqlite3 has built-in function call serialization and automatically waits before executing a blocking action until no other action is pending. This means that it’s safe to start calling functions on the database object even if it is not yet fully opened. The Database#close() function will wait until all pending queries are completed before closing the database.

Node-SQLite3 Control Flow—Two Execution Modes

node-sqlite3 provides two functions to help control the execution flow of statements. The default mode is to execute statements in parallel. However, the Database#close method will always run in exclusive mode, meaning it waits until all previous queries have completed and node-sqlite3 will not run any other queries while a close is pending.

Serialize Execution Mode

Calling Database#serialize() within nested functions is safe:

// Queries scheduled here will run in parallel.

db.serialize(function() {

    // Queries scheduled here will be serialized.
    db.serialize(function() {
        // Queries scheduled here will still be serialized.
    });
    // Queries scheduled here will still be serialized.
});

// Queries scheduled here will run in parallel again.

Note that queries scheduled not directly in the callback function are not necessarily serialized:

db.serialize(function() {

    // These two queries will run sequentially.
    db.run("CREATE TABLE foo (num)");
    db.run("INSERT INTO foo VALUES (?)", 1, function() {

        // These queries will run in parallel and the second query will probably
        // fail because the table might not exist yet.
        db.run("CREATE TABLE bar (num)");
        db.run("INSERT INTO bar VALUES (?)", 1);
    });
});

If you call it without a function parameter, the execution mode setting is sticky and won’t change until the next call to Database#parallelize.

Parallelize Execution Mode

Calling Database#parallelize() within nested functions is safe:

db.serialize(function() {

    // Queries scheduled here will be serialized.
    db.parallelize(function() {

        // Queries scheduled here will run in parallel.
    });

    // Queries scheduled here will be serialized again.
});

If you call it without a function parameter, the execution mode setting is sticky and won’t change until the next call to Database#serialize.

Converting CSV Files

Ledger Convert Command

  • Ledger manual sec. 7.2.1.2

The {{{command(convert)}}} command parses a comma separated value (csv) file and prints Ledger transactions. Importing csv files is a lot of work, but is very amenable to scripting.

$ ledger convert download.csv --input-date-format "%m/%d/%Y"

{{{heading(Fields Descriptors)}}}

Your bank’s csv files will have fields in different orders from other banks, so there must be a way to tell Ledger what to expect. Ledger expects the first line to contain a description of the fields on each line of the file.

  • Insert a line at the beginning of the csv file that describes the fields to Ledger.
  • The fields ledger can recognize contain these case-insensitive strings:
    • date
    • posted
    • code
    • payee
    • desc / description
    • amount
    • cost
    • total
    • note
  • the Input Date Format option tells ledger how to interpret the dates:
    • --input-date-format DATE_FORMAT
    • e.g., "%m/%d/%Y"
  • Metadata
    • If there are columns in the bank data you would like to keep in your ledger data, besides the primary fields described above, you can name them in the field descriptor list and Ledger will include them in the transaction as meta data if it doesn’t recognize the field name.
    CSV
    original line from the csv file
    Imported
    date data imported
    UUID
    unique checksum; if an entry with the same ‘UUID’ tag is already included in the normal ledger file (specified via {{{option(–file FILE (-f))}}} or via the environment variable LEDGER_FILE) this entry will not be printed again.

{{{heading(Convert Command Options)}}}

The convert command accepts the following options:

–invert
inverts the amount field
–auto-match
automatically matches an account from the Ledger journal for every CSV line
–account STR
use to specify the account to balance against
–rich-data
stores additional tag/value pairs
–file (-f)
the normal ledger file (could also be specified via the environment variable LEDGER_FILE)
–input-date-format
tells ledger how to interpret the dates

{{{heading(Command Directives)}}}

You can also use {{{command(convert)}}} with payee and account directives.

  1. Directives

    a. directive :: Command directives must occur at the beginning of a line.

    b. account directive :: Pre-declare valid account names. This only has an effect if {{{option(–strict)}}} or {{{option(–pedantic)}}} is used. The account directive supports several optional sub-directives, if they immediately follow the account= directive and if they begin with whitespace:

    account Expenses:Food
                  note This account is all about the chicken!
                  alias food
                  payee ^(KFC|Popeyes)$
                  check commodity == "$"
                  assert commodity == "$"
                  eval print("Hello!")
                  default
        

    c. payee sub-directive :: The payee sub-directive of the account directive, which can occur multiple times, provides regexes that identify the account if that payee is encountered and an account within its transaction ends in the name “Unknown”.

    2012-02-27 KFC
                  Expenses:Unknown      $10.00  ; Read now as "Expenses:Food"
                  Assets:Cash
        

    d. alias sub-directive :: The alias sub-directive of the acount directive, which can occur multiple times, allows the alias to be used in place of the full account name anywhere that account names are allowed.

    e. payee directive :: The payee directive supports two optional sub-directives, if they immediately follow the payee directive and–—if it is on a successive line—–begins with white-space:

    payee KFC
      alias KENTUCKY FRIED CHICKEN
      uuid 2a2e21d434356f886c84371eebac6e44f1337fda
        

    The alias sub-directive of the payee directive provides a regex which, if it matches a parsed payee, the declared payee name is substituted:

    f. alias directive :: Define an alias for an account name. The aliases are only in effect for transactions read in after the alias is defined and are affected by account directives that precede them. The alias sub-directive, which can occur multiple times, allows the alias to be used in place of the full account name anywhere that account names are allowed.

    alias Dining=Expenses:Entertainment:Dining
    alias Checking=Assets:Credit Union:Joint Checking Account
    
    2011/11/28 YummyPalace
        Dining        $10.00
        Checking
        
  2. First, you can use the payee directive and its alias sub-directive to rewrite the payee field based on some rules.
    payee Aldi
          alias ^ALDI SUED SAGT DANKE
        
  3. Then you can use the account directive and its payee sub-directive to specify the account.
    account Aufwand:Einkauf:Lebensmittel
           payee ^(Aldi|Alnatura|Kaufland|REWE)$
        

{{{subheading(Directive Example)}}}

payee Aldi
          alias ^ALDI SUED SAGT DANKE
account Aufwand:Einkauf:Lebensmittel
          payee ^(Aldi|Alnatura|Kaufland|REWE)$

Note that it may be necessary for the output of {{{command(ledger convert)}}} to be passed through {{{command(ledger print)}}} a second time if you want to match on the new payee field. During the {{{command(ledger convert)}}} run, only the original payee name as specified in the csv data seems to be used.

Update package.json Version

In order to keep the package.json version number, create-CSV-SQLite3-project, in sync with this document’s version number, I have created a little script to update its version based upon the macro version’s current value. This macro is defined at the very top of this Org source file just below the title and date. I update this version number after every modification to this source file and before commiting the change. This little script will then be run whenever an installation occurs.

This script can be run by invoking the Makefile target update-version. This will checkout the dev branch, tangle the update-version.sh script into the scripts/ directory, run it, amend the most recent commit to include the updated version number, push the amended commit to Github, and finally create a prod branch (for production), install all of the files and documentation, and commit and push the prod branch to Github. At this point, the package is ready to be cloned from Github and contains the most recent version number, the dependencies installed, and run.

This program is a little {{{command(sed))}}} script that modifies this Org source file in-place (after creating and storing a backup of the source) by copying the version number found in the macro and updating the version number of the package.json file. It runs very quickly.

sed -i .bak -E -e '
/\#\+macro: version Version/bx
/,\/\\"version\\":/by
b
:x
h
s/^(.*macro: version Version )(.*)$/\2/
x
b
:y
H
x
s/\n//
s/^([[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+)(.*)([[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+)/\2\1/
' CSV-SQLite3.org

Makefile

SOURCE = CSV-SQLite3
ORG    = $(SOURCE).org
TEXI   = $(SOURCE).texi
INFO   = $(SOURCE).info
PDF    = $(SOURCE).pdf
DOCS   = docs
SCRIPTS=scripts

.PHONY: clean clean-world clean-prod
.PHONY: tangle weave texi info pdf
.PHONY: install install-docs install-info install-pdf open-pdf docs-dir
.PHONY: update-dev update-prod checkout-dev checkout-prod
.PHONY: update-version tangle-update-version run-update-version

texi: $(TEXI)
$(TEXI): $(ORG)
	emacs -Q --batch $(ORG) \
	--eval '(setq org-export-use-babel nil)' \
	--eval '(org-texinfo-export-to-texinfo)'

tangle: $(ORG)
	emacs -Q --batch $(ORG) \
	--eval '(org-babel-tangle-file "$(ORG)")'

info weave install-info: $(DOCS)/$(INFO)
$(DOCS)/$(INFO): $(TEXI) | docs-dir
	makeinfo --output=$(DOCS)/ $(TEXI)

install: package.json
package.json:	$(ORG) | docs-dir
	emacs -Q --batch $(ORG) \
	--eval '(require '\''ob-shell)' \
	--eval '(require '\''ob-js)' \
	--eval '(setq org-confirm-babel-evaluate nil)' \
	--eval '(org-texinfo-export-to-info)'
	mv $(INFO) $(DOCS)/
	make install-pdf

install-docs: install-info install-pdf

pdf install-pdf: $(DOCS)/$(PDF)
$(DOCS)/$(PDF): $(TEXI) | docs-dir
	pdftexi2dvi -q -c $(TEXI)
	mv $(PDF) $(DOCS)/

open-pdf: $(DOCS)/$(PDF)
	open $(DOCS)/$(PDF)

docs-dir: $(DOCS)
$(DOCS):
	mkdir -vp docs


update-version: update-dev update-prod

checkout-dev:
	git checkout dev

update-dev: checkout-dev run-update-version
	git add -u
	git commit --amend -C HEAD
	git push origin +dev

checkout-prod: clean-world checkout-dev
	git checkout -B prod

update-prod: checkout-prod install clean-prod
	git add -A .
	git commit -m "Branch:prod"
	git push origin +prod

run-update-version: tangle-update-version
	./$(SCRIPTS)/update-version.sh
	mv -v $(ORG).bak $(WORKBAK)/$(ORG).$(shell date "+%s")

tangle-update-version: $(SCRIPTS)/update-version.sh
$(SCRIPTS)/update-version.sh: $(ORG)
	emacs -Q --batch $(ORG) \
	--eval '(search-forward ":tangle scripts/update-version.sh")' \
	--eval '(org-babel-tangle '\''(4))'


clean:
	-rm *~

clean-world: clean
	-rm *.{texi,info,pdf,js,json,lock,log,bak}
	-rm -rf LogReader
	-rm -rf node_modules $(SCRIPTS) $(DOCS)

clean-prod: clean
	-rm *.{texi,org} Makefile LogReader
	-rm -rf node_modules

Index

Function Index

Listings

Copying

{{{title}}} {{{version}}} ({{{date(%a %02m-%02d-%Y)}}})

© {{{date(%Y)}}} {{{author}}}

Macro Definitions

Options

Export Settings

Local Variables