diff --git a/README.md b/README.md index bdc18de6..062c5dde 100644 --- a/README.md +++ b/README.md @@ -24,6 +24,7 @@ A short description of each tool follows. There is more detail in the [tool refe * [tsv-uniq](#tsv-uniq) - Filter out duplicate lines using fields as a key. * [tsv-select](#tsv-select) - Keep a subset of the columns in the input. * [tsv-summarize](#tsv-summarize) - Aggregate field values, summarizing across the entire file or grouped by key. +* [tsv-append](#tsv-append) - Concatenate TSV files. Header aware; supports source file tracking. * [csv2tsv](#csv2tsv) - Convert CSV files to TSV. * [number-lines](#number-lines) - Number the input lines. * [Useful bash aliases](#useful-bash-aliases) @@ -84,7 +85,7 @@ See the [tsv-select reference](#tsv-select-reference) for details. ### tsv-summarize -tsv-summarize runs aggregation operations on fields. For example, generating the sum or median of a field's values. Summarization calculations can be run across the entire input or can be grouped by key fields. As an example, consider the file `data.tsv`: +`tsv-summarize` runs aggregation operations on fields. For example, generating the sum or median of a field's values. Summarization calculations can be run across the entire input or can be grouped by key fields. As an example, consider the file `data.tsv`: ``` color weight red 6 @@ -109,6 +110,18 @@ Multiple fields can be used as the `--group-by` key. The file's sort order does See the [tsv-summarize reference](#tsv-summarize-reference) for the list of statistical and other aggregation operations available. +### tsv-append + +`tsv-append` concatenates multiple TSV files, similar to the Unix `cat` utility. It is header aware, writing the header from only the first file. It also supports source tracking, adding a column indicating the original file to each row. + +Concatenation with header support is useful when preparing data for traditional Unix utilities like `sort` and `sed` or applications that read a single file. + +Source tracking is useful when creating long/narrow form tabular data. This format is used by many statistics and data mining packages. (See [Wide & Long Data - Stanford University](http://stanford.edu/~ejdemyr/r-tutorials/wide-and-long/) or Hadley Wickham's [Tidy data](http://vita.had.co.nz/papers/tidy-data.html) for more info.) + +In this scenario, files have been used to capture related data sets, the difference between data sets being a condition represented by the file. For example, results from different variants of an experiment might each be recorded in their own files. Retaining the source file as an output column preserves the condition represented by the file. The source values default to the file names, but this can be customized. + +See the [tsv-append reference](#tsv-append-reference) for the complete list of options available. + ### csv2tsv Sometimes you have a CSV file. This program does what you expect: convert CSV data to TSV. Example: @@ -205,7 +218,9 @@ There is directory for each tool, plus one directory for shared code (`common`). Documentation for each tool is found near the top of the main file, both in the help text and the option documentation. -The simplest tool is `number-lines`. It is useful as an illustration of the code outline followed by the other tools. `tsv-select` and `tsv-uniq` also have straightforward functionality, but employ a few more D programming concepts. `tsv-select` uses templates and compile-time programming in a somewhat less common way, it may be clearer after gaining some familiarity with D templates. A non-templatized version of the source code is included for comparison. +The simplest tool is `number-lines`. It is useful as an illustration of the code outline followed by the other tools. `tsv-select` and `tsv-uniq` also have straightforward functionality, but employ a few more D programming concepts. `tsv-select` uses templates and compile-time programming in a somewhat less common way, it may be clearer after gaining some familiarity with D templates. A non-templatized version of the source code is included for comparison. + +`tsv-append` has a simple code structure. It's one of the newer tools. It's only additional complexity is that writes to an 'output range' rather than directly to standard output. This enables better encapsulation for unit testing. `tsv-join` and `tsv-filter` also have relatively straightforward functionality, but support more use cases resulting in more code. `tsv-filter` in particular has more elaborate setup steps that take a bit more time to understand. `tsv-filter` uses several features like delegates (closures) and regular expressions not used in the other tools. @@ -258,7 +273,7 @@ $ make test-nobuild ### Unit tests -D has an excellent facility for adding unit tests right with the code. The `common` utility functions in this package take advantage of built-in unit tests. However, most of the command line executables do not, and instead use more traditional invocation of the command line executables and diffs the output against a "gold" result set. The exceptions are `csv2tsv` and `tsv-summarize`. These use both built-in unit tests and tests against the executable. The built-in unit tests are much nicer, and also the advantage of being naturally cross-platform. The command line executable tests assume a Unix shell. +D has an excellent facility for adding unit tests right with the code. The `common` utility functions in this package take advantage of built-in unit tests. However, most of the command line executables do not, and instead use more traditional invocation of the command line executables and diffs the output against a "gold" result set. The exceptions are `csv2tsv`, `tsv-summarize` and `tsv-append`. These use both built-in unit tests and tests against the executable. The built-in unit tests are much nicer, and also the advantage of being naturally cross-platform. The command line executable tests assume a Unix shell. Tests for the command line executables are in the `tests` directory of each tool. Overall the tests cover a fair number of cases and are quite useful checks when modifying the code. They may also be helpful as an examples of command line tool invocations. See the `tests.sh` file in each `test` directory, and the `test` makefile target in `makeapp.mk`. @@ -364,6 +379,8 @@ This section provides more detailed documentation about the different tools as w * [tsv-join reference](#tsv-join-reference) * [tsv-uniq reference](#tsv-uniq-reference) * [tsv-select reference](#tsv-select-reference) +* [tsv-summarize reference](#tsv-summarize-reference) +* [tsv-append reference](#tsv-append-reference) * [csv2tsv reference](#csv2tsv-reference) * [number-lines reference](#number-lines-reference) @@ -722,6 +739,39 @@ Calculations hold onto the minimum data needed while reading data. A few operati * `--mode n[,n...][:STR]` - Mode. The most frequent value. (Reads all values into memory.) * `--values n[,n...][:STR]` - All the values, separated by --v|values-delimiter. (Reads all values into memory.) +### tsv-append reference + +**Synopsis:** tsv-append [options] [file...] + +tsv-append concatenates multiple TSV files, similar to the Unix 'cat' utility. Unlike 'cat', it is header aware ('--H|header'), writing the header from only the first file. It also supports source tracking, adding a column indicating the original file to each row. Results are written to standard output. + +Concatenation with header support is useful when preparing data for traditional Unix utilities like 'sort' and 'sed' or applications that read a single file. + +Source tracking is useful when creating long/narrow form tabular data, a format used by many statistics and data mining packages. In this scenario, files have been used to capture related data sets, the difference between data sets being a condition represented by the file. For example, results from different variants of an experiment might each be recorded in their own files. Retaining the source file as an output column preserves the condition represented by the file. + +The file-name (without extension) is used as the source value. This can customized using the --f|file option. + +Example: Header processing: + + $ tsv-append -H file1.tsv file2.tsv file3.tsv + +Example: Header processing and source tracking: + + $ tsv-append -H -t file1.tsv file2.tsv file3.tsv + +Example: Source tracking with custom values: + + $ tsv-append -H -s test_id -f test1=file1.tsv -f test2=file2.tsv + +**Options:** +* `--h|help` - Print help. +* `--help-verbose` - Print detailed help. +* `--H|header` - Treat the first line of each file as a header. +* `--t|track-source` - Track the source file. Adds an column with the source name. +* `--s|source-header STR` - Use STR as the header for the source column. Implies --H|header and --t|track-source. Default: 'file' +* `--f|file STR=FILE` - Read file FILE, using STR as the 'source' value. Implies --t|track-source. +* `--d|delimiter CHR` - Field delimiter. Default: TAB. (Single byte UTF-8 characters only.) + ### csv2tsv reference **Synopsis:** csv2tsv [options] [file...] diff --git a/common/dub.json b/common/dub.json index 004b952a..f12f6dcc 100644 --- a/common/dub.json +++ b/common/dub.json @@ -3,7 +3,7 @@ "description": "Routines used by applications in the tsv-utils-dlang package.", "homepage": "https://github.com/eBay/tsv-utils-dlang", "authors": ["Jon Degenhardt"], - "copyright": "Copyright (c) 2015-2016, eBay Software Foundation", + "copyright": "Copyright (c) 2015-2017, eBay Software Foundation", "license": "BSL-1.0", "targetType": "sourceLibrary" } diff --git a/common/src/unittest_utils.d b/common/src/unittest_utils.d new file mode 100644 index 00000000..704121d7 --- /dev/null +++ b/common/src/unittest_utils.d @@ -0,0 +1,115 @@ +/** +Helper functions for tsv-utils-dlang unit tests. + +Copyright (c) 2017, eBay Software Foundation +Initially written by Jon Degenhardt + +License: Boost License 1.0 (http://boost.org/LICENSE_1_0.txt) +*/ + +version(unittest) +{ + /* Creates a temporary directory for writing unit test files. The path of the created + * directory is returned. The 'toolDirName' argument will be included in the directory + * name, and should consist of generic filename characters. e.g. "tsv_append". This + * name will also be used in assert error messages. + * + * The caller should delete the temporary directory and all its contents when tests + * are finished. This can be done using std.file.rmdirRecurse. For example: + * + * unittest + * { + * import std.file : rmdirRecurse; + * auto testDir = makeUnittestTempDir("tsv_append"); + * scope(exit) testDir.rmdirRecurse; + * ... test code + * } + * + * An assert is triggered if the directory cannot be created. There are two typical + * reasons: + * - Unable to find an available directory name. A number of unique names are tried + * (currently 1000). If they are all taken, it will normally be because the directories + * haven't been properly cleaned up from previous unit test runs. + * - Directory creation failed. e.g. Permission denied. + * + * This routine is intended to be run in 'unittest' mode, so that an assert is triggered + * on failure. However, if run with asserts disabled, the returned path will be empty in + * event of a failure. + */ + string makeUnittestTempDir(string toolDirName) + { + import std.conv; + import std.file : exists, mkdir, tempDir; + import std.format; + import std.path : buildPath; + import std.range; + + string dirNamePrefix = "tsv_utils_dlang__" ~ toolDirName ~ "_unittest_"; + string systemTempDirPath = tempDir(); + string newTempDirPath = ""; + + for (auto i = 0; i < 1000 && newTempDirPath.empty; i++) + { + string path = buildPath(systemTempDirPath, dirNamePrefix ~ i.to!string); + if (!path.exists) newTempDirPath = path; + } + assert (!newTempDirPath.empty, + format("Unable to obtain a new temp directory, paths tried already exist.\nPath prefix: %s", + buildPath(systemTempDirPath, dirNamePrefix))); + + if (!newTempDirPath.empty) + { + try mkdir(newTempDirPath); + catch (Exception exc) + { + assert(false, format("Failed to create temp directory: %s\n Error: %s", + newTempDirPath, exc.msg)); + } + } + + return newTempDirPath; + } + + /* Write a TSV file. The 'tsvData' argument is a 2-dimensional array of rows and + * columns. Asserts if the file cannot be written. + * + * This routine is intended to be run in 'unittest' mode, so that it will assert + * if the write fails. However, if run in a mode with asserts disabled, it will + * return false if the write failed. + */ + bool writeUnittestTsvFile(string filepath, string[][] tsvData, char delimiter = '\t') + { + import std.algorithm : each, joiner, map; + import std.conv; + import std.format: format; + import std.stdio : File; + + try + { + auto file = File(filepath, "w"); + tsvData + .map!(row => row.joiner(delimiter.to!string)) + .each!(str => file.writeln(str)); + } + catch (Exception exc) + { + assert(false, format("Failed to write TSV file: %s.\n Error: %s", + filepath, exc.msg)); + return false; + } + + return true; + } + + /* Convert a 2-dimensional array of values to an in-memory string. */ + string tsvDataToString(string[][] tsvData, char delimiter = '\t') + { + import std.algorithm : joiner, map; + import std.conv; + + return tsvData + .map!(row => row.joiner(delimiter.to!string).to!string ~ "\n") + .joiner + .to!string; + } + } diff --git a/dub.json b/dub.json index bf2dc17c..5433902f 100644 --- a/dub.json +++ b/dub.json @@ -11,9 +11,10 @@ "tsv-utils-dlang:common": "*", "tsv-utils-dlang:csv2tsv": "*", "tsv-utils-dlang:number-lines": "*", - "tsv-utils-dlang:tsv-select": "*", + "tsv-utils-dlang:tsv-append": "*", "tsv-utils-dlang:tsv-filter": "*", "tsv-utils-dlang:tsv-join": "*", + "tsv-utils-dlang:tsv-select": "*", "tsv-utils-dlang:tsv-summarize": "*", "tsv-utils-dlang:tsv-uniq": "*" }, @@ -21,9 +22,10 @@ "./common/", "./csv2tsv/", "./number-lines/", - "./tsv-select/", + "./tsv-append/", "./tsv-filter/", "./tsv-join/", + "./tsv-select/", "./tsv-summarize/", "./tsv-uniq/" ], diff --git a/dub_build.d b/dub_build.d index d3e42170..40cce77b 100644 --- a/dub_build.d +++ b/dub_build.d @@ -13,7 +13,7 @@ Another use-case: dub fetch --local cd - dub build + dub run This executable is intended to handle these cases. It also has one additional function: inform the user where the binaries are stored so they can be added to the path. @@ -56,7 +56,7 @@ int main(string[] args) { // Note: At present 'common' is a source library and does not need a standalone compilation step. auto packageName = "tsv-utils-dlang"; - auto subPackages = ["csv2tsv", "number-lines", "tsv-filter", "tsv-join", "tsv-select", "tsv-summarize", "tsv-uniq"]; + auto subPackages = ["csv2tsv", "number-lines", "tsv-append", "tsv-filter", "tsv-join", "tsv-select", "tsv-summarize", "tsv-uniq"]; auto buildCmdArgs = ["dub", "build", "", "--force", "-b"]; buildCmdArgs ~= debugBuild ? "debug" : "release"; if (compiler.length > 0) { diff --git a/makeapp.mk b/makeapp.mk index 850275b7..757c1d15 100644 --- a/makeapp.mk +++ b/makeapp.mk @@ -1,5 +1,5 @@ app ?= $(notdir $(basename $(CURDIR))) -common_srcs ?= $(common_srcdir)/tsvutil.d $(common_srcdir)/getopt_inorder.d +common_srcs ?= $(common_srcdir)/tsvutil.d $(common_srcdir)/getopt_inorder.d $(common_srcdir)/unittest_utils.d app_src ?= src/$(app).d srcs ?= $(app_src) $(common_srcs) imports ?= -I$(common_srcdir) diff --git a/makefile b/makefile index 31021867..5bb5598c 100644 --- a/makefile +++ b/makefile @@ -1,4 +1,4 @@ -appdirs = csv2tsv number-lines tsv-filter tsv-join tsv-select tsv-uniq tsv-summarize +appdirs = csv2tsv number-lines tsv-filter tsv-join tsv-select tsv-uniq tsv-summarize tsv-append subdirs = common $(appdirs) release: make_subdirs diff --git a/tsv-append/dub.json b/tsv-append/dub.json new file mode 100644 index 00000000..f8ed1338 --- /dev/null +++ b/tsv-append/dub.json @@ -0,0 +1,28 @@ +{ + "name": "tsv-append", + "description": "Concatenate TSV files. Header aware, with support for source file tracking.", + "homepage": "https://github.com/eBay/tsv-utils-dlang", + "authors": ["Jon Degenhardt"], + "copyright": "Copyright (c) 2017, eBay Software Foundation", + "license": "BSL-1.0", + "targetType": "executable", + "configurations": [ + { + "name" : "executable", + "targetName": "tsv-append", + "targetPath": "../bin/", + "mainSourceFile": "src/tsv-append.d", + "dependencies": { + "tsv-utils-dlang:common": "*" + } + }, + { + "name": "unittest", + "targetType": "none" + } + ], + "buildTypes": { + "debug": { "buildOptions": ["debugMode", "optimize"] }, + "release": { "buildOptions": ["releaseMode", "optimize", "inline"], "dflags": ["-boundscheck=off"] } + } +} diff --git a/tsv-append/makefile b/tsv-append/makefile new file mode 100644 index 00000000..ee1889c9 --- /dev/null +++ b/tsv-append/makefile @@ -0,0 +1,2 @@ +include ../makedefs.mk +include ../makeapp.mk diff --git a/tsv-append/src/tsv-append.d b/tsv-append/src/tsv-append.d new file mode 100644 index 00000000..65fb6dff --- /dev/null +++ b/tsv-append/src/tsv-append.d @@ -0,0 +1,360 @@ +/** +Command line tool that appends multiple TSV files. It is header aware and supports +tracking the original source file of each row. + +Copyright (c) 2017, eBay Software Foundation +Initially written by Jon Degenhardt + +License: Boost License 1.0 (http://boost.org/LICENSE_1_0.txt) +*/ +module tsv_append; + +import std.conv; +import std.range; +import std.stdio; +import std.typecons : tuple; + +version(unittest) +{ + // When running unit tests, use main from -main compiler switch. +} +else +{ + int main(string[] cmdArgs) { + TsvAppendOptions cmdopt; + auto r = cmdopt.processArgs(cmdArgs); + if (!r[0]) return r[1]; + try tsvAppend(cmdopt, stdout.lockingTextWriter); + catch (Exception exc) + { + stderr.writefln("Error [%s]: %s", cmdopt.programName, exc.msg); + return 1; + } + return 0; + } +} + +auto helpTextVerbose = q"EOS +Synopsis: tsv-append [options] [file...] + +tsv-append concatenates multiple TSV files, similar to the Unix 'cat' utility. +Unlike 'cat', it is header aware ('--H|header'), writing the header from only +the first file. It also supports source tracking, adding a column indicating +the original file to each row. Results are written to standard output. + +Concatenation with header support is useful when preparing data for traditional +Unix utilities like 'sort' and 'sed' or applications that read a single file. + +Source tracking is useful when creating long/narrow form tabular data, a format +used by many statistics and data mining packages. In this scenario, files have +been used to capture related data sets, the difference between data sets being a +condition represented by the file. For example, results from different variants +of an experiment might each be recorded in their own files. Retaining the source +file as an output column preserves the condition represented by the file. + +The file-name (without extension) is used as the source value. This can +customized using the --f|file option. + +Example: Header processing: + + $ tsv-append -H file1.tsv file2.tsv file3.tsv + +Example: Header processing and source tracking: + + $ tsv-append -H -t file1.tsv file2.tsv file3.tsv + +Example: Source tracking with custom values: + + $ tsv-append -H -s test_id -f test1=file1.tsv -f test2=file2.tsv + +Options: +EOS"; + +auto helpText = q"EOS +Synopsis: tsv-append [options] [file...] + +tsv-append concatenates multiple TSV files, reading from files or standard input +and writing to standard output. It is header aware ('--H|header'), writing the +header from only the first file. It also supports source tracking, adding an +indicator of original file to each row of input. + +Options: +EOS"; + +struct TsvAppendOptions { + string programName; + string[] files; // Input files + string[string] fileSourceNames; // Maps file path to the 'source' value + bool helpVerbose = false; // --help-verbose + string sourceHeader; // --s|source-header + bool trackSource = false; // --t|track-source + bool hasHeader = false; // --H|header + char delim = '\t'; // --d|delimiter + + /* fileOptionHandler processes the '--f|file source=file' option. */ + private void fileOptionHandler(string option, string optionVal) + { + import std.algorithm : findSplit; + import std.format : format; + + auto valSplit = findSplit(optionVal, "="); + if (valSplit[0].empty || valSplit[2].empty) + throw new Exception( + format("Invalid option value: '--%s %s'. Expected: '--%s ='.", + option, optionVal, option)); + + auto source = valSplit[0]; + auto filepath = valSplit[2]; + files ~= filepath; + fileSourceNames[filepath] = source; + } + + /* Returns a tuple. First value is true if command line arguments were successfully + * processed and execution should continue, or false if an error occurred or the user + * asked for help. If false, the second value is the appropriate exit code (0 or 1). + * + * Returning true (execution continues) means args have been validated and derived + * values calculated. In addition, field indices have been converted to zero-based. + * If the whole line is the key, the individual fields list will be cleared. + */ + auto processArgs (ref string[] cmdArgs) + { + import std.algorithm : any, each; + import std.getopt; + import std.path : baseName, stripExtension; + + programName = (cmdArgs.length > 0) ? cmdArgs[0].stripExtension.baseName : "Unknown_program_name"; + + try + { + arraySep = ","; // Use comma to separate values in command line options + auto r = getopt( + cmdArgs, + "help-verbose", " Print full help.", &helpVerbose, + std.getopt.config.caseSensitive, + "H|header", " Treat the first line of each file as a header.", &hasHeader, + std.getopt.config.caseInsensitive, + "t|track-source", " Track the source file. Adds an column with the source name.", &trackSource, + "s|source-header", "STR Use STR as the header for the source column. Implies --H|header and --t|track-source. Default: 'file'", &sourceHeader, + "f|file", "STR=FILE Read file FILE, using STR as the 'source' value. Implies --t|track-source.", &fileOptionHandler, + "d|delimiter", "CHR Field delimiter. Default: TAB. (Single byte UTF-8 characters only.)", &delim, + ); + + if (r.helpWanted) + { + defaultGetoptPrinter(helpText, r.options); + return tuple(false, 0); + } + else if (helpVerbose) + { + defaultGetoptPrinter(helpTextVerbose, r.options); + return tuple(false, 0); + } + + /* Derivations and consistency checks. */ + if (files.length > 0 || !sourceHeader.empty) trackSource = true; + if (!sourceHeader.empty) hasHeader = true; + if (hasHeader && sourceHeader.empty) sourceHeader = "file"; + + /* Assume the remaing arguments are filepaths. */ + foreach (fp; cmdArgs[1 .. $]) + { + import std.path : baseName, stripExtension; + files ~= fp; + fileSourceNames[fp] = fp.stripExtension.baseName; + } + + /* Add a name mapping for dash ('-') unless it was included in the --file option. */ + if ("-" !in fileSourceNames) fileSourceNames["-"] = "stdin"; + } + catch (Exception exc) + { + stderr.writefln("[%s] Error processing command line arguments: %s", programName, exc.msg); + return tuple(false, 1); + } + return tuple(true, 0); + } +} + +void tsvAppend(OutputRange)(TsvAppendOptions cmdopt, OutputRange outputStream) + if (isOutputRange!(OutputRange, char)) +{ + bool headerWritten = false; + foreach (filename; (cmdopt.files.length > 0) ? cmdopt.files : ["-"]) + { + auto inputStream = (filename == "-") ? stdin : filename.File(); + auto sourceName = cmdopt.fileSourceNames[filename]; + foreach (fileLineNum, line; inputStream.byLine(KeepTerminator.yes).enumerate(1)) + { + if (cmdopt.hasHeader && fileLineNum == 1) + { + if (!headerWritten) + { + if (cmdopt.trackSource) + { + outputStream.put(cmdopt.sourceHeader); + outputStream.put(cmdopt.delim); + } + outputStream.put(line); + headerWritten = true; + } + } + else { + if (cmdopt.trackSource) + { + outputStream.put(sourceName); + outputStream.put(cmdopt.delim); + } + outputStream.put(line); + } + } + } +} + +version(unittest) +{ + /* Unit test helper functions. */ + + import unittest_utils; // tsv unit test helpers, from common/src/. + + void testTsvAppend(string[] cmdArgs, string[][] expected) + { + import std.array : appender; + import std.format : format; + + assert(cmdArgs.length > 0, "[testTsvAppend] cmdArgs must not be empty."); + + auto formatAssertMessage(T...)(string msg, T formatArgs) + { + auto formatString = "[testTsvAppend] %s: " ~ msg; + return format(formatString, cmdArgs[0], formatArgs); + } + + TsvAppendOptions cmdopt; + auto savedCmdArgs = cmdArgs.to!string; + auto r = cmdopt.processArgs(cmdArgs); + assert(r[0], formatAssertMessage("Invalid command lines arg: '%s'.", savedCmdArgs)); + + auto output = appender!(char[])(); + tsvAppend(cmdopt, output); + auto expectedOutput = expected.tsvDataToString; + + assert(output.data == expectedOutput, + formatAssertMessage( + "Result != expected:\n=====Expected=====\n%s=====Actual=======\n%s==================", + expectedOutput.to!string, output.data.to!string)); + } + } + +unittest +{ + import std.path : buildPath; + import std.file : rmdirRecurse; + import std.format : format; + + auto testDir = makeUnittestTempDir("tsv_append"); + scope(exit) testDir.rmdirRecurse; + + string[][] data1 = + [["field_a", "field_b", "field_c"], + ["red", "17", "κόκκινος"], + ["blue", "12", "άσπρο"]]; + + string[][] data2 = + [["field_a", "field_b", "field_c"], + ["green", "13.5", "κόκκινος"], + ["blue", "15", "πράσινος"]]; + + string[][] data3 = + [["field_a", "field_b", "field_c"], + ["yellow", "9", "κίτρινος"]]; + + string[][] dataHeaderRowOnly = + [["field_a", "field_b", "field_c"]]; + + string[][] dataEmpty = [[]]; + + string filepath1 = buildPath(testDir, "file1.tsv"); + string filepath2 = buildPath(testDir, "file2.tsv"); + string filepath3 = buildPath(testDir, "file3.tsv"); + string filepathHeaderRowOnly = buildPath(testDir, "fileHeaderRowOnly.tsv"); + string filepathEmpty = buildPath(testDir, "fileEmpty.tsv"); + + writeUnittestTsvFile(filepath1, data1); + writeUnittestTsvFile(filepath2, data2); + writeUnittestTsvFile(filepath3, data3); + writeUnittestTsvFile(filepathHeaderRowOnly, dataHeaderRowOnly); + writeUnittestTsvFile(filepathEmpty, dataEmpty); + + testTsvAppend(["test-1", filepath1], data1); + testTsvAppend(["test-2", "--header", filepath1], data1); + testTsvAppend(["test-3", filepath1, filepath2], data1 ~ data2); + + testTsvAppend(["test-4", "--header", filepath1, filepath2], + [["field_a", "field_b", "field_c"], + ["red", "17", "κόκκινος"], + ["blue", "12", "άσπρο"], + ["green", "13.5", "κόκκινος"], + ["blue", "15", "πράσινος"]]); + + testTsvAppend(["test-5", "--header", filepath1, filepath2, filepath3], + [["field_a", "field_b", "field_c"], + ["red", "17", "κόκκινος"], + ["blue", "12", "άσπρο"], + ["green", "13.5", "κόκκινος"], + ["blue", "15", "πράσινος"], + ["yellow", "9", "κίτρινος"]]); + + testTsvAppend(["test-6", filepath1, filepathEmpty, filepath2, filepathHeaderRowOnly, filepath3], + data1 ~ dataEmpty ~ data2 ~ dataHeaderRowOnly ~ data3); + + testTsvAppend(["test-7", "--header", filepath1, filepathEmpty, filepath2, filepathHeaderRowOnly, filepath3], + [["field_a", "field_b", "field_c"], + ["red", "17", "κόκκινος"], + ["blue", "12", "άσπρο"], + ["green", "13.5", "κόκκινος"], + ["blue", "15", "πράσινος"], + ["yellow", "9", "κίτρινος"]]); + + testTsvAppend(["test-8", "--track-source", filepath1, filepath2], + [["file1", "field_a", "field_b", "field_c"], + ["file1", "red", "17", "κόκκινος"], + ["file1", "blue", "12", "άσπρο"], + ["file2", "field_a", "field_b", "field_c"], + ["file2", "green", "13.5", "κόκκινος"], + ["file2", "blue", "15", "πράσινος"]]); + + testTsvAppend(["test-9", "--header", "--track-source", filepath1, filepath2], + [["file", "field_a", "field_b", "field_c"], + ["file1", "red", "17", "κόκκινος"], + ["file1", "blue", "12", "άσπρο"], + ["file2", "green", "13.5", "κόκκινος"], + ["file2", "blue", "15", "πράσινος"]]); + + testTsvAppend(["test-10", "-H", "-t", "--source-header", "source", + filepath1, filepathEmpty, filepath2, filepathHeaderRowOnly, filepath3], + [["source", "field_a", "field_b", "field_c"], + ["file1", "red", "17", "κόκκινος"], + ["file1", "blue", "12", "άσπρο"], + ["file2", "green", "13.5", "κόκκινος"], + ["file2", "blue", "15", "πράσινος"], + ["file3", "yellow", "9", "κίτρινος"]]); + + testTsvAppend(["test-11", "-H", "-t", "-s", "id", "--file", format("1a=%s", filepath1), + "--file", format("1b=%s", filepath2), "--file", format("1c=%s", filepath3)], + [["id", "field_a", "field_b", "field_c"], + ["1a", "red", "17", "κόκκινος"], + ["1a", "blue", "12", "άσπρο"], + ["1b", "green", "13.5", "κόκκινος"], + ["1b", "blue", "15", "πράσινος"], + ["1c", "yellow", "9", "κίτρινος"]]); + + testTsvAppend(["test-12", "-s", "id", "-f", format("1a=%s", filepath1), + "-f", format("1b=%s", filepath2), filepath3], + [["id", "field_a", "field_b", "field_c"], + ["1a", "red", "17", "κόκκινος"], + ["1a", "blue", "12", "άσπρο"], + ["1b", "green", "13.5", "κόκκινος"], + ["1b", "blue", "15", "πράσινος"], + ["file3", "yellow", "9", "κίτρινος"]]); +} diff --git a/tsv-append/tests/gold/basic_tests_1.txt b/tsv-append/tests/gold/basic_tests_1.txt new file mode 100644 index 00000000..58aa903b --- /dev/null +++ b/tsv-append/tests/gold/basic_tests_1.txt @@ -0,0 +1,270 @@ +Basic tests set 1 +----------------- + +====[tsv-append input3x2.tsv input3x5.tsv]==== +field1 field2 field3 +abc def ghi +field1 field2 field3 +jkl mno pqr +123 456 789 +xy1 xy2 xy3 +pqx pqy pqz + +====[tsv-append input1x3.tsv input1x4.tsv]==== +field1 +row 1 +row 2 +field1 +next-empty + +last-line + +====[tsv-append input3x2.tsv input1x3.tsv input3x5.tsv input1x4.tsv]==== +field1 field2 field3 +abc def ghi +field1 +row 1 +row 2 +field1 field2 field3 +jkl mno pqr +123 456 789 +xy1 xy2 xy3 +pqx pqy pqz +field1 +next-empty + +last-line + +====[tsv-append input3x5.tsv]==== +field1 field2 field3 +jkl mno pqr +123 456 789 +xy1 xy2 xy3 +pqx pqy pqz + +====[tsv-append --header input3x2.tsv input3x5.tsv]==== +field1 field2 field3 +abc def ghi +jkl mno pqr +123 456 789 +xy1 xy2 xy3 +pqx pqy pqz + +====[tsv-append -H input1x3.tsv input1x4.tsv]==== +field1 +row 1 +row 2 +next-empty + +last-line + +====[tsv-append -H input3x2.tsv input1x3.tsv input3x5.tsv input1x4.tsv]==== +field1 field2 field3 +abc def ghi +row 1 +row 2 +jkl mno pqr +123 456 789 +xy1 xy2 xy3 +pqx pqy pqz +next-empty + +last-line + +====[tsv-append -H input3x5.tsv]==== +field1 field2 field3 +jkl mno pqr +123 456 789 +xy1 xy2 xy3 +pqx pqy pqz + +====[tsv-append --track-source input3x2.tsv input3x5.tsv]==== +input3x2 field1 field2 field3 +input3x2 abc def ghi +input3x5 field1 field2 field3 +input3x5 jkl mno pqr +input3x5 123 456 789 +input3x5 xy1 xy2 xy3 +input3x5 pqx pqy pqz + +====[tsv-append -t input1x3.tsv input1x4.tsv]==== +input1x3 field1 +input1x3 row 1 +input1x3 row 2 +input1x4 field1 +input1x4 next-empty +input1x4 +input1x4 last-line + +====[tsv-append -t input3x2.tsv input1x3.tsv input3x5.tsv input1x4.tsv]==== +input3x2 field1 field2 field3 +input3x2 abc def ghi +input1x3 field1 +input1x3 row 1 +input1x3 row 2 +input3x5 field1 field2 field3 +input3x5 jkl mno pqr +input3x5 123 456 789 +input3x5 xy1 xy2 xy3 +input3x5 pqx pqy pqz +input1x4 field1 +input1x4 next-empty +input1x4 +input1x4 last-line + +====[tsv-append -t input3x5.tsv]==== +input3x5 field1 field2 field3 +input3x5 jkl mno pqr +input3x5 123 456 789 +input3x5 xy1 xy2 xy3 +input3x5 pqx pqy pqz + +====[tsv-append --header --track-source input3x2.tsv input3x5.tsv]==== +file field1 field2 field3 +input3x2 abc def ghi +input3x5 jkl mno pqr +input3x5 123 456 789 +input3x5 xy1 xy2 xy3 +input3x5 pqx pqy pqz + +====[tsv-append -H -t input1x3.tsv input1x4.tsv]==== +file field1 +input1x3 row 1 +input1x3 row 2 +input1x4 next-empty +input1x4 +input1x4 last-line + +====[tsv-append -H -t input3x2.tsv input1x3.tsv input3x5.tsv input1x4.tsv]==== +file field1 field2 field3 +input3x2 abc def ghi +input1x3 row 1 +input1x3 row 2 +input3x5 jkl mno pqr +input3x5 123 456 789 +input3x5 xy1 xy2 xy3 +input3x5 pqx pqy pqz +input1x4 next-empty +input1x4 +input1x4 last-line + +====[tsv-append -H -t input3x5.tsv]==== +file field1 field2 field3 +input3x5 jkl mno pqr +input3x5 123 456 789 +input3x5 xy1 xy2 xy3 +input3x5 pqx pqy pqz + +====[tsv-append --source-header source input3x2.tsv input3x5.tsv]==== +source field1 field2 field3 +input3x2 abc def ghi +input3x5 jkl mno pqr +input3x5 123 456 789 +input3x5 xy1 xy2 xy3 +input3x5 pqx pqy pqz + +====[tsv-append --s source input1x3.tsv input1x4.tsv]==== +source field1 +input1x3 row 1 +input1x3 row 2 +input1x4 next-empty +input1x4 +input1x4 last-line + +====[tsv-append -s source input3x2.tsv input1x3.tsv input3x5.tsv input1x4.tsv]==== +source field1 field2 field3 +input3x2 abc def ghi +input1x3 row 1 +input1x3 row 2 +input3x5 jkl mno pqr +input3x5 123 456 789 +input3x5 xy1 xy2 xy3 +input3x5 pqx pqy pqz +input1x4 next-empty +input1x4 +input1x4 last-line + +====[tsv-append -s source input3x5.tsv]==== +source field1 field2 field3 +input3x5 jkl mno pqr +input3x5 123 456 789 +input3x5 xy1 xy2 xy3 +input3x5 pqx pqy pqz + +====[tsv-append -H -s source input3x2.tsv input3x5.tsv]==== +source field1 field2 field3 +input3x2 abc def ghi +input3x5 jkl mno pqr +input3x5 123 456 789 +input3x5 xy1 xy2 xy3 +input3x5 pqx pqy pqz + +====[tsv-append -H -t -s source input3x2.tsv input3x5.tsv]==== +source field1 field2 field3 +input3x2 abc def ghi +input3x5 jkl mno pqr +input3x5 123 456 789 +input3x5 xy1 xy2 xy3 +input3x5 pqx pqy pqz + +====[tsv-append -t --file Input-A=input1x3.tsv --file Input-B=input1x4.tsv]==== +Input-A field1 +Input-A row 1 +Input-A row 2 +Input-B field1 +Input-B next-empty +Input-B +Input-B last-line + +====[tsv-append -H -t -f Input-A=input1x3.tsv -f Input-B=input1x4.tsv]==== +file field1 +Input-A row 1 +Input-A row 2 +Input-B next-empty +Input-B +Input-B last-line + +====[tsv-append -H -t -s πηγή -f κόκκινος=input1x3.tsv -f άσπρο=input1x4.tsv]==== +πηγή field1 +κόκκινος row 1 +κόκκινος row 2 +άσπρο next-empty +άσπρο +άσπρο last-line + +====[cat input3x2.tsv | tsv-append]==== +field1 field2 field3 +abc def ghi + +====[cat input3x2.tsv | tsv-append -- - input3x5.tsv]==== +field1 field2 field3 +abc def ghi +field1 field2 field3 +jkl mno pqr +123 456 789 +xy1 xy2 xy3 +pqx pqy pqz + +====[cat input3x2.tsv | tsv-append -H -- - input3x5.tsv]==== +field1 field2 field3 +abc def ghi +jkl mno pqr +123 456 789 +xy1 xy2 xy3 +pqx pqy pqz + +====[cat input3x2.tsv | tsv-append -H input3x5.tsv -- -]==== +field1 field2 field3 +jkl mno pqr +123 456 789 +xy1 xy2 xy3 +pqx pqy pqz +abc def ghi + +====[cat input3x2.tsv | tsv-append -H -f standard-input=- -f 3x5=input3x5.tsv ]==== +file field1 field2 field3 +standard-input abc def ghi +3x5 jkl mno pqr +3x5 123 456 789 +3x5 xy1 xy2 xy3 +3x5 pqx pqy pqz diff --git a/tsv-append/tests/gold/error_tests_1.txt b/tsv-append/tests/gold/error_tests_1.txt new file mode 100644 index 00000000..171cab6d --- /dev/null +++ b/tsv-append/tests/gold/error_tests_1.txt @@ -0,0 +1,14 @@ +Error test set 1 +---------------- + +====[tsv-append no_such_file.tsv]==== +Error [tsv-append]: Cannot open file `no_such_file.tsv' in mode `rb' (No such file or directory) + +====[tsv-append -f none=no_such_file.tsv]==== +Error [tsv-append]: Cannot open file `no_such_file.tsv' in mode `rb' (No such file or directory) + +====[tsv-append --no-such-param input1x3.tsv]==== +[tsv-append] Error processing command line arguments: Unrecognized option --no-such-param + +====[tsv-append -d ß input1x3.tsv]==== +[tsv-append] Error processing command line arguments: Invalid UTF-8 sequence (at index 1) diff --git a/tsv-append/tests/input1x3.tsv b/tsv-append/tests/input1x3.tsv new file mode 100644 index 00000000..06ddcd7e --- /dev/null +++ b/tsv-append/tests/input1x3.tsv @@ -0,0 +1,3 @@ +field1 +row 1 +row 2 diff --git a/tsv-append/tests/input1x4.tsv b/tsv-append/tests/input1x4.tsv new file mode 100644 index 00000000..9ee6d244 --- /dev/null +++ b/tsv-append/tests/input1x4.tsv @@ -0,0 +1,4 @@ +field1 +next-empty + +last-line diff --git a/tsv-append/tests/input3x2.tsv b/tsv-append/tests/input3x2.tsv new file mode 100644 index 00000000..ab4c9352 --- /dev/null +++ b/tsv-append/tests/input3x2.tsv @@ -0,0 +1,2 @@ +field1 field2 field3 +abc def ghi diff --git a/tsv-append/tests/input3x5.tsv b/tsv-append/tests/input3x5.tsv new file mode 100644 index 00000000..42ef972f --- /dev/null +++ b/tsv-append/tests/input3x5.tsv @@ -0,0 +1,5 @@ +field1 field2 field3 +jkl mno pqr +123 456 789 +xy1 xy2 xy3 +pqx pqy pqz diff --git a/tsv-append/tests/tests.sh b/tsv-append/tests/tests.sh new file mode 100755 index 00000000..e27cf327 --- /dev/null +++ b/tsv-append/tests/tests.sh @@ -0,0 +1,89 @@ +#! /bin/sh + +## Most tsv-append testing is done as unit tests. Tests executed by this script are +## run against the final executable. This provides a sanity check that the +## final executable is good. Tests are easy to run in the format, so there is +## overlap. However, these tests do not test edge cases as rigorously as unit tests. +## Instead, these tests focus on areas that are hard to test in unit tests. + +if [ $# -le 1 ]; then + echo "Insufficient arguments. A program name and output director are required." + exit 1 +fi + +prog=$1 +shift +odir=$1 +echo "Testing ${prog}, output to ${odir}" + +## Three args: program, args, output file +runtest () { + echo "" >> $3 + echo "====[tsv-append $2]====" >> $3 + $1 $2 >> $3 2>&1 + return 0 +} + +basic_tests_1=${odir}/basic_tests_1.txt + +echo "Basic tests set 1" > ${basic_tests_1} +echo "-----------------" >> ${basic_tests_1} + +runtest ${prog} "input3x2.tsv input3x5.tsv" ${basic_tests_1} +runtest ${prog} "input1x3.tsv input1x4.tsv" ${basic_tests_1} +runtest ${prog} "input3x2.tsv input1x3.tsv input3x5.tsv input1x4.tsv" ${basic_tests_1} +runtest ${prog} "input3x5.tsv" ${basic_tests_1} + +runtest ${prog} "--header input3x2.tsv input3x5.tsv" ${basic_tests_1} +runtest ${prog} "-H input1x3.tsv input1x4.tsv" ${basic_tests_1} +runtest ${prog} "-H input3x2.tsv input1x3.tsv input3x5.tsv input1x4.tsv" ${basic_tests_1} +runtest ${prog} "-H input3x5.tsv" ${basic_tests_1} + +runtest ${prog} "--track-source input3x2.tsv input3x5.tsv" ${basic_tests_1} +runtest ${prog} "-t input1x3.tsv input1x4.tsv" ${basic_tests_1} +runtest ${prog} "-t input3x2.tsv input1x3.tsv input3x5.tsv input1x4.tsv" ${basic_tests_1} +runtest ${prog} "-t input3x5.tsv" ${basic_tests_1} + +runtest ${prog} "--header --track-source input3x2.tsv input3x5.tsv" ${basic_tests_1} +runtest ${prog} "-H -t input1x3.tsv input1x4.tsv" ${basic_tests_1} +runtest ${prog} "-H -t input3x2.tsv input1x3.tsv input3x5.tsv input1x4.tsv" ${basic_tests_1} +runtest ${prog} "-H -t input3x5.tsv" ${basic_tests_1} + +runtest ${prog} "--source-header source input3x2.tsv input3x5.tsv" ${basic_tests_1} +runtest ${prog} "--s source input1x3.tsv input1x4.tsv" ${basic_tests_1} +runtest ${prog} "-s source input3x2.tsv input1x3.tsv input3x5.tsv input1x4.tsv" ${basic_tests_1} +runtest ${prog} "-s source input3x5.tsv" ${basic_tests_1} +runtest ${prog} "-H -s source input3x2.tsv input3x5.tsv" ${basic_tests_1} +runtest ${prog} "-H -t -s source input3x2.tsv input3x5.tsv" ${basic_tests_1} + +runtest ${prog} "-t --file Input-A=input1x3.tsv --file Input-B=input1x4.tsv" ${basic_tests_1} +runtest ${prog} "-H -t -f Input-A=input1x3.tsv -f Input-B=input1x4.tsv" ${basic_tests_1} +runtest ${prog} "-H -t -s πηγή -f κόκκινος=input1x3.tsv -f άσπρο=input1x4.tsv" ${basic_tests_1} + +## runtest can't create a command lines with standard input. Write them out. +echo "" >> ${basic_tests_1}; echo "====[cat input3x2.tsv | tsv-append]====" >> ${basic_tests_1} +cat input3x2.tsv | ${prog} >> ${basic_tests_1} 2>&1 + +echo "" >> ${basic_tests_1}; echo "====[cat input3x2.tsv | tsv-append -- - input3x5.tsv]====" >> ${basic_tests_1} +cat input3x2.tsv | ${prog} -- - input3x5.tsv >> ${basic_tests_1} 2>&1 + +echo "" >> ${basic_tests_1}; echo "====[cat input3x2.tsv | tsv-append -H -- - input3x5.tsv]====" >> ${basic_tests_1} +cat input3x2.tsv | ${prog} -H -- - input3x5.tsv >> ${basic_tests_1} 2>&1 + +echo "" >> ${basic_tests_1}; echo "====[cat input3x2.tsv | tsv-append -H input3x5.tsv -- -]====" >> ${basic_tests_1} +cat input3x2.tsv | ${prog} -H input3x5.tsv -- - >> ${basic_tests_1} 2>&1 + +echo "" >> ${basic_tests_1}; echo "====[cat input3x2.tsv | tsv-append -H -f standard-input=- -f 3x5=input3x5.tsv ]====" >> ${basic_tests_1} +cat input3x2.tsv | ${prog} -H -f standard-input=- -f 3x5=input3x5.tsv >> ${basic_tests_1} 2>&1 + +## Error cases + +error_tests_1=${odir}/error_tests_1.txt + +echo "Error test set 1" > ${error_tests_1} +echo "----------------" >> ${error_tests_1} + +runtest ${prog} "no_such_file.tsv" ${error_tests_1} +runtest ${prog} "-f none=no_such_file.tsv" ${error_tests_1} +runtest ${prog} "--no-such-param input1x3.tsv" ${error_tests_1} +runtest ${prog} "-d ß input1x3.tsv" ${error_tests_1} diff --git a/tsv-summarize/makefile b/tsv-summarize/makefile index eb196c30..ee1889c9 100644 --- a/tsv-summarize/makefile +++ b/tsv-summarize/makefile @@ -1,6 +1,2 @@ include ../makedefs.mk include ../makeapp.mk - -# No external call tests yet -#test-debug: ; -#test-release: ;