-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved CSV export - feedback welcome #900
Comments
I would vote for using tab as separator, but changing the file extension to |
Yes, this is the idea mentioned in the original post, but it has the disadvantage that it would make the column names really long and complex to use. For example, they would need to include a timestamp, and thus after importing in some third-party software, you would have to use these long and unique column names instead of for example just
Note that we already use tabs in our current "CSV" format. Notification of users is not difficult, we can likely add the new format as an option in addition to the existing format and then delete the previous format in a new major version.
Hm, I am not sure I follow this argument. How would it make it easy to identify the relevant columns, if all columns are shown?
Note that we do have this feature already (cf. documentation). |
How about keeping a raw/master TSV around and some postprocessing scripts based on CLI tools like sed/head/tail/grep and csvkit? |
The raw/master files that BenchExec uses are the result XML files. We cannot use CSV/TSV for this, because these files contain important meta information about the whole benchmark run, which we need to keep together with the measurement data. This is important for example for creating the HTML tables (which contain both) and also makes archiving results easier. |
The CSV tables exported by table-generator have a layout that is inspired by the HTML tables, but this sometimes makes them hard to use programmatically in other tools. We should improve this.
Open points:
status
), it is no longer unique. Some concatenation of run set name, timestamp, and column name? Having stuff like the timestamp there would be highly inconvenient for those where it is not needed. Maybe keep only the column name as long as it is unique?#
in front of the line?cut
. Should we change the name and extension to TSV instead? Will people understand that abbreviation?expectedVerdict
even if it is always empty?In general, there is a trade-off between having tables that always have exactly same format (all task-id columns, header content with full information) even if redundant / not applicable and tables that are tailored to the specific use case (keeping column names short and easy to handle when they are anyway unique, hiding expected verdict if empty, etc.). The latter can be much more convenient in many use cases, but are more difficult to use in use cases where data from lots of different scenarios are combined.
Maybe we also need to add some options to the table definitions to make it possible for users to choose among them (e.g., which columns should be shown for the task id).
Any feedback and ideas, whether about the general goal or concrete ideas, is highly welcome!
@s-winter ping
The text was updated successfully, but these errors were encountered: