You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nice use case. My first thought is to wonder if there enough commonality in these patterns to develop a tool around. More examples would shed light on this. But, if it turned out that the flexibility of awk or sed is needed, then it might be best to leave these tasks to those tools and custom scripts.
That's a good point, and I'm not unsympathetic to it at all. If I hit more examples, I'll try to remember to outline them here.
I'll note up front that I really don't like sed/awk for this sort of thing because they're specifically general line-oriented tools. It's fine if there's something like "cores" to anchor on for extracting numbers and splitting them (and I think you rightly surmise that I wasn't looking to necessarily extract the column name in the same operation), but for the more general case? They're clunky-- the awareness of columns is extremely powerful and useful.
Just doodling here, but something like: tsv-filter --split 1:_:cores,threads
...could be helpful. Or maybe something like regex substitution via capture groups: tsv-filter --split 1:'([0-9]+)cores_([0-9]+)threads':cores,threads
...if we continue looking at my original example. (The column selector is necessary for the more general case that you have multiple columns with the delimiter of interest -- colon, for example -- but you only want to split one of them and the other is something like a timestamp.)
Broadly, I think I'd characterise this class of problem as "normalisation", which also includes other transformations on columns. (For example, some existing tools produce measures in whole seconds, so I want to multiply that my 1000 or divide the millisecond metrics by the same so they can be compared properly. ...This might be a separate ER?)
Another feature request that came to mind as I was working. Consider the following single column of data:
I ended up doing it in post-process, but I think it'd be handy to have some way to split fields so that it comes out like this:
The text was updated successfully, but these errors were encountered: