Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow processing pre-parsed slice of fields (via interp.Config) #248

Open
TLINDEN opened this issue Jan 11, 2025 · 4 comments
Open

Allow processing pre-parsed slice of fields (via interp.Config) #248

TLINDEN opened this issue Jan 11, 2025 · 4 comments

Comments

@TLINDEN
Copy link

TLINDEN commented Jan 11, 2025

Howdy,

currently - as far as I understand the docs - if you use the go api, you have to feed the data to be processed as a string (or from whatever input it comes, but will be a string anyway), which then will be parsed according to the settings. I suspect, that in the end this input ends up in some data structure like [][]any{} or something like that.

What if I already have such a data structure? It would be cool to be able to feed just the data to interp.Config{} somehow, thereby skipping the CSV parsing and just feed the data to the AWK program?

Something like:

func main() {
	src := `{ total += @"amount" } END { print total }`
	input := [][]string{
		[]string{"name", "amount"},
		[]string{"Bob", "17.50"},
		[]string{"Jill", "20"},
		[]string{"Boba Fett", "100.00"},
	}
	prog, err := parser.ParseProgram([]byte(src), nil)
	if err != nil {
		fmt.Println(err)
		return
	}
	config := &interp.Config{
		Data:      input,
		InputMode: interp.CSVMode,
		CSVInput:  interp.CSVInputConfig{Comment: '#', Header: true},
	}
	_, err = interp.ExecProgram(prog, config)
	if err != nil {
		fmt.Println(err)
		return
	}
}

That way I could for instance parse JSON tabular data and run some AWK code on it :)

Would that be possible?

@benhoyt
Copy link
Owner

benhoyt commented Jan 12, 2025

This is an interesting idea. I don't have any immediate plans to do this, but I could see it being done with another value for Config.InputMode, maybe FieldsMode where you specify a FieldsFunc func() []string which yields the next slice of fields (there's need to be a way to turn Header: true on too). I'd want to do it iterator-style like that, to avoid the caller needing the entire slice of slices of fields in memory at once.

I'll probably not work on this issue now, but I'll leave it open to come back to later, or in case someone wants to try.

In the meantime, I recommend writing your parsed data fields to CSV using encoding/csv.Writer, and then using GoAWK in CSV input mode on that. It's a bit roundabout and hence less efficient, but it'd get the job done.

@benhoyt benhoyt changed the title Would it be possible to use a readily available slice of fields as input for interp.Config{}? Allow processing pre-parsed slice of fields (via interp.Config) Jan 12, 2025
@TLINDEN
Copy link
Author

TLINDEN commented Jan 13, 2025

I'll probably not work on this issue now, but I'll leave it open to come back to later, or in case someone wants to try.

I could try my luck, if you could show some pointers where to start in your code?

In the meantime, I recommend writing your parsed data fields to CSV using encoding/csv.Writer, and then using GoAWK in CSV input mode on that.

Ah, very nice. I'll try this.

@benhoyt
Copy link
Owner

benhoyt commented Jan 13, 2025

I could try my luck, if you could show some pointers where to start in your code?

It'll be in the interp package (and directory). The interp.ensureFields function is a good place to start -- that's what parses the line into fields depending on the configured input mode and the value of FS (AWK field separator). I imagine you'll have to add a few new fields to the interp struct to keep track of things, and then return the next bunch of fields from the FieldsFunc.

Just note that I'm not 100% sure of this yet, but it'd be good to see a proof of concept (don't worry about tests) and then I'll make a decision about whether the concept is a goer or not.

@adsr303
Copy link

adsr303 commented Jan 13, 2025

@TLINDEN: For the original use case in this feature request, please also look into the yq tool – it can convert tabular JSON to CSV, among other stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants