Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example code for a simple node stream + buffer #88

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Example code for a simple node stream + buffer #88

wants to merge 1 commit into from

Conversation

tobinbc
Copy link

@tobinbc tobinbc commented Sep 4, 2019

There isn't anything too obvious about writing to streams in the readme that I could see so thought it'd be useful for others too :)

There isn't anything too obvious about writing to streams in the readme so thought it'd be useful for others :)
@kessler
Copy link
Contributor

kessler commented Sep 23, 2019

@tobinbc Thank you very much for the example. We're working on a new version of parquetjs which will support streams out of the box - this rewrite will take several months though, given the amount of time we have for this project atm. So I will gladly add your contribution to the docs in the meantime. Could you please sign the cla agreement here: https://github.com/ironSource/opensource-contributor-license-agreement

Thanks

@dobesv
Copy link
Contributor

dobesv commented Nov 29, 2019

Is there any way I could help with that rewrite, is it in a branch? Is there a process to contribute to it?

@kessler
Copy link
Contributor

kessler commented Dec 8, 2019

@dobesv we can certainly discuss it, can you email me?
image

@dgendill
Copy link

dgendill commented Jun 7, 2022

For those wanting to read parquet files outside the file system, I've found that this fork provides a good example of extending the ParquetEnvelopeReader to read from different sources, namely reading from a Buffer, From S3, or from a URL.

https://github.com/LibertyDSNP/parquetjs/blob/v1.2.0/lib/reader.ts#L378

That code has slightly deviated from the original ParquetEnvelopeReader which can be found here:

https://github.com/ironSource/parquetjs/blob/v0.8.0/lib/reader.js#L191

But the big idea is mostly the same. If you provide implementations of these functions you can create your own custom ParquetEnvelopeReader.

/*
readFn: (offset: number, length: number) => Promise<Buffer>
close: () => void;
fileSize: number;
*/
const myReader = new ParquetEnvelopeReader(readFn, closeFn, fileStat.size);

I have yet to implement this myself, but it seems reasonable that this could be extended to support a generic NodeJs Readable Stream such as the one provided by BlobDownloadResponseParsed.readablestreambody used in @azure/storage-blob

Being able to use a generic ReadableStream would also be a solution to fix this issue and open up the possibility of interfacing with other cloud services: #110

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants