Add vroom blogpost #308

jimhester · 2019-05-06T18:25:51Z

No description provided.

jimhester · 2019-05-06T18:33:32Z

Looking at the output I think I should do something about how much output is generated for each code block. Either

Always assign the results to a variable rather than letting it auto-print.
Provide a specification or use message = FALSE to suppress the column type message.

Or maybe even both... thoughts?

I think it is too cluttered

jimhester · 2019-05-07T12:31:20Z

@hadley, @jennybc or @batpigandme it would be great if one or more of you could review this!

batpigandme · 2019-05-07T12:50:44Z

On it!

hadley

Overall looks good — I don't need to re-review before you publish.

hadley · 2019-05-07T13:09:02Z

content/articles/2019-05-vroom-1-0-0.Rmarkdown

+I'm excited to announce that [vroom 1.0.0](http://vroom.r-lib.org) is now
+available on CRAN!
+
+vroom reads rectangular data, such as comma separated


I think you can combine these sentences into a paragraph.

hadley · 2019-05-07T13:11:09Z

content/articles/2019-05-vroom-1-0-0.Rmarkdown

+file lazily; you only pay for the data you use. This lazy access is done
+automatically, so no changes to your R data manipulation code are needed.
+
+vroom also provides efficient multi-threaded writing that is multiple times


I think we need a couple of sentences about vroom vs readr somewhere. i.e. we're not entirely sure yet, but we'll probably let them evolve separately for a little bit, but we plan to unite in the future. The major downside of vroom is that the laziness means that you can't get all problems up front, so unification will require some thought.

Yeah you are right, I forgot to include something like this.

hadley · 2019-05-07T13:11:27Z

content/articles/2019-05-vroom-1-0-0.Rmarkdown

+
+Compared to readr, the first difference you may note is you use only one
+function to read the files, `vroom()`. This is because `vroom()` guesses the
+delimiter of the file automatically (based on the first few lines). This works


Mention this is inspired by data.table

hadley · 2019-05-07T13:12:21Z

content/articles/2019-05-vroom-1-0-0.Rmarkdown

+vroom.
+
+```{r}
+# Split the flights data by carrier


I'd hide this code, and instead say something like: "Imagine we have a directory containing ..."

hadley · 2019-05-07T13:13:45Z

content/articles/2019-05-vroom-1-0-0.Rmarkdown

+
+It can even read gzipped files from the internet (although currently not the other compressed formats).
+
+## Reading and writing from pipe connections


Personally, I don't think this is important enough to include in the announcement, and including live code in data import makes me nervous.

hadley · 2019-05-07T13:15:28Z

content/articles/2019-05-vroom-1-0-0.Rmarkdown

+
+## Column types
+
+Like readr, vroom guesses the data types of columns as they are read, however sometimes it


Mention improved heuristic (i.e. looks at data throughout file, not just first n rows)

hadley · 2019-05-07T13:17:26Z

content/articles/2019-05-vroom-1-0-0.Rmarkdown

+
+vroom is fast, but how fast?
+We benchmarked vroom using a real world dataset of taxi trip data, with
+14.7 million rows, 11 columns. It contains a mix of numeric and textual data and has a


Suggested change

14.7 million rows, 11 columns. It contains a mix of numeric and textual data and has a

14.7 million rows, 11 columns. It contains a mix of numeric and text data and has a

hadley · 2019-05-07T13:18:38Z

content/articles/2019-05-vroom-1-0-0.Rmarkdown

+  - Filtering for "UNK" payment, this is 6434 rows (0.0435% of total).
+  - Aggregation of mean fare amount per payment type.
+
+<style>


Add this to #307 ?

hadley · 2019-05-07T13:19:21Z

content/articles/2019-05-vroom-1-0-0.Rmarkdown

+
+Some things to note in the results. The initial reading is much faster in vroom
+than any other method, and most of the manipulations, such as `print()`,
+`head()`, `tail()` and `sample()` are equally fast. However because the


So fast you can't see them in the plot

hadley · 2019-05-07T13:21:07Z

content/articles/2019-05-vroom-1-0-0.Rmarkdown

+`head()`, `tail()` and `sample()` are equally fast. However because the
+character data is read lazily operations such as `filter()` and `aggregrate()`
+which need character values require additional time.
+However this cost will only occur once, after the values have been read they


So why are both "aggregate" and "filter" quite wide?

(You might also rename "aggregate" to "summarise" because I keep thinking you mean the base function)

They obscure the point I think

The colored output is kind of beside the point for this, and it makes some things worse, like the separate blocks and comment highlighting.

batpigandme

Nicely done! I made a few suggestions, most of which are minor (commas, etc). Let me know if you have any questions, or if you're cool with these changes and want me to just go ahead and make them.

batpigandme · 2019-05-07T12:54:43Z

content/articles/2019-05-vroom-1-0-0.Rmarkdown

+(csv), tab separated (tsv) or fixed width files (fwf) into R.
+
+It performs the
+same function as packages like [readr](http://readr.r-lib.org),


It should either be functions (with an s), and packages, or "function like readr::read_csv(), data.table::fread()..." (so the list should either be packages or functions). Other option would be to sub out "function" for "role" in this first instance, since you specifically state that read.csv() is a function.

Also, maybe similar to instead of same as? (I'm just thinking about the scope of data.table).

yeah totally right, will change

batpigandme · 2019-05-07T13:02:41Z

content/articles/2019-05-vroom-1-0-0.Rmarkdown

+
+The main reason vroom can be faster is because character data is read from the
+file lazily; you only pay for the data you use. This lazy access is done
+automatically, so no changes to your R data manipulation code are needed.


hyphenate data-manipulation here (technically also R, R-data-manipulation, but I think that looks weird)

batpigandme · 2019-05-07T13:03:12Z

content/articles/2019-05-vroom-1-0-0.Rmarkdown

+file lazily; you only pay for the data you use. This lazy access is done
+automatically, so no changes to your R data manipulation code are needed.
+
+vroom also provides efficient multi-threaded writing that is multiple times


comma between efficient and multi-threaded

batpigandme · 2019-05-07T13:03:46Z

content/articles/2019-05-vroom-1-0-0.Rmarkdown

+automatically, so no changes to your R data manipulation code are needed.
+
+vroom also provides efficient multi-threaded writing that is multiple times
+faster on most inputs than the readr writer.


Maybe readr::write_*() functions?

batpigandme · 2019-05-07T13:06:16Z

content/articles/2019-05-vroom-1-0-0.Rmarkdown

+```
+
+The summary message after reading also differs from readr. We hope this output
+gives a more informative indication if the types of your columns are being guessed


"indication as to whether the types"?

batpigandme · 2019-05-07T13:33:53Z

content/articles/2019-05-vroom-1-0-0.Rmarkdown

+## Speed
+
+vroom is fast, but how fast?
+We benchmarked vroom using a real world dataset of taxi trip data, with


real-world gets hyphenated here, since it's modifying dataset (technically taxi-trip, too, but dealer's choice there)

batpigandme · 2019-05-07T13:34:38Z

content/articles/2019-05-vroom-1-0-0.Rmarkdown

+
+vroom is fast, but how fast?
+We benchmarked vroom using a real world dataset of taxi trip data, with
+14.7 million rows, 11 columns. It contains a mix of numeric and textual data and has a


I'd put a comma after "a mix of numeric and textual data", and has... since you've got a list w/in list situation

batpigandme · 2019-05-07T13:35:24Z

content/articles/2019-05-vroom-1-0-0.Rmarkdown

+Some things to note in the results. The initial reading is much faster in vroom
+than any other method, and most of the manipulations, such as `print()`,
+`head()`, `tail()` and `sample()` are equally fast. However because the
+character data is read lazily operations such as `filter()` and `aggregrate()`


offset "which need character values" here with commas

Also, comma after lazily

batpigandme · 2019-05-07T13:36:52Z

content/articles/2019-05-vroom-1-0-0.Rmarkdown

+`head()`, `tail()` and `sample()` are equally fast. However because the
+character data is read lazily operations such as `filter()` and `aggregrate()`
+which need character values require additional time.
+However this cost will only occur once, after the values have been read they


For this sentence I'd suggest:
However, this cost will only occur once. After the values have been read, they will be stored in memory, and subsequent accesses will be equivalent to other packages.

content/articles/2019-05-vroom-1-0-0.Rmarkdown

batpigandme · 2019-05-07T15:12:31Z

content/articles/2019-05-vroom-1-0-0.Rmarkdown

+
+vroom reads rectangular data, such as comma separated
+(csv), tab separated (tsv) or fixed width files (fwf) into R. It performs
+similar roles to functions like [readr::read_csv()](http://readr.r-lib.org),


Whoops — need backticks around readr::read_csv() and data.table::fread()

Add vroom blogpost

b20f3f2

jimhester force-pushed the vroom-1.0.0 branch from dcf4765 to b20f3f2 Compare May 6, 2019 18:28

jimhester added 2 commits May 6, 2019 15:40

Do not auto-print data

6d3ce80

I think it is too cluttered

Add name repair section

e33cda0

hadley approved these changes May 7, 2019

View reviewed changes

jimhester added 3 commits May 7, 2019 09:32

Need to escape html

9401a6e

Disable the spec message for the name repair block

a217ffb

They obscure the point I think

Ditch color

894c58b

The colored output is kind of beside the point for this, and it makes some things worse, like the separate blocks and comment highlighting.

batpigandme requested changes May 7, 2019

View reviewed changes

jimhester added 3 commits May 7, 2019 10:58

Changes based on Hadley's comments

1c9c217

Address Mara's comments

eb8b1b2

Bump date

da81ec7

jimhester force-pushed the vroom-1.0.0 branch from 698cb68 to da81ec7 Compare May 7, 2019 15:10

batpigandme approved these changes May 7, 2019

View reviewed changes

batpigandme reviewed May 7, 2019

View reviewed changes

jimhester added 2 commits May 7, 2019 11:16

Add missing backticks

de9ea0d

Tweak wording slightly

dbe41c7

jimhester merged commit 114f179 into tidyverse:master May 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vroom blogpost #308

Add vroom blogpost #308

jimhester commented May 6, 2019

jimhester commented May 6, 2019

jimhester commented May 7, 2019

batpigandme commented May 7, 2019

hadley left a comment

hadley May 7, 2019

hadley May 7, 2019

jimhester May 7, 2019

hadley May 7, 2019

hadley May 7, 2019

hadley May 7, 2019

hadley May 7, 2019

hadley May 7, 2019

hadley May 7, 2019

hadley May 7, 2019

hadley May 7, 2019

batpigandme left a comment

batpigandme May 7, 2019

batpigandme May 7, 2019

jimhester May 7, 2019

batpigandme May 7, 2019

batpigandme May 7, 2019

batpigandme May 7, 2019

batpigandme May 7, 2019

batpigandme May 7, 2019

batpigandme May 7, 2019

batpigandme May 7, 2019

batpigandme May 7, 2019

batpigandme May 7, 2019

batpigandme May 7, 2019


		It can even read gzipped files from the internet (although currently not the other compressed formats).

		## Reading and writing from pipe connections


		## Column types

		Like readr, vroom guesses the data types of columns as they are read, however sometimes it

	14.7 million rows, 11 columns. It contains a mix of numeric and textual data and has a
	14.7 million rows, 11 columns. It contains a mix of numeric and text data and has a

Add vroom blogpost #308

Add vroom blogpost #308

Conversation

jimhester commented May 6, 2019

jimhester commented May 6, 2019

jimhester commented May 7, 2019

batpigandme commented May 7, 2019

hadley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

batpigandme left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment