Skip to content

Commit

Permalink
factors.Rmd clarification (hadley#577)
Browse files Browse the repository at this point in the history
  • Loading branch information
bbrewington authored and hadley committed May 4, 2017
1 parent 2afd79c commit 0f956d6
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions factors.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -144,15 +144,15 @@ When working with factors, the two most common operations are changing the order
It's often useful to change the order of the factor levels in a visualisation. For example, imagine you want to explore the average number of hours spent watching TV per day across religions:

```{r}
relig <- gss_cat %>%
relig_summary <- gss_cat %>%
group_by(relig) %>%
summarise(
age = mean(age, na.rm = TRUE),
tvhours = mean(tvhours, na.rm = TRUE),
n = n()
)
ggplot(relig, aes(tvhours, relig)) + geom_point()
ggplot(relig_summary, aes(tvhours, relig)) + geom_point()
```

It is difficult to interpret this plot because there's no overall pattern. We can improve it by reordering the levels of `relig` using `fct_reorder()`. `fct_reorder()` takes three arguments:
Expand All @@ -163,7 +163,7 @@ It is difficult to interpret this plot because there's no overall pattern. We ca
`x` for each value of `f`. The default value is `median`.

```{r}
ggplot(relig, aes(tvhours, fct_reorder(relig, tvhours))) +
ggplot(relig_summary, aes(tvhours, fct_reorder(relig, tvhours))) +
geom_point()
```

Expand All @@ -172,31 +172,31 @@ Reordering religion makes it much easier to see that people in the "Don't know"
As you start making more complicated transformations, I'd recommend moving them out of `aes()` and into a separate `mutate()` step. For example, you could rewrite the plot above as:

```{r, eval = FALSE}
relig %>%
relig_summary %>%
mutate(relig = fct_reorder(relig, tvhours)) %>%
ggplot(aes(tvhours, relig)) +
geom_point()
```
What if we create a similar plot looking at how average age varies across reported income level?

```{r}
rincome <- gss_cat %>%
rincome_summary <- gss_cat %>%
group_by(rincome) %>%
summarise(
age = mean(age, na.rm = TRUE),
tvhours = mean(tvhours, na.rm = TRUE),
n = n()
)
ggplot(rincome, aes(age, fct_reorder(rincome, age))) + geom_point()
ggplot(rincome_summary, aes(age, fct_reorder(rincome, age))) + geom_point()
```

Here, arbitrarily reordering the levels isn't a good idea! That's because `rincome` already has a principled order that we shouldn't mess with. Reserve `fct_reorder()` for factors whose levels are arbitrarily ordered.

However, it does make sense to pull "Not applicable" to the front with the other special levels. You can use `fct_relevel()`. It takes a factor, `f`, and then any number of levels that you want to move to the front of the line.

```{r}
ggplot(rincome, aes(age, fct_relevel(rincome, "Not applicable"))) +
ggplot(rincome_summary, aes(age, fct_relevel(rincome, "Not applicable"))) +
geom_point()
```

Expand Down

0 comments on commit 0f956d6

Please sign in to comment.