Skip to content

Commit

Permalink
differences for PR #263
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Apr 3, 2024
1 parent 51e4599 commit 5957c59
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 11 deletions.
30 changes: 20 additions & 10 deletions 06-free-text.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,8 @@ We now use the `tr` command, used for translating or
deleting characters. Type and run:

```bash
$ tr -d '[:punct:]\r' < gulliver-noheadfoot.txt > gulliver-noheadfootpunct.txt
$ tr -d '[:punct:]\r' < gulliver-noheadfoot.txt > \
> gulliver-noheadfootpunct.txt
```

This uses the translate command and a special syntax to remove all punctuation
Expand All @@ -126,7 +127,8 @@ It also requires the use of both the output redirect `>` we have seen and the in
Finally regularise the text by removing all the uppercase lettering.

```bash
$ tr '[:upper:]' '[:lower:]' < gulliver-noheadfootpunct.txt > gulliver-clean.txt
$ tr '[:upper:]' '[:lower:]' < gulliver-noheadfootpunct.txt > \
> gulliver-clean.txt
```

Open the `gulliver-clean.txt` in a text editor. Note how the text has been transformed ready for analysis.
Expand All @@ -136,7 +138,8 @@ Open the `gulliver-clean.txt` in a text editor. Note how the text has been trans
We are now ready to pull the text apart.

```bash
$ tr ' ' '\n' < gulliver-clean.txt | sort | uniq -c | sort -nr > gulliver-final.txt
$ tr ' ' '\n' < gulliver-clean.txt | sort | \
> uniq -c | sort -nr > gulliver-final.txt
```

Here we've made extended use of the pipes we saw in [Counting and mining with the shell](05-counting-mining.md). The first part of this script uses the translate command again, this time to translate every blank space into `\n` which renders as a new line. Every word in the file will at this stage have its own line.
Expand Down Expand Up @@ -164,7 +167,8 @@ As a reminder, use the text editor of your choice to write a file that looks lik

```bash
#!/bin/bash
# This script removes quote marks from gulliver-clean.txt and saves the result as gulliver-noquotes.txt
# This script removes quote marks from gulliver-clean.txt
# and saves the result as gulliver-noquotes.txt
(replace this line with your solution)
```

Expand All @@ -180,8 +184,10 @@ bash remove-quotes.sh

```bash
#!/bin/bash
# This script removes quote marks from gulliver-clean.txt and saves the result as gulliver-noquotes.txt
sed -Ee 's/[""‘']//g' gulliver-clean.txt > gulliver-noquotes.txt
# This script removes quote marks from gulliver-clean.txt
# and saves the result as gulliver-noquotes.txt
sed -Ee 's/[""‘']//g' gulliver-clean.txt > \
gulliver-noquotes.txt
```
If this doesn't work for you, you might need to check whether your text editor can
Expand Down Expand Up @@ -242,7 +248,8 @@ We're going to start by using the `tr` command, used for translating or
deleting characters. Type and run:
```bash
$ tr -d '[:punct:]' < 201403160_01_text.json > 201403160_01_text-nopunct.txt
$ tr -d '[:punct:]' < 201403160_01_text.json > \
> 201403160_01_text-nopunct.txt
```
This uses the translate command and a special syntax to remove all punctuation.
Expand All @@ -251,7 +258,8 @@ It also requires the use of both the output redirect `>` we have seen and the in
Finally regularise the text by removing all the uppercase lettering.
```bash
$ tr '[:upper:]' '[:lower:]' < 201403160_01_text-nopunct.txt > 201403160_01_text-clean.txt
$ tr '[:upper:]' '[:lower:]' < 201403160_01_text-nopunct.txt > \
> 201403160_01_text-clean.txt
```
Open the `201403160_01_text-clean.txt` in a text editor. Note how the text has been transformed ready for analysis.
Expand All @@ -261,7 +269,8 @@ Open the `201403160_01_text-clean.txt` in a text editor. Note how the text has b
We are now ready to pull the text apart.
```bash
$ tr ' ' '\n' < 201403160_01_text-clean.txt | sort | uniq -c | sort -nr > 201403160_01_text-final.txt
$ tr ' ' '\n' < 201403160_01_text-clean.txt | sort | \
> uniq -c | sort -nr > 201403160_01_text-final.txt
```
Here we've made extended use of the pipes we saw in [Counting and mining with the shell](05-counting-mining.md). The first part of this script uses the translate command again, this time to translate every blank space into `\n` which renders as a new line. Every word in the file will at this stage have its own line.
Expand Down Expand Up @@ -373,7 +382,8 @@ Open the `diary-clean.txt` in a text editor. Note how the text has been transfor
We are now ready to pull the text apart.
```bash
$ tr ' ' '\n' < diary-clean.txt | sort | uniq -c | sort -nr > diary-final.txt
$ tr ' ' '\n' < diary-clean.txt | sort | \
> uniq -c | sort -nr > diary-final.txt
```
Here we've made extended use of the pipes we saw in [Counting and mining with the shell](05-counting-mining.md). The first part of this script uses the translate command again, this time to translate every blank space into `\n` which renders as a new line. Every word in the file will at this stage have its own line.
Expand Down
2 changes: 1 addition & 1 deletion md5sum.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"episodes/03-working-with-files-and-folders.md" "f28760ce8c1c6e3b96c8b6ccacc55772" "site/built/03-working-with-files-and-folders.md" "2024-02-23"
"episodes/04-loops.md" "ef86d9f8b71733dea97b44a886391bdd" "site/built/04-loops.md" "2024-02-23"
"episodes/05-counting-mining.md" "f61fde3e769614d41d8b22e09f38d1e7" "site/built/05-counting-mining.md" "2024-02-23"
"episodes/06-free-text.md" "143b9518631bcf5b114d1f432e5a9c25" "site/built/06-free-text.md" "2023-05-08"
"episodes/06-free-text.md" "34bfa6285f7425a17955460e6f9cffbe" "site/built/06-free-text.md" "2024-04-03"
"instructors/instructor-notes.md" "c317e03b34390725b50f49df1bf943b1" "site/built/instructor-notes.md" "2024-02-23"
"learners/discuss.md" "498cf8840b7e5bb0897f7c15af83c052" "site/built/discuss.md" "2023-08-29"
"learners/reference.md" "d4c4195030dad8f532e210812c9f90f2" "site/built/reference.md" "2023-09-08"
Expand Down

0 comments on commit 5957c59

Please sign in to comment.