Skip to content
This repository has been archived by the owner on Jun 5, 2024. It is now read-only.

Latest commit

 

History

History
3250 lines (2588 loc) · 82.8 KB

gnu_sed.md

File metadata and controls

3250 lines (2588 loc) · 82.8 KB





ℹ️ ℹ️ This chapter has been converted into a better formatted ebook: https://learnbyexample.github.io/learn_gnused/. The ebook also has content updated for newer version of the commands, includes exercises, solutions, etc.

For markdown source and links to buy pdf/epub versions, see: https://github.com/learnbyexample/learn_gnused





GNU sed

Table of Contents


$ sed --version | head -n1
sed (GNU sed) 4.2.2

$ man sed
SED(1)                           User Commands                          SED(1)

NAME
       sed - stream editor for filtering and transforming text

SYNOPSIS
       sed [OPTION]... {script-only-if-no-other-script} [input-file]...

DESCRIPTION
       Sed  is a stream editor.  A stream editor is used to perform basic text
       transformations on an input stream (a file or input from  a  pipeline).
       While  in  some  ways similar to an editor which permits scripted edits
       (such as ed), sed works by making only one pass over the input(s),  and
       is consequently more efficient.  But it is sed's ability to filter text
       in a pipeline which particularly distinguishes it from other  types  of
       editors.
...

Note: Multiline and manipulating pattern space with h,x,D,G,H,P etc is not covered in this chapter and examples/information is based on ASCII encoded text input only


Simple search and replace

Detailed examples for substitute command will be covered in later sections, syntax is

s/REGEXP/REPLACEMENT/FLAGS

The / character is idiomatically used as delimiter character. See also Using different delimiter for REGEXP


editing stdin

$ # sample command output to be edited
$ seq 10 | paste -sd,
1,2,3,4,5,6,7,8,9,10

$ # change only first ',' to ' : '
$ seq 10 | paste -sd, | sed 's/,/ : /'
1 : 2,3,4,5,6,7,8,9,10

$ # change all ',' to ' : ' by using 'g' modifier
$ seq 10 | paste -sd, | sed 's/,/ : /g'
1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10

Note: As a good practice, all examples use single quotes around arguments to prevent shell interpretation. See Shell substitutions section on use of double quotes


editing file input

  • By default newline character is the line separator
  • See Regular Expressions section for qualifying search terms, for ex
    • word boundaries to distinguish between 'hi', 'this', 'his', 'history', etc
    • multiple search terms, specific set of character, etc
$ cat greeting.txt
Hi there
Have a nice day

$ # change first 'e' in each line to 'E'
$ sed 's/e/E/' greeting.txt
Hi thEre
HavE a nice day

$ # change first 'nice day' in each line to 'safe journey'
$ sed 's/nice day/safe journey/' greeting.txt
Hi there
Have a safe journey

$ # change all 'e' to 'E' and save changed text to another file
$ sed 's/e/E/g' greeting.txt > out.txt
$ cat out.txt
Hi thErE
HavE a nicE day

Inplace file editing

  • In previous section, the output from sed was displayed on stdout or saved to another file
  • To write the changes back to original file, use -i option

Note:

  • Refer to man sed for details of how to use the -i option. It varies with different sed implementations. As mentioned at start of this chapter, sed (GNU sed) 4.2.2 is being used here
  • See also unix.stackexchange - working with symlinks

With backup

  • When extension is given, the original input file is preserved with name changed according to extension provided
$ # '.bkp' is extension provided
$ sed -i.bkp 's/Hi/Hello/' greeting.txt
$ # output from sed is written back to 'greeting.txt'
$ cat greeting.txt
Hello there
Have a nice day

$ # original file is preserved in 'greeting.txt.bkp'
$ cat greeting.txt.bkp
Hi there
Have a nice day

Without backup

  • Use this option with caution, changes made cannot be undone
$ sed -i 's/nice day/safe journey/' greeting.txt

$ # note, 'Hi' was already changed to 'Hello' in previous example
$ cat greeting.txt
Hello there
Have a safe journey

Multiple files

  • Multiple input files are treated individually and changes are written back to respective files
$ cat f1
I ate 3 apples
$ cat f2
I bought two bananas and 3 mangoes

$ # -i can be used with or without backup
$ sed -i 's/3/three/' f1 f2
$ cat f1
I ate three apples
$ cat f2
I bought two bananas and three mangoes

Prefix backup name

  • A * in argument given to -i will get expanded to input filename
  • This way, one can add prefix instead of suffix for backup
$ cat var.txt
foo
bar
baz

$ sed -i'bkp.*' 's/foo/hello/' var.txt
$ cat var.txt
hello
bar
baz

$ cat bkp.var.txt
foo
bar
baz

Place backups in directory

  • * also allows to specify an existing directory to place the backups instead of current working directory
$ mkdir bkp_dir
$ sed -i'bkp_dir/*' 's/bar/hi/' var.txt
$ cat var.txt
hello
hi
baz

$ cat bkp_dir/var.txt
hello
bar
baz

$ # extensions can be added as well
$ # bkp_dir/*.bkp for suffix
$ # bkp_dir/bkp.* for prefix
$ # bkp_dir/bkp.*.2017 for both and so on

Line filtering options

  • By default, sed acts on entire file. Often, one needs to extract or change only specific lines based on text search, line numbers, lines between two patterns, etc
  • This filtering is much like using grep, head and tail commands in many ways and there are even more features
    • Use sed for inplace editing, the filtered lines to be transformed etc. Not as substitute for those commands

Print command

  • It is usually used in conjunction with -n option
  • By default, sed prints every input line, including any changes made by commands like substitution
    • printing here refers to line being part of sed output which may be shown on terminal, redirected to file, etc
  • Using -n option and p command together, only specific lines needed can be filtered
  • Examples below use the /REGEXP/ addressing, other forms will be seen in sections to follow
$ cat poem.txt
Roses are red,
Violets are blue,
Sugar is sweet,
And so are you.

$ # all lines containing the string 'are'
$ # same as: grep 'are' poem.txt
$ sed -n '/are/p' poem.txt
Roses are red,
Violets are blue,
And so are you.

$ # all lines containing the string 'so are'
$ # same as: grep 'so are' poem.txt
$ sed -n '/so are/p' poem.txt
And so are you.
  • Using print and substitution together
$ # print only lines on which substitution happens
$ sed -n 's/are/ARE/p' poem.txt
Roses ARE red,
Violets ARE blue,
And so ARE you.

$ # if line contains 'are', perform given command
$ # print only if substitution succeeds
$ sed -n '/are/ s/so/SO/p' poem.txt
And SO are you.
  • Duplicating every input line
$ # note, -n is not used and no filtering applied
$ seq 3 | sed 'p'
1
1
2
2
3
3

Delete command

  • By default, sed prints every input line, including any changes like substitution
  • Using the d command, those specific lines will NOT be printed
$ # same as: grep -v 'are' poem.txt
$ sed '/are/d' poem.txt
Sugar is sweet,

$ # same as: seq 5 | grep -v '3'
$ seq 5 | sed '/3/d'
1
2
4
5
  • Modifier I allows to filter lines in case-insensitive way
  • See Regular Expressions section for more details
$ # /rose/I means match the string 'rose' irrespective of case
$ sed '/rose/Id' poem.txt
Violets are blue,
Sugar is sweet,
And so are you.

Quit commands

  • Exit sed without processing further input
$ # same as: seq 23 45 | head -n5
$ # remember that printing is default action if -n is not used
$ # here, 5 is line number based addressing
$ seq 23 45 | sed '5q'
23
24
25
26
27
  • Q is similar to q but won't print the matching line
$ seq 23 45 | sed '5Q'
23
24
25
26

$ # useful to print from beginning of file up to but not including line matching REGEXP
$ sed '/is/Q' poem.txt
Roses are red,
Violets are blue,
  • Use tac to get all lines starting from last occurrence of search string
$ # all lines from last occurrence of '7'
$ seq 50 | tac | sed '/7/q' | tac
47
48
49
50

$ # all lines from last occurrence of '7' excluding line with '7'
$ seq 50 | tac | sed '/7/Q' | tac
48
49
50

Note


Negating REGEXP address

  • Use ! to invert the specified address
$ # same as: sed -n '/so are/p' poem.txt
$ sed '/so are/!d' poem.txt
And so are you.

$ # same as: sed '/are/d' poem.txt
$ sed -n '/are/!p' poem.txt
Sugar is sweet,

Combining multiple REGEXP

$ # each command as argument to -e option
$ sed -n -e '/blue/p' -e '/you/p' poem.txt
Violets are blue,
And so are you.

$ # each command separated by ;
$ # not all commands can be specified so
$ sed -n '/blue/p; /you/p' poem.txt
Violets are blue,
And so are you.

$ # each command separated by literal newline character
$ # might depend on whether the shell allows such multiline command
$ sed -n '
/blue/p
/you/p
' poem.txt
Violets are blue,
And so are you.
  • Use {} command grouping for logical AND
$ # same as: grep 'are' poem.txt | grep 'And'
$ # space between /REGEXP/ and {} is optional
$ sed -n '/are/ {/And/p}' poem.txt
And so are you.

$ # same as: grep 'are' poem.txt | grep -v 'so'
$ sed -n '/are/ {/so/!p}' poem.txt
Roses are red,
Violets are blue,

$ # same as: grep -v 'red' poem.txt | grep -v 'blue'
$ sed -n '/red/!{/blue/!p}' poem.txt
Sugar is sweet,
And so are you.
$ # many ways to do it, use whatever feels easier to construct
$ # sed -e '/red/d' -e '/blue/d' poem.txt
$ # grep -v -e 'red' -e 'blue' poem.txt
$ # multiple commands can lead to duplicatation
$ sed -n '/blue/p; /t/p' poem.txt
Violets are blue,
Violets are blue,
Sugar is sweet,
$ # in such cases, use regular expressions instead
$ sed -nE '/blue|t/p;' poem.txt
Violets are blue,
Sugar is sweet,

$ sed -nE '/red|blue/!p' poem.txt
Sugar is sweet,
And so are you.

$ sed -n '/so/b; /are/p' poem.txt
Roses are red,
Violets are blue,

Filtering by line number

$ # here, 2 represents the address for print command, similar to /REGEXP/p
$ # same as: head -n2 poem.txt | tail -n1
$ sed -n '2p' poem.txt
Violets are blue,

$ # print 2nd and 4th line
$ sed -n '2p; 4p' poem.txt
Violets are blue,
And so are you.

$ # same as: tail -n1 poem.txt
$ sed -n '$p' poem.txt
And so are you.

$ # delete except 3rd line
$ sed '3!d' poem.txt
Sugar is sweet,

$ # substitution only on 2nd line
$ sed '2 s/are/ARE/' poem.txt
Roses are red,
Violets ARE blue,
Sugar is sweet,
And so are you.
  • For large input files, combine p with q for speedy exit
  • sed would immediately quit without processing further input lines when q is used
$ seq 3542 4623452 | sed -n '2452{p;q}'
5993

$ seq 3542 4623452 | sed -n '250p; 2452{p;q}'
3791
5993

$ # here is a sample time comparison
$ time seq 3542 4623452 | sed -n '2452{p;q}' > /dev/null

real    0m0.003s
user    0m0.000s
sys     0m0.000s
$ time seq 3542 4623452 | sed -n '2452p' > /dev/null

real    0m0.334s
user    0m0.396s
sys     0m0.024s
  • mimicking head command using q
$ # same as: seq 23 45 | head -n5
$ # remember that printing is default action if -n is not used
$ seq 23 45 | sed '5q'
23
24
25
26
27

Print only line number

$ # gives both line number and matching line
$ grep -n 'blue' poem.txt
2:Violets are blue,

$ # gives only line number of matching line
$ sed -n '/blue/=' poem.txt
2

$ sed -n '/are/=' poem.txt
1
2
4
  • If needed, matching line can also be printed. But there will be newline separation
$ sed -n '/blue/{=;p}' poem.txt
2
Violets are blue,

$ # or
$ sed -n '/blue/{p;=}' poem.txt
Violets are blue,
2

Address range

  • So far, we've seen how to filter specific line based on REGEXP and line numbers
  • sed also allows to combine them to enable selecting a range of lines
  • Consider the sample input file for this section
$ cat addr_range.txt
Hello World

Good day
How are you

Just do-it
Believe it

Today is sunny
Not a bit funny
No doubt you like it too

Much ado about nothing
He he he
  • Range defined by start and end REGEXP
  • For other cases like getting lines without the line matching start and/or end, unbalanced start/end, when end REGEXP doesn't match, etc see Lines between two REGEXPs section
$ sed -n '/is/,/like/p' addr_range.txt
Today is sunny
Not a bit funny
No doubt you like it too

$ sed -n '/just/I,/believe/Ip' addr_range.txt
Just do-it
Believe it

$ # the second REGEXP will always be checked after the line matching first address
$ sed -n '/No/,/No/p' addr_range.txt
Not a bit funny
No doubt you like it too

$ # all the matching ranges will be printed
$ sed -n '/you/,/do/p' addr_range.txt
How are you

Just do-it
No doubt you like it too

Much ado about nothing
  • Range defined by start and end line numbers
$ # print lines numbered 3 to 7
$ sed -n '3,7p' addr_range.txt
Good day
How are you

Just do-it
Believe it

$ # print lines from line number 13 to last line
$ sed -n '13,$p' addr_range.txt
Much ado about nothing
He he he

$ # delete lines numbered 2 to 13
$ sed '2,13d' addr_range.txt
Hello World
He he he
  • Range defined by mix of line number and REGEXP
$ sed -n '3,/do/p' addr_range.txt
Good day
How are you

Just do-it

$ sed -n '/Today/,$p' addr_range.txt
Today is sunny
Not a bit funny
No doubt you like it too

Much ado about nothing
He he he
  • Negating address range, just add ! to end of address range
$ # same as: seq 10 | sed '3,7d'
$ seq 10 | sed -n '3,7!p'
1
2
8
9
10

$ # same as: sed '/Today/,$d' addr_range.txt
$ sed -n '/Today/,$!p' addr_range.txt
Hello World

Good day
How are you

Just do-it
Believe it

Relative addressing

  • Prefixing + to a number for second address gives relative filtering
  • Similar to using grep -A<num> --no-group-separator 'REGEXP' but grep merges adjacent groups while sed does not
$ # line matching 'is' and 2 lines after
$ sed -n '/is/,+2p' addr_range.txt
Today is sunny
Not a bit funny
No doubt you like it too

$ # note that all matching ranges will be filtered
$ sed -n '/do/,+2p' addr_range.txt
Just do-it
Believe it

No doubt you like it too

Much ado about nothing
$ sed -n '3,+4p' addr_range.txt
Good day
How are you

Just do-it
Believe it
  • Another relative format is i~j which acts on ith line and i+j, i+2j, i+3j, etc
    • 1~2 means 1st, 3rd, 5th, 7th, etc (i.e odd numbered lines)
    • 5~3 means 5th, 8th, 11th, etc
$ # match odd numbered lines
$ # for even, use 2~2
$ seq 10 | sed -n '1~2p'
1
3
5
7
9

$ # match line numbers: 2, 2+1*4, 2+1*4, etc
$ seq 10 | sed -n '2~4p'
2
6
10
  • If ~j is specified after , then meaning changes completely
  • After the matching line based on number or REGEXP of start address, the closest line number multiple of j will mark end address
$ # 2nd line is start address
$ # closest multiple of 4 is 4th line
$ seq 10 | sed -n '2,~4p'
2
3
4
$ # closest multiple of 4 is 8th line
$ seq 10 | sed -n '5,~4p'
5
6
7
8

$ # line matching on `Just` is 6th line, so ending is 10th line
$ sed -n '/Just/,~5p' addr_range.txt
Just do-it
Believe it

Today is sunny
Not a bit funny

Using different delimiter for REGEXP

$ # instead of this
$ echo '/home/learnbyexample/reports' | sed 's/\/home\/learnbyexample\//~\//'
~/reports

$ # use a different delimiter
$ echo '/home/learnbyexample/reports' | sed 's#/home/learnbyexample/#~/#'
~/reports
  • For REGEXP used in address matching, syntax is a bit different \<char>REGEXP<char>
$ printf '/foo/bar/1\n/foo/baz/1\n'
/foo/bar/1
/foo/baz/1

$ printf '/foo/bar/1\n/foo/baz/1\n' | sed -n '\;/foo/bar/;p'
/foo/bar/1

Regular Expressions

  • By default, sed treats REGEXP as BRE (Basic Regular Expression)
  • The -E option enables ERE (Extended Regular Expression) which in GNU sed's case only differs in how meta characters are used, no difference in functionalities
    • Initially GNU sed only had -r option to enable ERE and man sed doesn't even mention -E
    • Other sed versions use -E and grep uses -E as well. So -r won't be used in examples in this tutorial
    • See also sed manual - BRE-vs-ERE
  • See sed manual - Regular Expressions for more details

Line Anchors

  • Often, search must match from beginning of line or towards end of line
  • For example, an integer variable declaration in C will start with optional white-space, the keyword int, white-space and then variable(s)
    • This way one can avoid matching declarations inside single line comments as well
  • Similarly, one might want to match a variable at end of statement

Consider the input file and sample substitution without using any anchoring

$ cat anchors.txt
cat and dog
too many cats around here
to concatenate, use the cmd cat
catapults laid waste to the village
just scat and quit bothering me
that is quite a fabricated tale
try the grape variety muscat

$ # without anchors, substitution will replace wherever the string is found
$ sed 's/cat/XXX/g' anchors.txt
XXX and dog
too many XXXs around here
to conXXXenate, use the cmd XXX
XXXapults laid waste to the village
just sXXX and quit bothering me
that is quite a fabriXXXed tale
try the grape variety musXXX
  • The meta character ^ forces REGEXP to match only at start of line
$ # filtering lines starting with 'cat'
$ sed -n '/^cat/p' anchors.txt
cat and dog
catapults laid waste to the village

$ # replace only at start of line
$ # g modifier not needed as there can only be single match at start of line
$ sed 's/^cat/XXX/' anchors.txt
XXX and dog
too many cats around here
to concatenate, use the cmd cat
XXXapults laid waste to the village
just scat and quit bothering me
that is quite a fabricated tale
try the grape variety muscat

$ # add something to start of line
$ echo 'Have a good day' | sed 's/^/Hi! /'
Hi! Have a good day
  • The meta character $ forces REGEXP to match only at end of line
$ # filtering lines ending with 'cat'
$ sed -n '/cat$/p' anchors.txt
to concatenate, use the cmd cat
try the grape variety muscat

$ # replace only at end of line
$ sed 's/cat$/YYY/' anchors.txt
cat and dog
too many cats around here
to concatenate, use the cmd YYY
catapults laid waste to the village
just scat and quit bothering me
that is quite a fabricated tale
try the grape variety musYYY

$ # add something to end of line
$ echo 'Have a good day' | sed 's/$/. Cya later/'
Have a good day. Cya later

Word Anchors

  • A word character is any alphabet (irrespective of case) or any digit or the underscore character
  • The word anchors help in matching or not matching boundaries of a word
    • For example, to distinguish between par, spar and apparent
  • \b matches word boundary
    • \ is meta character and certain combinations like \b and \B have special meaning
$ # words ending with 'cat'
$ sed -n 's/cat\b/XXX/p' anchors.txt
XXX and dog
to concatenate, use the cmd XXX
just sXXX and quit bothering me
try the grape variety musXXX

$ # words starting with 'cat'
$ sed -n 's/\bcat/YYY/p' anchors.txt
YYY and dog
too many YYYs around here
to concatenate, use the cmd YYY
YYYapults laid waste to the village

$ # only whole words
$ sed -n 's/\bcat\b/ZZZ/p' anchors.txt
ZZZ and dog
to concatenate, use the cmd ZZZ

$ # word is made up of alphabets, numbers and _
$ echo 'foo, foo_bar and foo1' | sed 's/\bfoo\b/baz/g'
baz, foo_bar and foo1
  • \B is opposite of \b, i.e it doesn't match word boundaries
$ # substitute only if 'cat' is surrounded by word characters
$ sed -n 's/\Bcat\B/QQQ/p' anchors.txt
to conQQQenate, use the cmd cat
that is quite a fabriQQQed tale

$ # substitute only if 'cat' is not start of word
$ sed -n 's/\Bcat/RRR/p' anchors.txt
to conRRRenate, use the cmd cat
just sRRR and quit bothering me
that is quite a fabriRRRed tale
try the grape variety musRRR

$ # substitute only if 'cat' is not end of word
$ sed -n 's/cat\B/SSS/p' anchors.txt
too many SSSs around here
to conSSSenate, use the cmd cat
SSSapults laid waste to the village
that is quite a fabriSSSed tale
  • One can also use these alternatives for \b
    • \< for start of word
    • \> for end of word
$ # same as: sed 's/\bcat\b/X/g'
$ echo 'concatenate cat scat cater' | sed 's/\<cat\>/X/g'
concatenate X scat cater

$ # add something to both start/end of word
$ echo 'hi foo_baz 3b' | sed 's/\b/:/g'
:hi: :foo_baz: :3b:

$ # add something only at start of word
$ echo 'hi foo_baz 3b' | sed 's/\</:/g'
:hi :foo_baz :3b

$ # add something only at end of word
$ echo 'hi foo_baz 3b' | sed 's/\>/:/g'
hi: foo_baz: 3b:

Matching the meta characters

  • Since meta characters like ^, $, \ etc have special meaning in REGEXP, they have to be escaped using \ to match them literally
$ # here, '^' will match only start of line
$ echo '(a+b)^2 = a^2 + b^2 + 2ab' | sed 's/^/**/g'
**(a+b)^2 = a^2 + b^2 + 2ab

$ # '\` before '^' will match '^' literally
$ echo '(a+b)^2 = a^2 + b^2 + 2ab' | sed 's/\^/**/g'
(a+b)**2 = a**2 + b**2 + 2ab

$ # to match '\' use '\\'
$ echo 'foo\bar' | sed 's/\\/ /'
foo bar

$ echo 'pa$$' | sed 's/$/s/g'
pa$$s
$ echo 'pa$$' | sed 's/\$/s/g'
pass

$ # '^' has special meaning only at start of REGEXP
$ # similarly, '$' has special meaning only at end of REGEXP
$ echo '(a+b)^2 = a^2 + b^2 + 2ab' | sed 's/a^2/A^2/g'
(a+b)^2 = A^2 + b^2 + 2ab
  • Certain characters like & and \ have special meaning in REPLACEMENT section of substitute as well. They too have to be escaped using \
  • And the delimiter character has to be escaped of course
  • See back reference section for use of & in REPLACEMENT section
$ # & will refer to entire matched string of REGEXP section
$ echo 'foo and bar' | sed 's/and/"&"/'
foo "and" bar
$ echo 'foo and bar' | sed 's/and/"\&"/'
foo "&" bar

$ # use different delimiter where required
$ echo 'a b' | sed 's/ /\//'
a/b
$ echo 'a b' | sed 's# #/#'
a/b

$ # use \\ to represent literal \
$ echo '/foo/bar/baz' | sed 's#/#\\#g'
\foo\bar\baz

Alternation

  • Two or more REGEXP can be combined as logical OR using the | meta character
    • syntax is \| for BRE and | for ERE
  • Each side of | is complete regular expression with their own start/end anchors
  • How each part of alternation is handled and order of evaluation/output is beyond the scope of this tutorial
    • See this for more info on this topic.
$ # BRE
$ sed -n '/red\|blue/p' poem.txt
Roses are red,
Violets are blue,

$ # ERE
$ sed -nE '/red|blue/p' poem.txt
Roses are red,
Violets are blue,

$ # filter lines starting or ending with 'cat'
$ sed -nE '/^cat|cat$/p' anchors.txt
cat and dog
to concatenate, use the cmd cat
catapults laid waste to the village
try the grape variety muscat

$ # g modifier is needed for more than one replacement
$ echo 'foo and temp and baz' | sed -E 's/foo|temp|baz/XYZ/'
XYZ and temp and baz
$ echo 'foo and temp and baz' | sed -E 's/foo|temp|baz/XYZ/g'
XYZ and XYZ and XYZ

The dot meta character

  • The . meta character matches any character once, including newline
$ # replace all sequence of 3 characters starting with 'c' and ending with 't'
$ echo 'coat cut fit c#t' | sed 's/c.t/XYZ/g'
coat XYZ fit XYZ

$ # replace all sequence of 4 characters starting with 'c' and ending with 't'
$ echo 'coat cut fit c#t' | sed 's/c..t/ABCD/g'
ABCD cut fit c#t

$ # space, tab etc are also characters which will be matched by '.'
$ echo 'coat cut fit c#t' | sed 's/t.f/IJK/g'
coat cuIJKit c#t

Quantifiers

All quantifiers in sed are greedy, i.e longest match wins as long as overall REGEXP is satisfied and precedence is left to right. In this section, we'll cover usage of quantifiers on characters

  • ? will try to match 0 or 1 time
  • For BRE, use \?
$ printf 'late\npale\nfactor\nrare\nact\n'
late
pale
factor
rare
act

$ # same as using: sed -nE '/at|act/p'
$ printf 'late\npale\nfactor\nrare\nact\n' | sed -nE '/ac?t/p'
late
factor
act

$ # greediness comes in handy in some cases
$ # problem: '<' has to be replaced with '\<' only if not preceded by '\'
$ echo 'blah \< foo bar < blah baz <'
blah \< foo bar < blah baz <
$ # this won't work as '\<' gets replaced with '\\<'
$ echo 'blah \< foo bar < blah baz <' | sed -E 's/</\\</g'
blah \\< foo bar \< blah baz \<
$ # by using '\\?<' both '\<' and '<' gets replaced by '\<'
$ echo 'blah \< foo bar < blah baz <' | sed -E 's/\\?</\\</g'
blah \< foo bar \< blah baz \<
  • * will try to match 0 or more times
$ printf 'abc\nac\nadc\nabbc\nbbb\nbc\nabbbbbc\n'
abc
ac
adc
abbc
bbb
bc
abbbbbc

$ # match 'a' and 'c' with any number of 'b' in between
$ printf 'abc\nac\nadc\nabbc\nbbb\nbc\nabbbbbc\n' | sed -n '/ab*c/p'
abc
ac
abbc
abbbbbc

$ # delete from start of line to 'te'
$ echo 'that is quite a fabricated tale' | sed 's/.*te//'
d tale
$ # delete from start of line to 'te '
$ echo 'that is quite a fabricated tale' | sed 's/.*te //'
a fabricated tale
$ # delete from first 'f' in the line to end of line
$ echo 'that is quite a fabricated tale' | sed 's/f.*//'
that is quite a 
  • + will try to match 1 or more times
  • For BRE, use \+
$ # match 'a' and 'c' with at least one 'b' in between
$ # BRE
$ printf 'abc\nac\nadc\nabbc\nbbb\nbc\nabbbbbc\n' | sed -n '/ab\+c/p'
abc
abbc
abbbbbc

$ # ERE
$ printf 'abc\nac\nadc\nabbc\nbbb\nbc\nabbbbbc\n' | sed -nE '/ab+c/p'
abc
abbc
abbbbbc
  • For more precise control on number of times to match, use {}
$ # exactly 5 times
$ printf 'abc\nac\nadc\nabbc\nbbb\nbc\nabbbbbc\n' | sed -nE '/ab{5}c/p'
abbbbbc

$ # between 1 to 3 times, inclusive of 1 and 3
$ printf 'abc\nac\nadc\nabbc\nbbb\nbc\nabbbbbc\n' | sed -nE '/ab{1,3}c/p'
abc
abbc

$ # maximum of 2 times, including 0 times
$ printf 'abc\nac\nadc\nabbc\nbbb\nbc\nabbbbbc\n' | sed -nE '/ab{,2}c/p'
abc
ac
abbc

$ # minimum of 2 times
$ printf 'abc\nac\nadc\nabbc\nbbb\nbc\nabbbbbc\n' | sed -nE '/ab{2,}c/p'
abbc
abbbbbc

$ # BRE
$ printf 'abc\nac\nadc\nabbc\nbbb\nbc\nabbbbbc\n' | sed -n '/ab\{2,\}c/p'
abbc
abbbbbc

Character classes

  • The . meta character provides a way to match any character
  • Character class provides a way to match any character among a specified set of characters enclosed within []
$ # same as: sed -nE '/lane|late/p'
$ printf 'late\nlane\nfate\nfete\n' | sed -n '/la[nt]e/p'
late
lane

$ printf 'late\nlane\nfate\nfete\n' | sed -n '/[fl]a[nt]e/p'
late
lane
fate

$ # quantifiers can be added similar to using for any other character
$ # filter lines made up entirely of digits, containing at least one digit
$ printf 'cat5\nfoo\n123\n42\n' | sed -nE '/^[0123456789]+$/p'
123
42
$ # filter lines made up entirely of digits, containing at least three digits
$ printf 'cat5\nfoo\n123\n42\n' | sed -nE '/^[0123456789]{3,}$/p'
123

Character ranges

$ # filter lines made up entirely of digits, at least one
$ printf 'cat5\nfoo\n123\n42\n' | sed -nE '/^[0-9]+$/p'
123
42

$ # filter lines made up entirely of lower case alphabets, at least one
$ printf 'cat5\nfoo\n123\n42\n' | sed -nE '/^[a-z]+$/p'
foo

$ # filter lines made up entirely of lower case alphabets and digits, at least one
$ printf 'cat5\nfoo\n123\n42\n' | sed -nE '/^[a-z0-9]+$/p'
cat5
foo
123
42
$ # numbers between 10 to 29
$ printf '23\n154\n12\n26\n98234\n' | sed -n '/^[12][0-9]$/p'
23
12
26

$ # numbers >= 100
$ printf '23\n154\n12\n26\n98234\n' | sed -nE '/^[0-9]{3,}$/p'
154
98234

$ # numbers >= 100 if there are leading zeros
$ printf '0501\n035\n154\n12\n26\n98234\n' | sed -nE '/^0*[1-9][0-9]{2,}$/p'
0501
154
98234

Negating character class

  • Meta characters inside and outside of [] are completely different
  • For example, ^ as first character inside [] matches characters other than those specified inside character class
$ # delete zero or more characters before first =
$ echo 'foo=bar; baz=123' | sed 's/^[^=]*//'
=bar; baz=123

$ # delete zero or more characters after last =
$ echo 'foo=bar; baz=123' | sed 's/[^=]*$//'
foo=bar; baz=

$ # same as: sed -n '/[aeiou]/!p'
$ printf 'tryst\nglyph\npity\nwhy\n' | sed -n '/^[^aeiou]*$/p'
tryst
glyph
why

Matching meta characters inside []

$ # to match - it should be first or last character within []
$ printf 'Foo-bar\nabc-456\n42\nCo-operate\n' | sed -nE '/^[a-z-]+$/Ip'
Foo-bar
Co-operate

$ # to match ] it should be first character within []
$ printf 'int foo\nint a[5]\nfoo=bar\n' | sed -n '/[]=]/p'
int a[5]
foo=bar

$ # to match [ use [ anywhere in the character list
$ # [][] will match both [ and ]
$ printf 'int foo\nint a[5]\nfoo=bar\n' | sed -n '/[[]/p'
int a[5]

$ # to match ^ it should be other than first in the list
$ printf 'c=a^b\nd=f*h+e\nz=x-y\n' | sed -n '/[*^]/p'
c=a^b
d=f*h+e

Named character classes

Character classes Description
[:digit:] Same as [0-9]
[:lower:] Same as [a-z]
[:upper:] Same as [A-Z]
[:alpha:] Same as [a-zA-Z]
[:alnum:] Same as [0-9a-zA-Z]
[:xdigit:] Same as [0-9a-fA-F]
[:cntrl:] Control characters - first 32 ASCII characters and 127th (DEL)
[:punct:] All the punctuation characters
[:graph:] [:alnum:] and [:punct:]
[:print:] [:alnum:], [:punct:] and space
[:blank:] Space and tab characters
[:space:] white-space characters: tab, newline, vertical tab, form feed, carriage return and space
$ # lines containing only hexadecimal characters
$ printf '128\n34\nfe32\nfoo1\nbar\n' | sed -nE '/^[[:xdigit:]]+$/p'
128
34
fe32

$ # lines containing at least one non-hexadecimal character
$ printf '128\n34\nfe32\nfoo1\nbar\n' | sed -n '/[^[:xdigit:]]/p'
foo1
bar

$ # same as: sed -nE '/^[a-z-]+$/Ip'
$ printf 'Foo-bar\nabc-456\n42\nCo-operate\n' | sed -nE '/^[[:alpha:]-]+$/p'
Foo-bar
Co-operate

$ # remove all punctuation characters
$ sed 's/[[:punct:]]//g' poem.txt
Roses are red
Violets are blue
Sugar is sweet
And so are you

Backslash character classes

Character classes Description
\w Same as [0-9a-zA-Z_] or [[:alnum:]_]
\W Same as [^0-9a-zA-Z_] or [^[:alnum:]_]
\s Same as [[:space:]]
\S Same as [^[:space:]]
$ # lines containing only word characters
$ printf '123\na=b+c\ncmp_str\nFoo_bar\n' | sed -nE '/^\w+$/p'
123
cmp_str
Foo_bar

$ # backslash character classes cannot be used inside [] unlike perl
$ # \w would simply match w
$ echo 'w=y-x+9*3' | sed 's/[\w=]//g'
y-x+9*3
$ echo 'w=y-x+9*3' | perl -pe 's/[\w=]//g'
-+*

Escape sequences

  • Certain ASCII characters like tab, carriage return, newline, etc have escape sequence to represent them
    • Unlike backslash character classes, these can be used within [] as well
  • Any ASCII character can be also represented using their decimal or octal or hexadecimal value
  • See sed manual - Escapes for more details
$ # example for representing tab character
$ printf 'foo\tbar\tbaz\n'
foo     bar     baz
$ printf 'foo\tbar\tbaz\n' | sed 's/\t/ /g'
foo bar baz
$ echo 'a b c' | sed 's/ /\t/g'
a       b       c

$ # using escape sequence inside character class
$ printf 'a\tb\vc\n'
a       b
         c
$ printf 'a\tb\vc\n' | cat -vT
a^Ib^Kc
$ printf 'a\tb\vc\n' | sed 's/[\t\v]/ /g'
a b c

$ # most common use case for hex escape sequence is to represent single quotes
$ # equivalent is '\d039' and '\o047' for decimal and octal respectively
$ echo "foo: '34'"
foo: '34'
$ echo "foo: '34'" | sed 's/\x27/"/g'
foo: "34"
$ echo 'foo: "34"' | sed 's/"/\x27/g'
foo: '34'

Grouping

  • Character classes allow matching against a choice of multiple character list and then quantifier added if needed
  • One of the uses of grouping is analogous to character classes for whole regular expressions, instead of just list of characters
  • The meta characters () are used for grouping
    • requires \(\) for BRE
  • Similar to maths ab + ac = a(b+c), think of regular expression a(b|c) = ab|ac
$ # four letter words with 'on' or 'no' in middle
$ printf 'known\nmood\nknow\npony\ninns\n' | sed -nE '/\b[a-z](on|no)[a-z]\b/p'
know
pony
$ # common mistake to use character class, will match 'oo' and 'nn' as well
$ printf 'known\nmood\nknow\npony\ninns\n' | sed -nE '/\b[a-z][on]{2}[a-z]\b/p'
mood
know
pony
inns

$ # quantifier example
$ printf 'handed\nhand\nhandy\nhands\nhandle\n' | sed -nE '/^hand([sy]|le)?$/p'
hand
handy
hands
handle

$ # remove first two columns where : is delimiter
$ echo 'foo:123:bar:baz' | sed -E 's/^([^:]+:){2}//'
bar:baz

$ # can be nested as required
$ printf 'spade\nscore\nscare\nspare\nsphere\n' | sed -nE '/^s([cp](he|a)[rd])e$/p'
spade
scare
spare
sphere

Back reference

  • The matched string within () can also be used to be matched again by back referencing the captured groups
  • \1 denotes the first matched group, \2 the second one and so on
    • Order is leftmost ( is \1, next one is \2 and so on
    • Can be used both in REGEXP as well as in REPLACEMENT sections
  • & or \0 represents entire matched string in REPLACEMENT section
  • Note that the matched string, not the regular expression itself is referenced
    • for ex: if ([0-9][a-f]) matches 3b, then back referencing will be 3b not any other valid match of the regular expression like 8f, 0a etc
  • As \ and & are special characters in REPLACEMENT section, use \\ and \& respectively for literal representation
$ # filter lines with consecutive repeated alphabets
$ printf 'eel\nflee\nall\npat\nilk\nseen\n' | sed -nE '/([a-z])\1/p'
eel
flee
all
seen

$ # reduce \\ to single \ and delete if only single \
$ echo '\[\] and \\w and \[a-zA-Z0-9\_\]' | sed -E 's/(\\?)\\/\1/g'
[] and \w and [a-zA-Z0-9_]

$ # remove two or more duplicate words separated by space
$ # word boundaries prevent false matches like 'the theatre' 'sand and stone' etc
$ echo 'a a a walking for for a cause' | sed -E 's/\b(\w+)( \1)+\b/\1/g'
a walking for a cause

$ # surround only third column with double quotes
$ # note the nested capture groups and numbers used in REPLACEMENT section
$ echo 'foo:123:bar:baz' | sed -E 's/^(([^:]+:){2})([^:]+)/\1"\3"/'
foo:123:"bar":baz

$ # add first column data to end of line as well
$ echo 'foo:123:bar:baz' | sed -E 's/^([^:]+).*/& \1/'
foo:123:bar:baz foo

$ # surround entire line with double quotes
$ echo 'hello world' | sed 's/.*/"&"/'
"hello world"
$ # add something at start as well as end of line
$ echo 'hello world' | sed 's/.*/Hi. &. Have a nice day/'
Hi. hello world. Have a nice day

Changing case

  • Applies only to REPLACEMENT section, unlike perl where these can be used in REGEXP portion as well
  • See sed manual - The s Command for more details and corner cases
$ # UPPERCASE all alphabets, will be stopped on \L or \E
$ echo 'HeLlO WoRLD' | sed 's/.*/\U&/'
HELLO WORLD

$ # lowercase all alphabets, will be stopped on \U or \E
$ echo 'HeLlO WoRLD' | sed 's/.*/\L&/'
hello world

$ # Uppercase only next character
$ echo 'foo bar' | sed 's/\w*/\u&/g'
Foo Bar
$ echo 'foo_bar next_line' | sed -E 's/_([a-z])/\u\1/g'
fooBar nextLine

$ # lowercase only next character
$ echo 'FOO BAR' | sed 's/\w*/\l&/g'
fOO bAR
$ echo 'fooBar nextLine Baz' | sed -E 's/([a-z])([A-Z])/\1_\l\2/g'
foo_bar next_line Baz

$ # titlecase if input has mixed case
$ echo 'HeLlO WoRLD' | sed 's/.*/\L&/; s/\w*/\u&/g'
Hello World
$ # sed 's/.*/\L\u&/' also works, but not sure if it is defined behavior
$ echo 'HeLlO WoRLD' | sed 's/.*/\L&/; s/./\u&/'
Hello world

$ # \E will stop conversion started by \U or \L
$ echo 'foo_bar next_line baz' | sed -E 's/([a-z]+)(_[a-z]+)/\U\1\E\2/g'
FOO_bar NEXT_line baz

Substitute command modifiers

The s command syntax:

s/REGEXP/REPLACEMENT/FLAGS
  • Modifiers (or FLAGS) like g, p and I have been already seen. For completeness, they will be discussed again along with rest of the modifiers
  • See sed manual - The s Command for more details and corner cases

g modifier

By default, substitute command will replace only first occurrence of match. g modifier is needed to replace all occurrences

$ # replace only first : with -
$ echo 'foo:123:bar:baz' | sed 's/:/-/'
foo-123:bar:baz

$ # replace all : with -
$ echo 'foo:123:bar:baz' | sed 's/:/-/g'
foo-123-bar-baz

Replace specific occurrence

  • A number can be used to specify Nth match to be replaced
$ # replace first occurrence
$ echo 'foo:123:bar:baz' | sed 's/:/-/'
foo-123:bar:baz
$ echo 'foo:123:bar:baz' | sed -E 's/[^:]+/XYZ/'
XYZ:123:bar:baz

$ # replace second occurrence
$ echo 'foo:123:bar:baz' | sed 's/:/-/2'
foo:123-bar:baz
$ echo 'foo:123:bar:baz' | sed -E 's/[^:]+/XYZ/2'
foo:XYZ:bar:baz

$ # replace third occurrence
$ echo 'foo:123:bar:baz' | sed 's/:/-/3'
foo:123:bar-baz
$ echo 'foo:123:bar:baz' | sed -E 's/[^:]+/XYZ/3'
foo:123:XYZ:baz

$ # choice of quantifier depends on knowing input
$ echo ':123:bar:baz' | sed 's/[^:]*/XYZ/2'
:XYZ:bar:baz
$ echo ':123:bar:baz' | sed -E 's/[^:]+/XYZ/2'
:123:XYZ:baz
  • Replacing Nth match from end of line when number of matches is unknown
  • Makes use of greediness of quantifiers
$ # replacing last occurrence
$ # can also use sed -E 's/:([^:]*)$/-\1/'
$ echo 'foo:123:bar:baz' | sed -E 's/(.*):/\1-/'
foo:123:bar-baz
$ echo '456:foo:123:bar:789:baz' | sed -E 's/(.*):/\1-/'
456:foo:123:bar:789-baz
$ echo 'foo and bar and baz land good' | sed -E 's/(.*)and/\1XYZ/'
foo and bar and baz lXYZ good
$ # use word boundaries as necessary
$ echo 'foo and bar and baz land good' | sed -E 's/(.*)\band\b/\1XYZ/'
foo and bar XYZ baz land good

$ # replacing last but one
$ echo 'foo:123:bar:baz' | sed -E 's/(.*):(.*:)/\1-\2/'
foo:123-bar:baz
$ echo '456:foo:123:bar:789:baz' | sed -E 's/(.*):(.*:)/\1-\2/'
456:foo:123:bar-789:baz

$ # replacing last but two
$ echo '456:foo:123:bar:789:baz' | sed -E 's/(.*):((.*:){2})/\1-\2/'
456:foo:123-bar:789:baz
$ # replacing last but three
$ echo '456:foo:123:bar:789:baz' | sed -E 's/(.*):((.*:){3})/\1-\2/'
456:foo-123:bar:789:baz
  • Replacing all but first N occurrences by combining with g modifier
$ # replace all : with - except first two
$ echo '456:foo:123:bar:789:baz' | sed -E 's/:/-/3g'
456:foo:123-bar-789-baz

$ # replace all : with - except first three
$ echo '456:foo:123:bar:789:baz' | sed -E 's/:/-/4g'
456:foo:123:bar-789-baz
  • Replacing multiple Nth occurrences
$ # replace first two occurrences of : with -
$ echo '456:foo:123:bar:789:baz' | sed 's/:/-/; s/:/-/'
456-foo-123:bar:789:baz

$ # replace second and third occurrences of : with -
$ # note the changes in number to be used for subsequent replacement
$ echo '456:foo:123:bar:789:baz' | sed 's/:/-/2; s/:/-/2'
456:foo-123-bar:789:baz

$ # better way is to use descending order
$ echo '456:foo:123:bar:789:baz' | sed 's/:/-/3; s/:/-/2'
456:foo-123-bar:789:baz
$ # replace second, third and fifth occurrences of : with -
$ echo '456:foo:123:bar:789:baz' | sed 's/:/-/5; s/:/-/3; s/:/-/2'
456:foo-123-bar:789-baz

Ignoring case

  • Either i or I can be used for replacing in case-insensitive manner
  • Since only I can be used for address filtering (for ex: sed '/rose/Id' poem.txt), use I for substitute command as well for consistency
$ echo 'hello Hello HELLO HeLlO' | sed 's/hello/hi/g'
hi Hello HELLO HeLlO

$ echo 'hello Hello HELLO HeLlO' | sed 's/hello/hi/Ig'
hi hi hi hi

p modifier

  • Usually used in conjunction with -n option to output only modified lines
$ # no output if no substitution
$ echo 'hi there. have a nice day' | sed -n 's/xyz/XYZ/p'
$ # modified line if there is substitution
$ echo 'hi there. have a nice day' | sed -n 's/\bh/H/pg'
Hi there. Have a nice day

$ # only lines containing 'are'
$ sed -n 's/are/ARE/p' poem.txt
Roses ARE red,
Violets ARE blue,
And so ARE you.

$ # only lines containing 'are' as well as 'so'
$ sed -n '/are/ s/so/SO/p' poem.txt
And SO are you.

w modifier

  • Allows to write only the changes to specified file name instead of default stdout
$ # space between w and filename is optional
$ # same as: sed -n 's/3/three/p' > 3.txt
$ seq 20 | sed -n 's/3/three/w 3.txt'
$ cat 3.txt
three
1three

$ # do not use -n if output should be displayed as well as written to file
$ echo '456:foo:123:bar:789:baz' | sed -E 's/(:[^:]*){2}$//w col.txt'
456:foo:123:bar
$ cat col.txt
456:foo:123:bar
  • For multiple output files, use -e for each file
$ seq 20 | sed -n -e 's/5/five/w 5.txt' -e 's/7/seven/w 7.txt'
$ cat 5.txt
five
1five
$ cat 7.txt
seven
1seven
  • There are two predefined filenames
    • /dev/stdout to write to stdout
    • /dev/stderr to write to stderr
$ # inplace editing as well as display changes on terminal
$ sed -i 's/three/3/w /dev/stdout' 3.txt
3
13
$ cat 3.txt
3
13

e modifier

  • Allows to use shell command output in REPLACEMENT section
  • Trailing newline from command output is suppressed
$ # replacing a line with output of shell command
$ printf 'Date:\nreplace this line\n'
Date:
replace this line
$ printf 'Date:\nreplace this line\n' | sed 's/^replace.*/date/e'
Date:
Thu May 25 10:19:46 IST 2017

$ # when using p modifier with e, order is important
$ printf 'Date:\nreplace this line\n' | sed -n 's/^replace.*/date/ep'
Thu May 25 10:19:46 IST 2017
$ printf 'Date:\nreplace this line\n' | sed -n 's/^replace.*/date/pe'
date

$ # entire modified line is executed as shell command
$ echo 'xyz 5' | sed 's/xyz/seq/e'
1
2
3
4
5

m modifier

Before seeing example with m modifier, let's see a simple example to get two lines in pattern space

$ # line matching 'blue' and next line in pattern space
$ sed -n '/blue/{N;p}' poem.txt
Violets are blue,
Sugar is sweet,

$ # applying substitution, remember that . matches newline as well
$ sed -n '/blue/{N;s/are.*is//p}' poem.txt
Violets  sweet,
  • When m modifier is used, it affects the behavior of ^, $ and . meta characters
$ # without m modifier, ^ will anchor only beginning of entire pattern space
$ sed -n '/blue/{N;s/^/:: /pg}' poem.txt
:: Violets are blue,
Sugar is sweet,
$ # with m modifier, ^ will anchor each individual line within pattern space
$ sed -n '/blue/{N;s/^/:: /pgm}' poem.txt
:: Violets are blue,
:: Sugar is sweet,

$ # same applies to $ as well
$ sed -n '/blue/{N;s/$/ ::/pg}' poem.txt
Violets are blue,
Sugar is sweet, ::
$ sed -n '/blue/{N;s/$/ ::/pgm}' poem.txt
Violets are blue, ::
Sugar is sweet, ::

$ # with m modifier, . will not match newline character
$ sed -n '/blue/{N;s/are.*//p}' poem.txt
Violets 
$ sed -n '/blue/{N;s/are.*//pm}' poem.txt
Violets 
Sugar is sweet,

Shell substitutions


Variable substitution

  • Entire command in double quotes can be used for simple use cases
$ word='are'
$ sed -n "/$word/p" poem.txt
Roses are red,
Violets are blue,
And so are you.

$ replace='ARE'
$ sed "s/$word/$replace/g" poem.txt
Roses ARE red,
Violets ARE blue,
Sugar is sweet,
And so ARE you.

$ # need to use delimiter as suitable
$ echo 'home path is:' | sed "s/$/ $HOME/"
sed: -e expression #1, char 7: unknown option to `s'
$ echo 'home path is:' | sed "s|$| $HOME|"
home path is: /home/learnbyexample
  • If command has characters like \, backtick, ! etc, double quote only the variable
$ # if history expansion is enabled, ! is special
$ word='are'
$ sed "/$word/!d" poem.txt
sed "/$word/date +%A" poem.txt
sed: -e expression #1, char 7: extra characters after command

$ # so double quote only the variable
$ # the command is concatenation of '/' and "$word" and '/!d'
$ sed '/'"$word"'/!d' poem.txt
Roses are red,
Violets are blue,
And so are you.

Command substitution

  • Much more flexible than using e modifier as part of line can be modified as well
$ echo 'today is date' | sed 's/date/'"$(date +%A)"'/'
today is Tuesday

$ # need to use delimiter as suitable
$ echo 'current working dir is: ' | sed 's/$/'"$(pwd)"'/'
sed: -e expression #1, char 6: unknown option to `s'
$ echo 'current working dir is: ' | sed 's|$|'"$(pwd)"'|'
current working dir is: /home/learnbyexample/command_line_text_processing

$ # multiline output cannot be substituted in this manner
$ echo 'foo' | sed 's/foo/'"$(seq 5)"'/'
sed: -e expression #1, char 7: unterminated `s' command

z and s command line options

  • We have already seen a few options like -n, -e, -i and -E
  • This section will cover -z and -s options
  • See sed manual - Command line options for other options and more details

The -z option will cause sed to separate input based on ASCII NUL character instead of newlines

$ # useful to process null separated data
$ # for ex: output of grep -Z, find -print0, etc
$ printf 'teal\0red\nblue\n\0green\n' | sed -nz '/red/p' | cat -A
red$
blue$
^@

$ # also useful to process whole file(not having NUL characters) as a single string
$ # adds ; to previous line if current line starts with c
$ printf 'cat\ndog\ncoat\ncut\nmat\n' | sed -z 's/\nc/;&/g'
cat
dog;
coat;
cut
mat

The -s option will cause sed to treat multiple input files separately instead of treating them as single concatenated input. If -i is being used, -s is implied

$ # without -s, there is only one first line
$ # F command prints file name of current file
$ sed '1F' f1 f2
f1
I ate three apples
I bought two bananas and three mangoes

$ # with -s, each file has its own address
$ sed -s '1F' f1 f2
f1
I ate three apples
f2
I bought two bananas and three mangoes


change command

The change command c will delete line(s) represented by address or address range and replace it with given string

Note the string used cannot have literal newline character, use escape sequence instead

$ # white-space between c and replacement string is ignored
$ seq 3 | sed '2c foo bar'
1
foo bar
3

$ # note how all lines in address range are replaced
$ seq 8 | sed '3,7cfoo bar'
1
2
foo bar
8

$ # escape sequences are allowed in string to be replaced
$ sed '/red/,/is/chello\nhi there' poem.txt
hello
hi there
And so are you.
  • command will apply for all matching addresses
$ seq 5 | sed '/[24]/cfoo'
1
foo
3
foo
5
  • \ is special immediately after c, see sed manual - other commands for details
  • If escape sequence is needed at beginning of replacement string, use an additional \
$ # \ helps to add leading spaces
$ seq 3 | sed '2c  a'
1
a
3
$ seq 3 | sed '2c\ a'
1
 a
3

$ seq 3 | sed '2c\tgood day'
1
tgood day
3
$ seq 3 | sed '2c\\tgood day'
1
        good day
3
  • Since ; cannot be used to distinguish between string and end of command, use -e for multiple commands
$ sed -e '/are/cHi;s/is/IS/' poem.txt
Hi;s/is/IS/
Hi;s/is/IS/
Sugar is sweet,
Hi;s/is/IS/

$ sed -e '/are/cHi' -e 's/is/IS/' poem.txt
Hi
Hi
Sugar IS sweet,
Hi
  • Using shell substitution
$ text='good day'
$ seq 3 | sed '2c'"$text"
1
good day
3

$ text='good day\nfoo bar'
$ seq 3 | sed '2c'"$text"
1
good day
foo bar
3

$ seq 3 | sed '2c'"$(date +%A)"
1
Thursday
3

$ # multiline command output will lead to error
$ seq 3 | sed '2c'"$(seq 2)"
sed: -e expression #1, char 5: missing command

insert command

The insert command allows to add string before a line matching given address

Note the string used cannot have literal newline character, use escape sequence instead

$ # white-space between i and string is ignored
$ # same as: sed '2s/^/hello\n/'
$ seq 3 | sed '2i hello'
1
hello
2
3

$ # escape sequences can be used
$ seq 3 | sed '2ihello\nhi'
1
hello
hi
2
3
  • command will apply for all matching addresses
$ seq 5 | sed '/[24]/ifoo'
1
foo
2
3
foo
4
5
  • \ is special immediately after i, see sed manual - other commands for details
  • If escape sequence is needed at beginning of replacement string, use an additional \
$ seq 3 | sed '2i  foo'
1
foo
2
3
$ seq 3 | sed '2i\ foo'
1
 foo
2
3

$ seq 3 | sed '2i\tbar'
1
tbar
2
3
$ seq 3 | sed '2i\\tbar'
1
        bar
2
3
  • Since ; cannot be used to distinguish between string and end of command, use -e for multiple commands
$ sed -e '/is/ifoobar;s/are/ARE/' poem.txt
Roses are red,
Violets are blue,
foobar;s/are/ARE/
Sugar is sweet,
And so are you.

$ sed -e '/is/ifoobar' -e 's/are/ARE/' poem.txt
Roses ARE red,
Violets ARE blue,
foobar
Sugar is sweet,
And so ARE you.
  • Using shell substitution
$ text='good day'
$ seq 3 | sed '2i'"$text"
1
good day
2
3

$ text='good day\nfoo bar'
$ seq 3 | sed '2i'"$text"
1
good day
foo bar
2
3

$ seq 3 | sed '2iToday is '"$(date +%A)"
1
Today is Thursday
2
3

$ # multiline command output will lead to error
$ seq 3 | sed '2i'"$(seq 2)"
sed: -e expression #1, char 5: missing command

append command

The append command allows to add string after a line matching given address

Note the string used cannot have literal newline character, use escape sequence instead

$ # white-space between a and string is ignored
$ # same as: sed '2s/$/\nhello/'
$ seq 3 | sed '2a hello'
1
2
hello
3

$ # escape sequences can be used
$ seq 3 | sed '2ahello\nhi'
1
2
hello
hi
3
  • command will apply for all matching addresses
$ seq 5 | sed '/[24]/afoo'
1
2
foo
3
4
foo
5
  • \ is special immediately after a, see sed manual - other commands for details
  • If escape sequence is needed at beginning of replacement string, use an additional \
$ seq 3 | sed '2a  foo'
1
2
foo
3
$ seq 3 | sed '2a\ foo'
1
2
 foo
3

$ seq 3 | sed '2a\tbar'
1
2
tbar
3
$ seq 3 | sed '2a\\tbar'
1
2
        bar
3
  • Since ; cannot be used to distinguish between string and end of command, use -e for multiple commands
$ sed -e '/is/afoobar;s/are/ARE/' poem.txt
Roses are red,
Violets are blue,
Sugar is sweet,
foobar;s/are/ARE/
And so are you.

$ sed -e '/is/afoobar' -e 's/are/ARE/' poem.txt
Roses ARE red,
Violets ARE blue,
Sugar is sweet,
foobar
And so ARE you.
  • Using shell substitution
$ text='good day'
$ seq 3 | sed '2a'"$text"
1
2
good day
3

$ text='good day\nfoo bar'
$ seq 3 | sed '2a'"$text"
1
2
good day
foo bar
3

$ seq 3 | sed '2aToday is '"$(date +%A)"
1
2
Today is Thursday
3

$ # multiline command output will lead to error
$ seq 3 | sed '2a'"$(seq 2)"
sed: -e expression #1, char 5: missing command

adding contents of file


r for entire file

  • The r command allows to add contents of file after a line matching given address
  • It is a robust way to add multiline content or if content can have characters that may be interpreted
  • Special name /dev/stdin allows to read from stdin instead of file input
  • First, a simple example to add contents of one file into another at specified address
$ cat 5.txt
five
1five

$ cat poem.txt
Roses are red,
Violets are blue,
Sugar is sweet,
And so are you.

$ # space between r and filename is optional
$ sed '2r 5.txt' poem.txt
Roses are red,
Violets are blue,
five
1five
Sugar is sweet,
And so are you.

$ # content cannot be added before first line
$ sed '0r 5.txt' poem.txt
sed: -e expression #1, char 2: invalid usage of line address 0
$ # but that is trivial to solve: cat 5.txt poem.txt
  • command will apply for all matching addresses
$ seq 5 | sed '/[24]/r 5.txt'
1
2
five
1five
3
4
five
1five
5
  • adding content of variable as it is without any interpretation
  • also shows example for using /dev/stdin
$ text='Good day\nfoo bar baz\n'
$ # escape sequence like \n will be interpreted when 'a' command is used
$ sed '/is/a'"$text" poem.txt
Roses are red,
Violets are blue,
Sugar is sweet,
Good day
foo bar baz

And so are you.

$ # \ is just another character, won't be treated as special with 'r' command
$ echo "$text" | sed '/is/r /dev/stdin' poem.txt
Roses are red,
Violets are blue,
Sugar is sweet,
Good day\nfoo bar baz\n
And so are you.
  • adding multiline command output is simple as well
$ seq 3 | sed '/is/r /dev/stdin' poem.txt
Roses are red,
Violets are blue,
Sugar is sweet,
1
2
3
And so are you.
$ # replacing range of lines
$ # order is important, first 'r' and then 'd'
$ sed -e '/is/r 5.txt' -e '1,/is/d' poem.txt
five
1five
And so are you.

$ # replacing a line
$ seq 3 | sed -e '3r /dev/stdin' -e '3d' poem.txt
Roses are red,
Violets are blue,
1
2
3
And so are you.

$ # can also use {} grouping to avoid repeating the address
$ seq 3 | sed -e '/blue/{r /dev/stdin' -e 'd}' poem.txt
Roses are red,
1
2
3
Sugar is sweet,
And so are you.

R for line by line

  • add a line for every address match
  • Special name /dev/stdin allows to read from stdin instead of file input
$ # space between R and filename is optional
$ seq 3 | sed '/are/R /dev/stdin' poem.txt
Roses are red,
1
Violets are blue,
2
Sugar is sweet,
And so are you.
3
$ # to replace matching line
$ seq 3 | sed -e '/are/{R /dev/stdin' -e 'd}' poem.txt
1
2
Sugar is sweet,
3

$ sed '2,3R 5.txt' poem.txt
Roses are red,
Violets are blue,
five
Sugar is sweet,
1five
And so are you.
  • number of lines from file to be read different from number of matching address lines
$ # file has more lines than matching address
$ # 2 lines in 5.txt but only 1 line matching 'is'
$ sed '/is/R 5.txt' poem.txt
Roses are red,
Violets are blue,
Sugar is sweet,
five
And so are you.

$ # lines matching address is more than file to be read
$ # 3 lines matching 'are' but only 2 lines from stdin
$ seq 2 | sed '/are/R /dev/stdin' poem.txt
Roses are red,
1
Violets are blue,
2
Sugar is sweet,
And so are you.

n and N commands

  • These two commands will fetch next line (newline or NUL character separated, depending on options)

Quoting from sed manual - common commands for n command

If auto-print is not disabled, print the pattern space, then, regardless, replace the pattern space with the next line of input. If there is no more input then sed exits without processing any more commands.

$ # if line contains 'blue', replace 'e' with 'E' only for following line
$ sed '/blue/{n;s/e/E/g}' poem.txt
Roses are red,
Violets are blue,
Sugar is swEEt,
And so are you.

$ # better illustrated with -n option
$ sed -n '/blue/{n;s/e/E/pg}' poem.txt
Sugar is swEEt,

$ # if line contains 'blue', replace 'e' with 'E' only for next to next line
$ sed -n '/blue/{n;n;s/e/E/pg}' poem.txt
And so arE you.

Quoting from sed manual - other commands for N command

Add a newline to the pattern space, then append the next line of input to the pattern space. If there is no more input then sed exits without processing any more commands

When -z is used, a zero byte (the ascii ‘NUL’ character) is added between the lines (instead of a new line)

$ # if line contains 'blue', replace 'e' with 'E' both in current line and next
$ sed '/blue/{N;s/e/E/g}' poem.txt
Roses are red,
ViolEts arE bluE,
Sugar is swEEt,
And so are you.

$ # better illustrated with -n option
$ sed -n '/blue/{N;s/e/E/pg}' poem.txt
ViolEts arE bluE,
Sugar is swEEt,

$ sed -n '/blue/{N;N;s/e/E/pg}' poem.txt
ViolEts arE bluE,
Sugar is swEEt,
And so arE you.
  • Combination
$ # n will fetch next line, current line is out of pattern space
$ # N will then add another line
$ sed -n '/blue/{n;N;s/e/E/pg}' poem.txt
Sugar is swEEt,
And so arE you.
  • not necessary to qualify with an address
$ seq 6 | sed 'n;cXYZ'
1
XYZ
3
XYZ
5
XYZ

$ seq 6 | sed 'N;s/\n/ /'
1 2
3 4
5 6

Control structures


if then else

$ # changing -ve to +ve and vice versa
$ cat nums.txt
42
-2
10101
-3.14
-75
$ # same as: perl -pe '/^-/ ? s/// : s/^/-/'
$ # empty REGEXP section will reuse previous REGEXP, in this case /^-/
$ sed '/^-/{s///;b}; s/^/-/' nums.txt
-42
2
-10101
3.14
75

$ # same as: perl -pe '/are/ ? s/e/*/g : s/e/#/g'
$ # if line contains 'are' replace 'e' with '*' else replace 'e' with '#'
$ sed '/are/{s/e/*/g;b}; s/e/#/g' poem.txt
Ros*s ar* r*d,
Viol*ts ar* blu*,
Sugar is sw##t,
And so ar* you.

replacing in specific column

$ # replace space with underscore only in 3rd column
$ # ^(([^|]+\|){2} captures first two columns
$ # [^|]* zero or more non-column separator characters
$ # as long as match is found, command will be repeated on same input line
$ echo 'foo bar|a b c|1 2 3|xyz abc' | sed -E ':a s/^(([^|]+\|){2}[^|]*) /\1_/; ta'
foo bar|a b c|1_2_3|xyz abc

$ # use awk/perl for simpler syntax
$ # for ex: awk 'BEGIN{FS=OFS="|"} {gsub(/ /,"_",$3); print}'
  • example to show difference between b and t
$ # whether or not 'R' is found on lines containing 'are', branch will happen
$ sed '/are/{s/R/*/g;b}; s/e/#/g' poem.txt
*oses are red,
Violets are blue,
Sugar is sw##t,
And so are you.

$ # branch only if line contains 'are' and substitution of 'R' succeeds
$ sed '/are/{s/R/*/g;t}; s/e/#/g' poem.txt
*oses are red,
Viol#ts ar# blu#,
Sugar is sw##t,
And so ar# you.

overlapping substitutions

  • t command looping with label comes in handy for overlapping substitutions as well
  • Note that in general this method will work recursively, see stackoverflow - substitute recursively for example
$ # consider the problem of replacing empty columns with something
$ # case1: no consecutive empty columns - no problem
$ echo 'foo::bar::baz' | sed 's/::/:0:/g'
foo:0:bar:0:baz
$ # case2: consecutive empty columns are present - problematic
$ echo 'foo:::bar::baz' | sed 's/::/:0:/g'
foo:0::bar:0:baz

$ # t command looping will handle both cases
$ echo 'foo::bar::baz' | sed ':a s/::/:0:/; ta'
foo:0:bar:0:baz
$ echo 'foo:::bar::baz' | sed ':a s/::/:0:/; ta'
foo:0:0:bar:0:baz

Lines between two REGEXPs

  • Simple cases were seen in address range section
  • This section will deal with more cases and some corner cases

Include or Exclude matching REGEXPs

Consider the sample input file, for simplicity the two REGEXPs are BEGIN and END strings instead of regular expressions

$ cat range.txt
foo
BEGIN
1234
6789
END
bar
BEGIN
a
b
c
END
baz

First, lines between the two REGEXPs are to be printed

  • Case 1: both starting and ending REGEXP part of output
$ sed -n '/BEGIN/,/END/p' range.txt
BEGIN
1234
6789
END
BEGIN
a
b
c
END
  • Case 2: both starting and ending REGEXP not part of ouput
$ # remember that empty REGEXP section will reuse previously matched REGEXP
$ sed -n '/BEGIN/,/END/{//!p}' range.txt
1234
6789
a
b
c
  • Case 3: only starting REGEXP part of output
$ sed -n '/BEGIN/,/END/{/END/!p}' range.txt
BEGIN
1234
6789
BEGIN
a
b
c
  • Case 4: only ending REGEXP part of output
$ sed -n '/BEGIN/,/END/{/BEGIN/!p}' range.txt
1234
6789
END
a
b
c
END

Second, lines between the two REGEXPs are to be deleted

  • Case 5: both starting and ending REGEXP not part of output
$ sed '/BEGIN/,/END/d' range.txt
foo
bar
baz
  • Case 6: both starting and ending REGEXP part of output
$ # remember that empty REGEXP section will reuse previously matched REGEXP
$ sed '/BEGIN/,/END/{//!d}' range.txt
foo
BEGIN
END
bar
BEGIN
END
baz
  • Case 7: only starting REGEXP part of output
$ sed '/BEGIN/,/END/{/BEGIN/!d}' range.txt
foo
BEGIN
bar
BEGIN
baz
  • Case 8: only ending REGEXP part of output
$ sed '/BEGIN/,/END/{/END/!d}' range.txt
foo
END
bar
END
baz

First or Last block

  • Getting first block is very simple by using q command
$ sed -n '/BEGIN/,/END/{p;/END/q}' range.txt
BEGIN
1234
6789
END

$ # use other tricks discussed in previous section as needed
$ sed -n '/BEGIN/,/END/{//!p;/END/q}' range.txt
1234
6789
  • To get last block, reverse the input linewise, the order of REGEXPs and finally reverse again
$ tac range.txt | sed -n '/END/,/BEGIN/{p;/BEGIN/q}' | tac
BEGIN
a
b
c
END

$ # use other tricks discussed in previous section as needed
$ tac range.txt | sed -n '/END/,/BEGIN/{//!p;/BEGIN/q}' | tac
a
b
c
  • To get a specific block, say 3rd one, awk or perl would be a better choice

Broken blocks

  • If there are blocks with ending REGEXP but without corresponding starting REGEXP, sed -n '/BEGIN/,/END/p' will suffice
  • Consider the modified input file where final starting REGEXP doesn't have corresponding ending
$ cat broken_range.txt
foo
BEGIN
1234
6789
END
bar
BEGIN
a
b
c
baz
  • All lines till end of file gets printed with simple use of sed -n '/BEGIN/,/END/p'
  • The file reversing trick comes in handy here as well
  • But if both kinds of broken blocks are present, further processing will be required. Better to use awk or perl in such cases
$ sed -n '/BEGIN/,/END/p' broken_range.txt
BEGIN
1234
6789
END
BEGIN
a
b
c
baz

$ tac broken_range.txt | sed -n '/END/,/BEGIN/p' | tac
BEGIN
1234
6789
END
  • If there are multiple starting REGEXP but single ending REGEXP, the reversing trick comes handy again
$ cat uneven_range.txt
foo
BEGIN
1234
BEGIN
42
6789
END
bar
BEGIN
a
BEGIN
b
BEGIN
c
BEGIN
d
BEGIN
e
END
baz

$ tac uneven_range.txt | sed -n '/END/,/BEGIN/p' | tac
BEGIN
42
6789
END
BEGIN
e
END

sed scripts

$ cat script.sed
# each line is a command
/is/cfoo bar
/you/r 3.txt
/you/d
# single quotes can be used freely
s/are/'are'/g

$ sed -f script.sed poem.txt
Roses 'are' red,
Violets 'are' blue,
foo bar
3
13

$ # command line options are specified as usual
$ sed -nf script.sed poem.txt
foo bar
3
13
$ type sed
sed is /bin/sed

$ cat executable.sed
#!/bin/sed -f
/is/cfoo bar
/you/r 3.txt
/you/d
s/are/'are'/g

$ chmod +x executable.sed

$ ./executable.sed poem.txt
Roses 'are' red,
Violets 'are' blue,
foo bar
3
13

$ ./executable.sed -n poem.txt
foo bar
3
13

Gotchas and Tips

  • dos style line endings
$ # no issue with unix style line ending
$ printf 'foo bar\n123 789\n' | sed -E 's/\w+$/xyz/'
foo xyz
123 xyz

$ # dos style line ending causes trouble
$ printf 'foo bar\r\n123 789\r\n' | sed -E 's/\w+$/xyz/'
foo bar
123 789

$ # can be corrected by adding \r as well to match
$ # if needed, add \r in replacement section as well
$ printf 'foo bar\r\n123 789\r\n' | sed -E 's/\w+\r$/xyz/'
foo xyz
123 xyz
  • changing dos to unix style line ending and vice versa
$ # bash functions
$ unix2dos() { sed -i 's/$/\r/' "$@" ; }
$ dos2unix() { sed -i 's/\r$//' "$@" ; }

$ cat -A 5.txt
five$
1five$

$ unix2dos 5.txt
$ cat -A 5.txt
five^M$
1five^M$

$ dos2unix 5.txt
$ cat -A 5.txt
five$
1five$
$ # variables don't get expanded within single quotes
$ printf 'user\nhome\n' | sed '/user/ s/$/: $USER/'
user: $USER
home
$ printf 'user\nhome\n' | sed '/user/ s/$/: '"$USER"'/'
user: learnbyexample
home

$ # variable being substituted cannot have the delimiter character
$ printf 'user\nhome\n' | sed '/home/ s/$/: '"$HOME"'/'
sed: -e expression #1, char 15: unknown option to `s'
$ printf 'user\nhome\n' | sed '/home/ s#$#: '"$HOME"'#'
user
home: /home/learnbyexample

$ # use r command for robust insertion from file/command-output
$ sed '1a'"$(seq 2)" 5.txt
sed: -e expression #1, char 5: missing command
$ seq 2 | sed '1r /dev/stdin' 5.txt
five
1
2
1five
  • common regular expression mistakes #1 - greediness
$ s='foo and bar and baz land good'
$ echo "$s" | sed 's/foo.*ba/123 789/'
123 789z land good

$ # use a more restrictive version
$ echo "$s" | sed -E 's/foo \w+ ba/123 789/'
123 789r and baz land good

$ # or use a tool with non-greedy feature available
$ echo "$s" | perl -pe 's/foo.*?ba/123 789/'
123 789r and baz land good

$ # for single characters, use negated character class
$ echo 'foo=123,baz=789,xyz=42' | sed 's/foo=.*,//'
xyz=42
$ echo 'foo=123,baz=789,xyz=42' | sed 's/foo=[^,]*,//'
baz=789,xyz=42
  • common regular expression mistakes #2 - BRE vs ERE syntax
$ # + needs to be escaped with BRE or enable ERE
$ echo 'like 42 and 37' | sed 's/[0-9]+/xxx/g'
like 42 and 37
$ echo 'like 42 and 37' | sed -E 's/[0-9]+/xxx/g'
like xxx and xxx

$ # or escaping when not required
$ echo 'get {} and let' | sed 's/\{\}/[]/'
sed: -e expression #1, char 10: Invalid preceding regular expression
$ echo 'get {} and let' | sed 's/{}/[]/'
get [] and let
  • common regular expression mistakes #3 - using PCRE syntax/features
    • especially by trying out solution on online sites like regex101 and expecting it to work with sed as well
$ # \d is not available as backslash character class, will match 'd' instead
$ echo 'like 42 and 37' | sed -E 's/\d+/xxx/g'
like 42 anxxx 37
$ echo 'like 42 and 37' | sed -E 's/[0-9]+/xxx/g'
like xxx and xxx

$ # features like lookarounds/non-greedy/etc not available
$ echo 'foo,baz,,xyz,,,123' | sed -E 's/,\K(?=,)/NaN/g'
sed: -e expression #1, char 16: Invalid preceding regular expression
$ echo 'foo,baz,,xyz,,,123' | perl -pe 's/,\K(?=,)/NaN/g'
foo,baz,NaN,xyz,NaN,NaN,123
  • common regular expression mistakes #4 - end of line white-space
$ printf 'foo bar \n123 789\t\n' | sed -E 's/\w+$/xyz/'
foo bar 
123 789 

$ printf 'foo bar \n123 789\t\n' | sed -E 's/\w+\s*$/xyz/'
foo xyz
123 xyz
$ time sed -nE '/^([a-d][r-z]){3}$/p' /usr/share/dict/words
avatar
awards
cravat

real    0m0.058s
$ time LC_ALL=C sed -nE '/^([a-d][r-z]){3}$/p' /usr/share/dict/words
avatar
awards
cravat

real    0m0.038s

$ time sed -nE '/^([a-z]..)\1$/p' /usr/share/dict/words > /dev/null

real    0m0.111s
$ time LC_ALL=C sed -nE '/^([a-z]..)\1$/p' /usr/share/dict/words > /dev/null

real    0m0.073s

Further Reading