12 Useful Commands For Filtering Text for Effective File Operations in Linux

In this article, we will review a number of command line tools that act as filters in Linux. A filter is a program that reads standard input, performs an operation upon it and writes the results to standard output.

For this reason, it can be used to process information in powerful ways such as restructuring output to generate useful reports, modifying text in files and many other system administration tasks.

With that said, below are some of the useful file or text filters in Linux.

1. Awk Command

Awk is a remarkable pattern scanning and processing language, it can be used to build useful filters in Linux. You can start using it by reading through our Awk series Part 1 to Part 13.

Additionally, also read through the awk man page for more info and usage options:

$ man awk

2. Sed Command

sed is a powerful stream editor for filtering and transforming text. We’ve already written a two useful articles on sed, that you can go through it here:

  1. How to use GNU ‘sed’ Command to Create, Edit, and Manipulate files in Linux
  2. 15 Useful ‘sed’ Command Tips and Tricks for Daily Linux System Administration Tasks

The sed man page has added control options and instructions:

$ man sed

3. Grep, Egrep, Fgrep, Rgrep Commands

These filters output lines matching a given pattern. They read lines from a file or standard input, and print all matching lines by default to standard output.

Note: The main program is grep, the variations are simply the same as using specific grep options as below (and they are still being used for backward compatibility):

$ egrep = grep -E
$ fgrep = grep -F
$ rgrep = grep -r  

Below are some basic grep commands:

[email protected] ~ $ grep "aaronkilik" /etc/passwd
aaronkilik:x:1001:1001::/home/aaronkilik:

[email protected] ~ $ cat /etc/passwd | grep "aronkilik"
aaronkilik:x:1001:1001::/home/aaronkilik:

You can read more about What’s Difference Between Grep, Egrep and Fgrep in Linux?.

4. head Command

head is used to display the first parts of a file, it outputs the first 10 lines by default. You can use the -n num flag to specify the number of lines to be displayed:

[email protected] ~ $ head /var/log/auth.log  
Jan  2 10:45:01 TecMint CRON[3383]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan  2 10:45:01 TecMint CRON[3383]: pam_unix(cron:session): session closed for user root
Jan  2 10:51:34 TecMint sudo:  tecmint : TTY=unknown ; PWD=/home/tecmint ; USER=root ; COMMAND=/usr/lib/linuxmint/mintUpdate/checkAPT.py
Jan  2 10:51:34 TecMint sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Jan  2 10:51:39 TecMint sudo: pam_unix(sudo:session): session closed for user root
Jan  2 10:55:01 TecMint CRON[4099]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan  2 10:55:01 TecMint CRON[4099]: pam_unix(cron:session): session closed for user root
Jan  2 11:05:01 TecMint CRON[4138]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan  2 11:05:01 TecMint CRON[4138]: pam_unix(cron:session): session closed for user root
Jan  2 11:09:01 TecMint CRON[4146]: pam_unix(cron:session): session opened for user root by (uid=0)

[email protected] ~ $ head  -n 5 /var/log/auth.log  
Jan  2 10:45:01 TecMint CRON[3383]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan  2 10:45:01 TecMint CRON[3383]: pam_unix(cron:session): session closed for user root
Jan  2 10:51:34 TecMint sudo:  tecmint : TTY=unknown ; PWD=/home/tecmint ; USER=root ; COMMAND=/usr/lib/linuxmint/mintUpdate/checkAPT.py
Jan  2 10:51:34 TecMint sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Jan  2 10:51:39 TecMint sudo: pam_unix(sudo:session): session closed for user root

Learn how to use head command with tail and cat commands for effective usage in Linux.

5. tail Command

tail outputs the last parts (10 lines by default) of a file. Use the -n num switch to specify the number of lines to be displayed.

The command below will output the last 5 lines of the specified file:

[email protected] ~ $ tail -n 5 /var/log/auth.log
Jan  6 13:01:27 TecMint sshd[1269]: Server listening on 0.0.0.0 port 22.
Jan  6 13:01:27 TecMint sshd[1269]: Server listening on :: port 22.
Jan  6 13:01:27 TecMint sshd[1269]: Received SIGHUP; restarting.
Jan  6 13:01:27 TecMint sshd[1269]: Server listening on 0.0.0.0 port 22.
Jan  6 13:01:27 TecMint sshd[1269]: Server listening on :: port 22.

Additionally, tail has a special option -f for watching changes in a file in real-time (especially log files).

The following command will enable you monitor changes in the specified file:

[email protected] ~ $ tail -f /var/log/auth.log
Jan  6 12:58:01 TecMint sshd[1269]: Server listening on :: port 22.
Jan  6 12:58:11 TecMint sshd[1269]: Received SIGHUP; restarting.
Jan  6 12:58:12 TecMint sshd[1269]: Server listening on 0.0.0.0 port 22.
Jan  6 12:58:12 TecMint sshd[1269]: Server listening on :: port 22.
Jan  6 13:01:27 TecMint sshd[1269]: Received SIGHUP; restarting.
Jan  6 13:01:27 TecMint sshd[1269]: Server listening on 0.0.0.0 port 22.
Jan  6 13:01:27 TecMint sshd[1269]: Server listening on :: port 22.
Jan  6 13:01:27 TecMint sshd[1269]: Received SIGHUP; restarting.
Jan  6 13:01:27 TecMint sshd[1269]: Server listening on 0.0.0.0 port 22.
Jan  6 13:01:27 TecMint sshd[1269]: Server listening on :: port 22.

Read through the tail man page for a complete list of usage options and instructions:

$ man tail

6. sort Command

sort is used to sort lines of a text file or from standard input.

Below is the content of a file named domains.list:

[email protected] ~ $ cat domains.list
tecmint.com
tecmint.com
news.tecmint.com
news.tecmint.com
linuxsay.com
linuxsay.com
windowsmint.com
windowsmint.com

You can run a simple sort command to sort the file content like so:

[email protected] ~ $ sort domains.list
linuxsay.com
linuxsay.com
news.tecmint.com
news.tecmint.com
tecmint.com
tecmint.com
windowsmint.com
windowsmint.com

You can use sort command in many ways, go through some of the useful articles on sort command as follows:

  1. 14 Useful Examples of Linux ‘sort’ Command – Part 1
  2. 7 Interesting Linux ‘sort’ Command Examples – Part 2
  3. How to Find and Sort Files Based on Modification Date and Time
  4. How to Sort Output of ‘ls’ Command By Last Modified Date and Time

7. uniq Command

uniq command is used to report or omit repeated lines, it filters lines from standard input and writes the outcome to standard output.

After running sort on an input stream, you can remove repeated lines with uniq as in the example below.

To indicate the number of occurrences of a line, use the -c option and ignore differences in case while comparing by including the -i option:

[email protected] ~ $ cat domains.list
tecmint.com
tecmint.com
news.tecmint.com
news.tecmint.com
linuxsay.com
linuxsay.com
windowsmint.com

[email protected] ~ $ sort domains.list | uniq -c 
2 linuxsay.com
2 news.tecmint.com
2 tecmint.com
1 windowsmint.com 

Read through the uniq man page for further usage info and flags:

$ man uniq

8. fmt Command

fmt simple optimal text formatter, it reformats paragraphs in specified file and prints results to the standard output.

The following is the content extracted from the file domain-list.txt:

1.tecmint.com 2.news.tecmint.com 3.linuxsay.com 4.windowsmint.com

To reformat the above content to a standard list, run the following command with -w switch is used to define the maximum line width:

[email protected] ~ $ cat domain-list.txt 
1.tecmint.com 2.news.tecmint.com 3.linuxsay.com 4.windowsmint.com

[email protected] ~ $ fmt -w 1 domain-list.txt
1.tecmint.com 
2.news.tecmint.com 
3.linuxsay.com 
4.windowsmint.com

9. pr Command

pr command converts text files or standard input for printing. For instance on Debian systems, you can list all installed packages as follows:

$ dpkg -l

To organize the list in pages and columns ready for printing, issue the following command.

[email protected] ~ $ dpkg -l | pr --columns 3 -l 20  

2017-01-06 13:19                                                  Page 1


Desired=Unknown/Install ii  adduser		ii  apg
| Status=Not/Inst/Conf- ii  adwaita-icon-theme	ii  app-install-data
|/ Err?=(none)/Reinst-r ii  adwaita-icon-theme- ii  apparmor
||/ Name		ii  alsa-base		ii  apt
+++-=================== ii  alsa-utils		ii  apt-clone
ii  accountsservice	ii  anacron		ii  apt-transport-https
ii  acl			ii  apache2		ii  apt-utils
ii  acpi-support	ii  apache2-bin		ii  apt-xapian-index
ii  acpid		ii  apache2-data	ii  aptdaemon
ii  add-apt-key		ii  apache2-utils	ii  aptdaemon-data


2017-01-06 13:19                                                  Page 2


ii  aptitude		ii  avahi-daemon	ii  bind9-host
ii  aptitude-common	ii  avahi-utils		ii  binfmt-support
ii  apturl		ii  aview		ii  binutils
ii  apturl-common	ii  banshee		ii  bison
ii  archdetect-deb	ii  baobab		ii  blt
ii  aspell		ii  base-files		ii  blueberry
ii  aspell-en		ii  base-passwd		ii  bluetooth
ii  at-spi2-core	ii  bash		ii  bluez
ii  attr		ii  bash-completion	ii  bluez-cups
ii  avahi-autoipd	ii  bc			ii  bluez-obexd

.....

The flags used here are:

  1. --column defines number of columns created in the output.
  2. -l specifies page length (default is 66 lines).

10. tr Command

This tool translates or deletes characters from standard input and writes results to standard output.

The syntax for using tr is as follows:

$ tr options set1 set2

Take a look at the examples below, in the first command, set1( [:upper:] ) represents the case of input characters (all upper case).

Then set2([:lower:]) represents the case in which the resultant characters will be. It’s same thing in the second example and the escape sequence \n means print output on a new line:

[email protected] ~ $ echo "WWW.TECMINT.COM" | tr [:upper:] [:lower:]
www.tecmint.com

[email protected] ~ $ echo "news.tecmint.com" | tr [:lower:] [:upper:]
NEWS.TECMINT.COM

11. more Command

more command is a useful file perusal filter created basically for certificate viewing. It shows file content in a page like format, where users can press [Enter] to view more information.

You can use it to view large files like so:

[email protected] ~ $ dmesg | more
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 4.4.0-21-generic ([email protected]) (gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2) ) #37-Ubuntu SMP Mon Apr 18 18:33:37 UTC 2016 (Ubuntu 4.4.0-21.37-generic
 4.4.6)
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.4.0-21-generic root=UUID=bb29dda3-bdaa-4b39-86cf-4a6dc9634a1b ro quiet splash vt.handoff=7
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers'
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[    0.000000] x86/fpu: Using 'eager' FPU context switches.
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009d3ff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009d400-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000a56affff] usable
[    0.000000] BIOS-e820: [mem 0x00000000a56b0000-0x00000000a5eaffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000a5eb0000-0x00000000aaabefff] usable
--More--

12. less Command

less is the opposite of more command above but it offers extra features and it’s a little faster with large files.

Use it in the same way as more:

[email protected] ~ $ dmesg | less
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 4.4.0-21-generic ([email protected]) (gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2) ) #37-Ubuntu SMP Mon Apr 18 18:33:37 UTC 2016 (Ubuntu 4.4.0-21.37-generic
 4.4.6)
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.4.0-21-generic root=UUID=bb29dda3-bdaa-4b39-86cf-4a6dc9634a1b ro quiet splash vt.handoff=7
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers'
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[    0.000000] x86/fpu: Using 'eager' FPU context switches.
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009d3ff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009d400-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000a56affff] usable
[    0.000000] BIOS-e820: [mem 0x00000000a56b0000-0x00000000a5eaffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000a5eb0000-0x00000000aaabefff] usable
:

Learn Why ‘less’ is Faster Than ‘more’ Command for effective file navigation in Linux.

That’s all for now, do let us know of any useful command line tools not mentioned here, that act as a text filters in Linux via the comment section below.

Best Affordable Linux and WordPress Services For Your Business
Outsource Your Linux and WordPress Project and Get it Promptly Completed Remotely and Delivered Online.

If You Appreciate What We Do Here On TecMint, You Should Consider:

  1. Stay Connected to: Twitter | Facebook | Google Plus
  2. Subscribe to our email updates: Sign Up Now
  3. Get your own self-hosted blog with a Free Domain at ($3.45/month).
  4. Become a Supporter - Make a contribution via PayPal
  5. Support us by purchasing our premium books in PDF format.
  6. Support us by taking our online Linux courses

We are thankful for your never ending support.

Aaron Kili

Aaron Kili is a Linux and F.O.S.S enthusiast, an upcoming Linux SysAdmin, web developer, and currently a content creator for TecMint who loves working with computers and strongly believes in sharing knowledge.

Your name can also be listed here. Got a tip? Submit it here to become an TecMint author.

RedHat RHCE and RHCSA Certification Book
Linux Foundation LFCS and LFCE Certification Preparation Guide

You may also like...

Got something to say? Join the discussion.

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.