How to Use Awk to Filter Text or Strings Using Pattern Specific Actions

In the third part of the Awk command series, we shall take a look at filtering text or strings based on specific patterns that a user can define.

Sometimes, when filtering text, you want to indicate certain lines from an input file or lines of strings based on a given condition or using a specific pattern that can be matched. Doing this with Awk is very easy, it is one of the great features of Awk that you will find helpful.

Let us take a look at an example below, say you have a shopping list for food items that you want to buy, called food_prices.list. It has the following list of food items and their prices.

$ cat food_prices.list 
No	Item_Name		Quantity	Price
1	Mangoes			   10		$2.45
2	Apples			   20		$1.50
3	Bananas			   5		$0.90
4	Pineapples		   10		$3.46
5	Oranges			   10		$0.78
6	Tomatoes		   5		$0.55
7	Onions			   5            $0.45

And then, you want to indicate a (*) sign on food items whose price is greater than $2, this can be done by running the following command:

$ awk '/ *\$[2-9]\.[0-9][0-9] */ { print $1, $2, $3, $4, "*" ; } / *\$[0-1]\.[0-9][0-9] */ { print ; }' food_prices.list
Print Items Whose Price is Greater Than $2

Print Items Whose Price is Greater Than $2

From the output above, you can see that the there is a (*) sign at the end of the lines having food items, mangoes and pineapples. If you check their prices, they are above $2.

In this example, we have used used two patterns:

  1. the first: / *\$[2-9]\.[0-9][0-9] */ gets the lines that have food item price greater than $2 and
  2. the second: /*\$[0-1]\.[0-9][0-9] */ looks for lines with food item price less than $2.

This is what happens, there are four fields in the file, when pattern one encounters a line with food item price greater than $2, it prints all the four fields and a (*) sign at the end of the line as a flag.

The second pattern simply prints the other lines with food price less than $2 as they appear in the input file, food_prices.list.

This way you can use pattern specific actions to filter out food items that are priced above $2, though there is a problem with the output, the lines that have the (*) sign are not formatted out like the rest of the lines making the output not clear enough.

We saw the same problem in Part 2 of the awk series, but we can solve it in two ways:

1. Using printf command which is a long and boring way using the command below:

$ awk '/ *\$[2-9]\.[0-9][0-9] */ { printf "%-10s %-10s %-10s %-10s\n", $1, $2, $3, $4 "*" ; } / *\$[0-1]\.[0-9][0-9] */ { printf "%-10s %-10s %-10s %-10s\n", $1, $2, $3, $4; }' food_prices.list 
Filter and Print Items Using Awk and Printf

Filter and Print Items Using Awk and Printf

2. Using $0 field. Awk uses the variable 0 to store the whole input line. This is handy for solving the problem above and it is simple and fast as follows:

$ awk '/ *\$[2-9]\.[0-9][0-9] */ { print $0 "*" ; } / *\$[0-1]\.[0-9][0-9] */ { print ; }' food_prices.list 
Filter and Print Items Using Awk and Variable

Filter and Print Items Using Awk and Variable

Conclusion

That’s it for now and these are simple ways of filtering text using pattern specific action that can help in flagging lines of text or strings in a file using Awk command.

Hope you find this article helpful and remember to read the next part of the series which will focus on using comparison operators using awk tool.

If You Appreciate What We Do Here On TecMint, You Should Consider:

TecMint is the fastest growing and most trusted community site for any kind of Linux Articles, Guides and Books on the web. Millions of people visit TecMint! to search or browse the thousands of published articles available FREELY to all.

If you like what you are reading, please consider buying us a coffee ( or 2 ) as a token of appreciation.

Support Us

We are thankful for your never ending support.

Aaron Kili

Aaron Kili is a Linux and F.O.S.S enthusiast, an upcoming Linux SysAdmin, web developer, and currently a content creator for TecMint who loves working with computers and strongly believes in sharing knowledge.

Your name can also be listed here. Got a tip? Submit it here to become an TecMint author.

RedHat RHCE and RHCSA Certification Book
Linux Foundation LFCS and LFCE Certification Preparation Guide

You may also like...

4 Responses

  1. Aaron Kili K says:

    Good suggestion, we shall look more into conditional statements in AWK in one of the next parts of the series. Thanks for reading.

  2. Gurpreet Singh says:

    Much simpler: awk ‘{w=$4;gsub(/\$/, “”, w);if(w+0>2){print $0, “*”}else{print $0}}’ food_prices.list

    • Aaron Kili K says:

      That is a great suggestion but only works for experienced users. In the one of the upcoming parts of the Awk series, we shall look at how to use the control statements in Awk in detail.

      • lethargos says:

        I’m quite inexperienced, and your solution is really difficult to follow, because you give so few details. For instance, you don’t explain how this (‘/ *\$[2-9]\.[0-9][0-9] */ { print $1, $2, $3, $4, “*” ; } / *\$[0-1]\.[0-9][0-9] */ { print ; }’) actually works.

        You don’t say why there’s a space and then a *, given that in a previous post you said that . means any character and * should mean 0 or however many of the proceeding character.

        Then there’s a ; after print, which again you don’t explain – might be meaningless after all, but when you explain to inexperienced users, you shouldn’t leave out so many things. Normally the ; is not necessary, but I suppose you’re writing it for consistency. You don’t explain what %-10s is and so on, and so forth.

        I’ve been following tecmint for quite a lot time and I like it, but these types of posts seem to work only as solutions to problems users had thought of before hand. They’re not really tutorials.

        In other contexts being so pragmatic should work (such as setting up a web server or a mail server, where you simply want it to work), but here people who want to learn need much more detail. In my opinion, the article should have been double in size.

        Moreover, the gif image is really hard to follow. When you try to concentrate on how awk filters the text, you need to see the output permanently, so as to compare it to the original and understand how awk syntax works. It’s quite frustrating, to be honest.

        At first glance, Gurpreet Singh’s actually seems simpler, as his syntax is more self-explanatory in a way than yours.

Leave a Reply to Gurpreet Singh Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.