How to Use Awk and Regular Expressions to Filter Text or String in Files

If You Appreciate What We Do Here On TecMint, You Should Consider:

  1. Stay Connected to: Twitter | Facebook | Google Plus
  2. Subscribe to our email updates: Sign Up Now
  3. Use our Linode referral link if you plan to buy VPS (it starts at only $10/month).
  4. Support us via PayPal donate - Make a Donation
  5. Support us by purchasing our premium books in PDF format.
  6. Support us by taking our online Linux courses

We are thankful for your never ending support.

Aaron Kili

Aaron Kili is a Linux and F.O.S.S enthusiast, an upcoming Linux SysAdmin, web developer, and currently a content creator for TecMint who loves working with computers and strongly believes in sharing knowledge.

Your name can also be listed here. Got a tip? Submit it here to become an TecMint author.

RedHat RHCE and RHCSA Certification Book
Linux Foundation LFCS and LFCE Certification Preparation Guide

You may also like...

18 Responses

  1. Sh says:

    there is no file named localhost in /etc/ directory…
    # awk ‘/l*c/{print}’ /etc/localhost

    The gif is fine but the command is wrong

  2. Erik Persson says:

    Nice tutorial.

    There are however some things that are not correct. "*" is NOT short for “any number of characters”. "*" in regular expressions means zero or more of the *preceding* character.

    For example /l*c/ matches all strings containing any number of l followed by a c. Thus it matches lc, llc, lllc, llllllc, but since any number of l can be zero it also matches just c.

    Thus /l*c/ is equivalent to all strings containing a c. l* is totally superfluous. All strings matched by /l*c/ will also be matched by /c/, and all strings matched by /c/ will also be matched by /l*c/. /t*t/ matches any string containing a *single* t.

    Again, in this situation the t* will not do anything. If you want to match any character you use a period, ex /t.*t/ matches any string containing two t (a t, followed any number of any character, and then a t).

    If you want to match a string beginning with a t and ending with a t you need “anchors”. The regular expression /^t.*t$/ matches a string starting and ending with a t.

    If you want to match a string containing a word beginning with a t and ending with a t you need word boundaries. I don’t know if you have word boundaries in awk regular expressions. However you do have them in perl regular expressions. You may use white space “\s” to compensate for the lack of word boundaries but then you must also know that the start and end of a string is not considered word boundaries. Thus, if you want to match all strings containing words beginning and ending with a t you need something like /(^|\s)t\S*t(\s|$)/

    /Erik

  3. Suresh Ravanam says:

    This is the best awk tutorial ever. I never understood awk so thoroughly. A simple explanation of awk sysntax(i.e., awk pattern action file) is enough to understand the awk command which i have not read anywhere else. thanks a lot.

  4. erramah says:

    Thank you guys it is really useful website and I appreciate your effort.

  5. Kostyanius says:

    Hi,
    Here is a little typo in tutorial in first awk command.
    awk ‘//{print}’/etc/hosts
    It should be devided to: awk ‘//{print}’ /etc/hosts so as whitespace was missed and this command doesn`t do anything but just waiting for something.

  6. Shashank says:

    The best knowledge full page is this, keep posting

  7. kcdtv says:

    Thanks for this tutorial!
    I always used “grep” but awk seems to do very well the job and the syntax is a bit more friendly to my taste.
    I wonder if it is not even more efficient than “grep” too.
    Correct me if i am wrong, but with grep we always need to use a pipe.
    I assume that awk is a bit more efficient (for file crawling) and that we should try to use it instead of grep (if we can.)
    Am I wrong?
    awk is so powerfull and amazing, i am impatient to read the next chapter :)

    • Aaron Kili K says:

      Thanks for sharing your experience, both grep and awk are great tools to use. As you have mentioned grep sometimes needs a pipe to deal with filtering text of strings.

      But a user always has to find something convenient to use.

    • me says:

      In the case you just want simple search, use grep, it is faster. AWK is full scripting programming language with syntax similar to C and can do tricks you cannot do with simple grep.

      AWK has good support for associative arrays and that is a strong “tool” when you know how to use it (simple key-value DB). AWK is great tool to process TEXT files. It is easy for AWK to do a calculations or reformat file to the form you want to have, like just extract important information, do a statistic, find what is missing in file, etc.

      Is AWK slow? My experience is that it a little bit faster than Python, it depends on task. AWK is great tool to write pipe filters but it can do more. This article is nice way to introduce AWK as tool to “grep” text.

      • Aaron Kili says:

        @me

        Well written, this is a good explanation to summarize the comparison between grep and Awk, as well as uncovering some powerful features of Awk as a text processing language. Thanks for stopping by.

  8. Tomas says:

    Awk is very useful for printing colums, but everything else can be achieved with grep.

Got something to say? Join the discussion.

Your email address will not be published. Required fields are marked *

Join Over 300K+ Linux Users
  1. 177,942
  2. 8,310
  3. 37,548

Are you subscribed?