Comments on: How to Count Word Occurrences in a Text File

By: Ravi Saive

Ravi Saive — Mon, 15 May 2023 03:50:16 +0000

In reply to Technotron.

@Technotron,

We’ve used Fira Code font for our Linux terminal…

By: Technotron

Technotron — Sun, 14 May 2023 14:41:12 +0000

What is your console font?

By: rob

rob — Sat, 15 May 2021 23:51:14 +0000

It's not perfect but I adapted the 'tr' approach to print a count of each word in some standard input:

tr -c "'[:alnum:]" "\n" | grep "[[:alnum:]]" | sort | uniq -c | sort -n

While grep -c works on a line, this puts every word or number on its own line and sorts them. Then uniq -c deduplicates them as well as printing the number of occurrences. grep is used to remove blank lines only because if you don't, uniq prints out the number of blank lines as well, and I didn't yet come up with a better way to do that. The final sort is optional, used to list the words by frequency of appearance instead of alphanumerically. Note the apostrophe in the first set given to tr such that possessives and contractions remain whole words but parenthesis, quotation marks, and other punctuation are stripped off. Also note that this breaks on longer, comma-separated numbers, turning each group into a (probably meaningless) lone 1-, 2-, or 3-digit number. So just don't try to handle those as though they are words, and there is no problem.

By: Martins Okoi

Martins Okoi — Thu, 30 May 2019 08:45:16 +0000

In reply to Denis.

Awesome!

By: Denis

Denis — Wed, 29 May 2019 09:34:03 +0000

I use Silver Searcher (https://geoff.greer.fm/ag/), which is capable to search ~1TB file in less than a second.

To print all found entries: ag -i mauris example.txt
To only count all entries: ag -c mauris example.txt

Give it a try…