4 Useful Tools to Find and Delete Duplicate Files in Linux

Organizing your home directory or even system can be particularly hard if you have the habit of downloading all kinds of stuff from the internet.

Often you may find you have downloaded the same mp3, pdf, epub (and all kind of other file extensions) and copied it to different directories. This may cause your directories to become cluttered with all kinds of useless duplicated stuff.

In this tutorial, you are going to learn how to find and delete duplicate files in Linux using rdfind and fdupes command-line tools, as well as using GUI tools called DupeGuru and FSlint.

A note of caution – always be careful what you delete on your system as this may lead to unwanted data loss. If you are using a new tool, first try it in a test directory where deleting files will not be a problem.

1. Rdfind – Finds Duplicate Files in Linux

Rdfind comes from redundant data find. It is a free tool used to find duplicate files across or within multiple directories. It uses checksum and finds duplicates based on file contains not only names.

Rdfind uses an algorithm to classify the files and detects which of the duplicates is the original file and considers the rest as duplicates. The rules of ranking are:

  • If A was found while scanning an input argument earlier than B, A is higher ranked.
  • If A was found at a depth lower than B, A is higher ranked.
  • If A was found earlier than B, A is higher ranked.

The last rule is used particularly when two files are found in the same directory.

To install rdfind in Linux, use the following command as per your Linux distribution.

$ sudo apt-get install rdfind     [On Debian/Ubuntu]
$ sudo yum install epel-release && $ sudo yum install rdfind    [On CentOS/RHEL]
$ sudo dnf install rdfind         [On Fedora 22+]
$ sudo pacman -S rdfind   [On Arch Linux]

To run rdfind on a directory simply type rdfind and the target directory. Here is an example:

$ rdfind /home/user
Find Duplicate Files in Linux
Find Duplicate Files in Linux

As you can see rdfind will save the results in a file called results.txt located in the same directory from where you ran the program. The file contains all the duplicate files that rdfind has found. You can review the file and remove the duplicate files manually if you want to.

Another thing you can do is to use the -dryrun an option that will provide a list of duplicates without taking any actions:

$ rdfind -dryrun true /home/user

When you find the duplicates, you can choose to replace them with hard links.

$ rdfind -makehardlinks true /home/user

And if you wish to delete the duplicates you can run.

$ rdfind -deleteduplicates true /home/user

To check other useful options of rdfind you can use the rdfind manual with.

$ man rdfind 

2. Fdupes – Scan for Duplicate Files in Linux

Fdupes is another program that allows you to identify duplicate files on your system. It is free and open-source and written in C. It uses the following methods to determine duplicate files:

  • Comparing partial md5sum signatures
  • Comparing full md5sum signatures
  • byte-by-byte comparison verification

Just like rdfind it has similar options:

  • Search recursively
  • Exclude empty files
  • Shows size of duplicate files
  • Delete duplicates immediately
  • Exclude files with a different owner

To install fdupes in Linux, use the following command as per your Linux distribution.

$ sudo apt-get install fdupes     [On Debian/Ubuntu]
$ sudo yum install epel-release && $ sudo yum install fdupes    [On CentOS/RHEL]
$ sudo dnf install fdupes         [On Fedora 22+]
$ sudo pacman -S fdupes   [On Arch Linux]

Fdupes syntax is similar to rdfind. Simply type the command followed by the directory you wish to scan.

$ fdupes <dir>

To search files recursively, you will have to specify the -r an option like this.

$ fdupes -r <dir>

You can also specify multiple directories and specify a dir to be searched recursively.

$ fdupes <dir1> -r <dir2>

To have fdupes calculate the size of the duplicate files use the -S option.

$ fdupes -S <dir>

To gather summarized information about the found files use the -m option.

$ fdupes -m <dir>
Scan Duplicate Files in Linux
Scan Duplicate Files in Linux

Finally, if you want to delete all duplicates use the -d an option like this.

$ fdupes -d <dir>

Fdupes will ask which of the found files to delete. You will need to enter the file number:

Delete Duplicate Files in Linux
Delete Duplicate Files in Linux

A solution that is definitely not recommended is to use the -N option which will result in preserving the first file only.

$ fdupes -dN <dir>

To get a list of available options to use with fdupes review the help page by running.

$ fdupes -help

3. dupeGuru – Find Duplicate Files in a Linux

dupeGuru is an open-source and cross-platform tool that can be used to find duplicate files in a Linux system. The tool can either scan filenames or content in one or more folders. It also allows you to find the filename that is similar to the files you are searching for.

dupeGuru comes in different versions for Windows, Mac, and Linux platforms. Its quick fuzzy matching algorithm feature helps you to find duplicate files within a minute. It is customizable, you can pull the exact duplicate files you want to, and Wipeout unwanted files from the system.

To install dupeGuru in Linux, use the following command as per your Linux distribution.

--------------- On Debian/Ubuntu/Mint --------------- 
$ sudo add-apt-repository ppa:dupeguru/ppa
$ sudo apt-get update
$ sudo apt-get install dupeguru
--------------- On Arch Linux --------------- 
$ sudo pacman -S dupeguru
DupeGuru - Find Duplicate Files in Linux
DupeGuru – Find Duplicate Files in Linux

4. FSlint – Duplicate File Finder for Linux

FSlint is a free utility that is used to find and clean various forms of lint on a filesystem. It also reports duplicate files, empty directories, temporary files, duplicate/conflicting (binary) names, bad symbolic links and many more. It has both command-line and GUI modes.

To install FSlint in Linux, use the following command as per your Linux distribution.

$ sudo apt-get install fslint     [On Debian/Ubuntu]
$ sudo yum install epel-release && $ sudo yum install fslint    [On CentOS/RHEL]
$ sudo dnf install fslint         [On Fedora 22+]
$ sudo pacman -S fslint   [On Arch Linux]
FSlint - Duplicate File Finder for -Linux
FSlint – Duplicate File Finder for -Linux
Conclusion

These are the very useful tools to find duplicated files on your Linux system, but you should be very careful when deleting such files.

If you are unsure if you need a file or not, it would be better to create a backup of that file and remember its directory prior to deleting it. If you have any questions or comments, please submit them in the comment section below.

If you liked this article, then do subscribe to email alerts for Linux tutorials. If you have any questions or doubts? do ask for help in the comments section.

If You Appreciate What We Do Here On TecMint, You Should Consider:

TecMint is the fastest growing and most trusted community site for any kind of Linux Articles, Guides and Books on the web. Millions of people visit TecMint! to search or browse the thousands of published articles available FREELY to all.

If you like what you are reading, please consider buying us a coffee ( or 2 ) as a token of appreciation.

Support Us

We are thankful for your never ending support.

11 thoughts on “4 Useful Tools to Find and Delete Duplicate Files in Linux”

  1. Hi! I have many duplicates, but I would like to keep the keywords of all of them when merging them into one book. My hard drive is full of Duplicate File. Is there a way so that I don’t have to copy it by hand? Many people recommenced Duplicate File Deleter. Thanks!

    Reply
  2. Interesting tools, but I have read somewhere that using “ls” command with the appropriate options is a simple way to find duplicate files. am I wrong?

    Reply
    • “ls” can show you file sizes, but two files being the same size is not the same as them being duplicates.

      Also, consider how you would use ls for this in a directory tree with thousands of directories and millions of files.

      Ideally, a tool will generate a list of files, optionally excluding eg. those with 0 sizes. Then it will weed out files with unique sizes since we know they don’t have dups. The rest will need checksums computed and compared; a really sophisticated utility might only fingerprint the first MB or so of each file, and the only checksum the whole files if the initial fingerprints aren’t unique. Ie., if I have 10GB files, each of which has a unique first MB, why read through the whole files?

      I came here looking for something to find duplicate files on my Synology; pulling files over a GE network to checksum them would take *forever*. To my delight the rdfind binary works just fine as-is.

      Reply
  3. Using rdfind helped me solve an issue with the number of inodes I had left on my drive. Running the command using the `-makehardlinks` option reduced my inode usage by 70% and my data usage by 23gb!

    Thanks for the useful tutorial!

    Reply
    • Sure it does.

      sh-4.3# /root/rdfind .
      Now scanning ".", found 7883 files.
      Now have 7883 files in total.
      Removed 0 files due to nonunique device and inode.
      Total size is 3866881470 bytes or 4 GiB
      Removed 4229 files due to unique sizes from list.3654 files left.
      Now eliminating candidates based on first bytes:removed 10 files from list.3644 files left.
      Now eliminating candidates based on last bytes:removed 3050 files from list.594 files left.
      Now eliminating candidates based on sha1 checksum:removed 396 files from list.198 files left.
      It seems like you have 198 files that are not unique
      Totally, 22 MiB can be reduced.
      Now making results file results.txt
      
      sh-4.3# find . -type d |wc
           47      47    1226
      
      sh-4.3# find . -type f |wc
         7884    8002  337870
      
      Reply

Got something to say? Join the discussion.

Have a question or suggestion? Please leave a comment to start the discussion. Please keep in mind that all comments are moderated and your email address will NOT be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.