How to Split Large ‘tar’ Archive into Multiple Files of Certain Size

Best Affordable Linux and WordPress Services For Your Business
Outsource Your Linux and WordPress Project and Get it Promptly Completed Remotely and Delivered Online.

If You Appreciate What We Do Here On TecMint, You Should Consider:

  1. Stay Connected to: Twitter | Facebook | Google Plus
  2. Subscribe to our email updates: Sign Up Now
  3. Get your own self-hosted blog with a Free Domain at ($3.45/month).
  4. Become a Supporter - Make a contribution via PayPal
  5. Support us by purchasing our premium books in PDF format.
  6. Support us by taking our online Linux courses

We are thankful for your never ending support.

Aaron Kili

Aaron Kili is a Linux and F.O.S.S enthusiast, an upcoming Linux SysAdmin, web developer, and currently a content creator for TecMint who loves working with computers and strongly believes in sharing knowledge.

Your name can also be listed here. Got a tip? Submit it here to become an TecMint author.

RedHat RHCE and RHCSA Certification Book
Linux Foundation LFCS and LFCE Certification Preparation Guide

You may also like...

27 Responses

  1. Reza says:

    wow after three years no one noticed that the join command had error in the wildcard. should have been part* instead of parta*. also there are a few more errors in the article, such as “begin with .php” instead of “ended with .php” also several truncated commands and mixed-mashed command and switches which might mess up on some distribution. for example the .flv extract is nowhere specified on command line, you might mean “video directory” not *.flv. i count eight errors in total, this article seriously needed polish. i might point out the rest if this comment is approved, or someone else could also can.

  2. Reza says:

    wow i can’t believe that no one actually noticed after the years that the join command has error in the wildcard. you’ll get only the fraction of the file if you do:

    # cat home.tar.bz2.parta* >backup.tar.gz.joined

    should have been:

    # cat home.tar.bz2.part* >backup.tar.gz.joined
  3. Greg Palmine says:

    There is a tool calld tarsplitter for permanently splitting a tar into into roughly equal sized parts. Slightly different than what is described here.

  4. Chris Kennedy says:

    Arron well done. Terse and accurate. No mention of tar -p option. Re io performance, the option means only compressed bits are written. gzip -9 minimizes bits to write.

    The most io performance pipe is something like.

    # tar cf - src_dir/ | gzip -9 > tarball.tar.gz
  5. sunny says:

    It might be a very silly question.

    I have more than 1,00,000 files in a directory(60GB data) and I want to tar them based on size 128 MB.

    What will happen to the file on the tape boundary .

    I mean if the 500th file is tarred at around 116MB but the file size is 15 MB will the file get corrupted or the system understands that the file cannot be accommodated in that part and puts it in the next part.

    • Aaron Kili says:


      Tar will create many tar balls of 128 MB and a few, less than that size. It is intelligent enough to allocate all your 100,000 files so that no file(s) are corrupted.

      • sunny says:

        Thanks Aaron.

        I am also experiencing very slow speeds while tarring so many files and such huge data.

        For 7,00,000 (50gb) the system is taking 20 hours . Is there a way I can speed this up

        • Aaron Kili says:


          If you are using tar without compression, the operation will be a little faster. But enabling compression slows down the whole process; for instance if you are using gzip, include the – -best or – -fast flag to make it faster(read man page for more info).

          Secondly, also ensure that there is not a lot of I/O operations running on the system, or any other CPU-time consuming processes.

  6. Jundi says:

    Is this applicable for binary files, like oracle rman backup?

  7. Andrew Nyago says:

    How do we combine the “Parallel Implementation of GZip” here…

  8. Bill says:

    It would be better to use gzip compression when planning on splitting. The reason being is the parts could still be used without the need to region the files.

    e.g. (cat filea fileb filec filed)|tar – xfz

    The same procedure does not work with bzip2 files, as bunzip2 requires a memory mapping of the files.

    You can also if you want to the gzip after splitting, as when concat gzip files together, the procedure still works exactly as described above.

  9. Dan St.Andre says:

    This is an effective explanation of a solution to a common problem. There is another side to these issues that involves initial creation of the large tar archive. The processing to create the archive is likely to take a long time (wall clock). Various events might interrupt that processing before it is complete. Example interruptions might include loss of power, battery depletion, loss of connection to the target disk(s), and so on.

    The tar command does not have the ability to remember which input files and folders have already been processed and resume with the remaining to-be-done files and folders. Instead, the list of files submitted to tar gets determined by the input selection GLOB at the tar command line. I’m certain that readers would be interested in any technique that will enable (1) creation of a list of files, (2) gathering of checkpoint details while files are processed, (3) restart from the most recent checkpoint following an interruption.

    • Ravi Saive says:


      I am totally agree with your point, but to be fact, it is not possible with tar command to resume broken or interrupted process and I think there isn’t any tool does the same….

    • Aaron Kili K says:

      @Dan St.Andre

      Thanks for the appreciation, and as you well expressed your concern, i took out to search for any other powerful Unix/Linux archive creating utilities apart from TAR that can handle the issues you are trying to bring to light.

      TAR is a Unix/Linux standard for this particular purpose, as far as i know(and i stand to be corrected), there is no other utility that can solve the shortcomings of TAR you have mentioned above. Probably in the future, someone will developed a utility that will resolve these issues but at the moment, that is just the way it works.

      But am still on the look out to find a technique or method that you have well explained above, and in case you find one, please always let us know. Many thanks for your feedback.

    • Lewis says:

      The only work around that might work (not tested it personally) is snapshot recovery of a virtual machine running the task, this resilience will not completely recover from last point of blackout though, only since last successful snapshot, furthermore it might involve more resource overhead (disc space, hosting, etc, etc) than initially desired.

      • Aaron Kili K says:


        Many thanks for sharing your thoughts on the matter, as you have mentioned, it can be a long process and requires so much more resources but if necessary then a user can give it a try.

  10. Lars says:

    Nice explanation, but please mention how to puzzle them together again …

    • Aaron Kili K says:


      Thanks for the feedback, we shall update the article to include that.

    • Leonardo Bertolo says:

      Hi, you could just issue

      cat filea fileb filec filed > file.tar

      And there you have your original file back.

      • Ravi Saive says:


        Thanks for the tip, didn’t know about this handy trick, just we’ve included the instructions to combine or join back together files after splitting large tar archive file to the writeup….hope you like it..

    • Ravi Saive says:


      Thanks for finding it useful, as per you request we’ve added a section to join back tar files after splitting to the writeup..

Got something to say? Join the discussion.

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.