Moving Retain Archives to Another Disk Efficiently Using Linux Rsync Command

  • 7019348
  • 10-Nov-2014
  • 07-Aug-2017

Environment


Retain (all versions up to date)
Linux

Situation


I want to move my Retain archives to another disk (everything under the "archive" directory in the Retain storage area).  Just copying the archive directory all at once can take weeks or longer, so is there a way to speed this process up?

NOTE:  If you are moving Retain from one server to another, you'll not only want to copy the "archive" directory but all of the other directories in that same storage path: index, export, xml, ebdb, and license.  You do not need to copy the backup directory for obvious reasons.  This article focuses on copying the archive directory because it is the most time consuming.  It can take days or weeks depending on the number of files, their sizes, etc.


Resolution

Yes, by running the Linux rsync command in parallel.  Support has a tool that is under construction for automating/simplifying various tasks such as this one. 

Here is the concept:
Here is how the commands would look for each directory. Obviously, the destination directory for you would be different from what is listed here (/storage/archive):
rsync -ravP [top-level directory name] [ipaddress/hostname]:/[destination directory path]
-r = recursive
-a = archive (retains permissions, ownership, and timestamps) NOTE: omit this option if copying to a Windows server - it is not necessary and it produces a non-fatal error.
-v = verbose (provides verbose copy progress) 
-P = shows the progress of the copy
[top-level directory name] = one of the 256 directories under the directory named "archive"
[ipaddress/hostname] = the IP address or DNS hostname of the destination server to which the files are being copied
[destination directory path] = the path the new archive directory
The following example assumes that both the "old" disk and the "new/destination" disk are mounted to the same server:
rsync -ravP 00 /storage/archive/
The top-level directory structure inside the "archive" directory consists of 256 directories using hex numbers.  Then, each of those directories has 256 subdirectories. Each of its subdirectories, in turn, have 256 subdirectories. So, there are a lot of directories for it to go through.

To figure out how many parallel sessions of rsync are optimal, run Rsync on 00 and time it. Then, run it on 01 and 02 at the same time (two separate terminal windows) and time it. If it took 1 hour for 00 alone, doing 01 and 02 should take under two hours combined to make it worth your while, right?

You follow that same paradigm until you hit your sweet spot. There will come a point where running rsync in parallel will be slower, so you want to run as many in parallel where it maximizes your throughput until it fails to do so.

Then, you divide up all 256 top-level directories under "archive" by the number of rsync sessions in parallel that you have determined is the most efficient to make "batches". Those batches then get assigned to a single bash script that launches rsync to process one directory and then, when finished, starts up rsync again to process the next directory until it has completed its list.

Let's say you find that running 4 rsync sessions in parallel is your sweet spot: 256 divided by 4 = 64. This means that 4 bash scripts would be created and each script would have 64 directories it was responsible for.

Thus, BashScript1 takes directories 00 through 3F. BashScript2 takes 40 through 7F, and so forth. 

If I look inside BashScript1, I might see something like this:
rsync -ravP 00 /storage/archive/
rsync -ravP 01 /storage/archive/
rsync -ravP 02 /storage/archive/
rsync -ravP 03 /storage/archive/
rsync -ravP 04 /storage/archive/
rsync -ravP 05 /storage/archive/
rsync -ravP 06 /storage/archive/
rsync -ravP 07 /storage/archive/
rsync -ravP 08 /storage/archive/
rsync -ravP 09 /storage/archive/
rsync -ravP 0A /storage/archive/
rsync -ravP 0B /storage/archive/
rsync -ravP 0C /storage/archive/
rsync -ravP 0D /storage/archive/
rsync -ravP 0E /storage/archive/
rsync -ravP 0F /storage/archive/
rsync -ravP 10 /storage/archive/
rsync -ravP 11 /storage/archive/
rsync -ravP 12 /storage/archive/
rsync -ravP 13 /storage/archive/
rsync -ravP 14 /storage/archive/
rsync -ravP 15 /storage/archive/
rsync -ravP 16 /storage/archive/
rsync -ravP 17 /storage/archive/
rsync -ravP 18 /storage/archive/
rsync -ravP 19 /storage/archive/
rsync -ravP 1A /storage/archive/
rsync -ravP 1B /storage/archive/
rsync -ravP 1C /storage/archive/
rsync -ravP 1D /storage/archive/
rsync -ravP 1E /storage/archive/
rsync -ravP 1F /storage/archive/
rsync -ravP 20 /storage/archive/
rsync -ravP 21 /storage/archive/
rsync -ravP 22 /storage/archive/
rsync -ravP 23 /storage/archive/
rsync -ravP 24 /storage/archive/
rsync -ravP 25 /storage/archive/
rsync -ravP 26 /storage/archive/
rsync -ravP 27 /storage/archive/
rsync -ravP 28 /storage/archive/
rsync -ravP 29 /storage/archive/
rsync -ravP 2A /storage/archive/
rsync -ravP 2B /storage/archive/
rsync -ravP 2C /storage/archive/
rsync -ravP 2D /storage/archive/
rsync -ravP 2E /storage/archive/
rsync -ravP 2F /storage/archive/
rsync -ravP 30 /storage/archive/
rsync -ravP 31 /storage/archive/
rsync -ravP 32 /storage/archive/
rsync -ravP 33 /storage/archive/
rsync -ravP 34 /storage/archive/
rsync -ravP 35 /storage/archive/
rsync -ravP 36 /storage/archive/
rsync -ravP 37 /storage/archive/
rsync -ravP 38 /storage/archive/
rsync -ravP 39 /storage/archive/
rsync -ravP 3A /storage/archive/
rsync -ravP 3B /storage/archive/
rsync -ravP 3C /storage/archive/
rsync -ravP 3D /storage/archive/
rsync -ravP 3E /storage/archive/
rsync -ravP 3F /storage/archive/

Additional Information

This article was originally published in the GWAVA knowledgebase as article ID 2404.