Products | Versions |
---|---|
TIBCO LogLogic Enterprise Virtual Appliance | all versions |
This article contains some moderate amount of background information to explain how the commands work that are used in this procedure. If you want to skip directly to the commands themselves then refer to the Resolution section.
The initial data migration process occurs as the first part of an HA configuration that has been newly enabled or when a node has left the cluster and then eventually rejoins. This process is a one way copy from master to slave that is initiated by the slave to pull necessary data from the master to make both nodes have identical information. The initial data migration is used to get a slave node caught up with existing data already on the master node and after the existing data has been copied the initial data migration part will end and regular minutely replication will start as an ongoing process until the HA is disabled or until a node drops out of the cluster. This also applies to the Data Migration function when migrating data from one appliance to another.
There are multiple phases to the initial data migration. This article won't go into the details of all the phases but rather focus on just a couple of them. Replication occurs at both the file system level and at the transaction level within the database. This article focuses on the file system. At the file system level there are 3 types of data that must be replicated: raw data, index data and full database tables that are no longer undergoing transaction activity.
A binary called myisam_rsync manages the database table replication. A binary called volume_rsrync manages the raw and index data replication.
When these 2 processes are executing they spawn a child rsync process which tunnels through ssh to copy data from master to slave. We have to do a little digging to get more insight into the synchronization process. During phase 2 of the initial data migration there are 2 activities that must occur: because the data migration is a one-way copy, any raw and index data file that only exists on the slave are deleted and after that action completes the rsync process builds a list of files to transfer to the slave.
While the file deletion occurs there is nothing logged to any log file or displayed in the GUI so the only way to know the progress is to attach to the rsync process using strace. To do that run the following ps command first to see the list of rsync-related processes:
Logapp root:/loglogic/tmp/volrsync_::ffff:192.168.72.182 0$ ps -ef | grep rsync
root 19153 18949 0 May14 ? 00:00:02 /loglogic/bin/volume_rsync --verbose --all-files --delete
root 19155 19153 0 May14 ? 00:00:00 /loglogic/bin/volume_rsync --verbose --all-files --delete
root 19156 19155 0 May14 ? 00:01:10 /loglogic/bin/rsync -6 --rsync-path=/loglogic/bin/rsync --verbose --stats --progress --timeout=86400 --rsh=ssh -t -o PasswordAuthentication=no -o StrictHostKeyChecking=no --whole-file --delete --delete-during --sparse --check-file-blocks --exclude=????/??/??/????index/* --links --owner --group --times --perms --temp-dir=/loglogic/tmp//volrsync_::ffff:192.168.72.182 --delay-updates --recursive toor@[::ffff:192.168.72.181]:/loglogic/data/vol1/ /loglogic/data/vol1/
root 19158 19156 2 May14 ? 00:20:01 ssh -t -o PasswordAuthentication=no -o StrictHostKeyChecking=no -l toor ::ffff:192.168.72.181 /loglogic/bin/rsync --server --sender -vlWogtprSe.Lsf --timeout=86400 --check-file-blocks . /loglogic/data/vol1/
root 19406 19156 3 May14 ? 00:22:03 /loglogic/bin/rsync -6 --rsync-path=/loglogic/bin/rsync --verbose --stats --progress --timeout=86400 --rsh=ssh -t -o PasswordAuthentication=no -o StrictHostKeyChecking=no --whole-file --delete --delete-during --sparse --check-file-blocks --exclude=????/??/??/????index/* --links --owner --group --times --perms --temp-dir=/loglogic/tmp//volrsync_::ffff:192.168.72.182 --delay-updates --recursive toor@[::ffff:192.168.72.181]:/loglogic/data/vol1/ /loglogic/data/vol1/
The processes listed from the ps command will be shown with their own process id and parent process id. The line in red above should be the process to focus on. If you try attaching to the ssh or the other rsync process you won't be able to recognize anything because the data will be encrypted. The process id will be the value you need to use with the strace command.
Now execute strace with the -p switch. -p allows connecting to an existing process. In this case that will be the first rsync process listed above. Strace is a system call trace program that is included with many Linux distributions, including the one used by LogLogic LMI. Because it can see system calls there will be a lot of information displayed on the screen. Below is just a small snippet but, as you can see, we recognize this information because it contains the names of BFQ files and their BFQ path showing being deleted. Although it isn't noticeable with this small amount of output, the rsync command is deleting these in sequential directory order which means we can use that to monitor progress. Rsync will continue deleting files through the directories that exist on the slave but not on the master. You can use the du command on the master and compare to the same directories on the slave to see in which directories this activity occurs.
Logapp root:/loglogic/tmp/volrsync_::ffff:192.168.72.182 0$ strace -p 19156
unlink("2018/04/30/2100/rawdata_10057_1525125420_180-365.txt.gz") = 0
write(1, "deleting 2018/04/30/2100/rawdata"..., 65) = 65
unlink("2018/04/30/2100/rawdata_10054_1525125240_180-365.txt.gz") = 0
write(1, "deleting 2018/04/30/2100/rawdata"..., 65) = 65
unlink("2018/04/30/2100/rawdata_10051_1525125060_180-365.txt.gz") = 0
write(1, "deleting 2018/04/30/2100/rawdata"..., 65) = 65
unlink("2018/04/30/2100/rawdata_10048_1525124880_180-365.txt.gz") = 0
write(1, "deleting 2018/04/30/2100/rawdata"..., 65) = 65
unlink("2018/04/30/2100/rawdata_10045_1525124700_180-365.txt.gz") = 0
Once the deletions are complete we must use a different procedure to monitor the raw data copy. Unfortunately this procedure does not work as well for index data. The volume_rsync program is designed to use a directory with a name structure like this /loglogic/tmp/volrsync_::ffff:192.168.72.182 (where 192.168.72.182 is the IP of the slave node) to store each file rsync'ed from master to slave before the slave node moves the file from that location to it's final location in the BFQ. We can use this to our advantage.
When you perform a directory listing on /loglogic/tmp/volrsync_::ffff:192.168.72.182 during when the raw data copy is ongoing you'll see a listing like this:
Logapp root:/loglogic/tmp/volrsync_::ffff:192.168.72.182 0$ ls -la
total 100112
drwx------ 2 root root 4096 May 17 13:45 .
drwxrwxrwx 9 root root 4096 May 17 13:44 ..
-rw------- 1 root root 1024925810 May 17 13:45 .rawdata_10045_1521134700_180-365.txt.gz.ufjWLw
The filename should change every second or so. Larger files will take longer to copy so they may stay there for a little longer in order to completely finish transferring. The following long command can parse the filename to monitor the progress through the BFQ structure. The command pulls out the epoch timestamp from each file and runs it through the date command to convert it to a user friendly format. Note the date command used will display the time in the system's local timezone but BFQ data is stored using directories with names based on UTC so be sure to account for that if your appliance is not already using UTC. Putting this into the watch command allows for a perfect method for monitoring the progress. The command is written to be executed from /loglogic/tmp/volrsync_::ffff:192.168.72.182 but that can be changed.
Logapp root:/loglogic/tmp/volrsync_::ffff:192.168.72.182 0$ watch -n2 "ls -1a | grep data | cut -d. -f2 | cut -d_ -f3 | awk '{print \"@\"\$1}' | xargs date -d"
The output will look like this and the dates will update every 2 seconds:
Every 2.0s: ls -1a | grep data | cut -d. -f2 | cut -d_ -f3 | awk '{print "@"$1}' | xargs date -d Thu May 17 13:51:33 2018
Thu Mar 15 17:25:00 UTC 2018
Because rsync builds a list of files to transfer before it begins the transfer there will be files created on the master appliance that must also be copied to the slave. Those will be processed by another phase of the initial data migration process. So don't expect the date above to necessarily match the current date before the rsync command finishes. If there isn't that much data to process the rsync will be fast, which will minimize how much time passes and how much data must be copied in a subsequent pass but, if the rsync takes days to copy due to a large data set, then there may be discrepancy of multiple days between the current date and the point at which the rsync stops copying data as part of phase 2.
After completion of raw data copying, index data is copied as part of phase 3 (out of 8) of the initial data migration. The index files are also placed in the same temporary directory but they do not contain any timestamps in their filename that we can leverage. Instead, the only data point we can monitor is the file size. When the sys.log (var/log/sys.log) shows that migration has reached phase 3 of 8 then you will need to switch the monitoring command to the one below to monitor the progress of the individual index file transfer. Note that each hourly index set contains the same filenames so it won't be possible to know where in the BFQ structure the rsync process is at. But this command will at least be able to show that progress is still occurring as long as the filenames and file sizes are changing:
Logapp root:/loglogic/tmp/volrsync_::ffff:192.168.72.182 0$ watch -n2 "ls -la | tail -n1 | awk '{print \"size: \" \$5\", name: \" \$9}'"
The output will look like this and the filename and size will update every 2 seconds:
Every 2.0s: ls -la | tail -n1 | awk '{print "size: " $5", name: " $9}' Thu May 17 14:07:36 2018
size: 24955, name: _4.prx.VTiyj0