restic option to configure S3 region

If you find yourself relying on restic using rclone to talk to non-default regions you may want to check out the just released restic version 0.9.6. To me it appears to be fixed when working with Oracle Cloud Infrastructure (OCI) object storage. Below shows a test accessing Phoenix endpoint with the new -o option.

# restic -r s3:<tenancy_name>.compat.objectstorage.us-phoenix-1.oraclecloud.com/restic-backups snapshots -o s3.region="us-phoenix-1"
repository <....> opened successfully, password is correct
ID        Time                 Host                          Tags        Paths
----------------------------------------------------------------------------------------
f23784fd  2019-10-27 05:10:02  host01.domain.com  mytag     /etc

Restic create backup and set tag with date logic

Also see previous post https://blog.ls-al.com/bash-date-usage-for-naming if you are interested. This post is similar but more specific to restic tagging.

Below is a test script and a test run. At the time of restic backup I create a tag in order to do snapshot forget based on tags.

root@pop-os:/tmp# cat backup-tags.sh 
#!/bin/bash

create_tag () {
  tag="daily"
  if [ $(date +%a) == "Sun" ]; then tag="weekly" ; fi
  if [ $(date +%d) == "01" ]; then 
   tag="monthly"
   if [ $(date +%b) == "Jan" ]; then
     tag="yearly"
   fi
  fi
}
create_tag
echo "backup policy: " $tag

create_tag_unit_test () {
  for i in {1..95}
  do 
      tdate=$(date -d "+$i day")
      tag="daily"
      if [ $(date -d "+$i day" +%a) == "Sun" ]; then tag="weekly" ; fi
      if [ $(date -d "+$i day" +%d) == "01" ]; then
      tag="monthly"
        if [ $(date -d "+$i day" +%b) == "Jan" ]; then
          tag="yearly"
        fi
      fi
  printf "%s - %s - %s | " "$(date -d "+$i day" +%d)" "$(date -d "+$i day" +%a)" "$tag" 
  if [ $(( $i %5 )) -eq 0 ]; then printf "\n"; fi
  done
}
create_tag_unit_test

root@pop-os:/tmp# ./backup-tags.sh 
backup policy:  daily
22 - Fri - daily      | 23 - Sat - daily      | 24 - Sun - weekly     | 25 - Mon - daily      | 26 - Tue - daily      | 
27 - Wed - daily      | 28 - Thu - daily      | 29 - Fri - daily      | 30 - Sat - daily      | 01 - Sun - monthly    | 
02 - Mon - daily      | 03 - Tue - daily      | 04 - Wed - daily      | 05 - Thu - daily      | 06 - Fri - daily      | 
07 - Sat - daily      | 08 - Sun - weekly     | 09 - Mon - daily      | 10 - Tue - daily      | 11 - Wed - daily      | 
12 - Thu - daily      | 13 - Fri - daily      | 14 - Sat - daily      | 15 - Sun - weekly     | 16 - Mon - daily      | 
17 - Tue - daily      | 18 - Wed - daily      | 19 - Thu - daily      | 20 - Fri - daily      | 21 - Sat - daily      | 
22 - Sun - weekly     | 23 - Mon - daily      | 24 - Tue - daily      | 25 - Wed - daily      | 26 - Thu - daily      | 
27 - Fri - daily      | 28 - Sat - daily      | 29 - Sun - weekly     | 30 - Mon - daily      | 31 - Tue - daily      | 
01 - Wed - yearly     | 02 - Thu - daily      | 03 - Fri - daily      | 04 - Sat - daily      | 05 - Sun - weekly     | 
06 - Mon - daily      | 07 - Tue - daily      | 08 - Wed - daily      | 09 - Thu - daily      | 10 - Fri - daily      | 
11 - Sat - daily      | 12 - Sun - weekly     | 13 - Mon - daily      | 14 - Tue - daily      | 15 - Wed - daily      | 
16 - Thu - daily      | 17 - Fri - daily      | 18 - Sat - daily      | 19 - Sun - weekly     | 20 - Mon - daily      | 

Below is the restic backup script setting a tag and then snapshot forget based on the tag.

As always this is NOT tested use at your own risk.

My “policy” is:

  • weekly on Sunday
  • 01 of every month is a monthly except if 01 is also a new year which makes it a yearly
  • everything else is a daily
root@pop-os:~/scripts# cat desktop-restic.sh 
#!/bin/bash
### wake up backup server and restic backup to 3TB ZFS mirror
cd /root/scripts
./wake-backup-server.sh

source /root/.restic.env

## Quick and dirty logic for snapshot tagging
create_tag () {
  tag="daily"
  if [ $(date +%a) == "Sun" ]; then tag="weekly" ; fi
  if [ $(date +%d) == "01" ]; then
   tag="monthly"
   if [ $(date +%b) == "Jan" ]; then
     tag="yearly"
   fi
  fi
}

create_tag
restic backup -q /DATA /ARCHIVE --tag "$tag" --exclude *.vdi --exclude *.iso --exclude *.ova --exclude *.img --exclude *.vmdk

restic forget -q --tag daily --keep-last 7
restic forget -q --tag weekly --keep-last 4
restic forget -q --tag monthly --keep-last 12

if [ "$tag" == "weekly" ]; then
  restic -q prune
fi

sleep 1m
ssh user@192.168.1.250 sudo shutdown now

AWS Cloudwatch Cron

I was trying to schedule a once a week snapshot of a EBS volume and getting “Parameter ScheduleExpression is not valid“. Turns out I missed something small. If you schedule using a cron expression note this important requirement: One of the day-of-month or day-of-week values must be a question mark (?)

I was trying:

0 1 * * SUN *

What worked was:

0 1 ? * SUN *

Oracle OCI CLI Query

Some bash snippets of using –query, jq and interacting with Bash to manipulate into variables.

Collect boot volume’s id

SRCBOOTVOLID=$(oci --profile $profile bv boot-volume list --compartment-id "$source_compartment" --availability-domain "$source_ad" --query "data [?\"display-name\" == '$instance_name (Boot Volume)'].{id:id}" | jq -r '.[] | .id')

Collect instance ocid

INSTANCEID=$(oci --profile $profile compute instance launch --availability-domain $target_ad --compartment-id $sandbox_compartment --shape VM.Standard1.1 --display-name "burner-$instance_name-instance-for-custom-image" --source-boot-volume-id $BOOTVOLID --wait-for-state RUNNING --subnet-id $sandbox_subnetid --query "data .{id:id}" | jq -r '. | .id')

Stop instance and collect the id (or whatever you need from the json)

STOPPEDID=$(oci --profile $profile compute instance action --action STOP --instance-id $INSTANCEID --wait-for-state STOPPED --query "data .{id:id}" | jq -r '. | .id')

Collect the work-request-id to monitor in a loop after I export a custom image to object storage. Note in the query the field I need is NOT in the data section.

WORKREQUESTID=$(oci --profile $profile compute image export to-object --image-id $IMAGEID --namespace faketenancy --bucket-name DR-Images --name $today-$instance_name-custom-image-object --query '"opc-work-request-id"' --raw-output)

while [ "$RESULT" != "SUCCEEDED" ]
do
  RESULT=$(oci --profile myprofile work-requests work-request get --work-request-id $WORKREQUESTID --query "data .{status:status}" | jq -r '. | .status')
  echo "running export job and $RESULT checking every 2 mins"
  sleep 2m
done

Restic snapshot detail json to csv

Restic shows details of a snapshot. Sometimes you want that to be CSV but the json output for paths, excludes and tags are lists which will choke the @csv jq filter. Furthermore not all snapshots have the excludes key. Here are some snippets on solving above. Use join to collapse the lists and use if to test if key exists.

# restic -r $REPO snapshots --last --json | jq -r '.[] | [.hostname,.short_id,.time,(.paths|join(",")),if (.excludes) then (.excludes|join(",")) else empty end]'
[
  "bkupserver.domain.com",
  "c56d3e2e",
  "2019-10-25T00:10:01.767408581-05:00",
  "/etc,/home,/root,/u01/backuplogs,/var/log,/var/spool/cron",
  "**/diag/**,/var/spool/lastlog"
]

And using CSV filter

# restic -r $REPO snapshots --last --json | jq -r '.[] | [.hostname,.short_id,.time,(.paths|join(",")),if (.excludes) then (.excludes|join(",")) else empty end] | @csv'
"bkupserver.domain.com","c56d3e2e","2019-10-25T00:10:01.767408581-05:00","/etc,/home,/root,/u01/backuplogs,/var/log,/var/spool/cron","**/diag/**,/var/spool/lastlog"

SHIPS Password Rotation

As explained on the website “unique and rotated local super user or administrator passwords for environments where it is not possible or not appropriate to disable these local accounts“.

I tested as a proof of concept how to:

  • setup a SHIPS server on CentOS7
  • configure SHIPS folder and ACL’s for devices
  • linux client execute SetAdminPass.sh for password rotation

Note that I simplified this test so the following was true:

  • no LDAP enabled for user logins into web interface (no identLDAP.rb)
  • devices not tested as belonging to LDAP OU (only using lib devicevalidatorany.rb)
  • Used ansible as much as possible to prepare the SHIPS server
  • Self signed certificate means client SetAdminPassword need –insecure with curl to even work.
  • Did not try and autostart SHIPS code on server reboot

So suffice to say you were warned this is not secure and correct way to run SHIPS it is a way to test the basics!

Final run after Ansible ironed out like this:

Download and unzip my file containing ansible playbook ships.yml plus the conf, ships.cert and ships.key files in /usr/src/ships-playbook. Update the conf file with correct IP address.

# yum install ansible -y
# cd /usr/src/ships-playbook/
root@ships ships-playbook]# rm -rf /opt/SHIPS ; ansible-playbook ships.yml

# cd /opt/SHIPS
[root@ships SHIPS]# ruby -r ./lib/identsqlite -r ./lib/identdevice -r ./lib/devicevalidatorany SHIPS.rb
  • from above ansible output capture password for SHIPS administrator user named root. Visit https://ip.addr.ess and login with root user and above password.
  • for folder and ACL configuration watch the section in the video located here https://www.trustedsec.com/2016/03/ships-version-2-released-major-release/
  • I made some changes on the client SetAdminPass.sh script as shown below.
URL='https://192.168.1.98/password'
#URL_OPTS=""

#RESPONSE=$( curl $CURL_OPTS -s "$URL?$URL_OPTSname=$HOST&amp;nonce=$NONCE" )
RESPONSE=$( curl $CURL_OPTS -s "$URL?name=$HOST&amp;nonce=$NONCE" )

#CURL_OPTS=''
CURL_OPTS='--insecure ' #DON'T DO THIS!

HISTORY='/var/run/SHIPS.HIST'

LINKS:

  • https://github.com/trustedsec/SHIPS/
  • https://www.trustedsec.com/2016/03/ships-version-2-released-major-release/
  • https://github.com/trustedsec/SHIPS/blob/master/doc/SHIPS_Installation_v2.pdf

POC of drdb replication

My notes of a quick drdb test…

The Distributed Replicated Block Device (DRBD) provides a networked version of data mirroring, classified under the redundant array of independent disks (RAID) taxonomy as RAID-1.

Showing status, add a file and check target…

After initial sync:

[root@drdb01 ~]# drbdadm status test
test role:Primary
  disk:UpToDate
  peer role:Secondary
    replication:Established peer-disk:UpToDate

[root@drdb02 ~]# drbdadm status test
test role:Secondary
  disk:UpToDate
  peer role:Primary
    replication:Established peer-disk:UpToDate

Create filesystem and some data:

[root@drdb01 ~]# mkfs -t ext4 /dev/drbd0

[root@drdb01 ~]# mkdir -p /mnt/DRDB_PRI
[root@drdb01 ~]# mount /dev/drbd0 /mnt/DRDB_PRI
[root@drdb01 ~]# cd /mnt/DRDB_PRI
[root@drdb01 DRDB_PRI]# ls -l
total 16
drwx------. 2 root root 16384 Aug  6 10:12 lost+found

[root@drdb01 DRDB_PRI]# yum install wget

[root@drdb01 DRDB_PRI]# wget https://osdn.net/projects/systemrescuecd/storage/releases/6.0.3/systemrescuecd-6.0.3.iso
2019-08-06 10:18:13 (4.61 MB/s) - ‘systemrescuecd-6.0.3.iso’ saved [881852416/881852416]

[root@drdb01 DRDB_PRI]# ls -lh
total 842M
drwx------. 2 root root  16K Aug  6 10:12 lost+found
-rw-r--r--. 1 root root 841M Apr 14 08:52 systemrescuecd-6.0.3.iso

Switch roles and check SECONDARY:

[root@drdb01 ~]# umount /mnt/DRDB_PRI
[root@drdb01 ~]# drbdadm secondary test

[root@drdb02 ~]# drbdadm primary test
[root@drdb02 ~]# mkdir -p /mnt/DRDB_SEC
[root@drdb02 ~]# mount /dev/drbd0 /mnt/DRDB_SEC
[root@drdb02 ~]# cd /mnt/DRDB_SEC
[root@drdb02 DRDB_SEC]# ls -lh
total 842M
drwx------. 2 root root  16K Aug  6 10:12 lost+found
-rw-r--r--. 1 root root 841M Apr 14 08:52 systemrescuecd-6.0.3.iso

Switch roles back:

[root@drdb02 DRDB_SEC]# cd
[root@drdb02 ~]# umount /mnt/DRDB_SEC
[root@drdb02 ~]# drbdadm secondary test

[root@drdb01 ~]# drbdadm primary test
[root@drdb01 ~]# mount /dev/drbd0 /mnt/DRDB_PRI

Detailed steps below of how we got to above test…

Node 1 Setup and start initial sync:

rrosso  ~  ssh root@192.168.1.95
[root@drdb01 ~]# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org

[root@drdb01 ~]# rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm

[root@drdb01 ~]# yum install -y kmod-drbd84 drbd84-utils

[root@drdb01 ~]# firewall-cmd --permanent --add-rich-rule='rule family="ipv4"  source address="192.168.1.96" port port="7789" protocol="tcp" accept'
success

[root@drdb01 ~]# firewall-cmd --reload
success

[root@drdb01 ~]# yum install policycoreutils-python

[root@drdb01 ~]# semanage permissive -a drbd_t

[root@drdb01 ~]# df -h
Filesystem           Size  Used Avail Use% Mounted on
/dev/mapper/cl-root  6.2G  1.2G  5.1G  20% /
devtmpfs             990M     0  990M   0% /dev
tmpfs               1001M     0 1001M   0% /dev/shm
tmpfs               1001M  8.4M  992M   1% /run
tmpfs               1001M     0 1001M   0% /sys/fs/cgroup
/dev/sda1           1014M  151M  864M  15% /boot
tmpfs                201M     0  201M   0% /run/user/0

[root@drdb01 ~]# init 0

rrosso  ~  255  ssh root@192.168.1.95

[root@drdb01 ~]# fdisk -l | grep sd
Disk /dev/sda: 8589 MB, 8589934592 bytes, 16777216 sectors
/dev/sda1   *        2048     2099199     1048576   83  Linux
/dev/sda2         2099200    16777215     7339008   8e  Linux LVM
Disk /dev/sdb: 21.5 GB, 21474836480 bytes, 41943040 sectors

[root@drdb01 ~]# vi /etc/drbd.d/global_common.conf 
[root@drdb01 ~]# vi /etc/drbd.d/test.res

[root@drdb01 ~]# uname -n
drdb01.localdomain

** partition disk
[root@drdb01 ~]# fdisk /dev/sdb

[root@drdb01 ~]# vi /etc/drbd.d/test.res

[root@drdb01 ~]# drbdadm create-md test
initializing activity log
initializing bitmap (640 KB) to all zero
Writing meta data...
New drbd meta data block successfully created.

[root@drdb01 ~]# drbdadm up test
The server's response is:
you are the 18305th user to install this version

[root@drdb01 ~]# vi /etc/drbd.d/test.res

[root@drdb01 ~]# drbdadm down test

[root@drdb01 ~]# drbdadm up test

[root@drdb01 ~]# drbdadm status test
test role:Secondary
  disk:Inconsistent
  peer role:Secondary
    replication:Established peer-disk:Inconsistent

[root@drdb01 ~]# drbdadm primary --force test
[root@drdb01 ~]# drbdadm status test
test role:Primary
  disk:UpToDate
  peer role:Secondary
    replication:SyncSource peer-disk:Inconsistent done:0.01

[root@drdb01 ~]# drbdadm status test
test role:Primary
  disk:UpToDate
  peer role:Secondary
    replication:SyncSource peer-disk:Inconsistent done:3.80

[root@drdb01 ~]# drbdadm status test
test role:Primary
  disk:UpToDate
  peer role:Secondary
    replication:SyncSource peer-disk:Inconsistent done:85.14

Node 2 setup and start initial sync:

 rrosso  ~  ssh root@192.168.1.96

[root@drdb01 ~]# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org

[root@drdb01 ~]# rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm

[root@drdb01 ~]# firewall-cmd --permanent --add-rich-rule='rule family="ipv4"  source address="192.168.1.95" port port="7789" protocol="tcp" accept'
success

[root@drdb01 ~]# firewall-cmd --reload
success

[root@drdb01 ~]# yum install policycoreutils-python

[root@drdb01 ~]# semanage permissive -a drbd_t
[root@drdb01 ~]# init 0

 rrosso  ~  255  ssh root@192.168.1.96
root@192.168.1.96's password: 
Last login: Tue Aug  6 09:46:34 2019

[root@drdb01 ~]# fdisk -l | grep sd
Disk /dev/sda: 8589 MB, 8589934592 bytes, 16777216 sectors
/dev/sda1   *        2048     2099199     1048576   83  Linux
/dev/sda2         2099200    16777215     7339008   8e  Linux LVM
Disk /dev/sdb: 21.5 GB, 21474836480 bytes, 41943040 sectors

[root@drdb01 ~]# vi /etc/drbd.d/global_common.conf 
[root@drdb01 ~]# vi /etc/drbd.d/test.res

** partition disk
[root@drdb01 ~]# fdisk /dev/sdb

[root@drdb01 ~]# fdisk -l | grep sd
Disk /dev/sda: 8589 MB, 8589934592 bytes, 16777216 sectors
/dev/sda1   *        2048     2099199     1048576   83  Linux
/dev/sda2         2099200    16777215     7339008   8e  Linux LVM
Disk /dev/sdb: 21.5 GB, 21474836480 bytes, 41943040 sectors
/dev/sdb1            2048    41943039    20970496   83  Linux

[root@drdb01 ~]# vi /etc/drbd.d/test.res

[root@drdb01 ~]# drbdadm create-md test
initializing activity log
initializing bitmap (640 KB) to all zero
Writing meta data...
New drbd meta data block successfully created.

** note I had wrong hostname because I cloned 2nd VM
[root@drdb01 ~]# drbdadm up test
you are the 18306th user to install this version
drbd.d/test.res:6: in resource test, on drdb01.localdomain:
	IP 192.168.1.95 not found on this host.

[root@drdb01 ~]# vi /etc/drbd.d/test.res

[root@drdb01 ~]# uname -a
Linux drdb01.localdomain 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

[root@drdb01 ~]# vi /etc/hostname 
[root@drdb01 ~]# vi /etc/hosts
[root@drdb01 ~]# reboot

 rrosso  ~  255  ssh root@192.168.1.96

[root@drdb02 ~]# uname -a
Linux drdb02.localdomain 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

[root@drdb02 ~]# drbdadm up test

[root@drdb02 ~]# drbdadm status test
test role:Secondary
  disk:Inconsistent
  peer role:Secondary
    replication:Established peer-disk:Inconsistent

[root@drdb02 ~]# drbdadm status test
test role:Secondary
  disk:Inconsistent
  peer role:Primary
    replication:SyncTarget peer-disk:UpToDate done:0.21

[root@drdb02 ~]# drbdadm status test
test role:Secondary
  disk:Inconsistent
  peer role:Primary
    replication:SyncTarget peer-disk:UpToDate done:82.66

LINKS:

https://www.linbit.com/en/disaster-recovery/
https://www.tecmint.com/setup-drbd-storage-replication-on-centos-7/

zfsbackup-go test with minio server

Recording my test with zfsbackup-go. While I am playing around with backup/DR/object storage I also compared the concept here with a previous test around restic/rclone/object storage.

In general ZFS snapshot and replication should work much better with file systems containing huge numbers of files. Most solutions struggle with millions of files and rsync on file level and restic/rclone on object storage level. Walking the tree is just never efficient. So this test works well but has not been scaled yet. I plan to work on that as well as seeing how well the bucket can be synced to different regions.

Minio server

Tip: minio server has a nice browser interface

# docker run -p 9000:9000 --name minio1 -e "MINIO_ACCESS_KEY=AKIAIOSFODNN7EXAMPLE" -e "MINIO_SECRET_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" -v /DATA/minio-repos/:/minio-repos minio/minio server /minio-repos

 You are running an older version of MinIO released 1 week ago 
 Update: docker pull minio/minio:RELEASE.2019-07-17T22-54-12Z 


Endpoint:  http://172.17.0.2:9000  http://127.0.0.1:9000

Browser Access:
   http://172.17.0.2:9000  http://127.0.0.1:9000

Object API (Amazon S3 compatible):
   Go:         https://docs.min.io/docs/golang-client-quickstart-guide
   Java:       https://docs.min.io/docs/java-client-quickstart-guide
   Python:     https://docs.min.io/docs/python-client-quickstart-guide
   JavaScript: https://docs.min.io/docs/javascript-client-quickstart-guide
   .NET:       https://docs.min.io/docs/dotnet-client-quickstart-guide

server 1:

This server simulate our “prod” server. We create an initial data set in /DATA on our server, take snapshot and backup to object storage.

# rsync -a /media/sf_DATA/MyWorkDocs /DATA/

# du -sh /DATA/MyWorkDocs/
1.5G	/DATA/MyWorkDocs/

# export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
# export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
# export AWS_S3_CUSTOM_ENDPOINT=http://192.168.1.112:9000
# export AWS_REGION=us-east-1

# zfs snapshot DATA@20190721-0752

# /usr/local/bin/zfsbackup-go send --full DATA s3://zfs-poc
2019/07/21 07:53:12 Ignoring user provided number of cores (2) and using the number of detected cores (1).
Done.
	Total ZFS Stream Bytes: 1514016976 (1.4 GiB)
	Total Bytes Written: 1176757570 (1.1 GiB)
	Elapsed Time: 1m17.522630438s
	Total Files Uploaded: 7

# /usr/local/bin/zfsbackup-go list s3://zfs-poc
2019/07/21 07:56:57 Ignoring user provided number of cores (2) and using the number of detected cores (1).
Found 1 backup sets:

Volume: DATA
	Snapshot: 20190721-0752 (2019-07-21 07:52:31 -0500 CDT)
	Replication: false
	Archives: 6 - 1176757570 bytes (1.1 GiB)
	Volume Size (Raw): 1514016976 bytes (1.4 GiB)
	Uploaded: 2019-07-21 07:53:12.42972167 -0500 CDT (took 1m16.313538867s)


There are 4 manifests found locally that are not on the target destination.

server 2:

This server is a possible DR or new server but the idea is somewhere else preferably another cloud region or data center.

# export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
# export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
# export AWS_S3_CUSTOM_ENDPOINT=http://192.168.1.112:9000
# export AWS_REGION=us-east-1
# /usr/local/bin/zfsbackup-go list s3://zfs-poc
2019/07/21 07:59:16 Ignoring user provided number of cores (2) and using the number of detected cores (1).
Found 1 backup sets:

Volume: DATA
	Snapshot: 20190721-0752 (2019-07-21 07:52:31 -0500 CDT)
	Replication: false
	Archives: 6 - 1176757570 bytes (1.1 GiB)
	Volume Size (Raw): 1514016976 bytes (1.4 GiB)
	Uploaded: 2019-07-21 07:53:12.42972167 -0500 CDT (took 1m16.313538867s)

# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
DATA  2.70M  96.4G    26K  /DATA
# zfs list -t snapshot
no datasets available
# ls /DATA/

** using -F. This is a COLD DR style test with no existing infrastructure/ZFS sets on target systems
# /usr/local/bin/zfsbackup-go receive --auto DATA s3://zfs-poc DATA -F
2019/07/21 08:05:28 Ignoring user provided number of cores (2) and using the number of detected cores (1).
2019/07/21 08:06:42 Done. Elapsed Time: 1m13.968871681s
2019/07/21 08:06:42 Done.
# ls /DATA/
MyWorkDocs
# du -sh /DATA/MyWorkDocs/
1.5G	/DATA/MyWorkDocs/
# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
DATA  1.41G  95.0G  1.40G  /DATA
# zfs list -t snapshot
NAME                 USED  AVAIL  REFER  MOUNTPOINT
DATA@20190721-0752   247K      -  1.40G  -

That concludes one test. In theory that is a cold DR situation where you have nothing really ready until you need it. So think build a server and recover /DATA from zfs backup in object storage. So initial restore will be very long depending on your size.

Read on if you are thinking you want to go more towards pilot light or warm DR we can run incremental backups, then on the target server keep receiving snapshots periodically into our target ZFS file system DATA. You may observe why not just do real ZFS send/receive an no object storage in between. There is no good answer except there are many ways you could solve DR and this is one of them. In this case I could argue object storage is cheap and has some very good redundancy/availability features. And your replication between regions may be using a back haul very fast/cheap channel where your VPN or fastconnect WAN between regions may be slow and/or expensive.

You could also be thinking something between cold and warm DR is where you want to be and therefore only apply the full DATA receive when you are ready. That could mean a lot of snapshots likely to apply afterwards. Or maybe not I have not checked on that aspect of a recovery process.

Regardless I like the idea of leveraging zfs with object storage so you may not have a use for this but I definitely will.

Incremental snapshots:

server 1:

Add more data to source, snapshot and backup to object storage.

# rsync -a /media/sf_DATA/MySrc /DATA/
# du -sh /DATA/MySrc/
1.1M	/DATA/MySrc/

# zfs snapshot DATA@20190721-0809
# zfs list -t snapshot
NAME                 USED  AVAIL  REFER  MOUNTPOINT
DATA@20190721-0752    31K      -  1.40G  -
DATA@20190721-0809     0B      -  1.41G  -

# /usr/local/bin/zfsbackup-go send --increment DATA s3://zfs-poc
2019/07/21 08:10:49 Ignoring user provided number of cores (2) and using the number of detected cores (1).
Done.
	Total ZFS Stream Bytes: 1202792 (1.1 MiB)
	Total Bytes Written: 254909 (249 KiB)
	Elapsed Time: 228.123591ms
	Total Files Uploaded: 2

# /usr/local/bin/zfsbackup-go list s3://zfs-poc
2019/07/21 08:11:17 Ignoring user provided number of cores (2) and using the number of detected cores (1).
Found 2 backup sets:

Volume: DATA
	Snapshot: 20190721-0752 (2019-07-21 07:52:31 -0500 CDT)
	Replication: false
	Archives: 6 - 1176757570 bytes (1.1 GiB)
	Volume Size (Raw): 1514016976 bytes (1.4 GiB)
	Uploaded: 2019-07-21 07:53:12.42972167 -0500 CDT (took 1m16.313538867s)


Volume: DATA
	Snapshot: 20190721-0809 (2019-07-21 08:09:47 -0500 CDT)
	Incremental From Snapshot: 20190721-0752 (2019-07-21 07:52:31 -0500 CDT)
	Intermediary: false
	Replication: false
	Archives: 1 - 254909 bytes (249 KiB)
	Volume Size (Raw): 1202792 bytes (1.1 MiB)
	Uploaded: 2019-07-21 08:10:49.3280703 -0500 CDT (took 214.139056ms)

There are 4 manifests found locally that are not on the target destination.

server 2:

# /usr/local/bin/zfsbackup-go list s3://zfs-poc
2019/07/21 08:11:44 Ignoring user provided number of cores (2) and using the number of detected cores (1).
Found 2 backup sets:

Volume: DATA
	Snapshot: 20190721-0752 (2019-07-21 07:52:31 -0500 CDT)
	Replication: false
	Archives: 6 - 1176757570 bytes (1.1 GiB)
	Volume Size (Raw): 1514016976 bytes (1.4 GiB)
	Uploaded: 2019-07-21 07:53:12.42972167 -0500 CDT (took 1m16.313538867s)


Volume: DATA
	Snapshot: 20190721-0809 (2019-07-21 08:09:47 -0500 CDT)
	Incremental From Snapshot: 20190721-0752 (2019-07-21 07:52:31 -0500 CDT)
	Intermediary: false
	Replication: false
	Archives: 1 - 254909 bytes (249 KiB)
	Volume Size (Raw): 1202792 bytes (1.1 MiB)
	Uploaded: 2019-07-21 08:10:49.3280703 -0500 CDT (took 214.139056ms)

** not sure why I need to force (-F) maybe because data set is mounted? message like this:
** cannot receive incremental stream: destination DATA has been modified since most recent snapshot
*** 2019/07/21 08:12:25 Error while trying to read from volume DATA|20190721-0752|to|20190721-0809.zstream.gz.vol1 - io: read/write on closed pipe

# /usr/local/bin/zfsbackup-go receive --auto DATA s3://zfs-poc DATA -F
2019/07/21 08:12:53 Ignoring user provided number of cores (2) and using the number of detected cores (1).
2019/07/21 08:12:54 Done. Elapsed Time: 379.712693ms
2019/07/21 08:12:54 Done.

# ls /DATA/
MySrc  MyWorkDocs
# du -sh /DATA/MySrc/
1.1M	/DATA/MySrc/
# zfs list -t snapshot
NAME                 USED  AVAIL  REFER  MOUNTPOINT
DATA@20190721-0752    30K      -  1.40G  -
DATA@20190721-0809    34K      -  1.41G  -

LINK: https://github.com/someone1/zfsbackup-go

OCI Bucket Delete Fail

If you have trouble deleting an object storage bucket in Oracle Cloud Infrastructure you may have to clear old multipart uploads. The message may look something like this: Bucket named ‘DR-Validation’ has pending multipart uploads. Stop all multipart uploads first.

At the time the only way I could do this was through the API. Did not appear like the CLI or Console could clear out the upload. Below is a little python that may help. Below is an example just to show the idea. And of course if you have thousands of multipart uploads(yes its possible); you will need to change this was only for one or two.

#!/usr/bin/python
#: Script Name  : lobjectparts.py
#: Author       : Riaan Rossouw
#: Date Created : June 13, 2019
#: Date Updated : July 18, 2019
#: Description  : Python Script to list multipart uploads
#: Examples     : lobjectparts.py -t tenancy -r region -b bucket
#:              : lobjectparts.py --tenancy <ocid> --region  <region> --bucket <bucket>

## Will need the api modules
## new: https://oracle-cloud-infrastructure-python-sdk.readthedocs.io/en/latest/
## old: https://oracle-bare-metal-cloud-services-python-sdk.readthedocs.io/en/latest/installation.html#install
## https://oracle-cloud-infrastructure-python-sdk.readthedocs.io/en/latest/api/object_storage/client/oci.object_storage.ObjectStorageClient.html

from __future__ import print_function
import os, optparse, sys, time, datetime
import oci

__version__ = '0.9.1'
optdesc = 'This script is used to list multipart uploads in a bucket'

parser = optparse.OptionParser(version='%prog version ' + __version__)
parser.formatter.max_help_position = 50
parser.add_option('-t', '--tenancy', help='Specify Tenancy ocid', dest='tenancy', action='append')
parser.add_option('-r', '--region', help='region', dest='region', action='append')
parser.add_option('-b', '--bucket', help='bucket', dest='bucket', action='append')

opts, args = parser.parse_args()

def showMultipartUploads(identity, bucket_name):
  object_storage = oci.object_storage.ObjectStorageClient(config)
  namespace_name = object_storage.get_namespace().data
  uploads = object_storage.list_multipart_uploads(namespace_name, bucket_name, limit = 1000).data
  print(' {:35}  | {:15} | {:30} | {:35} | {:20}'.format('bucket','namespace','object','time_created','upload_id'))
  for o in uploads:
    print(' {:35}  | {:15} | {:30} | {:35} | {:20}'.format(o.bucket, o.namespace, o.object, str(o.time_created), o.upload_id))
    confirm = input("Confirm if you want to abort this multipart upload (Y/N): ")
    if confirm == "Y":
      response = object_storage.abort_multipart_upload(o.namespace, o.bucket, o.object, o.upload_id).data
    else:
      print ("Chose to not do the abort action on this multipart upload at this time...")

def main():
  mandatories = ['tenancy','region','bucket']
  for m in mandatories:
    if not opts.__dict__[m]:
      print ("mandatory option is missing\n")
      parser.print_help()
      exit(-1)

  print ('Multipart Uploads')
  config['region'] = opts.region[0]
  identity = oci.identity.IdentityClient(config)
  showMultipartUploads(identity, opts.bucket)

if __name__ == '__main__':
  config = oci.config.from_file("/root/.oci/config","oci.api")
  main()

Restic scripting plus jq and minio client

I am jotting down some recent work on scripting restic and also using restic’s json output with jq and mc (minio client).

NOTE this is not production just example. Use at your own risk. These are edited by hand from real working scripts but since they are edited they will probably have typos etc in them. Again just examples!

Example backup script. Plus uploading json output to an object storage bucket for analysis later.

# cat restic-backup.sh
#!/bin/bash
source /root/.restic-keys
resticprog=/usr/local/bin/restic-custom
#rcloneargs="serve restic --stdio --b2-hard-delete --cache-workers 64 --transfers 64 --retries 21"
region="s3_phx"
rundate=$(date +"%Y-%m-%d-%H%M")
logtop=/reports
logyear=$(date +"%Y")
logmonth=$(date +"%m")
logname=$logtop/$logyear/$logmonth/restic/$rundate-restic-backup
jsonspool=/tmp/restic-fss-jobs

## Backing up some OCI FSS (same as AWS EFS) NFS folders
FSS=(
"fs-oracle-apps|fs-oracle-apps|.snapshot"           ## backup all exclude .snapshot tree
"fs-app1|fs-app1|.snapshot"                         ## backup all exclude .snapshot tree
"fs-sw|fs-sw/oracle_sw,fs-sw/restic_pkg|.snapshot"  ## backup two folders exclude .snapshot tree
"fs-tifs|fs-tifs|.snapshot,.tif"                  ## backup all exclude .snapshot tree and *.tif files
)

## test commands especially before kicking off large backups
function verify_cmds
{
  f=$1
  restic_cmd=$2
  printf "\n$rundate and cmd: $restic_cmd\n"
}

function backup
{
 f=$1
 restic_cmd=$2

 jobstart=$(date +"%Y-%m-%d-%H%M")

 mkdir $jsonspool/$f
 jsonfile=$jsonspool/$f/$jobstart-restic-backup.json
 printf "$jobstart with cmd: $restic_cmd\n"

 mkdir /mnt/$f
 mount -o ro xx.xx.xx.xx:/$f /mnt/$f

 ## TODO: shell issue with passing exclude from variable. verify exclude .snapshot is working
 ## TODO: not passing *.tif exclude fail?  howto pass *?
 $restic_cmd > $jsonfile

 #cat $jsonfile >> $logname-$f.log
 umount /mnt/$f
 rmdir /mnt/$f

## Using rclone to copy to OCI object storage bucket.
## Note the extra level folder so rclone can simulate 
## a server/20190711-restic.log style.
## Very useful with using minio client to analyze logs.
 rclone copy $jsonspool s3_ash:restic-backup-logs

 rm $jsonfile
 rmdir $jsonspool/$f

 jobfinish=$(date +"%Y-%m-%d-%H%M")
 printf "jobfinish $jobfinish\n"
}

for fss in "${FSS[@]}"; do
 arrFSS=(${fss//|/ })

 folders=""
 f=${arrFSS[0]}
 IFS=',' read -ra folderarr <<< ${arrFSS[1]}
 for folder in ${folderarr[@]};do folders+="/mnt/${folder} "; done

 excludearg=""
 IFS=',' read -ra excludearr <<< ${arrFSS[2]}
 for exclude in ${excludearr[@]};do excludearg+=" --exclude ${exclude}"; done

 backup_cmd="$resticprog -r rclone:$region:restic-$f backup ${folders} $excludearg --json"

## play with verify_cmds first before actual backups
 verify_cmds "$f" "$backup_cmd"
 #backup "$f" "$backup_cmd"
done

Since we have json logs in object storage lets check some of then with minio client.

# cat restic-check-logs.sh
#!/bin/bash

fss=(
 fs-oracle-apps
)

#checkdate="2019-07-11"
checkdate=$(date +"%Y-%m-%d")

for f in ${fss[@]}; do
  echo
  echo
  printf "$f:  "
  name=$(mc find s3-ash/restic-backup-logs/$f -name "*$checkdate*" | head -1)
  if [ -n "$name" ]
  then
    echo $name
    # play with sql --query later
    #mc sql --query "select * from S3Object"  --json-input .message_type=summary s3-ash/restic-backup-logs/$f/2019-07-09-1827-restic-backup.json
    mc cat $name  | jq -r 'select(.message_type=="summary")'
  else
    echo "Fail - no file found"
  fi
done

Example run of minio client against json

# ./restic-check-logs.sh

fs-oracle-apps:  s3-ash/restic-backup-logs/fs-oracle-apps/2019-07-12-0928-restic-backup.json
{
  "message_type": "summary",
  "files_new": 291,
  "files_changed": 1,
  "files_unmodified": 678976,
  "dirs_new": 0,
  "dirs_changed": 1,
  "dirs_unmodified": 0,
  "data_blobs": 171,
  "tree_blobs": 2,
  "data_added": 2244824,
  "total_files_processed": 679268,
  "total_bytes_processed": 38808398197,
  "total_duration": 1708.162522559,
  "snapshot_id": "f3e4dc06"
}

Note all of this was done with Oracle Cloud Infrastructure (OCI) object storage. Here are some observations around the OCI S3 compatible object storage.

  1. restic can not reach both us-ashburn-1 and us-phoenix-1 regions natively. s3:<tenant>.compat.objectstorage.us-ashburn-1.oraclecloud.com works but s3:<tenant>.compat.objectstorage.us-phoenix-1.oraclecloud.com does NOT work. Since restic can use rclone I am using rclone to access OCI object storage and rclone can reach both regions.
  2. rclone can reach both regions.
  3. minio command line client (mc) have the same issue as restic. Can reach us-ashburn-1 but not us-phoenix-1.
  4. minio python API can connect to us-ashburn-1 but shows an empty bucket list.