Backup-data: multiple schedule and backends

giacomo · July 11, 2018, 3:48pm

We can’t remove it, we must preserve backward compatibility with thousands servers.
But we can hide the single backup when we will implement a new UI using cockpit.

I agree, but this is the most flexible implementation. We will hide also this complexity.

Already removed in this commit: https://github.com/NethServer/nethserver-rsync/commit/2da9d79eff92f5dc0ee92a4e11b6b85a74487a8e
(Even if upstream is implementing it, but there are some bugs).

I forgot to remove it from the README, just done.

No one is. It’s bad hack for a problem without a simple solution.

No is not: the index is incremental, but still it’s way too slow!

This is true from the command line, but not from the UI which currently needs to create a tree.

This is an error inside the cron.d template, already fixed. Thank you for reporting!

pike · July 11, 2018, 6:59pm

question time…
Restic seems most interesting engine. Is there any plan to integrate a REST server into NethServer?

giacomo · July 11, 2018, 7:55pm

Not officially for now, but you can find the instructions here: https://github.com/NethServer/nethserver-restic/blob/master/README.rst

pagaille · July 11, 2018, 8:50pm

Hi @giacomo

Okay. Still not sure to see the backward compatibily issue (looks easier to migrate old settings to fit the new backup system) but I trust you.

Don’t forget the documentation.

Remember we already share some thoughts about this :

We could also use some Lucene on the fly indexing if we really want the UI be super responsive instead of building a file list when loading the UI page. I’m not sure it’s worth the trouble : someone wanting to restore some files will easily accept to wait some seconds when the list of files is being built.

My pleasure !

giacomo · July 12, 2018, 10:37am

I just merged nethserver-restic and nethserver-rsync inside nethserver-backup-data package.
restic binary has been moved to a restic rpm which can be eventually switched for non-x86_64 architectures.

If you already installed the packages from testing, you need to execute the following:

yum --enablerepo=nethserver-testing clean all
rpm -e nethserver-restic nethserver-rsync --nodeps
yum --enablerepo=nethserver-testing update \*backup\*

hector · July 21, 2018, 2:01am

Ok, this is a work for my new virtualization environment (an old CompaqCQ45 with 8Gbytes of ram), so hands on!!!

dnutan · July 21, 2018, 7:52pm

Components

nethserver-backup-data-1.3.4-1.53.g7dc6bc3.ns7.noarch
nethserver-restore-data-1.2.4-1.2.g5c84689.ns7.noarch

TESTS
test case 1: OK (but last backup info not shown on dashboard, for this and for any other Single Backup test case)
test case 2(a-c) (restic; cifs, nfs, webdav): OK
test case 3(a-c) (rsync; nfs): OK (chown failures on previous test were due to bad nfs config on destination)
test case 3(a-c) (rsync; cifs, webdav): FAILED

Backup is done
Can list backup files (backup-data-list)
Cannot restore files: from server-manager it says the folders were restored but they were not. From CLI reports symlink problem.

results of restore-data command

# restore-data 
Restore started at 2018-07-21 20:16:03
Event pre-restore-data: SUCCESS
rsync: change_dir "/mnt/backup/server/latest" failed: No such file or directory (2)

Number of files: 0
Number of created files: 0
Number of deleted files: 0
Number of regular files transferred: 0
Total file size: 0 bytes
Total transferred file size: 0 bytes
Literal data: 0 bytes
Matched data: 0 bytes
File list size: 0
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 20
Total bytes received: 12

sent 20 bytes  received 12 bytes  64.00 bytes/sec
total size is 0  speedup is 0.00
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1178) [sender=3.1.2]
Action '/etc/e-smith/events/actions/restore-data-rsync': SUCCESS

Result of test case 3 with rsync and webdav

rsync: failed to set permissions on "/mnt/backup/server/2018-07-25-151336/var/lib/nethserver/ibay/sharedfolder": Invalid argument (22)
rsync: mkstemp "/mnt/backup/server/2018-07-25-151336/var/lib/nethserver/nextcloud/.htaccess.EMiL21" failed: Invalid argument (22)
ln: failed to create symbolic link ‘/mnt/backup/server/latest’: Function not implemented
Backup failed
Action 'backup-data-rsync ': FAIL
Backup status: FAIL

Cause of failure:
As reported and explained before, symlink error:

rsync_tmbackup: Backup completed without errors.
ln: failed to create symbolic link ‘/mnt/backup/server/latest’: Operation not supported
Action 'backup-data-rsync ': SUCCESS

General problems
Using single backup with any engine /var/log/last-backup.log is not created.
Log also shows Requested path not found:

esmith::event[4719]: Action: /etc/e-smith/events/pre-backup-data/S20nethserver-backup-config-predatabackup SUCCESS [1.203213]
esmith::event[4719]: Requested path not found

Many problems using webdav on single backup mode with any engine. I suspect the webdav server. Reports of open files exceeding max cache size, problems removing locks, or duplicity’s remote manifest not matching local one. (EDIT: using a different webdav local destination server, no problems).

dnutan · July 25, 2018, 9:27pm

MULTIPLE BACKUPS TEST

Questions

Multiple Backups + Duplicity: Are these properties supported: Type, FullDay, VolSize? Yes
Is IncludeLogs prop supported by Multiple Backups?

test case 4 (duplicity; cifs): FAILED

Details of test case 4 (duplicity; cifs)

# db backups show duplicifs 
duplicifs=duplicity
    BackupTime=0 23 * * *
    CleanupOlderThan=10D
    FullDay=6
    Notify=error
    NotifyFrom=
    NotifyTo=root@localhost
    SMBHost=192.168.1.140
    SMBLogin=nethserver
    SMBPassword=********
    SMBShare=samba
    Type=incremental
    VFSType=cifs
    VolSize=250
    status=enabled

# backup-data -b duplicifs
Backup: duplicifs
Backup started at 2018-07-25 22:48:14
Pre backup scripts status: SUCCESS
umount: /mnt/backup-duplicifs: not mounted
Backup directory is not mounted
Can't mount /mnt/backup-duplicifs
Action 'backup-data-duplicity duplicifs': FAIL
Backup status: FAIL

# /etc/e-smith/events/actions/mount-cifs duplicifs
# touch /mnt/backup-duplicifs/file.txt
# ls /mnt/backup-duplicifs/
file.txt

Log:

Jul 25 23:01:39 server esmith::event[17604]: Event: pre-backup-data
Jul 25 23:01:39 server esmith::event[17604]: ===== Report for configuration backup =====
Jul 25 23:01:39 server esmith::event[17604]: 
Jul 25 23:01:39 server esmith::event[17604]: Backup started at 2018-07-25 23:01:39
Jul 25 23:01:39 server esmith::event[17607]: Event: pre-backup-config
Jul 25 23:01:39 server esmith::event[17607]: expanding /etc/backup-config.d/nethserver-sssd.include
Jul 25 23:01:39 server esmith::event[17607]: Action: /etc/e-smith/events/actions/generic_template_expand SUCCESS [0.191435]
Jul 25 23:01:39 server esmith::event[17607]: Action: /etc/e-smith/events/pre-backup-config/S20nethserver-directory-dump-ldap SUCCESS [0.08888]
Jul 25 23:01:39 server esmith::event[17607]: Action: /etc/e-smith/events/pre-backup-config/S40nethserver-sssd-backup-tdb SUCCESS [0.00395]
Jul 25 23:01:40 server esmith::event[17607]: Action: /etc/e-smith/events/pre-backup-config/S50nethserver-backup-config-list-packages SUCCESS [0.693899]
Jul 25 23:01:40 server esmith::event[17607]: Event: pre-backup-config SUCCESS
Jul 25 23:01:40 server esmith::event[17604]: Event pre-backup-config: SUCCESS
Jul 25 23:01:40 server esmith::event[17604]: Action backup-config-execute: SUCCESS
Jul 25 23:01:40 server esmith::event[17708]: Event: post-backup-config
Jul 25 23:01:40 server esmith::event[17708]: Event: post-backup-config SUCCESS
Jul 25 23:01:40 server esmith::event[17604]: Event post-backup-config: SUCCESS
Jul 25 23:01:40 server esmith::event[17604]: Backup status: SUCCESS
Jul 25 23:01:40 server esmith::event[17604]: Backup ended at 2018-07-25 23:01:40
Jul 25 23:01:40 server esmith::event[17604]: Time elapsed: 0 hours, 0 minutes, 1 seconds
Jul 25 23:01:40 server esmith::event[17604]: Action: /etc/e-smith/events/pre-backup-data/S20nethserver-backup-config-predatabackup SUCCESS [1.305231]
Jul 25 23:01:41 server esmith::event[17604]: Requested path not found
Jul 25 23:01:41 server esmith::event[17604]: Action: /etc/e-smith/events/pre-backup-data/S20nethserver-restore-data-duc-index SUCCESS [0.661085]
Jul 25 23:01:41 server esmith::event[17604]: Action: /etc/e-smith/events/pre-backup-data/S50mysql-dump-tables SUCCESS [0.191907]
Jul 25 23:01:41 server esmith::event[17604]: Action: /etc/e-smith/events/pre-backup-data/S50nethserver-ibays-dump-acls SUCCESS [0.003299]
Jul 25 23:01:41 server esmith::event[17604]: Action: /etc/e-smith/events/pre-backup-data/S70mount-cifs SUCCESS [0.094699]
Jul 25 23:01:41 server esmith::event[17604]: Action: /etc/e-smith/events/pre-backup-data/S70mount-nfs SUCCESS [0.094283]
Jul 25 23:01:41 server esmith::event[17604]: Action: /etc/e-smith/events/pre-backup-data/S70mount-usb SUCCESS [0.091106]
Jul 25 23:01:42 server esmith::event[17604]: Action: /etc/e-smith/events/pre-backup-data/S70mount-webdav SUCCESS [0.094998]
Jul 25 23:01:42 server esmith::event[17604]: Event: pre-backup-data SUCCESS

Cause of failure: /mnt/backup is mounted instead of the specific mountpoint for the backup task, even if Mount property for the backup job is set.

filippo_carletti · July 26, 2018, 10:19am

Yes, multiple backups honor the main backup settings, unless customized.

filippo_carletti · July 26, 2018, 10:33am

Restic retention

Restic deletes snapshots according to the retention policy, but disk space is not reclaimed automatically, you have to run a prune operation.
See https://restic.net/blog/2016-08-22/removing-snapshots for details.

NethServer forces pruning of snapshots to reclaim space on every backup, to mimic duplicity.
But prune is a really expensive operation. Real life example: a backup of about 250G with 2.5 million files and about 1G of changes every day is completed in less than 10 minutes on a 30 gbit/s link.
The same backup takes about 2.5 hours to be pruned.

And, due to deduplication, the prune operation frees a little space. I think we are wasting resources (cpu and time) for a very little benefit.

I propose to run prune only once in a while (maybe every week) as a different cron job.
We could add an option to select the prune frequency, but I don’t like to add another option.

What do you think?

dnutan · July 26, 2018, 12:05pm

Agree. Cannot say how often (are you thinking of weekend?), but hope the time it takes doesn’t clash with “work hours” (I guess the task can consume many server resources).

An option for prune frequency and schedule would be good.

m.traeumner · July 27, 2018, 7:24am

I think it would be good to let the user set the time and the frequency for the prune operation, because every business is different. Perhaps some people need to do the job every 2 days to have enough storage place for the next backup and others work on Sunday but not at Tuesday for example.

giacomo · July 30, 2018, 1:04pm

Improved the doc: Backup-data: multiple schedule and backends - #47 by dnutan

Yes, this log has been removed and everything is now logged inside /var/log/backup.
Did you find any incorrect documentation about it? Maybe I missed something.

This is quite strange, I was quite sure to have already tested it I will work on it.

I agree but the implementation could be hard because during the prune, no other restic backup must be running.
What about executing the prune operation using a hook script (GitHub - NethServer/nethserver-backup-data)?

Not really, it can be executed even on working hours on a normal server.

Agree, a new option is necessary.

But following my proposal I’d like to add a Prune option for each backup with following values:

never: do not prune
always: always prune after the backup
day of the week: a value from 0 to 6, which execute the prune once a week on the selected day

pike · July 30, 2018, 2:33pm

Prune could (or could not) overlap time on backup via restic backend. Which could generate an error.
I’m asking if could be possible (or logical) have the option “prune before backup”, or add a field/option for “on which occasion restic backup should be pruned”…
@giacomo is week the only timespace? Could be more time-to-gained-space efficient prune once a month?

giacomo · July 30, 2018, 3:25pm

It’s almost what I was proposing. The only difference is that prune will take place after the backup (not before).

I guess yes on machines where only few files are deleted. I was proposing it on a weekly timing because the implementation is very very easy

dnutan · July 31, 2018, 11:36am

IIRC /var/log/backup was empty when I tried, and /var/log/last-backup.log was referenced on stdout/stderr after backup failed or completed. Also this log was used to read the last backup data shown on dashboard for duplicity engine.

giacomo · July 31, 2018, 5:05pm

The log is generated only if the backup is invoked using backup-data-wrapper, otherwise the output is sent to stdout.
I also removed the wrong reference to the old unused /var/log/last-backup.log file.

This part has been already implemented using hooks (nethserver-backup-data/root/etc/backup-data.hooks/backup-dashboard-status at master · NethServer/nethserver-backup-data · GitHub).

I also tried to fix the test case 4, but it’s a very big refactor and the packages needs a full QA again
If you still have the machine could you please verify the bug is gone at least for the test case 4?
Use the latest package from this PR: Refactor mount by gsanchietti · Pull Request #22 · NethServer/nethserver-backup-data · GitHub

As a side note, this refactor needs also a new PR for nethserver-duc which should now exclude all backup mount directories like /mnt/backup-* (I hope @edoardo_spadoni can help here).

dnutan · July 31, 2018, 5:21pm

Components
nethserver-backup-data-1.3.4-1.61.pr22.ge55f992.ns7.noarch

Now mount is working as expected.

Single Backup Mode: last backup end time on dashboard is off by 2 hours (ex. 20:07). Log filename and content has the correct time (ex. 22:07, same as date command). Backup called from backup-data-wrapper.

MULTIPLE BACKUPS
test case 4 (duplicity, restic; cifs, nfs):

if custom inclusion is set: global backup inclusion not respected. Only files on custom inclusion are backed up.
backup-data-list worked
concurrent backups not tested

giacomo · August 1, 2018, 6:42am

This should depends on the different time zones for PHP an the system.

This is the expected behavior (GitHub - NethServer/nethserver-backup-data) maybe we should improve the doc.

Thank you Marc, as always your testing is really appreciated and extremely useful!

dnutan · August 1, 2018, 7:29am

multiple backups read the same configuration of the single backup.
(…) file will override the list on included and excluded files from the single backup

OK, so /etc/backup-data/*.include takes preference over /etc/backup-data.d/custom.include

Sorry, by using “global inclusion” wording didn’t make myself clear. If custom inclusion is set for the multiple backup job then NOTHING else is backed up (no /var/lib/nethserver/* …)