I’m using MrMarkuz’s packaged Zabbix - so far working excellently.
Since one of the last updates, all systems won’t correctly repport the backup. The Backup, however, is correctly done.
Any ideas?
Most likely it has something to do with changed reporting in the update of the backup module…
Yes, it seems like the backup-data log file that’s used by the script has changed. It’s now located in /var/log/backup/backup-backup-data-TIMESTAMP.log and has a slightly different format. The backup-script now takes the latest file to check if the backup time is ok and if it contains SUCCESS.
For now I adapted the script of @syntaxerrormmm with my weak python skills, it should work but please check and improve when needed. It now checks in the latest log file if the backup time is ok and if it contains SUCCESS.
Change /usr/local/bin/nethbackup_check.py to this:
Show script
#!/usr/bin/env python
# vim:sts=4:sw=4
# encoding: utf-8
import datetime, re, sys
import glob
import os
list_of_files = glob.glob('/var/log/backup/backup-backup-data-*.log')
latest_file = max(list_of_files, key=os.path.getctime)
BACKUPTYPE = {
'Data': latest_file,
'Config': '/var/log/backup-config.log'
}
def backup_check(backuptype, validity):
# get line with time
f = open(BACKUPTYPE[backuptype])
timeline = f.readlines()[-6]
f.close()
# get line with status, hopefully success
f = open(BACKUPTYPE[backuptype])
successline = f.readlines()[-7]
f.close()
# Splitting the lines once read
timeline_arr = str.split(timeline)
successline_arr = str.split(successline)
# Extract the date
check = datetime.datetime.strptime(timeline_arr[3], '%Y-%m-%d').date()
end = datetime.date.today()
start = end - datetime.timedelta(days = int(validity))
# Verifies the status of the last backup
if start <= check <= end and re.match(r'SUCCESS', successline_arr[2]):
return 1
return 0
if __name__ == '__main__':
print(backup_check(sys.argv[1], sys.argv[2]))
Please test and adapt, if it works I’ll add it to the module.
Ooops, it seems the backup-config logfile has changed too, so for now only the backup-data check is working…
Thanks for testing!
EDIT:
I adapted the script /usr/local/bin/nethbackup_check.py for testing with working backup-config check, I am afraid I have to rewrite it because the original logic implies log files with same format and now config is checked via /var/log/messages.
Show script
#!/usr/bin/env python
# vim:sts=4:sw=4
# encoding: utf-8
import datetime, re, sys
import glob
import os
import subprocess
list_of_files = glob.glob('/var/log/backup/backup-backup-data-*.log')
latest_file = max(list_of_files, key=os.path.getctime)
BACKUPTYPE = {
'Data': latest_file,
'Config': '/var/log/messages.log'
}
def backup_check(backuptype, validity):
if backuptype == 'Data':
# get line with time
f = open(BACKUPTYPE[backuptype])
timeline = f.readlines()[-6]
f.close()
# get line with status, hopefully success
f = open(BACKUPTYPE[backuptype])
successline = f.readlines()[-7]
f.close()
# Splitting the lines once read
timeline_arr = str.split(timeline)
successline_arr = str.split(successline)
# Extract the date
check = datetime.datetime.strptime(timeline_arr[3], '%Y-%m-%d').date()
end = datetime.date.today()
start = end - datetime.timedelta(days = int(validity))
# Verifies the status of the last backup
if start <= check <= end and re.match(r'SUCCESS', successline_arr[2]):
return 1
if backuptype == 'Config':
cmd = ["""grep 'post-backup-config SUCCESS' /var/log/messages | tail -1"""]
output = subprocess.check_output(cmd,shell=True)
# Splitting the lines once read
line_arr = str.split(output)
# From last line I will also extract the date
check = datetime.datetime.strptime(line_arr[0] + " " + line_arr[1] + " " + str(datetime.datetime.now().year), '%b %d %Y').date()
end = datetime.date.today()
start = end - datetime.timedelta(days = int(validity))
# Verifies the status of the last backup
if start <= check <= end:
return 1
return 0
if __name__ == '__main__':
print(backup_check(sys.argv[1], sys.argv[2]))
Hi
Seems your Python capabilities are understated…
The check seems to work - at least for the Data Backup part, as stated.
The Config Backup still needs to be adapted.
Maybe this evening I’ll find time to go over the code…
Just had an emergency call from my client (Hotel). The conference room with over 50 people for the subject “Challenges of Digitization” - and Internet does not work! (Murphy does such things!)
In the end it was a half year old High Speed USB Disk, acting as the firewall HD. On ANY System, formatting would work - till ca. 50%. Then dead! Sandisk usually have good quality stuff, and the firewall wasn’t doing much writing in there…
Sh"t happens, as the saying goes… It’s working again, had to buy a new USB Stick…
Guys, thanks for all the efforts trying to cope with updates.
Obviously, I have a lot of NSs failing the backups (which is obviously not the case) in our monitoring system, so I am affected on the update too.
I was working on a new version of the script on last friday, so stay tuned Just hope our customers don’t ask for the moon in the meantime
This is the sort of thing when “Upstream” changes something in the middle of the game…
It’s ok and fine with me if such changes happen in Major Upgrades - not in minor updates…
I do remember when still using SME-Server - and RH decided to change the encoding page for the samba part from ISO8859-1 to UFT8 or something like that a few years back…
OK, so what happens to all users who had valid passwords in the Database (LDAP/AD). No one can log in any more!
Great!
I think, on that particular day, someone “Upstream” didn’t turn on their brain in the morning!
I worked on a complete rewrite and:
1 - I am not completely satisfied with the result (much spaghetti code, a lot of repetitions);
2 - It does not support the multiple backup jobs yet (only ‘Config’ and ‘Data’ can be passed for check);
3 - Should be less change-prone (since it checks out /var/log/messages instead of the single backup file) — Now it only depends on the syntax of the SUCCESS/FAILURE line;
4 - Because it needs to access to /var/log/messages, now it requires to be run with sudo in userparameters (at least if you run your zabbix system with a user different from root).
great!
just a quick test… i’ve updated an almost clean install of 3.4 (only a discovery rule was set) . it worked and didn’t see any error in logs.
tnx!
[root@nethmon01 ~]# tail -n 20 /var/log/zabbix/zabbix_agentd.log
949:20181006:140012.176 **** Enabled features ****
949:20181006:140012.176 IPv6 support: YES
949:20181006:140012.176 TLS support: YES
949:20181006:140012.176 **************************
949:20181006:140012.176 using configuration file: /etc/zabbix/zabbix_agentd.conf
949:20181006:140012.176 agent #0 started [main process]
951:20181006:140012.180 agent #1 started [collector]
952:20181006:140012.180 agent #2 started [listener #1]
953:20181006:140012.182 agent #3 started [listener #2]
954:20181006:140012.182 agent #4 started [listener #3]
955:20181006:140012.191 agent #5 started [active checks #1]
955:20181006:140012.195 active check configuration update from [127.0.0.1:10051] started to fail (cannot connect to [[127.0.0.1]:10051]: [111] Connection refused)
955:20181006:140112.229 active check configuration update from [127.0.0.1:10051] is working again
955:20181006:140112.229 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
955:20181006:140312.249 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
955:20181006:140512.267 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
955:20181006:140712.283 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
955:20181006:140912.300 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
955:20181006:141112.318 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
955:20181006:141312.336 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
[root@nethmon01 ~]# tail -n 20 /var/log/zabbix/zabbix_agentd.log-20180923
953:20180923:215059.714 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
953:20180923:215259.734 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
953:20180923:215459.753 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
953:20180923:215659.772 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
953:20180923:215859.791 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
953:20180923:220059.810 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
953:20180923:220259.829 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
953:20180923:220459.846 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
953:20180923:220659.861 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
953:20180923:220859.879 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
953:20180923:221059.895 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
953:20180923:221259.911 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
953:20180923:221459.929 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
953:20180923:221659.946 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
953:20180923:221859.962 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
953:20180923:222059.978 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
953:20180923:222259.994 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
953:20180923:222459.014 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
953:20180923:222659.029 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
953:20180923:222859.044 no active checks on server [127.0.0.1:10051]: host [Zabbix] not found
[root@nethmon01 ~]# tail -n 20 /var/log/zabbix/zabbix_server.log
1562:20181006:140023.214 server #23 started [trapper #1]
1563:20181006:140023.215 server #24 started [trapper #2]
1564:20181006:140023.218 server #25 started [trapper #3]
1568:20181006:140023.218 server #29 started [alert manager #1]
1549:20181006:140023.219 server #13 started [escalator #1]
1567:20181006:140023.222 server #28 started [icmp pinger #1]
1550:20181006:140023.222 server #14 started [proxy poller #1]
1569:20181006:140023.223 server #30 started [preprocessing manager #1]
1571:20181006:140023.223 server #32 started [preprocessing worker #2]
1566:20181006:140023.225 server #27 started [trapper #5]
1570:20181006:140023.300 server #31 started [preprocessing worker #1]
1572:20181006:140023.300 server #33 started [preprocessing worker #3]
1565:20181006:140112.229 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1562:20181006:140312.249 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1564:20181006:140512.266 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1563:20181006:140712.283 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1565:20181006:140912.300 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1565:20181006:141112.318 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1564:20181006:141312.336 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1566:20181006:141512.353 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
[root@nethmon01 ~]# tail -n 20 /var/log/zabbix/zabbix_server.log-20180923
1290:20180923:215059.713 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1290:20180923:215259.734 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1291:20180923:215459.753 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1295:20180923:215659.772 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1290:20180923:215859.791 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1290:20180923:220059.810 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1292:20180923:220259.829 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1291:20180923:220459.845 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1292:20180923:220659.861 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1292:20180923:220859.879 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1292:20180923:221059.895 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1293:20180923:221259.910 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1292:20180923:221459.929 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1291:20180923:221659.946 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1292:20180923:221859.962 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1293:20180923:222059.978 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1293:20180923:222259.994 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1293:20180923:222459.014 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1293:20180923:222659.029 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
1293:20180923:222859.044 cannot send list of active checks to "127.0.0.1": host [Zabbix] not found
Did an update of Zabbix from 3.4.x to 4.0.4 LTS on a “productive” Home-Server.
No errors in the Terminal during update, however the Web-Interface of Zabbix still shows 3.4.xx…
Any ideas? (Done the upgrade on two Zabbix Servers, both no problems but still showing the older Versions…)
tnx fo the link…
upgrading to 4.0.8 also give me error on cache directory not witable and also if changing owner to asset folder solve the error, still have problems with missing text on graphs. i have made a rollback to 4.0.7 for now