We performed housekeeping on our Zabbix instance and by the morning the whole system became unresponsive and even though multiple restarts the database kept hanging. Using the following command: journalctl -u mariadb.service we managed to pull this information.
Feb 05 04:09:07 bezabbix001 mysqld[4042]: 2020-02-05 4:09:07 139744107072640 [Note] /usr/sbin/mysqld (mysqld 10.1.43-MariaDB-0ubuntu0.18.04.1) starting as process 4042 …
Feb 05 04:09:07 bezabbix001 mysqld[4042]: 2020-02-05 4:09:07 139744107072640 [Warning] Could not increase number of max_open_files to more than 16364 (request: 20115)
Feb 05 04:10:37 bezabbix001 systemd[1]: mariadb.service: Start operation timed out. Terminating.
Feb 05 04:12:07 bezabbix001 systemd[1]: mariadb.service: State 'stop-sigterm' timed out. Skipping SIGKILL.
Feb 05 04:13:37 bezabbix001 systemd[1]: mariadb.service: State 'stop-final-sigterm' timed out. Skipping SIGKILL. Entering failed mode.
Feb 05 04:13:37 bezabbix001 systemd[1]: mariadb.service: Failed with result 'timeout'.
Feb 05 04:13:37 bezabbix001 systemd[1]: Failed to start MariaDB 10.1.43 database server.
The first issue indicated insufficient number of open files. The solution to this issue was to increase those numbers by setting the following in the /lib/systemd/system/mariadb.service file.
LimitNOFILE=200000
LimitMEMLOCK=200000
Once we set the setting above and run reload with the following command the Warning message went away.
sudo systemctl daemon-reload
The server however still wasn’t willing to start showing the following in the log:
Feb 05 04:28:19 bezabbix001 mysqld[4932]: 2020-02-05 4:28:19 140431509490816 [Note] /usr/sbin/mysqld (mysqld 10.1.43-MariaDB-0ubuntu0.18.04.1) starting as process 4932 …
Feb 05 04:29:49 bezabbix001 systemd[1]: mariadb.service: Start operation timed out. Terminating.
Feb 05 04:31:19 bezabbix001 systemd[1]: mariadb.service: State 'stop-sigterm' timed out. Skipping SIGKILL.
Feb 05 04:32:49 bezabbix001 systemd[1]: mariadb.service: State 'stop-final-sigterm' timed out. Skipping SIGKILL. Entering failed mode.
Feb 05 04:32:49 bezabbix001 systemd[1]: mariadb.service: Failed with result 'timeout'.
Feb 05 04:32:49 bezabbix001 systemd[1]: Failed to start MariaDB 10.1.43 database server.
Feb 05 04:34:20 bezabbix001 systemd[1]: mariadb.service: Got notification message from PID 4932, but reception only permitted for main PID which is currently not known
Feb 05 04:34:22 bezabbix001 systemd[1]: mariadb.service: Got notification message from PID 4932, but reception only permitted for main PID which is currently not known
Feb 05 04:34:22 bezabbix001 systemd[1]: mariadb.service: Got notification message from PID 4932, but reception only permitted for main PID which is currently not known
This indicated that the service didn’t start within the default 90 seconds however it came up later. Since it has been timed out by service control already once the server came up it was immediately shut down. Looking at the /var/www/mysql/error.log which is the error log for maridb/mysql, we found that the server was just busy processing data and the startup was taking much longer than 90 seconds.
2020-02-05 4:39:55 140524339281024 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2020-02-05 4:39:55 140524339281024 [Note] InnoDB: The InnoDB memory heap is disabled
2020-02-05 4:39:55 140524339281024 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2020-02-05 4:39:55 140524339281024 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2020-02-05 4:39:55 140524339281024 [Note] InnoDB: Compressed tables use zlib 1.2.11
2020-02-05 4:39:55 140524339281024 [Note] InnoDB: Using Linux native AIO
2020-02-05 4:39:55 140524339281024 [Note] InnoDB: Using SSE crc32 instructions
2020-02-05 4:39:55 140524339281024 [Note] InnoDB: Initializing buffer pool, size = 3.0G
2020-02-05 4:39:55 140524339281024 [Note] InnoDB: Completed initialization of buffer pool
2020-02-05 4:39:55 140524339281024 [Note] InnoDB: Highest supported file format is Barracuda.
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 86545366 row operations to undo
InnoDB: Trx id counter is 311544320
2020-02-05 4:54:54 140524339281024 [Note] InnoDB: 128 rollback segment(s) are active.
2020-02-05 4:54:54 140520315680512 [Note] InnoDB: Starting in background the rollback of recovered transactions
2020-02-05 4:54:54 140520315680512 [Note] InnoDB: To roll back: 1 transactions, 86545366 rows
2020-02-05 4:54:54 140524339281024 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.45-86.1 started; log sequence number 376642022837
2020-02-05 4:54:54 140520315680512 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 0 failed attempts to flush a page!
2020-02-05 4:54:54 140520315680512 [Note] InnoDB: Consider increasing the buffer pool size.
2020-02-05 4:54:54 140520315680512 [Note] InnoDB: Pending flushes (fsync) log: 0 buffer pool: 1 OS file reads: 341095 OS file writes: 3 OS fsyncs: 2
2020-02-05 4:54:55 140520240146176 [Note] InnoDB: Dumping buffer pool(s) not yet started
2020-02-05 4:54:55 140524339281024 [Note] Plugin 'FEEDBACK' is disabled.
2020-02-05 4:54:55 140524339281024 [Note] Server socket created on IP: '0.0.0.0'.
2020-02-05 4:54:55 140524338853632 [Note] /usr/sbin/mysqld: Normal shutdown
2020-02-05 4:54:56 140524339281024 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.1.43-MariaDB-0ubuntu0.18.04.1' socket: '/var/run/mysqld/mysqld.sock' port: 3306 Ubuntu 18.04
2020-02-05 4:54:56 140524338853632 [Note] Event Scheduler: Purging the queue. 0 events
2020-02-05 4:54:56 140520273716992 [Note] InnoDB: FTS optimize thread exiting.
2020-02-05 4:54:56 140524338853632 [Note] InnoDB: Starting shutdown…
2020-02-05 4:54:56 140524338853632 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool
2020-02-05 4:54:58 140524338853632 [Note] InnoDB: Shutdown completed; log sequence number 376642023203
2020-02-05 4:54:58 140524338853632 [Note] /usr/sbin/mysqld: Shutdown complete
The following settings were added to the /lib/systemd/system/mariadb.service file to increase service start and stop timeout.
TimeoutStartSec=infinity
TimeoutStopSec=infinity
Run reload to apply these settings:
sudo systemctl daemon-reload
The mariadb server slowly but finally started after increasing the timeout.
2020-02-05 5:12:06 139656180632704 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2020-02-05 5:12:06 139656180632704 [Note] InnoDB: The InnoDB memory heap is disabled
2020-02-05 5:12:06 139656180632704 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2020-02-05 5:12:06 139656180632704 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2020-02-05 5:12:06 139656180632704 [Note] InnoDB: Compressed tables use zlib 1.2.11
2020-02-05 5:12:06 139656180632704 [Note] InnoDB: Using Linux native AIO
2020-02-05 5:12:06 139656180632704 [Note] InnoDB: Using SSE crc32 instructions
2020-02-05 5:12:06 139656180632704 [Note] InnoDB: Initializing buffer pool, size = 5.0G
2020-02-05 5:12:06 139656180632704 [Note] InnoDB: Completed initialization of buffer pool
2020-02-05 5:12:06 139656180632704 [Note] InnoDB: Highest supported file format is Barracuda.
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 86545366 row operations to undo
InnoDB: Trx id counter is 311544832
2020-02-05 5:25:38 139656180632704 [Note] InnoDB: 128 rollback segment(s) are active.
2020-02-05 5:25:38 139649825634048 [Note] InnoDB: Starting in background the rollback of recovered transactions
2020-02-05 5:25:38 139649825634048 [Note] InnoDB: To roll back: 1 transactions, 86545366 rows
2020-02-05 5:25:38 139656180632704 [Note] InnoDB: Waiting for purge to start
2020-02-05 5:25:38 139656180632704 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.45-86.1 started; log sequence number 376642023203
2020-02-05 5:25:38 139649825634048 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 0 failed attempts to flush a page!
2020-02-05 5:25:38 139649825634048 [Note] InnoDB: Consider increasing the buffer pool size.
2020-02-05 5:25:38 139649825634048 [Note] InnoDB: Pending flushes (fsync) log: 1 buffer pool: 0 OS file reads: 341115 OS file writes: 5 OS fsyncs: 4
2020-02-05 5:25:39 139649750099712 [Note] InnoDB: Dumping buffer pool(s) not yet started
2020-02-05 5:25:39 139656180632704 [Note] Plugin 'FEEDBACK' is disabled.
2020-02-05 5:25:39 139656180632704 [Note] Server socket created on IP: '0.0.0.0'.
2020-02-05 5:25:40 139656180632704 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.1.43-MariaDB-0ubuntu0.18.04.1' socket: '/var/run/mysqld/mysqld.sock' port: 3306 Ubuntu 18.04
2020-02-05 5:25:53 139649825634048 [Note] InnoDB: To roll back: 1 transactions, 86544705 rows
2020-02-05 5:26:08 139649825634048 [Note] InnoDB: To roll back: 1 transactions, 86543690 rows
2020-02-05 5:26:23 139649825634048 [Note] InnoDB: To roll back: 1 transactions, 86464992 rows
2020-02-05 5:26:38 139649825634048 [Note] InnoDB: To roll back: 1 transactions, 85968784 rows
2020-02-05 5:26:53 139649825634048 [Note] InnoDB: To roll back: 1 transactions, 85674936 rows
2020-02-05 5:27:08 139649825634048 [Note] InnoDB: To roll back: 1 transactions, 85274020 rows
Thanks, very helpful!