Recommended Posts

So I woke up late today, around 1300, to find that my Nextcloud instance was down.  I'm hosting it on Debian Bullseye via the regular old tarball manually set up with Apache, MariaDB/MySQL, PHP, etc.  It's been running great for literally years across multiple in-place upgrades to both Nextcloud and Debian.

 

After doing some tinkering it came to my attention that I MySQL was complaining it couldn't connect to the database.  Easy enough I figured, I'll just log into MySQL and see what's wrong.  Upon trying to launch the MySQL shell though it would ask for the password and then error out saying it couldn't connect to the server.

"ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/run/mysqld/mysqld.sock' (111)"

So I thought maybe the .sock file got messed with during an update or something and wasn't being removed properly, so I verified the location of the correct file by looking at the configs, all of which pointed to the same file, and I then deleted that mysqld.sock file and tried restarting MySQL, but still no dice.  I tried rebooting the whole server just for kicks, no luck.

 

I tried reinstalling MariaDB/MySQL but that apparently doesn't get rid of the existing configuration files, so what I ended up doing was apt purge --autoremove on mariadb-server, deleting /var/run/mysql, then reinstalling it and re-importing my most recent database backup (yesterday).  It's just a personal instance with myself, my wife and kids on it, and I've got it scheduled to do daily backups of the database, so it wasn't a huge issue.  What I'm curious about is why it crapped out in the first place.

 

While poking around in syslog I found the following line:

mariadbd[1115]: 2022-01-01 11:55:02 0 [ERROR] [FATAL] InnoDB: You should dump + drop + reimport the table to fix the corruption.


That timestamp is hours after any kind of automatic update/reboot would have taken place.

 

So something crazy happened that corrupted the actual database, but why would that have broken my ability to log into the MySQL shell to try and correct it?  It's saying I should dump and reimport the database, but I couldn't do that without having access to the MySQL shell.

 

I've checked the logs for apt and I don't see any kind of updates that would have been applied by unattended-upgrades; my last automatic update was December 18th.

 

Did anybody else have anything happen today with their database?  I guess it's definitely possible that Nextcloud encountered some kind of bug and corrupted its database.  I've done short SMART tests on all the drives in the system and found no issues, and the server is running on an UPS so there shouldn't have been any kind of power fluctuation or outage to cause any issues.  My UPS is reporting no events since the 17th either.

 

I guess I'm posting all this just to try and fish for thoughts from any of you who may have encountered this kind of thing in the past, or who may have some idea as to what happened.  I've restored a backup and everything is fine, but if there's something I can do to prevent the issue in the future, I'd like to do so.

Link to comment
https://www.neowin.net/forum/topic/1414223-mariadbmysql-took-a-dump-last-night/
Share on other sites

Well, I have a bunch of MySQL / MariaDB 5.x and MariaDB 10.x instances which are all running without issue right now.

 

I've had things like that happen before though. One cause is if the filesystem temporarily ran out of diskspace which can cause a table to require fixing. I've a suspicion that MySQL doesn't behave well if the data filesystem is briefly marked as read-only but it's just a hunch.

 

Table corruption can stop MySQL from starting though. That's a thing unfortunately.

 

Personally I'd recommend enabling the binlog and adding "--master-data=2" to your mysqldump line so that you can recover the database right up to the point where corruption occured. If you backup both the database dump file and the associated binlog files then you're pretty well sorted in terms of data recovery I think.

I just checked my install of MySQL running on Raspbian and all is well. With having to do a complete wipe and restore, the last entries are more than likely gone to see what the last thing that was modified or added/removed. The last time I had any corruption on my setup was testing new additions and was completely my own doing. Have you checked any connection logging to see if any weird connections were seen around the after the last time you knew it was working?

 

@DonC has a great point as that missing data between the last backup could be vital to see what happened.

The log entry immediately prior to the error messages is Nextcloud invoking its cron.php script, so I'm guessing it has something to do with that.  I've made copies of syslog from that timeframe so I may dig into it some more later, but I'm tired of reading logs since everything is back up and working I'll save it for later.

Good luck and I am happy that at least everything is back up and going. Keep us posted if you do dig into this. I am interested to see what you find if you do

On 01/01/2022 at 21:05, Gerowen said:

 

Did anybody else have anything happen today with their database? 

 

chinese hackers

On 01/01/2022 at 22:44, Marujan said:

chinese hackers

I thought about hackers of some sort, but there was no indications of any files missing or modified, no suspicious entries in auth.log, nothing banned by Fail2Ban, etc.  On top of that, all the various services hosted by the server are all hosted by their own non-root user accounts/groups and SSH is not open to the world and enforces public key authentication.  I'm fairly certain it was just some weird-ness with the database during the execution of Nextcloud's cron script.

 

Besides, with only 4 users, outside of some script kiddie who happened across a public share link I've posted somewhere, there's not really any incentive to try and bother my personal server.

Edited by Gerowen

So here's a piece of syslog.  You can see that at 11:50 the cron.php script executes and there are no problems.  5 minutes later it runs again (this is scheduled/expected), and this is where the problems begin.  So in the block of time between 11:50 and 11:55, something screwy happened.  I was asleep at the time, so I personally wasn't doing anything on the server directly, but we've all got cell phones and PCs connected to it all the time, plus I've shared several public links for photo albums and such with family members over Facebook, so even if I wasn't logged in, Nextcloud is constantly doing "something" in the background.

image.thumb.png.b4d6e17c3252dbf55a75302c9e5a5541.png

 

Here's the contents of auth.log for that particular block of time.  Nothing suspicious, root running cron and www-data running Nextcloud's cron.php script.

image.thumb.png.f22bf5f0c8231a95da077253eca4d1af.png

 

The database and the Nextcloud server files are stored on the main system drive which is a Western Digital Blue 2.5" SSD.  The actual data directory (user files) is stored however on a separate, encrypted RAID 5 "storage" partition.  Both drives have plenty of free space available.

image.png.f3fed2caa212019ffd5d112db6aa65d9.png

 

image.png.da1083f5449bd822d948673846af89b9.png

 

I never thought to keep a backup copy of the corrupted database for further inspection but once I got the backup copy up and running I deleted it.  I even had a copy of /var/run/mysql as a backup in the event that purging/re-installing MariaDB didn't fix the issue, but once it was clear everything was working again I deleted it.  But as far as I can tell, everything looks fine.  All I can figure is that I encountered some kind of weird bug/edge case.  I am running an older system.  The "server" originally started out as an old HP Pavilion P6803W tower PC that I bought ages ago.  Since then it has received an upgrade to a 6 core AMD Phenom II processor, 16GB of RAM, a new power supply, new case, etc.  The only original part is the motherboard.  However, all of the hardware in it is old and used, and the RAM isn't ECC, so it's totally possible that there was some sort of a bit flip or some other hardware issue.  I haven't had any issues in the past, but that doesn't mean they can't start, especially since the system has been running basically 24/7/365 for going on a decade now.  The temps have always been in great shape because I put an over-sized 125 watt cooler on a 95 watt chip.

image.png.2f78d8657437b73c344eb786920159d6.png

 

There's no indications of this being any kind of an attack either.  No changes made to my firewall rules, no new packages installed or removed, no modifications to any of my systemd service files, no files apparently tampered with or bothered, nobody banned by Fail2Ban, no unexpected auth attempts or blocked traffic on my firewall, no weird entries in syslog/kern.log, (at least that I've noticed) etc.

 

On the hardware front all the drives check out after running some short SMART tests, but I will see about doing a memtest scan on it at some point just to verify whether there are any issues with the RAM.  I'm gonna hope that it was just a software bug and I don't encounter it again because even though I don't mind replacing the server, I'm kind of attached to the old girl, :p  I will also verify that I don't have any other services hogging up my RAM as well just to be safe.

Edited by Gerowen
added screenshot as evidence of free space

At least from the quick views, nothing looks out of place. Knowing the the hardware is as old as it is could be just a really unfortunately timed hiccup. If the drivs check out and no bad sectors found, my next check would be the ram.

 

Keep up on those backs to be safe and I hope it does not happen again.🤞

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Posts

    • As much as I love owning my own hardware, it's hard to argue with the value. I'm not a huge gamer, I'd actually be interested in a cheaper plan with limited monthly hours, or even a pay-by-the-hour plan.
    • Well, they (LibreOffice/The Document Foundation) are bitchy and whiny, yes, but they're right, at least this time. It doesn't make sense to market something as "free and open source to thwart dependency on foreign companies' software" but at the same time, using the format of said companies (Microsoft) by default. That way, you are changing nothing, essentially, you're just using another UI. I'm not saying they should drop other formats altogether, but they shouldn't default to the thing they're trying to run away from in the first place. If you're gonna do something, just go all the way in, don't stop in the middle, IMO.
    • Words cannot express how much garbage this app is.
    • Vivaldi 8.0.4033.46 by Razvan Serea Vivaldi is a cross-platform web browser built for – and with – the web. A browser based on the Blink engine (same in Chrome and Chromium) that is fast, but also a browser that is rich in functionality, highly flexible and puts the user first. A browser that is made for you. Vivaldi is produced with love by a founding team of browser pioneers, including former CEO Jon Stephenson von Tetzchner, who co-founded and led Opera Software. Vivaldi’s interface is very customizable. Vivaldi combines simplicity and fashion to create a basic, highly customizable interface that provides everything a internet user could need. The browser allows users to customize the appearance of UI elements such as background color, overall theme, address bar and tab positioning, and start pages. Vivaldi features the ability to "stack" and "tile" tabs, annotate web pages, add notes to bookmarks and much more. Vivaldi 8.0.4033.46 fixes: [Chromium] Update to 148.0.7778.263 ESR (includes security fixes from 149.0.7827.102/103) Download: Vivaldi 64-bit | 139.0 MB (Freeware) Download: Vivaldi 32-bit | ARM64 View: Vivaldi Home Page | Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
    • I'm surprised they haven't found a way to fix that. As much as I don't like software wasting memory, a file manager is the kind of thing that makes sense to keep running in active memory for super-fast recall. I suspect that is why MS makes their File Explorer part of the main explorer.exe shell, so that it is guaranteed to always be running.
  • Recent Achievements

    • Week One Done
      FBSPL earned a badge
      Week One Done
    • One Year In
      Jim Dugan earned a badge
      One Year In
    • One Month Later
      Tommi118 earned a badge
      One Month Later
    • One Month Later
      sjbousquet earned a badge
      One Month Later
    • Week One Done
      sjbousquet earned a badge
      Week One Done
  • Popular Contributors

    1. 1
      +primortal
      484
    2. 2
      PsYcHoKiLLa
      195
    3. 3
      +Edouard
      155
    4. 4
      Steven P.
      83
    5. 5
      ATLien_0
      69
  • Tell a friend

    Love Neowin? Tell a friend!