Before I begin, I should note that this is a very technical and in-depth post. Good luck.
FreeForums.org launched in February of 2007, and has grown at a rapid pace ever since. With that comes growing pains.
For the first many months, a very simple set was required to keep the forums moving along. Within a few months, the need for more power came, but still nothing a single Intel Xeon couldn’t handle. But that was bound to change. At the time that I came to FreeForums.org, the setup was as follows:
Web Server
Single Intel Xeon L5310 (quad-core, 1.60Ghz, 8MB L2 cache)
2GB DDR2-667 FBDIMM RAM
250GB SATA2 Hard Drive
Overall CPU Usage: 30%
Database Server
Quad AMD Opteron 8212HE (dual-core, 2.00Ghz, 1MB L2 cache)
16GB DDR2-667 FBDIMM RAM
250GB SATA2 Hard Drive
Overall CPU Usage: 20%
At the time there were several major flaws in design that originally did not pose a problem.
Non-RAID Data Storage
In the world of servers and corporations, data redundancy and protection is an absolute must. What RAID does is it takes multiple hard drives and configures them in such a way that the loss of a numbre of hard drives (due to failure, for example) will not result in data loss (this is true for all levels of RAID except for RAID0).
The importance of RAID became very obvious the night I began consulting for FreeForums.org. That night, I was called in because all attempts to syncronzie the new MySQL server with the existing server’s data failed. The InnoDB database (see below) was unable to be read by the new server no matter what the staff tried to do.
My solution was that if we couldn’t move the data file, then let’s export the data ourselves and send it over row-by-row to the new server. But there laid yet another problem: All of the tables were in a single database. MySQL’s SQL export utility “mysqldump” was unable to dump SQL data as fast as we needed it to (45 minutes per forum at 44,000 forums or so, total time: 3.77 years). So my solution was to write a script that would read the data from MySQL, create the proper schema for it, and send it to the new server.
On the night of July 21, while the script was running and moving forums (it had moved roughly 4000 forums), the old MySQL server crashed and would never come back up. The single hard drive housing the InnoDB data file (containing all the forum data) corrupted and failed. After trying for 14 hours to recover the data, it became obvious that the data was lost.
The solution to prevent this in the future was simple: utilize RAID. The first hardware change I made with FreeForums.org was to institute the usage of RAID10 on all MySQL servers. RAID10 is a level of RAID which utilizes the speed benefits offered by RAID0 but provides data protection at the same time. In RAID0, each hard drive is setup in a linear segment in which data is sent in blocks across all the hard drives. It allows you a very fast virtual hard drive, but it offers no data protection. Four 80GB hard drives setup in RAID0 will offer you 320GB of storage space, but if even one hard drive fails you will lose all your data. RAID10 allows you to gain the benefits of RAID0, but it gives you data protection by pairing up each segment with a RAID1 mirror. In this case, four 80GB hard drives will offer you 160GB of storage space, but if you lose a drive in any segment, you’re safe. You can lose one drive in every segment without data loss, but should two drives in the same segment fail, then you lose all of your data. But the chances of that happening are slim to none without you having enough time to fix it.
Another benefit of RAID is that it gets you added disk bandwidth. The more drives you have, the more bandwidth is available to you. For us, that is one definite plus.
Thanks to RAID, we will never have to worry about what happened on July 21 from ever happening again.
InnoDB Storage Engine in MySQL
InnoDB is a great storage engine when you have very large and active tables, but when you have an insane amount of tiny tables, InnoDB becomes a huge problem. Mainly, InnoDB uses a lot of system resources to read data from it’s main file (ibdata1). Back in July, we needed, at an absolute minimum, 16GB of RAM on the MySQL server to stay online for even 8 hours. But the truth of the matter was, we needed 32GB to remain stable, and that number would only grow as FreeForums.org grew.
The solution to this was a simple yet time-consumng task. Convert from the InnoDB storage engine and to MyISAM, which is more suited to our needs. In August, I wrote a script which would alter our database tables to MyISAM one table at a time. Upon launching this script, it took 181 hours (7 days, 13 hours) to process. This process was done with the forums online as 7 days offline was unacceptable. Good thing was, not a single person noticed that we were doing this as it caused 0 downtime.
The outcome of this was noticed immediately. Our database server went from needing at least 16GB RAM to needing only a dismal 4.50GB. Furthermore, database restarts went from taking 45 minutes, or longer, to only seconds as MySQL no longer needed to read through a InnoDB data file that was hundreds of gigabytes in size.
Ultimately, this change was absolutely necessary. Had we not made the change, then today we would need at least 128GB of RAM on our MySQL servers to stay online. Come a few months from now, and we’d be up to 192GB which is currently the maximum limit than any Intel server can address (AMD is limited to 128GB).
Single Database for All Forums
Originally, a single database to house all forums was viewed as a decent choice as there weren’t many forums, there weren’t many tables, and this appeared to be the most simple system to work with. At the time, this was true. But with growth comes new needs, and things change.
As of July 21, there were just over 44,000 forums being hosted by FreeForums.org, and they were all hosted within a single database. This introduces a handful of problems. First, MySQL had to be able to address and stat every single file within the database (over 5 million files) every time it started up. The problem there was that the Linux kernel can only handle so many files in a single directory before it runs into problems, and we were well past that point. There were no inodes left on the hard drive.
The solution to this was implemented while migrating forums from the old database server over to the new database server. Store 2000 forums in a single database, then make a new one and start over. This fixed many of our issues including dumping MySQL data, kernel issues, and helped reduce start time as well as many other issues.
The Future
These changes were the first of many that were needed and still are needed to bring FreeForums.org into a new day; a day free of problems.
If you have any questions, please feel free to leave them in a comment. I’d be glad to answer them.
In the next blog entry, I’ll detail what we have done since July 21 through today.