Archive for the ‘Technology’ Category

FreeForums.org utilizes several advanced technologies to bring its forum hosting platform to its viewers.

The Web Front

FreeForums utilizes the Zeus ZXTM Load Balancing software to distribute all incoming requests for forums across its Intel Xeon 5400 web servers. Zeus provides us with a seamless way to distribute requests and handle down web servers without causing interruption to the viewers.

All incoming requests land at the Zeus ZXTM server. Zeus then selects a backend web node to serve the request based on the current load of each node and historical data to suggest which node has the most available processing power. In the event that a node is unreachable, it is removed from the pool and all requests are redirected to another node. This happens without the viewer ever knowing there was a problem.

All customer data is distributed across several MySQL storage servers each utilizing Intel Xeon 5400 CPUs and eight-disk, 15000RPM, SAS RAID arrays. This distribution ensures that problems with any one server will not affect other servers and also provides for adequate power across the entire network.

Usage of Memcached

Reducing the read/write load on our MySQL servers has been a continuing struggle for us. To help us achieve our goals, we utilized Memcached.

Using memcached, we are able to store stats, configuration data, and more into the memory of our web servers. Doing this prevents us from having to access the MySQL disks more often and results in pages loading faster. A perfect example: The Ajax chat system runs entirely within Memcached. Without Memcached, the Ajax chat system cannot work as it causes too much load on the MySQL servers.

Backup System

FreeForums.org has a lot of data (and I mean a lot of data). Each MySQL server has well over 20 million files on its filesystem. Every customer also has their own attachments folder full of its own files. To make backups of all this data has never been easy. At first we utilized rsync to send manual backups to our backup server, but that proved to be time consuming, slow, and unreliable.

Enter our backup system. Our backup system is an automated backup solution which provides us with point-in-time backups of all our data with easy point-click file restorations. It runs continuously and has helped us many times in the past to recover lost data.

Our global MySQL server, support board, and fileserver get backed up every 15 minutes and maintain over two weeks of checkpoints. This means that we can choose which version of a file we wish to restore from over the past two weeks. Our MySQL data servers are backed up once per day and also maintain two weeks of checkpoints.

Since July 21, focus shifted at FreeForums in the way we thought and went about seeing our future. We wanted to make sure that what happened that night could never happen again. At the same time, we were seeing problems brewing in the air.

Database Clusters
Due to the rate at which FreeForums was growing and the limitations of MySQL replication, we saw ourselves running into a very severe road block. To maintain our MySQL servers, at the time we had one master MySQL server and several slave servers. All writes would go to the MySQL master which would then be synchronized to the MySQL slaves which all reads went to.

The problem we saw was that MySQL replication, on the slaves, was only capable of using a single CPU core to process updates coming from the master. To handle the operation of replication, the master server needed use of all its cores, yet each slave could only use one. Eventually we would hit a wall at 100% usage of the core and thus be unable to process any more updates. The result from this: slaves would fall out of sync.

Our solution to this was to break FreeForums up into multiple independent MySQL clusters. This would distribute writes over serveral master servers and allow us to add even more slave servers. But writing a system for this would not be easy and would take some extensive testing and development.

All the while we were developing this new system, we began hitting the writing bottleneck on our slaves. Throughout December, we were having slaves fall out of sync whenever there was high traffic hitting our web servers; however, thanks to the holidays, we were given two weeks of lower traffic which gave us more time to test and develop the system. Come January, however, that went away and for the entire first week and a half, slaves were out of sync every day from 11AM thru 7PM.

Given the obviousness that we were out of time, we halted testing of the second cluster system and launched it. The tests had been going well, and we wanted to continue testing a few more things to make certain that the system would work as expected, but we were out of time. On the morning of January 15, while watching Cluster-0 slaves fall out of sync at 6AM, I decided that we had to launch the system immediately. At that time, we picked 39 databases accounting for 20% of our daily traffic and sent them to the new cluster.

Low and behold, the Cluster0 slaves synced back up, and since that date we haven’t had a single slave fall out of sync. At this time, we have finished testing and development of the clustering system and will be using it from now on. The addition of clustering was a huge step forward for us as it will guarantee the speed and stability of FreeForums in the coming years.

Web Servers
The second problem we were already facing was that we needed a more stable environment to serve the websites that we were hosting. We had multiple web servers, but the way in which we were load balancing them was by the use of round robin. Round robin is a method where in your DNS records, used by the Internet to look up where your website is located, you list multiple IP addresses. When you type in your web address, your computer randomly selects from that pool of IP addresses and that is the server you land on. Problem here is that there is no load distribution and there is no failover protection.

Those two issues would spell chaos as FreeForums grew. Even though we had three web servers, one of them was receiving 50% of all traffic with the other two receiving 25% each. Next, what would happen if, say, the web server receiving 50% of traffic would go down? The answer is simple: 50% of viewers wouldn’t be able to access the forums they were trying to view.

The solution to this was simple, in theory; however, it would take time to test and develop. Consulting with SoftLayer network engineers, we were directed to look at the Zeus ZXTM Load Balancer. Zeus ZXTM is a software suit that distributes requests for various TCP and UDP services based on specifications that you set for it.

What Zeus ZXTM would give us is a level of stability and redundancy on our web servers that has never been seen before my FreeForums.org and its users. By sending all requests to the Zeus ZXTM load balancing server, we are able to evenly distribute all requests over all web servers as well as redirect requests when a web server goes offline. This will make it impossible for a failed web server to ever be noticed by our viewers again.

As of January 26, all viewers are accessing FreeForums.org by way of the Zeus ZXTM load balancer, and we couldn’t be happier.

The Future
FreeForums.org is well on its way to making all points of service HA (high availability) by making every single service redundant with backup servers, but we still have a ways to go before we achieve this.

First, the global MySQL servers used to house the data shared between all clusters, while having a slave keeping mirrored copy of the data, is still setup in a Master+Slave method. Should the master go down, a human will need to convert the slave to a master server. The solution to this is to utilize MySQL Cluster, which allows for all servers in the mix to be masters. Once this is done, if the primary server ever goes down the others can take over immediately without anyone needing to intervene.

Next, the file storage server is currently standalone. There is no failover, there is no mirror. Just backups being made every fifteen minutes. But, aha!, yet another easy fix. We will be looking into using the Lustre File System as a means to create several file servers which act as one unified server. Should a server ever fail, go offline, or do anything that makes it unavailable, all requests to that server will be directed to the others. No one would ever know the difference.

Lustre is based off Sun Microsystem’s ZFS, a very powerful and robust filesystem. Lustre is currently in use by several of the world’s largest supercomputers, hosting hundreds of millions of files totaling petabytes in total space. So we’re confident that it can handle our humble hundreds of thousands of files.

Many more changes are coming in the future, and things are sure to be very exciting as we move into the future.