Judging from the experience of our partners and clients you may face scalability issues due to very different reasons, some of which are:
- good marketing results (finally got traction!),
- launching your services in another country,
- finalizing a deal with a huge company,
- being mentioned on product hunt etc.
No matter what the source of a growing customer base is, it's usually a good sign. But also it can cause you a lot of pain and suffering, i.e. when you hit limits of the application, and it will become too slow to keep the customers interested, or even goes down.
Find out more in section: "Why new users can kill your web application?". If you assume this issue won’t affect you, trust me, and read the piece mentioned. Now, let's focus on a situation when you hit, or are close to reaching your scalability limits and need immediate help with your codebase.
How to (properly) scale a web application
In this article, I will consider two methods of scaling a web application:
- Scaling a web application by using proper design patterns - well thought, with good application architecture. Done before you have issues with performance.
- ASAP scalability when your system is struggling with the number of users to handle, and you need to act now.
Implementing good architecture and proper design patterns take time, and might be problematic when you already have a web app that is not well-designed. I will highlight some features of such scalability in the presented article, but in general, this is a topic for another discussion. Nevertheless, this approach can take months to implement. ASAP scalability is what I would like to focus on in this post. My goal is to help you scale within hours.
Quick ways to scale a web app
(when there is no time for good application architecture)
Let's list five of the easiest and most efficient ways to scale and optimize a web application. Maybe you have already tried some of them. But it's always a good idea to double-check if changes were introduced properly. I'm going to focus only on quick-wins you can apply within hours or even minutes. I'm assuming you are hosting your web application on AWS or similar cloud provider, but if not - most of the hacks are still possible to apply on bare-metal servers.
1. Adding more CPU/RAM
I suggest skipping this step if you are not in real trouble, as the other tips are better. But, if you went offline because of the load or the application is very slow, you can take it under consideration. It is a typical trade-off. Instead of making the application or infrastructure more efficient, you throw additional money at it to make it run better.
Before increasing parameters, a piece of theory first. There are two approaches to infrastructure scalability:
- Vertical scalability - you make your server bigger - add CPU, RAM etc. But be aware - there is a limit of how much CPU and RAM one server can have, so you will sooner or later hit the ceiling anyway.
- Horizontal scalability - that means you add more servers. This is the proper way to scale a web app. It allows scaling even if vertical scalability does not work anymore.
Horizontal scalability is what you would love to have, and if you can do that - go for it! But in 99% of projects we worked in, it was not possible, because of lack of proper architecture.
An easy step you can take to make your app work better is to upgrade your server with more CPU and RAM. In most cases, it will give you a bit more time to adjust your application and work on other steps described below.
Determine the correct server size
Before you use your cloud console "upgrade" button, it would be wise to check what is the biggest issue with your server. Is it CPU? Lack of RAM? Or maybe the disks are slow?
It's better not to guess what is the reason behind this. No crystal ball to look into for answers. Ask a web developer or server administrator for help and check if the CPU and RAM are an issue or not.
You don't want to skip this step. I've seen people upgrading to 32GB RAM when only 4GB was needed! I’ve also seen applications running slow even though the server was working at 5% of its power. So please, do not skip the verification step.
When you have a bit more time (after you resolve the current issue), you should definitely make sure you are able to monitor your infrastructure properly. What does it mean? That you have a dashboard with information about CPU, memory, disk usage across all your servers, and that it will notify you when the load goes up. Take a look at tools like Zabbix or Datadog. The latter operates great in our projects.
I’ve seen production systems go down because of lack of free disk space. This is very easy to fix, and it should never happen. By implementing infrastructure alerts, you would be notified before the issue occurs!
You can start with a very simple dashboard with only basic information, this will be already a huge improvement!
2. Add cache
Cache is a mechanism that saves some calculated data, and returns them instead of calculating again the next time they are needed.
The first thing you want to do with cache to improve your web application scalability is to add cache (ex. CloudFront or CloudFlare) in front of it. The mentioned services are rather cheap (or even free), installation is quite easy and fast, and it might help you in many cases.
But adding the above services won’t probably fix all your problems, and there are other ways to implement a better caching mechanism.
There are multiple levels of cache that you can introduce:
- Cache database query results - very often data you get from the DB does not change too often or is not required to be "refreshed" on every request. You can achieve massive improvements by aiming at the most time-consuming queries.
- Cache in HTTP headers - this will help the browsers to reuse downloaded content and skip querying for images, CSS, JS on every page load.
- Reverse proxy cache - combined with point 2, it can significantly reduce the load on your web server.
In the case of a need for fast scalability, we probably should consider only quick and simple solutions: add proper cache headers to your static resources (images, CSS, JS, but also static HTML pages) and put a proxy in front of your website - this means CloudFlare or CloudFront. Thanks to that, most requests will not even reach your server, and the traffic will be significantly lower.
As a second step, you can add cache for heavy DB queries. I've seen shops that render a homepage with products, categories, news, promotions, etc. on every single page load. Adding cache is, in most cases tricky. But often it is possible to cache even a small, but computing heavy, part of the website. Adding such cache takes minutes, and improves the performance significantly. In the case of the already mentioned e-commerce platform, usually keeping a cache of categories do wonders! Just do the math, if you calculate the category tree on each request. Calculating the category tree takes ~200ms. If you have 10 000 views daily, you waste 33 minutes of computing power each day, and you also risk a ~2% conversion decrease.
3. Optimizing DB queries
My favorite “hack”. Optimizing the DB queries quite often takes a bit more time than the other solutions described, but it can also help you to achieve exceptional results. Non-optimal queries are the most common cause of issues with performance.
You know that the database is the issue if it is causing a huge CPU or high disk load (iowait) on the server. Or just check how slow it responds.
To verify, what is the real reason for DB performance issues, you will need to use tools like Blackfire, or just a database slow log. It is always a good idea to have Blackfire installed, as it allows you to quickly debug occurring performance issues and fix them. But a slow log will also do the job.
If you are running your web application on AWS, you can access the data in performance insights for RDS:
Based on the slow-log result view, you can spot what queries are causing the biggest load, and you can focus on them in the first place.
Keep in mind that the Pareto rule works here, optimize only the first query, and see how much it improves your situation.
By adding a missing index to a column, you can optimize a query by over 90%. Often a simple index change allowed us to go down from 10 seconds to 100ms for a query.
Don’t listen to people telling you that the query has to be slow because it runs on 2000000 rows. All popular database engines can run queries on much bigger tables in milliseconds. Ex. we have a table with over 30 million rows, and we run heavy calculations on it in 100ms (and the simpler ones in 5-20ms).
As mentioned before, Blackfire is our go-to tool when you have performance issues in different parts of your application. Instead of looking blindly for a problem or a fix, you can run Blackfire and verify what went wrong.
Here is an example: after a new release, a page loads very long (4s). Now instead of doing all the research by hand, we just run Blackfire against it, and we know that one query is the reason for all our troubles. We quickly switch to coding, test a fix, and release it 10-15 minutes later (tested and confirmed).
Here is how it looks in Blackfire:
Detailed SQL query execution times:
Looking at the data, it's easy to see what queries need to be optimized. We decide to optimize one part of the page that was executing the slowest queries. Effect? 67% faster load times!
Here is a view comparing the before and after loading times, queries etc.:
But the results can be even better! Recently one of developers at Accesto managed to improve the page load time by 97% in just 4 hours of development - here is his story: My homepage is slowing down.
4. Extract the database to a different server
Very often, the same server is running as both a web and a database server. It is a common problem when the database is causing a bigger part of the server load. What you can do in most cases is to move the database to a separate server and split the load into two machines.
This step will also be required when you plan to scale horizontally, and it is worth taking if you intend to introduce a better web app infrastructure to your business.
If you decide to migrate, take a look at managed services like AWS RDS or Google Cloud SQL. They offer an optimized, easy to scale solution to host a database, and also manage backups, upgrades, etc.
It is also possible to split a database onto multiple servers and divide the load even more. In such a scenario, you usually have one primary server that handles updates and adding new data and multiple secondary servers that allow you to “read” - run select queries, etc. Thanks to this approach, you can split the load onto even more servers:
Before you do this to your production, please, please, please test in on your UAT, staging, whatever server except for production. We have seen projects with 4000+ queries to a database to render one view, and in such cases, moving to a separate DB server may slow down the website!
5. Use proper configuration settings
Each framework or language you use usually has documentation describing suggested configuration settings. It is often skipped by the developers, but introducing the described optimizations mentioned could be a low-hanging fruit when it comes to optimizations.
Let’s take Symfony as an example: https://symfony.com/doc/current/performance.html by following the rules described in this article, your developers will be able to improve the performance a lot.
If you have more projects, you can introduce a company-wide release checklist to make sure your team covers all requirements.
Let me give you an example of what I meant by "a lot". By introducing only two of these good practices, we managed to go down from 1,2 second to 150ms per request. That’s 8 times faster! And it took us 5 minutes to fix!
There are more hacks you can implement to improve your scalability, ex. extract the marketing page to a different server, so even if your web app is down, new users will still be able to communicate with customers and leads.
Having read this post, you could have a question: If I can scale within hours, why even bother the first approach with architectural improvements? Because quick scalability hacks are only a temporary solution, you will always hit another ceiling sooner or later.
i've presented a few hacks that, in my opinion, will help you with overcoming the most important performance issues with your website. What I'll suggest next is taking some time to rethink the strategy to prevent feature problems. Keep in mind, that bad code can derail your project in the least expected moment.
What improvements should be made to prepare better in the future? How to prepare better for the next time? And a more important question: is your development team skilled enough to make the situation better?
If interested in answers to these questions subscribe to our newsletter, we plan to release more posts on web app architecture that will help both you and your team. If you have any questions or doubts, feel free to contact me. I'll be more than happy to learn about your case and suggest some changes.