3 Coding Mistakes to Unscalability

Every project starts small, ours did too. Before we knew it, it became a hit. I am talking about the Product Kits app of Shopify. You can read more about how we scaled to 100 to 100,000 hits in our Shopify App. However, there was a great learning experience and we realized how trivial things led to a big mess, but luckily everything was caught almost on first incident as we had proper logging.

Unfamiliar to Race Conditions

If you have a lot of traffic, the state of your system is unknown to two simultaneous operations unless you code that. This might cause either duplicate data and the fall-out of duplicate data or simply, operations are ignored. For example, Laravel’s Eloquent has firstOrCreate – it finds if there is a record with a specific condition, if not, it will create it and Shopify was sending the same webhook multiple times. Imagine the agony we had to face when we had duplicate data? If you have ‘group by’ in the query and then using ‘sort’ – ASC will have first record of duplicate, DESC will have last record of duplicate. Hence, an operation might be differ on each duplicate data – leading to a mess. This was happening because in between SELECT and INSERT, the SELECT of other webhook runs and finds no result. To avoid it, we used ‘locking’ SELECT – FOR UPDATE.

Unoptimized Migrations

If you are upgrading your app, you probably need migrations. Sometimes these migrations requiring you to operate on already present data in your database or – values are derived from other data in the table. For example, we were changing 1:N relations to N:M relations. Hence, a new table to be created to hold the relations and so on. In local environment, everything runs in less than a second right – you don’t have a 200GB data in local (usually) and in staging, you don’t have to keep the exact same data right? Now imagine what really an unoptimized code can do to 200GB of data? For starters, it will chock the RAM or already return an error if you are taking all the data in one go. If you are using iterations, it might take hours. We wrote a procedure to do it inside MySQL without the needing to take data and use PHP and write data back.

Choosing the Hammer

Just like not all languages work the same way, not all the frameworks are same. YII comes with the fantastic inbuilt cache with invalidation rules, but most used framework Laravel is missing it. PHP doesn’t have threads and bigint, Ruby, Golang does. So just don’t pick the Hammer, have a toolbox instead. Use docker for scalability combining all of your tools.

Conclusion

Unfortunately, a lot of things we take as granted when we write the first line of code. Worst thing that can happen to you is you become successful and can’t handle it well. I hope that these mistakes will not be made in your next project. I wish you good luck!

Worth Sharing?

Nginx Proxy Caching for Scalability.

Since our servers are spread across multiple locations, we had a lot of issues regarding speed. If it is served from the different location server, which is not in the local network, there is a latency of about 500ms to 750ms, This seems a lot and is unavoidable if you are running a maintenance on locals and have configured a load balancing using Nginx.

By default caching is off and thus it always go to the proxy server when a resource is requested and hence causes a lot of latency. Nginx cache is so advanced that you can tweak to to almost every use case. 

Generic configuration in any proxy caching.

Storage, Validity, Invalidity and conditions are basic requirements of any proxy caching.

Imagine a following configuration:

http {
    proxy_cache_path  /data/nginx/cache  levels=1:2    keys_zone=SCALE:10m inactive=1h  max_size=1g manager_files=20 manager_sleep=1000;
    server {
        location / {
            proxy_cache            SCALE;
            proxy_pass             http://1.2.3.4;
            proxy_set_header       Host $host;
            proxy_cache_min_uses   10;
            proxy_cache_valid      200  20m;
            proxy_cache_valid      401  1m;
            proxy_cache_revalidate on;
            proxy_cache_use_stale  error timeout invalid_header updating
                                   http_500 http_502 http_503 http_504;
        }
    }
}

Configuration of proxy_cache_path for scalability.

The cache directory is defined as a ‘zone’ with proxy_cache_path Cache is written in temp files before it is renamed which avoids ‘partial’ recurring response. A special process manager will delete cached files which is not accessed for one hour as specified by inactive=1h and to be less CPU intensive manager_files is set to 20 so that upon inactive instead of the default 100 files, only 20 files are deleted. Similarly manager_sleep is increased to 1000 instead of the default 200 to have a sleeping interval of 1 second before a next cycle to handle inactive files. Tweaking loader_files, loader_threshold, loader_sleep is generally not necessary. Defaults are good enough.

Please note that the approach using proxy_pass with the IP as above isn’t recommended, for more detail please, visit the guide of using Nginx Reverse Proxy for Scalability.

Configuring proxy_cache_min_uses for scalability

proxy_cache_min_uses tells the minimum number of times a resource has been requested before it is cached. Obviously, you don’t want a lower requesting resource to be cached. Hence, it has been increased to 10 in our case. This can be different for you. You might want to make it lower or higher value.

Configuring proxy_cache_revalidate for scalability

By default proxy_cache_revalidate is off, turning it on will only match ETAG from the proxy like a browser.

Conclusion

Nginx is extremely powerful but in order to use Nginx as a reverse proxy, not only cache zone must be configured, but some of the default values must be tweaked.

Worth Sharing?

Nginx Reverse Proxy for Scalability

Nginx comes up with a wonderful Reverse Proxy with tons of option. But the usual way of proxy is flawed in the sense that it doesn’t allow load balancing. For example consider this one:

Usual way of Reverse Proxy

    location / {
        try_files $uri @app;
    }
    location @app {
        proxy_pass http://127.0.0.1:8081;
        ...
    }

All the request in the location / will go to http://127.0.0.1:8081 but once you have out grown to the local server, and need additional server, you have to do a lot of changes. However, Nginx comes up with an ‘upstream‘ which will make it more manageable and less change prone with more servers as shown below.

Better Reverse Proxy

http {
    upstream app{
        server 127.0.0.1:8081;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://app;
            ....
        }
    }
}

With this approach, you have a proxy running just like before but if you want to add server, it is super easy like following:

http {
    upstream app{
        server 127.0.0.1:8081;
        server 192.168.0.2:8081;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://app;
            ....
        }
    }
}

Weighing Server

Since, local server – the one with 127.0.0.1:8081 might be having a lot going on – for example each application has many services and they are all running in a single server – at least in the beginning, It is important that this server has lesser traffic than other. To do that you just need to add ‘weight’

http {
    upstream app{
        server 127.0.0.1:8081;
        server 192.168.0.2:8081 weight=5;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://app;
            ....
        }
    }
}

Making initial server as “Backup”

Like stated above, you probably have a lot of things going on in initial server. Hence, it makes a lot of sense to add one more server and simply turn local server as a backup server – probably along with another server. For example look at the following block of code

http {
    upstream app{
        server 192.168.0.5:8081 weight=2;
        server 192.168.0.4:8081;

        server 127.0.0.1:8081 backup;
        server 192.168.0.2:8081 weight=5 backup;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://app;
            ....
        }
    }
}

Setup a “Resolve” for Movable Servers

So far we have dealt with “IP” addresses and hence, it is more of a rigid setup. With scalability, you tend to move your servers a lot and hence, it is impossible to have a same IP address in all the location. Only fewer service provider allows it – honestly I only know “upcloud” which does that. In fact, any other cloud server which doesn’t allow that – you have to come up with a following block and it isn’t enough, you have to make sure to wait for at least 48 hours before you burn down the old server or you can use local dns server which updates the domain quickly. 

http {
    #Google but can use local dns for quicker updates
    resolver 8.8.8.8; 
    upstream app{
        server us1.webapplicationconsultant.com:8081 weight=2 resolve;
        server us2.webapplicationconsultant.com:8081 resolve;

        server 127.0.0.1:8081 backup;
        server 192.168.0.2:8081 weight=5 backup;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://app;
            ....
        }
    }
}

Session Affinity

First rule of scalability is to have a common session handler – you can do it using redis – master-master configuration. However, for some reason if you are not using it, it becomes very important to have session affinity.

http {
    #Google but can use local dns for quicker updates
    resolver 8.8.8.8; 
    upstream app{
        server us1.webapplicationconsultant.com:8081 weight=2 resolve route=us1;
        server us2.webapplicationconsultant.com:8081 resolve route=us2;
        sticky cookie srv_id expires=1h domain=.webapplicationconsultant.com path=/;
        # srv_id = us1 or us2 
        server 127.0.0.1:8081 backup;
        server 192.168.0.2:8081 weight=5 backup;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://app;
            ....
        }
    }
}

Other methods are “learn” and “route” which will be discussed in a dedicated post about Session Affinity. 

Worth Sharing?

How to write the high-performance application in PHP?

We coded Product Kits app and it worked pretty well. Peak hits were 5000 hits per seconds. Read Story of Product Kits from 100 to 100000 hits per minute. We had a lot of issues, but issues with PHP wasn’t scalability. We were able to handle everything but the problem came when we had to know what other workers were doing.

The first question that might come to your mind is if you have chosen the write tech stack? Since we are aware of the fact that PHP doesn’t support multi-threading. However, there is a trade-off between the speed of development and the performance. PHP is not very fast, in fact, it is slow – but it is fast enough. As long as you don’t want individual PHP scripts to know the state of each other, you are in a pretty good shape – most of the time.

Trade-Off – Scalable VS Speed

While using PHP, our major concern was RAM, it was much easier to get high RAM usage and CPU. We had to deal with a lot of data and data most of the time either stays in RAM or required us to increase the HIT if we wanted to keep it outside. If your PHP codes are using a lot of RAM, you will have to solve a scalability problem. However, if your app doesn’t require a lot of RAM, better is to optimize it for speed.

Writing the Right Codes:

  1. Rely on always running PHP codes – If a worker is written in PHP, tie the worker in an infinite loop which will wait for an event (A Queue, MySQL entry) instead of invoking PHP every second or so.
  2. Cache sooner – Although there are a couple of cache options in PHP – OPcache and Memcache. However, Redis is favourite which can further help you scaling by having multiple master or other topology. Combination of opcache and redis will be best.
  3. Load fewer classes – Ensure that you are not loading a lot of classes, rely on dynamic loading. This will increase the speed and reduce the memory.
  4. Keep over-writing variables – This is a pretty bad practice but it ensures that your memory is limited.
  5. Make smaller blocks – A heavy code or multiple functional calls under the loop are your sworn enemy. It is better to write multiple loops few smaller blocks than to have one large block.
  6. Use JSON instead of XML – JSON is a new standard and takes lesser memory.
  7. Use classes – Obvious but – having functions inside class will make it less memory hogger as long as you are loading classes when needed.

Micro Optimization of your codes:

These optimizations are not something you should do after the development as it has a very little effect. However, right from the beginning, a good practice is to follow it.

  1. Promote ‘static’ – This alone can increase the execution speed by 3X.
  2. Use single quotes – ‘ – As long as there is variable inside.
  3. Use str_replace instead of preg_replace_all
  4. Use ‘===’ instead of ‘==’
  5. Use ‘isset’ instead of count/size

Worth Sharing?





How to make app ready for scalability?

Most apps are built to fail – meaning that you develop an app with half-heartedly and not architect it well to make it scalable. Ask yourself, did you make an app to fail? The problem with success is scalability. If you can’t scale, you are bound to fail.

1. Divide Everything

I have a list of following things which you can divide. If you have more, please put that in comments:

  1. Multiple users can have separate database assigned.
  2. Credential are authenticated by a separate service.
  3. Outsource the background jobs to a different server.
  4. Have multiple queues.
  5. Have at least two masters.

2. Isolate and backup every service.

Putting small services on their own small servers can help you prevent a death from hardware failure. Consider an email sending service, you can easily have two or three service provider and if one is down, immediately switch to the second one. Similarly, if your backups are relying on one slave, make it work with the other one if it gets down.

3. Don’t just switch, resurrect the services.

We had three geolocation services on three separate servers. One day, none of them were working. In the log, we found that two were crashed weeks ago because of RAM usage as one of our developers used a nasty bulk check for 20M records. It only needed a service start command. So ensuring that it remains started can actually solve “fools’ development problem”

4. Proxy is new God

There are a lot of proxy solutions for every kind of services, place them ahead of everything you are running. These proxies serve two purposes:

  1. Switching the services when dead!
  2. Limiting the number of connections

Proxies have their own pool of connections and hence, despite you are hitting database by creating 200 connections, if it is going through a proxy service, it will be as low as 20. Some of the proxy solutions we have used:

5. Monitoring Services

We are a big fan of Prometheus and Grafana. While Prometheus exporters export different data, Grafana can be used to see it beautifully and send the alert.

6. (Bonus) – Attitude

Every app must be developed with a TDD/BDD approach and attempts must be made to tune everything you have. It is far better to run an optimized query than to throw a hardware on a database. The attitude of your development team matters most. So, the first step of scalability is in fact, make sure you hire the Good and fire the Bad.

Worth Sharing?