3 Coding Mistakes to Unscalability

Every project starts small, ours did too. Before we knew it, it became a hit. I am talking about the Product Kits app of Shopify. You can read more about how we scaled to 100 to 100,000 hits in our Shopify App. However, there was a great learning experience and we realized how trivial things led to a big mess, but luckily everything was caught almost on first incident as we had proper logging.

Unfamiliar to Race Conditions

If you have a lot of traffic, the state of your system is unknown to two simultaneous operations unless you code that. This might cause either duplicate data and the fall-out of duplicate data or simply, operations are ignored. For example, Laravel’s Eloquent has firstOrCreate – it finds if there is a record with a specific condition, if not, it will create it and Shopify was sending the same webhook multiple times. Imagine the agony we had to face when we had duplicate data? If you have ‘group by’ in the query and then using ‘sort’ – ASC will have first record of duplicate, DESC will have last record of duplicate. Hence, an operation might be differ on each duplicate data – leading to a mess. This was happening because in between SELECT and INSERT, the SELECT of other webhook runs and finds no result. To avoid it, we used ‘locking’ SELECT – FOR UPDATE.

Unoptimized Migrations

If you are upgrading your app, you probably need migrations. Sometimes these migrations requiring you to operate on already present data in your database or – values are derived from other data in the table. For example, we were changing 1:N relations to N:M relations. Hence, a new table to be created to hold the relations and so on. In local environment, everything runs in less than a second right – you don’t have a 200GB data in local (usually) and in staging, you don’t have to keep the exact same data right? Now imagine what really an unoptimized code can do to 200GB of data? For starters, it will chock the RAM or already return an error if you are taking all the data in one go. If you are using iterations, it might take hours. We wrote a procedure to do it inside MySQL without the needing to take data and use PHP and write data back.

Choosing the Hammer

Just like not all languages work the same way, not all the frameworks are same. YII comes with the fantastic inbuilt cache with invalidation rules, but most used framework Laravel is missing it. PHP doesn’t have threads and bigint, Ruby, Golang does. So just don’t pick the Hammer, have a toolbox instead. Use docker for scalability combining all of your tools.

Conclusion

Unfortunately, a lot of things we take as granted when we write the first line of code. Worst thing that can happen to you is you become successful and can’t handle it well. I hope that these mistakes will not be made in your next project. I wish you good luck!

Tell your network!

Nginx Proxy Caching for Scalability.

Since our servers are spread across multiple locations, we had a lot of issues regarding speed. If it is served from the different location server, which is not in the local network, there is a latency of about 500ms to 750ms, This seems a lot and is unavoidable if you are running a maintenance on locals and have configured a load balancing using Nginx.

By default caching is off and thus it always go to the proxy server when a resource is requested and hence causes a lot of latency. Nginx cache is so advanced that you can tweak to to almost every use case. 

Generic configuration in any proxy caching.

Storage, Validity, Invalidity and conditions are basic requirements of any proxy caching.

Imagine a following configuration:

http {
    proxy_cache_path  /data/nginx/cache  levels=1:2    keys_zone=SCALE:10m inactive=1h  max_size=1g manager_files=20 manager_sleep=1000;
    server {
        location / {
            proxy_cache            SCALE;
            proxy_pass             http://1.2.3.4;
            proxy_set_header       Host $host;
            proxy_cache_min_uses   10;
            proxy_cache_valid      200  20m;
            proxy_cache_valid      401  1m;
            proxy_cache_revalidate on;
            proxy_cache_use_stale  error timeout invalid_header updating
                                   http_500 http_502 http_503 http_504;
        }
    }
}

Configuration of proxy_cache_path for scalability.

The cache directory is defined as a ‘zone’ with proxy_cache_path Cache is written in temp files before it is renamed which avoids ‘partial’ recurring response. A special process manager will delete cached files which is not accessed for one hour as specified by inactive=1h and to be less CPU intensive manager_files is set to 20 so that upon inactive instead of the default 100 files, only 20 files are deleted. Similarly manager_sleep is increased to 1000 instead of the default 200 to have a sleeping interval of 1 second before a next cycle to handle inactive files. Tweaking loader_files, loader_threshold, loader_sleep is generally not necessary. Defaults are good enough.

Please note that the approach using proxy_pass with the IP as above isn’t recommended, for more detail please, visit the guide of using Nginx Reverse Proxy for Scalability.

Configuring proxy_cache_min_uses for scalability

proxy_cache_min_uses tells the minimum number of times a resource has been requested before it is cached. Obviously, you don’t want a lower requesting resource to be cached. Hence, it has been increased to 10 in our case. This can be different for you. You might want to make it lower or higher value.

Configuring proxy_cache_revalidate for scalability

By default proxy_cache_revalidate is off, turning it on will only match ETAG from the proxy like a browser.

Conclusion

Nginx is extremely powerful but in order to use Nginx as a reverse proxy, not only cache zone must be configured, but some of the default values must be tweaked.

Tell your network!

Nginx Reverse Proxy for Scalability

Nginx comes up with a wonderful Reverse Proxy with tons of option. But the usual way of proxy is flawed in the sense that it doesn’t allow load balancing. For example consider this one:

Usual way of Reverse Proxy

    location / {
        try_files $uri @app;
    }
    location @app {
        proxy_pass http://127.0.0.1:8081;
        ...
    }

All the request in the location / will go to http://127.0.0.1:8081 but once you have out grown to the local server, and need additional server, you have to do a lot of changes. However, Nginx comes up with an ‘upstream‘ which will make it more manageable and less change prone with more servers as shown below.

Better Reverse Proxy

http {
    upstream app{
        server 127.0.0.1:8081;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://app;
            ....
        }
    }
}

With this approach, you have a proxy running just like before but if you want to add server, it is super easy like following:

http {
    upstream app{
        server 127.0.0.1:8081;
        server 192.168.0.2:8081;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://app;
            ....
        }
    }
}

Weighing Server

Since, local server – the one with 127.0.0.1:8081 might be having a lot going on – for example each application has many services and they are all running in a single server – at least in the beginning, It is important that this server has lesser traffic than other. To do that you just need to add ‘weight’

http {
    upstream app{
        server 127.0.0.1:8081;
        server 192.168.0.2:8081 weight=5;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://app;
            ....
        }
    }
}

Making initial server as “Backup”

Like stated above, you probably have a lot of things going on in initial server. Hence, it makes a lot of sense to add one more server and simply turn local server as a backup server – probably along with another server. For example look at the following block of code

http {
    upstream app{
        server 192.168.0.5:8081 weight=2;
        server 192.168.0.4:8081;

        server 127.0.0.1:8081 backup;
        server 192.168.0.2:8081 weight=5 backup;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://app;
            ....
        }
    }
}

Setup a “Resolve” for Movable Servers

So far we have dealt with “IP” addresses and hence, it is more of a rigid setup. With scalability, you tend to move your servers a lot and hence, it is impossible to have a same IP address in all the location. Only fewer service provider allows it – honestly I only know “upcloud” which does that. In fact, any other cloud server which doesn’t allow that – you have to come up with a following block and it isn’t enough, you have to make sure to wait for at least 48 hours before you burn down the old server or you can use local dns server which updates the domain quickly. 

http {
    #Google but can use local dns for quicker updates
    resolver 8.8.8.8; 
    upstream app{
        server us1.webapplicationconsultant.com:8081 weight=2 resolve;
        server us2.webapplicationconsultant.com:8081 resolve;

        server 127.0.0.1:8081 backup;
        server 192.168.0.2:8081 weight=5 backup;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://app;
            ....
        }
    }
}

Session Affinity

First rule of scalability is to have a common session handler – you can do it using redis – master-master configuration. However, for some reason if you are not using it, it becomes very important to have session affinity.

http {
    #Google but can use local dns for quicker updates
    resolver 8.8.8.8; 
    upstream app{
        server us1.webapplicationconsultant.com:8081 weight=2 resolve route=us1;
        server us2.webapplicationconsultant.com:8081 resolve route=us2;
        sticky cookie srv_id expires=1h domain=.webapplicationconsultant.com path=/;
        # srv_id = us1 or us2 
        server 127.0.0.1:8081 backup;
        server 192.168.0.2:8081 weight=5 backup;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://app;
            ....
        }
    }
}

Other methods are “learn” and “route” which will be discussed in a dedicated post about Session Affinity. 

Tell your network!