Containerize your first Go Lang App using Docker

Does size matter?

It does – in production. Most orchestration solutions involve a smoother transitions from one docker image to other. A simple up command, if downloading only ~10MB of file, obviously it will save few seconds – but not just that, what about limited size we have in our production servers? This normally leads to the regular cleanups which is usually a manual and tedious tasks. So, for a scalability point of view, a multi-stage build approach is preferred over usual.

The Multi stage build Approach

Most of the approaches are quite poor as it ends up with a large docker image. New approach involves having a two steps approach as following:

The Build:

You build Go in this system and use that to build your application. This will result in an insane size of docker image ~ from 200MB to 300MB

Copy Build

You copy the build from first step and this ensures size of final docker image is less. I bet you will have no more than 10MB image size.

Example:

# The Build
FROM golang:alpine AS the-build
ADD . /src
RUN cd /src && go build -o app

# Copy Build
FROM alpine
WORKDIR /app
COPY --from=the-build /src/app /app/
ENTRYPOINT ./app

Here you are naming the first step of the build as ‘the-build’ Docker supports a partial build on multi-build process. So if you would like to build till ‘the-build’ you can do it with:

docker build --target the-build -t github/go-app:latest

If you omit –target – you can build the whole system in a single command as following:

docker build -t github/go-app:latest

Worth Sharing?

The Stateless Development Machine

Are you spending 2-3 hours and sometimes a day getting your new laptop ready for the development? I used to do it with ansible and be happy about it. With Docker, it has now become easier. I have following in my .zshr and it works pretty great.

The .zshrc aliases

alias npm='docker run -ti --rm -v $(pwd):/src:rw -e "PUID=$UID" -e "PGID=$GID" mkenney/npm:latest npm'
alias gulp='docker run -ti --rm -v $(pwd):/src:rw -e "PUID=$UID" -e "PGID=$GID" mkenney/npm:latest gulp'
alias node='docker run -ti --rm -v $(pwd):/src:rw -e "PUID=$UID" -e "PGID=$GID" mkenney/npm:latest node'
alias bower='docker run -ti --rm -v $(pwd):/src:rw -e "PUID=$UID" -e "PGID=$GID" mkenney/npm:latest bower'
alias bundle='docker run -ti --rm -v $(pwd):/src:rw -e "PUID=$UID" -e "PGID=$GID" rails:latest bundle'
alias composer='docker run -ti --rm -v $(pwd):/app:rw -e "PUID=$UID" -e "PGID=$GID" composer:latest composer'
alias go='docker run -ti --rm  -w /usr/src/myapp -v $(pwd):/usr/src/myapp:rw -e "PUID=$UID" -e "PGID=$GID" golang:latest go'
alias nghttp='docker run --rm -it dajobe/nghttpx nghttp'
function dbash() {
    docker exec -it $1 /bin/bash
}
alias dbash=dbash

Understanding

Obviously, the list can go on but understanding here is to keep your system in a stateless mode. Simply by having dotfiles backup, you can move your system with no trouble. Now days everything is in Git already!

Worth Sharing?

Meet SVIM – Dockerized VIM

Retired

It is retired in favor of SpaceBox

The Frustration

It is very annoying to found that you have to install VIM in every system. Despite VIM is very popular but it becomes really messy to have the same set all over the system. You keep building plugins and setting-resetting shortcuts.

The SVIM – Pronounce as “swim”

SVIM is designed to be portable and is based on amix/vimrc which is already a standard for more than 80% of VIM enthusiastic. SVIM understands GIT as well as Grep (FlyGrep)

Shortcuts:

All the shortcuts are derived from amix/vimrc extended version other than few as mentioned below.

To use GIT

Product base directory must be the mounting point which can be done by default if you are in that directory.

GIT SHORTCUTS

nmap ]h <Plug>GitGutterNextHunk
nmap [h <Plug>GitGutterPrevHunk
nmap ]s <Plug>GitGutterStageHunk
nmap ]u <Plug>GitGutterUndoHunk

FlyGrep Shortcuts

nnoremap <Space>s/ :FlyGrep<cr>

How to use SVIM?

alias svim='docker run -ti -e TERM=xterm -e GIT_USERNAME="You True" -e GIT_EMAIL="you@getyourdatasold"  --rm -v $(pwd):/home/developer/workspace varunbatrait/svim'

Takeaways:

  1. Portable
  2. Git enabled
  3. Visible hidden characters

Worth Sharing?

Lossless Image Compressions using Docker

I tried couple of existing docker containers to compress images without success. Both suffered a sever security threat because of being ‘very old’. There was only one complete tool zevilz/zImageOptimizer and that too didn’t have docker (have sent a pull request) meaning that you have to install everything for the compression.

I turned it into docker image over  varunbatrait/zimageoptimizer

My primary requirement was to use this image to shrink images every week or a fortnight on few blogs or images shot by my camera.This docker image is ideal for that.

It supports cron and as a web user compressing images helps in saving your BW and thus contribute you scalability.

Supported Format

  1. JPEG
  2. PNG
  3. GIF

How to use?

There are two ways to do it:

Maintain the marker

Marker is just a file with a timestamp of last run command. If new images are added, zImageOptimizer will consider only new image.

docker run -it -u "$UID:$GID" -d --volume /mnt/ImagesHundred/marker:/work/marker --volume /mnt/ImagesHundred/images/:/work/images/ -v /etc/passwd:/etc/passwd:ro -v /etc/group:/etc/group:ro varunbatrait/zimageoptimizer ./zImageOptimizer.sh -p /work/images/ -q -n -m /work/marker/marker

Not Maintaining the marker

docker run -u "$UID:$GID" --volume /path/to/images:/work/images -v /etc/passwd:/etc/passwd:ro -v /etc/group:/etc/group:ro  varunbatrait/zimageoptimizer

Takeaways:

  1. Images are losslessly compressed – no quality loss.
  2. You don’t have to install dependencies on every server. It is in docker.
  3. You can use it with cron.

Pain with PNGs

Please note that PNGs images can take significant time (15-25 seconds per image) and CPU (almost 100%). Just stay calm! 🙂

Worth Sharing?

3 Coding Mistakes to Unscalability

Every project starts small, ours did too. Before we knew it, it became a hit. I am talking about the Product Kits app of Shopify. You can read more about how we scaled to 100 to 100,000 hits in our Shopify App. However, there was a great learning experience and we realized how trivial things led to a big mess, but luckily everything was caught almost on first incident as we had proper logging.

Unfamiliar to Race Conditions

If you have a lot of traffic, the state of your system is unknown to two simultaneous operations unless you code that. This might cause either duplicate data and the fall-out of duplicate data or simply, operations are ignored. For example, Laravel’s Eloquent has firstOrCreate – it finds if there is a record with a specific condition, if not, it will create it and Shopify was sending the same webhook multiple times. Imagine the agony we had to face when we had duplicate data? If you have ‘group by’ in the query and then using ‘sort’ – ASC will have first record of duplicate, DESC will have last record of duplicate. Hence, an operation might be differ on each duplicate data – leading to a mess. This was happening because in between SELECT and INSERT, the SELECT of other webhook runs and finds no result. To avoid it, we used ‘locking’ SELECT – FOR UPDATE.

Unoptimized Migrations

If you are upgrading your app, you probably need migrations. Sometimes these migrations requiring you to operate on already present data in your database or – values are derived from other data in the table. For example, we were changing 1:N relations to N:M relations. Hence, a new table to be created to hold the relations and so on. In local environment, everything runs in less than a second right – you don’t have a 200GB data in local (usually) and in staging, you don’t have to keep the exact same data right? Now imagine what really an unoptimized code can do to 200GB of data? For starters, it will chock the RAM or already return an error if you are taking all the data in one go. If you are using iterations, it might take hours. We wrote a procedure to do it inside MySQL without the needing to take data and use PHP and write data back.

Choosing the Hammer

Just like not all languages work the same way, not all the frameworks are same. YII comes with the fantastic inbuilt cache with invalidation rules, but most used framework Laravel is missing it. PHP doesn’t have threads and bigint, Ruby, Golang does. So just don’t pick the Hammer, have a toolbox instead. Use docker for scalability combining all of your tools.

Conclusion

Unfortunately, a lot of things we take as granted when we write the first line of code. Worst thing that can happen to you is you become successful and can’t handle it well. I hope that these mistakes will not be made in your next project. I wish you good luck!

Worth Sharing?

Passing Real IP in WordPress behind Proxy or in Docker

If you have followed the tutorial on How to run WordPress Blog behind Nginx Secure (https) Proxy, you might be under a situation that WordPress is showing all ips as proxy ips. In case of Docker it must be like 172.X.X.X otherwise, it is the ip of your server. 

If this is a problem?

You might be wondering if this is worth solving? Well Yes!, Most of the real comments were categorized as spam. 

Adding Real-IP to WordPress

Step 1 – Editing WordPress config

In wp-config.php file add following lines just above /* That’s all, stop editing! Happy blogging. */

// Use X-Forwarded-For HTTP Header to Get Visitor's Real IP Address
if ( isset( $_SERVER['HTTP_X_FORWARDED_FOR'] ) ) {
  $http_x_headers = explode( ',', $_SERVER['HTTP_X_FORWARDED_FOR'] );
  $_SERVER['REMOTE_ADDR'] = $http_x_headers[0];
}
/* That's all, stop editing! Happy blogging. */

Step 2 – Editing Nginx

Inside your proxy settings in nginx, simply add this:

proxy_set_header        HTTP_X_FORWARDED_FOR       $remote_addr;

In case of WordPress Behind Docker

In case if you are using Docker, you will need to copy wp-config.php from container and later copy to container. This can be done as following.

#Copy from docker container
docker cp project_wordpress_1:/var/www/html/wp-config.php .

#Copy to docker container
docker cp wp-config.php project_wordpress_1:/var/www/html/wp-config.php

Easy-peasy right?

Worth Sharing?

Nginx Proxy Caching for Scalability.

Since our servers are spread across multiple locations, we had a lot of issues regarding speed. If it is served from the different location server, which is not in the local network, there is a latency of about 500ms to 750ms, This seems a lot and is unavoidable if you are running a maintenance on locals and have configured a load balancing using Nginx.

By default caching is off and thus it always go to the proxy server when a resource is requested and hence causes a lot of latency. Nginx cache is so advanced that you can tweak to to almost every use case. 

Generic configuration in any proxy caching.

Storage, Validity, Invalidity and conditions are basic requirements of any proxy caching.

Imagine a following configuration:

http {
    proxy_cache_path  /data/nginx/cache  levels=1:2    keys_zone=SCALE:10m inactive=1h  max_size=1g manager_files=20 manager_sleep=1000;
    server {
        location / {
            proxy_cache            SCALE;
            proxy_pass             http://1.2.3.4;
            proxy_set_header       Host $host;
            proxy_cache_min_uses   10;
            proxy_cache_valid      200  20m;
            proxy_cache_valid      401  1m;
            proxy_cache_revalidate on;
            proxy_cache_use_stale  error timeout invalid_header updating
                                   http_500 http_502 http_503 http_504;
        }
    }
}

Configuration of proxy_cache_path for scalability.

The cache directory is defined as a ‘zone’ with proxy_cache_path Cache is written in temp files before it is renamed which avoids ‘partial’ recurring response. A special process manager will delete cached files which is not accessed for one hour as specified by inactive=1h and to be less CPU intensive manager_files is set to 20 so that upon inactive instead of the default 100 files, only 20 files are deleted. Similarly manager_sleep is increased to 1000 instead of the default 200 to have a sleeping interval of 1 second before a next cycle to handle inactive files. Tweaking loader_files, loader_threshold, loader_sleep is generally not necessary. Defaults are good enough.

Please note that the approach using proxy_pass with the IP as above isn’t recommended, for more detail please, visit the guide of using Nginx Reverse Proxy for Scalability.

Configuring proxy_cache_min_uses for scalability

proxy_cache_min_uses tells the minimum number of times a resource has been requested before it is cached. Obviously, you don’t want a lower requesting resource to be cached. Hence, it has been increased to 10 in our case. This can be different for you. You might want to make it lower or higher value.

Configuring proxy_cache_revalidate for scalability

By default proxy_cache_revalidate is off, turning it on will only match ETAG from the proxy like a browser.

Conclusion

Nginx is extremely powerful but in order to use Nginx as a reverse proxy, not only cache zone must be configured, but some of the default values must be tweaked.

Worth Sharing?

Writing the ‘straight’ codes

The most prominent mistake which is made during coding is calling with nested functions. For instance look at the following codes:

//bad codes
function getBooks(){
 return getAuhtors(books);
}
function getAuthors(books){
  books.authors = SomeQuery;
}
function main(){
 let booksWithAuthor = getBooks();
}

Problem with the above code is, they are nested and when you call getBooks() – you are not aware that it will bring authors as well. Let us try again by renaming the function.

//bad codes
function getBooksWithAuthors(){
 return getAuthors(books);
}
function getAuthors(books = []){
  books.authors = SomeQuery;
  return books
}
function main(){
 let booksWithAuthor = getBooksWithAuthors();
}

After the change, there is not any longer function which only takes the Books i.e. Books without authors and it is still sort of nested right? Lets further modify the codes and turn them into sequential

// Good Codes
function getBooks(){
 return books;
}
function getAuthors(books = []){
 return authors;
}
function getBooksWithAuthors(){
 books = getBooks();
 books.authors = getAuthors(books);
}

The function calls in getBooksWithAuthors now do not have nested function calls but a sequential call which combines both of the data i.e. books and authors

Advantages

  1. Codes are more readable – you know that getBooksWithAuthors will get you both.
  2. Codes will lead to isolated functions – getBooks and getAuthors are isolated functions and can be called by any other function.
  3. Codes don’t have side-effects – Here sequential call ensures that it is called where it is intended. For instance, if you want to get only books, you would call getBooks()
  4. Improve Unit Test coverage – There will be more coverage of tests as they are all isolated functions.

Worth Sharing?

Nginx Reverse Proxy for Scalability

Nginx comes up with a wonderful Reverse Proxy with tons of option. But the usual way of proxy is flawed in the sense that it doesn’t allow load balancing. For example consider this one:

Usual way of Reverse Proxy

    location / {
        try_files $uri @app;
    }
    location @app {
        proxy_pass http://127.0.0.1:8081;
        ...
    }

All the request in the location / will go to http://127.0.0.1:8081 but once you have out grown to the local server, and need additional server, you have to do a lot of changes. However, Nginx comes up with an ‘upstream‘ which will make it more manageable and less change prone with more servers as shown below.

Better Reverse Proxy

http {
    upstream app{
        server 127.0.0.1:8081;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://app;
            ....
        }
    }
}

With this approach, you have a proxy running just like before but if you want to add server, it is super easy like following:

http {
    upstream app{
        server 127.0.0.1:8081;
        server 192.168.0.2:8081;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://app;
            ....
        }
    }
}

Weighing Server

Since, local server – the one with 127.0.0.1:8081 might be having a lot going on – for example each application has many services and they are all running in a single server – at least in the beginning, It is important that this server has lesser traffic than other. To do that you just need to add ‘weight’

http {
    upstream app{
        server 127.0.0.1:8081;
        server 192.168.0.2:8081 weight=5;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://app;
            ....
        }
    }
}

Making initial server as “Backup”

Like stated above, you probably have a lot of things going on in initial server. Hence, it makes a lot of sense to add one more server and simply turn local server as a backup server – probably along with another server. For example look at the following block of code

http {
    upstream app{
        server 192.168.0.5:8081 weight=2;
        server 192.168.0.4:8081;

        server 127.0.0.1:8081 backup;
        server 192.168.0.2:8081 weight=5 backup;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://app;
            ....
        }
    }
}

Setup a “Resolve” for Movable Servers

So far we have dealt with “IP” addresses and hence, it is more of a rigid setup. With scalability, you tend to move your servers a lot and hence, it is impossible to have a same IP address in all the location. Only fewer service provider allows it – honestly I only know “upcloud” which does that. In fact, any other cloud server which doesn’t allow that – you have to come up with a following block and it isn’t enough, you have to make sure to wait for at least 48 hours before you burn down the old server or you can use local dns server which updates the domain quickly. 

http {
    #Google but can use local dns for quicker updates
    resolver 8.8.8.8; 
    upstream app{
        server us1.webapplicationconsultant.com:8081 weight=2 resolve;
        server us2.webapplicationconsultant.com:8081 resolve;

        server 127.0.0.1:8081 backup;
        server 192.168.0.2:8081 weight=5 backup;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://app;
            ....
        }
    }
}

Session Affinity

First rule of scalability is to have a common session handler – you can do it using redis – master-master configuration. However, for some reason if you are not using it, it becomes very important to have session affinity.

http {
    #Google but can use local dns for quicker updates
    resolver 8.8.8.8; 
    upstream app{
        server us1.webapplicationconsultant.com:8081 weight=2 resolve route=us1;
        server us2.webapplicationconsultant.com:8081 resolve route=us2;
        sticky cookie srv_id expires=1h domain=.webapplicationconsultant.com path=/;
        # srv_id = us1 or us2 
        server 127.0.0.1:8081 backup;
        server 192.168.0.2:8081 weight=5 backup;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://app;
            ....
        }
    }
}

Other methods are “learn” and “route” which will be discussed in a dedicated post about Session Affinity. 

Worth Sharing?

Choosing MySQL Memory Engine for Session/Caching

MySQL memory engine is least popular but most among most effective solution for a performance first application. Most of the ‘node’ application developers are generally spinning a Redis for having a session, while same can be achieved using memory engine of MySQL without any overhead of having a different tech – “Redis” in this case.

More overheads mean more ways to Break

It is often an “overkill” by using Redis for session/cache, in case you already have MySQL in place.

Configuration:

The max_heap_table_size system variable defines the limit of the maximum size of MEMORY tables. As this is dynamic variable, you can set this by following in runtime.

SET max_heap_table_size = 1024*1024;

Use cases:

  1. Non-Critical Read-Only and Read-Mostly Data
  2. Caching
  3. Session

An example of Non-Critical Read-Mostly Data where we used MySQL memory storage engine other than “session” and “caching” was ‘computing” to store intermediate results. To be more specific we used it in pattern recognition of stock pricing.

Limitations:

There are many limitations in Memory engine, most of these are okay in case of listed use cases above.

  1. No row-level locking
  2. No FKs
  3. No Transactions
  4. Clustering (No Scalability)
  5. Geospatial Data or Geospatial indexes

Worth Sharing?