3 Coding Mistakes to Unscalability

Every project starts small, ours did too. Before we knew it, it became a hit. I am talking about the Product Kits app of Shopify. You can read more about how we scaled to 100 to 100,000 hits in our Shopify App. However, there was a great learning experience and we realized how trivial things led to a big mess, but luckily everything was caught almost on first incident as we had proper logging.

Unfamiliar to Race Conditions

If you have a lot of traffic, the state of your system is unknown to two simultaneous operations unless you code that. This might cause either duplicate data and the fall-out of duplicate data or simply, operations are ignored. For example, Laravel’s Eloquent has firstOrCreate – it finds if there is a record with a specific condition, if not, it will create it and Shopify was sending the same webhook multiple times. Imagine the agony we had to face when we had duplicate data? If you have ‘group by’ in the query and then using ‘sort’ – ASC will have first record of duplicate, DESC will have last record of duplicate. Hence, an operation might be differ on each duplicate data – leading to a mess. This was happening because in between SELECT and INSERT, the SELECT of other webhook runs and finds no result. To avoid it, we used ‘locking’ SELECT – FOR UPDATE.

Unoptimized Migrations

If you are upgrading your app, you probably need migrations. Sometimes these migrations requiring you to operate on already present data in your database or – values are derived from other data in the table. For example, we were changing 1:N relations to N:M relations. Hence, a new table to be created to hold the relations and so on. In local environment, everything runs in less than a second right – you don’t have a 200GB data in local (usually) and in staging, you don’t have to keep the exact same data right? Now imagine what really an unoptimized code can do to 200GB of data? For starters, it will chock the RAM or already return an error if you are taking all the data in one go. If you are using iterations, it might take hours. We wrote a procedure to do it inside MySQL without the needing to take data and use PHP and write data back.

Choosing the Hammer

Just like not all languages work the same way, not all the frameworks are same. YII comes with the fantastic inbuilt cache with invalidation rules, but most used framework Laravel is missing it. PHP doesn’t have threads and bigint, Ruby, Golang does. So just don’t pick the Hammer, have a toolbox instead. Use docker for scalability combining all of your tools.

Conclusion

Unfortunately, a lot of things we take as granted when we write the first line of code. Worst thing that can happen to you is you become successful and can’t handle it well. I hope that these mistakes will not be made in your next project. I wish you good luck!

3 Coding Mistakes to Unscalability

13 thoughts on “3 Coding Mistakes to Unscalability

  1. One mistake is the minor optimization which we tend to avoid. If you learn to do it from the start, it will help you look like a scientist or a crazy after 5 years 🙂

  2. There was this client with 10K customers and we wanted to add a currency symbol in a table and wanted to provide settings for it. We migrated and created a new symbol column with NO DEFAULT value. So all price were without a currency sign. IMAGINE THAT! Later we found that who tested the app simply added a currency code and he didn’t even think that ALL THE 10K customers will NOT add the currency code. But luckily as soon as it was spotted we migratted it back and then added currency code in the migration itself.

  3. Most important this is here to maintain the memory. There is a sweet spot between number of queries and number of records you are inserting in a query. Mind that!

  4. Dear Mr. Batra, by putting the race condition with such a scary problem in Laravel, you have earned my respect but I had no idea and used it in all the places I can imagine. Should I be worried? I should rather swith off my mobile lol 😛

  5. You can save plenty of your pain simply using multiple databases approach. With that, if you have TBs of data, place them in different databases. Release versions to seperate databases.

Leave a Reply to Ryan Cancel reply

Your email address will not be published. Required fields are marked *

Scroll to top