Hey Everyone,
Today we’ll be talking about
Questions? Please contact me at [email protected].
Quastor is a free Software Engineering newsletter that sends out summaries of technical blog posts, deep dives on interesting tech and FAANG interview questions and solutions.
Here’s a summary of Clean Code’s advice on writing readable functions.
This advice is geared towards functions written in an OOP language, although many of the concepts carry over to other programming paradigms.
The majority of your functions should be less than 15 lines long, and they should hardly ever be more than 20 lines long.
You should always be able to fit the entire function on your monitor. If you can’t then you probably need to break the function up into separate functions.
If you’re following this principle, it means that your function will not be large enough to hold several nested structures. You won’t be able to fit a loop, and then a nested if/else statement inside that loop, etc.
Instead, you should make those nested structures separate function calls. This also adds documentary value since those functions will have nice descriptive names that explain what they do.
Another way of stating Principle 1 is that your function should only do one thing.
A good heuristic to use is to see if you can extract some code from your function and turn that code into a new function with a name that is not just a restatement of its implementation.
If you can do that, then that suggests that your function is doing multiple things and you should break it down further.
Your programs can frequently be divided into layers of abstraction, where each layer represents a different model of the same information and processes, but in a different manner.
An example is if you’re developing a program to scrape a website, parse some data and then put that data in a database.
One layer of abstraction is the HTML string you get after sending an HTTP request to the website’s server.
You take that HTML string and send it to your second layer of abstraction, where an HTML parser converts that string into an object where the HTML page is represented as a nested data structure (read up on BeautifulSoup if you’d like a more concrete example). This is the second layer of abstraction.
The third layer of abstraction could be where you extract the specific data you need from the HTML page object and store that data in an array.
The fourth layer of abstraction would be where you write the data in that array to a database.
The statements in a function should all be at the same level of abstraction.
You should not be manipulating the HTML string object and writing to the database in the same function.
Instead, Clean Code suggests The Stepdown Rule, where your code can be read as a top-down narrative.
Every function should be followed by a function at the next layer of abstraction. This way, as you read the program, you’ll be descending down one layer of abstraction at a time as you go through the list of functions.
If you’re following the first three principles, then coming up with a name shouldn’t be too difficult.
The smaller and more focused your functions are, the easier it is to come up with a name.
Don’t be afraid to make your names long. A long, descriptive name is much better than a short, ambiguous name. Your editor’s autocomplete function should make it easy to write code with long function names.
Also, use a naming convention that allows multiple words to be easily read in function names. CamelCase is one example.
The ideal number of arguments for a function is zero. Having more than three arguments for a function should be avoided.
Having lots of function arguments makes it harder to read and understand the function since it forces you to know extra details (and those details are usually from a different layer of abstraction).
Additionally, having lots of function arguments makes it more difficult for you to write tests. You have to write all the test cases to ensure that all the various combinations of arguments work properly!
A Side Effect is where your function modifies some state variable outside of its local environment, in addition to returning a value (the intended effect).
An example might be a function that reads a certain value from a database and then returns that value.
If that same function also modifies state in some way (and that modification lasts after the function terminates), then that modification is a side effect.
Clean Code recommends you avoid side effects.
Your function should either change the state of an object(s), or it should return some information. A single function should not do both!
Your functions should rarely return error codes. Instead, if something goes wrong, you should throw an exception with a well-written error message.
One issue with error codes is that it leads to deeply nested structures. When you return an error code, the caller must deal with the error code immediately.
On the other hand, a caller can separate the exception handler logic from the other code (a try, catch statement with separate try logic and catch logic).
Quastor is a free Software Engineering newsletter that sends out summaries of technical blog posts, deep dives on interesting tech and FAANG interview questions and solutions.
Grab is the largest transportation and food delivery company in Southeast Asia with more than 25 million monthly users completing ~2 billion transactions per year.
One of the marketing features in the Grab app is to offer real-time rewards whenever a user takes a certain action (or series of actions).
For example, if a user uses the Grab app to get a ride to work in the morning, the app might immediately reward her with a 50% off ride reward that she can use in the evening for the ride back home.
Jie Zhang and Abdullah Al Mamum are two senior software engineers at Grab and they wrote a great blog post on how they process thousands of events every second to send out hundreds of millions of rewards monthly.
Here’s a summary
Grab runs growth campaigns where they’ll reward a user with discounts and perks if the user completes a certain set of actions. Over a typical month, they’ll send out ~500 million rewards and over 2.5 billion messages to their end-users.
Trident is the engine Grab engineers built to handle this workload. It’s an If This, Then That engine which allows Grab’s growth managers to create new promotional campaigns. If a user does this, then award that user with that.
Trident’s architecture was designed with the following goals
Whenever a customer uses one of Grab’s products the backend service associated with that product will publish an event to a specific Kafka stream.
Trident subscribes to all the events from these multiple Kafka streams and processes them. By utilizing Kafka streams, Trident is decoupled from the upstream backend services.
Kafka guarantees at-least-once message delivery and then Trident makes sure any duplicate events are filtered out. This gives Trident exactly-once event processing, fulfilling the robustness criteria.
After filtering out duplicates, Trident will process each event and check if it results in any messages/rewards that have to be sent to the user. Trident does this by taking the event and running a rule evaluation process where it checks if the event satisfies any of the pre-defined rules set by the growth campaigns.
All processed events are stored in Redis (for 24 hours) and events that trigger an action are persisted in MySQL as well.
If an action is triggered, Trident will then call the backend service associated with that action. These calls are rate-limited (with tighter limits during peak hours) so that Trident doesn’t accidently DoS attack any of Grab’s downstream backend services.
The number of events that Trident has to process can vary widely based on the time of day, day of week and time of year. During the peak of 2020, Trident was processing more than 2,000 events per second.
Grab uses quite a few strategies to make sure Trident can scale properly. The strategies are illustrated in this diagram.
It boils down to two things: scaling the server level and scaling the data store level.
The source of events for Trident are Kafka streams. Upstream backend services that are handling delivery/taxi orders will publish events to these streams after they handle a user’s request.
Trident can handle increased load (more events coming down the Kafka streams) by
If any actions are triggered, Trident will call downstream backend services to handle them. For example, the GrabRewards service could be called to give a user a free ride.
There are strict rate-limits built in to stop Trident from overwhelming these downstream services during a time of high load.
Trident uses two types of storage: cache storage (Redis) and persistent storage (MySQL and S3).
Scaling cache storage isn’t too complicated since Redis Cluster makes it pretty simple. Engineers can add new shards at any time to spread out the load and data replication prevents any data loss from shard failures.
In terms of persistent storage, Trident has two types of data in terms of access pattern: online data and offline data.
The online data is frequently accessed (so it has to be relatively quick) and medium size (a couple of terabytes). Grab uses MySQL for this data.
The offline data is infrequently accessed but very large in size (hundreds of terabytes generated per day). Grab uses AWS S3 for this.
For the MySQL database, Grab added read replicas that can handle some of the load from the read queries. This relieved more than 30% of the load from the master instance and allows MySQL queries to be performant.
In the future, they plan to vertically partition (split the database up by tables) the single MySQL database into multiple databases based on table usage.
Given an integer array nums and an integer k, return the number of good subarrays of nums.
A good array is an array where the number of different integers in that array is exactly k.
A subarray is a contiguous part of an array.
As a reminder, here’s our last question
Given an integer, convert it to a Roman Numeral.
Example
Input: 3
Output: “III”
Input: 4
Output: “IV”
Input: 1994
Output: “MCMXCIV”
Solution
This question can actually be solved by a pretty elegant solution.
We first put all the possible combinations of roman numeral symbols and their integer values in an ordered dictionary (dictionaries are ordered in Python 3.6+ by default).
We put them in order from largest to smallest.
Then, we iterate through our dictionary and for each integer value, we divide the number by that integer value.
That tells us how many of those roman numerals we will need.
For example, 4000 will be represented as MMMM.
4000 / 1000 = 4, which means 4 Ms.
Then, we can add that number of roman numeral symbols to our character array (that represents the final roman numeral).
After, we can set our number to the remainder that’s left.
Then, we continue on to the next roman numeral, integer value pair in our dictionary.
Here’s the Python 3 code
Quastor is a free Software Engineering newsletter that sends out summaries of technical blog posts, deep dives on interesting tech and FAANG interview questions and solutions.