Khan Academy's migration from Python to Go
Hope you’re all having a fantastic day!
Today we’ll be talking about
- A step-by-step overview of how SHA-256 works
- Why Khan Academy switched from Python to Go as their backend server language and why they went from a Monolith to a Services-oriented architecture.
The previous interview question is on backtracking and today’s interview question is from Google.
- Front-End Checklist - An exhaustive list of all the things you have to have/test before you launch your website.
- SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the NSA.
- SHA-256 is one cryptographic hash function in the SHA-2 family where the output of the hash function (the message digest) is 256 bits.
- Step-by-step guide on how SHA-256 works
- Pre-process the input (the message) - Convert the string to binary and apply several other transformations (detailed in the link).
- Initialize Hash Values - Create 8 hash values. These are hard-coded constants that represent the first 32 bits of the fractional parts of the square roots of the first 8 prime numbers.
- Initialize Round Constants - Create 64 constants. Each value is the first 32 bits of the fractional parts of the cube roots of the first 64 primes.
- Chunk Loop - Split the message into 512 bit chunks. Run steps 5-7 on each chunk.
- Create Message Schedule - Expand the input message into a 64 word array called the message schedule.
- Compression - Initialize 8 variables and set them equal to the 8 hash values created earlier. Run the compression function main loop that loops over the 64 words in the message schedule and runs a series of steps that mutate the 8 variables based on each of the 64 words.
- Modify Final Values - After the compression loop, modify the 8 variables by adding each of them with their respective hash values from the original 8 hash values.
- Concatenate Final Hash - Concatenate the 8 variables with a simple string concatenation.
- In December of 2019, Khan Academy (KA) was looking to update their backend.
- KA has been using Python 2 as their backend server language for over 10 years and had to update since Python 2 was about to reach official end of life (January 1rst, 2020)
- Khan Academy had several options
- Migrate from Python 2 to Python 3 - They would get a 10-15% boost in backend server code performance plus Python 3’s features.
- Migrate from Python 2 to Kotlin - KA started using Kotlin for compute-intensive backend tasks since it was more performant than Python 2 and had good support on Google App Engine (KA uses Google Cloud).
- Migrate from Python 2 to Go - Go has very quick compile times, first-class support on Google App Engine and used a lot less memory than Kotlin (based on KA’s tests).
- KA previously used a monolith architecture, but they also decided to switch to a services-oriented architecture. Here’s why…
- Deployment and tests are now faster since they can move more quickly for a single service (as opposed to upgrading the entire backend).
- A problem with deployment will have limited impact on other parts of the website.
- KA can choose the right kind of instances and hosting configuration needed for each individual service, as opposed to having one for the entire backend. This optimizes performance and cost.
- KA will continue inside the Google Cloud Platform ecosystem, using Google App Engine and Google Cloud Datastore for their databases. They’ve found GCP to be perform well and scale with their needs. (Hopefully they won’t have to deal with GCP customer support anytime soon. Or maybe Sal Khan has Sundar’s phone number?)
You are given an array which consists of non-negative integers. You are also given an integer m.
Split the array into m non-empty continuous subarrays such that the largest sum amongst all the subarrays is minimized.
We’ll send the solution in our next email, so make sure you move our emails to primary, so you don’t miss them!
Gmail users—move us to your primary inbox
- On your phone? Hit the 3 dots at the top right corner, click "Move to" then "Primary"
- On desktop? Back out of this email then drag and drop this email into the "Primary" tab near the top left of your screen
- A pop-up will ask you “Do you want to do this for future messages from [email protected]” - please select yes
Apple mail users—tap on our email address at the top of this email (next to "From:" on mobile) and click “Add to VIPs”
As a refresher, here’s the last question
Given an integer array nums that may contain duplicates, return all possible subsets.
The solution set must not contain duplicate subsets. Return the solution in any order.
This question is asking us to return the power set given a set of integers (nums).
The power set is defined as the set of all subsets (including the empty set and the set itself).
In the worst case (where nums is not empty and consists of all unique integers), there will be 2^N subsets in our power set (where N is the number of integers in nums).
Therefore, our worst case time complexity will be at least O(2^N).
Finding the power set will require a backtracking approach.
With backtracking, you incrementally build up your solution until you reach some block (no more solutions are possible). Then, you recurse back up the stack and incrementally build solutions in a different direction.
Let’s apply backtracking to the given question. Let’s say nums is [1,2,3].
We’ll use the array powerset to be an array of arrays to keep track of all the subsets. After we finish backtracking, our function can just return powerset.
We’ll use curSol to be our current solution. We’ll start with curSol being equal to an empty array.
The empty set is one valid subset of nums, so we’ll add curSol to powerset.
Now, we’ll add 1 to curSol, making it .
This is a valid subset, so we add it to powerset.
We’ll continue down this path and add 2 to curSol, making it [1,2].
Another valid subset, so we add it to powerset.
We continue down and add 3 to curSol, making it [1,2,3].
Again, a valid subset, so we add it to powerset.
Now, we’ve reached a block. There are no more integers in nums to add to curSol.
Therefore, we’ll back out of the current solution by popping 3 off curSol, making curSol equal to [1,2].
However, since we’ve already gone down the 3 path, there still aren’t any numbers in nums that we can add to curSol.
Therefore, we’ll back out of the current solution by popping 2 off, making curSol equal to .
Now, we have the 3 path to explore. Therefore, we’ll add 3 to curSol, making it [1,3].
This is a new valid subset, so we’ll add it to our powerset.
At this point, we’ve reached another block. There are no new integers in nums that we can add to curSol.
Therefore, we’ll pop off 3 and go back to where curSol is .
We’re still at a block, since there are no new integers in nums.
So, we’ll pop off 1, and go back to the empty set.
Now, we’ll add 2 to curSol, making curSol equal to . Again, a valid subset, so we add it to powerset. Then, we continue down the [2,] solution.
Hopefully, you’re getting the gist of how the backtracking approach works now.
Here’s the Python 3 code.
What’s the time/space complexity of our code?
Reply back with your best guess and we’ll tell you if you’re right or wrong!
If you want to practice more questions like this with top engineers at Google, Facebook, etc. then check out Interviewing.io.
You can book realistic mock interviews with senior FAANG engineers who will give you detailed and actionable feedback on exactly what you need to work on.
You don’t pay anything until you’re hired.
Check them out here.