Software Architecture Principles
Hey Everyone,
Today we’ll be talking about
Creating a Static Analysis tool at Slack
Slack’s codebase is largely written in the Hack programming language (a typed dialect of PHP that was developed at Facebook)
There are no widely-used static analysis tools for Hack, so two interns set out to build one
They did it by adding support for Hack to Semgrep (a popular open source static analysis tool)
10 Principles for Architecture at Salesforce
Wesley Beary is a Software Architect at Salesforce and he wrote an article on 10 principles that the company uses
We’ll share 5 of our favorite principles (check the article for the rest)
Plus, a couple awesome tech snippets on
Good C++ codebases to read if you want to become a better C++ programmer
A brief introduction to Operating Systems (by brief I mean 100 pages)
Intel explains Quantum Computing
We have a solution to our last Google interview question and a new question from Apple.
Quastor Daily is a free Software Engineering newsletter sends out FAANG Interview questions (with detailed solutions), Technical Deep Dives and summaries of Engineering Blog Posts.
Nicholas Lin and David Frankel were two interns on Slack’s Product Security team, and they worked on building a static code analysis tool for Slack’s codebase.
In case you’re unaware, Slack is a chat tool catering towards businesses.
Here’s a summary
Static Code Analysis tools inspect your source code (without executing it) and identify potential errors and security vulnerabilities.
Giving engineers static analysis tools can immensely help developer productivity and make the codebase much more secure.
Slack’s codebase is largely written in the Hack programming language.
Hack was developed at Facebook and is a typed dialect of PHP (Hack allows both dynamic and static typing, so it’s type system is classified as gradually typed).
There were no static analysis tools available for Hack, so Nicholas and David set out to build one.
Building a static analysis tool from scratch would be too complex, so they decided to extend an existing open source static analysis tool, Semgrep.
Semgrep was already in use at Slack to scan code in 6 different languages, and there’s already infrastructure in place to integrate Semgrep into the CI/CD pipeline.
In order to add Hack functionality to Semgrep, they needed to answer two questions
What are the grammar rules for the Hack language?
How can Semgrep understand these grammar rules?
Developing a grammar for Hack
Programming languages have a structure to them that is known as the grammar.
The main notation used to represent grammars is the Backus-Naur Form (BNF).
Here’s an example of a very simple grammar that can recognize arithmetic expressions.
A programming language’s grammar will let you transform the program from a series of ASCII characters (words, spaces, etc.) into a concrete syntax tree (also known as a parse tree).
The concrete syntax tree (CST) is an exact visual representation of the parsed source code based on the grammar.
So, Nicholas and David incrementally wrote the grammar rules for the Hack programming language.
They tested their grammar by using a library called Tree-sitter.
Tree-sitter can take your grammar rules and then generate a language parser from them.
Nicholas and David used Tree-sitter to create a Hack parser using their grammar.
Then, they tested that parser by using it to convert some of Slack’s source code into a CST.
From this conversion, they can measure the parse rate, which is the proportion of the source code that could be properly parsed to construct a CST.
The higher the parse rate, the more code your parser was able to understand to construct the CST.
Since the parser is based on the grammar rules, having a very high parse rate means that your grammar is very expressive and covers the programming language well.
Nicholas and Dave were eventually able to develop a grammar that achieved a parse rate of greater than 99.999%.
Of the 5 million lines of Hack code they tested, there were less than 15 lines of unparsable code.
The Hack grammar they developed is open source. You can view it here.
Teaching Semgrep the Grammar
Semgrep (the static analysis tool) uses an abstract syntax tree to understand the source code and find bugs/vulnerabilities.
While the CST is an exact representation of your code, the abstract syntax tree (AST) focuses on the essential information.
An AST focuses on the structure of your code, and represents it in a hierarchical data structure useful for analysis.
You can read about the differences between an AST and a CST here.
Semgrep converts your source code (in Go, Java, JavaScript, Python, Ruby, etc.) into a language-agnostic AST.
Then, it looks through a list of rules that check the Semgrep AST for bugs and vulnerabilities.
This makes Semgrep highly extensible, as it’s loosely coupled with the programming language.
In order to map the tree-sitter CST to the Semgrep AST, Nicholas and David wrote a custom parser file in OCaml (Semgrep-core is written in the OCaml programming language).
Then, they plugged this file into Semgrep and used the static analysis capabilities with Hack code.
For more details, read the full article here.
Tech Snippets
Good C++ codebases to read - Reading good code is one of the best ways to pick up a programming language.This is an interesting HN discussion where someone asks for good C++ codebases to read for someone new to the language.A couple of expertly-written C++ codebases that were recommended were
A Brief Introduction to Operating Systems - This book is an awesome introduction to Operating Systems. It does not assume you know C and it does not assume you’ve studied Computer Architecture, so the book is great for beginners.
Intel explains Quantum Computing - Architecture All Access is a fantastic series by Intel that goes into depth on various topics in computer architecture.Their past videos were onTheir video on Quantum Computing goes into
Quantum Superposition and Quantum Entanglement
How Quantum Computing can help with the Protein Folding problem
The engineering behind Quantum Computers
Quastor Daily is a free Software Engineering newsletter sends out FAANG Interview questions (with detailed solutions), Technical Deep Dives and summaries of Engineering Blog Posts.
Wesley Beary is a Software Architect at Salesforce and spends his time on Engineering Practices and Culture.
He wrote a great article for the Salesforce Engineering blog where he discussed 10 principles that the company uses for software architecture.
Here are 5 of the most interesting points
Everything Evolves - Don’t just think about end states when building a system. Have a plan for decomposability, flexibility, and extensibility. Your system will evolve over time.
Overcome Silos - Big companies don’t always have the right incentive structure for building shared value. It’s important to push against this. This may mean doing what’s better for the global good instead of what’s locally optimal.
Complexity is Debt - Treat every bit of complexity added to your system as highly suspect. Viewing additional complexity as similar to technical debt can be a helpful framework.
Communicate the Why - Be clear on the Why behind the decisions and write it down! Leave good records of your thinking for posterity.
Technology serves Customers - There are usually many layers between the engineer/architect and the end customer but you should always be aware of customer needs and wants. Have a clear picture of how your work helps the customer.
Read all 10 of the principles here.
Interview Question
You are given an n x n 2D matrix.
Rotate the matrix by 90 degrees clockwise.
We’ll send a detailed solution in our next email, so make sure you move our emails to primary, so you don’t miss them!
Gmail users—move us to your primary inbox
On your phone? Hit the 3 dots at the top right corner, click "Move to" then "Primary"
On desktop? Back out of this email then drag and drop this email into the "Primary" tab near the top left of your screen
Apple mail users—tap on our email address at the top of this email (next to "From:" on mobile) and click “Add to VIPs”
Previous Solution
As a reminder, here’s our last question
Write a function to randomly generate a set of m integers from an array of size n.
Each element must have an equal probability of being chosen.
Solution
The key to solving questions of this type is a technique called Reservoir Sampling.
Reservoir Sampling is a family of algorithms for choosing a simple random sample of k items from a population of unknown size n in a single pass over the items.
Here’s a fantastic video on Reservoir Sampling. I’d highly recommend you watch it if you’re unfamiliar with the concept.
So, the video is for picking 1 item from n items where n is unknown (and you want each of the n items to have an equally likely probability of being picked).
For our problem, we want to pick m items out of n.
We can do this by selecting the first m items and placing them in our sample.
Then, for the m+1th item, we will randomly select it with a probability of (m/m+1).
If the m+1th item is selected, then we will randomly select one of our m items (each has a (1/m) probability of being selected) to be replaced by the m+1th item.
For the m+2th item, that will have a probability of (m/m+2) of being selected, and if selected will randomly replace one of the items in our current sample (chosen with probability 1/m).
For the m+ith item, that will have a probability of (m/m+i) of being selected.
And so on until we reach the nth integer.
At the end, we’ll have m integers out of the n possible choices where each integer had an equally likely chance of being selected.