How LinkedIn Reduced Latency with JSON

An overview of Protobuf and how it's used at LinkedIn. Plus, how SMS Fraud works and how to guard against it, things DBs don't do and more.

Hey Everyone!

Today we'll be talking about

  • Why LinkedIn switched from JSON to Protobuf

    • A brief intro to Rest.li, LinkedIn’s framework for creating web servers and clients

    • Issues they were facing with JSON and alternatives they considered

    • An overview of Protobuf

    • Results from switching to Protobuf

  • Tech Snippets

    • How SMS Fraud Works and How to Guard Against It

    • Things DBs Don’t Do - But Should

    • Structural Lessons in Engineering Management

How Beacons Connects Millions of Accounts Across Various Social Platforms with a Single API

Whenever you click an instagram creator’s link-in-bio, there’s a good chance you’ll be directed to their Beacons page. They’re one of the largest link-in-bio tools with millions of creators signed up.

To power their platform, Beacons relies on Phyllo, the most powerful API platform in the creator space.

Here’s why Beacons (and hundreds of other companies) rely on Phyllo.

  • Real-Time Webhook Systems - You don’t have to bother polling their API for updates and dealing with rate limits or quota constraints. Phyllo sends real-time updates via webhooks whenever metrics change.

  • Streamlined Integrations - Integrating an app with Instagram, YouTube, Twitch, TikTok and other apps can require a months-long, multi-step process. Phyllo has a pre-approved API so you can instantly pull data from all of these platforms.

  • Maintenance Made Easy - Phyllo handles ongoing integration issues, debugging, support requirements and more. They offer guaranteed uptime and support SLAs.

  • SOC 2 Compliant - Phyllo is SOC 2 compliant and prioritizes safe and efficient data retrieval. Their system is engineered for massive scale.

Join Beacons and hundreds of other startups in accessing creator data instantly!

sponsored

Why LinkedIn switched from JSON to Protobuf

LinkedIn has over 900 million members in 200 countries. To serve this traffic, they use a microservices architecture with thousands of backend services. These microservices combine to tens of thousands of individual API endpoints across their system.

As you might imagine, this can lead to quite a few headaches if not managed properly (although, properly managing the system will also lead to headaches, so I guess there’s just no winning).

To simplify the process of creating and interacting with these services, LinkedIn built (and open-sourced) Rest.li, a Java framework for writing RESTful clients and servers.

To create a web server with Rest.li, all you have to do is define your data schema and write the business logic for how the data should be manipulated/sent with the different HTTP requests (GET, POST, etc.).

Rest.li will create Java classes that represent your data model with the appropriate getters, setters, etc. It will also use the code you wrote for handling the different HTTP endpoints and spin up a highly scalable web server.

For creating a client, Rest.li handles things like

  • Service Discovery - Translates a URI like d2:// to the proper address - http://myD2service.something.com:9520/.

  • Type Safety - Uses the schema created when building the server to check types for requests/responses.

  • Load Balancing - Balancing request load between the different servers that are running a certain backend service.

  • Common Request Patterns - You can do things like make parallel Scatter-Gather requests, where you get data from all the nodes in a cluster.

and more.

To learn more about Rest.li, you can check out the docs here.

JSON

Since it was created, Rest.li has used JSON as the default serialization format for sending data between clients and servers.

{

  "id": 43234,

  "type": "post",

  "authors": [

    "jerry",

    "tom"
  ]
}

JSON has tons of benefits

  • Human Readable - Makes it much easier to work with than looking at 08 96 01 (binary-encoded message). If something’s not working, you can just log the JSON message.

  • Broad Support - Every programming language has libraries for working with JSON. (I actually tried looking for a language that didn’t and couldn’t find one. Here’s a JSON library for Fortran.)

  • Flexible Schema - The format of your data doesn’t have to be defined in advance and you can dynamically add/remove fields as necessary. However, this flexibility can also be a downside since you don’t have type safety.

  • Huge amount of Tooling - There’s a huge amount of tooling developed for JSON like linters, formatters/beautifiers, logging infrastructure and more.

However, the downside that LinkedIn kept running into was with performance.

With JSON, they faced

  • Increased Network Bandwidth Usage - plaintext is pretty verbose and this resulted in large payload sizes. The increased network bandwidth usage was hurting latency and placing excess load on LinkedIn’s backend.

  • Serialization and Deserialization Latency - Serializing and deserializing an object to JSON can be suboptimal due to how verbose the messages are. This is not an issue for the majority of applications, but at Linkedin’s volume it was becoming a problem.

To reduce network usage, engineers tried integrating compression algorithms like gzip to reduce payload size. However, this just made the serialization/deserialization latency worse.

Instead, LinkedIn looked at several formats as an alternative to JSON.

They considered

  • Protocol Buffers (Protobuf) - Protobuf is a widely used message-serialization format that encodes your message in binary. It’s very efficient, supported by a wide range of languages and also strongly typed (requires a predefined schema). We’ll talk more about this below.

  • Flatbuffers - A serialization format that was also open-sourced by Google. It’s similar to Protobuf but also offers “zero-copy deserialization”. This means that you don’t need to parse/unpack the message before you access data.

  • MessagePack - Another binary serialization format with wide language support. However, MessagePack doesn’t require a predefined schema so this can cause it to be less safe and less performant than Protobuf.

  • CBOR - A binary serialization format that was inspired by MessagePack. CBOR extends MessagePack and adds some features like distinguishing text strings from byte strings. Like MessagePack, it does not require a predefined schema.

And a couple other formats.

They ran some benchmarks and also looked at factors like language support, community and tooling. Based on their examination, they went with Protobuf.

Overview of Protobuf

Protocol Buffers (Protobuf) are a language-agnostic, binary serialization format created at Google in 2001. Google needed an efficient way for storing structured data to send across the network, store on disk, etc.

Protobuf is strongly typed. You start by defining how you want your data to be structured in a .proto file.

The proto file for serializing a user object might look like…

syntax = "proto3";

message Person {

  string name = 1;

  int32 id = 2;

  repeated string email = 3;

}

They support a huge variety of types including: bools, strings, arrays, maps, etc. You can also update your schema later without breaking deployed programs that were compiled against the older formats.

Once you define your schema in a .proto file, you use the protobuf compiler (protoc) to compile this to data access classes in your chosen language. You can use these classes to read/write protobuf messages.

Some of the benefits of Protobuf are

  • Smaller Payload - Encoding is much more efficient. If you have {“id”:59} in JSON, then this takes around 10 bytes assuming no whitespace and UTF-8 encoding. In protobuf, that message would be 0x08 0x3b (hexadecimal), and it would only take 2 bytes.

  • Fast Serialization/Deserialization - Because the payload is much more compact, serializing and deserializing it is also faster. Additionally, knowing what format to expect for each message allows for more optimizations when deserializing.

  • Type Safety - As we discussed, having a schema means that any deviations from this schema are caught at compile time. This leads to a better experience for users and (hopefully) fewer 3 am calls.

  • Language Support - There’s wide language support with tooling for Python, Java, Objective-C, C++, Kotlin, Dart, and more.

Results

Using Protobuf resulted in an increase in throughput for response and request payloads. For large payloads, LinkedIn saw improvements in latency of up to 60%.

They didn’t see any statistically significant degradations when compared to JSON for any of their services.

Here’s the P99 latency comparison chart from benchmarking Protobuf against JSON for servers under heavy load.

For more details, read the full blog post here.

How did you like this summary?

Your feedback really helps me improve curation for future emails.

Login or Subscribe to participate in polls.

Here’s Why You Shouldn’t Connect With Social Platforms Directly

When you're building an app that connects to social platforms like YouTube, Facebook, Twitter, Substack, etc., your first instinct might be to just use the platform’s API.

This can be a massive headache because of

  • Normalizing data from different platforms

  • Long app approval processes

  • Rate and quota Limits

  • API changes & maintenance

Phyllo is an abstraction layer that handles all of this for you. You can use their API to connect to over 25 social platforms including TikTok, YouTube, Facebook, Spotify, Twitter, GitHub and more.

They have a webhook system that pushes updates to your app in real time and they take care of normalizing data from all the different platforms. This way, you can focus on building features your users will love.

Tech Snippets