How Quora integrated a Service Mesh in their Backend

We'll delve into what a Service Mesh is, why you'd use one and how it's used at Quora. Plus, why new hires often get paid more than existing employees, lessons from successful one-person startups and more.

Hey Everyone!

Today we’ll be talking about

  • How Quora integrated a Service Mesh in their Backend

    • What is a Service Mesh and why would you use one

    • Service Mesh Concepts Explained (Control Plane vs. Data Plane)

    • Why Quora picked Istio and the other choices they considered

    • Design challenges they faced with Istio

    • Final Results

  • Tech Snippets

    • 10 Lessons from Successful One Person Startups

    • Why New Hires often get Paid more than Existing Employees

    • Detecting Traffic Anomalies at Scale

    • Going from Developer to CEO

You probably know Brave as the ad-blocking, privacy-focused web browser. But, did you know Brave also has one of the fastest growing, independent search engines out there?

Now, they’ve released the Brave Search API, a fantastic way to incorporate this search engine in your app, and connect your AI to the Web.

Maybe you want to build a bot that looks at the latest headlines on stocks in your portfolio and texts you an alert when there’s something important. Or find all the latest news headlines and use GPT-4 to send you summaries of the articles without clickbait.

You can build both of these—and hundreds more ideas—in a weekend with the Brave Search API!

The Brave Search API is

  • Affordable - It’s much cheaper and easier to set up than the other big tech options. So it’s perfect for everything from small projects to large apps.

  • High Quality - Brave’s index is populated with sites that real people actually visit. No junk or clickbait farms, no SEO spam, and a much more human dataset.

  • Easy to Use - It’s quick and easy to set up. Data is structured for simple implementation across a wide range of apps, from NLP to complex analysis.

  • It’s Fast - The API is optimized for low latency, so it’s ideal for real-time apps like responsive search or chatbots.

You can use the API for free for up to 2,000 queries per month.

sponsored

How Quora integrated a Service Mesh into their Backend

Quora is a question-answering website with over 400 million monthly active users. You can post questions about anything on the site and other users will respond with long-form answers.

For their infrastructure, Quora uses both Kubernetes clusters for container orchestration and separate EC2 instances for particular services.

Since late 2021, one of their major projects has been building a service mesh to handle communication between all their machines and improve observability, reliability and developer productivity.

The Quora engineering team published a fantastic blog post delving into the background, technical evaluations, implementation and results of the service mesh migration.

We’ll first explain what a service mesh is and what purpose it serves. Then, we’ll delve into how Quora implemented theirs.

We talk about a lot of different technical concepts in Quastor. If you’d like long-form deep dives on specific concepts (like DynamoDB, Redis, Spark, Caching Strategies and more) then check out Quastor Pro.

What is a Service Mesh

A service mesh is an infrastructure layer that handles communication between the microservices (or machines) in your backend.

As you might imagine, communication between these services can be extremely complicated, so the service mesh will handle tasks like

  • Service Discovery - For each microservice, new instances are constantly being spun up/down. The service mesh keeps track of the IP addresses/port number of these instances and routes requests to/from them.

  • Load Balancing - When one microservice calls another, you want to send that request to an instance that’s not busy (using round robin, least connections, consistent hashing, etc.). The service mesh can handle this for you.

  • Observability - As all communications get routed through the service mesh, it can keep track of metrics, logs and traces.

  • Resiliency - The service mesh can handle things like retrying requests, rate limiting, timeouts, etc. to make the backend more resilient.

  • Security - The mesh layer can encrypt and authenticate service-to-service communications. You can also configure access control policies to set limits on which microservice can talk to whom.

  • Deployments - You might have a new version for a microservice you’re rolling out and you want to run an A/B test on this. You can set the service mesh to route a certain % of requests to the old version and the rest to the new version (or some other deployment pattern)

Architecture of Service Mesh

In practice, a service mesh typically consists of two components

  • Data Plane

  • Control Plane

Data Plane

The data plane consists of lightweight proxies that are deployed alongside every instance for all of your microservices (i.e. the sidecar pattern). This service mesh proxy will handle all outbound/inbound communications for the instance.

So, with Istio (a popular service mesh), you could install the Envoy Proxy on all the instances of all your microservices.

Control Plane

The control plane manages and configures all the data plane proxies. So you can configure things like retries, rate limiting policies, health checks, etc. in the control plane.

The control plane will also handle service discovery (keeping track of all the IP addresses for all the instances), deployments, and more.

Integrating a Service Mesh at Quora

The Quora team looked at several options for the data plane and the control plane. For the data plane, they looked at Envoy, Linkerd and Nginx. For the control plane, they looked at Istio, Linkerd, Kuma, AWS app mesh and a potential in-house solution.

They decided to go with Istio because of its large community and ecosystem. One of the downsides is Istio’s reputation for complexity, but the Quora team found that it had become simpler after it depreciated Mixer and unified control plane components.

Design

When implementing the service mesh in Quora’s hybrid environment, they had several design problems they needed to address.

  1. Connecting EC2 VMs and Kubernetes - Istio was built with a focus on Kubernetes but the Quora team found that integrating it with EC2 VMs was a bit bumpy. They ended up forking the Istio codebase and making some changes to the agent code that was running on their VMs.

  2. Handling Metrics Collection - For historical/legacy reasons, Quora stored Kubernetes metrics in Prometheus and VM application metrics in Graphite. They ended up migrating to VictoriaMetrics for easier integration.

  3. Configuration and Deployment - Istio configurations are verbose due to it’s rich feature-set. This can make it a bit complex for engineers to ramp up to all the Istio concepts. To improve developer productivity, Quora created high-level abstractions defined in YAML that engineers could use instead.

Results

Quora first deployed the service mesh in late 2021 and have since integrated hundreds of services (using thousands of proxies).

Some features they were able to spin up with the service mesh include

  • Canary deployments with precise traffic controls

  • Load Balancing/Rate limiting/Retries

  • Generic service warm up infrastructure so that new pods can warm up their local cache from live traffic

For more details, read the full blog post here.

You probably know Brave as the ad-blocking, privacy-focused web browser. But, did you know Brave also has one of the fastest growing, independent search engines out there?

Now, they’ve released the Brave Search API, a fantastic way to incorporate this search engine in your app, and connect your AI to the Web.

Maybe you want to build a bot that looks at the latest headlines on stocks in your portfolio and texts you an alert when there’s something important. Or find all the latest news headlines and use GPT-4 to send you summaries of the articles without clickbait.

You can build both of these—and hundreds more ideas—in a weekend with the Brave Search API!

The Brave Search API is

  • Affordable - It’s much cheaper and easier to set up than the other big tech options. So it’s perfect for everything from small projects to large apps.

  • High Quality - Brave’s index is populated with sites that real people actually visit. No junk or clickbait farms, no SEO spam, and a much more human dataset.

  • Easy to Use - It’s quick and easy to set up. Data is structured for simple implementation across a wide range of apps, from NLP to complex analysis.

  • It’s Fast - The API is optimized for low latency, so it’s ideal for real-time apps like responsive search or chatbots.

You can use the API for free for up to 2,000 queries per month.

sponsored

Tech Snippets