6 min read

Backend code review guidelines

Code Review Guidelines for Backend Developers
Backend code review guidelines
Photo by Hansjörg Keller / Unsplash

A lot of this uses Ruby on Rails as the Framework and ActiveRecord as the ORM, but a lot of what is written here applies to a wide variety of scenarios regardless of your framework and or ORM of choice.

General

  • Be thorough in your reviews to reduce the number of iterations.
  • Communicate the ideas that you feel strongly about and those that you don’t feel as strongly e.g., blocking comments vs not blocking comments when referring to the conventional comments. Also, code-style preferences over logic and potential performance concerns
  • Seek to understand the author’s perspective.
  • If you don’t understand a piece of the code, say so. There’s a good chance that others will be confused by it as well.
  • Offer alternative solutions and or implementation; consider that those might have been taken into consideration by the author as well when writing the review comment.
  • Ensure that the review comments that you leave are clear and actionable for the author to follow up on them, e.g., “Please use this method instead of a deprecated one” vs. “This is not performant” and not elaborating on the latter.
  • For code that you own, especially in areas of high throughput of your application, think about when others contribute code to that area: “Are they doing something that they don’t realize is super dangerous?”

Performance

  • Every new piece of code should be performant by default, and every new Merge Request/Pull Request should adhere to this.
  • Think about the impact that new code may have on the application you’re currently working on; this also affects the people who maintain it, both on a code-level and on the production level (keeping it up and running)
  • Think outside the box. Someone with experience from a power user or engineering standpoint might have better set expectations on how a feature would be used, but users can be unpredictable. Expect users to test features in unconventional ways, like brute force or taking advantage of edge cases not taken into account.

Timeouts

A reasonable timeout should be set when the system invokes HTTP requests to external services; these should be executed in Sidekiq or a background job.

Memory usage

Any modifications to the current code should not increase memory unless absolutely necessary. New code should add only what’s required.

For example, when parsing a large document (e.g., an HTML document), it’s best to parse it as a stream whenever possible instead of loading the entire input into memory.

Caching

Cache data in memory or an in-memory data structure store like Redis when it’s needed multiple times in a transaction or has to be kept around for a certain time

For example, processing snippets of text containing username mentions (e.g., Hello @alice or How are you doing @alice). By caching the user objects for every username, the need to run the same query for every mention of Alice

RequestStore can be used to cache data per transaction.

Caching data can be done using Rails’ caching system.

Pagination

Features that render a list of items as a table need to include pagination

Main styles of pagination:

  • Offset-based pagination: User goes to a page, e.g., page 1. User sees the next page number and the total number of pages
  • Offset-based pagination without a number of pages. A user can only see the next page number, but not the total count of available pages.
  • Next page using keyset-based pagination: The User can only go to the next page, and the number of pages available is not known
  • Infinite scroll pagination: User scrolls the page, and the following items, to the ones currently displayed, are loaded asynchronously. Similar to keyset-based pagination in terms of implementation results

Offset pagination

Offset pagination is easier to implement, and it offers a few features that users might appreciate, such as:

  • Users need page numbers.
  • Users need to jump to specific pages.
  • Total count is important.

Unfortunately, the performance degrades with large datasets (O(n) execution time), indices are needed to ensure performance doesn’t degrade even below <10k rows.

Keyset pagination

Keyset pagination is harder to implement, but it offers significant performance benefits (O(1) execution time), making this method better suited for large datasets. Unfortunately, some features are lost, such as:

  • Page numbers
  • The ability to jump between pages
  • No count, but this could be fixed using trigger counts.

Database

Queries

  • Features should not increase the number of queries executed unless absolutely necessary. It is possible to require new queries, but those should be kept to a minimum.

An example of a common N+1 problem

Suppose we have an authors table and a books table

Controller (app/controllers/authors_controller.rb)

class AuthorsController < ApplicationController
  def index
    @authors = Author.all
  end
end

View (app/views/authors/index.html.erb)

<h1>Authors and Their Books</h1>
<ul>
  <% @authors.each do |author| %>
    <li>
      <strong><%= author.name %></strong>:
      <%= author.books.map(&:title).join(', ') %> <%# This line triggers a new query for EACH author! %>
    </li>
  <% end %>
</ul>

This will cause the following queries to be executed

SELECT "authors".* FROM "authors";
SELECT "books".* FROM "books" WHERE "books"."author_id" = 1;
SELECT "books".* FROM "books" WHERE "books"."author_id" = 2;
SELECT "books".* FROM "books" WHERE "books"."author_id" = 3;
SELECT "books".* FROM "books" WHERE "books"."author_id" = 4;
SELECT "books".* FROM "books" WHERE "books"."author_id" = 5;

This means for 5 authors we've got 5 queries! This will grow at O(n) rates

A simple fix would be

class AuthorsController < ApplicationController
  def index
    @authors = Author.includes(:books)
  end
end
  • Use read replicas when necessary.
  • A Database can be a cluster of servers that can have many read replicas and one primary database server. A classic use to scale a Database cluster is to have the read replicas perform read-only actions.
  • By default, queries use the read-only replicas. Primary sticking needs to be factored in as this process will have a cluster stick to the primary for a set amount of time before moving to the secondaries after the time has passed, or if they catch up before the time has passed. Load balancers are the ones responsible for handling the load between replicas and the primary.
  • Use CTEs wisely (Common Table Expressions): CTEs can become problematic in some situations, similar to N+1 problems

Bad example

WITH EngineeringEmployees AS 
( 
	SELECT e.first_name, e.last_name, e.salary 
	FROM employees AS e 
	JOIN departments AS d 
	ON e.department_id = d.id 
	WHERE d.department_name = 'Engineering' 
)

SELECT * FROM EngineeringEmployees;

This is just a fancy way of writing

SELECT e.first_name, e.last_name, e.salary 
FROM employees AS e 
JOIN departments AS d 
ON e.department_id = d.id 
WHERE d.department_name = 'Engineering';

Good example

/*
  This query first calculates the average salary for each department (cte_dept_avg).
  Then, it joins this result with the employee and department tables
  to list only the departments where the average salary is above the
  company-wide average salary (which is calculated in a subquery).
*/
WITH DepartmentAvgSalary AS (
  SELECT
    department_id,
    AVG(salary) AS avg_dept_salary
  FROM employees
  GROUP BY
    department_id
)
SELECT
  d.department_name,
  das.avg_dept_salary
FROM DepartmentAvgSalary AS das
JOIN departments AS d
  ON das.department_id = d.id
WHERE
  das.avg_dept_salary > (
    SELECT
      AVG(salary)
    FROM employees
  );
  • Cached queries: Rails provides an SQL Query Cache, used to cache the results of database queries for the duration of a request
  • Any new code should not execute multiple duplicated cached queries, which goes back to the first point of this section. The number of queries executed, including cached ones, should not increase unless absolutely necessary.
  • Cached queries are considered bad, even if among the benefits they provide, such as reducing the load to the database, this is because:
    • They consume memory.
    • Require Rails to re-instantiate each ActiveRecord object.
    • Require Rails to re-instantiate each relation of the object.
    • They spend additional CPU cycles to look into a list of cached queries.
    • They can potentially mask N+1 query problems.
  • SQL queries must not be executed in a loop unless absolutely necessary. In a development environment, this could be fine, but in a large production environment, this could spiral out of control.

Eager Loading

Always eager-load associations when retrieving more than one row

When retrieving multiple records for which you need to use any associations, you must eager load these associations.

In other words, instead of this:

Post.all.each do |post|
  puts post.author.name
end

You should use this:

Post.all.includes(:author).each do |post|
  puts post.author.name
end

Glossary

Should: From the RFC 2119

“This word, or the adjective "RECOMMENDED", means that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.”