20 points

2 weeks ago

How we improved the leaderboard load times by 8x

HackerStash update by Avatar Lewis Monteith in Dev

HackerStash has a rather complicated system for calculating the score for the leaderboard as it includes the following:

  • sum of the votes for the project for this month (x10)
  • sum of the votes for the posts for this month (x5)
  • sum of the projects team members comments for this month (x1)
  • sum of the completed challenges for this month (xN)

In order to build the leaderboard, all this information must be known for every project. As you can imagine, this is a very expensive operation! Until recently we used to calculate this on the fly, then cache it in Redis for a few minutes so that subsequent queries were faster.

This approach was a total hack and was not scalable. We were already seeing 1000ms response times with only ~30 projects on the leaderboard which is far too slow.

After some research I decided on a Redis SortedSet as it seemed to fit the bill. I created a new class to handle the leaderboard that would be used for fetching the order of projects, as well as updating the project's score.

The class looks a little something like this (simplified of course):

class Leaderboard:    
  key: str = key()    

  def __init__(self, project):        
    self.project = project    

  @classmethod    
  def order(cls, reverse=False) -> list[int]:        
    order = redis.zrevrange(cls.key, 0, -1)        
    return [int(x.decode('utf-8')) for x in order]    

  @classmethod    
  def remaining_days(cls) -> str:        
    return arrow.utcnow().ceil('month').humanize(only_distance=True)    

  @property    
  def position(self) -> int:        
    rank = redis.zrevrank(self.key, self.project.id)        
    return rank + 1 if rank is not None else -1    

  @property    
  def score(self) -> int:        
    score = redis.zscore(self.key, self.project.id) or 0.0        
    return int(score)    

  def update(self, amount: int) -> int:        
    score = redis.zincrby(self.key, amount, self.project.id)        
    return int(score)

Let's break this down!

Leaderboard#order

This uses the Redis ZREVRANGE operation to get the project ids in reverse order using this months key. Redis stores the ids in a byte format, so they must be decoded into strings.

Leaderboard#remaining_days

We use a package called arrow (similar to moment.js for you Node folk) to tell us how long is left for this current tournament.

Leaderboard#position

We use the Redis ZREVRANK method to get an individual projects position. As the Set is 0 index based, we need to increment by 1.

Leaderboard#score

This uses the Redis ZSCORE method to get the current score for the project. It returns the score as a Float if it exists, or None if not. We cast to an Int as it would look a bit weird otherwise.

Leaderboard#update

This uses the Redis ZINCRBY method to increment the projects score. We call this whenever an action occurs like a vote, a simplified example looks like this:

def vote(self, direction: str) -> None:    
  score = 5 if direction == 'up' else -5    
  vote = Vote(type='post', score=score)    
  Leaderboard(self.project).update(score)    
  db.session.commit()

When fetching the leaderboard, we can first get the order and use it to work out which projects to fetch from the database, like so:

# Fetch the page from the query string, default to the first page
page = request.args.get('page', 1, type=int)

# Get a list of ids in their correct order from the above class
order = Leaderboard.order()

# Create an SQL expression using those orders
order_expr = func.array_position(order, Project.id)

# Get only the projects that qualify for the leaderboard, and order
# them based on the order expression
projects = Project.query\    
  .filter(Project.id.in_(order))\    
  .filter(Project.published == True)\    
  .order_by(order_expr)\    
  .paginate(page, 25, False)

Overall, some nice gains in performance and we're left with a more future proof solution