25 points

10 months ago

How we improved the leaderboard load times by 8x

HackerStash update by Avatar Lewis Monteith in Dev

HackerStash has a rather complicated system for calculating the score for the leaderboard as it includes the following:

  • sum of the votes for the project for this month (x10)
  • sum of the votes for the posts for this month (x5)
  • sum of the projects team members comments for this month (x1)
  • sum of the completed challenges for this month (xN)

In order to build the leaderboard, all this information must be known for every project. As you can imagine, this is a very expensive operation! Until recently we used to calculate this on the fly, then cache it in Redis for a few minutes so that subsequent queries were faster.

This approach was a total hack and was not scalable. We were already seeing 1000ms response times with only ~30 projects on the leaderboard which is far too slow.

After some research I decided on a Redis SortedSet as it seemed to fit the bill. I created a new class to handle the leaderboard that would be used for fetching the order of projects, as well as updating the project's score.

The class looks a little something like this (simplified of course):

class Leaderboard:    
  key: str = key()    

  def __init__(self, project):        
    self.project = project    

  def order(cls, reverse=False) -> list[int]:        
    order = redis.zrevrange(cls.key, 0, -1)        
    return [int(x.decode('utf-8')) for x in order]    

  def remaining_days(cls) -> str:        
    return arrow.utcnow().ceil('month').humanize(only_distance=True)    

  def position(self) -> int:        
    rank = redis.zrevrank(self.key, self.project.id)        
    return rank + 1 if rank is not None else -1    

  def score(self) -> int:        
    score = redis.zscore(self.key, self.project.id) or 0.0        
    return int(score)    

  def update(self, amount: int) -> int:        
    score = redis.zincrby(self.key, amount, self.project.id)        
    return int(score)

Let's break this down!


This uses the Redis ZREVRANGE operation to get the project ids in reverse order using this months key. Redis stores the ids in a byte format, so they must be decoded into strings.


We use a package called arrow (similar to moment.js for you Node folk) to tell us how long is left for this current tournament.


We use the Redis ZREVRANK method to get an individual projects position. As the Set is 0 index based, we need to increment by 1.


This uses the Redis ZSCORE method to get the current score for the project. It returns the score as a Float if it exists, or None if not. We cast to an Int as it would look a bit weird otherwise.


This uses the Redis ZINCRBY method to increment the projects score. We call this whenever an action occurs like a vote, a simplified example looks like this:

def vote(self, direction: str) -> None:    
  score = 5 if direction == 'up' else -5    
  vote = Vote(type='post', score=score)    

When fetching the leaderboard, we can first get the order and use it to work out which projects to fetch from the database, like so:

# Fetch the page from the query string, default to the first page
page = request.args.get('page', 1, type=int)

# Get a list of ids in their correct order from the above class
order = Leaderboard.order()

# Create an SQL expression using those orders
order_expr = func.array_position(order, Project.id)

# Get only the projects that qualify for the leaderboard, and order
# them based on the order expression
projects = Project.query\    
  .filter(Project.published == True)\    
  .paginate(page, 25, False)

Overall, some nice gains in performance and we're left with a more future proof solution