HackerStash has a rather complicated system for calculating the score for the leaderboard as it includes the following:
- sum of the votes for the project for this month (x10)
- sum of the votes for the posts for this month (x5)
- sum of the projects team members comments for this month (x1)
- sum of the completed challenges for this month (xN)
In order to build the leaderboard, all this information must be known for every project. As you can imagine, this is a very expensive operation! Until recently we used to calculate this on the fly, then cache it in Redis for a few minutes so that subsequent queries were faster.
This approach was a total hack and was not scalable. We were already seeing 1000ms response times with only ~30 projects on the leaderboard which is far too slow.
After some research I decided on a Redis SortedSet as it seemed to fit the bill. I created a new class to handle the leaderboard that would be used for fetching the order of projects, as well as updating the project's score.
The class looks a little something like this (simplified of course):
class Leaderboard: key: str = key() def __init__(self, project): self.project = project @classmethod def order(cls, reverse=False) -> list[int]: order = redis.zrevrange(cls.key, 0, -1) return [int(x.decode('utf-8')) for x in order] @classmethod def remaining_days(cls) -> str: return arrow.utcnow().ceil('month').humanize(only_distance=True) @property def position(self) -> int: rank = redis.zrevrank(self.key, self.project.id) return rank + 1 if rank is not None else -1 @property def score(self) -> int: score = redis.zscore(self.key, self.project.id) or 0.0 return int(score) def update(self, amount: int) -> int: score = redis.zincrby(self.key, amount, self.project.id) return int(score)
Let's break this down!
Leaderboard#order
This uses the Redis ZREVRANGE operation to get the project ids in reverse order using this months key. Redis stores the ids in a byte format, so they must be decoded into strings.
Leaderboard#remaining_days
We use a package called arrow (similar to moment.js for you Node folk) to tell us how long is left for this current tournament.
Leaderboard#position
We use the Redis ZREVRANK method to get an individual projects position. As the Set is 0 index based, we need to increment by 1.
Leaderboard#score
This uses the Redis ZSCORE method to get the current score for the project. It returns the score as a Float if it exists, or None if not. We cast to an Int as it would look a bit weird otherwise.
Leaderboard#update
This uses the Redis ZINCRBY method to increment the projects score. We call this whenever an action occurs like a vote, a simplified example looks like this:
def vote(self, direction: str) -> None: score = 5 if direction == 'up' else -5 vote = Vote(type='post', score=score) Leaderboard(self.project).update(score) db.session.commit()
When fetching the leaderboard, we can first get the order and use it to work out which projects to fetch from the database, like so:
# Fetch the page from the query string, default to the first page page = request.args.get('page', 1, type=int) # Get a list of ids in their correct order from the above class order = Leaderboard.order() # Create an SQL expression using those orders order_expr = func.array_position(order, Project.id) # Get only the projects that qualify for the leaderboard, and order # them based on the order expression projects = Project.query\ .filter(Project.id.in_(order))\ .filter(Project.published == True)\ .order_by(order_expr)\ .paginate(page, 25, False)
Overall, some nice gains in performance and we're left with a more future proof solution
0 points
I've never played with Ruby on Rails before but it's interesting to read and sort of looks fun.
Which flavor of SQL are you using?
1 points
Sweet! Thanks for sharing this insightful behind the scenes view. I honestly 'felt' that the page was loading faster, but didn't really think about the change until I saw your article 👍🏽