How We're Crushing Our Day 0 Infrastructure Costs

How We're Crushing Our Day 0 Infrastructure Costs

When starting a new business, it's important to keep costs down. Even relatively small costs can add up, so here's what we've done with #HackerStash to reduce our bill, whilst striking a balance against maintainability and reliability.

Reserved Instances

In a "here's one I made earlier" Blue Peter moment, we already had a spare Reserved Instance that we had previously been using for another project.  As we paid for a year up front, we're only paying for the storage on the server.  At time of purchase, the host cost $137, which works out at roughly $12 per month.  If we choose to renew with the same size instance when the reservation ends, we could opt for a standard reservation instead of a convertible, reducing this cost to $8.75 per month.

To allow quick replacement of our EC2 host, we have a small second disk attached, which is used for storing configurations and some local service storage.  The smallest size Amazon will allow is 4GB.  I had hoped I could offset the costs of this disk by reducing the size of the root volume, but we use an Amazon provided AMI for ECS (Elastic Container Service), which requires a 30GB drive as part of it's spec.  34GB of GP2 disk space costs roughly $4 per month.

This disk is setup to attach to our ECS host via Terraform.  This means in a variety of failure scenarios which result in a terminated host, anyone can run terraform apply, and a new instance will be created, with the disk mounted and ready to go- no juggling of old root volumes that might bring old problems with them.

Could we go serverless?  It's certainly an option we're considering.  By the time our reserved host runs out, we should've made a decision, but if we need to defer this decision, instead of reserving, we could purchase a AWS Savings Plan.  This will apply a discount where it makes the most difference and can also be used to offset a Lambda bill.

Load Balancing

At my day job, I've come to rely on AWS's Load Balancer to take care of continuous deployments, and to maintain a list of healthy app hosts.  But for all its benefits, running a load balancer with no targets for a month costs you $18.40 a month.  More once you tack on usage.  Definitely worth the money at scale, but difficult to justify when you want to prove your side project works.

ECS can dynamically allocate a free port for running applications.  AWS's target groups can route these for you, with an ALB, but an alternative approach is to use service discovery.  This means that ECS will automatically create a DNS record which includes the port and protocol for each running container.  This requires a dedicated Route 53 zone, so expect an additional $0.50 on the books, plus you'll want the records to have a short TTL of about 10 seconds, so add a couple of cents for regular DNS lookups.

My original plan was to use Nginx to route traffic to these containers, but the free version of Nginx doesn't support SRV records on its own.  They don't list a price on the Nginx site, so it probably means its outside of our budget.  There are some third party plugins which offer this for free, but this meant I wanted to avoid baking and maintaining a custom docker container.

Enter HAProxy.  I had heard that this was the software behind AWS's ALB, but I couldn't find any decent sources to back up that claim.  It does however have a blog post which walks you through routing traffic to a SRV upstream.

HAProxy doesn't ship with a base config, so it was a little frustrating to work with initially, but when you finally get to grips with it, the syntax is quite simple.  The game changer was discovering there is a stats page which can be enabled, which makes verifying your config so much easier!  One thing I have to revisit is nicely draining servers during a deployment, the short TTL on our SRV records mean this is not an issue for much time during deploy, but ideally we should elimiate this altogether.  I suspect a lambda will come in handy here.

CDN

Amazon's Cloudfront integrates very easily with S3 buckets for hosting static content, but any URL can be made an origin too.  It's Pay as You go model means that failure to launch is cheap.  By default, content is cached for 24 hours, but can be customised using the Cache-Control header, reducing S3 fetch costs, and strain on the backend for common fetches.  Look at the lovely performance score we get from WebPageTest.

You can use multiple behaviour rules in the same distribution to serve different origins under the same domain.  Initially, I thought this would completly replace the need for a load balancer, but origins don't support dynamic port allocation. Using static ports for our services would reduce our redundancy, and would mean minor outages when deploying changes.  We want to release often, so HAProxy is a key component.

Cloudfront -> HAProxy -> ECS Service

Upstream

Using Cloudfront means we can take advantage of ACM for free certificates.  Currently Cloudfront talks to HAProxy over HTTPS for most requests (some blog requests refuse to route nicely).  The tricky bit here is that we can't use ACM, it can only be associated with some first party Amazon services.  Right now we use Let's Encrypt Certificate for within HAProxy.  This works reasonably well, but needs renewing every 90 days, and HAProxy doesn't seem to recognise changes to the certificate on the file system without a restart.  There may be an API call that solves this.

For now, we use a slightly less elegant approach of using a Scheduled ECS task to put the new certificate in a bucket, trigger a lambda, which forces a new deployment for HAProxy.  Assuming everyone works as expected, this will update with only a few seconds outage 6 times a year, and can be mitigated by adjusting the schedule to run in the small hours.

Scheduled Certbot stores the output in S3, then a lambda adjusts the format and places it on our EBS value, ready to be picked up when the ECS HAProxy service is restarted.

An alternative approach would be to talk to our upstream over HTTP.  This avoids the need to restart or sign certificates, and we could limit access to the HTTP version of the site via a firewall rule.  Amazon public the public IP addresses of the servers they use for Cloudfront, and these can be queried by terraform.

Database

For a while, we ran our Postgres database from a single docker container on the same host.  This works fine in principal, but would mean rolling our own backup procedures and managing failover.  Failure would mean we are on our own.  The same setup on Amazon Aurora RDS manage these features via a few small configuration settings, and you don't need to worry about provisioning storage.  In case of a serious problem, you can also turn to Amazon support, for either $30 or 3% of your monthly spend.

Terraform makes adding a replica even easier

All this convenience comes with a price tag to match.  Running each instance on demand costs us roughly $65 per month, with additional pay as you go costs.  A one year reservation reduces it to about $43.

Running this database is by far our largest monthly cost, so we'll be strongly considering alternatives like DynamoDB.

Blog

This blog is runs on the Ghost platform.  Ghost pro starts at $29, but as an open source project, they also offer an official docker image.  95% of this worked out of the box, and I mounted a volume to ensure blog content persists after restart.  Blog content lives in a SQLLite database.

Adding Cloudfront support was largely a point of following the guide they provide for pro users, but setting the base URL to https has so far eluded me.  This only has a minor impact, the worst of these being the sitemap.xml is served in http only.

By running on our existing ECS host, we're effectively running our blog for no additional cost, which is excellent value.  If the rest of the site goes serverless, we could follow suit with a serverless blog engine, perhaps Hugo would be well suited.  

The Final Bill

Full Price2xRDS On Demand, EC2 on demand, Application Load Balancer, EBS Storage costs, Ghost Pro~$198 per month
Today2xRDS On Demand, EBS Storage costs~$135 per month
Six months from now, all reservedEC2 reserved instance, 2xRDS Reserved~$102 per month
Six months from now, but serverlessNo fixed costs~$0 per month

Ignoring any pay as you go usage, the decisions we've made thus far have saved us just over $60, which isn't bad.  Reserving for a year will bring us to a 50% saving, assuming caching and demand remain balanced.  If we can reach serverless, then we would simply be charged for usage, so whilst not free, it could represent a massive reduction in our monthly costs, but we would only make the leap if the change is performant.  Expect a follow up here in a few months time!