AWS → Bare-metal migration

This post covers how Kallax(.io) was migrated from Amazon Web Services to a dedicated server hosted by Scaleway.
For people unfamiliar with Kallax, it's a non-commercial website that aids with the organization of 400.000+ board games for a few thousand board game enthusiasts.
I will start by introducing the architecture it was running on. The project is 4 years old and have already managed to have a legacy version ("app") and a full rewrite that live side-by-side until we have achieved full feature parity (close).

Kallax — Web Server
At the core Kallax was running a fairly standard small cloud web-server setup. The setup was a EC2 instance behind an application load balancer w. a managed RDS for data storage. Static files (65.000+ images) were served from a S3 bucket through CloudFront.
The application load balancer is overkill for such a project as we rarely scaled beyond one instance. It served two purposes — it allowed for blue/green deployment and was responsible for SSL termination so we did not have to handle SSL certificates ourselves.
At one request per hour, an ALB will set you back US$21.96/month.
Note: We were using free tier. If you want AWS to handle SSL for you but don’t need load balancing, consider CloudFront or API Gateway.
The webserver itself ran on a t3.micro. That is a tiny instance to run a production environment on — but it has actually been able to handle going viral on /r/boardgames pretty well.
A t3.micro gives you access to 2 vCPUs and 1 GiB ram on 1st or 2nd generation Intel Xeon Platinum 8000 (North Virginia). Those CPUs have 2 threads per physical core, so you get access to the equivalent* of a single 2.3 GHz (3GHz turbo) core for US$7.49/month
*Oversimplification. vCPUs to physical core conversion is a complicated topic.
We were/are able to support many users with few resource. The server only serves the initial rendering together with the web app, and then transfer rendering responsibility to the client. This means our instance only has to handle lightweight gRPC call after that initial ‘hit’.
The database was a managed instance on db.t4g.micro at US$11.52/month. Again, not usually a production instance — but handled it. Our codebase is written with the server having limited resources available in mind, complex queries are extremely sparse.
A t4g.micro is 2 vCPU and 1 GiB ram on a the (AWS) Graviton2 processors. You pay a 90% premium (+US$0.0076/hr) for having your instance managed, but it does come with some neat features and peace of mind.
Note: You do not want to manage your production database yourself unless you are absolutely sure what you are doing. That premium is worth it for most scenarios as it comes with automatic updates and backup.
Route53 was used for DNS, it makes everything easier to keep it within the AWS eco-system. Elastic Beanstalk was used as orchestration service, the cost here is negligible. Beanstalk has all the features needed for most web services, consider CloudFormation complex setups.
Kallax — App (legacy)
Kallax used to be a Blazor WebAssembly app (until August 2024). Similar to our current setup, except the server/backend did not perform pre-rendering. Great if you have limited resource’s and allows you to serve the entire site as a static website. It does have the side-effect that the initial render takes a long time as the web app has to be downloaded and started up in the users browser. Search bots (Google, Bing, DuckDuck) don’t like performing this work and will skip indexing in most cases.
The legacy client use the proto. subdomain for handling API request (gRPC). In the setup we ran it routes to the same instance as the main domain (allowing us run a single instance), but having a dedicated subdomain so we could redirect the legacy API calls easily.
We previously ran a complicated pre-rendering pipeline that would store public pages on S3 — but generating them and handling cache busting became cumbersome.
The new Kallax runs as Blazor Hybrid (dynamic pre-rendering) — it’s a bit more work for our server on initial load but it is much nicer to work with for us.
We are still not at full feature parity with the new Blazor Hybrid project (very close) so the ‘legacy’ app is being used as a fallback and served on the app.kallax.io subdomain.
The cost of this is negligible. S3 + CloudFront, cost based on usage.
Kallax — Content Delivery
Nothing complicated here. We don’t want to waste CPU-cycles sending images (65.000+) through our tiny web backend — so they are served from a S3 bucket through CloudFront.
The images are highly compressed and we only support the .webp format, so the cost is negligible here (less than US$0.2 most months).
Kallax — Cloud
That wraps up our cloud architecture*. At idle the price starts at US$41/month, on top of that comes data transfer, storage and small fees like the IPv4 charges.
*Continuous Integration and Continuous Delivery/Deployment excluded.
It’s worth nothing that there are way more cost-effective ways to host a similar site in the cloud. The ALB is responsible for most of the cost here and is unnecessary at this scale, consider API Gateway or CloudFront for SSL.

Kallax — Bare metal
We have rented a dedicated server in a European owned and operated data center (Scaleway). For a fixed price of €24.99 / ~C$38.56 / ~US$26.93 per month we get;
Intel Xeon D-1531 @ 2.2 GHz, 32GB DDR4, 2x 250GB SSD with access to 100GB onsite storage and unmetered 1 Gbps bandwidth, including a free static IPv4 (and IPv6 /48 subnet).
The diagram above shows the setup we have installed to support an equivalent* service. The individual services are running virtualized. The diagram hides a major factor; locality.
The cloud setup was spreading services over multiple physical machines. The load balancer, EC2 instance and RDS was running virtualized on (minimum) three distinct physical devices.
ALBs require a minimum of two availability zones, we limited that to us-east-az5 and us-east-az , our RDS was running in us-east-az5. This meant that the backend and database were sometimes in the same data center, sometimes not.
In our dedicated setup, this is now a single physical device. As a result we change from millisecond latency to nano/microsecond latency (~x1000 faster) between DB and server.
Instead of having the combined compute of 4 vCPU and 2 GiB ram, we have 12 vCPU and 32 GiB ram (albeit on a slightly slower CPU). Since they run on the same device, they can temporarily utilize the full available capacity when needed.
There are too many factors to provide a proper performance comparison without extensive benchmarking so you will have to do with; trust me it is faster.
Disclaimer; I do not necessarily recommend this.
Installing, configuring and maintaining dedicated servers requires a massive time investment. I have enjoyed it, but most web services will end up paying more for the developer salaries required then they are going to save on their Cloud bill. Cloud is trendy for a reason.
There is also an absolute boatload of configurations that needs to be correct to keep your server secure. Consider starting with a VPS if you want to play around, or self-host.
Data Security
If our server is off or misconfigured then Kallax is down. Completely. We can accept (as a free hobby project) to be down for a limited period, but we can not accept losing the users data.
People trust us with storing their board game collections that they have spent time setting up their collection on our site (multiple users have 1000+ boardgames, most have around ~100 boardgames). We take that responsibility very seriously.
The server runs with the two SSDs in RAID 1 so we can handle a single storage drive failure and it also has a fallback power supply. This gives us some protection against hardware failure, but you can never fully prevent it.
You can prepare for it though.
We have configured an hourly nearline backup. All data is copied to another machine within the same data center. In the case of complete hardware failure we never lose more than an hour worth of data.
AWS RDS does a backup every day per default, this was our setting.
In addition to hourly backups we do pre-deployment backups now. The entire database is copied to nearline backup before the web-server is updated. This allows us to do a clean restore of the database from the nearline backup, which takes less than a minute.
We also do a offsite backup daily. In case something happens to our server AND the backup (or entire datacenter, god forbid), we’ll be able to restore the database from our off-site backup located in another country and not owned by the same provider.
Note: Kallax never stores your password. Our server never even receives your password either — the client hashes before sending it.
Kallax — Migration
Back to the point, we migrated! Now, to be fair — we are a small hobby website — we could just have had downtime and no one would bat an eye.
However, we are also nerds professionally. It’s rare that you get to migrate entire production environments (luckily?), so we jumped on the chance of a low-risk learning experience.
Before migrating anything we performed restoration drills, tested the backup system and hardened the server. The first thing we moved over was the static files. These are fairly easy to migrate as we have full control of when something is uploaded.
Disable anything adding static files (in our case image downloads), copy everything over. With full parity between the environments we could switch over the domain to our new dedicated server without any outage. Both destinations serve identical responses, so the DNS propagation period is not an issue.

Our backend server is written to support horizontal scaling, so running multiple instances against a single database is no issue. Switching over the database would be key.
Having full control over the dedicated server we temporarily opened up the firewall and virtual instance for our EC2 instance IP to connect to db-dedicated and switched the connection strings for the AWS hosted web server (see diagram).
During this switch we had a 26 second period where users could write to db-cloud and temporarily lose data as it would not be included in the dump that db-dedicated received.
We performed this switch during our quiet hours (week days, US night). After switching over we downloaded the data from db-cloud again and compared to db-dedicated. Our traffic at the time was low enough to fix inconsistencies manually.
Note; I would not recommend this method for commercial production environments. Opt for double-write pattern or preferably short maintenance window if acceptable.
At this point the migration was in a stable state as both backends would answer with identical responses. This allowed us to sequentially move and terminate services without having to worry about DNS propagation issues.
After less than an hour we could remove our firewall exception again.
Conclusion
We have been terminating resources left and right and are almost completely off AWS Cloud. The remaining services are what supports the legacy application (app.) and it will not be migrated to dedicated as we are retiring it soon. I̶t̶ w̶i̶l̶l̶ r̶e̶m̶a̶i̶n̶ h̶o̶s̶t̶e̶d̶ o̶n̶ A̶W̶S̶ u̶n̶t̶i̶l̶ w̶e̶ h̶a̶v̶e̶ f̶u̶l̶l̶ f̶e̶a̶t̶u̶r̶e̶-̶p̶a̶r̶i̶t̶y̶. The rest of the services are now running on our dedicated server.
Update: Moved the static legacy app over on Bunny.net, took all of 15 minutes.

Users might notice a considerable difference in response time (both positively and negatively). The majority of this will be from the fact that we have moved the service from North Virginia to Paris — not the hardware/architecture changes.
CloudFront will be replaced with another global CDN provider in the near future, ensuring that users in North America continue to receive a fast and reliable service.
By coincidence, I committed to a dedicated server on Kallax’s 4th anniversary. I had completely forgotten about the date, but consider it a gift 🎁
Thanks for reading, and if you are a Kallax user — thanks for playing!