Slug Generation

High level explanation of how slug generation works on

Slug Generation

This article has been written to celebrate reaching one thousand users and explain the changes in URL generation released in v1.3.0-alpha

First - let's clarify the term. The part of the URL that is used to uniquely identify a resource is referred to as a slug in web development. I do apologize if you were looking for articles related to mollusc.

There are many strategies in regards to slug generation - most try to optimize for ranking on search engines. The primary goal of the slug generation on KallaX has been creating short URLs that appear non-predictable.

One common strategy for creating short slugs is generating a random string and checking if it has already been used. This requires storing the slugs somewhere (often a database) and doing a look-up after each generation, if it has already been used before generate a new one and try again. The problem with this method is that the chance for collision radically increases as the database starts getting populated. This is often mitigated by increasing the slug length after a fixed number of tries to prevent the algorithm from deadlocking.

The solution we choose on KallaX is based on integer encoding (heavily inspired by hashids) with a custom alphabet. The Indo-Arabic numeral system or what most people would refer to as the regular numbers - are based on the alphabet 0,1,2,3,4,5,6,7,8 and 9.

Hexadecimal numbers expand this alphabet to 16 characters (0-9 + A-F). This means the first 10 numbers are represented in the same way. However, the number 10 is represented as A in hexadecimal. This representation requires significantly fewer characters e.g. 42000 = A410.

One can easily create a new numeral system by choosing another alphabet for the numeral system - and then you have your own little custom solution for predictable slug generation. Such an alphabet (0-9,a-z,A-Z) would produce;
0,1,2,3,4 ... a,b,c ... Z, Z0, Z1, Z2, Z3 ... ZZ

The problem is these slugs will look very predictable - which might be fine for your solution but we are trying to obscure this fact. We could shuffle our alphabet - which would make it a bit harder to figure out what is going on but you would ultimately still end up with very similar-looking slugs.

To pseudo randomize even more we pick one character based on the key we are trying to encode - and then use this single character to shuffle our alphabet. Our slugs are now using as many different alphabets as we have characters in our alphabet! That makes them look very unpredictable!

... but how do we figure out which alphabet to use? We simply append the character we choose to the message. This way we can re-create the alphabet used to encode when we are trying to decode the slug.

As the last step, we add padding to ensure the slugs are always of a minimum length, just to look pretty. This is done by selecting characters that are not part of our alphabet and using them to indicate where our slug starts. Imagine the letter E is NOT part of our alphabet. To make the slug JF a minimum of 5 characters long we simply prepend the character E and then random characters that will not be considered part of the slug when decoding. Selecting these random characters is easier as we do not care about reversing the process.

JFEQA <= Everything after E does not matter when decoding.

The result is slugs that appear random but are fully deterministic and can be generated on the fly based on our internal integer ids. They can encode large numbers in relatively short strings - which makes them perfect for shareable URLs. All the slugs in the image (except the highlighted one) are all in order and represent numbers between 0 and 100.

Anywho, this was a really - really complicated way to say that we have updated the alphabet used in slug generation to only include upper case characters and this means your profile slug is now different - sorry 🤷

Please do not feel obligated to tip!
We have been asked a few times if we accept donations and we are very flattered! The hosting cost is currently very manageable and we have no issue paying it out of our own pockets. This is our hobby project - hobbies are allowed to cost money. However, after some consideration, we decided to accept donations from those who might feel like it. Donations will solely be used to cover hosting costs and potentially allow us to upscale services to accommodate user growth in the future.