Feb 27, 2024
Design your own Tiktok
I'm going to keep this introduction short and simple. We're going to architecture our own TikTok application. This post is going to have very brief and surface-level content aimed for beginners looking into systems design. First, we'll go over a quick introduction to the features, such as:
- Basic authentication
- User content
- Profiles
- Videos (video uploads will be handled in a message queue)
- Comments
- Likes
- Shares
The Database
We'll be using Postgres for our database. It's an RDBMS known for being durable, performant, and is used by many tech companies.
Here's a (very bare) schema for reference:
CREATE TABLE users (
user_id SERIAL PRIMARY KEY,
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL,
password VARCHAR(100) NOT NULL,
full_name VARCHAR(100),
bio TEXT,
profile_picture_url VARCHAR(255),
date_joined DATE NOT NULL DEFAULT CURRENT_DATE
);
CREATE TABLE videos (
video_id SERIAL PRIMARY KEY,
user_id INT NOT NULL,
video_url VARCHAR(255) NOT NULL,
description TEXT,
uploaded BOOL DEFAULT false
likes INT DEFAULT 0,
views INT DEFAULT 0,
upload_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(user_id)
);
CREATE TABLE comments (
comment_id SERIAL PRIMARY KEY,
user_id INT NOT NULL,
video_id INT NOT NULL,
comment_text TEXT NOT NULL,
comment_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(user_id),
FOREIGN KEY (video_id) REFERENCES videos(video_id)
);
CREATE TABLE likes (
like_id SERIAL PRIMARY KEY,
user_id INT NOT NULL,
video_id INT NOT NULL,
FOREIGN KEY (user_id) REFERENCES users(user_id),
FOREIGN KEY (video_id) REFERENCES videos(video_id)
);
CREATE TABLE follows (
follow_id SERIAL PRIMARY KEY,
follower_id INT NOT NULL,
following_id INT NOT NULL,
FOREIGN KEY (follower_id) REFERENCES users(user_id),
FOREIGN KEY (following_id) REFERENCES users(user_id)
);
Our users
will have an id, a unique username and email, a hashed password, a full name, a bio, a join date and a URL linking to their profile photo (we'll go into media uploads later on).
These users will have references to other tables that will hold followers, likes, comments, and videos. The reference tables will hold the necessary data, primarily ids.
The REST API
Next, we need to expose a REST API that our UI can use, whether that be a mobile or web app. This API will handle most of the data transactions. I won't spend too much time on this, as REST APIs are pretty straightforward.
Message Queues
One of the primary processes of our system is to handle videos. Mainly, this system will handle the upload and compression of our videos. Compressing the videos
is important to keep our storage usage as efficient as possible.
When a user uploads a video, we should not take the time to upload and respond with with a status code. This would result in a poor user experience, and can also
make it difficult to retry failed requests. To get around this, we'll be using a message queue.
The endpoint will put our message (the video and it's metadata) into a queue. Then, worker processes will pick up these messages as they're able to do so and process them.
The processes can also upload the final video to a bucket such as S3, and update the Postgres database with the corresponding data (video_url, etc.)
Meanwhile, the endpoint will simply respond with a 200 OK
and the video_id
.
On the frontend, we can simply poll the API for the uploaded status.
Performance
Caching
A very important component of our API is caching. If you don't know what caching is, here's a quickstart:
imagine you're a librarian and someone asks you to find a book. You could search the library and go by the author's last name and keep going and going until you find what you're looking for. This is very time consuming and inefficient.
Now let's say this book also happens to be super popular, and everybody is asking for this book. You could do the same thing you always do, OR you could simply get all copies of the book and keep them near your desk for quick retrieval. This is
very similar to caching in systems. We keep a copy of certain pieces of data/resources in memory as it's much faster than a query to a database.
Obviously this is a very simplified example.
So back to our system, we'll be using Redis as our caching layer, allowing us to store our data in memory as key:value pairs. Redis is a very popular tool, used by high traffic companies such as Twitter and Snapchat.
Content Delivery Networks
Content Delivery Networks (CDNs) are a very similar idea to caching! CDNs act as an intermediary between users and our bucket storage, being placed in specific positions around the world. They aim to optimize the delivery of content, such as images or videos, and ensure faster access regardless of geographical region. (ironically, this website you're on doesn't use a CDN, I probably should get to it though!)
Now this blog post is already getting a bit long. We've already gone over some of the foundational design choices, such as caches and message queues. In the next part, I'll be talking about load balancing and analytics. From there, we'll tie things up together and dive deeper into specific technologies and tools.
-Caleb