Why is System Design so important?
One thing that big tech companies do more than smaller ones is to place an emphasis on system design. In fact, it makes up an integral part of their interview process. But this isn’t something just for an interview process, but a key way of thinking that is critical for modern software development. It is a shame then, that the core skill of system design is often neglected in any training plan and something picked up on the job. The closest thing most get to is either cloud certifications or more code-based techniques like design pattern courses. It is also something that might not get taught/tested on in Universities. This results in many engineers making common mistakes when it comes to putting a system together.
Note: see the end of the article for some great training resources in system design which you can use with your own team.
Common Mistakes
- Sticking with what we know or using a generic tool for everything.
- Thinking that one tool is better than any of the others.
- Using something, simply because its part of the same stack.
- Not considering running/maintenance costs (infrastructure and people).
What is the best tool?
It is very easy to get caught up in a ‘one true way’ mentality where you default to the latest trend such as microservices or event-driven architectures. But rather than picking the architecture and making it fit the problem, we should be doing it the other way round. First, we need to understand and work out the details around the problem and then pick the best solution for that job. It is also important to consider not just one approach, but multiple approaches and validate them. This allows us to compare them against each other and have a greater chance of picking the optimal solution.
A classic example of the above is choosing No-SQL databases for everything, just because we are told it is more scalable than a relational database. But a SQL database is simple, it handles transactions really well and guarantees ACID compliance. If we do some calculations and it turns out that we are going to have billions of records and we need to shard that out over multiple instances, then it probably makes sense to move towards something like DynamoDB or Cassandra (especially if the data is not relational).
Evolutionary
Systems should be designed in such a way that they are evolutionary, rather than perfect from the get-go. This means using a lean mentality to do the minimum amount of work necessary in order to deliver fast. You do, however, want to have in mind how the company/system is likely to scale over time and build things in such a way where you can scale them out later on.
This following video shows how dropbox scaled over time and is a great example of an evolving system. We can apply this approach to our system design. First, start with the simplest system we can get away with and then build on it, remove bottlenecks and scale it where we need to in order to handle the amount of data/requests we have.
Outline Process
There are multiple ways to do system design, but many of them share a similar approach, which I have listed the steps of below. You would usually end up doing this with your team around a whiteboard when you were first planning a rough solution to the problem. Unlike a system design interview, you are likely to repeat this process over multiple iterations.
- Ask Questions: Try to understand the problem more and make sure you are all trying to solve the same problem.
- Define Scope: What will your system cover and not cover?
- Data/API Details: Work out what data you might need in order to do the next step.
- Back of Envelope Numbers: Work out the scale of the problem and the amount of data/requests/etc you are dealing with. (see separate article).
- High-Level Design: Design something simple.
- Detailed Design: Break it down further and scale it out where needed.
- Identify/Resolve Bottlenecks: Think about possible problems.
- Expand on areas: Keep going!
Trade-Offs
Every choice we make within a design will have trade-offs. For example, this might be using an in-memory data store to reduce read latency at the expense of increased costs. We might choose to make a distributed system always available, even if that means our data can become inconsistent. Another example is using a tool which is more efficient but would require more maintenance/support time as it is not managed. Whatever you decide, you should relate it back to the problem you are trying to solve and the current circumstances of your team/company.
Main Considerations
When designing your system, try to keep the following considerations in mind with every part of the system. They will almost certainly be linked to trade-offs you make and it can be useful to scan through this list before/during designing a system for ideas.
- Extensibility: How easy it is to add new parts to the system?
- Scalability: Does the system scale and does it even need to?
- Consistency: Does everyone always need to see the most up to date data?
- Availability: How available does your system need to be? This is often a trade-off with consistency (see CAP theorem).
- Reliability: Your system might be highly available, but how reliable does it need to be?
- Redundancy: How important is it that data is never lost? Do you need to replicate the data or have backups?
- Efficiency: How efficient is the solution?
- Manageability: How easy will it be to maintain after its deployed? Do you have the knowledge on your team for the chosen tools?
- Security: Think about possible security risks.
- Read/Write Heavy: Is the problem read or write-heavy and how does this affect the design?
- Cost: Less important for interviews, but very important in the real world.
Costs
Depending on the tools, there are a number of different aspects of the cost you may need to consider. Here are some of the common ones:
- Compute time | Storage size/type| Memory
- Network traffic | Read/Write costs | Logging/Metrics
Data Structures & Algorithms
You might not immediately think about data structures when coming to designing a system. Having good knowledge in these areas and making sure you keep refreshed can come in very handy. To illustrate why this is I have given a couple of examples below:
Web Crawling
The web can be thought of as a Tree data structure where one page (node) links off to many others. A better way still is to represent the web as a Graph data structure as nodes (or pages) can reference each other and create circular references. Therefore if we know about these data structures, we can tackle the problem of creating a web crawler using the same techniques.
For example, crawling is just like traversing a Tree/Graph. There are two main ways to do this, Breadth-First Search (BFS) and Depth-First Search (DFS). We can then design our system in a way that reflects how we would code the problem. To implement BFS you use a Queue data structure within the code and you can use a queue component in a larger scaled system.
- Add seed URLs to the queue.
- Take a link off the queue.
- Scrape the link and extract the URLs.
- Add the URLs to the queue.
- Take the next item off and repeat.
One thing to consider with any traversal of a Graph is that we have to watch out for circular references. Therefore we should add a check that we have not already processed the URL before adding it to the queue.
Checking a Record Exists
A common system design problem is the need to check if something already exists. For example, this might be checking if a username has already been taken or a record has already been processed. Where latency matters, we need to make this check fast. A way to do this is to store a list of the values in a fast in-memory cache like Redis, so you don’t have to keep hitting a database. But what happens if the number of keys is so big that this is not cost-effective. Enter the probabilistic data structure of the Bloom Filter. This allows us to store a fraction of the data.
Resources
The following resources are some of my favourite for learning more about system design.
Real Company Design Videos
Dropbox: Learn how dropbox scaled their systems over time.
System Design Videos
Google Search — AutoComplete (Using the Trie data structure)
Other Resouces
Grokking the System Design Interview: This course takes you through many different scenarios of designing systems and has an excellent reference section.
System Design Primer: This is an open-source repository of lots of system design information.
Cracking the Design Interview: One of the best books for improving your data structures and algorithms knowledge.
Distributed Systems: Video of core concepts for distributed systems around compute, messaging and data stores.
Please share with all your friends on social media and hit that clap button below to spread the word. Leave a response of resources you find useful for systems design. 👏👏👏
If you liked this post, then please follow me and check out some of my other articles.
About
Matthew Bill is a passionate technology leader and agile enthusiast from the UK. He enjoys disrupting the status quo to bring about transformational change and technical excellence. With a strong technical background in Polyglot Software Engineering, he solves complex problems by using innovative solutions and excels in implementing strong DevOps cultures.
He is an active member of the tech community, writing articles, presenting talks and contributing to open source. If you would like him to speak at one of your conferences or write a piece for your publication, then please get in touch.
Find out more about Matthew and his projects at www.matthewbill.com.
Thanks for reading!