Don’t miss the first post in the series: An architecture for scaling dev organizations.
Let’s start simple. We have a TODO app where you can add a todo, toggle it between complete and incomplete, and filter you todos to show all, active, or completed. Simple.
Our data model might look like:
Now let’s scope creep it a bit. Let’s say we want to jump on the personalization bandwagon and suggest todo items for our users. First we probably want to make our webapp multi-user so we can increase the amount of data available for our classification algorithm.
We can just add a user to our TODO table:
Now every night we run a classification algorithm over our todos which groups our users based on some threshold of the sameness of their todos. After some more processing we can come up with a list of todos that people like our user have and we can create a list of potential todos. We can now integrate that data into our TODO webapp.
Our original table isn’t a fit for this data because it is a list of unique todos. Here we are associating users with already existing todos written by other people. Let’s create another table called RECOMMENDATION that has a user and a foreign key to a row in the TODO table.
Now we have two tables that both reference a user so we may want to refactor out a USER table that we can then use to create a relationship to the RECOMMENDATION and TODO tables.
Ok, now that we have designed the data let’s go back and design the screen. Let’s put the recommendations on the same screen as the todo entries so the user can click on the recommendation and have it appear on their list of todos. Pretty simple, we will just have our backend query the database twice, combine the entries on the server side and present a json document for our frontend to consume and display.
Fast-forward a few months. We have now added several more “simple” features where the data model has grown. We have marketing querying in our database in order to support personalized ads for potential clients of our software. We have data analysts querying our database to get business insights into our data. We have new screens querying our data in ways we never thought of as we organically grew our tables. We spend devops time tuning and adding indexes and generally trying to keep the queries fast.
Soon afterwards, our product owners come to us and ask us to add a single block of text to one of our screens. It will be so simple, it is just one extra piece of data. They confidently estimate a day of development work without even consulting the devs. It is just one piece of data right?
The difference this time is that the data isn’t easily accessible based on how we have structured our database. So we have a couple of options: slow down the screen by adding one or more complex queries, or push back and keep that new piece of data off of the screen. Neither is a good place to be. If you choose door #1 you are offering your users a really bad slow offering. Door #2 means you are withholding functionality your users may really want or need. All this because some decisions you made before created tech debt that is killing your ability to evolve the situation. But there is a third option.
A third option is to design the UI first. The UI is the gateway that the user has into your system and from their way of thinking it may actually be the system to them. Your users are really why you are writing your system in the first place and so it makes sense to think of them first. So strategy #3 is to change the data to fit the UI.
How does this work? Aren’t we going to run into the same issues we had? Isn’t changing the data going to break other screens?
No. That is because we are going to do something that runs counter to everything we have been taught over the last several years. We are going to duplicate our data.
The relational model is really good when you want to present a single model of the system, normalize your data, and prevent data duplication. For this example we are going to choose to structure our data as a document. Documents are great when you have slow changing structured data. Let’s see how this will work.
In the basic use case we are going to create one document per screen. A powerful way to do this is to just store your data as a json object so that when your todo screen calls to your backend to fetch all the data (and only the data) it needs to display you are literally just making a call to your database and returning whatever comes back to the frontend. There is no marshaling, unmarshalling, converting to json, just a super fast read and then your http server streams the results. Can you imagine how you can transform your user’s life by speeding up the way they can interact with your system? This strategy will work with almost any screen that has a query based off of a single key.
Another common type of screen has a number of search filters that a user can interact with is low (maybe less than 30) or if your filter is insanely simple (like our example of toggling between active and completed) you may still want to use the document storage strategy and just filter in the browser. For more complicated queries there is no reason you can’t create a new relational table or set of tables that you have specifically designed for this screen.
The next article in this series will go over some strategies for embracing data duplication, how to think about writing data, and how to reason about data SLAs.
By developing the UI before the data model you can provide a really first class experience to your end user. Embracing data duplication allows UI first development and can greatly speed up your user experience.
Continue Reading in Part 3 — Data Replication
Sign up for the beta of committeddb, a simple to use commit log data base perfect for implementing the ideas in this series.