Semantic Versioning of Data Models

Hi everyone, this is Daniel with the Structure.rest team. We make analyzing your data easier using a graph-based editor to organize your queries into pipelines. Here's a quick Youtube video: https://www.youtube.com/watch?v=8uaov4xm764&feature=reddit to get a better idea of what we do.

We’ve been doing customer interviews for the past couple weeks, and the one feature that is a “table stakes”, “must-have”, “basic need” for all of the data engineers that we interviewed was version control. I made this video- https://youtu.be/gVx4JhugCUc - showing how we implemented version control. I built a simple version control menu that connects up to the GitHub Rest API (v3). At first, I thought this would be enough, but as I have talked to more people, the picture becomes clear that this is not a simple problem. If any of you guys or gals have similar problems please reach out. We’d be interested in learning about the problem, so we can offer better solutions in the future.

In data engineering, version control can be useful for situations such as when data sources change, ETL automation services change, schemas change, or when business goals change. The big problem is that you don’t want to either start from scratch or refresh all of your tables from scratch when some change happens upstream of the models you are currently working on. I think semantic versioning is an excellent solution to this problem.

Here's a blog article - https://www.structure.rest/blog/semantic-versioning-of-data-models - that talks about the problem just a little bit more. If this kind of stuff excites you please free to check us out at https://structure.rest or visit or slack: https://join.slack.com/t/structuresupport/shared_invite/zt-ddx04ho4-_q43i5o3zQ9jv00qx~dx8A