Data Infrastructure: an Introduction to the Semantic Layer

hero image

What is a data infrastructure?

Data infrastructure is how an organization extracts, processes, and integrates data from various sources in order to make it useable throughout the company. A modern data infrastructure often looks like this:

Data Infrastructure

The telephone game

See those little grey people off on the right? That’s the ‘business side’ of a business. The people actually involved with making use of the data, as opposed to the technical people in charge of maintaining the data itself — not that this makes the business side non-technical. Most positions these days require expertise and training in usually extremely complicated Business Intelligence (BI) tools. ‍

You can see the potential for problems in the image above: the business side (and their tools) will essentially always be getting their information second-, third-, or even fourth-hand. This can make it hard to make informed decisions, even if you have a BI tool that you trust and are extremely proficient in using. ‍

Consider a factory: you put truckloads of plastic, metal, and computer chips in one end (yes, my factory has ends. I’m making an analogy, not drawing up a blueprint) and finished goods come out the other. Depending on what happens inside the factory this might be a blender or an anti-aircraft missile. Without an understanding of the processes taking place within the factory, Margarita Mondays become a lot more dangerous. ‍

The semantic layer is a factory

Those multiple steps between the database and the users — data lakes, warehouses, and marts — are collectively known as the semantic layer. The semantic layer translates the structure of the database into a form parsable by the business side, so that it can be manipulated easily. ‍

For example, let’s take what should be a relatively straightforward concept: sales. “Sales” in the database will be a floating value formed from adding columns from multiple sources — new customer sales might be in one place, yearly sums for different sales teams in another, perhaps renewals in yet another. ‍

What can be a little difficult to grasp is that — in the database — “Sales” is not a number; it’s a formula. It doesn’t really exist as a quantity. ‍

The semantic layer translates that database representation of Column A + B + C (etc) into the usable value of “Sales,” meaning the business side has a concrete figure to use in their analyses, powerpoints, and graphs. ‍

Why has the semantic layer become more important in recent years?

The term ‘semantic layer’ was patented back in 1991, so the idea has been around for a while. For long time a semantic layer was a technical function of BI tools: one piece of software would plug in to the company database and create its own semantic layer to interact with raw data. ‍ A look at search trends for semantic layers over the last decade, however, shows that interest has grown quite dramatically in the last few years.

Data Infrastructure

Why? There are three main reasons for this:

1: The proliferation of BI tools‍

With the spread of BI tools, a situation has arisen where multiple tools are used within a company. This has led to problems, for example, where the indicator (like “Sales”) is defined within each tool, and differences in the definitions can lead to slightly different figures being produced.‍

2: The rise of SaaS‍

As the number of SaaS deployed within the enterprise continues to grow, it inevitably means distributed data sources over one single database source. As a result, data integration has become increasingly complex, and it is no longer possible to understand the integrated database fully from the business side.‍

3: Black boxes‍

In recent years, the improvement and spread of AI technology has led to data assets being directly linked to achieve competitive advantage. However, even if data is available, it cannot be utilized if the user does not accurately understand the content of the data. We’ve all worked for a company with massive shared folders filled with spreadsheets with gnomic titles like “sales20final” and “sales20real.”‍

This has increased the need to manage data in a way that is understandable (and mistake-proof) for users, thus the importance of building a semantic layer is therefore considered to be increasing.‍

Wouldn’t it be nice to have a single SaaS that could perform all of your data integration, build your semantic layer, and also take the place of most BI tools? All without needing coding skills or an engineer?

Morph does all of these things, and it’s completely free to try. Give it a try, and see how it feels to be in full control of your company's semantic layer!