Real-Time Data Science and BI with Azure Synapse Analytics Part 1: Overview

Dawid Borycki

4.40/5 (2 votes)

Jul 19, 2021

CPOL

2 min read

4084

This article discusses ways to analyze real-time data without affecting application performance.

Modern application architectures typically use hot and cold data stores. Hot data stores provide rapid responses to users. Cold data stores offer insights into how to improve business processes or adjust to changing markets. These are useful for data scientists, data engineers, developers, and business analysts.

Data science and business intelligence (BI) typically work from non-real-time warehoused data, periodically updated by extract, transform, and load (ETL) jobs. While that is useful, sometimes we want to query, understand, and visualize business data immediately as the underlying data changes. This way, we can adjust to change much faster and gain an advantage over our competitors.

This data analysis approach is hybrid transactional and analytical processing (HTAP). The usual barrier to HTAP is that we do not want analytical queries to impact the performance of transactional databases used in our deployed applications.

Azure Synapse Link enables us to perform data science and business intelligence on live data without hurting transactional database performance. Specifically, Azure Synapse Link for Azure Cosmos DB provides HTAP capabilities to run near real-time analytics based on hot data stored in Azure Cosmos DB.

The Azure Cosmos DB analytical store makes this possible. Its isolated column-based store accelerates analysis. It uses data from the row-based transactional store and writes it to a column-based store.

When creating reports or processing data, we usually want to aggregate data from individual or selected fields. If the store’s data is in the column order, we can serialize several values together, reducing the required input/output operations per second (IOPS). Here is where the Azure Cosmos DB analytical store comes into play. It automatically synchronizes our operational data into a separate column-based store.

The typical use cases for this approach include supply chain analytics, forecasting, reporting, real-time personalization, anomaly detection, and predictive maintenance. Almost all of these architectures involve additional components like Spark and Power BI. Spark enables running your analysis in parallel, while Power BI accelerates BI dashboard creation.

In this series of hands-on articles, we will explore how to use Azure Synapse Link over data stored in Azure Cosmos DB. We will start by importing sample retail sales data into the Azure Cosmos DB container using the Python notebook:

Then, we will learn how to access this data from Azure Synapse Studio to perform the analysis and gain insight into our retail sales:

In the last step, we will create the BI dashboard, publish it, and access it in the Azure Synapse Studio. Here we will gain deeper insight into our retail sales data.

Summary

Continue to the second article in this series to start setting up your environment. As shown above, we will be using Azure Portal, Azure Synapse Studio, and Python.

To learn more, continue to the second article in this series to learn how to create charts and gain insight into business data.

For even more information about using Azure Synapse to drive business intelligence and machine learning, check out Microsoft’s Hands-on Training Series for Azure Synapse Analytics.