Click here to Skip to main content
15,881,803 members
Articles / Hosted Services / Azure
Article

Real-Time Data Science and BI with Azure Synapse Analytics Part 1: Overview

Rate me:
Please Sign up or sign in to vote.
4.40/5 (2 votes)
19 Jul 2021CPOL2 min read 3.5K   2  
This article discusses ways to analyze real-time data without affecting application performance.

This article is a sponsored article. Articles such as these are intended to provide you with information on products and services that we consider useful and of value to developers

Modern application architectures typically use hot and cold data stores. Hot data stores provide rapid responses to users. Cold data stores offer insights into how to improve business processes or adjust to changing markets. These are useful for data scientists, data engineers, developers, and business analysts.

Data science and business intelligence (BI) typically work from non-real-time warehoused data, periodically updated by extract, transform, and load (ETL) jobs. While that is useful, sometimes we want to query, understand, and visualize business data immediately as the underlying data changes. This way, we can adjust to change much faster and gain an advantage over our competitors.

This data analysis approach is hybrid transactional and analytical processing (HTAP). The usual barrier to HTAP is that we do not want analytical queries to impact the performance of transactional databases used in our deployed applications.

Azure Synapse Link enables us to perform data science and business intelligence on live data without hurting transactional database performance. Specifically, Azure Synapse Link for Azure Cosmos DB provides HTAP capabilities to run near real-time analytics based on hot data stored in Azure Cosmos DB.

The Azure Cosmos DB analytical store makes this possible. Its isolated column-based store accelerates analysis. It uses data from the row-based transactional store and writes it to a column-based store.

When creating reports or processing data, we usually want to aggregate data from individual or selected fields. If the store’s data is in the column order, we can serialize several values together, reducing the required input/output operations per second (IOPS). Here is where the Azure Cosmos DB analytical store comes into play. It automatically synchronizes our operational data into a separate column-based store.

The typical use cases for this approach include supply chain analytics, forecasting, reporting, real-time personalization, anomaly detection, and predictive maintenance. Almost all of these architectures involve additional components like Spark and Power BI. Spark enables running your analysis in parallel, while Power BI accelerates BI dashboard creation.

In this series of hands-on articles, we will explore how to use Azure Synapse Link over data stored in Azure Cosmos DB. We will start by importing sample retail sales data into the Azure Cosmos DB container using the Python notebook:

Image 1

Then, we will learn how to access this data from Azure Synapse Studio to perform the analysis and gain insight into our retail sales:

Image 2

In the last step, we will create the BI dashboard, publish it, and access it in the Azure Synapse Studio. Here we will gain deeper insight into our retail sales data.

Image 3

Summary

Continue to the second article in this series to start setting up your environment. As shown above, we will be using Azure Portal, Azure Synapse Studio, and Python.

To learn more, continue to the second article in this series to learn how to create charts and gain insight into business data.

For even more information about using Azure Synapse to drive business intelligence and machine learning, check out Microsoft’s Hands-on Training Series for Azure Synapse Analytics.

This article is part of the series 'Real-Time Data Science and BI with Azure Synapse Analytics View All

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
Dawid Borycki is a software engineer and biomedical researcher with extensive experience in Microsoft technologies. He has completed a broad range of challenging projects involving the development of software for device prototypes (mostly medical equipment), embedded device interfacing, and desktop and mobile programming. Borycki is an author of two Microsoft Press books: “Programming for Mixed Reality (2018)” and “Programming for the Internet of Things (2017).”

Comments and Discussions

 
-- There are no messages in this forum --
Real-Time Data Science and BI with Azure Synapse Analytics