Click here to Skip to main content
15,881,561 members
Articles / Containers

Utilizing Azure Blob and WebJob to Convert Excel Files to Flat File Format

Rate me:
Please Sign up or sign in to vote.
0.00/5 (No votes)
24 Feb 2020CPOL4 min read 11.1K   103   3   1
How to perform Excel to comma separated file conversion without storing them on local hard drive
This article shows how to perform Excel to CSV file conversion without storing them on local hard drive. Though there are many articles on this topic, this one is different in the way files are read and saved back. Using Azure storage and WebJob, we convert files from one format to another without utilizing the local disk space for saving file during this entire conversion process.

Introduction

I believe there are many articles or blogs already available which show how to convert an Excel file to a comma separated file using C# and in all the cases (which I referred to), Excel is read from a hard drive of a local machine and CSV file is saved back to the same hard drive. But in spite of knowing this, again, I’m going to draft another post.

Wondering why?

Well, this post is going to be slightly different in the way files are being read and saved back. Below are the major offerings of this post:

  • What if we have many Excel files to convert but the disk does not have enough space to save all of those? Same is the case for conversion output too.
  • What if we don’t have permission to save our converted files on to the local machine?
  • How can we run this conversion utility using web jobs?

In order to address the above challenges, we can utilize Azure capabilities wherein we will do everything on the fly without utilizing disk space as a storage for our files. Let’s see everything in action by going step by step.

Problem Statement

Read Excel files from Azure blob storage, convert them to CSV format and upload them back to Azure blob storage. This entire process has to run via triggered web job, so that it can be repeated as and when Excel to CSV conversion is required.

Setting Up Environment

I’m using Visual Studio 2019 v16.4.0 and have an active Azure subscription.

High Level Tasks

Below is the list of high level tasks which we will be performing:

  • Creating containers in Azure storage
  • Reading Excel from Azure storage
  • Converting Excel to CSV format
  • Uploading CSV to Azure storage
  • Creating Azure WebJob
  • Triggering Azure WebJob

Creating Containers in Azure Storage

A container must be created under blob service to store all the Excel files which need to be converted to CSV format. Now there are two ways, one can create a container – one is through the Azure portal and another one is by using C#. As both these are easily available on MSDN, I’m not going to repeat the entire procedure. For detailed steps on how to create a container, please refer to the references section.

For our exercise, I’ve created two containers named excelcontainer and csvcontainer under one storage account, where:

  • excelcontainer – holds Excel files which are to be converted to CSV
  • csvcontainer – holds the converted CSV files

Below is the screenshot of my excelcontainer, which holds three Excel workbooks:

Image 1

Reading Excel from Azure Storage

Now we have excelcontainer ready with uploaded files, it’s time to read data from all those files and here is the code to do that:

C++
public async Task<List<BlobOutput>> Download(string containerName)
        {
            var downloadedData = new List<BlobOutput>();
            try
            {
                // Create service and container client for blob
                BlobContainerClient blobContainerClient = 
                        _blobServiceClient.GetBlobContainerClient(containerName);

                // List all blobs in the container
                await foreach (BlobItem item in blobContainerClient.GetBlobsAsync())
                {
                    // Download the blob's contents and save it to a file
                    BlobClient blobClient = blobContainerClient.GetBlobClient(item.Name);
                    BlobDownloadInfo downloadedInfo = await blobClient.DownloadAsync();
                    downloadedData.Add(new BlobOutput 
                       { BlobName = item.Name, BlobContent = downloadedInfo.Content });
                }
            }
            catch (Exception ex)
            {
                throw ex;
            }
            return downloadedData;
        }

where BlobOutput is the DTO with below members:

C++
public class BlobOutput
{        
      public string BlobName { get; set; }
      public Stream BlobContent { get; set; }
}

Converting Excel to CSV Format

In the above step, we have collected the data from each blob object into a stream. So, in this step, we will convert the streamed data into CSV format and here is the code for that:

C++
public static List<BlobInput> Convert(List<BlobOutput> inputs)
        {
            var dataForBlobInput = new List<BlobInput>();
            try
            {
                foreach (BlobOutput item in inputs)
                {
                    using (SpreadsheetDocument document = 
                           SpreadsheetDocument.Open(item.BlobContent, false))
                    {
                        foreach (Sheet _Sheet in 
                                 document.WorkbookPart.Workbook.Descendants<Sheet>())
                        {
                            WorksheetPart _WorksheetPart = 
                               (WorksheetPart)document.WorkbookPart.GetPartById(_Sheet.Id);
                            Worksheet _Worksheet = _WorksheetPart.Worksheet;

                            SharedStringTablePart _SharedStringTablePart = 
                                     document.WorkbookPart.GetPartsOfType
                                               <SharedStringTablePart>().First();
                            SharedStringItem[] _SharedStringItem = 
                                     _SharedStringTablePart.SharedStringTable.Elements
                                               <SharedStringItem>().ToArray();

                            StringBuilder stringBuilder = new StringBuilder();
                            foreach (var row in _Worksheet.Descendants<Row>())
                            {
                                foreach (Cell _Cell in row)
                                {
                                    string Value = string.Empty;
                                    if (_Cell.CellValue != null)
                                    {
                                        if (_Cell.DataType != null && 
                                            _Cell.DataType.Value == CellValues.SharedString)
                                            Value = _SharedStringItem[int.Parse
                                                    (_Cell.CellValue.Text)].InnerText;
                                        else
                                            Value = _Cell.CellValue.Text;
                                    }
                                    stringBuilder.Append(string.Format("{0},", Value.Trim()));
                                }
                                stringBuilder.Append("\n");
                            }

                            byte[] data = Encoding.UTF8.GetBytes
                                               (stringBuilder.ToString().Trim());
                            string fileNameWithoutExtn = item.BlobName.ToString().Substring
                                             (0, item.BlobName.ToString().IndexOf("."));
                            string newFilename = $"{fileNameWithoutExtn}_{_Sheet.Name}.csv";
                            dataForBlobInput.Add(new BlobInput { BlobName = newFilename, 
                                                                 BlobContent = data });
                        }
                    }
                }
            }
            catch (Exception Ex)
            {
                throw Ex;
            }
            return dataForBlobInput;
        }   

where BlobInput is the DTO with below members:

C++
public class BlobInput
 {
        public string BlobName { get; set; }
        public byte[] BlobContent { get; set; }
 }

If a workbook contains multiple sheets, then a separate csv will be created for each sheet with the file name format as <ExcelFileName>_<SheetName>. csv.

Uploading CSV to Azure Storage

Once the data is converted to CSV, we are good to go for uploading the CSV files back to container and here is the code to perform this:

C++
public async Task Upload(string containerName, List<BlobInput> inputs)
        {
            try
            {
                // Create service and container client for blob
                BlobContainerClient blobContainerClient = 
                         _blobServiceClient.GetBlobContainerClient(containerName);

                foreach (BlobInput item in inputs)
                {
                    // Get a reference to a blob and upload
                    BlobClient blobClient = 
                        blobContainerClient.GetBlobClient(item.BlobName.ToString());

                    using(var ms=new MemoryStream(item.BlobContent))
                    {
                        await blobClient.UploadAsync(ms, overwrite: true);
                    }                    
                }                
            }
            catch (Exception ex)
            {
                throw ex;
            }
        }

So far, we have read the Excel file from container, convert it to CSV format and uploaded back to another container. All good. The next task is to automate this using triggered WebJob.

Creating Azure WebJob

WebJob can be created using Visual Studio by right clicking on the project and selecting Publish

Apart from this, there are many ways to create a triggered WebJob and all are mentioned over here on MSDN.

Triggering Azure WebJob

If everything is setup correctly, you will be able to see the below screen on your Azure portal:

Image 2

As this is triggered WebJob, clicking on Run button will trigger this job and will create output as shown below:

Image 3

Takeaway

Using Azure storage and WebJob, we have converted files from one format to another without utilizing the local disk space for saving file during this entire conversion process.

References

History

  • 24th February, 2020: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Team Leader
United States United States
I am 5 times C# Corner MVP, a blogger and technical contributor at various forums like Microsoft TechNet, C# Corner, Code Project ,etc. I received several awards my community contributions. I have also presented technical contents as an speaker.

Comments and Discussions

 
QuestionBlob data to function Pin
Member 1029087222-Jul-20 4:10
Member 1029087222-Jul-20 4:10 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.