Introducing SQL Clone

Redgate Software

0/5 (0 vote)

Mar 10, 2017

CPOL

8 min read

16927

SQL Clone exploits the existing virtualization technology within Windows to take much of the sting out of the traditional problems surrounding database provisioning.

Introduction

It can be a time-consuming and often frustrating task to provision all development, test and delivery environments, such as UAT, pre-production and so on, with the latest copy of the production database, or with "production-like" data. Moving large databases around takes time, and there are inevitably painful issues to resolve around disk space, database configuration, permissions and more.

I'll describe how a new tool from Redgate, SQL Clone, can remove much of this pain. I'll demonstrate a nightly provisioning job, in PowerShell, that will make multiple copies of a database available for non-production environments, with data as of the nightly backup. SQL Clone allows us to create a 'data image' from a database or backup, and then reuse that data image many times, delivering database clones to multiple SQL Server instances in a matter of seconds, and using only a few megabytes of disk space for each clone. From SQL Server's perspective, a clone is just a normal database, and we can query and update the clone just as any other database, with any changes isolated in a local differencing file.

For more information and to download a trial, head over to the SQL Clone product page.

The pains of Database Provisioning

Increasingly, developers expect to be able to run scripts to make available, reliably and predictably, all required database and application resources. This includes the ability to develop and test against production-like data, in databases installed in a production-like environment. It means they can perform realistic load, integration and acceptance testing as early as possible, and see the true impact their changes are likely to have, and can reproduce quickly the production conditions required to diagnose bugs and implement hotfixes.

However, databases are unlike other resources; they are by definition mutable, so only really exist as of a given point in time; they are large, averaging around 100GB according to our latest survey; they have to be mounted or attached to a database management system; the commands required to make them available in the application context are complex and specific.

The pains of database provisioning are all too familiar, the disk space juggling, the endless ad hoc requests for an environment 'refresh'. I've experienced first-hand the frustration of waiting for backups to restore to a shared development database, and then getting in each other’s way when making changes, and having to request a new restore in order to revert to the baseline.

And yet, just how hard can it be to automate this task? In fact, creating a solution that will copy down a large database to any environment, and make it available in correct configuration, is trickier than you might think.

Let’s have a look behind the curtain. As a DBA working with a software team, I'll start from a familiar place, my existing database backup maintenance plan in SSMS:

Surely all I need to do is reverse the process, restore the backup and run an integrity check, but this time do it in a way that can be scheduled, or run on demand?

Back to the maintenance wizard, but there doesn’t seem to be an option for restoring the database:

No problem, I’m pretty sure I’ll need to write a script anyway. I’ll just use the Restore Database wizard, then hit the Script button to get my starting point.

This gives me the following block of T-SQL:

USE [master]
RESTORE DATABASE [Forex-Copy] FROM  DISK = N'E:\SQL\MSSQL12.MSSQLSERVER\MSSQL\Backup\Forex_backup_2017_02_20_092655_6408207.bak' WITH  FILE = 1, 
 MOVE N'Forex' TO N'E:\SQL\MSSQL12.MSSQLSERVER\MSSQL\DATA\Forex-Copy.mdf', 
 MOVE N'Forex_log' TO N'E:\SQL\MSSQL12.MSSQLSERVER\MSSQL\DATA\Forex-Copy_log.ldf'
 ,  NOUNLOAD,  STATS = 5
GO

So, RESTORE DATABASE x FROM DISK= ' … '. Ah, my files have timestamps. No matter, I’ll put together some PowerShell to query the backup folder and find out what my filenames are:

WITH MOVE … hmm, but the drives are different for each of the servers to which I need to restore. OK, so now I need to create a PowerShell function that takes an environment name, and resolves it to a file path. Shouldn't be too tricky. What? Some have two files, some have more? That’s OK, I'm sure my scripting skills can handle it…although I hope the destination paths have enough disk space!

Of course, on my production server, I have various service accounts for my application (Orders service, Invoices service and so on). So, right after the restore I’ll need to drop those Production logins, and then create environment-specific logins (UAT_Orders, UAT_Invoices etc.), and map them to the users and roles in that database. I’m sure the application developers will let me know in plenty of time every time they need a new service account?

Hopefully, by now, you get the picture. While it may sound simple to script a database restore and then adapt it, in order to automate database provisioning to various environments, it's not. In fact, many of the scripted solution I've encountered personally have tended to be rather brittle and unreliable.

SQL Clone is a dedicated database-provisioning tool that can remove some of these problems entirely. For example, it uses the VM technology built into Windows to provide storage virtualization; you no longer need to think about disks. Its PowerShell interface makes it very easy to automate provisioning, updates and maintenance of the clone databases.

Another Way – Using SQL Clone

SQL Clone uses the Virtual Disk Service in x64 Windows to allow the same bytes (the 'data Image') to be reused many times, and on multiple SQL Server instances, as 'Clone' databases. The changes made on each clone are stored in a differencing disk on the local clone machine, so the data Image is immutable. In other words, the actual database image is held only once, in the network, and never updated: all changes are held locally with the clone, using existing virtualization technologies within Windows.

The provisioning time for each clone is just the time it takes to set up the .vhd and mount the database (typically seconds), and the initial clone requires only about 40MB of disk space.

You can find out more about how SQL Clone works here, but let me show you it in action.

Worked example of database provisioning

You'll need a database to play with. In this example, I use a Foreign Exchange rate (Forex) database that is about 48GB. However, a great, publicly-available alternative is to use Brent Ozar’s copy of the StackOverflow database, which weighs in at 95GB; not huge, but big enough to be a pain to copy around. The following screen shows a few of my existing data images, in the SQL Clone UI:

A common approach for many users, in the first instance, will be to set up SQL Clone using the UI, and then start thinking about automation with PowerShell. That's the approach I'll use here; I'll create an initial data image through the UI, and then automate image creation, and provisioning of clones from that image, using PowerShell.

SQL Clone's user interface provides a simple Create Image workflow. I can select a backup or a direct SQL Server connection as the source. In this example, I'm creating the data image from a backup, so I provide the path to the backup and identify a SQL Server instance (SQL Server has to initialize a restore operation on this instance, which creates the data image).

After that, just follow the instructions and hit Create Image.

Scripting data image creation

The developers will want to be using the latest data, so I need to automate this task so that it runs nightly, in a batch during my maintenance window. Time to turn to the PowerShell interface for SQL Clone.

# Script to create a new SQL Clone data image from a backup file

$SQLCloneServer= "http://mysqlcloneserver.red-gate.com:14145"
$SQLCloneAgent = "rm-iclone1"

Connect-SqlClone -ServerUrl $myUrl

$SourceDatabase = 'Forex'
$BackupFolder = 'E:\SQL\MSSQL12.MSSQLSERVER\MSSQL\Backup'
 
if (!(Test-Path ($BackupFolder)))
  {
    write-host 'Backup folder not found. Exiting.'
    break
  }
 
# Get the latest backup file for our database (striped backups would be more complex)
$BackupFiles = Get-ChildItem -Path $BackupFolder  |
    Where-Object -FilterScript { $_.Name.Substring(0,$SourceDatabase.Length) -eq $SourceDatabase}   # My backup files always start with the database name
 
# Now we have a filtered list, sort to get latest
$BackupFile = $BackupFiles |
    Sort-Object -Property LastWriteTime  |   
    Select-Object -Last 1 # I only want the most recent file for this database to be used
 
$BackupFileName = $BackupFile.Name

#Start a timer
$elapsed = [System.Diagnostics.Stopwatch]::StartNew()
 
"Started at {0}, creating data image for database ""{1}"" from backup file ""{2}""" -f $(get-date) , $SourceDatabase , $BackupFileName
 
$DataImageName = $SourceDatabase + "_" + (Get-Date -Format "yyyyMMdd") # Prepare a name for the data image, with a timestamp
$ImageDestination = Get-SqlCloneImageLocation -Path '\\is-filestore.red-gate.com\rm-iclone\RM\SQL Clone Beta Images' # Point to the file share we want to use to store the image
$CloneBackupLocation = Get-SqlCloneBackupLocation -Path $BackupFolder # Point to the backup folder we want to work with (this was 'registered' with SQL Clone when I used the UI above)

$NewImage = New-SqlCloneImage -Name $DataImageName -BackupLocation $CloneBackupLocation -BackupFileName $BackupFileName -Destination $ImageDestination | Wait-SqlCloneOperation # Create the data image and wait for completion

"Total Elapsed Time: {0}" -f $($elapsed.Elapsed.ToString())

Listing 1: Automating data image creation

If I set the script running and then return to the SQL Clone UI, I can see the image creation in progress. This took 20 minutes for my 48GB database, but remember that I can create multiple clone databases from this image, at little further cost in terms of time or disk space.

Scripting clone deployment to multiple environments

Having automated creation of a nightly data image, developers can self-serve the creation of clone databases from that image, on-demand, and requiring only a handful of megabytes per copy.

In this case, I'm just going to deliver a database from that nightly image to a few environments and adjust some permissions following delivery.

# Script to create a new SQL Clone database on each of my connected machines

$SQLCloneServer= "http://rm-win10-sql201.testnet.red-gate.com:14145"

Connect-SqlClone -ServerUrl $myUrl

$SourceDataImage = Get-SqlCloneImage -Name  'StackOverflow 20170117'

$CloneName = 'StackOverflow_Latest'

# I have 3 SQL Server instances registered on my SQL Clone Server - I want to deliver a copy to all of them
$Destinations = Get-SqlCloneSqlServerInstance

# I'm only going to make a small adjustment to permissions in this example
$Query = "CREATE USER StackOverflowUser FROM LOGIN [RED-GATE\Richard.Macaskill];ALTER ROLE db_datareader ADD member [StackOverflowUser];"

# Start a timer
$elapsed = [System.Diagnostics.Stopwatch]::StartNew()

"Started at {0}, creating clone databases for image ""{1}""" -f $(get-date) , $SourceDataImage.Name

foreach ($Destination in $Destinations)
{
    $SourceDataImage | New-SqlClone -Name $CloneName -Location $Destination | Wait-SqlCloneOperation
    $ServerInstance = $Destination.Server + '\' +$Destination.Instance
    Start-Sleep -s 10
    Invoke-Sqlcmd -Query $Query -ServerInstance $ServerInstance -Database $CloneName
    "Created clone in instance {0}" -f $Destination.Server + '\' + $Destination.Instance;  
}

"Total Elapsed Time: {0}" -f $($elapsed.Elapsed.ToString())

Listing 2: Scripting Clone delivery

If I look in the UI of SQL Clone, I can see that my databases have been delivered, taking up a handful of megabytes on each instance.

For a really smooth workflow, there is a lot more I could do with the script in Listing 2. For example, I might want to perform some trivial data masking. In this example, we're cloning a production database, so we're assuming there are no data sensitivity issues, but there still might be data that you don't want the application to use (such as real email addresses). We can do this quite simply by running the invoke-sqlcmd cmdlet on the clone database. We could also build in automated database configuration, and possibly also some clone maintenance. For example, some early users set up '‘Cinderella’ database clones; basically SQL Clone scripts to remove any SQL Clone databases (and images) older than 7 days.

Having configured a clone as required for a particular environment, you have the option of creating a fresh data image from that clone, as part of the overnight batch (though of course, the job would take longer).

Conclusion

SQL Clone exploits the existing virtualization technology within Windows to take much of the sting out of the traditional problems surrounding database provisioning.

Redgate will be working on additional capabilities in the coming months. For now, get your trial copy here, and see what it can do for your database provisioning workflows.