RSS

Monthly Archives: February 2016

Backing up Azure Storage Accounts…

New year, new challenges. And I was confronted with quite a nice one.

One of my customers uses Azure Storage quite intensively. While Azure Storage Accounts provide some protection by means of replication, there’s no real protection from corruption or deletion inside the Storage Account itself. Data that has been deleted will be deleted on the replicas as well. Unfortunately, there’s no replication mechanism to replicate data between Storage Accounts comparable to DFS Replication. Governance constraints may also prevent using Geo-redundant storage. Geo-redundant storage can also not guarantee that the secondary location still has the data before it became corrupted or deleted.

So a mechanism must be developed to protect the data from a potential disaster. Only Blob, File and Table Storage are valid candidates to be protected. Microsoft has released a tool that allows content to be copied to a different Storage Account (including from and to local disk): AzCopy

The relevant information regarding AzCopy is available at https://azure.microsoft.com/en-us/documentation/articles/storage-use-azcopy/

AzCopy is a commandline tool that allows content to be copied, but it is very static by nature. This may be great for a few single Blob containers or Table Uri, but a more versatile approach is required when hundreds of Blob containers and Table Uris need to be copied, especially when they are frequently changed.

Fortunately, I was able to use PowerShell that ‘feeds’ Azcopy with the required parameters to get the job done. The result is a script. The script developed focuses on Blob and Table Storage only, it can easily be extended for File Storage. For the sake of this blog post, Access Keys are used (the secondary) but SAS tokens requires only minor modifications.

Before proceeding, the workflow needs to be defined:

  • Provide the source and destination Storage Accounts, the destination Storage Account will store everything as Blobs meaning that Tables will be exported to a .MANIFEST and .JSON file (to allow the table to be recreated);
  • Retrieve the list of Blob containers and ‘convert’ them to a URL;
  • Copy each Blob container to the destination Storage Account, all containers will be stored as virtual directories in a single container to maintain structure;
  • Retrieve the list of Table Uris;
  • Export each Uri to a .MANIFEST and .JSON file and store them as blob at the destination Storage Account;
  • Log all copy actions.

The benefit of AzCopy is that it tells the Azure platform to copy something. No extensive processing is required by AzCopy. This allows the script to run on an Azure VM. A Standard_A1 VM is sufficient, additional disks are highly recommended when large Tables are used.

Unfortunately, parallel processing of AzCopy.exe is not possible. Everything must be done sequentially. Alternatively, multiple VMs can be spun up to backup their own set of Storage Accounts. This is highly recommended because backing up Storage Accounts may very time consuming, especially when large sets of small files or very large tables. Additionally, some small text files are used to store the items retrieved. They also allow the use of ForEach loops.

To repeat the actions for multiple Storage Accounts, a .csv file is used to input. The .csv file may look like this:

SourceStorageName,SourceStorageKey,DestinationStorageName,DestinationStorageKey
source1,sourcekey1,destination1,destinationkey1
source2,sourcekey2,destination2,destinationkey2

The actual script uses a lot of variables, AzCopy is called using the Start-Process cmdlet while the parameters for AzCopy.exe are populated by a long variable that will be used by the -ArgumentList parameter.

So here’s the script I used to achieve the goal of backing up Blob containers and Table Uris:

#
# Name: Copy_Storage_Account_AzCopy.ps1
#
# Author: Marc Westerink
#
# Version: 1.0
#
# Purpose: This script copies Blob and Table Storage from a source Storage Account to a Destination Storage Account.
# File F:\Input\Storage_Accounts.csv contains all Source and Destination Storage.
# All Blob Containers and Tables will be retrieved and processed sequentially.
# All content will be copied as blobs to a container named after the Source Storage Account. A virtual directory will be created for each blob container.
#
# Requirements:
# – Storage Accounts with Secondary Access Keys
# – AzCopy needs to be installed
# – Azure PowerShell needs to be installed
# – An additional disk to store Temporary Files (i.e. F:\ drive)
# – A Temp Folder (i.e. F:\Temp) with two Text Files ‘temp.txt’ and ‘output.txt’. The Temp Folder will be used by AzCopy.
# – A Folder to store all log files (i.e. F:\Logs)
#

# First, let’s create the required global variables

# Get the date the script is being run
$Date = Get-Date -format “dd-MM-yyyy”

# AzCopy Path
$FilePath=’C:\Program Files (x86)\Microsoft SDKs\Azure\AzCopy\AzCopy.exe’

#Temp Files, let’s make sure they’re cleared before starting

$File1=’F:\Temp\temp.txt’
Clear-Content -Path $File1

$File2=’F:\Temp\output.txt’
Clear-Content -Path $File2

#Recursive Parameter: DO NOT use for Table Storage
$Recursive=’/S’

#Suppress prompt popups
$Prompt=’/Y’

#Temporary Directory for AzCopy
$TempDir=’F:\Temp’

#SplitSize parameter for Tables, this will split a large table into separate .JSON files of 1024 MB
$SplitSize=’/SplitSize:1024′

#DestType parameter, required for copying tables as blobs to the Destination
$DestType=’/DestType:Blob’

#Temporary Directory for AzCopy Journal files
$Journal=’/Z:F:\Temp’

#Blob path
$Blob=’.blob.core.windows.net/’

#https header
$HTTPS=’https:// ‘

#Let’s import the CSV and process all Storage Accounts
Import-Csv F:\Input\Storage_Accounts.csv | % {

#Creating the Full Path of the Source Storage Account Blob
$SourceStoragePath=$HTTPS+$_.SourceStorageName+$Blob

#Creating the Full Path of the Destination Storage Container, if it doesn’t exist it will be created
$DestStorageContainer=$HTTPS+$_.DestinationStorageName+$Blob+$_.SourceStorageName+$Date

#Gather the Source Access Key
$SourceStorageKey=$_.SourceStorageKey

#Gather the Destination Access Key
$DestinationStorageKey=$_.DestinationStorageKey

#Defining the log file for verbose logging with the Source Storage Account Name and the date
$Verbose=’/V:F:\Logs\’+$_.SourceStorageName+$Date+’.log’

#Create the Azure Storage Context to gather all Blobs and Tables
$Context = New-AzureStorageContext -StorageAccountName $_.SourceStorageName -StorageAccountKey $_.SourceStorageKey

#Copy blob containers first

#Get all containers
Get-AzureStorageContainer -context $context | Select Name | % {

Add-Content -Path $File1 -Value $_.Name
}

#Convert all Container Names to full paths and write them to the Output File
Get-Content $File1 | % {

Add-Content -Path $File2 -Value $SourceStoragePath$_
}

#Process all Containers using the Output File as input
Get-Content $File2 | % {

#Gather virtual directory name using the container name
$VirtualDirectory= $_ -replace $SourceStoragePath,”

$ArgumentList=’/Source:’+$_+’ ‘+’/Dest:’+$DestStorageContainer+’/’+$VirtualDirectory+’ ‘+’/SourceKey:’+$SourceStorageKey+’ ‘+’/DestKey:’+$DestinationStorageKey+’ ‘+$Recursive+’ ‘+$Verbose+’ ‘+$Prompt+’ ‘+$Journal
Start-Process -FilePath $FilePath -ArgumentList $ArgumentList -Wait
}

#Before proceeding, let’s clean up all files used
Clear-Content -Path $File1
Clear-Content -Path $File2
#Get All Tables
Get-AzureStorageTable -context $context | Select Uri | % {

Add-Content -Path $File2 -Value $_.Uri
}

#Process all Tables using the Output File as input
Get-Content $File2 | % {

$ArgumentList=’/Source:’+$_+’ ‘+’/Dest:’+$DestStorageContainer+’ ‘+’/SourceKey:’+$SourceStorageKey+’ ‘+’/DestKey:’+$DestinationStorageKey+’ ‘+$SplitSize+’ ‘+$Verbose+’ ‘+$Prompt+’ ‘+$DestType+’ ‘+$Journal
Start-Process -FilePath $FilePath -ArgumentList $ArgumentList -Wait

}

#Cleanup Output File
Clear-Content -Path $File2
}

To have this script run by a schedule, a simple Scheduled Task can be created to do so. The schedule itself depends on the environment’s needs and the time to have everything copied. It’s not uncommon that a large storage account with countless small blobs and huge tables may take a week or even longer to be copied…

 

 

 
Leave a comment

Posted by on 02/02/2016 in Azure, PowerShell, Public Cloud

 
 
Steve Thompson [MVP]

The automation specialist

Boudewijn Plomp

Cloud and related stuff...

Anything about IT

by Alex Verboon

MDTGuy.WordPress.com

Deployment Made Simple

Modern Workplace

Azure, Hybrid Identity & Enterprise Mobility + Security

Daan Weda

This WordPress.com site is all about System Center and PowerShell

IT And Management by Abheek

Microsoft certified Trainer -Abheek

Heading To The Clouds

by Marthijn van Rheenen