In my day to day role as an Azure Solution Architect I get involved in some pretty substantial and very complex deployments for customers which requires a lot of planning and design work. One thing I have found, especially in this new cloud world is that there are about a dozen ways to solve the customer’s problem and each one would be technically right. It typically comes down to what gives the customer the best solution without breaking the bank.
One of the more complex issues I’ve found when working on IaaS deployments with a large number of virtual machines is ensuring that the storage account design is sound. One of the accepted practices is to group common VM tiers into shared storage accounts and to of course place each VM into an availability set to ensure that they fall into Microsoft’s 99.95% SLA. Digging a bit deeper this practice isn’t as resilient as one might think. Sure having VMs in an availability set spreads the VMs across separate fault and update domains, but what about storage? If I have both of my VMs in the same storage account and the underlying storage is unavailable then what happens?
Is this right? Do I need to place each VM into its own storage account for greater resiliency?After doing a bit of research I found this great article: https://azure.microsoft.com/en-us/documentation/articles/resiliency-high-availability-checklist/
In particular this section caught my attention: (I have highlighted the key points)
Are you using premium storage and separate storage accounts for each of your virtual machines?
It is a best practice to use premium storage for your production virtual machines. In addition, you should make sure that you use a separate storage account for each virtual machine (this is true for small-scale deployments. For larger deployments you can re-use storage accounts for multiple machines but there is a balancing that needs to be done to ensure you are balanced across update domains and across tiers of your application).
So it seems premium storage and separate storage accounts are the way to go. Things get even more interesting. Read on…
Not only should you use premium storage and separate storage accounts for your VMs you need to name the storage accounts following a specific naming convention or you run the risk of the storage partitions being potentially co-located on the same partition server. That caught my attention. Luckily I was sent this article: https://azure.microsoft.com/en-us/documentation/articles/storage-performance-checklist/ and the section that really cleared everything up for me was this:
Partition Naming Convention
…naming conventions such as lexical ordering (e.g. msftpayroll, msftperformance, msftemployees, etc) or using time-stamps (log20160101, log20160102, log20160102, etc) will lend itself to the partitions being potentially co-located on the same partition server, until a load balancing operation splits them out into smaller ranges.
You can follow some best practices to reduce the frequency of such operations.
- Examine the naming convention you use for accounts, containers, blobs, tables and queues, closely. Consider prefixing account names with a 3-digit hash using a hashing function that best suits your needs.
- If you organize your data using timestamps or numerical identifiers, you have to ensure you are not using an append-only (or prepend-only) traffic patterns. These patterns are not suitable for a range -based partitioning system, and could lead to all the traffic going to a single partition and limiting the system from effectively load balancing. For instance, if you have daily operations that use a blob object with a timestamp such as yyyymmdd, then all the traffic for that daily operation is directed to a single object which is served by a single partition server.
So from the above information it seems that the following holds true:
1) Use Premium storage in conjunction with separate storage accounts. This gets around any IOPS limits per storage account as well but there is a limit of 200 storage accounts per subscription which is a hard limit.
2) Prefix your storage accounts with a random 3 digit hash per storage account to ensure that the storage accounts are properly spread across load balanced partition servers. For example naming your storage accounts storageaccount1, storageaccount2 isn’t sufficient. Go with something like fxwstorage1, bcdstorage2 etc. to ensure that the storage accounts are load balanced correctly. Luckily for us we can use ARM templates to provision storage accounts using the naming convention as mentioned above, but that’s for another post….