The Curious Case of the “Un-Enforced” Azure Key Vault RBAC Policy

I want to preface this post with a small, but important comment: there is absolutely no vulnerability here. If you’re looking for some dirt, you won’t find any, and as you read through this, you’ll hopefully realize what’s going on and maybe learn a little about Azure RBAC policy and asymmetric crypto in general.

I guess the title is a little ‘click-baity’ 🙂

The ‘Issue’

I wrote some C# code that encrypts and decrypts data using Azure Key Vault APIs. It’s pretty simple, you can grab the source from https://github.com/x509cert/AzureKeyVault.

In the screenshot below, an exception is raised calling cryptoClient.Decrypt() on line 60, but to get to this point in the code, cryptoClient.Encrypt() has been called on line 55 and it succeeded.

Figure 1: Exception raised on Decrypt(), but not on Encrypt()

So what?

If you take a look at the RBAC policy for my account, you can see I DO NOT HAVE the ability to Decrypt nor Encrypt:

Figure 2: My Access Policy for this Key Vault, I should not be able to encrypt nor decrypt

In short, my account:

  • Does not have the Key Encrypt right, but my code can encrypt (not expected)
  • Does not have the Key Decrypt right, and my code cannot decrypt (expected).

So what’s going on?

Digging In

Let’s start at the beginning with some pertinent details, if you know Azure RBAC and Azure Key Vault well, you can probably jump straight to the “Why Is the Key Encrypt Policy not enforced” section.

  1. Azure Key Vault stores and manages three kinds of items: Keys, Secrets and Certificates.
  2. Key Vault keys are only asymmetric RSA or Elliptic Curve keys.
  3. Keys are real keys used for signing/verifying, wrapping/unwrapping and encryption/decryption.
  4. RSA and Elliptic Curve keys are asymmetric, comprised of a public key and a private key.
  5. The public key is, well, public!
  6. Because the public key is public, it does not need to be securely stored.
  7. The private key is, you guessed it, private.
  8. It’s imperative that private keys be protected, preferably directly in hardware or encrypted using hardware-backed keys.
  9. Only RSA keys can encrypt, sign and wrap; Elliptic Curve keys can only sign (this is an algorithm issue and has nothing to do with Key Vault)
  10. Key Vault provides multiple layers of defense, including network isolation, IP restrictions, authentication and authorization.
  11. One form of authorization is role-based access control, or RBAC.
  12. RBAC policies can be defined at the data-plane (to control day-to-day use of the service) and at the control-plane (to control management of the service).
  13. Key Vault has fine-grained RBAC controls for Keys, Secrets and Certificates at both the data-plane and the control-plane.

Make sure you thoroughly understand the list above before you move on.

Let’s dig into RBAC policy a little. If you look at https://docs.microsoft.com/en-us/azure/role-based-access-control/resource-provider-operations#microsoftkeyvault you will notice there’re two possible Action Types:

  • Action
  • DataAction

‘Action’ includes RBAC policy options at the control-plane (ie; Delete a Key Vault) and ‘DataAction’ includes RBAC policy options at the data-plane (ie; Read a key.)

Here are some examples. First some control-plane policies:

ActionMicrosoft.KeyVault/vaults/readView the properties of a key vault
ActionMicrosoft.KeyVault/vaults/writeCreate a new key vault or update the properties of an existing key vault
ActionMicrosoft.KeyVault/vaults/deleteDelete a key vault
Table 1: Sample Azure Key Vault Action policies

And some data-plane policies:

DataActionMicrosoft.KeyVault/vaults
/keys/encrypt/action
Encrypt plaintext with a key. Note that if the key is asymmetric, this operation can be performed by principals with read access.
DataActionMicrosoft.KeyVault/vaults
/keys/decrypt/action
Decrypt ciphertext with a key.
DataActionMicrosoft.KeyVault/vaults
/keys/delete
Delete a key.
Table 2: Sample Azure Key Vault DataAction policies

Why Is the Key Encrypt Policy not enforced?

So let’s get back to the opening question. Why can my code encrypt data when my account does not have the right to do so?

The answer is:

The encryption is not performed by Key Vault,
it’s performed by the Key Vault SDK code.

Why?

There is no need to perform the encryption operation in Key Vault because:

Asymmetric encryption uses a public key.

The public key is public! Remember the list at the start of this post, items 5 and 6? In theory, anyone can use the public key to encrypt, but it takes the private key to decrypt.

The decryption operation is performed by Key Vault and hence there is an RBAC policy check. In other words:

We must protect the decrypt operation because it uses a private key.

Remember, today Key Vault keys are always asymmetric. If you’re wondering if Key Vault can store AES keys, it can, but they are stored as secrets, not keys, and because they are secrets, they cannot be used by Key Vault to perform encryption/decryption. You can pull the secret from Key Vault into your code and perform AES crypto in your code. Your code would then be subject to a different bucket of RBAC policies:

Figure 3: RBAC Policies that apply to Secrets in Key Vault

If you must block someone from reading a public key from Key Vault, you can remove the Get Key role from their access policy:

Figure 4: Denying the ability to read a key

On a side note, you can see that only the decrypt operations are performed in Key Vault by looking at the logs using Azure Monitor:

Table 3: Key Vault audit logs

Finally, this situation is described in the description of the Microsoft.KeyVault/vaults
/keys/encrypt/action RBAC policy: “Encrypt plaintext with a key. Note that if the key is asymmetric, this operation can be performed by principals with read access.”

In Summary

Technically, the Key Encrypt RBAC policy in Key Vault is not needed when using asymmetric encryption because anyone and everyone can access the public key, other Key Vault defenses aside.

If you can read the key, you can encrypt with it and encryption outside of Key Vault is not the purview of Key Vault.

The Key Decrypt RBAC policy is needed to decrypt because the decrypt operation requires access to the private key, and that’s a trusted operation performed by Key Vault.

TL;DR: This is working as intended!

Big thanks to Heath Stewart, Hervey Wilson and Scott Schaab on the Key Vault team for their valuable assistance and feedback.

Thoughts on Passing AZ-500

Well, I passed AZ-500 about 60mins ago. All I can say is I feel relieved to have it behind me, because it’s a beast.

IMPORTANT: I am neither confirming nor denying that any of the material below was in the exam. This is material I learned along the way, however.

AZ-500 is the current Azure Security exam, you can read more about it here.

Now, you’re probably thinking, “Aren’t you a security guy at Microsoft? Shouldn’t this be easy for you?” The answer is an emphatic, resounding, vigorous “nope.”

The reason it’s hard isn’t because I don’t know the subject matter; ok, sure, there’re some parts of Azure I know zip about, for example, I know what Privileged Identity Management (PIM) is, but until studying for AZ-500 that was the extent of my knowledge. The reason it’s hard is because of all the subtle nooks and crannies within Azure generally, and Azure Security specifically.

Here are some examples.

So you decide to study Key Vault in depth, which is a great idea. But do you understand the limitations of using Key Vault with various other Azure services. In different regions? In different resource groups?

Do you understand the specific RBAC requirements when pulling containers from Azure Container Registry? Like did you know that the AcrPull role can pull containers, but so can the AcrPush role? Don’t believe me? Take a look. And if you still don’t believe me, go take a look at all the RBAC scopes for AcrPush in the Azure Portal. And if you don’t know how to do that last point, you really ought to know!

Do you know the folder permissions required on a parent folder when using POSIX 1003.1 ACLs on Azure Data Lake Storage Gen 2 volumes?

Do you truly understand the relationships between ASGs, NSGs, subnets and VNets? Now throw VMs into the mix and consider virtual NICs. One thing I learned along the way is an NSG can be assigned to more than one VNIC.

Learning a service in isolation is only the starting point, you absolutely need to understand how the services work together.

Me Adding Some Value

So this is where I think I can add some value if you want to pass AZ-500.

There’re plenty of good classes out there you can take, here’re two I used:

They are both good. I did not go over each from start-to-end, however; I started by focusing on the areas I knew little about. Like PIM! I looked at the PIM material in Udemy and then in WhizLabs. I also had the sessions on my phone so I could listen in the car. It may be boring, but it’s way more uplifting than listening to the news!

But the courses themselves are not enough; I also replicated the material in my own subscriptions. I learn by doing, not watching or reading. After I had looked at PIM in both classes, I jumped into the Azure Portal and did the work. The most important learning experience is when something didn’t work, because then you REALLY need to understand how it works as you figure out why it failed.

Sure, it takes longer doing it this way, but it cemented the service details in my head.

My other source of learning was good ol’ docs.microsoft.com; this repo has list of the AZ-500 requirements and links to appropriate material at the docs site. Keep this repo handy!

I had a thick pile of printed material, especially on features I was not 100% familiar with. Like PIM! Right before the exam I read through all the printed material. By “right before” I mean all the way up to 30secs before logging on to take the exam!

Finally, I took sample tests, and if I failed a question (that happened a lot!), I printed out the correct answer and made sure I understood why I got it wrong. I mainly used http://www.measureup.com/ for this.

Here’s the TL;DR:

  • Watch AZ-500 videos
  • Read docs.microsoft.com
  • Focus on the parts of Azure Security you don’t know well
  • Make sure you understand the relationships BETWEEN services
  • Spend most of your time in the Azure portal doing stuff.

In summary, I am happy I have AZ-500 behind me. I probably spent around 60 hours studying for this, spread over a couple of months.

All the best if you take the exam.

I think I will do AZ-204 next 🙂

PS: We started an Azure Security Podcast! https://azsecuritypodcast.net/

Make sure you understand what Azure SQL Database Transparent Data Encryption (TDE) mitigates

Encryption of data at rest is one of the most important defenses we have in our architectural arsenal. When firewalls, authentication and authorization fail, correctly encrypted data gives the attacker nothing but a jumble of valueless bytes.

Leaked data is bad, but correctly encrypted data is substantially less worse.

I really want to stress ‘correctly encrypted’, by this I mean appropriate algorithms, keys, key derivation, key length, key storage etc.

This brings us to SQL Server Transparent Data Encryption (TDE).

Please note that TDE-type technology exists in other database engines, not just SQL Server.

What TDE Does

TDE was first introduced in SQL Server 2008 and allows an administrator to configure SQL Server so that it automatically encrypts and decrypts data at rest. The key word in the last sentence is ‘automatically.’ No code updates are needed whatsoever, no updates to client code, triggers, or stored procedures. It all ‘just works.’

But this is where knowing what TDE is designed to mitigate is critical. TDE mitigates a stolen drive or database file. If an attacker accesses a TDE-encrypted database (ie; tables etc) perhaps through a SQL injection attack, then the bad guy will get plaintext not ciphertext. This is the ‘transparent’ aspect of TDE.

Let me say that last part again, because it’s important:

TDE + SQLi == Plaintext

The Value of TDE

If the attacker can read the database file directly (eg; database.mdf), for example via a directory traversal vulnerability at the server, then they will only get ciphertext back, because SQL Server does not decrypt the data automatically.

Note this last point is also true for a database backup. It’s encrypted using TDE even when held in another location away from the SQL Server database itself.

I would argue that when using Azure SQL Database (the PaaS version, not SQL Server IaaS, that resides in a Windows or Linux VM), the value of TDE in a stolen hard drive scenario is considerably lower than on-prem. TDE provides protection against a lost or stolen disk, and Azure takes physical custody of the disks seriously; and while these scenarios are unlikely, mistakes happen, and TDE provides an extra defensive layer.

As a side note, you can read more about Azure data center physical security here.

If you use TDE, then you should use it in conjunction with keys you manage, that way if you know of an attack you can pull the keys or deny access to the keys. This way, the attacker has access only to cipher text once you pull the keys because SQL Server no longer has access to the keys. It’s not perfect, but it’s better than the attacker getting everything.

For many customers, TDE (using customer managed keys) offers protection against an internal Azure attack. Again, the likelihood of this scenario is slim, but it’s never zero.

Now What?

So what are our options?

One option is to use SQL Server “Always Encrypted“, but there are implications you must be aware of.

First, you will probably need to make code changes to an existing system. I can’t go into what you have to do because it varies by app, but don’t think you can take an existing application, change SQL Server to use Always Encrypted and expect everything to work. It probably won’t.

In my opinion, if you’re designing a new system, you should seriously consider using Always Encrypted on top of TDE. You can certainly update an existing system to use Always Encrypted, but as noted, it’s not trivial.

Always Encrypted allows you to perform only equality operations over encrypted columns that use a specific version of Always Encrypted called “Deterministic Encryption” so you will need to change the way some of your queries work. SQL Server 2019 adds support for performing more complex queries in a secure enclave, but that is available only in the non-PaaS version of SQL Server.

One final feature to look at is Dynamic Data Masking, it’s not a true security boundary, but it does help mitigate casual snooping.

Wrap Up

For an existing Azure SQL Database system that uses TDE, continue to use TDE, but I would suggest you use keys held in Key Vault if you want 100% control of the keys.

See if there’s an opportunity to move some of the more sensitive fields to take advantage of Always Encrypted. The fewer the number of columns using Always Encrypted, the smaller the chance of regressions.

Big thanks to Andreas Wolter, Shawn Hernan and Jakub Szymaszek from the SQL Server Security and Azure Security Assurance teams for their review and valuable comments.

So you want to learn Azure Security?

A few weeks ago I spoke to a new Microsoft employee who is trying to find his spot in security within the company. What follows is some advice I gave him.

Before I get started I want to share something that serves as the cornerstone for the rest of this article.

Some years ago, I made a comment that if you’re a developer working in the cloud then you need to learn basic networking, and if you’re a networking geek, you need to learn basic programming.

This comment is, in my opinion, as true today as it was when I first made the comment. The worlds of development and networking are deeply intertwined in the cloud and if you want to excel, you really need to understand both.

Now onto my Azure security advice.

Embrace the complexity

First up, cloud infrastructure is complex, so don’t be too concerned if you don’t understand all of it at once. No-one I know understood all of it from the get-go, either. When you do finally understand it, something new or updated will come along anyway! So don’t be disheartened! Just roll with the punches and keep learning.

I set aside a 2-3 hours a week in my calendar labeled ‘Learn’ and I use Microsoft ToDo to track “Stuff to Learn” as I run across items of interest where I feel I should know more.

Right now I have about 20 items on the list, and whenever I come across something of interest, I add it to the list.

Examples in the list include:

Setup an Azure account

If you don’t already have a free Azure account, sign up for one. There is absolutely nothing that can compare with getting your hands dirty. Head over here to get your free account.

Learn the basic network defenses and technologies

Azure has many network defenses, below is a list of some defenses/techs you MUST understand, I would recommend you learn these before you progress:

  • Virtual Networks <link>
  • Network Security Groups <link>
  • Service End-points <link>
  • Azure Private Link<link>
  • Web Application Firewall <link>
  • Azure Bastion <link>
  • Azure Firewall <link>
  • Azure Security Center <link>
  • Azure Sentinel (at least understand what it is) <link>
  • DDoS Protections <link>

Learn the basic application defenses and technologies

Next, you need to understand various application-layer defenses/techs, examples include:

  • Azure Active Directory <link>
  • Setting up Multi-Factor Authentication <link>
  • Azure AD Privileged Identity Management <link>
  • Service Principals and Managed Identities <link>
  • Application Gateway <link>
  • Application Security Groups (they are associated with NSGs) <link>
  • Application-specific ‘firewalls’ (eg; SQL Server, CosmosDB etc) <link><link>
  • Key Vault <link>
  • RBAC <link>
  • Azure Policy and Blueprints <link> <link>
  • OAuth 2 and OpenID Connect <link>
  • Application-specific encryption of data at rest, such as for Storage accounts <link>

Compliance

Another important topic is compliance. Yes, I realize that security != compliance, but it’s a topic you must be versed in. Start here for your Azure compliance journey.

Build Something

Now that you have a basic idea of the core security-related tools and technologies available to you in Azure, it’s time to create something. When I want to learn something I build something.

Some years ago when PowerShell was still in its infancy, I asked Lee Holmes, “What’s the best way to learn PS?” He replied, “You know all those tools you wrote in C/C++/C#? Re-write them in PowerShell!” So I did, and I learned an incredible amount about PowerShell in a short time.

What you decide to create is up to you, but what I’d start with is:

  • Create two VMs in the same VNet, but different subnets – try pinging one VM from the other, does it work? Explain.
  • Using the same VMs, add an NSG to one subnet that blocks all traffic to/from the other VM’s IP address. Can you ping one VM from the other? Explain.
  • Create two VMs in different VNets – try pinging them, does it work? Explain.
  • Encrypt the hard drive of one of the VMs. You will need to create a Key Vault to do this.
  • Take a look at the NSG associated with a VM. Enable Just-in-Time (JIT) access to the VM in Azure Security Center. Now look at the NSG again. Now request JIT access and take another look at the NSG. Explain what happened.
  • Create a Key Vault, add a secret and pull out the secret from, say, an Azure Function. This is quite complex and requires you add a managed identity to the Function or run the function in the same VNet as the Key Vault.
  • If you used a managed identity in the example above, make sure you assign least privilege access to the Key Vault (ie; read access to secrets and nothing else)
  • Create a custom role with very specific actions.
  • Create a blob in a Storage Account. Experiment with the various authorization polices, most notably SAS tokens.
  • Install Azure SQL and configure ‘Always Encrypted’
  • Use Azure Monitor to see who is doing what to your subscription.
  • Set an alert on one of the event types.
  • Open Azure Security Center – look at your issues (red). Look at the compliance issues (PCI etc)
  • Remediate something flagged by ASC.
  • Set a policy that only allows a hardware-backed Key Vault and create a non-HSM KV (ie; not Premium). Use this as a starting point https://github.com/x509cert/AzurePolicy. Remember, it can take 30mins after a policy is deployed to it being effective. I previously wrote about Policy here.

I could keep going with more examples and I will update this list over time!

As a side-note, I often use a resource group named rg-dev-sandbox, when experimenting, that way I can blow the resource group away when I am done, leaving nothing behind.

Go Deep

After you have learned and experimented, it’s time to go deep. Pick a product, say Azure SQL Database, and learn absolutely everything there is to know about security, reliability, compliance and privacy for that product. For a product like Azure SQL, this would include:

  • Access Policies for data at rest
  • Crypto for data at rest (TDE, Always Encrypted, Column Encryption)
  • Crypto for data on the wire (ie; TLS!)
  • Auditing
  • Disaster recovery
  • Secure access to connection strings
  • Azure AD vs SQL Authentication
  • Data masking (ok, not REAL security, but useful nonetheless)
  • Threat Protection
  • Azure SQL firewall (note a lower-case ‘f’ as it’s not a true ‘F’irewall)
  • SQL injection issues and remedies

Consider AZ-500 Certification

I know some people are cynical about certification, but the Azure certifications are not easy and from customers I have spoken to, they are welcome and required. I worked with a large financial organization for over a year and they required their staff working on Azure get certified in various Azure topics. You can get more information about certifications here.

AZ-500 measures Azure security knowledge, and the exam includes labs. I would highly recommend you read the skills outline. Even if you don’t take the exam and get certified, this is a broad set of security-related items you really ought to know.

Wrap Up

I hope this helps you on your journey through Azure security, even if this post only skims the surface!

But remember, as soon as you understand it, something will change, so stay abreast of new features and function by monitoring the Azure Heat Map.

Big thanks to my colleague Ravi Shetwal for his review and feedback.

Who is Authenticating Whom?

I have worked on too-many-to-count threat models over the last twenty or so years and invariably I have to explain, “Who is authenticating whom?”

Let me explain the issue with a simple example.

In this scenario, a user communicates through an API App on Azure which in turn communicates with a CosmosDB instance.

When I ask the question, “How do you authenticate the API App?” The response is often, “We use a username and password” and here’s where the confusion lies: the answer (uid and pwd) is the user’s credential.

What I do at this point is hop up to the whiteboard to explain that both communicating parties should authenticate one another.

First, when the user connects to the API App, how does the user know that the API App is the correct API App and not a rogue? The answer is to use server authentication and the correct answer most of the time is to use TLS.

That’s what I meant when I asked, “How do you authenticate the API App?”

Next, how does the API App know the user is a valid user of the system?

That depends on what the server, in this case API Apps, supports.

Different server technologies offer various ways to authenticate incoming connections from users. Examples include client TLS, username and password (with 2FA!) and delegating to a federated identity provider, such as Google, Facebook, Twitter or Microsoft ID.

A common way to authenticate a valid user is through the use of access keys, which are often implemented as a long, random series of bytes that are passed in the client request. Here’s an example from an Azure Storage Account:

Another method to authenticate and authorize a client is to use Shared Access Signatures. Here’s an example from the same Storage Account:

…and a resulting SAS token:

This page and this page explain the client authentication (not authorization) policies supported by Azure API Apps.

Let’s be super secure and in our solution we’re going to use TLS at both ends: TLS is used to authenticate the server and TLS is used to authenticate the client.

Let me state this another way.

When the user connects to a server, the user’s software (browser) authenticates the server is the correct server. This is SERVER authentication.

On accepting a user’s connection, the server verifies the user is who they are. This is CLIENT authentication.

But it gets a little more complex. When the API App connects to CosmosDB, how does CosmosDB know the connection is from a valid source and how does the API App know it is talking to the correct CosmosDB?

In this scenario, the API App is now the client, and CosmosDB is the server. When building the threat model, you might decide that you don’t need to authenticate the two to each other. That’s fine, so long as is it’s the threat model and the rationale for that decision.

API App authenticating CosmosDB, the server, is easy: use TLS, which is enabled by default anyway in CosmosDB.

When the API App connects to ComsosDB, it can provide and key to do so. This key should be stored in Key Vault and use a Managed Identity for the API App to access Key Vault.

Now I think about this, I might do a Proof of Concept for the entire solution to show how it’s done and build the threat model while I am at it. There’re many more ways to restrict access at the backend, such as only allowing CosmosDB to listen on specific IP addresses or ranges, and in this scenario, we can use the backend IP address of API Apps. This way CosmosDB will ONLY allow traffic from our API App. Let me know what you think!

Anyway, to wrap up. When working with a threat model, or looking at the security of any architecture, you must always consider:

“Who is authenticating whom”

or… how is the client authenticating the server (server authentication) and how is the server authenticating the client (client authentication)?

Azure Monitor Activity Log Change History (Preview)

There’s a new public preview feature named Change History in Azure Monitor Activity Logs that allows you to get a better feel for what caused events in the first place.

I don’t know about you, but wading through a JSON description of an event to determine what happened can be a little cumbersome at times, and this feature is a great time-saver.

To get to the feature, go to Monitor and then click Activity Log:

This is the list of all your management plane activities across your subscription consumed by Azure Monitor.

NOTE: If you don’t see Monitor in the list on the left, type monitor in the Search resources, services, and docs textbox at the top of the portal:

Below is an expanded example of the Activity log from my personal subscription:

Now click on an Update event – in this example, I will click the first event. As you can see I have been experimenting with Azure Private DNS; I had just got done running this example.

Notice there’s a new option named Change history (preview):

There’s a list of changes to this resource shown and if I click any of these entries, the following pops up:

As you can see – right there is the diff – this change was me changing a tag named vmRelated from True to False.

More info is available here.

Simple, but effective!

Azure Logic Apps and SQL Injection Vulnerabilities

This code takes a property from an untrusted HTTP post named name and uses that as input to a SQL statement. If I use an HTTP POST tool, like Postman, I can test the code:

In this example, the name Mary is inserted into the database and the code is working as intended. But if the untrusted input is changed to something like this:

Well, I think you know what happens! Assuming the connection to SQL Server has enough privilege to delete the foo table this code will go ahead and delete the foo table.

Remember, one good definition of a secure system is a system that does what it’s supposed to do and nothing else. In this example, the code can delete a table, and that is far above what was written in the spec!

Most common SQLi attacks are used to exfiltrate data when using SQL select clauses using the classic or 1=1 -- string and the myriad of variants.

The Remedy

The remedy is to use another SQL Server action rather than the Execute a SQL Query action. The SQL Server connector has plenty of other, more constrained actions.

You can also use the Execute Stored Procedure action, as that will use SQL parameters under the hood.

If you must accept untrusted data in order to construct a SQL statement, make sure you use SQL parameters as shown below. Note that using parameterized queries to build SQL statements is as true today as it was well over a decade ago!

Notice how the SQL query has changed to use @name, which is a parameter declared using the formalParameters option.  The key is the parameter name and the value is the SQL data type which should match the table definition.

Now the value of this parameter can be set to reference the name dynamic content that came from the HTTP POST request in the actual parameter value below

So there you have it, use SQL parameters! Be careful out there!

I hope that helps, if you have any questions please leave a comment.