Who is Authenticating Whom?

I have worked on too-many-to-count threat models over the last twenty or so years and invariably I have to explain, “Who is authenticating whom?”

Let me explain the issue with a simple example.

In this scenario, a user communicates through an API App on Azure which in turn communicates with a CosmosDB instance.

When I ask the question, “How do you authenticate the API App?” The response is often, “We use a username and password” and here’s where the confusion lies: the answer (uid and pwd) is the user’s credential.

What I do at this point is hop up to the whiteboard to explain that both communicating parties should authenticate one another.

First, when the user connects to the API App, how does the user know that the API App is the correct API App and not a rogue? The answer is to use server authentication and the correct answer most of the time is to use TLS.

That’s what I meant when I asked, “How do you authenticate the API App?”

Next, how does the API App know the user is a valid user of the system?

That depends on what the server, in this case API Apps, supports.

Different server technologies offer various ways to authenticate incoming connections from users. Examples include client TLS, username and password (with 2FA!) and delegating to a federated identity provider, such as Google, Facebook, Twitter or Microsoft ID.

A common way to authenticate a valid user is through the use of access keys, which are often implemented as a long, random series of bytes that are passed in the client request. Here’s an example from an Azure Storage Account:

Another method to authenticate and authorize a client is to use Shared Access Signatures. Here’s an example from the same Storage Account:

…and a resulting SAS token:

This page and this page explain the client authentication (not authorization) policies supported by Azure API Apps.

Let’s be super secure and in our solution we’re going to use TLS at both ends: TLS is used to authenticate the server and TLS is used to authenticate the client.

Let me state this another way.

When the user connects to a server, the user’s software (browser) authenticates the server is the correct server. This is SERVER authentication.

On accepting a user’s connection, the server verifies the user is who they are. This is CLIENT authentication.

But it gets a little more complex. When the API App connects to CosmosDB, how does CosmosDB know the connection is from a valid source and how does the API App know it is talking to the correct CosmosDB?

In this scenario, the API App is now the client, and CosmosDB is the server. When building the threat model, you might decide that you don’t need to authenticate the two to each other. That’s fine, so long as is it’s the threat model and the rationale for that decision.

API App authenticating CosmosDB, the server, is easy: use TLS, which is enabled by default anyway in CosmosDB.

When the API App connects to ComsosDB, it can provide and key to do so. This key should be stored in Key Vault and use a Managed Identity for the API App to access Key Vault.

Now I think about this, I might do a Proof of Concept for the entire solution to show how it’s done and build the threat model while I am at it. There’re many more ways to restrict access at the backend, such as only allowing CosmosDB to listen on specific IP addresses or ranges, and in this scenario, we can use the backend IP address of API Apps. This way CosmosDB will ONLY allow traffic from our API App. Let me know what you think!

Anyway, to wrap up. When working with a threat model, or looking at the security of any architecture, you must always consider:

“Who is authenticating whom”

or… how is the client authenticating the server (server authentication) and how is the server authenticating the client (client authentication)?

Azure Monitor Activity Log Change History (Preview)

There’s a new public preview feature named Change History in Azure Monitor Activity Logs that allows you to get a better feel for what caused events in the first place.

I don’t know about you, but wading through a JSON description of an event to determine what happened can be a little cumbersome at times, and this feature is a great time-saver.

To get to the feature, go to Monitor and then click Activity Log:

This is the list of all your management plane activities across your subscription consumed by Azure Monitor.

NOTE: If you don’t see Monitor in the list on the left, type monitor in the Search resources, services, and docs textbox at the top of the portal:

Below is an expanded example of the Activity log from my personal subscription:

Now click on an Update event – in this example, I will click the first event. As you can see I have been experimenting with Azure Private DNS; I had just got done running this example.

Notice there’s a new option named Change history (preview):

There’s a list of changes to this resource shown and if I click any of these entries, the following pops up:

As you can see – right there is the diff – this change was me changing a tag named vmRelated from True to False.

More info is available here.

Simple, but effective!

Azure Logic Apps and SQL Injection Vulnerabilities

This code takes a property from an untrusted HTTP post named name and uses that as input to a SQL statement. If I use an HTTP POST tool, like Postman, I can test the code:

In this example, the name Mary is inserted into the database and the code is working as intended. But if the untrusted input is changed to something like this:

Well, I think you know what happens! Assuming the connection to SQL Server has enough privilege to delete the foo table this code will go ahead and delete the foo table.

Remember, one good definition of a secure system is a system that does what it’s supposed to do and nothing else. In this example, the code can delete a table, and that is far above what was written in the spec!

Most common SQLi attacks are used to exfiltrate data when using SQL select clauses using the classic or 1=1 -- string and the myriad of variants.

The Remedy

The remedy is to use another SQL Server action rather than the Execute a SQL Query action. The SQL Server connector has plenty of other, more constrained actions.

You can also use the Execute Stored Procedure action, as that will use SQL parameters under the hood.

If you must accept untrusted data in order to construct a SQL statement, make sure you use SQL parameters as shown below. Note that using parameterized queries to build SQL statements is as true today as it was well over a decade ago!

Notice how the SQL query has changed to use @name, which is a parameter declared using the formalParameters option.  The key is the parameter name and the value is the SQL data type which should match the table definition.

Now the value of this parameter can be set to reference the name dynamic content that came from the HTTP POST request in the actual parameter value below

So there you have it, use SQL parameters! Be careful out there!

I hope that helps, if you have any questions please leave a comment.

A Common Pattern – Using Insecure JavaScript Libraries

I will be up front and state that I am no JavaScript fan. I REALLY liked it back in the day when it was readable! I much prefer TypeScript as it fosters more robust, readable and maintainable code, especially for large projects. But that’s just my opinion.

With all that said, I understand JS’s importance in today’s modern toolset, and I am fine with that.

But there’s an alarming trend I have noticed with many customers who rely on common JS frameworks such as jQuery, Angular, Bootstrap, React and so on. Seems there’s a new framework every week!

The trend is devs using insecure JS libraries. 100% of the customers I have worked with over the last year or so have had at least one insecure JS library.

My old boss, Steve Lipner, coined a term about 20 years ago to describe these kinds of dependencies: he used the term ‘giblets’.

A giblet is code you depend on but you don’t control and JS frameworks are a prime giblet. If a JS framework you use has a security vulnerability, then you have a security vuln, too! Here’s a little more info on giblets.

So how do you remedy this issue of using insecure JS libraries?

First, you need to understand which libraries you use, and keep abreast of any security vulnerabilites therein. Someone in your org needs to keep track of this; it’s part of their job!

Next, you need to make a hard decision. Does your application host a copy of the libraries and possibly be behind on security fixes? Or, does your application pull the latest-n-greatest from the source and possibly run the risk of regressions? I cannot answer that question for you, you need to determine the policy and live with the tradeoffs offered by each scenario.

As a side note, there’s a great add-on for Chrome named Retire.js that will flag potentially vulnerable JS libraries from the browser. It produces output like the image below when you navigate to a web page that references a vulnerable library.

I want to wrap this up with a story.

One customer told me they had to use an old and insecure version of jQuery for compatibility with Internet Explorer on Windows XP! After looking at their analytics, they realized that some customers were still using Windows XP, but it was only three customers out of thousands!

So we added landing-page text that said, “We see you’re still using Windows XP! Windows XP is no longer supported by Microsoft, and we will retire support for Windows XP on September 30th 2019. After this date you might encounter issues using our system. Please upgrade before then.”

I think that’s reasonable!


Azure Policy – A Love Story

In a previous post I briefly mentioned the policy-enforcement capability built into Azure. If you have not seen Azure Policy, used it or simply not aware of it, you really need to know about it!

Before I start, this post is merely to make you aware Azure Policy exists, it’s not a complete tutorial by any stretch and it’s policy from my perspective.

So what is Azure Policy? It’s simply a way to make sure specific conditions exist and if the conditions don’t exist then either raise an audit event or block the action. There’re other options as well, but audit and deny are, by far, the two most important outcomes.

For example, some conditions might be:

  • Require TLS to blob store
  • Secrets in Key Vault must be in hardware
  • Don’t allow VMs to have a public end point
  • Require Threat Detection on Azure SQL Server
  • Require HTTP/2 for Azure Functions

I could keep going, but I am sure you get the picture. The beauty of Azure Policy is once it’s set, enforcement is performed automatically.

So let’s look at an example. Many customers want to, or need to control where their data resides. Data sovereignty is a big deal for some customers. For example, a customer might not want data in specific countries because of legal or regulatory reasons.

A customer wants their CosmosDB data available only in specific regions. One of the nice things about CosmosDB is there’s a simple UI that makes it easy to replicate data geographical. Of course, with that ease and power comes responsibility!

The customer wants to put a requirement in place that limits data replication to only East US, East US 2, South Central US and North Central US. That way ‘something’ comes to the rescue if someone fat-fingers the CosmosDB replication UI, or ‘accidently’ runs a PowerShell script that configures data replication all over the world.

Enter Azure Policy.

I won’t lie, Azure Policy files are a little obtuse. They are declared in JSON. The following file enforces a rule that only allows CosmosDB data in the four regions mentioned above.

     "properties": {
         "displayName": "Limit CosmosDB to specific locations",
         "policyRule": {
             "if": {
                 "not": {
                     "field": "Microsoft.DocumentDB/databaseAccounts/Locations[*].locationName",
                     "in": [
                         "East US",
                         "East US 2",
                         "South Central US",
                         "North Central US"
             "then": {
                 "effect": "deny"

Here’s how to read it. It all starts at the policyRule block. Before I start, remember that ComsosDB was known as DocumentDB.

“if the database account locations is not in East US, East US 2, South Central or North Central US, then deny the action. “

In other words, if someone attempts, by any means, to set a data replication site other than one of the four, the request is blocked.

If you change the “deny” to “audit” the action will go through, but an audit event is raised. This is useful if you want to roll out policy in an existing environment and you want to get a feel for how bad things are without stopping the business from running 🙂

Some customers think in terms of preventative and detective controls. “Deny” is preventatve and “Audit” is detective.

This is just the tip of Azure Policy. If you want to experiment, take a look at some of the sample policies available on GitHub.

One final thought. The title says “A Love Story” and clearly that’s a little extreme, maybe even clickbait, but Azure Policy is one of the most important security compliance features built in to Azure.

Security and Compiler Optimizations

A previous post I wrote about password comparison led to much discussion between some colleagues and I; most notably long-time friend and co-author David LeBlanc and Reid Borsuk in the security engineering team.

Reid mentioned an issue that I have not considered in a long time; and that is the potential for compiler optimizations to introduce security vulnerabilities.

The first time I learned of this issue was during the development of Windows Server 2003. Here is the link to my initial message to bugtraq in Nov 2002 on the topic.

Fast forward almost 17 years.

The issue with my initial code is the compiler might generate code that could short-circuit the string comparison. Here’s the code:

static bool TimeConstantCompare(string hashName, string s1, string s2)
      var h1 = HashAlgorithm.Create(hashName).ComputeHash(Encoding.UTF8.GetBytes(s1));
      var h2 = HashAlgorithm.Create(hashName).ComputeHash(Encoding.UTF8.GetBytes(s2));

      int accum = 0;
      for (int i = 0; i < h1.Length; i++)
          accum |= (h1[i] ^ h2[i]);

      return accum == 0;

The code walks over the bytes of each hash; bytes 0..N from hash 1 and bytes 0..N for hash 2. Each byte is XOR’d and if two bytes are different the result is non-zero (it’s a property of XOR) and this result is OR’d with an accumulator. This means, and this is the important part, that if the accumulator ever gets to non-zero it will always be non-zero from that point on.

Now here’s the fun part.

The code returns a bool (true or false) based on the accumulator being zero or not, but the compiler might emit optimized code that no longer performs the rest of the loop if at ANY point the accumulator is not zero.

For example, let’s say the hashes are 32-bytes in size. In theory, the loop will ALWAYS iterate from 0-31. However, let’s say when i==17, the two bytes are 0xDE and 0xAD, this will yield a value of 0xDE XOR 0xAD == 0x73 and if the accumulator is 0x00, then 0x00 OR 0x73 == 0x73. The compiler knows that from this point forward (because of how OR works), the accumulator will always be >= 0x73. The last line of the function returns a value based on the accumulator == 0, the compiler knows that the accumulator can NEVER be 0x00 from this point on, so skip the rest of the loop!

I hope that makes sense!

This isn’t a huge issue because we’re comparing a hash and not the actual password. But it’s still valuable to know compilers can do this.

There are two fixes, here’s the updated method:

[MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]        
static int TimeConstantCompare(string hashName, string s1, string s2)
     var h1 = HashAlgorithm.Create(hashName).ComputeHash(GetRawBytes(s1));
     var h2 = HashAlgorithm.Create(hashName).ComputeHash(GetRawBytes(s2));

     int accum = 0;
     for (int i = 0; i < h1.Length; i++)
          accum |= (h1[i] ^ h2[i]);

     return accum;

First, the method returns an int rather than a bool, so the compiler has no clue what you’re doing with an int. In theory, the compiler could look at how the code calling this function uses the return value to optimize the code, but inter-procedural optimizations are harder and less common than intra-procedural optimizations. It would be very hard for the compiler to ascertain how to optimize this method based on every call in the code to the method.

The real fix is to add the NoOptimization directive before the method implementation. This prevents the optimizer from doing little tricks like this. The obvious downside is a potential performance hit. If this is code that’s called constantly, in a tight loop, then you may see degradation. But like everything in life, measure!

Reid provided me with some more in-depth detail about inter- vs intra-procedural optimizations. You probably don’t need to know this, but if you’re nosey, read on.

This statement is true for traditional compilers, but the .NET JIT is prohibited from inter-procedural optimizations only in the case where both NoInlining and NoOptimization is enforced, not just NoOptimization. That’s because the .NET runtime requires method signatures to be carried with the objects they represent. This means every instance of Program (including the static one) carries the pointer to the compiled method (or JIT stub pre-compile), so every caller needs to call the function in the same way. The caller would then be restricted to calling the “standard” implementation of the method without this cheat.
The way .NET implements the scenario you’re talking about is to force an inline to occur. After inlining occurs, the NoOptimization attribute gets stripped since the method disappeared. Then they force optimization on the new parent method with code already inlined. Preventing both inlining and optimization prevents the specific optimization you refer to in your blog post.

My updated code for string comparison is on GitHub

On a side note, you may notice the code to convert a string into a series of bytes is different. That’s a topic for another post 🙂

CosmosDB and SQL Injection

Over the last few months I have started digging deep into CosmosDB. I have to say I really like it, and coming from someone who has used SQL Server since v4.2 on OS/2, that’s saying something!

To be honest, I have only used the SQL API, but eventually I will take a look at the Gremlin graph API.

Being a security guy, my first area of interest is SQL Injection – is it possible to build SQL statements that are subject to SQLi when using CosmosDB?

The answer is YES! (but read on)

In fact, querying CosmosDB using the SQL API is as vulnerable to SQLi as, say, SQL Server or Oracle or MySQL.

IMPORTANT This is not a vulnerability in ComsosDB; it’s an issue with any system that uses strings to build programming constucts. A similar example is Cross-site scripting (XSS) in web sites using
JavaScript (or Java or C# or PHP or Perl or Ruby or Python!) at the back-end. Remember, CosmosDB will give you back whatever you ask for (permissions aside) and this may be more than you bargained for if the query is built incorrectly from untrusted data.

Even if you don’t have CosmosDB setup in your Azure subscription (remember, you can get a free subscription) you can test this out.

  • Go to the CosmosDB query test site at https://www.documentdb.com/sql/demo
  • Run the default query and note you only get one document returned.
  • Now add ‘or 1=1’ at the end of the query (without the quotes) and re-run the query. You get 100 documents returned.

The fact that the ‘or 1=1’ clause changed the query and made it return all documents indicates a potential SQLi issue. If any arguments in the query came from an attacker, we’d be in a world of hurt. For those not aware, the ‘or 1=1’ clause at the end of query is true for every document in the database, so the query returns all documents.

Remedy If you’re programmatically querying CosmosDB, and handle untrusted input that is used to build queries, you must use parameterized queries. This is the same guidance that’s been around in other SQL APIs.

There’s a nice write-up about CosmosDB and parameterized queries here.
The methods you’ll use the most are SqlParameter and SqlParameterCollection.

TL;DR CosmosDB queries are subject to SQLi if you build queries incorrectly from untrusted data. Make sure your queries use parameterized queries, especially in the face of untrusted input.