ASP.NET, Azure, Database, EntityFramework Performance Pointers

Introduction

Over the last few years, I’ve been involved in optimizing performance in a respectable amount of projects made by a lot of different developers. The quality of the codebase in each project has varied from the posterchild for best practice architecture to pure and utter insanity involving a mix of SQL queries being constructed inline in ASP.NET pages.

An interesting thing I’ve found is that the architectural patterns aren’t really indicative of the project’s overall performance. Regardless of “good” or “bad” code, there are mistakes being made, methods needing optimization and databases needing to work more than they should.

I want to make a post here and describe the most common issues, and steps one can take to fix these issues and make the world a much better and faster world.

I’m mentioning Azure in this topic as I’ve been involved in migrating and/or improving performance for quite a few Azure sites. The perceived initial reaction on migration is generally that “Azure is slow!” which usually tends to be indicative of a bigger problem. The fact is that Azure works fine, but it highlights the performance bottlenecks in an application, especially when it comes to database access. An on-site local database server will most likely be a lot more forgiving than an azure DTU-plan.

In general nowadays there are two main areas which end up as bottlenecks performance-wise, web-server or database-server. The Web-server can end up being the bottleneck if you have a lot of CPU bound operations, such as repeated loops to populate objects with various information or simply preparing and padding large amounts of data. The database server can have similar CPU-issues, usually as a result of complex queries, procedures/long running jobs or simply excessive amounts of queries. In addition there’s the possibility of huge amounts of data being requested, which might not be CPU-intensive, but it will cause the application to wait for a longer period of time while the data is being transferred.

I’d say that roughly 80% of performance issues I’ve dealt with are related to interactions with the database server, if you’re experience issues in your application, that’s usually the best place to start looking. Note that this doesn’t mean that things can be fixed exclusively on the server, in most cases it’s the actual code that needs to be modified to improve performance. To troubleshoot these issues you need to have a good grasp of SQL, know what queries are reasonable, and how execution plans work.

Database Performance

The primary tool you want to use to locate most of these issues is an SQL Profiler. I’d recommend using an actual SQL database profiler such as Microsoft’s SQL Server Profiler instead of the pseudo-profilers that are attached to ASP.NET applications, as the latter can’t measure database statistics like reads/writes/CPU, only duration. They’re also in my experience not 100% reliable in cases where you have threads/tasks or other web-requests firing off database queries.

I’m not going to write any pointers about how to fix issues exclusive to the SQL server as this requires more knowledge than simple pointers. In general though, these problems are usually related to missing indexes, complex procedures and inefficient views, which you would already require a good understanding of databases in general to improve, and most importantly, not make worse.

Entity Framework (EF)

I’m generally in favor of using EF as an OR/M, and I’m describing specific scenarios with EF here in detail. The concepts will most likely translate to other OR/Ms or data-access strategies as well. EF simplifies data-access, but you need to be aware of how it does this to generate code which will translate well into SQL. On a side-note, if you’re one of the people claiming that EF (or most mainstream OR/Ms) is “slow and horrible”, there’s a good chance you’re doing something in an inefficient way. Just because you can stab yourself with a pair of scissors, doesn’t mean it’s not extremely useful for cutting paper.

These pointers are based on actual real world issues I’ve come across.

Simplify complex LINQ EF queries – “Just because LINQ accepts it, doesn’t mean it’s a good idea!

Some of the queries I’ve seen generated tend to get slightly over the top complex when it’s translated into SQL. When you’re trying to obtain data from the database, be as straight-forward as you can, don’t try to do something extremely fancy. If you need to join in data from all over the place, group and summarize bits of it, filter parts of it and only retrieve a tiny bit of information – you should consider a dedicated Stored Procedure, or splitting parts of the LINQ EF query into pieces to make it easier on the database. These issues are usually found by looking for high CPU/Reads in SQL Profiler.

Include related tables when relevant – “Lazy-loading means the database gets busy!

When you’re querying a table, and always use the related tables, consider adding .Include(relatedTable) to ensure the related entities are already loaded rather than querying them individually. If you query a list of products and always want to access their metadata in another table, including that would prevent you from performing X+1 selects where X is the list of products.

Don’t *always* include related tables – “Too much data makes everything slow!

If you’re querying a table with a 1:VeryMany relation, the way EF handles this is by doing a standard join. This means that the data in the 1-part of the relation will be duplicated X amount of times before it’s transmitted from the database server. If there are enough rows combined, especially if the table being duplicated has a ton of data, this will often cause delays. If you’re only retrieving a single row, simply removing the include statement will cause it to lazy-load the needed data fairly efficiently. If you have a lot of rows returned, you will run into the issue in the previous paragraph which creates a lot of queries during lazy-loading. In this case, it can be beneficial to eagerly manually load the related entities, by first retrieving the rows in the primary table, then retrieving the rows from the joined table, using the ids from the first table as a parameter, then connecting the rows manually.

Only include the data you need for large queries – “YAGNI!

For large queries where you only want some information and don’t need the full entities, try to create queries where you only select the properties relevant for your operation. This is generally done when you end up requesting so much data that you see a noticeable delay on the data transfer from the server. Populating a wrapper-object directly from the IQueryable can greatly increase performance in these scenarios.

Don’t post-filter the query in code – “Think of the network!

I’ve seen countless examples of cases where a query is done to retrieve data, only to have the next line of code ignore most of the data retrieved. If you’re implementing a restrictive filter, try your best to restrict it in the actual query to ensure that only the relevant information comes back. The worst case here is when people perform a ToList() on the base query to retrieve the entire table, and then filter. This happens more often than people think.

Group similar restrictive queries into one – “I know what I just said!

Despite the previous paragraph, there are a fair few queries that should be combined if possible. As an example, I’ve seen a *lot* of instances where some information is selected by a given status, then subsequent queries do the same thing with a different status. In these instances, it’s beneficial to group them together and share the result so you only perform one trip to the database.

Avoid the same queries in the same scope – “… but it’s so much easier!

In the more complex systems in the real world, where there’s more than an open connection and retrieve the hello world text, there’s often the chance that the same query is being requested several times by different controls during the same request. As an example, a web-site could require bits of customer information several places on the same page, which end up being located in different controls with no real knowledge of each other. Make your DAL able to share this information if it’s already been requested within the same page request without requiring additional database trips.

Don’t convert database types in queries – “… but it looks good in LINQ!

When you have types that don’t match, such as string value that is holding a number that you want to use to filter on ids, make sure that you convert the code type to the database type, ideally before the query. I’ve seen examples where these filters have been done the other way around, which creates a query where the SQL server needs to convert all the ids in the table to another format before it can perform the comparison and filter.

Stored Procedures can still be used – “… but it’s so boring to add to the model!

Keep Stored Procedures as a tool in your toolbox even with an OR/M, as it’s still extremely useful in the right scenarios. Typically if you have batch updates, cross-database joins, complicated reports or other larger sets of data that needs information from all over the database, it’s a good call to utilize a Stored Procedure over trying to complicate matters with the world’s largest LINQ-query. Keep in mind that it needs to be maintained independent of the solution and creates a slightly bigger maintenance overhead as a result, so it’s not something you’d want to do for most things, but keep it in mind for the special scenarios.

Know the difference between a Queryable/DbSet and a List – “… but they look alike …

Keep in mind when you design your DAL-strategy how far up you want to pass the Queryables. Make sure that everyone working with them knows when an actual query is being performed against the database, and knows that a List is something that has already been populated from the database. This is quite essential when lazy-loading comes into play, and making conscious decisions about when to filter data.

Cache static data – “… but we need changes NOW!

The biggest resource-saver is implementing some sort of caching mechanic for the frequently accessed data. In general, configuration-type data, type-tables and other data that only changes during deployments can be cached indefinitely, either through normal MemoryCache means or having the data in a static container. The issue comes when the frequently accessed data can change, at which point you need to determine on a case by case basis how long you can get away with caching data. From experience, no businesses want caching, they just want the performance that comes from it.

Web Server/Application

When the performance bottleneck is located in the web server area, symptoms include a very high CPU utilization on the server, there’s usually not a generic suggestion to fix it. You can diagnose issues by figuring out which areas are frequently accessed, which methods are frequently run, add diagnostic/time logging and check recently changed code areas if it’s a new issue.

There are, however, a few common scenarios which are easy to fix that often will cause these problems.

IEnumerables filtering in other lists

In larger datasets, there is an issue where you pass in an IEnumerable and filter it inside the query of another list or enumerable, as this will potentially cause the enumeration to happen for every other entity it’s being filtered on. A relatively simple query if you had been using a List would instead become an exponential CPU-nightmare. I strongly recommend using ToList() instead of enumerables in these cases for performance reasons, and in just about every other case for similar reasons. For further optimization, make sure you filter out the entities you know won’t be a match before doing the multiple query filtering, the fewer available entities to choose from the better.

Preparing cached objects for presentation

When you have frequently accessed objects, be it cached entities from the database or simpler objects, make sure you cache them as “prepared” for presentation as they can be. Given that they’re actually cached means they’re accessed fairly often. If you then need to do post-processing on these objects, for instance localize them, grab other information from other cached objects and so on – that becomes a costly process which should be replaced by caching the object *after* you have performed these operations.

Looping through large lists

If you know you have large lists of objects (say 50,000+), finding an object even if the list is cached might seem quick, but when these collections get accessed excessively without thought, this becomes a major CPU-drain. If you’re using a lookup based on an id for a large list, I strongly recommend using a dictionary with the id as the key, as this will improve performance by several magnitudes in these scenarios.

General loops

Whenever you loop there’s a chance that things don’t exactly go as smooth as you want. If you have performance issues, go through each line in a loop and make sure you don’t do anything costly. A few simple pointers include cutting the loop as quickly as you can, move as much code out of the loop as possible, ensure that you’re aware of anything performing database lookups and always test it with a worst case scenario number of items – as that’s how much it needs to be able to handle.

Summary

There’s a lot of different pitfalls when it comes to performance and even the best intentions can cause issues. Especially in high-performing/high-request applications even the slightest change can have major effects on the performance and stability of an application – and it’s in everyone’s best interest to ensure that developers are informed and keep this in mind while developing code.

Azure makes this a very interesting problem given that you’re now technically paying more upkeep (literally) when the code is performing badly. With this in mind it’s extremely important to keep your application from doing just that, as the price jump from one tier to the next can be quite noticeable on your monthly reports.

The good news is that it’s usually relatively straightforward to fix issues related to performance once you’re familiar with what’s causing it, and I do believe most software systems should be quite capable of running on the lower tiers of Azure with optimized code.

To end with some self-promotion at the end, if you *do* need help with performance in your EF or general .NET application, head over to http://ignitiondevelopment.co.nz/ and leave a message.

Mijan.Core.DataLayer - Quick Cache Samples

One of the things I've done quite frequently over the years to improve performance against databases is to cache items which I consider relatively static. When I was writing the DataLayer wrapper I wanted to incorporate this in a very quick and easy way to ensure people could actually use this without the amount of hassle normally involved in setting up caching.

With Entity Framework, you'll often use the lazy-loading without thinking that you're actually lazy-loading any items due to the easy relations defined by the generated classes, or the code-first classes, I wanted an easy way to sort out easy caching options for these as well, but the problem here is that the model shouldn't really have access to data-retrieval options in a standard tiered setup, so I've added some reflecting magic that gets the job done.

The Quick Cache classes, (Qc and Qsc, for Quick-Cache and Quick-Single-Cache respectively) are accessible in the Mijan.Core.DataLayer. What they do is look up an implementation of IDbCache and uses this implementation to access the database - if you've already defined a DalCache class like mentioned in the previous post, it's already done! You can start using Qc and Qsc as much as you want for all your caching needs!

A practical example of this would be in a blog-model, where you rarely add authors, or any model with enum/type foreign keys, consider the following Code-First example in a Blog entity class.


Here you can simply use blog.AuthorCached to access a cached variant, where the blog.Author would actually load the row from the database.

What happens behind the scenes here is that the Quick Cache will load the entire Author table, and maintain it in-memory. The Qc class will return the List of those objects, while the Qsc will return a single entry there through a Dictionary approach, making it a lot more efficient for single-entry lookups. A note here is that Qsc will not work unless there's a single Primary Key on the table, due to the need for the initial reflection and dictionary approach becoming excessively complex with composite keys and potentially needing tuples to maintain key values. Simply use Qc instead for these scenarios.

Example of how to use Qc as follows, in this case, we want to cache all the posts for a given blog, as accessed through the code-first entity.


It's important to note that by default everything grabbed through this approach will be detached from its context, if you for some reason need the context attached, there is a GetWithContext for that as well, but I'd strongly recommend that you really know what you're doing in that case, and that you know how to handle potential errors coming from objects attached to "different contexts".

If you need to refresh data for whatever reason, Qc.Clear<T>(); and Qsc.Clear<T>(); will do the trick. Also, due to the nature of reflection and potential errors, there will be an exception thrown if you try to use this and there's anything but *one* instance of IDbcContext defined.