Jan Heggernes

. . . systematic randomness . . .

ASP.NET, Azure, Database, EntityFramework Performance Pointers

Introduction

Over the last few years, I’ve been involved in optimizing performance in a respectable amount of projects made by a lot of different developers. The quality of the codebase in each project has varied from the posterchild for best practice architecture to pure and utter insanity involving a mix of SQL queries being constructed inline in ASP.NET pages.

An interesting thing I’ve found is that the architectural patterns aren’t really indicative of the project’s overall performance. Regardless of “good” or “bad” code, there are mistakes being made, methods needing optimization and databases needing to work more than they should.

I want to make a post here and describe the most common issues, and steps one can take to fix these issues and make the world a much better and faster world.

I’m mentioning Azure in this topic as I’ve been involved in migrating and/or improving performance for quite a few Azure sites. The perceived initial reaction on migration is generally that “Azure is slow!” which usually tends to be indicative of a bigger problem. The fact is that Azure works fine, but it highlights the performance bottlenecks in an application, especially when it comes to database access. An on-site local database server will most likely be a lot more forgiving than an azure DTU-plan.

In general nowadays there are two main areas which end up as bottlenecks performance-wise, web-server or database-server. The Web-server can end up being the bottleneck if you have a lot of CPU bound operations, such as repeated loops to populate objects with various information or simply preparing and padding large amounts of data. The database server can have similar CPU-issues, usually as a result of complex queries, procedures/long running jobs or simply excessive amounts of queries. In addition there’s the possibility of huge amounts of data being requested, which might not be CPU-intensive, but it will cause the application to wait for a longer period of time while the data is being transferred.

I’d say that roughly 80% of performance issues I’ve dealt with are related to interactions with the database server, if you’re experience issues in your application, that’s usually the best place to start looking. Note that this doesn’t mean that things can be fixed exclusively on the server, in most cases it’s the actual code that needs to be modified to improve performance. To troubleshoot these issues you need to have a good grasp of SQL, know what queries are reasonable, and how execution plans work.

Database Performance

The primary tool you want to use to locate most of these issues is an SQL Profiler. I’d recommend using an actual SQL database profiler such as Microsoft’s SQL Server Profiler instead of the pseudo-profilers that are attached to ASP.NET applications, as the latter can’t measure database statistics like reads/writes/CPU, only duration. They’re also in my experience not 100% reliable in cases where you have threads/tasks or other web-requests firing off database queries.

I’m not going to write any pointers about how to fix issues exclusive to the SQL server as this requires more knowledge than simple pointers. In general though, these problems are usually related to missing indexes, complex procedures and inefficient views, which you would already require a good understanding of databases in general to improve, and most importantly, not make worse.

Entity Framework (EF)

I’m generally in favor of using EF as an OR/M, and I’m describing specific scenarios with EF here in detail. The concepts will most likely translate to other OR/Ms or data-access strategies as well. EF simplifies data-access, but you need to be aware of how it does this to generate code which will translate well into SQL. On a side-note, if you’re one of the people claiming that EF (or most mainstream OR/Ms) is “slow and horrible”, there’s a good chance you’re doing something in an inefficient way. Just because you can stab yourself with a pair of scissors, doesn’t mean it’s not extremely useful for cutting paper.

These pointers are based on actual real world issues I’ve come across.

Simplify complex LINQ EF queries – “Just because LINQ accepts it, doesn’t mean it’s a good idea!

Some of the queries I’ve seen generated tend to get slightly over the top complex when it’s translated into SQL. When you’re trying to obtain data from the database, be as straight-forward as you can, don’t try to do something extremely fancy. If you need to join in data from all over the place, group and summarize bits of it, filter parts of it and only retrieve a tiny bit of information – you should consider a dedicated Stored Procedure, or splitting parts of the LINQ EF query into pieces to make it easier on the database. These issues are usually found by looking for high CPU/Reads in SQL Profiler.

Include related tables when relevant – “Lazy-loading means the database gets busy!

When you’re querying a table, and always use the related tables, consider adding .Include(relatedTable) to ensure the related entities are already loaded rather than querying them individually. If you query a list of products and always want to access their metadata in another table, including that would prevent you from performing X+1 selects where X is the list of products.

Don’t *always* include related tables – “Too much data makes everything slow!

If you’re querying a table with a 1:VeryMany relation, the way EF handles this is by doing a standard join. This means that the data in the 1-part of the relation will be duplicated X amount of times before it’s transmitted from the database server. If there are enough rows combined, especially if the table being duplicated has a ton of data, this will often cause delays. If you’re only retrieving a single row, simply removing the include statement will cause it to lazy-load the needed data fairly efficiently. If you have a lot of rows returned, you will run into the issue in the previous paragraph which creates a lot of queries during lazy-loading. In this case, it can be beneficial to eagerly manually load the related entities, by first retrieving the rows in the primary table, then retrieving the rows from the joined table, using the ids from the first table as a parameter, then connecting the rows manually.

Only include the data you need for large queries – “YAGNI!

For large queries where you only want some information and don’t need the full entities, try to create queries where you only select the properties relevant for your operation. This is generally done when you end up requesting so much data that you see a noticeable delay on the data transfer from the server. Populating a wrapper-object directly from the IQueryable can greatly increase performance in these scenarios.

Don’t post-filter the query in code – “Think of the network!

I’ve seen countless examples of cases where a query is done to retrieve data, only to have the next line of code ignore most of the data retrieved. If you’re implementing a restrictive filter, try your best to restrict it in the actual query to ensure that only the relevant information comes back. The worst case here is when people perform a ToList() on the base query to retrieve the entire table, and then filter. This happens more often than people think.

Group similar restrictive queries into one – “I know what I just said!

Despite the previous paragraph, there are a fair few queries that should be combined if possible. As an example, I’ve seen a *lot* of instances where some information is selected by a given status, then subsequent queries do the same thing with a different status. In these instances, it’s beneficial to group them together and share the result so you only perform one trip to the database.

Avoid the same queries in the same scope – “… but it’s so much easier!

In the more complex systems in the real world, where there’s more than an open connection and retrieve the hello world text, there’s often the chance that the same query is being requested several times by different controls during the same request. As an example, a web-site could require bits of customer information several places on the same page, which end up being located in different controls with no real knowledge of each other. Make your DAL able to share this information if it’s already been requested within the same page request without requiring additional database trips.

Don’t convert database types in queries – “… but it looks good in LINQ!

When you have types that don’t match, such as string value that is holding a number that you want to use to filter on ids, make sure that you convert the code type to the database type, ideally before the query. I’ve seen examples where these filters have been done the other way around, which creates a query where the SQL server needs to convert all the ids in the table to another format before it can perform the comparison and filter.

Stored Procedures can still be used – “… but it’s so boring to add to the model!

Keep Stored Procedures as a tool in your toolbox even with an OR/M, as it’s still extremely useful in the right scenarios. Typically if you have batch updates, cross-database joins, complicated reports or other larger sets of data that needs information from all over the database, it’s a good call to utilize a Stored Procedure over trying to complicate matters with the world’s largest LINQ-query. Keep in mind that it needs to be maintained independent of the solution and creates a slightly bigger maintenance overhead as a result, so it’s not something you’d want to do for most things, but keep it in mind for the special scenarios.

Know the difference between a Queryable/DbSet and a List – “… but they look alike …

Keep in mind when you design your DAL-strategy how far up you want to pass the Queryables. Make sure that everyone working with them knows when an actual query is being performed against the database, and knows that a List is something that has already been populated from the database. This is quite essential when lazy-loading comes into play, and making conscious decisions about when to filter data.

Cache static data – “… but we need changes NOW!

The biggest resource-saver is implementing some sort of caching mechanic for the frequently accessed data. In general, configuration-type data, type-tables and other data that only changes during deployments can be cached indefinitely, either through normal MemoryCache means or having the data in a static container. The issue comes when the frequently accessed data can change, at which point you need to determine on a case by case basis how long you can get away with caching data. From experience, no businesses want caching, they just want the performance that comes from it.

Web Server/Application

When the performance bottleneck is located in the web server area, symptoms include a very high CPU utilization on the server, there’s usually not a generic suggestion to fix it. You can diagnose issues by figuring out which areas are frequently accessed, which methods are frequently run, add diagnostic/time logging and check recently changed code areas if it’s a new issue.

There are, however, a few common scenarios which are easy to fix that often will cause these problems.

IEnumerables filtering in other lists

In larger datasets, there is an issue where you pass in an IEnumerable and filter it inside the query of another list or enumerable, as this will potentially cause the enumeration to happen for every other entity it’s being filtered on. A relatively simple query if you had been using a List would instead become an exponential CPU-nightmare. I strongly recommend using ToList() instead of enumerables in these cases for performance reasons, and in just about every other case for similar reasons. For further optimization, make sure you filter out the entities you know won’t be a match before doing the multiple query filtering, the fewer available entities to choose from the better.

Preparing cached objects for presentation

When you have frequently accessed objects, be it cached entities from the database or simpler objects, make sure you cache them as “prepared” for presentation as they can be. Given that they’re actually cached means they’re accessed fairly often. If you then need to do post-processing on these objects, for instance localize them, grab other information from other cached objects and so on – that becomes a costly process which should be replaced by caching the object *after* you have performed these operations.

Looping through large lists

If you know you have large lists of objects (say 50,000+), finding an object even if the list is cached might seem quick, but when these collections get accessed excessively without thought, this becomes a major CPU-drain. If you’re using a lookup based on an id for a large list, I strongly recommend using a dictionary with the id as the key, as this will improve performance by several magnitudes in these scenarios.

General loops

Whenever you loop there’s a chance that things don’t exactly go as smooth as you want. If you have performance issues, go through each line in a loop and make sure you don’t do anything costly. A few simple pointers include cutting the loop as quickly as you can, move as much code out of the loop as possible, ensure that you’re aware of anything performing database lookups and always test it with a worst case scenario number of items – as that’s how much it needs to be able to handle.

Summary

There’s a lot of different pitfalls when it comes to performance and even the best intentions can cause issues. Especially in high-performing/high-request applications even the slightest change can have major effects on the performance and stability of an application – and it’s in everyone’s best interest to ensure that developers are informed and keep this in mind while developing code.

Azure makes this a very interesting problem given that you’re now technically paying more upkeep (literally) when the code is performing badly. With this in mind it’s extremely important to keep your application from doing just that, as the price jump from one tier to the next can be quite noticeable on your monthly reports.

The good news is that it’s usually relatively straightforward to fix issues related to performance once you’re familiar with what’s causing it, and I do believe most software systems should be quite capable of running on the lower tiers of Azure with optimized code.

To end with some self-promotion at the end, if you *do* need help with performance in your EF or general .NET application, head over to http://ignitiondevelopment.co.nz/ and leave a message.

Entity Framework - Code First or Database First?

One of the questions that tends to pop up in most new (and old) .NET projects using Entity Framework nowadays, is which approach to go for, either having a Code First approach, or Database First.

The Code First approach is relatively new, and gives developers the ability to essentially forget that there's a database involved, you simply create the classes you want to be represented as database tables, add properties to related classes, and the framework generates all the required tables, constraints and you never even have to know what SQL is to be up and running.

On the other hand, the Database First approach involves you creating the data model in the database, normally using either SQL Management Studio or some other tool, creating all tables, specifying values and adding on Primary Keys, Foreign Keys and various constraints if needed, the framework then extracts the database schema, and creates the .NET classes for you.

Apart from this process, the development is pretty much exactly the same, and developers working at the higher tiers don't have to know which approach was used, as they would use the generated code in the exact same way regardless.

Why then does it matter? Where is the difference?

For developers, the ease of access by never having to use any database tools gives Code First immediately an advantage over DB First. It's quite simply much easier and quicker to add a property to a class in the code than it is to add a column to a table, as you generally then also need to generate a database script to be able to use the added column in production, as well as regenerating the model once the column is added, while all this is covered automatically by Code First and its "Migrations" for database changes.

For DBAs, well, if you have DBAs in your organization or projects, as a developer you likely don't have much say in the matter, and projects will be DB first regardless. However, if you're a developer, you might want explicit control over how the data model looks and how it is constructed, and the best way to do this, is to do it at the database level.

What to choose then?

Personally, all my recent personal projects are Code First, and I don't see that changing any time soon. I might also endorse this approach for limited team projects or smaller data storage projects, but that's where it ends. For any major team effort, or major high-performing data storage project, I simply can't encourage this approach as opposed to creating the database first.

I will start off my reasoning by saying that in my career, I've generally been the go-to guy when something feels sluggish, or a page is slow. I'm just as comfortable looking at SQL as I am looking at code, and I've come to realize that I'm not the typical developer due to my high focus on performance which involves among other things always keeping SQL Profiler running on a second monitor while I'm developing and testing.

When I use Code First, I always verify the migration scripts thoroughly, verify the migration scripts, ensure that it's doing *exactly* what I want it to do. It's very easy, especially with relationships, to mess up and add unintended columns, join tables and lacking foreign keys by accident if you don't have the right attributes on your classes. I also verify that everything looks alright in SQL Management Studio once the changes have been made. 

This is the bare minimum I'd expect *anyone* to do if they're serious about their development work. The problem here is that most developers won't do this. It's an unfortunate fact for those of us who "live and breathe" development, you know the type - we get twitchy eyes and blood boiling when we see someone using a capitalized variable name for a scope variable, messing up the code "feng shui" that's implicit when we open any solution - not everyone is like this. For some (most?) this is "just a job", where the goal is to simply be done with your tasks/backlogs/defects in a timely manner and cash your paycheck. This is a reality of life, all developers are different, and I'm not saying everyone should be the embodiment of Code Complete 2nd edition, just that you should expect and anticipate that some developers want to do exactly what they're tasked with in the least complicated matter regardless of the implications.

As such, for general development in major projects and/or high performing solutions, I will strongly recommend database first. The developers who may not know how to do anything in SQL shouldn't be tasked with model changes that could potentially hugely affect database performance regardless. Leave that to the people who can and, most importantly, care about the overall health of the model and the performance of the database.

Mijan.Core - Service Overview

One of the more "common" tasks in computing in general is to have some sort of service running in the background doing all kinds of different things, and as a developer, I feel it's quite nice to just have a very basic framework I can plug into whenever I need something run as a service.

In Mijan.Core I have a Service namespace which contains the basics for your own windows service, complete with installer and the potential to debug as well. This is a very simplistic concept which enables people to make complex services and just quickly plug it into a windows service.

I've defined IService as the base interface for any service, a service needs to be able to start, stop and have a name for identification purposes. The Start method also takes an IServiceHandler as a parameter, if the service for some reason needs to interact with other services running in the same application, in general this parameter can be ignored though.

For the basic executable service, all you have to do is create a console application (not a service!) and have the following in the main entry point.


You can also add another optional parameter after args to specify the name of the service as it would appear in services, otherwise it'll default to MijanService.

I've made a basic TimerService implementation that resides in Mijan.Core.Service, this performs a given task every X minutes during a specified time interval, or simply every X minutes all the time. For a very simple implementation of this, look at the following.


This is essentially all that's needed for a service to execute something every minute, in this instance TimerTick() will write the current time to the Console.

For the service to pick up on this, however, it needs to be plugged in to the app.configuration file. MijanService needs its own little configurationSection, and that's handled as such.


The HelloService line is the timer just implemented above, with its name and namespace as well as the assembly where it's located. The second example is how to specify custom values in the configuration xml. In this case I've added a few properties in another service, and need to add a separate class for the XmlConfiguration, as well as a Service to handle this. As long as I have a service constructor with an XmlElement as its parameter, it will pass its service element into the constructor to handle. For this example, the classes look like

It's important to note the [XmlRoot("service")] attribute, as that's the XmlElement being passed into the constructor.

The service executable takes 3 different parameters when you launch it, either install, uninstall or debug. If you use install, it will be added to the computer's list of services, and will function as any other service in terms of start/stop/recovery. Uninstall obviously uninstalls it, while debug will launch the service in the console window, so you can essentially have it running in a controlled environment before throwing it into fully automated service-mode.

The result and output from running this service in debug mode will be ...


Both services specified in the configuration get to run in their own threads and have their fun!

Mijan.Core.DataLayer - Quick Cache Samples

One of the things I've done quite frequently over the years to improve performance against databases is to cache items which I consider relatively static. When I was writing the DataLayer wrapper I wanted to incorporate this in a very quick and easy way to ensure people could actually use this without the amount of hassle normally involved in setting up caching.

With Entity Framework, you'll often use the lazy-loading without thinking that you're actually lazy-loading any items due to the easy relations defined by the generated classes, or the code-first classes, I wanted an easy way to sort out easy caching options for these as well, but the problem here is that the model shouldn't really have access to data-retrieval options in a standard tiered setup, so I've added some reflecting magic that gets the job done.

The Quick Cache classes, (Qc and Qsc, for Quick-Cache and Quick-Single-Cache respectively) are accessible in the Mijan.Core.DataLayer. What they do is look up an implementation of IDbCache and uses this implementation to access the database - if you've already defined a DalCache class like mentioned in the previous post, it's already done! You can start using Qc and Qsc as much as you want for all your caching needs!

A practical example of this would be in a blog-model, where you rarely add authors, or any model with enum/type foreign keys, consider the following Code-First example in a Blog entity class.


Here you can simply use blog.AuthorCached to access a cached variant, where the blog.Author would actually load the row from the database.

What happens behind the scenes here is that the Quick Cache will load the entire Author table, and maintain it in-memory. The Qc class will return the List of those objects, while the Qsc will return a single entry there through a Dictionary approach, making it a lot more efficient for single-entry lookups. A note here is that Qsc will not work unless there's a single Primary Key on the table, due to the need for the initial reflection and dictionary approach becoming excessively complex with composite keys and potentially needing tuples to maintain key values. Simply use Qc instead for these scenarios.

Example of how to use Qc as follows, in this case, we want to cache all the posts for a given blog, as accessed through the code-first entity.


It's important to note that by default everything grabbed through this approach will be detached from its context, if you for some reason need the context attached, there is a GetWithContext for that as well, but I'd strongly recommend that you really know what you're doing in that case, and that you know how to handle potential errors coming from objects attached to "different contexts".

If you need to refresh data for whatever reason, Qc.Clear<T>(); and Qsc.Clear<T>(); will do the trick. Also, due to the nature of reflection and potential errors, there will be an exception thrown if you try to use this and there's anything but *one* instance of IDbcContext defined.

Mijan.Core.DataLayer NuGet Package - Overview

I've worked on a fair few projects lately, and I always use my own little custom wrapper of the Entity Framework for these projects. It abstracts the storage of the context away by defaulting to have it as a per-httpcontext storage, or a per-thread storage in case of non-web projects.

Other people have expressed interest in using the same approach, and as I'm constantly refining and upgrading what I'm using I've decided to take the step and actually create a NuGet package for this little tool.

This is the initial overview of the DataLayer package, and how to quickly get started using it.

The DataLayer package is named Mijan.Core.DataLayer, and can be installed through the NuGet Package Manager, or console by doing Install-Package Mijan.Core.DataLayer

There's technically support for both the legacy ObjectContext and the new DbContext, but I'm primarily focusing on developing the DbContext part as that's the "surviving" way forward.

Quickstart on how to get up and running with this wrapper is to go about creating your DbContext either through Code First or Db/Model first, then wrap the generated context with your own class-name as an example here.


There are other variants which offer non-static options, but this is the quickest and easiest way to manage your data. Once this is done, you can easily retrieve and manipulate data through the Dal classes you've "created".


In addition to this, the DalCache provides built-in MemoryCache wrapping.


This is just a very quick introduction to what the primary features are in the DataLayer NuGet package. Feel free to comment, I'll most likely add some more information about this package and the Mijan.Core package soon.

Common Entity Framework Errors and Performance Tips

I wanted to write a quick entry about Entity Framework and just point out some of the errors I’ve been asked to resolve, and some of the worst uses of EF I’ve seen performance-wise. All of these are done with the default EF4.0 settings and default entities with foreign keys exposed.

 

I should note that there are probably a ton of errors more frequent than these in the world, but this is based on my personal experience supervising and helping others use Entity Framework. If you have an error which keeps happening with your EF-application, add it to the comments below and I’ll try to write something about it in a future post.

 

Entity Framework Errors

System.InvalidOperationException: The relationship between the two objects cannot be defined because they are attached to different ObjectContext objects.

This error appears when you use entity objects from different contexts when specifying relations and attempt to persist the changes to the database, seeing how the state of the two objects are maintained by two different context instances, they can’t really know the exact state of the other object, and as such, refuse to go along with it. Typically this happens in real situations when you cache an entity or simply have a project which is designed to use more than one ObjectContext.

 

The best way to resolve this is to be consistent in terms of assigning related entities. For the most part I would recommend exposing the foreign key column and setting the id of the related entity directly rather than specifying the entire entity. When this isn’t an option, you either have to refresh the entity in the same context as the entity you want to add relations to, or explicitly detach the entity from its old context and attach it to the new one. The latter of these could be dangerous if it’s also referenced by its old context, so make sure you know what you’re doing if you go with that approach.

 

System.ObjectDisposedException: The ObjectContext instance has been disposed and can no longer be used for operations that require a connection.

This appears most often when you try to lazy-load entities related to an ObjectContext instance which has been disposed one way or another. I’ve seen quite a few instances of code which is similar to this approach:

 

 

which will subsequently die horribly due to the disposed context as a result of the using block. The typical real case scenario here is that you have a function which will return a list of products, then that list will be databound to a listview in a web application, and ProductModel.Name is evaluated there, resulting in much anger directed towards Entity Framework in general and a lot of creative workarounds as the deadline approaches.

 

To resolve this peacefully, a few different approaches should be considered. First off, if you maintain your ObjectContext in a way which lets the garbage collection deal with the entities, you shouldn’t have this problem at all unless you explicitly detach objects, resulting in you getting a nullreference exception instead of this exception.

 

If you explicitly dispose the context like it’s done here, you should always ensure that you have all the information you need by the time the context is disposed. If you work with more complex relations, you might want to consider creating some wrapper classes and populate them with all information needed, or simply specify include hints in the retrieval of the original object.

 

Entity Framework Common Performance Tips

First off, I would like to just mention that you really should run SQL Profiler or something to that extent while developing your EF-applications, especially if you’re new at this and not entirely sure what happens behind the scenes when you design your elegant LINQ queries. The ease of developing using Entity Framework as your data-layer comes with a certain responsibility to at least try to make efficient code. I also know some DBA types who really want to avoid the use of any O/RM due to the SQL it generates, so please think of the overall reputation of EF and try to keep things optimized. Again, these issues and suggestions I post here are based on my personal experiences with Entity Framework and optimizing actual code.

 

Do not ToList() before filtered iteration

Even though this one is a simple one, I’ve seen it slip through quite a few times. The standard way of messing up this way is to specify a query, add .ToList() to perform the query, and then filter it as we’d only want to see the first 500 entries regardless. When you’re developing on your own PC with a local SQL-server, you don’t really notice anything unless you’re profiling at the same time. In reality, you’re retrieving the entire table before you’re applying the filter, resulting in a lot more data transferred than what’s needed. Simply put, make sure you filter/select before you .ToList() a collection for re-use.

 

Related collections are *not* IQueryables

Something which is easy to forget, even if you laughed at the above point, is that the related collections when you have a one to many relationship are not IQueryables, but DataCollections. As a result of this, if you have a customer object and want to display his order number 123, doing customer.Order.Where(o=>o.OrderId == 123) is the same as retrieving all the orders for that customer from the database, then looking through the collection to find the right one. Rewrite this to retrieve the order based on both customerId and orderId to ensure you don’t end up with a lot more data than you need.

 

Be *very* careful when putting entities in ViewState

Avoid this as best you can, if you have to maintain objects in this fashion, try to make wrapper classes. Having the full entities in viewstate tend to be fairly huge compared to the relevant information that you want to keep track of, not to mention that if you have relations included, this will grow even bigger.

 

Avoid instantiating unnecessary ObjectContext-instances.

This is more of a CPU-performance issue, as the cost of creating the context is fairly high, so try to limit these and re-use a context as best you can, please do not put these things inside loops or loop-like scenarios such as partial class properties.

 

Includes

As EF4 rolled in and people could lazy-load without worrying about what was actually in the current context, it seemed to have the added effect of making developers a bit more lazy as well. The typical scenario where you really notice the improvement of a well placed include or two comes when you load up a decent set of data and databind it in an application, evaluating left and right across relations in the database, and EF simply serves up the data without any question at all.

 

It’s also very hard to give specific and definitive suggestions regarding when to use includes and when not to, as every situation tends to be different. Also, you need to consider whether or not this is for general use, or if it’s very specific for your one little area of code.

 

 

Here’s an example of 3 different ways to do the same thing, only difference is the amount of data being transferred and the amount of queries/stress being put on the SQL Server. For this example I want a report of 1000 sales from the AdventureWorks database, and I simulate requiring information from two related tables similar to how I’d do it with a report displayed nicely in a listview.

 

Now, the first method is slightly hard on my database with the amount of queries it’s performing, while the second one is the kindest, as it only requests the exact data it needs and returns it in an anonymous class, and the last one returns the same as the first one, only it does it in one query behind the scenes.

 

The obvious downside of the second option is that it’s an anonymous class and not very flexible. Creating a wrapper class to contain the specific information is okay to transfer it to other tiers of the application, but going down this road will most likely result in an insane amount of wrapperclasses for a ton of different data-subsets.

 

The third option provides the best option for flexibility when you need a lot of different data from all of these tables, but if you then re-use that method elsewhere and only want the specific information in the sales table, you’re all of a sudden getting a lot of extra information you’re not going to use.

 

So, as I’m unable to give one definitive answer here in terms of what to do, I can’t put enough emphasis on the importance of profiling your application while you’re developing. Go through the different database calls being done by your application on each page, and make sure each of them can be justified. You should also note that if you specify multiple includes, you really should ensure that it looks sane enough in a profiler and has a decent execution plan. Also, if you’re retrieving insane amounts of data, you really should avoid includes due to the sheer amount of data it will generate. If you’re loading a lot of metadata for caching, you should have a look at how the lazy loading works in this post, and keep messing around with SQL profiler to get the quickest loading time possible.

 

An additional note here on includes. I’ve seen a few of the generic data-layer/extensions which works with EF, but I haven’t seen one which enables you to specify include paths – as that’s not something that fits very well with the generic pattern in general, but something which must be done directly on the ObjectContext-table. Keep this in mind when you decide on such patterns and practices.

Entity Framework, Context and Lazy-Loading

I wanted to write up a short little entry about how the ObjectContext and Lazy-Loading works in situations where you typically lazy-load entities. In particular this is important to keep in mind when you maintain the context for longer period of times, or cache the context/entities in one way or another.

 

First off, the EF object context maintains the information about the entities it has loaded as long as the tracking option is enabled – by default this is enabled for EF4 with standard settings. Now, the only time these entities are accessed from the context rather than querying the database directly is when they are referred to from another related entity.

 

As an example of this, we can do (using the AdventureWorks DB)

 

This will result in 3 database queries as noted in the comments. It’s important to note that even though we already have the product with ID 680 stored in the context as a result of the first query, it still performs an additional query to retrieve the same object, as opposed to the lazy loaded entity which EF will be content to retrieve from the context once it’s in there.

 

This holds true regardless of where the information has been loaded from. Take the following example

 

Here we simply select all the ProductModel entites up front, which results in them being added to the object context even if they’re not “used” for anything.

 

This is all good when lazy-loading relatively static data. However, if the entities being lazy-loaded are changed through another context or in the database directly, this will potentially result in out-of-date data being used. In this case, if someone were to change the ProductModel relevant for our product, it wouldn’t refresh in our application as long as the object context lived, even if you “refresh” the main product. This can result in some unpredictable results and weird errors if you’re not aware of how it works.

 

If you find yourself in a scenario where you maintain the object context for a longer period of time, and think this could be a problem, there are a few ways to work around this. One way is to explicitly include the related entities in the query, another is to turn off entity tracking for the related objects altogether and you can also explicitly refresh the data if you’re using something similar to the .ToList() approach in the second picture.

 

 

This adds a bit of complexity to the SQL generated, as you tell EF to add a join clause to it, but will ensure that the ProductModule entity will be retrieved fresh from the database. If you have performance in mind for static and cached data, the option of simply loading all the relevant tables with .ToList() should be added as a contender if your includes are getting too complex.

 

 

Finally, the example of how the MergeOption can be used to ensure that we don’t get old data. It’s important to note that the option here is set on a table level, and it’s the table doing the queries which needs to have it set. If we set it on the ProductModel table, the Product table will still track its related ProductModel entities.

 

To summarize, make sure you know how the lazy loading works when you’re using it! Especially when using it in non-disposed contexts. If you rely on lazy-loading in such a context, make sure that you know that only explicit calls to the database will be fresh by default.

C# Coding Conventions

I’ve actually had a few people who are just getting into “the business” of programming ask me for advice recently, and after looking through some code samples, I really think that people who are just starting are underestimating the level of importance we place on correct casing and naming.

 

Sure, the code might still work, but for others who have grown accustomed to the “right” way of defining things in the C# world, it just feels wrong, and there’s really no reason to not use accepted conventions from day #1.

 

I found one of my old links related to coding standards, it’s definitely worth a look for any C# developer. I was looking through it earlier before passing it on, and I have to admit to completely forgetting about the string compare issue listed in point 4.32! :)

 

http://weblogs.asp.net/lhunt/pages/CSharp-Coding-Standards-document.aspx

Entity Framework with Caching and ViewState

I’ve been doing quite a few projects now using Entity Framework with .NET 4.0 and wanted to post a few hints and tips based on my experiences dealing with entities and caching.

 

First off I’d like to mention that my experience with Entity Framework has been very positive in general. I’ve been using it since its .NET 3.5 version both for commercial and private projects and found it to be very useful and easy to use. I also intend to keep using it as my DAL for future projects.

 

As far as performance goes, and to prevent relatively static data from being retrieved from the database, it’s usually preferred to cache or somehow temporarily store the data to minimize the load on the database as much as possible. For Entity Framework this usually means storing the individual entities or a collection of them.

 

A very quick and  simple generic way to cache results in the memory here is used regardless of entities could be as follows, using the System.Runtime.Caching implemented in .NET 4.0. (Previously found in System.Web)

 

 

This will simply store the results from GetStuffFromDatabase for 24 hours in a sane matter, the double cache check with lock is to prevent multiple simultaneous requests from accessing the database at the same time.

 

The important thing to note here when using this with the regular entities from Entity Framework – is that by default, this will include the DataContext which the objects are attached to. Meaning that, for instance, the entities will be able to access linked related objects through this specific context – which also means it will perform additional queries against the database later as long as you don’t explicitly dispose of the specific context.

 

It’s also important to note that if you intend to use the cached objects as references for other entities later on, you will most likely use a separate context and can’t use the objects directly. The simplest workaround for this is either to use exposed foreign key ids on the entities, or re-load the entity in the other context. An example to demonstrate this using the GetStuff method above is as follows, assuming that the AddAndSave will add the object to a fresh context and the GetStuff data has already been cached. Also, if you don’t expose foreign key id columns (4.0 feature), you won’t be able to do this, but need to reload the object instead.

 

When it comes to ViewState in ASP.NET applications, you want to be very careful when it comes to storing entities there. As with entities cached in memory, the ones you store in viewstate will also include the context, which might seem fine when you’re doing simple testing. However, a worse scenario could be if you for some reason retrieve 1000 entities and decide that you want to store one of them in the viewstate for a given page, the entire context will still be included, so the actual size will end up being that of all 1000 entities in the context.

 

There are a few relatively simple workarounds to this, the quickest one is to detach the entity from the context before you put it into the viewstate, meaning that only the actual entity will be stored there. It should be noted that you wouldn’t be able to access any foreign key related entity properties once you decided to do this unless you explicitly re-attach it to a context – which typically is not something you want to explicitly keep track of in an ASP.NET application.

 

The other, and slightly smarter way, includes creating a custom simple wrapper class including only the properties you deem relevant for the page being presented. This will effectively minimize the size of the viewstate and provide a much more defined way of saying to developers that it’s *not* an actual entity which can use its relations to access more data like a detached entity might say – but an object which already should have the data needed for the specific page. For anyone using entities and really wanting to optimize the page presented to the user and still need to use viewstate, this is the way to go. It’s really as lightweight as possible, and you weed out any extra properties and general overhead that you would still have in a POCO-entity/detached state. For more static data, you could combine this with the memory cache to store wrapped entities for the best performance.

 

In short, I’m quite happy with Entity Framework! It’s all incorporated nicely into the core of the .NET framework now, and it has in my opinion improved quite a bit in its 4.0 version. Getting it set up and working with the performance improvements you get from cached entities is also very simple and effective once you see the pitfalls related to how the contexts actually behave.

 

Setting up the data layer of any application in the most optimal way obviously depends on the nature of the different applications, but I hope this post has given a few pointers regarding how you can use Entity Framework with the standard .NET memory-caching.

ASP.NET Web Performance Trace Debugging with Log4net

I’ve recently worked a fair bit with checking out performance and pinpointing errors within web applications where performance is a priority, which, to be fair, really should be all web applications out there. Over the course of my performance adventures, I’ve returned to messing around with something I don’t really see a lot of people use to the extent which they could/should, the ASP.NET Trace.

 

For those not familiar with this, it’s information about the current page request, including specifics regarding everything conceivable about the current request plus server information, and in particular when it comes to performance, it provides specific load times and a view of the control hierarchy – including render size and viewstate size.

 

I’ve made a quick and very simple sample to illustrate what you can do with this, and in my little sample, the load time trace information on my default page looks a little something like this.

  

This table shows the load times for each of the steps, and by the looks of things, there are things happening in the Page Load which takes an unusually high amount of time to load!

 

To get detailed information about which specific portions of the web page which spends the most time delaying my page from rendering, I’m starting by adding a simple log4net entry to the trace appender in my web.config file, and configuring it on application start.

 

My Page Load consists of 3 simple calls, I want to figure out which calls are being the most painful ones, and I wrap each of them with debug messages for testing, like this.

  

Keep in mind, this is just for a quick sample, in reality with complex systems you’d most likely want to place such messages inside the functions, and probably give them slightly more relevant information. Even better is if you already have a decent debug scenario set up across your solution, in which case you get a lot of free debugging here!

 

I recompile my solution now, and check the trace again, and find that the log4net messages have now decided to join in with the other trace messages, which adds a few more lines in-between the page load, as shown here. It’s important to note that the log4net messages now “break up” the step between begin and end load, as the trace counter simply counts from the last event, which log4net is now hooked up to.

 

This gives me an overview to work with stating that QuickStuff takes roughly 0.1 seconds to finish, SomeStuff roughly 2 seconds and LotsOfStuff is the winner with 5 seconds. Using these debug messages I can drill further down with more detailed debug messages to find the exact source of the "problem" and thus know exactly where I need to optimize code and/or queries to make the page load quicker.

 

By default, the ASP.NET Trace is disabled, and a lot of people tend to keep it that way, either by choice or by accident. To turn on this feature, you can add Trace=”true” to the page declaration of each individual page, but this does create some issues as you really don’t want to accidently put the trace out on a production site! I’ve found that a generic and more “on demand” way to enable trace on demand is to add a check for it through a querystring check in global.asax. Although, don’t put this in production either! :)

 

Doing something like this in global.asax will make the trace appear when you append ?trace=1 to whichever url you are working on, provided that it’s not overridden in the page declaration. It’s worth to note that working with ajax will sometimes mess up your trace, I tend to disable that temporarily if that’s the case – however, if I can’t work around this, I will add a fileappender to the log4net configuration, and check the delays between the entries there manually.

 

The last part I’d like to mention is the “page size” problem when it comes to performance. An issue which is more or less specific to ASP.NET WebForms is the added overhead with the viewstate. Using the Control Tree Information in the trace lets you detect how much of the page is viewstate, and how much is actual rendering.

 

I’ve adjusted my sample webpage to now have two labels, one of which has EnableViewState=”false”, both have had their Text properties set to Hello World! looped 100.000 times for over the top emphasis.

 

Here you can see the exact control hierarchy of all the controls and text on the page, total render size is 4mb, with each of the labels having 1.2mb of text and the label with the viewstate having 1.6mb on top of that in added viewstate.

 

Although this is a very far out example, it’s important to note that the viewstate of the label here is about 33% larger than the size of the actual label, and the client will have to download both of these, increasing the total download from a non-viewstate label to a viewstate by 133%. Also keep in mind that on client postbacks, the client will have to post the amount of viewstate back to the server, so if you’re having huge amounts of data to post for relatively simple operations, you might want to refactor or restructure your page if you’re experiencing performance issues.

 

A few obvious things I feel is worth pointing out here as a result of this is that if you have WebForm pages which don’t use postbacks at all, keep viewstate off! You can disable viewstate on a page level in the page declaration with an EnableViewState=”false”, the same goes for any controls you might have.

 

Furthermore, when binding complex objects, it’s ideal for performance to make sure you only bind objects with the properties you really need. If you’re using O/RMs in particular, you want to try to make as simple wrapper classes of your objects as possible, to ensure that you don’t keep pushing data you don’t need at all to both the client when it’s being read, but also on a potential postback to your server.