I wanted to write a quick entry about Entity Framework and just point out some of the errors I’ve been asked to resolve, and some of the worst uses of EF I’ve seen performance-wise. All of these are done with the default EF4.0 settings and default entities with foreign keys exposed.
I should note that there are probably a ton of errors more frequent than these in the world, but this is based on my personal experience supervising and helping others use Entity Framework. If you have an error which keeps happening with your EF-application, add it to the comments below and I’ll try to write something about it in a future post.
Entity Framework Errors
System.InvalidOperationException: The relationship between the two objects cannot be defined because they are attached to different ObjectContext objects.
This error appears when you use entity objects from different contexts when specifying relations and attempt to persist the changes to the database, seeing how the state of the two objects are maintained by two different context instances, they can’t really know the exact state of the other object, and as such, refuse to go along with it. Typically this happens in real situations when you cache an entity or simply have a project which is designed to use more than one ObjectContext.
The best way to resolve this is to be consistent in terms of assigning related entities. For the most part I would recommend exposing the foreign key column and setting the id of the related entity directly rather than specifying the entire entity. When this isn’t an option, you either have to refresh the entity in the same context as the entity you want to add relations to, or explicitly detach the entity from its old context and attach it to the new one. The latter of these could be dangerous if it’s also referenced by its old context, so make sure you know what you’re doing if you go with that approach.
System.ObjectDisposedException: The ObjectContext instance has been disposed and can no longer be used for operations that require a connection.
This appears most often when you try to lazy-load entities related to an ObjectContext instance which has been disposed one way or another. I’ve seen quite a few instances of code which is similar to this approach:
which will subsequently die horribly due to the disposed context as a result of the using block. The typical real case scenario here is that you have a function which will return a list of products, then that list will be databound to a listview in a web application, and ProductModel.Name is evaluated there, resulting in much anger directed towards Entity Framework in general and a lot of creative workarounds as the deadline approaches.
To resolve this peacefully, a few different approaches should be considered. First off, if you maintain your ObjectContext in a way which lets the garbage collection deal with the entities, you shouldn’t have this problem at all unless you explicitly detach objects, resulting in you getting a nullreference exception instead of this exception.
If you explicitly dispose the context like it’s done here, you should always ensure that you have all the information you need by the time the context is disposed. If you work with more complex relations, you might want to consider creating some wrapper classes and populate them with all information needed, or simply specify include hints in the retrieval of the original object.
Entity Framework Common Performance Tips
First off, I would like to just mention that you really should run SQL Profiler or something to that extent while developing your EF-applications, especially if you’re new at this and not entirely sure what happens behind the scenes when you design your elegant LINQ queries. The ease of developing using Entity Framework as your data-layer comes with a certain responsibility to at least try to make efficient code. I also know some DBA types who really want to avoid the use of any O/RM due to the SQL it generates, so please think of the overall reputation of EF and try to keep things optimized. Again, these issues and suggestions I post here are based on my personal experiences with Entity Framework and optimizing actual code.
Do not ToList() before filtered iteration
Even though this one is a simple one, I’ve seen it slip through quite a few times. The standard way of messing up this way is to specify a query, add .ToList() to perform the query, and then filter it as we’d only want to see the first 500 entries regardless. When you’re developing on your own PC with a local SQL-server, you don’t really notice anything unless you’re profiling at the same time. In reality, you’re retrieving the entire table before you’re applying the filter, resulting in a lot more data transferred than what’s needed. Simply put, make sure you filter/select before you .ToList() a collection for re-use.
Related collections are *not* IQueryables
Something which is easy to forget, even if you laughed at the above point, is that the related collections when you have a one to many relationship are not IQueryables, but DataCollections. As a result of this, if you have a customer object and want to display his order number 123, doing customer.Order.Where(o=>o.OrderId == 123) is the same as retrieving all the orders for that customer from the database, then looking through the collection to find the right one. Rewrite this to retrieve the order based on both customerId and orderId to ensure you don’t end up with a lot more data than you need.
Be *very* careful when putting entities in ViewState
Avoid this as best you can, if you have to maintain objects in this fashion, try to make wrapper classes. Having the full entities in viewstate tend to be fairly huge compared to the relevant information that you want to keep track of, not to mention that if you have relations included, this will grow even bigger.
Avoid instantiating unnecessary ObjectContext-instances.
This is more of a CPU-performance issue, as the cost of creating the context is fairly high, so try to limit these and re-use a context as best you can, please do not put these things inside loops or loop-like scenarios such as partial class properties.
As EF4 rolled in and people could lazy-load without worrying about what was actually in the current context, it seemed to have the added effect of making developers a bit more lazy as well. The typical scenario where you really notice the improvement of a well placed include or two comes when you load up a decent set of data and databind it in an application, evaluating left and right across relations in the database, and EF simply serves up the data without any question at all.
It’s also very hard to give specific and definitive suggestions regarding when to use includes and when not to, as every situation tends to be different. Also, you need to consider whether or not this is for general use, or if it’s very specific for your one little area of code.
Here’s an example of 3 different ways to do the same thing, only difference is the amount of data being transferred and the amount of queries/stress being put on the SQL Server. For this example I want a report of 1000 sales from the AdventureWorks database, and I simulate requiring information from two related tables similar to how I’d do it with a report displayed nicely in a listview.
Now, the first method is slightly hard on my database with the amount of queries it’s performing, while the second one is the kindest, as it only requests the exact data it needs and returns it in an anonymous class, and the last one returns the same as the first one, only it does it in one query behind the scenes.
The obvious downside of the second option is that it’s an anonymous class and not very flexible. Creating a wrapper class to contain the specific information is okay to transfer it to other tiers of the application, but going down this road will most likely result in an insane amount of wrapperclasses for a ton of different data-subsets.
The third option provides the best option for flexibility when you need a lot of different data from all of these tables, but if you then re-use that method elsewhere and only want the specific information in the sales table, you’re all of a sudden getting a lot of extra information you’re not going to use.
So, as I’m unable to give one definitive answer here in terms of what to do, I can’t put enough emphasis on the importance of profiling your application while you’re developing. Go through the different database calls being done by your application on each page, and make sure each of them can be justified. You should also note that if you specify multiple includes, you really should ensure that it looks sane enough in a profiler and has a decent execution plan. Also, if you’re retrieving insane amounts of data, you really should avoid includes due to the sheer amount of data it will generate. If you’re loading a lot of metadata for caching, you should have a look at how the lazy loading works in this post, and keep messing around with SQL profiler to get the quickest loading time possible.
An additional note here on includes. I’ve seen a few of the generic data-layer/extensions which works with EF, but I haven’t seen one which enables you to specify include paths – as that’s not something that fits very well with the generic pattern in general, but something which must be done directly on the ObjectContext-table. Keep this in mind when you decide on such patterns and practices.