Should I Migrate to Entity Framework Core?

There's 2 upcoming releases that should pique the interest of most .NET Developers. EF Core 3.0 is in preview and expected to be released soon, and EF6 has started making steps toward support .NET Standard 2.0. If you're like me, you've probably experimented with EF Core a bit, and kicked the can down the road. However, some recent project work has given me the opportunity to study the 2 frameworks more in-depth, and really compare the features. This has caused me to re-evaluate my decision to ignore EF Core for the time being, and to develop a long-term plan to use it in the future.

Long story short, EF Core really does perform better. My current project work has required me to do some pretty heavy lifting, and work with both EF Core and EF6 DbContexts. The problem itself is kind of boring, but it does have enough data volume to make it hard even for an experienced developer. The work requires that about an object graph of about 325,000 entities and related entities be deleted from a dataset of about 10M. Not surprisingly, the first few attempts at doing so were quite frustrating. The expected running time of this script was in the neighborhood of 30 hours. Clearly there was a lot of work to do in optimizing it. I went the usual routes: maybe EF isn't cut out for this, and I should just use raw SQL. As it turned out, this didn't really improve matters, so I returned to working with entities instead of SQL. I tried parallelizing the operations and cut a few hours off the running time. But it simply wasn't even close to fast enough. Ultimately, I was going to need the full object graph in memory before attempting to do anything with it.

And this was my first observation: EF6 would choke on the query used to pull the whole recordset when I added the necessary includes. I let the query run for two hours before giving up in frustration. I noted the memory usage at this time: over 6GB, and it hadn't even pulled in all the related entities yet. But there was an opportunity for experimentation here: I was already using one EF Core DbContext, why not quickly create the necessary DbContext in EF Core and see what happened? And it worked! In 7 minutes. Total memory usage of the entire object graph: 2GB with the ChangeTracker enabled. So EF Core showed me some strengths from the performance and efficiency side. Clearly the SQL query generated was much more efficient, and the memory usage was much better.

Let's be fair to EF6 here though for a minute. Pretty much every .NET Developer in the last 5 years has used it at one time or another. The performance is OK, if a bit resource-heavy. With experience, a developer can optimize EF6 code to work quite well. And now EF6 is going to .NET Standard, eliminating the last hurdle to using it in .NET Core projects. Notwithstanding the performance observations, why change what you already know? EF6 on .NET Standard had the potential to bury the upstart EF Core project and give everybody what they were familiar with. A conservative, if unimaginative, approach.

But let's really consider the costs of scalability given the performance observations. If you end up scaling out because of memory usage and not CPU usage, then it seems kind of wasteful. You still have to pay for the extra CPU that goes with the memory, even if you're not using it. Our conservative approach is beginning to look a little fiscally irresponsible for a high-volume application. Our three-fold reduction in memory usage actually has the potential of saving money in the long run!

Maybe that's a little banal. It's not like you can copy and paste your EF6 context into an EF Core project and call it a day. In the mission to achieve performance and efficiency, EF Core has sacrificed some features. So, let's look at these missing features in a little more detail, and see what they really cost. I'll start with a feature that has since been added in EF Core 2.1, lazy loading. On the surface, lazy loading seems like a great idea. You only pull what you need when you need it. It's easier for developers to understand, and requires less time to craft new queries. But have you looked at your database metrics when you use lazy loading? Did you see that batch operations per second metric spike when you started iterating through your data? Did you know that every single access to an unloaded related entity requires a new database round trip? Lazy loading has taught us to be, well, lazy. In a high-volume dataset, you really can't afford these additional round trips. Experienced developers learned how to disable the feature altogether and use the Include() method instead. One query, one database round trip. With effort, you can make sure all the related entities you need are loaded all at once. There's a lot to be said for eager loading. EF Core requires that you enable the feature. Good, let's teach developers how to write better code. Yes it will require additional development hours, but the code quality and performance improves considerably.

Let's look at a second contentious feature: Table-per-Type inheritance. This really came about because once we taught developers to look at a relational database like an object graph, every single OOP feature imaginable had to be available. The storage model was the domain model. Hooray! Of course, if you actually look under the hood, TPT inheritance used in LINQ performs very poorly. Those arrogant EF Core guys! They are prescribing the TPH model instead, but now my tables are all wrong! My TPT model is beautiful and you guys won't support it! I have to have both a domain model and a storage model? That just sucks. Or maybe it doesn't. In ye olde days, we would transfer data from tables into a lightweight model that was serializable. Maybe you've heard of a DTO? Funnily enough, those two requirements -- lightweight, serializable -- describe a domain model quite well. Let's face it: relational storage will never look like the domain you are modelling unless it is trivial. Forcing us to map between the two is not a terrible design decision. It might even teach developers to write better code. Please say you weren't serializing the entities themselves to JSON. These same arguments apply to other missing features, such as many-to-many relationships. In the end, I am convinced that keeping the storage model looking like a storage model is not a bad design decision.

If you're still reading, maybe I've convinced you of the value of going with EF Core, despite its so-called shortcomings. Now for the interesting bit. What needs to be done? I'll give a rundown, and even apply some architectural principles while I'm at it. Our roadmap looks something like this:

  1. Break down that monolithic EF6 DbContext into bite-sized pieces.
  2. Write your domain model first. You made beautiful object models when you thought the storage model was the same as the domain model. Now you can do it without making concessions for your ORM of choice.
  3. Write your storage model for your ORM of choice. Now you absolutely have to make the necessary concessions.
  4. Write your AutoMapper (or whatever) code so that you can turn the storage model into the domain model and vice versa.
  5. You can optionally add object repositories here, but I find that they are often of little value. They do, however, have the advantage of translating your storage model to the domain model on the fly.

That's pretty much the process. I'll add some notes about each one and talk about how they support a future architecture.

Let's talk a bit about the monolith, and why it is that way to begin with. So you took a bunch of loosely-coupled domains and stuck them in a single database. Then you used some sort of wizard to create a DbContext out of your database. The first thing you need to ask yourself is: why are these loosely-coupled domains all stuck together like a giant ball of mud? Because they're part of the same application? So you can enforce foreign key constraints between domains? Those aren't bad answers, but I don't think it's good design. I'm not telling you that you have to break the database apart, so you can keep your foreign key constraints. But why does the DbContext have to exactly mirror the gigantic database? Apply some DDD and find your logical domains, and create one DbContext per bounded context. It will help with your thinking, and it will help with your code organization. And the option to migrate out a portion of the database somewhere else still exists. Maybe we'll find that different LOB applications have similar bounded contexts. Maybe it will be good to put those similar contexts into a master database that spans the organization. Ever get frustrated when you call one government department and they don't have the updates you gave to another one? Do you have a Person table in every single database for every single application? Exactly.

Now that you've identified your bounded contexts, write the domain model for all of them. You should be an expert at this by now.

Most domain models are going to map 1:1 to a storage model. Unless you have a good reason not to write the storage model, do it now. Make sure you remember the following: in EF Core, inheritance works using TPH (which may require that you reorganize some data tables), and many-to-many relationships require the existence of a join entity.

AutoMapper will ease the pain of mapping between your storage model and your domain model, and you can nicely abstract away the annoying details like the join entity.

The debate about the value of object repositories will probably rage for years to come. I don't use them personally, and prefer to inject the DbContext itself. But, I can see cases where they will come in handy. Having the object repository return the domain model is a useful function which abstracts away the use of AutoMapper. Additionally you can create read-only versions as well, which is handy if you're employing CQRS. It's not a wasted effort, but I find that directly injecting the DbContext gives me more flexibility in my command/query handlers. Not having to update the repositories as you write code is good too.

To conclude, I would point at the initial 2 observations that led me to write this piece. The performance of EF Core is really that much better. The effort you put into moving to EF Core will pay dividends in code quality and future architecture. I hope you've learned something from all this, and welcome any feedback or questions you may have.