Why You Shouldn’t Expose Your Entities Through Your Services

I sometimes still get questions from people who want to expose their entities through their WCF Services. Regardless of whether these are entities that are populated through NHibernate or any other ORM, this is just not a good thing to do. Many people prefer to accept and return entities through their services because they believe this is an easier programming model. They believe that it takes less work than mapping to DTO’s and that as a whole, this solution is much more manageable. Rest assured that this is a fallacy. Any perceived benefit that you’ll get from exposing entities outside of your service layer will only last a very short time and will quickly be dwarfed by added complexity, increased maintenance overhead and a performance overhead which must not be ignored.

In this post, I'd like to take the chance to explain the downsides to exposing entities through services. Though I'll probably miss quite a few of the downsides (feel free to add to the list through comments), the ones I will mention are IMO important enough to take note of.

Exposing entities to clients means your clients are very tightly coupled to your service(s)

Entities are a part of your domain. These entities in your domain can change for various reasons. Sometimes because functional changes are required, but quite often also for optimizations (whether they are for performance reasons or to improve the clarity and maintainability of your domain). Functional changes can impact your clients, though that is not necessarily the case. Optimizations hardly ever have an impact on your clients (other than possibly improved response times from your service calls obviously). If your service layer accepts and returns domain entities, each possible change is highly likely to have an impact on your clients. And this impact is not cheap. In the best case scenario, it means updating your service contracts, regenerating your service proxies and redeploying your clients. In the worst case scenario, it means making actual changes to the code of your clients. And for what? Because of changes that shouldn’t have impacted your clients in the first place?

Ideally, your clients are as dumb as they can be. They should know as little as possible about the actual implementation of the domain because that implementation is simply not relevant to them. They should present users with data and give them the option to modify that data, to trigger actions and to perform certain tasks. They should focus squarely on those tasks and pretty much everything else is typically better suited to be done behind your service layer. If you build your clients with no real knowledge of the actual domain model, but of DTO’s and possible actions to be performed then you can reduce the level of coupling between your clients and your services substantially.

Many of the people who prefer to expose entities often claim that going for the DTO approach introduces too much extra work and too many extra, seemingly unnecessary classes. For starters, they don’t want to write code that maps entities to DTO’s. First of all, the amount of code that this requires is in reality very small, not to mention very easy. Secondly, you can just as well use a library such as AutoMapper to take that pain away from you. And contrary to what you might think, there is a big performance gain to be had from returning DTO’s over entities, but I'll get to that in the next section.

Entities are hardly ever the most optimal representation of data

I think we can safely say that most applications need to show data in the following 3 ways:

  • In a grid view, either as a total listing of all instances of a certain type of data or the result of a search query or some kind of filtering action
  • In dropdown controls or anything else that lets users select pieces of data
  • In edit screens where a piece of data needs to be displayed in its entirety, perhaps even to be modified by the user

There are undoubtedly more ways in which data can be presented to the user but I think it’s safe to say that most business applications will certainly rely on the following 3 ways quite heavily.

In the case of a grid view, you’re frequently showing data that is related to more than one entity. You’ll often need to include the name or the description of some associated entities. So what exactly is it that you want to do in this situation? Do you want to return a list of the main entities of the grid view, which all have their required association properties filled in so you can display the columns that you need in the grid view? Do you actually need all of the properties of these entities (for both the main entities and the associated entities)? Odds are high that you’re going to be returning a lot more data to the client than you actually need. And that is what is realistically going to hurt the performance of your system. Any piece of unnecessary data that you transmit to your clients has a cost associated with it. The unnecessary data is retrieved from the database. The entities are then serialized at the service end. Then they are transmitted to the client. Then they are deserialized by your client. All of this is pretty costly, so the more unnecessary data that is included in this operation, the more your performance and the responsiveness of your client (not to mention your database and your server) is impacted negatively.

In the case of dropdown controls or anything else that lets users select pieces of data, you typically only need very few of the properties of that piece of data. In many cases, the primary key and a name or a description are sufficient. Do you really need to transmit the entire entity every time for usages like this? Again, keep in mind that all of that extra data that will never be used by your client needs to be retrieved, serialized, transmitted and deserialized again. Surely, this is an awful waste, no?

And then there’s the case where a piece of data needs to be displayed in its entirety. In these cases, you will almost always need all of the properties of the entity that is displayed, but you’ll most often also need to show other data (things that can be selected, or linked to the main entity). This other data will in most cases fall into the previous category where you’ll only need very little information about the actual entity. If you’re smart, you’ve chosen the DTO approach to retrieve this data for the data that can be selected, and in that case, you already have all of the infrastructural code in place to project entities or data into DTO’s. So you might as well reuse it for the main entity as well since you already have the capability to do this.

Always keep in mind that your entities will frequently either contain more data than needed, or less data than needed. As such, it just doesn’t make much sense to expose entities to your clients since they are hardly ever optimal for client-side usage. If you really want to think about performance, stop worrying about the supposed cost of mapping to DTO’s (which is truly negligible) and start focusing on what your actually sending to and from your service because this is far more costly than any kind of DTO-mapping really is.

Must your data really come from entities?

If you are displaying data to your user, does that data really need to come from your domain model? Does it really need to be retrieved by populating a collection of entities to then return them to the client? Again, keep the form of the data in mind when thinking about this. In many cases, as I mentioned above, an entity is not the most optimal form of the data that your client needs. So why even retrieve it through entities? Sure, asking your ORM to retrieve a set of entities based on a set of criteria is often the easiest thing to do, but if the easiest path were the best path, the overall quality of software projects wouldn’t be in the sad state that it’s in today. If the form of the required data is not identical to the structure of an entity, it’s often far more optimal to simply populate a DTO directly from the data. With NHibernate, you can easily do this by adding a list of projections to your query and then using a ResultTransformer to populate the DTO’s based on the direct output of the query. In this case, no entity instance ever needs to be created when you’re just retrieving data, and no extra mapping between the entity and the DTO’s needs to be performed. Your data access code simply retrieves the resulting data from a query, and puts that data directly in your DTO’s. There’s no reason why usage of an ORM should prevent you from doing this. Once again, this approach will offer far more performance benefits than avoiding DTO mapping at all costs ever can.

What about the behavior of your entities?

Do your entities have any behavior in them? If not, they are already more of a DTO than a true entity. In fact, if your entities have no behavior at all, you could even wonder why you’re using an ORM in the first place. Now, behavior can mean many things. It could mean lazy loading of associations. It could mean actual business logic. Obviously, lazy-loading doesn’t (and shouldn’t!) work client-side, but what about your business logic? Do you have business logic that can be executed client-side? Or is it business logic that should only be executed behind the service layer? If so, how do you make the distinction between this to prevent client-side usage from these entities? Whatever you do, you’re pretty much opening up a can of worms that really is better avoided in the first place.

How are you going to deal with technical issues?

Accepting and returning entities from services introduces a host of technical issues that can be quite substantial. Serialization and deserialization specifically are issues that you need to be worried about. If you’re using an ORM which does lazy-loading of associations, this will certainly cause serialization issues that you need to work around. You can either disable lazy loading, or you can make sure that your entities are always fully initialized (as in: always have their associations fully loaded) before they are sent back to the client. Disabling lazy-loading will cause performance problems in your service layer, either in places where you don’t expect them to be or in places that you haven’t thought of before it’s too late. Fully loading your entities and their associates before returning them is another performance nightmare waiting to happen so that’s really not an ideal solution either. You can try to hook into the serialization process or even the lazy-loading features of your ORM but whatever you do in that case will be a hack that will cause issues sooner or later. And again, all of these problems can very easily be avoided with a solution which, I hope you realize by now, offers plenty more benefits than any solution where you accept/return entities in your service.

Conclusion

Every single downside to exposing entities through services are issues that I have myself encountered in past projects, either ones I've worked on myself, or ones that I've seen other people work on. If that’s not enough for you, then maybe you’ll find it interesting to know that some of the brightest and most respected people (like Udi Dahan and Ayende for instance) in the .NET community also actively recommend against exposing entities through services because of the same downsides that I mentioned, though they could probably give you even more downsides that I forgot to cover in this post. These downsides are not figments of anyone’s imagination. They are very real, and you really, really ought to think twice before dismissing this advice.

Written by Davy Brion, published on 2010-05-17 16:18:43
Categories: architecture , code-quality , opinions , performance , wcf

« Consuming An Agatha Service From A Non-Agatha-Aware Client Isolation At Work: Good Or Bad? »




comments powered by Disqus