Don't POOP - The Partial/Optional Object Population Anti-Pattern

The Partial/Optional Object Population (POOP) anti-pattern occurs when have a class with multiple properties, we re-use it in various parts of our application, and in different places we populate some properties and ignore others, leaving some with default values. We do this because it seems easier than creating a new class with only the properties we need. It appears to save a few seconds but leads to maintainability issues and defects.

Suppose we have these classes containing information about a product:

public class Product
{
    public Guid Id { get; set; }
    public string Name { get; set; }
    public string Description { get; set; }
    public ProductPrice Price { get; set; } 
}

public class ProductPrice
{
    public decimal Amount { get; set; }
    public string Currency { get; set; }
}

(You may see issues with this code that have nothing to do with whether objects are partially populated, but that’s realistic. Wherever we find one anti-pattern we’re likely to find others.)

We start with methods that return a Product or a collection:

Task<Product> FindProduct(Guid id);
Task<IEnumerable<Product>> FindProducts(SearchCriteria criteria);

Each Product returned from these methods has an ID, name, description, and a ProductPrice object representing the price of the product. So far, so good.

Then we encounter another scenario: Another part of our application needs the ID, name, and description, but not the price. Perhaps the price lookup consumes more resources and that method doesn’t use it, so why retrieve it?

We create a new method:

Task<Product> GetProductDisplayDetails(Guid productId);

This method returns a Product with a null Price property. What’s the harm? The method that calls GetProductDisplayDetail doesn’t use the Price property, and We know that Price isn’t populated when we get a Product from this method.

Then we have a requirement to provide promotional discounts for some products, so we modify our ProductPrice with some new properties:

public class ProductPrice
{
    public decimal Amount { get; set; }
    public string Currency { get; set; }
    public decimal DiscountPrice { get; set; }
    public string DiscountCode { get; set; }
}

…and we create a new method that returns discounted products for specific customers:

Task<Product> GetDiscountedProduct(Guid productId);

When we call this method we get a Product with a ProductPrice, and the DiscountPrice and DiscountCode properties are populated.

Now we’ve got three ways to get a Product, and which properties are or aren’t populated depends on which method we got it from.

The Problem

When we start out we know that some Product or ProductPrice properties are populated depending on which method returned them. We know that because we wrote the code five minutes ago.

Over time as our code becomes more complex we may pass these Product objects to more methods, and as we do so it becomes harder to keep track of where they came from. Developers working in other parts of the code may be surprised to find that when they perform an operation on one Product it works, but in another case they get a NullReferenceException because they expected a Price and there was none. Or there was a price and they expected a DiscountCode but there was none. Or, worse, they expected the DiscountPrice to be populated and introduced another defect because sometimes the discounted price is $0.

Chances are that if we “solve” a few problems by adding a few properties to these classes we won’t stop. The more properties the class has the more likely it that some new consumer will have a use for it, and re-using it will seem more expedient than creating a new one. That consumer may have a need to add its own properties, and the problem grows. We may end up with dozens of properties, combinations of which are populated by some code paths and not others. I’ve seen this become so confusing that developers begin to duplicate properties. Clusters of properties appear on a class and on an object contained within that class. Imagine seeing multiple DiscountPrice properties when reading code and having to figure out which one has a value, or realizing that the answer is “It depends.”

POOP also violates the Interface Segregation Principle. The varied, unrelated consumers of the class become effectively coupled to each other because a change to the class made to meet the need of one consumer potentially impacts the other consumers. That wouldn’t be a concern if they all used different classes instead of using the same one for unrelated purposes.

This is the sort of problem we learn to work around, but doing so has consequences. Developers lose the ability to look at code in a smaller context. If a method does something with a Product, we can’t understand that method in isolation. We find ourselves tracing all the code paths leading to that method to figure out where that Product came from. We can figure it out if we’re careful, but having to be careful slows us down. When we carefully add new code we add to the complexity for the next developer. They must be as careful as we were and understand the new code we’ve added as well.

This cycle may be sustainable, but it costs us. Changes take longer and longer and the chances of introducing defects increases. Instead of modifying the code to add useful functionality we’re doing so to fix defects. Fixing them takes longer and is more likely to introduce even more defects. This churn can go on for months or years

Good thing we saved a few seconds by re-using existing classes instead of creating new ones!

Solution: Invariants

The easiest solution is to define Product and ProductPrice to enforce that properties which should be populated are always populated. This is an invariant - something that is always true. An example of this would be an immutable class. Its properties are validated and set when an instance is created and never change.

It might look like this:

public class Product
{
    public Product(Guid id, string name, string description, ProductPrice price)
    {
        if (id == Guid.Empty) 
            throw new ArgumentException($"'{nameof(id)}' cannot be an empty Guid.");
        if (string.IsNullOrEmpty(name)) 
            throw new ArgumentException($"'{nameof(name)}' cannot be null or empty.");
        if (string.IsNullOrEmpty(description)) 
            throw new ArgumentException($"'{nameof(description)}' cannot be null or empty.");
        Id = id;
        Name = name;
        Description = description;
        Price = price ?? throw new ArgumentNullException(nameof(price));
    }
    public Guid Id { get; }
    public string Name { get; }
    public string Description { get; }
    public ProductPrice Price { get; }
}

The object can’t be created without supplying all of its properties, and once it’s created those properties can’t be changed (assuming that we also make ProductPrice immutable.) The properties aren’t optional. I had to work that extra “O” in there to make the “POOP” acronym.

It’s more code and takes longer to write, but that effort may pay for itself many times over. Why? Because whenever a future developer encounters Product anywhere they don’t need to consider where it came from. The class definition itself tells them everything they need to know. What they think they know about it will never be wrong. The extra minute or two it took to create an immutable type may in the long run save dozens of hours or more that might have been spent reading code over and over or fixing defects.

If we have lots of properties then applying the builder pattern may be easier than having a constructor with many arguments. C# 9 also introduces record types which make defining immutable classes easier.

I use immutability as an example of invariance. What ultimately matters is not that an object is immutable but that it’s always in a valid state.

Solution: Group Sets of Properties Into Classes

One of the reasons why this problem might occur in the first place is that Product has dozens of properties. If we need a class that has many of those properties and a few new ones, that’s when it seems easier to add to existing classes instead of creating new ones.

We can mitigate this by grouping properties into classes as we did with the ProductPrice class. This makes it easier to create new classes with combinations of the properties we need. Whenever we find ourselves duplicating sets of related properties across multiple classes we should consider encapsulating them within a single class.

Solution: Inheritance

I’m just kidding. Inheritance might seem like an easy solution. If we need a class with all the properties of Product and a few more, we can inherit from Product and the new class has more properties. But unless we’re careful it leads us back to the same place. We’ll need a class with some of the properties from Product, some of the properties from the new inherited class, and a few new ones. The we end up with partially populated objects plus the confusion of inheritance. The previous solution is better. Prefer composition over inheritance.

Conclusion

We’ve all found ourselves working in code that’s difficult to understand and safely modify. When that happens, can we identify individual decisions that led to that difficulty? If we can identify them then we can avoid repeating them and even begin to fix them.

Big complexity is usually the accumulation of small decisions, such as

We decided to create a class with multiple properties but didn’t think making it immutable was worth the effort.
We needed a new property some of the time, and adding it to an existing class and populating it only when needed would take less time than creating a new class.
Repeat step 2. We just need one more property.

We make these choices because individually they are small, having no apparent consequences. The effects come later. This, in my opinion, is one of the greatest challenges of software development. There are causes and effects, but we don’t connect them because the effects are deferred, and by the time we encounter them there are lots of causes mixed together. This is why our code gets out of control, we hate it, and yet given the chance to start over we make the same decisions with the same eventual results.

Once we begin to connect cause and effect we see that few decisions are truly insignificant. Realizing this could overwhelm us. How do we look into the future and see the outcome of each choice? We can’t. We can guide our decisions by applying principles and avoiding anti-patterns so that we make fewer decisions we regret and the rest are easier to change as the need arises.