ECS back and forth

Time for a new part of the ECS back anf forth series. With this post I want to go through the hybrid storage chimera for ECS libraries and explain why it’s not that good as it seems at a first glance.
EnTT already comes with an hybrid storage and it offers it since a long time ago, even though it isn’t fully recognized as such by many (apparently I did a good job of hiding it). Briefly, it does it with its grouping functionality. For those who don’t know it, this feature allows users to literally create tables across different and otherwise independent pools.

If on one side this has some benefits, on the other side it also suffers from the same problems of other well known architectures. Be careful though: not problems in an absolute way, but such only when framed in this perspective.
This is why I think I’m probably good enough to talk about hybrid storage and especially explain the cons of them. In a nutshell, because I’ve already hurt myself with them.

Introduction

The idea behind the concept of hybrid storage is quite simple but can also be easily misunderstood.
I’m not talking of the custom storage offered by EnTT here. In this case, we have fully independent storage classes that somehow offer different functionalities. For example, a plain array in one case and a paged one in the other case or a pool that also emits signals versus one that does not. This is not a hybrid storage, they are just custom independent pools.
A hybrid storage is something else. It doesn’t cover how a single type is laid out. Instead, it covers how multiple types are laid out and how they affect each other in case.

Let’s take the most common models: sparse set based solutions vs table based ones (like those where archetypes are dynamically created at runtime, but also where the number and the types of the tables are fixed at compile-time).
Can you spot the differences? Do you see why mixing them can lead to headaches in the worst case or just get away from you some nice-to-have feature in the best case? Do you know how groups fit into all of this?
If you don’t, then this post is for you.

Independent pools

A sparse set based model, like many others actually, is such that it has fully independent pools. If it’s good or not is not something I want to discuss and it’s primarily a matter of tastes. Of course, it’s my preferred design, but this doesn’t make it any worse or better than others.
What matters here are the details. Because the devil is in the details.

Let’s make an example: multithreading characteristics. In this case, users can easily add and remove components concurrently, as long as they don’t insist on the same pool from different threads. No command queues, no staging areas, none of this: you can just do it when you like. You can even sort a pool and update the data within other pools from another thread if you like. There is no limit to what you can do as long as you don’t break its most basic rule: don’t write (for all definitions of write, so for example don’t update, add or remove) instances of components having the same type from different threads.
Otherwise, without bothering with multithreading, consider that an independent pools approach allows users to add and delete components during iterations (with some tricks unknown to most), without the need for any overlay to delay these operations.
Furthermore, it allows users to extend pools dedicated to specific types or to change their internal design to fit a given goal (duck typing at the end of the day, as long as it behaves like a storage, it’s a storage).

This gives users a high degree of freedom in many respects. As anything else, it has a price but let’s ignore it for the sake of the discussion and focus only on the main topic.
Note also that this kind of freedom isn’t necessarily desired, so far be it from me to imply that it’s an advantage. It is for me of course, because I like to have it and exploit it, but I’ve known many developers who are even afraid of it and for example prefer a more classic approach with sync points to some problems.

Tables

Tables (either archetypes or fixed predefined tables and so on) undergo a completely different set of rules.
Roughly speaking, this is due to the fact that they aren’t built on top of independent pools. Component T for entity E is somewhere in a table with other components that you don’t know about and that get moved when their sets change. This necessarily forces a different approach in a lot of cases. Neither worse nor better, again, just different.

A trivial example is that of adding and removing components from a thread, to create a link with the previous section. One can’t do this directly and that’s all. You always need a support layer and a sync point.
Since users don’t know what components are in a table other than that they want to add or remove, the risk of a data race is high. So, delaying operations to a sync point is necessary, no matter what. Yay or nay? Good or bad? Not the focus of this post.
This is also true for a single threaded approach unfortunately. Because of how things are laid out and due to the fact that they get moved around after any change (well, more or less), the risk is to elaborate an element twice in an iteration otherwise.
It’s also harder in general to specialize a type or give it a special treatment, or at least to make it flexible and easier to do user side. Types have not their own pools and instead they share tables with other types. There is little else to add.

On the other side, this kind of models (well, some flavors at least) are for example easier to use to balance the load between different threads. A model with fully independent pools requires more work to split the workload properly (it’s possible, though it’s not trivial, unless you’re doing it for a single type), while for example a chunked table (and only a chunked table actually, so not all table based models) is easy to translate in an almost perfect workload.

Grouping functionality and independent pools

How does EnTT’s grouping functionality feature fit into all of this?

It’s easy to explain, if you look at it for what it is. This feature is nothing more than a way to create tables within an independent pools model. It therefore introduces a hybrid model within a specific design.
How well does it actually work? What are the advantages and disadvantages of it? Let’s go a little further on the topic.

In terms of performance, I’ve already told you everything I could say with my previous posts.
It’s fast, damn fast, even if the price for this speed comes when components (with data!) are added and removed. Because it’s easy to optimize adding and removing components with no data, but also limited in scope. So, let’s consider the most interesting cases and of course it has a price.

In terms of impact on the way we write code instead, the use of groups shouldn’t be overlooked.
In fact, they suffer from the same problems that table based models suffer from. It’s quite easy to understand: a group creates an interdependence between independent pools to induce a sort of table within them. That’s it.
Users cannot make the same assumptions when doing multithreading as if they were working with independent pools and cannot think of adding or removing instances easily during an iteration without a support structure, because entities can enter and exit the implicit table at any time.

On the one hand, the fact that groups are plug-and-play and that the types involved are user-controllable limits their scope and allows us to benefit from the multithreading features of the two models with a few precautions (perhaps, even more than a few). On the other hand, when you begin to make extensive use of groups and lose control over the number and types of components involved, you find yourself giving up the benefits of an independent pool model, I dare say out of fear.
This is especially true when working in a large team or when you don’t have the full vision of the project, or even just if you’re a junior developer and don’t want to take risks, which makes perfectly sense.

Opaque API… really?

And then we come to the point: how convenient is it to hide the presence of multiple storage that can influence each other or not behind a transparent API?
I did it and went back, so my answer is obvious probaly. Apparently, I’m not even the only one who tried this either.

Now, let’s consider this code for a moment:

auto view = registry.view<T, U, V>();

How are these types related to each other? How can I use them to get the most out of my threads? A perfect workload? Upstream or downstream filtering? Can I process them in parallel with other types? Can I add or remove easily during iterations? And so on.
I don’t know, because there are important things that aren’t explicit here. The only option is to use the more conservative approach. I could even decide to add instances of type T during iterations because I know that…, but how safe is it? Well, it is not if my fellow decides to change how T is laid all of a sudden. This will introduce subtle bugs that are hard to spot and require me to got throught the whole codebase to spot all uses of T.

The fact is that there is a limit between ease of use and control over a system and I personally think that here is exceeded.
The chimera of helping the user is sometimes a double-edged sword that risks taking away the freedom of choice.

To what extent is all this true? As always, the answer is: it depends.

Looking at the specific implementations, my consideration is the following.
In an approach like that of EnTT, where everything is designed as a container that can be used at any time and things like scheduling are left to the user, having an explicit model is certainly more convenient. The alternative is to accept a compromise on some aspects and to find the common factor between the different solutions, that is, to work as if there were only tables under the hood.
Conversely, in a model that takes over your loop, schedules tasks for you, and is more greedy for information and more invasive in managing your data in general, the problem moves to the other side of the border. Being explicit still helps, but this matters internally and not leaks to the user, therefore a more homogeneous API is possible, at the price of less control over the dynamics for the user.

So, to sum up, I would agree to use a system that uses different storage and makes them opaque to me if it also takes care of managing the scheduling, any command queues or similar and the merge of my data where necessary. I couldn’t accept it otherwise.
However, this means giving up control and (entirely personal opinion) isn’t something I like very much, so I would opt for more API clarity which allows me to get the most out of my skills (which are non-existent, but that’s another point).

Conclusion

What you should take away from this super short and very intuitive analysis is that the two models aren’t easily interchangeable or, at least, not that easy to run together at full capacity and making the most of both. Just like other models are not, especially if they offer such different characteristics in many respects. This is particularly true in a design like that of EnTT, which looks like a simple container and doesn’t try to take over any aspect of the application.
Personally (but of course everyone has their own opinion) I prefer to adopt a model and exploit it to the full, knowing its pros and cons. I like less having something hybrid in hand, which perhaps solves a problem I didn’t have on one side but also introduces new and trickier ones on the other.

Groups are the only exception I’ve ever made to my rule. Because every rule has its exception, right?
Let’s open the personal parenthesis that maybe someone is interested in but that others can skip. I use groups in very specific cases, where I know that their pros will benefit me and their cons don’t contrast with the use I make of certain types. However, I use them in a very controlled way, never letting the number and type of components involved get out of hand. This happens because I want to continue to exploit the features (all feature!!) of a model that I personally appreciate the most, without losing the benefits of a technique that, in some cases, can give me something more.

Let me know that it helped

I hope you enjoyed what you’ve read so far.

If you liked this post and want to say thanks, consider to star the GitHub project that hosts this blog. It’s the only way you have to let me know that you appreciate my work.

Thanks.