Time for a new part of the ECS back anf forth series. With this post I want to
go through the hybrid storage chimera for ECS libraries and explain why it’s
not that good as it seems at a first glance.
EnTT
already comes with an hybrid storage and it offers it since a long time
ago, even though it isn’t fully recognized as such by many (apparently I did a
good job of hiding it). Briefly, it does it with its grouping functionality.
For those who don’t know it, this feature allows users to literally create
tables across different and otherwise independent pools.
If on one side this has some benefits, on the other side it also suffers from
the same problems of other well known architectures. Be careful though: not
problems in an absolute way, but such only when framed in this perspective.
This is why I think I’m probably good enough to talk about hybrid storage and
especially explain the cons of them. In a nutshell, because I’ve already hurt
myself with them.
Introduction
The idea behind the concept of hybrid storage is quite simple but can also be
easily misunderstood.
I’m not talking of the custom storage offered by EnTT
here. In this case, we
have fully independent storage classes that somehow offer different
functionalities. For example, a plain array in one case and a paged one in the
other case or a pool that also emits signals versus one that does not. This
is not a hybrid storage, they are just custom independent pools.
A hybrid storage is something else. It doesn’t cover how a single type is laid
out. Instead, it covers how multiple types are laid out and how they affect
each other in case.
Let’s take the most common models: sparse set based solutions vs table based
ones (like those where archetypes are dynamically created at runtime, but also
where the number and the types of the tables are fixed at compile-time).
Can you spot the differences? Do you see why mixing them can lead to headaches
in the worst case or just get away from you some nice-to-have feature in the
best case? Do you know how groups fit into all of this?
If you don’t, then this post is for you.
Independent pools
A sparse set based model, like many others actually, is such that it has fully
independent pools. If it’s good or not is not something I want to discuss and
it’s primarily a matter of tastes. Of course, it’s my preferred design, but this
doesn’t make it any worse or better than others.
What matters here are the details. Because the devil is in the details.
Let’s make an example: multithreading characteristics. In this case, users can
easily add and remove components concurrently, as long as they don’t insist on
the same pool from different threads. No command queues, no staging areas, none
of this: you can just do it when you like. You can even sort a pool and update
the data within other pools from another thread if you like. There is no limit
to what you can do as long as you don’t break its most basic rule: don’t write
(for all definitions of write, so for example don’t update, add or remove)
instances of components having the same type from different threads.
Otherwise, without bothering with multithreading, consider that an independent
pools approach allows users to add and delete components during iterations (with
some tricks unknown to most), without the need for any overlay to delay these
operations.
Furthermore, it allows users to extend pools dedicated to specific types or to
change their internal design to fit a given goal (duck typing at the end of
the day, as long as it behaves like a storage, it’s a storage).
This gives users a high degree of freedom in many respects. As anything else, it
has a price but let’s ignore it for the sake of the discussion and focus only on
the main topic.
Note also that this kind of freedom isn’t necessarily desired, so far be it
from me to imply that it’s an advantage. It is for me of course, because I like
to have it and exploit it, but I’ve known many developers who are even afraid
of it and for example prefer a more classic approach with sync points to some
problems.
Tables
Tables (either archetypes or fixed predefined tables and so on) undergo a
completely different set of rules.
Roughly speaking, this is due to the fact that they aren’t built on top of
independent pools. Component T
for entity E
is somewhere in a table with
other components that you don’t know about and that get moved when their sets
change. This necessarily forces a different approach in a lot of cases. Neither
worse nor better, again, just different.
A trivial example is that of adding and removing components from a thread, to
create a link with the previous section. One can’t do this directly and that’s
all. You always need a support layer and a sync point.
Since users don’t know what components are in a table other than that they want
to add or remove, the risk of a data race is high. So, delaying operations to a
sync point is necessary, no matter what. Yay or nay? Good or bad? Not the focus
of this post.
This is also true for a single threaded approach unfortunately. Because of how
things are laid out and due to the fact that they get moved around after any
change (well, more or less), the risk is to elaborate an element twice in an
iteration otherwise.
It’s also harder in general to specialize a type or give it a special
treatment, or at least to make it flexible and easier to do user side. Types
have not their own pools and instead they share tables with other types. There
is little else to add.
On the other side, this kind of models (well, some flavors at least) are for example easier to use to balance the load between different threads. A model with fully independent pools requires more work to split the workload properly (it’s possible, though it’s not trivial, unless you’re doing it for a single type), while for example a chunked table (and only a chunked table actually, so not all table based models) is easy to translate in an almost perfect workload.
Grouping functionality and independent pools
How does EnTT’s grouping functionality feature fit into all of this?
It’s easy to explain, if you look at it for what it is. This feature is nothing
more than a way to create tables within an independent pools model. It
therefore introduces a hybrid model within a specific design.
How well does it actually work? What are the advantages and disadvantages of it?
Let’s go a little further on the topic.
In terms of performance, I’ve already told you everything I could say with my
previous posts.
It’s fast, damn fast, even if the price for this speed comes when components
(with data!) are added and removed. Because it’s easy to optimize adding and
removing components with no data, but also limited in scope. So, let’s consider
the most interesting cases and of course it has a price.
In terms of impact on the way we write code instead, the use of groups shouldn’t
be overlooked.
In fact, they suffer from the same problems that table based models suffer
from. It’s quite easy to understand: a group creates an interdependence between
independent pools to induce a sort of table within them. That’s it.
Users cannot make the same assumptions when doing multithreading as if they were
working with independent pools and cannot think of adding or removing instances
easily during an iteration without a support structure, because entities can
enter and exit the implicit table at any time.
On the one hand, the fact that groups are plug-and-play and that the types
involved are user-controllable limits their scope and allows us to benefit from
the multithreading features of the two models with a few precautions (perhaps,
even more than a few). On the other hand, when you begin to make extensive use
of groups and lose control over the number and types of components involved, you
find yourself giving up the benefits of an independent pool model, I dare say
out of fear.
This is especially true when working in a large team or when you don’t have the
full vision of the project, or even just if you’re a junior developer and don’t
want to take risks, which makes perfectly sense.
Opaque API… really?
And then we come to the point: how convenient is it to hide the presence of
multiple storage that can influence each other or not behind a transparent
API?
I did it and went back, so my answer is obvious probaly. Apparently, I’m not
even the only one who
tried this either.
Now, let’s consider this code for a moment:
auto view = registry.view<T, U, V>();
How are these types related to each other? How can I use them to get the most
out of my threads? A perfect workload? Upstream or downstream filtering? Can I
process them in parallel with other types? Can I add or remove easily during
iterations? And so on.
I don’t know, because there are important things that aren’t explicit here. The
only option is to use the more conservative approach. I could even decide to add
instances of type T
during iterations because I know that…, but how safe
is it? Well, it is not if my fellow decides to change how T
is laid all of a
sudden. This will introduce subtle bugs that are hard to spot and require me to
got throught the whole codebase to spot all uses of T
.
The fact is that there is a limit between ease of use and control over a system
and I personally think that here is exceeded.
The chimera of helping the user is sometimes a double-edged sword that risks
taking away the freedom of choice.
To what extent is all this true? As always, the answer is: it depends.
Looking at the specific implementations, my consideration is the following.
In an approach like that of EnTT
, where everything is designed as a container
that can be used at any time and things like scheduling are left to the user,
having an explicit model is certainly more convenient. The alternative is to
accept a compromise on some aspects and to find the common factor between the
different solutions, that is, to work as if there were only tables under the
hood.
Conversely, in a model that takes over your loop, schedules tasks for you, and
is more greedy for information and more invasive in managing your data in
general, the problem moves to the other side of the border. Being explicit still
helps, but this matters internally and not leaks to the user, therefore a more
homogeneous API is possible, at the price of less control over the dynamics for
the user.
So, to sum up, I would agree to use a system that uses different storage and
makes them opaque to me if it also takes care of managing the scheduling, any
command queues or similar and the merge of my data where necessary. I couldn’t
accept it otherwise.
However, this means giving up control and (entirely personal opinion) isn’t
something I like very much, so I would opt for more API clarity which allows me
to get the most out of my skills (which are non-existent, but that’s another
point).
Conclusion
What you should take away from this super short and very intuitive analysis is
that the two models aren’t easily interchangeable or, at least, not that easy to
run together at full capacity and making the most of both. Just like other
models are not, especially if they offer such different characteristics in many
respects. This is particularly true in a design like that of EnTT
, which looks
like a simple container and doesn’t try to take over any aspect of the
application.
Personally (but of course everyone has their own opinion) I prefer to adopt a
model and exploit it to the full, knowing its pros and cons. I like less having
something hybrid in hand, which perhaps solves a problem I didn’t have on one
side but also introduces new and trickier ones on the other.
Groups are the only exception I’ve ever made to my rule. Because every rule has
its exception, right?
Let’s open the personal parenthesis that maybe someone is interested in but
that others can skip. I use groups in very specific cases, where I know that
their pros will benefit me and their cons don’t contrast with the use I make of
certain types. However, I use them in a very controlled way, never letting the
number and type of components involved get out of hand. This happens because I
want to continue to exploit the features (all feature!!) of a model that I
personally appreciate the most, without losing the benefits of a technique that,
in some cases, can give me something more.
Let me know that it helped
I hope you enjoyed what you’ve read so far.
If you liked this post and want to say thanks, consider to star the GitHub project that hosts this blog. It’s the only way you have to let me know that you appreciate my work.
Thanks.