Second part of the ECS back and forth series. After a half-way solution between OOP and component-based models and an approach where entities were also indexes, I want to introduce another couple of techniques that try to get rid of some common problems seen so far. In particular, we have still to find a way that gives us all the entities and only the entities that have a given set of components.
Later on in the series, I’ll go probably in depth into one of the models
presented, the one I know best and on which I can give more implementation
details.
Fortunately, more or less all the solutions share many aspects with each other.
It will therefore be possible to reuse most of the techniques and suggestions
while trying to implement your own tool.
If you haven’t done it yet, read the first part of the series before to continue.
Introduction
In our journey from OOP towards component-based models we left out a problem
that is still unsolved. In particular, both the implementations discussed in the
previous part require systems to iterate all the entities to know what are the
ones with the desired set of components. There is apparently no way to avoid
it.
However, as it happens ofter in computer science, an extra level of indirection
can solve the problem. This is due mainly to the fact that it helps to decouple
the client code from the data structures where entities and components are
stored and this allows to lay out the objects to ease the search or to fill the
holes, if any.
As a side note, the extra level of indirection is more powerful than this. In fact, we can do much more than rearranging components and fill holes. I’ll try to give more details in a future post, but this is beyond the purposes of this one. Because of that, in the next sections I’ll concentrate only on how to get packed arrays for components and entities and to leave holes out of the way.
Archetypes
A model very popular lately, at least since Unity has declared to have implemented a version of it, is that of archetypes. They were previously known also as prototypes, but it seems that archetypes is now the common term used for this approach.
The basic concept is rather simple, even if implementing an efficient version of
it isn’t trivial at all.
The idea behind this approach can be summarized as follows: if an entity has a
particular set of components, take the pool (also known as archetype) for the
entities that have that same set (if it doesn’t already exist, create it) and
assign the entity and all its components to that pool. Whenever you add/remove
a component to/from an entity, pick up everything again and move the entity and
all its components from a pool to the other, from an archetype to the other.
The implementation details for this aren’t that easy, but there exists a good
project on GitHub called decs
that
offers an experimental version of an archetype based model in C++.
Another implementation of this design I’m aware of is
reflecs
, although this time it’s
about a C project and not C++. I’m not a fan of C-ish interfaces (I wrote
uvw
to avoid working directly with libuv
after all). However, this project is particularly interesting, since the author
has tried to go beyond the standard features usually offered by component based
models and added things such as the ability to make entities literally
children of other entities as if they were components. I can’t say if features
like these can be useful in everyday use, personally I’ve never felt the need,
but they are certainly interesting and make reflecs
more intriguing than other
projects.
So, how does archetypes solve the problem of finding all the entities that have
certain components?
An archetype based solution no longer requires you to iterate all the entities
and test them to know if they have the desired components. Instead, this time
we iterate all the archetypes (that are likely to be much less than the
entities), then we return all the entities from the archetypes that are built
for a set of components that contains at least the desired ones (but not
necessarily only the desired ones). Most likely there will be more than one
archetype to visit, but you are guaranteed that all the entities in an archetype
have at least the components of interest if the archetype matches the query and
therefore you don’t have to test the entities as well. This results in a huge
speed up at the end of the day.
Another benefit is that the archetype contains both the entities and the
components and the latter will most likely be organized in packed arrays all
ordered in the same way. Therefore, visiting an archetype and returning what it
contains results in an ordered visit of some arrays.
One of the advantages of this approach that isn’t obvious at all initially is
that it fits particularly well with multithreading. In general, component
based models are often designed to exploit also this aspect. However, archetypes
allows to do it in an even easier way, if possibile.
If you consider that the entities that have at least some components are likely
to be scattered around in different archetypes, you can imagine to elaborate the
archetypes separately. If you then put objects in different blocks of the same
size within an archetype, you can have even a better balance by directly
distributing those blocks to the various threads. Consider that archetypes could
be highly unbalanced, therefore the first solution could result in suboptimal
performance and defeat a bit the purpose of using multiple threads.
In both cases, multithreading becomes trivial to implement and to reason of,
what to elaborate and how to distribute the load derives directly from how the
data are laid out under the hood. Moreover, the way instances are stored can
easily reduce to a minimum false sharing and improve overall performance.
Unfortunately, also archetypes have their own drawbacks as with any other solution.
Intuitively, the first problem to face derives from the way they work. Every
time a component is added or removed, an entity and all its components are moved
from an archetype to another one. This affects to an extent the construction and
destruction of components. The problem can be mitigated with a bulk add, but one
cannot really get rid of it if instances of components are assigned and removed
on the fly.
Because of that, this model works like a charm when the sets of components
assigned to the entities don’t change much during the runtime. This is for
example true in low level systems, why it isn’t necessarily the case with high
level systems. A possible solution to reduce the impact of this problem is to
use this model to manage almost-fixed types of components (eg it’s unlikely that
instances of the physics
component are removed from their entities once
assigned and this makes physics
a good candidate among the others) while
relying on a secondary data structure to deal with those components that are
frequently assigned or removed during the runtime.
However, the main issue you can encounter with archetypes is the
fragmentation.
Shortly, an aspect that is less evident but still present is the fact that
archetypes aren’t primarily designed around usage patterns. This technique
simply tries to optimize everything as much as possible, since it is known that
within the everything there are also our usage patterns. However, this has a
drawback, that is that you lose the chance to get even more on the archetypes
(let me say) of interest and sacrifice something to optimize things for
iterations you won’t ever perform. Even worse, the number of archetypes could
explode if you have a high number of possible combinations of components
assigned to different entities at runtime and this will definitely affect the
iterations to an extent by adding more and more jumps to find all the entities.
Finally, there is something that can and cannot be a problem. If it is depends
mainly on your point of view, but it’s worth mentioning.
In particular, entities and components are all stored in separate blocks and
it’s not possible anymore to get a couple (T *, size)
for a given type to
iterate or to memcpy
everything at once. For the same reasons, using custom
allocators could give us slightly lower benefits than with other solutions that
combine all components into a single array.
That being said, archetypes play in another league when compared to the
solutions seen so far and can be safely considered one of the best possible
approaches to the problem.
Do not forget that the perfect solution doesn’t exist and each has pros and
cons. In this case, if we consider it as a whole, it’s definitely worth it.
Sparse sets
Sparse sets flip the problem on its head and approach it in a way that is completely different from that of archetypes. I’ve not seen many implementations of component based models that are based on this data structure, probably because it can be used in so many different ways that is difficult to find a sort of guideline. Therefore newcomers are probably discouraged.
Before to describe how sparse sets can be used to address the problem of knowing what are all the entities and only the entities that own a specific set of components, let’s see what the sparse sets are and how they work.
They called them packed arrays
A sparse set is such that it’s:
[…] a clever data structure for storing sparse sets of integers on the range
0 .. u−1
and performing initialization, lookup, and insertion is timeO(1)
and iteration inO(n)
, wheren
is the number of elements in the set.
At least according with
this interesting
article that explains clearly how they work.
You can find blog posts and examples of use of sparse sets/packed arrays all
around the web if you know what to search. As an example, the BitSquid blog
contains one post
about it. Similarly, Molecular Matters tries to give a few details on the topic
in a post
of its adventures in data oriented design series.
Personally, I don’t want to go in depth of the argument this time. I kindly
suggest to read the articles above to have a grasp of how sparse sets work
before to continue.
In short, this data structure contains two arrays: the former is a sparse
array (also known as external or reverse array) while the latter is a
packed array (also known as internal or direct array), hence the name
often used to refer to the sparse sets. To know if the sparse set contains a
value, as an example an entity idenfier, you do the following test:
const bool match = (packed[sparse[entity]] == entity);
If match
is true, than the sparse set contains the given value.
Fortunately, the indirection isn’t required when you want to iterate all the
values contained by the sparse set. It’s suffice to walk through the packed
array for that, from the first element to the last one. This dual access mode is
what makes them both fast and flexible.
In general, all the implementations I’ve seen so far differ slightly from the theoretical version. Sparse sets allow for several optimizations if you know your data. Therefore, you can easily get rid of indirections most of the time or store augmented values in one or two of the arrays. As an example of this, the implementation proposed by EnTT is quite different from the classical one and hardly anyone would say it’s a sparse set without knowing the details and being able to understand the differences.
Systems own sparse sets
A possible use of sparse sets is to match entities against systems. An open
source project in C++ from GitHub that uses sparse sets for that is called
ecst
.
The basic idea is that a sparse set is instantiated for each and every system.
The purpose is to make it contain all and only the entities to which the system
itself is interested and these depend strictly on the components it wants to
iterate at runtime. Systems are then registered with a sort of central manager
that executes them and manages entities and components. During the registration
phase, a system declares what are the components of interest. Its sparse set
will be kept up-to-date after each iteration.
When a system wants to iterate its entities, it reads its packed array and use
the entities thus obtained to get the components from the manager. The way the
manager actually stores the components is out of scope in this case.
As you can guess, iterating entities is incredibly fast. They are all contained
in a packed array and the number of cache misses is reduced to a minimum.
Similarly, keeping a system up-to-date has also a very low cost, thanks to the
properties of the sparse set.
On the other side, getting components in this model could incur in a performance
penalty. If it’s so mostly depends on how things are laid out by the central
manager. This is a double-edged sword: a clever organization can really increase
performance, while a less clever approach can definitely ruin everything.
However, unlike other solutions, data can be organized on a per-component basis,
choosing the best from time to time according to the usage patterns.
A variation on the topic exists and relies on a different ownership model to
speed up the whole thing. I’ve read more than once of this, so it’s worth
mentioning, even though I must admit that the first time I’ve read of it I felt
like it doesn’t fit well with real world cases. Nowadays I’ve changed my mind
and I firmly believe that not only can it work but it’s also an excellent
approach, if you can develop it correctly.
The idea is quite simple in fact: define your systems so that most if not all of
them own one or more types of components and thus their instances. Owned types
don’t necessarily have to be all those iterated by a system, of course. One can
easily understand that a type can be owned at most by a system and referred to
by the others in some way. However, it should be easy to define which system
intuitively owns which component: as a rule of thumb, the one that wants to
write it is the best candidate. As you can imagine, a central manager for
components is no longer required, because their instances are managed directly
by systems.
How does this approach improve the one described above?
Consider that a sparse set now contains not only entities in a packed array, but
also instances of components. If a system wants to iterate the components it
owns, it does it by iterating one or more packed arrays all sorted the same.
Again, cache misses are reduced to a minimum, this time not for entities only
but for entities and components.
The problem (that isn’t a problem at all in fact) arises when a system wants
to iterate also the components owned by one or more of the other systems. In
this case, the entity identifiers can be used to access those items through
indirection in the other systems, as it happens usually with sparse sets. In
other terms, the entity identifiers are used to access the external arrays of
the other systems and to retrieve the indexes where the instances of the
components are stored in the internal arrays.
As long as the number of not owned components is kept low, the advantages due to
the use of full packed arrays for owned components should overcome the time
spent on indirections and the overall performance should be incredibly good.
Moreover, it has the advantage that data can be organized on a per-component or
per-system basis to exploit usage patterns if possible, that is something you
cannot really achieve with other models like archetypes.
Something similar to the last approach described above is used also by
EnTT
for its grouping functionality. Of
course, the implementation differs quite a lot from what I wrote, mainly because
systems don’t own components in its case. However, the principle on which groups
rely is similar and they exploit a combination of packed arrays and optionally
indirection to improve performance when needed.
The way groups work is beyond the purposes of this section. Some additional
details can be found in the next section, but I’ll leave a thorough analysis of
the topic for a future post.
Independent sparse sets
The other way around to use sparse sets to implement a component based model is
to rely on them as the key data structure for pools of components. This time,
the systems aren’t involved and the optimizations find room elsewhere.
Now it should be clear how the sparse sets work, so I’ll give you perhaps less
details and go straight to the point where possible.
An open source project in C++ from GitHub that uses sparse sets for that is my
own library, EnTT
.
To use a sparse set as a pool is straightforward, at least if you don’t aim to
optimize it. The basic implementation already works fine for that. You can just
add a third array ordered as the internal one and add or remove items from it
every time you do that with the packed array.
This gives us a very simple way to know if an entity has a certain component or
not: just use the entity to index the sparse array and the position thus
obtained will also be that of the instance in the packed array of components if
a link exists. Moreover, iterating all the instances of one type of component is
the fastest thing you can imagine, because all your components are tightly
packed in a single array, with no holes in it. If you use entity identifiers
as indexes for the sparse array and because of how sparse sets work, the packed
array of indices will contain in fact identifiers and therefore also iterating
the entities assigned to the given component should be fast as hell.
Note that entities and components will be ordered in the same way in the last
case, so iterations are reduced to the ordered visit of a pair of arrays.
The problems arise when you want to iterate multiple components, mainly because pools are completely independent from each other. There exist several approaches to the problem, either trivial and slightly slower or more complex but faster:
-
The most obvious one is probably more than enough for most of the games and applications out there.
To iterate componentsA
andB
, pick both the pools required, then find what’s the shortest one, that is the one that contains fewer entities. It’s likely that it contains by far less entities than the other pool and this will save a lot of time during iterations. Once the pool to be iterated is decided, begin to visit the entities one at a time and check if the other pool contains them. If so, the entity can be returned to the caller with its components, otherwise skip it and check the next one.
It goes without saying that with this method we don’t know a priori all the entities that have the desired components. Moreover, the indirection required to verify if the entity has all the components slightly degrades the performance, which however remain very good and definitely sufficient for the vast majority of cases. -
A less obvious and more tricky solution is that of creating an additional sparse set to track the entities that have a given set of components. Whenever the component set of an entity changes so that it owns at least the components of interest, the entity is added to this additional sparse set. When one of the components is deleted, the entity that owns it is removed from the pool if present. In general, at runtime, the sparse set will contain all and only the entities that have at least the required components.
This approach obviously increases performance in a lot of cases. However, it also suffers from the indirection required to access the components when the entities contained in the sparse set are iterated. The performance hit due to this aspect can be mitigated in some ways but, from personal experience, I find that none of the possible methods to do it is actually worth it. On the other hand, when you don’t want to access all the required components despite being part of the query, this approach gives you the freedom to choose where to spend your cpu cycles. -
The third and last alternative is perhaps the least obvious, but also the most appealing and is in theory the best in terms of performance, although it has some limitations. That’s also the one implemented to an extent by the grouping functionality of
EnTT
. I’ll try to describe it briefly, without going into too many details.
Consider you want to iterate componentsposition
andvelocity
. Because of the way sparse sets work, you know components are all tightly packed in two arrays. Both of them contain some entities that have both the components and some others that have only one of the components. If you can arrange things so that all the entities that have both the components are at the top of the arrays while all the others are at the end, as a result the components will also be arranged accordingly. Iterations will benefit indecently from how things are laid out in this case, because all the entities that have both the components and the components themselves are tightly packed and sort in the same way at the beginning of their arrays. Therefore, during the iterations no test is required, nor any indirection.
Performance are theoretically the best you can get, because the system knows now exactly what are the entities and the components to return and have them all tightly packed in the same order. It’s like having archetypes, but without the jumps between different blocks to pay for. The downside is that only one order can be imposed within an array. Therefore, if a type of component is affected by multiple iterations, you’ll have to choose for which you want maximum performance and for which you are willing to pay the indirection.
There are probably other solutions that I’m not yet aware of, but these are the
ones I’ve seen so far and/or tried for my own and on which I can guarantee both
in terms of performance and functionalities.
Reach me if you know any other variation on the topic that can be interesting.
The variety of available solutions gives us a perhaps not obvious
advantage.
Consider that a software is usually composed of critical and non-critical
paths. In this case, each of the approaches has advantages and disadvantages and
increases the performance on one or the other feature (eg iterations or
component construction/destruction). Therefore, one could choose to use
different methods for different paths, so as to maximize the benefits depending
on the case.
Obviously this requires that you know very well the software you are working on,
but this is another story.
Final notes on sparse sets
What can be seen as the main problem with sparse sets, at least if compared to
the previous solution, is that it is not automatic. While archetypes are
implicitly generated for the users, this time we need to explicitly decide what
to optimize. On the other hand, however, it’s true that you’ve the possibility
to choose and shape things on your usage patterns, that could be seen as an
advantage instead.
Another consequence of this is that one can decide where to spend cpu cycles,
opting for solutions with slightly lower performance during iterations but with
less impact on other functionalities when needed.
One aspect that may seem interesting about this approach is the fact that it is
definitely easier to set per-component allocators. Not that it isn’t a thing
for everyone, nor a widespread desire, but it is something to keep in mind if it
were among the requirements.
Of course, even with archetypes you can manage to do it, but it is slightly more
inconvenient to do and results probably in a suboptimal solution due to the
fragmentation.
Multithreading is easy to do even in this case, although less trivial than with archetypes. Intuitively, it will be enough to break the arrays to be iterated in several parts, as many as the processes to run, then assign a portion of data to each of the threads.
Finally, something that can be done easily with sparse sets and independent
pools in general but it’s hard to get with archetypes is sorting the
components.
This is something that you may never need, but it’s common for example when
working in painter mode. Of course, it doesn’t shift the goodness of one or the
other solution, but it’s certainly something to take into consideration if you
have this need.
So, archetypes or sparse sets?
There is no answer to this question, at least from my point of view. These
two solutions, for my experience, play in the same league and allow to reach
performance well above expectations if implemented properly, both with their
pros and their cons.
There are advantages and disadvantages in both cases and each solution offers
features that are simply impossible to obtain for the other. Therefore, probably
the choice should be more due to a matter of taste than to the performance of
the individual approach.
I personally believe that, at these levels, any performance problems will be
difficult to ascribe to the time spent on iterations in both cases. Most likely,
bottlenecks will be elsewhere.
Therefore, I don’t suggest either one or the other solution. I obviously made my
choice, but I don’t think this is better or worse than the other. I think rather
that the features offered by one of the two approaches are closer to my way of
thinking and writing code and, therefore, I prefer that solution.
It’s a matter of taste, in fact.
Let me know that it helped
I hope you enjoyed what you’ve read so far.
If you liked this post and want to say thanks, consider to star the GitHub project that hosts this blog. It’s the only way you have to let me know that you appreciate my work.
Thanks.