With this post I described what is probably one of the most interesting features of the sparse set based model.
The grouping functionality makes possible what is called perfect SoA, that is something we can achieve only with this approach among the ones described by the series. As we have seen, that’s theoretically the best we can obtain in terms of performance during iterations because we have all the entities and their components tightly packed and ordered the same. No jumps, no branches, nothing on which to waste our cpu cycles.

If it sounds good, you’ve probably also heard me say that great powers come with some limitations. In particular, the fact that a type can belong only to one group and therefore we must define carefully our groups to get the best out from them.
This contraint can be relaxed in fact and we can have perfect SoA in many more cases than expected.

If you haven’t done it yet, read the previous parts of the series before to continue. It will help to fully understand this one.

Introduction

Long story short, groups are a way to organize the elements within the packed array(s) of two or more sparse sets so that the items that belong to all sets and match some given requirements (the most common of which is that they are assigned to the same entity) are tightly packed and ordered the same at the beginning of their respective arrays.
This is extremely powerful because iterations happen on a fixed number of items with no branches, no jumps, nothing at all. They are just plain iterations from 0 to N and they are performed on thightly packed pieces of memory. We know for sure that the i-th element of each array is assigned to the same entity, no matter what.

However, it goes without saying that we cannot force more than one order within a given array. Therefore groups are limited. A given type of component can belong only to one group and therefore can only be optimized for that group. In all other cases we’ll access it through indirection because of how sparse sets work.
It would be great if we could create both groups <sprite, renderable> and <sprite, renderable, position>. They are almost identical, it’s a shame we cannot reuse the first group to somehow construct the second.

What if I said you that a given type of component could belong to more than one group then? There are some rules to follow but it may work.
This would mean more and more usage patterns matched and therefore maximum performance on more and more paths, not only the most critical ones.

A quick recap

From the insights on the grouping functionality of the sparse sets, we know how groups work. They are a tool that trades more performance during iterations with the invocation of a callback or such during the creation and destruction of components.
Granted, you’ve not to implement it necessarily by means of some callbacks but this is how it works in EnTT. You know: don’t call us, we will call you. That’s it. Here is our trade-off for the grouping functionality. It’s not something one can easily observe in a real world case and it’s still faster than many of the other solutions when it comes to adding or removing components but it’s something to take in consideration anyway.

A running group is like the following:

A packed    : [e4|e7|e3|e8|e6]
A component : [A4|A1|A0|A2|A3]
     group _________^

B packed    : [e4|e7|e5]
B component : [B0|B2|B1]
     group _________^

In other terms, it behaves like a pointer to the first element after the group itself. By construction (read the article linked above for the details) we know then that all elements are tightly packed and ordered the same.

When a new entity enters the group, all the components assigned to it are moved to the position pointed by the group and the latter is then incremented. When an entity leaves the group, it’s swapped with the last item present in the group itself and the latter is then decremented.

So far, so good. In a sense, the grouping functionality induces an order in the packed arrays of the sparse sets for the types of components it owns. Moreover, it’s known that we cannot sort an array in two different ways at the same time.
Here we are thus with our limitations. One component, one group. Nothing more and nothing less.

Nested groups

The trick to overcome the one-component-belongs-to-one-group limit is in the fact that we aren’t inducing an order at all. Quite the opposite actually. The order is a nice-to-have consequence of the way we construct and maintain a group. However, roughly speaking, ordered items aren’t a requirement to be able to properly define a group.
Of course, I’m not saying we could renounce to the order because it would make groups pointless. However, for the purpose of this post, let’s ignore it for a while and look at what remains to us: a separation in two parts of the original array. This is what defines a group. After all, it’s only a pointer to the element immediately after the ones in the group itsef, right?

If you’re wondering how this can help, look at the two partitions as if they were two arrays. Nothing prevents us from splitting them in two parts and these parts in two parts and so on until necessary.

Consider now the example above. It describes a group <A, B> that contains already entities e4 and e7:

A packed    : [e4|e7|e3|e8|e6]
A component : [A4|A1|A0|A2|A3]
     group _________^

B packed    : [e4|e7|e5]
B component : [B0|B2|B1]
     group _________^

The parts left out don’t have much to share with each other, so we cannot really divide them for our purposes. However, they are also the entities in which we weren’t interested initially, so who cares anyway.
On the other side, the parts of the arrays that contain the groups are identical in terms of entities: e4 occupies position 0 and e7 occupies position 1 in both the arrays. We know also that all the elements that belong to an entity will move together here and there when needed by construction.

Imagine to add another group <A, B, C> that subsumes the one already created. It’s strictier in a sense, it sets more constraints to the entities that want to enter it. Furthermore, we can say already that <A, B> will contain at least the elements of <A, B, C>. So, <A, B, C> defines two partitions within <A, B>: the entities that have also C and the ones that have not it, if any:

A packed    : [e4|e7|e3|e8|e6]
A component : [A4|A1|A0|A2|A3]
  <A, B, C> _____^
  <A, B> ___________^

B packed    : [e4|e7|e5]
B component : [B0|B2|B1]
  <A, B, C> _____^
  <A, B> ___________^

C packed    : [e4|e8|e5]
C component : [C2|C0|C1]
  <A, B, C> _____^

The example is trivial but it should give you a grasp of what I mean. We can define as many nested groups as we want. The new condition is that two nested groups, that is two groups that own the same type(s) of component(s), must be such that one literally contains the other.
Contains here means that the list of types of components of one group extends the list of the other. The largest group (as in the group that owns the highest number of types of components) also contains at least all the elements of the smallest.

Once defined, nested groups are then identical to normal groups. In case we wanted to iterate <A, B>, we could get the packed arrays for the components A and B and iterate them from 0 to size of <A, B> (whatever it is). To iterate <A, B, C> instead, we just add a third packed array to the list and we use size of <A, B, C> as our N in this case. We don’t have to know anything about <A, B>.

An example of two conflicting groups that should be rejected is <A, B> and <A, C>. In this case, none of them extends the other. Therefore, it will be impossible to create multiple nested partitions within the arrays of entities, components or whatever your arrays contain.

The order matters

I bet it’s not immediately evident what the algorithm to make an entity enter a deep nested group is however. I made this error and what I’m to say is also the reason for which order of execution on callbacks is guaranteed now in EnTT.
Unfortunately, we cannot just say - if the entity has A, B and C, let’s swap it with the element pointed by this group.

Consider the following example:

A packed    : [e4|e7|e3|e8|e6]
A component : [A4|A1|A0|A2|A3]
  <A, B, C> __^
  <A, B> ___________^

B packed    : [e4|e7|e5]
B component : [B0|B2|B1]
  <A, B, C> __^
  <A, B> ___________^

C packed    : [e6|e8|e5]
C component : [C2|C0|C1]
  <A, B, C> __^

The entity e8 has both A and C, so it’s not part of any of our groups. However, it’s only far a B from both of them. Let’s add this component and just swap e8 with the element pointed to by <A, B, C>:

A packed    : [e8|e7|e3|e4|e6]
A component : [A2|A1|A0|A4|A3]
  <A, B, C> _____^
  <A, B> ___________^

B packed    : [e8|e7|e5|e4]
B component : [B3|B2|B1|B0]
  <A, B, C> _____^
  <A, B> ___________^

C packed    : [e8|e6|e5]
C component : [C0|C2|C1]
  <A, B, C> _____^

No good: e8 entered both groups as expected but e4 exited <A, B> even though it has both A and B. Our groups are broken now and there is no way we can easily recover.

This happened because order of execution matters in this case. Rules are pretty simple but we cannot escape them:

  • When we add a component to an entity, test-and-swap groups from the largest to the smallest.
  • When we remove a component from an entity, test-and-swap groups from the smallest to the largest.

In the example above, it would mean to make e8 enter firstly <A, B> and only then <A, B, C>.

Why should this solve the issue though?

In all cases, the element pointed by <A, B, C> is either the same pointed by <A, B> or one within this group. Because of this, swapping the element pointed by <A, B, C> with the one we want to move into the group means swapping two elements within <A, B> if it already contained the entity and thus no element will unexpectedly leave <A, B>.
The same is true in case of destruction of components but the order to follow is the opposite this time, for similar reasons.

Perfect SoA to the rescue… what else?

As it was for the basic grouping functionality, nested groups come with a price. The good news are that it’s exactly the same as before. The more groups own a given type of component, the more this will affect construction and destruction of its instances.
On the other side, the more you are willing to pay for a given type of component, the higher the number of critical paths on which it will have the best performance ever.

Note that I use the term pay to make it clear that nothing is for free in this field but I strongly suggest you to measure, measure, measure to have an exact idea of what the price is at the end of the day.
It’s unlikely you’ll ever note it unless you’re writing some crappy benchmarks that creates and destroys 1M entities at once or so repeatedly.

Furthermore, fortunately the benefits of the groups aren’t limited only to the fact that they are faster.
Also in this case you’ll observe the difference only when you decide to push your software beyond its limits. I even suggest users not to use groups with EnTT during the prototypying phase and for large part of the development, until they know what are their most critical paths and where it makes sense to use groups or not.
However, groups provide the users with a (T*/size) couple that fits with many uses other than iterations. Serialization? Networking? Raw copies? Just use your imagination and I’m pretty sure you’ll find many other examples. Because of this, having the possibility to define more groups as nested groups is a great plus

Let me know that it helped

I hope you enjoyed what you’ve read so far.

If you liked this post and want to say thanks, consider to star the GitHub project that hosts this blog. It’s the only way you have to let me know that you appreciate my work.

Thanks.