Thinking in Actors when querying data

Say, I’m hosting some online service. In this service each user account is represented by a Persistent Actor, which manages all of the user’s state and reacts to certain events.

Now I want to query certain users based on some secondary attribute, and send those actors a command. Say, for example, I want to upgrade all users located in Australia to a Premium Membership.

I could ask each actor in the cluster about it’s country, but that would presumably take a long time to execute on a frequent basis. What is the technique one would use for the ability to query persistent actors based on secondary attributes? Would you duplicate the state of each actor in a database (e.g. RDBMS) to make it queryable? Can I integrate that mirroring mechanism into the journal plugin? Or is there another technique/pattern you can use to query a group of persistent actors based on their attributes?

1 Like

Model your platform/app/system as a graph/tree. Then your ‘country’ attribute will be managed by a ‘supervisor’. Rather than ‘ask’/query just ‘tell’ based on the namespaced tree that is the model/system. So all the actors managed by ‘australia’ receive the message.

1 Like

Hey there,

If I understand your question correctly, you would like to know if there are generic approaches that can be applied to the scenario “How do I query for entities based on some secondary attribute, and then send those actors a command?” The short answer is yes, there are techniques/patterns that people have devised and are supported by Akka out-of-the-box, but your solution will invariably depend on your context. I can hardly cover all, but I can give you some food for though - and then I shall try to unpack the example problem.

  1. Generally accepted as the most appropriate approach, you should consider building materialized views of your entities in an engine suitable for querying (such as SQL), effectively creating what’s known as the read-side. After all, CQRS and Event Sourcing are used together more often than not. Have a look at Lagom’s implementation, considering it’s built on top of Akka you can hardly find a better reference.

  2. Alternatively, if you do not require powerful querying capabilities, a consumer could listen in on the published entity events and maintain a collection of entities information that you’re interested in. Whenever you wish to send a command to all or subset of those actors, you can instruct the consumer to broadcast the command in question. If you think about it, it’s not much different than the approach above but you might want to save yourself the trouble of adding yet another database engine into the picture if it’s not really needed.

  3. Given that you will likely passivate your actors in order to free up memory, you could use the pre-start phase of the actor to request updates from another actor/register. Similarly, active actors could use timers to do the same.

  4. Depending on the volume of your data, you could use the Distributed Data module.

And the list goes on. As I say, it all depends on your context. Akka is quite powerful and enables you to come with ingenious ways of tackling all sort of problems.

Now to unpack the example situation you’ve presented: upgrade all Australian residents to Premium Membership. Without any knowledge about the domain, I am going to assume that having a full-blown a read-side is optimal. And since I’ve referred to Lagom’s one as a starting point, I will consider the query problem solved (indeed, just like that!) and take a look at the next part of the problem - sending a command to all of the user actors.

Looking at the requirement, my immediate instinct is to ask myself few important questions that will help determine what’s the best way handle the “broadcast” to the actors.

  1. How many users do we have in Australia? I most certainly don’t want to hammer my system to death, so I might want to run these upgrades in a controlled manner (say, in batches). After all, this is not a typical UPDATE … WHERE statement I am running here.

  2. What happens if the system crashes mid-flight, so to speak? I should probably want to maintain some state, allowing me to recover from failure? But then, how do I trigger this operation again?

  3. And so forth …

The point I am trying to make is this, while Event Sourcing is great for modelling complex systems it does force you to think carefully about the consequences of the group operations you intend to run, especially when it’s on a large scale. Otherwise, you risk leaving your system in an undesirable state or, worse, batter it to death.

Hopefully you’ll find my ramblings useful.

1 Like

Just to get a bit more granular - The ‘australia’ supervisor broadcasts the message and each actor persists its state. Modeling using the actor paradigm helps with message flow in the direction of arrival.

Akka persistence helps with the stateful nature of the experience (eventsourcing). This structural linearity in the model removes the need to query/ask - who among you belongs to ‘Australia’?

The pattern here is applying what we would apply in the ‘physical world’ around us. We don’t ask if someone belongs to an area before sending a message. We have the ‘addresses’ and we send the message.

Actors have a ‘physics of messages’ to what they do. And the nice thing is akka gives us all the tools to make it richer. Persistence/sharding etc.

Start with the graph/addresses and see the flow of the messages and then enhance using akka tools.

If we start with the tools and work backwards the questions we ask will be old school.

Thank you very much for your helpful and extensive answers @ravimadhu @chmodas .

I’ve been looking into your approaches, and let me try to break down what I got from it.

What all approaches share in common is to create a “queryable” view of the actors. A view is a read-only representation of the actor states with added querying capabilities.

The difference between Ravi’s and Borislav’s approach is how the queryable view is created and maintained:

  1. Ravi creates that view through actor hierarchy. There are “supervisor” actors that manage a certain attribute of child actors, and child actors are registered with the corresponding supervisor. The views (supervisors) are specific to certain attributes of the child actors. This requires deep understanding of the domain and it’s requirements. The views are created by communication between actors on an application level, according to those requirements.
  2. Borislav creates the views by globally listening to the events emitted by the persistent entities. The view is updated based on the received events and can have any level of complexity. The views can be actors with only a subset of the data (similar to Ravi’s approach), or a full database representation of each actor. However the data is represented, as external database or through in-memory data-structures, the view can then be used to query and obtain references to specific actors.

If I’ve misunderstood anything, please let me know.

At this stage I’m familiar with how to implement 1), but 2) is a new field to me that I have to learn about and explore further in practical terms.