How I would do it (wiser heads would likely have a better way):
The messages aren’t the events. The events are:
“WorkRequestArrived” – which holds the message and some unique identifier
“WorkRequestCompleted” – which holds just the unique identifier
What gets persisted are these two events – so you’d have 100 “WorkRequestArrived” events in the journal at 10:05am. As messages are completed, “WorkRequestCompleted” events are added to the journal. So at 10:10am, you’d have 100 “WorkRequestArrived” and 50 “WorkRequestCompleted” events in the journal.
(if the messages are sizable or there’s an issue serializing them, or even just for auditing, I’d store them somewhere else and just persist a reference to them.)
Snapshots would reduce the number of events replayed during recovery, yes. Each persisted event gets a serial number; when you take a snapshot it receives the same number as the most recent persistent event. During recovery, the system gives you the snapshot then each event with a higher serial number.
So, if you don’t have snapshots, and need to restart after 10:10am, you’d have 150 events played back.
Let’s say you take a snapshot every 100 events (but not do anything else). At 10:05am, your journal has 100 events and your snapshot has a single serialized view of the queue. At 10:10am, the journal has 150 events, and the snapshot holds the same serialized view. If you start at this point, you get the snapshot offer, then the 50 events that occurred after the snapshot was taken. So, yes, the snapshot will reduce the number of events replayed.
To save space in the journal, you can use SaveSnapshotSuccess to tell you it’s OK to delete the journal entries older than the snapshot. So, with this, at 10:10am the journal would only contain 50 events, instead of 150.
Does this help? I hope I’m answering your questions.