Working with some customers it is often desirable for them to maintain an on-premise, operational system but have that system stream it's data to the cloud for new types of work such as analysis or business intelligence. In this instance streams of transactional information is pretty straight forward, where each new record is a new entity all of its own. Sometimes though you might want to take copies of relational data but frequent snapshots just aren't viable, this could be because there's too much data to do this frequently, or because the impact of running a large select on the table impedes the performance of the operational system. One solution to this is to stream the changes to the table, or change data capture.
Using this stream of insert, update and delete statements can be useful in itself to get some insight into how your data is changing over time, identifying when there are a large number of update statements for instance could identify some background process executing which you were previously unaware of. But to use the data itself you'll often want to turn it back into a reasonably current view of the on-premise data. A method of doing this is to "re-play" the records as they arrive which is okay if the data doesn't change too much, but if it does then this can be burdensome both in terms of performance and possibly financially as well.
A possible solution is to periodically (at a frequency of your choosing) take the latest view of the data. Often people can get distracted by making sure that Insert, Update and Delete records are done in the correct order because of edge cases that happened a while ago. But what you're actually concerned about is "what does the data look like now". In this fictional example we have the current view of data.
And the changes received since.
A replay of this would take each change record and play it again against the destination. As you can see for employee_id 1 this would result in 2 updates before a final delete. Employee 2 would be added and then updated and employee 3 (who was previously employee 1) would be added. But if we take just the most recent records for each then we get the near-current state of each employee record.
So we know that employee 1 needs removing and employees 2 and 3 need upserting. That can be done differently depending on your platform of choice, but applying the changes as batch deletes and upserts you end up with the following.
The end result should be a process which is easier to think about, easier to implement and should save a lot of processing effort.