Stateful Data Flow Beta Build composable event-driven data pipelines in minutes.

Get Started Now

Streaming First: Designing Data Intensive Applications

Deb RoyChowdhury

Deb RoyChowdhury

VP Product, InfinyOn Inc.

SHARE ON
GitHub stars

Data intensive applications, anyone?

It’s probably foolish to attempt to fill jet fuel in a car and expect it to fly.

Yet. This is precisely what we have been trying to do with data. Infact it is even worse. We may even be trying to take a steam engine and boil the ocean and trying to make it fly!

As the title says, this blog is about streaming first. Its a data engineering myth that no-one needs real time and everyone can wait for batch processing.

Data Intensive Applications are better suited to the streaming pattern. Thankfully there is a solid book on the topic. And that is what this article is about.

The Metamorphosis of Data Systems

The shift from transactional databases to streaming systems isn’t just a technological shift. A trend. A fad.

It’s an an organic transformation to align with the natural order of data processing.

As Martin Kleppmann articulates in “Designing Data-Intensive Applications,” systems must be:

  • Reliable: Work correctly even in the face of failures
  • Scalable: Handle growth in data volume, traffic, and complexity
  • Maintainable: Make it possible for engineers to work on the system intuitively

I think the text might talk about modern systems. But I think this is true for most if not all systems in theory and aspirations.

From ACID to Event Streams

“Wait don’t we have databases?” you might say. Are they not reliable, scalable, maintainable? Yes, and, they also have some limitations.

Traditional databases solved these challenges through ACID transactions:

  • Atomicity: All or nothing execution
  • Consistency: Data remains valid between transactions
  • Isolation: Concurrent transactions don’t interfere
  • Durability: Committed data isn’t lost

However, as systems scaled across multiple machines, the limitations of this approach became apparent:

  1. Coordination Costs: Distributed transactions require expensive coordination
  2. Availability Tradeoffs: Strong consistency often compromises availability
  3. Operational Complexity: Managing distributed state becomes increasingly difficult

The Case for Event-Driven Architecture

So what is this Event Driven Architecture, and what about streaming? Well most of us might not need to consciously reflect on this, but majority of software is analytical now.

We don’t use software merely to insert/update information or to transact anymore. We spend a lof of our time in the digital universe interacting with information. And we operate in the paradign of streams and events.

Why Events Matter

The challenge with Events is that it requires us to shift how we think about data. It requires us to unlearn what we are used to. It requires us to change.

And no matter what we want others to think, we are are change resistant. Often for legitimate reasons.

Weird things is that thinking in terms of Events is more aligned with the organic flow of human computer interaction than thinking in transactions.

  1. Natural Representation: Most real-world changes are events

    // Traditional Database Update
    UPDATE users SET status = 'premium' WHERE id = 123;
    
    // Event-Based Approach
    {
      "type": "user_upgraded",
      "user_id": 123,
      "plan": "premium",
      "timestamp": "2025-02-17T10:30:00Z"
    }
    
  2. Audit Trail: Events provide a complete history

    • What changed
    • When it changed
    • Why it changed
    • Who changed it
  3. Time Travel: Events if managed properly gives us the ability to reconstruct state at various points in time.

Transactional & Event Driven

Kleppmann makes a crucial distinction between two approaches:

  1. Transactional (Traditional)

    • Databases store current state
    • Updates modify state in-place
    • Harder to debug and audit
    • History is lost
  2. Event-Oriented (Modern)

    • Store events that led to current state
    • State is derived from event log
    • Complete history preserved
    • Natural audit trail

Streaming First Architecture

If the phrase streaming first is shocking, let me re-iterate: batch processing is a subset of event streaming. Transactions are a subset of events.

While you don’t necessarily need to have event driven architecture; building customer facing intelligent applications with analytics and AI/ML using the patterns of batch processing and ETL will have significant challenges and costs.

You may say streaming is complicated to manage as well. The complexity has been reducing rapidly in the past years and there are refined systems which make stream transport and stream processing a lot more accessible than it was 10 years ago.

The Log as a Source of Truth

The key insight is that an append-only log of events serves as a powerful abstraction:

Event Log:
[0] User registered     | timestamp: 2025-02-17T10:00:00Z
[1] Profile updated     | timestamp: 2025-02-17T10:05:00Z
[2] Subscription added  | timestamp: 2025-02-17T10:10:00Z
[3] Payment processed   | timestamp: 2025-02-17T10:11:00Z

Benefits:

  1. Immutability: Events once written cannot change
  2. Ordering: Clear sequence of what happened when
  3. Replayability: Can recreate state by replaying events

Components of Streaming Systems

Streaming architectures typically include:

  1. Event Brokers

    • Apache Kafka: distributed event streaming platform, the OG, written in Java.
    • Apache Pulsar: distributed messaging and streaming platform, written in Java.
    • RedPanda: Kafka-compatible Fast Streaming Platform, written in C++.
    • Fluvio: Lightweight next generation streaming platform, written in Rust.
  2. Stream Processors

  3. State Stores

    • Materialized views of event streams
    • Optimized for different query patterns
    • Automatically updated from streams
    • Examples: RocksDB

Handling Time and Order

Kleppmann emphasizes several crucial concepts about time in distributed systems:

  1. Event Time vs Processing Time Timestamps are an extremely important aspect of a distributed system. Granular management of timestamps is important for the reliability of the system and the accuracy of the data.

    Event occurs  →  Event detected  →  Event processed
    (event time)    (ingestion time)   (processing time)
    
  2. Ordering Guarantees Ordering guarantees, deduplication, and reliable distribution of data is critical to the performance of distributed systems.

    • Total Order: Within a partition
    • Partial Order: Across partitions
    • Casual Order: Related events
  3. Windows and Watermarks Windows are an amazing capability for bounding unbounded event streams. Window Types:

    • Tumbling (fixed, non-overlapping)
    • Sliding (fixed, overlapping)
    • Session (activity-based)

Practical Implications

Fault Tolerance

How to handle failures:

  1. Exactly-Once Processing Exactly once processing has quickly become a common expectation of these systems. Exactly-once semantics is achieved by:

    • Idempotent operations
    • Unique event IDs
    • State checkpointing
  2. State Management

    • Local state with checkpoints
    • Distributed state with replication
    • Recovery from logs

Consistency Models

Different consistency requirements need different approaches:

  1. Strong Consistency

    • Use synchronous processing
    • Accept higher latency
    • Limited to single partition
  2. Eventual Consistency

    • Process asynchronously
    • Optimize for availability
    • Handle conflicts explicitly

Composing Systems

One of the key insights is about system composition:

  1. Event Stream Processing The ideal pattern is to capture events from source applications, APIs, and process them to build enriched dataframes, materialized views, analytics etc.

    Event Log → Stream Processing → Materialized Views
                                  → Search Indexes
                                  → Analytics
                                  → Caches
    
  2. Change Data Capture The alternative pattern is to hit the transactional database log and capture the changes as events. Change data capture is more of a workaround to go from Extract Tranform Load type batch patterns to event driven analytics.

    • Transform database changes into streams
    • Enable gradual system evolution
    • Bridge old and new systems

Real-World Implementation

Design Patterns

Command Query Responsibility Segregation (CQRS)

CQRS might sound complex, but it’s really about recognizing a simple truth: reading and writing data have different needs. Instead of forcing both operations through the same model, CQRS splits them up.

Here’s how it typically flows:

Write Path:
Command "Create Order" → Validation → Event Generation → State Updates
                                                       → View Updates

Read Path:
Query "Get Order History" → Read from Optimized View

Event Collaboration

Think of event collaboration as a sophisticated postal system for your services. Instead of services calling each other directly (like making a phone call), they drop messages in a shared mailbox (the event stream) that interested parties can check.

This creates some powerful possibilities:

  • Services can be added or removed without disrupting others
  • Each service can process events at its own pace
  • You get a complete audit trail by design

Materialized Views

Materialized views solve a common problem in event-sourced systems: how do you efficiently query data that’s stored as a sequence of events? The answer is to create read-optimized views that are automatically updated as new events arrive.

The beauty of this approach is that you can:

  • Optimize each view for specific query patterns
  • Rebuild views from scratch by replaying events
  • Add new views without changing your event structure

Practical Considerations

Schema Evolution

As your system evolves, your event schemas will need to change. The trick is doing this without breaking existing processors. Here’s a robust approach:

  1. Start with versioning:
  • Work with API Versions, Package Versions for schemas, types, functions
  • Version DataFlows, Connectors and target complete test coverage
  1. Use a schema registry:
  • Validate new events against schema
  • Ensure backward compatibility
  • Document changes over time

Performance Optimization

Performance in streaming systems comes down to smart partitioning and efficient data management. Here’s what that looks like:

Partitioning Strategy: Choose based on your most common access patterns: Data Lifecycle: Implement smart retention policies:

  • Keep detailed data for 30 days
  • Aggregate data older than 30 days
  • Archive data older than 1 year

The key is to mix and match these patterns based on your specific needs. Don’t be afraid to evolve your approach as you learn more about your usage patterns.


Conclusion: The Future is Streaming

The insights from the text make it clear that streaming isn’t just another architecture style—it’s a fundamental shift in the design and implementation of data systems.

Key takeaways

  1. Event-First Thinking

    • Model changes as events: Instead of thinking about data as current state, model it as a series of events that led to that state. For example, rather than storing a user’s current balance, store all the transactions that resulted in that balance. This approach provides richer context and enables powerful analysis.
    • Preserve history: By keeping the complete event log, you maintain the ability to understand how your system reached its current state. This is invaluable for debugging, audit compliance, and understanding user behavior patterns over time.
    • Enable auditing: Event logs provide natural answers to questions like “who changed this value?”, “when did this change happen?”, and “what was the sequence of actions that led to this state?” This is increasingly important for regulatory compliance and security investigations.
  2. Composition Over Monoliths

    • Specialized components: Build systems from specialized components that excel at specific tasks. For example, use a stream processor for real-time event handling, a search engine for text queries. Each component can be optimized for its specific use case.
    • Clear boundaries: Define explicit interfaces between components using defined event schemas and types. This makes it easier to understand system behavior, replace components, and evolate the system over time. Clear boundaries also help in maintaining team ownership and responsibility.
    • Flexible evolution: When systems are composed of independent components connected by event streams, you can upgrade or replace components without system-wide changes. For instance, you can add new analytics capabilities by simply consuming the existing event stream, without modifying the source systems.
  3. Embrace Asynchrony

    • Decouple processing: Allow different parts of your system to process events at their own pace. For example, while your order processing system might need to handle events immediately, your analytics system can deal with backpressure during high load and catch up.
    • Scale independently: Different components can scale according to their specific needs. Your event ingestion might need to scale based on incoming event rate, while your query layer scales based on the number of active users. Asynchronous processing makes this possible.
    • Handle failures gracefully: In an asynchronous system, the failure of one component doesn’t need to affect others. If your analytics pipeline fails, your core transaction processing can continue unaffected. Failed components can recover by replaying events from the log.

The future belongs to systems that:

  • Treat events as first-class citizens
  • Provide complete audit trails
  • Enable flexible derivation of state
  • Support gradual evolution

As Kleppmann demonstrates, the theoretical foundations support this direction, and the practical benefits make it inevitable.

Stay in Touch:

Thanks for checking out this article. Hope this gives you some insight on streaming first architecture to build in your context. If you’d like to see how InfinyOn Cloud could level up your data operations - Just Ask.