Stateful Services (private release) Build composable event-driven data pipelines in minutes.

Request access

Why 87% of all data projects are doomed to fail, and how you can improve the odds of success

Deb RoyChowdhury

Deb RoyChowdhury

Contributor, InfinyOn

SHARE ON
GitHub stars

What the data processing tools in the markets won’t tell you about data pipelines

As an engineering leader in an ambitious growing organization, there is a nine out of ten chance of you getting sucked into using data processing infrastructure and tools that will need 3x to 5x more time, developer capacity, and budget.

Maybe you are already on that slippery slope subscribing to the ease of managed cloud solutions or the temptation of open source tools. It would take 180 days or less for your weekends to be consumed by ‘incidents,’ and the rest of the ride is going to be bumpy until the crash landing! If you have been doing it for a while, you know this reality!

I have been in similar circumstances through the past 16 years of my career in building data intensive products. Whether it is the low entry barrier for open source big data tools, the latest fads claiming the death of entire categories, or managed services which are ‘easy to get started to use’… until they come to harvest your organs with their bills!

What’s up with the tooling bloat? Nearly 1500 companies are selling memes!

It is a pain to exist in this market. But hey, there is sunk cost fallacy, survivorship bias, fear of crashing out, and more. Yet there is a sliver of hope of a brighter future!

What you must do if you are trying to build profitable future-proof data solutions

I have seen way too many failed data projects because of over promising, over complicating, over engineering, under communicating and not empathizing with the customer. You likely have as well.

At the end of the day the data is important, but it is an input to better decisions and success!

That is what we are looking to actualize for our businesses in the age of entropy overload!

Less is indeed more! This is the hardest problem for me as a product leader with a hands-on data engineering and AI/ML background. An area that I deliberately strive for 1% improvement each day!

The thing is that the real-time data game is lost before it begins if you are looking to keep on routing data through multiple data landings and manage a ludicrous amount of transformation logic all of which results in more copies of the same data.

More infrastructure to manage. More applications to manage. More costs to your business. And a dangling carrot of profits which keeps you on the hamster wheel as long as you can survive.

Now if you want to play that game, I would be in the stands rooting for you and hoping you win. Or, I can be real and tell you that 9 out of 10 times you are going to be on the losing side!

At least, that has been my experience. Not everyone is Netflix, Uber, LinkedIn and there are several reasons for that. Capital, talent pool, expectation vs reality of actual digital competence of infrastructure and the ecosystem.

You need to be real about your budget and capacity, and your build vs. buy decisions. It sounds obvious, yet this is where we suck big time.

How we are solving the problem at InfinyOn

At InfinyOn we are building simple primitives to build data flows which are currently helping small engineering teams at a few companies to orchestrate and build products based on machine generated data from sensors and server applications.

Now our website says that we are a real-time event streaming platform, which we are. However, real-time is like a slang these days!

Here is what we are not. We are not a Kafka reseller, a Flink reseller, a database, or yet another big data analytics solution! There are many real time databases out there which are awesome, but the real-time game is played as the data flows and not in database queries. This is a pattern that is pretty sparse in the entire software ecosystem.

The problem that our small team has found to be the hardest is the one of data orchestration and transformation.

Now you might be thinking, what is the big deal? There are so many ETL, ELT companies out there, and you are correct!

Just look at the amount customer complaints about their connectors, and the inflated costs, are and how suboptimal the workflow is…

We are not building yet another tool which tries to serve everything up on a platter, sets the expectation of serving a Michelin star restaurant quality dish and serves up fast food quality bloatware.

We have built the foundations of an end-to-end data streaming tool that combines a message queue, a transformation engine, and Serverless data flow runtime from 0 to 1.

No infrastructure to set up, no upfront cost, just build clients and configurations to connect your data sources, add your business logic transformations. And Voila, the data collection, transformation and delivery setup is done.

We empower companies who value their product and data, with a data orchestration and transformation layer that is green, future proof, cost effective as well as robust, blazing fast, and easy to maintain.

How we can elevate your data flows?

We are opening up limited spots to double our design partners in Q2, 2023 to power up your technology roadmap with simplicity. We will provide you with a ton of credits on InfinyOn Cloud to get started, and collaborate with your engineering team to build out your data collection, data transformation, and data orchestration layer. You will have your data pipeline built out and you will ship intelligent features in your product.

Our raving customer says that in 99% of their use cases, InfinyOn empowers them to build robust data orchestration in a single sprint with a team of 1 or 2 engineers! What are their use cases? They need to collect and shape machine generated data for building their product which includes telemetry, monitoring, analytics, automation, and machine learning features. Our platform allows them to build simple clients that orchestrates the shaped data to the different features and applications so they can focus on serving their customers.

A huge financial services customer benchmarked our product against the currently available messaging and data orchestration solutions in the market and found us to be 3x lower in latency, 5x faster in throughput, 7x lower CPU consumption, and 50x lower memory consumption. If you read between the lines of this benchmark, what they are saying is that the platform is ridiculously efficient and cost effective.

Let me balance out the positive feedback by tempering the expectations a bit. We believe that connectors of data sources are a ubiquitous pattern, yet there is no one size fits all approach to connectors. It is the reason for the subpar experience with endless lists of prebuilt connectors.

We have built development kits to empower and enable you to build robust connectors and smart modules. We are here to collaborate with you to build connectors and transformation smart modules which you will be able to reuse and repurpose for all your data collection, and transformation use cases.

There are several things that we can build together, but I don’t believe in ‘if you build it they will come, or ‘fake it till you make it.’

We want to teach you how to fish rather than serving you stale canned fish!

We may have a fit, if you are a technology or delivery leader at a technology company that is working with data understands the pains and gains of data and infrastructure with a limited budget to build. It would be a perfect fit if you are working on remote monitoring, telemetry type use cases and innovating in manufacturing, robotics, connected devices, gaming, avionics, spacetech etc.

If the above description resonates, I would invite you to apply to our data flow design sprint to accelerate your product development and build for the future.

If you would like to be a part of the next exclusive cohort of data flow design sprint:

Setup a 1:1 call with me.

Technical fine-print

We are building the simplest and most performant data capture and transformation primitives to process data as it flows from the edge. Of course that means serverless, real-time, event streaming and all the buzz words that you can think of!

If you take away one thing and one thing only from our self description, then I hope it is this:

InfinyOn is a platform for developing simple yet ridiculously efficient data flows to build delightful software.

The product is a low latency data orchestration and transformation runtime which is written from scratch using Rust and the first principles of software engineering! It’s like a kernel with endless possibilities. It’s compatible with Web Assembly giving us an astronomical level of flexibility and usability. This is just the tip of the iceberg!

The primitives that we use in our platform are producers, consumers, topics, connectors, and smart modules aligned with the messaging paradigm.

  • Fluvio is the core of the platform, the equivalent of a kernel that pacakges the primitives into a runtime
  • InfinyOn Cloud s a fully managed operating system that simplifies the deployment of the Fluvio kernel and runtime.
  • Topics, Producers, and Consumers are standard messaging system concepts.
  • TopicsConnectors are used to connect to data sources and destinations using connection protocol, authentication, and access patterns.
  • You can build connectors using the Connector Development Kit to connect to any protocol to orchestrate ingress and egress of data.
  • Smart modules are data transformation operators which filter, group, transform, count, map and more.
  • You can build smart modules using the Smart Module Development Kit to transform the data in topics.
  • InfinyOn Cloud Hub enables reusing connectors and smart modules to apply common patterns of data collection and data transformation to shape data for diverse use cases.

Our current design partners and customers can’t rave enough about InfinyOn Cloud, which is the best part of my job as the Head of Product!

Thank you for reading through this piece.

I am looking to start collaborating with a cohort of 3 companies in Q2 and I want to identify the design partners by the end of April and get to building together.

If you are a technology or delivery leader building data-intensive applications for the future, I would invite you to apply to the upcoming data flow design sprint.

If you would like to be a part of the next exclusive cohort of data flow design sprint:

Setup a 1:1 call with me.

If you would like to learn more to make an informed decision, book a meeting to learn more. Here is my calendar: Setup a 1:1 call with me.

Connect with us:

Please, be sure to join our Discord server if you want to talk to us or have any questions.

Subscribe to our YouTube channel

Follow us on Twitter

Follow us on LinkedIn

Try InfinyOn Cloud a fully managed Fluvio service

Have a happy coding, and stay tuned!

Further reading