Lewis Kaneshiro & Karthik Ramasamy, Streamlio | Big Data SV 2018

(upbeat techno music) >> Narrator: Live, from San Jose, it's theCUBE! Presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partners. >> Welcome back to Big Data SV, everybody. My name is Dave Vellante and this is theCUBE, the leader in live tech coverage. You know, this is our 10th big data event. When we first started covering big data, back in 2010, it was Hadoop, and everything was a batch job. About four or five years ago, everybody started talking about real time and the ability to affect outcomes before you lose the customer. Lewis Kaneshiro was here. He's the CEO of Streamlio and he's joined by Karthik Ramasamy who's the chief product officer. They're both co-founders. Gentlemen, welcome to theCUBE. My first question is, why did you start this company? >> Sure, we came together around a vision that enterprises need to access the value around fast data. And so as you mentioned, enterprises are moving out of the slow data era, and looking for a fast data value to their data, to really deliver that back to their users or their use cases. And so, coming together around that idea of real time action what we did was we realized that enterprises can't all access this data with projects right now that are not meant to work together, that are very difficult, perhaps, to stitch together. So what we did was create an intelligent platform for fast data that's really accessible to enterprises of all sizes. What we do is we unify the core components to access fast data, which is messaging, compute and stream storage, accessing the best of breed open-source technology that's really open-source out of Twitter and Yahoo! >> It's a good thing I was going to ask why does the world need to know there are, you know, streaming platforms, but Lewis kind of touched on it, 'cause it's too hard. It's too complicated, so you guys are trying to simplify all that. >> Yep, the reason mainly we wanted to simplify it because, based on all our experiences at Twitter and Yahoo! one of the key aspects was to to simplify it so that it's conceivable by regular enterprise because Twitter and Yahoo! kind of our position can afford the talent and the expertise in order to do this real time platforms. But when it goes to normal enterprises, they don't have access to the expertise and the cost benefits that they might have to reincur. So, because of that we wanted to use these open-source projects, the Twitter and the Yahoo!'s provider, combine them, and make sure that you have a simple, easy, drag and drop kind of interface, so that it's easily conceivable for any enterprise. Essentially, what we are trying to do is reduce the (mumbles) for enterprises for real time, for all enterprises. >> Dave: Yeah, enterprises will pay up... >> Yes. >> For a solution. The companies that you used to work for, they all gladly throw engineering at the problem. >> Yeah. >> Sure. >> To save time, but most organizations, they don't have the resources and so. Okay, so how does it, would it work prior to Streamlio? Maybe take us through sort of how a company would attack this problem, the complexities of what they have to deal with, and what life is like with you guys. >> So, current state of the world is it's fragmented solution, today. So the state of the world is where you take multiple pieces of different projects and you'd assemble them together in formats so that you can do (mumbles) right? So the reason why people end up doing is each of these big data projects that people use was the same for completely different purpose. Like messaging is one, and compute is another one, and third one is storage one. So, essentially what we have done as company is to simplify this aspect by integrating this well-known, best-of-the-breed projects called, for messaging we use something called Apache Poser, for compute we use something called Apache Krem, from Twitter, and similarly for storage, for real time storage, we use something called Apache Bookkeeper, so and to unify them, so that, under the hoods, it may be three systems, but, as a user, when you are using it, it serves or functions as a single system. So you install the system, and ingest your data, express your computation, and get the results out, in one single system. >> So you've unified or converged these functions. If I understand it correctly, we talking off camera a little bit, the team, Lewis, that you've assembled actually developed a lot of these, or hugely committed to these open-source projects, right? >> Absolutely, co-creators of each of the projects and what that allows us to do is to really integrate, at a deep level, each project. For example, Pulsar is actually a pub/sub system that is built on Bookkeeper, and Bookkeeper, in our minds, is a pure list best-of-breed stream storage solution. So, fast and durable storage. That storage is also used in Apache Heron to store State. So, as you can see, enterprises, rather than stitching together multiple different solutions for queuing, streaming, compute, and storage, now have one option that they can install in a very small cluster, and operationally it's very simple to scale up. We simply add nodes if you get data spikes. And what this allows is enterprises to access new and exciting use cases that really weren't possible before. For example, machine learning model deployment to real time. So I'm a data scientist and what I found is in data science, you spend a lot of time training models in batch mode. It's a legacy type of approach, but once the model is trained, you want to put that model into production in real time so that you can deliver that value back to a user in real time. Let's call it under two second SLA. So, that has been a great use case for Streamlio because we are a ready made intelligent platform for fast data, for MLai deployment. >> And the use cases are typically stateful and your persisting data, is that right? >> Yes, use cases, it can be used for stateless use cases also, but the key advantage that we bring to a table is stateful storage. And since we ship along with the storage (mumbles) stateful storage becomes much easier because of the fact that it can be used to store a real intermediate state of the computation or it can be used for the staging (mumbles) data when it spills over from what the memory is it's automatically stored to disk or you can even in the data for as long as you want so that you can unlock the value later after the data has been processed for the fast data. You can access the lazy data later, in time. >> So give us the run-down on the company, funding, you know, VCs, head count. Give us the basics. >> Sure, we raise Series A from Lightspeed Venture Partners, lead by John Vrionis and Sudip Chakrabarti. We've raised seven and a half million and emerged from stealth back in August. That allowed us to ramp up our team to 17, now, mainly engineers, in order to really have a very solid product, but we launched post rev, prelaunch and some of our customers are really looking at geo replication across multiple data centers and so active, active geo replication is an open-source feature in Apache Pulsar, and that's been a huge draw, compared to some other solutions that are out there. As you can see, this theme of simplifying architecture is where Streamlio sits, so unifying, queuing and streaming allows us to replace a number of different legacy systems. So that's been one avenue to help growth. The other, obviously is on the compute piece. As enterprises are finding new and exciting use cases to deliver back to their users, the compute piece needs to scale up and down. We also announce Pulsar Functions, which is stream-native compute that allows very simple function computation in native Python and Java, so you spin out the Apache Python cluster or Streamlio platform, and you simply have compute functionality. That allows us to access edge use cases, so IOT is a huge, kind of exciting POC's for us right now where we have connected car examples that don't need heavyweight schedule or deployment at the edge. It's Pulsar Pulsar functions. What that allows us to do are things like fraud detection, anomaly detection at the edge, model deployment at the edge, interpolation, observability, and alerts. >> And, so how do you charge for this? Is it usage based. >> Sure. What we found is enterprise are more comfortable on a per node basis, simply because we have the ambition to really scale up and help enterprises really use Streamlio as their fast data platform across the entire enterprise. We found that having a per data charge rate actually would limit that growth, and so per node and shared architecture. So, we took an early investment in optimizing around Kubernetes. And so, as enterprises are adopting Kubernetes, we are the most simple installation on Kubernetes, so on-prem, multicloud, at the edge. >> I love it, so I mean for years we've just been talking about the complexity headwinds in this big data space. We certainly saw that with Hadoop. You know, Spark was designed to certainly solve some of those problems, but. Sounds like you're doing some really good work to take that further. Lewis and Karthik, thank you so much for coming on theCUBE. I really appreciate it. >> Thanks for having us, Dave. >> All right, thank you for watching. We're here at Big Data SV, live from San Jose. We'll be right back. (techno music)

Published Date : Mar 9 2018

SUMMARY :

brought to you by SiliconANGLE Media and the ability to affect outcomes And so as you mentioned, enterprises are moving out so you guys are trying to simplify all that. and the cost benefits that they might have to reincur. The companies that you used to work for, and what life is like with you guys. so that you can do (mumbles) right? the team, Lewis, that you've assembled so that you can deliver that value so that you can unlock the value later you know, VCs, head count. the compute piece needs to scale up and down. And, so how do you charge for this? have the ambition to really scale up and help enterprises Lewis and Karthik, thank you so much for coming on theCUBE. All right, thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Karthik Ramasamy	PERSON	0.99+
Karthik	PERSON	0.99+
Lewis Kaneshiro	PERSON	0.99+
Dave	PERSON	0.99+
San Jose	LOCATION	0.99+
Lightspeed Venture Partners	ORGANIZATION	0.99+
John Vrionis	PERSON	0.99+
Lewis	PERSON	0.99+
2010	DATE	0.99+
August	DATE	0.99+
three systems	QUANTITY	0.99+
Streamlio	ORGANIZATION	0.99+
Yahoo!	ORGANIZATION	0.99+
each	QUANTITY	0.99+
Twitter	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Java	TITLE	0.99+
first question	QUANTITY	0.99+
Sudip Chakrabarti	PERSON	0.99+
one option	QUANTITY	0.99+
Python	TITLE	0.99+
both	QUANTITY	0.99+
seven and a half million	QUANTITY	0.99+
17	QUANTITY	0.98+
each project	QUANTITY	0.98+
third one	QUANTITY	0.98+
Kubernetes	TITLE	0.98+
single system	QUANTITY	0.98+
first	QUANTITY	0.96+
Pulsar	TITLE	0.96+
Streamlio	TITLE	0.96+
Spark	TITLE	0.94+
Bookkeeper	TITLE	0.94+
one	QUANTITY	0.93+
one single system	QUANTITY	0.92+
theCUBE	ORGANIZATION	0.91+
today	DATE	0.91+
Big Data SV 2018	EVENT	0.9+
Apache	ORGANIZATION	0.89+
Silicon Valley	LOCATION	0.89+
SLA	TITLE	0.89+
one avenue	QUANTITY	0.89+
Series A	OTHER	0.88+
five years ago	DATE	0.86+
Big Data	EVENT	0.85+
About four	DATE	0.85+
Big Data SV	EVENT	0.82+
IOT	TITLE	0.81+
Poser	TITLE	0.75+
Big Data SV	ORGANIZATION	0.71+
10th big	QUANTITY	0.67+
Apache Heron	TITLE	0.65+
under two second	QUANTITY	0.62+
data	EVENT	0.61+
Streamlio	PERSON	0.54+
event	QUANTITY	0.48+
Hadoop	TITLE	0.45+
Krem	TITLE	0.32+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Bookkeeper: