Image Title

Search Results for Apache Druid:

Fangjin Yang, Imply.io | CUBE Conversation


 

(bright upbeat music) >> Welcome, everyone, to this CUBE Conversation featuring Imply. I'm your host, Lisa Martin. Today, we are excited to be joined by FJ Yang, the co-founder and CEO of Imply. FJ, thanks so much for joining us today. >> Lisa, thank you so much for having me. >> Tell me a little bit about yourself and about Imply. >> Yeah, absolutely. So, I started Imply a couple years ago and before start the company, I was a technologist. So, I was a software engineer and software developer primarily specializing in distributed systems. And one of the projects I worked on, ultimately became kind of the centerpiece behind Imply. Imply, as a company is a database company. What we do is we provide developers a powerful tool in order to help them build various types of data analytic applications. We're also an open source company, where the company develops a popular open source project called Apache Druid. >> Got it, so database as a service for modern analytics applications. You're also one of the original authors of Apache Druid. Talk to me, gimme a timeline, Druid's 10-year history or so. What's the big picture? What's been the market evolution that you've seen? >> Yeah, absolutely. So, I moved out to Silicon Valley basically to try and work at a startup, 'cause I was enamored with startups and I thought they were the coolest thing ever. So, at one point, I basically joined the smallest startup I could find. It was a startup called Metamarkets, which actually doesn't exist anymore, it was ultimately acquired by Snapchat a couple years ago. But, I was one of the first employees there. And what we were trying to do at the time, was we were trying to build an analytics application, a user-facing application where people could slice and dice various types of data. At the time, the data sets we were working with were like online advertising, digital advertising data sets which were very large and complex. And, we really struggled to find a database that could basically power the kind of interactive and user experience that we know we want to provide our end customers. So, what ended up happening was we decided to build our own database and we were a three or five-person shop when we decided to build our own database, and that was Druid. And over time, we saw many other types of companies actually struggle with a similar set of problems, albeit with very different types of use cases and very different types of data sets. And, the Druid community kind of grew and evolved from that. And in my work in engaging with the community, what I saw was a market opportunity and a market gap and that's where Imply formed. >> Let's double click on that. You talked about why you built Druid, the problem you were looking to solve. But, talk to me about the role that Imply has. >> Right. So, Imply is a commercial company. What we do is we build kind of an end-to-end enterprise product around Druid as the core engine. Imply provides deployment, it deploys management, it provides security, and it also provides visualization and monitoring pieces around Druid as a core engine. What we aim to do at Imply is really enable developers to build various types of data applications with only the click of a few buttons and interacting with a simple set of APIs. So, the goal is, if you're a developer, you don't have to think about managing the database yourself, you don't have to think about the operational complexity at the database, but instead, what you do is just work with APIs and build your application. >> So, then what gives Druid its superpower? What makes Druid Druid? >> Yeah, so, Druid, the easiest way to think about it, is it's a really fast calculator and it's a very fast calculator for a whole lot of data. So, when you have a whole lot of data and you want to crunch numbers very, very quickly, Druid is very good at doing that. And, people always ask me this question, which is, what makes Druid special? And I always struggle with it, because it's never just one thing, it's actually layers, upon layers, upon layers of engineering. You start with fundamentals of how you maximally optimize the resources of any hardware. So, how do you maximize storage? How do you maximize compute? And then, there's a lot of optimizations around how do you store the data? How do you access that data in a very fast way once it's stored in order to run computations very quickly? So, unfortunately, there's no silver bullet about Druid, but maybe I can summarize in this way. Druid, it's like a search system, and a data warehouse, and a time series database all mixed together. And, that architecture enables it to be very, very quickly. And unfortunately, if you don't know what some of the components I'm talking about are, it's hard to describe where the secret sauce is (chuckling). >> Sometimes you want to keep that secret sauce secret. Talk to me about the overall data space, as we see these days, every company is a data company or if it's not, it needs to be to be successful. Where does Druid fit in the overall data space? Give us that picture of where it fits. >> Yeah, absolutely. So, it's pretty interesting that you see now in the public markets as well as the private markets, some of the hottest unicorns out there are actually data companies. And, I think what people are are understanding now for the first time, is just how vast and complex the data space is and also how large the market is as well. So for sure, there's many different components and pieces in the data space, and they oftentimes come together to form what's known as a data stack. So, data stack is basically kind of an architecture that has various systems and each of these systems are designed to do a certain set of things very, very well. For example, a company that recently went public is a company called Confluent, which mostly catered towards data transport, so getting data from one place to another. They're built around an open source engine called Apache Kafka. Databricks is another mega unicorn that's going to go public pretty soon. And they're built around an open source project called Spark, which is mainly used for data processing. Where we sit is on the data query side. So, what that means is we're a system in which people can store data and then access that data very, very quickly. And there's other systems that do that, but where our bread and butter is, is we're building some sort of application, where you have end users that are clicking buttons in order to get access to data, we're a platform that enables the best end user experience. We return queries very, very quickly with a consistent SLA, we immediately visualize data as soon as it's made available, and then we can support many, many, many concurrent end users to access the system at the same time. >> So, real time. One of the things I think that we learned during the pandemic, one of the many things is that access to real time data, it's no longer a nice to have, it is table stakes for, as I said, every company, these days is a data company. So with how you describe it, how should people think of Druid versus a data warehouse? >> Yeah. So, that's a great question. And obviously, data warehouses have been around since the 70s. In the B2B space, they're among the largest players that kind of exist in enterprise software. So, it's only natural that when you come up with sort of a new analytics database, that people compare it with what they already know, which is data warehouse. So, a lot of how we think about why we're different than data warehouse goes back to how I answered the previous question, and that we're focused right now, really, on powering different types of data applications. Data applications are UIs in which people are really accessing and getting insights from data by clicking buttons versus writing more complex equal queries. And when you click buttons and you get access to data, what you want in terms of an end user experience, is you want answers to questions to come back almost immediately. So you don't want to click a button and then see a spinning dial that goes on for minute and minutes before an answer comes back. You basically want results to come back immediately. You want that experience no matter what types of queries that you're issuing or how many people are issuing those queries. If you have thousands, if not tens of thousands of people that are trying to access data exact same time, you want to give a consistent user experience like Google, which is one of my favorite products. There're millions of people that use Google, and ask questions and they get their answers back immediately. So we try to provide that same experience, but instead of a generic search engine, what we're doing is we're providing a system that basically answers questions on data and users get a very interactive and fast experience when asking questions. And that's something that I think is very different than what data warehouses are primarily specialized in. Data warehouses are really designed to be systems in which people write very large complex sequel queries that might take minutes or hours sometimes to run. But the experience of using a data warehouse to power and application is not a great one. >> So, I'm just curious, FJ, in the last couple of years, with, as I mentioned before the access to real time data no longer a nice to have, but it's something business critical for so many industries, did you see any industries in particular in the recent years that were really primed candidates for what Druid would can deliver? >> Yeah, that's a great question. And you can imagine that the industries that really heavily rely on fast decision making are the ones that are earliest to adopt technologies like this. So, in the security space, and the observability space, as well as working with networking and various forms of backend kind of metrics data, this system has been very popular and it's been popular because people need to triage (indistinct) as they occur, they need to resolve problems, and they also need immediate visibility, as well as very fast queries on data. Another space is online advertising. Online advertising, nowadays is almost entirely programmatic and digital. So, response times are critical in order to make decisions. And that's where Druid was actually born. It was born for advertising before it kind of went everywhere else. We're seeing it more in fraud protection, fraud prevention as well as fraud diagnostics nowadays. We're seeing it in retail as well, which is pretty interesting. And, the goal, of course, is I believe every industry and every vertical needs the capabilities that we provide. So hopefully, we see a whole lot more use cases in the near future. >> Right, it's absolutely horizontal these days. So, 10-year history, you've got a community of thousands, what's the future of Druid? What do you see when you open the crystal ball and look now down the 12 months, 18 months road? >> Yeah. So, I think as a technologist, your goal as the technologist, at least for me, is to try and create technology that has as much applicability as possible and solves problems for as many people as possible. That's always the way I think about it. So, I want to do good engineering and I want to build good systems. And I think what the hallmark of a really good system is you can solve all different types of problems and condense all these different problems, actually into the same set of models and the same set of principles. And, a thing that makes me most excited about Druid is the many, many different industries that it's found value and the many different use cases it's found value. So, if I were to give 30,000 foot roadmap, that's what we're trying to do with the next generation of Druid. We're actually doing a pretty major engine upgrade right now, and pretty major overhaul the entire system. And the goal of that is to take all the learnings that we've had over the last decade and to create something new that can solve an expanded set of problems that we've heard from the community and from other places as well. >> Excellent. FJ, exciting work that you've done the last 10 years. Congratulations on that. Looking forward to the roadmap that you talked about. Thanks for sharing what Druid is, the Imply connection, and all the different use cases where it applies. We appreciate your insights. >> Appreciate you having me on the show. Thank you very much. >> My pleasure. For FJ Yang, I'm Lisa Martin. You're watching this CUBE Conversation, the leader in live tech enterprise coverage. (bright upbeat music)

Published Date : Mar 23 2022

SUMMARY :

the co-founder and CEO of Imply. and before start the company, You're also one of the original At the time, the data sets we were working the problem you were looking to solve. So, the goal is, if you're a developer, of the components I'm talking about are, the overall data space? in the data space, One of the things I think So, a lot of how we think So, in the security space, and look now down the 12 and the same set of principles. and all the different use Appreciate you having me on the show. the leader in live tech

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Lisa MartinPERSON

0.99+

thousandsQUANTITY

0.99+

Silicon ValleyLOCATION

0.99+

LisaPERSON

0.99+

SnapchatORGANIZATION

0.99+

10-yearQUANTITY

0.99+

18 monthsQUANTITY

0.99+

FJ YangPERSON

0.99+

threeQUANTITY

0.99+

ImplyORGANIZATION

0.99+

ConfluentORGANIZATION

0.99+

12 monthsQUANTITY

0.99+

30,000 footQUANTITY

0.99+

DruidTITLE

0.99+

eachQUANTITY

0.99+

oneQUANTITY

0.99+

Fangjin YangPERSON

0.99+

first timeQUANTITY

0.98+

TodayDATE

0.98+

GoogleORGANIZATION

0.98+

todayDATE

0.98+

millions of peopleQUANTITY

0.98+

OneQUANTITY

0.98+

Imply.ioORGANIZATION

0.97+

MetamarketsORGANIZATION

0.96+

five-personQUANTITY

0.96+

first employeesQUANTITY

0.94+

tens of thousands of peopleQUANTITY

0.94+

pandemicEVENT

0.94+

last couple of yearsDATE

0.91+

FJPERSON

0.91+

70sDATE

0.89+

one thingQUANTITY

0.89+

DatabricksORGANIZATION

0.88+

one pointQUANTITY

0.87+

DruidPERSON

0.84+

couple years agoDATE

0.81+

last decadeDATE

0.75+

Apache DruidORGANIZATION

0.73+

ConversationEVENT

0.73+

ApacheORGANIZATION

0.72+

last 10 yearsDATE

0.72+

doubleQUANTITY

0.69+

SparkTITLE

0.66+

my favorite productsQUANTITY

0.62+

CUBE ConversationTITLE

0.58+

minutesQUANTITY

0.54+

minuteQUANTITY

0.51+

KafkaTITLE

0.41+

CUBE ConversationEVENT

0.31+