Gabriela de Queiroz, Microsoft | WiDS 2023

(upbeat music) >> Welcome back to theCUBE's coverage of Women in Data Science 2023 live from Stanford University. This is Lisa Martin. My co-host is Tracy Yuan. We're excited to be having great conversations all day but you know, 'cause you've been watching. We've been interviewing some very inspiring women and some men as well, talking about all of the amazing applications of data science. You're not going to want to miss this next conversation. Our guest is Gabriela de Queiroz, Principal Cloud Advocate Manager of Microsoft. Welcome, Gabriela. We're excited to have you. >> Thank you very much. I'm so excited to be talking to you. >> Yeah, you're on theCUBE. >> Yeah, finally. (Lisa laughing) Like a dream come true. (laughs) >> I know and we love that. We're so thrilled to have you. So you have a ton of experience in the data space. I was doing some research on you. You've worked in software, financial advertisement, health. Talk to us a little bit about you. What's your background in? >> So I was trained in statistics. So I'm a statistician and then I worked in epidemiology. I worked with air pollution and public health. So I was a researcher before moving into the industry. So as I was talking today, the weekly paths, it's exactly who I am. I went back and forth and back and forth and stopped and tried something else until I figured out that I want to do data science and that I want to do different things because with data science we can... The beauty of data science is that you can move across domains. So I worked in healthcare, financial, and then different technology companies. >> Well the nice thing, one of the exciting things that data science, that I geek out about and Tracy knows 'cause we've been talking about this all day, it's just all the different, to your point, diverse, pun intended, applications of data science. You know, this morning we were talking about, we had the VP of data science from Meta as a keynote. She came to theCUBE talking and really kind of explaining from a content perspective, from a monetization perspective, and of course so many people in the world are users of Facebook. It makes it tangible. But we also heard today conversations about the applications of data science in police violence, in climate change. We're in California, we're expecting a massive rainstorm and we don't know what to do when it rains or snows. But climate change is real. Everyone's talking about it, and there's data science at its foundation. That's one of the things that I love. But you also have a lot of experience building diverse teams. Talk a little bit about that. You've created some very sophisticated data science solutions. Talk about your recommendation to others to build diverse teams. What's in it for them? And maybe share some data science project or two that you really found inspirational. >> Yeah, absolutely. So I do love building teams. Every time I'm given the task of building teams, I feel the luckiest person in the world because you have the option to pick like different backgrounds and all the diverse set of like people that you can find. I don't think it's easy, like people say, yeah, it's very hard. You have to be intentional. You have to go from the very first part when you are writing the job description through the interview process. So you have to be very intentional in every step. And you have to think through when you are doing that. And I love, like my last team, we had like 10 people and we were so diverse. Like just talking about languages. We had like 15 languages inside a team. So how beautiful it is. Like all different backgrounds, like myself as a statistician, but we had people from engineering background, biology, languages, and so on. So it's, yeah, like every time thinking about building a team, if you wanted your team to be diverse, you need to be intentional. >> I'm so glad you brought up that intention point because that is the fundamental requirement really is to build it with intention. >> Exactly, and I love to hear like how there's different languages. So like I'm assuming, or like different backgrounds, I'm assuming everybody just zig zags their way into the team and now you're all women in data science and I think that's so precious. >> Exactly. And not only woman, right. >> Tracy: Not only woman, you're right. >> The team was diverse not only in terms of like gender, but like background, ethnicity, and spoken languages, and language that they use to program and backgrounds. Like as I mentioned, not everybody did the statistics in school or computer science. And it was like one of my best teams was when we had this combination also like things that I'm good at the other person is not as good and we have this knowledge sharing all the time. Every day I would feel like I'm learning something. In a small talk or if I was reviewing something, there was always something new because of like the richness of the diverse set of people that were in your team. >> Well what you've done is so impressive, because not only have you been intentional with it, but you sound like the hallmark of a great leader of someone who hires and builds teams to fill gaps. They don't have to know less than I do for me to be the leader. They have to have different skills, different areas of expertise. That is really, honestly Gabriela, that's the hallmark of a great leader. And that's not easy to come by. So tell me, who were some of your mentors and sponsors along the way that maybe influenced you in that direction? Or is that just who you are? >> That's a great question. And I joke that I want to be the role model that I never had, right. So growing up, I didn't have anyone that I could see other than my mom probably or my sister. But there was no one that I could see, I want to become that person one day. And once I was tracing my path, I started to see people looking at me and like, you inspire me so much, and I'm like, oh wow, this is amazing and I want to do do this over and over and over again. So I want to be that person to inspire others. And no matter, like I'll be like a VP, CEO, whoever, you know, I want to be, I want to keep inspiring people because that's so valuable. >> Lisa: Oh, that's huge. >> And I feel like when we grow professionally and then go to the next level, we sometimes we lose that, you know, thing that's essential. And I think also like, it's part of who I am as I was building and all my experiences as I was going through, I became what I mentioned is unique person that I think we all are unique somehow. >> You're a rockstar. Isn't she a rockstar? >> You dropping quotes out. >> I'm loving this. I'm like, I've inspired Gabriela. (Gabriela laughing) >> Oh my God. But yeah, 'cause we were asking our other guests about the same question, like, who are your role models? And then we're talking about how like it's very important for women to see that there is a representation, that there is someone they look up to and they want to be. And so that like, it motivates them to stay in this field and to start in this field to begin with. So yeah, I think like you are definitely filling a void and for all these women who dream to be in data science. And I think that's just amazing. >> And you're a founder too. In 2012, you founded R Ladies. Talk a little bit about that. This is present in more than 200 cities in 55 plus countries. Talk about R Ladies and maybe the catalyst to launch it. >> Yes, so you always start, so I'm from Brazil, I always talk about this because it's such, again, I grew up over there. So I was there my whole life and then I moved to here, Silicon Valley. And when I moved to San Francisco, like the doors opened. So many things happening in the city. That was back in 2012. Data science was exploding. And I found out something about Meetup.com, it's a website that you can join and go in all these events. And I was going to this event and I joke that it was kind of like going to the Disneyland, where you don't know if I should go that direction or the other direction. >> Yeah, yeah. >> And I was like, should I go and learn about data visualization? Should I go and learn about SQL or should I go and learn about Hadoop, right? So I would go every day to those meetups. And I was a student back then, so you know, the budget was very restricted as a student. So we don't have much to spend. And then they would serve dinner and you would learn for free. And then I got to a point where I was like, hey, they are doing all of this as a volunteer. Like they are running this meetup and events for free. And I felt like it's a cycle. I need to do something, right. I'm taking all this in. I'm having this huge opportunity to be here. I want to give back. So that's what how everything started. I was like, no, I have to think about something. I need to think about something that I can give back. And I was using R back then and I'm like how about I do something with R. I love R, I'm so passionate about R, what about if I create a community around R but not a regular community, because by going to this events, I felt that as a Latina and as a woman, I was always in the corner and I was not being able to participate and to, you know, be myself and to network and ask questions. I would be in the corner. So I said to myself, what about if I do something where everybody feel included, where everybody can participate, can share, can ask questions without judgment? So that's how R ladies all came together. >> That's awesome. >> Talk about intentions, like you have to, you had that go in mind, but yeah, I wanted to dive a little bit into R. So could you please talk more about where did the passion for R come from, and like how did the special connection between you and R the language, like born, how did that come from? >> It was not a love at first sight. >> No. >> Not at all. Not at all. Because that was back in Brazil. So all the documentation were in English, all the tutorials, only two. We had like very few tutorials. It was not like nowadays that we have so many tutorials and courses. There were like two tutorials, other documentation in English. So it's was hard for me like as someone that didn't know much English to go through the language and then to learn to program was not easy task. But then as I was going through the language and learning and reading books and finding the people behind the language, I don't know how I felt in love. And then when I came to to San Francisco, I saw some of like the main contributors who are speaking in person and I'm like, wow, they are like humans. I don't know, it was like, I have no idea why I had this love. But I think the the people and then the community was the thing that kept me with the R language. >> Yeah, the community factors is so important. And it's so, at WIDS it's so palpable. I mean I literally walk in the door, every WIDS I've done, I think I've been doing them for theCUBE since 2017. theCUBE has been here since the beginning in 2015 with our co-founders. But you walk in, you get this sense of belonging. And this sense of I can do anything, why not? Why not me? Look at her up there, and now look at you speaking in the technical talk today on theCUBE. So inspiring. One of the things that I always think is you can't be what you can't see. We need to be able to see more people that look like you and sound like you and like me and like you as well. And WIDS gives us that opportunity, which is fantastic, but it's also helping to move the needle, really. And I was looking at some of the Anitab.org stats just yesterday about 2022. And they're showing, you know, the percentage of females in technical roles has been hovering around 25% for a while. It's a little higher now. I think it's 27.6 according to any to Anitab. We're seeing more women hired in roles. But what are the challenges, and I would love to get your advice on this, for those that might be in this situation is attrition, women who are leaving roles. What would your advice be to a woman who might be trying to navigate family and work and career ladder to stay in that role and keep pushing forward? >> I'll go back to the community. If you don't have a community around you, it's so hard to navigate. >> That's a great point. >> You are lonely. There is no one that you can bounce ideas off, that you can share what you are feeling or like that you can learn as well. So sometimes you feel like you are the only person that is going through that problem or like, you maybe have a family or you are planning to have a family and you have to make a decision. But you've never seen anyone going through this. So when you have a community, you see people like you, right. So that's where we were saying about having different people and people like you so they can share as well. And you feel like, oh yeah, so they went through this, they succeed. I can also go through this and succeed. So I think the attrition problem is still big problem. And I'm sure will be worse now with everything that is happening in Tech with layoffs. >> Yes and the great resignation. >> Yeah. >> We are going back, you know, a few steps, like a lot of like advancements that we did. I feel like we are going back unfortunately, but I always tell this, make sure that you have a community. Make sure that you have a mentor. Make sure that you have someone or some people, not only one mentor, different mentors, that can support you through this trajectory. Because it's not easy. But there are a lot of us out there. >> There really are. And that's a great point. I love everything about the community. It's all about that network effect and feeling like you belong- >> That's all WIDS is about. >> Yeah. >> Yes. Absolutely. >> Like coming over here, it's like seeing the old friends again. It's like I'm so glad that I'm coming because I'm all my old friends that I only see like maybe once a year. >> Tracy: Reunion. >> Yeah, exactly. And I feel like that our tank get, you know- >> Lisa: Replenished. >> Exactly. For the rest of the year. >> Yes. >> Oh, that's precious. >> I love that. >> I agree with that. I think one of the things that when I say, you know, you can't see, I think, well, how many females in technology would I be able to recognize? And of course you can be female technology working in the healthcare sector or working in finance or manufacturing, but, you know, we need to be able to have more that we can see and identify. And one of the things that I recently found out, I was telling Tracy this earlier that I geeked out about was finding out that the CTO of Open AI, ChatGPT, is a female. I'm like, (gasps) why aren't we talking about this more? She was profiled on Fast Company. I've seen a few pieces on her, Mira Murati. But we're hearing so much about ChatJTP being... ChatGPT, I always get that wrong, about being like, likening it to the launch of the iPhone, which revolutionized mobile and connectivity. And here we have a female in the technical role. Let's put her on a pedestal because that is hugely inspiring. >> Exactly, like let's bring everybody to the front. >> Yes. >> Right. >> And let's have them talk to us because like, you didn't know. I didn't know probably about this, right. You didn't know. Like, we don't know about this. It's kind of like we are hidden. We need to give them the spotlight. Every woman to give the spotlight, so they can keep aspiring the new generation. >> Or Susan Wojcicki who ran, how long does she run YouTube? All the YouTube influencers that probably have no idea who are influential for whatever they're doing on YouTube in different social platforms that don't realize, do you realize there was a female behind the helm that for a long time that turned it into what it is today? That's outstanding. Why aren't we talking about this more? >> How about Megan Smith, was the first CTO on the Obama administration. >> That's right. I knew it had to do with Obama. Couldn't remember. Yes. Let's let's find more pedestals. But organizations like WIDS, your involvement as a speaker, showing more people you can be this because you can see it, >> Yeah, exactly. is the right direction that will help hopefully bring us back to some of the pre-pandemic levels, and keep moving forward because there's so much potential with data science that can impact everyone's lives. I always think, you know, we have this expectation that we have our mobile phone and we can get whatever we want wherever we are in the world and whatever time of day it is. And that's all data driven. The regular average person that's not in tech thinks about data as a, well I'm paying for it. What's all these data charges? But it's powering the world. It's powering those experiences that we all want as consumers or in our business lives or we expect to be able to do a transaction, whether it's something in a CRM system or an Uber transaction like that, and have the app respond, maybe even know me a little bit better than I know myself. And that's all data. So I think we're just at the precipice of the massive impact that data science will make in our lives. And luckily we have leaders like you who can help navigate us along this path. >> Thank you. >> What advice for, last question for you is advice for those in the audience who might be nervous or maybe lack a little bit of confidence to go I really like data science, or I really like engineering, but I don't see a lot of me out there. What would you say to them? >> Especially for people who are from like a non-linear track where like going onto that track. >> Yeah, I would say keep going. Keep going. I don't think it's easy. It's not easy. But keep going because the more you go the more, again, you advance and there are opportunities out there. Sometimes it takes a little bit, but just keep going. Keep going and following your dreams, that you get there, right. So again, data science, such a broad field that doesn't require you to come from a specific background. And I think the beauty of data science exactly is this is like the combination, the most successful data science teams are the teams that have all these different backgrounds. So if you think that we as data scientists, we started programming when we were nine, that's not true, right. You can be 30, 40, shifting careers, starting to program right now. It doesn't matter. Like you get there no matter how old you are. And no matter what's your background. >> There's no limit. >> There was no limits. >> I love that, Gabriela, >> Thank so much. for inspiring. I know you inspired me. I'm pretty sure you probably inspired Tracy with your story. And sometimes like what you just said, you have to be your own mentor and that's okay. Because eventually you're going to turn into a mentor for many, many others and sounds like you're already paving that path and we so appreciate it. You are now officially a CUBE alumni. >> Yes. Thank you. >> Yay. We've loved having you. Thank you so much for your time. >> Thank you. Thank you. >> For our guest and for Tracy's Yuan, this is Lisa Martin. We are live at WIDS 23, the eighth annual Women in Data Science Conference at Stanford. Stick around. Our next guest joins us in just a few minutes. (upbeat music)

Published Date : Mar 8 2023

SUMMARY :

but you know, 'cause you've been watching. I'm so excited to be talking to you. Like a dream come true. So you have a ton of is that you can move across domains. But you also have a lot of like people that you can find. because that is the Exactly, and I love to hear And not only woman, right. that I'm good at the other Or is that just who you are? And I joke that I want And I feel like when You're a rockstar. I'm loving this. So yeah, I think like you the catalyst to launch it. And I was going to this event And I was like, and like how did the special I saw some of like the main more people that look like you If you don't have a community around you, There is no one that you Make sure that you have a mentor. and feeling like you belong- it's like seeing the old friends again. And I feel like that For the rest of the year. And of course you can be everybody to the front. you didn't know. do you realize there was on the Obama administration. because you can see it, I always think, you know, What would you say to them? are from like a non-linear track that doesn't require you to I know you inspired me. you so much for your time. Thank you. the eighth annual Women

ENTITIES

Entity	Category	Confidence
Tracy Yuan	PERSON	0.99+
Megan Smith	PERSON	0.99+
Gabriela de Queiroz	PERSON	0.99+
Susan Wojcicki	PERSON	0.99+
Gabriela	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Brazil	LOCATION	0.99+
2015	DATE	0.99+
2012	DATE	0.99+
San Francisco	LOCATION	0.99+
San Francisco	LOCATION	0.99+
Tracy	PERSON	0.99+
Obama	PERSON	0.99+
Lisa	PERSON	0.99+
Mira Murati	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
California	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
Uber	ORGANIZATION	0.99+
27.6	QUANTITY	0.99+
two	QUANTITY	0.99+
30	QUANTITY	0.99+
40	QUANTITY	0.99+
15 languages	QUANTITY	0.99+
R Ladies	ORGANIZATION	0.99+
two tutorials	QUANTITY	0.99+
Anitab	ORGANIZATION	0.99+
10 people	QUANTITY	0.99+
one	QUANTITY	0.99+
YouTube	ORGANIZATION	0.99+
today	DATE	0.99+
55 plus countries	QUANTITY	0.99+
first part	QUANTITY	0.99+
more than 200 cities	QUANTITY	0.99+
first	QUANTITY	0.98+
nine	QUANTITY	0.98+
SQL	TITLE	0.98+
theCUBE	ORGANIZATION	0.98+
WIDS 23	EVENT	0.98+
Stanford University	ORGANIZATION	0.98+
2017	DATE	0.98+
CUBE	ORGANIZATION	0.97+
Stanford	LOCATION	0.97+
Women in Data Science	TITLE	0.97+
around 25%	QUANTITY	0.96+
Disneyland	LOCATION	0.96+
English	OTHER	0.96+
one mentor	QUANTITY	0.96+
Women in Data Science Conference	EVENT	0.96+
once a year	QUANTITY	0.95+
WIDS	ORGANIZATION	0.92+
this morning	DATE	0.91+
Meetup.com	ORGANIZATION	0.91+
Facebook	ORGANIZATION	0.9+
Hadoop	TITLE	0.89+
WiDS 2023	EVENT	0.88+
Anitab.org	ORGANIZATION	0.87+
ChatJTP	TITLE	0.86+
One	QUANTITY	0.86+
one day	QUANTITY	0.85+
ChatGPT	TITLE	0.84+
pandemic	EVENT	0.81+
Fast Company	ORGANIZATION	0.78+
CTO	PERSON	0.76+
Open	ORGANIZATION	0.76+

Phil Kippen, Snowflake, Dave Whittington, AT&T & Roddy Tranum, AT&T | | MWC Barcelona 2023

(gentle music) >> Narrator: "TheCUBE's" live coverage is made possible by funding from Dell Technologies, creating technologies that drive human progress. (upbeat music) >> Hello everybody, welcome back to day four of "theCUBE's" coverage of MWC '23. We're here live at the Fira in Barcelona. Wall-to-wall coverage, John Furrier is in our Palo Alto studio, banging out all the news. Really, the whole week we've been talking about the disaggregation of the telco network, the new opportunities in telco. We're really excited to have AT&T and Snowflake here. Dave Whittington is the AVP, at the Chief Data Office at AT&T. Roddy Tranum is the Assistant Vice President, for Channel Performance Data and Tools at AT&T. And Phil Kippen, the Global Head Of Industry-Telecom at Snowflake, Snowflake's new telecom business. Snowflake just announced earnings last night. Typical Scarpelli, they beat earnings, very conservative guidance, stocks down today, but we like Snowflake long term, they're on that path to 10 billion. Guys, welcome to "theCUBE." Thanks so much >> Phil: Thank you. >> for coming on. >> Dave and Roddy: Thanks Dave. >> Dave, let's start with you. The data culture inside of telco, We've had this, we've been talking all week about this monolithic system. Super reliable. You guys did a great job during the pandemic. Everything shifting to landlines. We didn't even notice, you guys didn't miss a beat. Saved us. But the data culture's changing inside telco. Explain that. >> Well, absolutely. So, first of all IoT and edge processing is bringing forth new and exciting opportunities all the time. So, we're bridging the world between a lot of the OSS stuff that we can do with edge processing. But bringing that back, and now we're talking about working, and I would say traditionally, we talk data warehouse. Data warehouse and big data are now becoming a single mesh, all right? And the use cases and the way you can use those, especially I'm taking that edge data and bringing it back over, now I'm running AI and ML models on it, and I'm pushing back to the edge, and I'm combining that with my relational data. So that mesh there is making all the difference. We're getting new use cases that we can do with that. And it's just, and the volume of data is immense. >> Now, I love ChatGPT, but I'm hoping your data models are more accurate than ChatGPT. I never know. Sometimes it's really good, sometimes it's really bad. But enterprise, you got to be clean with your AI, don't you? >> Not only you have to be clean, you have to monitor it for bias and be ethical about it. We're really good about that. First of all with AT&T, our brand is Platinum. We take care of that. So, we may not be as cutting-edge risk takers as others, but when we go to market with an AI or an ML or a product, it's solid. >> Well hey, as telcos go, you guys are leaning into the Cloud. So I mean, that's a good starting point. Roddy, explain your role. You got an interesting title, Channel Performance Data and Tools, what's that all about? >> So literally anything with our consumer, retail, concenters' channels, all of our channels, from a data perspective and metrics perspective, what it takes to run reps, agents, all the way to leadership levels, scorecards, how you rank in the business, how you're driving the business, from sales, service, customer experience, all that data infrastructure with our great partners on the CDO side, as well as Snowflake, that comes from my team. >> And that's traditionally been done in a, I don't mean the pejorative, but we're talking about legacy, monolithic, sort of data warehouse technologies. >> Absolutely. >> We have a love-hate relationship with them. It's what we had. It's what we used, right? And now that's evolving. And you guys are leaning into the Cloud. >> Dramatic evolution. And what Snowflake's enabled for us is impeccable. We've talked about having, people have dreamed of one data warehouse for the longest time and everything in one system. Really, this is the only way that becomes a reality. The more you get in Snowflake, we can have golden source data, and instead of duplicating that 50 times across AT&T, it's in one place, we just share it, everybody leverages it, and now it's not duplicated, and the process efficiency is just incredible. >> But it really hinges on that separation of storage and compute. And we talk about the monolithic warehouse, and one of the nightmares I've lived with, is having a monolithic warehouse. And let's just go with some of my primary, traditional customers, sales, marketing and finance. They are leveraging BSS OSS data all the time. For me to coordinate a deployment, I have to make sure that each one of these units can take an outage, if it's going to be a long deployment. With the separation of storage, compute, they own their own compute cluster. So I can move faster for these people. 'Cause if finance, I can implement his code without impacting finance or marketing. This brings in CI/CD to more reality. It brings us faster to market with more features. So if he wants to implement a new comp plan for the field reps, or we're reacting to the marketplace, where one of our competitors has done something, we can do that in days, versus waiting weeks or months. >> And we've reported on this a lot. This is the brilliance of Snowflake's founders, that whole separation >> Yep. >> from compute and data. I like Dave, that you're starting with sort of the business flexibility, 'cause there's a cost element of this too. You can dial down, you can turn off compute, and then of course the whole world said, "Hey, that's a good idea." And a VC started throwing money at Amazon, but Redshift said, "Oh, we can do that too, sort of, can't turn off the compute." But I want to ask you Phil, so, >> Sure. >> it looks from my vantage point, like you're taking your Data Cloud message which was originally separate compute from storage simplification, now data sharing, automated governance, security, ultimately the marketplace. >> Phil: Right. >> Taking that same model, break down the silos into telecom, right? It's that same, >> Mm-hmm. >> sorry to use the term playbook, Frank Slootman tells me he doesn't use playbooks, but he's not a pattern matcher, but he's a situational CEO, he says. But the situation in telco calls for that type of strategy. So explain what you guys are doing in telco. >> I think there's, so, what we're launching, we launched last week, and it really was three components, right? So we had our platform as you mentioned, >> Dave: Mm-hmm. >> and that platform is being utilized by a number of different companies today. We also are adding, for telecom very specifically, we're adding capabilities in marketplace, so that service providers can not only use some of the data and apps that are in marketplace, but as well service providers can go and sell applications or sell data that they had built. And then as well, we're adding our ecosystem, it's telecom-specific. So, we're bringing partners in, technology partners, and consulting and services partners, that are very much focused on telecoms and what they do internally, but also helping them monetize new services. >> Okay, so it's not just sort of generic Snowflake into telco? You have specific value there. >> We're purposing the platform specifically for- >> Are you a telco guy? >> I am. You are, okay. >> Total telco guy absolutely. >> So there you go. You see that Snowflake is actually an interesting organizational structure, 'cause you're going after verticals, which is kind of rare for a company of your sort of inventory, I'll say, >> Absolutely. >> I don't mean that as a negative. (Dave laughs) So Dave, take us through the data journey at AT&T. It's a long history. You don't have to go back to the 1800s, but- (Dave laughs) >> Thank you for pointing out, we're a 149-year-old company. So, Jesse James was one of the original customers, (Dave laughs) and we have no longer got his data. So, I'll go back. I've been 17 years singular AT&T, and I've watched it through the whole journey of, where the monolithics were growing, when the consolidation of small, wireless carriers, and we went through that boom. And then we've gone through mergers and acquisitions. But, Hadoop came out, and it was going to solve all world hunger. And we had all the aspects of, we're going to monetize and do AI and ML, and some of the things we learned with Hadoop was, we had this monolithic warehouse, we had this file-based-structured Hadoop, but we really didn't know how to bring this all together. And we were bringing items over to the relational, and we were taking the relational and bringing it over to the warehouse, and trying to, and it was a struggle. Let's just go there. And I don't think we were the only company to struggle with that, but we learned a lot. And so now as tech is finally emerging, with the cloud, companies like Snowflake, and others that can handle that, where we can create, we were discussing earlier, but it becomes more of a conducive mesh that's interoperable. So now we're able to simplify that environment. And the cloud is a big thing on that. 'Cause you could not do this on-prem with on-prem technologies. It would be just too cost prohibitive, and too heavy of lifting, going back and forth, and managing the data. The simplicity the cloud brings with a smaller set of tools, and I'll say in the data space specifically, really allows us, maybe not a single instance of data for all use cases, but a greatly reduced ecosystem. And when you simplify your ecosystem, you simplify speed to market and data management. >> So I'm going to ask you, I know it's kind of internal organizational plumbing, but it'll inform my next question. So, Dave, you're with the Chief Data Office, and Roddy, you're kind of, you all serve in the business, but you're really serving the, you're closer to those guys, they're banging on your door for- >> Absolutely. I try to keep the 130,000 users who may or may not have issues sometimes with our data and metrics, away from Dave. And he just gets a call from me. >> And he only calls when he has a problem. He's never wished me happy birthday. (Dave and Phil laugh) >> So the reason I asked that is because, you describe Dave, some of the Hadoop days, and again love-hate with that, but we had hyper-specialized roles. We still do. You've got data engineers, data scientists, data analysts, and you've got this sort of this pipeline, and it had to be this sequential pipeline. I know Snowflake and others have come to simplify that. My question to you is, how is that those roles, how are those roles changing? How is data getting closer to the business? Everybody talks about democratizing business. Are you doing that? What's a real use example? >> From our perspective, those roles, a lot of those roles on my team for years, because we're all about efficiency, >> Dave: Mm-hmm. >> we cut across those areas, and always have cut across those areas. So now we're into a space where things have been simplified, data processes and copying, we've gone from 40 data processes down to five steps now. We've gone from five steps to one step. We've gone from days, now take hours, hours to minutes, minutes to seconds. Literally we're seeing that time in and time out with Snowflake. So these resources that have spent all their time on data engineering and moving data around, are now freed up more on what they have skills for and always have, the data analytics area of the business, and driving the business forward, and new metrics and new analysis. That's some of the great operational value that we've seen here. As this simplification happens, it frees up brain power. >> So, you're pumping data from the OSS, the BSS, the OKRs everywhere >> Everywhere. >> into Snowflake? >> Scheduling systems, you name it. If you can think of what drives our retail and centers and online, all that data, scheduling system, chat data, call center data, call detail data, all of that enters into this common infrastructure to manage the business on a day in and day out basis. >> How are the roles and the skill sets changing? 'Cause you're doing a lot less ETL, you're doing a lot less moving of data around. There were guys that were probably really good at that. I used to joke in the, when I was in the storage world, like if your job is bandaging lungs, you need to look for a new job, right? So, and they did and people move on. So, are you able to sort of redeploy those assets, and those people, those human resources? >> These folks are highly skilled. And we were talking about earlier, SQL hasn't gone away. Relational databases are not going away. And that's one thing that's made this migration excellent, they're just transitioning their skills. Experts in legacy systems are now rapidly becoming experts on the Snowflake side. And it has not been that hard a transition. There are certainly nuances, things that don't operate as well in the cloud environment that we have to learn and optimize. But we're making that transition. >> Dave: So just, >> Please. >> within the Chief Data Office we have a couple of missions, and Roddy is a great partner and an example of how it works. We try to bring the data for democratization, so that we have one interface, now hopefully know we just have a logical connection back to these Snowflake instances that we connect. But we're providing that governance and cleansing, and if there's a business rule at the enterprise level, we provide it. But the goal at CDO is to make sure that business units like Roddy or marketing or finance, that they can come to a platform that's reliable, robust, and self-service. I don't want to be in his way. So I feel like I'm providing a sub-level of platform, that he can come to and anybody can come to, and utilize, that they're not having to go back and undo what's in Salesforce, or ServiceNow, or in our billers. So, I'm sort of that layer. And then making sure that that ecosystem is robust enough for him to use. >> And that self-service infrastructure is predominantly through the Azure Cloud, correct? >> Dave: Absolutely. >> And you work on other clouds, but it's predominantly through Azure? >> We're predominantly in Azure, yeah. >> Dave: That's the first-party citizen? >> Yeah. >> Okay, I like to think in terms sometimes of data products, and I know you've mentioned upfront, you're Gold standard or Platinum standard, you're very careful about personal information. >> Dave: Yeah. >> So you're not trying to sell, I'm an AT&T customer, you're not trying to sell my data, and make money off of my data. So the value prop and the business case for Snowflake is it's simpler. You do things faster, you're in the cloud, lower cost, et cetera. But I presume you're also in the business, AT&T, of making offers and creating packages for customers. I look at those as data products, 'cause it's not a, I mean, yeah, there's a physical phone, but there's data products behind it. So- >> It ultimately is, but not everybody always sees it that way. Data reporting often can be an afterthought. And we're making it more on the forefront now. >> Yeah, so I like to think in terms of data products, I mean even if the financial services business, it's a data business. So, if we can think about that sort of metaphor, do you see yourselves as data product builders? Do you have that, do you think about building products in that regard? >> Within the Chief Data Office, we have a data product team, >> Mm-hmm. >> and by the way, I wouldn't be disingenuous if I said, oh, we're very mature in this, but no, it's where we're going, and it's somewhat of a journey, but I've got a peer, and their whole job is to go from, especially as we migrate from cloud, if Roddy or some other group was using tables three, four and five and joining them together, it's like, "Well look, this is an offer for data product, so let's combine these and put it up in the cloud, and here's the offer data set product, or here's the opportunity data product," and it's a journey. We're on the way, but we have dedicated staff and time to do this. >> I think one of the hardest parts about that is the organizational aspects of it. Like who owns the data now, right? It used to be owned by the techies, and increasingly the business lines want to have access, you're providing self-service. So there's a discussion about, "Okay, what is a data product? Who's responsible for that data product? Is it in my P&L or your P&L? Somebody's got to sign up for that number." So, it sounds like those discussions are taking place. >> They are. And, we feel like we're more the, and CDO at least, we feel more, we're like the guardians, and the shepherds, but not the owners. I mean, we have a role in it all, but he owns his metrics. >> Yeah, and even from our perspective, we see ourselves as an enabler of making whatever AT&T wants to make happen in terms of the key products and officers' trade-in offers, trade-in programs, all that requires this data infrastructure, and managing reps and agents, and what they do from a channel performance perspective. We still ourselves see ourselves as key enablers of that. And we've got to be flexible, and respond quickly to the business. >> I always had empathy for the data engineer, and he or she had to service all these different lines of business with no business context. >> Yeah. >> Like the business knows good data from bad data, and then they just pound that poor individual, and they're like, "Okay, I'm doing my best. It's just ones and zeros to me." So, it sounds like that's, you're on that path. >> Yeah absolutely, and I think, we do have refined, getting more and more refined owners of, since Snowflake enables these golden source data, everybody sees me and my organization, channel performance data, go to Roddy's team, we have a great team, and we go to Dave in terms of making it all happen from a data infrastructure perspective. So we, do have a lot more refined, "This is where you go for the golden source, this is where it is, this is who owns it. If you want to launch this product and services, and you want to manage reps with it, that's the place you-" >> It's a strong story. So Chief Data Office doesn't own the data per se, but it's your responsibility to provide the self-service infrastructure, and make sure it's governed properly, and in as automated way as possible. >> Well, yeah, absolutely. And let me tell you more, everybody talks about single version of the truth, one instance of the data, but there's context to that, that we are taking, trying to take advantage of that as we do data products is, what's the use case here? So we may have an entity of Roddy as a prospective customer, and we may have a entity of Roddy as a customer, high-value customer over here, which may have a different set of mix of data and all, but as a data product, we can then create those for those specific use cases. Still point to the same data, but build it in different constructs. One for marketing, one for sales, one for finance. By the way, that's where your data engineers are struggling. >> Yeah, yeah, of course. So how do I serve all these folks, and really have the context-common story in telco, >> Absolutely. >> or are these guys ahead of the curve a little bit? Or where would you put them? >> I think they're definitely moving a lot faster than the industry is generally. I think the enabling technologies, like for instance, having that single copy of data that everybody sees, a single pane of glass, right, that's definitely something that everybody wants to get to. Not many people are there. I think, what AT&T's doing, is most definitely a little bit further ahead than the industry generally. And I think the successes that are coming out of that, and the learning experiences are starting to generate momentum within AT&T. So I think, it's not just about the product, and having a product now that gives you a single copy of data. It's about the experiences, right? And now, how the teams are getting trained, domains like network engineering for instance. They typically haven't been a part of data discussions, because they've got a lot of data, but they're focused on the infrastructure. >> Mm. >> So, by going ahead and deploying this platform, for platform's purpose, right, and the business value, that's one thing, but also to start bringing, getting that experience, and bringing new experience in to help other groups that traditionally hadn't been data-centric, that's also a huge step ahead, right? So you need to enable those groups. >> A big complaint of course we hear at MWC from carriers is, "The over-the-top guys are killing us. They're riding on our networks, et cetera, et cetera. They have all the data, they have all the client relationships." Do you see your client relationships changing as a result of sort of your data culture evolving? >> Yes, I'm not sure I can- >> It's a loaded question, I know. >> Yeah, and then I, so, we want to start embedding as much into our network on the proprietary value that we have, so we can start getting into that OTT play, us as any other carrier, we have distinct advantages of what we can do at the edge, and we just need to start exploiting those. But you know, 'cause whether it's location or whatnot, so we got to eat into that. Historically, the network is where we make our money in, and we stack the services on top of it. It used to be *69. >> Dave: Yeah. >> If anybody remembers that. >> Dave: Yeah, of course. (Dave laughs) >> But you know, it was stacked on top of our network. Then we stack another product on top of it. It'll be in the edge where we start providing distinct values to other partners as we- >> I mean, it's a great business that you're in. I mean, if they're really good at connectivity. >> Dave: Yeah. >> And so, it sounds like it's still to be determined >> Dave: Yeah. >> where you can go with this. You have to be super careful with private and for personal information. >> Dave: Yep. >> Yeah, but the opportunities are enormous. >> There's a lot. >> Yeah, particularly at the edge, looking at, private networks are just an amazing opportunity. Factories and name it, hospital, remote hospitals, remote locations. I mean- >> Dave: Connected cars. >> Connected cars are really interesting, right? I mean, if you start communicating car to car, and actually drive that, (Dave laughs) I mean that's, now we're getting to visit Xen Fault Tolerance people. This is it. >> Dave: That's not, let's hold the traffic. >> Doesn't scare me as much as we actually learn. (all laugh) >> So how's the show been for you guys? >> Dave: Awesome. >> What're your big takeaways from- >> Tremendous experience. I mean, someone who doesn't go outside the United States much, I'm a homebody. The whole experience, the whole trip, city, Mobile World Congress, the technologies that are out here, it's been a blast. >> Anything, top two things you learned, advice you'd give to others, your colleagues out in general? >> In general, we talked a lot about technologies today, and we talked a lot about data, but I'm going to tell you what, the accelerator that you cannot change, is the relationship that we have. So when the tech and the business can work together toward a common goal, and it's a partnership, you get things done. So, I don't know how many CDOs or CIOs or CEOs are out there, but this connection is what accelerates and makes it work. >> And that is our audience Dave. I mean, it's all about that alignment. So guys, I really appreciate you coming in and sharing your story in "theCUBE." Great stuff. >> Thank you. >> Thanks a lot. >> All right, thanks everybody. Thank you for watching. I'll be right back with Dave Nicholson. Day four SiliconANGLE's coverage of MWC '23. You're watching "theCUBE." (gentle music)

Published Date : Mar 2 2023

SUMMARY :

that drive human progress. And Phil Kippen, the Global But the data culture's of the OSS stuff that we But enterprise, you got to be So, we may not be as cutting-edge Channel Performance Data and all the way to leadership I don't mean the pejorative, And you guys are leaning into the Cloud. and the process efficiency and one of the nightmares I've lived with, This is the brilliance of the business flexibility, like you're taking your Data Cloud message But the situation in telco and that platform is being utilized You have specific value there. I am. So there you go. I don't mean that as a negative. and some of the things we and Roddy, you're kind of, And he just gets a call from me. (Dave and Phil laugh) and it had to be this sequential pipeline. and always have, the data all of that enters into How are the roles and in the cloud environment that But the goal at CDO is to and I know you've mentioned upfront, So the value prop and the on the forefront now. I mean even if the and by the way, I wouldn't and increasingly the business and the shepherds, but not the owners. and respond quickly to the business. and he or she had to service Like the business knows and we go to Dave in terms doesn't own the data per se, and we may have a entity and really have the and having a product now that gives you and the business value, that's one thing, They have all the data, on the proprietary value that we have, Dave: Yeah, of course. It'll be in the edge business that you're in. You have to be super careful Yeah, but the particularly at the edge, and actually drive that, let's hold the traffic. much as we actually learn. the whole trip, city, is the relationship that we have. and sharing your story in "theCUBE." Thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Whittington	PERSON	0.99+
Frank Slootman	PERSON	0.99+
Roddy	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Phil	PERSON	0.99+
Phil Kippen	PERSON	0.99+
AT&T	ORGANIZATION	0.99+
Jesse James	PERSON	0.99+
AT&T.	ORGANIZATION	0.99+
five steps	QUANTITY	0.99+
Dave Nicholson	PERSON	0.99+
John Furrier	PERSON	0.99+
50 times	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
Roddy Tranum	PERSON	0.99+
10 billion	QUANTITY	0.99+
one step	QUANTITY	0.99+
17 years	QUANTITY	0.99+
130,000 users	QUANTITY	0.99+
United States	LOCATION	0.99+
1800s	DATE	0.99+
last week	DATE	0.99+
Barcelona	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
Dell Technologies	ORGANIZATION	0.99+
last night	DATE	0.99+
MWC '23	EVENT	0.98+
telco	ORGANIZATION	0.98+
one system	QUANTITY	0.98+
one	QUANTITY	0.98+
40 data processes	QUANTITY	0.98+
today	DATE	0.98+
one place	QUANTITY	0.97+
P&L	ORGANIZATION	0.97+
telcos	ORGANIZATION	0.97+
CDO	ORGANIZATION	0.97+
149-year-old	QUANTITY	0.97+
five	QUANTITY	0.97+
single	QUANTITY	0.96+
three components	QUANTITY	0.96+
One	QUANTITY	0.96+

SiliconANGLE News | Beyond the Buzz: A deep dive into the impact of AI

(upbeat music) >> Hello, everyone, welcome to theCUBE. I'm John Furrier, the host of theCUBE in Palo Alto, California. Also it's SiliconANGLE News. Got two great guests here to talk about AI, the impact of the future of the internet, the applications, the people. Amr Awadallah, the founder and CEO, Ed Alban is the CEO of Vectara, a new startup that emerged out of the original Cloudera, I would say, 'cause Amr's known, famous for the Cloudera founding, which was really the beginning of the big data movement. And now as AI goes mainstream, there's so much to talk about, so much to go on. And plus the new company is one of the, now what I call the wave, this next big wave, I call it the fifth wave in the industry. You know, you had PCs, you had the internet, you had mobile. This generative AI thing is real. And you're starting to see startups come out in droves. Amr obviously was founder of Cloudera, Big Data, and now Vectara. And Ed Albanese, you guys have a new company. Welcome to the show. >> Thank you. It's great to be here. >> So great to see you. Now the story is theCUBE started in the Cloudera office. Thanks to you, and your friendly entrepreneurship views that you have. We got to know each other over the years. But Cloudera had Hadoop, which was the beginning of what I call the big data wave, which then became what we now call data lakes, data oceans, and data infrastructure that's developed from that. It's almost interesting to look back 12 plus years, and see that what AI is doing now, right now, is opening up the eyes to the mainstream, and the application's almost mind blowing. You know, Sati Natel called it the Mosaic Moment, didn't say Netscape, he built Netscape (laughing) but called it the Mosaic Moment. You're seeing companies in startups, kind of the alpha geeks running here, because this is the new frontier, and there's real meat on the bone, in terms of like things to do. Why? Why is this happening now? What's is the confluence of the forces happening, that are making this happen? >> Yeah, I mean if you go back to the Cloudera days, with big data, and so on, that was more about data processing. Like how can we process data, so we can extract numbers from it, and do reporting, and maybe take some actions, like this is a fraud transaction, or this is not. And in the meanwhile, many of the researchers working in the neural network, and deep neural network space, were trying to focus on data understanding, like how can I understand the data, and learn from it, so I can take actual actions, based on the data directly, just like a human does. And we were only good at doing that at the level of somebody who was five years old, or seven years old, all the way until about 2013. And starting in 2013, which is only 10 years ago, a number of key innovations started taking place, and each one added on. It was no major innovation that just took place. It was a couple of really incremental ones, but they added on top of each other, in a very exponentially additive way, that led to, by the end of 2019, we now have models, deep neural network models, that can read and understand human text just like we do. Right? And they can reason about it, and argue with you, and explain it to you. And I think that's what is unlocking this whole new wave of innovation that we're seeing right now. So data understanding would be the essence of it. >> So it's not a Big Bang kind of theory, it's been evolving over time, and I think that the tipping point has been the advancements and other things. I mean look at cloud computing, and look how fast it just crept up on AWS. I mean AWS you back three, five years ago, I was talking to Swami yesterday, and their big news about AI, expanding the Hugging Face's relationship with AWS. And just three, five years ago, there wasn't a model training models out there. But as compute comes out, and you got more horsepower,, these large language models, these foundational models, they're flexible, they're not monolithic silos, they're interacting. There's a whole new, almost fusion of data happening. Do you see that? I mean is that part of this? >> Of course, of course. I mean this wave is building on all the previous waves. We wouldn't be at this point if we did not have hardware that can scale, in a very efficient way. We wouldn't be at this point, if we don't have data that we're collecting about everything we do, that we're able to process in this way. So this, this movement, this motion, this phase we're in, absolutely builds on the shoulders of all the previous phases. For some of the observers from the outside, when they see chatGPT for the first time, for them was like, "Oh my god, this just happened overnight." Like it didn't happen overnight. (laughing) GPT itself, like GPT3, which is what chatGPT is based on, was released a year ahead of chatGPT, and many of us were seeing the power it can provide, and what it can do. I don't know if Ed agrees with that. >> Yeah, Ed? >> I do. Although I would acknowledge that the possibilities now, because of what we've hit from a maturity standpoint, have just opened up in an incredible way, that just wasn't tenable even three years ago. And that's what makes it, it's true that it developed incrementally, in the same way that, you know, the possibilities of a mobile handheld device, you know, in 2006 were there, but when the iPhone came out, the possibilities just exploded. And that's the moment we're in. >> Well, I've had many conversations over the past couple months around this area with chatGPT. John Markoff told me the other day, that he calls it, "The five dollar toy," because it's not that big of a deal, in context to what AI's doing behind the scenes, and all the work that's done on ethics, that's happened over the years, but it has woken up the mainstream, so everyone immediately jumps to ethics. "Does it work? "It's not factual," And everyone who's inside the industry is like, "This is amazing." 'Cause you have two schools of thought there. One's like, people that think this is now the beginning of next gen, this is now we're here, this ain't your grandfather's chatbot, okay?" With NLP, it's got reasoning, it's got other things. >> I'm in that camp for sure. >> Yeah. Well I mean, everyone who knows what's going on is in that camp. And as the naysayers start to get through this, and they go, "Wow, it's not just plagiarizing homework, "it's helping me be better. "Like it could rewrite my memo, "bring the lead to the top." It's so the format of the user interface is interesting, but it's still a data-driven app. >> Absolutely. >> So where does it go from here? 'Cause I'm not even calling this the first ending. This is like pregame, in my opinion. What do you guys see this going, in terms of scratching the surface to what happens next? >> I mean, I'll start with, I just don't see how an application is going to look the same in the next three years. Who's going to want to input data manually, in a form field? Who is going to want, or expect, to have to put in some text in a search box, and then read through 15 different possibilities, and try to figure out which one of them actually most closely resembles the question they asked? You know, I don't see that happening. Who's going to start with an absolute blank sheet of paper, and expect no help? That is not how an application will work in the next three years, and it's going to fundamentally change how people interact and spend time with opening any element on their mobile phone, or on their computer, to get something done. >> Yes. I agree with that. Like every single application, over the next five years, will be rewritten, to fit within this model. So imagine an HR application, I don't want to name companies, but imagine an HR application, and you go into application and you clicking on buttons, because you want to take two weeks of vacation, and menus, and clicking here and there, reasons and managers, versus just telling the system, "I'm taking two weeks of vacation, going to Las Vegas," book it, done. >> Yeah. >> And the system just does it for you. If you weren't completing in your input, in your description, for what you want, then the system asks you back, "Did you mean this? "Did you mean that? "Were you trying to also do this as well?" >> Yeah. >> "What was the reason?" And that will fit it for you, and just do it for you. So I think the user interface that we have with apps, is going to change to be very similar to the user interface that we have with each other. And that's why all these apps will need to evolve. >> I know we don't have a lot of time, 'cause you guys are very busy, but I want to definitely have multiple segments with you guys, on this topic, because there's so much to talk about. There's a lot of parallels going on here. I was talking again with Swami who runs all the AI database at AWS, and I asked him, I go, "This feels a lot like the original AWS. "You don't have to provision a data center." A lot of this heavy lifting on the back end, is these large language models, with these foundational models. So the bottleneck in the past, was the energy, and cost to actually do it. Now you're seeing it being stood up faster. So there's definitely going to be a tsunami of apps. I would see that clearly. What is it? We don't know yet. But also people who are going to leverage the fact that I can get started building value. So I see a startup boom coming, and I see an application tsunami of refactoring things. >> Yes. >> So the replatforming is already kind of happening. >> Yes, >> OpenAI, chatGPT, whatever. So that's going to be a developer environment. I mean if Amazon turns this into an API, or a Microsoft, what you guys are doing. >> We're turning it into API as well. That's part of what we're doing as well, yes. >> This is why this is exciting. Amr, you've lived the big data dream, and and we used to talk, if you didn't have a big data problem, if you weren't full of data, you weren't really getting it. Now people have all the data, and they got to stand this up. >> Yeah. >> So the analogy is again, the mobile, I like the mobile movement, and using mobile as an analogy, most companies were not building for a mobile environment, right? They were just building for the web, and legacy way of doing apps. And as soon as the user expectations shifted, that my expectation now, I need to be able to do my job on this small screen, on the mobile device with a touchscreen. Everybody had to invest in re-architecting, and re-implementing every single app, to fit within that model, and that model of interaction. And we are seeing the exact same thing happen now. And one of the core things we're focused on at Vectara, is how to simplify that for organizations, because a lot of them are overwhelmed by large language models, and ML. >> They don't have the staff. >> Yeah, yeah, yeah. They're understaffed, they don't have the skills. >> But they got developers, they've got DevOps, right? >> Yes. >> So they have the DevSecOps going on. >> Exactly, yes. >> So our goal is to simplify it enough for them that they can start leveraging this technology effectively, within their applications. >> Ed, you're the COO of the company, obviously a startup. You guys are growing. You got great backup, and good team. You've also done a lot of business development, and technical business development in this area. If you look at the landscape right now, and I agree the apps are coming, every company I talk to, that has that jet chatGPT of, you know, epiphany, "Oh my God, look how cool this is. "Like magic." Like okay, it's code, settle down. >> Mm hmm. >> But everyone I talk to is using it in a very horizontal way. I talk to a very senior person, very tech alpha geek, very senior person in the industry, technically. they're using it for log data, they're using it for configuration of routers. And in other areas, they're using it for, every vertical has a use case. So this is horizontally scalable from a use case standpoint. When you hear horizontally scalable, first thing I chose in my mind is cloud, right? >> Mm hmm. >> So cloud, and scalability that way. And the data is very specialized. So now you have this vertical specialization, horizontally scalable, everyone will be refactoring. What do you see, and what are you seeing from customers, that you talk to, and prospects? >> Yeah, I mean put yourself in the shoes of an application developer, who is actually trying to make their application a bit more like magic. And to have that soon-to-be, honestly, expected experience. They've got to think about things like performance, and how efficiently that they can actually execute a query, or a question. They've got to think about cost. Generative isn't cheap, like the inference of it. And so you've got to be thoughtful about how and when you take advantage of it, you can't use it as a, you know, everything looks like a nail, and I've got a hammer, and I'm going to hit everything with it, because that will be wasteful. Developers also need to think about how they're going to take advantage of, but not lose their own data. So there has to be some controls around what they feed into the large language model, if anything. Like, should they fine tune a large language model with their own data? Can they keep it logically separated, but still take advantage of the powers of a large language model? And they've also got to take advantage, and be aware of the fact that when data is generated, that it is a different class of data. It might not fully be their own. >> Yeah. >> And it may not even be fully verified. And so when the logical cycle starts, of someone making a request, the relationship between that request, and the output, those things have to be stored safely, logically, and identified as such. >> Yeah. >> And taken advantage of in an ongoing fashion. So these are mega problems, each one of them independently, that, you know, you can think of it as middleware companies need to take advantage of, and think about, to help the next wave of application development be logical, sensible, and effective. It's not just calling some raw API on the cloud, like openAI, and then just, you know, you get your answer and you're done, because that is a very brute force approach. >> Well also I will point, first of all, I agree with your statement about the apps experience, that's going to be expected, form filling. Great point. The interesting about chatGPT. >> Sorry, it's not just form filling, it's any action you would like to take. >> Yeah. >> Instead of clicking, and dragging, and dropping, and doing it on a menu, or on a touch screen, you just say it, and it's and it happens perfectly. >> Yeah. It's a different interface. And that's why I love that UIUX experiences, that's the people falling out of their chair moment with chatGPT, right? But a lot of the things with chatGPT, if you feed it right, it works great. If you feed it wrong and it goes off the rails, it goes off the rails big. >> Yes, yes. >> So the the Bing catastrophes. >> Yeah. >> And that's an example of garbage in, garbage out, classic old school kind of comp-side phrase that we all use. >> Yep. >> Yes. >> This is about data in injection, right? It reminds me the old SQL days, if you had to, if you can sling some SQL, you were a magician, you know, to get the right answer, it's pretty much there. So you got to feed the AI. >> You do, Some people call this, the early word to describe this as prompt engineering. You know, old school, you know, search, or, you know, engagement with data would be, I'm going to, I have a question or I have a query. New school is, I have, I have to issue it a prompt, because I'm trying to get, you know, an action or a reaction, from the system. And the active engineering, there are a lot of different ways you could do it, all the way from, you know, raw, just I'm going to send you whatever I'm thinking. >> Yeah. >> And you get the unintended outcomes, to more constrained, where I'm going to just use my own data, and I'm going to constrain the initial inputs, the data I already know that's first party, and I trust, to, you know, hyper constrain, where the application is actually, it's looking for certain elements to respond to. >> It's interesting Amr, this is why I love this, because one we are in the media, we're recording this video now, we'll stream it. But we got all your linguistics, we're talking. >> Yes. >> This is data. >> Yep. >> So the data quality becomes now the new intellectual property, because, if you have that prompt source data, it makes data or content, in our case, the original content, intellectual property. >> Absolutely. >> Because that's the value. And that's where you see chatGPT fall down, is because they're trying to scroll the web, and people think it's search. It's not necessarily search, it's giving you something that you wanted. It is a lot of that, I remember in Cloudera, you said, "Ask the right questions." Remember that phrase you guys had, that slogan? >> Mm hmm. And that's prompt engineering. So that's exactly, that's the reinvention of "Ask the right question," is prompt engineering is, if you don't give these models the question in the right way, and very few people know how to frame it in the right way with the right context, then you will get garbage out. Right? That is the garbage in, garbage out. But if you specify the question correctly, and you provide with it the metadata that constrain what that question is going to be acted upon or answered upon, then you'll get much better answers. And that's exactly what we solved Vectara. >> Okay. So before we get into the last couple minutes we have left, I want to make sure we get a plug in for the opportunity, and the profile of Vectara, your new company. Can you guys both share with me what you think the current situation is? So for the folks who are now having those moments of, "Ah, AI's bullshit," or, "It's not real, it's a lot of stuff," from, "Oh my god, this is magic," to, "Okay, this is the future." >> Yes. >> What would you say to that person, if you're at a cocktail party, or in the elevator say, "Calm down, this is the first inning." How do you explain the dynamics going on right now, to someone who's either in the industry, but not in the ropes? How would you explain like, what this wave's about? How would you describe it, and how would you prepare them for how to change their life around this? >> Yeah, so I'll go first and then I'll let Ed go. Efficiency, efficiency is the description. So we figured that a way to be a lot more efficient, a way where you can write a lot more emails, create way more content, create way more presentations. Developers can develop 10 times faster than they normally would. And that is very similar to what happened during the Industrial Revolution. I always like to look at examples from the past, to read what will happen now, and what will happen in the future. So during the Industrial Revolution, it was about efficiency with our hands, right? So I had to make a piece of cloth, like this piece of cloth for this shirt I'm wearing. Our ancestors, they had to spend month taking the cotton, making it into threads, taking the threads, making them into pieces of cloth, and then cutting it. And now a machine makes it just like that, right? And the ancestors now turned from the people that do the thing, to manage the machines that do the thing. And I think the same thing is going to happen now, is our efficiency will be multiplied extremely, as human beings, and we'll be able to do a lot more. And many of us will be able to do things they couldn't do before. So another great example I always like to use is the example of Google Maps, and GPS. Very few of us knew how to drive a car from one location to another, and read a map, and get there correctly. But once that efficiency of an AI, by the way, behind these things is very, very complex AI, that figures out how to do that for us. All of us now became amazing navigators that can go from any point to any point. So that's kind of how I look at the future. >> And that's a great real example of impact. Ed, your take on how you would talk to a friend, or colleague, or anyone who asks like, "How do I make sense of the current situation? "Is it real? "What's in it for me, and what do I do?" I mean every company's rethinking their business right now, around this. What would you say to them? >> You know, I usually like to show, rather than describe. And so, you know, the other day I just got access, I've been using an application for a long time, called Notion, and it's super popular. There's like 30 or 40 million users. And the new version of Notion came out, which has AI embedded within it. And it's AI that allows you primarily to create. So if you could break down the world of AI into find and create, for a minute, just kind of logically separate those two things, find is certainly going to be massively impacted in our experiences as consumers on, you know, Google and Bing, and I can't believe I just said the word Bing in the same sentence as Google, but that's what's happening now (all laughing), because it's a good example of change. >> Yes. >> But also inside the business. But on the crate side, you know, Notion is a wiki product, where you try to, you know, note down things that you are thinking about, or you want to share and memorialize. But sometimes you do need help to get it down fast. And just in the first day of using this new product, like my experience has really fundamentally changed. And I think that anybody who would, you know, anybody say for example, that is using an existing app, I would show them, open up the app. Now imagine the possibility of getting a starting point right off the bat, in five seconds of, instead of having to whole cloth draft this thing, imagine getting a starting point then you can modify and edit, or just dispose of and retry again. And that's the potential for me. I can't imagine a scenario where, in a few years from now, I'm going to be satisfied if I don't have a little bit of help, in the same way that I don't manually spell check every email that I send. I automatically spell check it. I love when I'm getting type ahead support inside of Google, or anything. Doesn't mean I always take it, or when texting. >> That's efficiency too. I mean the cloud was about developers getting stuff up quick. >> Exactly. >> All that heavy lifting is there for you, so you don't have to do it. >> Right? >> And you get to the value faster. >> Exactly. I mean, if history taught us one thing, it's, you have to always embrace efficiency, and if you don't fast enough, you will fall behind. Again, looking at the industrial revolution, the companies that embraced the industrial revolution, they became the leaders in the world, and the ones who did not, they all like. >> Well the AI thing that we got to watch out for, is watching how it goes off the rails. If it doesn't have the right prompt engineering, or data architecture, infrastructure. >> Yes. >> It's a big part. So this comes back down to your startup, real quick, I know we got a couple minutes left. Talk about the company, the motivation, and we'll do a deeper dive on on the company. But what's the motivation? What are you targeting for the market, business model? The tech, let's go. >> Actually, I would like Ed to go first. Go ahead. >> Sure, I mean, we're a developer-first, API-first platform. So the product is oriented around allowing developers who may not be superstars, in being able to either leverage, or choose, or select their own large language models for appropriate use cases. But they that want to be able to instantly add the power of large language models into their application set. We started with search, because we think it's going to be one of the first places that people try to take advantage of large language models, to help find information within an application context. And we've built our own large language models, focused on making it very efficient, and elegant, to find information more quickly. So what a developer can do is, within minutes, go up, register for an account, and get access to a set of APIs, that allow them to send data, to be converted into a format that's easy to understand for large language models, vectors. And then secondarily, they can issue queries, ask questions. And they can ask them very, the questions that can be asked, are very natural language questions. So we're talking about long form sentences, you know, drill down types of questions, and they can get answers that either come back in depending upon the form factor of the user interface, in list form, or summarized form, where summarized equals the opportunity to kind of see a condensed, singular answer. >> All right. I have a. >> Oh okay, go ahead, you go. >> I was just going to say, I'm going to be a customer for you, because I want, my dream was to have a hologram of theCUBE host, me and Dave, and have questions be generated in the metaverse. So you know. (all laughing) >> There'll be no longer any guests here. They'll all be talking to you guys. >> Give a couple bullets, I'll spit out 10 good questions. Publish a story. This brings the automation, I'm sorry to interrupt you. >> No, no. No, no, I was just going to follow on on the same. So another way to look at exactly what Ed described is, we want to offer you chatGPT for your own data, right? So imagine taking all of the recordings of all of the interviews you have done, and having all of the content of that being ingested by a system, where you can now have a conversation with your own data and say, "Oh, last time when I met Amr, "which video games did we talk about? "Which movie or book did we use as an analogy "for how we should be embracing data science, "and big data, which is moneyball," I know you use moneyball all the time. And you start having that conversation. So, now the data doesn't become a passive asset that you just have in your organization. No. It's an active participant that's sitting with you, on the table, helping you make decisions. >> One of my favorite things to do with customers, is to go to their site or application, and show them me using it. So for example, one of the customers I talked to was one of the biggest property management companies in the world, that lets people go and rent homes, and houses, and things like that. And you know, I went and I showed them me searching through reviews, looking for information, and trying different words, and trying to find out like, you know, is this place quiet? Is it comfortable? And then I put all the same data into our platform, and I showed them the world of difference you can have when you start asking that question wholeheartedly, and getting real information that doesn't have anything to do with the words you asked, but is really focused on the meaning. You know, when I asked like, "Is it quiet?" You know, answers would come back like, "The wind whispered through the trees peacefully," and you know, it's like nothing to do with quiet in the literal word sense, but in the meaning sense, everything to do with it. And that that was magical even for them, to see that. >> Well you guys are the front end of this big wave. Congratulations on the startup, Amr. I know you guys got great pedigree in big data, and you've got a great team, and congratulations. Vectara is the name of the company, check 'em out. Again, the startup boom is coming. This will be one of the major waves, generative AI is here. I think we'll look back, and it will be pointed out as a major inflection point in the industry. >> Absolutely. >> There's not a lot of hype behind that. People are are seeing it, experts are. So it's going to be fun, thanks for watching. >> Thanks John. (soft music)

Published Date : Feb 23 2023

SUMMARY :

I call it the fifth wave in the industry. It's great to be here. and the application's almost mind blowing. And in the meanwhile, and you got more horsepower,, of all the previous phases. in the same way that, you know, and all the work that's done on ethics, "bring the lead to the top." in terms of scratching the surface and it's going to fundamentally change and you go into application And the system just does it for you. is going to change to be very So the bottleneck in the past, So the replatforming is So that's going to be a That's part of what and they got to stand this up. And one of the core things don't have the skills. So our goal is to simplify it and I agree the apps are coming, I talk to a very senior And the data is very specialized. and be aware of the fact that request, and the output, some raw API on the cloud, about the apps experience, it's any action you would like to take. you just say it, and it's But a lot of the things with chatGPT, comp-side phrase that we all use. It reminds me the old all the way from, you know, raw, and I'm going to constrain But we got all your So the data quality And that's where you That is the garbage in, garbage out. So for the folks who are and how would you prepare them that do the thing, to manage the current situation? And the new version of Notion came out, But on the crate side, you I mean the cloud was about developers so you don't have to do it. and the ones who did not, they all like. If it doesn't have the So this comes back down to Actually, I would like Ed to go first. factor of the user interface, I have a. generated in the metaverse. They'll all be talking to you guys. This brings the automation, of all of the interviews you have done, one of the customers I talked to Vectara is the name of the So it's going to be fun, Thanks John.

ENTITIES

Entity	Category	Confidence
John Markoff	PERSON	0.99+
2013	DATE	0.99+
AWS	ORGANIZATION	0.99+
Ed Alban	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
30	QUANTITY	0.99+
10 times	QUANTITY	0.99+
2006	DATE	0.99+
John Furrier	PERSON	0.99+
two weeks	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Ed Albanese	PERSON	0.99+
John	PERSON	0.99+
five seconds	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
Ed	PERSON	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
10 good questions	QUANTITY	0.99+
Swami	PERSON	0.99+
15 different possibilities	QUANTITY	0.99+
Palo Alto, California	LOCATION	0.99+
Vectara	ORGANIZATION	0.99+
Amr Awadallah	PERSON	0.99+
Google	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
first time	QUANTITY	0.99+
both	QUANTITY	0.99+
end of 2019	DATE	0.99+
yesterday	DATE	0.98+
Big Data	ORGANIZATION	0.98+
40 million users	QUANTITY	0.98+
two things	QUANTITY	0.98+
two great guests	QUANTITY	0.98+
12 plus years	QUANTITY	0.98+
one	QUANTITY	0.98+
five dollar	QUANTITY	0.98+
Netscape	ORGANIZATION	0.98+
five years ago	DATE	0.98+
SQL	TITLE	0.98+
first inning	QUANTITY	0.98+
Amr	PERSON	0.97+
two schools	QUANTITY	0.97+
first	QUANTITY	0.97+
10 years ago	DATE	0.97+
One	QUANTITY	0.96+
first day	QUANTITY	0.96+
three	DATE	0.96+
chatGPT	TITLE	0.96+
first places	QUANTITY	0.95+
Bing	ORGANIZATION	0.95+
Notion	TITLE	0.95+
first thing	QUANTITY	0.94+
theCUBE	ORGANIZATION	0.94+
Beyond the Buzz	TITLE	0.94+
Sati Natel	PERSON	0.94+
Industrial Revolution	EVENT	0.93+
one location	QUANTITY	0.93+
three years ago	DATE	0.93+
single application	QUANTITY	0.92+
one thing	QUANTITY	0.91+
first platform	QUANTITY	0.91+
five years old	QUANTITY	0.91+

Breaking Analysis: CIOs in a holding pattern but ready to strike at monetization

>> From theCUBE Studios in Palo Alto and Boston, bringing you data-driven insights from theCUBE and ETR. This is "Breaking Analysis" with Dave Vellante. >> Recent conversations with IT decision makers show a stark contrast between exiting 2023 versus the mindset when we were leaving 2022. CIOs are generally funding new initiatives by pushing off or cutting lower priority items, while security efforts are still being funded. Those that enable business initiatives that generate revenue or taking priority over cleaning up legacy technical debt. The bottom line is, for the moment, at least, the mindset is not cut everything, rather, it's put a pause on cleaning up legacy hairballs and fund monetization. Hello, and welcome to this week's Wikibon Cube Insights powered by ETR. In this breaking analysis, we tap recent discussions from two primary sources, year-end ETR roundtables with IT decision makers, and CUBE conversations with data, cloud, and IT architecture practitioners. The sources of data for this breaking analysis come from the following areas. Eric Bradley's recent ETR year end panel featured a financial services DevOps and SRE manager, a CSO in a large hospitality firm, a director of IT for a big tech company, the head of IT infrastructure for a financial firm, and a CTO for global travel enterprise, and for our upcoming Supercloud2 conference on January 17th, which you can register free by the way, at supercloud.world, we've had CUBE conversations with data and cloud practitioners, specifically, heads of data in retail and financial services, a cloud architect and a biotech firm, the director of cloud and data at a large media firm, and the director of engineering at a financial services company. Now we've curated commentary from these sources and now we share them with you today as anecdotal evidence supporting what we've been reporting on in the marketplace for these last couple of quarters. On this program, we've likened the economy to the slingshot effect when you're driving, when you're cruising along at full speed on the highway, and suddenly you see red brake lights up ahead, so, you tap your own brakes and then you speed up again, and traffic is moving along at full speed, so, you think nothing of it, and then, all of a sudden, the same thing happens. You slow down to a crawl and you start wondering, "What the heck is happening?" And you become a lot more cautious about the rate of acceleration when you start moving again. Well, that's the trend in IT spend right now. Back in June, we reported that despite the macro headwinds, CIOs were still expecting 6% to 7% spending growth for 2022. Now that was down from 8%, which we reported at the beginning of 2022. That was before Ukraine, and Fed tightening, but given those two factors, you know that that seemed pretty robust, but throughout the fall, we began reporting consistently declining expectations where CIOs are now saying Q4 will come in at around 3% growth relative to last year, and they're expecting, or should we say hoping that it pops back up in 2023 to 4% to 5%. The recent ETR panelists, when they heard this, are saying based on their businesses and discussions with their peers, they could see low single digit growth for 2023, so, 1%, 2%, 3%, so, this sort of slingshotting, or sometimes we call it a seesaw economy, has caught everyone off guard. Amazon is a good example of this, and there are others, but Amazon entered the pandemic with around 800,000 employees. It doubled that workforce during the pandemic. Now, right before Thanksgiving in 2022, Amazon announced that it was laying off 10,000 employees, and, Jassy, the CEO of Amazon, just last week announced that number is now going to grow to 18,000. Now look, this is a rounding error at Amazon from a headcount standpoint and their headcount remains far above 2019 levels. Its stock price, however, does not and it's back down to 2019 levels. The point is that visibility is very poor right now and it's reflected in that uncertainty. We've seen a lot of layoffs, obviously, the stock market's choppy, et cetera. Now importantly, not everything is on hold, and this downturn is different from previous tech pullbacks in that the speed at which new initiatives can be rolled out is much greater thanks to the cloud, and if you can show a fast return, you're going to get funding. Organizations are pausing on the cleanup of technical debt, unless it's driving fast business value. They're holding off on modernization projects. Those business enablement initiatives are still getting funded. CIOs are finding the money by consolidating redundant vendors, and they're stealing from other pockets of budget, so, it's not surprising that cybersecurity remains the number one technology priority in 2023. We've been reporting that for quite some time now. It's specifically cloud, cloud native security container and API security. That's where all the action is, because there's still holes to plug from that forced march to digital that occurred during COVID. Cloud migration, kind of showing here on number two on this chart, still a high priority, while optimizing cloud spend is definitely a strategy that organizations are taking to cut costs. It's behind consolidating redundant vendors by a long shot. There's very little evidence that cloud repatriation, i.e., moving workloads back on prem is a major cost cutting trend. The data just doesn't show it. What is a trend is getting more real time with analytics, so, companies can do faster and more accurate customer targeting, and they're really prioritizing that, obviously, in this down economy. Real time, we sometimes lose it, what's real time? Real time, we sometimes define as before you lose the customer. Now in the hiring front, customers tell us they're still having a hard time finding qualified site reliability engineers, SREs, Kubernetes expertise, and deep analytics pros. These job markets remain very tight. Let's stay with security for just a moment. We said many times that, prior to COVID, zero trust was this undefined buzzword, and the joke, of course, is, if you ask three people, "What is zero trust?" You're going to get three different answers, but the truth is that virtually every security company that was resisting taking a position on zero trust in an attempt to avoid... They didn't want to get caught up in the buzzword vortex, but they're now really being forced to go there by CISOs, so, there are some good quotes here on cyber that we want to share that came out of the recent conversations that we cited up front. The first one, "Zero trust is the highest ROI, because it enables business transformation." In other words, if I can have good security, I can move fast, it's not a blocker anymore. Second quote here, "ZTA," zero trust architecture, "Is more than securing the perimeter. It encompasses strong authentication and multiple identity layers. It requires taking a software approach to security instead of a hardware focus." The next one, "I'd love to have a security data lake that I could apply to asset management, vulnerability management, incident management, incident response, and all aspects for my security team. I see huge promise in that space," and the last one, I see NLP, natural language processing, as the foundation for email security, so, instead of searching for IP addresses, you can now read emails at light speed and identify phishing threats, so, look at, this is a small snapshot of the mindset around security, but I'll add, when you talk to the likes of CrowdStrike, and Zscaler, and Okta, and Palo Alto Networks, and many other security firms, they're listening to these narratives around zero trust. I'm confident they're working hard on skating to this puck, if you will. A good example is this idea of a security data lake and using analytics to improve security. We're hearing a lot about that. We're hearing architectures, there's acquisitions in that regard, and so, that's becoming real, and there are many other examples, because data is at the heart of digital business. This is the next area that we want to talk about. It's obvious that data, as a topic, gets a lot of mind share amongst practitioners, but getting data right is still really hard. It's a challenge for most organizations to get ROI and expected return out of data. Most companies still put data at the periphery of their businesses. It's not at the core. Data lives within silos or different business units, different clouds, it's on-prem, and increasingly it's at the edge, and it seems like the problem is getting worse before it gets better, so, here are some instructive comments from our recent conversations. The first one, "We're publishing events onto Kafka, having those events be processed by Dataproc." Dataproc is a Google managed service to run Hadoop, and Spark, and Flank, and Presto, and a bunch of other open source tools. We're putting them into the appropriate storage models within Google, and then normalize the data into BigQuery, and only then can you take advantage of tools like ThoughtSpot, so, here's a company like ThoughtSpot, and they're all about simplifying data, democratizing data, but to get there, you have to go through some pretty complex processes, so, this is a good example. All right, another comment. "In order to use Google's AI tools, we have to put the data into BigQuery. They haven't integrated in the way AWS and Snowflake have with SageMaker. Moving the data is too expensive, time consuming, and risky," so, I'll just say this, sharing data is a killer super cloud use case, and firms like Snowflake are on top of it, but it's still not pretty across clouds, and Google's posture seems to be, "We're going to let our database product competitiveness drive the strategy first, and the ecosystem is going to take a backseat." Now, in a way, I get it, owning the database is critical, and Google doesn't want to capitulate on that front. Look, BigQuery is really good and competitive, but you can't help but roll your eyes when a CEO stands up, and look, I'm not calling out Thomas Kurian, every CEO does this, and talks about how important their customers are, and they'll do whatever is right by the customer, so, look, I'm telling you, I'm rolling my eyes on that. Now let me also comment, AWS has figured this out. They're killing it in database. If you take Redshift for example, it's still growing, as is Aurora, really fast growing services and other data stores, but AWS realizes it can make more money in the long-term partnering with the Snowflakes and Databricks of the world, and other ecosystem vendors versus sub optimizing their relationships with partners and customers in order to sell more of their own homegrown tools. I get it. It's hard not to feature your own product. IBM chose OS/2 over Windows, and tried for years to popularize it. It failed. Lotus, go back way back to Lotus 1, 2, and 3, they refused to run on Windows when it first came out. They were running on DEC VAX. Many of you young people in the United States have never even heard of DEC VAX. IBM wanted to run every everything only in its cloud, the same with Oracle, originally. VMware, as you might recall, tried to build its own cloud, but, eventually, when the market speaks and reveals what seems to be obvious to analysts, years before, the vendors come around, they face reality, and they stop wasting money, fighting a losing battle. "The trend is your friend," as the saying goes. All right, last pull quote on data, "The hardest part is transformations, moving traditional Informatica, Teradata, or Oracle infrastructure to something more modern and real time, and that's why people still run apps in COBOL. In IT, we rarely get rid of stuff, rather we add on another coat of paint until the wood rots out or the roof is going to cave in. All right, the last key finding we want to highlight is going to bring us back to the cloud repatriation myth. Followers of this program know it's a real sore spot with us. We've heard the stories about repatriation, we've read the thoughtful articles from VCs on the subject, we've been whispered to by vendors that you should investigate this trend. It's really happening, but the data simply doesn't support it. Here's the question that was posed to these practitioners. If you had unlimited budget and the economy miraculously flipped, what initiatives would you tackle first? Where would you really lean into? The first answer, "I'd rip out legacy on-prem infrastructure and move to the cloud even faster," so, the thing here is, look, maybe renting infrastructure is more expensive than owning, maybe, but if I can optimize my rental with better utilization, turn off compute, use things like serverless, get on a steeper and higher performance over time, and lower cost Silicon curve with things like Graviton, tap best of breed tools in AI, and other areas that make my business more competitive. Move faster, fail faster, experiment more quickly, and cheaply, what's that worth? Even the most hard-o CFOs understand the business benefits far outweigh the possible added cost per gigabyte, and, again, I stress "possible." Okay, other interesting comments from practitioners. "I'd hire 50 more data engineers and accelerate our real-time data capabilities to better target customers." Real-time is becoming a thing. AI is being injected into data and apps to make faster decisions, perhaps, with less or even no human involvement. That's on the rise. Next quote, "I'd like to focus on resolving the concerns around cloud data compliance," so, again, despite the risks of data being spread out in different clouds, organizations realize cloud is a given, and they want to find ways to make it work better, not move away from it. The same thing in the next one, "I would automate the data analytics pipeline and focus on a safer way to share data across the states without moving it," and, finally, "The way I'm addressing complexity is to standardize on a single cloud." MonoCloud is actually a thing. We're hearing this more and more. Yes, my company has multiple clouds, but in my group, we've standardized on a single cloud to simplify things, and this is a somewhat dangerous trend, because it's creating even more silos and it's an opportunity that needs to be addressed, and that's why we've been talking so much about supercloud is a cross-cloud, unifying, architectural framework, or, perhaps, it's a platform. In fact, that's a question that we will be exploring later this month at Supercloud2 live from our Palo Alto Studios. Is supercloud an architecture or is it a platform? And in this program, we're featuring technologists, analysts, practitioners to explore the intersection between data and cloud and the future of cloud computing, so, you don't want to miss this opportunity. Go to supercloud.world. You can register for free and participate in the event directly. All right, thanks for listening. That's a wrap. I'd like to thank Alex Myerson, who's on production and manages our podcast, Ken Schiffman as well, Kristen Martin and Cheryl Knight, they helped get the word out on social media, and in our newsletters, and Rob Hof is our editor-in-chief over at siliconangle.com. He does some great editing. Thank you, all. Remember, all these episodes are available as podcasts wherever you listen. All you've got to do is search "breaking analysis podcasts." I publish each week on wikibon.com and siliconangle.com where you can email me directly at david.vellante@siliconangle.com or DM me, @Dante, or comment on our LinkedIn posts. By all means, check out etr.ai. They get the best survey data in the enterprise tech business. We'll be doing our annual predictions post in a few weeks, once the data comes out from the January survey. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching, everybody, and we'll see you next time on "Breaking Analysis." (upbeat music)

Published Date : Jan 7 2023

SUMMARY :

This is "Breaking Analysis" and the director of engineering

ENTITIES

Entity	Category	Confidence
Alex Myerson	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Ken Schiffman	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Jassy	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Eric Bradley	PERSON	0.99+
Rob Hof	PERSON	0.99+
Okta	ORGANIZATION	0.99+
Kristen Martin	PERSON	0.99+
Zscaler	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Thomas Kurian	PERSON	0.99+
6%	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
2023	DATE	0.99+
18,000	QUANTITY	0.99+
Palo Alto Networks	ORGANIZATION	0.99+
10,000 employees	QUANTITY	0.99+
CrowdStrike	ORGANIZATION	0.99+
January	DATE	0.99+
2022	DATE	0.99+
January 17th	DATE	0.99+
Boston	LOCATION	0.99+
Lotus 1	TITLE	0.99+
2019	DATE	0.99+
June	DATE	0.99+
8%	QUANTITY	0.99+
United States	LOCATION	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
Snowflakes	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
Lotus	TITLE	0.99+
two factors	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Dataproc	ORGANIZATION	0.99+
three people	QUANTITY	0.99+
last week	DATE	0.99+
Supercloud2	EVENT	0.99+
Teradata	ORGANIZATION	0.99+
1%	QUANTITY	0.99+
3	TITLE	0.99+
Windows	TITLE	0.99+
5%	QUANTITY	0.99+
3%	QUANTITY	0.99+
BigQuery	TITLE	0.99+
Second quote	QUANTITY	0.99+
4%	QUANTITY	0.99+
DEC VAX	TITLE	0.99+
Thanksgiving	EVENT	0.98+
OS/2	TITLE	0.98+
7%	QUANTITY	0.98+
last year	DATE	0.98+
two primary sources	QUANTITY	0.98+
each week	QUANTITY	0.98+
Informatica	ORGANIZATION	0.98+
pandemic	EVENT	0.98+
first one	QUANTITY	0.98+
siliconangle.com	OTHER	0.97+
first answer	QUANTITY	0.97+
2%	QUANTITY	0.97+
around 800,000 employees	QUANTITY	0.97+
50 more data engineers	QUANTITY	0.97+
zero trust	QUANTITY	0.97+
Snowflake	ORGANIZATION	0.96+
single cloud	QUANTITY	0.96+
2	TITLE	0.96+
today	DATE	0.95+
ETR	ORGANIZATION	0.95+
single cloud	QUANTITY	0.95+
LinkedIn	ORGANIZATION	0.94+
later this month	DATE	0.94+

Ash Naseer, Warner Bros. Discovery | Busting Silos With Monocloud

(vibrant electronic music) >> Welcome back to SuperCloud2. You know, this event, and the Super Cloud initiative in general, it's an open industry-wide collaboration. Last August at SuperCloud22, we really honed in on the definition, which of course we've published. And there's this shared doc, which folks are still adding to and refining, in fact, just recently, Dr. Nelu Mihai added some critical points that really advanced some of the community's initial principles, and today at SuperCloud2, we're digging further into the topic with input from real world practitioners, and we're exploring that intersection of data, data mesh, and cloud, and importantly, the realities and challenges of deploying technology to drive new business capability, and I'm pleased to welcome Ash Naseer to the program. He's a Senior Director of Data Engineering at Warner Bros. Discovery. Ash, great to see you again, thanks so much for taking time with us. >> It's great to be back, these conversations are always very fun. >> I was so excited when we met last spring, I guess, so before we get started I wanted to play a clip from that conversation, it was June, it was at the Snowflake Summit in Las Vegas. And it's a comment that you made about your company but also data mesh. Guys, roll the clip. >> Yeah, so, when people think of Warner Bros., you always think of the movie studio. But we're more than that, right, I mean, you think of HBO, you think of TNT, you think of CNN. We have 30 plus brands in our portfolio, and each have their own needs. So the idea of a data mesh really helps us because what we can do is we can federate access across the company, so that CNN can work at their own pace, you know, when there's election season, they can ingest their own data. And they don't have to bump up against, as an example, HBO, if Game of Thrones is goin' on. >> So-- Okay, so that's pretty interesting, so you've got these sort of different groups that have different data requirements inside of your organization. Now data mesh, it's a relatively new concept, so you're kind of ahead of the curve. So Ash, my question is, when you think about getting value from data, and how that's changed over the past decade, you've had pre-Hadoop, Hadoop, what do you see that's changed, now you got the cloud coming in, what's changed? What had to be sort of fixed? What's working now, and where do you see it going? >> Yeah, so I feel like in the last decade, we've gone through quite a maturity curve. I actually like to say that we're in the golden age of data, because the tools and technology in the data space, particularly and then broadly in the cloud, they allow us to do things that we couldn't do way back when, like you suggested, back in the Hadoop era or even before that. So there's certainly a lot of maturity, and a lot of technology that has come about. So in terms of the good, bad, and ugly, so let me kind of start with the good, right? In terms of bringing value from the data, I really feel like we're in this place where the folks that are charged with unlocking that value from the data, they're actually spending the majority of their time actually doing that. And what do I mean by that? If you think about it, 10 years ago, the data scientist was the person that was going to sort of solve all of the data problems in a company. But what happened was, companies asked these data scientists to come in and do a multitude of things. And what these data scientists found out was, they were spending most of their time on, really, data wrangling, and less on actually getting the value out of the data. And in the last decade or so, I feel like we've made the shift, and we realize that data engineering, data management, data governance, those are as important practices as data science, which is sort of getting the value out of the data. And so what that has done is, it has freed up the data scientist and the business analyst and the data analyst, and the BI expert, to really focus on how to get value out of the data, and spend less time wrangling data. So I really think that that's the good. In terms of the bad, I feel like, there's a lot of legacy data platforms out there, and I feel like there's going to be a time where we'll be in that hybrid mode. And then the ugly, I feel like, with all the data and all the technology, creates another problem of itself. Because most companies don't have arms around their data, and making sure that they know who's using the data, what they're using for, and how can the company leverage the collective intelligence. That is a bigger problem to solve today than 10 years ago. And that's where technologies like the data mesh come in. >> Yeah, so when I think of data mesh, and I say, you're an early practitioner of data mesh, you mentioned legacy technology, so the concept of data mesh is inclusive. In theory anyway, you're supposed to be including the legacy technologies. Whether it's a data lake or data warehouse or Oracle or Snowflake or whatever it is. And when you think about Jamak Dagani's principles, it's domain-centric ownership, data as product. And that creates challenges around self-serve infrastructure and automated governance, and then when you start to combine these different technologies. You got legacy, you got cloud. Everything's different. And so you have to figure out how to deal with that, so my question is, how have you dealt with that, and what role has the cloud played in solving those problems, in particular, that self-serve infrastructure, and that automated governance, and where are we in terms of solving that problem from a practitioner's standpoint? >> Yeah, I always like to say that data is a team sport, and we should sort of think of it as such, and that's, I feel like, the key of the data mesh concept, is treating it as a team sport. A lot of people ask me, they're like, "Oh hey, Ash, I've heard about this thing called data mesh. "Where can I buy one?" or, "what's the technology that I use to get a data mesh? And the reality is that there isn't one technology, you can't really buy a data mesh. It's really a way of life, it's how organizations decide to approach data, like I said, back to a team sport analogy, making sure that everyone has the seat on the table, making sure that we embrace the fact that we have a lot of data, we have a lot of data problems to solve. And the way we'll be successful is to make everyone inclusive. You know, you think about the old days, Data silos or shadow IT, some might call it. That's been around for decades. And what hasn't changed was this notion that, hey, everything needs to be sort of managed centrally. But with the cloud and with the technologies that we have today, we have the right technology and the tooling to democratize that data, and democratize not only just the access, but also sort of building building blocks and sort of taking building blocks which are relevant to your product or your business. And adding to the overall data mesh. We've got all that technology. The challenge is for us to really embrace it, and make sure that we implement it from an organizational standpoint. >> So, thinking about super cloud, there's a layer that lives above the clouds and adds value. And you think about your brands you got 30 brands, you mentioned shadow IT. If, let's say, one of those brands, HBO or TNT, whatever. They want to go, "Hey, we really like Google's analytics tools," and they maybe go off and build something, I don't know if that's even allowed, maybe it's not. But then you build this data mesh. My question is around multi-cloud, cross cloud, super cloud if you will. Is that a advantage for you as a practitioner, or does that just make things more complicated? >> I really love the idea of a multi-cloud. I think it's great, I think that it should have been the norm, not the exception, I feel like people talk about it as if it's the exception. That should have been the case. I will say, though, I feel like multi-cloud should evolve organically, so back to your point about some of these different brands, and, you know, different brands or different business units. Or even in a merger and acquisitions situation, where two different companies or multiple different companies come together with different technology stacks. You know, I feel like that's an organic evolution, and making sure that we use the concepts and the technologies around the multi-cloud to bring everyone together. That's where we need to be, and again, it talks to the fact that each of those business units and each of those groups have their own unique needs, and we need to make sure that we embrace that and we enable that, rather than stifling everything. Now where I have a little bit of a challenge with the multi-cloud is when technology leaders try to build it by design. So there's a notion there that, "Hey, you need to sort of diversify "and don't put all your eggs in one basket." And so we need to have this multi-cloud thing. I feel like that is just sort of creating more complexity where it doesn't need to be, we can all sort of simplify our lives, but where it evolves organically, absolutely, I think that's the right way to go. >> But, so Ash, if it evolves organically don't you need some kind of cloud interpreter, to create a common experience across clouds, does that exist today? What are your thoughts on that? >> There is a lot of technology that exists today, and that helps go between these different clouds, a lot of these sort of cloud agnostic technologies that you talked about, the Snowflakes and the Databricks and so forth of the world, they operate in multiple clouds, they operate in multiple regions, within a given cloud and multiple clouds. So they span all of that, and they have the tools and technology, so, I feel like the tooling is there. There does need to be more of an evolution around the tooling and I think the market's need are going to dictate that, I feel like the market is there, they're asking for it, so, there's definitely going to be that evolution, but the technology is there, I think just making sure that we embrace that and we sort of embrace that as a challenge and not try to sort of shut all of that down and box everything into one. >> What's the biggest challenge, is it governance or security? Or is it more like you're saying, adoption, cultural? >> I think it's a combination of cultural as well as governance. And so, the cultural side I've talked about, right, just making sure that we give these different teams a seat at the table, and they actually bring that technology into the mix. And we use the modern tools and technologies to make sure that everybody sort of plays nice together. That is definitely, we have ways to go there. But then, in terms of governance, that is another big problem that most companies are just starting to wrestle with. Because like I said, I mean, the data silos and shadow IT, that's been around there, right? The only difference is that we're now sort of bringing everything together in a cloud environment, the collective organization has access to that. And now we just realized, oh we have quite a data problem at our hands, so how do we sort of organize this data, make sure that the quality is there, the trust is there. When people look at that data, a lot of those questions are now coming to the forefront because everything is sort of so transparent with the cloud, right? And so I feel like, again, putting in the right processes, and the right tooling to address that is going to be critical in the next years to come. >> Is sharing data across clouds, something that is valuable to you, or even within a single cloud, being able to share data. And my question is, not just within your organization, but even outside your organization, is that something that has sort of hit your radar or is it mature or is that something that really would add value to your business? >> Data sharing is huge, and again, this is another one of those things which isn't new. You know, I remember back in the '90s, when we had to share data externally, with our partners or our vendors, they used to physically send us stacks of these tapes, or physical media on some truck. And we've evolved since then, right, I mean, it went from that to sharing files online and so forth. But data sharing as a concept and as a concept which is now very frictionless, through these different technologies that we have today, that is very new. And that is something, like I said, it's always been going on. But that needs to be really embraced more as well. We as a company heavily leverage data sharing between our own different brands and business units, that helps us make that data mesh, so that when CNN, as an example, builds their own data model based on election data and the kinds of data that they need, compare that with other data in the rest of the company, sports, entertainment, and so forth and so on. Everyone has their unique data, but that data sharing capability brings it together wherever there is a need. So you think about having a Tiger Woods documentary, as an example, on HBO Max and making sure that you reach the audiences that are interested in golf and interested in sports and so forth, right? That all comes through the magic of data sharing, so, it's really critical, internally, for us. And then externally as well, because just understanding how our products are doing on our partners' networks and different distribution channels, that's important, and then just understanding how our consumers are consuming it off properties, right, I mean, we have brands that transcend just the screen, right? We have a lot of physical merchandise that you can buy in the store. So again, understanding who's buying the Batman action figures after the Batman movie was released, that's another critical insight. So it all gets enabled through data sharing, and something we rely heavily on. >> So I wanted to get your perspective on this. So I feel like the nirvana of data mesh is if I want to use Google BigQuery, an Oracle database, or a Microsoft database, or Snowflake, Databricks, Amazon, whatever. That that's a node on the mesh. And in the perfect world, you can share that data, it can be governed, I don't think we're quite there today, so. But within a platform, maybe it's within Google or within Amazon or within Snowflake or Databricks. If you're in that world, maybe even Oracle. You actually can do some levels of data sharing, maybe greater with some than others. Do you mandate as an organization that you have to use this particular data platform, or are you saying "Hey, we are architecting a data mesh for the future "where we believe the technology will support that," or maybe you've invented some technology that supports that today, can you help us understand that? >> Yeah, I always feel like mandate is a strong area, and it breeds the shadow IT and the data silos. So we don't mandate, we do make sure that there's a consistent set of governance rules, policies, and tooling that's there, so that everyone is on the same page. However, at the same time our focus is really operating in a federated way, that's been our solution, right? Is to make sure that we work within a common set of tooling, which may be different technologies, which in some cases may be different clouds. Although we're not that multi-cloud. So what we're trying to do is making sure that everyone who has that technology already built, as long as it sort of follows certain standards, it's modern, it has the capabilities that will eventually allow us to be successful and eventually allow for that data sharing, amongst those different nodes, as you put it. As long as that's the case, and as long as there's a governance layer, a master governance layer, where we know where all that data is and who has access to what and we can sort of be really confident about the quality of the data, as long as that case, our approach to that is really that federated approach. >> Sorry, did I hear you correctly, you're not multi-cloud today? >> Yeah, that's correct. There are certain spots where we use that, but by and large, we rely on a particular cloud, and that's just been, like I said, it's been the evolution, it was our evolution. We decided early on to focus on a single cloud, and that's the direction we've been going in. >> So, do you want to go to a multi-cloud, or, you mentioned organic before, if a business unit wants to go there, as long as they're adhering to those standards that you put out, maybe recommendations, that that's okay? I guess my question is, does that bring benefit to your business that you'd like to tap, or do you feel like it's not necessary? >> I'll go back to the point of, if it happens organically, we're going to be open about it. Obviously we'll have to look at every situations, not all clouds are created equal as well, so there's a number of different considerations. But by and large, when it happens organically, the key is time to value, right? How do you quickly bring those technologies in, as long as you could share the data, they're interconnected, they're secured, they're governed, we are confident on the quality, as long as those principles are met, we could definitely go in that direction. But by and large, we're sort of evolving in a singular direction, but even within a singular cloud, we're a global company. And we have audiences around the world, so making sure that even within a single cloud, those different regions interoperate as one, that's a bigger challenge that we're having to solve as well. >> Last question is kind of to the future of data and cloud and how it's going to evolve, do you see a day when companies like yours are increasingly going to be offering data, their software, services, and becoming more of a technology company, sort of pointing your tooling and your proprietary knowledge at the external world, as an opportunity, as a business opportunity? >> That's a very interesting concept, and I know companies have done that, and some of them have been extremely successful, I mean, Amazon is the biggest example that comes to mind, right-- >> Yeah. >> When they launched AWS, something that they had that expertise they had internally, and they offered it to the world as a product. But by and large, I think it's going to be far and few between, especially, it's going to be focused on companies that have technology as their DNA, or almost like in the technology sector, building technology. Most other companies have different markets that they are addressing. And in my opinion, a lot of these companies, what they're trying to do is really focus on the problems that we can solve for ourselves, I think there are more problems than we have people and expertise. So my guess is that most large companies, they're going to focus on solving their own problems. A few, like I said, more tech-focused companies, that would want to be in that business, would probably branch out, but by and large, I think companies will continue to focus on serving their customers and serving their own business. >> Alright, Ash, we're going to leave it there, Ash Naseer. Thank you so much for your perspectives, it was great to see you, I'm sure we'll see you face-to-face later on this year. >> This is great, thank you for having me. >> Ah, you're welcome, alright. Keep it right there for more great content from SuperCloud2. We'll be right back. (gentle percussive music)

Published Date : Dec 27 2022

SUMMARY :

and the Super Cloud initiative in general, It's great to be back, And it's a comment that So the idea of a data mesh really helps us and how that's changed and making sure that they and that automated governance, and make sure that we implement it And you think about your brands and making sure that we use the concepts and so forth of the world, make sure that the quality or is it mature or is that something and the kinds of data that they need, And in the perfect world, so that everyone is on the same page. and that's the direction the key is time to value, right? and they offered it to Thank you so much for your perspectives, Keep it right there

ENTITIES

Entity	Category	Confidence
CNN	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Warner Bros.	ORGANIZATION	0.99+
TNT	ORGANIZATION	0.99+
Ash Naseer	PERSON	0.99+
HBO	ORGANIZATION	0.99+
Ash	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Nelu Mihai	PERSON	0.99+
each	QUANTITY	0.99+
June	DATE	0.99+
Microsoft	ORGANIZATION	0.99+
Las Vegas	LOCATION	0.99+
Game of Thrones	TITLE	0.99+
Databricks	ORGANIZATION	0.99+
Last August	DATE	0.99+
30 brands	QUANTITY	0.99+
30 plus brands	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
last spring	DATE	0.99+
Batman	PERSON	0.99+
Jamak Dagani	PERSON	0.99+
AWS	ORGANIZATION	0.98+
one basket	QUANTITY	0.98+
10 years ago	DATE	0.98+
today	DATE	0.98+
last decade	DATE	0.97+
Snowflakes	EVENT	0.95+
single cloud	QUANTITY	0.95+
one	QUANTITY	0.95+
two different companies	QUANTITY	0.94+
SuperCloud2	ORGANIZATION	0.94+
Tiger Woods	PERSON	0.94+
Warner Bros. Discovery	ORGANIZATION	0.92+
decades	QUANTITY	0.88+
this year	DATE	0.85+
SuperCloud22	EVENT	0.84+
'90s	DATE	0.84+
SuperCloud2	EVENT	0.83+
Monocloud	ORGANIZATION	0.83+
Snowflake Summit	LOCATION	0.77+
Super Cloud	EVENT	0.77+
a day	QUANTITY	0.74+
Busting Silos With	TITLE	0.73+
Hadoop era	DATE	0.66+
past decade	DATE	0.63+
Databricks	EVENT	0.63+
Max	TITLE	0.49+
BigQuery	TITLE	0.46+
Discovery	ORGANIZATION	0.44+

Veronika Durgin, Saks | The Future of Cloud & Data

(upbeat music) >> Welcome back to Supercloud 2, an open collaborative where we explore the future of cloud and data. Now, you might recall last August at the inaugural Supercloud event we validated the technical feasibility and tried to further define the essential technical characteristics, and of course the deployment models of so-called supercloud. That is, sets of services that leverage the underlying primitives of hyperscale clouds, but are creating new value on top of those clouds for organizations at scale. So we're talking about capabilities that fundamentally weren't practical or even possible prior to the ascendancy of the public clouds. And so today at Supercloud 2, we're digging further into the topic with input from real-world practitioners. And we're exploring the intersection of data and cloud, And importantly, the realities and challenges of deploying technology for a new business capability. I'm pleased to have with me in our studios, west of Boston, Veronika Durgin, who's the head of data at Saks. Veronika, welcome. Great to see you. Thanks for coming on. >> Thank you so much. Thank you for having me. So excited to be here. >> And so we have to say upfront, you're here, these are your opinions. You're not representing Saks in any way. So we appreciate you sharing your depth of knowledge with us. >> Thank you, Dave. Yeah, I've been doing data for a while. I try not to say how long anymore. It's been a while. But yeah, thank you for having me. >> Yeah, you're welcome. I mean, one of the highlights of this past year for me was hanging out at the airport with you after the Snowflake Summit. And we were just chatting about sort of data mesh, and you were saying, "Yeah, but." There was a yeah, but. You were saying there's some practical realities of actually implementing these things. So I want to get into some of that. And I guess starting from a perspective of how data has changed, you've seen a lot of the waves. I mean, even if we go back to pre-Hadoop, you know, that would shove everything into an Oracle database, or, you know, Hadoop was going to save our data lives. And the cloud came along and, you know, that was kind of a disruptive force. And, you know, now we see things like, whether it's Snowflake or Databricks or these other platforms on top of the clouds. How have you observed the change in data and the evolution over time? >> Yeah, so I started as a DBA in the data center, kind of like, you know, growing up trying to manage whatever, you know, physical limitations a server could give us. So we had to be very careful of what we put in our database because we were limited. We, you know, purchased that piece of hardware, and we had to use it for the next, I don't know, three to five years. So it was only, you know, we focused on only the most important critical things. We couldn't keep too much data. We had to be super efficient. We couldn't add additional functionality. And then Hadoop came along, which is like, great, we can dump all the data there, but then we couldn't get data out of it. So it was like, okay, great. Doesn't help either. And then the cloud came along, which was incredible. I was probably the most excited person. I'm lying, but I was super excited because I no longer had to worry about what I can actually put in my database. Now I have that, you know, scalability and flexibility with the cloud. So okay, great, that data's there, and I can also easily get it out of it, which is really incredible. >> Well, but so, I'm inferring from what you're saying with Hadoop, it was like, okay, no schema on write. And then you got to try to make sense out of it. But so what changed with the cloud? What was different? >> So I'll tell a funny story. I actually successfully avoided Hadoop. The only time- >> Congratulations. >> (laughs) I know, I'm like super proud of it. I don't know how that happened, but the only time I worked for a company that had Hadoop, all I remember is that they were running jobs that were taking over 24 hours to get data out of it. And they were realizing that, you know, dumping data without any structure into this massive thing that required, you know, really skilled engineers wasn't really helpful. So what changed, and I'm kind of thinking of like, kind of like how Snowflake started, right? They were marketing themselves as a data warehouse. For me, moving from SQL Server to Snowflake was a non-event. It was comfortable, I knew what it was, I knew how to get data out of it. And I think that's the important part, right? Cloud, this like, kind of like, vague, high-level thing, magical, but the reality is cloud is the same as what we had on prem. So it's comfortable there. It's not scary. You don't need super new additional skills to use it. >> But you're saying what's different is the scale. So you can throw resources at it. You don't have to worry about depreciating your hardware over three to five years. Hey, I have an asset that I have to take advantage of. Is that the big difference? >> Absolutely. Actually, from kind of like operational perspective, which it's funny. Like, I don't have to worry about it. I use what I need when I need it. And not to take this completely in the opposite direction, people stop thinking about using things in a very smart way, right? You like, scale and you walk away. And then, you know, the cool thing about cloud is it's scalable, but you also should not use it when you don't need it. >> So what about this idea of multicloud. You know, supercloud sort of tries to go beyond multicloud. it's like multicloud by accident. And now, you know, whether it's M&A or, you know, some Skunkworks is do, hey, I like Google's tools, so I'm going to use Google. And then people like you are called on to, hey, how do we clean up this mess? And you know, you and I, at the airport, we were talking about data mesh. And I love the concept. Like, doesn't matter if it's a data lake or a data warehouse or a data hub or an S3 bucket. It's just a node on the mesh. But then, of course, you've got to govern it. You've got to give people self-serve. But this multicloud is a reality. So from your perspective, from a practitioner's perspective, what are the advantages of multicloud? We talk about the disadvantages all the time. Kind of get that, but what are the advantages? >> So I think the first thing when I think multicloud, I actually think high-availability disaster recovery. And maybe it's just how I grew up in the data center, right? We were always worried that if something happened in one area, we want to make sure that we can bring business up very quickly. So to me that's kind of like where multicloud comes to mind because, you know, you put your data, your applications, let's pick on AWS for a second and, you know, US East in AWS, which is the busiest kind of like area that they have. If it goes down, for my business to continue, I would probably want to move it to, say, Azure, hypothetically speaking, again, or Google, whatever that is. So to me, and probably again based on my background, disaster recovery high availability comes to mind as multicloud first, but now the other part of it is that there are, you know, companies and tools and applications that are being built in, you know, pick your cloud. How do we talk to each other? And more importantly, how do we data share? You know, I work with data. You know, this is what I do. So if, you know, I want to get data from a company that's using, say, Google, how do we share it in a smooth way where it doesn't have to be this crazy, I don't know, SFTP file moving. So that's where I think supercloud comes to me in my mind, is like practical applications. How do we create that mesh, that network that we can easily share data with each other? >> So you kind of answered my next question, is do you see use cases going beyond H? I mean, the HADR was, remember, that was the original cloud use case. That and bursting, you know, for, you know, Thanksgiving or, you know, for Black Friday. So you see an opportunity to go beyond that with practical use cases. >> Absolutely. I think, you know, we're getting to a world where every company is a data company. We all collect a lot of data. We want to use it for whatever that is. It doesn't necessarily mean sell it, but use it to our competitive advantage. So how do we do it in a very smooth, easy way, which opens additional opportunities for companies? >> You mentioned data sharing. And that's obviously, you know, I met you at Snowflake Summit. That's a big thing of Snowflake's. And of course, you've got Databricks trying to do similar things with open technology. What do you see as the trade-offs there? Because Snowflake, you got to come into their party, you're in their world, and you're kind of locked into that world. Now they're trying to open up. You know, and of course, Databricks, they don't know our world is wide open. Well, we know what that means, you know. The governance. And so now you're seeing, you saw Amazon come out with data clean rooms, which was, you know, that was a good idea that Snowflake had several years before. It's good. It's good validation. So how do you think about the trade-offs between kind of openness and freedom versus control? Is the latter just far more important? >> I'll tell you it depends, right? It's kind of like- >> Could be insulting to that. >> Yeah, I know. It depends because I don't know the answer. It depends, I think, because on the use case and application, ultimately every company wants to make money. That's the beauty of our like, capitalistic economy, right? We're driven 'cause we want to make money. But from the use, you know, how do I sell a product to somebody who's in Google if I am in AWS, right? It's like, we're limiting ourselves if we just do one cloud. But again, it's difficult because at the same time, every cloud provider wants for you to be locked in their cloud, which is why probably, you know, whoever has now data sharing because they want you to stay within their ecosystem. But then again, like, companies are limited. You know, there are applications that are starting to be built on top of clouds. How do we ensure that, you know, I can use that application regardless what cloud, you know, my company is using or I just happen to like. >> You know, and it's true they want you to stay in their ecosystem 'cause they'll make more money. But as well, you think about Apple, right? Does Apple do it 'cause they can make more money? Yes, but it's also they have more control, right? Am I correct that technically it's going to be easier to govern that data if it's all the sort of same standard, right? >> Absolutely. 100%. I didn't answer that question. You have to govern and you have to control. And honestly, it's like it's not like a nice-to-have anymore. There are compliances. There are legal compliances around data. Everybody at some point wants to ensure that, you know, and as a person, quite honestly, you know, not to be, you know, I don't like when my data's used when I don't know how. Like, it's a little creepy, right? So we have to come up with standards around that. But then I also go back in the day. EDI, right? Electronic data interchange. That was figured out. There was standards. Companies were sending data to each other. It was pretty standard. So I don't know. Like, we'll get there. >> Yeah, so I was going to ask you, do you see a day where open standards actually emerge to enable that? And then isn't that the great disruptor to sort of kind of the proprietary stack? >> I think so. I think for us to smoothly exchange data across, you know, various systems, various applications, we'll have to agree to have standards. >> From a developer perspective, you know, back to the sort of supercloud concept, one of the the components of the essential characteristics is you've got this PaaS layer that provides consistency across clouds, and it has unique attributes specific to the purpose of that supercloud. So in the instance of Snowflake, it's data sharing. In the case of, you know, VMware, it might be, you know, infrastructure or self-serve infrastructure that's consistent. From a developer perspective, what do you hear from developers in terms of what they want? Are we close to getting that across clouds? >> I think developers always want freedom and ability to engineer. And oftentimes it's not, (laughs) you know, just as an engineer, I always want to build something, and it's not always for the, to use a specific, you know, it's something I want to do versus what is actually applicable. I think we'll land there, but not because we are, you know, out of the kindness of our own hearts. I think as a necessity we will have to agree to standards, and that that'll like, move the needle. Yeah. >> What are the limitations that you see of cloud and this notion of, you know, even cross cloud, right? I mean, this one cloud can't do it all. You know, but what do you see as the limitations of clouds? >> I mean, it's funny, I always think, you know, again, kind of probably my background, I grew up in the data center. We were physically limited by space, right? That there's like, you can only put, you know, so many servers in the rack and, you know, so many racks in the data center, and then you run out space. Earth has a limited space, right? And we have so many data centers, and everybody's collecting a lot of data that we actually want to use. We're not just collecting for the sake of collecting it anymore. We truly can't take advantage of it because servers have enough power, right, to crank through it. We will run enough space. So how do we balance that? How do we balance that data across all the various data centers? And I know I'm like, kind of maybe talking crazy, but until we figure out how to build a data center on the Moon, right, like, we will have to figure out how to take advantage of all the compute capacity that we have across the world. >> And where does latency fit in? I mean, is it as much of a problem as people sort of think it is? Maybe it depends too. It depends on the use case. But do multiple clouds help solve that problem? Because, you know, even AWS, $80 billion company, they're huge, but they're not everywhere. You know, they're doing local zones, they're doing outposts, which is, you know, less functional than their full cloud. So maybe I would choose to go to another cloud. And if I could have that common experience, that's an advantage, isn't it? >> 100%, absolutely. And potentially there's some maybe pricing tiers, right? So we're talking about latency. And again, it depends on your situation. You know, if you have some sort of medical equipment that is very latency sensitive, you want to make sure that data lives there. But versus, you know, I browse on a website. If the website takes a second versus two seconds to load, do I care? Not exactly. Like, I don't notice that. So we can reshuffle that in a smart way. And I keep thinking of ways. If we have ways for data where it kind of like, oh, you are stuck in traffic, go this way. You know, reshuffle you through that data center. You know, maybe your data will live there. So I think it's totally possible. I know, it's a little crazy. >> No, I like it, though. But remember when you first found ways, you're like, "Oh, this is awesome." And then now it's like- >> And it's like crowdsourcing, right? Like, it's smart. Like, okay, maybe, you know, going to pick on US East for Amazon for a little bit, their oldest, but also busiest data center that, you know, periodically goes down. >> But then you lose your competitive advantage 'cause now it's like traffic socialism. >> Yeah, I know. >> Right? It happened the other day where everybody's going this way up. There's all the Wazers taking. >> And also again, compliance, right? Every country is going down the path of where, you know, data needs to reside within that country. So it's not as like, socialist or democratic as we wish for it to be. >> Well, that's a great point. I mean, when you just think about the clouds, the limitation, now you go out to the edge. I mean, everybody talks about the edge in IoT. Do you actually think that there's like a whole new stove pipe that's going to get created. And does that concern you, or do you think it actually is going to be, you know, connective tissue with all these clouds? >> I honestly don't know. I live in a practical world of like, how does it help me right now? How does it, you know, help me in the next five years? And mind you, in five years, things can change a lot. Because if you think back five years ago, things weren't as they are right now. I mean, I really hope that somebody out there challenges things 'cause, you know, the whole cloud promise was crazy. It was insane. Like, who came up with it? Why would I do that, right? And now I can't imagine the world without it. >> Yeah, I mean a lot of it is same wine, new bottle. You know, but a lot of it is different, right? I mean, technology keeps moving us forward, doesn't it? >> Absolutely. >> Veronika, it was great to have you. Thank you so much for your perspectives. If there was one thing that the industry could do for your data life that would make your world better, what would it be? >> I think standards for like data sharing, data marketplace. I would love, love, love nothing else to have some agreed upon standards. >> I had one other question for you, actually. I forgot to ask you this. 'Cause you were saying every company's a data company. Every company's a software company. We're already seeing it, but how prevalent do you think it will be that companies, you've seen some of it in financial services, but companies begin to now take their own data, their own tooling, their own software, which they've developed internally, and point that to the outside world? Kind of do what AWS did. You know, working backwards from the customer and saying, "Hey, we did this for ourselves. We can now do this for the rest of the world." Do you see that as a real trend, or is that Dave's pie in the sky? >> I think it's a real trend. Every company's trying to reinvent themselves and come up with new products. And every company is a data company. Every company collects data, and they're trying to figure out what to do with it. And again, it's not necessarily to sell it. Like, you don't have to sell data to monetize it. You can use it with your partners. You can exchange data. You know, you can create products. Capital One I think created a product for Snowflake pricing. I don't recall, but it just, you know, they built it for themselves, and they decided to kind of like, monetize on it. And I'm absolutely 100% on board with that. I think it's an amazing idea. >> Yeah, Goldman is another example. Nasdaq is basically taking their exchange stack and selling it around the world. And the cloud is available to do that. You don't have to build your own data center. >> Absolutely. Or for good, right? Like, we're talking about, again, we live in a capitalist country, but use data for good. We're collecting data. We're, you know, analyzing it, we're aggregating it. How can we use it for greater good for the planet? >> Veronika, thanks so much for coming to our Marlborough studios. Always a pleasure talking to you. >> Thank you so much for having me. >> You're really welcome. All right, stay tuned for more great content. From Supercloud 2, this is Dave Vellante. We'll be right back. (upbeat music)

Published Date : Dec 27 2022

SUMMARY :

and of course the deployment models Thank you so much. So we appreciate you sharing your depth But yeah, thank you for having me. And the cloud came along and, you know, So it was only, you know, And then you got to try I actually successfully avoided Hadoop. you know, dumping data So you can throw resources at it. And then, you know, the And you know, you and I, at the airport, to mind because, you know, That and bursting, you know, I think, you know, And that's obviously, you know, But from the use, you know, You know, and it's true they want you to ensure that, you know, you know, various systems, In the case of, you know, VMware, but not because we are, you know, and this notion of, you know, can only put, you know, which is, you know, less But versus, you know, But remember when you first found ways, Like, okay, maybe, you know, But then you lose your It happened the other day the path of where, you know, is going to be, you know, How does it, you know, help You know, but a lot of Thank you so much for your perspectives. to have some agreed upon standards. I forgot to ask you this. I don't recall, but it just, you know, And the cloud is available to do that. We're, you know, analyzing Always a pleasure talking to you. From Supercloud 2, this is Dave Vellante.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Veronika	PERSON	0.99+
Veronika Durgin	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Apple	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
two seconds	QUANTITY	0.99+
Saks	ORGANIZATION	0.99+
$80 billion	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
three	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
last August	DATE	0.99+
Capital One	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
M&A	ORGANIZATION	0.99+
Skunkworks	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
Nasdaq	ORGANIZATION	0.98+
Supercloud 2	EVENT	0.98+
Earth	LOCATION	0.98+
Databricks	ORGANIZATION	0.98+
Supercloud	EVENT	0.98+
today	DATE	0.98+
Snowflake Summit	EVENT	0.98+
US East	LOCATION	0.98+
five years ago	DATE	0.97+
SQL Server	TITLE	0.97+
first thing	QUANTITY	0.96+
Boston	LOCATION	0.95+
Black Friday	EVENT	0.95+
Hadoop	TITLE	0.95+
over 24 hours	QUANTITY	0.95+
one	QUANTITY	0.94+
first	QUANTITY	0.94+
supercloud	ORGANIZATION	0.94+
one thing	QUANTITY	0.93+
Moon	LOCATION	0.93+
Thanksgiving	EVENT	0.93+
over three	QUANTITY	0.92+
one other question	QUANTITY	0.91+
one cloud	QUANTITY	0.9+
one area	QUANTITY	0.9+
Snowflake	TITLE	0.89+
multicloud	ORGANIZATION	0.86+
Azure	ORGANIZATION	0.85+
Supercloud 2	ORGANIZATION	0.83+
> 100%	QUANTITY	0.82+
Goldman	ORGANIZATION	0.81+
Snowflake	EVENT	0.8+
a second	QUANTITY	0.73+
several years before	DATE	0.72+
this past year	DATE	0.71+
second	QUANTITY	0.7+
Marlborough	LOCATION	0.7+
supercloud	TITLE	0.66+
next five years	DATE	0.65+
multicloud	TITLE	0.59+
PaaS	TITLE	0.55+

Breaking Analysis: Grading our 2022 Enterprise Technology Predictions

>>From the Cube Studios in Palo Alto in Boston, bringing you data-driven insights from the cube and E T R. This is breaking analysis with Dave Valante. >>Making technology predictions in 2022 was tricky business, especially if you were projecting the performance of markets or identifying I P O prospects and making binary forecast on data AI and the macro spending climate and other related topics in enterprise tech 2022, of course was characterized by a seesaw economy where central banks were restructuring their balance sheets. The war on Ukraine fueled inflation supply chains were a mess. And the unintended consequences of of forced march to digital and the acceleration still being sorted out. Hello and welcome to this week's weekly on Cube Insights powered by E T R. In this breaking analysis, we continue our annual tradition of transparently grading last year's enterprise tech predictions. And you may or may not agree with our self grading system, but look, we're gonna give you the data and you can draw your own conclusions and tell you what, tell us what you think. >>All right, let's get right to it. So our first prediction was tech spending increases by 8% in 2022. And as we exited 2021 CIOs, they were optimistic about their digital transformation plans. You know, they rushed to make changes to their business and were eager to sharpen their focus and continue to iterate on their digital business models and plug the holes that they, the, in the learnings that they had. And so we predicted that 8% rise in enterprise tech spending, which looked pretty good until Ukraine and the Fed decided that, you know, had to rush and make up for lost time. We kind of nailed the momentum in the energy sector, but we can't give ourselves too much credit for that layup. And as of October, Gartner had it spending growing at just over 5%. I think it was 5.1%. So we're gonna take a C plus on this one and, and move on. >>Our next prediction was basically kind of a slow ground ball. The second base, if I have to be honest, but we felt it was important to highlight that security would remain front and center as the number one priority for organizations in 2022. As is our tradition, you know, we try to up the degree of difficulty by specifically identifying companies that are gonna benefit from these trends. So we highlighted some possible I P O candidates, which of course didn't pan out. S NQ was on our radar. The company had just had to do another raise and they recently took a valuation hit and it was a down round. They raised 196 million. So good chunk of cash, but, but not the i p O that we had predicted Aqua Securities focus on containers and cloud native. That was a trendy call and we thought maybe an M SS P or multiple managed security service providers like Arctic Wolf would I p o, but no way that was happening in the crummy market. >>Nonetheless, we think these types of companies, they're still faring well as the talent shortage in security remains really acute, particularly in the sort of mid-size and small businesses that often don't have a sock Lacework laid off 20% of its workforce in 2022. And CO C e o Dave Hatfield left the company. So that I p o didn't, didn't happen. It was probably too early for Lacework. Anyway, meanwhile you got Netscope, which we've cited as strong in the E T R data as particularly in the emerging technology survey. And then, you know, I lumia holding its own, you know, we never liked that 7 billion price tag that Okta paid for auth zero, but we loved the TAM expansion strategy to target developers beyond sort of Okta's enterprise strength. But we gotta take some points off of the failure thus far of, of Okta to really nail the integration and the go to market model with azero and build, you know, bring that into the, the, the core Okta. >>So the focus on endpoint security that was a winner in 2022 is CrowdStrike led that charge with others holding their own, not the least of which was Palo Alto Networks as it continued to expand beyond its core network security and firewall business, you know, through acquisition. So overall we're gonna give ourselves an A minus for this relatively easy call, but again, we had some specifics associated with it to make it a little tougher. And of course we're watching ve very closely this this coming year in 2023. The vendor consolidation trend. You know, according to a recent Palo Alto network survey with 1300 SecOps pros on average organizations have more than 30 tools to manage security tools. So this is a logical way to optimize cost consolidating vendors and consolidating redundant vendors. The E T R data shows that's clearly a trend that's on the upswing. >>Now moving on, a big theme of 2020 and 2021 of course was remote work and hybrid work and new ways to work and return to work. So we predicted in 2022 that hybrid work models would become the dominant protocol, which clearly is the case. We predicted that about 33% of the workforce would come back to the office in 2022 in September. The E T R data showed that figure was at 29%, but organizations expected that 32% would be in the office, you know, pretty much full-time by year end. That hasn't quite happened, but we were pretty close with the projection, so we're gonna take an A minus on this one. Now, supply chain disruption was another big theme that we felt would carry through 2022. And sure that sounds like another easy one, but as is our tradition, again we try to put some binary metrics around our predictions to put some meat in the bone, so to speak, and and allow us than you to say, okay, did it come true or not? >>So we had some data that we presented last year and supply chain issues impacting hardware spend. We said at the time, you can see this on the left hand side of this chart, the PC laptop demand would remain above pre covid levels, which would reverse a decade of year on year declines, which I think started in around 2011, 2012. Now, while demand is down this year pretty substantially relative to 2021, I D C has worldwide unit shipments for PCs at just over 300 million for 22. If you go back to 2019 and you're looking at around let's say 260 million units shipped globally, you know, roughly, so, you know, pretty good call there. Definitely much higher than pre covid levels. But so what you might be asking why the B, well, we projected that 30% of customers would replace security appliances with cloud-based services and that more than a third would replace their internal data center server and storage hardware with cloud services like 30 and 40% respectively. >>And we don't have explicit survey data on exactly these metrics, but anecdotally we see this happening in earnest. And we do have some data that we're showing here on cloud adoption from ET R'S October survey where the midpoint of workloads running in the cloud is around 34% and forecast, as you can see, to grow steadily over the next three years. So this, well look, this is not, we understand it's not a one-to-one correlation with our prediction, but it's a pretty good bet that we were right, but we gotta take some points off, we think for the lack of unequivocal proof. Cause again, we always strive to make our predictions in ways that can be measured as accurate or not. Is it binary? Did it happen, did it not? Kind of like an O K R and you know, we strive to provide data as proof and in this case it's a bit fuzzy. >>We have to admit that although we're pretty comfortable that the prediction was accurate. And look, when you make an hard forecast, sometimes you gotta pay the price. All right, next, we said in 2022 that the big four cloud players would generate 167 billion in IS and PaaS revenue combining for 38% market growth. And our current forecasts are shown here with a comparison to our January, 2022 figures. So coming into this year now where we are today, so currently we expect 162 billion in total revenue and a 33% growth rate. Still very healthy, but not on our mark. So we think a w s is gonna miss our predictions by about a billion dollars, not, you know, not bad for an 80 billion company. So they're not gonna hit that expectation though of getting really close to a hundred billion run rate. We thought they'd exit the year, you know, closer to, you know, 25 billion a quarter and we don't think they're gonna get there. >>Look, we pretty much nailed Azure even though our prediction W was was correct about g Google Cloud platform surpassing Alibaba, Alibaba, we way overestimated the performance of both of those companies. So we're gonna give ourselves a C plus here and we think, yeah, you might think it's a little bit harsh, we could argue for a B minus to the professor, but the misses on GCP and Alibaba we think warrant a a self penalty on this one. All right, let's move on to our prediction about Supercloud. We said it becomes a thing in 2022 and we think by many accounts it has, despite the naysayers, we're seeing clear evidence that the concept of a layer of value add that sits above and across clouds is taking shape. And on this slide we showed just some of the pickup in the industry. I mean one of the most interesting is CloudFlare, the biggest supercloud antagonist. >>Charles Fitzgerald even predicted that no vendor would ever use the term in their marketing. And that would be proof if that happened that Supercloud was a thing and he said it would never happen. Well CloudFlare has, and they launched their version of Supercloud at their developer week. Chris Miller of the register put out a Supercloud block diagram, something else that Charles Fitzgerald was, it was was pushing us for, which is rightly so, it was a good call on his part. And Chris Miller actually came up with one that's pretty good at David Linthicum also has produced a a a A block diagram, kind of similar, David uses the term metacloud and he uses the term supercloud kind of interchangeably to describe that trend. And so we we're aligned on that front. Brian Gracely has covered the concept on the popular cloud podcast. Berkeley launched the Sky computing initiative. >>You read through that white paper and many of the concepts highlighted in the Supercloud 3.0 community developed definition align with that. Walmart launched a platform with many of the supercloud salient attributes. So did Goldman Sachs, so did Capital One, so did nasdaq. So you know, sorry you can hate the term, but very clearly the evidence is gathering for the super cloud storm. We're gonna take an a plus on this one. Sorry, haters. Alright, let's talk about data mesh in our 21 predictions posts. We said that in the 2020s, 75% of large organizations are gonna re-architect their big data platforms. So kind of a decade long prediction. We don't like to do that always, but sometimes it's warranted. And because it was a longer term prediction, we, at the time in, in coming into 22 when we were evaluating our 21 predictions, we took a grade of incomplete because the sort of decade long or majority of the decade better part of the decade prediction. >>So last year, earlier this year, we said our number seven prediction was data mesh gains momentum in 22. But it's largely confined and narrow data problems with limited scope as you can see here with some of the key bullets. So there's a lot of discussion in the data community about data mesh and while there are an increasing number of examples, JP Morgan Chase, Intuit, H S P C, HelloFresh, and others that are completely rearchitecting parts of their data platform completely rearchitecting entire data platforms is non-trivial. There are organizational challenges, there're data, data ownership, debates, technical considerations, and in particular two of the four fundamental data mesh principles that the, the need for a self-service infrastructure and federated computational governance are challenging. Look, democratizing data and facilitating data sharing creates conflicts with regulatory requirements around data privacy. As such many organizations are being really selective with their data mesh implementations and hence our prediction of narrowing the scope of data mesh initiatives. >>I think that was right on J P M C is a good example of this, where you got a single group within a, within a division narrowly implementing the data mesh architecture. They're using a w s, they're using data lakes, they're using Amazon Glue, creating a catalog and a variety of other techniques to meet their objectives. They kind of automating data quality and it was pretty well thought out and interesting approach and I think it's gonna be made easier by some of the announcements that Amazon made at the recent, you know, reinvent, particularly trying to eliminate ET t l, better connections between Aurora and Redshift and, and, and better data sharing the data clean room. So a lot of that is gonna help. Of course, snowflake has been on this for a while now. Many other companies are facing, you know, limitations as we said here and this slide with their Hadoop data platforms. They need to do new, some new thinking around that to scale. HelloFresh is a really good example of this. Look, the bottom line is that organizations want to get more value from data and having a centralized, highly specialized teams that own the data problem, it's been a barrier and a blocker to success. The data mesh starts with organizational considerations as described in great detail by Ash Nair of Warner Brothers. So take a listen to this clip. >>Yeah, so when people think of Warner Brothers, you always think of like the movie studio, but we're more than that, right? I mean, you think of H B O, you think of t n t, you think of C N N. We have 30 plus brands in our portfolio and each have their own needs. So the, the idea of a data mesh really helps us because what we can do is we can federate access across the company so that, you know, CNN can work at their own pace. You know, when there's election season, they can ingest their own data and they don't have to, you know, bump up against, as an example, HBO if Game of Thrones is going on. >>So it's often the case that data mesh is in the eyes of the implementer. And while a company's implementation may not strictly adhere to Jamma Dani's vision of data mesh, and that's okay, the goal is to use data more effectively. And despite Gartner's attempts to deposition data mesh in favor of the somewhat confusing or frankly far more confusing data fabric concept that they stole from NetApp data mesh is taking hold in organizations globally today. So we're gonna take a B on this one. The prediction is shaping up the way we envision, but as we previously reported, it's gonna take some time. The better part of a decade in our view, new standards have to emerge to make this vision become reality and they'll come in the form of both open and de facto approaches. Okay, our eighth prediction last year focused on the face off between Snowflake and Databricks. >>And we realized this popular topic, and maybe one that's getting a little overplayed, but these are two companies that initially, you know, looked like they were shaping up as partners and they, by the way, they are still partnering in the field. But you go back a couple years ago, the idea of using an AW w s infrastructure, Databricks machine intelligence and applying that on top of Snowflake as a facile data warehouse, still very viable. But both of these companies, they have much larger ambitions. They got big total available markets to chase and large valuations that they have to justify. So what's happening is, as we've previously reported, each of these companies is moving toward the other firm's core domain and they're building out an ecosystem that'll be critical for their future. So as part of that effort, we said each is gonna become aggressive investors and maybe start doing some m and a and they have in various companies. >>And on this chart that we produced last year, we studied some of the companies that were targets and we've added some recent investments of both Snowflake and Databricks. As you can see, they've both, for example, invested in elation snowflake's, put money into Lacework, the Secur security firm, ThoughtSpot, which is trying to democratize data with ai. Collibra is a governance platform and you can see Databricks investments in data transformation with D B T labs, Matillion doing simplified business intelligence hunters. So that's, you know, they're security investment and so forth. So other than our thought that we'd see Databricks I p o last year, this prediction been pretty spot on. So we'll give ourselves an A on that one. Now observability has been a hot topic and we've been covering it for a while with our friends at E T R, particularly Eric Bradley. Our number nine prediction last year was basically that if you're not cloud native and observability, you are gonna be in big trouble. >>So everything guys gotta go cloud native. And that's clearly been the case. Splunk, the big player in the space has been transitioning to the cloud, hasn't always been pretty, as we reported, Datadog real momentum, the elk stack, that's open source model. You got new entrants that we've cited before, like observe, honeycomb, chaos search and others that we've, we've reported on, they're all born in the cloud. So we're gonna take another a on this one, admittedly, yeah, it's a re reasonably easy call, but you gotta have a few of those in the mix. Okay, our last prediction, our number 10 was around events. Something the cube knows a little bit about. We said that a new category of events would emerge as hybrid and that for the most part is happened. So that's gonna be the mainstay is what we said. That pure play virtual events are gonna give way to hi hybrid. >>And the narrative is that virtual only events are, you know, they're good for quick hits, but lousy replacements for in-person events. And you know that said, organizations of all shapes and sizes, they learn how to create better virtual content and support remote audiences during the pandemic. So when we set at pure play is gonna give way to hybrid, we said we, we i we implied or specific or specified that the physical event that v i p experience is going defined. That overall experience and those v i p events would create a little fomo, fear of, of missing out in a virtual component would overlay that serves an audience 10 x the size of the physical. We saw that really two really good examples. Red Hat Summit in Boston, small event, couple thousand people served tens of thousands, you know, online. Second was Google Cloud next v i p event in, in New York City. >>Everything else was, was, was, was virtual. You know, even examples of our prediction of metaverse like immersion have popped up and, and and, and you know, other companies are doing roadshow as we predicted like a lot of companies are doing it. You're seeing that as a major trend where organizations are going with their sales teams out into the regions and doing a little belly to belly action as opposed to the big giant event. That's a definitely a, a trend that we're seeing. So in reviewing this prediction, the grade we gave ourselves is, you know, maybe a bit unfair, it should be, you could argue for a higher grade, but the, but the organization still haven't figured it out. They have hybrid experiences but they generally do a really poor job of leveraging the afterglow and of event of an event. It still tends to be one and done, let's move on to the next event or the next city. >>Let the sales team pick up the pieces if they were paying attention. So because of that, we're only taking a B plus on this one. Okay, so that's the review of last year's predictions. You know, overall if you average out our grade on the 10 predictions that come out to a b plus, I dunno why we can't seem to get that elusive a, but we're gonna keep trying our friends at E T R and we are starting to look at the data for 2023 from the surveys and all the work that we've done on the cube and our, our analysis and we're gonna put together our predictions. We've had literally hundreds of inbounds from PR pros pitching us. We've got this huge thick folder that we've started to review with our yellow highlighter. And our plan is to review it this month, take a look at all the data, get some ideas from the inbounds and then the e t R of January surveys in the field. >>It's probably got a little over a thousand responses right now. You know, they'll get up to, you know, 1400 or so. And once we've digested all that, we're gonna go back and publish our predictions for 2023 sometime in January. So stay tuned for that. All right, we're gonna leave it there for today. You wanna thank Alex Myerson who's on production and he manages the podcast, Ken Schiffman as well out of our, our Boston studio. I gotta really heartfelt thank you to Kristen Martin and Cheryl Knight and their team. They helped get the word out on social and in our newsletters. Rob Ho is our editor in chief over at Silicon Angle who does some great editing for us. Thank you all. Remember all these podcasts are available or all these episodes are available is podcasts. Wherever you listen, just all you do Search Breaking analysis podcast, really getting some great traction there. Appreciate you guys subscribing. I published each week on wikibon.com, silicon angle.com or you can email me directly at david dot valante silicon angle.com or dm me Dante, or you can comment on my LinkedIn post. And please check out ETR AI for the very best survey data in the enterprise tech business. Some awesome stuff in there. This is Dante for the Cube Insights powered by etr. Thanks for watching and we'll see you next time on breaking analysis.

Published Date : Dec 18 2022

SUMMARY :

From the Cube Studios in Palo Alto in Boston, bringing you data-driven insights from self grading system, but look, we're gonna give you the data and you can draw your own conclusions and tell you what, We kind of nailed the momentum in the energy but not the i p O that we had predicted Aqua Securities focus on And then, you know, I lumia holding its own, you So the focus on endpoint security that was a winner in 2022 is CrowdStrike led that charge put some meat in the bone, so to speak, and and allow us than you to say, okay, We said at the time, you can see this on the left hand side of this chart, the PC laptop demand would remain Kind of like an O K R and you know, we strive to provide data We thought they'd exit the year, you know, closer to, you know, 25 billion a quarter and we don't think they're we think, yeah, you might think it's a little bit harsh, we could argue for a B minus to the professor, Chris Miller of the register put out a Supercloud block diagram, something else that So you know, sorry you can hate the term, but very clearly the evidence is gathering for the super cloud But it's largely confined and narrow data problems with limited scope as you can see here with some of the announcements that Amazon made at the recent, you know, reinvent, particularly trying to the company so that, you know, CNN can work at their own pace. So it's often the case that data mesh is in the eyes of the implementer. but these are two companies that initially, you know, looked like they were shaping up as partners and they, So that's, you know, they're security investment and so forth. So that's gonna be the mainstay is what we And the narrative is that virtual only events are, you know, they're good for quick hits, the grade we gave ourselves is, you know, maybe a bit unfair, it should be, you could argue for a higher grade, You know, overall if you average out our grade on the 10 predictions that come out to a b plus, You know, they'll get up to, you know,

ENTITIES

Entity	Category	Confidence
Alex Myerson	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Ken Schiffman	PERSON	0.99+
Chris Miller	PERSON	0.99+
CNN	ORGANIZATION	0.99+
Rob Ho	PERSON	0.99+
Alibaba	ORGANIZATION	0.99+
Dave Valante	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
5.1%	QUANTITY	0.99+
2022	DATE	0.99+
Charles Fitzgerald	PERSON	0.99+
Dave Hatfield	PERSON	0.99+
Brian Gracely	PERSON	0.99+
2019	DATE	0.99+
Lacework	ORGANIZATION	0.99+
two	QUANTITY	0.99+
GCP	ORGANIZATION	0.99+
33%	QUANTITY	0.99+
Walmart	ORGANIZATION	0.99+
David	PERSON	0.99+
2021	DATE	0.99+
20%	QUANTITY	0.99+
Kristen Martin	PERSON	0.99+
Palo Alto	LOCATION	0.99+
2020	DATE	0.99+
Ash Nair	PERSON	0.99+
Goldman Sachs	ORGANIZATION	0.99+
162 billion	QUANTITY	0.99+
New York City	LOCATION	0.99+
Databricks	ORGANIZATION	0.99+
October	DATE	0.99+
last year	DATE	0.99+
Arctic Wolf	ORGANIZATION	0.99+
two companies	QUANTITY	0.99+
38%	QUANTITY	0.99+
September	DATE	0.99+
Fed	ORGANIZATION	0.99+
JP Morgan Chase	ORGANIZATION	0.99+
80 billion	QUANTITY	0.99+
29%	QUANTITY	0.99+
32%	QUANTITY	0.99+
21 predictions	QUANTITY	0.99+
30%	QUANTITY	0.99+
HBO	ORGANIZATION	0.99+
75%	QUANTITY	0.99+
Game of Thrones	TITLE	0.99+
January	DATE	0.99+
2023	DATE	0.99+
10 predictions	QUANTITY	0.99+
both	QUANTITY	0.99+
22	QUANTITY	0.99+
ThoughtSpot	ORGANIZATION	0.99+
196 million	QUANTITY	0.99+
30	QUANTITY	0.99+
each	QUANTITY	0.99+
last year	DATE	0.99+
Palo Alto Networks	ORGANIZATION	0.99+
2020s	DATE	0.99+
167 billion	QUANTITY	0.99+
Okta	ORGANIZATION	0.99+
Second	QUANTITY	0.99+
Gartner	ORGANIZATION	0.99+
Eric Bradley	PERSON	0.99+
Aqua Securities	ORGANIZATION	0.99+
Dante	PERSON	0.99+
8%	QUANTITY	0.99+
Warner Brothers	ORGANIZATION	0.99+
Intuit	ORGANIZATION	0.99+
Cube Studios	ORGANIZATION	0.99+
each week	QUANTITY	0.99+
7 billion	QUANTITY	0.99+
40%	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+

Joshua Haslett, Google | Palo Alto Networks Ignite22

>> Narrator: TheCUBE presents Ignite '22, brought to you by Palo Alto Networks. >> Greetings from the MGM Grand Hotel in beautiful Las Vegas. It's theCUBE Live Day two of our coverage of Palo Alto Networks, ignite 22. Lisa Martin, Dave Vellante. Dave, what can I say? This has been a great couple of days. The amount of content we have created and shared with our viewers on theCUBE is second to none. >> Well, the cloud has completely changed the way that people think about security. >> Yeah. You know at first it was like, oh, the cloud, how can that be secure? And they realized, wow actually cloud is pretty secure if we do it right. And so shared responsibility model and partnerships are critical. >> Partnerships are critical, especially as more and more organizations are multicloud by default. Right? These days we're going to be bring Google into the conversation. Josh Haslet joins us. Strategic Partnership Manager at Google. Welcome. Great to have you Josh. >> Hi Lisa, thanks for having me here. >> So you are a secret squirrel from Palo Alto Networks. Talk to me a little bit about your background and about your role at Google in terms of partnership management. >> Sure, I feel like we need to add that to my title. [Lisa] You should, secret squirrel. >> Great. Yeah, so as a matter of fact, I've been at Google for two and a half years. Prior to that, I was at Palo Alto Networks. I was managing the business development relationship with Google, and I was kind of at the inception of when the cash came in and, and decided that we needed to think about how to do security in a new way from a platform standpoint, right? And so it was exciting because when I started with the partnership, we were focusing on still securing you know, workloads in the cloud with next generation firewall. And then as we went through acquisitions the Palo Alto added it expanded the capabilities of what we could do from cloud security. And so it was very exciting, you know, to, to make sure that we could onboard with Google Cloud, take a look at how not only Palo Alto was enhancing their solutions as they built those and delivered those from Google Cloud. But then how did we help customers adopt cloud in a more easy fashion by making things, you know more tightly integrated? And so that's really been a lot of what I've been involved in, which has been exciting to see the growth of both organizations as we see customers shifting to cloud transformation. And then how do they deploy these new methodologies and tools from a security perspective to embrace this new way of working and this new way of, you know creating applications and doing digital transformation. >> Important, since work is no longer a place, it's an activity. Organizations have have to be able to cater to the distributed workforce. Of course, the, the, the workforce has to be able to access everything that they need to, but it has to be done in a secure way regardless of what kind of company you are. >> Yeah, you're right, Lisa. It's interesting. I mean, the pandemic has really changed and accelerated that transformation. I think, you know really remote working has started previous to that. And I think Nikesh called that out in the keynote too right? He, he really said that this has been ongoing for a while, but I think, you know organizations had to figure out how to scale and that was something that they weren't as prepared for. And a lot of the technology that was deployed for VPN connectivity or supporting remote work that was fixed hardware. And so cloud deployment and cloud architecture specifically with Prisma access really enabled this transformation to happen in a much faster, you know, manner. And where we've come together is how do we make sure that customers, no matter what device, what user what application you're accessing. As we take a look at ZTNA, Zero Trust Network Access 2.0, how can we come together to partner to make sure the customers have that wide range of coverage and capability? >> How, how do you how would you describe Josh Google's partner strategy generally and specifically, you know, in the world of cyber and what makes it unique and different? >> Yeah, so that's a great question. I think, you know, from Google Cloud perspective we heard TK mention this in the keynote with Nikesh. You know, we focus on on building a secure platform first and foremost, right? We want to be a trusted cloud for customers to deploy on. And so, you know, we find that as customers do one of two things, they're looking at, you know, reducing cost as they move to cloud and consolidate workloads or as they embrace innovation and look at, you know leveraging things like BigQuery for analytics and you know machine learning for the way that they want to innovate and stay ahead of the competition. They have to think about how do they secure in a new way. And so, not only do we work on how do we secure our own platform, we work with trusted partners to make sure that customers have you mentioned it earlier, Dave the shared security model, right? How do they take a look at their applications and their workloads and this new way of working as they go to CI/CD pipelines, they start thinking about DevSecOps. How do they integrate tooling that is frictionless and seamless for their, for their teams to deploy but allows them to quickly embrace that cloud transformation journey. And so, yes, partners are critical to that. The other thing is, you know we find that, you mentioned earlier, Lisa that customers are multicloud, right? That's kind of the the new normal as we look at enterprises today. And so Google Cloud's going to do a great job at securing our platform, but we need partners that can help customers deploy policy that embraces not only the things that they put in Google Cloud but as they're in their transformation journey. How that embraces the estates that are in data centers the things that are still on-prem. And really this is about making sure that the applications no matter where they are, the databases no matter where they are, and the users no matter where they are are all secure in that new framework of deploying and embracing innovation on public cloud. >> One of the things that almost everybody from Palo Alto Networks talks about is their partnering strategy their acquisition strategy integrations. And I was doing some research. There's over 50 joint integrations that Google Cloud and Palo Alto Networks. Have you talked about Zero Trust Network Access 2.0 that was announced yesterday. >> Correct. >> Give us a flavor of what that is and what does it deliver that 1.0 did not? >> Well, great. And what I'd like to do is touch a little bit on those 50 integrations because it's been, you know, a a building rolling thunder, shall we say as far as how have we taken a look at customers embracing the cloud. The first thing was we took a look at at how do we make sure that Palo Alto solutions are easier for customers to deploy and to orchestrate in Google Cloud making their journey to embracing cloud seamless and easy. The second thing was how could we make that deployment and the infrastructure even more easy to adopt by doing first party integrations? So earlier this year we announced cloud IDS intrusion detection system where we actually have first party directly in our console of customers being able to simply select, they want to turn on inspection of the traffic that's running on Google Cloud and it leverages the threat detection capability from Palo Alto Networks. So we've gone from third party integration alone to first party integration. And that really takes us to, you know, the direction of what we're seeing customers need to embrace now which is, this is your Zero Trusts strategy and Zero Trust 2.0 helps customers do a number of things. The first is, you know, we don't want to just verify a user and their access into the environment once. It needs to be continuous inspection, right? Cause their state could change. I think, you know, the, the teams we're talking about some really good ways of addressing, you know for instance, TSA checkpoints, right? And how does that experience look? We need to make sure that we're constantly evaluating that user's access into the environment and then we need to make sure that the content that's being accessed or, you know, loaded into the environment is inspected. So we need continuous content inspection. And that's where our partnership really comes together very well, is not only can we take care of any app any device, any user, and especially as we take a look at you know, embracing contractor like use cases for instance where we have managed devices and unmanaged devices we bring together beyond Corp and Prisma access to take a look at how can we make sure any device, any user any application is secure throughout. And then we've got content inspection of how that ZTNA 2.0 experience looks like. >> Josh, that threat data that you just talked about. >> Yeah. >> Who has access to that? Is it available to any partner, any customer, how... it seems like there's gold in them, NAR hills, so. >> There is. But, this could be gold going both ways. So how, how do you adjudicate and, how do you make sure that first of all that that data's accessible for, for good and not in how do you protect it against, you know, wrong use? >> Well, this is one of the great things about partnering with Palo Alto because technically the the threat intelligence is coming from their ingestion of malware, known threats, and unknown threats right into their technology. Wildfire, for instance, is a tremendous example of this where unit 42 does, you know, analysis on unknown threats based upon what Nikesh said on stage. They've taken their I think he said 27 days to identification and remediation down to less than a minute, right? So they've been able to take the intelligence of what they ingest from all of their existing customers the unknown vulnerabilities that are identified quickly assessing what those look like, and then pushing out information to the rest of their customers so that they can remediate and protect against those threats. So we get this shared intelligence from the way that Palo Alto leverages that capability and we've brought that natively into Google Cloud with cloud intrusion detection. >> So, okay, so I'm, I'm I dunno why I have high frequency trading in my mind cause it used to be, you know, like the norm was, oh it's going to take a year to identify an intrusion. And, and, and now it's down to, you know take was down to 27 days. Now it's down to a minute. Now it's not. That's best practice. And I'm, again, I'm thinking high frequency trading how do I beat the speed of light? And that's kind of where we're headed, right? >> Right. >> And so that's why he said one minute's not enough. We have to keep going. >> That's right. >> So guys got your best people working on that? >> Well, as a matter of fact, so Palo Alto Networks, you know when we take a look at what Nikesh said from stage, he talked about using machine learning and AI to get ahead of what we what they look at as far as predictability not only about behaviors in the environment so things that are not necessarily known threats but things that aren't behaving properly in the environment. And you can start to detect based on that. The second piece of it then is a lot of that technology is built on Google Cloud. So we're leveraging, their leveraging the capabilities that come together with you know, aggregation of, of logs the file stitching across the entire environment from the endpoint through to cloud operations the things that they detect for network content inspection putting all those files together to understand, you know where has the threat vector entered how has it gone lateral inside the environment? And then how do you make sure that you remediate all of those points of intrusion. And so yeah it's been exciting to see how our product teams have worked together to continue to advance the capabilities for speed for customers. >> And secure speed is critical. We had the opportunity this morning to speak with Lee Claridge, the chief product officer, and you know one of the things that I had heard about Lee is that despite all of the challenges in cybersecurity and the amorphous expansion of the threat network and the sophistication of the adversaries he's really optimistic about what it's going to enable organizations to do. I see you smiling. Do you share that optimism? >> I, I do. I think, you know, when you bring, when you bring leaders together to tackle big problems, I think, you know we've got the right teams working on the right things and we understand the problems that the customers are facing. And so, you know, from a a Google cloud perspective we understand that partnering with Palo Alto Networks helps to make sure that that optimism continues. You know, we work on continuous innovation when it comes to Google Cloud security framework, but then partnering with Palo Alto brings additional capabilities to the table. >> Vision for the, for the partnership. Where do you want to see it go? What's... we're two to five years down the road, what's it look like? Maybe two to three years. Let's go. >> Well, it was interesting. I, I think neer was the one that mentioned on stage about, you know how AI is going to start replacing us in our main jobs, right? I I think there's a lot of truth to that. I think as we look forward, we see that our teams are going to continue to help with automation remediation and we're going to have the humans working on things that are more interesting and important. And so that's an exciting place to go because today the reality is that we are understaffed in cybersecurity across the industry and we just can't hire enough people to make sure that we can detect, remediate and secure, you know every user endpoint and environment out there. So it's exciting to see that we've got a capability to move in a direction to where we can make sure that we get ahead of the threat actors. >> Yeah. So he said within five years your SOC will be AI based and and basically he elaborated saying there's a lot of stuff that you're doing today that you're not going to be doing tomorrow. >> That's true. >> And that's going to continue to be a moving target I would think Google is probably ahead in that game and ahead of most, right? I mean, you guys were there early. I mean, I remember when Hadoop was all the rage like just at the beginning you guys like, yeah, you know Google's like, no, no, no, we're not doing Hadoop anymore. That's like old news. So you tended to be, I don't know, at least five maybe seven years ahead of the industry. So I imagine you using a lot of those AI techniques in your own business today. >> Absolutely. I mean, I think you see it in our consumer products, and you certainly see it in the the capabilities we make available to enterprise as far as how they can innovate on our cloud. And we want to make sure that we continue to provide those capabilities, you know not only for the tools that we build but the tools that customers use. >> What's the, as we kind of get towards the end of our conversation here, we we talk about zero trust as, as a journey, as an approach. It's not a product, it's not a tool. What is the, who's involved in the zero trust journey from the customers perspective? Is this solely with the CSO, CSO, CIOs or is this at the CEO level going, we have to be a data company but we have to be a secure data company 24/7. >> It's interesting as you've seen malware, phishing, ransomware attacks. >> Yeah. >> This is not only just a CSO CIO conversation it's a board level conversation. And so, you know the way to address this new way of working where we have very distributed environments where you can't create a perimeter anymore. You need to strategize with zero trust. And so continuously, when we're talking to customers we're hearing that as a main initiative, you know from the CIO's office and from the board level. >> Got it, last question. The upgrade path for existing customers from 1., ZTNA 1.0 to 2.0. How simple is that? >> It's easy. You know, when we take- >> Is there an easy button? >> So here's the great thing [Dave] If you're feeling lucky. [Lisa] Yeah. (group laughs) >> Well, Palo Alto, right? Billing prisma access has really taken what was traditional security that was an on-prem or a data center deployed strategy to cloud-based. And so we've worked with customers like Princeton University who had to quickly transition from in-person learning to distance learning find a way to ramp their staff their faculty and their students. And we were able to, you know Palo Alto deploy it on Google Cloud's, you know network that solution in very quick order and had those, you know, everybody back up and running. So deployment and upgrade path is, is simple when you look at cloud deployed architectures to address zero trusts network. >> That's awesome. Some of those, some of those use cases that came out of the pandemic were mind blowing but also really set the table for other organizations to go, yes, this can be done. And it doesn't have to take forever because frankly where security is concerned, we don't have time. >> That's right. And it's so much faster than traditional architectures where you had to procure hardware. >> Yeah. >> Deploy it, configure it, and then, you know push agents out to all the endpoints and and get your users provisioned. In this case, we're talking about cloud delivered, right? So I've seen, you know, with Palo Alto deploying for customers that run on Google Cloud they've deployed tens of thousands of users in a very short order. You know, we're talking It was, it's not months anymore. It's not weeks anymore. It's days >> Has to be days. Josh, it's been such a pleasure having you on the program. Thank you for stopping by and talking with Dave and me about Google Cloud, Palo Alto Networks in in addition to secret squirrel. I feel like when you were describing your background that you're like the love child of Palo Alto Networks and Google Cloud, you might put that on your cartoon. >> That is a huge compliment. I really appreciate that, Lisa, thank you so much. >> Thanks so much, Josh. [Josh] It's been a pleasure being here with you. [Dave] Thank you >> Oh, likewise. For Josh Haslett and Dave, I'm Lisa Martin. You're watching theCUBE, the leader in live coverage for emerging and enterprise tech. (upbeat outro music)

Published Date : Dec 15 2022

SUMMARY :

brought to you by Palo Alto Networks. The amount of content we have created completely changed the way how can that be secure? Great to have you Josh. So you are a secret squirrel to add that to my title. and decided that we needed to what kind of company you are. And a lot of the technology And so, you know, we find One of the things that almost everybody and what does it deliver that 1.0 did not? of addressing, you know that you just talked about. Is it available to any against, you know, wrong use? and remediation down to And, and, and now it's down to, you know We have to keep going. that you remediate all of that despite all of the And so, you know, from a Where do you want to see it go? And so that's an exciting place to go of stuff that you're doing today And that's going to not only for the tools that we build at the CEO level going, we It's interesting And so, you know from 1., ZTNA 1.0 to 2.0. You know, when we take- So here's the great thing And we were able to, you know And it doesn't have to take you had to procure hardware. So I've seen, you know, I feel like when you were Lisa, thank you so much. [Dave] Thank you For Josh Haslett and

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Josh	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Google	ORGANIZATION	0.99+
Joshua Haslett	PERSON	0.99+
Lisa	PERSON	0.99+
two	QUANTITY	0.99+
Josh Haslet	PERSON	0.99+
Josh Haslett	PERSON	0.99+
27 days	QUANTITY	0.99+
Palo Alto Networks	ORGANIZATION	0.99+
Lee Claridge	PERSON	0.99+
Princeton University	ORGANIZATION	0.99+
Palo Alto Networks	ORGANIZATION	0.99+
50 integrations	QUANTITY	0.99+
Palo Alto	ORGANIZATION	0.99+
first	QUANTITY	0.99+
five years	QUANTITY	0.99+
three years	QUANTITY	0.99+
one minute	QUANTITY	0.99+
tomorrow	DATE	0.99+
less than a minute	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
yesterday	DATE	0.99+
two and a half years	QUANTITY	0.99+
Palo Alto	ORGANIZATION	0.99+
one	QUANTITY	0.99+
today	DATE	0.99+
Hadoop	TITLE	0.99+
both ways	QUANTITY	0.99+
seven years	QUANTITY	0.99+
second thing	QUANTITY	0.98+
Prisma	ORGANIZATION	0.98+
second piece	QUANTITY	0.98+
Zero Trusts	ORGANIZATION	0.98+
TheCUBE	ORGANIZATION	0.98+
Lee	PERSON	0.98+
earlier this year	DATE	0.98+
both organizations	QUANTITY	0.98+
second	QUANTITY	0.97+
One	QUANTITY	0.97+
Day two	QUANTITY	0.97+
first thing	QUANTITY	0.97+
Google Cloud	TITLE	0.96+
first party	QUANTITY	0.96+
ZTNA 2.0	TITLE	0.96+
a year	QUANTITY	0.96+
Nikesh	PERSON	0.95+
over 50 joint integrations	QUANTITY	0.94+
tens of thousands of users	QUANTITY	0.94+
zero trust	QUANTITY	0.92+
two things	QUANTITY	0.92+

Jed Dougherty, Dataiku | AWS re:Invent 2022

(bright music) >> Welcome back to Vegas, guys and girls. We're pleased that you're watching theCUBE. We know you've been with us. This is our fourth day. We know you've been with us since day one. Why wouldn't you be? Lisa Martin, here. As I mentioned, day four of theCUBE's coverage of AWS re:Invent. There are north of 55,000 people that have been at this event this week. We're hearing hundreds of thousands online. It really feels like old times, which is awesome. We're pleased to welcome back a gentleman from Dataiku who's actually new to theCUBE but Dataiku is not. Jed Dougherty is here, the VP of Platform Strategy. Thanks to joining me today, Jed. >> Oh, I'm so happy to be here. >> Talk a little bit, for anybody that isn't familiar with Dataiku, tell the audience a little bit about the technology, what you guys do. >> Dataiku is an end-to-end data science machine learning platform. We take everything from data ingestion, piplining of that data, bringing it all together, something that's useful for building models, deploying those models and then managing your ML ops workflow. So, really all the way across. And we sit on top of, basically, tons of different AWS stack as well as lots of the partners that are here today. >> Okay, got it. >> Snowflake, Databricks, all that. >> Got it, so one of the things that, it was funny, I think it was Adam's keynote Tuesday morning. I didn't time it, I watched it, but one of my guests said to me earlier this week that Adam spent exactly 52 minutes talking about data. >> Yeah. >> 52 minutes. Obviously, we can't come to an event like this without talking about data. Every company these days has to be a data company. Whether it's my grocery store or a retailer, a hospital, and so- >> Jed: It is the lifeblood of every modern company. >> It is, but you have to be able to access it. You have to be able to harness it, access it, derive insights from it, and be able to act on that faster than the competitors that are waiting, like, right back here. One of the things Adam Selipsky talked about with our boss, John Furrier, who's the co-CEO of theCUBE, they had a sit-down about a week before re:Invent. John always gets a preview of the show and Adam said, you know, he thinks the role of data analyst is going to go away. Or at least the term, because with data democratization that needs to happen. Putting data in the hands of all the business users, that every business user, whether you're in technology or marketing or ops or finance, it's going to have to analyze data to do their jobs. >> Could not agree more. >> Are you hearing that from customers? >> 100% >> Yeah. >> I was just at the CTO Summit of Bank of America two weeks ago out in California, and they told, their CTO had a statistic, 60,000 technologists in Bank of America, all asking data-type questions. You can have the best team of data scientists in the world, and they do. They have some of the best data scientists in the world there. And this team of data scientists could answer any one of the questions that those 60,000 people might have but they can't answer all of them, right? You need those people to be able to answer their own questions. I don't know if the term data analysts are going away. I think, yeah, everybody's just going to have to become a bit more of one. Just like how Excel taught everybody how to use the spreadsheet, in the future, in the next five, 10 years, the democratization of AI means that tools like Dataiku and other data science tools are going to teach everybody how to analyze data. >> Talk about Dataiku as a facilitator of that, of that democratization. Giving, like the citizen technologist who might be in finance, the ability to do that. >> So, a lot of data science tools are aimed at your hardcore coder, right? Somebody who wants to be sitting at a notebook writing (indistinct) or something like that and running models on some big fancy Spark server. Dataiku is still going to be running models on some big fancy Spark server but we're really obfuscating the challenge of writing code away from the user. So we target low code, no code, and high code users all working together in a collaborative platform. So we really do, we believe that there is always going to be a place for data scientists. That role is not going away. You will always need hardcore coders to take on those moonshot very challenging topics. But for every day AI, anybody should be able to do this and it should be open to anybody. >> Right. >> Jed: Really aim to facilitate that. >> I would love to hear some feedback, you know, this is day four of the show as I was saying, and day four is packed. I mean, this is energy-level-wise, guys, it is the same as it was when we started here on Friday night. But I'd love to hear, Jed, from your perspective some of the customer conversations that you've had, what are some of the challenges? They're coming to you saying, "Jed, Dataiku, help us eradicate these challenges so we can transform our business." >> What I'm hearing from customers and partners and AWS here is, over and over, we don't want to buy tools anymore. We want to buy solutions. We want a vertical solution that's pre-built for our industry. And we want it to be, not necessarily click and run out of the box, but we want a template that we can build off of quickly. And I've heard that customers are also looking to understand how tools can be packaged together. You got how many booths are here? 1000 booths? >> Yes, easily. >> You have 1000 different products being talked about, right behind us. Customers need to know which of these products are friends with each other and how they fit together so that they are making sure that when they purchase a set, a suite of tools to do their jobs, it's all going to work naturally together. So, being able, I think this is a really vital concept for GSIs as well. GSIs needs to understand how to package sets of tools together to deliver a full solution to clients. People don't want to be, you know, I think 10 years ago, five years ago, AWS was in the business of selling servers in the cloud. But basically what you do is, you would buy an EC two instance and you install whatever software you wanted on it. I don't know that they're in that business still but customers don't want to buy servers from AWS anymore. They want to buy solutions. >> Right. >> Rent, whatever. >> Yeah. (chuckles) >> That is the big repeated message that I've heard here. >> So you brought up a good point that there are probably 1000 booths here. You could be here every day and not get to see everything that's going on. Plus this show was going on across the strip. We're only getting a fraction of the people that are here. But with that said, to your point, there are so many tools out there. Customers are looking for solutions. One of the things that we say about theCUBE is, we extract the signal from the noise. How does Dataiku get past the noise? How do you get up the stack to really impact customers so they understand the value that you're delivering? >> I think that Data science and ML sound like a very complicated topic but our value prop is relatively simple. And we appeal both to your end users who are excited to learn about how data science works and how they can leverage these tools in their day-to-day jobs, as well as appealing to IT. IT, right now, at major organizations they want to be able to build a full stack that makes sense. And the big choices they're making right now are around infrastructure. Where am I going to run my compute? So, they're choosing between Snowflake or Databricks or a native AWS compute solution, right? And so they make this big choice around compute and then they realize, "Oh, how many of our users across our organization are actually able to leverage this big compute choice?" Oh, maybe 100, maybe 200. That's not incredibly useful for what we've just decided to completely stand behind. Dataiku, all of a sudden, opens that up to 1000s of users across your organization. So it makes IT feel empowered by being able to help more people. And it makes users feel empowered by being able to use a great tool and start answering their own questions. >> And where are your customer conversations these days? As we look at AI and ML, emerging technologies, so many customers and companies, knowing we have to go in this direction. We have to have AI to speed the business. Are you seeing more of the conversations are still in IT or are they actually going up the stack? >> (chuckles) It's a great question. When you're going into large organizations, there's two sales motions, right? There's convincing the business users that this is a great thing and then convincing IT that it's not going to be too painful. You always have to go to both places. IT doesn't want to take on a boondoggler, or there's an albatross, I don't remember the word, but, something that they're going to have to deal with for the next 10 years and then eventually dismantle and pull apart. I think a lot of IT got very scared about big data platforms and solutions because of Hadoop. To be honest, Hadoop was incredibly powerful but maybe not as mature of technology as IT would've liked it to be. From a maintenance and administration standpoint. So yes, you will always have to sell to IT and help IT feel comfortable with the platform. But no, the conversations that I want to have are the use case conversations with a Chief Data Officer, Chief Revenue Officer, Chief Marketing Officer. That's who I really want to convince that this is going to be a worthwhile opportunity. >> And what are some of the key, sorry. What are some of the key use cases that Dataiku is tackling in the market these days? >> So we work a lot. Two of the biggest organizations, or verticals, that I work with personally are finance and pharmaceuticals. In finance, we are closely embedded with wealth management organizations. So, a lot of that is around customer entertainment, churn, relatively obvious, simple concepts but ones where it's worth a lot of money. In pharma, we work both on the supply side. So, doing supply chain optimization, ensuring the right drugs get to the right places at the right time. As well as on the business and marketing side. So, ensuring that your ad spend is correctly distributed across different advertising platforms. >> So if you're working with a financial organization, I want to understand from a consumer, from the end user's perspective, although obviously this technology impacts the end user who's trying to do a transaction. What's in it for me? And I don't know as the end user that Dataiku is under the hood. >> You'd never know. >> Which is good. I shouldn't have to worry about the technology. >> Jed: You shouldn't have to worry about that at all. >> What's in it for the end user customer? What are they gaining from this? >> So, from a very end user perspective, if you think about when you logged onto maybe your Bank of America, your Chase app, five or 10 years ago, maybe you didn't even have it on your phone five years ago. Or when you logged into your account online. We do 95% of our banking online right now, right? I go into a physical location, what? I don't know, once every six months or something? Get a cashier's check? I don't know. The experience that you're getting and the amount of information you're getting back about your spending habits, where your money is going, what your credit score is, all of these things are being driven by these big data organizations inside the banks. Also, any type, this is a little creepier, but any type of promotional emails or the types of things that you get feedback on when you use your credit card and the offers that you get through that, are all being personalized to you through the information that these banks are collecting about your spending habits. >> Yeah, but we want that as a consumer, we want the personalized. >> Yeah, of course. We want it to be magic slash not creepy. (laughs) >> Right, I want them to recommend the best card for me. >> Right. >> The next best thing. >> It's good for me, it's good for them. >> Don't serve me up something that I've already bought. That always bugs me when I'm like, I already bought that. >> I get that all the time. I'm like, yeah, I have that card already. It's in my wallet. Why are you telling me? >> We only have a couple of minutes left Jed, but talk to me about from a platform strategy perspective, what's next for Dataiku and AWS? >> So we are making a matrix transition right now and it's core to our platform. For a long time, the way that we've installed Dataiku is, we help our customers install it on their AWS account so it runs inside their tenant. This is very comfortable for, for example, large banking clients, pharma clients that have personally identifiable information, all that kind of thing. They own everything. However, as we were talking about before, we're really moving from providing a tool to providing solutions. And part of that is obviously a move to SaaS. So two years ago we released a SaaS offering. We've been expanding it more and more to, this year, we want to be pushing SaaS first. So Dataiku online should be the first option when new customers move on. And that is a huge platform shift. It means making sure that we have the right security in place. It means making sure that we have the right scaling in place, that we have 24-7 support. All this has been a big challenge. A big fascinating challenge, actually, to put together. >> Awesome. Last question for you. Say you get a brand new DeLorean, I hear they're coming back, and you want to put, you really, really want to put a bumper sticker on it, 'cause why not? And it's about Dataiku and it's like a sizzle reel kind of thing. >> A sizzle real, alright. >> Yeah. What does it say? >> Extraordinary people, everyday AI. >> Wow. Drop the mic, Jed. That was awesome. Thank you so much for coming on the program. We really appreciate the update on Dataiku. What you guys are doing for customers, your specialization and solutions for verticals. Awesome stuff, we'll have to have you back. >> Thank you so much. >> Alright, my pleasure. >> Bye-Bye. >> For my guest, I'm Lisa Martin. You're watching theCUBE, the leader in live enterprise and emerging tech coverage. (bright music)

Published Date : Dec 1 2022

SUMMARY :

Jed Dougherty is here, the tell the audience a little lots of the partners that are here today. Got it, so one of the has to be a data company. Jed: It is the lifeblood that needs to happen. I don't know if the term the ability to do that. is always going to be a of the show as I was saying, and run out of the box, I don't know that they're That is the big repeated of the people that are here. And the big choices We have to have AI to speed the business. that this is going to be What are some of the key use cases So, a lot of that is around And I don't know as the I shouldn't have to worry to worry about that at all. and the offers that you get through that, Yeah, but we want that as a consumer, We want it to be magic the best card for me. it's good for them. something that I've already bought. I get that all the time. and it's core to our platform. and you want to put, you really, really What does it say? have to have you back. the leader in live enterprise

ENTITIES

Entity	Category	Confidence
Adam	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Jed Dougherty	PERSON	0.99+
Adam Selipsky	PERSON	0.99+
John Furrier	PERSON	0.99+
AWS	ORGANIZATION	0.99+
95%	QUANTITY	0.99+
California	LOCATION	0.99+
Jed	PERSON	0.99+
1000 booths	QUANTITY	0.99+
Friday night	DATE	0.99+
John	PERSON	0.99+
100%	QUANTITY	0.99+
fourth day	QUANTITY	0.99+
Two	QUANTITY	0.99+
first option	QUANTITY	0.99+
Tuesday morning	DATE	0.99+
Excel	TITLE	0.99+
60,000 people	QUANTITY	0.99+
Bank of America	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
two years ago	DATE	0.99+
this year	DATE	0.99+
100	QUANTITY	0.99+
today	DATE	0.99+
52 minutes	QUANTITY	0.99+
60,000 technologists	QUANTITY	0.99+
10 years ago	DATE	0.99+
both	QUANTITY	0.99+
One	QUANTITY	0.99+
five	DATE	0.99+
Dataiku	ORGANIZATION	0.99+
52 minutes	QUANTITY	0.98+
five years ago	DATE	0.98+
200	QUANTITY	0.98+
two sales	QUANTITY	0.98+
one	QUANTITY	0.98+
earlier this week	DATE	0.98+
Snowflake	ORGANIZATION	0.98+
Vegas	LOCATION	0.98+
1000 different products	QUANTITY	0.97+
this week	DATE	0.97+
both places	QUANTITY	0.97+
Hadoop	TITLE	0.97+
CTO Summit	EVENT	0.97+
two weeks ago	DATE	0.96+
hundreds of thousands	QUANTITY	0.96+
theCUBE	ORGANIZATION	0.95+
Bank of America	LOCATION	0.94+
Bank of America	EVENT	0.93+
Dataiku	TITLE	0.92+
day one	QUANTITY	0.91+
Spark	TITLE	0.9+
day four	QUANTITY	0.89+
first	QUANTITY	0.88+
EC two	TITLE	0.88+
Dataiku	PERSON	0.86+
a week	DATE	0.83+
Chase	TITLE	0.83+
one of my guests	QUANTITY	0.83+
CTO	ORGANIZATION	0.81+

Evan Kaplan, InfluxData | AWS re:invent 2022

>>Hey everyone. Welcome to Las Vegas. The Cube is here, live at the Venetian Expo Center for AWS Reinvent 2022. Amazing attendance. This is day one of our coverage. Lisa Martin here with Day Ante. David is great to see so many people back. We're gonna be talk, we've been having great conversations already. We have a wall to wall coverage for the next three and a half days. When we talk to companies, customers, every company has to be a data company. And one of the things I think we learned in the pandemic is that access to real time data and real time analytics, no longer a nice to have that is a differentiator and a competitive all >>About data. I mean, you know, I love the topic and it's, it's got so many dimensions and such texture, can't get enough of data. >>I know we have a great guest joining us. One of our alumni is back, Evan Kaplan, the CEO of Influx Data. Evan, thank you so much for joining us. Welcome back to the Cube. >>Thanks for having me. It's great to be here. So here >>We are, day one. I was telling you before we went live, we're nice and fresh hosts. Talk to us about what's new at Influxed since the last time we saw you at Reinvent. >>That's great. So first of all, we should acknowledge what's going on here. This is pretty exciting. Yeah, that does really feel like, I know there was a show last year, but this feels like the first post Covid shows a lot of energy, a lot of attention despite a difficult economy. In terms of, you know, you guys were commenting in the lead into Big data. I think, you know, if we were to talk about Big Data five, six years ago, what would we be talking about? We'd been talking about Hadoop, we were talking about Cloudera, we were talking about Hortonworks, we were talking about Big Data Lakes, data stores. I think what's happened is, is this this interesting dynamic of, let's call it if you will, the, the secularization of data in which it breaks into different fields, different, almost a taxonomy. You've got this set of search data, you've got this observability data, you've got graph data, you've got document data and what you're seeing in the market and now you have time series data. >>And what you're seeing in the market is this incredible capability by developers as well and mostly open source dynamic driving this, this incredible capability of developers to assemble data platforms that aren't unicellular, that aren't just built on Hado or Oracle or Postgres or MySQL, but in fact represent different data types. So for us, what we care about his time series, we care about anything that happens in time, where time can be the primary measurement, which if you think about it, is a huge proportion of real data. Cuz when you think about what drives ai, you think about what happened, what happened, what happened, what happened, what's going to happen. That's the functional thing. But what happened is always defined by a period, a measurement, a time. And so what's new for us is we've developed this new open source engine called IOx. And so it's basically a refresh of the whole database, a kilo database that uses Apache Arrow, par K and data fusion and turns it into a super powerful real time analytics platform. It was already pretty real time before, but it's increasingly now and it adds SQL capability and infinite cardinality. And so it handles bigger data sets, but importantly, not just bigger but faster, faster data. So that's primarily what we're talking about to show. >>So how does that affect where you can play in the marketplace? Is it, I mean, how does it affect your total available market? Your great question. Your, your customer opportunities. >>I think it's, it's really an interesting market in that you've got all of these different approaches to database. Whether you take data warehouses from Snowflake or, or arguably data bricks also. And you take these individual database companies like Mongo Influx, Neo Forge, elastic, and people like that. I think the commonality you see across the volume is, is many of 'em, if not all of them, are based on some sort of open source dynamic. So I think that is an in an untractable trend that will continue for on. But in terms of the broader, the broader database market, our total expand, total available tam, lots of these things are coming together in interesting ways. And so the, the, the wave that will ride that we wanna ride, because it's all big data and it's all increasingly fast data and it's all machine learning and AI is really around that measurement issue. That instrumentation the idea that if you're gonna build any sophisticated system, it starts with instrumentation and the journey is defined by instrumentation. So we view ourselves as that instrumentation tooling for understanding complex systems. And how, >>I have to follow quick follow up. Why did you say arguably data bricks? I mean open source ethos? >>Well, I was saying arguably data bricks cuz Spark, I mean it's a great company and it's based on Spark, but there's quite a gap between Spark and what Data Bricks is today. And in some ways data bricks from the outside looking in looks a lot like Snowflake to me looks a lot like a really sophisticated data warehouse with a lot of post-processing capabilities >>And, and with an open source less >>Than a >>Core database. Yeah. Right, right, right. Yeah, I totally agree. Okay, thank you for that >>Part that that was not arguably like they're, they're not a good company or >>No, no. They got great momentum and I'm just curious. Absolutely. You know, so, >>So talk a little bit about IOx and, and what it is enabling you guys to achieve from a competitive advantage perspective. The key differentiators give us that scoop. >>So if you think about, so our old storage engine was called tsm, also open sourced, right? And IOx is open sourced and the old storage engine was really built around this time series measurements, particularly metrics, lots of metrics and handling those at scale and making it super easy for developers to use. But, but our old data engine only supported either a custom graphical UI that you'd build yourself on top of it or a dashboarding tool like Grafana or Chronograph or things like that. With IOCs. Two or three interventions were important. One is we now support, we'll support things like Tableau, Microsoft, bi, and so you're taking that same data that was available for instrumentation and now you're using it for business intelligence also. So that became super important and it kind of answers your question about the expanded market expands the market. The second thing is, when you're dealing with time series data, you're dealing with this concept of cardinality, which is, and I don't know if you're familiar with it, but the idea that that it's a multiplication of measurements in a table. And so the more measurements you want over the more series you have, you have this really expanding exponential set that can choke a database off. And the way we've designed IIS to handle what we call infinite cardinality, where you don't even have to think about that design point of view. And then lastly, it's just query performance is dramatically better. And so it's pretty exciting. >>So the unlimited cardinality, basically you could identify relationships between data and different databases. Is that right? Between >>The same database but different measurements, different tables, yeah. Yeah. Right. Yeah, yeah. So you can handle, so you could say, I wanna look at the way, the way the noise levels are performed in this room according to 400 different locations on 25 different days, over seven months of the year. And that each one is a measurement. Each one adds to cardinality. And you can say, I wanna search on Tuesdays in December, what the noise level is at 2:21 PM and you get a very quick response. That kind of instrumentation is critical to smarter systems. How are >>You able to process that data at at, in a performance level that doesn't bring the database to its knees? What's the secret sauce behind that? >>It's AUM database. It's built on Parque and Apache Arrow. But it's, but to say it's nice to say without a much longer conversation, it's an architecture that's really built for pulling that kind of data. If you know the data is time series and you're looking for a time measurement, you already have the ability to optimize pretty dramatically. >>So it's, it's that purpose built aspect of it. It's the >>Purpose built aspect. You couldn't take Postgres and do the same >>Thing. Right? Because a lot of vendors say, oh yeah, we have time series now. Yeah. Right. So yeah. Yeah. Right. >>And they >>Do. Yeah. But >>It's not, it's not, the founding of the company came because Paul Dicks was working on Wall Street building time series databases on H base, on MyQ, on other platforms and realize every time we do it, we have to rewrite the code. We build a bunch of application logic to handle all these. We're talking about, we have customers that are adding hundreds of millions to billions of points a second. So you're talking about an ingest level. You know, you think about all those data points, you're talking about ingest level that just doesn't, you know, it just databases aren't designed for that. Right? And so it's not just us, our competitors also build good time series databases. And so the category is really emergent. Yeah, >>Sure. Talk about a favorite customer story they think really articulates the value of what Influx is doing, especially with IOx. >>Yeah, sure. And I love this, I love this story because you know, Tesla may not be in favor because of the latest Elon Musker aids, but, but, but so we've had about a four year relationship with Tesla where they built their power wall technology around recording that, seeing your device, seeing the stuff, seeing the charging on your car. It's all captured in influx databases that are reporting from power walls and mega power packs all over the world. And they report to a central place at, at, at Tesla's headquarters and it reports out to your phone and so you can see it. And what's really cool about this to me is I've got two Tesla cars and I've got a Tesla solar roof tiles. So I watch this date all the time. So it's a great customer story. And actually if you go on our website, you can see I did an hour interview with the engineer that designed the system cuz the system is super impressive and I just think it's really cool. Plus it's, you know, it's all the good green stuff that we really appreciate supporting sustainability, right? Yeah. >>Right, right. Talk about from a, what's in it for me as a customer, what you guys have done, the change to IOCs, what, what are some of the key features of it and the key values in it for customers like Tesla, like other industry customers as well? >>Well, so it's relatively new. It just arrived in our cloud product. So Tesla's not using it today. We have a first set of customers starting to use it. We, the, it's in open source. So it's a very popular project in the open source world. But the key issues are, are really the stuff that we've kind of covered here, which is that a broad SQL environment. So accessing all those SQL developers, the same people who code against Snowflake's data warehouse or data bricks or Postgres, can now can code that data against influx, open up the BI market. It's the cardinality, it's the performance. It's really an architecture. It's the next gen. We've been doing this for six years, it's the next generation of everything. We've seen how you make time series be super performing. And that's only relevant because more and more things are becoming real time as we develop smarter and smarter systems. The journey is pretty clear. You instrument the system, you, you let it run, you watch for anomalies, you correct those anomalies, you re instrument the system. You do that 4 billion times, you have a self-driving car, you do that 55 times, you have a better podcast that is, that is handling its audio better, right? So everything is on that journey of getting smarter and smarter. So >>You guys, you guys the big committers to IOCs, right? Yes. And how, talk about how you support the, develop the surrounding developer community, how you get that flywheel effect going >>First. I mean it's actually actually a really kind of, let's call it, it's more art than science. Yeah. First of all, you you, you come up with an architecture that really resonates for developers. And Paul Ds our founder, really is a developer's developer. And so he started talking about this in the community about an architecture that uses Apache Arrow Parque, which is, you know, the standard now becoming for file formats that uses Apache Arrow for directing queries and things like that and uses data fusion and said what this thing needs is a Columbia database that sits behind all of this stuff and integrates it. And he started talking about it two years ago and then he started publishing in IOCs that commits in the, in GitHub commits. And slowly, but over time in Hacker News and other, and other people go, oh yeah, this is fundamentally right. >>It addresses the problems that people have with things like click cows or plain databases or Coast and they go, okay, this is the right architecture at the right time. Not different than original influx, not different than what Elastic hit on, not different than what Confluent with Kafka hit on and their time is you build an audience of people who are committed to understanding this kind of stuff and they become committers and they become the core. Yeah. And you build out from it. And so super. And so we chose to have an MIT open source license. Yeah. It's not some secondary license competitors can use it and, and competitors can use it against us. Yeah. >>One of the things I know that Influx data talks about is the time to awesome, which I love that, but what does that mean? What is the time to Awesome. Yeah. For developer, >>It comes from that original story where, where Paul would have to write six months of application logic and stuff to build a time series based applications. And so Paul's notion was, and this was based on the original Mongo, which was very successful because it was very easy to use relative to most databases. So Paul developed this commitment, this idea that I quickly joined on, which was, hey, it should be relatively quickly for a developer to build something of import to solve a problem, it should be able to happen very quickly. So it's got a schemaless background so you don't have to know the schema beforehand. It does some things that make it really easy to feel powerful as a developer quickly. And if you think about that journey, if you feel powerful with a tool quickly, then you'll go deeper and deeper and deeper and pretty soon you're taking that tool with you wherever you go, it becomes the tool of choice as you go to that next job or you go to that next application. And so that's a fundamental way we think about it. To be honest with you, we haven't always delivered perfectly on that. It's generally in our dna. So we do pretty well, but I always feel like we can do better. >>So if you were to put a bumper sticker on one of your Teslas about influx data, what would it >>Say? By the way, I'm not rich. It just happened to be that we have two Teslas and we have for a while, we just committed to that. The, the, so ask the question again. Sorry. >>Bumper sticker on influx data. What would it say? How, how would I >>Understand it be time to Awesome. It would be that that phrase his time to Awesome. Right. >>Love that. >>Yeah, I'd love it. >>Excellent time to. Awesome. Evan, thank you so much for joining David, the >>Program. It's really fun. Great thing >>On Evan. Great to, you're on. Haven't Well, great to have you back talking about what you guys are doing and helping organizations like Tesla and others really transform their businesses, which is all about business transformation these days. We appreciate your insights. >>That's great. Thank >>You for our guest and Dave Ante. I'm Lisa Martin, you're watching The Cube, the leader in emerging and enterprise tech coverage. We'll be right back with our next guest.

Published Date : Nov 29 2022

SUMMARY :

And one of the things I think we learned in the pandemic is that access to real time data and real time analytics, I mean, you know, I love the topic and it's, it's got so many dimensions and such Evan, thank you so much for joining us. It's great to be here. Influxed since the last time we saw you at Reinvent. terms of, you know, you guys were commenting in the lead into Big data. And so it's basically a refresh of the whole database, a kilo database that uses So how does that affect where you can play in the marketplace? And you take these individual database companies like Mongo Influx, Why did you say arguably data bricks? And in some ways data bricks from the outside looking in looks a lot like Snowflake to me looks a lot Okay, thank you for that You know, so, So talk a little bit about IOx and, and what it is enabling you guys to achieve from a And the way we've designed IIS to handle what we call infinite cardinality, where you don't even have to So the unlimited cardinality, basically you could identify relationships between data And you can say, time measurement, you already have the ability to optimize pretty dramatically. So it's, it's that purpose built aspect of it. You couldn't take Postgres and do the same So yeah. And so the category is really emergent. especially with IOx. And I love this, I love this story because you know, what you guys have done, the change to IOCs, what, what are some of the key features of it and the key values in it for customers you have a self-driving car, you do that 55 times, you have a better podcast that And how, talk about how you support architecture that uses Apache Arrow Parque, which is, you know, the standard now becoming for file And you build out from it. One of the things I know that Influx data talks about is the time to awesome, which I love that, So it's got a schemaless background so you don't have to know the schema beforehand. It just happened to be that we have two Teslas and we have for a while, What would it say? Understand it be time to Awesome. Evan, thank you so much for joining David, the Great thing Haven't Well, great to have you back talking about what you guys are doing and helping organizations like Tesla and others really That's great. You for our guest and Dave Ante.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Evan Kaplan	PERSON	0.99+
six months	QUANTITY	0.99+
Evan	PERSON	0.99+
Tesla	ORGANIZATION	0.99+
Influx Data	ORGANIZATION	0.99+
Paul	PERSON	0.99+
55 times	QUANTITY	0.99+
two	QUANTITY	0.99+
2:21 PM	DATE	0.99+
Las Vegas	LOCATION	0.99+
Dave Ante	PERSON	0.99+
Paul Dicks	PERSON	0.99+
six years	QUANTITY	0.99+
last year	DATE	0.99+
hundreds of millions	QUANTITY	0.99+
Mongo Influx	ORGANIZATION	0.99+
4 billion times	QUANTITY	0.99+
Two	QUANTITY	0.99+
December	DATE	0.99+
Microsoft	ORGANIZATION	0.99+
Influxed	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Influx	ORGANIZATION	0.99+
IOx	TITLE	0.99+
MySQL	TITLE	0.99+
three	QUANTITY	0.99+
Tuesdays	DATE	0.99+
each one	QUANTITY	0.98+
400 different locations	QUANTITY	0.98+
25 different days	QUANTITY	0.98+
first set	QUANTITY	0.98+
an hour	QUANTITY	0.98+
First	QUANTITY	0.98+
six years ago	DATE	0.98+
The Cube	TITLE	0.98+
One	QUANTITY	0.98+
Neo Forge	ORGANIZATION	0.98+
second thing	QUANTITY	0.98+
Each one	QUANTITY	0.98+
Paul Ds	PERSON	0.97+
IOx	ORGANIZATION	0.97+
today	DATE	0.97+
Teslas	ORGANIZATION	0.97+
MIT	ORGANIZATION	0.96+
Postgres	ORGANIZATION	0.96+
over seven months	QUANTITY	0.96+
one	QUANTITY	0.96+
five	DATE	0.96+
Venetian Expo Center	LOCATION	0.95+
Big Data Lakes	ORGANIZATION	0.95+
Cloudera	ORGANIZATION	0.94+
Columbia	LOCATION	0.94+
InfluxData	ORGANIZATION	0.94+
Wall Street	LOCATION	0.93+
SQL	TITLE	0.92+
Elastic	TITLE	0.92+
Data Bricks	ORGANIZATION	0.92+
Hacker News	TITLE	0.92+
two years ago	DATE	0.91+
Oracle	ORGANIZATION	0.91+
AWS Reinvent 2022	EVENT	0.91+
Elon Musker	PERSON	0.9+
Snowflake	ORGANIZATION	0.9+
Reinvent	ORGANIZATION	0.89+
billions of points a second	QUANTITY	0.89+
four year	QUANTITY	0.88+
Chronograph	TITLE	0.88+
Confluent	TITLE	0.87+
Spark	TITLE	0.86+
Apache	ORGANIZATION	0.86+
Snowflake	TITLE	0.85+
Grafana	TITLE	0.85+
GitHub	ORGANIZATION	0.84+

Justin Emerson, Pure Storage | SuperComputing 22

(soft music) >> Hello, fellow hardware nerds and welcome back to Dallas Texas where we're reporting live from Supercomputing 2022. My name is Savannah Peterson, joined with the John Furrier on my left. >> Looking good today. >> Thank you, John, so are you. It's been a great show so far. >> We've had more hosts, more guests coming than ever before. >> I know. >> Amazing, super- >> We've got a whole thing going on. >> It's been a super computing performance. >> It, wow. And, we'll see how many times we can say super on this segment. Speaking of super things, I am in a very unique position right now. I am a flanked on both sides by people who have been doing content on theCUBE for 12 years. Yes, you heard me right, our next guest was on theCUBE 12 years ago, the third event, was that right, John? >> Man: First ever VM World. >> Yeah, the first ever VM World, third event theCUBE ever did. We are about to have a lot of fun. Please join me in welcoming Justin Emerson of Pure Storage. Justin, welcome back. >> It's a pleasure to be here. It's been too long, you never call, you don't write. (Savannah laughs) >> Great to see you. >> Yeah, likewise. >> How fun is this? Has the set evolved? Is everything looking good? >> I mean, I can barely remember what happened last week, so. (everyone laughs) >> Well, I remember lot's changed that VM world. You know, Paul Moritz was the CEO if you remember at that time. His actual vision actually happened but not the way, for VMware, but the industry, the cloud, he called the software mainframe. We were kind of riffing- >> It was quite the decade. >> Unbelievable where we are now, how we got here, but not where we're going to be. And you're with Pure Storage now which we've been, as you know, covering as well. Where's the connection into the supercomputing? Obviously storage performance, big part of this show. >> Right, right. >> What's the take? >> Well, I think, first of all it's great to be back at events in person. We were talking before we went on, and it's been so great to be back at live events now. It's been such a drought over the last several years, but yeah, yeah. So I'm very glad that we're doing in person events again. For Pure, this is an incredibly important show. You know, the product that I work with, with FlashBlade is you know, one of our key areas is specifically in this high performance computing, AI machine learning kind of space. And so we're really glad to be here. We've met a lot of customers, met a lot of other folks, had a lot of really great conversations. So it's been a really great show for me. And also just seeing all the really amazing stuff that's around here, I mean, if you want to find, you know, see what all the most cutting edge data center stuff that's going to be coming down the pipe, this is the place to do it. >> So one of the big themes of the show for us and probably, well, big theme of your life, is balancing power efficiency. You have a product in this category, Direct Flash. Can you tell us a little bit more about that? >> Yeah, so Pure as a storage company, right, what do we do differently from everybody else? And if I had to pick one thing, right, I would talk about, it's, you know, as the name implies, we're an all, we're purely flash, we're an all flash company. We've always been, don't plan to be anything else. And part of that innovation with Direct Flash is the idea of rather than treating a solid state disc as like a hard drive, right? Treat it as it actually is, treat it like who it really is and that's a very different kind of thing. And so Direct Flash is all about bringing native Flash interfaces to our product portfolio. And what's really exciting for me as a FlashBlade person, is now that's also part of our FlashBlade S portfolio, which just launched in June. And so the benefits of that are our myriad. But, you know, talking about efficiency, the biggest difference is that, you know, we can use like 90% less DRAM in our drives, which you know, everything uses, everything that you put in a drive uses power, it adds cost and all those things and so that really gives us an efficiency edge over everybody else and at a show like this, where, I mean, you walk the aisles and there's there's people doing liquid cooling and so much immersion stuff, and the reason they're doing that is because power is just increasing everywhere, right? So if you can figure out how do we use less power in some areas means you can shift that budget to other places. So if you can talk to a customer and say, well, if I could shrink your power budget for storage by two thirds or even, save you two-thirds of power, how many more accelerators, how many more CPUs, how much more work could you actually get done? So really exciting. >> I mean, less power consumption, more power and compute. >> Right. >> Kind of power center. So talk about the AI implications, where the use cases are. What are you seeing here? A lot of simulations, a lot of students, again, dorm room to the boardroom we've been saying here on theCUBE this is a great broad area, where's the action in the ML and the AI for you guys? >> So I think, not necessarily storage related but I think that right now there's this enormous explosion of custom silicon around AI machine learning which I as a, you said welcome hardware nerds at the beginning and I was like, ah, my people. >> We're all here, we're all here in Dallas. >> So wonderful. You know, as a hardware nerd we're talking about conferences, right? Who has ever attended hot chips and there's so much really amazing engineering work going on in the silicon space. It's probably the most exciting time for, CPU and accelerator, just innovation in, since the days before X 86 was the defacto standard, right? And you could go out and buy a different workstation with 16 different ISAs. That's really the most exciting thing, I walked past so many different places where you know, our booth is right next to Havana Labs with their gout accelerator, and they're doing this cute thing with one of the AI image generators in their booth, which is really cute. >> Woman: We're going to have to go check that out. >> Yeah, but that to me is like one of the more exciting things around like innovation at a, especially at a show like this where it's all about how do we move forward, the state of the art. >> What's different now than just a few years ago in terms of what's opening up the creativity for people to look at things that they could do with some of the scale that's different now. >> Yeah well, I mean, every time the state of the art moves forward what it means is, is that the entry level gets better, right? So if the high end is going faster, that means that the mid-range is going faster, and that means the entry level is going faster. So every time it pushes the boundary forward, it's a rising tide that floats all boats. And so now, the kind of stuff that's possible to do, if you're a student in a dorm room or if you're an enterprise, the world, the possible just keeps expanding dramatically and expanding almost, you know, geometrically like the amount of data that we are, that we have, as a storage guy, I was coming back to data but the amount of data that we have and the amount of of compute that we have, and it's not just about the raw compute, but also the advances in all sorts of other things in terms of algorithms and transfer learning and all these other things. There's so much amazing work going on in this area and it's just kind of this Kay Green explosion of innovation in the area. >> I love that you touched on the user experience for the community, no matter the level that you're at. >> Yeah. >> And I, it's been something that's come up a lot here. Everyone wants to do more faster, always, but it's not just that, it's about making the experience and the point of entry into this industry more approachable and digestible for folks who may not be familiar, I mean we have every end of the ecosystem here, on the show floor, where does Pure Storage sit in the whole game? >> Right, so as a storage company, right? What AI is all about deriving insights from data, right? And so everyone remembers that magazine cover data's the new oil, right? And it's kind of like, okay, so what do you do with it? Well, how do you derive value from all of that data? And AI machine learning and all of this supercomputing stuff is about how do we take all this data? How do we innovate with it? And so if you want data to innovate with, you need storage. And so, you know, our philosophy is that how do we make the best storage platforms that we can using the best technology for our customers that enable them to do really amazing things with AI machine learning and we've got different products, but, you know at the show here, what we're specifically showing off is our new flashlight S product, which, you know, I know we've had Pure folks on theCUBE before talking about FlashBlade, but for viewers out there, FlashBlade is our our scale out unstructured data platform and AI and machine learning and supercomputing is all about unstructured data. It's about sensor data, it's about imaging, it's about, you know, photogrammetry, all this other kinds of amazing stuff. But, you got to land all that somewhere. You got to process that all somewhere. And so really high performance, high throughput, highly scalable storage solutions are really essential. It's an enabler for all of the amazing other kinds of engineering work that goes on at a place like Supercomputing. >> It's interesting you mentioned data's oil. Remember in 2010, that year, our first year of theCUBE, Hadoop World, Hadoop just started to come on the scene, which became, you know kind of went away and, but now you got, Spark and Databricks and Snowflake- >> Justin: And it didn't go away, it just changed, right? >> It just got refactored and right size, I think for what the people wanted it to be easy to use but there's more data coming. How is data driving innovation as you bring, as people see clearly the more data's coming? How is data driving innovation as you guys look at your products, your roadmap and your customer base? How is data driving innovation for your customers? >> Well, I think every customer who has been, you know collecting all of this data, right? Is trying to figure out, now what do I do with it? And a lot of times people collect data and then it will end up on, you know, lower slower tiers and then suddenly they want to do something with it. And it's like, well now what do I do, right? And so there's all these people that are reevaluating you know, we, when we developed FlashBlade we sort of made this bet that unstructured data was going to become the new tier one data. It used to be that we thought unstructured data, it was emails and home directories and all that stuff the kind of stuff that you didn't really need a really good DR plan on. It's like, ah, we could, now of course, as soon as email goes down, you realize how important email is. But, the perspectives that people had on- >> Yeah, exactly. (all laughing) >> The perspectives that people had on unstructured data and it's value to the business was very different and so now- >> Good bet, by the way. >> Yeah, thank you. So now unstructured data is considered, you know, where companies are going to derive their value from. So it's whether they use the data that they have to build better products whether it's they use the data they have to develop you know, improvements in processes. All those kinds of things are data driven. And so all of the new big advancements in industry and in business are all about how do I derive insights from data? And so machine learning and AI has something to do with that, but also, you know, it all comes back to having data that's available. And so, we're working very hard on building platforms that customers can use to enable all of this really- >> Yeah, it's interesting, Savannah, you know, the top three areas we're covering for reinventing all the hyperscale events is data. How does it drive innovation and then specialized solutions to make customers lives easier? >> Yeah. >> It's become a big category. How do you compose stuff and then obviously compute, more and more compute and services to make the performance goes. So those seem to be the three hot areas. So, okay, data's the new oil refineries. You've got good solutions. What specialized solutions do you see coming out because once people have all this data, they might have either large scale, maybe some edge use cases. Do you see specialized solutions emerging? I mean, obviously it's got DPU emerging which is great, but like, do you see anything else coming out at that people are- >> Like from a hardware standpoint. >> Or from a customer standpoint, making the customer's lives easier? So, I got a lot of data flowing in. >> Yeah. >> It's never stopping, it keeps powering in. >> Yeah. >> Are there things coming out that makes their life easier? Have you seen anything coming out? >> Yeah, I think where we are as an industry right now with all of this new technology is, we're really in this phase of the standards aren't quite there yet. Everybody is sort of like figuring out what works and what doesn't. You know, there was this big revolution in sort of software development, right? Where moving towards agile development and all that kind of stuff, right? The way people build software change fundamentally this is kind of like another wave like that. I like to tell people that AI and machine learning is just a different way of writing software. What is the output of a training scenario, right? It's a model and a model is just code. And so I think that as all of these different, parts of the business figure out how do we leverage these technologies, what it is, is it's a different way of writing software and it's not necessarily going to replace traditional software development, but it's going to augment it, it's going to let you do other interesting things and so, where are things going? I think we're going to continue to start coalescing around what are the right ways to do things. Right now we talk about, you know, ML Ops and how development and the frameworks and all of this innovation. There's so much innovation, which means that the industry is moving so quickly that it's hard to settle on things like standards and, or at least best practices you know, at the very least. And that the best practices are changing every three months. Are they really best practices right? So I think, right, I think that as we progress and coalesce around kind of what are the right ways to do things that's really going to make customers' lives easier. Because, you know, today, if you're a software developer you know, we build a lot of software at Pure Storage right? And if you have people and developers who are familiar with how the process, how the factory functions, then their skills become portable and it becomes easier to onboard people and AI is still nothing like that right now. It's just so, so fast moving and it's so- >> Wild West kind of. >> It's not standardized. It's not industrialized, right? And so the next big frontier in all of this amazing stuff is how do we industrialize this and really make it easy to implement for organizations? >> Oil refineries, industrial Revolution. I mean, it's on that same trajectory. >> Yeah. >> Yeah, absolutely. >> Or industrial revolution. (John laughs) >> Well, we've talked a lot about the chaos and sort of we are very much at this early stage stepping way back and this can be your personal not Pure Storage opinion if you want. >> Okay. >> What in HPC or AIML I guess it all falls under the same umbrella, has you most excited? >> Ooh. >> So I feel like you're someone who sees a lot of different things. You've got a lot of customers, you're out talking to people. >> I think that there is a lot of advancement in the area of natural language processing and I think that, you know, we're starting to take things just like natural language processing and then turning them into vision processing and all these other, you know, I think the, the most exciting thing for me about AI is that there are a lot of people who are, you are looking to use these kinds of technologies to make technology more inclusive. And so- >> I love it. >> You know the ability for us to do things like automate captioning or the ability to automate descriptive, audio descriptions of video streams or things like that. I think that those are really,, I think they're really great in terms of bringing the benefits of technology to more people in an automated way because the challenge has always been bandwidth of how much a human can do. And because they were so difficult to automate and what AI's really allowing us to do is build systems whether that's text to speech or whether that's translation, or whether that's captioning or all these other things. I think the way that AI interfaces with humans is really the most interesting part. And I think the benefits that it can bring there because there's a lot of talk about all of the things that it does that people don't like or that they, that people are concerned about. But I think it's important to think about all the really great things that maybe don't necessarily personally impact you, but to the person who's not cited or to the person who you know is hearing impaired. You know, that's an enormously valuable thing. And the fact that those are becoming easier to do they're becoming better, the quality is getting better. I think those are really important for everybody. >> I love that you brought that up. I think it's a really important note to close on and you know, there's always the kind of terminator, dark side that we obsess over but that's actually not the truth. I mean, when we think about even just captioning it's a tool we use on theCUBE. It's, you know, we see it on our Instagram stories and everything else that opens the door for so many more people to be able to learn. >> Right? >> And the more we all learn, like you said the water level rises together and everything is magical. Justin, it has been a pleasure to have you on board. Last question, any more bourbon tasting today? >> Not that I'm aware of, but if you want to come by I'm sure we can find something somewhere. (all laughing) >> That's the spirit, that is the spirit of an innovator right there. Justin, thank you so much for joining us from Pure Storage. John Furrier, always a pleasure to interview with you. >> I'm glad I can contribute. >> Hey, hey, that's the understatement of the century. >> It's good to be back. >> Yeah. >> Hopefully I'll see you guys in, I'll see you guys in 2034. >> No. (all laughing) No, you've got the Pure Accelerate conference. We'll be there. >> That's right. >> We'll be there. >> Yeah, we have our Pure Accelerate conference next year and- >> Great. >> Yeah. >> I love that, I mean, feel free to, you know, hype that. That's awesome. >> Great company, great runs, stayed true to the mission from day one, all Flash, continue to innovate congratulations. >> Yep, thank you so much, it's pleasure being here. >> It's a fun ride, you are a joy to talk to and it's clear you're just as excited as we are about hardware, so thanks a lot Justin. >> My pleasure. >> And thank all of you for tuning in to this wonderfully nerdy hardware edition of theCUBE live from Dallas, Texas, where we're at, Supercomputing, my name's Savannah Peterson and I hope you have a wonderful night. (soft music)

Published Date : Nov 16 2022

SUMMARY :

and welcome back to Dallas Texas It's been a great show so far. We've had more hosts, more It's been a super the third event, was that right, John? Yeah, the first ever VM World, It's been too long, you I mean, I can barely remember for VMware, but the industry, the cloud, as you know, covering as well. and it's been so great to So one of the big the biggest difference is that, you know, I mean, less power consumption, in the ML and the AI for you guys? nerds at the beginning all here in Dallas. places where you know, have to go check that out. Yeah, but that to me is like one of for people to look at and the amount of of compute that we have, I love that you touched and the point of entry It's an enabler for all of the amazing but now you got, Spark and as you guys look at your products, the kind of stuff that Yeah, exactly. And so all of the new big advancements Savannah, you know, but like, do you see a hardware standpoint. the customer's lives easier? It's never stopping, it's going to let you do And so the next big frontier I mean, it's on that same trajectory. (John laughs) a lot about the chaos You've got a lot of customers, and I think that, you know, or to the person who you and you know, there's always And the more we all but if you want to come by that is the spirit of an Hey, hey, that's the Hopefully I'll see you guys We'll be there. free to, you know, hype that. all Flash, continue to Yep, thank you so much, It's a fun ride, you and I hope you have a wonderful night.

ENTITIES

Entity	Category	Confidence
Paul Moritz	PERSON	0.99+
Justin	PERSON	0.99+
Justin Emerson	PERSON	0.99+
John	PERSON	0.99+
Savannah Peterson	PERSON	0.99+
Savannah	PERSON	0.99+
Dallas	LOCATION	0.99+
June	DATE	0.99+
John Furrier	PERSON	0.99+
12 years	QUANTITY	0.99+
2010	DATE	0.99+
Kay Green	PERSON	0.99+
Dallas, Texas	LOCATION	0.99+
third event	QUANTITY	0.99+
Dallas Texas	LOCATION	0.99+
last week	DATE	0.99+
12 years ago	DATE	0.99+
two-thirds	QUANTITY	0.99+
First	QUANTITY	0.98+
VM World	EVENT	0.98+
first	QUANTITY	0.98+
two thirds	QUANTITY	0.98+
Havana Labs	ORGANIZATION	0.98+
Pure Accelerate	EVENT	0.98+
next year	DATE	0.98+
today	DATE	0.98+
both sides	QUANTITY	0.98+
Pure Storage	ORGANIZATION	0.97+
first year	QUANTITY	0.97+
16 different ISAs	QUANTITY	0.96+
FlashBlade	TITLE	0.96+
three hot areas	QUANTITY	0.94+
three	QUANTITY	0.94+
Snowflake	ORGANIZATION	0.93+
one	QUANTITY	0.93+
2034	DATE	0.93+
one thing	QUANTITY	0.93+
Supercomputing	ORGANIZATION	0.9+
90% less	QUANTITY	0.89+
theCUBE	ORGANIZATION	0.86+
agile	TITLE	0.84+
VM world	EVENT	0.84+
few years ago	DATE	0.81+
day one	QUANTITY	0.81+
Hadoop World	ORGANIZATION	0.8+
VMware	ORGANIZATION	0.79+
Instagram	ORGANIZATION	0.78+
Spark and	ORGANIZATION	0.77+
Hadoop	ORGANIZATION	0.74+
years	DATE	0.73+
last	DATE	0.73+
three months	QUANTITY	0.69+
FlashBlade	ORGANIZATION	0.68+
Direct Flash	TITLE	0.67+
year	DATE	0.65+
tier one	QUANTITY	0.58+
Supercomputing	TITLE	0.58+
Direct	TITLE	0.56+
Flash	ORGANIZATION	0.55+
86	TITLE	0.55+
aces	QUANTITY	0.55+
Pure	ORGANIZATION	0.51+
Databricks	ORGANIZATION	0.5+
2022	ORGANIZATION	0.5+
X	EVENT	0.45+

Tim Yocum, Influx Data | Evolving InfluxDB into the Smart Data Platform

(soft electronic music) >> Okay, we're back with Tim Yocum who is the Director of Engineering at InfluxData. Tim, welcome, good to see you. >> Good to see you, thanks for having me. >> You're really welcome. Listen, we've been covering opensource software on theCUBE for more than a decade and we've kind of watched the innovation from the big data ecosystem, the cloud is being built out on opensource, mobile, social platforms, key databases, and of course, InfluxDB. And InfluxData has been a big consumer and crontributor of opensource software. So my question to you is where have you seen the biggest bang for the buck from opensource software? >> So yeah, you know, Influx really, we thrive at the intersection of commercial services and opensource software, so OSS keeps us on the cutting edge. We benefit from OSS in delivering our own service from our core storage engine technologies to web services, templating engines. Our team stays lean and focused because we build on proven tools. We really build on the shoulders of giants. And like you've mentioned, even better, we contribute a lot back to the projects that we use, as well as our own product InfluxDB. >> But I got to ask you, Tim, because one of the challenge that we've seen, in particular, you saw this in the heyday of Hadoop, the innovations come so fast and furious, and as a software company, you got to place bets, you got to commit people, and sometimes those bets can be risky and not pay off. So how have you managed this challenge? >> Oh, it moves fast, yeah. That's a benefit, though, because the community moves so quickly that today's hot technology can be tomorrow's dinosaur. And what we tend to do is we fail fast and fail often; we try a lot of things. You know, you look at Kubernetes, for example. That ecosystem is driven by thousands of intelligent developers, engineers, builders. They're adding value every day, so we have to really keep up with that. And as the stack changes, we try different technologies, we try different methods. And at the end of the day, we come up with a better platform as a result of just the constant change in the environment. It is a challenge for us, but it's something that we just do every day. >> So we have a survey partner down in New York City called Enterprise Technology Research, ETR, and they do these quarterly surveys of about 1500 CIOs, IT practitioners, and they really have a good pulse on what's happening with spending. And the data shows that containers generally, but specifically Kubernetes, is one of the areas that is kind of, it's been off the charts and seen the most significant adoption and velocity particularly along with cloud, but really, Kubernetes is just, you know, still up and to the right consistently, even with the macro headwinds and all of the other stuff that we're sick of talking about. So what do you do with Kubernetes in the platform? >> Yeah, it's really central to our ability to run the product. When we first started out, we were just on AWS and the way we were running was a little bit like containers junior. Now we're running Kubernetes everywhere at AWS, Azure, Google cloud. It allows us to have a consistent experience across three different cloud providers and we can manage that in code. So our developers can focus on delivering services not trying to learn the intricacies of Amazon, Azure, and Google, and figure out how to deliver services on those three clouds with all of their differences. >> Just a followup on that, is it now, so I presume it sounds like there's a PaaS layer there to allow you guys to have a consistent experience across clouds and out to the edge, wherever. Is that correct? >> Yeah, so we've basically built more or less platform engineering is this the new, hot phrase. Kubernetes has made a lot of things easy for us because we've built a platform that our developers can lean on and they only have to learn one way of deploying their application, managing their application. And so that just gets all of the underlying infrastructure out of the way and lets them focus on delivering Influx cloud. >> And I know I'm taking a little bit of a tangent, but is that, I'll call it a PaaS layer, if I can use that term, are there specific attributes to InfluxDB or is it kind of just generally off-the-shelf PaaS? Is there any purpose built capability there that is value-add or is it pretty much generic? >> So we really build, we look at things with a build versus buy, through a build versus buy lens. Some things we want to leverage, cloud provider services, for instance, POSTGRES databases for metadata, perhaps. Get that off of our plate, let someone else run that. We're going to deploy a platform that our engineers can deliver on, that has consistency, that is all generated from code. that we can, as an SRE group, as an OPS team, that we can manage with very few people, really, and we can stamp out clusters across multiple regions in no time. >> So sometimes you build, sometimes you buy it. How do you make those decisions and what does that mean for the platform and for customers? >> Yeah, so what we're doing is, it's like everybody else will do. We're looking for trade-offs that make sense. We really want to protect our customers' data, so we look for services that support our own software with the most up-time reliability and durability we can get. Some things are just going to be easier to have a cloud provider take care of on our behalf. We make that transparent for our own team and of course, for our customers; you don't even see that. But we don't want to try to reinvent the wheel, like I had mentioned with SQL datasource for metadata, perhaps. Let's build on top of what of these three large cloud providers have already perfected and we can then focus on our platform engineering and we can help our developers then focus on the InfluxData software, the Influx cloud software. >> So take it to the customer level. What does it mean for them, what's the value that they're going to get out of all these innovations that we've been talking about today, and what can they expect in the future? >> So first of all, people who use the OSS product are really going to be at home on our cloud platform. You can run it on your desktop machine, on a single server, what have you, but then you want to scale up. We have some 270 terabytes of data across over four billion series keys that people have stored, so there's a proven ability to scale. Now in terms of the opensource software and how we've developed the platform, you're getting highly available, high cardinality time-series platform. We manage it and really, as I had mentioned earlier, we can keep up with the state of the art. We keep reinventing, we keep deploying things in realtime. We deploy to our platform every day, repeatedly, all the time. And it's that continuous deployment that allow us to continue testing things in flight, rolling things out that change, new features, better ways of doing deployments, safer ways of doing deployments. All of that happens behind the scenes and like we had mentioned earllier, Kubernetes, I mean, that allows us to get that done. We couldn't do it without having that platform as a base layer for us to then put our software on. So we iterate quickly. When you're on the Influx cloud platform, you really are able to take advantage of new features immediately. We roll things out every day and as those things go into production, you have the ability to use them. And so in the then, we want you to focus on getting actual insights from your data instead of running infrastructure, you know, let us do that for you. >> That makes sense. Are the innovations that we're talking about in the evolution of InfluxDB, do you see that as sort of a natural evolution for existing customers? Is it, I'm sure the answer is both, but is it opening up new territory for customers? Can you add some color to that? >> Yeah, it really is. It's a little bit of both. Any engineer will say, "Well it depends." So cloud-native technologies are really the hot thing, IoT, industrial IoT especially. People want to just shove tons of data out there and be able to do queries immediately and they don't want to manage infrastructure. What we've started to see are people that use the cloud service as their datastore backbone and then they use edge computing with our OSS product to ingest data from say, multiple production lines, and down-sample that data, send the rest of that data off to Influx cloud where the heavy processing takes place. So really, us being in all the different clouds and iterating on that, and being in all sorts of different regions, allows for people to really get out of the business of trying to manage that big data, have us take care of that. And, of course, as we change the platform, endusers benefit from that immediately. >> And so obviously you've taken away a lot of the heavy lifting for the infrastructure. Would you say the same things about security, especially as you go out to IoT at the edge? How should we be thinking about the value that you bring from a security perspective? >> We take security super seriously. It's built into our DNA. We do a lot of work to ensure that our platform is secure, that the data that we store is kept private. It's, of course, always a concern, you see in the news all the time, companies being compromised. That's something that you can have an entire team working on which we do, to make sure that the data that you have, whether it's in transit, whether it's at rest is always kept secure, is only viewable by you. You look at things like software bill of materials, if you're running this yourself, you have to go vet all sorts of different pieces of software and we do that, you know, as we use new tools. That's something, that's just part of our jobs to make sure that the platform that we're running has fully vetted software. And you know, with opensource especially, that's a lot of work, and so it's definitely new territory. Supply chain attacks are definitely happening at a higher clip that they used to but that is really just part of a day in the life for folks like us that are building platforms. >> And that's key, especially when you start getting into the, you know, that we talk about IoT and the operations technologies, the engineers running that infrastrucutre. You know, historically, as you know, Tim, they would air gap everything; that's how they kept it safe. But that's not feasible anymore. Everything's-- >> Can't do that. >> connected now, right? And so you've got to have a partner that is, again, take away that heavy lifting to R&D so you can focus on some of the other activities. All right, give us the last word and the key takeaways from your perspective. >> Well, you know, from my perspective, I see it as a two-lane approach, with Influx, with any time-series data. You've got a lot of stuff that you're going to run on-prem. What you had mentioned, air gapping? Sure, there's plenty of need for that. But at the end of the day, people that don't want to run big datacenters, people that want to entrust their data to a company that's got a full platform set up for them that they can build on, send that data over to the cloud. The cloud is not going away. I think a more hybrid approach is where the future lives and that's what we're prepared for. >> Tim, really appreciate you coming to the program. Great stuff, good to see you. >> Thanks very much, appreciate it. >> Okay in a moment, I'll be back to wrap up today's session. You're watching theCUBE. (soft electronic music)

Published Date : Nov 8 2022

SUMMARY :

the Director of Engineering at InfluxData. So my question to you back to the projects that we use, in the heyday of Hadoop, And at the end of the day, we and all of the other stuff and the way we were and out to the edge, wherever. And so that just gets all of that we can manage with for the platform and for customers? and we can then focus on that they're going to get And so in the then, we want you to focus about in the evolution of InfluxDB, and down-sample that data, that you bring from a that the data that you have, and the operations technologies, and the key takeaways that data over to the cloud. you coming to the program. to wrap up today's session.

ENTITIES

Entity	Category	Confidence
Tim Yocum	PERSON	0.99+
Tim	PERSON	0.99+
InfluxData	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
both	QUANTITY	0.99+
two-lane	QUANTITY	0.99+
thousands	QUANTITY	0.99+
tomorrow	DATE	0.98+
today	DATE	0.98+
more than a decade	QUANTITY	0.98+
270 terabytes	QUANTITY	0.98+
InfluxDB	TITLE	0.98+
one	QUANTITY	0.97+
about 1500 CIOs	QUANTITY	0.97+
Influx	ORGANIZATION	0.96+
Azure	ORGANIZATION	0.94+
one way	QUANTITY	0.93+
single server	QUANTITY	0.93+
first	QUANTITY	0.92+
PaaS	TITLE	0.92+
Kubernetes	TITLE	0.91+
Enterprise Technology Research	ORGANIZATION	0.91+
Kubernetes	ORGANIZATION	0.91+
three clouds	QUANTITY	0.9+
ETR	ORGANIZATION	0.89+
tons of data	QUANTITY	0.87+
rsus	ORGANIZATION	0.87+
Hadoop	TITLE	0.85+
over four billion series	QUANTITY	0.85+
three large cloud providers	QUANTITY	0.74+
three different cloud providers	QUANTITY	0.74+
theCUBE	ORGANIZATION	0.66+
SQL	TITLE	0.64+
opensource	ORGANIZATION	0.63+
intelligent developers	QUANTITY	0.57+
POSTGRES	ORGANIZATION	0.52+
earllier	ORGANIZATION	0.5+
Azure	TITLE	0.49+
InfluxDB	OTHER	0.48+
cloud	TITLE	0.4+

Anais Dotis Georgiou, InfluxData | Evolving InfluxDB into the Smart Data Platform

>>Okay, we're back. I'm Dave Valante with The Cube and you're watching Evolving Influx DB into the smart data platform made possible by influx data. Anna East Otis Georgio is here. She's a developer advocate for influx data and we're gonna dig into the rationale and value contribution behind several open source technologies that Influx DB is leveraging to increase the granularity of time series analysis analysis and bring the world of data into realtime analytics. Anna is welcome to the program. Thanks for coming on. >>Hi, thank you so much. It's a pleasure to be here. >>Oh, you're very welcome. Okay, so IO X is being touted as this next gen open source core for Influx db. And my understanding is that it leverages in memory, of course for speed. It's a kilo store, so it gives you compression efficiency, it's gonna give you faster query speeds, it gonna use store files and object storages. So you got very cost effective approach. Are these the salient points on the platform? I know there are probably dozens of other features, but what are the high level value points that people should understand? >>Sure, that's a great question. So some of the main requirements that IOCs is trying to achieve and some of the most impressive ones to me, the first one is that it aims to have no limits on cardinality and also allow you to write any kind of event data that you want, whether that's lift tag or a field. It also wants to deliver the best in class performance on analytics queries. In addition to our already well served metrics queries, we also wanna have operator control over memory usage. So you should be able to define how much memory is used for buffering caching and query processing. Some other really important parts is the ability to have bulk data export and import, super useful. Also, broader ecosystem compatibility where possible we aim to use and embrace emerging standards in the data analytics ecosystem and have compatibility with things like sql, Python, and maybe even pandas in the future. >>Okay, so a lot there. Now we talked to Brian about how you're using Rust and and which is not a new programming language and of course we had some drama around Russ during the pandemic with the Mozilla layoffs, but the formation of the Russ Foundation really addressed any of those concerns. You got big guns like Amazon and Google and Microsoft throwing their collective weights behind it. It's really, adoption is really starting to get steep on the S-curve. So lots of platforms, lots of adoption with rust, but why rust as an alternative to say c plus plus for example? >>Sure, that's a great question. So Rust was chosen because of his exceptional performance and rebi reliability. So while rust is synt tactically similar to c c plus plus and it has similar performance, it also compiles to a native code like c plus plus. But unlike c plus plus, it also has much better memory safety. So memory safety is protection against bugs or security vulnerabilities that lead to excessive memory usage or memory leaks. And rust achieves this memory safety due to its like innovative type system. Additionally, it doesn't allow for dangling pointers and dangling pointers are the main classes of errors that lead to exploitable security vulnerabilities in languages like c plus plus. So Russ like helps meet that requirement of having no limits on card for example, because it's, we're also using the Russ implementation of Apache Arrow and this control over memory and also Russ, Russ Russ's packaging system called crates IO offers everything that you need out of the box to have features like AY and a weight to fixed race conditions to protect against buffering overflows and to ensure thread safe ay caching structures as well. So essentially it's just like has all the control, all the fine grain control, you need to take advantage of memory and all your resources as well as possible so that you can handle those really, really high ity use cases. >>Yeah, and the more I learned about the the new engine and the, and the platform IOCs et cetera, you know, you, you see things like, you know, the old days not even to even today you do a lot of garbage collection in these, in these systems and there's an inverse, you know, impact relative to performance. So it looks like you're really, you know, the community is modernizing the platform, but I wanna talk about Apache Arrow for a moment. It's designed to address the constraints that are associated with analyzing large data sets. We, we know that, but please explain why, what, what is Arrow and and what does it bring to Influx db? >>Sure, yeah. So Arrow is a, a framework for defining in memory calmer data and so much of the efficiency and performance of IOCs comes from taking advantage of calmer data structures. And I will, if you don't mind, take a moment to kind of illustrate why calmer data structures are so valuable. Let's pretend that we are gathering field data about the temperature in our room and also maybe the temperature of our stove. And in our table we have those two temperature values as well as maybe a measurement value, timestamp value, maybe some other tag values that describe what room and what house, et cetera we're getting this data from. And so you can picture this table where we have like two rows with the two temperature values for both our room and the stove. Well usually our room temperature is regulated so those values don't change very often. >>So when you have calm oriented st calm oriented storage, essentially you take each row, each column and group it together. And so if that's the case and you're just taking temperature values from the room and a lot of those temperature values are the same, then you'll, you might be able to imagine how equal values will then neighbor each other and when they neighbor each other in the storage format. This provides a really perfect opportunity for cheap compression. And then this cheap compression enables high cardinality use cases. It also enables for faster scan rates. So if you wanna define like the min and max value of the temperature in the room across a thousand different points, you only have to get those a thousand different points in order to answer that question and you have those immediately available to you. But let's contrast this with a row oriented storage solution instead so that we can understand better the benefits of calmer oriented storage. >>So if you had a row oriented storage, you'd first have to look at every field like the temperature in, in the room and the temperature of the stove. You'd have to go across every tag value that maybe describes where the room is located or what model the stove is. And every timestamp you'd then have to pluck out that one temperature value that you want at that one times stamp and do that for every single row. So you're scanning across a ton more data and that's why row oriented doesn't provide the same efficiency as calmer and Apache Arrow is in memory calmer data, calmer data fit framework. So that's where a lot of the advantages come >>From. Okay. So you've basically described like a traditional database, a row approach, but I've seen like a lot of traditional databases say, okay, now we've got, we can handle colo format versus what you're talking about is really, you know, kind of native it, is it not as effective as the, is the form not as effective because it's largely a, a bolt on? Can you, can you like elucidate on that front? >>Yeah, it's, it's not as effective because you have more expensive compression and because you can't scan across the values as quickly. And so those are, that's pretty much the main reasons why, why RO row oriented storage isn't as efficient as calm, calmer oriented storage. >>Yeah. Got it. So let's talk about Arrow data fusion. What is data fusion? I know it's written in rust, but what does it bring to to the table here? >>Sure. So it's an extensible query execution framework and it uses Arrow as its in memory format. So the way that it helps influx DB IOx is that okay, it's great if you can write unlimited amount of cardinality into influx cbis, but if you don't have a query engine that can successfully query that data, then I don't know how much value it is for you. So data fusion helps enable the, the query process and transformation of that data. It also has a PANDAS API so that you could take advantage of PDA's data frames as well and all of the machine learning tools associated with pandas. >>Okay. You're also leveraging par K in the platform course. We heard a lot about Par K in the middle of the last decade cuz as a storage format to improve on Hadoop column stores. What are you doing with Par K and why is it important? >>Sure. So Par K is the calm oriented durable file format. So it's important because it'll enable bulk import and bulk export. It has compatibility with Python and pandas so it supports a broader ecosystem. Parque files also take very little disc disc space and they're faster to scan because again they're column oriented in particular, I think PAR K files are like 16 times cheaper than CSV files, just as kind of a point of reference. And so that's essentially a lot of the, the benefits of par k. >>Got it. Very popular. So and these, what exactly is influx data focusing on as a committer to these projects? What is your focus? What's the value that you're bringing to the community? >>Sure. So Influx DB first has contributed a lot of different, different things to the Apache ecosystem. For example, they contribute an implementation of Apache Arrow and go and that will support clearing with flux. Also, there has been a quite a few contributions to data fusion for things like memory optimization and supportive additional SQL features like support for timestamp, arithmetic and support for exist clauses and support for memory control. So yeah, Influx has contributed a a lot to the Apache ecosystem and continues to do so. And I think kind of the idea here is that if you can improve these upstream projects and then the long term strategy here is that the more you contribute and build those up, then the more you will perpetuate that cycle of improvement and the more we will invest in our own project as well. So it's just that kind of symbiotic relationship and appreciation of the open source community. >>Yeah. Got it. You got that virtuous cycle going, the people call it the flywheel. Give us your last thoughts and kind of summarize, you know, where what, what the big takeaways are from your perspective. >>So I think the big takeaway is that influx data is doing a lot of really exciting things with Influx DB IOCs and I really encourage if you are interested in learning more about the technologies that Influx is leveraging to produce IOCs, the challenges associated with it and all of the hard work questions and I just wanna learn more, then I would encourage you to go to the monthly tech talks and community office hours and they are on every second Wednesday of the month at 8:30 AM Pacific time. There's also a community forums and a community Slack channel. Look for the influx D DB underscore IAC channel specifically to learn more about how to join those office hours and those monthly tech tech talks as well as ask any questions they have about IOCs, what to expect and what you'd like to learn more about. I as a developer advocate, I wanna answer your questions. So if there's a particular technology or stack that you wanna dive deeper into and want more explanation about how influx TB leverages it to build IOCs, I will be really excited to produce content on that topic for you. >>Yeah, that's awesome. You guys have a really rich community, collaborate with your peers, solve problems, and you guys super responsive, so really appreciate that. All right, thank you so much and East for explaining all this open source stuff to the audience and why it's important to the future of data. >>Thank you. I really appreciate it. >>All right, you're very welcome. Okay, stay right there and in a moment I'll be back with Tim Yokum. He's the director of engineering for Influx Data and we're gonna talk about how you update a SaaS engine while the plane is flying at 30,000 feet. You don't wanna miss this.

Published Date : Nov 8 2022

SUMMARY :

to increase the granularity of time series analysis analysis and bring the world of data Hi, thank you so much. So you got very cost effective approach. it aims to have no limits on cardinality and also allow you to write any kind of event data that So lots of platforms, lots of adoption with rust, but why rust as an all the fine grain control, you need to take advantage of even to even today you do a lot of garbage collection in these, in these systems and And so you can picture this table where we have like two rows with the two temperature values for order to answer that question and you have those immediately available to you. to pluck out that one temperature value that you want at that one times stamp and do that for every about is really, you know, kind of native it, is it not as effective as the, Yeah, it's, it's not as effective because you have more expensive compression and because So let's talk about Arrow data fusion. It also has a PANDAS API so that you could take advantage of What are you doing with So it's important What's the value that you're bringing to the community? here is that the more you contribute and build those up, then the kind of summarize, you know, where what, what the big takeaways are from your perspective. So if there's a particular technology or stack that you wanna dive deeper into and want and you guys super responsive, so really appreciate that. I really appreciate it. Influx Data and we're gonna talk about how you update a SaaS engine while

ENTITIES

Entity	Category	Confidence
Tim Yokum	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Brian	PERSON	0.99+
Anna	PERSON	0.99+
James Bellenger	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Dave Valante	PERSON	0.99+
James	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
three months	QUANTITY	0.99+
16 times	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Python	TITLE	0.99+
mobile.twitter.com	OTHER	0.99+
Influx Data	ORGANIZATION	0.99+
iOS	TITLE	0.99+
Twitter	ORGANIZATION	0.99+
30,000 feet	QUANTITY	0.99+
Russ Foundation	ORGANIZATION	0.99+
Scala	TITLE	0.99+
Twitter Lite	TITLE	0.99+
two rows	QUANTITY	0.99+
200 megabyte	QUANTITY	0.99+
Node	TITLE	0.99+
Three months ago	DATE	0.99+
one application	QUANTITY	0.99+
both places	QUANTITY	0.99+
each row	QUANTITY	0.99+
Par K	TITLE	0.99+
Anais Dotis Georgiou	PERSON	0.99+
one language	QUANTITY	0.98+
first one	QUANTITY	0.98+
15 engineers	QUANTITY	0.98+
Anna East Otis Georgio	PERSON	0.98+
both	QUANTITY	0.98+
one second	QUANTITY	0.98+
25 engineers	QUANTITY	0.98+
About 800 people	QUANTITY	0.98+
sql	TITLE	0.98+
Node Summit 2017	EVENT	0.98+
two temperature values	QUANTITY	0.98+
one times	QUANTITY	0.98+
c plus plus	TITLE	0.97+
Rust	TITLE	0.96+
SQL	TITLE	0.96+
today	DATE	0.96+
Influx	ORGANIZATION	0.95+
under 600 kilobytes	QUANTITY	0.95+
first	QUANTITY	0.95+
c plus plus	TITLE	0.95+
Apache	ORGANIZATION	0.95+
par K	TITLE	0.94+
React	TITLE	0.94+
Russ	ORGANIZATION	0.94+
About three months ago	DATE	0.93+
8:30 AM Pacific time	DATE	0.93+
twitter.com	OTHER	0.93+
last decade	DATE	0.93+
Node	ORGANIZATION	0.92+
Hadoop	TITLE	0.9+
InfluxData	ORGANIZATION	0.89+
c c plus plus	TITLE	0.89+
Cube	ORGANIZATION	0.89+
each column	QUANTITY	0.88+
InfluxDB	TITLE	0.86+
Influx DB	TITLE	0.86+
Mozilla	ORGANIZATION	0.86+
DB IOx	TITLE	0.85+

Evolving InfluxDB into the Smart Data Platform

>>This past May, The Cube in collaboration with Influx data shared with you the latest innovations in Time series databases. We talked at length about why a purpose built time series database for many use cases, was a superior alternative to general purpose databases trying to do the same thing. Now, you may, you may remember the time series data is any data that's stamped in time, and if it's stamped, it can be analyzed historically. And when we introduced the concept to the community, we talked about how in theory, those time slices could be taken, you know, every hour, every minute, every second, you know, down to the millisecond and how the world was moving toward realtime or near realtime data analysis to support physical infrastructure like sensors and other devices and IOT equipment. A time series databases have had to evolve to efficiently support realtime data in emerging use cases in iot T and other use cases. >>And to do that, new architectural innovations have to be brought to bear. As is often the case, open source software is the linchpin to those innovations. Hello and welcome to Evolving Influx DB into the smart Data platform, made possible by influx data and produced by the Cube. My name is Dave Valante and I'll be your host today. Now in this program we're going to dig pretty deep into what's happening with Time series data generally, and specifically how Influx DB is evolving to support new workloads and demands and data, and specifically around data analytics use cases in real time. Now, first we're gonna hear from Brian Gilmore, who is the director of IOT and emerging technologies at Influx Data. And we're gonna talk about the continued evolution of Influx DB and the new capabilities enabled by open source generally and specific tools. And in this program you're gonna hear a lot about things like Rust, implementation of Apache Arrow, the use of par k and tooling such as data fusion, which powering a new engine for Influx db. >>Now, these innovations, they evolve the idea of time series analysis by dramatically increasing the granularity of time series data by compressing the historical time slices, if you will, from, for example, minutes down to milliseconds. And at the same time, enabling real time analytics with an architecture that can process data much faster and much more efficiently. Now, after Brian, we're gonna hear from Anna East Dos Georgio, who is a developer advocate at In Flux Data. And we're gonna get into the why of these open source capabilities and how they contribute to the evolution of the Influx DB platform. And then we're gonna close the program with Tim Yokum, he's the director of engineering at Influx Data, and he's gonna explain how the Influx DB community actually evolved the data engine in mid-flight and which decisions went into the innovations that are coming to the market. Thank you for being here. We hope you enjoy the program. Let's get started. Okay, we're kicking things off with Brian Gilmore. He's the director of i t and emerging Technology at Influx State of Bryan. Welcome to the program. Thanks for coming on. >>Thanks Dave. Great to be here. I appreciate the time. >>Hey, explain why Influx db, you know, needs a new engine. Was there something wrong with the current engine? What's going on there? >>No, no, not at all. I mean, I think it's, for us, it's been about staying ahead of the market. I think, you know, if we think about what our customers are coming to us sort of with now, you know, related to requests like sql, you know, query support, things like that, we have to figure out a way to, to execute those for them in a way that will scale long term. And then we also, we wanna make sure we're innovating, we're sort of staying ahead of the market as well and sort of anticipating those future needs. So, you know, this is really a, a transparent change for our customers. I mean, I think we'll be adding new capabilities over time that sort of leverage this new engine, but you know, initially the customers who are using us are gonna see just great improvements in performance, you know, especially those that are working at the top end of the, of the workload scale, you know, the massive data volumes and things like that. >>Yeah, and we're gonna get into that today and the architecture and the like, but what was the catalyst for the enhancements? I mean, when and how did this all come about? >>Well, I mean, like three years ago we were primarily on premises, right? I mean, I think we had our open source, we had an enterprise product, you know, and, and sort of shifting that technology, especially the open source code base to a service basis where we were hosting it through, you know, multiple cloud providers. That was, that was, that was a long journey I guess, you know, phase one was, you know, we wanted to host enterprise for our customers, so we sort of created a service that we just managed and ran our enterprise product for them. You know, phase two of this cloud effort was to, to optimize for like multi-tenant, multi-cloud, be able to, to host it in a truly like sass manner where we could use, you know, some type of customer activity or consumption as the, the pricing vector, you know, And, and that was sort of the birth of the, of the real first influx DB cloud, you know, which has been really successful. >>We've seen, I think like 60,000 people sign up and we've got tons and tons of, of both enterprises as well as like new companies, developers, and of course a lot of home hobbyists and enthusiasts who are using out on a, on a daily basis, you know, and having that sort of big pool of, of very diverse and very customers to chat with as they're using the product, as they're giving us feedback, et cetera, has has, you know, pointed us in a really good direction in terms of making sure we're continuously improving that and then also making these big leaps as we're doing with this, with this new engine. >>Right. So you've called it a transparent change for customers, so I'm presuming it's non-disruptive, but I really wanna understand how much of a pivot this is and what, what does it take to make that shift from, you know, time series, you know, specialist to real time analytics and being able to support both? >>Yeah, I mean, it's much more of an evolution, I think, than like a shift or a pivot. You know, time series data is always gonna be fundamental and sort of the basis of the solutions that we offer our customers, and then also the ones that they're building on the sort of raw APIs of our platform themselves. You know, the time series market is one that we've worked diligently to lead. I mean, I think when it comes to like metrics, especially like sensor data and app and infrastructure metrics, if we're being honest though, I think our, our user base is well aware that the way we were architected was much more towards those sort of like backwards looking historical type analytics, which are key for troubleshooting and making sure you don't, you know, run into the same problem twice. But, you know, we had to ask ourselves like, what can we do to like better handle those queries from a performance and a, and a, you know, a time to response on the queries, and can we get that to the point where the results sets are coming back so quickly from the time of query that we can like limit that window down to minutes and then seconds. >>And now with this new engine, we're really starting to talk about a query window that could be like returning results in, in, you know, milliseconds of time since it hit the, the, the ingest queue. And that's, that's really getting to the point where as your data is available, you can use it and you can query it, you can visualize it, and you can do all those sort of magical things with it, you know? And I think getting all of that to a place where we're saying like, yes to the customer on, you know, all of the, the real time queries, the, the multiple language query support, but, you know, it was hard, but we're now at a spot where we can start introducing that to, you know, a a limited number of customers, strategic customers and strategic availability zones to start. But you know, everybody over time. >>So you're basically going from what happened to in, you can still do that obviously, but to what's happening now in the moment? >>Yeah, yeah. I mean if you think about time, it's always sort of past, right? I mean, like in the moment right now, whether you're talking about like a millisecond ago or a minute ago, you know, that's, that's pretty much right now, I think for most people, especially in these use cases where you have other sort of components of latency induced by the, by the underlying data collection, the architecture, the infrastructure, the, you know, the, the devices and you know, the sort of highly distributed nature of all of this. So yeah, I mean, getting, getting a customer or a user to be able to use the data as soon as it is available is what we're after here. >>I always thought, you know, real, I always thought of real time as before you lose the customer, but now in this context, maybe it's before the machine blows up. >>Yeah, it's, it's, I mean it is operationally or operational real time is different, you know, and that's one of the things that really triggered us to know that we were, we were heading in the right direction, is just how many sort of operational customers we have. You know, everything from like aerospace and defense. We've got companies monitoring satellites, we've got tons of industrial users, users using us as a processes storing on the plant floor, you know, and, and if we can satisfy their sort of demands for like real time historical perspective, that's awesome. I think what we're gonna do here is we're gonna start to like edge into the real time that they're used to in terms of, you know, the millisecond response times that they expect of their control systems, certainly not their, their historians and databases. >>I, is this available, these innovations to influx DB cloud customers only who can access this capability? >>Yeah. I mean commercially and today, yes. You know, I think we want to emphasize that's a, for now our goal is to get our latest and greatest and our best to everybody over time. Of course. You know, one of the things we had to do here was like we double down on sort of our, our commitment to open source and availability. So like anybody today can take a look at the, the libraries in on our GitHub and, you know, can ex inspect it and even can try to, you know, implement or execute some of it themselves in their own infrastructure. You know, we are, we're committed to bringing our sort of latest and greatest to our cloud customers first for a couple of reasons. Number one, you know, there are big workloads and they have high expectations of us. I think number two, it also gives us the opportunity to monitor a little bit more closely how it's working, how they're using it, like how the system itself is performing. >>And so just, you know, being careful, maybe a little cautious in terms of, of, of how big we go with this right away, just sort of both limits, you know, the risk of, of, you know, any issues that can come with new software rollouts. We haven't seen anything so far, but also it does give us the opportunity to have like meaningful conversations with a small group of users who are using the products, but once we get through that and they give us two thumbs up on it, it'll be like, open the gates and let everybody in. It's gonna be exciting time for the whole ecosystem. >>Yeah, that makes a lot of sense. And you can do some experimentation and, you know, using the cloud resources. Let's dig into some of the architectural and technical innovations that are gonna help deliver on this vision. What, what should we know there? >>Well, I mean, I think foundationally we built the, the new core on Rust. You know, this is a new very sort of popular systems language, you know, it's extremely efficient, but it's also built for speed and memory safety, which goes back to that us being able to like deliver it in a way that is, you know, something we can inspect very closely, but then also rely on the fact that it's going to behave well. And if it does find error conditions, I mean we, we've loved working with Go and, you know, a lot of our libraries will continue to, to be sort of implemented in Go, but you know, when it came to this particular new engine, you know, that power performance and stability rust was critical. On top of that, like, we've also integrated Apache Arrow and Apache Parque for persistence. I think for anybody who's really familiar with the nuts and bolts of our backend and our TSI and our, our time series merged Trees, this is a big break from that, you know, arrow on the sort of in MI side and then Par K in the on disk side. >>It, it allows us to, to present, you know, a unified set of APIs for those really fast real time inquiries that we talked about, as well as for very large, you know, historical sort of bulk data archives in that PARQUE format, which is also cool because there's an entire ecosystem sort of popping up around Parque in terms of the machine learning community, you know, and getting that all to work, we had to glue it together with aero flight. That's sort of what we're using as our, our RPC component. You know, it handles the orchestration and the, the transportation of the Coer data. Now we're moving to like a true Coer database model for this, this version of the engine, you know, and it removes a lot of overhead for us in terms of having to manage all that serialization, the deserialization, and, you know, to that again, like blurring that line between real time and historical data. It's, you know, it's, it's highly optimized for both streaming micro batch and then batches, but true streaming as well. >>Yeah. Again, I mean, it's funny you mentioned Rust. It is, it's been around for a long time, but it's popularity is, is you know, really starting to hit that steep part of the S-curve. And, and we're gonna dig into to more of that, but give us any, is there anything else that we should know about Bryan? Give us the last word? >>Well, I mean, I think first I'd like everybody sort of watching just to like take a look at what we're offering in terms of early access in beta programs. I mean, if, if, if you wanna participate or if you wanna work sort of in terms of early access with the, with the new engine, please reach out to the team. I'm sure you know, there's a lot of communications going out and you know, it'll be highly featured on our, our website, you know, but reach out to the team, believe it or not, like we have a lot more going on than just the new engine. And so there are also other programs, things we're, we're offering to customers in terms of the user interface, data collection and things like that. And, you know, if you're a customer of ours and you have a sales team, a commercial team that you work with, you can reach out to them and see what you can get access to because we can flip a lot of stuff on, especially in cloud through feature flags. >>But if there's something new that you wanna try out, we'd just love to hear from you. And then, you know, our goal would be that as we give you access to all of these new cool features that, you know, you would give us continuous feedback on these products and services, not only like what you need today, but then what you'll need tomorrow to, to sort of build the next versions of your business. Because you know, the whole database, the ecosystem as it expands out into to, you know, this vertically oriented stack of cloud services and enterprise databases and edge databases, you know, it's gonna be what we all make it together, not just, you know, those of us who were employed by Influx db. And then finally I would just say please, like watch in ICE in Tim's sessions, like these are two of our best and brightest, They're totally brilliant, completely pragmatic, and they are most of all customer obsessed, which is amazing. And there's no better takes, like honestly on the, the sort of technical details of this, then there's, especially when it comes to like the value that these investments will, will bring to our customers and our communities. So encourage you to, to, you know, pay more attention to them than you did to me, for sure. >>Brian Gilmore, great stuff. Really appreciate your time. Thank you. >>Yeah, thanks Dave. It was awesome. Look forward to it. >>Yeah, me too. Looking forward to see how the, the community actually applies these new innovations and goes, goes beyond just the historical into the real time really hot area. As Brian said in a moment, I'll be right back with Anna East dos Georgio to dig into the critical aspects of key open source components of the Influx DB engine, including Rust, Arrow, Parque, data fusion. Keep it right there. You don't wanna miss this >>Time series Data is everywhere. The number of sensors, systems and applications generating time series data increases every day. All these data sources producing so much data can cause analysis paralysis. Influx DB is an entire platform designed with everything you need to quickly build applications that generate value from time series data influx. DB Cloud is a serverless solution, which means you don't need to buy or manage your own servers. There's no need to worry about provisioning because you only pay for what you use. Influx DB Cloud is fully managed so you get the newest features and enhancements as they're added to the platform's code base. It also means you can spend time building solutions and delivering value to your users instead of wasting time and effort managing something else. Influx TVB Cloud offers a range of security features to protect your data, multiple layers of redundancy ensure you don't lose any data access controls ensure that only the people who should see your data can see it. >>And encryption protects your data at rest and in transit between any of our regions or cloud providers. InfluxDB uses a single API across the entire platform suite so you can build on open source, deploy to the cloud and then then easily query data in the cloud at the edge or on prem using the same scripts. And InfluxDB is schemaless automatically adjusting to changes in the shape of your data without requiring changes in your application. Logic. InfluxDB Cloud is production ready from day one. All it needs is your data and your imagination. Get started today@influxdata.com slash cloud. >>Okay, we're back. I'm Dave Valante with a Cube and you're watching evolving Influx DB into the smart data platform made possible by influx data. Anna ETOs Georgio is here, she's a developer advocate for influx data and we're gonna dig into the rationale and value contribution behind several open source technologies that Influx DB is leveraging to increase the granularity of time series analysis analysis and bring the world of data into real-time analytics and is welcome to the program. Thanks for coming on. >>Hi, thank you so much. It's a pleasure to be here. >>Oh, you're very welcome. Okay, so IX is being touted as this next gen open source core for Influx db. And my understanding is that it leverages in memory of course for speed. It's a kilo store, so it gives you a compression efficiency, it's gonna give you faster query speeds, you store files and object storage, so you got very cost effective approach. Are these the salient points on the platform? I know there are probably dozens of other features, but what are the high level value points that people should understand? >>Sure, that's a great question. So some of the main requirements that IOx is trying to achieve and some of the most impressive ones to me, the first one is that it aims to have no limits on cardinality and also allow you to write any kind of event data that you want, whether that's live tag or a field. It also wants to deliver the best in class performance on analytics queries. In addition to our already well served metrics queries, we also wanna have operator control over memory usage. So you should be able to define how much memory is used for buffering caching and query processing. Some other really important parts is the ability to have bulk data export and import super useful. Also broader ecosystem compatibility where possible we aim to use and embrace emerging standards in the data analytics ecosystem and have compatibility with things like sql, Python, and maybe even pandas in the future. >>Okay, so lot there. Now we talked to Brian about how you're using Rust and which is not a new programming language and of course we had some drama around Rust during the pandemic with the Mozilla layoffs, but the formation of the Rust Foundation really addressed any of those concerns. You got big guns like Amazon and Google and Microsoft throwing their collective weights behind it. It's really, the adoption is really starting to get steep on the S-curve. So lots of platforms, lots of adoption with rust, but why rust as an alternative to say c plus plus for example? >>Sure, that's a great question. So Russ was chosen because of his exceptional performance and reliability. So while Russ is synt tactically similar to c plus plus and it has similar performance, it also compiles to a native code like c plus plus. But unlike c plus plus, it also has much better memory safety. So memory safety is protection against bugs or security vulnerabilities that lead to excessive memory usage or memory leaks. And rust achieves this memory safety due to its like innovative type system. Additionally, it doesn't allow for dangling pointers. And dangling pointers are the main classes of errors that lead to exploitable security vulnerabilities in languages like c plus plus. So Russ like helps meet that requirement of having no limits on ality, for example, because it's, we're also using the Russ implementation of Apache Arrow and this control over memory and also Russ Russ's packaging system called crates IO offers everything that you need out of the box to have features like AY and a weight to fix race conditions, to protection against buffering overflows and to ensure thread safe async cashing structures as well. So essentially it's just like has all the control, all the fine grain control, you need to take advantage of memory and all your resources as well as possible so that you can handle those really, really high ity use cases. >>Yeah, and the more I learn about the, the new engine and, and the platform IOCs et cetera, you know, you, you see things like, you know, the old days not even to even today you do a lot of garbage collection in these, in these systems and there's an inverse, you know, impact relative to performance. So it looks like you really, you know, the community is modernizing the platform, but I wanna talk about Apache Arrow for a moment. It it's designed to address the constraints that are associated with analyzing large data sets. We, we know that, but please explain why, what, what is Arrow and and what does it bring to Influx db? >>Sure, yeah. So Arrow is a, a framework for defining in memory calmer data. And so much of the efficiency and performance of IOx comes from taking advantage of calmer data structures. And I will, if you don't mind, take a moment to kind of of illustrate why column or data structures are so valuable. Let's pretend that we are gathering field data about the temperature in our room and also maybe the temperature of our stove. And in our table we have those two temperature values as well as maybe a measurement value, timestamp value, maybe some other tag values that describe what room and what house, et cetera we're getting this data from. And so you can picture this table where we have like two rows with the two temperature values for both our room and the stove. Well usually our room temperature is regulated so those values don't change very often. >>So when you have calm oriented st calm oriented storage, essentially you take each row, each column and group it together. And so if that's the case and you're just taking temperature values from the room and a lot of those temperature values are the same, then you'll, you might be able to imagine how equal values will then enable each other and when they neighbor each other in the storage format, this provides a really perfect opportunity for cheap compression. And then this cheap compression enables high cardinality use cases. It also enables for faster scan rates. So if you wanna define like the men and max value of the temperature in the room across a thousand different points, you only have to get those a thousand different points in order to answer that question and you have those immediately available to you. But let's contrast this with a row oriented storage solution instead so that we can understand better the benefits of calmer oriented storage. >>So if you had a row oriented storage, you'd first have to look at every field like the temperature in, in the room and the temperature of the stove. You'd have to go across every tag value that maybe describes where the room is located or what model the stove is. And every timestamp you'd then have to pluck out that one temperature value that you want at that one time stamp and do that for every single row. So you're scanning across a ton more data and that's why Rowe Oriented doesn't provide the same efficiency as calmer and Apache Arrow is in memory calmer data, commoner data fit framework. So that's where a lot of the advantages come >>From. Okay. So you basically described like a traditional database, a row approach, but I've seen like a lot of traditional database say, okay, now we've got, we can handle colo format versus what you're talking about is really, you know, kind of native i, is it not as effective? Is the, is the foreman not as effective because it's largely a, a bolt on? Can you, can you like elucidate on that front? >>Yeah, it's, it's not as effective because you have more expensive compression and because you can't scan across the values as quickly. And so those are, that's pretty much the main reasons why, why RO row oriented storage isn't as efficient as calm, calmer oriented storage. Yeah. >>Got it. So let's talk about Arrow Data Fusion. What is data fusion? I know it's written in Rust, but what does it bring to the table here? >>Sure. So it's an extensible query execution framework and it uses Arrow as it's in memory format. So the way that it helps in influx DB IOCs is that okay, it's great if you can write unlimited amount of cardinality into influx Cbis, but if you don't have a query engine that can successfully query that data, then I don't know how much value it is for you. So Data fusion helps enable the, the query process and transformation of that data. It also has a PANDAS API so that you could take advantage of PANDAS data frames as well and all of the machine learning tools associated with Pandas. >>Okay. You're also leveraging Par K in the platform cause we heard a lot about Par K in the middle of the last decade cuz as a storage format to improve on Hadoop column stores. What are you doing with Parque and why is it important? >>Sure. So parque is the column oriented durable file format. So it's important because it'll enable bulk import, bulk export, it has compatibility with Python and Pandas, so it supports a broader ecosystem. Par K files also take very little disc disc space and they're faster to scan because again, they're column oriented in particular, I think PAR K files are like 16 times cheaper than CSV files, just as kind of a point of reference. And so that's essentially a lot of the, the benefits of par k. >>Got it. Very popular. So and he's, what exactly is influx data focusing on as a committer to these projects? What is your focus? What's the value that you're bringing to the community? >>Sure. So Influx DB first has contributed a lot of different, different things to the Apache ecosystem. For example, they contribute an implementation of Apache Arrow and go and that will support clearing with flux. Also, there has been a quite a few contributions to data fusion for things like memory optimization and supportive additional SQL features like support for timestamp, arithmetic and support for exist clauses and support for memory control. So yeah, Influx has contributed a a lot to the Apache ecosystem and continues to do so. And I think kind of the idea here is that if you can improve these upstream projects and then the long term strategy here is that the more you contribute and build those up, then the more you will perpetuate that cycle of improvement and the more we will invest in our own project as well. So it's just that kind of symbiotic relationship and appreciation of the open source community. >>Yeah. Got it. You got that virtuous cycle going, the people call the flywheel. Give us your last thoughts and kind of summarize, you know, where what, what the big takeaways are from your perspective. >>So I think the big takeaway is that influx data is doing a lot of really exciting things with Influx DB IOx and I really encourage, if you are interested in learning more about the technologies that Influx is leveraging to produce IOCs, the challenges associated with it and all of the hard work questions and you just wanna learn more, then I would encourage you to go to the monthly Tech talks and community office hours and they are on every second Wednesday of the month at 8:30 AM Pacific time. There's also a community forums and a community Slack channel look for the influx DDB unders IAC channel specifically to learn more about how to join those office hours and those monthly tech tech talks as well as ask any questions they have about iacs, what to expect and what you'd like to learn more about. I as a developer advocate, I wanna answer your questions. So if there's a particular technology or stack that you wanna dive deeper into and want more explanation about how INFLUX DB leverages it to build IOCs, I will be really excited to produce content on that topic for you. >>Yeah, that's awesome. You guys have a really rich community, collaborate with your peers, solve problems, and, and you guys super responsive, so really appreciate that. All right, thank you so much Anise for explaining all this open source stuff to the audience and why it's important to the future of data. >>Thank you. I really appreciate it. >>All right, you're very welcome. Okay, stay right there and in a moment I'll be back with Tim Yoakum, he's the director of engineering for Influx Data and we're gonna talk about how you update a SAS engine while the plane is flying at 30,000 feet. You don't wanna miss this. >>I'm really glad that we went with InfluxDB Cloud for our hosting because it has saved us a ton of time. It's helped us move faster, it's saved us money. And also InfluxDB has good support. My name's Alex Nada. I am CTO at Noble nine. Noble Nine is a platform to measure and manage service level objectives, which is a great way of measuring the reliability of your systems. You can essentially think of an slo, the product we're providing to our customers as a bunch of time series. So we need a way to store that data and the corresponding time series that are related to those. The main reason that we settled on InfluxDB as we were shopping around is that InfluxDB has a very flexible query language and as a general purpose time series database, it basically had the set of features we were looking for. >>As our platform has grown, we found InfluxDB Cloud to be a really scalable solution. We can quickly iterate on new features and functionality because Influx Cloud is entirely managed, it probably saved us at least a full additional person on our team. We also have the option of running InfluxDB Enterprise, which gives us the ability to even host off the cloud or in a private cloud if that's preferred by a customer. Influx data has been really flexible in adapting to the hosting requirements that we have. They listened to the challenges we were facing and they helped us solve it. As we've continued to grow, I'm really happy we have influx data by our side. >>Okay, we're back with Tim Yokum, who is the director of engineering at Influx Data. Tim, welcome. Good to see you. >>Good to see you. Thanks for having me. >>You're really welcome. Listen, we've been covering open source software in the cube for more than a decade, and we've kind of watched the innovation from the big data ecosystem. The cloud has been being built out on open source, mobile, social platforms, key databases, and of course influx DB and influx data has been a big consumer and contributor of open source software. So my question to you is, where have you seen the biggest bang for the buck from open source software? >>So yeah, you know, influx really, we thrive at the intersection of commercial services and open, so open source software. So OSS keeps us on the cutting edge. We benefit from OSS in delivering our own service from our core storage engine technologies to web services temping engines. Our, our team stays lean and focused because we build on proven tools. We really build on the shoulders of giants and like you've mentioned, even better, we contribute a lot back to the projects that we use as well as our own product influx db. >>You know, but I gotta ask you, Tim, because one of the challenge that that we've seen in particular, you saw this in the heyday of Hadoop, the, the innovations come so fast and furious and as a software company you gotta place bets, you gotta, you know, commit people and sometimes those bets can be risky and not pay off well, how have you managed this challenge? >>Oh, it moves fast. Yeah, that, that's a benefit though because it, the community moves so quickly that today's hot technology can be tomorrow's dinosaur. And what we, what we tend to do is, is we fail fast and fail often. We try a lot of things. You know, you look at Kubernetes for example, that ecosystem is driven by thousands of intelligent developers, engineers, builders, they're adding value every day. So we have to really keep up with that. And as the stack changes, we, we try different technologies, we try different methods, and at the end of the day, we come up with a better platform as a result of just the constant change in the environment. It is a challenge for us, but it's, it's something that we just do every day. >>So we have a survey partner down in New York City called Enterprise Technology Research etr, and they do these quarterly surveys of about 1500 CIOs, IT practitioners, and they really have a good pulse on what's happening with spending. And the data shows that containers generally, but specifically Kubernetes is one of the areas that has kind of, it's been off the charts and seen the most significant adoption and velocity particularly, you know, along with cloud. But, but really Kubernetes is just, you know, still up until the right consistently even with, you know, the macro headwinds and all, all of the stuff that we're sick of talking about. But, so what are you doing with Kubernetes in the platform? >>Yeah, it, it's really central to our ability to run the product. When we first started out, we were just on AWS and, and the way we were running was, was a little bit like containers junior. Now we're running Kubernetes everywhere at aws, Azure, Google Cloud. It allows us to have a consistent experience across three different cloud providers and we can manage that in code so our developers can focus on delivering services, not trying to learn the intricacies of Amazon, Azure, and Google and figure out how to deliver services on those three clouds with all of their differences. >>Just to follow up on that, is it, no. So I presume it's sounds like there's a PAs layer there to allow you guys to have a consistent experience across clouds and out to the edge, you know, wherever is that, is that correct? >>Yeah, so we've basically built more or less platform engineering, This is the new hot phrase, you know, it, it's, Kubernetes has made a lot of things easy for us because we've built a platform that our developers can lean on and they only have to learn one way of deploying their application, managing their application. And so that, that just gets all of the underlying infrastructure out of the way and, and lets them focus on delivering influx cloud. >>Yeah, and I know I'm taking a little bit of a tangent, but is that, that, I'll call it a PAs layer if I can use that term. Is that, are there specific attributes to Influx db or is it kind of just generally off the shelf paths? You know, are there, is, is there any purpose built capability there that, that is, is value add or is it pretty much generic? >>So we really build, we, we look at things through, with a build versus buy through a, a build versus by lens. Some things we want to leverage cloud provider services, for instance, Postgres databases for metadata, perhaps we'll get that off of our plate, let someone else run that. We're going to deploy a platform that our engineers can, can deliver on that has consistency that is, is all generated from code that we can as a, as an SRE group, as an ops team, that we can manage with very few people really, and we can stamp out clusters across multiple regions and in no time. >>So how, so sometimes you build, sometimes you buy it. How do you make those decisions and and what does that mean for the, for the platform and for customers? >>Yeah, so what we're doing is, it's like everybody else will do, we're we're looking for trade offs that make sense. You know, we really want to protect our customers data. So we look for services that support our own software with the most uptime, reliability, and durability we can get. Some things are just going to be easier to have a cloud provider take care of on our behalf. We make that transparent for our own team. And of course for customers you don't even see that, but we don't want to try to reinvent the wheel, like I had mentioned with SQL data stores for metadata, perhaps let's build on top of what of these three large cloud providers have already perfected. And we can then focus on our platform engineering and we can have our developers then focus on the influx data, software, influx, cloud software. >>So take it to the customer level, what does it mean for them? What's the value that they're gonna get out of all these innovations that we've been been talking about today and what can they expect in the future? >>So first of all, people who use the OSS product are really gonna be at home on our cloud platform. You can run it on your desktop machine, on a single server, what have you, but then you want to scale up. We have some 270 terabytes of data across, over 4 billion series keys that people have stored. So there's a proven ability to scale now in terms of the open source, open source software and how we've developed the platform. You're getting highly available high cardinality time series platform. We manage it and, and really as, as I mentioned earlier, we can keep up with the state of the art. We keep reinventing, we keep deploying things in real time. We deploy to our platform every day repeatedly all the time. And it's that continuous deployment that allows us to continue testing things in flight, rolling things out that change new features, better ways of doing deployments, safer ways of doing deployments. >>All of that happens behind the scenes. And like we had mentioned earlier, Kubernetes, I mean that, that allows us to get that done. We couldn't do it without having that platform as a, as a base layer for us to then put our software on. So we, we iterate quickly. When you're on the, the Influx cloud platform, you really are able to, to take advantage of new features immediately. We roll things out every day and as those things go into production, you have, you have the ability to, to use them. And so in the end we want you to focus on getting actual insights from your data instead of running infrastructure, you know, let, let us do that for you. So, >>And that makes sense, but so is the, is the, are the innovations that we're talking about in the evolution of Influx db, do, do you see that as sort of a natural evolution for existing customers? I, is it, I'm sure the answer is both, but is it opening up new territory for customers? Can you add some color to that? >>Yeah, it really is it, it's a little bit of both. Any engineer will say, well, it depends. So cloud native technologies are, are really the hot thing. Iot, industrial iot especially, people want to just shove tons of data out there and be able to do queries immediately and they don't wanna manage infrastructure. What we've started to see are people that use the cloud service as their, their data store backbone and then they use edge computing with R OSS product to ingest data from say, multiple production lines and downsample that data, send the rest of that data off influx cloud where the heavy processing takes place. So really us being in all the different clouds and iterating on that and being in all sorts of different regions allows for people to really get out of the, the business of man trying to manage that big data, have us take care of that. And of course as we change the platform end users benefit from that immediately. And, >>And so obviously taking away a lot of the heavy lifting for the infrastructure, would you say the same thing about security, especially as you go out to IOT and the Edge? How should we be thinking about the value that you bring from a security perspective? >>Yeah, we take, we take security super seriously. It, it's built into our dna. We do a lot of work to ensure that our platform is secure, that the data we store is, is kept private. It's of course always a concern. You see in the news all the time, companies being compromised, you know, that's something that you can have an entire team working on, which we do to make sure that the data that you have, whether it's in transit, whether it's at rest, is always kept secure, is only viewable by you. You know, you look at things like software, bill of materials, if you're running this yourself, you have to go vet all sorts of different pieces of software. And we do that, you know, as we use new tools. That's something that, that's just part of our jobs to make sure that the platform that we're running it has, has fully vetted software and, and with open source especially, that's a lot of work. And so it's, it's definitely new territory. Supply chain attacks are, are definitely happening at a higher clip than they used to, but that is, that is really just part of a day in the, the life for folks like us that are, are building platforms. >>Yeah, and that's key. I mean especially when you start getting into the, the, you know, we talk about IOT and the operations technologies, the engineers running the, that infrastructure, you know, historically, as you know, Tim, they, they would air gap everything. That's how they kept it safe. But that's not feasible anymore. Everything's >>That >>Connected now, right? And so you've gotta have a partner that is again, take away that heavy lifting to r and d so you can focus on some of the other activities. Right. Give us the, the last word and the, the key takeaways from your perspective. >>Well, you know, from my perspective I see it as, as a a two lane approach with, with influx, with Anytime series data, you know, you've got a lot of stuff that you're gonna run on-prem, what you had mentioned, air gaping. Sure there's plenty of need for that, but at the end of the day, people that don't want to run big data centers, people that want torus their data to, to a company that's, that's got a full platform set up for them that they can build on, send that data over to the cloud, the cloud is not going away. I think more hybrid approach is, is where the future lives and that's what we're prepared for. >>Tim, really appreciate you coming to the program. Great stuff. Good to see you. >>Thanks very much. Appreciate it. >>Okay, in a moment I'll be back to wrap up. Today's session, you're watching The Cube. >>Are you looking for some help getting started with InfluxDB Telegraph or Flux Check >>Out Influx DB University >>Where you can find our entire catalog of free training that will help you make the most of your time series data >>Get >>Started for free@influxdbu.com. >>We'll see you in class. >>Okay, so we heard today from three experts on time series and data, how the Influx DB platform is evolving to support new ways of analyzing large data sets very efficiently and effectively in real time. And we learned that key open source components like Apache Arrow and the Rust Programming environment Data fusion par K are being leveraged to support realtime data analytics at scale. We also learned about the contributions in importance of open source software and how the Influx DB community is evolving the platform with minimal disruption to support new workloads, new use cases, and the future of realtime data analytics. Now remember these sessions, they're all available on demand. You can go to the cube.net to find those. Don't forget to check out silicon angle.com for all the news related to things enterprise and emerging tech. And you should also check out influx data.com. There you can learn about the company's products. You'll find developer resources like free courses. You could join the developer community and work with your peers to learn and solve problems. And there are plenty of other resources around use cases and customer stories on the website. This is Dave Valante. Thank you for watching Evolving Influx DB into the smart data platform, made possible by influx data and brought to you by the Cube, your leader in enterprise and emerging tech coverage.

Published Date : Nov 2 2022

SUMMARY :

we talked about how in theory, those time slices could be taken, you know, As is often the case, open source software is the linchpin to those innovations. We hope you enjoy the program. I appreciate the time. Hey, explain why Influx db, you know, needs a new engine. now, you know, related to requests like sql, you know, query support, things like that, of the real first influx DB cloud, you know, which has been really successful. as they're giving us feedback, et cetera, has has, you know, pointed us in a really good direction shift from, you know, time series, you know, specialist to real time analytics better handle those queries from a performance and a, and a, you know, a time to response on the queries, you know, all of the, the real time queries, the, the multiple language query support, the, the devices and you know, the sort of highly distributed nature of all of this. I always thought, you know, real, I always thought of real time as before you lose the customer, you know, and that's one of the things that really triggered us to know that we were, we were heading in the right direction, a look at the, the libraries in on our GitHub and, you know, can ex inspect it and even can try And so just, you know, being careful, maybe a little cautious in terms And you can do some experimentation and, you know, using the cloud resources. You know, this is a new very sort of popular systems language, you know, really fast real time inquiries that we talked about, as well as for very large, you know, but it's popularity is, is you know, really starting to hit that steep part of the S-curve. going out and you know, it'll be highly featured on our, our website, you know, the whole database, the ecosystem as it expands out into to, you know, this vertically oriented Really appreciate your time. Look forward to it. goes, goes beyond just the historical into the real time really hot area. There's no need to worry about provisioning because you only pay for what you use. InfluxDB uses a single API across the entire platform suite so you can build on Influx DB is leveraging to increase the granularity of time series analysis analysis and bring the Hi, thank you so much. it's gonna give you faster query speeds, you store files and object storage, it aims to have no limits on cardinality and also allow you to write any kind of event data that It's really, the adoption is really starting to get steep on all the control, all the fine grain control, you need to take you know, the community is modernizing the platform, but I wanna talk about Apache And so you can answer that question and you have those immediately available to you. out that one temperature value that you want at that one time stamp and do that for every talking about is really, you know, kind of native i, is it not as effective? Yeah, it's, it's not as effective because you have more expensive compression and So let's talk about Arrow Data Fusion. It also has a PANDAS API so that you could take advantage of PANDAS What are you doing with and Pandas, so it supports a broader ecosystem. What's the value that you're bringing to the community? And I think kind of the idea here is that if you can improve kind of summarize, you know, where what, what the big takeaways are from your perspective. the hard work questions and you All right, thank you so much Anise for explaining I really appreciate it. Data and we're gonna talk about how you update a SAS engine while I'm really glad that we went with InfluxDB Cloud for our hosting They listened to the challenges we were facing and they helped Good to see you. Good to see you. So my question to you is, So yeah, you know, influx really, we thrive at the intersection of commercial services and open, You know, you look at Kubernetes for example, But, but really Kubernetes is just, you know, Azure, and Google and figure out how to deliver services on those three clouds with all of their differences. to the edge, you know, wherever is that, is that correct? This is the new hot phrase, you know, it, it's, Kubernetes has made a lot of things easy for us Is that, are there specific attributes to Influx db as an SRE group, as an ops team, that we can manage with very few people So how, so sometimes you build, sometimes you buy it. And of course for customers you don't even see that, but we don't want to try to reinvent the wheel, and really as, as I mentioned earlier, we can keep up with the state of the art. the end we want you to focus on getting actual insights from your data instead of running infrastructure, So cloud native technologies are, are really the hot thing. You see in the news all the time, companies being compromised, you know, technologies, the engineers running the, that infrastructure, you know, historically, as you know, take away that heavy lifting to r and d so you can focus on some of the other activities. with influx, with Anytime series data, you know, you've got a lot of stuff that you're gonna run on-prem, Tim, really appreciate you coming to the program. Thanks very much. Okay, in a moment I'll be back to wrap up. brought to you by the Cube, your leader in enterprise and emerging tech coverage.

ENTITIES

Entity	Category	Confidence
Brian Gilmore	PERSON	0.99+
David Brown	PERSON	0.99+
Tim Yoakum	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Dave Volante	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Brian	PERSON	0.99+
Dave	PERSON	0.99+
Tim Yokum	PERSON	0.99+
Stu	PERSON	0.99+
Herain Oberoi	PERSON	0.99+
John	PERSON	0.99+
Dave Valante	PERSON	0.99+
Kamile Taouk	PERSON	0.99+
John Fourier	PERSON	0.99+
Rinesh Patel	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Santana Dasgupta	PERSON	0.99+
Europe	LOCATION	0.99+
Canada	LOCATION	0.99+
BMW	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
ICE	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Jack Berkowitz	PERSON	0.99+
Australia	LOCATION	0.99+
NVIDIA	ORGANIZATION	0.99+
Telco	ORGANIZATION	0.99+
Venkat	PERSON	0.99+
Michael	PERSON	0.99+
Camille	PERSON	0.99+
Andy Jassy	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Venkat Krishnamachari	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Don Tapscott	PERSON	0.99+
thousands	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
Intercontinental Exchange	ORGANIZATION	0.99+
Children's Cancer Institute	ORGANIZATION	0.99+
Red Hat	ORGANIZATION	0.99+
telco	ORGANIZATION	0.99+
Sabrina Yan	PERSON	0.99+
Tim	PERSON	0.99+
Sabrina	PERSON	0.99+
John Furrier	PERSON	0.99+
Google	ORGANIZATION	0.99+
MontyCloud	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Leo	PERSON	0.99+
COVID-19	OTHER	0.99+
Santa Ana	LOCATION	0.99+
UK	LOCATION	0.99+
Tushar	PERSON	0.99+
Las Vegas	LOCATION	0.99+
Valente	PERSON	0.99+
JL Valente	PERSON	0.99+
1,000	QUANTITY	0.99+

Felix Van de Maele, Collibra, Data Citizens 22

(upbeat techno music) >> Collibra is a company that was founded in 2008 right before the so-called modern big data era kicked into high gear. The company was one of the first to focus its business on data governance. Now, historically, data governance and data quality initiatives, they were back office functions, and they were largely confined to regulated industries that had to comply with public policy mandates. But as the cloud went mainstream the tech giants showed us how valuable data could become, and the value proposition for data quality and trust, it evolved from primarily a compliance driven issue, to becoming a linchpin of competitive advantage. But, data in the decade of the 2010s was largely about getting the technology to work. You had these highly centralized technical teams that were formed and they had hyper-specialized skills, to develop data architectures and processes, to serve the myriad data needs of organizations. And it resulted in a lot of frustration, with data initiatives for most organizations, that didn't have the resources of the cloud guys and the social media giants, to really attack their data problems and turn data into gold. This is why today, for example, there's quite a bit of momentum to re-thinking monolithic data architectures. You see, you hear about initiatives like Data Mesh and the idea of data as a product. They're gaining traction as a way to better serve the the data needs of decentralized business users. You hear a lot about data democratization. So these decentralization efforts around data, they're great, but they create a new set of problems. Specifically, how do you deliver, like a self-service infrastructure to business users and domain experts? Now the cloud is definitely helping with that but also, how do you automate governance? This becomes especially tricky as protecting data privacy has become more and more important. In other words, while it's enticing to experiment, and run fast and loose with data initiatives, kind of like the Wild West, to find new veins of gold, it has to be done responsibly. As such, the idea of data governance has had to evolve to become more automated and intelligent. Governance and data lineage is still fundamental to ensuring trust as data. It moves like water through an organization. No one is going to use data that is entrusted. Metadata has become increasingly important for data discovery and data classification. As data flows through an organization, the continuously ability to check for data flaws and automating that data quality, they become a functional requirement of any modern data management platform. And finally, data privacy has become a critical adjacency to cyber security. So you can see how data governance has evolved into a much richer set of capabilities than it was 10 or 15 years ago. Hello and welcome to theCUBE's coverage of Data Citizens made possible by Collibra, a leader in so-called Data intelligence and the host of Data Citizens 2022, which is taking place in San Diego. My name is Dave Vellante and I'm one of the hosts of our program which is running in parallel to Data Citizens. Now at theCUBE we like to say we extract the signal from the noise, and over the next couple of days we're going to feature some of the themes from the keynote speakers at Data Citizens, and we'll hear from several of the executives. Felix Van de Maele, who is the co-founder and CEO of Collibra, will join us. Along with one of the other founders of Collibra, Stan Christiaens, who's going to join my colleague Lisa Martin. I'm going to also sit down with Laura Sellers, she's the Chief Product Officer at Collibra. We'll talk about some of the the announcements and innovations they're making at the event, and then we'll dig in further to data quality with Kirk Haslbeck. He's the Vice President of Data Quality at Collibra. He's an amazingly smart dude who founded Owl DQ, a company that he sold to Collibra last year. Now, many companies they didn't make it through the Hadoop era, you know they missed the industry waves and they became driftwood. Collibra, on the other hand, has evolved its business, they've leveraged the cloud, expanded its product portfolio and leaned in heavily to some major partnerships with cloud providers as well as receiving a strategic investment from Snowflake, earlier this year. So, it's a really interesting story that we're thrilled to be sharing with you. Thanks for watching and I hope you enjoy the program. (upbeat rock music) Last year theCUBE covered Data Citizens, Collibra's customer event, and the premise that we put forth prior to that event was that despite all the innovation that's gone on over the last decade or more with data, you know starting with the Hadoop movement, we had Data lakes, we had Spark, the ascendancy of programming languages like Python, the introduction of frameworks like Tensorflow, the rise of AI, Low Code, No Code, et cetera. Businesses still find it's too difficult to get more value from their data initiatives, and we said at the time, you know maybe it's time to rethink data innovation. While a lot of the effort has been focused on, you more efficiently storing and processing data, perhaps more energy needs to go into thinking about the people and the process side of the equation. Meaning, making it easier for domain experts to both gain insights from data, trust the data, and begin to use that data in new ways, fueling data products, monetization, and insights. Data Citizens 2022 is back and we're pleased to have Felix Van de Maele who is the founder and CEO of Collibra. He's on theCUBE. We're excited to have you Felix. Good to see you again. >> Likewise Dave. Thanks for having me again. >> You bet. All right, we're going to get the update from Felix on the current data landscape, how he sees it why data intelligence is more important now than ever, and get current on what Collibra has been up to over the past year, and what's changed since Data citizens 2021, and we may even touch on some of the product news. So Felix, we're living in a very different world today with businesses and consumers. They're struggling with things like supply chains, uncertain economic trends and we're not just snapping back to the 2010s, that's clear, and that's really true as well in the world of data. So what's different in your mind, in the data landscape of the 2020s, from the previous decade, and what challenges does that bring for your customers? >> Yeah, absolutely, and and I think you said it well, Dave and the intro that, that rising complexity and fragmentation, in the broader data landscape, that hasn't gotten any better over the last couple of years. When when we talk to our customers, that level of fragmentation, the complexity, how do we find data that we can trust, that we know we can use, has only gotten more more difficult. So that trend that's continuing, I think what is changing is that trend has become much more acute. Well, the other thing we've seen over the last couple of years is that the level of scrutiny that organizations are under, respect to data, as data becomes more mission critical, as data becomes more impactful than important, the level of scrutiny with respect to privacy, security, regulatory compliance, as only increasing as well. Which again, is really difficult in this environment of continuous innovation, continuous change, continuous growing complexity, and fragmentation. So, it's become much more acute. And to your earlier point, we do live in a different world and and the past couple of years we could probably just kind of brute force it, right? We could focus on, on the top line, there was enough kind of investments to be, to be had. I think nowadays organizations are focused or are, are, are are, are, are in a very different environment where there's much more focus on cost control, productivity, efficiency, how do we truly get the value from that data? So again, I think it just another incentive for organization to now truly look at data and to scale with data, not just from a a technology and infrastructure perspective, but how do we actually scale data from an organizational perspective, right? You said at the, the people and process, how do we do that at scale? And that's only, only, only becoming much more important, and we do believe that the, the economic environment that we find ourselves in today is going to be catalyst for organizations to really take that more seriously if, if, if you will, than they maybe have in the have in the past. >> You know, I don't know when you guys founded Collibra, if you had a sense as to how complicated it was going to get, but you've been on a mission to really address these problems from the beginning. How would you describe your, your, your mission and what are you doing to address these challenges? >> Yeah, absolutely. We, we started Collibra in 2008. So, in some sense and the, the last kind of financial crisis and that was really the, the start of Collibra, where we found product market fit, working with large financial institutions to help them cope with the increasing compliance requirements that they were faced with because of the, of the financial crisis. And kind of here we are again, in a very different environment of course 15 years, almost 15 years later, but data only becoming more important. But our mission to deliver trusted data for every user, every use case and across every source, frankly, has only become more important. So, what has been an incredible journey over the last 14, 15 years, I think we're still relatively early in our mission to again, be able to provide everyone, and that's why we call it Data Citizens, we truly believe that everyone in the organization should be able to use trusted data in an easy, easy matter. That mission is is only becoming more important, more relevant. We definitely have a lot more work ahead of us because we still relatively early in that, in that journey. >> Well that's interesting, because you know, in my observation it takes 7 to 10 years to actually build a company, and then the fact that you're still in the early days is kind of interesting. I mean, you, Collibra's had a good 12 months or so since we last spoke at Data Citizens. Give us the latest update on your business. What do people need to know about your current momentum? >> Yeah, absolutely. Again, there's a lot of tailwind organizations that are only maturing their data practices and we've seen that kind of transform or influence a lot of our business growth that we've seen, broader adoption of the platform. We work at some of the largest organizations in the world with its Adobe, Heineken, Bank of America and many more. We have now over 600 enterprise customers, all industry leaders and every single vertical. So it's, it's really exciting to see that and continue to partner with those organizations. On the partnership side, again, a lot of momentum in the org in the, in the market with some of the cloud partners like Google, Amazon, Snowflake, Data Breaks, and and others, right? As those kind of new modern data infrastructures, modern data architectures, are definitely all moving to the cloud. A great opportunity for us, our partners, and of course our customers, to help them kind of transition to the cloud even faster. And so we see a lot of excitement and momentum there. We did an acquisition about 18 months ago around data quality, data observability, which we believe is an enormous opportunity. Of course data quality isn't new but I think there's a lot of reasons why we're so excited about quality and observability now. One, is around leveraging AI machine learning again to drive more automation. And a second is that those data pipelines, that are now being created in the cloud, in these modern data architecture, architectures, they've become mission critical. They've become real time. And so monitoring, observing those data pipelines continuously, has become absolutely critical so that they're really excited about, about that as well. And on the organizational side, I'm sure you've heard the term around kind of data mesh, something that's gaining a lot of momentum, rightfully so. It's really the type of governance that we always believed in. Federated, focused on domains, giving a lot of ownership to different teams. I think that's the way to scale data organizations, and so that aligns really well with our vision and from a product perspective, we've seen a lot of momentum with our customers there as well. >> Yeah, you know, a couple things there. I mean, the acquisition of OwlDQ, you know Kirk Haslbeck and, and their team. It's interesting, you know the whole data quality used to be this back office function and and really confined to highly regulated industries. It's come to the front office, it's top of mind for Chief Data Officers. Data mesh, you mentioned you guys are a connective tissue for all these different nodes on the data mesh. That's key. And of course we see you at all the shows. You're, you're a critical part of many ecosystems and you're developing your own ecosystem. So, let's chat a little bit about the, the products. We're going to go deeper into products later on, at Data Citizens 22, but we know you're debuting some, some new innovations, you know, whether it's, you know, the the under the covers in security, sort of making data more accessible for people, just dealing with workflows and processes, as you talked about earlier. Tell us a little bit about what you're introducing. >> Yeah, absolutely. We we're super excited, a ton of innovation. And if we think about the big theme and like, like I said, we're still relatively early in this, in this journey towards kind of that mission of data intelligence that really bolts and compelling mission. Either customers are still start, are just starting on that, on that journey. We want to make it as easy as possible for the, for organization to actually get started, because we know that's important that they do. And for our organization and customers, that have been with us for some time, there's still a tremendous amount of opportunity to kind of expand the platform further. And again to make it easier for, really to, to accomplish that mission and vision around that Data Citizen, that everyone has access to trustworthy data in a very easy, easy way. So that's really the theme of a lot of the innovation that we're driving, a lot of kind of ease of adoption, ease of use, but also then, how do we make sure that, as clear becomes this kind of mission critical enterprise platform, from a security performance, architecture scale supportability, that we're truly able to deliver that kind of an enterprise mission critical platform. And so that's the big theme. From an innovation perspective, from a product perspective, a lot of new innovation that we're really excited about. A couple of highlights. One, is around data marketplace. Again, a lot of our customers have plans in that direction, How to make it easy? How do we make How do we make available to true kind of shopping experience? So that anybody in the organization can, in a very easy search first way, find the right data product, find the right dataset, that they can then consume. Usage analytics, how do you, how do we help organizations drive adoption? Tell them where they're working really well and where they have opportunities. Homepages again to, to make things easy for, for people, for anyone in your organization, to kind of get started with Collibra. You mentioned Workflow Designer, again, we have a very powerful enterprise platform, one of our key differentiators is the ability to really drive a lot of automation through workflows. And now we provided a, a new Low-Code, No-Code kind of workflow designer experience. So, so really customers can take it to the next level. There's a lot more new product around Collibra protect, which in partnership with Snowflake, which has been a strategic investor in Collibra, focused on how do we make access governance easier? How do we, how do we, how are we able to make sure that as you move to the cloud, things like access management, masking around sensitive data, PIA data, is managed as a much more effective, effective rate. Really excited about that product. There's more around data quality. Again, how do we, how do we get that deployed as easily, and quickly, and widely as we can? Moving that to the cloud has been a big part of our strategy. So, we launch our data quality cloud product, as well as making use of those, those native compute capabilities and platforms, like Snowflake, Databricks, Google, Amazon, and others. And so we are bettering a capability, a capability that we call push down, so we're actually pushing down the computer and data quality, to monitoring into the underlying platform, which again from a scale performance and ease of use perspective, is going to make a massive difference. And then more broadly, we talked a little bit about the ecosystem. Again, integrations, we talk about being able to connect to every source. Integrations are absolutely critical, and we're really excited to deliver new integrations with Snowflake, Azure and Google Cloud storage as well. So that's a lot coming out, the team has been work, at work really hard, and we are really really excited about what we are coming, what we're bringing to market. >> Yeah, a lot going on there. I wonder if you could give us your, your closing thoughts. I mean, you you talked about, you know, the marketplace, you know you think about Data Mesh, you think of data as product, one of the key principles, you think about monetization. This is really different than what we've been used to in data, which is just getting the technology to work has been, been so hard. So, how do you see sort of the future and, you know give us the, your closing thoughts please? >> Yeah, absolutely. And, and I think we we're really at a pivotal moment and I think you said it well. We, we all know the constraint and the challenges with data, how to actually do data at scale. And while we've seen a ton of innovation on the infrastructure side, we fundamentally believe that just getting a faster database is important, but it's not going to fully solve the challenges and truly kind of deliver on the opportunity. And that's why now is really the time to, deliver this data intelligence vision, this data intelligence platform. We are still early, making it as easy as we can, as kind of our, as our mission. And so I'm really, really excited to see what we, what we are going to, how the marks are going to evolve over the next, next few quarters and years. I think the trend is clearly there. We talked about Data Mesh, this kind of federated approach focus on data products, is just another signal that we believe, that a lot of our organization are now at the time, they're understanding need to go beyond just the technology. I really, really think about how to actually scale data as a business function, just like we've done with IT, with HR, with sales and marketing, with finance. That's how we need to think about data. I think now is the time, given the economic environment that we are in, much more focus on control, much more focus on productivity, efficiency, and now is the time we need to look beyond just the technology and infrastructure to think of how to scale data, how to manage data at scale. >> Yeah, it's a new era. The next 10 years of data won't be like the last, as I always say. Felix, thanks so much. Good luck in, in San Diego. I know you're going to crush it out there. >> Thank you Dave. >> Yeah, it's a great spot for an in-person event and and of course the content post-event is going to be available at collibra.com and you can of course catch theCUBE coverage at theCUBE.net and all the news at siliconangle.com. This is Dave Vellante for theCUBE, your leader in enterprise and emerging tech coverage. (upbeat techno music)

Published Date : Nov 2 2022

SUMMARY :

and the premise that we put for having me again. in the data landscape of the 2020s, and to scale with data, and what are you doing to And kind of here we are again, still in the early days a lot of momentum in the org in the, And of course we see you at all the shows. is the ability to the technology to work and now is the time we need to look of data won't be like the and of course the content

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Heineken	ORGANIZATION	0.99+
Adobe	ORGANIZATION	0.99+
Felix Van de Maele	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Laura Sellers	PERSON	0.99+
Collibra	ORGANIZATION	0.99+
2008	DATE	0.99+
Felix	PERSON	0.99+
San Diego	LOCATION	0.99+
Stan Christiaens	PERSON	0.99+
Dave	PERSON	0.99+
Bank of America	ORGANIZATION	0.99+
7	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
2020s	DATE	0.99+
last year	DATE	0.99+
2010s	DATE	0.99+
Data Breaks	ORGANIZATION	0.99+
Python	TITLE	0.99+
Last year	DATE	0.99+
12 months	QUANTITY	0.99+
siliconangle.com	OTHER	0.99+
one	QUANTITY	0.99+
Data Citizens	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
Owl DQ	ORGANIZATION	0.98+
10	DATE	0.98+
OwlDQ	ORGANIZATION	0.98+
Kirk Haslbeck	PERSON	0.98+
10 years	QUANTITY	0.98+
One	QUANTITY	0.98+
Spark	TITLE	0.98+
today	DATE	0.98+
first	QUANTITY	0.97+
Data Citizens	EVENT	0.97+
earlier this year	DATE	0.96+
Tensorflow	TITLE	0.96+
Data Citizens 22	ORGANIZATION	0.95+
both	QUANTITY	0.94+
theCUBE	ORGANIZATION	0.94+
15 years ago	DATE	0.93+
over 600 enterprise customers	QUANTITY	0.91+
past couple of years	DATE	0.91+
about 18 months ago	DATE	0.9+
collibra.com	OTHER	0.89+
Data citizens 2021	ORGANIZATION	0.88+
Data Citizens 2022	EVENT	0.86+
almost 15 years later	DATE	0.85+
West	LOCATION	0.85+
Azure	TITLE	0.84+
first way	QUANTITY	0.83+
Vice President	PERSON	0.83+
last couple of years	DATE	0.8+

Evolving InfluxDB into the Smart Data Platform Full Episode

>>This past May, The Cube in collaboration with Influx data shared with you the latest innovations in Time series databases. We talked at length about why a purpose built time series database for many use cases, was a superior alternative to general purpose databases trying to do the same thing. Now, you may, you may remember the time series data is any data that's stamped in time, and if it's stamped, it can be analyzed historically. And when we introduced the concept to the community, we talked about how in theory, those time slices could be taken, you know, every hour, every minute, every second, you know, down to the millisecond and how the world was moving toward realtime or near realtime data analysis to support physical infrastructure like sensors and other devices and IOT equipment. A time series databases have had to evolve to efficiently support realtime data in emerging use cases in iot T and other use cases. >>And to do that, new architectural innovations have to be brought to bear. As is often the case, open source software is the linchpin to those innovations. Hello and welcome to Evolving Influx DB into the smart Data platform, made possible by influx data and produced by the Cube. My name is Dave Valante and I'll be your host today. Now in this program we're going to dig pretty deep into what's happening with Time series data generally, and specifically how Influx DB is evolving to support new workloads and demands and data, and specifically around data analytics use cases in real time. Now, first we're gonna hear from Brian Gilmore, who is the director of IOT and emerging technologies at Influx Data. And we're gonna talk about the continued evolution of Influx DB and the new capabilities enabled by open source generally and specific tools. And in this program you're gonna hear a lot about things like Rust, implementation of Apache Arrow, the use of par k and tooling such as data fusion, which powering a new engine for Influx db. >>Now, these innovations, they evolve the idea of time series analysis by dramatically increasing the granularity of time series data by compressing the historical time slices, if you will, from, for example, minutes down to milliseconds. And at the same time, enabling real time analytics with an architecture that can process data much faster and much more efficiently. Now, after Brian, we're gonna hear from Anna East Dos Georgio, who is a developer advocate at In Flux Data. And we're gonna get into the why of these open source capabilities and how they contribute to the evolution of the Influx DB platform. And then we're gonna close the program with Tim Yokum, he's the director of engineering at Influx Data, and he's gonna explain how the Influx DB community actually evolved the data engine in mid-flight and which decisions went into the innovations that are coming to the market. Thank you for being here. We hope you enjoy the program. Let's get started. Okay, we're kicking things off with Brian Gilmore. He's the director of i t and emerging Technology at Influx State of Bryan. Welcome to the program. Thanks for coming on. >>Thanks Dave. Great to be here. I appreciate the time. >>Hey, explain why Influx db, you know, needs a new engine. Was there something wrong with the current engine? What's going on there? >>No, no, not at all. I mean, I think it's, for us, it's been about staying ahead of the market. I think, you know, if we think about what our customers are coming to us sort of with now, you know, related to requests like sql, you know, query support, things like that, we have to figure out a way to, to execute those for them in a way that will scale long term. And then we also, we wanna make sure we're innovating, we're sort of staying ahead of the market as well and sort of anticipating those future needs. So, you know, this is really a, a transparent change for our customers. I mean, I think we'll be adding new capabilities over time that sort of leverage this new engine, but you know, initially the customers who are using us are gonna see just great improvements in performance, you know, especially those that are working at the top end of the, of the workload scale, you know, the massive data volumes and things like that. >>Yeah, and we're gonna get into that today and the architecture and the like, but what was the catalyst for the enhancements? I mean, when and how did this all come about? >>Well, I mean, like three years ago we were primarily on premises, right? I mean, I think we had our open source, we had an enterprise product, you know, and, and sort of shifting that technology, especially the open source code base to a service basis where we were hosting it through, you know, multiple cloud providers. That was, that was, that was a long journey I guess, you know, phase one was, you know, we wanted to host enterprise for our customers, so we sort of created a service that we just managed and ran our enterprise product for them. You know, phase two of this cloud effort was to, to optimize for like multi-tenant, multi-cloud, be able to, to host it in a truly like sass manner where we could use, you know, some type of customer activity or consumption as the, the pricing vector, you know, And, and that was sort of the birth of the, of the real first influx DB cloud, you know, which has been really successful. >>We've seen, I think like 60,000 people sign up and we've got tons and tons of, of both enterprises as well as like new companies, developers, and of course a lot of home hobbyists and enthusiasts who are using out on a, on a daily basis, you know, and having that sort of big pool of, of very diverse and very customers to chat with as they're using the product, as they're giving us feedback, et cetera, has has, you know, pointed us in a really good direction in terms of making sure we're continuously improving that and then also making these big leaps as we're doing with this, with this new engine. >>Right. So you've called it a transparent change for customers, so I'm presuming it's non-disruptive, but I really wanna understand how much of a pivot this is and what, what does it take to make that shift from, you know, time series, you know, specialist to real time analytics and being able to support both? >>Yeah, I mean, it's much more of an evolution, I think, than like a shift or a pivot. You know, time series data is always gonna be fundamental and sort of the basis of the solutions that we offer our customers, and then also the ones that they're building on the sort of raw APIs of our platform themselves. You know, the time series market is one that we've worked diligently to lead. I mean, I think when it comes to like metrics, especially like sensor data and app and infrastructure metrics, if we're being honest though, I think our, our user base is well aware that the way we were architected was much more towards those sort of like backwards looking historical type analytics, which are key for troubleshooting and making sure you don't, you know, run into the same problem twice. But, you know, we had to ask ourselves like, what can we do to like better handle those queries from a performance and a, and a, you know, a time to response on the queries, and can we get that to the point where the results sets are coming back so quickly from the time of query that we can like limit that window down to minutes and then seconds. >>And now with this new engine, we're really starting to talk about a query window that could be like returning results in, in, you know, milliseconds of time since it hit the, the, the ingest queue. And that's, that's really getting to the point where as your data is available, you can use it and you can query it, you can visualize it, and you can do all those sort of magical things with it, you know? And I think getting all of that to a place where we're saying like, yes to the customer on, you know, all of the, the real time queries, the, the multiple language query support, but, you know, it was hard, but we're now at a spot where we can start introducing that to, you know, a a limited number of customers, strategic customers and strategic availability zones to start. But you know, everybody over time. >>So you're basically going from what happened to in, you can still do that obviously, but to what's happening now in the moment? >>Yeah, yeah. I mean if you think about time, it's always sort of past, right? I mean, like in the moment right now, whether you're talking about like a millisecond ago or a minute ago, you know, that's, that's pretty much right now, I think for most people, especially in these use cases where you have other sort of components of latency induced by the, by the underlying data collection, the architecture, the infrastructure, the, you know, the, the devices and you know, the sort of highly distributed nature of all of this. So yeah, I mean, getting, getting a customer or a user to be able to use the data as soon as it is available is what we're after here. >>I always thought, you know, real, I always thought of real time as before you lose the customer, but now in this context, maybe it's before the machine blows up. >>Yeah, it's, it's, I mean it is operationally or operational real time is different, you know, and that's one of the things that really triggered us to know that we were, we were heading in the right direction, is just how many sort of operational customers we have. You know, everything from like aerospace and defense. We've got companies monitoring satellites, we've got tons of industrial users, users using us as a processes storing on the plant floor, you know, and, and if we can satisfy their sort of demands for like real time historical perspective, that's awesome. I think what we're gonna do here is we're gonna start to like edge into the real time that they're used to in terms of, you know, the millisecond response times that they expect of their control systems, certainly not their, their historians and databases. >>I, is this available, these innovations to influx DB cloud customers only who can access this capability? >>Yeah. I mean commercially and today, yes. You know, I think we want to emphasize that's a, for now our goal is to get our latest and greatest and our best to everybody over time. Of course. You know, one of the things we had to do here was like we double down on sort of our, our commitment to open source and availability. So like anybody today can take a look at the, the libraries in on our GitHub and, you know, can ex inspect it and even can try to, you know, implement or execute some of it themselves in their own infrastructure. You know, we are, we're committed to bringing our sort of latest and greatest to our cloud customers first for a couple of reasons. Number one, you know, there are big workloads and they have high expectations of us. I think number two, it also gives us the opportunity to monitor a little bit more closely how it's working, how they're using it, like how the system itself is performing. >>And so just, you know, being careful, maybe a little cautious in terms of, of, of how big we go with this right away, just sort of both limits, you know, the risk of, of, you know, any issues that can come with new software rollouts. We haven't seen anything so far, but also it does give us the opportunity to have like meaningful conversations with a small group of users who are using the products, but once we get through that and they give us two thumbs up on it, it'll be like, open the gates and let everybody in. It's gonna be exciting time for the whole ecosystem. >>Yeah, that makes a lot of sense. And you can do some experimentation and, you know, using the cloud resources. Let's dig into some of the architectural and technical innovations that are gonna help deliver on this vision. What, what should we know there? >>Well, I mean, I think foundationally we built the, the new core on Rust. You know, this is a new very sort of popular systems language, you know, it's extremely efficient, but it's also built for speed and memory safety, which goes back to that us being able to like deliver it in a way that is, you know, something we can inspect very closely, but then also rely on the fact that it's going to behave well. And if it does find error conditions, I mean we, we've loved working with Go and, you know, a lot of our libraries will continue to, to be sort of implemented in Go, but you know, when it came to this particular new engine, you know, that power performance and stability rust was critical. On top of that, like, we've also integrated Apache Arrow and Apache Parque for persistence. I think for anybody who's really familiar with the nuts and bolts of our backend and our TSI and our, our time series merged Trees, this is a big break from that, you know, arrow on the sort of in MI side and then Par K in the on disk side. >>It, it allows us to, to present, you know, a unified set of APIs for those really fast real time inquiries that we talked about, as well as for very large, you know, historical sort of bulk data archives in that PARQUE format, which is also cool because there's an entire ecosystem sort of popping up around Parque in terms of the machine learning community, you know, and getting that all to work, we had to glue it together with aero flight. That's sort of what we're using as our, our RPC component. You know, it handles the orchestration and the, the transportation of the Coer data. Now we're moving to like a true Coer database model for this, this version of the engine, you know, and it removes a lot of overhead for us in terms of having to manage all that serialization, the deserialization, and, you know, to that again, like blurring that line between real time and historical data. It's, you know, it's, it's highly optimized for both streaming micro batch and then batches, but true streaming as well. >>Yeah. Again, I mean, it's funny you mentioned Rust. It is, it's been around for a long time, but it's popularity is, is you know, really starting to hit that steep part of the S-curve. And, and we're gonna dig into to more of that, but give us any, is there anything else that we should know about Bryan? Give us the last word? >>Well, I mean, I think first I'd like everybody sort of watching just to like take a look at what we're offering in terms of early access in beta programs. I mean, if, if, if you wanna participate or if you wanna work sort of in terms of early access with the, with the new engine, please reach out to the team. I'm sure you know, there's a lot of communications going out and you know, it'll be highly featured on our, our website, you know, but reach out to the team, believe it or not, like we have a lot more going on than just the new engine. And so there are also other programs, things we're, we're offering to customers in terms of the user interface, data collection and things like that. And, you know, if you're a customer of ours and you have a sales team, a commercial team that you work with, you can reach out to them and see what you can get access to because we can flip a lot of stuff on, especially in cloud through feature flags. >>But if there's something new that you wanna try out, we'd just love to hear from you. And then, you know, our goal would be that as we give you access to all of these new cool features that, you know, you would give us continuous feedback on these products and services, not only like what you need today, but then what you'll need tomorrow to, to sort of build the next versions of your business. Because you know, the whole database, the ecosystem as it expands out into to, you know, this vertically oriented stack of cloud services and enterprise databases and edge databases, you know, it's gonna be what we all make it together, not just, you know, those of us who were employed by Influx db. And then finally I would just say please, like watch in ICE in Tim's sessions, like these are two of our best and brightest, They're totally brilliant, completely pragmatic, and they are most of all customer obsessed, which is amazing. And there's no better takes, like honestly on the, the sort of technical details of this, then there's, especially when it comes to like the value that these investments will, will bring to our customers and our communities. So encourage you to, to, you know, pay more attention to them than you did to me, for sure. >>Brian Gilmore, great stuff. Really appreciate your time. Thank you. >>Yeah, thanks Dave. It was awesome. Look forward to it. >>Yeah, me too. Looking forward to see how the, the community actually applies these new innovations and goes, goes beyond just the historical into the real time really hot area. As Brian said in a moment, I'll be right back with Anna East dos Georgio to dig into the critical aspects of key open source components of the Influx DB engine, including Rust, Arrow, Parque, data fusion. Keep it right there. You don't wanna miss this >>Time series Data is everywhere. The number of sensors, systems and applications generating time series data increases every day. All these data sources producing so much data can cause analysis paralysis. Influx DB is an entire platform designed with everything you need to quickly build applications that generate value from time series data influx. DB Cloud is a serverless solution, which means you don't need to buy or manage your own servers. There's no need to worry about provisioning because you only pay for what you use. Influx DB Cloud is fully managed so you get the newest features and enhancements as they're added to the platform's code base. It also means you can spend time building solutions and delivering value to your users instead of wasting time and effort managing something else. Influx TVB Cloud offers a range of security features to protect your data, multiple layers of redundancy ensure you don't lose any data access controls ensure that only the people who should see your data can see it. >>And encryption protects your data at rest and in transit between any of our regions or cloud providers. InfluxDB uses a single API across the entire platform suite so you can build on open source, deploy to the cloud and then then easily query data in the cloud at the edge or on prem using the same scripts. And InfluxDB is schemaless automatically adjusting to changes in the shape of your data without requiring changes in your application. Logic. InfluxDB Cloud is production ready from day one. All it needs is your data and your imagination. Get started today@influxdata.com slash cloud. >>Okay, we're back. I'm Dave Valante with a Cube and you're watching evolving Influx DB into the smart data platform made possible by influx data. Anna ETOs Georgio is here, she's a developer advocate for influx data and we're gonna dig into the rationale and value contribution behind several open source technologies that Influx DB is leveraging to increase the granularity of time series analysis analysis and bring the world of data into real-time analytics and is welcome to the program. Thanks for coming on. >>Hi, thank you so much. It's a pleasure to be here. >>Oh, you're very welcome. Okay, so IX is being touted as this next gen open source core for Influx db. And my understanding is that it leverages in memory of course for speed. It's a kilo store, so it gives you a compression efficiency, it's gonna give you faster query speeds, you store files and object storage, so you got very cost effective approach. Are these the salient points on the platform? I know there are probably dozens of other features, but what are the high level value points that people should understand? >>Sure, that's a great question. So some of the main requirements that IOx is trying to achieve and some of the most impressive ones to me, the first one is that it aims to have no limits on cardinality and also allow you to write any kind of event data that you want, whether that's live tag or a field. It also wants to deliver the best in class performance on analytics queries. In addition to our already well served metrics queries, we also wanna have operator control over memory usage. So you should be able to define how much memory is used for buffering caching and query processing. Some other really important parts is the ability to have bulk data export and import super useful. Also broader ecosystem compatibility where possible we aim to use and embrace emerging standards in the data analytics ecosystem and have compatibility with things like sql, Python, and maybe even pandas in the future. >>Okay, so lot there. Now we talked to Brian about how you're using Rust and which is not a new programming language and of course we had some drama around Rust during the pandemic with the Mozilla layoffs, but the formation of the Rust Foundation really addressed any of those concerns. You got big guns like Amazon and Google and Microsoft throwing their collective weights behind it. It's really, the adoption is really starting to get steep on the S-curve. So lots of platforms, lots of adoption with rust, but why rust as an alternative to say c plus plus for example? >>Sure, that's a great question. So Russ was chosen because of his exceptional performance and reliability. So while Russ is synt tactically similar to c plus plus and it has similar performance, it also compiles to a native code like c plus plus. But unlike c plus plus, it also has much better memory safety. So memory safety is protection against bugs or security vulnerabilities that lead to excessive memory usage or memory leaks. And rust achieves this memory safety due to its like innovative type system. Additionally, it doesn't allow for dangling pointers. And dangling pointers are the main classes of errors that lead to exploitable security vulnerabilities in languages like c plus plus. So Russ like helps meet that requirement of having no limits on ality, for example, because it's, we're also using the Russ implementation of Apache Arrow and this control over memory and also Russ Russ's packaging system called crates IO offers everything that you need out of the box to have features like AY and a weight to fix race conditions, to protection against buffering overflows and to ensure thread safe async cashing structures as well. So essentially it's just like has all the control, all the fine grain control, you need to take advantage of memory and all your resources as well as possible so that you can handle those really, really high ity use cases. >>Yeah, and the more I learn about the, the new engine and, and the platform IOCs et cetera, you know, you, you see things like, you know, the old days not even to even today you do a lot of garbage collection in these, in these systems and there's an inverse, you know, impact relative to performance. So it looks like you really, you know, the community is modernizing the platform, but I wanna talk about Apache Arrow for a moment. It it's designed to address the constraints that are associated with analyzing large data sets. We, we know that, but please explain why, what, what is Arrow and and what does it bring to Influx db? >>Sure, yeah. So Arrow is a, a framework for defining in memory calmer data. And so much of the efficiency and performance of IOx comes from taking advantage of calmer data structures. And I will, if you don't mind, take a moment to kind of of illustrate why column or data structures are so valuable. Let's pretend that we are gathering field data about the temperature in our room and also maybe the temperature of our stove. And in our table we have those two temperature values as well as maybe a measurement value, timestamp value, maybe some other tag values that describe what room and what house, et cetera we're getting this data from. And so you can picture this table where we have like two rows with the two temperature values for both our room and the stove. Well usually our room temperature is regulated so those values don't change very often. >>So when you have calm oriented st calm oriented storage, essentially you take each row, each column and group it together. And so if that's the case and you're just taking temperature values from the room and a lot of those temperature values are the same, then you'll, you might be able to imagine how equal values will then enable each other and when they neighbor each other in the storage format, this provides a really perfect opportunity for cheap compression. And then this cheap compression enables high cardinality use cases. It also enables for faster scan rates. So if you wanna define like the men and max value of the temperature in the room across a thousand different points, you only have to get those a thousand different points in order to answer that question and you have those immediately available to you. But let's contrast this with a row oriented storage solution instead so that we can understand better the benefits of calmer oriented storage. >>So if you had a row oriented storage, you'd first have to look at every field like the temperature in, in the room and the temperature of the stove. You'd have to go across every tag value that maybe describes where the room is located or what model the stove is. And every timestamp you'd then have to pluck out that one temperature value that you want at that one time stamp and do that for every single row. So you're scanning across a ton more data and that's why Rowe Oriented doesn't provide the same efficiency as calmer and Apache Arrow is in memory calmer data, commoner data fit framework. So that's where a lot of the advantages come >>From. Okay. So you basically described like a traditional database, a row approach, but I've seen like a lot of traditional database say, okay, now we've got, we can handle colo format versus what you're talking about is really, you know, kind of native i, is it not as effective? Is the, is the foreman not as effective because it's largely a, a bolt on? Can you, can you like elucidate on that front? >>Yeah, it's, it's not as effective because you have more expensive compression and because you can't scan across the values as quickly. And so those are, that's pretty much the main reasons why, why RO row oriented storage isn't as efficient as calm, calmer oriented storage. Yeah. >>Got it. So let's talk about Arrow Data Fusion. What is data fusion? I know it's written in Rust, but what does it bring to the table here? >>Sure. So it's an extensible query execution framework and it uses Arrow as it's in memory format. So the way that it helps in influx DB IOCs is that okay, it's great if you can write unlimited amount of cardinality into influx Cbis, but if you don't have a query engine that can successfully query that data, then I don't know how much value it is for you. So Data fusion helps enable the, the query process and transformation of that data. It also has a PANDAS API so that you could take advantage of PANDAS data frames as well and all of the machine learning tools associated with Pandas. >>Okay. You're also leveraging Par K in the platform cause we heard a lot about Par K in the middle of the last decade cuz as a storage format to improve on Hadoop column stores. What are you doing with Parque and why is it important? >>Sure. So parque is the column oriented durable file format. So it's important because it'll enable bulk import, bulk export, it has compatibility with Python and Pandas, so it supports a broader ecosystem. Par K files also take very little disc disc space and they're faster to scan because again, they're column oriented in particular, I think PAR K files are like 16 times cheaper than CSV files, just as kind of a point of reference. And so that's essentially a lot of the, the benefits of par k. >>Got it. Very popular. So and he's, what exactly is influx data focusing on as a committer to these projects? What is your focus? What's the value that you're bringing to the community? >>Sure. So Influx DB first has contributed a lot of different, different things to the Apache ecosystem. For example, they contribute an implementation of Apache Arrow and go and that will support clearing with flux. Also, there has been a quite a few contributions to data fusion for things like memory optimization and supportive additional SQL features like support for timestamp, arithmetic and support for exist clauses and support for memory control. So yeah, Influx has contributed a a lot to the Apache ecosystem and continues to do so. And I think kind of the idea here is that if you can improve these upstream projects and then the long term strategy here is that the more you contribute and build those up, then the more you will perpetuate that cycle of improvement and the more we will invest in our own project as well. So it's just that kind of symbiotic relationship and appreciation of the open source community. >>Yeah. Got it. You got that virtuous cycle going, the people call the flywheel. Give us your last thoughts and kind of summarize, you know, where what, what the big takeaways are from your perspective. >>So I think the big takeaway is that influx data is doing a lot of really exciting things with Influx DB IOx and I really encourage, if you are interested in learning more about the technologies that Influx is leveraging to produce IOCs, the challenges associated with it and all of the hard work questions and you just wanna learn more, then I would encourage you to go to the monthly Tech talks and community office hours and they are on every second Wednesday of the month at 8:30 AM Pacific time. There's also a community forums and a community Slack channel look for the influx DDB unders IAC channel specifically to learn more about how to join those office hours and those monthly tech tech talks as well as ask any questions they have about iacs, what to expect and what you'd like to learn more about. I as a developer advocate, I wanna answer your questions. So if there's a particular technology or stack that you wanna dive deeper into and want more explanation about how INFLUX DB leverages it to build IOCs, I will be really excited to produce content on that topic for you. >>Yeah, that's awesome. You guys have a really rich community, collaborate with your peers, solve problems, and, and you guys super responsive, so really appreciate that. All right, thank you so much Anise for explaining all this open source stuff to the audience and why it's important to the future of data. >>Thank you. I really appreciate it. >>All right, you're very welcome. Okay, stay right there and in a moment I'll be back with Tim Yoakum, he's the director of engineering for Influx Data and we're gonna talk about how you update a SAS engine while the plane is flying at 30,000 feet. You don't wanna miss this. >>I'm really glad that we went with InfluxDB Cloud for our hosting because it has saved us a ton of time. It's helped us move faster, it's saved us money. And also InfluxDB has good support. My name's Alex Nada. I am CTO at Noble nine. Noble Nine is a platform to measure and manage service level objectives, which is a great way of measuring the reliability of your systems. You can essentially think of an slo, the product we're providing to our customers as a bunch of time series. So we need a way to store that data and the corresponding time series that are related to those. The main reason that we settled on InfluxDB as we were shopping around is that InfluxDB has a very flexible query language and as a general purpose time series database, it basically had the set of features we were looking for. >>As our platform has grown, we found InfluxDB Cloud to be a really scalable solution. We can quickly iterate on new features and functionality because Influx Cloud is entirely managed, it probably saved us at least a full additional person on our team. We also have the option of running InfluxDB Enterprise, which gives us the ability to even host off the cloud or in a private cloud if that's preferred by a customer. Influx data has been really flexible in adapting to the hosting requirements that we have. They listened to the challenges we were facing and they helped us solve it. As we've continued to grow, I'm really happy we have influx data by our side. >>Okay, we're back with Tim Yokum, who is the director of engineering at Influx Data. Tim, welcome. Good to see you. >>Good to see you. Thanks for having me. >>You're really welcome. Listen, we've been covering open source software in the cube for more than a decade, and we've kind of watched the innovation from the big data ecosystem. The cloud has been being built out on open source, mobile, social platforms, key databases, and of course influx DB and influx data has been a big consumer and contributor of open source software. So my question to you is, where have you seen the biggest bang for the buck from open source software? >>So yeah, you know, influx really, we thrive at the intersection of commercial services and open, so open source software. So OSS keeps us on the cutting edge. We benefit from OSS in delivering our own service from our core storage engine technologies to web services temping engines. Our, our team stays lean and focused because we build on proven tools. We really build on the shoulders of giants and like you've mentioned, even better, we contribute a lot back to the projects that we use as well as our own product influx db. >>You know, but I gotta ask you, Tim, because one of the challenge that that we've seen in particular, you saw this in the heyday of Hadoop, the, the innovations come so fast and furious and as a software company you gotta place bets, you gotta, you know, commit people and sometimes those bets can be risky and not pay off well, how have you managed this challenge? >>Oh, it moves fast. Yeah, that, that's a benefit though because it, the community moves so quickly that today's hot technology can be tomorrow's dinosaur. And what we, what we tend to do is, is we fail fast and fail often. We try a lot of things. You know, you look at Kubernetes for example, that ecosystem is driven by thousands of intelligent developers, engineers, builders, they're adding value every day. So we have to really keep up with that. And as the stack changes, we, we try different technologies, we try different methods, and at the end of the day, we come up with a better platform as a result of just the constant change in the environment. It is a challenge for us, but it's, it's something that we just do every day. >>So we have a survey partner down in New York City called Enterprise Technology Research etr, and they do these quarterly surveys of about 1500 CIOs, IT practitioners, and they really have a good pulse on what's happening with spending. And the data shows that containers generally, but specifically Kubernetes is one of the areas that has kind of, it's been off the charts and seen the most significant adoption and velocity particularly, you know, along with cloud. But, but really Kubernetes is just, you know, still up until the right consistently even with, you know, the macro headwinds and all, all of the stuff that we're sick of talking about. But, so what are you doing with Kubernetes in the platform? >>Yeah, it, it's really central to our ability to run the product. When we first started out, we were just on AWS and, and the way we were running was, was a little bit like containers junior. Now we're running Kubernetes everywhere at aws, Azure, Google Cloud. It allows us to have a consistent experience across three different cloud providers and we can manage that in code so our developers can focus on delivering services, not trying to learn the intricacies of Amazon, Azure, and Google and figure out how to deliver services on those three clouds with all of their differences. >>Just to follow up on that, is it, no. So I presume it's sounds like there's a PAs layer there to allow you guys to have a consistent experience across clouds and out to the edge, you know, wherever is that, is that correct? >>Yeah, so we've basically built more or less platform engineering, This is the new hot phrase, you know, it, it's, Kubernetes has made a lot of things easy for us because we've built a platform that our developers can lean on and they only have to learn one way of deploying their application, managing their application. And so that, that just gets all of the underlying infrastructure out of the way and, and lets them focus on delivering influx cloud. >>Yeah, and I know I'm taking a little bit of a tangent, but is that, that, I'll call it a PAs layer if I can use that term. Is that, are there specific attributes to Influx db or is it kind of just generally off the shelf paths? You know, are there, is, is there any purpose built capability there that, that is, is value add or is it pretty much generic? >>So we really build, we, we look at things through, with a build versus buy through a, a build versus by lens. Some things we want to leverage cloud provider services, for instance, Postgres databases for metadata, perhaps we'll get that off of our plate, let someone else run that. We're going to deploy a platform that our engineers can, can deliver on that has consistency that is, is all generated from code that we can as a, as an SRE group, as an ops team, that we can manage with very few people really, and we can stamp out clusters across multiple regions and in no time. >>So how, so sometimes you build, sometimes you buy it. How do you make those decisions and and what does that mean for the, for the platform and for customers? >>Yeah, so what we're doing is, it's like everybody else will do, we're we're looking for trade offs that make sense. You know, we really want to protect our customers data. So we look for services that support our own software with the most uptime, reliability, and durability we can get. Some things are just going to be easier to have a cloud provider take care of on our behalf. We make that transparent for our own team. And of course for customers you don't even see that, but we don't want to try to reinvent the wheel, like I had mentioned with SQL data stores for metadata, perhaps let's build on top of what of these three large cloud providers have already perfected. And we can then focus on our platform engineering and we can have our developers then focus on the influx data, software, influx, cloud software. >>So take it to the customer level, what does it mean for them? What's the value that they're gonna get out of all these innovations that we've been been talking about today and what can they expect in the future? >>So first of all, people who use the OSS product are really gonna be at home on our cloud platform. You can run it on your desktop machine, on a single server, what have you, but then you want to scale up. We have some 270 terabytes of data across, over 4 billion series keys that people have stored. So there's a proven ability to scale now in terms of the open source, open source software and how we've developed the platform. You're getting highly available high cardinality time series platform. We manage it and, and really as, as I mentioned earlier, we can keep up with the state of the art. We keep reinventing, we keep deploying things in real time. We deploy to our platform every day repeatedly all the time. And it's that continuous deployment that allows us to continue testing things in flight, rolling things out that change new features, better ways of doing deployments, safer ways of doing deployments. >>All of that happens behind the scenes. And like we had mentioned earlier, Kubernetes, I mean that, that allows us to get that done. We couldn't do it without having that platform as a, as a base layer for us to then put our software on. So we, we iterate quickly. When you're on the, the Influx cloud platform, you really are able to, to take advantage of new features immediately. We roll things out every day and as those things go into production, you have, you have the ability to, to use them. And so in the end we want you to focus on getting actual insights from your data instead of running infrastructure, you know, let, let us do that for you. So, >>And that makes sense, but so is the, is the, are the innovations that we're talking about in the evolution of Influx db, do, do you see that as sort of a natural evolution for existing customers? I, is it, I'm sure the answer is both, but is it opening up new territory for customers? Can you add some color to that? >>Yeah, it really is it, it's a little bit of both. Any engineer will say, well, it depends. So cloud native technologies are, are really the hot thing. Iot, industrial iot especially, people want to just shove tons of data out there and be able to do queries immediately and they don't wanna manage infrastructure. What we've started to see are people that use the cloud service as their, their data store backbone and then they use edge computing with R OSS product to ingest data from say, multiple production lines and downsample that data, send the rest of that data off influx cloud where the heavy processing takes place. So really us being in all the different clouds and iterating on that and being in all sorts of different regions allows for people to really get out of the, the business of man trying to manage that big data, have us take care of that. And of course as we change the platform end users benefit from that immediately. And, >>And so obviously taking away a lot of the heavy lifting for the infrastructure, would you say the same thing about security, especially as you go out to IOT and the Edge? How should we be thinking about the value that you bring from a security perspective? >>Yeah, we take, we take security super seriously. It, it's built into our dna. We do a lot of work to ensure that our platform is secure, that the data we store is, is kept private. It's of course always a concern. You see in the news all the time, companies being compromised, you know, that's something that you can have an entire team working on, which we do to make sure that the data that you have, whether it's in transit, whether it's at rest, is always kept secure, is only viewable by you. You know, you look at things like software, bill of materials, if you're running this yourself, you have to go vet all sorts of different pieces of software. And we do that, you know, as we use new tools. That's something that, that's just part of our jobs to make sure that the platform that we're running it has, has fully vetted software and, and with open source especially, that's a lot of work. And so it's, it's definitely new territory. Supply chain attacks are, are definitely happening at a higher clip than they used to, but that is, that is really just part of a day in the, the life for folks like us that are, are building platforms. >>Yeah, and that's key. I mean especially when you start getting into the, the, you know, we talk about IOT and the operations technologies, the engineers running the, that infrastructure, you know, historically, as you know, Tim, they, they would air gap everything. That's how they kept it safe. But that's not feasible anymore. Everything's >>That >>Connected now, right? And so you've gotta have a partner that is again, take away that heavy lifting to r and d so you can focus on some of the other activities. Right. Give us the, the last word and the, the key takeaways from your perspective. >>Well, you know, from my perspective I see it as, as a a two lane approach with, with influx, with Anytime series data, you know, you've got a lot of stuff that you're gonna run on-prem, what you had mentioned, air gaping. Sure there's plenty of need for that, but at the end of the day, people that don't want to run big data centers, people that want torus their data to, to a company that's, that's got a full platform set up for them that they can build on, send that data over to the cloud, the cloud is not going away. I think more hybrid approach is, is where the future lives and that's what we're prepared for. >>Tim, really appreciate you coming to the program. Great stuff. Good to see you. >>Thanks very much. Appreciate it. >>Okay, in a moment I'll be back to wrap up. Today's session, you're watching The Cube. >>Are you looking for some help getting started with InfluxDB Telegraph or Flux Check >>Out Influx DB University >>Where you can find our entire catalog of free training that will help you make the most of your time series data >>Get >>Started for free@influxdbu.com. >>We'll see you in class. >>Okay, so we heard today from three experts on time series and data, how the Influx DB platform is evolving to support new ways of analyzing large data sets very efficiently and effectively in real time. And we learned that key open source components like Apache Arrow and the Rust Programming environment Data fusion par K are being leveraged to support realtime data analytics at scale. We also learned about the contributions in importance of open source software and how the Influx DB community is evolving the platform with minimal disruption to support new workloads, new use cases, and the future of realtime data analytics. Now remember these sessions, they're all available on demand. You can go to the cube.net to find those. Don't forget to check out silicon angle.com for all the news related to things enterprise and emerging tech. And you should also check out influx data.com. There you can learn about the company's products. You'll find developer resources like free courses. You could join the developer community and work with your peers to learn and solve problems. And there are plenty of other resources around use cases and customer stories on the website. This is Dave Valante. Thank you for watching Evolving Influx DB into the smart data platform, made possible by influx data and brought to you by the Cube, your leader in enterprise and emerging tech coverage.

Published Date : Oct 28 2022

SUMMARY :

we talked about how in theory, those time slices could be taken, you know, As is often the case, open source software is the linchpin to those innovations. We hope you enjoy the program. I appreciate the time. Hey, explain why Influx db, you know, needs a new engine. now, you know, related to requests like sql, you know, query support, things like that, of the real first influx DB cloud, you know, which has been really successful. as they're giving us feedback, et cetera, has has, you know, pointed us in a really good direction shift from, you know, time series, you know, specialist to real time analytics better handle those queries from a performance and a, and a, you know, a time to response on the queries, you know, all of the, the real time queries, the, the multiple language query support, the, the devices and you know, the sort of highly distributed nature of all of this. I always thought, you know, real, I always thought of real time as before you lose the customer, you know, and that's one of the things that really triggered us to know that we were, we were heading in the right direction, a look at the, the libraries in on our GitHub and, you know, can ex inspect it and even can try And so just, you know, being careful, maybe a little cautious in terms And you can do some experimentation and, you know, using the cloud resources. You know, this is a new very sort of popular systems language, you know, really fast real time inquiries that we talked about, as well as for very large, you know, but it's popularity is, is you know, really starting to hit that steep part of the S-curve. going out and you know, it'll be highly featured on our, our website, you know, the whole database, the ecosystem as it expands out into to, you know, this vertically oriented Really appreciate your time. Look forward to it. goes, goes beyond just the historical into the real time really hot area. There's no need to worry about provisioning because you only pay for what you use. InfluxDB uses a single API across the entire platform suite so you can build on Influx DB is leveraging to increase the granularity of time series analysis analysis and bring the Hi, thank you so much. it's gonna give you faster query speeds, you store files and object storage, it aims to have no limits on cardinality and also allow you to write any kind of event data that It's really, the adoption is really starting to get steep on all the control, all the fine grain control, you need to take you know, the community is modernizing the platform, but I wanna talk about Apache And so you can answer that question and you have those immediately available to you. out that one temperature value that you want at that one time stamp and do that for every talking about is really, you know, kind of native i, is it not as effective? Yeah, it's, it's not as effective because you have more expensive compression and So let's talk about Arrow Data Fusion. It also has a PANDAS API so that you could take advantage of PANDAS What are you doing with and Pandas, so it supports a broader ecosystem. What's the value that you're bringing to the community? And I think kind of the idea here is that if you can improve kind of summarize, you know, where what, what the big takeaways are from your perspective. the hard work questions and you All right, thank you so much Anise for explaining I really appreciate it. Data and we're gonna talk about how you update a SAS engine while I'm really glad that we went with InfluxDB Cloud for our hosting They listened to the challenges we were facing and they helped Good to see you. Good to see you. So my question to you is, So yeah, you know, influx really, we thrive at the intersection of commercial services and open, You know, you look at Kubernetes for example, But, but really Kubernetes is just, you know, Azure, and Google and figure out how to deliver services on those three clouds with all of their differences. to the edge, you know, wherever is that, is that correct? This is the new hot phrase, you know, it, it's, Kubernetes has made a lot of things easy for us Is that, are there specific attributes to Influx db as an SRE group, as an ops team, that we can manage with very few people So how, so sometimes you build, sometimes you buy it. And of course for customers you don't even see that, but we don't want to try to reinvent the wheel, and really as, as I mentioned earlier, we can keep up with the state of the art. the end we want you to focus on getting actual insights from your data instead of running infrastructure, So cloud native technologies are, are really the hot thing. You see in the news all the time, companies being compromised, you know, technologies, the engineers running the, that infrastructure, you know, historically, as you know, take away that heavy lifting to r and d so you can focus on some of the other activities. with influx, with Anytime series data, you know, you've got a lot of stuff that you're gonna run on-prem, Tim, really appreciate you coming to the program. Thanks very much. Okay, in a moment I'll be back to wrap up. brought to you by the Cube, your leader in enterprise and emerging tech coverage.

ENTITIES

Entity	Category	Confidence
Brian Gilmore	PERSON	0.99+
Tim Yoakum	PERSON	0.99+
Brian	PERSON	0.99+
Dave	PERSON	0.99+
Tim Yokum	PERSON	0.99+
Dave Valante	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Tim	PERSON	0.99+
Google	ORGANIZATION	0.99+
16 times	QUANTITY	0.99+
two rows	QUANTITY	0.99+
New York City	LOCATION	0.99+
60,000 people	QUANTITY	0.99+
Rust	TITLE	0.99+
Influx	ORGANIZATION	0.99+
Influx Data	ORGANIZATION	0.99+
today	DATE	0.99+
Influx Data	ORGANIZATION	0.99+
Python	TITLE	0.99+
three experts	QUANTITY	0.99+
InfluxDB	TITLE	0.99+
both	QUANTITY	0.99+
each row	QUANTITY	0.99+
two lane	QUANTITY	0.99+
Today	DATE	0.99+
Noble nine	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
Flux	ORGANIZATION	0.99+
Influx DB	TITLE	0.99+
each column	QUANTITY	0.99+
270 terabytes	QUANTITY	0.99+
cube.net	OTHER	0.99+
twice	QUANTITY	0.99+
Bryan	PERSON	0.99+
Pandas	TITLE	0.99+
c plus plus	TITLE	0.99+
three years ago	DATE	0.99+
two	QUANTITY	0.99+
more than a decade	QUANTITY	0.98+
Apache	ORGANIZATION	0.98+
dozens	QUANTITY	0.98+
free@influxdbu.com	OTHER	0.98+
30,000 feet	QUANTITY	0.98+
Rust Foundation	ORGANIZATION	0.98+
two temperature values	QUANTITY	0.98+
In Flux Data	ORGANIZATION	0.98+
one time stamp	QUANTITY	0.98+
tomorrow	DATE	0.98+
Russ	PERSON	0.98+
IOT	ORGANIZATION	0.98+
Evolving InfluxDB	TITLE	0.98+
first	QUANTITY	0.97+
Influx data	ORGANIZATION	0.97+
one	QUANTITY	0.97+
first one	QUANTITY	0.97+
Influx DB University	ORGANIZATION	0.97+
SQL	TITLE	0.97+
The Cube	TITLE	0.96+
Influx DB Cloud	TITLE	0.96+
single server	QUANTITY	0.96+
Kubernetes	TITLE	0.96+

Tim Yocum, Influx Data

(upbeat music) >> Okay, we're back with Tim Yoakum, who is the Director of Engineering at Influx Data. Tim, welcome. Good to see you. >> Good to see you. Thanks for having me. >> You're really welcome. Listen, we've been covering open source software on the Cube for more than a decade, and we've kind of watched the innovation from the big data ecosystem, the cloud is being built out on open source, mobile social platforms, key databases, and of course Influx DB, and Influx Data has been a big consumer and contributor of open source software. So my question to you is where have you seen the biggest bang for the buck from open source software? >> So, yeah, you know, Influx, really, we thrive at the intersection of commercial services and open source software. So OSS keeps us on the cutting edge. We benefit from OSS in delivering our own service, from our core storage engine technologies to web services, templating engines. Our team stays lean and focused because we build on proven tools. We really build on the shoulders of giants. And like you've mentioned, even better, we contribute a lot back to the projects that we use as well as our own product, Influx DB. >> You know, but I got to ask you, Tim, because one of the challenge that we've seen, in particular, you saw this in the heyday of Hadoop. The innovations come so fast and furious, and as a software company, you got to place bets, you got to, you know, commit people, and sometimes those bets can be risky and not pay off. How have you managed this challenge? >> Oh, it moves fast, yeah. That's a benefit though, because the community moves so quickly that today's hot technology can be tomorrow's dinosaur. And what we tend to do is we fail fast and fail often. We try a lot of things. You know, you look at Kubernetes for example. That ecosystem is driven by thousands of intelligent developers, engineers, builders. They're adding value every day. So we have to really keep up with that. And as the stack changes, we try different technologies, we try different methods, and at the end of the day, we come up with a better platform as a result of just the constant change in the environment. It is a challenge for us, but it's something that we just do every day. >> So we have a survey partner down in New York City called Enterprise Technology Research, ETR, and they do these quarterly surveys of about 1500 CIOs, IT practitioners, and they really have a good pulse on what's happening with spending. And the data shows that containers generally, but specifically Kubernetes, is one of the areas that has kind of, it's been off the charts and seen the most significant adoption and velocity, particularly, you know, along with cloud. But really Kubernetes is just, you know, still up and to the right consistently, even with, you know the macro headwinds and all of the other stuff that we're sick of talking about. So what are you doing with Kubernetes in the platform? >> Yeah, it's really central to our ability to run the product. When we first started out, we were just on AWS, and the way we were running was a little bit like containers junior. Now we're running Kubernetes everywhere, at AWS, Azure, Google Cloud. It allows us to have a consistent experience across three different cloud providers, and we can manage that in code. So our developers can focus on delivering services, not trying to learn the intricacies of Amazon, Azure, and Google, and figure out how to deliver services on those three clouds with all of their differences. >> Just a follow up on that, is it, now, so I presume it sounds like there's a PaaS layer there to allow you guys to have a consistent experience across clouds and up to the edge, you know, wherever. Is that, is that correct? >> Yeah, so we've basically built, more or less, platform engineering. This is the new hot phrase. You know, Kubernetes has made a lot of things easy for us because we've built a platform that our developers can lean on, and they only have to learn one way of deploying their application, managing their application. And so that just gets all of the underlying infrastructure out of the way and lets them focus on delivering Influx Cloud. >> Yeah, and I know I'm taking a little bit of a tangent, but is that, I'll call it a PaaS layer if I can use that term, are there specific attributes to Influx DB, or is it kind of just generally off the shelf PaaS? You know, is there any purpose built capability there that is value add, or is it pretty much generic? >> So we really build, we look at things with a build versus buy, through a build versus buy lens. Some things we want to leverage, cloud provider services for instance, Postgres databases for metadata perhaps, get that off of our plate, let someone else run that. We're going to deploy a platform that our engineers can deliver on, that has consistency, that is all generated from code that we can, as an SRE group, as an ops team, that we can manage with very few people really, and we can stamp out clusters across multiple regions in no time. >> So how, so sometimes you build, sometimes you buy it. How do you make those decisions, and what does that mean for the platform and for customers? >> Yeah, so what we're doing is, it's like everybody else will do. We're looking for trade offs that make sense. You know, we really want to protect our customers' data. So we look for services that support our own software with the most uptime, reliability, and durability we can get. Some things are just going to be easier to have a cloud provider take care of on our behalf. We make that transparent for our own team. And of course for customers, you don't even see that, but we don't want to try to reinvent the wheel. Like I had had mentioned with SQL data storage for metadata perhaps. Let's build on top of what these three large cloud providers have already perfected, and we can then focus on our platform engineering, and we can have our developers then focus on the Influx Data software, Influx Cloud software. >> So take it to the customer level. What does it mean for them? What's the value that they're going to get out of all these innovations that we've been been talking about today? And what can they expect in the future? >> So first of all, people who use the OSS product are really going to be at home on our cloud platform. You can run it on your desktop machine, on a single server, what have you. But then you want to scale up. We have some 270 terabytes of data across over 4 billion series keys that people have stored. So there's a proven ability to scale. Now, in terms of the open source software, and how we've developed the platform, you're getting highly available, high cardinality time series platform. We manage it, and really as I mentioned earlier, we can keep up with the state of the art. We keep reinventing. We keep deploying things in real time. We deploy to our platform every day repeatedly, all the time. And it's that continuous deployment that allows us to continue testing things in flight, rolling things out that change, new features, better ways of doing deployments, safer ways of doing deployments. All of that happens behind the scenes. And we had mentioned earlier Kubernetes, I mean that allows us to get that done. We couldn't do it without having that platform as a base layer for us to then put our software on. So we iterate quickly. When you're on the Influx Cloud platform, you really are able to take advantage of new features immediately. We roll things out every day. And as those things go into production, you have the ability to use them. And so in the end, we want you to focus on getting actionable insights from your data instead of running infrastructure. You know, let us do that for you. >> And that makes sense, but so is the, are the innovations that we're talking about in the evolution of Influx DB, do you see that as sort of a natural evolution for existing customers? Is it, I'm sure the answer is both, but is it opening up new territory for customers? Can you add some color to that? >> Yeah, it really is. It's a little bit of both. Any engineer will say, well, it depends. So cloud native technologies are really the hot thing. IoT, industrial IoT especially, people want to just shove tons of data out there and be able to do queries immediately, and they don't want to manage infrastructure. What we've started to see are people that use the cloud service as their data store backbone, and then they use edge computing with our OSS product to ingest data from say multiple production lines and down-sample that data, send the rest of that data off to Influx Cloud where the heavy processing takes place. So really us being in all the different clouds and iterating on that, and being in all sorts of different regions allows for people to really get out of the business of trying to manage that big data, have us take care of that. And of course, as we change the platform, end users benefit from that immediately. >> And so obviously, taking away a lot of the heavy lifting for the infrastructure, would you say the same thing about security, especially as you go out to IoT and the edge? How should we be thinking about the value that you bring from a security perspective? >> Yeah, we take security super seriously. It's built into our DNA. We do a lot of work to ensure that our platform is secure, that the data we store is kept private. It's of course always a concern. You see in the news all the time companies being compromised. You know, that's something that you can have an entire team working on, which we do, to make sure that the data that you have, whether it's in transit, whether it's at rest, is always kept secure, is only viewable by you. You look at things like software bill of materials. If you're running this yourself, you have to go vet all sorts of different pieces of software. And we do that, you know, as we use new tools. That's something that's just part of our jobs, to make sure that the platform that we're running has fully vetted software. And with open source especially, that's a lot of work. And so it's definitely new territory. Supply chain attacks are definitely happening at a higher clip than they used to. But that is really just part of a day in the life for folks like us that are building platforms. >> Yeah, and that's key. I mean, especially when you start getting into the, you know, we talk about IoT and the operations technologies, the engineers running that infrastructure. You know, historically, as you know, Tim, they would air gap everything. That's how they kept it safe. But that's not feasible anymore. Everything's >> Can't do that. >> connected now, right? And so you've got to have a partner that is, again, take away that heavy lifting to R and D so you can focus on some of the other activities. All right. Give us the last word and the key takeaways from your perspective. >> Well, you know, from my perspective, I see it as a a two lane approach. With Influx, with any any time series data, you know, you've got a lot of stuff that you're going to run on-prem. What you mentioned, air gaping, sure there's plenty of need for that, but at the end of the day, people that don't want to run big data centers, people that want to entrust their data to a company that's got a full platform set up for them that they can build on, send that data over to the cloud. The cloud is not going away. I think a more hybrid approach is where the future lives, and that's what we're prepared for. >> Tim, really appreciate you coming to the program. Great stuff. Good to see you. >> Thanks very much. Appreciate it. >> Okay, in a moment, I'll be back to wrap up today's session. You're watching the Cube. (gentle music)

Published Date : Oct 18 2022

SUMMARY :

Good to see you. Good to see you. So my question to you is to the projects that we use in the heyday of Hadoop. And as the stack changes, we and all of the other stuff that and the way we were to allow you guys to have and they only have to learn one way that we can manage with So how, so sometimes you and we can have our developers then focus So take it to the customer level. And so in the end, we want you to focus And of course, as we change the platform, that the data we store is kept private. and the operations technologies, and the key takeaways that data over to the cloud. you coming to the program. Thanks very much. I'll be back to wrap up today's session.

ENTITIES

Entity	Category	Confidence
Tim Yoakum	PERSON	0.99+
Tim	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Influx Data	ORGANIZATION	0.99+
Tim Yocum	PERSON	0.99+
Google	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
today	DATE	0.99+
both	QUANTITY	0.99+
two lane	QUANTITY	0.99+
Influx	ORGANIZATION	0.98+
Azure	ORGANIZATION	0.98+
270 terabytes	QUANTITY	0.98+
about 1500 CIOs	QUANTITY	0.97+
tomorrow	DATE	0.97+
more than a decade	QUANTITY	0.97+
over 4 billion	QUANTITY	0.97+
one	QUANTITY	0.97+
tons of data	QUANTITY	0.95+
Influx DB	TITLE	0.95+
Kubernetes	TITLE	0.94+
Enterprise Technology Research	ORGANIZATION	0.93+
first	QUANTITY	0.93+
single server	QUANTITY	0.92+
SQL	TITLE	0.91+
three	QUANTITY	0.91+
Postgres	ORGANIZATION	0.91+
Influx Cloud	TITLE	0.9+
thousands of intelligent developers	QUANTITY	0.9+
ETR	ORGANIZATION	0.9+
Hadoop	TITLE	0.9+
three large cloud providers	QUANTITY	0.81+
three clouds	QUANTITY	0.79+
Influx DB	ORGANIZATION	0.74+
cloud	QUANTITY	0.62+
Google Cloud	ORGANIZATION	0.56+
Cube	PERSON	0.53+
Cube	COMMERCIAL_ITEM	0.52+
Cloud	TITLE	0.45+
Influx	TITLE	0.36+

Anais Dotis Georgiou, InfluxData

(upbeat music) >> Okay, we're back. I'm Dave Vellante with The Cube and you're watching Evolving InfluxDB into the smart data platform made possible by influx data. Anais Dotis-Georgiou is here. She's a developer advocate for influx data and we're going to dig into the rationale and value contribution behind several open source technologies that InfluxDB is leveraging to increase the granularity of time series analysis and bring the world of data into realtime analytics. Anais welcome to the program. Thanks for coming on. >> Hi, thank you so much. It's a pleasure to be here. >> Oh, you're very welcome. Okay, so IOx is being touted as this next gen open source core for InfluxDB. And my understanding is that it leverages in memory, of course for speed. It's a kilometer store, so it gives you compression efficiency it's going to give you faster query speeds, it's going to see you store files and object storages so you got very cost effective approach. Are these the salient points on the platform? I know there are probably dozens of other features but what are the high level value points that people should understand? >> Sure, that's a great question. So some of the main requirements that IOx is trying to achieve and some of the most impressive ones to me the first one is that it aims to have no limits on cardinality and also allow you to write any kind of event data that you want whether that's lift tag or a field. It also wants to deliver the best in class performance on analytics queries. In addition to our already well served metric queries we also want to have operator control over memory usage. So you should be able to define how much memory is used for buffering caching and query processing. Some other really important parts is the ability to have bulk data export and import, super useful. Also, broader ecosystem compatibility where possible we aim to use and embrace emerging standards in the data analytics ecosystem and have compatibility with things like SQL, Python and maybe even Pandas in the future. >> Okay, so a lot there. Now we talked to Brian about how you're using Rust and which is not a new programming language and of course we had some drama around Rust during the pandemic with the Mozilla layoffs but the formation of the Rust Foundation really addressed any of those concerns and you got big guns like Amazon and Google and Microsoft throwing their collective weights behind it. It's really adoption is really starting to get steep on the S-curve. So lots of platforms, lots of adoption with Rust but why Rust as an alternative to say C++ for example? >> Sure, that's a great question. So Rust was chosen because of his exceptional performance and reliability. So while Rust is syntactically similar to C++ and it has similar performance it also compiles to a native code like C++ But unlike C++ it also has much better memory safety. So memory safety is protection against bugs or security vulnerabilities that lead to excessive memory usage or memory leaks. And Rust achieves this memory safety due to its like innovative type system. Additionally, it doesn't allow for dangling pointers and dangling pointers are the main classes of errors that lead to exploitable security vulnerabilities in languages like C++. So Rust like helps meet that requirement of having no limits on cardinality, for example, because it's we're also using the Rust implementation of Apache Arrow and this control over memory and also Rust's packaging system called Crates IO offers everything that you need out of the box to have features like async and await to fix race conditions to protect against buffering overflows and to ensure thread safe async caching structures as well. So essentially it's just like has all the control all the fine grain control, you need to take advantage of memory and all your resources as well as possible so that you can handle those really, really high cardinality use cases. >> Yeah, and the more I learn about the new engine and the platform IOx et cetera, you see things like the old days not even to even today you do a lot of garbage collection in these systems and there's an inverse, impact relative to performance. So it looks like you're really, the community is modernizing the platform but I want to talk about Apache Arrow for a moment. It's designed to address the constraints that are associated with analyzing large data sets. We know that, but please explain why, what is Arrow and what does it bring to InfluxDB? >> Sure. Yeah. So Arrow is a a framework for defining in memory column data. And so much of the efficiency and performance of IOx comes from taking advantage of column data structures. And I will, if you don't mind, take a moment to kind of illustrate why column data structures are so valuable. Let's pretend that we are gathering field data about the temperature in our room and also maybe the temperature of our store. And in our table we have those two temperature values as well as maybe a measurement value, timestamp value maybe some other tag values that describe what room and what house, et cetera we're getting this data from. And so you can picture this table where we have like two rows with the two temperature values for both our room and the store. Well, usually our room temperature is regulated so those values don't change very often. So when you have calm oriented storage essentially you take each row each column and group it together. And so if that's the case and you're just taking temperature values from the room and a lot of those temperature values are the same then you'll, you might be able to imagine how equal values will then enable each other and when they neighbor each other in the storage format this provides a really perfect opportunity for cheap compression. And then this cheap compression enables high cardinality use cases. It also enables for faster scan rates. So if you want to define like the min and max value of the temperature in the room across a thousand different points you only have to get those a thousand different points in order to answer that question and you have those immediately available to you. But let's contrast this with a row oriented storage solution instead so that we can understand better the benefits of column oriented storage. So if you had a row oriented storage, you'd first have to look at every field like the temperature in the room and the temperature of the store. You'd have to go across every tag value that maybe describes where the room is located or what model the store is. And every timestamp you then have to pluck out that one temperature value that you want at that one time stamp and do that for every single row. So you're scanning across a ton more data and that's why row oriented doesn't provide the same efficiency as column and Apache Arrow is in memory column data column data fit framework. So that's where a lot of the advantages come from. >> Okay. So you've basically described like a traditional database a row approach, but I've seen like a lot of traditional databases say, okay, now we've got we can handle Column format versus what you're talking about is really kind of native is it not as effective as the former not as effective because it's largely a bolt on? Can you like elucidate on that front? >> Yeah, it's not as effective because you have more expensive compression and because you can't scan across the values as quickly. And so those are, that's pretty much the main reasons why row oriented storage isn't as efficient as column oriented storage. >> Yeah. Got it. So let's talk about Arrow data fusion. What is data fusion? I know it's written in Rust but what does it bring to to the table here? >> Sure. So it's an extensible query execution framework and it uses Arrow as its in memory format. So the way that it helps InfluxDB IOx is that okay it's great if you can write unlimited amount of cardinality into InfluxDB, but if you don't have a query engine that can successfully query that data then I don't know how much value it is for you. So data fusion helps enable the query process and transformation of that data. It also has a Pandas API so that you could take advantage of Pandas data frames as well and all of the machine learning tools associated with Pandas. >> Okay. You're also leveraging Par-K in the platform course. We heard a lot about Par-K in the middle of the last decade cuz as a storage format to improve on Hadoop column stores. What are you doing with Par-K and why is it important? >> Sure. So Par-K is the column oriented durable file format. So it's important because it'll enable bulk import and bulk export. It has compatibility with Python and Pandas so it supports a broader ecosystem. Par-K files also take very little disc space and they're faster to scan because again they're column oriented, in particular I think Par-K files are like 16 times cheaper than CSV files, just as kind of a point of reference. And so that's essentially a lot of the benefits of Par-K. >> Got it. Very popular. So and these, what exactly is Influx data focusing on as a committer to these projects? What is your focus? What's the value that you're bringing to the community? >> Sure. So InfluxDB first has contributed a lot of different things to the Apache ecosystem. For example, they contribute an implementation of Apache Arrow and go and that will support clearing Influx. Also, there has been a quite a few contributions to data fusion for things like memory optimization and supportive additional SQL features like support for timestamp, arithmetic and support for exist clauses and support for memory control. So yeah, Influx has contributed a lot to the Apache ecosystem and continues to do so. And I think kind of the idea here is that if you can improve these upstream projects and then the long term strategy here is that the more you contribute and build those up then the more you will perpetuate that cycle of improvement and the more we will invest in our own project as well. So it's just that kind of symbiotic relationship and appreciation of the open source community. >> Yeah. Got it. You got that virtuous cycle going people call it the flywheel. Give us your last thoughts and kind of summarize, what the big takeaways are from your perspective. >> So I think the big takeaway is that, Influx data is doing a lot of really exciting things with InfluxDB IOx and I really encourage if you are interested in learning more about the technologies that Influx is leveraging to produce IOx the challenges associated with it and all of the hard work questions and I just want to learn more then I would encourage you to go to the monthly Tech talks and community office hours and they are on every second Wednesday of the month at 8:30 AM Pacific time. There's also a community forums and a community Slack channel. Look for the InfluxDB underscore IOx channel specifically to learn more about how to join those office hours and those monthly tech talks as well as ask any questions they have about IOx what to expect and what you'd like to learn more about. I as a developer advocate, I want to answer your questions. So if there's a particular technology or stack that you want to dive deeper into and want more explanation about how InfluxDB leverages it to build IOx, I will be really excited to produce content on that topic for you. >> Yeah, that's awesome. You guys have a really rich community collaborate with your peers, solve problems and you guys super responsive, so really appreciate that. All right, thank you so much Anais for explaining all this open source stuff to the audience and why it's important to the future of data. >> Thank you. I really appreciate it. >> All right, you're very welcome. Okay, stay right there and in a moment I'll be back with Tim Yoakam. He's the director of engineering for Influx Data and we're going to talk about how you update a SaaS engine while the plane is flying at 30,000 feet. You don't want to miss this. (upbeat music)

Published Date : Oct 18 2022

SUMMARY :

and bring the world of data It's a pleasure to be here. it's going to give you and some of the most impressive ones to me and you got big guns and dangling pointers are the main classes Yeah, and the more I and the temperature of the store. is it not as effective as the former not and because you can't scan to to the table here? So the way that it helps Par-K in the platform course. and they're faster to scan So and these, what exactly is Influx data and appreciation of the and kind of summarize, of the hard work questions and you guys super responsive, I really appreciate it. and we're going to talk about

ENTITIES

Entity	Category	Confidence
Tim Yoakam	PERSON	0.99+
Brian	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Anais	PERSON	0.99+
two rows	QUANTITY	0.99+
16 times	QUANTITY	0.99+
Influx Data	ORGANIZATION	0.99+
each row	QUANTITY	0.99+
Python	TITLE	0.99+
Rust	TITLE	0.99+
C++	TITLE	0.99+
SQL	TITLE	0.99+
Anais Dotis Georgiou	PERSON	0.99+
InfluxDB	TITLE	0.99+
both	QUANTITY	0.99+
Rust Foundation	ORGANIZATION	0.99+
30,000 feet	QUANTITY	0.99+
first one	QUANTITY	0.99+
Mozilla	ORGANIZATION	0.99+
Pandas	TITLE	0.98+
InfluxData	ORGANIZATION	0.98+
Influx	ORGANIZATION	0.98+
IOx	TITLE	0.98+
each column	QUANTITY	0.97+
one time stamp	QUANTITY	0.97+
first	QUANTITY	0.97+
Influx	TITLE	0.96+
Anais Dotis-Georgiou	PERSON	0.95+
Crates IO	TITLE	0.94+
IOx	ORGANIZATION	0.94+
two temperature values	QUANTITY	0.93+
Apache	ORGANIZATION	0.93+
today	DATE	0.93+
8:30 AM Pacific time	DATE	0.92+
Wednesday	DATE	0.91+
one temperature	QUANTITY	0.91+
two temperature values	QUANTITY	0.91+
InfluxDB IOx	TITLE	0.9+
influx	ORGANIZATION	0.89+
last decade	DATE	0.88+
single row	QUANTITY	0.83+
a ton more data	QUANTITY	0.81+
thousand	QUANTITY	0.8+
dozens of other features	QUANTITY	0.8+
a thousand different points	QUANTITY	0.79+
Hadoop	TITLE	0.77+
Par-K	TITLE	0.76+
points	QUANTITY	0.75+
each	QUANTITY	0.75+
Slack	TITLE	0.74+
Evolving InfluxDB	TITLE	0.68+
kilometer	QUANTITY	0.67+
Arrow	TITLE	0.62+
The Cube	ORGANIZATION	0.61+

Jack Andersen & Joel Minnick, Databricks | AWS Marketplace Seller Conference 2022

(upbeat music) >> Welcome back everyone to The Cubes coverage here in Seattle, Washington. For AWS's Marketplace Seller Conference. It's the big news within the Amazon partner network, combining with marketplace, forming the Amazon partner organization. Part of a big reorg as they grow to the next level, NextGen cloud, mid-game on the chessboard. Cube's got it covered. I'm John Furry, your host at Cube. Great guests here from Data bricks. Both cube alumni's. Jack Anderson, GM and VP of the Databricks partnership team for AWS. You handle that relationship and Joel Minick vice president of product and partner marketing. You guys have the keys to the kingdom with Databricks and AWS. Thanks for joining. Good to see you again. >> Thanks for having us back. >> Yeah, John, great to be here. >> So I feel like we're at Reinvent 2013. Small event, no stage, but there's a real shift happening with procurement. Obviously it's a no brainer on the micro, you know, people should be buying online. Self-service, Cloud Scale. But Amazon's got billions being sold through their marketplace. They've reorganized their partner network. You can see kind of what's going on. They've kind of figured it out. Like let's put everything together and simplify and make it less of a website, marketplace. Merge our partner organizations, have more synergy and frictionless experiences so everyone can make more money and customer's are going to be happier. >> Yeah, that's right. >> I mean, you're running relationship. You're in the middle of it. >> Well, Amazon's mental model here is that they want the world's best ISVs to operate on AWS so that we can collaborate and co architect on behalf of customers. And that's exactly what the APO and marketplace allow us to do, is to work with Amazon on these really, you know, unique use cases. >> You know, I interviewed Ali many times over the years. I remember many years ago, maybe six, seven years ago, we were talking. He's like, "we're all in on AWS." Obviously now the success of Databricks, you've got multiple clouds, see that. Customers have choice. But I remember the strategy early on. It was like, we're going to be deep. So this is, speaks volumes to the relationship you have. Years. Jack, take us through the relationship that Databricks has with AWS from a partner perspective. Joel, and from a product perspective. Because it's not like you guys are Johnny come lately, new to the scene. >> Right. >> You've been there, almost president creation of this wave. What's the relationship and how does it relate to what's going on today? >> So most people may not know that Databricks was born on AWS. We actually did our first $100 million of revenue on Amazon. And today we're obviously available on multiple clouds. But we're very fond of our Amazon relationship. And when you look at what the APN allows us to do, you know, we're able to expand our reach and co-sell with Amazon, and marketplace broadens our reach. And so, we think of marketplace in three different aspects. We've got the marketplace private offer business, which we've been doing for a number of years. Matter of fact, we were driving well over a hundred percent year over year growth in private offers. And we have a nine figure business. So it's a very significant business. And when a customer uses a private offer, that private offer counts against their private pricing agreement with AWS. So they get pricing power against their private pricing. So it's really important it goes on their Amazon bill. In may we launched our pay as you go, on demand offering. And in five short months, we have well over a thousand subscribers. And what this does, is it really reduces the barriers to entry. It's low friction. So anybody in an enterprise or startup or public sector company can start to use Databricks on AWS, in a consumption based model, and have it go against their monthly bill. And so we see customers, you know, doing rapid experimentation, pilots, POCs. They're really learning the value of that first, use case. And then we see rapid use case expansion. And the third aspect is the consulting partner, private offer, CPPO. Super important in how we involve our partner ecosystem of our consulting partners and our resellers that are able to work with Databricks on behalf of customers. >> So you got the big contracts with the private offer. You got the product market fit, kind of people iterating with data, coming in with the buyers you get. And obviously the integration piece all fitting in there. >> Exactly. >> Okay, so those are the offers, that's current, what's in marketplace today. Is that the products... What are people buying? >> Yeah. >> I mean, I guess what's the... Joel, what are people buying in the marketplace? And what does it mean for them? >> So fundamentally what they're buying is the ability to take silos out of their organization. And that is the problem that Databricks is out there to solve. Which is, when you look across your data landscape today, you've got unstructured data, you've got structured data, you've got real time streaming data. And your teams are trying to use all of this data to solve really complicated problems. And as Databricks, as the Lakehouse Company, what we're helping customers do is, how do they get into the new world? How do they move to a place where they can use all of that data across all of their teams? And so we allow them to begin to find, through the marketplace, those rapid adoption use cases where they can get rid of these data warehousing, data lake silos they've had in the past. Get their unstructured and structured data onto one data platform, an open data platform, that is no longer adherent to any proprietary formats and standards and something they can, very much, very easily, integrate into the rest of their data environment. Apply one common data governance layer on top of that. So that from the time they ingest that data, to the time they use that data, to the time they share that data, inside and outside of their organization, they know exactly how it's flowing. They know where it came from. They know who's using it. They know who has access to it. They know how it's changing. And then with that common data platform, with that common governance solution, they'd being able to bring all of those use cases together. Across their real time streaming, their data engineering, their BI, their AI. All of their teams working on one set of data. And that lets them move really, really fast. And it also lets them solve challenges they just couldn't solve before. A good example of this, you know, one of the world's now largest data streaming platforms runs on Databricks with AWS. And if you think about what does it take to set that up? Well, they've got all this customer data that was historically inside of data warehouses. That they have to understand who their customers are. They have all this unstructured data, they've built their data science model, so they can do the right kinds of recommendation engines and forecasting around. And then they've got all this streaming data going back and forth between click stream data, from what the customers are doing with their platform and the recommendations they want to push back out. And if those teams were all working in individual silos, building these kinds of platforms would be extraordinarily slow and complex. But by building it on Databricks, they were able to release it in record time and have grown at a record pace to now be the number one platform. >> And this product, it's impacting product development. >> Absolutely. >> I mean, this is like the difference between lagging months of product development, to like days. >> Yes. >> Pretty much what you're getting at. >> Yes. >> So total agility. >> Mm-hmm. >> I got that. Okay, now, I'm a customer I want to buy in the marketplace, but you got direct Salesforce up there. So how do you guys look at this? Is there channel conflict? Are there comp programs? Because one of the things I heard today in on the stage from AWS's leadership, Chris, was up there speaking, and Mona was, "Hey, he's a CRO conference chief revenue officer" conversation. Which means someone's getting compensated. So, if I'm the sales rep at Databricks, what's my motion to the customer? Do I get paid? Does Amazon sell it? Take us through that. Is there channel conflict? Or, how do you handle it? >> Well, I'd add what Joel just talked about with, you know, with the solution, the value of the solution our entire offering is available on AWS marketplace. So it's not a subset, it's the entire Data Bricks offering. And- >> The flagship, all the, the top stuff. >> Everything, the flagship, the complete offering. So it's not segmented. It's not a sub segment. >> Okay. >> It's, you know, you can use all of our different offerings. Now when it comes to seller compensation, we view this two different ways, right? One is that AWS is also incented, right? Versus selling a native service to recommend Databricks for the right situation. Same thing with Databricks, our sales force wants to do the right thing for the customer. If the customer wants to use marketplace as their procurement vehicle. And that really helps customers because if you get Databricks and five other ISVs together, and let's say each ISV is spending, you're spending a million dollars. You have $5 million of spend. You put that spend through the flywheel with AWS marketplace, and then you can use that in your negotiations with AWS to get better pricing overall. So that's how we view it. >> So customers are driving. This sounds like. >> Correct. For sure. >> So they're looking at this as saying, Hey, I'm going to just get purchasing power with all my relationships. Because it's a solution architectural market, right? >> Yeah. It makes sense. Because if most customers will have a primary and secondary cloud provider. If they can consolidate, you know, multiple ISV spend through that same primary provider, you get pricing power. >> Okay, Joel, we're going to date ourselves. At least I will. So back in the old days, (group laughter) It used to be, do a Barney deal with someone, Hey, let's go to market together. You got to get paper, you do a biz dev deal. And then you got to say, okay, now let's coordinate our sales teams, a lot of moving parts. So what you're getting at here is that the alternative for Databricks, or any company is, to go find those partners and do deals, versus now Amazon is the center point for the customer. So you can still do those joint deals, but this seems to be flipping the script a little bit. >> Well, it is, but we still have vars and consulting partners that are doing implementation work. Very valuable work, advisory work, that can actually work with marketplace through the CPPO offering. So the marketplace allows multiple ways to procure your solution. >> So it doesn't change your business structure. It just makes it more efficient. >> That's correct. >> That's a great way to say it. >> Yeah, that's great. >> Okay. So, that's it. So that's just makes it more efficient. So you guys are actually incented to point customers to the marketplace. >> Yes. >> Absolutely. >> Economically. >> Economically, it's the right thing to do for the customer. It's the right thing to do for our relationship with Amazon. Especially when it comes back to co-selling, right? Because Amazon now is leaning in with ISVs and making recommendations for, you know, an ISV solution. And our teams are working backwards from those use cases, you know, to collaborate and land them. >> Yeah. I want to get that out there. Go ahead, Joel. >> So one of the other things I might add to that too, you know, and why this is advantageous for companies like Databricks to work through the marketplace. Is it makes it so much easier for customers to deploy a solution. It's very, literally, one click through the marketplace to get Databricks stood up inside of your environment. And so if you're looking at how do I help customers most rapidly adopt these solutions in the AWS cloud, the marketplace is a fantastic accelerator to that. >> You know, it's interesting. I want to bring this up and get your reaction to it because to me, I think this is the future of procurement. So from a procurement standpoint, I mean, again, dating myself, EDI back in the old days, you know, all that craziness. Now this is all the internet, basically through the console. I get the infrastructure side, you know, spin up and provision some servers, all been good. You guys have played well there in the marketplace. But now as we get into more of what I call the business apps, and they brought this up on stage. A little nuanced. Most enterprises aren't yet there of integrating tech, on the business apps, into the stack. This is where I think you guys are a use case of success where you guys have been successful with data integration. It's an integrators dilemma, not an innovator's dilemma. So like, I want to integrate. So now I have integration points with Databricks, but I want to put an app in there. I want to provision an application, but it has to be built. It's not, you don't buy it. You build, you got to build stuff. And this is the nuance. What's your reaction to that? Am I getting this right? Or am I off because, no one's going to be buying software like they used to. They buy software to integrate it. >> Yeah, no- >> Because everything's integrated. >> I think AWS has done a great job at creating a partner ecosystem, right? To give customers the right tools for the right jobs. And those might be with third parties. Databricks is doing the same thing with our partner connect program, right? We've got customer partners like Five Tran and DBT that, you know, augment and enhance our platform. And so you're looking at multi ISV architectures and all of that can be procured through the AWS marketplace. >> Yeah. It's almost like, you know, bundling and un bundling. I was talking about this with, with Dave Alante about Supercloud. Which is why wouldn't a customer want the best solution in their architecture? Period. In its class. If someone's got API security or an API gateway. Well, you know, I don't want to be forced to buy something because it's part of a suite. And that's where you see things get sub optimized. Where someone dominates a category and they have, oh, you got to buy my version of this. >> Joel and I were talking, we were actually saying, what's really important about Databricks, is that customers control the data, right? You want to comment on that? >> Yeah. I was going to say, you know, what you're pushing on there, we think is extraordinarily, you know, the way the market is going to go. Is that customers want a lot of control over how they build their data stack. And everyone's unique in what tools are the right ones for them. And so one of the, you know, philosophically, I think, really strong places, Databricks and AWS have lined up, is we both take an approach that you should be able to have maximum flexibility on the platform. And as we think about the Lakehouse, one thing we've always been extremely committed to, as a company, is building the data platform on an open foundation. And we do that primarily through Delta Lake and making sure that, to Jack's point, with Databricks, the data is always in your control. And then it's always stored in a completely open format. And that is one of the things that's allowed Databricks to have the breadth of integrations that it has with all the other data tools out there. Because you're not tied into any proprietary format, but instead are able to take advantage of all the innovation that's happening out there in the open source ecosystem. >> When you see other solutions out there that aren't as open as you guys, you guys are very open by the way, we love that too. We think that's a great strategy, but what am I foreclosing if I go with something else that's not as open? What's the customer's downside as you think about what's around the corner in the industry? Because if you believe it's going to be open, open source, which I think open source software is the software industry, and integration is a big deal. Because software's going to be plentiful. >> Sure. >> Let's face it. It's a good time to be in software business. But Cloud's booming. So what's the downside, from your Databricks perspective? You see a buyer clicking on Databricks versus that alternative. What's potentially should they be a nervous about, down the road, if they go with a more proprietary or locked in approach? >> Yeah. >> Well, I think the challenge with proprietary ecosystems is you become beholden to the ability of that provider to both build relationships and convince other vendors that they should invest in that format. But you're also, then, beholden to the pace at which that provider is able to innovate. >> Mm-hmm. >> And I think we've seen lots of times over history where, you know, a proprietary format may run ahead, for a while, on a lot of innovation. But as that market control begins to solidify, that desire to innovate begins to degrade. Whereas in the open formats- >> So extract rents versus innovation. (John laughs) >> Exactly. Yeah, exactly. >> I'll say it. >> But in the open world, you know, you have to continue to innovate. >> Yeah. >> And the open source world is always innovating. If you look at the last 10 to 15 years, I challenge you to find, you know, an example where the innovation in the data and AI world is not coming from open source. And so by investing in open ecosystems, that means you are always going to be at the forefront of what is the latest. >> You know, again, not to date myself again, but you look back at the eighties and nineties, the protocol stacked with proprietary. >> Yeah. >> You know, SNA and IBM, deck net was digital. You know the rest. And then TCPIP was part of the open systems interconnect. >> Mm-hmm. >> Revolutionary (indistinct) a big part of that, as well as my school did. And so like, you know, that was, but it didn't standardize the whole stack. It stopped at IP and TCP. >> Yeah. >> But that helped inter operate, that created a nice defacto. So this is a big part of this mid game. I call it the chessboard, you know, you got opening game and mid-game, then you get the end game. You're not there at the end game yet at Cloud. But Cloud- >> There's, always some form of lock in, right? Andy Jazzy will address it, you know, when making a decision. But if you're going to make a decision you want to reduce- You don't want to be limited, right? So I would advise a customer that there could be limitations with a proprietary architecture. And if you look at what every customer's trying to become right now, is an AI driven business, right? And so it has to do with, can you get that data out of silos? Can you organize it and secure it? And then can you work with data scientists to feed those models? >> Yeah. >> In a very consistent manner. And so the tools of tomorrow will, to Joel's point, will be open and we want interoperability with those tools. >> And choice is a matter too. And I would say that, you know, the argument for why I think Amazon is not as locked in as maybe some other clouds, is that they have to compete directly too. Redshift competes directly with a lot of other stuff. But they can't play the bundling game because the customers are getting savvy to the fact that if you try to bundle an inferior product with something else, it may not work great at all. And they're going to be, they're onto it. This is the- >> To Amazon's credit by having these solutions that may compete with native services in marketplace, they are providing customers with choice, low price- >> And access to the core value. Which is the hardware- >> Exactly. >> Which is their platform. Okay. So I want to get you guys thought on something else I see emerging. This is, again, kind of Cube rumination moment. So on stage, Chris unpacked a lot of stuff. I mean this marketplace, they're touching a lot of hot buttons here, you know, pricing, compensation, workflows, services behind the curtain. And one of those things he mentioned was, they talk about resellers or channel partners, depending upon what you talk about. We believe, Dave and I believe on the Cube, that the entire indirect sales channel of the industry is going to be disrupted radically. Because those players were selling hardware in the old days and software. That game is going to change. You mentioned you guys have a program, let me get your thoughts on this. We believe that once this gets set up, they can play in this game and bring their services in. Which means that the old reseller channels are going to be rewritten. They're going to be refactored with this new kinds of access. Because you've got scale, you've got money and you've got product. And you got customers coming into the marketplace. So if you're like a reseller that sold computers to data centers or software, you know, a value added reseller or VAB or business. >> You've got to evolve. >> You got to, you got to be here. >> Yes. >> Yeah. >> How are you guys working with those partners? Because you say you have a product in your marketplace there. How do I make money if I'm a reseller with Databricks, with Amazon? Take me through that use case. >> Well I'll let Joel comment, but I think it's pretty straightforward, right? Customers need expertise. They need knowhow. When we're seeing customers do mass migrations to the cloud or Hadoop specific migrations or data transformation implementations. They need expertise from consulting and SI partners. If those consulting and SI partners happen to resell the solution as well. Well, that's another aspect of their business. But I really think it is the expertise that the partners bring to help customers get outcomes. >> Joel, channel big opportunity for Amazon to reimagine this. >> For sure. Yeah. And I think, you know, to your comment about how do resellers take advantage of that, I think what Jack was pushing on is spot on. Which is, it's becoming more and more about the expertise you bring to the table. And not just transacting the software. But now actually helping customers make the right choices. And we're seeing, you know, both SIs begin to be able to resell solutions and finding a lot of opportunity in that. >> Yeah. And I think we're seeing traditional resellers begin to move into that SI model as well. And that's going to be the evolution that this goes. >> At the end of the day, it's about services, right? >> For sure. Yeah. >> I mean... >> You've got a great service. You're going to have high gross profits. >> Yeah >> Managed service provider business is alive and well, right? Because there are a number of customers that want that type of a service. >> I think that's going to be a really hot, hot button for you guys. I think being the way you guys are open, this channel, partner services model coming in, to the fold, really kind of makes for kind of that Supercloud like experience, where you guys now have an ecosystem. And that's my next question. You guys have an ecosystem going on, within Databricks. >> For sure. >> On top of this ecosystem. How does that work? This is kind of like, hasn't been written up in business school and case studies yet. This is new. What is this? >> I think, you know, what it comes down to is, you're seeing ecosystems begin to evolve around the data platforms. And that's going to be one of the big, kind of, new horizons for us as we think about what drives ecosystems. It's going to be around, well, what's the data platform that I'm using? And then all the tools that have to encircle that to get my business done. And so I think there's, you know, absolutely ecosystems inside of the AWS business on all of AWS's services, across data analytics and AI. And then to your point, you are seeing ecosystems now arise around Databricks in its Lakehouse platform as well. As customers are looking at well, if I'm standing these Lakehouses up and I'm beginning to invest in this, then I need a whole set of tools that help me get that done as well. >> I mean you think about ecosystem theory, we're living a whole nother dream. And I'm not kidding. It hasn't yet been written up and for business school case studies is that, we're now in a whole nother connective tissue, ecology thing happening. Where you have dependencies and value proposition. Economics, connectedness. So you have relationships in these ecosystems. >> And I think one of the great things about the relationships with these ecosystems, is that there's a high degree of overlap. >> Yeah. >> So you're seeing that, you know, the way that the cloud business is evolving, the ecosystem partners of Databricks, are the same ecosystem partners of AWS. And so as you build these platforms out into the cloud, you're able to really take advantage of best of breed, the broadest set of solutions out there for you. >> Joel, Jack, I love it because you know what it means? The best ecosystem will win, if you keep it open. >> Sure, sure. >> You can see everything. If you're going to do it in the dark, you know, you don't know the outcome. I mean, this is really kind of what we're talking about. >> And John, can I just add that when I was at Amazon, we had a theory that there's buyers and builders, right? There's very innovative companies that want to build things themselves. We're seeing now that that builders want to buy a platform. Right? >> Yeah. >> And so there's a platform decision being made and that ecosystem is going to evolve around the platform. >> Yeah, and I totally agree. And the word innovation gets kicked around. That's why, you know, when we had our Supercloud panel, it was called the innovators dilemma, with a slash through it, called the integrater's dilemma. Innovation is the digital transformation. So- >> Absolutely. >> Like that becomes cliche in a way, but it really becomes more of a, are you open? Are you integrating? If APIs are connective tissue, what's automation, what's the service messages look like? I mean, a whole nother set of, kind of thinking, goes on in these new ecosystems and these new products. >> And that thinking is, has been born in Delta Sharing, right? So the idea that you can have a multi-cloud implementation of Databricks, and actually share data between those two different clouds, that is the next layer on top of the native cloud solution. >> Well, Databricks has done a good job of building on top of the goodness of, and the CapEx gift from AWS. But you guys have done a great job taking that building differentiation into the product. You guys have great customer base, great growing ecosystem. And again, I think a shining example of what every enterprise is going to do. Build on top of something, operating model, get that operating model, driving revenue. >> Mm-hmm. >> Yeah. >> Whether, you're Goldman Sachs or capital one or XYZ corporation. >> S and P global, NASDAQ. >> Yeah. >> We've got, you know, the biggest verticals in the world are solving tough problems with Databricks. I think we'd be remiss because if Ali was here, he would really want to thank Amazon for all of the investments across all of the different functions. Whether it's the relationship we have with our engineering and service teams. Our marketing teams, you know, product development. And we're going to be at Reinvent. A big presence at Reinvent. We're looking forward to seeing you there, again. >> Yeah. We'll see you guys there. Yeah. Again, good ecosystem. I love the ecosystem evolutions happening. This NextGen Cloud is here. We're seeing this evolve, kind of new economics, new value propositions kind of scaling up. Producing more. So you guys are doing a great job. Thanks for coming on the Cube and taking the time. Joel, great to see you at the check. >> Thanks for having us, John. >> Okay. Cube coverage here. The world's changing as APN comes together with the marketplace for a new partner organization at Amazon web services. The Cube's got it covered. This should be a very big, growing ecosystem as this continues. Billions of being sold through the marketplace. And of course the buyers are happy as well. So we've got it all covered. I'm John Furry. your host of the cube. Thanks for watching. (upbeat music)

Published Date : Oct 10 2022

SUMMARY :

You guys have the keys to the kingdom on the micro, you know, You're in the middle of it. you know, unique use cases. to the relationship you have. and how does it relate to And so we see customers, you know, And obviously the integration Is that the products... buying in the marketplace? And that is the problem that Databricks And this product, it's the difference between So how do you guys look at So it's not a subset, it's the Everything, the flagship, and then you can use So customers are driving. For sure. Hey, I'm going to just you know, multiple ISV spend here is that the alternative So the marketplace allows multiple ways So it doesn't change So you guys are actually incented It's the right thing to do for out there. the marketplace to get Databricks stood up I get the infrastructure side, you know, Databricks is doing the same thing And that's where you see And that is one of the things that aren't as open as you guys, down the road, if they go that provider is able to innovate. that desire to innovate begins to degrade. So extract rents versus innovation. Yeah, exactly. But in the open world, you know, And the open source the protocol stacked with proprietary. You know the rest. And so like, you know, that was, I call it the chessboard, you know, And if you look at what every customer's And so the tools of tomorrow And I would say that, you know, And access to the core value. to data centers or software, you know, How are you guys working that the partners bring to to reimagine this. And I think, you know, And that's going to be the Yeah. You're going to have high gross profits. that want that type of a service. I think being the way you guys are open, This is kind of like, And so I think there's, you know, So you have relationships And I think one of the great things And so as you build these because you know what it means? in the dark, you know, that want to build things themselves. to evolve around the platform. And the word innovation more of a, are you open? So the idea that you and the CapEx gift from AWS. Whether, you're Goldman for all of the investments across Joel, great to see you at the check. And of course the buyers

ENTITIES

Entity	Category	Confidence
David Nicholson	PERSON	0.99+
Chris	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Joel	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Peter	PERSON	0.99+
Mona	PERSON	0.99+
Dave Vellante	PERSON	0.99+
David Vellante	PERSON	0.99+
Keith	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Jeff	PERSON	0.99+
Kevin	PERSON	0.99+
Joel Minick	PERSON	0.99+
Andy	PERSON	0.99+
Ryan	PERSON	0.99+
Cathy Dally	PERSON	0.99+
Patrick	PERSON	0.99+
Greg	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Stephen	PERSON	0.99+
Kevin Miller	PERSON	0.99+
Marcus	PERSON	0.99+
Dave Alante	PERSON	0.99+
Eric	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Dan	PERSON	0.99+
Peter Burris	PERSON	0.99+
Greg Tinker	PERSON	0.99+
Utah	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
Raleigh	LOCATION	0.99+
Brooklyn	LOCATION	0.99+
Carl Krupitzer	PERSON	0.99+
Lisa	PERSON	0.99+
Lenovo	ORGANIZATION	0.99+
JetBlue	ORGANIZATION	0.99+
2015	DATE	0.99+
Dave	PERSON	0.99+
Angie Embree	PERSON	0.99+
Kirk Skaugen	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
2014	DATE	0.99+
Simon	PERSON	0.99+
United	ORGANIZATION	0.99+
Stu Miniman	PERSON	0.99+
Southwest	ORGANIZATION	0.99+
Kirk	PERSON	0.99+
Frank	PERSON	0.99+
Patrick Osborne	PERSON	0.99+
1984	DATE	0.99+
China	LOCATION	0.99+
Boston	LOCATION	0.99+
California	LOCATION	0.99+
Singapore	LOCATION	0.99+

Jack Andersen & Joel Minnick, Databricks | AWS Marketplace Seller Conference 2022

>>Welcome back everyone to the cubes coverage here in Seattle, Washington, AWS's marketplace seller conference. It's the big news within the Amazon partner network, combining with marketplaces, forming the Amazon partner organization, part of a big reorg as they grow the next level NextGen cloud mid-game on the chessboard. Cube's got cover. I'm John fur, host of Cub, a great guests here from data bricks, both cube alumnis, Jack Anderson, GM of the and VP of the data bricks partnership team. For ADOS, you handle that relationship and Joel Minick vice president of product and partner marketing. You guys are the, have the keys to the kingdom with data, bricks, and AWS. Thanks for joining. Thanks for good to see you again. Thanks for >>Having us back. Yeah, John, great to be here. >>So I feel like we're at reinvent 2013 small event, no stage, but there's a real shift happening with procurement. Obviously it makes it's a no brainer on the micro, you know, people should be buying online self-service cloud scale, but Amazon's got billions being sold to their marketplace. They've reorganized their partner network. You can see kind of what's going on. They've kind of figured it out. Like let's put everything together and simplify and make it less of a website marketplace merge our partner to have more synergy and friction, less experiences so everyone can make more money and customer's gonna be happier. >>Yeah, that's right. >>I mean, you're run relationship. You're in the middle of it. >>Well, Amazon's mental model here is that they want the world's best ISVs to operate on AWS so that we can collaborate and co architect on behalf of customers. And that's exactly what the APO and marketplace allow us to do is to work with Amazon on these really, you know, unique use cases. >>You know, I interviewed Ali many times over the years. I remember many years ago, I think six, maybe six, seven years ago, we were talking. He's like, we're all in ons. Obviously. Now the success of data bricks, you've got multiple clouds. See that customers have choice, but I remember the strategy early on. It was like, we're gonna be deep. So this is speaks volumes to the, the relationship you have years. Jack take us through the relationship that data bricks has with AWS from a, from a partner perspective, Joel, and from a product perspective, because it's not like you got to Johnny come lately new to the new, to the scene, right? We've been there almost president creation of this wave. What's the relationship and has it relate to what's going on today? >>So, so most people may not know that data bricks was born on AWS. We actually did our first 100 million of revenue on Amazon. And today we're obviously available on multiple clouds, but we're very fond of our Amazon relationship. And when you look at what the APN allows us to do, you know, we're able to expand our reach and co-sell with Amazon and marketplace broadens our reach. And so we think of marketplace in three different aspects. We've got the marketplace, private offer business, which we've been doing for a number of years. Matter of fact, we we're driving well over a hundred percent year over year growth in private offers and we have a nine figure business. So it's a very significant business. And when a customer uses a private offer that private offer counts against their private pricing agreement with AWS. So they get pricing power against their, their private pricing. >>So it's really important. It goes on their Amazon bill in may. We launched our pay as you go on demand offering. And in five short months, we have well over a thousand subscribers. And what this does is it really reduces the barriers to entry it's low friction. So anybody in an enterprise or startup or public sector company can start to use data bricks on AWS and pay consumption based model and have it go against their monthly bill. And so we see customers, you know, doing rapid experimentation pilots, POCs, they're, they're really learning the value of that first use case. And then we see rapid use case expansion. And the third aspect is the consulting partner, private offers C P O super important in how we involve our partner ecosystem of our consulting partners and our resellers that are able to work with data bricks on behalf of customers. >>So you got the big contracts with the private offer. You got the product market fit, kind of people iterating with data coming in with, with the buyers you go. And obviously the integration piece all fitting in there. Exactly. Exactly. Okay. So that's that those are the offers that's current and what's in marketplace today. Is that the products, what are, what are people buying? I mean, I guess what's the Joel, what are, what are people buying in the marketplace and what does it mean for >>Them? So fundamentally what they're buying is the ability to take silos out of their organization. And that's, that is the problem that data bricks is out there to solve, which is when you look across your data landscape today, you've got unstructured data, you've got structured data, you've got real time streaming data, and your teams are trying to use all of this data to solve really complicated problems. And as data bricks as the lake house company, what we're helping customers do is how do they get into the new world? How do they move to a place where they can use all of that data across all of their teams? And so we allow them to begin to find through the marketplace, those rapid adoption use cases where they can get rid of these data, warehousing data lake silos they've had in the past, get their unstructured and structured data onto one data platform and open data platform that is no longer adherent to any proprietary formats and standards and something. >>They can very much, very easily integrate into the rest of their data environment, apply one common data governance layer on top of that. So that from the time they ingest that data to the time they use that data to the time they share that data inside and outside of their organization, they know exactly how it's flowing. They know where it came from. They know who's using it. They know who has access to it. They know how it's changing. And then with that common data platform with that common governance solution, they'd being able to bring all of those use cases together across their real time, streaming their data engineering, their BI, their AI, all of their teams working on one set of data. And that lets them move really, really fast. And it also lets them solve challenges. They just couldn't solve before a good example of this, you know, one of the world's now largest data streaming platforms runs on data bricks with AWS. >>And if you think about what does it take to set that up? Well, they've got all this customer data that was historically inside of data warehouses, that they have to understand who their customers are. They have all this unstructured data, they've built their data science model, so they can do the right kinds of recommendation engines and forecasting around. And then they've got all this streaming data going back and forth between click stream data from what the customers are doing with their platform and the recommendations they wanna push back out. And if those teams were all working in individual silos, building these kinds of platforms would be extraordinarily slow and complex, but by building it on data bricks, they were able to release it in record time and have grown at, at record pace >>To not be that's product platform that's impacting product development. Absolutely. I mean, this is like the difference between lagging months of product development to like days. Yes. Pretty much what you're getting at. Yeah. So total agility. I got that. Okay. Now I'm a customer I wanna buy in the marketplace, but I also, you got direct Salesforce up there. So how do you guys look at this? Is there channel conflict? Are there comp programs? Because one of the things I heard today in on the stage from a Davis's leadership, Chris was up there speaking and, and, and moment I was, Hey, he's a CRO conference, chief revenue officer conversation, which means someone's getting compensated. So if I'm the sales rep at data bricks, what's my motion to the customer. Do I get paid? Does Amazon sell it? Take us through that. Is there channel conflict? Is there or an audio lift? >>Well, I I'd add what Joel just talked about with, with, you know, what the solution, the value of the solution our entire offering is available on AWS marketplace. So it's not a subset, the entire data bricks offering and >>The flagship, all the, the top, >>Everything, the flagship, the complete offering. So it's not, it's not segmented. It's not a sub segment. It's it's, you know, you can use all of our different offerings. Now when it comes to seller compensation, we, we, we view this two, two different ways, right? One is that AWS is also incented, right? Versus selling a native service to recommend data bricks for the right situation. Same thing with data bricks. Our Salesforce wants to do the right thing for the customer. If the customer wants to use marketplace as their procurement vehicle. And that really helps customers because if you get data bricks and five other ISVs together, and let's say each ISV is spending, you're spending a million dollars, you have $5 million of spend, you put that spend through the flywheel with AWS marketplace. And then you can use that in your negotiations with AWS to get better pricing overall. So that's how we, >>We do it. So customers are driving. This sounds like, correct. For sure. So they're looking at this as saying, Hey, I'm gonna just get purchasing power with all my relationships because it's a solution architectural market, right? >>Yeah. It makes sense. Because if most customers will have a primary and secondary cloud provider, if they can consolidate, you know, multiple ISV spend through that same primary provider, you get pricing >>Power, okay, Jill, we're gonna date ourselves. At least I will. So back in the old days, it used to be, do a Barney deal with someone, Hey, let's go to market together. You gotta get paper, you do a biz dev deal. And then you gotta say, okay, now let's coordinate our sales teams, a lot of moving parts. So what you're getting at here is that the alternative for data bricks or any company is to go find those partners and do deals versus now Amazon is the center point for the customer so that you can still do those joint deals. But this seems to be flipping the script a little bit. >>Well, it is, but we still have VAs and consulting partners that are doing implementation work very valuable work advisory work that can actually work with marketplace through the C PPO offering. So the marketplace allows multiple ways to procure your >>Solution. So it doesn't change your business structure. It just makes it more efficient. That's >>Correct. >>That's a great way to say it. Yeah, >>That's great. So that's so that's it. So that's just makes it more efficient. So you guys are actually incented to point customers to the marketplace. >>Yes, >>Absolutely. Economically. Yeah. >>E economically it's the right thing to do for the customer. It's the right thing to do for our relationship with Amazon, especially when it comes back to co-selling right? Because Amazon now is leaning in with ISVs and making recommendations for, you know, an ISV solution and our teams are working backwards from those use cases, you know, to collaborate, land them. >>Yeah. I want, I wanna get that out there. Go ahead, Joel. >>So one of the other things I might add to that too, you know, and why this is advantageous for, for companies like data bricks to, to work through the marketplace, is it makes it so much easier for customers to deploy a solution. It's, it's very, literally one click through the marketplace to get data bricks stood up inside of your environment. And so if you're looking at how do I help customers most rapidly adopt these solutions in the AWS cloud, the marketplace is a fantastic accelerator to that. You >>Know, it's interesting. I wanna bring this up and get your reaction to it because to me, I think this is the future of procurement. So from a procurement standpoint, I mean, again, dating myself EDI back in the old days, you know, all that craziness. Now this is all the, all the internet, basically through the console, I get the infrastructure side, you know, spin up and provision. Some servers, all been good. You guys have played well there in the marketplace. But now as we get into more of what I call the business apps, and they brought this up on stage little nuance, most enterprises aren't yet there of integrating tech on the business apps, into the stack. This is where I think you guys are a use case of success where you guys have been successful with data integration. It's an integrator's dilemma, not an innovator's dilemma. So like, I want to integrate, so now I have integration points with data bricks, but I want to put an app in there. I want to provision an application, but it has to be built. It's not, you don't buy it. You build, you gotta build stuff. And this is the nuance. What's your reaction to that? Am I getting this right? Or, or am I off because no, one's gonna be buying software. Like they used to, they buy software to integrate it. >>Yeah, >>No, I, cause everything's integrated. >>I think AWS has done a great job at creating a partner ecosystem, right. To give customers the right tools for the right jobs. And those might be with third parties, data bricks is doing the same thing with our partner connect program. Right. We've got customer, customer partners like five tra and D V T that, you know, augment and enhance our platform. And so you, you're looking at multi ISV architectures and all of that can be procured through the AWS marketplace. >>Yeah. It's almost like, you know, bundling and unbundling. I was talking about this with, with Dave ante about Supercloud, which is why wouldn't a customer want the best solution in their architecture period. And it's class. If someone's got API security or an API gateway. Well, you know, I don't wanna be forced to buy something because it's part of a suite and that's where you see things get suboptimized where someone dominates a category and they have, oh, you gotta buy my version of this. Yeah. >>Joel, Joel. And that's Joel and I were talking, we're actually saying what what's really important about Databricks is that customers control the data. Right? You wanna comment on that? >>Yeah. I was say the, you know what you're pushing on there we think is extraordinarily, you know, the way the market is gonna go is that customers want a lot of control over how they build their data stack. And everyone's unique in what tools are the right ones for them. And so one of the, you know, philosophically I think really strong places, data, bricks, and AWS have lined up is we both take an approach that you should be able to have maximum flexibility on the platform. And as we think about the lake house, one thing we've always been extremely committed to as a company is building the data platform on an open foundation. And we do that primarily through Delta lake and making sure that to Jack's point with data bricks, the data is always in your control. And then it's always stored in a completely open format. And that is one of the things that's allowed data bricks to have the breadth of integrations that it has with all the other data tools out there, because you're not tied into any proprietary format, but instead are able to take advantage of all the innovation that's happening out there in the open source ecosystem. >>When you see other solutions out there that aren't as open as you guys, you guys are very open by the way, we love that too. We think that's a great strategy, but what's the, what am I foreclosing? If I go with something else that's not as open what what's the customer's downside as you think about what's around the corner in the industry. Cuz if you believe it's gonna be open, open source, which I think opens our software is the software industry and integration is a big deal, cuz software's gonna be plentiful. Let's face it. It's a good time to be in software business, but cloud's booming. So what's the downside from your data bricks perspective, you see a buyer clicking on data bricks versus that alternative what's potentially is should they be a nervous about down the road if they go with a more proprietary or locked in approach? Well, >>I think the challenge with proprietary ecosystems is you become beholden to the ability of that provider to both build relationships and convince other vendors that they should invest in that format. But you're also then beholden to the pace at which that provider is able to innovate. And I think we've seen lots of times over history where, you know, a proprietary format may run ahead for a while on a lot of innovation. But as that market control begins to solidify that desire to innovate begins to, to degrade, whereas in the open format. So >>Extract rents versus innovation. Exactly. >>Yeah, exactly. >>But >>I'll say it in the open world, you know, you have to continue to innovate. Yeah. And the open source world is always innovating. If you look at the last 10 to 15 years, I challenge you to find, you know, an example where the innovation in the data and AI world is not coming from open source. And so by investing in open ecosystems, that means you were always going to be at the forefront of what is the >>Latest, you know, again, not to date myself again, but you look back at the eighties and nineties, the protocol stacked for proprietary. Yeah. You know, SNA at IBM deck net was digital, you know, the rest is, and then TCP, I P was part of the open systems, interconnect, revolutionary Oly, a big part of that as well as my school did. And so like, you know, that was, but it didn't standardize the whole stack. It stopped at IP and TCP. Yeah. But that helped interoperate, that created a nice defacto. So this is a big part of this mid game. I call it the chessboard, you know, you got opening game and mid game. Then you got the end game and we're not there. The end game yet cloud the cloud. >>There's, there's always some form of lock in, right. Andy jazzy will, will address it, you know, when making a decision. But if you're gonna make a decision you want to reduce as you don't wanna be limited. Right. So I would advise a customer that there could be limitations with a proprietary architecture. And if you look at what every customer's trying to become right now is an AI driven business. Right? And so it has to do with, can you get that data outta silos? Can you, can you organize it and secure it? And then can you work with data scientists to feed those models? Yeah. In a, in a very consistent manner. And so the tools of tomorrow will to Joel's point will be open and we want interoperability with those >>Tools and, and choice is a matter too. And I would say that, you know, the argument for why I think Amazon is not as locked in as maybe some other clouds is that they have to compete directly too. Redshift competes directly with a lot of other stuff, but they can't play the bundling game because the customers are getting savvy to the fact that if you try to bundle an inferior product with something else, it may not work great at all. And they're gonna be they're onto it. This is >>The Amazon's credit by having these, these solutions that may compete with native services in marketplace, they are providing customers with choice, low >>Price and access to the S and access to the core value. Exactly. Which the >>Hardware, which is their platform. Okay. So I wanna get you guys thought on something else. I, I see emerging, this is again kind of cube rumination moment. So on stage Chris unpacked, a lot of stuff. I mean this marketplace, they're touching a lot of hot buttons here, you know, pricing compensation, workflows services behind the curtain. And one of the things he mentioned was they talk about resellers or channel partners, depending upon what you talk about. We believe Dave and I believe on the cube that the entire indirect sales channel of the industry is gonna be disrupted radically because those players were selling hardware in the old days and software, that game is gonna change. You know, you mentioned you guys have a program, want to get your thoughts on this. We believe that once this gets set up, they can play in this game and bring their services in which means that the old reseller channels are gonna be rewritten. They're gonna be refactored with this new kinds of access. Cuz you've got scale, you've got money and you've got product and you got customers coming into the marketplace. So if you're like a reseller that sold computers to data centers or software, you know, value added reseller or V or business, >>You've gotta evolve. >>You gotta, you gotta be here. Yes. How are you guys working with those partners? Cuz you say you have a part in your marketplace there. How do I make money? If I'm a reseller with data bricks with eight Amazon, take me through that use case. >>Well I'll let Joel comment, but I think it's, it's, it's pretty straightforward, right? Customers need expertise. They need knowhow. When we're seeing customers do mass migrations to the cloud or Hadoop specific migrations or data transformation implementations, they need expertise from consulting and SI partners. If those consulting SI partners happen to resell the solution as well. Well, that's another aspect of their business, but I really think it is the expertise that the partners bring to help customers get outcomes. >>Joel, channel big opportunity for re re Amazon to reimagine this. >>For sure. Yeah. And I think, you know, to your comment about how to resellers take advantage of that, I think what Jack was pushing on is spot on, which is it's becoming more about more and more about the expertise you bring to the table and not just transacting the software, but now actually helping customers make the right choices. And we're seeing, you know, both SI begin to be able to resell solutions and finding a lot of opportunity in that. Yeah. And I think we're seeing traditional resellers begin to move into that SI model as well. And that's gonna be the evolution that >>This gets at the end of the day. It's about services for sure, for sure. You've got a great service. You're gonna have high gross profits. And >>I think that the managed service provider business is alive and well, right? Because there are a number of customers that want that, that type of a service. >>I think that's gonna be a really hot, hot button for you guys. I think being the way you guys are open this channel partner services model coming in to the fold really kind of makes for kind of that super cloudlike experience where you guys now have an ecosystem. And that's my next question. You guys have an ecosystem going on within data bricks for sure. On top of this ecosystem, how does that work? This is kinda like hasn't been written up in business school and case studies yet this is new. What is this? >>I think, you know, what it comes down to is you're seeing ecosystems begin to evolve around the data platforms and that's gonna be one of the big kind of new horizons for us as we think about what drives ecosystems it's going to be around. Well, what is the, what's the data platform that I'm using and then all the tools that have to encircle that to get my business done. And so I think there's, you know, absolutely ecosystems inside of the AWS business on all of AWS's services, across data analytics and AI. And then to your point, you are seeing ecosystems now arise around data bricks in its Lakehouse platform, as well as customers are looking at well, if I'm standing these Lakehouse up and I'm beginning to invest in this, then I need a whole set of tools that help me get that done as well. >>I mean you think about ecosystem theory, we're living a whole nother dream and I'm, and I'm not kidding. It hasn't yet been written up and for business school case studies is that we're now in a whole nother connective tissue ecology thing happening where you have dependencies and value proposition economics connectedness. So you have relationships in these ecosystems. >>And I think one of the great things about relationships with these ecosystems is that there's a high degree of overlap. Yeah. So you're seeing that, you know, the way that the cloud business is evolving, the, the ecosystem partners of data bricks are the same ecosystem partners of AWS. And so as you build these platforms out into the cloud, you're able to really take advantage of best of breed, the broadest set of solutions out there for >>You. Joel, Jack, I love it because you know what it means the best ecosystem will win. If you keep it open. Sure. You can see everything. If you're gonna do it in the dark, you know, you don't know the outcome. I mean, this is really kind we're talking about. >>And John, can I just add that when I was in Amazon, we had a, a theory that there's buyers and builders, right? There's very innovative companies that want to build things themselves. We're seeing now that that builders want to buy a platform. Right? Yeah. And so there's a platform decision being made and that ecosystem gonna evolve around the >>Platform. Yeah. And I totally agree. And, and, and the word innovation get kicks around. That's why, you know, when we had our super cloud panel was called the innovators dilemma with a slash through it called the integrated dilemma, innovation is the digital transformation. So absolutely like that becomes cliche in a way, but it really becomes more of a, are you open? Are you integrating if APIs are the connective tissue, what's automation, what's the service message look like. I mean, a whole nother set of kind of thinking goes on and these new ecosystems and these new products >>And that, and that thinking is, has been born in Delta sharing. Right? So the idea that you can have a multi-cloud implementation of data bricks, and actually share data between those two different clouds, that is the next layer on top of the native cloud >>Solution. Well, data bricks has done a good job of building on top of the goodness of, and the CapEx gift from AWS. But you guys have done a great job taking that building differentiation into the product. You guys have great customer base, great grow ecosystem. And again, I think in a shining example of what every enterprise is going to do, build on top of something operating model, get that operating model, driving revenue. >>Yeah. >>Well we, whether whether you're Goldman Sachs or capital one or XYZ corporation >>S and P global NASDAQ, right. We've got, you know, these, the biggest verticals in the world are solving tough problems with data breaks. I think we'd be remiss cuz if Ali was here, he would really want to thank Amazon for all of the investments across all of the different functions, whether it's the relationship we have with our engineering and service teams. Yeah. Our marketing teams, you know, product development and we're gonna be at reinvent the big presence of reinvent. We're looking forward to seeing you there again. >>Yeah. We'll see you guys there. Yeah. Again, good ecosystem. I love the ecosystem evolutions happening this next gen cloud is here. We're seeing this evolve kind of new economics, new value propositions kind of scaling up, producing more so you guys are doing a great job. Thanks for coming on the Cuban, taking time. Chill. Great to see you at the check. Thanks for having us. Thanks. Going. Okay. Cube coverage here. The world's changing as APN comes to give the marketplace for a new partner organization at Amazon web services, the Cube's got a covered. This should be a very big growing ecosystem as this continues, billions of being sold through the marketplace. Of course the buyers are happy as well. So we've got it all covered. I'm John furry, your host of the cube. Thanks for watching.

Published Date : Sep 21 2022

SUMMARY :

Thanks for good to see you again. Yeah, John, great to be here. Obviously it makes it's a no brainer on the micro, you know, You're in the middle of it. you know, unique use cases. So this is speaks volumes to the, the relationship you have years. And when you look at what the APN allows us to do, And so we see customers, you know, doing rapid experimentation pilots, POCs, So you got the big contracts with the private offer. And that's, that is the problem that data bricks is out there to solve, They just couldn't solve before a good example of this, you know, And if you think about what does it take to set that up? So how do you guys look at this? Well, I I'd add what Joel just talked about with, with, you know, what the solution, the value of the solution our entire offering And that really helps customers because if you get data bricks So they're looking at this as saying, you know, multiple ISV spend through that same primary provider, you get pricing And then you gotta say, okay, now let's coordinate our sales teams, a lot of moving parts. So the marketplace allows multiple ways to procure your So it doesn't change your business structure. Yeah, So you guys are actually incented to Yeah. It's the right thing to do for our relationship with Amazon, So one of the other things I might add to that too, you know, and why this is advantageous for, I get the infrastructure side, you know, spin up and provision. you know, augment and enhance our platform. you know, I don't wanna be forced to buy something because it's part of a suite and the data. And that is one of the things that's allowed data bricks to have the breadth of integrations that it has with When you see other solutions out there that aren't as open as you guys, you guys are very open by the I think the challenge with proprietary ecosystems is you become beholden to the Exactly. I'll say it in the open world, you know, you have to continue to innovate. I call it the chessboard, you know, you got opening game and mid game. And so it has to do with, can you get that data outta silos? And I would say that, you know, the argument for why I think Amazon Price and access to the S and access to the core value. So I wanna get you guys thought on something else. You gotta, you gotta be here. If those consulting SI partners happen to resell the solution as well. And we're seeing, you know, both SI begin to be This gets at the end of the day. I think that the managed service provider business is alive and well, right? I think being the way you guys are open this channel I think, you know, what it comes down to is you're seeing ecosystems begin to evolve around So you have relationships in And so as you build these platforms out into the cloud, you're able to really take advantage you don't know the outcome. And John, can I just add that when I was in Amazon, we had a, a theory that there's buyers and builders, That's why, you know, when we had our super cloud panel So the idea that you can have a multi-cloud implementation of data bricks, and actually share data But you guys have done a great job taking that building differentiation into the product. We're looking forward to seeing you there again. Great to see you at the check.

ENTITIES

Entity	Category	Confidence
Chris	PERSON	0.99+
Joel Minick	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
Joel	PERSON	0.99+
Ali	PERSON	0.99+
Jack Anderson	PERSON	0.99+
Dave	PERSON	0.99+
$5 million	QUANTITY	0.99+
Jack	PERSON	0.99+
two	QUANTITY	0.99+
Goldman Sachs	ORGANIZATION	0.99+
XYZ	ORGANIZATION	0.99+
Joel Minnick	PERSON	0.99+
Jack Andersen	PERSON	0.99+
Andy jazzy	PERSON	0.99+
third aspect	QUANTITY	0.99+
John fur	PERSON	0.99+
NASDAQ	ORGANIZATION	0.99+
Barney	ORGANIZATION	0.99+
both	QUANTITY	0.99+
five short months	QUANTITY	0.99+
One	QUANTITY	0.99+
APO	ORGANIZATION	0.99+
today	DATE	0.99+
IBM	ORGANIZATION	0.99+
first 100 million	QUANTITY	0.98+
tomorrow	DATE	0.98+
one	QUANTITY	0.98+
billions	QUANTITY	0.98+
Johnny	PERSON	0.97+
Davis	PERSON	0.97+
a million dollars	QUANTITY	0.96+
Salesforce	ORGANIZATION	0.96+
data bricks	ORGANIZATION	0.95+
each ISV	QUANTITY	0.95+
Seattle, Washington	LOCATION	0.95+
two different ways	QUANTITY	0.95+
one data platform	QUANTITY	0.95+
seven years ago	DATE	0.94+

Said Ouissal, Zededa | VMware Explore 2022

>>Hey, everyone. Welcome back to San Francisco. Lisa Martin and John furrier live on the floor at VMware Explorer, 2022. This is our third day of wall to wall coverage on the cube. But you know that cuz you've been here the whole time. We're pleased to welcome up. First timer to the cubes we saw is here. The CEO and founder of ZDA. Saed welcome to the program. >>Thank you for having me >>Talk to me a little bit about what ZDA does in edge. >>Sure. So ZDA is a company purely focused in edge computing. I started a company about five years ago, go after edge. So what we do is we help customers with orchestrating their edge, helping them to deploy secure monitor application services and devices at the edge. >>What's the business model for you guys. We get that out there. So the targeting the edge, which is everything from telco to whatever. Yeah. What's the business model. Yeah. >>Maybe before we go there, let's talk about edge itself. Cuz edge is complex. There's a lot of companies. I call 'em lens company nowadays, if you're not a cloud company, you're probably an edge company at this point. So we are focusing something called the distributed edge. So distributed edge. When you start putting tiny servers in environments like factory floors, solar farms, wind farms, even inside machines or well sites, et cetera. And a question that people always ask me, like why, why would you want to put, you know, servers there on servers supposed to be in a data center in the cloud? And the answer to the question actually is data gravity. So traditionally wherever the data gets created is where your applications live. But as we're connecting more and more devices to the edge of the network, we basically customers now are required to push the applications to the edge cause they can't go all the data to the cloud. So basically that's where we focus on people call it the far edge as well. You know, that's the term we've heard in the past as well. And what we do in our business model is provide customers a, a software as a service solution where they can basically deploy and monitor these applications at these highly distributed environments. >>Data, gravity comes up a lot and I want you to take a minute to explain the definition as it is today. And people have used that term, you know, with big data, going back to 2010 leads when we covering the Hadoop wave, which ended up becoming, you know, data, data, bricks, and snowflake now, but, but a lots changed, but what does it mean to be data gravity? It means that staying local, it's just what specifically describe and, and define what data gravity is. >>Yeah. So for me, data gravity is where you need to process the data, right? It's where the data usually gets created. So if you think about a web app, where does the data get created? Where people click on buttons, they, they interface with it. They, they upload content to it, et cetera. So that's where the data gravity therefore is therefore that's where you do your analytics. That's where you do your visualization processing, machine learning and all of those pieces. So it's really where that data gets created is where the data gravity in my view says, >>What are some of the challenges that data and opportunities that data gravity presents to customers? >>Well, obviously I think every enterprise in this day is trying to take data and make it a competitive advantage, right? Like faster decisions, better decisions, outcompete your competition by, you know, being first with a product or being first with a product with the future, et cetera. So, so I think, you know, if you're not a data driven enterprise by now, then I think the future may be a little bit bleak. >>Okay. So you're targeting the market distributed edge business model, SAS technology, secret sauce. What's that piece. >>Yeah. So that's, that's what the interesting part comes in. I think, you know, if you kind of look at the data center in the cloud, we've had these virtualization and orchestration stacks create, I mean, we're here in VMware Explorer. And as an example, what we basically, what we saw is that the edge is so unique and so different than what we've seen in the data center, in the cloud that we needed to build a complete brand new purpose-built illustration and virtualization solution. So that's really what we, we set off to do. So there's two components that we do. One end is we built a purpose-built edge operating system for the edge and we actually open sourced it. And the reason we opensource it, we said, Hey, you know, edge is so diverse. You know, depending on the environment you're running in a machine or in a vehicle or in a well site, you have different hardware, different networks, different applications you need to enable. >>And we will never be able to support all of them ourselves. As a matter of fact, we actually think there's a need for standardization at the edge. We need to kind of cut through all these silos that have been created traditionally from the embedded way of thinking. So we created basically an open source project in the Linux foundation in LFS, which is a sister organization through the CNCF it's called project Eve. And the idea is to create the Android of the edge, basically what Android became for mobile computing, an a common operating system. So you build one app. You can run in any phone in the world that runs Android, build an architecture. You build one app. You can run in any Eve powered node in the world, >>So distributed edge and you get the tech here, get the secret sauce. We'll get more into that in a second, but I wanna just tie one kick quick point and get your clarification on edge is becoming much more about the physical side too. I mean, absolutely. So when you talk about Android, you're making the reference of a phone. I get that's metaphor to what you're doing at the edge, wind farms, factories, alarms, light bulbs, buildings. I mean, that's what you're talking about, right? Yes. We're getting down to that very, >>Very physical, dark distributed locations. >>We're gonna come back to the CISO CSO. We're gonna come back to the CISO versus CSO question because is the CISO or CIO or who runs that anyway? So that's true. What's the important thing that's happening because that sounds like old OT world, like yes. Operating technology, not it information technology, is it a complete reset of those worlds or is it a collision? >>It's a great question. So what we're seeing is first of all, there is already compute in these environments, industrial PCs of existed well beyond, you know, an industrial automation has been done for many, many decades. The point is that that stuff has been done. Collect data has been collected, but never connected, right? So with edge computing, we're connecting now this data from an industrial machine and industrial process to the cloud, right? And one of the problems is it's data that comes of that industrial process too much to upload to the cloud. So I gotta analyze, analyze it locally. So one of the, the things we saw early on in edge is there's a lot of brownfield. Most of our customers today actually have applications running on windows and they would love to make in Linux and containers and Kubernetes, but it took them 20, 30 years to build those apps. And they basically are the money makers of the enterprise. So they are in a, in a transitionary phase and they need something that can take them from the brown to the Greenfield. So to your point, you gotta support all of these types of unique brownfield applications. >>So you're, you're saying I don't really care if this is a customer, how you get the data, you wanna start new start fresh. That's cool. But if you wanna take your old data, you'll >>Take that. Yeah. You don't wanna rebuild the whole machine. You're >>Just, they can life cycle it out on their own timetable. Yeah. >>So we had to learn, first of all, how do we take and lift and shift windows based industrial application and make it run at the edge on, on our architecture. Right? And then the second step is how do we then Sen off that data that this application is generating and do we fuse it with cloud native capability? Like, >>So your cloud, so your staff is your open source that you're giving to the Linux foundation as part of that Eve project that's available to everybody. So they can, they can look at the code, which is great by the way. Yeah. So people wanna do that. Yeah. Your self source, I'm assuming, is your hardened version with support? >>Well, we took what we took, what the open source companies did, opensource companies traditionally have sold, you know, basically a support model around the open source. We actually saw another problem. Customers has like, okay, now I have this node running and I can, you know, do this data analytics, but what if I have 15 or 20,000 of these node? And they're all around the world in remote locations on satellite links or wireless connectivity, how do I orchestrate them? So we actually build an orchestration service for these nodes running this open source >>Software. So that's a key secret sauce right there. >>That is the business model that taking open store and a lot. >>And you're taking your own code that you have. Okay. Got it. Cool. And then the customer's customer piece is, is key. So that's the final piece, I guess who's using it. >>Yeah. Well, and, >>And, and one of the business outcomes that they're achieving. Oh >>Yeah. Well, so maybe start with that first. I mean, we are deployed in customers in all and gas, for instance, helping them with the transition to renewable energy, right? So basically we, we have customers for instance, that deploy us in the, how they drill Wells is one use case and doing that better, faster, and cheaper and, and less environmental impacting. But we also have customers that use us in wind farms. We have, and solar farms, like we, one of the leading solar energy companies in the world is using us to bring down the cost of power by predicting failures ahead of time, for >>Instance. And when you're working with customers to create the optimal solution at the distributed edge, who are you working with in, within an organization? Yeah. >>It's usually a mix of OT and it people. Okay. So the OT people typically they're >>Arm wrestling, well, or they're getting along, actually, >>I think they're getting along very well. Okay, good. But they also agree that they have to have swim lanes. The it folks, obviously their job is to make sure, you know, everything is secure. Everything is according to the compliance it's, it's, you know, the, the best TCO on the infrastructure, those type of things, the OT guy, they, they, or girl, they care about the application. They care about the services. They care about the support new business. So how can you create a model that too can coexist? And if you do that, they get along really well. >>You know, we had an event called Supercloud and@theurlsupercloud.world, if you're watching check it out, it's our version of what we think multicloud will merge into including edge cuz edge is just another node in the, in the, in the network. As far as we're concerned, hybrid is the steady state. That's distributed computing on premise, private cloud, public cloud. We know what that looks like. People love that things are happening. Edge is like a whole nother new area. That's blossoming and with disruption, yeah. There's a lot of existing market and incumbents that need to be disrupted. And there's also a new capabilities that are coming that we don't yet see. So we're seeing it with the super cloud idea that these new kinds of clouds are emerging. Like there could be an edge cloud. Yeah. Why isn't there a security cloud, whereas the financial services cloud, whereas the insurance cloud, whereas the, so these become super clouds where the CapEx could be done by the Amazon, whatnot you've been following them is edge cloud. Can you make that a cloud? Is that what you guys are trying to do? And if so, what does that look like? Cause we we're adding a new track to our super cloud site. I mentioned on edge specifically, we're trying to figure out you and if you share your opinion, it'd be great. Can the E can edge clouds exist and be run by companies? Yeah. Or is that what you guys are trying to do? >>I, I, I mean, I think first of all, there is no edge without cloud, right? So when I meet any customer who says, Hey, we're gonna do edge without cloud. Then I'm like, you're probably not gonna do edge computing. Right. And, and the way we built the company and the way we think about it, it's about extending the cloud experience all the way into these embedded distributed environments. That's really, I think what customers are looking for, cuz customers love the simplicity of the cloud. They love the ease of use agility, all of that greatness. And they're like, Hey, I want that. But not in a, you know, in an Amazon or Azure data center. I want that in my factories. I want that in my wealth sites, in my vehicles. And that's really what I think the future >>Is gonna. And how long have you guys been around? What's the, what's the history of the company because you might actually be that cloud. Yeah. And are you on AWS or Azure? You're building your own. What's the, >>Yeah. Yeah. So >>Take it through the, the architecture because yeah, yeah, sure. You're a modern startup. I mean you gotta, and the edges you're going after you gotta be geared up. Yeah. To win that. Yeah. >>So, so the company's about five years old. So we, when we started focusing on edge, people didn't necessarily talk as much about edge. We kind of identified the it's like, you know, how do you find a black hole in, in the universe? Cuz you can't see it, but you sort of look around that's why you in it. And so we were like looking at it, like there's something gonna happen here at the edge of the network, because everybody's saying we're connecting these vice upload the data to the cloud's never gonna work. My background is networking. I worked at companies like Juniper and Ericsson ran several products there. So I know how the internet networks have built. And it was very Evan to me. It's not gonna be possible. My co-founders come from open source companies like pivotal and Cloudera. My auto co-founder was a, an engineer at sun Microsystems built the first network stack in the solar is operating system. So a lot of experience that kind of came together to build this. >>Yeah. Cloudera is a big day. That's where the cube started by the way. Yeah. >>Yeah. So, so we, we, we have, I think a good view on the stack, the cloud stack and therefore a good view of what the ed stack needs to look like. And then I think, you know, to answer your other question, our orchestration service runs in the cloud. We have, we actually are multi-cloud company. So we offer customers choice where they want to orchestrate the node from the nodes themself, never sit in a data center. They always highly embedded. We have customers are putting machines or inside these factory lines, et cetera. Are >>You running your SAS on Amazon web services or which >>Cloud we're running it on several clouds, including Amazon, all of, pretty much the cloud. So some customers say, Hey, I'd prefer to be on the Amazon set. And others customers say, I wanna be on Azure set. >>And you leverage their CapEx on that side. Yes. On behalf of yeah. >>Yeah. We, yes. Yes. But the majority of the customer data and, and all the data that the nodes process, the customer send it to their clouds. They don't send it to us. We don't get a copy of the camera feed analytics or the machine data. We actually decouple those though. So basically the, the team production data go straight to the customer's cloud and that's why they love us. >>And they choose that they can control their own desktop. >>Yeah. So we separate the management plane from the data plane at the edge. Yeah. >>That's a good call >>Actually. Yeah. That was another very important part of the architecture early on. Cause customers don't want us to see their, you know, highly confidential production data and we don't wanna have it either. So >>We had a great chat with Chris Wolf who works with kit culvert about control plane, data, plane. So that seems to be the trend data, plane customers want full yeah. Management of that. Yeah. Control plane. Maybe give multiple >>Versions. Yeah. Yeah. So our cloud consumption what the data we stories about the apps, their behavior, the networking, the security, all of that. That's what we store in our cloud. And then customers can access that and monitor. But the actual machine that I go somewhere else >>Here we are at VMware. Explore. Talk a little bit about the VMware relationship. You just had some big news the other day. >>Yeah. So two days ago we actually made a big announcement with VMware. So we signed an OEM agreement with VMware. So we're part now of VMware's edge compute stack. So VMware customers, as they start using the recently announced edge compute stack 2.0, that was announced here. Basically it's powered by Edda technology. So it's a really exciting partnership as part of this, we actually building integrations with the VMware organization products. So that's basically now extending to more, you know, other groups inside VMware. >>So what's the value in it for VMware customers. >>Yeah. So I think the, the, the benefit of, of VMware customers, I think cus VMware customers want that multi-cloud multi edge orchestration experience. So they wanna be able to deploy workloads in the cloud. They wanna deploy the workloads in the data center. And of course also at the edge. So by us integrating in that vision customers now can have that unified experience from cloud to edge and anywhere in between. >>What's the big vision that you see happening at the edge. I mean, a lot of the VMware customers here, they're classic it that have evolved into ops now, dev ops. Now you've got second data ops coming. The edge is gonna right around the corner for them. They're dealing with it now, probably just kicking the tires, towing the water kind of thing. Where do you see the vision going? Cuz now, no matter what happens with VMware, the Broadcom, this wave is still here. You got AWS, got Azure, got Google cloud, you got Oracle, Alibaba internationally. And the cloud native surges here. How do you see that disrupting the existing edge? Because let's face it the O some of those OT players, a little bit old and antiquated, a little bit outdated. I mean, I was talking to a telco person. They, they puked the word open source. I mean, these people are so dogmatic on, on their architecture. Yeah. They're gonna get disrupted. It's a matter of time. Yeah. Where's the new guard come in. How do you see the configuration changing in the landscape? Because some people will cross over to the right side of the street here. Yeah. Some won't yeah. Open circle. Dominate cloud native will be key. Yeah. >>Well, I mean, I think, again, let's, let's take an example of a vertical that's heavily disrupted now as the automotive market, right? The, so look at Tesla and look at all these companies, they built, they built software first cars, right? Software, first delivery of capabilities and everything else. And the, and the incumbents. They have only two options, right? Either they try to respond by adopting open source cloud, native technologies. Like the, these new entrants have done and really, you know, compete with them at that level, or they can become commodity. Right. So, and I think that's the customers we're seeing the smart customers go like, we need to compete with these guys. We need to figure out how to take this technology in. And they need partners like us and partners like VMware for them. >>Do you see customers becoming cloud super cloud players? If they continue to keep leveraging the CapEx of the clouds and focus all their operational capital on top line revenue, generating activities. >>Yeah. I, so I think the CapEx model of the cloud is a great benefit of the cloud, but I think that is not, what's the longer term future of the cloud. I think the op the cloud operating model is the future. Like the agility, the ability imagine embedded software that, you know, you do an over the year update to fix a bug, but it's very hard to make a, an embedded device smarter over time. And then imagine if you can run cloud native software, you can roll out every two weeks new features and make that thing smarter, intelligent, and continue to help you in your business. That I think is what cloud did ultimately. And I think that is what really these customers are gonna need at their edge. >>Well, we talked about the value within it for customers with the VMware partnership, but what are some of your expectations? Obviously, this is a pretty powerful partnership for you guys. Yeah. What are some of the things that you're expecting that this is gonna drive? Yeah, >>So we, we, we have always operated at the more OT layer, distributed organizations in retail, energy, industrial automotive. Those are the verticals we, so we've developed. I think a lot of experience there, what, what we're seeing as we talk to those customers is they obviously have it organizations and the it organizations, Hey, that's great. You're looking at its computing, but how do we tie this into the existing investments we made with VMware? And how do we kind of take that also to this new environment? And I think that's the expectation I have is that I think we will be able to, to talk to the it folks and say, Hey, you can actually talk to the OT person. And both of you will speak the same language. You probably will both standardize on the same architecture and you'll be together deploying and enabling this new agility at the edge. >>What are some of the next things coming up for ZDA and the team? >>Well, so we've had a really amazing few quarters. We just close a series B round. So we've raised the companies raised over 55 million so far, we're growing very rapidly. We opened up no new international offices. I would say the, the early customers that we started deploying, wait a while back, they're now going into mass scale deployment. So we have now deployments underway in, you know, the 10 to hundred thousands of nodes at certain customers and in amazing environments. And so, so for us, it's continuing to prove the product in more and more verticals. Our, our product is really built for the largest of the largest. So, you know, for the size of the company, we are, we have a high concentration of fortune 500 global 500 customers, and some of them even invested in our rounds recently. So we we've been really, you know, honored with that support. Well, congratulations. Good stuff, edges popping. All right. Thank you. >>Thank you so much for joining us, talking about what you're doing in distributed edge. What's in it for customers, the VMware partnership, and by the way, congratulations on >>That too. Thank you. Thank you so much. Nice to meet you. Thank >>You. All right. Nice to meet you as well for our guest and John furrier. I'm Lisa Martin. You're watching the cube live from VMware Explorer, 22, John and I will be right back with our next guest.

Published Date : Sep 1 2022

SUMMARY :

But you know that cuz you've been here the whole time. So what we do is we help customers with orchestrating What's the business model for you guys. And the answer to the question actually And people have used that term, you know, with big data, going back to 2010 leads when we covering the Hadoop So that's where the data gravity therefore is therefore that's where you do your analytics. so I think, you know, if you're not a data driven enterprise by now, then I think the future may be a little bit bleak. What's that piece. And the reason we opensource it, And the idea is to create the Android of the edge, basically what Android became for mobile computing, So when you talk about Android, you're making the reference of a phone. So that's true. So one of the, the things we saw early But if you wanna take your old data, you'll You're Just, they can life cycle it out on their own timetable. So we had to learn, first of all, how do we take and lift and shift windows based industrial application So they can, they can look at the code, which is great by the way. So we actually build an orchestration service for these nodes running this open source So that's a key secret sauce right there. So that's the final piece, I guess who's using it. And, and one of the business outcomes that they're achieving. I mean, we are deployed in customers in all and gas, edge, who are you working with in, within an organization? So the OT people typically they're So how can you create a model that too can coexist? Or is that what you guys are trying to do? And, and the way we built the company and And are you on AWS or Azure? I mean you gotta, and the edges you're going after you gotta be We kind of identified the it's like, you know, how do you find a black hole in, That's where the cube started by the way. And then I think, you know, to answer your other question, So some customers say, And you leverage their CapEx on that side. the team production data go straight to the customer's cloud and that's why they love us. you know, highly confidential production data and we don't wanna have it either. So that seems to be the trend data, plane customers want full yeah. But the actual machine that I go somewhere else You just had some big news the other day. So that's basically now extending to more, you know, other groups inside VMware. And of course also at the edge. What's the big vision that you see happening at the edge. Like the, these new entrants have done and really, you know, compete with them at that level, Do you see customers becoming cloud super cloud players? that thing smarter, intelligent, and continue to help you in your business. What are some of the things that you're expecting that this is gonna drive? And I think that's the expectation I have is that I think we will be able to, to talk to the it folks and say, So we we've been really, you know, honored with that support. Thank you so much for joining us, talking about what you're doing in distributed edge. Thank you so much. Nice to meet you as well for our guest and John furrier.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Ericsson	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.99+
Juniper	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
Chris Wolf	PERSON	0.99+
Tesla	ORGANIZATION	0.99+
Alibaba	ORGANIZATION	0.99+
2010	DATE	0.99+
Oracle	ORGANIZATION	0.99+
15	QUANTITY	0.99+
Android	TITLE	0.99+
20	QUANTITY	0.99+
First	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Zededa	PERSON	0.99+
John	PERSON	0.99+
both	QUANTITY	0.99+
two components	QUANTITY	0.99+
10	QUANTITY	0.99+
second step	QUANTITY	0.99+
third day	QUANTITY	0.99+
sun Microsystems	ORGANIZATION	0.99+
one	QUANTITY	0.99+
CNCF	ORGANIZATION	0.99+
20,000	QUANTITY	0.99+
Linux	TITLE	0.99+
CapEx	ORGANIZATION	0.99+
windows	TITLE	0.99+
Cloudera	ORGANIZATION	0.99+
John furrier	PERSON	0.99+
two days ago	DATE	0.98+
telco	ORGANIZATION	0.98+
over 55 million	QUANTITY	0.98+
first	QUANTITY	0.98+
two options	QUANTITY	0.98+
one app	QUANTITY	0.98+
500 customers	QUANTITY	0.98+
today	DATE	0.98+
One end	QUANTITY	0.98+
Hadoop wave	EVENT	0.98+
Broadcom	ORGANIZATION	0.97+
Kubernetes	TITLE	0.97+
first network	QUANTITY	0.96+
LFS	ORGANIZATION	0.96+
multicloud	ORGANIZATION	0.95+
VMware Explorer	TITLE	0.95+
first cars	QUANTITY	0.93+
one use case	QUANTITY	0.91+
Ouissal	PERSON	0.9+
about five years old	QUANTITY	0.9+
2022	DATE	0.89+
ZDA	ORGANIZATION	0.88+
pivotal	ORGANIZATION	0.87+
about five years ago	DATE	0.87+
series B round	OTHER	0.86+
hundred thousands	QUANTITY	0.85+
30 years	QUANTITY	0.81+

Super Data Cloud | Supercloud22

(electronic music) >> Welcome back to our studios in Palo Alto, California. My name is Dave Vellante, I'm here with John Furrier, who is taking a quick break. You know, in one of the early examples that we used of so called super cloud was Snowflake. We called it a super data cloud. We had, really, a lot of fun with that. And we've started to evolve our thinking. Years ago, we said that data was going to form in the cloud around industries and ecosystems. And Benoit Dogeville is a many time guest of theCube. He's the co-founder and president of products at Snowflake. Benoit, thanks for spending some time with us, at Supercloud 22, good to see you. >> Thank you, thank you, Dave. >> So, you know, like I said, we've had some fun with this meme. But it really is, we heard on the previous panel, everybody's using Snowflake as an example. Somebody how builds on top of hyper scale infrastructure. You're not building your own data centers. And, so, are you building a super data cloud? >> We don't call it exactly that way. We don't like the super word, it's a bit dismissive. >> That's our term. >> About our friends, cloud provider friends. But we call it a data cloud. And the vision, really, for the data cloud is, indeed, it's a cloud which overlays the hyper scaler cloud. But there is a big difference, right? There are several ways to do this super cloud, as you name them. The way we picked is to create one single system, and that's very important, right? There are several ways, right. You can instantiate your solution in every region of the cloud and, you know, potentially that region could be AWS, that region could be GCP. So, you are, indeed, a multi-cloud solution. But Snowflake, we did it differently. We are really creating cloud regions, which are superimposed on top of the cloud provider region, infrastructure region. So, we are building our regions. But where it's very different is that each region of Snowflake is not one instantiation of our service. Our service is global, by nature. We can move data from one region to the other. When you land in Snowflake, you land into one region. But you can grow from there and you can, you know, exist in multiple cloud at the same time. And that's very important, right? It's not different instantiation of a system, it's one single instantiation which covers many cloud regions and many cloud provider. >> So, we used Snowflake as an example. And we're trying to understand what the salient aspects are of your data cloud, what we call super cloud. In fact, you've used the word instantiate. Kit Colbert, just earlier today, laid out, he said, there's sort of three levels. You can run it on one cloud and communicate with the other cloud, you can instantiate on the clouds, or you can have the same service running 24/7 across clouds, that's the hardest example. >> Yeah. >> The most mature. You just described, essentially, doing that. How do you enable that? What are the technical enablers? >> Yeah, so, as I said, first we start by building, you know, Snowflake regions, we have today 30 regions that span the world, so it's a world wide system, with many regions. But all these regions are connected together. They are meshed together with our technology, we name it Snow Grid, and that makes it hard because, you know, Azure region can talk to a WS region, or GCP regions, and as a user for our cloud, you don't see, really, these regional differences, that regions are in different potentially cloud. When you use Snowflake, you can exist, your presence as an organization can be in several regions, several clouds, if you want, geographic, both geographic and cloud provider. >> So, I can share data irrespective of the cloud. And I'm in the Snowflake data cloud, is that correct? I can do that today? >> Exactly, and that's very critical, right? What we wanted is to remove data silos. And when you insociate a system in one single region, and that system is locked in that region, you cannot communicate with other parts of the world, you are locking data in one region. Right, and we didn't want to do that. We wanted data to be distributed the way customer wants it to be distributed across the world. And potentially sharing data at world scales. >> Does that mean if I'm in one region and I want to run a query, if I'm in AWS in one region, and I want to run a query on data that happens to be in an Azure cloud, I can actually execute that? >> So, yes and no. The way we do it is very expensive to do that. Because, generally, if you want to join data which are in different region and different cloud, it's going to be very expensive because you need to move data every time you join it. So, the way we do it is that you replicate the subset of data that you want to access from one region from other region. So, you can create this data mesh, but data is replicated to make it very cheap and very performing too. >> And is the Snow Grid, does that have the metadata intelligence to actually? >> Yes, yes. >> Can you describe that a little? >> Yeah, Snow Grid is both a way to exchange metadata. So, each region of Snowflake knows about all the other regions of Snowflake. Every time we create a new region, the metadata is distributed over our data cloud, not only region knows all the region, but knows every organization that exists in our cloud, where this organization is, where data can be replicated by this organization. And then, of course, it's also used as a way to exchange data, right? So, you can exchange data by scale of data size. And I was just receiving an email from one of our customers who moved more than four petabytes of data, cross region, cross cloud providers in, you know, few days. And it's a lot of data, so it takes some time to move. But they were able to do that online, completely online, and switch over to the other region, which is very important also. >> So, one of the hardest parts about super cloud that I'm still trying to struggling through is the security model. Because you've got the cloud as your sort of first line of defense. And now we've got multiple clouds, with multiple first lines of defense, I've got a shared responsibility model across those clouds, I've got different tools in each of those clouds. Do you take care of that? Where do you pick up from the cloud providers? Do you abstract that security layer? Do you bring in partners? It's a very complicated. >> No, this is a great question. Security has always been the most important aspect of Snowflake sense day one, right? This is the question that every customer of ours has. You know, how can you guarantee the security of my data? And, so, we secure data really tightly in region. We have several layers of security. It starts by creating every data at rest. And that's very important. A lot of customers are not doing that, right? You hear of these attacks, for example, on cloud, where someone left their buckets. And then, you know, you can access the data because it's a non-encrypted. So, we are encrypting everything at rest. We are encrypting everything in transit. So, a region is very secure. Now, you know, from one region, you never access data from another region in Snowflake. That's why, also, we replicate data. Now the replication of that data across region, or the metadata, for that matter, is really our least secure, so Snow Grid ensures that everything is encrypted, everything is, we have multiple encryption keys, and it's stored in hardware secure modules, so, we bit Snow Grid such that it's secure and it allows very secure movement of data. >> Okay, so, I know we kind of, getting into the technology here a lot today, but because super cloud is the future, we actually have to have an architectural foundation on which to build. So, you mentioned a bucket, like an S3 bucket. Okay, that's storage, but you also, for instance, taking advantage of new semi-conductor technology. Like Graviton, as an example, that drives efficiency. You guys talk about how you pass that on to your customers. Even if it means less revenue for you, so, awesome, we love that, you'll make it up in volume. And, so. >> Exactly. >> How do you deal with the lowest common denominator problem? I was talking to somebody the other day and this individual brought up what I thought was a really good point. What if we, let's say, AWS, have the best, silicon. And we can run the fastest and the least expensive, and the lowest power. But another cloud provider hasn't caught up yet. How do you deal with that delta? Do you just take the best of and try to respect that? >> No, it's a great question. I mean, of course, our software is extracting all the cloud providers infrastructure so that when you run in one region, let's say AWS, or Azure, it doesn't make any difference, as far as the applications are concerned. And this abstraction, of course, is a lot of work. I mean, really, a lot of work. Because it needs to be secure, it needs to be performance, and every cloud, and it has to expose APIs which are uniform. And, you know, cloud providers, even though they have potentially the same concept, let's say block storage, APIs are completely different. The way these systems are secure, it's completely different. There errors that you can get. And the retry mechanism is very different from one cloud to the other. The performance is also different. We discovered that when we starting to port our software. And we had to completely rethink how to leverage block storage in that cloud versus that cloud, because just off performance too. And, so, we had, for example, to stripe data. So, all this work is work that you don't need as an application because our vision, really, is that application, which are running in our data cloud, can be abstracted for this difference. And we provide all the services, all the workload that this application need. Whether it's transactional access to data, analytical access to data, managing logs, managing metrics, all of this is abstracted too, so that they are not tied to one particular service of one cloud. And distributing this application across many region, many cloud, is very seamless. >> So, Snowflake has built, your team has built a true abstraction layer across those clouds that's available today? It's actually shipping? >> Yes, and we are still developing it. You know, transactional, Unistore, as we call it, was announced last summit. So, they are still, you know, work in progress. >> You're not done yet. >> But that's the vision, right? And that's important, because we talk about the infrastructure, right. You mention a lot about storage and compute. But it's not only that, right. When you think about application, they need to use the transactional database. They need to use an analytical system. They need to use machine learning. So, you need to provide, also, all these services which are consistent across all the cloud providers. >> So, let's talk developers. Because, you know, you think Snowpark, you guys announced a big application development push at the Snowflake summit recently. And we have said that a criterion of super cloud is a super paz layer, people wince when I say that, but okay, we're just going to go with it. But the point is, it's a purpose built application development layer, specific to your particular agenda, that supports your vision. >> Yes. >> Have you essentially built a purpose built paz layer? Or do you just take them off the shelf, standard paz, and cobble it together? >> No, we build it a custom build. Because, as you said, what exist in one cloud might not exist in another cloud provider, right. So, we have to build in this, all these components that a multi-application need. And that goes to machine learning, as I said, transactional analytical system, and the entire thing. So that it can run in isolation physically. >> And the objective is the developer experience will be identical across those clouds? >> Yes, the developers doesn't need to worry about cloud provider. And, actually, our system will have, we didn't talk about it, but a marketplace that we have, which allows, actually, to deliver. >> We're getting there. >> Yeah, okay. (both laughing) I won't divert. >> No, no, let's go there, because the other aspect of super cloud that we've talked about is the ecosystem. You have to enable an ecosystem to add incremental value, it's not the power of many versus the capabilities of one. So, talk about the challenges of doing that. Not just the business challenges but, again, I'm interested in the technical and architectural challenges. >> Yeah, yeah, so, it's really about, I mean, the way we enable our ecosystem and our partners to create value on top of our data cloud, is via the marketplace. Where you can put shared data on the marketplace. Provide listing on this marketplace, which are data sets. But it goes way beyond data. It's all the way to application. So, you can think of it as the iPhone. A little bit more, all right. Your iPhone is great. Not so much because the hardware is great, or because of the iOS, but because of all the applications that you have. And all these applications are not necessarily developed by Apple, basically. So, we are, it's the same model with our marketplace. We foresee an environment where providers and partners are going to build these applications. We call it native application. And we are going to help them distribute these applications across cloud, everywhere in the world, potentially. And they don't need to worry about that. They don't need to worry about how these applications are going to be instantiated. We are going to help them to monetize these applications. So, that unlocks, you know, really, all the partner ecosystem that you have seen, you know, with something like the iPhone, right? It has created so many new companies that have developed these applications. >> Your detractors have criticized you for being a walled garden. I've actually used that term. I used terms like defacto standard, which are maybe less sensitive to you, but, nonetheless, we've seen defacto standards actually deliver value. I've talked to Frank Slootman about this, and he said, Dave, we deliver value, that's what we're all about. At the same time, he even said to me, and I want your thoughts on this, is, look, we have to embrace open source where it makes sense. You guys announced Apache Iceberg. So, what are your thoughts on that? Is that to enable a developer ecosystem? Why did you do Iceberg? >> Yeah, Iceberg is very important. So, just to give some context, Iceberg is an open table format. >> Right. >> Which was first developed by Netflix. And Netflix put it open source in the Apache community. So, we embraced that open source standard because it's widely used by many companies. And, also, many companies have really invested a lot of effort in building big data, Hadoop Solutions, or DataX Solution, and they want to use Snowflake. And they couldn't really use Snowflake, because all their data were in open format. So, we are embracing Iceberg to help these companies move through the cloud. But why we have been reluctant with direct access to data, direct access to data is a little bit of a problem for us. And the reason is when you direct access to data, now you have direct access to storage. Now you have to understand, for example, the specificity of one cloud versus the other. So, as soon as you start to have direct access to data, you lose your cloud data sync layer. You don't access data with API. When you have direct access to data, it's very hard to sync your data. Because you need to grant access, direct access to tools which are not protected. And you see a lot of hacking of data because of that. So, direct access to data is not serving well our customers, and that's why we have been reluctant to do that. Because it is not cloud diagnostic. You have to code that, you need a lot of intelligence, why APIs access, so we want open APIs. That's, I guess, the way we embrace openness, is by open API versus you access, directly, data. >> iPhone. >> Yeah, yeah, iPhone, APIs, you know. We define a set of APIs because APIs, you know, the implementation of the APIs can change, can improve. You can improve compression of data, for example. If you open direct access to data now, you cannot evolve. >> My point is, you made a promise, from governed, security, data sharing ecosystem. It works the same way, so that's the path that you've chosen. Benoit Dogeville, thank you so much for coming on theCube and participating in Supercloud 22, really appreciate that. >> Thank you, Dave. It was a great pleasure. >> All right, keep it right there, we'll be right back with our next segment, right after this short break. (electronic music)

Published Date : Aug 9 2022

SUMMARY :

You know, in one of the So, you know, like I said, We don't like the super and you can, you know, or you can have the same How do you enable that? we start by building, you know, And I'm in the Snowflake And when you insociate a So, the way we do it is that you replicate So, you can exchange data So, one of the hardest And then, you know, So, you mentioned a and the least expensive, so that when you run in one So, they are still, you know, So, you need to provide, Because, you know, you think Snowpark, And that goes to machine a marketplace that we have, I won't divert. So, talk about the of all the applications that you have. At the same time, he even said to me, So, just to give some context, You have to code that, you because APIs, you know, so that's the path that you've chosen. It was a great pleasure. with our next segment, right

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Frank Slootman	PERSON	0.99+
Benoit	PERSON	0.99+
Dave	PERSON	0.99+
Apple	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Netflix	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Kit Colbert	PERSON	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
Palo Alto, California	LOCATION	0.99+
Benoit Dogeville	PERSON	0.99+
one region	QUANTITY	0.99+
iOS	TITLE	0.99+
30 regions	QUANTITY	0.99+
more than four petabytes	QUANTITY	0.99+
Snowflake	EVENT	0.99+
first line	QUANTITY	0.99+
Snowpark	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.98+
today	DATE	0.98+
Apache	ORGANIZATION	0.98+
both	QUANTITY	0.98+
each	QUANTITY	0.98+
one	QUANTITY	0.97+
Unistore	ORGANIZATION	0.97+
Supercloud 22	EVENT	0.97+
first lines	QUANTITY	0.97+
DataX Solution	ORGANIZATION	0.97+
each region	QUANTITY	0.96+
Snow Grid	TITLE	0.96+
one cloud	QUANTITY	0.96+
one single region	QUANTITY	0.96+
first	QUANTITY	0.96+
one region	QUANTITY	0.96+
Hadoop Solutions	ORGANIZATION	0.95+
WS	ORGANIZATION	0.94+
Supercloud 22	ORGANIZATION	0.93+
three levels	QUANTITY	0.93+
Snowflake	TITLE	0.93+
one single system	QUANTITY	0.92+
Iceberg	TITLE	0.89+
one single instantiation	QUANTITY	0.89+
theCube	ORGANIZATION	0.86+
Azure	ORGANIZATION	0.85+
Years ago	DATE	0.83+
earlier today	DATE	0.82+
one instantiation	QUANTITY	0.82+
Super Data Cloud	ORGANIZATION	0.81+
S3	COMMERCIAL_ITEM	0.8+
one cloud	QUANTITY	0.76+
delta	ORGANIZATION	0.76+
Azure	TITLE	0.75+
one of our customers	QUANTITY	0.72+
day one	QUANTITY	0.72+
Supercloud22	EVENT	0.66+

Ali Ghodsi, Databricks | Supercloud22

(light hearted music) >> Okay, welcome back to Supercloud '22. I'm John Furrier, host of theCUBE. We got Ali Ghodsi here, co-founder and CEO of Databricks. Ali, Great to see you. Thanks for spending your valuable time to come on and talk about Supercloud and the future of all the structural change that's happening in cloud computing. >> My pleasure, thanks for having me. >> Well, first of all, congratulations. We've been talking for many, many years, and I still go back to the video that we have in archive, you talking about cloud. And really, at the beginning of the big reboot, I called the post Hadoop, a revitalization of data. Congratulations, you've been cloud-first, now on multiple clouds. Congratulations to you and your team for achieving what looks like a billion dollars in annualized revenue as reported by the Wall Street Journal, so first, congratulations. >> Thank you so much, appreciate it. >> So I was talking to some young developers and I asked a random poll, what do you think about Databricks? Oh, we love those guys, they're AI and ML-native, and that's their advantage over the competition. So I pressed why. I don't think they knew why, but that's an interesting perspective. This idea of cloud native, AI/ML-native, ML Ops, this has been a big trend and it's continuing. This is a big part of how this change and this structural change is happening. How do you react to that? And how do you see Databricks evolving into this new Supercloud-like multi-cloud environment? >> Yeah, look, I think it's a continuum. It starts with having data, but they want to clean it, you know, and they want to get insights out of it. But then, eventually, you'd like to start asking questions, doing reports, maybe ask questions about what was my revenue yesterday, last week, but soon you want to start using the crystal ball, predictive technology. Okay, but what will my revenue be next week? Next quarter? Who's going to churn? And if you can finally automate that completely so that you can act on the predictions, right? So this credit card that got swiped, the AI thinks it's fraud, we're going to deny it. That's when you get real value. So we're trying to help all these organizations move through this data AI maturity curve, all the way to that, the prescriptive, automated AI machine learning. That's when you get real competitive advantage. And you know, we saw that with the fans, right? I mean, Google wouldn't be here today if it wasn't for AI. You know, we'd be using AltaVista or something. We want to help all organizations to be able to leverage data and AI that way that the fans did. >> One of the things we're looking at with supercloud and why we call it supercloud versus other things like multi-cloud is that today a lot of the successful companies have started in the cloud have been successful, but have realized and even enterprises who have gotten by accident, and maybe have done nothing with cloud have just some cloud projects on multiple clouds. So, people have multiple cloud operational things going on but it hasn't necessarily been a strategy per se. It's been more of kind of a default reaction to things but the ones that are innovating have been successful in one native cloud because the use cases that drove that got scale got value, and then they're making that super by bringing it on premise, putting in a modern data stack, for the modern application development, and kind of dealing with the things that you guys are in the middle of with data bricks is that, that is where the action is, and they don't want to go, lose the trajectory in all the economies of scale. So we're seeing another structural change where the evolutionary nature of the cloud has solved a bunch of use cases, but now other use cases are emerging that's on premises and edge that have been driven by applications because of the developer boom, that's happening. You guys are in the middle of it. What is happening with this structural change? Are people looking for the modern data stack? Are they looking for more AI? What's the, what's your perspective on this supercloud kind of position? >> Look, it started with not AR on multiple clouds, right? So multi-cloud has been a thing. It became a thing 70, 80% of our customers when you ask them, they're more than one cloud. But then soon to start realizing that, hey, you know, if I'm on multiple clouds, this data stuff is hard enough as it is. Do I want to redo it again and again with different proprietary technologies, on each of the clouds. And that's when I started thinking about let's standardize this, let's figure out a way which just works across them. That's where I think open source comes in, becomes really important. Hey, can we leverage open standards because then we can make it work in these different environments, as we said so that we can actually go super, as you said, that's one. The second thing is, can we simplify it? You know, and I think today, the data landscape is complicated. Conceptually it's simple. You have data which is essentially customer data that you have, maybe employee data. And you want to get some kind of insights from that. But how you do that is very complicated. You have to buy data warehouse, hire data analysts. You have to buy, store stuff in the Delta Lake you know, get your data engineers. If you want streaming real time thing that's another complete different set of technologies you have to buy. And then you have to stitch all these together, and you have to do again and again on every cloud. So they just want simplification. So that's why we're big believers in this Delta Lakehouse concept. Which is an open standard to simplifying this data stack and help people to just get value out of their data in any environment. So they can do that in this sort of supercloud as you call it. >> You know, we've been talking about that in previous interviews, do the heavy lifting let them get the value. I have to ask you about how you see that going forward, Because if I'm a customer, I have a lot of operational challenges. Cause the developers are are kicking butt right now. We see that clearly. Open sources growing at, and continue to be great. But ops and security teams they really care about this stuff. And most companies don't want to spin up multiple ops teams to deal with different stacks. This is one big problem that I think that's leading into the multi-cloud viability. How do you guys deal with that? How do you talk to customers when they say, I want to have less complications on operations? >> Yeah, you're absolutely right. You know, it's easy for a developer to adopt all these technologies and new things are coming out all the time. The ops teams are the ones that have to make sure this works. Doing that in multiple different environments is super hard. especially when there's a proprietary stack in each environment that's different. So they just want standardization. They want open source, that's super important. We hear that all the time from them. They want open the source technologies. They believe in the communities around it. You know, they know that source code is open. So you can also see if there's issues with it. If there's security breaches, those kind of things that they can have a community around it. So they can actually leverage that. So they're the ones that are really pushing this, and we're seeing it across the board. You know, it starts first with the digital natives you know, the companies that are, but slowly it's also now percolating to the other organizations, we're hearing across the board. >> Where are we, Ali on the innovation strategies for customers? Where are they on the trajectory around how they're building out their teams? How are they looking at the open source? How are they extending the value proposition of Databricks, and data at scale, as they start to build out their teams and operations, because some are like kind of starting, crawl, walk, run, kind of vibe. Some are big companies, they're dealing with data all the time. Where are they in their journey? What's the core issues that they're solving? What are some of the use cases that you see that are most pressing in customer? >> Yeah, what I've seen, that's really exciting about this Delta Lakehouse concept is that we're now seeing a lot of use cases around real time. So real time fraud detection, real time stock ticker pricing, anyone that's doing trading, they want that to work real time. Lots of use cases around that. Lots of use cases around how do we in real time drive more engagement on our web assets if we're a media company, right? We have all these assets how do we get people to get engaged? Stay on our sites. Continue engaging with the material we have. Those are real time use cases. And the interesting thing is, they're real time. So, you know, it's really important that you that now you don't want to recommend someone, hey, you should go check out this restaurant if they just came from that restaurant, half an hour ago. So you want it to be real time, but B, that it's also all based on machine learning. These are a lot of this is trying to predict what you want to see, what you want to do, is it fraudulent? And that's also interesting because basically more and more machine learning is coming in. So that's super exciting to see, the combination of real time and machine learning on the Lakehouse. And finally, I would say the Lakehouse is really important for this because that's where the data is flowing in. If they have to take that data that's flowing into the lake and actually copy it into a separate warehouse, that delays the real time use cases. And then it can't hit those real time deadlines. So that's another catalyst for this Lakehouse pattern. >> Would that be an example of how the metrics are changing? Cause I've been looking at some people saying, well you can tell if someone's doing well there's a lot of data being transferred. And then I was saying, well, wait a minute. Data transfer costs money, right? And time. So this is interesting dynamic, in a way you don't want to have a lot of movement, right? >> Yeah, movement actually decreases for a lot of these real time use cases. 'Cause what we saw in the past was that they would run a batch processing to process all the data. So once they process all the data. But actually if you look at the things that have changed since the data that we have yesterday it's actually not that much. So if you can actually incrementally process it in real time, you can actually reduce the cost of transfers and storage and processing. So that's actually a great point. That's also one of the main things that we're seeing with the use cases, the bill shrinks and the cost goes down, and they can process less. >> Yeah, and it'd be interesting to see how those KPIs evolve into industry metrics down the road around the supercloud of evolution. I got to ask you about the open source concept of data platforms. You guys have been a pioneer in there doing great work, kind of picking the baton off where the Hadoop World left off as Dave Vellante always points out. But if working across clouds is super important. How are you guys looking at the ability to work across the different clouds with data bricks? Are you going to build that abstraction yourself? Does data sharing and model sharing kind of come into play there? How do you see this data bricks capability across the clouds? >> Yeah, I mean, let me start by saying, we just we're big fans of open source. We think that open source is a force in software. That's going to continue for, decades, hundreds of years, and it's going to slowly replace all proprietary code in its way. We saw that, it could do that with the most advanced technology. Windows, you know proprietary operating system, very complicated, got replaced with Linux. So open source can pretty much do anything. And what we're seeing with the Delta Lakehouse is that slowly the open source community is building a replacement for the proprietary data warehouse, Delta Lake, machine learning, real time stack in open source. And we're excited to be part of it. For us, Delta Lake is a very important project that really helps you standardize how you layout your data in the cloud. And when it comes a really important protocol called data sharing, that enables you in a open way actually for the first time ever share large data sets between organizations, but it uses an open protocol. So the great thing about that is you don't need to be a Databricks customer. You don't need to even like Databricks, you just need to use this open source project and you can now securely share data sets between organizations across clouds. And it actually does so really efficiently just one copy of the data. So you don't have to copy it if you're within the same cloud. >> So you're playing the long game on open source. >> Absolutely. I mean, this is a force it's going to be there if if you deny it, before you know it there's going to be, something like Linux, that is going to be a threat to your propriety. >> I totally agree by the way. I was just talking to somebody the other day and they're like hey, the software industry someone made the comment, the software industry, the software industry is open source. There's no more software industry, it's called open source. It's integrations that become interesting. And I was looking at integrations now is really where the action is. And we had a panel with the Clouderati we called it, the people have been around for a long time. And it was called the innovator's dilemma. And one of the comments was it's the integrator's dilemma, not the innovator's dilemma. And this is a big part of this piece of supercloud. Can you share your thoughts on how cloud and integration need to be tightened up to really make it super? >> Actually that's a great point. I think the beauty of this is, look the ecosystem of data today is vast, there's this picture that someone puts together every year of all the different vendors and how they relate, and it gets bigger and bigger and messy and messier. So, we see customers use all kinds of different aspects of what's existing in the ecosystem and they want it to be integrated in whatever you're selling them. And that's where I think the power of open source comes in. Open source, you get integrations that people will do without you having to push it. So us, Databricks as a vendor, we don't have to go tell people please integrate with Databricks. The open source technology that we contribute to, automatically, people are integrating with it. Delta Lake has integrations with lots of different software out there and Databricks as a company doesn't have to push that. So I think open source is also another thing that really helps with the ecosystem integrations. Many of these companies in this data space actually have employees that are full-time dedicated to make sure make sure our software works well with Spark. Make sure our software works well with Delta and they contribute back to that community. And that's the way you get this sort of ecosystem to further sort of flourish. >> Well, I really appreciate your time. And I, my final question for you is, as we're kind of unpack and and kind of shape and frame supercloud for the future, how would you see a roadmap or architecture or outcome for companies that are going to clearly be in the cloud where it's open source is going to be dominating. Integrations has got to be seamless and frictionless. Abstraction layer make things super easy and take away the complexity. What is supercloud to them? What does the outcome look like? How would you define a supercloud environment for an enterprise? >> Yeah, for me, it's the simplification that you get where you standardize an open source. You get your data in one place, in one format in one standardized way, and then you can get your insights from it, without having to buy lots of different idiosyncratic proprietary software from different vendors. That's different in each environment. So it's this slow standardization that's happening. And I think it's going to happen faster than we think. And I think in a couple years it's going to be a requirement that, does your software work on all these different departments? Is it based on open source? Is it using this Delta Lake house pattern? And if it's not, I think they're going to demand it. >> Yeah, I feel like we're close to some sort of defacto standard coming and you guys are a big part of it, once that clicks in, it's going to highly accelerate in the open, and I think it's going to be super valuable. Ali, thank you so much for your time, and congratulations to you and your team. Like we've been following you guys since the beginning. Remember the early days and look how far it's come. And again, you guys are really making a big difference in making a super cool environment out there. Thanks for coming on sharing. >> Thank you so much John. >> Okay, this is supercloud 22. I'm John Furrier stay with more for more coverage and more commentary after this break. (light hearted music)

Published Date : Aug 7 2022

SUMMARY :

and the future of all Congratulations to you and your team And how do you see Databricks evolving And if you can finally One of the things we're And then you have to I have to ask you about how We hear that all the time from them. What are some of the use cases that delays the real time use cases. in a way you don't want to So if you can actually incrementally I got to ask you about So you don't have to copy it So you're playing the that is going to be a And one of the comments was And that's the way you and take away the complexity. simplification that you get and congratulations to you and your team. Okay, this is supercloud 22.

ENTITIES

Entity	Category	Confidence
Ali Ghodsi	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Google	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
John	PERSON	0.99+
last week	DATE	0.99+
next week	DATE	0.99+
Ali	PERSON	0.99+
Next quarter	DATE	0.99+
yesterday	DATE	0.99+
John Furrier	PERSON	0.99+
Delta	ORGANIZATION	0.99+
one format	QUANTITY	0.99+
first	QUANTITY	0.99+
today	DATE	0.98+
second thing	QUANTITY	0.98+
one	QUANTITY	0.98+
Linux	TITLE	0.98+
one copy	QUANTITY	0.98+
Delta Lakehouse	ORGANIZATION	0.98+
supercloud 22	ORGANIZATION	0.98+
more than one cloud	QUANTITY	0.98+
each environment	QUANTITY	0.98+
Clouderati	ORGANIZATION	0.98+
Supercloud22	ORGANIZATION	0.98+
hundreds of years	QUANTITY	0.97+
Delta Lake	LOCATION	0.97+
one big problem	QUANTITY	0.97+
70, 80%	QUANTITY	0.97+
Windows	TITLE	0.96+
one place	QUANTITY	0.96+
first time	QUANTITY	0.96+
billion dollars	QUANTITY	0.95+
decades	QUANTITY	0.95+
Delta Lake	ORGANIZATION	0.95+
One	QUANTITY	0.94+
supercloud	ORGANIZATION	0.94+
Supercloud	ORGANIZATION	0.94+
half an hour ago	DATE	0.93+
Delta Lake	TITLE	0.92+
Lakehouse	ORGANIZATION	0.92+
Spark	TITLE	0.91+
each	QUANTITY	0.91+
a minute	QUANTITY	0.85+
one of	QUANTITY	0.73+
one native	QUANTITY	0.72+
supercloud	TITLE	0.7+
couple years	QUANTITY	0.66+
AltaVista	ORGANIZATION	0.65+
Wall Street Journal	ORGANIZATION	0.63+
theCUBE	ORGANIZATION	0.63+
Lakehouse	TITLE	0.51+
Lake	LOCATION	0.46+
Hadoop World	TITLE	0.41+
'22	EVENT	0.24+

Evolution of Data Lakes

(light music) >> Kevin Miller joins us. He's the Vice President and General Manager of Amazon S3. And we're going to discuss the evolution of data lakes. Hey Kevin. >> Hey Dave. Great to be here. >> Yeah, let's riff on this a little bit. Why is S3 so popular for data lakes? How have data lakes on S3 changed and evolved? >> Well, I think a lot of the core benefits of S3 really play directly into what customers are looking for when they're building a data lake, right? They're looking for low cost storage, some place that they can put shared data sets and have, make it very easy for other teams and businesses to access a set of data as well as have all the management around it. Knowing that the data's secure, is durable, it's protected. And so all of the capability that S3 provides out of the box, is just a really good fit for what customers need out of a data lake storage provider. >> And it's really the simple form. I remember when Schema on Read hit, and people were like, oh great, we can just shove all our stuff into a data lake. And then of course the old broma it became a data swamp. But the industry has evolved, hasn't it? It has new tools, machine intelligence and AI, and machine learning have really helped a lot. Talk about how that's changed from the, the old days if you will, where it was just kind of this mess and you really couldn't do much with it. And why today we're able to get so much more out of data lakes. >> Yeah. I think that original use of data lakes centered a lot around analytics and sort of Hadoop or Spark type applications. And that continues to be a big driver. But I think that one is that we're continuing to expand the kinds of applications. Like you mentioned, machine learning, or other kinds of intelligence are, those applications are increasing as things that customers want to do around these shared data sets. And being able to pretty easily sort of dynamically combine data sets together and use that to drive more insight. I think that you're absolutely right. You know, if you left unstructured or left without any kind of governance you can quickly develop a lot of unusable data. And so I think we're seeing the evolution is in customers putting more of a governance structure in place around it, really trying to understand and catalog the the data sets they have. And I think that's going to continue. That's something that we're seeing pretty actively develop right now in terms of knowing what data I have, knowing the essence of metadata around it. As far as how frequently is this data being updated? When is it updated? What are the rules around when I can access it and so forth. As well as around data lake access control, making it very easy to grant an end user, a specific end user, access to certain data sets knowing that they can then audit and really know exactly who has access to what data in that data lake. So you're seeing a lot of that governance type structure come around while not taking away the essence of having a simple, low cost, scalable way to store and then access data from a number of applications. So that's all now starting to really come together, I see. >> I think this is a really important point you're making because I see organizations rethinking their data architecture and their data organizations to really put put data in the hands of the lines of business, those with domain expertise and self-service is becoming really important. I see a lot of organizations say, 'Hey we're going to give the lines of business their own data lakes that they can spin up' but, they have to be governed in a federated fashion. I know you guys use this term lake house. How do these things fit together? >> Well, Dave, I think you're absolutely right. I think that what a lot of organizations, what I see a lot of organizations doing is evolving to a point where they want as minimal layers between someone who owns a business outcome. Whether it's a top level revenue generation line or bottom level cost line, they want to connect the people who are in the, closest to the business problem with the applications and the technology that they can use to solve it. And that's, a big part of that then is the data and the data sets that are available. So I think where it needs to come together and where it is coming together is around making it very easy to federate, to know what data sources I have, to know what the rules are around accessing it, to remove as much of the friction as we can around just the basics of provisioning access. Knowing that this set of people is allowed to access it. And how do they access it. Just as much as possible removing that, so that it's not weeks between when I have an idea and when I can build an application to process that data. Ideally it's within an hour, I have an idea, I can spin up a notebook. I can pull in the data sets I need. Train an ML algorithm or build some analytics function and then start to see some results and see is this really working or not? And then of course sort of scale it up from there in a seamless fashion. So I think that a lot of the essence of AWS that we've built over the years is really starting to come together. And where we are continuing to make it simpler for customers is all around that federation and the simplicity of provisioning access to the data. >> And share that data across a massive global network. Kevin Miller, thanks so much for coming on theCube and talking about data lakes. >> Yeah. Thanks for having me, Dave. >> You're welcome. And thank you for watching. This is Dave Vellante for theCube. (light music)

Published Date : Aug 1 2022

SUMMARY :

the evolution of data lakes. Why is S3 so popular for data lakes? And so all of the And it's really the simple form. And I think that's going to continue. of the lines of business, of the essence of AWS And share that data across And

ENTITIES

Entity	Category	Confidence
Kevin Miller	PERSON	0.99+
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Kevin	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
today	DATE	0.94+
an hour	QUANTITY	0.93+
Spark	TITLE	0.93+
Schema	TITLE	0.87+
S3	TITLE	0.84+
one	QUANTITY	0.81+
Hadoop	TITLE	0.67+
S3	COMMERCIAL_ITEM	0.46+

Haseeb Budhani, Rafay & Kevin Coleman, AWS | AWS Summit New York 2022

(gentle music) (upbeat music) (crowd chattering) >> Welcome back to The City That Never Sleeps. Lisa Martin and John Furrier in New York City for AWS Summit '22 with about 10 to 12,000 of our friends. And we've got two more friends joining us here today. We're going to be talking with Haseeb Budhani, one of our alumni, co-founder and CEO of Rafay Systems, and Kevin Coleman, senior manager for Go-to Market for EKS at AWS. Guys, thank you so much for joining us today. >> Thank you very much for having us. Excited to be here. >> Isn't it great to be back at an in-person event with 10, 12,000 people? >> Yes. There are a lot of people here. This is packed. >> A lot of energy here. So, Haseeb, we've got to start with you. Your T-shirt says it all. Don't hate k8s. (Kevin giggles) Talk to us about some of the trends, from a Kubernetes perspective, that you're seeing, and then Kevin will give your follow-up. >> Yeah. >> Yeah, absolutely. So, I think the biggest trend I'm seeing on the enterprise side is that enterprises are forming platform organizations to make Kubernetes a practice across the enterprise. So it used to be that a BU would say, "I need Kubernetes. I have some DevOps engineers, let me just do this myself." And the next one would do the same, and then next one would do the same. And that's not practical, long term, for an enterprise. And this is now becoming a consolidated effort, which is, I think it's great. It speaks to the power of Kubernetes, because it's becoming so important to the enterprise. But that also puts a pressure because what the platform team has to solve for now is they have to find this fine line between automation and governance, right? I mean, the developers, you know, they don't really care about governance. Just give me stuff, I need to compute, I'm going to go. But then the platform organization has to think about, how is this going to play for the enterprise across the board? So that combination of automation and governance is where we are finding, frankly, a lot of success in making enterprise platform team successful. I think, that's a really new thing to me. It's something that's changed in the last six months, I would say, in the industry. I don't know if, Kevin, if you agree with that or not, but that's what I'm seeing. >> Yeah, definitely agree with that. We see a ton of customers in EKS who are building these new platforms using Kubernetes. The term that we hear a lot of customers use is standardization. So they've got various ways that they're deploying applications, whether it's on-prem or in the cloud and region. And they're really trying to standardize the way they deploy applications. And Kubernetes is really that compute substrate that they're standardizing on. >> Kevin, talk about the relationship with Rafay Systems that you have and why you're here together. And two, second part of that question, why is EKS kicking ass so much? (Haseeb and Kevin laughing) All right, go ahead. First one, your relationship. Second one, EKS is doing pretty well. >> Yep, yep, yep. (Lisa laughing) So yeah, we work closely with Rafay, Rafay, excuse me. A lot of joint customer wins with Haseeb and Co, so they're doing great work with EKS customers and, yeah, love the partnership there. In terms of why EKS is doing so well, a number of reasons, I think. Number one, EKS is vanilla, upstream, open-source Kubernetes. So customers want to use that open-source technology, that open-source Kubernetes, and they come to AWS to get it in a managed offering, right? Kubernetes isn't the easiest thing to self-manage. And so customers, you know, back before EKS launched, they were banging down the door at AWS for us to have a managed Kubernetes offering. And, you know, we launched EKS and there's been a ton of customer adoption since then. >> You know, Lisa, when we, theCUBE 12 years, now everyone knows we started in 2010, we used to cover a show called OpenStack. >> I remember that. >> OpenStack Summit. >> What's that now? >> And at the time, at that time, Kubernetes wasn't there. So theCUBE was present at creation. We've been to every KubeCon ever, CNCF then took it over. So we've been watching it from the beginning. >> Right. And it reminds me of the same trend we saw with MapReduce and Hadoop. Very big promise, everyone loved it, but it was hard, very difficult. And Hadoop's case, big data, it ended up becoming a data lake. Now you got Spark, or Snowflake, and Databricks, and Redshift. Here, Kubernetes has not yet been taken over. But, instead, it's being abstracted away and or managed services are emerging. 'Cause general enterprises can't hire enough Kubernetes people. >> Yep. >> They're not that many out there yet. So there's the training issue. But there's been the rise of managed services. >> Yep. >> Can you guys comment on what your thoughts are relative to that trend of hard to use, abstracting away the complexity, and, specifically, the managed services? >> Yeah, absolutely. You want to go? >> Yeah, absolutely. I think, look, it's important to not kid ourselves. It is hard. (Johns laughs) But that doesn't mean it's not practical, right. When Kubernetes is done well, it's a thing of beauty. I mean, we have enough customer to scale, like, you know, it's like a, forget a hockey stick, it's a straight line up, because they just are moving so fast when they have the right platform in place. I think that the mistake that many of us make, and I've made this mistake when we started this company, was trivializing the platform aspect of Kubernetes, right. And a lot of my customers, you know, when they start, they kind of feel like, well, this is not that hard. I can bring this up and running. I just need two people. It'll be fine. And it's hard to hire, but then, I need two, then I need two more, then I need two, it's a lot, right. I think, the one thing I keep telling, like, when I talk to analysts, I say, "Look, somebody needs to write a book that says, 'Yes, it's hard, but, yes, it can be done, and here's how.'" Let's just be open about what it takes to get there, right. And, I mean, you mentioned OpenStack. I think the beauty of Kubernetes is that because it's such an open system, right, even with the managed offering, companies like Rafay can build really productive businesses on top of this Kubernetes platform because it's an open system. I think that is something that was not true with OpenStack. I've spent time with OpenStack also, I remember how it is. >> Well, Amazon had a lot to do with stalling the momentum of OpenStack, but your point about difficulty. Hadoop was always difficult to maintain and hiring against. There were no managed services and no one yet saw that value of big data yet. Here at Kubernetes, people are living a problem called, I'm scaling up. >> Yep. And so it sounds like it's a foundational challenge. The ongoing stuff sounds easier or manageable. >> Once you have the right tooling. >> Is that true? >> Yeah, no, I mean, once you have the right tooling, it's great. I think, look, I mean, you and I have talked about this before, I mean, the thesis behind Rafay is that, you know, there's like 8, 12 things that need to be done right for Kubernetes to work well, right. And my whole thesis was, I don't want my customer to buy 10, 12, 15 products. I want them to buy one platform, right. And I truly believe that, in our market, similar to what vCenter, like what VMware's vCenter did for VMs, I want to do that for Kubernetes, right. And that the reason why I say that is because, see, vCenter is not about hypervisors, right? vCenter is about hypervisor, access, networking, storage, all of the things, like multitenancy, all the things that you need to run an enterprise-grade VM environment. What is that equivalent for the Kubernetes world, right? So what we are doing at Rafay is truly building a vCenter, but for Kubernetes, like a kCenter. I've tried getting the domain. I couldn't get it. (Kevin laughs) >> Well, after the Broadcom view, you don't know what's going to happen. >> Ehh. (John laughs) >> I won't go there! >> Yeah. Yeah, let's not go there today. >> Kevin, EKS, I've heard people say to me, "Love EKS. Just add serverless, that's a home run." There's been a relationship with EKS and some of the other Amazon tools. Can you comment on what you're seeing as the most popular interactions among the services at AWS? >> Yeah, and was your comment there, add serverless? >> Add serverless with AKS at the edge- >> Yeah. >> and things are kind of interesting. >> I mean, so, one of the serverless offerings we have today is actually Fargate. So you can use Fargate, which is our serverless compute offering, or one of our serverless compute offerings with EKS. And so customers love that. Effectively, they get the beauty of EKS and the Kubernetes API but they don't have to manage nodes. So that's, you know, a good amount of adoption with Fargate as well. But then, we also have other ways that they can manage their nodes. We have managed node groups as well, in addition to self-managed nodes also. So there's a variety of options that customers can use from a compute perspective with EKS. And you'll continue to see us evolve the portfolio as well. >> Can you share, Haseeb, can you share a customer example, a joint customer example that you think really articulates the value of what Rafay and AWS are doing together? >> Yeah, absolutely. In fact, we announced a customer very recently on this very show, which is MoneyGram, which is a joint AWS and Rafay customer. Look, we have enough, you know, the thing about these massive customers is that, you know, not everybody's going to give us their logo to use. >> Right. >> But MoneyGram has been a Rafay plus EKS customer for a very, very long time. You know, at this point, I think we've earned their trust, and they've allowed us to, kind of say this publicly. But there's enough of these financial services companies who have, you know, standardized on EKS. So it's EKS first, Rafay second, right. They standardized on EKS. And then they looked around and said, "Who can help me platform EKS across my enterprise?" And we've been very lucky. We have some very large financial services, some very large healthcare companies now, who, A, EKS, B, Rafay. I'm not just saying that because my friend Kevin's here, (Lisa laughs) it's actually true. Look, EKS is a brilliant platform. It scales so well, right. I mean, people try it out, relative to other platforms, and it's just a no-brainer, it just scales. You want to build a big enterprise on the backs of a Kubernetes platform. And I'm not saying that's because I'm biased. Like EKS is really, really good. There's a reason why so many companies are choosing it over many other options in the market. >> You're doing a great job of articulating why the theme (Kevin laughs) of the New York City Summit is scale anything. >> Oh, yeah. >> There you go. >> Oh, yeah. >> I did not even know that but I'm speaking the language, right? >> You are. (John laughs) >> Yeah, absolutely. >> One of the things that we're seeing, also, I want to get your thoughts on, guys, is the app modernization trend, right? >> Yep. >> Because unlike other standards that were hard, that didn't have any benefit downstream 'cause they were too hard to get to, here, Kubernetes is feeding into real app for app developer pressure. They got to get cloud-native apps out. It's fairly new in the mainstream enterprise and a lot of hyperscalers have experience. So I'm going to ask you guys, what is the key thing that you're enabling with Kubernetes in the cloud-native apps? What is the key value? >> Yeah. >> I think, there's a bifurcation happening in the market. One is the Kubernetes Engine market, which is like EKS, AKS, GKE, right. And then there's the, you know, what, back in the day, we used to call operations and management, right. So the OAM layer for Kubernetes is where there's need, right. People are learning, right. Because, as you said before, the skill isn't there, you know, there's not enough talent available to the market. And that's the opportunity we're seeing. Because to solve for the standardization, the governance, and automation that we talked about earlier, you know, you have to solve for, okay, how do I manage my network? How do I manage my service mesh? How do I do chargebacks? What's my, you know, policy around actual Kubernetes policies? What's my blueprinting strategy? How do I do add-on management? How do I do pipelines for updates of add-ons? How do I upgrade my clusters? And we're not done yet, there's a longer list, right? This is a lot, right? >> Yeah. >> And this is what happens, right. It's just a lot. And really, the companies who understand that plethora of problems that need to be solved and build easy-to-use solutions that enterprises can consume with the right governance automation, I think they're going to be very, very successful here. >> Yeah. >> Because this is a train, right? I mean, this is happening whether, it's not us, it's happening, right? Enterprises are going to keep doing this. >> And open-source is a big driver in all of this. >> Absolutely. >> Absolutely. >> And I'll tag onto that. I mean, you talked about platform engineering earlier. Part of the point of building these platforms on top of Kubernetes is giving developers an easier way to get applications into the cloud. So building unique developer experiences that really make it easy for you, as a software developer, to take the code from your laptop, get it out of production as quickly as possible. The question is- >> So is that what you mean, does that tie your point earlier about that vertical, straight-up value once you've set up it, right? >> Yep. >> Because it's taking the burden off the developers for stopping their productivity. >> Absolutely. >> To go check in, is it configured properly? Is the supply chain software going to be there? Who's managing the services? Who's orchestrating the nodes? >> Yep. >> Is that automated, is that where you guys see the value? >> That's a lot of what we see, yeah. In terms of how these companies are building these platforms, is taking all the component pieces that Haseeb was talking about and really putting it into a cohesive whole. And then, you, as a software developer, you don't have to worry about configuring all of those things. You don't have to worry about security policy, governance, how your app is going to be exposed to the internet. >> It sounds like infrastructure is code. >> (laughs) Yeah. >> Come on, like. >> (laughs) Infrastructure's code is a big piece of it, for sure, for sure. >> Yeah, look, infrastructure's code actually- >> Infrastructure's sec is code too, the security. >> Yeah. >> Huge. >> Well, it all goes together. Like, we talk about developer self-service, right? The way we enable developer self-service is by teaching developers, here's a snippet of code that you write and you check it in and your infrastructure will just magically be created. >> Yep. >> But not automatically. It's going to go through a check, like a check through the platform team. These are the workflows that if you get them right, developers don't care, right. All developers want is I want to compute. But then all these 20 things need to happen in the back. That's what, if you nail it, right, I mean, I keep trying to kind of pitch the company, I don't want to do that today. But if you nail that, >> I'll give you a plug at the end. >> you have a good story. >> But I got to, I just have a tangent question 'cause you reminded me. There's two types of developers that have emerged, right. You have the software developer that wants infrastructures code. I just want to write my code, I don't want to stop. I want to build in shift-left for security, shift-right for data. All that's in there. >> Right. >> I'm coding away, I love coding. Then you've got the under-the-hood person. >> Yes. >> I've been to the engines. >> Certainly. >> So that's more of an SRE, data engineer, I'm wiring services together. >> Yeah. >> A lot of people are like, they don't know who they are yet. They're in college or they're transforming from an IT job. They're trying to figure out who they are. So question is, how do you tell a person that's watching, like, who am I? Like, should I be just coding? But I love the tech. Would you guys have any advice there? >> You know, I don't know if I have any guidance in terms of telling people who they are. (all laughing) I mean, I think about it in terms of a spectrum and this is what we hear from customers, is some customers want to shift as much responsibility onto the software teams to manage their infrastructure as well. And then some want to shift it all the way over to the very centralized model. And, you know, we see everything in between as well with our EKS customer base. But, yeah, I'm not sure if I have any direct guidance for people. >> Let's see, any wisdom? >> Aside from experiment. >> If you're coding more, you're a coder. If you like to play with the hardware, >> Yeah. >> or the gears. >> Look, I think it's really important for managers to understand that developers, yes, they have a job, you have to write code, right. But they also want to learn new things. It's only fair, right. >> Oh, yeah. >> So what we see is, developers want to learn. And we enable for them to understand Kubernetes in small pieces, like small steps, right. And that is really, really important because if we completely abstract things away, like Kubernetes, from them, it's not good for them, right. It's good for their careers also, right. It's good for them to learn these things. This is going to be with us for the next 15, 20 years. Everybody should learn it. But I want to learn it because I want to learn, not because this is part of my job, and that's the distinction, right. I don't want this to become my job because I want, I want to write my code. >> Do what you love. If you're more attracted to understanding how automation works, and robotics, or making things scale, you might be under-the-hood. >> Yeah. >> Yeah, look under the hood all day long. But then, in terms of, like, who keeps the lights on for the cluster, for example. >> All right, see- >> That's the job. >> He makes a lot of value. Now you know who you are. Ask these guys. (Lisa laughing) Congratulations on your success on EKS 2. >> Yeah, thank you. >> Quick, give a plug for the company. I know you guys are growing. I want to give you a minute to share to the audience a plug that's going to be, what are you guys doing? You're hiring? How many employees? Funding? Customer new wins? Take a minute to give a plug. >> Absolutely. And look, I come see, John, I think, every show you guys are doing a summit or a KubeCon, I'm here. (John laughing) And every time we come, we talk about new customers. Look, platform teams at enterprises seem to love Rafay because it helps them build that, well, Kubernetes platform that we've talked about on the show today. I think, many large enterprises on the financial service side, healthcare side, digital native side seem to have recognized that running Kubernetes at scale, or even starting with Kubernetes in the early days, getting it right with the right standards, that takes time, that takes effort. And that's where Rafay is a great partner. We provide a great SaaS offering, which you can have up and running very, very quickly. Of course, we love EKS. We work with our friends at AWS. But also works with Azure, we have enough customers in Azure. It also runs in Google. We have enough customers at Google. And it runs on-premises with OpenShift or with EKS A, right, whichever option you want to take. But in terms of that standardization and governance and automation for your developers to move fast, there's no better product in the market right now when it comes to Kubernetes platforms than Rafay. >> Kevin, while we're here, why don't you plug EKS too, come on. >> Yeah, absolutely, why not? (group laughing) So yes, of course. EKS is AWS's managed Kubernetes offering. It's the largest managed Kubernetes service in the world. We help customers who want to adopt Kubernetes and adopt it wherever they want to run Kubernetes, whether it's in region or whether it's on the edge with EKS A or running Kubernetes on Outposts and the evolving portfolio of EKS services as well. We see customers running extremely high-scale Kubernetes clusters, excuse me, and we're here to support them as well. So yeah, that's the managed Kubernetes offering. >> And I'll give the plug for theCUBE, we'll be at KubeCon in Detroit this year. (Lisa laughing) Lisa, look, we're giving a plug to everybody. Come on. >> We're plugging everybody. Well, as we get to plugs, I think, Haseeb, you have a book to write, I think, on Kubernetes. And I think you're wearing the title. >> Well, I do have a book to write, but I'm one of those people who does everything at the very end, so I will never get it right. (group laughing) So if you want to work on it with me, I have some great ideas. >> Ghostwriter. >> Sure! >> But I'm lazy. (Kevin chuckles) >> Ooh. >> So we got to figure something out. >> Somehow I doubt you're lazy. (group laughs) >> No entrepreneur's lazy, I know that. >> Right? >> You're being humble. >> He is. So Haseeb, Kevin, thank you so much for joining John and me today, >> Thank you. >> talking about what you guys are doing at Rafay with EKS, the power, why you shouldn't hate k8s. We appreciate your insights and your time. >> Thank you as well. >> Yeah, thank you very much for having us. >> Our pleasure. >> Thank you. >> We appreciate it. With John Furrier, I'm Lisa Martin. You're watching theCUBE live from New York City at the AWS NYC Summit. John and I will be right back with our next guest, so stick around. (upbeat music) (gentle music)

Published Date : Jul 14 2022

SUMMARY :

We're going to be talking Thank you very much for having us. This is packed. Talk to us about some of the trends, I mean, the developers, you know, in the cloud and region. that you have and why And so customers, you know, we used to cover a show called OpenStack. And at the time, And it reminds me of the same trend we saw They're not that many out there yet. You want to go? And, I mean, you mentioned OpenStack. Well, Amazon had a lot to do And so it sounds like it's And that the reason why Well, after the Broadcom view, (John laughs) Yeah, let's not go there today. and some of the other Amazon tools. I mean, so, one of the you know, the thing about these who have, you know, standardized on EKS. of the New York City (John laughs) So I'm going to ask you guys, And that's the opportunity we're seeing. I think they're going to be very, I mean, this is happening whether, big driver in all of this. I mean, you talked about Because it's taking the is taking all the component pieces code is a big piece of it, is code too, the security. here's a snippet of code that you write that if you get them right, at the end. I just want to write my I'm coding away, I love coding. So that's more of But I love the tech. And then some want to If you like to play with the hardware, for managers to understand This is going to be with us Do what you love. the cluster, for example. Now you know who you are. I want to give you a minute Kubernetes in the early days, why don't you plug EKS too, come on. and the evolving portfolio And I'll give the plug And I think you're wearing the title. So if you want to work on it with me, But I'm lazy. So we got to (group laughs) So Haseeb, Kevin, thank you so much the power, why you shouldn't hate k8s. Yeah, thank you very much at the AWS NYC Summit.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Kevin Coleman	PERSON	0.99+
Kevin	PERSON	0.99+
John	PERSON	0.99+
Rafay	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Haseeb	PERSON	0.99+
John Furrier	PERSON	0.99+
two	QUANTITY	0.99+
EKS	ORGANIZATION	0.99+
10	QUANTITY	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
Haseeb Budhani	PERSON	0.99+
2010	DATE	0.99+
Rafay Systems	ORGANIZATION	0.99+
20 things	QUANTITY	0.99+
12	QUANTITY	0.99+
Lisa	PERSON	0.99+
two people	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
one platform	QUANTITY	0.99+
two types	QUANTITY	0.99+
MoneyGram	ORGANIZATION	0.99+
15 products	QUANTITY	0.99+
one	QUANTITY	0.99+
OpenShift	TITLE	0.99+
Rafay	ORGANIZATION	0.99+
12 things	QUANTITY	0.98+
today	DATE	0.98+
Second one	QUANTITY	0.98+
8	QUANTITY	0.98+
10, 12,000 people	QUANTITY	0.98+
vCenter	TITLE	0.98+
Detroit	LOCATION	0.98+
12 years	QUANTITY	0.98+
New York City Summit	EVENT	0.97+
EKS A	TITLE	0.97+
Kubernetes	TITLE	0.97+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Hadoop 3.0: