Starburst panel Q3

>>Okay. We're back with Justin Boorman CEO of Starburst. Richard Jarvis is the CTO of EMI health and Teresa tongue is the cloud first technologist from Accenture. We're on July number three. And that is the claim that today's modern data stack is actually modern. So I guess that's the lie or it's it is it's is that it's not modern, Justin, what do you say? >>Yeah, I mean, I think new isn't modern, right? I think it's, the's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually are exactly the same as what we've had for 40 years, rather than Terra data. You have snowflake rather than Informatica you have five trend. So it's the same general stack, just, you know, a cloud version of it. And I think a lot of the challenges that it plagued us for 40 years still maintain. >>So lemme come back to you just this, but okay. But, but there are differences, right? I mean, you can scale, you can throw resources at the problem. You can separate compute from storage. You really, you know, there's a lot of money being thrown at that by venture capitalists and snowflake, you mentioned it's competitors. So that's different. Is it not, is that not at least an aspect of, of modern dial it up, dial it down. So what, what do you say to that? >>Well, it, it is, it's certainly taking, you know, what the cloud offers and taking advantage of that, but it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data's still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same sort of structural constraints that exist with the old enterprise data warehouse model OnPrem still exists just, yes, a little bit more elastic now because the cloud offers that. >>So Theresa, let me go to you cuz you have cloud first in your, in your, your title. So what's what say you to this conversation? >>Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, that's not even how the cloud providers are looking at it. They have news query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our, the future goes, right, that that's gonna very much fall the same thing. There was gonna be more edge. There's gonna be more on premise because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers. Right? So there's a lot of reasons why the modern, I guess the next modern generation of the data staff needs to be much more federated. >>Okay. So Richard, how do you deal with this? You you've obviously got, you know, the technical debt, the existing infrastructure it's on the books. You don't wanna just throw it out. A lot of, lot of conversation about modernizing applications, which a lot of times is a, you know, of microservices layer on top of leg legacy apps. Ho how do you think about the modern data stack? >>Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just cuz you can scale CPU and storage doesn't mean you can get more people to use your data, to generate you more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data, but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams, we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business, not just the technology itself. >>Oh thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five, seven years cloud obviously has given a different pricing model. Drisk experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm, I'm taking away that that's not enough based on what Richard just said. The modern data stack has to serve the business and enable the business to build data products. I, I buy that I'm, you know, a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about, you know, the paying, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and principles there >>Of, of how it should look like or, or how >>Yeah. What it should be? >>Yeah. Yeah. Well, I think, you know, in Theresa mentioned this in, in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I, I certainly agree with that. So by no means, are we suggesting that, you know, snowflake or Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's, it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of, of idealism. They had the benefit of starting with a clean slate that does not reflect the vast majority of enterprises. >>And even those companies, as they grow up mature out of that ideal state, they go by a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding, you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really Futureproof yourself from the inevitable change that you will, you won't encounter over time. >>So thank you. So there, based on what Justin just said, I, I might take away there is it's inclusive, whether it's a data Mart, data hub, data lake data warehouse, it's a, just a node on the mesh. Okay. I get that. Does that include Theresa on, on Preem data? Obviously it has to, what are you seeing in terms of the ability to, to take that data mesh concept on pre I mean most implementations I've seen and data mesh, frankly really aren't, you know, adhering to the philosophy there. Maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing. Hello, fresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >>I mean, I, I think it's a killer case for data mesh. The fact that you have valuable data sources, OnPrem, and then yet you still wanna modernize and take the best of cloud cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both world. You can start using the data products on-prem or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or, or maybe just tap into better analytics to get better insights, right? So you're gonna be able to stretch and really have the best of both worlds that, again, going back to Richard's point, that is needful by the business. Not everything has to have that one size fits all set a tool. >>Okay. Thank you. So Richard, you know, you're talking about data as product. Wonder if we could give us your perspectives here, what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >>So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients' demographics about their, their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight and in any business, that's clearly not a desirable outcome, but when that insight is so critical, as it might be in healthcare or some security settings, you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured and managed way, even if that data comes from a variety of different sources in, in, in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to research is >>So that data product through whatever APIs is, is accessible, it's discoverable, but it's obviously gotta be governed as well. You mentioned appropriately provided to internally. Yeah. But also, you know, external folks as well. So the, so you've, you've architected that capability today >>We have and because the data is standard, it can generate value much more quickly and we can be sure of the security and, and, and value that that's providing because the data product isn't just about formatting the data into the right, correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context, what does this data mean? And what does it mean to process this data for a particular use >>Case? Yeah, it makes sense. It's got the context. If the, if the domains on the data, you, you gotta cut through a lot of the, the, the centralized teams, the technical teams that, that data agnostic, they don't really have that context. All right. Let's end, Justin, how does Starburst fit into this modern data stack? Bring us home. >>Yeah. So I think for us, it's really providing our customers with, you know, the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know, and optionality provides the ability to reduce costs, store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, you can really create and, and curate, you know, data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know, model and make that an appropriate compliment to, you know, the, the, the modern data stack that people have today. >>Excellent. Hey, I wanna thank Justin Teresa and Richard for joining us today. You guys are great. I big believers in the, in the data mesh concept, and I think, you know, we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are gonna be available on the cube.net for on-demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and, and, and they have awesome resources, lots of data mesh conversations over there, and really good stuff in, in the resource section. So check that out. Thanks for watching the data doesn't lie or does it made possible by Starburst data? This is Dave ante for the, and we'll see you next time.

Published Date : Aug 2 2022

SUMMARY :

And that is the claim that today's So it's the same general stack, So lemme come back to you just this, but okay. So a lot of the same sort of structural So Theresa, let me go to you cuz you have cloud first in your, in your, So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, a, you know, of microservices layer on top of leg legacy apps. you can get more people to use your data, to generate you more value for the business. So you think about the past, you know, five, seven years cloud obviously has given And I think that's the paradigm shift that needs to occur. from the inevitable change that you will, you won't encounter over time. and data mesh, frankly really aren't, you know, adhering to So the mesh allows you to have the best of both world. So Richard, you know, you're talking about data as product. that data or having the data not presented in the way that the user But also, you know, external folks as well. the proper product management around the data to say in a very clear business It's got the context. So we're trying to help enable the data mesh, you know, I big believers in the, in the data mesh concept, and I think, you know,

ENTITIES

Entity	Category	Confidence
Richard	PERSON	0.99+
Theresa	PERSON	0.99+
Richard Jarvis	PERSON	0.99+
Justin	PERSON	0.99+
Justin Boorman	PERSON	0.99+
Dave	PERSON	0.99+
AWS	ORGANIZATION	0.99+
five	QUANTITY	0.99+
40 years	QUANTITY	0.99+
Starburst	ORGANIZATION	0.99+
Accenture	ORGANIZATION	0.99+
40 years	QUANTITY	0.99+
JPMC	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Justin Teresa	PERSON	0.99+
both worlds	QUANTITY	0.99+
today	DATE	0.98+
first thing	QUANTITY	0.98+
Teresa	PERSON	0.98+
first technologist	QUANTITY	0.98+
Oracle	ORGANIZATION	0.98+
first	QUANTITY	0.98+
one element	QUANTITY	0.97+
Informatica	ORGANIZATION	0.97+
cube.net	OTHER	0.97+
Mongo	ORGANIZATION	0.97+
starburst.io	OTHER	0.96+
seven years	QUANTITY	0.95+
one	QUANTITY	0.95+
data Mart	ORGANIZATION	0.91+
one place	QUANTITY	0.88+
both world	QUANTITY	0.85+
COVID	TITLE	0.83+
single location	QUANTITY	0.8+
OnPrem	ORGANIZATION	0.8+
Terra	ORGANIZATION	0.77+
single source	QUANTITY	0.74+
one size	QUANTITY	0.73+
EMI health	ORGANIZATION	0.73+
July number	DATE	0.7+
data	ORGANIZATION	0.64+
five trend	QUANTITY	0.63+
money	QUANTITY	0.51+
three	QUANTITY	0.37+

Julie Lockner, IBM | IBM DataOps 2020

>>from the Cube Studios in Palo Alto and Boston connecting with thought leaders all around the world. This is a cube conversation. >>Hi, everybody. This is Dave Volante with Cuban. Welcome to the special digital presentation. We're really digging into how IBM is operational izing and automating the AI and data pipeline not only for its clients, but also for itself. And with me is Julie Lockner, who looks after offering management and IBM Data and AI portfolio really great to see you again. >>Great, great to be here. Thank you. Talk a >>little bit about the role you have here at IBM. >>Sure, so my responsibility in offering >>management and the data and AI organization is >>really twofold. One is I lead a team that implements all of the back end processes, really the operations behind any time we deliver a product from the Data and AI team to the market. So think about all of the release cycle management are seeing product management discipline, etcetera. The other role that I play is really making sure that I'm We are working with our customers and making sure they have the best customer experience and a big part of that is developing the data ops methodology. It's something that I needed internally >>from my own line of business execution. But it's now something that our customers are looking for to implement in their shops as well. >>Well, good. I really want to get into that. So let's let's start with data ops. I mean, I think you know, a lot of people are familiar with Dev Ops. Not maybe not everybody's familiar with data ops. What do we need to know about data? >>Well, I mean, you bring up the point that everyone knows Dev ops. And in fact, I think you know what data ops really >>does is bring a lot of the benefits that Dev Ops did for application >>development to the data management organizations. So when we look at what is data ops, it's a data management. Uh, it is a data management set of principles that helps organizations bring business ready data to their consumers. Quickly. It takes it borrows from Dev ops. Similarly, where you have a data pipeline that associates a business value requirement. I have this business initiative. It's >>going to drive this much revenue or this must cost >>savings. This is the data that I need to be able to deliver it. How do I develop that pipeline and map to the data sources Know what data it is? Know that I can trust it. So ensuring >>that it has the right quality that I'm actually using, the data that it was meant >>for and then put it to use. So in in history, most data management practices deployed a waterfall like methodology. Our implementation methodology and what that meant is all the data pipeline >>projects were implemented serially, and it was done based on potentially a first in first out program management office >>with a Dev Ops mental model and the idea of being able to slice through all of the different silos that's required to collect the data, to organize it, to integrate it, the validate its quality to create those data integration >>pipelines and then present it to the dashboard like if it's a Cognos dashboard >>or a operational process or even a data science team, that whole end to end process >>gets streamlined through what we're pulling data ops methodology. >>So I mean, as you well know, we've been following this market since the early days of Hadoop people struggle with their data pipelines. It's complicated for them, there's a a raft of tools and and and they spend most of their time wrangling data preparing data moving data quality, different roles within the organization. So it sounds like, you know, to borrow from from Dev Ops Data offices is all about streamlining that data pipeline, helping people really understand and communicate across. End the end, as you're saying, But but what's the ultimate business outcome that you're trying to drive? >>So when you think about projects that require data to again cut costs Teoh Artemia >>business process or drive new revenue initiatives, >>how long does it take to get from having access to the data to making it available? That duration for every time delay that is spent wasted trying to connect to data sources, trying to find subject matter experts that understand what the data means and can verify? It's quality, like all of those steps along those different teams and different disciplines introduces delay in delivering high quality data fat, though the business value of data ops is always associated with something that the business is trying to achieve but with a time element so if it's for every day, we don't have this data to make a decision where either making money or losing money, that's the value proposition of data ops. So it's about taking things that people are already doing today and figuring out the quickest way to do it through automation or work flows and just cutting through all the political barriers >>that often happens when these data's cross different organizational boundaries. >>Yes, sir, speed, Time to insights is critical. But in, you know, with Dev Ops, you really bringing together of the skill sets into, sort of, you know, one Super Dev or one Super ops. It sounds with data ops. It's really more about everybody understanding their role and having communication and line of sight across the entire organization. It's not trying to make everybody else, Ah, superhuman data person. It's the whole It's the group. It's the team effort, Really. It's really a team game here, isn't it? >>Well, that's a big part of it. So just like any type of practice, there's people, aspects, process, aspects and technology, right? So people process technology, and while you're you're describing it, like having that super team that knows everything about the data. The only way that's possible is if you have a common foundation of metadata. So we've seen a surgeons in the data catalog market in the last, you know, 67 years. And what what the what? That the innovation in the data catalog market has actually enabled us to be able >>to drive more data ops pipelines. >>Meaning as you identify data assets you captured the metadata capture its meaning. You capture information that can be shared, whether they're stakeholders, it really then becomes more of a essential repository for people don't really quickly know what data they have really quickly understand what it means in its quality and very quickly with the right proper authority, like privacy rules included. Put it to use >>for models, um, dashboards, operational processes. >>Okay. And we're gonna talk about some examples. And one of them, of course, is IBM's own internal example. But help us understand where you advise clients to start. I want to get into it. Where do I get started? >>Yeah, I mean, so traditionally, what we've seen with these large data management data governance programs is that sometimes our customers feel like this is a big pill to swallow. And what we've said is, Look, there's an operator. There's an opportunity here to quickly define a small project, align into high value business initiative, target something that you can quickly gain access to the data, map out these pipelines and create a squad of skills. So it includes a person with Dev ops type programming skills to automate an instrument. A lot of the technology. A subject matter expert who understands the data sources in it's meeting the line of business executive who translate bringing that information to the business project and associating with business value. So when we say How do you get started? We've developed A I would call it a pretty basic maturity model to help organizations figure out. Where are they in terms of the technology, where are they in terms of organizationally knowing who the right people should be involved in these projects? And then, from a process perspective, we've developed some pretty prescriptive project plans. They help you nail down. What are the data elements that are critical for this business business initiative? And then we have for each role what their jobs are to consolidate the data sets map them together and present them to the consumer. We find that six week projects, typically three sprints, are perfect times to be able to a timeline to create one of these very short, quick win projects. Take that as an opportunity to figure out where your bottlenecks are in your own organization, where your skill shortages are, and then use the outcome of that six week sprint to then focus on billing and gaps. Kick off the next project and iterating celebrate the success and promote the success because >>it's typically tied to a business value to help them create momentum for the next one. >>That's awesome. I want to get into some examples, I mean, or we're both Massachusetts based. Normally you'd be in our studio and we'd be sitting here for face to face of obviously with Kobe. 19. In this crisis world sheltering in place, you're up somewhere in New England. I happened to be in my studio, but I'm the only one here, so relate this to cove it. How would data ops, or maybe you have a, ah, a concrete example in terms of how it's helped, inform or actually anticipate and keep up to date with what's happening with both. >>Yeah, well, I mean, we're all experiencing it. I don't think there's a person >>on the planet who hasn't been impacted by what's been going on with this Cupid pandemic prices. >>So we started. We started down this data obscurity a year ago. I mean, this isn't something that we just decided to implement a few weeks ago. We've been working on developing the methodology, getting our own organization in place so that we could respond the next time we needed to be able todo act upon a data driven decision. So part of the step one of our journey has really been working with our global chief data officer, Interpol, who I believe you have had an opportunity to meet with an interview. So part of this year Journey has been working with with our corporate organization. I'm in a line of business organization where we've established the roles and responsibilities we've established the technology >>stack based on our cloud pack for data and Watson knowledge padlock. >>So I use that as the context. For now, we're faced with a pandemic prices, and I'm being asked in my business unit to respond very quickly. How can we prioritize the offerings that are going to help those in critical need so that we can get those products out to market? We can offer a 90 day free use for governments and hospital agencies. So in order for me to do that as a operations lead or our team, I needed to be able to have access to our financial data. I needed to have access to our product portfolio information. I needed to understand our cloud capacity. So in order for me to be able to respond with the offers that we recently announced and you'll you can take a look at some of the examples with our Watson Citizen Assistant program, where I was able to provide the financial information required for >>us to make those products available from governments, hospitals, state agencies, etcetera, >>that's a That's a perfect example. Now, to set the stage back to the corporate global, uh, the chief data office organization, they implemented some technology that allowed us to, in just data, automatically classify it, automatically assign metadata, automatically associate data quality so that when my team started using that data, we knew what the status of that information >>was when we started to build our own predictive models. >>And so that's a great example of how we've been partnered with a corporate central organization and took advantage of the automated, uh, set of capabilities without having to invest in any additional resources or head count and be able to release >>products within a matter of a couple of weeks. >>And in that automation is a function of machine intelligence. Is that right? And obviously, some experience. But you couldn't you and I when we were consultants doing this by hand, we couldn't have done this. We could have done it at scale anyway. It is it is it Machine intelligence and AI that allows us to do this. >>That's exactly right. And you know, our organization is data and AI, so we happen to have the research and innovation teams that are building a lot of this technology, so we have somewhat of an advantage there, but you're right. The alternative to what I've described is manual spreadsheets. It's querying databases. It's sending emails to subject matter experts asking them what this data means if they're out sick or on vacation. You have to wait for them to come back, and all of this was a manual process. And in the last five years, we've seen this data catalog market really become this augmented data catalog, and the augmentation means it's automation through AI. So with years of experience and natural language understanding, we can home through a lot of the metadata that's available electronically. We can calm for unstructured data, but we can categorize it. And if you have a set of business terms that have industry standard definitions through machine learning, we can automate what you and I did as a consultant manually in a matter of seconds. That's the impact that AI is have in our organization, and now we're bringing this to the market, and >>it's a It's a big >>part of where I'm investing. My time, both internally and externally, is bringing these types >>of concepts and ideas to the market. >>So I'm hearing. First of all, one of the things that strikes me is you've got multiple data, sources and data that lives everywhere. You might have your supply chain data in your er p. Maybe that sits on Prem. You might have some sales data that's sitting in a sas in a cloud somewhere. Um, you might have, you know, weather data that you want to bring in in theory. Anyway, the more data that you have, the better insights that you could gather assuming you've got the right data quality. But so let me start with, like, where the data is, right? So So it's it's anywhere you don't know where it's going to be, but you know you need it. So that's part of this right? Is being able >>to get >>to the data quickly. >>Yeah, it's funny. You bring it up that way. I actually look a little differently. It's when you start these projects. The data was in one place, and then by the time you get through the end of a project, you >>find out that it's moved to the cloud, >>so the data location actually changes. While we're in the middle of projects, we have many or even during this this pandemic crisis. We have many organizations that are using this is an opportunity to move to SAS. So what was on Prem is now cloud. But that shouldn't change the definition of the data. It shouldn't change. It's meaning it might change how you connect to it. It might also change your security policies or privacy laws. Now, all of a sudden, you have to worry about where is that data physically located? And am I allowed to share it across national boundaries right before we knew physically where it waas. So when you think about data ops, data ops is a process that sits on top of where the data physically resides. And because we're mapping metadata and we're looking at these data pipelines and automated work flows, part of the design principles are to set it up so that it's independent of where it resides. However, you have to have placeholders in your metadata and in your tool chain, where we're automating these work flows so that you can accommodate when the data decides to move. Because the corporate policy change >>from on prem to cloud. >>And that's a big part of what Data ops offers is the same thing. By the way, for Dev ops, they've had to accommodate building in, you know, platforms as a service versus on from the development environments. It's the same for data ops, >>and you know, the other part that strikes me and listening to you is scale, and it's not just about, you know, scale with the cloud operating model. It's also about what you were talking about is you know, the auto classification, the automated metadata. You can't do that manually. You've got to be able to do that. Um, in order to scale with automation, That's another key part of data office, is it not? >>It's a well, it's a big part of >>the value proposition and a lot of the part of the business case. >>Right then you and I started in this business, you know, and big data became the thing. People just move all sorts of data sets to these Hadoop clusters without capturing the metadata. And so as a result, you know, in the last 10 years, this information is out there. But nobody knows what it means anymore. So you can't go back with the army of people and have them were these data sets because a lot of the contact was lost. But you can use automated technology. You can use automated machine learning with natural, understand natural language, understanding to do a lot of the heavy lifting for you and a big part of data ops, work flows and building these pipelines is to do what we call management by exception. So if your algorithms say 80% confident that this is a phone number and your organization has a low risk tolerance, that probably will go to an exception. But if you have a you know, a match algorithm that comes back and says it's 99% sure this is an email address, right, and you have a threshold that's 98%. It will automate much of the work that we used to have to do manually. So that's an example of how you can automate, eliminate manual work and have some human interaction based on your risk threshold. >>That's awesome. I mean, you're right, the no schema on write said. I throw it into a data lake. Data Lake becomes a data swamp. We all know that joke. Okay, I want to understand a little bit, and maybe you have some other examples of some of the use cases here, but there's some of the maturity of where customers are. It seems like you've got to start by just understanding what data you have, cataloging it. You're getting your metadata act in order. But then you've got you've got a data quality component before you can actually implement and get yet to insight. So, you know, where are customers on the maturity model? Do you have any other examples that you can share? >>Yeah. So when we look at our data ops maturity model, we tried to simplify, and I mentioned this earlier that we try to simplify it so that really anybody can get started. They don't have to have a full governance framework implemented to to take advantage of the benefits data ops delivers. So what we did is we said if you can categorize your data ops programs into really three things one is how well do you know your data? Do you even know what data you have? The 2nd 1 is, and you trust it like, can you trust it's quality? Can you trust it's meeting? And the 3rd 1 is Can you put it to use? So if you really think about it when you begin with what data do you know, write? The first step is you know, how are you determining what data? You know? The first step is if you are using spreadsheets. Replace it with a data catalog. If you have a department line of business catalog and you need to start sharing information with the department's, then start expanding to an enterprise level data catalog. Now you mentioned data quality. So the first step is do you even have a data quality program, right. Have you even established what your criteria are for high quality data? Have you considered what your data quality score is comprised of? Have you mapped out what your critical data elements are to run your business? Most companies have done that for there. They're governed processes. But for these new initiatives And when you identify, I'm in my example with the covert prices, what products are we gonna help bring to market quickly? I need to be able to >>find out what the critical data elements are. And can I trust it? >>Have I even done a quality scan and have teams commented on it's trustworthiness to be used in this case, If you haven't done anything like that in your organization, that might be the first place to start. Pick the critical data elements for this initiative, assess its quality, and then start to implement the work flows to re mediate. And then when you get to putting it to use, there's several methods for making data available. One is simply making a gate, um, are available to a small set of users. That's what most people do Well, first, they make us spreadsheet of the data available, But then, if they need to have multiple people access it, that's when, like a Data Mart might make sense. Technology like data virtualization eliminates the need for you to move data as you're in this prototyping phase, and that's a great way to get started. It doesn't cost a lot of money to get a virtual query set up to see if this is the right join or the right combination of fields that are required for this use case. Eventually, you'll get to the need to use a high performance CTL tool for data integration. But Nirvana is when you really get to that self service data prep, where users can weary a catalog and say these are the data sets I need. It presents you a list of data assets that are available. I can point and click at these columns I want as part of my data pipeline and I hit go and automatically generates that output or data science use cases for it. Bad news, Dashboard. Right? That's the most mature model and being able to iterate on that so quickly that as soon as you get feedback that that data elements are wrong or you need to add something, you can do it. Push button. And that's where data obscurity should should bring organizations too. >>Well, Julie, I think there's no question that this covert crisis is accentuated the importance of digital. You know, we talk about digital transformation a lot, and it's it's certainly riel, although I would say a lot of people that we talk to we'll say, Well, you know, not on my watch. Er, I'll be retired before that all happens. Well, this crisis is accelerating. That transformation and data is at the heart of it. You know, digital means data. And if you don't have data, you know, story together and your act together, then you're gonna you're not gonna be able to compete. And data ops really is a key aspect of that. So give us a parting word. >>Yeah, I think This is a great opportunity for us to really assess how well we're leveraging data to make strategic decisions. And if there hasn't been a more pressing time to do it, it's when our entire engagement becomes virtual like. This interview is virtual right. Everything now creates a digital footprint that we can leverage to understand where our customers are having problems where they're having successes. You know, let's use the data that's available and use data ops to make sure that we can generate access. That data? No, it trust it, Put it to use so that we can respond to >>those in need when they need it. >>Julie Lockner, your incredible practitioner. Really? Hands on really appreciate you coming on the Cube and sharing your knowledge with us. Thank you. >>Thank you very much. It was a pleasure to be here. >>Alright? And thank you for watching everybody. This is Dave Volante for the Cube. And we will see you next time. >>Yeah, yeah, yeah, yeah, yeah

Published Date : May 28 2020

SUMMARY :

from the Cube Studios in Palo Alto and Boston connecting with thought leaders all around the world. portfolio really great to see you again. Great, great to be here. from the Data and AI team to the market. But it's now something that our customers are looking for to implement I mean, I think you know, I think you know what data ops really Similarly, where you have a data pipeline that associates a This is the data that I need to be able to deliver it. for and then put it to use. So it sounds like, you know, that the business is trying to achieve but with a time element so if it's for every you know, with Dev Ops, you really bringing together of the skill sets into, sort of, in the data catalog market in the last, you know, 67 years. Meaning as you identify data assets you captured the metadata capture its meaning. But help us understand where you advise clients to start. So when we say How do you get started? it's typically tied to a business value to help them create momentum for the next or maybe you have a, ah, a concrete example in terms of how it's helped, I don't think there's a person on the planet who hasn't been impacted by what's been going on with this Cupid pandemic Interpol, who I believe you have had an opportunity to meet with an interview. So in order for me to Now, to set the stage back to the corporate But you couldn't you and I when we were consultants doing this by hand, And if you have a set of business terms that have industry part of where I'm investing. Anyway, the more data that you have, the better insights that you could The data was in one place, and then by the time you get through the end of a flows, part of the design principles are to set it up so that it's independent of where it for Dev ops, they've had to accommodate building in, you know, and you know, the other part that strikes me and listening to you is scale, and it's not just about, So you can't go back with the army of people and have them were these data I want to understand a little bit, and maybe you have some other examples of some of the use cases So the first step is do you even have a data quality program, right. And can I trust it? able to iterate on that so quickly that as soon as you get feedback that that data elements are wrong And if you don't have data, you know, Put it to use so that we can respond to Hands on really appreciate you coming on the Cube and sharing Thank you very much. And we will see you next time.

ENTITIES

Entity	Category	Confidence
Julie	PERSON	0.99+
Julie Lockner	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dave Volante	PERSON	0.99+
New England	LOCATION	0.99+
90 day	QUANTITY	0.99+
99%	QUANTITY	0.99+
80%	QUANTITY	0.99+
Massachusetts	LOCATION	0.99+
Data Mart	ORGANIZATION	0.99+
first step	QUANTITY	0.99+
98%	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
Boston	LOCATION	0.99+
67 years	QUANTITY	0.99+
six week	QUANTITY	0.99+
Cube Studios	ORGANIZATION	0.99+
both	QUANTITY	0.99+
one	QUANTITY	0.99+
a year ago	DATE	0.99+
first	QUANTITY	0.98+
Dev Ops	ORGANIZATION	0.98+
2nd 1	QUANTITY	0.97+
One	QUANTITY	0.97+
First	QUANTITY	0.97+
Interpol	ORGANIZATION	0.97+
one place	QUANTITY	0.97+
each role	QUANTITY	0.97+
Hadoop	TITLE	0.95+
Kobe	PERSON	0.95+
SAS	ORGANIZATION	0.95+
Cupid pandemic	EVENT	0.94+
today	DATE	0.93+
3rd 1	QUANTITY	0.93+
this year	DATE	0.93+
few weeks ago	DATE	0.88+
Prem	ORGANIZATION	0.87+
last five years	DATE	0.87+
2020	DATE	0.85+
three sprints	QUANTITY	0.81+
one Super	QUANTITY	0.8+
Nirvana	ORGANIZATION	0.79+
Cuban	ORGANIZATION	0.77+
three things	QUANTITY	0.76+
pandemic	EVENT	0.74+
step one	QUANTITY	0.71+
one of them	QUANTITY	0.7+
last 10 years	DATE	0.69+
Dev Ops	TITLE	0.69+
Teoh Artemia	ORGANIZATION	0.68+
Cognos	ORGANIZATION	0.61+
Watson Citizen Assistant	TITLE	0.6+
Dev ops	TITLE	0.6+
Cube	COMMERCIAL_ITEM	0.57+
ops	ORGANIZATION	0.54+
weeks	QUANTITY	0.48+
Cube	ORGANIZATION	0.47+
couple	QUANTITY	0.47+
Watson	TITLE	0.42+

Raymie Stata, SAP - Big Data SV 17 - #BigDataSV - #theCUBE

>> Announcer: From San Jose, California, it's The Cube, covering Big Data Silicon Valley 2017. >> Welcome back everyone. We are at Big Data Silicon Valley, running in conjunction with Strata + Hadoop World in San Jose. I'm George Gilbert and I'm joined by Raymie Stata, and Raymie was most recently CEO and Founder of Altiscale. Hadoop is a service vendor. One of the few out there, not part of one of the public clouds. And in keeping with all of the great work they've done, they got snapped up by SAP. So, Rami, since we haven't seen you, I think on The Cube since then, why don't you catch us up with all that, the good work that's gone on between you and SAP since then. >> Sure, so the acquisition closed back in September, so it's been about six months. And it's been a very busy six months. You know, there's just a lot of blocking and tackling that needs to happen. So, you know, getting people on board. Getting new laptops, all that good stuff. But certainly a huge effort for us was to open up a data center in Europe. We've long had demand to have that European presence, both because I think there's a lot of interest over in Europe itself, but also large, multi-national companies based in the US, you know, it's important for them to have that European presence as well. So, it was a natural thing to do as part of SAP, so kind of first order of business was to expand over into Europe. So that was a big exercise. We've actually had some good traction on the sales side, right, so we're getting new customers, larger customers, more demanding customers, which has been a good challenge too. >> So let's pause for a minute on, sort of unpack for folks, what Altiscale offered, the core services. >> Sure. >> That were, you know, here in the US, and now you've extended to Europe. >> Right. So our core platform is kind of Hadoop, Hive, and Spark, you know, as a service in the cloud. And so we would offer HDFS and YARN for Hadoop. Spark and Hive kind of well-integrated. And we would offer that as a cloud service. So you would just, you know, get an account, login, you know, store stuff in HDFS, run your Spark programs, and the way we encourage people to think about it is, I think very often vendors have trained folks in the big data space to think about nodes. You know, how many nodes am I going to get? What kind of nodes am I going to get? And the way we really force people to think twice about Hadoop and what Hadoop as a service means is, you know, they don't, why are you asking that? You don't need to know about nodes. Just store stuff, run your jobs. We worry about nodes. And that, you know, once people kind of understood, you know, just how much complexity that takes out of their lives and how that just enables them to truly focus on using these technologies to get business value, rather that operating them. You know, there's that aha moment in the sales cycle, where people say yeah, that's what I want. I want Hadoop as a service. So that's been our value proposition from the beginning. And it's remained quite constant, and even coming into SAP that's not changing, you know, one bit. >> So, just to be clear then, it's like a lot of the operational responsibilities sort of, you took control over, so that when you say, like don't worry about nodes, it's customer pours x amount of data into storage, which in your case would be HDFS, and then compute is independent of that. They need, you spin up however many, or however much capacity they need, with Spark for instance, to process it, or Hive. Okay, so. >> And all on demand. >> Yeah so it sounds like it's, how close to like the Big Query or Athena services, Athena on AWS or Big Query on Google? Where you're not aware of any servers, either for storage or for compute? >> Yeah I think that's a very good comparable. It's very much like Athena and Big Query where you just store stuff in tables and you issue queries and you don't worry about how much compute, you know, and managing it. I think, by throwing, you know, Spark in the equation, and YARN more generally, right, we can handle a broader range of these cases. So, for example, you don't have to store data in tables, you can store them into HDFS files which is good for processing log data, for example. And with Spark, for example, you have access to a lot of machine learning algorithms that are a little bit harder to run in the context of, say, Athena. So I think it's the same model, in terms of, it's fully operated for you. But a broader platform in terms of its capabilities. >> Okay, so now let's talk about what SAP brought to the table and how that changed the use cases that were appropriate for Altiscale. You know, starting at the data layer. >> Yeah, so, I think the, certainly the, from the business perspective, SAP brings a large, very engaged customer base that, you know, is eager to embrace, kind of a data-driven mindset and culture and is looking for a partner to help them do that, right. And so that's been great to be in that environment. SAP has a number of additional technologies that we've been integrating into the Altiscale offering. So one of them is Vora, which is kind of an interactive sequel engine, it also has time series capabilities and graph capabilities and search capabilities. So it has a lot of additive capabilities, if you will, to what we have at Altiscale. And it also integrates very deeply into HANA itself. And so we now have that for a technology available as a service at Altiscale. >> Let me make sure, so that everyone understands, and so I understand too, is that so you can issue queries from HANA and they can, you know, beyond just simple sequel queries, they can handle the time series, and predictive analytics, and access data sort of seamlessly that's in Hadoop, or can it go the other way as well? >> It's both ways. So you can, you know, from HANA you can essentially federate out into Vora. And through that access data that's in a Hadoop cluster. But it's also the other way around. A lot of times there's an analyst who really lives in the big data world, right, they're in the Hadoop world, but they want to join in data that's sitting in a HANA database, you know. Might be dimensions in a warehouse or, you know, customer details even in a transactional system. And so, you know, that Hadoop-based analyst now has access to data that's out in those HANA databases. >> Do you have some Lighthouse accounts that are working with this already? >> Yes, we do. (laughter) >> Yes we do, okay. I guess that was the diplomatic way of saying yes. But no comment. Alright, so tell us more about SAPs big data stack today and how that might evolve. >> Yeah, of course now, especially that now we've got the Spark, Hadoop, Hive offering that we have. And then four sitting on top of that. There's an offering called Predictive Analytics, which is Spark-based predictive analytics. >> Is that something that came from you, or is that, >> That's an SAP thing, so this is what's been great about the acquisition is that SAP does have a lot of technologies that we can now integrate. And it brings new capabilities to our customer base. So those three are kind of pretty key. And then there's something called Data Services as well, which allows us to move data easily in and out of, you know, HANA and other data stores. >> Is it, is this ability to federate queries between Hadoop and HANA and then migration of the data between the stores, does that, has that changed the economics of how much data people, SAP customers, maintain and sort of what types of apps they can build on it now that they might, it's economically feasible to store a lot more data. >> Well, yes and no. I think the context of Altiscale, both before and after the acquisition is very often there's, what you might call a big data source, right. It could be your web logs, it could be some IOT generated log data, it could be social media streams. You know, this is data that's, you know, doesn't have a lot of structure coming in. It's fairly voluminous. It doesn't, very naturally, go into a sequel database, and that's kind of the sweet spot for the big data technologies like Hadoop and Spark. So, those datas come into your big data environment. You can transform it, you can do some data quality on it. And then you can eventually stage it out into something like HANA data mart, where it, you know, to make it available for reporting. But obviously there's stuff that you can do on the larger dataset in Hadoop as well. So, in a way, yes, you can now tame, if you will, those huge data sources that, you know, weren't practical to put into HANA databasing. >> If you were to prioritize, in the context of, sort of, the applications SAP focuses on, would you be, sort of, with the highest priority use case be IOT related stuff, where, you know, it was just prohibitive to put it in HANA since it's mostly in memory. But, you know, SAP is exposed to tons of that type of data, which would seem to most naturally have an afinity to Altiscale. >> Yeah, so, I mean, IOT is a big initiative. And is a great use case for big data. But, you know, financial-to-financial services industry, as another example, is fairly down the path using Hadoop technologies for many different use cases. And so, that's also an opportunity for us. >> So, let me pop back up, you know, before we have to wrap. With Altiscale as part of the SAP portfolio, have the two companies sort of gone to customers with a more, with more transformational options, that, you know, you'll sell together? >> Yeah, we have. In fact, Altiscale actually is no longer called Altiscale, right? We're part of a portfolio of products, you know, known as the SAP Cloud Platform. So, you know, under the cloud platform we're the big data services. The SAP Cloud Platform is all about business transformation. And business innovation. And so, we bring to that portfolio the ability to now bring the types of data sources that I've just discussed, you know, to bear on these transformative efforts. And so, you know, we fit into some momentum SAP already has, right, to help companies drive change. >> Okay. So, along those lines, which might be, I mean, we know the financial services has done a lot of work with, and I guess telcos as well, what are some of the other verticals that look like they're primed to fall, you know, with this type of transformational network? >> So you mentioned one, which I kind of call manufacturing, right, and there tends to be two kind of different use cases there. One of them I call kind of the shop floor thing. Where you're collecting a lot of sensor data, you know, out of a manufacturing facility with the goal of increasing yield. So you've got the shop floor. And then you've got the, I think, more commonly discussed measuring stuff out in the field. You've got a product, you know, out in the field. Bringing the telemetry back. Doing things like predictive meetings. So, I think manufacturing is a big sector ready to go for big data. And healthcare is another one. You know, people pulling together electronic medical records, you know trying to combine that with clinical outcomes, and I think the big focus there is to drive towards, kind of, outcome-based models, even on the payment side. And big data is really valuable to drive and assess, you know, kind of outcomes in an aggregate way. >> Okay. We're going to have to leave it on that note. But we will tune back in at I guess Sapphire or TechEd, whichever of the SAP shows is coming up next to get an update. >> Sapphire's next. Then TechEd. >> Okay. With that, this is George Gilbert, and Raymie Stata. We will be back in few moments with another segment. We're here at Big Data Silicon Valley. Running in conjunction with Strata + Hadoop World. Stay tuned, we'll be right back.

Published Date : Mar 15 2017

SUMMARY :

it's The Cube, covering Big One of the few out there, companies based in the US, you So let's pause for a minute That were, you know, here in the US, And that, you know, once so that when you say, you know, and managing it. You know, starting at the data layer. very engaged customer base that, you know, And so, you know, that Yes, we do. and how that might evolve. the Spark, Hadoop, Hive in and out of, you know, migration of the data You know, this is data that's, you know, be IOT related stuff, where, you know, But, you know, financial-to-financial So, let me pop back up, you know, And so, you know, we fit into you know, with this type you know, out of a manufacturing facility We're going to have to Gilbert, and Raymie Stata.

ENTITIES

Entity	Category	Confidence
Europe	LOCATION	0.99+
George Gilbert	PERSON	0.99+
George Gilbert	PERSON	0.99+
September	DATE	0.99+
US	LOCATION	0.99+
Raymie Stata	PERSON	0.99+
Altiscale	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
San Jose, California	LOCATION	0.99+
Raymie	PERSON	0.99+
One	QUANTITY	0.99+
six months	QUANTITY	0.99+
TechEd	ORGANIZATION	0.99+
two companies	QUANTITY	0.99+
HANA	TITLE	0.99+
SAP	ORGANIZATION	0.99+
Rami	PERSON	0.99+
Hadoop	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
Big Data	ORGANIZATION	0.99+
three	QUANTITY	0.99+
Sapphire	ORGANIZATION	0.99+
both	QUANTITY	0.98+
twice	QUANTITY	0.98+
SAP Cloud Platform	TITLE	0.98+
one	QUANTITY	0.98+
about six months	QUANTITY	0.98+
Spark	TITLE	0.98+
AWS	ORGANIZATION	0.98+
Google	ORGANIZATION	0.97+
both ways	QUANTITY	0.97+
Athena	TITLE	0.97+
Strata + Hadoop World	ORGANIZATION	0.96+
Strata	ORGANIZATION	0.92+
Predictive Analytics	TITLE	0.91+
Athena	ORGANIZATION	0.91+
one bit	QUANTITY	0.9+
first order	QUANTITY	0.89+
The Cube	ORGANIZATION	0.89+
Vora	TITLE	0.88+
Big Query	TITLE	0.87+
today	DATE	0.86+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for data Mart: