David Hseih, Qubole - DataWorks Summit 2017
>> Announcer: Live from San Jose in the heart of Silicon Valley, it's theCube. Covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Hey, welcome back to theCUBE. We are live on day one of the DataWorks Summit in the heart of Silicon Valley. I'm Lisa Martin with my co-host Peter Burgess. Just chatting with our next guest about the Warriors win yesterday we're also pretty excited about that. David Hseih the SVP of Marketing from Qubole, hi David. >> David: Hey, thanks for having me. >> Welcome to theCUBE, we're glad you still have a voice after no doubt cheering on the home team last night. >> It was a close call 'cause I was yelling pretty loud yesterday. >> So talk to us about you the SVP of Marketing for Qubole. Big data platform in the cloud. You guys just had a big announcement a few weeks ago. >> David: Right. >> What are your thoughts, what's going on with Qubole? What's going on with big data? What are you seeing in the market? So you know we're a cloud-native data platform and you know when we talk to customers, we're really, you know, they're really complaining about how they're just struggling with complexity and the barriers to entry and you know, they're really crying out for help. And the good news I suppose is we're in an industry that has a very high pace of innovation. That's great right. Spark has had eight versions now in two years, but that pace of innovation is, you know, making the complexity even harder. I was watching Cloudera bragging about how their new product is a combination of 24 open source projects. You know that's tough stuff, right. So if you're a practitioner trying to get big data operationalized in your company. And trying to scale the use of data and analytics across the company. The nature of open source is it's designed for flexibility. Right, the source codes public, you have all these options, configuration settings et cetera. But moving those into production and then scaling them in a reliable way is just crushing practitioners. And so data teams are suffering, and I think frankly it's bad for our industry, because, you know, Gardner's talking about a, you know, 80% failure rate of big data projects by 2018. Think about that, what industry can survive when 70 or 80% of the projects fail? >> Well I think what's let me push on that a little bit. Because I think that the concern is about, is not about 70 to 80% of the efforts to reach an answer in a complex big data thing, it's going to fail. We can probably accommodate that, but what we can't accommodate is failure in the underlying infrastructure. >> David: Absolutely. >> So the research we've done, suggest something as well that we are seeing an enormous amount of time spent on the underlying infrastructure. And there's a lot of failures there. People would say, I have a question, I want to know if there's an answer and try to get to that answer, and not getting the answer they want, >> David: Yep. or getting a different answer. That kind of failure is still okay. >> David: Right. >> Because that's experience, you get more and more and more. >> David: Absolutely. >> So it's not the failure in the data science side or the application side. >> Actually I would say getting to an answer you don't like, is a form of success. Like you have an idea, you try it out, that's all great. >> So what Gardner is really saying it's failure in the implementation of the infrastructure. >> That's exactly right. >> So it's the administrative and operational sides. >> Correct, it's a project that didn't deliver then resolve. If the end result what you hoped, great. >> You couldn't even answer your question. >> Exactly, couldn't even answer the question. >> So let me test something on you Dave, David. We've been carrying a thesis at Wikibon for awhile that it looks like opensource is proving that it's very good at mimicking, and not quite as good at inventing. >> David: Right. >> So by that I mean if you put an operating, drop an operating system in front of Linus Torvalds he can look at that and say I can do that. >> David: Right. >> And do a great job of it. If you put a development tool same kind of thing. But big data is very complex, a lot of it, an enormous number of usecases. >> David: Correct. >> And open source has done a good job at a tool level and it looks as though the tools are being built to make other tools more valuable, >> David: Ha, right. >> As opposed to making it easy for a business to operationalize data science and the use of big data in their business. Would you agree or disagree with that? >> I yeah, I think that sort of like fundamentally the philosophy of open source. You know I'm going to do my work, something I need for me, but I'm going to share it with everybody else. And they can contribute. But at the end of the day, you know, unlike commercial software, there's sort of no one throat to choke. Right and there's nobody who is going to guarantee the interoperability and the success of the piece of software that you're trying to deploy. >> There's mot even a real coherent vision in many respects. >> David: No, absolutely not. >> What the final product's going to end up looking like. >> So what you have is a lot of really great cutting edge technology that a lot of really smart people, sort of poured their hearts and souls into. But that's a little different than trying to get to an end result. And, you know. Like it or not, commercial software packages are designed to deliver the result you pay for. Open source being sort of philosophically, very different I think breeds you know inherent complexity. And that complexity right now, is I think the root of the problem in our industry. >> So give us an example David, you know, you're a Marketing guy, I'm a marketing gal. >> Sure. >> Give us an example of a customer, maybe one of your favorite examples, where are you helping them? They're struggling here, they've made significant investments from an infrastructure perspective. They know there's value in the data, >> David: Yup. varying degrees as we've talked about before. How does Qubole get in there and start helping this usecase customer start to optimize, and really start making this big data project successful? >> That's a great question. So there's really two things, number one is that we are a SAAS based platform in the cloud and what we do basically is make big data into more of a turnkey service. So actually the other day, I was sort of surfing the internet, and we have a customer from Sonic Drive-In. You know they do hamburgers and stuff. >> Lisa: Oh yeah. >> And they're doing a bunch of big data, and this guy was at a data science meet, talking about. We didn't put him up to this, he just volunteered. He was talking about how we made his life so much easier. Why, because all of the configurations stuff, the settings, and you know, how to manage costs, was basically filling out a form and setting policy and parameters. And not having to write scripts and figure out all these configuration settings. If I set this one this way and that one that way, what happens. You know, we have a sort of more curated environment that makes that easy. But the thing that I'm really excited about is we think this is the time to really look at having data platforms that can you know, build or run autonomously. Today companies have to hire really expensive, really highly skilled, super smart data engineers, and data ops people to run their infrastructure. And you know, if you look at studies, we're about a 180,000 people short of the number of data engineers, data ops this industry needs. So try to scale by adding more smart people is super hard. Right but instead if you could start to get machines to do what people are doing. Just faster, cheaper, more reliably. Then you can scale your data platform. So we basically, made an announcement a couple weeks ago, kind of about the industry's first autonomous data platform. And what we're building, are software agents that can take over certain types of data management tasks so that data engineers don't have to do it. Or don't have to be up at three in the morning making sure everything is going right. >> And from a market segmentation perspective where's your sweet spot for that? Enterprise, SMB, somewhere in the middle? >> The bigger you have to scale. It's not about company size it's really about sort of the scope and scale of your big data efforts. So you know, the more people you have using it, then the more data you have. The more you want automation to make things easier. It's sort of true of any industry, it's certainly going to be true of the big data industry. >> Peter: Yeah more complexity in the question set, >> Correct. >> The more complexity-- >> Or the more users you have, the more it gives. Adds more data sources. >> Which presumable is going to be correlated. >> Absolutely correct. >> Which is we can use a big data project to ascertain that. >> Well in fact that sort of what we're doing. Because we're a SAAS platform we take in the metadata from what our customers are doing. What users, what clusters, what queries, which tables, all that stuff. We basically use machine learning and artificial intelligence to analyze how you're using your data platform. And tell you what you could do better or automates stuff that you don't have to do anymore. >> So we've presumed that the industry at some point of time, the big data industry at some point of time, is going to start moving it's attention to things like machine learning and A.I., you know, up into applications. >> David: Yep. >> Are we going to see the big data industry basically more pretty rapidly into more of inservice or application conversation, or is it going to kind of are we going to see a rebirth, as folks try to bring a more coherent approach to the existing, many of the tools that are here right now. >> David: Right. >> What do you think? >> Well I think, we're going to see some degree of industry consolidation, and you're going to see vendors, you know, and you're seeing it today. Try to simplify and consolidate. Right so some of that is moving stack towards applications some of that is about repackaging their offerings and adding simplicity. It's about using artificial intelligence to make the operational platform itself easier. I think you'll see a variety of those things, because you know, companies have too many places where they can stumble in their deployment. And you know, it's going to be, you know, the vendor community has to step in and simplify those things to basically gain greater adoption. >> So if you think about it, what is, I mean I have my own idea, but what do you think the metric that businesses should be using as they conceive of how to source different tools and invest in different tools, put things together. I think it's increasingly we're going to talk about time to value. What do you think? >> I think time to value is one. I think another one you could look at is the number of people who have access to the data to create insights. Right so you know, you can say a 100% of my company has access to the data and analytics that they need to help their function run better. Whatever it is, that's a pretty awesome accomplishment. And you know, there's a bunch of people who may or may not have 100% but they're pretty close, right. And they've really become a data driven enterprise. And then you have lots of companies what are sort of stuck with, okay we have this usecase running, thank goodness. Took us two years and a couple million bucks and now they're trying to figure out how to get to the next step. And so they have five users who are able to use their data platform successfully. That's you know, I think that's a big measure of success. >> So I want to talk quickly about, if I may about the cloud. >> David: Yeah. >> Because it's pretty clear there are a number of, that there are some very, very large shops. >> David: Yep. >> That are starting to conceive of important parts of their overall approach to data. >> David: Right. >> And putting things into the cloud. There's a lot of advantages of doing it that way. At the same time they're also thinking about, and how I'm going to integrate, the models that I generate out of big data back into applications that might be running in a lot of different places. >> Right. >> That suggests there's going to be a new challenge on the horizon. Of how do we think about end to end bringing applications together with predictable date of movement and control and other types of activities. >> David: Yeah. >> Do you agree that's on the horizon of how we think about end to end performance across multiple different clouds? >> I think that's coming, you know, I think I'm still surprised at how many people have not figured out that the economic and agility advantages of cloud, are so great, that'd you'd be honestly foolish not to, you know, consider cloud and have that proactive way to migrate there. And so there is just you know a shocking amount of companies that are still plotting away, you know, and building their own prime infrastructures et cetera. And they still have hesitancy and questions about the cloud. I do think that you're right, but I think what you're talking about is, you know, three to five years out for the mainstream in the industry. Certainly there are early adopters you know, who have sort of gotten there. They're talking about that now. But as sort of a mainstream phenomenon I think that's a couple years out. >> Excuse me Peter, one of the things that just kind of made me think of was, you know, these companies as what you're saying, that is till had hesitancy regarding cloud. >> Right. >> And kind of vendor lock in popped into my head. And that kid of brought me back to one of the things that you were mentioning in the beginning. Open source, complexity there. >> David: Yep. >> Are you seeing, or are you helping companies to go back to more of that commercialized proprietary software. Are you seeing a shift in enterprises being less concerned about lock-in because they want simplicity? >> You know that's a great question. I think in the big data space it's hard to avoid, you know, sort of going down the open source path. I think what people are getting concerned about is getting locked into a single cloud vendor. So more and more of the conversations we have are about, what are your multi-cloud and eventually cross-cloud capabilities? >> Peter: That's the question I just asked, right. >> Exactly so I think more and more of that's coming to the front. I was with a large, very large healthcare company a week ago, and I said, what's your cloud strategy? And they said we have a no vendor left behind policy. So you know our, we're standardized on Azure, we've got a bunch of pilots on AWS, and we're planning to move from a data warehousing vendor to Oracle in the cloud. Ha so, I think for large companies a lot of them can't control the fact that different division, departments, whatever will use different clouds. So architecturally, they're going to have to start to think about using these multi-cloud, cross-cloud you know, scenarios. And you know, most large companies, given a choice, will not bet the farm on a single cloud provider. And you know, we're great partners and we love Amazon, but every time they have you know, an S3 outage like they had a few months ago. You know, it really makes people think carefully about what their infrastructure is and how they're dealing with reliability. >> Well in fairness they don't have that many, >> They don't, it only takes one. >> That's right, that's right, and there's reasons to suspect that there will be increased specialization of services in the cloud. >> David: Correct. >> So I mean it's going to get more complex as we go as well. >> David: Oh absolutely correct. >> Not less. >> Well David Hseih, SVP of Marketing at Qubole. Thank you so much for joining, >> Thank you. >> And sharing your insights with Peter and myself. It's been very insightful. >> Right. >> So this is another great example of how we've been talking about the Warriors and food, Sonic was brought up into play here. >> David: Exactly, go Sonic. Very exciting you never know what's going to happen on theCUBE. So for David and Peter, I am Lisa Martin, You're watching Day One, of the Data Work Summit, in the heart of Silicon Valley. But stick around because we've got more great content coming your way.
SUMMARY :
Brought to you by Hortonworks. in the heart of Silicon Valley. Welcome to theCUBE, we're glad you still have a voice It was a close call 'cause I was So talk to us about you the SVP of Marketing for Qubole. and the barriers to entry and you know, is not about 70 to 80% of the efforts to reach So the research we've done, suggest something as well That kind of failure is still okay. So it's not the failure in the Like you have an idea, you try it out, that's all great. it's failure in the implementation of the infrastructure. If the end result what you hoped, great. So let me test something on you Dave, David. So by that I mean if you put an operating, If you put a development tool same kind of thing. and the use of big data in their business. But at the end of the day, you know, unlike are designed to deliver the result you pay for. So give us an example David, you know, you're a They know there's value in the data, and really start making this big data project successful? So actually the other day, I was sort of surfing the the settings, and you know, how to manage costs, So you know, the more people you have Or the more users you have, the more it gives. or automates stuff that you don't have to do anymore. you know, up into applications. many of the tools that are here right now. And you know, it's going to be, you know, I mean I have my own idea, but what do you think And you know, there's a bunch of people who may or that there are some very, very large shops. of their overall approach to data. and how I'm going to integrate, the models That suggests there's going to be I think that's coming, you know, I think I'm still just kind of made me think of was, you know, And that kid of brought me back to one of the things Are you seeing, or are you helping companies So more and more of the conversations we have And you know, we're great partners and we love Amazon, to suspect that there will be increased Thank you so much for joining, And sharing your insights with Peter and myself. talking about the Warriors and food, Very exciting you never know what's
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
David | PERSON | 0.99+ |
Peter Burgess | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
David Hseih | PERSON | 0.99+ |
Peter | PERSON | 0.99+ |
Lisa | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
70 | QUANTITY | 0.99+ |
Dave | PERSON | 0.99+ |
five users | QUANTITY | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
2018 | DATE | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
San Jose | LOCATION | 0.99+ |
two years | QUANTITY | 0.99+ |
100% | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
24 open source projects | QUANTITY | 0.99+ |
80% | QUANTITY | 0.99+ |
three | QUANTITY | 0.99+ |
Gardner | PERSON | 0.99+ |
Qubole | ORGANIZATION | 0.99+ |
Sonic Drive-In | ORGANIZATION | 0.99+ |
Linus Torvalds | PERSON | 0.99+ |
yesterday | DATE | 0.99+ |
two things | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
five years | QUANTITY | 0.99+ |
DataWorks Summit | EVENT | 0.99+ |
Today | DATE | 0.98+ |
Wikibon | ORGANIZATION | 0.98+ |
Data Work Summit | EVENT | 0.98+ |
a week ago | DATE | 0.98+ |
eight versions | QUANTITY | 0.97+ |
last night | DATE | 0.97+ |
theCUBE | ORGANIZATION | 0.97+ |
Day One | QUANTITY | 0.96+ |
today | DATE | 0.96+ |
DataWorks Summit 2017 | EVENT | 0.96+ |
Sonic | PERSON | 0.95+ |
single cloud | QUANTITY | 0.95+ |
180,000 people | QUANTITY | 0.95+ |
Hortonworks | ORGANIZATION | 0.93+ |
day one | QUANTITY | 0.93+ |
S3 | COMMERCIAL_ITEM | 0.93+ |
Spark | ORGANIZATION | 0.92+ |
Azure | TITLE | 0.92+ |
Cloudera | ORGANIZATION | 0.92+ |
SVP | PERSON | 0.89+ |
few months ago | DATE | 0.89+ |
few weeks ago | DATE | 0.86+ |
first autonomous data platform | QUANTITY | 0.86+ |
a couple weeks ago | DATE | 0.81+ |
Kellyn Pot'Vin Gorman, Delphix - Data Platforms 2017 - #DataPlatforms2017
>> Announcer: Live from the Wigwam in Phoenix, Arizona. It's theCUBE covering Data Platforms 2017. Brought to you by Qubole. >> Hey welcome back everybody. Jeff Frick here with theCUBE. We're at the historic Wigwam Resort. 99 years young just outside of Phoenix. At Data Platforms 2017. I'm Jeff Frick here with George Gilbert from Wikibon who's co-hosting with me all day. Getting to the end of the day. And we're excited to have our next guest. She is Kellyn Gorman. The technical intelligence manager and also the office of the CTO at Delphix, welcome. >> Yes, thank you, thank you so much. >> Absolutely, so what is Delphix for people that aren't familiar with Delphix? >> Most of realize that the database and data in general is the bottleneck and Delphix completely revolutionizes that. We remove it from being the bottleneck by virtualizing data. >> So you must love this show. >> Oh I do, I do. I'm hearing all about all kinds of new terms that we can take advantage of. >> Right, Cloud-Native and SEPRATE, you know and I think just the whole concept of atomic computing. Breaking down, removing storage, from serve. Breaking it down into smaller parts. Sounds like it fits right into kind of your guys will house. >> Yeah, I kind of want to containerize it all and be able to move it everywhere. But I love it. Yeah. >> So what do you think of this whole concept of Data Ops? We've been talking about Dev Ops for, I don't know how long... How long have we been talking about Dev Ops George? Five years? Six years? A while? >> Yeah a while (small chuckle) >> But now... >> Actually maybe eight years. >> Jeff: you're dating yourself George. (all laugh) Now we're talking about Data Ops, right? And there's a lot of talk of Data Ops. So this is the first time I've really heard it coined in such a way where it really becomes the primary driver in the way that you basically deliver value inside your organization. >> Oh absolutely. You know I come from the database realm. I was a DBA for over two decades and Dev Ops was a hard sell to a lot of DBAs. They didn't want to hear about it. I tried to introduce it over and over. The idea of automating and taking us kind of out this manual intervention. That introduced many times human error. So Dev Ops was a huge step forward getting that out of there. But the database was still in data in general was still this bottleneck. So Data Ops is the idea that you automate all of this and if you virtualize that data we found with Delphix that removed that last hurdle. And that was my, I guess my session was on virtualizing big data. The idea that I could take any kind of structured or unstructured file and virtualize that as well and instead of deploying it to multiple environments, I was able to deploy it once and actually do IO on demand. >> So let's peel the onion on that a little bit. What does it mean to virtualize data? And how does that break databases' bottleneck on the application? >> Well right now, when you talk about a relational data or any kind of legacy data store, people are duplicating that through our kick processes. So if we talk about Oracle they're using things like Datapump. They're using transportable table spaces. These are very cumbersome they take a very long time. Especially with the introduction of the cloud, there's many room for failure. It's not made for that, especially as the network is our last bottleneck. Is what we're also feeling too for many of these folks. When we introduce big data, many of these environments many of these, I guess you'd say projects came out of open source. They were done as a need, as a necessity to fulfill. And they've got a lot of moving pieces. And to be able to containerize that and then deploy it once and the virtualize it so instead of let's say you have 16 gigs that you need to duplicate here and over and over again. Especially if you're going on-prem or to the cloud. That I'm able to do it once and then do that IO on demand and go back to a gold copy a central location. And it makes it look like it's there. I was able to deploy a 16 gig file to multiple environments in less than a minute. And then each of those developers each have their own environment. Each tester has their own and they actually have a read write full robust copy. That's amazing to folks. All of a sudden, they're not held back by it. >> So our infrastructure analysts and our Wikibon research CTO David Floyer, if I'm understanding this correctly, talks about this where it's almost like a snapshot. >> Absolutely >> And it's a read write snapshot although you're probably not going to merge it back into the original. And this way Dev tests and whoever else wants to operate on live data can do that. >> Absolutely, it's full read write what we call it data version control. We've always had version control at the cold level. You may of had it at the actual server level. But you've rarely ever had it at the data level for the database or with flat files. What I used was the cms.gov data. It's available to everyone, it's public data. And we realized that these files were quite large and cumbersome. And I was able to reproduce it and enhance what they were doing at TIME magazine. And create a used case that made sense to a lot of people. Things that they're seeing in their real world environments. >> So, tell us more, elaborate how dev ops expands on this, I'm sorry, not dev ops data ops. How, take that as an example and generalize it some more so that we see how if DBAs were a bottleneck. How they now can become an enabler? >> One it's getting them to raise new skills. Many DBAs think that their value relies on those archaic processes. "It's going to take me three weeks to do this." So I have three weeks of value. Instead of saying "I am going to be able to do this in one day" and those other resources are now also valuable because they're doing their jobs. We're also seeing that data was seen as the centralized point. People were trying to come up with these pain points of solution to them. We're able to take that out completely. And people are able to embrace agility. They have agile environments now. Dev Ops means that they're able to automate that very easily instead of having that stopping point of constantly hitting a data and saying "I've got to take time to refresh this." "How am I going to refresh it?" "Can I do just certain..." We hear about this all the time with testing. When I go to testing summits, they are trying to create synchronized virtualized data. They're creating test data sets that they have to manage. It may not be the same as production where I can actually create a container of the entire developmental production environment. And refresh that back. And people are working on their full product. There's no room for error that you're seeing. Where you would have that if you were just taking a piece of it. Or if you were able to just grab just one tier of that environment because the data was too large before. >> So would the automation part be a generation of snapshot one or more snapshots. And then the sort of orchestration distribution to get it to the intended audiences? >> Yes, and we would use >> Okay. things like Jenkins through Chev normal dev ops tools work along with this. Along with command line utilities that are part of our product. To allow people to just create what they would create normally. But many times it's been siloed and like I said, work around that data. We've included the data as part of that. That they can deploy it just as fast. >> So a lot of the conversation here this morning was really about put the data all in this through your or pick your favorite public cloud to enable access to all the applications to the UPIs, through all different types of things. How does that impact kind of what you guys do in terms of conceptually? >> If you're able to containerize that it makes you capable of deploying to multiple clouds. Which is what we're finding. About 60% of our customers are in more than one cloud, two to five exactly. As we're dealing with that and recognizing that it's kind of like looking at your cloud environments. Like your phone providers. People see something shiny and new a better price point, lesser dollar. We're able to provide that one by saving all that storage space. It's virtualized, it's not taking a lot of disc space. Second of all, we're seeing them say "You know, I'm going to go over to Google." Oh guess what? This project says they need the data and they need to actually take the data source over to Amazon now. We're able to do that very easily. And we do it from multi tier. Flat files, the data, legacy data sources as well as our application tier. >> Now, when you're doing these snapshots, my understanding if I'm getting it right, is it's like a, it's not a full Xerox. It's more like the Delta. Like if someone's doing test dev they have some portion of the source of the source of truth, and as they make changes to it, it grows to include the edits until they're done, in which case then the whole thing is blown away. >> It depends on the technology you're looking at. Ours is able to trap that. So when we're talking about a virtual database, we're using the native recovery mechanisms. To kind of think of it as a perpetual recovery state inside our Delphix engine. So those changes are going on and then you have your VDBs that are a snapshot in time that they're working on. >> Oh so like you take a snapshot and then it's like a journal >> the transactional data is from the logs is continually applied. Of course it's different depending on each technology. So we do it differently for Cybase versus Oracle versus Sequal server and so on and so forth. Virtual files when we talk about flat files are different as well. Your parent, you take an exact snapshot of it. But it's really just projecting that NFS mount to another place. So that mount, if you replace those files, or update them of course, then you would be able to refresh and create a new shot of those files. So somebody said "We refresh these files every single night." You would be able to then refresh and project them out to the new place. >> Oh so you're, it's almost like you're sub-classing them... >> Yes. >> Okay, interesting... When you go into a company that's got a big data initiative, where do you fit in the discussion, in the sequence how do you position the value add relative to the data platform that it's sort of the center of the priority of getting it a platform in place? >> Well, that's what's so interesting about this is that we haven't really talked to a lot of big data companies. We've been very relational over a period of time. But our product is very much a Swiss Army knife. It will work on flat files. We've been doing it for multi tier environments forever. It's that our customers are now going "I have 96 petabytes in Oracle. I'm about to move over to big data." so I was able to go out and say we how would I do this in a big data environment? And I found this used case being used by TIME magazine and then created my environment. And did it off of Amazon. But it was just a used case. I was just a proof of concept that I built to show and demonstrate that. Yeah, my guy's back at the office are going "Kellyn when you're done with it, you can just deliver it back to us." (laughing) >> Jeff: Alright Kellyn. Well thank you for taking a few minutes to stop by and pretty interesting story. Everything's getting virtualized machines, databases... >> Soon us! >> And our data. >> Soon George! >> Right, not me George... (George laughs) Alright, thanks again Kellyn >> Thank you so much. >> for stopping by. Alright I'm with George Gilbert. I'm Jeff Frick you're watching theCUBE from Data Platforms 2017 in Phoenix, Arizona. Thanks for watching. (upbeat electronic music)
SUMMARY :
Brought to you by Qubole. and also the office of the CTO at Delphix, welcome. Most of realize that the database that we can take advantage of. Right, Cloud-Native and SEPRATE, you know and be able to move it everywhere. So what do you think of this whole concept in the way that you basically deliver and instead of deploying it to multiple environments, What does it mean to virtualize data? And to be able to containerize that and our Wikibon research CTO David Floyer, into the original. You may of had it at the actual server level. so that we see how if DBAs were a bottleneck. They're creating test data sets that they have to manage. distribution to get it to the intended audiences? To allow people to just create what So a lot of the conversation here the data source over to Amazon now. of the source of truth, and as they make and then you have your VDBs that NFS mount to another place. Oh so you're, it's almost like you're to the data platform that it's sort of I'm about to move over to big data." to stop by and pretty interesting story. Right, not me George... Alright I'm with George Gilbert.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
Jeff | PERSON | 0.99+ |
Kellyn Gorman | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
Kellyn | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
George | PERSON | 0.99+ |
two | QUANTITY | 0.99+ |
three weeks | QUANTITY | 0.99+ |
16 gig | QUANTITY | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Phoenix | LOCATION | 0.99+ |
Five years | QUANTITY | 0.99+ |
eight years | QUANTITY | 0.99+ |
Six years | QUANTITY | 0.99+ |
16 gigs | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
less than a minute | QUANTITY | 0.99+ |
each | QUANTITY | 0.99+ |
99 years | QUANTITY | 0.99+ |
Xerox | ORGANIZATION | 0.99+ |
Phoenix, Arizona | LOCATION | 0.99+ |
Delphix | ORGANIZATION | 0.99+ |
Swiss Army | ORGANIZATION | 0.99+ |
96 petabytes | QUANTITY | 0.98+ |
David Floyer | PERSON | 0.98+ |
About 60% | QUANTITY | 0.98+ |
Each tester | QUANTITY | 0.98+ |
Wikibon | ORGANIZATION | 0.98+ |
more than one cloud | QUANTITY | 0.98+ |
Second | QUANTITY | 0.98+ |
one day | QUANTITY | 0.98+ |
first time | QUANTITY | 0.97+ |
TIME | TITLE | 0.97+ |
five | QUANTITY | 0.97+ |
Ops | TITLE | 0.96+ |
each technology | QUANTITY | 0.96+ |
Qubole | PERSON | 0.96+ |
CTO | PERSON | 0.95+ |
one tier | QUANTITY | 0.94+ |
theCUBE | ORGANIZATION | 0.94+ |
Chev | TITLE | 0.93+ |
#DataPlatforms2017 | EVENT | 0.92+ |
Dev Ops | TITLE | 0.91+ |
this morning | DATE | 0.89+ |
Kellyn Pot'Vin Gorman | PERSON | 0.88+ |
over two decades | QUANTITY | 0.87+ |
one | QUANTITY | 0.82+ |
Delphix | TITLE | 0.81+ |
One | QUANTITY | 0.77+ |
Datapump | ORGANIZATION | 0.75+ |
Wigwam Resort | LOCATION | 0.75+ |
Ops | ORGANIZATION | 0.73+ |
single night | QUANTITY | 0.72+ |
Jenkins | TITLE | 0.71+ |
Wigwam | LOCATION | 0.71+ |
Sequal | ORGANIZATION | 0.7+ |
Data | TITLE | 0.66+ |
Platforms | EVENT | 0.65+ |
Data Platforms 2017 | EVENT | 0.64+ |
SEPRATE | PERSON | 0.63+ |
cms.gov | OTHER | 0.56+ |
Cybase | ORGANIZATION | 0.56+ |
Cloud- | ORGANIZATION | 0.55+ |
Delta | ORGANIZATION | 0.54+ |
Data Ops | ORGANIZATION | 0.52+ |
2017 | DATE | 0.44+ |
Colin Riddell, Epic Games - Data Platforms 2017 - #DataPlatforms2017
>> Narrator: Live from The Wigwam in Phoenix, Arizona, it's the CUBE. Covering Data Platforms 2017. Brought to you by Qubole. (techno music) >> Hey, welcome back everybody. Jeff Frick here with the CUBE. We are in The Wigwam Resort, historic Wigwam Resort, just outside of Phoenix, Arizona at Data Platforms 2017. It's a new Big Data event. You might say, god there's already a lot of Big Data events, but Qubole's taken a different approach to Big Data. Cloud-first, cloud-native, you're integrated with all the big public clouds and they all come from Big Data backgrounds, practitioner backgrounds. So it's a really cool thing and we're really excited to have our next guest, Colin Ridell, he's a Big Data architect from Epic Games, was up on a panel earlier today. Colin, Welcome. >> Thank you, thank you for having me. >> Absolutely, so, enjoyed your panel, a lot of topics that you guys covered. One of the ones we hear over and over again is get early wins. How do you drive adoption, change people's behaviors, it's not really a technology story. It's a human factors and behaviors story. So I wonder if you can share some of your experience, some best practices, some stories. >> So I don't know if there's really a rule book on best practices for that. Every environment is different, every company is different. But one thing that seems to be constant is resistance to change in a lot of the places, so... >> Jeff: That is consistent. >> We had some challenges when I came in. We were running a system that was on it's last legs basically, and we had to replace it. There was really no choice. There was no fixing it. And so, I did actually encounter a fair bit of resistance with regards to that when I started at Epic. >> Now it's interesting, you said a fair amount of resistance. Another one of your lessons was start slow, find some early wins, but you said, that you were thrown into a big project right off the bat. >> Colin: So, we were, yeah. >> I'm curious, how did the big project go, but when you do start slow, how small does it need to be where you can start to get these wins to break down the resistance. >> I think what we, the way we approached it was we looked at what was the most crucial process, or the most crucial set of processes. And that's where we started. So that was what we tried to convert first and then make that data available to people via an alternative method, which was Hive. And once people started using it and learned how to interact with it properly the barriers start to fall. >> What were some of the difficult change management issues? Where did you come from in terms of the technology platform and what resistance did you hit? >> So it was really a user interface was the main factor of resistance. So we were running a Hadoop cluster. It was fixed sized, it wasn't on PRaM, but it was in a private cloud. It was basically, simply being overloaded. We had to do constant maintenance on it. We had to prop it up. And it was, the performance was degrading and degrading and degrading. The idea behind the replacement was really to give us something that was scalable, that would grow in the future, that wouldn't run into these performance blockers that we were having. But again, like I said, the hardest factor was the user interface differences. People were used to the tool set that they were working with, they liked the way it worked. >> What was the tool set? >> I would rather not actually say that on camera, >> Jeff: That's fine. >> Does it source itself in Redmond or something? >> No, no it doesn't, they're not from Redmond. I just don't want to cast aspersions. >> No, you don't need to cast aspersions. The conflict was really just around familiarity with the tool, it wasn't really about a wholesale change in behavior and becoming more data-centric. >> No, because the tool that we replaced was an effort to become more data-centric to begin with. There definitely was a corporate culture of we want to be more data-informed. So that was not one of the factors that we had to overcome. It was really tool-based. >> But the games market is so competitive, right? You guys have to be on your game all the time and you got to keep an eye on what everybody else is doing in their games, and make course corrections as I understand, something becomes hot, or new, so you guys have to be super nimble on your feet. How does taking this approach help you be more nimble in the way that you guys get new code out, new functionality? >> It's really, really very easy for us now to inject new events into the game, we basically can break those events out and report on them or analyze what's going on in the game for free with the architecture that we have now. >> Does that mean it's the equivalent of, in IT operations, we instrument everything from the applications, to the middleware, down to the hardware. Are you essentially doing the same to the game so you can follow the pathway of a gamer, or the hotspots of all the gamers, that sort of thing? >> I'm not sure I fully understand your question. >> When you're running analytics on a massively multi-player game, what questions are you seeking to answer? >> Really what we are seeking to answer at the moment is what brings people back? What behaviors can we foster in-- >> Engagement. >> in our players. Yeah, engagement, exactly. >> And that's how you measure engagement, it's just as simple as, do they come back or time on game? >> That's the most simple measure that we use for it, yeah. >> So Colin, we're short on time, want to give you the last word. When you come to a conference like this, there's a lot of peer interaction, there's some great questions coming out of the panel, around specifically, how do you measure success? It wasn't technical at all. It's, what are the things that you're using to measure whether stuff is working. I wonder if you can talk to the power of being in an ecosystem of peers here. Any surprises or great insights that you've got. I know we've only been here for a couple days. >> I would say that one of the biggest values, obviously the sessions and the breakouts are great, but I think one of the greatest values of here is simply the networking aspect of it. The being able to speak to people who are facing similar challenges, or doing similar things. Even although they're in a completely different domain, the problems are constant. Or common at least. How do you do machine learning to categorize player behaviors in our case and in other cases it's categorization of feedback that people get from websites, stuff like that. I really think the networking aspect is the most valuable thing to conferences like this. >> Alright, awesome. Well, Colin Ridell, Epic Games, thanks for taking a few minutes to stop by the CUBE. >> You're welcome, more than welcome, thank you very much. >> Absolutely, alright, George Gilbert, I'm Jeff Frick, you're watching the CUBE from Data Platforms 2017 at the historic Wigwam Resort. Thanks for watching. (upbeat techno music)
SUMMARY :
Brought to you by Qubole. from Epic Games, was up on a panel earlier today. So I wonder if you can share some of your experience, is resistance to change in a lot of the places, so... There was really no choice. that you were thrown into a big project right off the bat. but when you do start slow, how small does it need to be So that was what we tried to convert first The idea behind the replacement was really to I just don't want to cast aspersions. No, you don't need to cast aspersions. So that was not one of the factors that we had to overcome. more nimble in the way that you guys in the game for free with the architecture that we have now. from the applications, to the middleware, in our players. I wonder if you can talk to the power of being How do you do machine learning thanks for taking a few minutes to stop by the CUBE. from Data Platforms 2017 at the historic Wigwam Resort.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
Colin Ridell | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
Jeff | PERSON | 0.99+ |
Colin | PERSON | 0.99+ |
Epic Games | ORGANIZATION | 0.99+ |
Colin Riddell | PERSON | 0.99+ |
Phoenix, Arizona | LOCATION | 0.98+ |
Wigwam Resort | LOCATION | 0.98+ |
one thing | QUANTITY | 0.96+ |
One | QUANTITY | 0.96+ |
Data Platforms 2017 | EVENT | 0.95+ |
Redmond | ORGANIZATION | 0.94+ |
CUBE | ORGANIZATION | 0.94+ |
Qubole | PERSON | 0.94+ |
Epic | ORGANIZATION | 0.91+ |
#DataPlatforms2017 | EVENT | 0.87+ |
one | QUANTITY | 0.86+ |
earlier today | DATE | 0.82+ |
Narrator: Live from The Wigwam | TITLE | 0.79+ |
first | QUANTITY | 0.71+ |
one of the factors | QUANTITY | 0.67+ |
Covering | EVENT | 0.65+ |
couple | QUANTITY | 0.48+ |
Tripp Smith, Clarity - Data Platforms 2017 - #DataPlatforms2017
>> Narrator: Live from the Wigwam in Phoenix Arizona, it's theCUBE, covering data platforms 2017, brought to you by Qubole. >> Hey welcome back everybody, Jeff Frick here with theCUBE. I'm joined by George Gilbert from Wikibond and we're at DataPlatforms 2017. Small conference down at the historic Wigwam Resort, just outside of Phoenix, talking about, kind of a new approach to big data really. A Cloud native approach to big data and really kind of flipping the old model on it's head. We're really excited to be joined by Tripp Smith, he's the CTO of Clarity Insights, up on a panel earlier today. So first off, welcome Tripp. >> Thank you. >> For the folks that aren't familiar with Clarity Insights Give us a little background. >> So Clarity is a pure play data analytics professional services company. That's all we do. We say we advise, build and enable for our client. So what that means, is data strategy, data engineering and data science and making sure that we can action the insights that our customers get out of their data analytics platforms. >> Jeff: So not a real busy area these days. >> It's growing pretty well. >> Good for you. So a lot of interesting stuff came up on the panel. But one of the things that you reacted to, I reacted to as well from the keynote. Was this concept of, you know before you had kind of the data scientist with the data platform behind them, being service providers to the basic business units. Really turning that model on it's head. Giving access to the data to all the business units, and people that want to consume that. Making the data team really enablers of kind of a platform play. Seemed to really resonate with you as well. >> Yeah absolutely, so if you think about it, a lot of the focus on legacy platforms was driven by, scarcity around the resources to deal with data. So you created this almost pyramid structure with IT and architecture at the top. They were the gatekeepers and kind of the single door where Insights got out to the business. >> Jeff: Right. >> So in the big data world and with Cloud, with elastic scale, we've been able to turn that around and actually create much more collaborative friction in parallel with the business. Putting the data engineers, data scientists and business focus analystist together and making them more of partners, than just customers of IT. >> Jeff: Right, very interesting way, to think of it as a partner. It's a very different mindset. The other piece that came up over and over in the Q&A at the end. Was how do people get started? How are they successful? So you deal with a lot of customers, right? That's your business. What are some stories, or one that you can share of best practices, when people come and they say, we obviously hired you, we wrote a check. But how do we get started, where do we go first? How do you help people out? >> We focus on self funding analytic programs. Getting those early wins, tend to pay for more investment in analytics. So if you look at the ability to scale out as a starting point. Then aligning that business value and the roadmap in a way that going to both demonstrate the value along the way, and contribute to that capability is important. I think we also recommend to our clients that they solve the hard problems around security and data governance and compliance first. Because that allows them to deal with more valuable data and put that to work for their business. >> So is there any kind of low hanging fruit that you see time and time and time again? That just is like, ah we can do this. We know it's got huge ROI. It's either neglected cause they don't think it's valuable or it's neglected because it's in the backroom. Or is there any easy steps that you find some patterns? >> Yeah, absolutely. So we go to market by industry vertical. So within each vertical, we've defined the value maps and ROI levers within that business. Then align a lot of our analytic solutions to those ROI levers. In doing that, we focus this on being able to build a small, multifunctional team that can work directly with the business. Then deliver that in real time in an interactive way. >> Right, another thing you just talked about security and government, are we past the security concerns about public Cloud? Does that even come up as an issue anymore? >> You know, I think there was a great comment today that if you had money, you wouldn't put it in your safe at home. You'd put it in a bank. >> Jeff: I missed that one, that's a good one. >> The Cloud providers are really focused on security in a way that they can invest in it. That an individual enterprise really can't. So in a lot of cases, moving to the Cloud means, letting the experts take on the area that they're really good at and letting you focus on your business. >> Jeff: Right, interesting they had, Amazon is here, Google's here, Oracle's here and Azure is here. AWS reinvent one of my favorite things, is Tuesday night with James Hamilton. Which I don't know if you've ever been, it's a can't miss presentation. But he talks about the infrastructure investments that Amazon, AWS can make. Which again, compared to any individual enterprise are tremendous in not only security, but networking and all these other things that they do. So it really seems that the scale that these huge Cloud providers have now reach, gives them such an advantage over any individual enterprise, whether it's for security, or networking or anything else. So it's very different kind of a model. >> Yeah, absolutely, or even the application platform, like Google now having Spanner. Which has the scale advantage of Cassandra or H Based. The transactional capabilities of a traditional RDB mess. I guess my question is. Once a customer is considering Qubole, as a Cloud first data platform. How do you help the customer evaluate it? Relative to the dist rose that started out on Prim, and then the other Cloud native ones that are from Azure and Google and Amazon. >> You know I think that's a great question. It kind of focuses back on, letting the experts do what they're really good at. My business may not be differentiated by my ability to operate and support Hadoop. But it's really putting Hadoop to work in order to solve this business problems that makes me money. So when I look at something like Qubole, it's actually going to that expert and saying, "Hey own this for me and deliver this in a reliable way." Rather than me having to solve those problems over and over again myself. >> Do you think that those problems are not solved to the same degree by the Cloud native services? >> So I think there's definitely an ability to leverage Cloud data services. But there's also this aspect of administration and management, and understanding how those integrate within an ecosystem. That I don't think necessarily every company is going to be able to approach in the same way, that a company like Qubole can. So again, being able to shift that off and having that kind of support gives you the ability to focus back on what really makes a difference for you. >> So Tripp we're running out of time. We got a really tight schedule here. I'm just curious, it's a busy conference season. Big data's all over the place. How did you end up here? What is it about this conference and this technology that got you to come down to the, I think it's only a 106 today, weather to take it in. What do you see that's a special opportunity here? >> Yeah you know, this is Data Platforms 2017. It's been a really great conference, just in the focus on being able to look at Cloud and look at this differentiation. Outside of the realm of inventing new shiny objects and really putting it to work for new business cases and that sort of thing. >> Jeff: Well Tripp Smith, thanks for stopping by theCUBE. >> Excellent, Thank you guys for having me. >> All right, he's George Gilbert, I'm Jeff Frick. You're watching Data Platforms 2017 from the historic Wigwam Resort in Phoenix Arizona. Thanks for watching. (techno music)
SUMMARY :
brought to you by Qubole. and really kind of flipping the old model on it's head. For the folks that aren't familiar with Clarity Insights and data science and making sure that we can action Seemed to really resonate with you as well. So you created this almost pyramid structure So in the big data world and with Cloud, What are some stories, or one that you can share and put that to work for their business. that you see time and time and time again? to those ROI levers. that if you had money, and letting you focus on your business. So it really seems that the scale Relative to the dist rose that started out on Prim, But it's really putting Hadoop to work in order So again, being able to shift that off that got you to come down to the, and really putting it to work for new business cases from the historic Wigwam Resort in Phoenix Arizona.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Jeff | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
James Hamilton | PERSON | 0.99+ |
Phoenix | LOCATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Tripp Smith | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Tripp | PERSON | 0.99+ |
Clarity Insights | ORGANIZATION | 0.99+ |
Tuesday night | DATE | 0.99+ |
today | DATE | 0.99+ |
Clarity | ORGANIZATION | 0.99+ |
Hadoop | TITLE | 0.99+ |
both | QUANTITY | 0.98+ |
Phoenix Arizona | LOCATION | 0.98+ |
one | QUANTITY | 0.97+ |
Qubole | ORGANIZATION | 0.97+ |
Wigwam Resort | LOCATION | 0.96+ |
first | QUANTITY | 0.96+ |
Data Platforms 2017 | EVENT | 0.95+ |
106 | QUANTITY | 0.93+ |
each vertical | QUANTITY | 0.93+ |
Wikibond | ORGANIZATION | 0.92+ |
2017 | DATE | 0.92+ |
#DataPlatforms2017 | EVENT | 0.91+ |
DataPlatforms 2017 | EVENT | 0.9+ |
single door | QUANTITY | 0.89+ |
first data platform | QUANTITY | 0.88+ |
Narrator: Live from the | TITLE | 0.86+ |
Azure | TITLE | 0.82+ |
theCUBE | ORGANIZATION | 0.8+ |
Data | TITLE | 0.8+ |
Spanner | TITLE | 0.79+ |
Cassandra | TITLE | 0.6+ |
Wigwam | LOCATION | 0.58+ |
Insights | ORGANIZATION | 0.58+ |
Platforms 2017 | EVENT | 0.57+ |
CTO | PERSON | 0.53+ |
Cloud | TITLE | 0.52+ |
Prim | ORGANIZATION | 0.44+ |
Based | OTHER | 0.41+ |
H | TITLE | 0.37+ |
Karthik Ramasamy, Streamlio - Data Platforms 2017 - #DataPlatforms2020
>> Narrator: Hi from the Wigwam in Phoenix, Arizona, it is theCUBE, covering Data Platforms 2017. Brought to you by Qubole. >> Hey welcome back everybody. Jeff Frick with theCUBE. We are down at the historic Wigwam 99 years young just outside of Phoenix, Arizona, Data Platforms 2017. It is really talking about a new approach to big data in cloud put on by Qubole about 200 people, very interesting conversation this morning and we're really interested to have Karthik Ramasamy. He is the co-founder of Streamlio which is still in stealth mode according to his LinkedIn profile so we won't talk about that but long time Twitter guy and really shared some great lessons this morning about things that you guys learned while growing Twitter. So welcome. >> Thank you, thanks for having me. >> Absolutely. One of the key parts of your whole talk was this concept of real time. I always joke with people real time is in time to do something about it. You went through a bunch of examples of real time is really a variable depending on what the right application is but at Twitter real time was super, super important. >> Yes it is indeed important because the nature of the streaming data, the nature of the Twitter data is streaming data because the tweets are coming at a high velocity. And Twitter positioned itself as more of a real time delivery company because that way what happens is whatever the information that we get within Twitter we need to have a strong time budget before we can deliver it to people so that people when they consume the information the information is live or real time. >> But the real time too is becoming obviously for Twitter but for lot of big enterprises it is more and more important and the great analogy I referred before is you used a sample data, is the sample historic data to make decisions. Now you want to keep all the data in real time to make decisions, so its a very different way you drive your decision-making process. >> Very different way of thinking. Especially considering the fact as you said the enterprises are getting into understanding what real time means for them and but if you look at some of the traditional enterprise like financial, they understand the value of real time. Similarly the upcoming new used cases like IoT they understand the value of real time like autonomous vehicles where they have to make quick decisions. Healthcare you have to make quick decisions because the preventive and the predictive maintenance is very important in those kind of segments. So because of those segments, its getting really popular and traditional enterprises like retail and all they're also valuing real time because it allows to blend in into the user behavior so that they can recommend products and other things in real time so that people can react to that so that its becoming more and more important. That's what I would say. >> So Hadoop started out as mostly batch infrastructure and Twitter was pioneer in the design pattern to accommodate both batch and in real time. How has that big data infrastructure evolved so that one, you don't have to split batch in real time and what should we expect going forward to make that platform stronger in terms of in your real time analytics and potentially so that it can inform decisions in systems of record. >> I think like today as of now there are two different infrastructures. One is in general is the Hadoop infrastructure. Other one is more of a real time infrastructure at this point. And the Hadoop is kind of considered as this monolithic, not monolithic, its kind of a mega store where every data like similar to all the rivers kind of reach the sea, it kind of becomes a storage sea where all the data comes and stores there. But before the data comes and stores there, lot of analytics and lot of visibility about the data from the point of its creation before it ends up there it setting done on those rive, whatever you call the data river so you could get lot of analytics done during the time before it ends up so that its more live than the other analytics. Hadoop had its own kind of limitations in terms of how much data it can handle, how real time the data can be. For example, you can kind of dump the data in real time into Hadoop but until you close the file you cannot see the data at all. There is a time budget gets into play there. And you could do smaller files like small, small files writing but the namenode will blob because like within a day you write million files, the namenode is not going to sustain that. So those are the trade-off. That's one of the reason we have to end up doing new real time infrastructure like the distributor log that allows you to the moment the data comes in data is immediately visible within the three to five millisecond timeframe. >> The distributed log you're talking about would be Kafka. The output of that would be to train the model or just score a model and then would that model essentially be carved off from this big data platform and be integrated within a system of record where would informed decisions. >> There are multiple things you could do. First of all, the distributed log essentially the data is kind of, you can think about as a data staging environment where the data kind of lands up there and once it lands up there when there's a lot of sharing of that same data going on in real time, when several jobs are taking they're using some popular data source, it provides a high fan out in the sense like 100 jobs can consume the same data they can be at different parts of the data itself. So that provides a nice sharing environment. Now once the data is around there, now the data is being used for different kind of analytics and one of them could be a model enhancement because typically in the back segment you build the model because you're looking at lot of data and other things, then once the model is built that model is pre-loaded into the real time computer environment like HERON then you look up this model and serve data based on that model whatever it tells you. For example when you do a ad serving to look up that model and what is our relevant ad for you to click. Then the next aspect is model enhancement. Because users behavior is going to change, over a period of time. Now can you capture and incrementally update the model so that those things are also partly done on the real time aspects rather than recomputing the batch and again and again and again. >> Okay so its sort of like a what's the delta? >> Karthik: Yes. >> Let's train on the delta and lets score on the delta. >> Yes and once the delta gets updated then when the new user behavior comes in they can look at that new model what that's being continuously being enhanced and once that enhancement is kind of captured you know that user behavior is changing. And ads are served accordingly. >> Okay so now that our customers are getting serious about moving to the cloud with their big data platforms and the applications on them, have you seen a change in the patterns of the apps they're looking to build or a change in the makeup of the platform that they want to use. >> SO that depends on, typically like, one disclosure is I've worked with Amazon and all, the AWS but within the companies that I worked for its everything is an on frame but thing is having said that cloud is nice because it gives you machines on the fly whenever you need to and it gives a bunch of tools around it where you can bootstrap it and all the various stuff. This works ideal for a smaller company and medium companies but the big companies one of the this things that we calculate in terms of the costwise how much is the cost that we have to pay versus doing it inhouse so there's still a huge gap unless cloud provider is going to provide a huge discount or whatever for the big companies to move in. So that is always a challenge that we get into because think about I have 10 or 20,000 notes of Hadoop can I move all of them into Amazon AWS, how much I am going to pay? Versus the cost of maintaining my own data centers and everything. I would say like I don't know the latest pricing and other things but approximately it comes to three x in terms of cost wise. >> If you're using... >> Our own on-prem and the data center and all of the staffing and everything. There's a difference of I would say three x. >> For on-prem being higher. >> On-prem being lower. >> Lower? >> Yes. >> But that assumes then that you've got flat utilization. >> Flat utilization but, I mean cloud of course I have the expands out of scale and all the various thing that you can, it gives an illusion of unlimited resources but in our case if you're provisioning so much machines in most of the at least 50 or 60% of the machines are used for production but the rest of them are used for staging, development, and all the various other environments so which means like the total cost of those machines even though like only is 50% utilized still you end up saving so much shit like operate out one-third of the cost that might be in the cloud. >> Alright Karthik, that opens up a whole can of interesting conversations. Again we just don't have time to jump into. So I'll give you the last word. When can we expect you to come out of stealth or is that stealthy too? >> It is kind of, that is stealthy too. >> Okay fair enough, I don't want to put you on the spot but thanks for stopping by and sharing your story. >> Karthik: Thanks, thanks for everything. >> Alright, he is Karthik, he is George, I'm Jeff. You're watching theCUBE. We are in the Wigwam resort just outside of phoenix at Data Platforms 2017. We will be back after this short break. Thanks for watching.
SUMMARY :
Narrator: Hi from the Wigwam in Phoenix, Arizona, He is the co-founder of Streamlio One of the key parts of your whole talk was the nature of the streaming data, But the real time too is becoming obviously for Twitter Especially considering the fact as you said the evolved so that one, you don't have to split batch so that its more live than the other analytics. and then would that model essentially be carved off the data is kind of, you can think about as a data staging Yes and once the delta gets updated makeup of the platform that they want to use. one of the this things that we calculate in terms of and all of the staffing and everything. But that assumes then that you've got and all the various other environments So I'll give you the last word. on the spot but thanks for stopping by We are in the Wigwam resort just outside of phoenix
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Karthik | PERSON | 0.99+ |
Karthik Ramasamy | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
Jeff | PERSON | 0.99+ |
100 jobs | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
50% | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
AWS | ORGANIZATION | 0.99+ |
10 | QUANTITY | 0.99+ |
three | QUANTITY | 0.99+ |
Wigwam | LOCATION | 0.99+ |
ORGANIZATION | 0.98+ | |
20,000 notes | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
60% | QUANTITY | 0.98+ |
Streamlio | ORGANIZATION | 0.98+ |
Phoenix, Arizona | LOCATION | 0.98+ |
one | QUANTITY | 0.98+ |
one-third | QUANTITY | 0.98+ |
99 years | QUANTITY | 0.98+ |
Hadoop | TITLE | 0.97+ |
about 200 people | QUANTITY | 0.97+ |
today | DATE | 0.96+ |
One | QUANTITY | 0.96+ |
two different infrastructures | QUANTITY | 0.96+ |
Qubole | PERSON | 0.96+ |
First | QUANTITY | 0.89+ |
five millisecond | QUANTITY | 0.86+ |
theCUBE | ORGANIZATION | 0.86+ |
Data Platforms | ORGANIZATION | 0.85+ |
this morning | DATE | 0.82+ |
Amazon AWS | ORGANIZATION | 0.8+ |
a day | QUANTITY | 0.79+ |
HERON | ORGANIZATION | 0.7+ |
least 50 | QUANTITY | 0.62+ |
three x | QUANTITY | 0.6+ |
llion | QUANTITY | 0.49+ |
Kafka | TITLE | 0.49+ |
2017 | DATE | 0.47+ |
phoenix | ORGANIZATION | 0.46+ |
2017 | EVENT | 0.42+ |
Data Platforms | TITLE | 0.37+ |
Platforms | ORGANIZATION | 0.25+ |