Alex Sadovsky, Oracle - Data Platforms 2017 - #DataPlatforms2017
>> Announcer: Live from the Wigwam in Phoenix, Arizona it's the CUBE, covering Data Platforms 2017. Brought to you by Qubal. >> Hey, welcome back everybody, Jeff Frick here with the CUBE along with George Gilbert. We're at Data Platforms 2017 a the historic 99 years young Wigwam resort outside of Phoenix and we're excited to be joined by our next guest, Alex Sadovsky, he the director of data science for Oracle Data Cloud. Welcome. >> Thanks, thanks for having me. >> Absolutely, I know so I know we got a short time window, you're racing off to your next session. So, for the people that aren't here, what are you going to be talking about in your session here? >> So, the Oracle Data Cloud, what we do is online advertising and essentially we have lots and lots of data, customers comet to us and they have some sort of question in mind. They want to say, I want to figure out who's going to buy a mini-van in California next month, or who's going to get a hotel in Las Vegas, who's going to buy Kraft macaroni and cheese? All sorts of different questions. We have all of that data, we have to turn it into actionable insights, into audiences for them so they can advertise Facebook, Twitter, all over the web. And so, what this talk is really focusing on is how do we take all of this data and use it efficiently? And it's going to talk about the technologies that we've used specifically Hive, and then moving that technology over to Spark, just so that we can use more data, get quicker processing, and essentially make our clients have a better experience and give 'em a better product. >> And do the clients execute the results of this process inside their other Oracle apps, or is it something that they can use with any number of apps? >> So, a lot of the ways that we work, we actually are interfaced with companies like Facebook and Twitter directly. And so, essentially what we're doing is we're partnering with them so that the client, all they really need to do is kind of come to us either onboard some data through maybe other Oracle applications or onboard data directly through us and then push it out, we help push it all the way through the process, all the way into Facebook, etc. >> Yes, 'cause we were covering Oracle modern marketing, which is now Oracle modern customer experience, I'm sure you guys must be tightly integrated with all that. >> Yeah, and so for Oracle Data Cloud it's kind of interesting we're a collaboration of five recently acquired start-ups. And so it's everything from two to three years ago all of this coming together. So, for us, we're really excited because we're just at the tip of the iceberg of getting into the whole Oracle ecosystem and having that help build up our product even better. >> So, when you say partner with Facebook or Twitter, that would be for brand or direct response advertising that one of your B2B clients has signed up for? Or I should say, B2B, your B the client is B, and the end customer's C, so it's a B2B2C. And now okay, so you help them in a consultative way. You have the data, you have a consultative sales approach, are you building models for them? Or are you telling them, sort of running a model? >> Alex: Yeah. >> Sort of which is it? >> So, we will, we run models based upon data. So, a customer could come to us with, here are a thousand people that that customer knows bought their product last month, and they say, we want to expand our business, we want to advertise to 20 million people who might be similar to those thousand. And so that's where all of our data comes in. We can look at those thousand people and we can say, hey did you guys know that most of your customers are millennials? Did you know that most of them tend to live on the west coast or east coast populated cities? And we're not really consulting that in the sense of like there's people looking at the data, it's all machine learning. And so computers are looking at all of our data to help get insights from what the customers bringing to us. >> So, would it be fair to say then that the, let's say the thousand example that the customer brings in is the training data. >> Yes. >> And then you use your data in your databases, your consumer databases, to say, to generate essentially scores, since they were going to send out to these. >> That's 100% right. They come in with a thousand of their customers, we see how those customers rank up against every single household in the entire United States. >> I was going to say, we're going to be at Spark Summit in a couple weeks or a week, whenever it is. I can't keep track of all these shows. So, they can't do the whole thing wiHive to Spark, but in three minutes or less wiHive to Spark. >> So, number one reason for us, and number one reason I think a lot of people are moving to Spark is just speed. Without getting into a lot of technical details, there's just a lot better engine, a lot better flexible engine underneath Spark than kind of traditional Hive. >> And then machine learning models are, most of the libraries are built in, which Hive doesn't have. >> Yeah, machine learning is really built into Spark. There's, you know, whole projects within Spark built around that. And so, for us, we really, Spark considers machine learning kind of a first class citizen. And since that's essentially what our business is, we go 100% into Spark as well. >> So, let me ask you, what is the scope now and potentially in the future for these data based predictive models where customer comes to you with essentially some labeled data and then you'll come out with I guess that's the training data and then right now you have data in what categories? And then what categories would you like to have? >> So, we have data everything from what people are doing on the web, so what they're searching for, what websites they're going for. We have grocery store data. So, what people are buying in the grocery store. We have retail data. So, what people are buying in the malls. Because a lot of what happens is, even though consumers are spending a lot more time on the web, 80%-90% of purchases are still made in the store. So, we have all of this actual real world purchase data that we've partnered with different retail partners, including like automotive data, too. So that's really like the core of our data. So, really what we try to do is have data sets strategically placed all around and that's why the Oracle Data Cloud is made up of so many different start-ups, we're really getting expertise from different areas for different data sets to bring that together. >> Do you need to buy those sources of data? Or can you license? >> Data is everything from licensed to purchased outright to shared, revenue sharing with other companies. It's really, there's a huge data market right now. It's kind of the data gold rush and we're trying get in anywhere we can, figure out what's going to help us and what's going to help our customers make better models. >> What would you like to see in terms of a, if you look out a couple years, where would you like to see your data assets sort of augment all your Oracle applications? >> Yeah, so I think... SO, augmenting Oracle really we have so many different data assets that everything from like live streaming data, of what people are searching for on the web, to historically what someone has bought in the last three years and so, as we partner more and more with Oracle, Oracle has different things in healthcare, in retail, in all sorts of B2B applications. And our data really can fit almost everywhere. It's really like a data driven sort of product. And so, we've been partnering with Oracle left and right many different groups just trying to figure out where can this data help augment kind of your services. >> Alright, Alex, well, we got to leave it there. That was a good summary. I know you got to race off to your thing. I'll let you take a breath and get a glass of water. So thanks for squeezing us in your busy day. >> Alex: Thanks so much. >> Alright, he's Alex, he's George, I'm Jeff, you're watching the CUBE from Data Platforms 2017. We'll be right back after this short break. Thanks for watching.
SUMMARY :
Brought to you by Qubal. We're at Data Platforms 2017 a the historic 99 years what are you going to be talking about in your session here? and essentially we have lots and lots of data, So, a lot of the ways that we work, I'm sure you guys must be tightly integrated with all that. So, for us, we're really excited because we're just at the You have the data, you have a consultative sales approach, and they say, we want to expand our business, let's say the thousand example that the customer brings in And then you use your data in your databases, household in the entire United States. So, they can't do the whole thing wiHive to Spark, So, number one reason for us, most of the libraries are built in, And so, for us, we really, Spark considers machine learning So, we have data everything from what people are doing It's kind of the data gold rush of what people are searching for on the web, I know you got to race off to your thing. Thanks for watching.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Alex Sadovsky | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
California | LOCATION | 0.99+ |
Alex | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
Jeff | PERSON | 0.99+ |
100% | QUANTITY | 0.99+ |
80% | QUANTITY | 0.99+ |
United States | LOCATION | 0.99+ |
five | QUANTITY | 0.99+ |
Las Vegas | LOCATION | 0.99+ |
next month | DATE | 0.99+ |
Phoenix | LOCATION | 0.99+ |
20 million people | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
90% | QUANTITY | 0.99+ |
thousand people | QUANTITY | 0.99+ |
three minutes | QUANTITY | 0.99+ |
Qubal | PERSON | 0.99+ |
last month | DATE | 0.99+ |
ORGANIZATION | 0.98+ | |
Oracle Data Cloud | ORGANIZATION | 0.98+ |
99 years | QUANTITY | 0.98+ |
thousand | QUANTITY | 0.98+ |
Phoenix, Arizona | LOCATION | 0.98+ |
Spark | TITLE | 0.98+ |
Wigwam | LOCATION | 0.96+ |
CUBE | ORGANIZATION | 0.96+ |
Spark | ORGANIZATION | 0.95+ |
a week | QUANTITY | 0.95+ |
one | QUANTITY | 0.94+ |
three years ago | DATE | 0.89+ |
last three years | DATE | 0.88+ |
two | DATE | 0.88+ |
number one reason | QUANTITY | 0.82+ |
Data Platforms | TITLE | 0.79+ |
Hive | TITLE | 0.78+ |
couple years | QUANTITY | 0.78+ |
#DataPlatforms2017 | EVENT | 0.76+ |
Data Platforms 2017 | EVENT | 0.76+ |
Spark Summit | EVENT | 0.75+ |
first | QUANTITY | 0.71+ |
single household | QUANTITY | 0.69+ |
a couple weeks | QUANTITY | 0.66+ |
a thousand | QUANTITY | 0.64+ |
Oracle Data | ORGANIZATION | 0.64+ |
Data Platforms | EVENT | 0.63+ |
Hive | ORGANIZATION | 0.6+ |
2017 | DATE | 0.58+ |
data | QUANTITY | 0.54+ |
Data | COMMERCIAL_ITEM | 0.48+ |
Cloud | TITLE | 0.41+ |
Kellyn Pot'Vin Gorman, Delphix - Data Platforms 2017 - #DataPlatforms2017
>> Announcer: Live from the Wigwam in Phoenix, Arizona. It's theCUBE covering Data Platforms 2017. Brought to you by Qubole. >> Hey welcome back everybody. Jeff Frick here with theCUBE. We're at the historic Wigwam Resort. 99 years young just outside of Phoenix. At Data Platforms 2017. I'm Jeff Frick here with George Gilbert from Wikibon who's co-hosting with me all day. Getting to the end of the day. And we're excited to have our next guest. She is Kellyn Gorman. The technical intelligence manager and also the office of the CTO at Delphix, welcome. >> Yes, thank you, thank you so much. >> Absolutely, so what is Delphix for people that aren't familiar with Delphix? >> Most of realize that the database and data in general is the bottleneck and Delphix completely revolutionizes that. We remove it from being the bottleneck by virtualizing data. >> So you must love this show. >> Oh I do, I do. I'm hearing all about all kinds of new terms that we can take advantage of. >> Right, Cloud-Native and SEPRATE, you know and I think just the whole concept of atomic computing. Breaking down, removing storage, from serve. Breaking it down into smaller parts. Sounds like it fits right into kind of your guys will house. >> Yeah, I kind of want to containerize it all and be able to move it everywhere. But I love it. Yeah. >> So what do you think of this whole concept of Data Ops? We've been talking about Dev Ops for, I don't know how long... How long have we been talking about Dev Ops George? Five years? Six years? A while? >> Yeah a while (small chuckle) >> But now... >> Actually maybe eight years. >> Jeff: you're dating yourself George. (all laugh) Now we're talking about Data Ops, right? And there's a lot of talk of Data Ops. So this is the first time I've really heard it coined in such a way where it really becomes the primary driver in the way that you basically deliver value inside your organization. >> Oh absolutely. You know I come from the database realm. I was a DBA for over two decades and Dev Ops was a hard sell to a lot of DBAs. They didn't want to hear about it. I tried to introduce it over and over. The idea of automating and taking us kind of out this manual intervention. That introduced many times human error. So Dev Ops was a huge step forward getting that out of there. But the database was still in data in general was still this bottleneck. So Data Ops is the idea that you automate all of this and if you virtualize that data we found with Delphix that removed that last hurdle. And that was my, I guess my session was on virtualizing big data. The idea that I could take any kind of structured or unstructured file and virtualize that as well and instead of deploying it to multiple environments, I was able to deploy it once and actually do IO on demand. >> So let's peel the onion on that a little bit. What does it mean to virtualize data? And how does that break databases' bottleneck on the application? >> Well right now, when you talk about a relational data or any kind of legacy data store, people are duplicating that through our kick processes. So if we talk about Oracle they're using things like Datapump. They're using transportable table spaces. These are very cumbersome they take a very long time. Especially with the introduction of the cloud, there's many room for failure. It's not made for that, especially as the network is our last bottleneck. Is what we're also feeling too for many of these folks. When we introduce big data, many of these environments many of these, I guess you'd say projects came out of open source. They were done as a need, as a necessity to fulfill. And they've got a lot of moving pieces. And to be able to containerize that and then deploy it once and the virtualize it so instead of let's say you have 16 gigs that you need to duplicate here and over and over again. Especially if you're going on-prem or to the cloud. That I'm able to do it once and then do that IO on demand and go back to a gold copy a central location. And it makes it look like it's there. I was able to deploy a 16 gig file to multiple environments in less than a minute. And then each of those developers each have their own environment. Each tester has their own and they actually have a read write full robust copy. That's amazing to folks. All of a sudden, they're not held back by it. >> So our infrastructure analysts and our Wikibon research CTO David Floyer, if I'm understanding this correctly, talks about this where it's almost like a snapshot. >> Absolutely >> And it's a read write snapshot although you're probably not going to merge it back into the original. And this way Dev tests and whoever else wants to operate on live data can do that. >> Absolutely, it's full read write what we call it data version control. We've always had version control at the cold level. You may of had it at the actual server level. But you've rarely ever had it at the data level for the database or with flat files. What I used was the cms.gov data. It's available to everyone, it's public data. And we realized that these files were quite large and cumbersome. And I was able to reproduce it and enhance what they were doing at TIME magazine. And create a used case that made sense to a lot of people. Things that they're seeing in their real world environments. >> So, tell us more, elaborate how dev ops expands on this, I'm sorry, not dev ops data ops. How, take that as an example and generalize it some more so that we see how if DBAs were a bottleneck. How they now can become an enabler? >> One it's getting them to raise new skills. Many DBAs think that their value relies on those archaic processes. "It's going to take me three weeks to do this." So I have three weeks of value. Instead of saying "I am going to be able to do this in one day" and those other resources are now also valuable because they're doing their jobs. We're also seeing that data was seen as the centralized point. People were trying to come up with these pain points of solution to them. We're able to take that out completely. And people are able to embrace agility. They have agile environments now. Dev Ops means that they're able to automate that very easily instead of having that stopping point of constantly hitting a data and saying "I've got to take time to refresh this." "How am I going to refresh it?" "Can I do just certain..." We hear about this all the time with testing. When I go to testing summits, they are trying to create synchronized virtualized data. They're creating test data sets that they have to manage. It may not be the same as production where I can actually create a container of the entire developmental production environment. And refresh that back. And people are working on their full product. There's no room for error that you're seeing. Where you would have that if you were just taking a piece of it. Or if you were able to just grab just one tier of that environment because the data was too large before. >> So would the automation part be a generation of snapshot one or more snapshots. And then the sort of orchestration distribution to get it to the intended audiences? >> Yes, and we would use >> Okay. things like Jenkins through Chev normal dev ops tools work along with this. Along with command line utilities that are part of our product. To allow people to just create what they would create normally. But many times it's been siloed and like I said, work around that data. We've included the data as part of that. That they can deploy it just as fast. >> So a lot of the conversation here this morning was really about put the data all in this through your or pick your favorite public cloud to enable access to all the applications to the UPIs, through all different types of things. How does that impact kind of what you guys do in terms of conceptually? >> If you're able to containerize that it makes you capable of deploying to multiple clouds. Which is what we're finding. About 60% of our customers are in more than one cloud, two to five exactly. As we're dealing with that and recognizing that it's kind of like looking at your cloud environments. Like your phone providers. People see something shiny and new a better price point, lesser dollar. We're able to provide that one by saving all that storage space. It's virtualized, it's not taking a lot of disc space. Second of all, we're seeing them say "You know, I'm going to go over to Google." Oh guess what? This project says they need the data and they need to actually take the data source over to Amazon now. We're able to do that very easily. And we do it from multi tier. Flat files, the data, legacy data sources as well as our application tier. >> Now, when you're doing these snapshots, my understanding if I'm getting it right, is it's like a, it's not a full Xerox. It's more like the Delta. Like if someone's doing test dev they have some portion of the source of the source of truth, and as they make changes to it, it grows to include the edits until they're done, in which case then the whole thing is blown away. >> It depends on the technology you're looking at. Ours is able to trap that. So when we're talking about a virtual database, we're using the native recovery mechanisms. To kind of think of it as a perpetual recovery state inside our Delphix engine. So those changes are going on and then you have your VDBs that are a snapshot in time that they're working on. >> Oh so like you take a snapshot and then it's like a journal >> the transactional data is from the logs is continually applied. Of course it's different depending on each technology. So we do it differently for Cybase versus Oracle versus Sequal server and so on and so forth. Virtual files when we talk about flat files are different as well. Your parent, you take an exact snapshot of it. But it's really just projecting that NFS mount to another place. So that mount, if you replace those files, or update them of course, then you would be able to refresh and create a new shot of those files. So somebody said "We refresh these files every single night." You would be able to then refresh and project them out to the new place. >> Oh so you're, it's almost like you're sub-classing them... >> Yes. >> Okay, interesting... When you go into a company that's got a big data initiative, where do you fit in the discussion, in the sequence how do you position the value add relative to the data platform that it's sort of the center of the priority of getting it a platform in place? >> Well, that's what's so interesting about this is that we haven't really talked to a lot of big data companies. We've been very relational over a period of time. But our product is very much a Swiss Army knife. It will work on flat files. We've been doing it for multi tier environments forever. It's that our customers are now going "I have 96 petabytes in Oracle. I'm about to move over to big data." so I was able to go out and say we how would I do this in a big data environment? And I found this used case being used by TIME magazine and then created my environment. And did it off of Amazon. But it was just a used case. I was just a proof of concept that I built to show and demonstrate that. Yeah, my guy's back at the office are going "Kellyn when you're done with it, you can just deliver it back to us." (laughing) >> Jeff: Alright Kellyn. Well thank you for taking a few minutes to stop by and pretty interesting story. Everything's getting virtualized machines, databases... >> Soon us! >> And our data. >> Soon George! >> Right, not me George... (George laughs) Alright, thanks again Kellyn >> Thank you so much. >> for stopping by. Alright I'm with George Gilbert. I'm Jeff Frick you're watching theCUBE from Data Platforms 2017 in Phoenix, Arizona. Thanks for watching. (upbeat electronic music)
SUMMARY :
Brought to you by Qubole. and also the office of the CTO at Delphix, welcome. Most of realize that the database that we can take advantage of. Right, Cloud-Native and SEPRATE, you know and be able to move it everywhere. So what do you think of this whole concept in the way that you basically deliver and instead of deploying it to multiple environments, What does it mean to virtualize data? And to be able to containerize that and our Wikibon research CTO David Floyer, into the original. You may of had it at the actual server level. so that we see how if DBAs were a bottleneck. They're creating test data sets that they have to manage. distribution to get it to the intended audiences? To allow people to just create what So a lot of the conversation here the data source over to Amazon now. of the source of truth, and as they make and then you have your VDBs that NFS mount to another place. Oh so you're, it's almost like you're to the data platform that it's sort of I'm about to move over to big data." to stop by and pretty interesting story. Right, not me George... Alright I'm with George Gilbert.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
Jeff | PERSON | 0.99+ |
Kellyn Gorman | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
Kellyn | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
George | PERSON | 0.99+ |
two | QUANTITY | 0.99+ |
three weeks | QUANTITY | 0.99+ |
16 gig | QUANTITY | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Phoenix | LOCATION | 0.99+ |
Five years | QUANTITY | 0.99+ |
eight years | QUANTITY | 0.99+ |
Six years | QUANTITY | 0.99+ |
16 gigs | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
less than a minute | QUANTITY | 0.99+ |
each | QUANTITY | 0.99+ |
99 years | QUANTITY | 0.99+ |
Xerox | ORGANIZATION | 0.99+ |
Phoenix, Arizona | LOCATION | 0.99+ |
Delphix | ORGANIZATION | 0.99+ |
Swiss Army | ORGANIZATION | 0.99+ |
96 petabytes | QUANTITY | 0.98+ |
David Floyer | PERSON | 0.98+ |
About 60% | QUANTITY | 0.98+ |
Each tester | QUANTITY | 0.98+ |
Wikibon | ORGANIZATION | 0.98+ |
more than one cloud | QUANTITY | 0.98+ |
Second | QUANTITY | 0.98+ |
one day | QUANTITY | 0.98+ |
first time | QUANTITY | 0.97+ |
TIME | TITLE | 0.97+ |
five | QUANTITY | 0.97+ |
Ops | TITLE | 0.96+ |
each technology | QUANTITY | 0.96+ |
Qubole | PERSON | 0.96+ |
CTO | PERSON | 0.95+ |
one tier | QUANTITY | 0.94+ |
theCUBE | ORGANIZATION | 0.94+ |
Chev | TITLE | 0.93+ |
#DataPlatforms2017 | EVENT | 0.92+ |
Dev Ops | TITLE | 0.91+ |
this morning | DATE | 0.89+ |
Kellyn Pot'Vin Gorman | PERSON | 0.88+ |
over two decades | QUANTITY | 0.87+ |
one | QUANTITY | 0.82+ |
Delphix | TITLE | 0.81+ |
One | QUANTITY | 0.77+ |
Datapump | ORGANIZATION | 0.75+ |
Wigwam Resort | LOCATION | 0.75+ |
Ops | ORGANIZATION | 0.73+ |
single night | QUANTITY | 0.72+ |
Jenkins | TITLE | 0.71+ |
Wigwam | LOCATION | 0.71+ |
Sequal | ORGANIZATION | 0.7+ |
Data | TITLE | 0.66+ |
Platforms | EVENT | 0.65+ |
Data Platforms 2017 | EVENT | 0.64+ |
SEPRATE | PERSON | 0.63+ |
cms.gov | OTHER | 0.56+ |
Cybase | ORGANIZATION | 0.56+ |
Cloud- | ORGANIZATION | 0.55+ |
Delta | ORGANIZATION | 0.54+ |
Data Ops | ORGANIZATION | 0.52+ |
2017 | DATE | 0.44+ |
Colin Riddell, Epic Games - Data Platforms 2017 - #DataPlatforms2017
>> Narrator: Live from The Wigwam in Phoenix, Arizona, it's the CUBE. Covering Data Platforms 2017. Brought to you by Qubole. (techno music) >> Hey, welcome back everybody. Jeff Frick here with the CUBE. We are in The Wigwam Resort, historic Wigwam Resort, just outside of Phoenix, Arizona at Data Platforms 2017. It's a new Big Data event. You might say, god there's already a lot of Big Data events, but Qubole's taken a different approach to Big Data. Cloud-first, cloud-native, you're integrated with all the big public clouds and they all come from Big Data backgrounds, practitioner backgrounds. So it's a really cool thing and we're really excited to have our next guest, Colin Ridell, he's a Big Data architect from Epic Games, was up on a panel earlier today. Colin, Welcome. >> Thank you, thank you for having me. >> Absolutely, so, enjoyed your panel, a lot of topics that you guys covered. One of the ones we hear over and over again is get early wins. How do you drive adoption, change people's behaviors, it's not really a technology story. It's a human factors and behaviors story. So I wonder if you can share some of your experience, some best practices, some stories. >> So I don't know if there's really a rule book on best practices for that. Every environment is different, every company is different. But one thing that seems to be constant is resistance to change in a lot of the places, so... >> Jeff: That is consistent. >> We had some challenges when I came in. We were running a system that was on it's last legs basically, and we had to replace it. There was really no choice. There was no fixing it. And so, I did actually encounter a fair bit of resistance with regards to that when I started at Epic. >> Now it's interesting, you said a fair amount of resistance. Another one of your lessons was start slow, find some early wins, but you said, that you were thrown into a big project right off the bat. >> Colin: So, we were, yeah. >> I'm curious, how did the big project go, but when you do start slow, how small does it need to be where you can start to get these wins to break down the resistance. >> I think what we, the way we approached it was we looked at what was the most crucial process, or the most crucial set of processes. And that's where we started. So that was what we tried to convert first and then make that data available to people via an alternative method, which was Hive. And once people started using it and learned how to interact with it properly the barriers start to fall. >> What were some of the difficult change management issues? Where did you come from in terms of the technology platform and what resistance did you hit? >> So it was really a user interface was the main factor of resistance. So we were running a Hadoop cluster. It was fixed sized, it wasn't on PRaM, but it was in a private cloud. It was basically, simply being overloaded. We had to do constant maintenance on it. We had to prop it up. And it was, the performance was degrading and degrading and degrading. The idea behind the replacement was really to give us something that was scalable, that would grow in the future, that wouldn't run into these performance blockers that we were having. But again, like I said, the hardest factor was the user interface differences. People were used to the tool set that they were working with, they liked the way it worked. >> What was the tool set? >> I would rather not actually say that on camera, >> Jeff: That's fine. >> Does it source itself in Redmond or something? >> No, no it doesn't, they're not from Redmond. I just don't want to cast aspersions. >> No, you don't need to cast aspersions. The conflict was really just around familiarity with the tool, it wasn't really about a wholesale change in behavior and becoming more data-centric. >> No, because the tool that we replaced was an effort to become more data-centric to begin with. There definitely was a corporate culture of we want to be more data-informed. So that was not one of the factors that we had to overcome. It was really tool-based. >> But the games market is so competitive, right? You guys have to be on your game all the time and you got to keep an eye on what everybody else is doing in their games, and make course corrections as I understand, something becomes hot, or new, so you guys have to be super nimble on your feet. How does taking this approach help you be more nimble in the way that you guys get new code out, new functionality? >> It's really, really very easy for us now to inject new events into the game, we basically can break those events out and report on them or analyze what's going on in the game for free with the architecture that we have now. >> Does that mean it's the equivalent of, in IT operations, we instrument everything from the applications, to the middleware, down to the hardware. Are you essentially doing the same to the game so you can follow the pathway of a gamer, or the hotspots of all the gamers, that sort of thing? >> I'm not sure I fully understand your question. >> When you're running analytics on a massively multi-player game, what questions are you seeking to answer? >> Really what we are seeking to answer at the moment is what brings people back? What behaviors can we foster in-- >> Engagement. >> in our players. Yeah, engagement, exactly. >> And that's how you measure engagement, it's just as simple as, do they come back or time on game? >> That's the most simple measure that we use for it, yeah. >> So Colin, we're short on time, want to give you the last word. When you come to a conference like this, there's a lot of peer interaction, there's some great questions coming out of the panel, around specifically, how do you measure success? It wasn't technical at all. It's, what are the things that you're using to measure whether stuff is working. I wonder if you can talk to the power of being in an ecosystem of peers here. Any surprises or great insights that you've got. I know we've only been here for a couple days. >> I would say that one of the biggest values, obviously the sessions and the breakouts are great, but I think one of the greatest values of here is simply the networking aspect of it. The being able to speak to people who are facing similar challenges, or doing similar things. Even although they're in a completely different domain, the problems are constant. Or common at least. How do you do machine learning to categorize player behaviors in our case and in other cases it's categorization of feedback that people get from websites, stuff like that. I really think the networking aspect is the most valuable thing to conferences like this. >> Alright, awesome. Well, Colin Ridell, Epic Games, thanks for taking a few minutes to stop by the CUBE. >> You're welcome, more than welcome, thank you very much. >> Absolutely, alright, George Gilbert, I'm Jeff Frick, you're watching the CUBE from Data Platforms 2017 at the historic Wigwam Resort. Thanks for watching. (upbeat techno music)
SUMMARY :
Brought to you by Qubole. from Epic Games, was up on a panel earlier today. So I wonder if you can share some of your experience, is resistance to change in a lot of the places, so... There was really no choice. that you were thrown into a big project right off the bat. but when you do start slow, how small does it need to be So that was what we tried to convert first The idea behind the replacement was really to I just don't want to cast aspersions. No, you don't need to cast aspersions. So that was not one of the factors that we had to overcome. more nimble in the way that you guys in the game for free with the architecture that we have now. from the applications, to the middleware, in our players. I wonder if you can talk to the power of being How do you do machine learning thanks for taking a few minutes to stop by the CUBE. from Data Platforms 2017 at the historic Wigwam Resort.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
Colin Ridell | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
Jeff | PERSON | 0.99+ |
Colin | PERSON | 0.99+ |
Epic Games | ORGANIZATION | 0.99+ |
Colin Riddell | PERSON | 0.99+ |
Phoenix, Arizona | LOCATION | 0.98+ |
Wigwam Resort | LOCATION | 0.98+ |
one thing | QUANTITY | 0.96+ |
One | QUANTITY | 0.96+ |
Data Platforms 2017 | EVENT | 0.95+ |
Redmond | ORGANIZATION | 0.94+ |
CUBE | ORGANIZATION | 0.94+ |
Qubole | PERSON | 0.94+ |
Epic | ORGANIZATION | 0.91+ |
#DataPlatforms2017 | EVENT | 0.87+ |
one | QUANTITY | 0.86+ |
earlier today | DATE | 0.82+ |
Narrator: Live from The Wigwam | TITLE | 0.79+ |
first | QUANTITY | 0.71+ |
one of the factors | QUANTITY | 0.67+ |
Covering | EVENT | 0.65+ |
couple | QUANTITY | 0.48+ |
Show Wrap - Data Platforms 2017 - #DataPlatforms2017
>> Announcer: Live from the Wigwam in Phoenix, Arizona. It's theCUBE. Covering Data Platforms 2017. Brought to you by Kubo. >> Hey welcome back everybody. Jeff Frick here with theCUBE along with George Gilbert from Wikibon. We've had a tremendous day here at DataPlatforms 2017 at the historic Wigwam Resort, just outside of Phoenix, Arizona. George, you've been to a lot of big data shows. What's your impression? >> I thought we're at the, we're sort of at the edge of what could be a real bridge to something new, which is, we've built big data systems for like out of traditional, as traditional software for deployment on traditional infrastructure. Even if you were going to put it in a virtual machine, it's still not a cloud. You're still dealing with server abstractions. But what's happening with Kubo is, they're saying, once you go to the cloud, whether it's Amazon, Azure, Google or Oracle, you're going to be dealing with services. Services are very different. It greatly simplifies the administrative experience, the developer experience, and more than that, they're focused on, they're focused on turning Kubo, the product on Kubo the service, so that they can automate the management of it. And we know that big data has been choking itself on complexity. Both admin and developer complexity. And they're doing something unique, both on sort of the big data platform management, but also data science operations. And their point, their contention, which we still have to do a little more homework on, is that the vendors who started with software on-prem, can't really make that change very easily without breaking what they've done on-prem. Cuz they have traditional perpetual license physical software as opposed to services, which is what is in the cloud. >> The question is, are people going to wait for them to figure it out. I talked to somebody in the hallway earlier this morning and we were talking about their move to put all their data into, it was S3, on their data lake. And he said, it's part of a much bigger transformational process that we're doing inside the company. And so, this move, from his cloud, public cloud viable, to tell me, give me a reason why it shouldn't go to the cloud, has really kicked in big time. And hear over and over and over that speed and agility, not just in deploying applications, but in operating as a company, is the key to success. And we hear over and over how many, how short the tenure is on the Fortune 500 now, compared to what it used to be. So if you're not speed and agile, which you pretty much have to use cloud, and software driven automated decision-making >> Yeah. >> that's powered by machine learning to eat. >> Those two things. >> A huge percentage of your transaction and decision-making, you're going to get smoked by the person that is. >> Let's let's sort of peel that back. I was talking to Monte Zweben who is the co-founder of Splice Machine, one of the most advance databases that sort of come out of nowhere over the last couple of years. And it's now, I think, in close beta on Amazon. He showed me, like a couple of screens for spinning it up and configuring it on Amazon. And he said, if I were doing that on-prem, he goes I needed Hadoop cluster with HBase. It would take me like four plus months. And that's an example of software versus services. >> Jeff: Right. >> And when you said, when you pointed out that, automated decision-making, powered by machine learning, that's the other part, which is these big data systems ultimately are in the service of creating machine learning models that will inform ever better decisions with ever greater speed and the key then is to plug those models into existing systems of record. >> Jeff: Right. Right. >> Because we're not going to, >> We're not going to to rip those out and rebuild them from scratch. >> Right. But as you just heard, you can pull the data out that you need, run it through a new age application. >> George: Yeah. >> And then feed it back into the old system. >> George: Yes. >> The other thing that came up, it was Oskar, I have to look him up, Oskar Austegard from Gannett was on one of the panels. We always talk about the flexibility to add capacity very easily in a cloud-based solution. But he talked about in the separation of storage and cloud, that they actually have times where they turn off all their compute. It's off. Off. >> And that was If you had to boil down the fundamental compatibility break between on-prem and in the cloud, the Kubo folks, both the CEO and CMO said, look, you cannot reconcile what's essentially server send, where the storage is attached to the compute node, the server. With cloud where you have storage separate from compute and allowing you to spin it down completely. He said those are just the fundamentally incompatible. >> Yeah, yeah. And also, Andretti, one of the founders in his talk, he talked about the big three trends, which we just kind of talked about, he summarized them right in serverless. This continual push towards smaller and smaller units >> George: Yeah. >> of store compute. And the increasing speed of networks is one, from virtual servers to just no servers, to just compute. The second one is automation, you've got to move to automation. >> George: Right. If you're not, you're going to get passed by your competitor that is. Or the competitor you that you don't even know that exists that's going to come out from over your shoulder. And the third one was the intelligence, right. There is a lot of intelligence that can be applied. And I think the other cusp that we're on, is this continuing crazy increase in compute horsepower. Which just keeps going. That the speed and the intelligence of these machines is growing at an exponential curve, not a linear curve. It's going to be bananas in the not too distance future. >> We're soaking up more and more that intelligence with machine learning. The training part of machine learning where the datasets to train a model are immense. Not only the dataset are large, but the amount of time to sort of chug through them to come up with the, just the right mix of variables and values for those variables. Or maybe even multiple models. So that we're going to see in the cloud. And that's going to chew up more and more cycles. Even as we have >> Jeff: Right. Right. >> specialized processors. >> Jeff: Right. But in the data ops world, in theory yes, but I don't have to wait to get it right. Right? I can get it 70% right. >> George: Yeah. >> Which is better than not right. >> George: Yeah. >> And I can continue to iterate over time. In that, I think was the the genius of dev-ops. To stop writing PRDs and MRDs. >> George: Yeah. >> And deliver something. And then listen and adjust. >> George: Yeah. >> And within the data ops world, it's the same thing. Don't try to figure it all out. Take the data you know, have some hypothesis. Build some models and iterate. That's really tough to compete with. >> George: Yeah. >> Fast, fast, fast iteration. >> We're doing actually a fair amount of research on that. On the Wikibon side. Which is, if you build, if you build an enterprise application that has, that is reinforced or informed by models in many different parts, in other words, you're modeling more and more digital entities within the business. >> Jeff: Right. >> Each of those has feedback loops. >> Jeff: Right. Right. >> And when you get the whole thing orchestrated and moving or learning in concert then you have essentially what Michael Porter many years ago called competitive advantage. Which is when each business process reinforces all the other business processes in service of a delivering a value proposition. And those models represent business processes and when they're learning and orchestrated all together, you have a, what Trump called a fined-tuned machine. >> I won't go there. >> Leaving out that it was Bigley and it was finely-tuned machine. >> Yeah, yeah. But the end of the day, if you're using resources and effort to improve an different resource and effort, you're getting a multiplier effect. >> Yes. >> And that's really the key part. Final thought as we go out of here. Are you excited about this? Do you see, they showed the picture the NASA headquarters with the big giant snowball truck loading up? Do you see more and more of this big enterprise data going into S3, going into Google Cloud, going into Microsoft Azure? >> You're asking-- >> Is this the solution for the data lake swamp issue that we've been talking about? >> You're asking the 64 dollar question. Which is, companies, we sensed a year ago at the at the Hortonworks DataWorks Summit in, was in June, down in San Jose last year. That was where we first got the sense that, people were sort of throwing in the towel on trying to build, large scale big data platforms on-prem. And what changes now is, are they now evaluating Hortonworks versus Cloudera versus MapR in the cloud or are they widening their consideration as Kubo suggests. Because now they want to look, not only at Cloud Native Hadoop, but they actually might want to look at Cloud Native Services that aren't necessarily related to Hadoop. >> Right. Right. And we know as a service wins. It's continue. PAS is a service. Software is a service. Time and time again, as a service either eats a lot of share from the incumbent or knocks the incumbent out. So, Hadoop as a service, regardless of your distro, via one of these types of companies on Amazon, it seems like it's got to win, right. It's going to win. >> Yeah but the difference is, so far, so far, the Clouderas and the MapRs and the Hortonworks of the world are more software than service when they're in the cloud. They don't hide all the knobs. You still need You still a highly trained admin to get them up-- >> But not if you buy it as a service, in theory, right. It's going to be packaged up by somebody else and they'll have your knobs all set. >> They're not designed yet that way. >> HD Insight >> Then, then, then, then, They better be careful cuz it might be a new, as a service distro, of the Hadoop system. >> My point, which is what this is. >> Okay, very good, we'll leave it at that. So George, thanks for spending the day with me. Good show as always. >> And I'll be in a better mood next time when you don't steal my candy bars. >> All right. He's George Goodwin. I'm Jeff Frick. You're watching theCUBE. We're at the historic 99 years young, Wigwam Resort, just outside of Phoenix, Arizona. DataPlatforms 2017. Thanks for watching. It's been a busy season. It'll continue to be a busy season. So keep it tuned. SiliconAngle.TV or YouTube.com/SiliconAngle. Thanks for watching.
SUMMARY :
Brought to you by Kubo. at the historic Wigwam Resort, is that the vendors who started with software on-prem, but in operating as a company, is the key to success. you're going to get smoked by the person that is. over the last couple of years. and the key then is to plug those models Jeff: Right. We're not going to to rip those out But as you just heard, We always talk about the flexibility to add capacity And that was And also, Andretti, one of the founders in his talk, And the increasing speed of networks is one, And the third one was the intelligence, right. but the amount of time to sort of chug through them Jeff: Right. But in the data ops world, in theory yes, And I can continue to iterate over time. And then listen and adjust. Take the data you know, have some hypothesis. On the Wikibon side. Jeff: Right. And when you get the whole thing orchestrated Leaving out that it was Bigley But the end of the day, if you're using resources And that's really the key part. You're asking the 64 dollar question. a lot of share from the incumbent and the Hortonworks of the world It's going to be packaged up by somebody else of the Hadoop system. which is what this is. So George, thanks for spending the day with me. And I'll be in a better mood next time We're at the historic 99 years young, Wigwam Resort,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jeff Frick | PERSON | 0.99+ |
Jeff | PERSON | 0.99+ |
George | PERSON | 0.99+ |
George Goodwin | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Michael Porter | PERSON | 0.99+ |
Andretti | PERSON | 0.99+ |
San Jose | LOCATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
64 dollar | QUANTITY | 0.99+ |
70% | QUANTITY | 0.99+ |
Trump | PERSON | 0.99+ |
Oskar Austegard | PERSON | 0.99+ |
June | DATE | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Oskar | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
NASA | ORGANIZATION | 0.99+ |
Kubo | ORGANIZATION | 0.99+ |
one | QUANTITY | 0.99+ |
last year | DATE | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
four plus months | QUANTITY | 0.99+ |
99 years | QUANTITY | 0.99+ |
third one | QUANTITY | 0.99+ |
Phoenix, Arizona | LOCATION | 0.99+ |
a year ago | DATE | 0.99+ |
Splice Machine | ORGANIZATION | 0.98+ |
Both | QUANTITY | 0.98+ |
Microsoft | ORGANIZATION | 0.98+ |
Hadoop | TITLE | 0.98+ |
both | QUANTITY | 0.97+ |
Azure | ORGANIZATION | 0.97+ |
Each | QUANTITY | 0.96+ |
Monte Zweben | PERSON | 0.96+ |
first | QUANTITY | 0.94+ |
MapRs | ORGANIZATION | 0.94+ |
earlier this morning | DATE | 0.92+ |
Wigwam Resort | LOCATION | 0.92+ |
two things | QUANTITY | 0.92+ |
2017 | DATE | 0.92+ |
#DataPlatforms2017 | EVENT | 0.89+ |
Wikibon | ORGANIZATION | 0.89+ |
second one | QUANTITY | 0.89+ |
three trends | QUANTITY | 0.89+ |
each business process | QUANTITY | 0.87+ |
DataPlatforms | TITLE | 0.86+ |
theCUBE | ORGANIZATION | 0.85+ |
Cloudera | ORGANIZATION | 0.85+ |
Hortonworks DataWorks Summit | EVENT | 0.85+ |
Wigwam Resort | ORGANIZATION | 0.85+ |
Kubo | PERSON | 0.84+ |
Gannett | ORGANIZATION | 0.82+ |
MapR | ORGANIZATION | 0.8+ |
S3 | TITLE | 0.8+ |
many years ago | DATE | 0.78+ |
DataPlatforms 2017 | EVENT | 0.74+ |
years | DATE | 0.73+ |
YouTube.com/SiliconAngle | OTHER | 0.72+ |
Clouderas | ORGANIZATION | 0.7+ |
Cloud Native | TITLE | 0.67+ |
Platforms | TITLE | 0.67+ |
Google Cloud | TITLE | 0.64+ |
Cloud Native Hadoop | TITLE | 0.64+ |
last couple | DATE | 0.64+ |
Azure | TITLE | 0.61+ |
Mick Bass, 47Lining - Data Platforms 2017 - #DataPlatforms2017
>> Live, from The Wigwam, in Phoenix, Arizona, it's theCube, covering Data Platforms 2017. Brought to you by Cue Ball. Hey, welcome back everybody. Jeff Frick here with theCube. Welcome back to Data Platforms 2017, at the historic Wigwam Resort, just outside of Phoenix, Arizona. I'm here all day with George Gilbert from Wikibon, and we're excited to be joined by our next guest. He's Mick Bass, the CEO of 47Lining. Mick, welcome. >> Welcome, thanks for having me, yes. >> Absolutely. So, what is 47Lining, for people that aren't familiar? >> Well, you know every cloud has a silver lining, and if you look at the periodic table, 47 is the atomic number for silver. So, we are a consulting services company that helps customers build out data platforms and ongoing data processes and data machines in Amazon web services. And, one of the primary use cases that we help customers with is to establish data lakes in Amazon web services to help them answer some of their most valuable business questions. >> So, there's always this question about own vs buy, right, with Cloud and Amazon, specifically. >> Mm-hmm, mm-hmm. >> And, with a data lake, the perception right... That's huge, this giant cost. Clearly that's from benefits that come with putting your data lake in AWS vs having it on Primm. What are some of the things you take customers through, and kind of the scenario planning and the value planning? >> Well, just a couple of the really important aspects, one, is this notion of elastic and on-demand pricing. In a Cloud based data lake, you can start out with actually a very small infrastructure footprint that's focused on maybe just one or two business use cases. You can pay only for the data that you need to get your data leg bootstrapped, and demonstrate the business benefit from one of those use cases. But, then it's very easy to scale that up, in a pay as you go kind of a way. The second, you know, really important benefit that customers experience in a platform that's built on AWS, is the breadth of the tools and capabilities that they can bring to bare for their predictive analytics and descriptive analytics, and streaming kinds of data problems. So, you need Spark, you can have it. You need Hive, you can have it. You need a high performance, close to the metal, data warehouse, on a cluster database, you can have it. So, analysts are really empowered through this approach because they can choose the right tool for the right job, and reduce the time to business benefit, based on what their business owners are asking them for. >> You touched on something really interesting, which was... So, when a customer is on Primm, and let's say is evaluating Cloudera, MaPr, Hortonworks, there's a finite set of services or software components within that distro. Once they're on the Cloud, there's a thousand times more... As you were saying, you could have one of 27 different data warehouse products, you could have many different sequel products, some of which are really delivered as services. >> Mm-hmm >> How does the consideration of the customer's choice change when they go to the Cloud? >> Well, I think that what they find is that it's much more tenable to take an agile, iterative process, where they're trying to align the outgoing cost of the data lake build to keep that in alignment with the business benefits that come from it. And, so if you recognize the need for a particular kind of analytics approach, but you're not going to need that until down the road, two or three quarters from now. It's easy to get started with simple use cases, and then like add those incremental services, as the need manifests. One of the things that I mention in my talk, that I always encourage our customers to keep in mind, is that a data lake is more than just a technology construct. It's not just an analysis set of machinery, it's really a business construct. Your data lake has a profit and loss statement, and the way that you interact with your business owners to identify this specific value sources, that you're going to make pop for you company, can be made to align with the cost footprint, as you build your data lake out. >> So I'm curious, when you're taking customers though the journey to start kind of thinking of the data lake and AWS, are there any specific kind of application spaces, or vertical spaces where you have pretty high confidence that you can secure an early, and relatively easy, win to help them kind of move down the road? >> Absolutely. So, you know, many of our customers, in a very common, you know, business need, is to enhance the set of information that they have available for a 360 degree view of the customer. In many cases, this information and data, it's available in different parts of the enterprises, but it might be siloed. And, a data lake approach in AWS really helps you to pull it together in an agile fashion based on particular, quarter by quarter, objectives or capabilities that you're trying to respond to. Another very common example is predictive analytics for things like fraud detection, or mechanical failure. So, in eCommerce kinds of situations, being able to pull together semi-structured information that might be coming from web servers or logs, or like what cookies are associated with this particular user. It's very easy to pull together a fraud oriented predictive analytic. And, then the third area that is very common is internet of things use cases. Many enterprises are augmenting their existing data warehouse with sensor oriented time series data, and there's really no place in the enterprise for that data currently to land. >> So, when you say they are augmenting the data warehouse, are they putting it in the data warehouse, or they putting it in a sort of adjunct, time series database, from which they can sort of curate aggregates, and things like that to put in the data warehouse? >> It's very much the latter, right. And, the time series data itself may come from multiple different vendors and the input formats, in which that information lands, can be pretty diverse. And so, it's not really a good fit for a typical kind of data warehouse ingest or intake process. >> So, if you were to look at, sort of, maturity models for the different use cases, where would we be, you know, like IOT, Customer 360, fraud, things like that? >> I think, you know, so many customers have pretty rich fraud analytics capabilities, but some of the pain points that we hear is that it's difficult for them to access the most recent technologies. In some cases the order management systems that those analytics are running on are quite old. We just finished some work with a customer where literally the order management system's running on a mainframe, even today. Those systems have the ability to accept steer from like a sidecar decision support predictive analytic system. And, one of the things that's really cool about the Cloud is you could build a custom API just for that fraud analytics use case so that you can inject exactly the right information that makes it super cheap and easy for the ops team, that's running that mainframe, to consume the fraud improvement decision signal that you're offering. >> Interesting. And so, this may be diving into the weeds a little bit, but if you've got an order management system that's decades old and you're going to plug-in something that has to meet some stringent performance requirements, how do you, sort of, test... It's not just the end to end performance once, but you know for the 99th percentile, that someone doesn't get locked out for five minutes while he's to trying to finish his shopping cart. >> Exactly. And I mean, I think this is what is important about the concept of building data machines, in the Cloud. This is not like a once and done kind of process. You're not building an analytic that produces a print out that an executive is going to look at (laughing) and make a decision. (laughing) You're really creating a process that runs at consumer scale, and you're going to apply all of the same kinds of metrics of percentile performance that you would apply at any kind of large scale consumer delivery system. >> Do you custom-build, a fraud prevention application for each customer? Or, is there a template and then some additional capabilities that you'll learn by running through their training data? >> Well, I think largely, there are business by business distinctions in the approach that these customers take to fraud detection. There's also business by business direction distinction in their current state. But, what we find is that the commonalities in the kinds of patterns and approaches that you tend to apply. So, you know... We may have extra data about you based on your behavior on the web, and your behavior on a mobile app. The particulars of that data might be different for Enterprise A vs Enterprise B, but this pattern of joining up mobile data plus web data plus, maybe, phone-in call center data. Putting those all together, to increase the signal that can be made available to a fraud prevention algorithm, that's very common across all enterprises. And so, one of the roles that we play is to set up the platform, so that it's really easy to mobilize each of these data sources. So in many cases, it's the customer's data scientist that's saying, I think I know how to do a better job for my business. I just need to be unleashed to be able to access this data, and if I'm blocked, I need a platform where the answer that I get back is oh, you could have that, like, second quarter of 2019. Instead, you want to say, oh, we can onboard that data in an agile fashion pay, and increment a little bit of money because you've identified a specific benefit that could be made available by having that data. >> Alright Mick, well thanks for stopping by. I'm going to send Andy Jassy a note that we found the silver lining to the Cloud (laughing) So, I'm excited for that, if nothing else, so that made the trip well worth while, so thanks for taking a few minutes. >> You bet, thanks so much, guys. >> Alright Mick Bass, George Gilbert, Jeff Frick, you're watching theCube, from Data Platforms 2017. We'll be right back after this short break. Thanks for watching. (computer techno beat)
SUMMARY :
Brought to you by Cue Ball. So, what is 47Lining, for people that aren't familiar? and if you look at the periodic table, So, there's always this question about own vs buy, right, What are some of the things you take customers through, and reduce the time to business benefit, you could have many different sequel products, and the way that you interact with your business owners for that data currently to land. and the input formats, so that you can inject exactly the right information It's not just the end to end performance once, a print out that an executive is going to look at (laughing) of patterns and approaches that you tend to apply. the silver lining to the Cloud (laughing) Thanks for watching.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
Andy Jassy | PERSON | 0.99+ |
Mick Bass | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
five minutes | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Mick | PERSON | 0.99+ |
360 degree | QUANTITY | 0.99+ |
Cue Ball | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
47Lining | ORGANIZATION | 0.99+ |
99th percentile | QUANTITY | 0.99+ |
Phoenix, Arizona | LOCATION | 0.99+ |
two | QUANTITY | 0.99+ |
second quarter of 2019 | DATE | 0.99+ |
One | QUANTITY | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
second | QUANTITY | 0.98+ |
each | QUANTITY | 0.98+ |
Spark | TITLE | 0.96+ |
today | DATE | 0.96+ |
Cloud | TITLE | 0.95+ |
27 different data warehouse products | QUANTITY | 0.95+ |
Wikibon | ORGANIZATION | 0.95+ |
decades | QUANTITY | 0.94+ |
three quarters | QUANTITY | 0.9+ |
each customer | QUANTITY | 0.89+ |
MaPr | ORGANIZATION | 0.87+ |
third area | QUANTITY | 0.87+ |
two business use cases | QUANTITY | 0.81+ |
The Wigwam | ORGANIZATION | 0.8+ |
theCube | ORGANIZATION | 0.8+ |
Wigwam Resort | LOCATION | 0.78+ |
Cloud | ORGANIZATION | 0.77+ |
IOT | ORGANIZATION | 0.76+ |
47 | OTHER | 0.74+ |
a thousand times | QUANTITY | 0.73+ |
Customer | ORGANIZATION | 0.72+ |
Cloudera | ORGANIZATION | 0.7+ |
2017 | DATE | 0.7+ |
things | QUANTITY | 0.68+ |
#DataPlatforms2017 | EVENT | 0.62+ |
Platforms | TITLE | 0.61+ |
Primm | ORGANIZATION | 0.59+ |
Data | ORGANIZATION | 0.58+ |
Data Platforms | EVENT | 0.53+ |
Data Platforms 2017 | TITLE | 0.5+ |
lake | ORGANIZATION | 0.49+ |
2017 | EVENT | 0.46+ |
Data Platforms | ORGANIZATION | 0.38+ |
360 | OTHER | 0.24+ |
Tripp Smith, Clarity - Data Platforms 2017 - #DataPlatforms2017
>> Narrator: Live from the Wigwam in Phoenix Arizona, it's theCUBE, covering data platforms 2017, brought to you by Qubole. >> Hey welcome back everybody, Jeff Frick here with theCUBE. I'm joined by George Gilbert from Wikibond and we're at DataPlatforms 2017. Small conference down at the historic Wigwam Resort, just outside of Phoenix, talking about, kind of a new approach to big data really. A Cloud native approach to big data and really kind of flipping the old model on it's head. We're really excited to be joined by Tripp Smith, he's the CTO of Clarity Insights, up on a panel earlier today. So first off, welcome Tripp. >> Thank you. >> For the folks that aren't familiar with Clarity Insights Give us a little background. >> So Clarity is a pure play data analytics professional services company. That's all we do. We say we advise, build and enable for our client. So what that means, is data strategy, data engineering and data science and making sure that we can action the insights that our customers get out of their data analytics platforms. >> Jeff: So not a real busy area these days. >> It's growing pretty well. >> Good for you. So a lot of interesting stuff came up on the panel. But one of the things that you reacted to, I reacted to as well from the keynote. Was this concept of, you know before you had kind of the data scientist with the data platform behind them, being service providers to the basic business units. Really turning that model on it's head. Giving access to the data to all the business units, and people that want to consume that. Making the data team really enablers of kind of a platform play. Seemed to really resonate with you as well. >> Yeah absolutely, so if you think about it, a lot of the focus on legacy platforms was driven by, scarcity around the resources to deal with data. So you created this almost pyramid structure with IT and architecture at the top. They were the gatekeepers and kind of the single door where Insights got out to the business. >> Jeff: Right. >> So in the big data world and with Cloud, with elastic scale, we've been able to turn that around and actually create much more collaborative friction in parallel with the business. Putting the data engineers, data scientists and business focus analystist together and making them more of partners, than just customers of IT. >> Jeff: Right, very interesting way, to think of it as a partner. It's a very different mindset. The other piece that came up over and over in the Q&A at the end. Was how do people get started? How are they successful? So you deal with a lot of customers, right? That's your business. What are some stories, or one that you can share of best practices, when people come and they say, we obviously hired you, we wrote a check. But how do we get started, where do we go first? How do you help people out? >> We focus on self funding analytic programs. Getting those early wins, tend to pay for more investment in analytics. So if you look at the ability to scale out as a starting point. Then aligning that business value and the roadmap in a way that going to both demonstrate the value along the way, and contribute to that capability is important. I think we also recommend to our clients that they solve the hard problems around security and data governance and compliance first. Because that allows them to deal with more valuable data and put that to work for their business. >> So is there any kind of low hanging fruit that you see time and time and time again? That just is like, ah we can do this. We know it's got huge ROI. It's either neglected cause they don't think it's valuable or it's neglected because it's in the backroom. Or is there any easy steps that you find some patterns? >> Yeah, absolutely. So we go to market by industry vertical. So within each vertical, we've defined the value maps and ROI levers within that business. Then align a lot of our analytic solutions to those ROI levers. In doing that, we focus this on being able to build a small, multifunctional team that can work directly with the business. Then deliver that in real time in an interactive way. >> Right, another thing you just talked about security and government, are we past the security concerns about public Cloud? Does that even come up as an issue anymore? >> You know, I think there was a great comment today that if you had money, you wouldn't put it in your safe at home. You'd put it in a bank. >> Jeff: I missed that one, that's a good one. >> The Cloud providers are really focused on security in a way that they can invest in it. That an individual enterprise really can't. So in a lot of cases, moving to the Cloud means, letting the experts take on the area that they're really good at and letting you focus on your business. >> Jeff: Right, interesting they had, Amazon is here, Google's here, Oracle's here and Azure is here. AWS reinvent one of my favorite things, is Tuesday night with James Hamilton. Which I don't know if you've ever been, it's a can't miss presentation. But he talks about the infrastructure investments that Amazon, AWS can make. Which again, compared to any individual enterprise are tremendous in not only security, but networking and all these other things that they do. So it really seems that the scale that these huge Cloud providers have now reach, gives them such an advantage over any individual enterprise, whether it's for security, or networking or anything else. So it's very different kind of a model. >> Yeah, absolutely, or even the application platform, like Google now having Spanner. Which has the scale advantage of Cassandra or H Based. The transactional capabilities of a traditional RDB mess. I guess my question is. Once a customer is considering Qubole, as a Cloud first data platform. How do you help the customer evaluate it? Relative to the dist rose that started out on Prim, and then the other Cloud native ones that are from Azure and Google and Amazon. >> You know I think that's a great question. It kind of focuses back on, letting the experts do what they're really good at. My business may not be differentiated by my ability to operate and support Hadoop. But it's really putting Hadoop to work in order to solve this business problems that makes me money. So when I look at something like Qubole, it's actually going to that expert and saying, "Hey own this for me and deliver this in a reliable way." Rather than me having to solve those problems over and over again myself. >> Do you think that those problems are not solved to the same degree by the Cloud native services? >> So I think there's definitely an ability to leverage Cloud data services. But there's also this aspect of administration and management, and understanding how those integrate within an ecosystem. That I don't think necessarily every company is going to be able to approach in the same way, that a company like Qubole can. So again, being able to shift that off and having that kind of support gives you the ability to focus back on what really makes a difference for you. >> So Tripp we're running out of time. We got a really tight schedule here. I'm just curious, it's a busy conference season. Big data's all over the place. How did you end up here? What is it about this conference and this technology that got you to come down to the, I think it's only a 106 today, weather to take it in. What do you see that's a special opportunity here? >> Yeah you know, this is Data Platforms 2017. It's been a really great conference, just in the focus on being able to look at Cloud and look at this differentiation. Outside of the realm of inventing new shiny objects and really putting it to work for new business cases and that sort of thing. >> Jeff: Well Tripp Smith, thanks for stopping by theCUBE. >> Excellent, Thank you guys for having me. >> All right, he's George Gilbert, I'm Jeff Frick. You're watching Data Platforms 2017 from the historic Wigwam Resort in Phoenix Arizona. Thanks for watching. (techno music)
SUMMARY :
brought to you by Qubole. and really kind of flipping the old model on it's head. For the folks that aren't familiar with Clarity Insights and data science and making sure that we can action Seemed to really resonate with you as well. So you created this almost pyramid structure So in the big data world and with Cloud, What are some stories, or one that you can share and put that to work for their business. that you see time and time and time again? to those ROI levers. that if you had money, and letting you focus on your business. So it really seems that the scale Relative to the dist rose that started out on Prim, But it's really putting Hadoop to work in order So again, being able to shift that off that got you to come down to the, and really putting it to work for new business cases from the historic Wigwam Resort in Phoenix Arizona.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Jeff | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
James Hamilton | PERSON | 0.99+ |
Phoenix | LOCATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Tripp Smith | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Tripp | PERSON | 0.99+ |
Clarity Insights | ORGANIZATION | 0.99+ |
Tuesday night | DATE | 0.99+ |
today | DATE | 0.99+ |
Clarity | ORGANIZATION | 0.99+ |
Hadoop | TITLE | 0.99+ |
both | QUANTITY | 0.98+ |
Phoenix Arizona | LOCATION | 0.98+ |
one | QUANTITY | 0.97+ |
Qubole | ORGANIZATION | 0.97+ |
Wigwam Resort | LOCATION | 0.96+ |
first | QUANTITY | 0.96+ |
Data Platforms 2017 | EVENT | 0.95+ |
106 | QUANTITY | 0.93+ |
each vertical | QUANTITY | 0.93+ |
Wikibond | ORGANIZATION | 0.92+ |
2017 | DATE | 0.92+ |
#DataPlatforms2017 | EVENT | 0.91+ |
DataPlatforms 2017 | EVENT | 0.9+ |
single door | QUANTITY | 0.89+ |
first data platform | QUANTITY | 0.88+ |
Narrator: Live from the | TITLE | 0.86+ |
Azure | TITLE | 0.82+ |
theCUBE | ORGANIZATION | 0.8+ |
Data | TITLE | 0.8+ |
Spanner | TITLE | 0.79+ |
Cassandra | TITLE | 0.6+ |
Wigwam | LOCATION | 0.58+ |
Insights | ORGANIZATION | 0.58+ |
Platforms 2017 | EVENT | 0.57+ |
CTO | PERSON | 0.53+ |
Cloud | TITLE | 0.52+ |
Prim | ORGANIZATION | 0.44+ |
Based | OTHER | 0.41+ |
H | TITLE | 0.37+ |
Karthik Ramasamy, Streamlio - Data Platforms 2017 - #DataPlatforms2020
>> Narrator: Hi from the Wigwam in Phoenix, Arizona, it is theCUBE, covering Data Platforms 2017. Brought to you by Qubole. >> Hey welcome back everybody. Jeff Frick with theCUBE. We are down at the historic Wigwam 99 years young just outside of Phoenix, Arizona, Data Platforms 2017. It is really talking about a new approach to big data in cloud put on by Qubole about 200 people, very interesting conversation this morning and we're really interested to have Karthik Ramasamy. He is the co-founder of Streamlio which is still in stealth mode according to his LinkedIn profile so we won't talk about that but long time Twitter guy and really shared some great lessons this morning about things that you guys learned while growing Twitter. So welcome. >> Thank you, thanks for having me. >> Absolutely. One of the key parts of your whole talk was this concept of real time. I always joke with people real time is in time to do something about it. You went through a bunch of examples of real time is really a variable depending on what the right application is but at Twitter real time was super, super important. >> Yes it is indeed important because the nature of the streaming data, the nature of the Twitter data is streaming data because the tweets are coming at a high velocity. And Twitter positioned itself as more of a real time delivery company because that way what happens is whatever the information that we get within Twitter we need to have a strong time budget before we can deliver it to people so that people when they consume the information the information is live or real time. >> But the real time too is becoming obviously for Twitter but for lot of big enterprises it is more and more important and the great analogy I referred before is you used a sample data, is the sample historic data to make decisions. Now you want to keep all the data in real time to make decisions, so its a very different way you drive your decision-making process. >> Very different way of thinking. Especially considering the fact as you said the enterprises are getting into understanding what real time means for them and but if you look at some of the traditional enterprise like financial, they understand the value of real time. Similarly the upcoming new used cases like IoT they understand the value of real time like autonomous vehicles where they have to make quick decisions. Healthcare you have to make quick decisions because the preventive and the predictive maintenance is very important in those kind of segments. So because of those segments, its getting really popular and traditional enterprises like retail and all they're also valuing real time because it allows to blend in into the user behavior so that they can recommend products and other things in real time so that people can react to that so that its becoming more and more important. That's what I would say. >> So Hadoop started out as mostly batch infrastructure and Twitter was pioneer in the design pattern to accommodate both batch and in real time. How has that big data infrastructure evolved so that one, you don't have to split batch in real time and what should we expect going forward to make that platform stronger in terms of in your real time analytics and potentially so that it can inform decisions in systems of record. >> I think like today as of now there are two different infrastructures. One is in general is the Hadoop infrastructure. Other one is more of a real time infrastructure at this point. And the Hadoop is kind of considered as this monolithic, not monolithic, its kind of a mega store where every data like similar to all the rivers kind of reach the sea, it kind of becomes a storage sea where all the data comes and stores there. But before the data comes and stores there, lot of analytics and lot of visibility about the data from the point of its creation before it ends up there it setting done on those rive, whatever you call the data river so you could get lot of analytics done during the time before it ends up so that its more live than the other analytics. Hadoop had its own kind of limitations in terms of how much data it can handle, how real time the data can be. For example, you can kind of dump the data in real time into Hadoop but until you close the file you cannot see the data at all. There is a time budget gets into play there. And you could do smaller files like small, small files writing but the namenode will blob because like within a day you write million files, the namenode is not going to sustain that. So those are the trade-off. That's one of the reason we have to end up doing new real time infrastructure like the distributor log that allows you to the moment the data comes in data is immediately visible within the three to five millisecond timeframe. >> The distributed log you're talking about would be Kafka. The output of that would be to train the model or just score a model and then would that model essentially be carved off from this big data platform and be integrated within a system of record where would informed decisions. >> There are multiple things you could do. First of all, the distributed log essentially the data is kind of, you can think about as a data staging environment where the data kind of lands up there and once it lands up there when there's a lot of sharing of that same data going on in real time, when several jobs are taking they're using some popular data source, it provides a high fan out in the sense like 100 jobs can consume the same data they can be at different parts of the data itself. So that provides a nice sharing environment. Now once the data is around there, now the data is being used for different kind of analytics and one of them could be a model enhancement because typically in the back segment you build the model because you're looking at lot of data and other things, then once the model is built that model is pre-loaded into the real time computer environment like HERON then you look up this model and serve data based on that model whatever it tells you. For example when you do a ad serving to look up that model and what is our relevant ad for you to click. Then the next aspect is model enhancement. Because users behavior is going to change, over a period of time. Now can you capture and incrementally update the model so that those things are also partly done on the real time aspects rather than recomputing the batch and again and again and again. >> Okay so its sort of like a what's the delta? >> Karthik: Yes. >> Let's train on the delta and lets score on the delta. >> Yes and once the delta gets updated then when the new user behavior comes in they can look at that new model what that's being continuously being enhanced and once that enhancement is kind of captured you know that user behavior is changing. And ads are served accordingly. >> Okay so now that our customers are getting serious about moving to the cloud with their big data platforms and the applications on them, have you seen a change in the patterns of the apps they're looking to build or a change in the makeup of the platform that they want to use. >> SO that depends on, typically like, one disclosure is I've worked with Amazon and all, the AWS but within the companies that I worked for its everything is an on frame but thing is having said that cloud is nice because it gives you machines on the fly whenever you need to and it gives a bunch of tools around it where you can bootstrap it and all the various stuff. This works ideal for a smaller company and medium companies but the big companies one of the this things that we calculate in terms of the costwise how much is the cost that we have to pay versus doing it inhouse so there's still a huge gap unless cloud provider is going to provide a huge discount or whatever for the big companies to move in. So that is always a challenge that we get into because think about I have 10 or 20,000 notes of Hadoop can I move all of them into Amazon AWS, how much I am going to pay? Versus the cost of maintaining my own data centers and everything. I would say like I don't know the latest pricing and other things but approximately it comes to three x in terms of cost wise. >> If you're using... >> Our own on-prem and the data center and all of the staffing and everything. There's a difference of I would say three x. >> For on-prem being higher. >> On-prem being lower. >> Lower? >> Yes. >> But that assumes then that you've got flat utilization. >> Flat utilization but, I mean cloud of course I have the expands out of scale and all the various thing that you can, it gives an illusion of unlimited resources but in our case if you're provisioning so much machines in most of the at least 50 or 60% of the machines are used for production but the rest of them are used for staging, development, and all the various other environments so which means like the total cost of those machines even though like only is 50% utilized still you end up saving so much shit like operate out one-third of the cost that might be in the cloud. >> Alright Karthik, that opens up a whole can of interesting conversations. Again we just don't have time to jump into. So I'll give you the last word. When can we expect you to come out of stealth or is that stealthy too? >> It is kind of, that is stealthy too. >> Okay fair enough, I don't want to put you on the spot but thanks for stopping by and sharing your story. >> Karthik: Thanks, thanks for everything. >> Alright, he is Karthik, he is George, I'm Jeff. You're watching theCUBE. We are in the Wigwam resort just outside of phoenix at Data Platforms 2017. We will be back after this short break. Thanks for watching.
SUMMARY :
Narrator: Hi from the Wigwam in Phoenix, Arizona, He is the co-founder of Streamlio One of the key parts of your whole talk was the nature of the streaming data, But the real time too is becoming obviously for Twitter Especially considering the fact as you said the evolved so that one, you don't have to split batch so that its more live than the other analytics. and then would that model essentially be carved off the data is kind of, you can think about as a data staging Yes and once the delta gets updated makeup of the platform that they want to use. one of the this things that we calculate in terms of and all of the staffing and everything. But that assumes then that you've got and all the various other environments So I'll give you the last word. on the spot but thanks for stopping by We are in the Wigwam resort just outside of phoenix
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Karthik | PERSON | 0.99+ |
Karthik Ramasamy | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
Jeff | PERSON | 0.99+ |
100 jobs | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
50% | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
AWS | ORGANIZATION | 0.99+ |
10 | QUANTITY | 0.99+ |
three | QUANTITY | 0.99+ |
Wigwam | LOCATION | 0.99+ |
ORGANIZATION | 0.98+ | |
20,000 notes | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
60% | QUANTITY | 0.98+ |
Streamlio | ORGANIZATION | 0.98+ |
Phoenix, Arizona | LOCATION | 0.98+ |
one | QUANTITY | 0.98+ |
one-third | QUANTITY | 0.98+ |
99 years | QUANTITY | 0.98+ |
Hadoop | TITLE | 0.97+ |
about 200 people | QUANTITY | 0.97+ |
today | DATE | 0.96+ |
One | QUANTITY | 0.96+ |
two different infrastructures | QUANTITY | 0.96+ |
Qubole | PERSON | 0.96+ |
First | QUANTITY | 0.89+ |
five millisecond | QUANTITY | 0.86+ |
theCUBE | ORGANIZATION | 0.86+ |
Data Platforms | ORGANIZATION | 0.85+ |
this morning | DATE | 0.82+ |
Amazon AWS | ORGANIZATION | 0.8+ |
a day | QUANTITY | 0.79+ |
HERON | ORGANIZATION | 0.7+ |
least 50 | QUANTITY | 0.62+ |
three x | QUANTITY | 0.6+ |
llion | QUANTITY | 0.49+ |
Kafka | TITLE | 0.49+ |
2017 | DATE | 0.47+ |
phoenix | ORGANIZATION | 0.46+ |
2017 | EVENT | 0.42+ |
Data Platforms | TITLE | 0.37+ |
Platforms | ORGANIZATION | 0.25+ |
Saket Saurabh, Nexla - Data Platforms 2017 - #DataPlatforms2017
(upbeat music) [Announcer] Live from the Wigwam in Pheonix, Arizona, it's the Cube. Covering Data Platforms 2017. Brought to you by Cue Ball. >> Hey welcome back everybody, Jeff Frick here with the Cube. We are coming down to the end of a great day here at the historic Wigwam at the Data Platforms 2017, lot of great big data practitioners talking about the new way to do things, really coining the term data ops, or maybe not coining it but really leveraging it, as a new way to think about data and using data in your business, to be data-driven, software-defined, automated solution and company. So we're excited to have Saket Saurabh, he is the, and I'm sorry I butchered that, Saurabh. >> Saurabh, yeah. >> Saurabh, thank you, sorry. He is the co-founder and CEO of Nexla, and welcome. >> Thank you. >> So what is Nexla, tell us about Nexla for those that aren't familiar with the company. Thank you so much. Yeah so Nexla is a data operations platform. And the way we look at data is that data is increasingly moving between companies and one of the things that is driving that is the growth in machine learning. So imagine you are an e-commerce company, or a healthcare provider. You need to get data from your different partners. You know, suppliers and point-of-sale systems, and brands and all that. And the companies, when they are getting this data, from all these different places, it's so hard to manage. So we think of, you know just like cloud computing, made it easy to manage thousands of servers, we think of data ops as something that makes it easy to manage those thousands of data sources coming from so many partners. So you've jumped straight past the it's a cool buzz term in way to think about things, into the actual platform. So how does that platform fit within the cloud, and on Prim, is it part of the infrastructure, sits next to the infrastructure, is it a conduit? How does that work? >> Yeah, we think of it as, if you think of maybe machine learning or advanced analytics as the application, then data operations is sort of an underlying infrastructure for it. It's not really the hardware, the storage, but it's a layer on top. The job of data operations is to get the data from where it is to where you need it to be, and in the right form and shape. So now you can act on it. >> And do you find yourself replacing legacy stuff, or is this a brand new demand because of all the variant and so many types of datasets that are coming in that people want to leverage. >> Yeah, I mean to be honest, some of this has always been there in the sense that the day you connected a database to a network data started to move around. But if you think of the scale that has happened in the last six or seven years, none of those existing systems were ever designed for that. So when we talk about data growing at at a Moore's Law rate, when we talk about everybody getting into machine learning, when we talk about thousands of data sets across so many different partners that you work with, and when we think that reports that you get from your partners is no more sufficient, you need that underlying data, you can not basically feed that report into an algo. So when you look at all of these things we feel like it is a new thing in some ways. >> Right. Well, I want to unpack that a little bit because you made an interesting comment, before you turned on the cameras you just repeated, that you can't run an algorithm on a report. And in a world where we've got all the shared data sets, and it's funny too right, because you used to run a sample, now you want, you said, the raw. Not only all, but the raw data, so that you can do with it what you wish. Very different paradigm. >> Yeah. >> It sounds like there's a lot more, and you're not just parsing what's in the report, but you have to give it structure that can be combined with other data sources. And that sounds like a rather challenging task. Because the structure, all the metadata, the context that gives the data meaning that is relevant to other data sets, where does that come from? >> Yeah, so what happens, and this has been how technology companies have started to evolve. You want to focus on your core business. And therefore you will use a provider that processes your payments, you will use a provider that gives you search. You will use a provider that provides you the data for example for your e-commerce system. So there are different types of vendors you're working with. Which means that there's different types of data being involved. So when I look at for example a brand today, you could be say, a Nike, and your products are being sold on so many websites. If you want to really analyze your business well, you want data from every single one of those places, where your data team can now access it. So yes, it is that raw data, it is that metadata, and it is the data coming from all the systems that you can look at together and say when I ran this ad this is how people reacted to it, this was the marketing lift from that, this is the purchase that happened across these different channels, this is how my top line or bottom line was affected. And to analyze everything together you need all the data in a place. >> I'm curious on what do you find on the change in the business relationship. Because I'm sure there were agreements structured in another time which weren't quite as detailed, where the expectations in terms of what was exchanged wasn't quite this deep. Are you seeing people have to change their relationships to get this data? Is it out there that they're getting it, or is this really changing the way that people partner in data exchange, on like the example that you just used between say Nike and Foot Locker, to pick a name. >> Yeah, so I think companies that have worked together have always had reports come in, so you would get a daily report of how much you sold. Now just a high-level report of how much you sold is not sufficient anymore. You want to understand where was it bought, in which city, under what weather conditions, by what kind of user and all that stuff. So I think what companies are looking at, again, they have built their data systems, they have the data teams, unless they give the data their teams cannot be effective and you cannot really take a daily sales report and feed that into your algorithm, right? So you need very fine-grained data for that. So I think companies are doing this where, hey you were giving me a report before, I also need some underlying data. Report is for a business executive to look at and see how business is doing, and the underlying data is really for that algorithm to understand and maybe identify things that a report might not. >> Wouldn't there have been already, at least in the example of sell-through, structured data that's been exchanged between partners already like vendor-managed inventory, or you know where like a downstream retailer might make their sell-through data accessible to suppliers who actually take ownership of the inventory and are responsible for stocking it at optimal levels. >> Yeah, I think Walmart was the innovator in that, with the POS link system, back in the day, for retail. But the point is that this need for data to go from one company to their partners and back and forth is across every sector. So you need that in e-commerce, you need that in fintech, we see companies who have to manage your portfolio needs to connect with different banks and brokerages you work with to get the data. We see that in healthcare across different providers and pharmaceutical companies, you need that. We see that in automotive. If every care generates data, an insurance company needs to be able to understand that and look at it. >> This, it's a huge problem you're addressing, because this is the friction between inter-company applications. And we went through this with the B2B marketplaces, 15 plus years ago. But the reason we did these marketplace hubs was so that we could standardize the information exchange. If it's just Walgreens talking to Pfizer, and then doing another one-off deal with, I don't know, Lily, I don't know if they both still exist, it won't work for connecting all of pharmacy with all of pharma. How do you ensure standards between downstream and upstream? >> Yeah. So you're right, this has happened. When we do a wire transfer from one person to another, some data goes from a bank to another bank, still takes hours to get that, it's very tiny amount of data. That has all exploded, we are talking about zetabytes of data now every year. So the challenge is significantly bigger. Now coming to standards, what we have found, that two companies sitting together and defining a standard almost never works. It never works because applications change, systems change, the change is the only constant. So the way we've approached it at our company is, we monitor the data, we sit on top of the data and just learn the structure as we observe data flowing through. So we have tons of data flowing through and we're constantly learning the structure, and are identifying how the structure will map to the destination. So again, applying machine learning to see how the structure is changing, how the data volume is changing. So you are getting data from somewhere say every hour, and then it doesn't show up for two hours. Traditionally systems will go down, you may not even find for five days that the data wasn't there for that. So we look at the data structure, the amount of data, the time when it comes, and everything to instantly learn and be able to inform the downstream systems of what they should be expecting, if there is a change that somebody needs to be alerted about. So a lot of innovation is going in to doing this at scale without necessarily having to predefine something in a tight box that cannot be changed. Because it's extremely hard to control. >> All right, Saket, that's a great explanation. We're going to have to leave it there, we're out of time. And thank you for taking a few minutes out of your day to stop by. >> Thank you. >> All right. Jeff Frick with George Gilbert, we are at Data Platforms 2017, Pheonix Arizona, thanks for watching. (electronic music)
SUMMARY :
Brought to you by Cue Ball. at the historic Wigwam at the Data Platforms 2017, He is the co-founder and CEO of Nexla, So we think of, you know just like cloud computing, So now you can act on it. And do you find yourself replacing legacy stuff, the day you connected a database to a network Not only all, but the raw data, so that you can do with it but you have to give it structure that can be combined And to analyze everything together you need all the data I'm curious on what do you find on the change So you need very fine-grained data for that. or you know where like a downstream retailer But the point is that this need for data to go But the reason we did these marketplace hubs and just learn the structure as we observe data And thank you for taking a few minutes out of your day we are at Data Platforms 2017, Pheonix Arizona,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Walmart | ORGANIZATION | 0.99+ |
Walgreens | ORGANIZATION | 0.99+ |
Saurabh | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
Nike | ORGANIZATION | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Pfizer | ORGANIZATION | 0.99+ |
two hours | QUANTITY | 0.99+ |
five days | QUANTITY | 0.99+ |
Lily | PERSON | 0.99+ |
two companies | QUANTITY | 0.99+ |
Nexla | ORGANIZATION | 0.99+ |
Saket | PERSON | 0.99+ |
Foot Locker | ORGANIZATION | 0.99+ |
Saket Saurabh | PERSON | 0.99+ |
one person | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
Pheonix, Arizona | LOCATION | 0.97+ |
Cube | ORGANIZATION | 0.97+ |
15 plus years ago | DATE | 0.97+ |
today | DATE | 0.97+ |
thousands of data sources | QUANTITY | 0.97+ |
Wigwam | LOCATION | 0.96+ |
Data Platforms 2017 | EVENT | 0.96+ |
thousands of servers | QUANTITY | 0.95+ |
one company | QUANTITY | 0.95+ |
#DataPlatforms2017 | EVENT | 0.92+ |
Cue Ball | PERSON | 0.9+ |
thousands of data sets | QUANTITY | 0.9+ |
Arizona | LOCATION | 0.75+ |
last six | DATE | 0.73+ |
hour | QUANTITY | 0.69+ |
single | QUANTITY | 0.67+ |
seven years | QUANTITY | 0.67+ |
Moore | ORGANIZATION | 0.66+ |
every | QUANTITY | 0.64+ |
Pheonix | LOCATION | 0.54+ |
Covering | EVENT | 0.51+ |