Image Title

Search Results for Knowledge Graph:

Discussion about Walmart's Approach | Supercloud2


 

(upbeat electronic music) >> Okay, welcome back to Supercloud 2, live here in Palo Alto. I'm John Furrier, with Dave Vellante. Again, all day wall-to-wall coverage, just had a great interview with Walmart, we've got a Next interview coming up, you're going to hear from Bob Muglia and Tristan Handy, two experts, both experienced entrepreneurs, executives in technology. We're here to break down what just happened with Walmart, and what's coming up with George Gilbert, former colleague, Wikibon analyst, Gartner Analyst, and now independent investor and expert. George, great to see you, I know you're following this space. Like you read about it, remember the first days when Dataverse came out, we were talking about them coming out of Berkeley? >> Dave: Snowflake. >> John: Snowflake. >> Dave: Snowflake In the early days. >> We, collectively, have been chronicling the data movement since 2010, you were part of our team, now you've got your nose to the grindstone, you're seeing the next wave. What's this all about? Walmart building their own super cloud, we got Bob Muglia talking about how these next wave of apps are coming. What are the super apps? What's the super cloud to you? >> Well, this key's off Dave's really interesting questions to Walmart, which was like, how are they building their supercloud? 'Cause it makes a concrete example. But what was most interesting about his description of the Walmart WCMP, I forgot what it stood for. >> Dave: Walmart Cloud Native Platform. >> Walmart, okay. He was describing where the logic could run in these stateless containers, and maybe eventually serverless functions. But that's just it, and that's the paradigm of microservices, where the logic is in this stateless thing, where you can shoot it, or it fails, and you can spin up another one, and you've lost nothing. >> That was their triplet model. >> Yeah, in fact, and that was what they were trying to move to, where these things move fluidly between data centers. >> But there's a but, right? Which is they're all stateless apps in the cloud. >> George: Yeah. >> And all their stateful apps are on-prem and VMs. >> Or the stateful part of the apps are in VMs. >> Okay. >> And so if they really want to lift their super cloud layer off of this different provider's infrastructure, they're going to need a much more advanced software platform that manages data. And that goes to the -- >> Muglia and Handy, that you and I did, that's coming up next. So the big takeaway there, George, was, I'll set it up and you can chime in, a new breed of data apps is emerging, and this highly decentralized infrastructure. And Tristan Handy of DBT Labs has a sort of a solution to begin the journey today, Muglia is working on something that's way out there, describe what you learned from it. >> Okay. So to talk about what the new data apps are, and then the platform to run them, I go back to the using what will probably be seen as one of the first data app examples, was Uber, where you're describing entities in the real world, riders, drivers, routes, city, like a city plan, these are all defined by data. And the data is described in a structure called a knowledge graph, for lack of a, no one's come up with a better term. But that means the tough, the stuff that Jack built, which was all stateless and sits above cloud vendors' infrastructure, it needs an entirely different type of software that's much, much harder to build. And the way Bob described it is, you're going to need an entirely new data management infrastructure to handle this. But where, you know, we had this really colorful interview where it was like Rock 'Em Sock 'Em, but they weren't really that much in opposition to each other, because Tristan is going to define this layer, starting with like business intelligence metrics, where you're defining things like bookings, billings, and revenue, in business terms, not in SQL terms -- >> Well, business terms, if I can interrupt, he said the one thing we haven't figured out how to APIify is KPIs that sit inside of a data warehouse, and that's essentially what he's doing. >> George: That's what he's doing, yes. >> Right. And so then you can now expose those APIs, those KPIs, that sit inside of a data warehouse, or a data lake, a data store, whatever, through APIs. >> George: And the difference -- >> So what does that do for you? >> Okay, so all of a sudden, instead of working at technical data terms, where you're dealing with tables and columns and rows, you're dealing instead with business entities, using the Uber example of drivers, riders, routes, you know, ETA prices. But you can define, DBT will be able to define those progressively in richer terms, today they're just doing things like bookings, billings, and revenue. But Bob's point was, today, the data warehouse that actually runs that stuff, whereas DBT defines it, the data warehouse that runs it, you can't do it with relational technology >> Dave: Relational totality, cashing architecture. >> SQL, you can't -- >> SQL caching architectures in memory, you can't do it, you've got to rethink down to the way the data lake is laid out on the disk or cache. Which by the way, Thomas Hazel, who's speaking later, he's the chief scientist and founder at Chaos Search, he says, "I've actually done this," basically leave it in an S3 bucket, and I'm going to query it, you know, with no caching. >> All right, so what I hear you saying then, tell me if I got this right, there are some some things that are inadequate in today's world, that's not compatible with the Supercloud wave. >> Yeah. >> Specifically how you're using storage, and data, and stateful. >> Yes. >> And then the software that makes it run, is that what you're saying? >> George: Yeah. >> There's one other thing you mentioned to me, it's like, when you're using a CRM system, a human is inputting data. >> George: Nothing happens till the human does something. >> Right, nothing happens until that data entry occurs. What you're talking about is a world that self forms, polling data from the transaction system, or the ERP system, and then builds a plan without human intervention. >> Yeah. Something in the real world happens, where the user says, "I want a ride." And then the software goes out and says, "Okay, we got to match a driver to the rider, we got to calculate how long it takes to get there, how long to deliver 'em." That's not driven by a form, other than the first person hitting a button and saying, "I want a ride." All the other stuff happens autonomously, driven by data and analytics. >> But my question was different, Dave, so I want to get specific, because this is where the startups are going to come in, this is the disruption. Snowflake is a data warehouse that's in the cloud, they call it a data cloud, they refactored it, they did it differently, the success, we all know it looks like. These areas where it's inadequate for the future are areas that'll probably be either disrupted, or refactored. What is that? >> That's what Muglia's contention is, that the DBT can start adding that layer where you define these business entities, they're like mini digital twins, you can define them, but the data warehouse isn't strong enough to actually manage and run them. And Muglia is behind a company that is rethinking the database, really in a fundamental way that hasn't been done in 40 or 50 years. It's the first, in his contention, the first real rethink of database technology in a fundamental way since the rise of the relational database 50 years ago. >> And I think you admit it's a real Hail Mary, I mean it's quite a long shot right? >> George: Yes. >> Huge potential. >> But they're pretty far along. >> Well, we've been talking on theCUBE for 12 years, and what, 10 years going to AWS Reinvent, Dave, that no one database will rule the world, Amazon kind of showed that with them. What's different, is it databases are changing, or you can have multiple databases, or? >> It's a good question. And the reason we've had multiple different types of databases, each one specialized for a different type of workload, but actually what Muglia is behind is a new engine that would essentially, you'll never get rid of the data warehouse, or the equivalent engine in like a Databricks datalake house, but it's a new engine that manages the thing that describes all the data and holds it together, and that's the new application platform. >> George, we have one minute left, I want to get real quick thought, you're an investor, and we know your history, and the folks watching, George's got a deep pedigree in investment data, and we can testify against that. If you're going to invest in a company right now, if you're a customer, I got to make a bet, what does success look like for me, what do I want walking through my door, and what do I want to send out? What companies do I want to look at? What's the kind of of vendor do I want to evaluate? Which ones do I want to send home? >> Well, the first thing a customer really has to do when they're thinking about next gen applications, all the people have told you guys, "we got to get our data in order," getting that data in order means building an integrated view of all your data landscape, which is data coming out of all your applications. It starts with the data model, so, today, you basically extract data from all your operational systems, put it in this one giant, central place, like a warehouse or lake house, but eventually you want this, whether you call it a fabric or a mesh, it's all the data that describes how everything hangs together as in one big knowledge graph. There's different ways to implement that. And that's the most critical thing, 'cause that describes your Uber landscape, your Uber platform. >> That's going to power the digital transformation, which will power the business transformation, which powers the business model, which allows the builders to build -- >> Yes. >> Coders to code. That's Supercloud application. >> Yeah. >> George, great stuff. Next interview you're going to see right here is Bob Muglia and Tristan Handy, they're going to unpack this new wave. Great segment, really worth unpacking and reading between the lines with George, and Dave Vellante, and those two great guests. And then we'll come back here for the studio for more of the live coverage of Supercloud 2. Thanks for watching. (upbeat electronic music)

Published Date : Feb 17 2023

SUMMARY :

remember the first days What's the super cloud to you? of the Walmart WCMP, I and that's the paradigm of microservices, and that was what they stateless apps in the cloud. And all their stateful of the apps are in VMs. And that goes to the -- Muglia and Handy, that you and I did, But that means the tough, he said the one thing we haven't And so then you can now the data warehouse that runs it, Dave: Relational totality, Which by the way, Thomas I hear you saying then, and data, and stateful. thing you mentioned to me, George: Nothing happens polling data from the transaction Something in the real world happens, that's in the cloud, that the DBT can start adding that layer Amazon kind of showed that with them. and that's the new application platform. and the folks watching, all the people have told you guys, Coders to code. for more of the live

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

GeorgePERSON

0.99+

Bob MugliaPERSON

0.99+

Tristan HandyPERSON

0.99+

DavePERSON

0.99+

BobPERSON

0.99+

Thomas HazelPERSON

0.99+

George GilbertPERSON

0.99+

AmazonORGANIZATION

0.99+

WalmartORGANIZATION

0.99+

John FurrierPERSON

0.99+

Palo AltoLOCATION

0.99+

Chaos SearchORGANIZATION

0.99+

JackPERSON

0.99+

TristanPERSON

0.99+

12 yearsQUANTITY

0.99+

BerkeleyLOCATION

0.99+

UberORGANIZATION

0.99+

firstQUANTITY

0.99+

DBT LabsORGANIZATION

0.99+

10 yearsQUANTITY

0.99+

two expertsQUANTITY

0.99+

Supercloud 2TITLE

0.99+

GartnerORGANIZATION

0.99+

AWSORGANIZATION

0.99+

bothQUANTITY

0.99+

MugliaORGANIZATION

0.99+

one minuteQUANTITY

0.99+

40QUANTITY

0.99+

two great guestsQUANTITY

0.98+

WikibonORGANIZATION

0.98+

50 yearsQUANTITY

0.98+

JohnPERSON

0.98+

Rock 'Em Sock 'EmTITLE

0.98+

todayDATE

0.98+

first personQUANTITY

0.98+

DatabricksORGANIZATION

0.98+

S3COMMERCIAL_ITEM

0.97+

50 years agoDATE

0.97+

2010DATE

0.97+

MaryPERSON

0.96+

first daysQUANTITY

0.96+

SQLTITLE

0.96+

oneQUANTITY

0.95+

Supercloud waveEVENT

0.95+

each oneQUANTITY

0.93+

DBTORGANIZATION

0.91+

SupercloudTITLE

0.91+

Supercloud2TITLE

0.91+

Supercloud 2ORGANIZATION

0.89+

SnowflakeTITLE

0.86+

DataverseORGANIZATION

0.83+

tripletQUANTITY

0.78+

Aman Naimat, Demandbase, Chapter 2 | George Gilbert at HQ


 

>> And we're back, this is George Gilbert from Wikibon, and I'm here with Aman Naimat at Demandbase, the pioneers in the next gen AI generation of CRM. So Aman, let's continue where we left off. So we're talking about natural language processing, and I think most people are familiar with it more on the B to C technology, where the big internet providers have sort of accumulated a lot of voice data and have learned how to process it and convert it into text. So tell us how B to B NLP is different, to use a lot of acronyms. In other words, how you're using it to build up a map of relationships between businesses. >> Right, yeah, we call it the demand graph. So it's an interesting question, because firstly, it turns out that, while very different, B to B is also, the language is quite boring. It doesn't evolve as fast as consumer concepts. And so it makes the problem much more approachable from a language understanding point of view. So natural language processing or natural language understanding is all about how machines can understand and store and take action on language. So while we were working on this four or five years ago, and that's my background as well, it turned out the problem was simpler, because human language is very rich, and natural language processing converting voice to text is trivial compared to understanding meaning of things and words, which is much more difficult. Or even the sense of the word, apparently in English each word has six meanings, right? We call them word senses. So the problem was only simpler because B to B language doesn't tend to evolve as fast as regular language, because terms stick in an industry. The challenge with B to B and why it was different is that each industry or sub-industry has a very specific language and jargon and acronyms. So to really understand that industry, you need to come from that industry. So if you go back to the CRM example of what happened 10, 20 years ago, you would have a sales person that would come from that industry if you wanted to sell into it. And that still happens in some traditional companies, right? So the idea was to be able to replicate the knowledge that they would have as if they came from that industry. So it's the language, the vocabularies, and then ultimately have a way of storing and taking action on it. It's very analogous to what Google had done with Knowledge Graph. >> Alright, so two questions I guess. First is, it sounds almost like a translation problem, in the sense that you have some base language primitives, like partner, supplier, competitor, customer. But that the language in each industry is different, and so you have to map those down to those sort of primitives. So tell us the process. You don't have on staff people who translate from every industry. >> I mean that was the whole, writing logical rules or expressions for language, which use conventional good old fashioned AI. >> You mean this was the rules-based knowledge engineering? >> That's right. And that clearly did not succeed, because it is impossible to do it. >> The old quip which was, one researcher said, "Every time I fired a rules engineer, "my accuracy score would go up." (chuckles) >> That's right, and now the problem is because language is evolving, and the context is so different. So even pharmaceutical companies in the US or in the Bay Area would use different language than pharma in Europe or in Switzerland. And so it's just impossible to be able to quantify the variations. >> George: To do it manually. >> To do it manually, it's impossible. It's certainly not possible for a small startup. And we did try having it be generated. In the early days we used to have crowdsource workers validate the machine. But it turned out that they couldn't do it either, because they didn't understand the pharmaceutical language either, right? So in the end, the only way to do that was to have some sort of model and some seed data to be able to validate it, or to hire experts and to have small samples of data to validate. So going back to the graph, right, it turns out that when we have seen sophisticated AI work, you know, towards complex problems, so for example predicting your next connection on LinkedIn, or your next friend, or what ads should you see on Facebook, they have used network-based data, social graph data, or in the case of Google, it's the Knowledge Graph, of how things are connected. And somehow machine learning and AI systems based on network data tend to be more powerful and more intuitive than other types of models. >> So OK, when you say model, help us with an example of, you're representing a business and who it's connected to and its place in the world. >> So the demand graph is basically as Demandbase, who are our customers, who are their partners, who are their suppliers, who are their competitors. And utilizing that network of companies in a manner that we have network of friends on LinkedIn or Facebook. And it turns out that businesses are extremely social in nature. In fact, we found out that the connections between companies have more signal, and are more predictive of acquisition or predicting the next customer, than even the Facebook social graph. So it's much easier to utilize the business graph, the B to B business graph, to predict the next customer, than to say, predict your next friend on Facebook. >> OK, so that's a perfect analogy. So tell us about the raw material you churn through on the web, and then how you learn what that terminology might be. You've boot-strapped a little bit, now you have all this data, and you have to make sense out of new terms, and then you build this graph of who this business is related to. >> That's right, and the hardest part is to be able to handle rumors and to be able to handle jokes, like, "Isn't it time for Microsoft to just buy Salesforce?" Question mark, smiley face. You know, so it's a challenging problem. But we were lucky that business language and business press is definitely more boring than, you know, people talking about movies. >> George: Or Reddit. >> Or Reddit, right. So the way we work is we process the entire business internet, or the entire internet. And initially we used to crawl it ourselves, but soon realized that Common Crawl, which is an open source foundation that has crawled the internet and put at least a large chunk of it, and that really enabled us to stop the crawling. And we read the entire internet and look at, ultimately we're interested in businesses, 'cause that's the world we are, in business, B to B marketing and B to B sales. We look at wherever there's a company mentioned or a business person or business title mentioned, and then ignore everything else. 'Cause if it doesn't have a company or a business person, we don't care. Right, so, or a business product. So we read the entire internet, and try to then infer that this is, Amazon is mentioned in it, then we figure out, is it Amazon the company, or is it Amazon the river? So that's problem number one. So we call it the entity linking problem. And then we try to understand and piece together the various expressions of relationships between companies expressed in text. It could be a press release, it could be a competitive analysis, it could be announcement of a new product. It could be a supply chain relationship. It could be a rumor. And then it also turns out the internet's very noisy, so we look at corroboration across multiple disparate sources-- >> George: Interesting, to decide-- >> Is it true? >> George: To signal is it real. >> Right, yeah, 'cause there's a lot of fake news out there. (George laughs) So we look at corroboration and the sources to be able to infer if we can have confidence in this. >> I can imagine this could be applied to-- >> A lot of other problems. >> Political issues. So OK, you've got all these sources, give us some specific examples of feeds, of sources, and then help us understand. 'Cause I don't think we've heard a lot about the notion of boot-strapping, and it sounds like you're generalizing, which is not something that most of us are familiar with who have a surface-level familiarity with machine learning. >> I think there was a lot of research like, not to credit Google too much, but... Boot-strapping methods were used by Sergei I think was the first person, and then he gave up 'cause they founded Google and they moved on. And since then in 2003, 2004, there was a lot of research around this topic. You know, and it's in the genre of unsupervised machine learning models. And in the real world, because there's less labeled data, we tend to find that to be an extremely effective method, to learn language and obviously now with deep learning, it's also being utilized more, unsupervised methods. But the idea is really to, and this was around five years ago when we started building this graph, and I obviously don't know how the Google Knowledge Graph is built, but I can assume it's a similar technique. We don't tend to talk about how commercial products work that much. But the idea is basically to generalize models or learn from a small seed, so let's say I put in seed like Nike and Adidas, and say they compete, right? And then if you look at the entire internet and look at all the expressions of how Nike and Adidas are expressed together in language, it could be, you know, "I think "Nike shoes are better than Adidas." >> Ah, so it's not just that you find an opinion that they're better than, but you find all the expressions that explain that they're different and they're competition. >> That's right. But we also find cases where somebody's saying, "I bought Nike and Adidas," or, "Nike and Adidas shoes are sold here." So we have to be able to be smart enough to discern when it's something else and not competition. >> OK, so you've told us how this graph gets built out. So the suppliers, the partners, the customers, the competitors, now you've got this foundation-- >> And people and products as well. >> OK, people, products. You've got this really rich foundation. Now you build and application on top of it. Tell us about CRM with that foundation. >> Yeah, I mean we have the demand graph, in which we tie in also things around basic data that you could find from graphics and intent that we've also built. But it also turns out that the knowledge graph itself, our initial intuition was that we'll just expose this to end users, and they'll be able to figure it out. But it was just too complicated. It really needed another level of machinery and AI on top to take advantage of the graph, and to be able to build prescriptive actions. And action could be, or to solve a business problem. A problem could be, I'm an IOT startup, I'm looking for manufacturing companies who will buy my product. Or it could be, I am a venture capital firm, I want to understand what other venture capital firms are investing in. Or, hey, I'm Tesla, and I'm looking for a new supplier for the new Tesla screen. Or you know, things of that nature. So then we apply and build specific models, more machine learning, or layers of machine learning, to then solve specific business problems. Like the reinforcement learning to understand next best action. >> And are these models associated with one of your customers? >> No, they're general purpose, they're packaged applications. >> OK, tell us more, so what was the base level technology that you started with in terms of the being able to manage a customer conversation, a marketing conversation, and then how did that get richer over time? >> Yeah, I mean we take our proprietary data sets that we've accumulated over the years and manufactured over the years, and then co-mingle with customer data, which we keep private, 'cause they own the data. And the technology is generic, but you're right, the model being generated by the machine is specific to every customer. So obviously the next best action model for a pharmaceutical company is based on doctors visiting, and is this person an oncologist, or what they're researching online. And that model is very different than a model for Demandbase for example, or Salesforce. >> Is it that the algorithm's different, or it's trained on different data? >> It's trained on different data. It's the same code, I mean we only have 20, 30 data scientists, so we're obviously not going to build custom code for... So the idea is it's the same model, but the same meta model is trained on different data. So public data, but also customers' private data. >> And how much does the customer, let's say your customer's Tesla, how much of it is them running some of their data through this boot-strapping process, versus how much of it is, your model is set up and it just automatically once you've boot-strapped it, it automatically starts learning from the interactions with the Tesla, with Tesla itself from all the different partners and customers? >> Right, I think you know, we have found, most startups are just learning over small data sets, which are customer-centric. What we have found is real magic happens when you take private data and combine it with large amounts of public data. So at Demandbase, we have massive amounts of public and proprietary data. And then we plug in, and we have to tell you that our client is Tesla, so it understands the localized graph, and knows the Tesla ecosystem, and that's based on public data sets and our proprietary data. Then we also bring in your private slice whenever possible. >> George: Private...? >> Slice of data. So we have code that can plug into your web site, and then start understanding interactions that your customers are having. And then based on that, we're able to train our models. As much as possible, we try to automate the data capture process, so in essence using a sensor or using a pixel on your web site, and then we take that private stream of data and include it in our graph and merge it in, and that's where we find... Our data by itself is not as powerful as our data mixed with your private data. >> So I guess one way to think about it would be, there's a skeletal graph, and that may be sounding too minimalistic, there's a graph. But let's say you take Tesla as the example, you tell them what data you need from them, and that trains the meta models, and then it fleshes out the graph of the Tesla ecosystem. >> Right, whatever data we couldn't get or infer, from the outside. And we have a lot of proprietary data, where we see online traffic, business traffic, what people are reading, who's interested in what, for hundreds of millions of people. We have developed that technology. So we know a lot without actually getting people's private slice. But you know, whenever possible, we want the maximum impact. >> So... >> It's actually simple, and let's divorce the words graphs for a second. It's really about, let's say that I know you, right, and there's some information you can tell me about you. But imagine if I google your name, and I read every document about you, every video you have produced, every blog you have written, then I have the best of both knowledge, right, your private data from maybe your social graph on Facebook, and then your public data. And then if I knew, you know... If I partnered with Forbes and they told me you logged in and read something on Forbes, then they'll get me that data, so now I really have a deep understanding of what you're interested in, who you are, what's your language, you know, what are you interested in. It's that, sort of simplified, but similar, at a much larger scale. >> Alright, let's take a pause at this point and then we'll come back with part three. >> Excellent.

Published Date : Nov 2 2017

SUMMARY :

more on the B to C technology, So the idea was to be able to replicate in the sense that you have I mean that was the because it is impossible to do it. The old quip which And so it's just impossible to be So in the end, the only way to do that was So OK, when you say model, the B to B business graph, and then how you learn what the hardest part is to So the way we work is and the sources to be and it sounds like you're generalizing, But the idea is basically to generalize Ah, so it's not just that you find So we have to be able to So the suppliers, the Now you build and and to be able to build No, they're general purpose, and manufactured over the years, So the idea is it's the same model, and we have to tell you and then we take that graph of the Tesla ecosystem. get or infer, from the outside. and then your public data. and then we'll come back with part three.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
SwitzerlandLOCATION

0.99+

AmazonORGANIZATION

0.99+

EuropeLOCATION

0.99+

George GilbertPERSON

0.99+

USLOCATION

0.99+

2003DATE

0.99+

GeorgePERSON

0.99+

SergeiPERSON

0.99+

GoogleORGANIZATION

0.99+

Bay AreaLOCATION

0.99+

TeslaORGANIZATION

0.99+

AdidasORGANIZATION

0.99+

NikeORGANIZATION

0.99+

two questionsQUANTITY

0.99+

six meaningsQUANTITY

0.99+

2004DATE

0.99+

MicrosoftORGANIZATION

0.99+

ForbesORGANIZATION

0.99+

FirstQUANTITY

0.99+

DemandbaseORGANIZATION

0.99+

each wordQUANTITY

0.99+

bothQUANTITY

0.99+

LinkedInORGANIZATION

0.99+

AmanORGANIZATION

0.99+

each industryQUANTITY

0.99+

FacebookORGANIZATION

0.99+

fourDATE

0.98+

Aman NaimatPERSON

0.98+

WikibonORGANIZATION

0.95+

RedditORGANIZATION

0.95+

hundreds of millions of peopleQUANTITY

0.95+

EnglishOTHER

0.94+

10, 20 years agoDATE

0.94+

first personQUANTITY

0.94+

one wayQUANTITY

0.94+

Aman NaimatORGANIZATION

0.94+

five years agoDATE

0.93+

20, 30 data scientistsQUANTITY

0.88+

SalesforceORGANIZATION

0.88+

firstlyQUANTITY

0.86+

one researcherQUANTITY

0.83+

around five years agoDATE

0.82+

oneQUANTITY

0.73+

a secondQUANTITY

0.71+

SalesforceTITLE

0.67+

Chapter 2OTHER

0.64+

Knowledge GraphTITLE

0.63+

part threeQUANTITY

0.56+