Ed Walsh and Thomas Hazel, ChaosSearch

>> Welcome to theCUBE, I am Dave Vellante. And today we're going to explore the ebb and flow of data as it travels into the cloud and the data lake. The concept of data lakes was alluring when it was first coined last decade by CTO James Dixon. Rather than be limited to highly structured and curated data that lives in a relational database in the form of an expensive and rigid data warehouse or a data mart. A data lake is formed by flowing data from a variety of sources into a scalable repository, like, say an S3 bucket that anyone can access, dive into, they can extract water, A.K.A data, from that lake and analyze data that's much more fine-grained and less expensive to store at scale. The problem became that organizations started to dump everything into their data lakes with no schema on our right, no metadata, no context, just shoving it into the data lake and figure out what's valuable at some point down the road. Kind of reminds you of your attic, right? Except this is an attic in the cloud. So it's too big to clean out over a weekend. Well look, it's 2021 and we should be solving this problem by now. A lot of folks are working on this, but often the solutions add other complexities for technology pros. So to understand this better, we're going to enlist the help of ChaosSearch CEO Ed Walsh, and Thomas Hazel, the CTO and Founder of ChaosSearch. We're also going to speak with Kevin Miller who's the Vice President and General Manager of S3 at Amazon web services. And of course they manage the largest and deepest data lakes on the planet. And we'll hear from a customer to get their perspective on this problem and how to go about solving it, but let's get started. Ed, Thomas, great to see you. Thanks for coming on theCUBE. >> Likewise. >> Face to face, it's really good to be here. >> It is nice face to face. >> It's great. >> So, Ed, let me start with you. We've been talking about data lakes in the cloud forever. Why is it still so difficult to extract value from those data lakes? >> Good question. I mean, data analytics at scale has always been a challenge, right? So, we're making some incremental changes. As you mentioned that we need to see some step function changes. But in fact, it's the reason ChaosSearch was really founded. But if you look at it, the same challenge around data warehouse or a data lake. Really it's not just to flowing the data in, it's how to get insights out. So it kind of falls into a couple of areas, but the business side will always complain and it's kind of uniform across everything in data lakes, everything in data warehousing. They'll say, "Hey, listen, I typically have to deal with a centralized team to do that data prep because it's data scientists and DBAs". Most of the time, they're a centralized group. Sometimes they're are business units, but most of the time, because they're scarce resources together. And then it takes a lot of time. It's arduous, it's complicated, it's a rigid process of the deal of the team, hard to add new data, but also it's hard to, it's very hard to share data and there's no way to governance without locking it down. And of course they would be more self-serve. So there's, you hear from the business side constantly now underneath is like, there's some real technology issues that we haven't really changed the way we're doing data prep since the two thousands, right? So if you look at it, it's, it falls two big areas. It's one, how to do data prep. How do you take, a request comes in from a business unit. I want to do X, Y, Z with this data. I want to use this type of tool sets to do the following. Someone has to be smart, how to put that data in the right schema, you mentioned. You have to put it in the right format, that the tool sets can analyze that data before you do anything. And then second thing, I'll come back to that 'cause that's the biggest challenge. But the second challenge is how these different data lakes and data warehouses are now persisting data and the complexity of managing that data and also the cost of computing it. And I'll go through that. But basically the biggest thing is actually getting it from raw data so the rigidness and complexity that the business sides are using it is literally someone has to do this ETL process, extract, transform, load. They're actually taking data, a request comes in, I need so much data in this type of way to put together. They're literally physically duplicating data and putting it together on a schema. They're stitching together almost a data puddle for all these different requests. And what happens is anytime they have to do that, someone has to do it. And it's, very skilled resources are scanned in the enterprise, right? So it's a DBS and data scientists. And then when they want new data, you give them a set of data set. They're always saying, what can I add to this data? Now that I've seen the reports. I want to add this data more fresh. And the same process has to happen. This takes about 60% to 80% of the data scientists in DPA's to do this work. It's kind of well-documented. And this is what actually stops the process. That's what is rigid. They have to be rigid because there's a process around that. That's the biggest challenge of doing this. And it takes an enterprise, weeks or months. I always say three weeks or three months. And no one challenges beyond that. It also takes the same skill set of people that you want to drive digital transformation, data warehousing initiatives, motorization, being data driven or all these data scientists and DBS they don't have enough of. So this is not only hurting you getting insights out of your day like in the warehouses. It's also, this resource constraint is hurting you actually getting. >> So that smallest atomic unit is that team, that's super specialized team, right? >> Right. >> Yeah. Okay. So you guys talk about activating the data lake. >> Yep. >> For analytics. What's unique about that? What problems are you all solving? You know, when you guys crew created this magic sauce. >> No, and basically, there's a lot of things. I highlighted the biggest one is how to do the data prep, but also you're persisting and using the data. But in the end, it's like, there's a lot of challenges at how to get analytics at scale. And this is really where Thomas and I founded the team to go after this, but I'll try to say it simply. What we're doing, I'll try to compare and contrast what we do compared to what you do with maybe an elastic cluster or a BI cluster. And if you look at it, what we do is we simply put your data in S3, don't move it, don't transform it. In fact, we're against data movement. What we do is we literally point and set that data and we index that data and make it available in a data representation that you can give virtual views to end-users. And those virtual views are available immediately over petabytes of data. And it actually gets presented to the end-user as an open API. So if you're elastic search user, you can use all your elastic search tools on this view. If you're a SQL user, Tableau, Looker, all the different tools, same thing with machine learning next year. So what we do is we take it, make it very simple. Simply put it there. It's already there already. Point us at it. We do the hard of indexing and making available. And then you publish in the open API as your users can use exactly what they do today. So that's, dramatically I'll give you a before and after. So let's say you're doing elastic search. You're doing logging analytics at scale, they're lending their data in S3. And then they're ETL physically duplicating and moving data. And typically deleting a lot of data to get in a format that elastic search can use. They're persisting it up in a data layer called leucine. It's physically sitting in memories, CPU, SSDs, and it's not one of them, it's a bunch of those. They in the cloud, you have to set them up because they're persisting ECC. They stand up same by 24, not a very cost-effective way to the cloud computing. What we do in comparison to that is literally pointing it at the same S3. In fact, you can run a complete parallel, the data necessary it's being ETL out. When just one more use case read only, or allow you to get that data and make this virtual views. So we run a complete parallel, but what happens is we just give a virtual view to the end users. We don't need this persistence layer, this extra cost layer, this extra time, cost and complexity of doing that. So what happens is when you look at what happens in elastic, they have a constraint, a trade-off of how much you can keep and how much you can afford to keep. And also it becomes unstable at time because you have to build out a schema. It's on a server, the more the schema scales out, guess what? you have to add more servers, very expensive. They're up seven by 24. And also they become brutalized. You lose one node, the whole thing has to be put together. We have none of that cost and complexity. We literally go from to keep whatever you want, whatever you want to keep an S3 is single persistence, very cost effective. And what we are able to do is, costs, we save 50 to 80%. Why? We don't go with the old paradigm of sit it up on servers, spin them up for persistence and keep them up 7 by 24. We're literally asking their cluster, what do you want to cut? We bring up the right compute resources. And then we release those sources after the query done. So we can do some queries that they can't imagine at scale, but we're able to do the exact same query at 50 to 80% savings. And they don't have to do any tutorial of moving that data or managing that layer of persistence, which is not only expensive, it becomes brittle. And then it becomes, I'll be quick. Once you go to BI, it's the same challenge, but the BI systems, the requests are constant coming at from a business unit down to the centralized data team. Give me this flavor of data. I want to use this piece of, you know, this analytic tool in that desk set. So they have to do all this pipeline. They're constantly saying, okay, I'll give you this data, this data, I'm duplicating that data, I'm moving it and stitching it together. And then the minute you want more data, they do the same process all over. We completely eliminate that. >> And those requests are queue up. Thomas, it had me, you don't have to move the data. That's kind of the exciting piece here, isn't it? >> Absolutely no. I think, you know, the data lake philosophy has always been solid, right? The problem is we had that Hadoop hang over, right? Where let's say we were using that platform, little too many variety of ways. And so, I always believed in data lake philosophy when James came and coined that I'm like, that's it. However, HTFS, that wasn't really a service. Cloud object storage is a service that the elasticity, the security, the durability, all that benefits are really why we founded on-cloud storage as a first move. >> So it was talking Thomas about, you know, being able to shut off essentially the compute so you don't have to keep paying for it, but there's other vendors out there and stuff like that. Something similar as separating, compute from storage that they're famous for that. And you have Databricks out there doing their lake house thing. Do you compete with those? How do you participate and how do you differentiate? >> Well, you know you've heard this term data lakes, warehouse, now lake house. And so what everybody wants is simple in, easy in, however, the problem with data lakes was complexity of out. Driving value. And I said, what if, what if you have the easy in and the value out? So if you look at, say snowflake as a warehousing solution, you have to all that prep and data movement to get into that system. And that it's rigid static. Now, Databricks, now that lake house has exact same thing. Now, should they have a data lake philosophy, but their data ingestion is not data lake philosophy. So I said, what if we had that simple in with a unique architecture and indexed technology, make it virtually accessible, publishable dynamically at petabyte scale. And so our service connects to the customer's cloud storage. Data stream the data in, set up what we call a live indexing stream, and then go to our data refinery and publish views that can be consumed the elastic API, use cabana Grafana, or say SQL tables look or say Tableau. And so we're getting the benefits of both sides, use scheme on read-write performance with scheme write-read performance. And if you can do that, that's the true promise of a data lake, you know, again, nothing against Hadoop, but scheme on read with all that complexity of software was a little data swamping. >> Well, you've got to start it, okay. So we got to give them a good prompt, but everybody I talked to has got this big bunch of spark clusters, now saying, all right, this doesn't scale, we're stuck. And so, you know, I'm a big fan of Jamag Dagani and our concept of the data lake and it's early days. But if you fast forward to the end of the decade, you know, what do you see as being the sort of critical components of this notion of, people call it data mesh, but to get the analytics stack, you're a visionary Thomas, how do you see this thing playing out over the next decade? >> I love her thought leadership, to be honest, our core principles were her core principles now, 5, 6, 7 years ago. And so this idea of, decentralize that data as a product, self-serve and, and federated computer governance, I mean, all that was our core principle. The trick is how do you enable that mesh philosophy? I can say we're a mesh ready, meaning that, we can participate in a way that very few products can participate. If there's gates data into your system, the CTL, the schema management, my argument with the data meshes like producers and consumers have the same rights. I want the consumer, people that choose how they want to consume that data. As well as the producer, publishing it. I can say our data refinery is that answer. You know, shoot, I'd love to open up a standard, right? Where we can really talk about the producers and consumers and the rights each others have. But I think she's right on the philosophy. I think as products mature in this cloud, in this data lake capabilities, the trick is those gates. If you have to structure up front, if you set those pipelines, the chance of you getting your data into a mesh is the weeks and months that Ed was mentioning. >> Well, I think you're right. I think the problem with data mesh today is the lack of standards you've got. You know, when you draw the conceptual diagrams, you've got a lot of lollipops, which are APIs, but they're all unique primitives. So there aren't standards, by which to your point, the consumer can take the data the way he or she wants it and build their own data products without having to tap people on the shoulder to say, how can I use this?, where does the data live? And being able to add their own data. >> You're exactly right. So I'm an organization, I'm generating data, when the courageously stream it into a lake. And then the service, a ChaosSearch service, is the data is discoverable and configurable by the consumer. Let's say you want to go to the corner store. I want to make a certain meal tonight. I want to pick and choose what I want, how I want it. Imagine if the data mesh truly can have that producer of information, you know, all the things you can buy a grocery store and what you want to make for dinner. And if you'd static, if you call up your producer to do the change, was it really a data mesh enabled service? I would argue not. >> Ed, bring us home. >> Well, maybe one more thing with this. >> Please, yeah. 'Cause some of this is we're talking 2031, but largely these principles are what we have in production today, right? So even the self service where you can actually have a business context on top of a data lake, we do that today, we talked about, we get rid of the physical ETL, which is 80% of the work, but the last 20% it's done by this refinery where you can do virtual views, the right or back and do all the transformation need and make it available. But also that's available to, you can actually give that as a role-based access service to your end-users, actually analysts. And you don't want to be a data scientist or DBA. In the hands of a data scientist the DBA is powerful, but the fact of matter, you don't have to affect all of our employees, regardless of seniority, if they're in finance or in sales, they actually go through and learn how to do this. So you don't have to be it. So part of that, and they can come up with their own view, which that's one of the things about data lakes. The business unit wants to do themselves, but more importantly, because they have that context of what they're trying to do instead of queuing up the very specific request that takes weeks, they're able to do it themselves. >> And if I have to put it on different data stores and ETL that I can do things in real time or near real time. And that's game changing and something we haven't been able to do ever. >> And then maybe just to wrap it up, listen, you know 8 years ago, Thomas and his group of founders, came up with the concept. How do you actually get after analytics at scale and solve the real problems? And it's not one thing, it's not just getting S3. It's all these different things. And what we have in market today is the ability to literally just simply stream it to S3, by the way, simply do, what we do is automate the process of getting the data in a representation that you can now share an augment. And then we publish open API. So can actually use a tool as you want, first use case log analytics, hey, it's easy to just stream your logs in. And we give you elastic search type of services. Same thing that with CQL, you'll see mainstream machine learning next year. So listen, I think we have the data lake, you know, 3.0 now, and we're just stretching our legs right now to have fun. >> Well, and you have to say it log analytics. But if I really do believe in this concept of building data products and data services, because I want to sell them, I want to monetize them and being able to do that quickly and easily, so I can consume them as the future. So guys, thanks so much for coming on the program. Really appreciate it.

Published Date : Nov 15 2021

SUMMARY :

and Thomas Hazel, the CTO really good to be here. lakes in the cloud forever. And the same process has to happen. So you guys talk about You know, when you guys crew founded the team to go after this, That's kind of the exciting service that the elasticity, And you have Databricks out there And if you can do that, end of the decade, you know, the chance of you getting your on the shoulder to say, all the things you can buy a grocery store So even the self service where you can actually have And if I have to put it is the ability to literally Well, and you have

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Kevin Miller	PERSON	0.99+
Thomas	PERSON	0.99+
Ed	PERSON	0.99+
80%	QUANTITY	0.99+
Ed Walsh	PERSON	0.99+
50	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
James	PERSON	0.99+
Thomas Hazel	PERSON	0.99+
ChaosSearch	ORGANIZATION	0.99+
three months	QUANTITY	0.99+
Databricks	ORGANIZATION	0.99+
next year	DATE	0.99+
2021	DATE	0.99+
two thousands	QUANTITY	0.99+
three weeks	QUANTITY	0.99+
24	QUANTITY	0.99+
James Dixon	PERSON	0.99+
last decade	DATE	0.99+
7	QUANTITY	0.99+
second challenge	QUANTITY	0.99+
2031	DATE	0.99+
Jamag Dagani	PERSON	0.98+
S3	ORGANIZATION	0.98+
both sides	QUANTITY	0.98+
S3	TITLE	0.98+
8 years ago	DATE	0.98+
second thing	QUANTITY	0.98+
today	DATE	0.98+
about 60%	QUANTITY	0.98+
tonight	DATE	0.97+
first	QUANTITY	0.97+
Tableau	TITLE	0.97+
two big areas	QUANTITY	0.96+
one	QUANTITY	0.95+
SQL	TITLE	0.94+
seven	QUANTITY	0.94+
6	DATE	0.94+
CTO	PERSON	0.93+
CQL	TITLE	0.93+
7 years	DATE	0.93+
first move	QUANTITY	0.93+
next decade	DATE	0.92+
single	QUANTITY	0.91+
DBS	ORGANIZATION	0.9+
20%	QUANTITY	0.9+
one thing	QUANTITY	0.87+
5	DATE	0.87+
Hadoop	TITLE	0.87+
Looker	TITLE	0.8+
Grafana	TITLE	0.73+
DPA	ORGANIZATION	0.71+
one more thing	QUANTITY	0.71+
end of the	DATE	0.69+
Vice President	PERSON	0.65+
petabytes	QUANTITY	0.64+
cabana	TITLE	0.62+
CEO	PERSON	0.57+
HTFS	ORGANIZATION	0.54+
house	ORGANIZATION	0.49+
theCUBE	ORGANIZATION	0.48+

Charles Gaddy. Melissa Data | PentahoWorld 2017

(Upbeat music) >> Announcer: Live from Orlando Florida, It's theCUBE covering PentahoWorld 2017. Brought to you by Hitachi Vantara. >> Welcome back to theCUBE's coverage of PentahoWorld, brought to you, of course, by Hitachi Vantara, I'm your host Rebecca Knight along with my cohost James Kobielius. We're joined by Charles Gaddy, he is the Business Development Manager at Melissa Data. Thanks so much for joining us. >> Great, thank you for having me. >> So tell us, tell our viewers a little bit about Melissa Data and what you do there. >> Well, Melissa is a data quality and identity assurance company, so we have been around for 30 years. And we're a 30 year old start up you might say. Very innovative in what we do, and the way we address our problems. We are the strategic partner for Pentaho as it relates to data quality. So most of our data quality solutions are embedded and available within the Pentaho stack. So my particular role there is to facilitate global sales and alliances, and Pentaho is one of our global alliances. >> Okay, so that's the, it's a strategic alliance, and so what is your relationship now with Hitachi Vantara? >> That's a great question, because now that we're with Hitachi Vantara, one of the things we're focusing on is a strategy around data quality blue prints. Data quality blueprints are something that Pentaho brought in to that relationship, or that new company, right? And it's a powerful way that they sell their solutions, and craft the message around their solutions in a way that sounds less technical and more engaging, I think. And I'll give you a bit of an opinion there, and so we're very excited to be one of the first companies, from a partner perspective, to do a blueprint that's not strictly Pentaho based. >> Is it, you're talking about blueprints, is it a consultative marketing and sales tool? Or is it a solution accelerator template, or a bit of both? >> You stole my thunder, I was going to say I think it's a bit of both actually, yes. The nice thing that I've seen about the other ones they've done and the one that we're crafting is, you're taking a use case, effectively, and you're breaking down what you're bringing to that use case, with a sprinkle of technology, so that they know it is a technical solution, as well as a consultative sale. Then you're telling them about the problem you're going to solve with it, and the expected outcomes after you've solved that problem. So, the first use case is around customer data quality, within online retail, right. So, everything from preventing packages from being misplaced by using address verification, and geocoding in order to improve the quality of address data that you're shipping, all the way through to customer demographics, so you can understand and overlay demographic information about the customers you're targeting online. All of these solutions, we bring the data piece of that, and Pentaho brings the other elements to make that combined blueprint. >> So just in hearing you say those things, I'm thinking back to what we heard on the main stage today, about the potential of the dark side, in the sense of the models maybe being used for nefarious reasons, I mean, how do you guard against that? >> Well, you know, there's that AI component, which was very much of the Skynet comment I believe, and then there's data quality, which, having been around data quality for quite a while, there's a rules based element to that, that isn't necessarily AI based, so you don't necessarily have as much of that dark side to deal with, what you are rightfully pointing out, is the idea that you're using elements of data that represent someone's identity potentially, right. And how do you protect and safeguard that? And our 30 years in the business really gives us an insight on how to protect the data in ways that insure the quality of it, but then also insure that it's not used for nefarious purposes, like you said. >> Okay, so as you know, Pentaho co-founder James Dixon coined the term "the data lake". So how has Melissa partnered and integrated with Pentaho in that way? >> And how does data governance and quality ride upon and leverage the data lake to be effective? >> Okay, so it's a two part question. Looking at it from the perspective of what was described in the data lake, things are going in to the data lake. Well, you can take two approaches to it, I guess. You can try to boil that data lake, which is very challenging, you know. Or you can extract quality information out of it, and so, data quality, whether you're pushing data quality into the lake, or whether you're trying to extract actionable intelligence out of the lake, fits on both sides and gives you that step towards analytics and intelligence that you need. Right, otherwise it's a lake. The other side you mentioned is the governance side of it. So, our components that run, and our services that run as a part of what is offered with Pentaho, give elements of a feature like profiling, so you're able to profile the data as it's moving between these different places, see the anomalies, potentially address the anomalies, if that's something you need to do, or at least be aware of them so you know what's going on, right, and you're constantly monitoring. >> Does that involve AI or machine learning on your end to do that, the anomaly detection within the data lake? >> There's elements of our technology that leverage pieces of that for sure. I wouldn't call it full blown AI from that perspective, but there are some patents and some proprietary technology that we have, that gives us a unique approach on how to profile that data, and how to make that profiled information actionable within Pentaho. >> So, you talked about the retailer use case, and that's how we can make sure the packages are delivered to the right places, and the demographic. What are some other examples of ways that we can use Melissa Data? >> Okay, so as luck would have it, the first blueprint we're doing is the customer one I just mentioned, but we're already talking with Hitachi Vantara about the idea of doing a financial services one, right. And so in that fin tech space, not only would you be able to leverage matching deduplication, which they call more of an identity resolution in that element, but you'd also be able to leverage the elements of data that we bring to bear to say that you are who you say you are. So you bundle those together in a fin tech, or a financial services model, and you've got a different use case from customers and online retail, but you still have a very compelling joint offering as you're pushing data through. >> Which is particularly relevant in light of the Equifax breach, which will haunt us for the rest of our lives, we keep hearing about this. >> Yes, you have to be very careful with the data that you utilize, absolutely. >> One of the terms we keep hearing a lot is future proofing. What does that mean to you at Melissa Data? How do you describe your approach to future proofing your business? >> So, it's interesting because, as I mentioned, we're pretty much a 30 year old start up, so as a function of that, we future proofed ourselves. Because we've evolved and adapted, you have to be nimble, you have to be agile, as well as embracing agile concepts, which, there's two different meanings there, if you will. And so, in looking at that, you want to make sure that you've got the right technology set, and that that technology set can be easily adapted and evolve over time, right. I think those are they key things we've done as a company, with the solutions we've built, and much like, I heard today on the keynote, that Hitachi had focused to do, we've done a very similar thing, because we started in direct marketing, with a database of zip codes. And now we offer matching, and we offer these cloud solutions and identity. So we've had a very similar track to that story you heard earlier. >> You've said it a couple of times, you're a 30 year old start up. How do you stay innovative? I mean, you're a 30 year old start up that now has employees in four locations across the U.S. dealing in huge businesses. How do you keep that start up mentality? The hungry mentality, and the hack-y mentality, I guess I should say too? >> One of the real advantages we've got there, is our CEO and founder has always innovated. From the first company before Melissa, all the way up through today, he's always been one to say we need to try that next thing, right. Pentaho, five or six years ago, was that next thing that he and our VP of strategy said we should try, and now I'm sitting here with you today. There's a top down, bottom up approach, if that makes sense to you, because if you have an idea, you can bring that idea forward as well. >> You consider the next thing, and Hitachi Vantara's been saying that in spades today here at this event, it's also a Wikibon research focus, the Edge, Edge computing, Edge analytics, data, machine data coming from Edge devices, how is Melissa Data, in partnership with Pentaho, moving towards this Edge to outcome frame of reference, or frame for building innovative solutions, where does that fit with your roadmap going forward? >> So our perspective on that, much like when we first engaged with them, data was going into the data lake, let's just get it all in there, get it all in there, get it all in there, get it all in there, right. Well, eventually you have to make that data actionable. You're going to have a reverse scenario with the Edge. There's a lot of data, small amounts, small chunks, that are going to be everywhere, I think it was talked about being on cell phones, and everywhere else. The idea that you can extend the reach of data quality along with the reach of analytics, to actually make sure you're getting the best data you can, to feed those microanalytics, to feed that, that's a critical part that we see as potential. >> Looking ahead, what are some of the problems that you want to solve, just sort of in the next year, the next five years, what are some of the things that you're thinking about and keeping you up at night right now. >> We're doing some very interesting things with globally unique identifiers, I'll call them that, not a GUID in that sense, but the idea that every address on the planet could be indexed, right. And then the idea beyond that was every email and every phone and every identity around that could be indexed. Then when you're dealing with a massive amount of indexes, becomes a lot faster and a lot easier to match, to dedupe, to do other data quality tasks. So, it's one of the projects that our CEO is very interested in, is this sort of indexing or massive indexing table concept. And so that's one of the things I know we're very focused on as an organization, and how that can feed all of our other technologies. >> How would that work, I mean, I know it's a research process in motion, but >> And keep in mind I am the head of global sales and alliances, so don't bust out all the too technical a question. (laughter) >> Yeah, so this is identity resolution at a massive scale, does it involve an internet of things, almost like a, slap me on the wrist, a graph, a social graph of you and all the identities you may have running on various Edge devices? You meaning a user. >> I think there is the potential for pieces. >> Remember, I'm a geek here so. >> Yeah, yeah there's a potential for pieces of that to be used in that way. Like an example we got approached about was, someone who wanted to have a cookie that represented the address that they just captured from this particular interaction on the web, right. Well, imagine if you could use this table of addresses that was indexed, right, to get that number back, and you just store that number constantly with that cookie, you'd never have to store that address data again, you could match that index against other indexes, and the uses go on and on and on. >> James: Right. >> So it's not complete in any way, so I wouldn't want to venture to answer the implete part of your question, but the idea that you can represent things with a series of numbers is how the internet got started, effectively, right, so you could look at something similar. >> Right. >> So you're here at PentahoWorld, and you said you're a biz dev manager, what is your, what do you hope to take away from it? I mean, are you talking? >> You mean outside of business? (laughter) >> Get some deals done, exactly. But what are you learning, what are you hearing, are you sharing best practices, and how do you do that here? >> Well, we're pretty tightly connected into different elements of what is now Hitachi Vantara, right, so we work with their office in Singapore, we work with them engaged all over the world, on many different fronts, and so it's nice to be here one, so you can literally put some faces with some names, right. And as you look at some of their different initiatives, like cyber security that I've seen, over there somewhere, and some of the other initiatives they've got going, they march a bit in lock step with what we're doing, and the nice thing about being here, is the ability to sort of reconcile that and see and talk about how we can go forward together with those elements, if that makes sense. >> James: Right. >> Absolutely. Well Charles, thanks so much for coming on theCUBE, it's been a great talking to you. >> James: Yeah absolutely. >> Thank you for having me, I appreciate it. >> We will have more from theCUBE's live coverage of PentahoWorld in just a little bit. (upbeat music)

Published Date : Oct 26 2017

SUMMARY :

Brought to you by Hitachi Vantara. he is the Business Development about Melissa Data and what you do there. and the way we address our problems. and craft the message and the one that we're crafting is, of that dark side to deal with, Okay, so as you know, intelligence that you need. and how to make that profiled information the retailer use case, to say that you are who you say you are. of the Equifax breach, which will haunt us with the data that you One of the terms we keep to that story you heard earlier. and the hack-y mentality, and now I'm sitting here with you today. getting the best data you can, that you want to solve, just And so that's one of the things And keep in mind I am the head almost like a, slap me on the wrist, I think there is the of that to be used in that way. that you can represent and how do you do that here? is the ability to sort it's been a great talking to you. Thank you for having me, of PentahoWorld in just a little bit.

ENTITIES

Entity	Category	Confidence
Rebecca Knight	PERSON	0.99+
James Kobielius	PERSON	0.99+
Singapore	LOCATION	0.99+
Charles	PERSON	0.99+
Pentaho	ORGANIZATION	0.99+
Charles Gaddy	PERSON	0.99+
James	PERSON	0.99+
James Dixon	PERSON	0.99+
30 year	QUANTITY	0.99+
Hitachi	ORGANIZATION	0.99+
Hitachi Vantara	ORGANIZATION	0.99+
Melissa Data	PERSON	0.99+
Melissa Data	ORGANIZATION	0.99+
30 years	QUANTITY	0.99+
PentahoWorld	ORGANIZATION	0.99+
Equifax	ORGANIZATION	0.99+
Orlando Florida	LOCATION	0.99+
two approaches	QUANTITY	0.99+
next year	DATE	0.99+
five	DATE	0.99+
two part	QUANTITY	0.99+
U.S.	LOCATION	0.99+
One	QUANTITY	0.99+
Melissa	ORGANIZATION	0.99+
one	QUANTITY	0.99+
both	QUANTITY	0.98+
today	DATE	0.98+
both sides	QUANTITY	0.98+
six years ago	DATE	0.97+
Pentaho	PERSON	0.96+
Wikibon	ORGANIZATION	0.96+
Melissa	PERSON	0.95+
first	QUANTITY	0.95+
theCUBE	ORGANIZATION	0.95+
first company	QUANTITY	0.94+
first blueprint	QUANTITY	0.93+
two different meanings	QUANTITY	0.91+
first companies	QUANTITY	0.9+
next five years	DATE	0.88+
Skynet	ORGANIZATION	0.87+
2017	DATE	0.83+
30 year old	QUANTITY	0.82+
PentahoWorld	EVENT	0.81+
every phone	QUANTITY	0.76+
every email	QUANTITY	0.72+
PentahoWorld 2017	EVENT	0.7+
first use case	QUANTITY	0.69+
Edge	ORGANIZATION	0.66+
Pentaho	LOCATION	0.65+
couple	QUANTITY	0.64+
four locations	QUANTITY	0.55+
PentahoWorld	TITLE	0.52+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for James Dixon: