Steve Spear, Author - HPE Big Data Conference 2016 #SeizeTheData #theCUBE

>> Announcer: It's The Cube. Covering HPE Big Data Conference 2016. Now here are your hosts, Dave Vellante and Paul Gillin. >> Welcome back to Boston, everybody, this is The Cube, we're here live at HP's big data conference, hashtag seize the data. Steve Spear is here, he's an author, MIT professor, author of The High Velocity Edge, welcome to The Cube, thanks for coming on. >> Oh, thanks for having me. >> I got to tell you, following Phil Black, you were coming onstage, I have never heard you speak before, I said, "Oh, this poor guy," and you did awesome, you were great, you held the audience, so congratulations, you were very dynamic and he was unbelievable and you were fantastic, so. >> Today was second-worst speaking setup, one time I was on a panel where it was three admirals, a general, and then the other guy wearing a suit, I said, "Well at least another schmo in a suit," and his opening lines were, "You know, this reminds me, "when I was on the space shuttle and we were flying "to the Hubble," and I'm like, "A flipping astronaut, "I got to follow an astronaut?" So anyway, this was only a SEAL, there were a lot of them, there were far fewer astronauts, so that was easy. >> What I really liked about your talk is, first of all, you told the story of Toyota, which I didn't know, you may. >> No, my experience with Toyota was in the early '70s, I remember the Toyota sort of sweeping into the market but you talked about 20 years before it when they were first entering and how this really was a company that had a lot of quality problems and it was perceived as not being very competitive. >> Yeah, Toyota now people look at as almost, they just take for granted the quality, the productivity, they assume good labor relations and that kind of thing, it's non-unionized, not because the unions haven't tried to unionize, but the employees don't feel the need. And again, in the '50s, Toyota was absolutely an abysmal auto-maker, their product was terrible, their productivity was awful and they didn't have particularly good relations with the workforce either. I mean, it's a profound transformation. >> And you gave this test, in the 50s, I forget what it was, it was one-tenth the productivity of the sort of average automobile manufacturer and then they reached parity in '62, by '68 they were 2X, and by '73, they were off the charts. >> Right, right, right. >> Right, so amazing transformation and then you try to figure out how they did it and they couldn't answer, but they said, "We can show you," right? And that sort of led to your research and your book. >> Yeah, so the quick background is in some regards, this fellow Kenneth Bowen, who was my mentor and advisor when I was doing my doctorate, he could argue we were late to the game because people started recognizing Toyota as this paragon of virtue, high quality at low cost, and so that in the 1980s prompted this whole investigation and the term lean manufacturing came out of the realization that on any given day, Toyota and suppliers were making basically twice the product with half the effort and so you had this period of '85 to about '95 where there was this intense attempt to study Toyota, document Toyota, imitate Toyota, General Motors had a joint venture with Toyota, and then you have the mid-'90s and there's no second Toyota, despite all this investment, so we go to the Toyota guys and say, "Look, clearly if everyone is studying you, imitating you, "copying you, and they haven't replicated you, "they've missed something, so what is it?" And they say, "I'm sorry, but we can't tell you." And we said, "Well you got to be kidding, I mean, "you have a joint venture with your biggest competitor, "General Motors," and they said, "No, no, it's not that we wouldn't tell you, "we just actually don't know how to explain what we do "'cause most of us learn it in this very immersive setting, "but if you'd like to learn it, "you can learn it the way we do." I didn't realize at the time that it would be this Karate Kid wax-on, wax-off, paint-up, paint-down experience, which took years and years to learn and there are some funny anecdotes about it but even at the end, their inability to say what it is, so I went years trying to capture what they were doing and realizing I was wrong 'cause different things wouldn't work quite right, and I can tell you, I was on the Shinkansen with the guy who was my Toyota mentor and I finally said, "Mr. Oba, I think I finally "figured it out, it all boils down to these basic "approaches to seeing and solving problems." And he's looking over my cartoons and stuff and he says, "Well, I don't see anything wrong with this." (laughs) >> That was as good as it got. >> That was as good as it got, I was like, "Score, nothing wrong that he can see!" So anyway. >> But so if you talk about productivity, reliability, you made huge gains there, and the speed of product cycles, were the three knobs that Toyota was turning much more significantly than anybody else and then fuel efficiency came. >> Right, so if you start looking at Toyota and I think this is where people first got the attraction and then sort of the dismissive of, we don't make cars, so the initial hook was the affordable reliability, they could deliver a much higher-quality car, much more affordable based on their productivity. And so that's what triggered attention which then manifest itself as this lean manufacturing and its production control tools. What then sort of started to fall off people's radar is that Toyota not only stayed ahead on those dimensions but they added to the dimensionality of the game, so they started introducing new product faster than anybody else and then they introduced new brand more successfully so all the Japanese, Nissan, Honda, Toyota, all came out with a luxury version, but no one came out with Lexus other than Toyota. The Affinity and the Acura, I mean, it's nice cars, but it didn't become this dominant brand like the Lexus. And then in trying to hit the youth market, everyone tried to come up with, like Honda had the Element but nothing like the Scion, so then Toyota's, and that's much further upstream, a much more big an undertaking than just productivity in a factory. And then when it came time to this issue around fuel efficiency, that's a big technology play of trying to figure out how you get these hybridized technologies with a very very complex software engineering overlay to coordinate power flow in this thing and that, and everyone has their version of hybrid, but no one has it through six generations, 21 platforms, and millions of copies sold. So it didn't matter where you were, Toyota figured out how to compete on this value to market with speed and ease which no one else in their industry was replicating. >> You're talking about, this has nothing to do with operational efficiency, when you talk about the Scion for example, you're talking about tapping into a customer, into an emotional connection with your customer and being able to actually anticipate what they will want before they even know, how do you operationalize that? >> So I think, again, Toyota made such an impression on people with operational efficiency that a lot of their genius went unrecognized, so what I was trying to elaborate on this morning is that Toyota's operational efficiency is not the consequence of just more clever design of operations, like you have an algorithm which I lack and so you get to a better answer than I do, it was this very intense almost empathetic approach to improving existing operations, so you're working on something and it's difficult so we're perceptive of that difficulty and try to understand the source of that difficulty and resolve it, and just do that relentlessly about everything all the time, and it's that empathy to understand your difficulty which then becomes the trigger for making things better, so as far as the Scion comes in, what you see is the same notion of empathic design apply to the needs of the youth market. And the youth market unlike the folks who are, let's say at the time, middle-aged, was less about reliable affordability, but these were people who were coming of age during the Bannatyne era where, very fast mass customization or the iPod era, which was common Chassis but very fast, inexpensive personalization and the folks at Toyota said, "You know what, "the youth market, we don't really understand that, "we've been really successful for this older mid-market, "so let's try to understand the problems that the youth "are trying to solve with their acquisitions," and it turned out personalization. And so if you look at the Scion, it wasn't necessarily a technically or technologically sophisticated quote-unquote sexy product, what it did was it leant itself towards very diverse personalization, which was the problem that the youth market was trying to solve. And you actually see, if I can go on this notion of empathic design, so you see this with the Lexus, so I think the conventional wisdom about luxury cars was Uber technology and bling it, throw chrome and leather and wood and when Toyota tried that initially, they took what was I guess now the Avalon, full-sized car, and they blinged it up and it was contradictory 'cause if you're looking for a luxury car, you don't go to a Toyota dealer, and if you go to a Toyota dealer and you see something with chrome and leather and wood veneer, you're like, you have dissonance. So they tried to understand what luxury meant from the American consumer perspective and again, it wasn't, you always wish you'd get this job, but they sent an engineering team to live in Beverly Hills for some months. (laughs) It's like, ooh, twist my arm on that one, right? But what they found was that luxury wasn't just the physical product, it was the respectful service around it, like when you came back to your hotel room, you walked in, people remembered your name or remembered that, oh we noticed that you used a lot of bath towels so we made sure there were extra in your room, that sort of thing, and if you look at the Lexus, and people were dismissive of the Lexus, saying, "It looks like slightly fancier Toyota, "but what's the big deal, it's not a Beamer or Mercedes." But that wasn't the point, it was the experience you got when you went for sales and service, which was, you got treated so nice, and again, not like hoity toity but you got treated respectfully, so anyway, it all comes back to this empathic design around what problem is the customer or someone inside a plan trying to solve. >> So Toyota and Volkswagen trying to vie for top market share but Toyota, as you say, has got this brand and this empathy that Volkswagen doesn't. You must get a lot of questions about Tesla. Thoughts on Tesla. >> Yeah, cool product, cool technology and time will tell if they're actually solving a real problem. And I don't mean to be dismissive, it's just not an area where I've spent a lot of time. >> And we don't really know, I mean, it's amazing and a software-defined automobile and autonomous, very difficult to predict, we're very tight on time. >> All the cool people seem to drive them though. >> Yeah, that's true. Last question I have is, what the heck does this have to do with analytics at a conference like this? >> Right, so you start thinking about the Toyota model, really, it's not that you can sit down and design something right, it's that you design things which you know deep-rooted in your DNA is that what you've designed is wrong, and that in order to get it right and actually much righter than anything else in the marketplace, what you need to do is understand what's wrong about it and so the experience of the user will help inform what's wrong, the worker rounds they do, the inconveniences they experience, the coping, the compensation they do, and that you can not only use that to help inform what's wrong, but then help shape your understanding of how to get to right, and so where all this fits in is that when you start thinking about data, well first of all, these are gigantic systems, right, which it's probably well-informed to think in terms of these systems are being designed by flawed human beings so the systems themselves have flaws, so it's good to be attentive to the flaws that are designed in it so you can fix them and make them more usable by your intended clientele. But the other thing is that these systems can help you gain much greater precision, granularity, frequency of sampling and understanding of where things are misfiring sooner than later, smaller than larger, so you can adjust and adapt and be more agile in shaping the experience. >> Well Steve, great work, thanks very much for coming on The Cube and sharing and great to meet you. >> Yeah likewise, thanks for having me. >> You're welcome. Alright, keep it right there, everybody, Paul and I will be back with our next guest, we're live from Boston, this is The Cube, we'll be right back. (upbeat music)

Published Date : Aug 30 2016

SUMMARY :

Vellante and Paul Gillin. hashtag seize the data. and you were fantastic, so. astronauts, so that was easy. which I didn't know, you may. and how this really was And again, in the '50s, Toyota the 50s, I forget what it was, And that sort of led to and so that in the 1980s I was like, "Score, nothing and the speed of product so the initial hook was and so you get to a and this empathy that Volkswagen doesn't. And I don't mean to be and a software-defined All the cool people have to do with analytics and so the experience sharing and great to meet you. Paul and I will be back

ENTITIES

Entity	Category	Confidence
Nissan	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Toyota	ORGANIZATION	0.99+
Steve	PERSON	0.99+
Honda	ORGANIZATION	0.99+
Paul Gillin	PERSON	0.99+
Steve Spear	PERSON	0.99+
Kenneth Bowen	PERSON	0.99+
Beverly Hills	LOCATION	0.99+
Paul	PERSON	0.99+
Boston	LOCATION	0.99+
Lexus	ORGANIZATION	0.99+
Tesla	ORGANIZATION	0.99+
Phil Black	PERSON	0.99+
Volkswagen	ORGANIZATION	0.99+
General Motors	ORGANIZATION	0.99+
21 platforms	QUANTITY	0.99+
Oba	PERSON	0.99+
2X	QUANTITY	0.99+
three admirals	QUANTITY	0.99+
The High Velocity Edge	TITLE	0.99+
Mercedes	ORGANIZATION	0.99+
iPod	COMMERCIAL_ITEM	0.99+
six generations	QUANTITY	0.99+
'73	DATE	0.99+
mid-'90s	DATE	0.99+
'62	DATE	0.99+
MIT	ORGANIZATION	0.99+
millions of copies	QUANTITY	0.99+
'68	DATE	0.99+
three knobs	QUANTITY	0.98+
early '70s	DATE	0.98+
'85	DATE	0.98+
Today	DATE	0.98+
50s	DATE	0.98+
Acura	ORGANIZATION	0.97+
one-tenth	QUANTITY	0.97+
Uber	ORGANIZATION	0.97+
twice	QUANTITY	0.96+
HPE Big Data Conference 2016	EVENT	0.96+
1980s	DATE	0.95+
Beamer	ORGANIZATION	0.95+
Affinity	ORGANIZATION	0.95+
first	QUANTITY	0.95+
half	QUANTITY	0.94+
HPE	ORGANIZATION	0.93+
HP	ORGANIZATION	0.93+
The Cube	ORGANIZATION	0.92+
second-worst	QUANTITY	0.91+
'50s	DATE	0.91+
one time	QUANTITY	0.91+
American	OTHER	0.89+
Big Data Conference 2016	EVENT	0.83+
'95	DATE	0.83+
this morning	DATE	0.79+
Scion	ORGANIZATION	0.78+

Rich Gaston, Micro Focus | Virtual Vertica BDC 2020

(upbeat music) >> Announcer: It's theCUBE covering the virtual Vertica Big Data Conference 2020 brought to you by Vertica. >> Welcome back to the Vertica Virtual Big Data Conference, BDC 2020. You know, it was supposed to be a physical event in Boston at the Encore. Vertica pivoted to a digital event, and we're pleased that The Cube could participate because we've participated in every BDC since the inception. Rich Gaston this year is the global solutions architect for security risk and governance at Micro Focus. Rich, thanks for coming on, good to see you. >> Hey, thank you very much for having me. >> So you got a chewy title, man. You got a lot of stuff, a lot of hairy things in there. But maybe you can talk about your role as an architect in those spaces. >> Sure, absolutely. We handle a lot of different requests from the global 2000 type of organization that will try to move various business processes, various application systems, databases, into new realms. Whether they're looking at opening up new business opportunities, whether they're looking at sharing data with partners securely, they might be migrating it to cloud applications, and doing migration into a Hybrid IT architecture. So we will take those large organizations and their existing installed base of technical platforms and data, users, and try to chart a course to the future, using Micro Focus technologies, but also partnering with other third parties out there in the ecosystem. So we have large, solid relationships with the big cloud vendors, with also a lot of the big database spenders. Vertica's our in-house solution for big data and analytics, and we are one of the first integrated data security solutions with Vertica. We've had great success out in the customer base with Vertica as organizations have tried to add another layer of security around their data. So what we will try to emphasize is an enterprise wide data security approach, where you're taking a look at data as it flows throughout the enterprise from its inception, where it's created, where it's ingested, all the way through the utilization of that data. And then to the other uses where we might be doing shared analytics with third parties. How do we do that in a secure way that maintains regulatory compliance, and that also keeps our company safe against data breach. >> A lot has changed since the early days of big data, certainly since the inception of Vertica. You know, it used to be big data, everyone was rushing to figure it out. You had a lot of skunkworks going on, and it was just like, figure out data. And then as organizations began to figure it out, they realized, wow, who's governing this stuff? A lot of shadow IT was going on, and then the CIO was called to sort of reign that back in. As well, you know, with all kinds of whatever, fake news, the hacking of elections, and so forth, the sense of heightened security has gone up dramatically. So I wonder if you can talk about the changes that have occurred in the last several years, and how you guys are responding. >> You know, it's a great question, and it's been an amazing journey because I was walking down the street here in my hometown of San Francisco at Christmastime years ago and I got a call from my bank, and they said, we want to inform you your card has been breached by Target, a hack at Target Corporation and they got your card, and they also got your pin. And so you're going to need to get a new card, we're going to cancel this. Do you need some cash? I said, yeah, it's Christmastime so I need to do some shopping. And so they worked with me to make sure that I could get that cash, and then get the new card and the new pin. And being a professional in the inside of the industry, I really questioned, how did they get the pin? Tell me more about this. And they said, well, we don't know the details, but you know, I'm sure you'll find out. And in fact, we did find out a lot about that breach and what it did to Target. The impact that $250 million immediate impact, CIO gone, CEO gone. This was a big one in the industry, and it really woke a lot of people up to the different types of threats on the data that we're facing with our largest organizations. Not just financial data; medical data, personal data of all kinds. Flash forward to the Cambridge Analytica scandal that occurred where Facebook is handing off data, they're making a partnership agreement --think they can trust, and then that is misused. And who's going to end up paying the cost of that? Well, it's going to be Facebook at a tune of about five billion on that, plus some other finds that'll come along, and other costs that they're facing. So what we've seen over the course of the past several years has been an evolution from data breach making the headlines, and how do my customers come to us and say, help us neutralize the threat of this breach. Help us mitigate this risk, and manage this risk. What do we need to be doing, what are the best practices in the industry? Clearly what we're doing on the perimeter security, the application security and the platform security is not enough. We continue to have breaches, and we are the experts at that answer. The follow on fascinating piece has been the regulators jumping in now. First in Europe, but now we see California enacting a law just this year. They came into a place that is very stringent, and has a lot of deep protections that are really far-reaching around personal data of consumers. Look at jurisdictions like Australia, where fiduciary responsibility now goes to the Board of Directors. That's getting attention. For a regulated entity in Australia, if you're on the Board of Directors, you better have a plan for data security. And if there is a breach, you need to follow protocols, or you personally will be liable. And that is a sea change that we're seeing out in the industry. So we're getting a lot of attention on both, how do we neutralize the risk of breach, but also how can we use software tools to maintain and support our regulatory compliance efforts as we work with, say, the largest money center bank out of New York. I've watched their audit year after year, and it's gotten more and more stringent, more and more specific, tell me more about this aspect of data security, tell me more about encryption, tell me more about money management. The auditors are getting better. And we're supporting our customers in that journey to provide better security for the data, to provide a better operational environment for them to be able to roll new services out with confidence that they're not going to get breached. With that confidence, they're not going to have a regulatory compliance fine or a nightmare in the press. And these are the major drivers that help us with Vertica sell together into large organizations to say, let's add some defense in depth to your data. And that's really a key concept in the security field, this concept of defense in depth. We apply that to the data itself by changing the actual data element of Rich Gaston, I will change that name into Ciphertext, and that then yields a whole bunch of benefits throughout the organization as we deal with the lifecycle of that data. >> Okay, so a couple things I want to mention there. So first of all, totally board level topic, every board of directors should really have cyber and security as part of its agenda, and it does for the reasons that you mentioned. The other is, GDPR got it all started. I guess it was May 2018 that the penalties went into effect, and that just created a whole Domino effect. You mentioned California enacting its own laws, which, you know, in some cases are even more stringent. And you're seeing this all over the world. So I think one of the questions I have is, how do you approach all this variability? It seems to me, you can't just take a narrow approach. You have to have an end to end perspective on governance and risk and security, and the like. So are you able to do that? And if so, how so? >> Absolutely, I think one of the key areas in big data in particular, has been the concern that we have a schema, we have database tables, we have CALMS, and we have data, but we're not exactly sure what's in there. We have application developers that have been given sandbox space in our clusters, and what are they putting in there? So can we discover that data? We have those tools within Micro Focus to discover sensitive data within in your data stores, but we can also protect that data, and then we'll track it. And what we really find is that when you protect, let's say, five billion rows of a customer database, we can now know what is being done with that data on a very fine grain and granular basis, to say that this business process has a justified need to see the data in the clear, we're going to give them that authorization, they can decrypt the data. Secure data, my product, knows about that and tracks that, and can report on that and say at this date and time, Rich Gaston did the following thing to be able to pull data in the clear. And that could be then used to support the regulatory compliance responses and then audit to say, who really has access to this, and what really is that data? Then in GDPR, we're getting down into much more fine grained decisions around who can get access to the data, and who cannot. And organizations are scrambling. One of the funny conversations that I had a couple years ago as GDPR came into place was, it seemed a couple of customers were taking these sort of brute force approach of, we're going to move our analytics and all of our data to Europe, to European data centers because we believe that if we do this in the U.S., we're going to violate their law. But if we do it all in Europe, we'll be okay. And that simply was a short-term way of thinking about it. You really can't be moving your data around the globe to try to satisfy a particular jurisdiction. You have to apply the controls and the policies and put the software layers in place to make sure that anywhere that someone wants to get that data, that we have the ability to look at that transaction and say it is or is not authorized, and that we have a rock solid way of approaching that for audit and for compliance and risk management. And once you do that, then you really open up the organization to go back and use those tools the way they were meant to be used. We can use Vertica for AI, we can use Vertica for machine learning, and for all kinds of really cool use cases that are being done with IOT, with other kinds of cases that we're seeing that require data being managed at scale, but with security. And that's the challenge, I think, in the current era, is how do we do this in an elegant way? How do we do it in a way that's future proof when CCPA comes in? How can I lay this on as another layer of audit responsibility and control around my data so that I can satisfy those regulators as well as the folks over in Europe and Singapore and China and Turkey and Australia. It goes on and on. Each jurisdiction out there is now requiring audit. And like I mentioned, the audits are getting tougher. And if you read the news, the GDPR example I think is classic. They told us in 2016, it's coming. They told us in 2018, it's here. They're telling us in 2020, we're serious about this, and here's the finds, and you better be aware that we're coming to audit you. And when we audit you, we're going to be asking some tough questions. If you can't answer those in a timely manner, then you're going to be facing some serious consequences, and I think that's what's getting attention. >> Yeah, so the whole big data thing started with Hadoop, and Hadoop is open, it's distributed, and it just created a real governance challenge. I want to talk about your solutions in this space. Can you tell us more about Micro Focus voltage? I want to understand what it is, and then get into sort of how it works, and then I really want to understand how it's applied to Vertica. >> Yeah, absolutely, that's a great question. First of all, we were the originators of format preserving encryption, we developed some of the core basic research out of Stanford University that then became the company of Voltage; that build-a-brand name that we apply even though we're part of Micro Focus. So the lineage still goes back to Dr. Benet down at Stanford, one of my buddies there, and he's still at it doing amazing work in cryptography and keeping moving the industry forward, and the science forward of cryptography. It's a very deep science, and we all want to have it peer-reviewed, we all want to be attacked, we all want it to be proved secure, that we're not selling something to a major money center bank that is potentially risky because it's obscure and we're private. So we have an open standard. For six years, we worked with the Department of Commerce to get our standard approved by NIST; The National Institute of Science and Technology. They initially said, well, AES256 is going to be fine. And we said, well, it's fine for certain use cases, but for your database, you don't want to change your schema, you don't want to have this increase in storage costs. What we want is format preserving encryption. And what that does is turns my name, Rich, into a four-letter ciphertext. It can be reversed. The mathematics of that are fascinating, and really deep and amazing. But we really make that very simple for the end customer because we produce APIs. So these application programming interfaces can be accessed by applications in C or Java, C sharp, other languages. But they can also be accessed in Microservice Manor via rest and web service APIs. And that's the core of our technical platform. We have an appliance-based approach, so we take a secure data appliance, we'll put it on Prim, we'll make 50 of them if you're a big company like Verizon and you need to have these co-located around the globe, no problem; we can scale to the largest enterprise needs. But our typical customer will install several appliances and get going with a couple of environments like QA and Prod to be able to start getting encryption going inside their organization. Once the appliances are set up and installed, it takes just a couple of days of work for a typical technical staff to get done. Then you're up and running to be able to plug in the clients. Now what are the clients? Vertica's a huge one. Vertica's one of our most powerful client endpoints because you're able to now take that API, put it inside Vertica, it's all open on the internet. We can go and look at Vertica.com/secure data. You get all of our documentation on it. You understand how to use it very quickly. The APIs are super simple; they require three parameter inputs. It's a really basic approach to being able to protect and access data. And then it gets very deep from there because you have data like credit card numbers. Very different from a street address and we want to take a different approach to that. We have data like birthdate, and we want to be able to do analytics on dates. We have deep approaches on managing analytics on protected data like Date without having to put it in the clear. So we've maintained a lead in the industry in terms of being an innovator of the FF1 standard, what we call FF1 is format preserving encryption. We license that to others in the industry, per our NIST agreement. So we're the owner, we're the operator of it, and others use our technology. And we're the original founders of that, and so we continue to sort of lead the industry by adding additional capabilities on top of FF1 that really differentiate us from our competitors. Then you look at our API presence. We can definitely run as a dup, but we also run in open systems. We run on main frame, we run on mobile. So anywhere in the enterprise or one in the cloud, anywhere you want to be able to put secure data, and be able to access the protect data, we're going to be there and be able to support you there. >> Okay so, let's say I've talked to a lot of customers this week, and let's say I'm running in Eon mode. And I got some workload running in AWS, I've got some on Prim. I'm going to take an appliance or multiple appliances, I'm going to put it on Prim, but that will also secure my cloud workloads as part of a sort of shared responsibility model, for example? Or how does that work? >> No, that's absolutely correct. We're really flexible that we can run on Prim or in the cloud as far as our crypto engine, the key management is really hard stuff. Cryptography is really hard stuff, and we take care of all that, so we've all baked that in, and we can run that for you as a service either in the cloud or on Prim on your small Vms. So really the lightweight footprint for me running my infrastructure. When I look at the organization like you just described, it's a classic example of where we fit because we will be able to protect that data. Let's say you're ingesting it from a third party, or from an operational system, you have a website that collects customer data. Someone has now registered as a new customer, and they're going to do E-commerce with you. We'll take that data, and we'll protect it right at the point of capture. And we can now flow that through the organization and decrypt it at will on any platform that you have that you need us to be able to operate on. So let's say you wanted to pick that customer data from the operational transaction system, let's throw it into Eon, let's throw it into the cloud, let's do analytics there on that data, and we may need some decryption. We can place secure data wherever you want to be able to service that use case. In most cases, what you're doing is a simple, tiny little atomic efetch across a protected tunnel, your typical TLS pipe tunnel. And once that key is then cashed within our client, we maintain all that technology for you. You don't have to know about key management or dashing. We're good at that; that's our job. And then you'll be able to make those API calls to access or protect the data, and apply the authorization authentication controls that you need to be able to service your security requirements. So you might have third parties having access to your Vertica clusters. That is a special need, and we can have that ability to say employees can get X, and the third party can get Y, and that's a really interesting use case we're seeing for shared analytics in the internet now. >> Yeah for sure, so you can set the policy how we want. You know, I have to ask you, in a perfect world, I would encrypt everything. But part of the reason why people don't is because of performance concerns. Can you talk about, and you touched upon it I think recently with your sort of atomic access, but can you talk about, and I know it's Vertica, it's Ferrari, etc, but anything that slows it down, I'm going to be a concern. Are customers concerned about that? What are the performance implications of running encryption on Vertica? >> Great question there as well, and what we see is that we want to be able to apply scale where it's needed. And so if you look at ingest platforms that we find, Vertica is commonly connected up to something like Kafka. Maybe streamsets, maybe NiFi, there are a variety of different technologies that can route that data, pipe that data into Vertica at scale. Secured data is architected to go along with that architecture at the node or at the executor or at the lowest level operator level. And what I mean by that is that we don't have a bottleneck that everything has to go through one process or one box or one channel to be able to operate. We don't put an interceptor in between your data and coming and going. That's not our approach because those approaches are fragile and they're slow. So we typically want to focus on integrating our APIs natively within those pipeline processes that come into Vertica within the Vertica ingestion process itself, you can simply apply our protection when you do the copy command in Vertica. So really basic simple use case that everybody is typically familiar with in Vertica land; be able to copy the data and put it into Vertica, and you simply say protect as part of the data. So my first name is coming in as part of this ingestion. I'll simply put the protect keyword in the Syntax right in SQL; it's nothing other than just an extension SQL. Very very simple, the developer, easy to read, easy to write. And then you're going to provide the parameters that you need to say, oh the name is protected with this kind of a format. To differentiate it between a credit card number and an alphanumeric stream, for example. So once you do that, you then have the ability to decrypt. Now, on decrypt, let's look at a couple different use cases. First within Vertica, we might be doing select statements within Vertica, we might be doing all kinds of jobs within Vertica that just operate at the SQL layer. Again, just insert the word "access" into the Vertica select string and provide us with the data that you want to access, that's our word for decryption, that's our lingo. And we will then, at the Vertica level, harness the power of its CPU, its RAM, its horsepower at the node to be able to operate on that operator, the decryption request, if you will. So that gives us the speed and the ability to scale out. So if you start with two nodes of Vertica, we're going to operate at X number of hundreds of thousands of transactions a second, depending on what you're doing. Long strings are a little bit more intensive in terms of performance, but short strings like social security number are our sweet spot. So we operate very very high speed on that, and you won't notice the overhead with Vertica, perse, at the node level. When you scale Vertica up and you have 50 nodes, and you have large clusters of Vertica resources, then we scale with you. And we're not a bottleneck and at any particular point. Everybody's operating independently, but they're all copies of each other, all doing the same operation. Fetch a key, do the work, go to sleep. >> Yeah, you know, I think this is, a lot of the customers have said to us this week that one of the reasons why they like Vertica is it's very mature, it's been around, it's got a lot of functionality, and of course, you know, look, security, I understand is it's kind of table sticks, but it's also can be a differentiator. You know, big enterprises that you sell to, they're asking for security assessments, SOC 2 reports, penetration testing, and I think I'm hearing, with the partnership here, you're sort of passing those with flying colors. Are you able to make security a differentiator, or is it just sort of everybody's kind of got to have good security? What are your thoughts on that? >> Well, there's good security, and then there's great security. And what I found with one of my money center bank customers here in San Francisco was based here, was the concern around the insider access, when they had a large data store. And the concern that a DBA, a database administrator who has privilege to everything, could potentially exfil data out of the organization, and in one fell swoop, create havoc for them because of the amount of data that was present in that data store, and the sensitivity of that data in the data store. So when you put voltage encryption on top of Vertica, what you're doing now is that you're putting a layer in place that would prevent that kind of a breach. So you're looking at insider threats, you're looking at external threats, you're looking at also being able to pass your audit with flying colors. The audits are getting tougher. And when they say, tell me about your encryption, tell me about your authentication scheme, show me the access control list that says that this person can or cannot get access to something. They're asking tougher questions. That's where secure data can come in and give you that quick answer of it's encrypted at rest. It's encrypted and protected while it's in use, and we can show you exactly who's had access to that data because it's tracked via a different layer, a different appliance. And I would even draw the analogy, many of our customers use a device called a hardware security module, an HSM. Now, these are fairly expensive devices that are invented for military applications and adopted by banks. And now they're really spreading out, and people say, do I need an HSM? Well, with secure data, we certainly protect your crypto very very well. We have very very solid engineering. I'll stand on that any day of the week, but your auditor is going to want to ask a checkbox question. Do you have HSM? Yes or no. Because the auditor understands, it's another layer of protection. And it provides me another tamper evident layer of protection around your key management and your crypto. And we, as professionals in the industry, nod and say, that is worth it. That's an expensive option that you're going to add on, but your auditor's going to want it. If you're in financial services, you're dealing with PCI data, you're going to enjoy the checkbox that says, yes, I have HSMs and not get into some arcane conversation around, well no, but it's good enough. That's kind of the argument then conversation we get into when folks want to say, Vertica has great security, Vertica's fantastic on security. Why would I want secure data as well? It's another layer of protection, and it's defense in depth for you data. When you believe in that, when you take security really seriously, and you're really paranoid, like a person like myself, then you're going to invest in those kinds of solutions that get you best in-class results. >> So I'm hearing a data-centric approach to security. Security experts will tell you, you got to layer it. I often say, we live in a new world. The green used to just build a moat around the queen, but the queen, she's leaving her castle in this world of distributed data. Rich, incredibly knowlegable guest, and really appreciate you being on the front lines and sharing with us your knowledge about this important topic. So thanks for coming on theCUBE. >> Hey, thank you very much. >> You're welcome, and thanks for watching everybody. This is Dave Vellante for theCUBE, we're covering wall-to-wall coverage of the Virtual Vertica BDC, Big Data Conference. Remotely, digitally, thanks for watching. Keep it right there. We'll be right back right after this short break. (intense music)

Published Date : Mar 31 2020

SUMMARY :

Vertica Big Data Conference 2020 brought to you by Vertica. and we're pleased that The Cube could participate But maybe you can talk about your role And then to the other uses where we might be doing and how you guys are responding. and they said, we want to inform you your card and it does for the reasons that you mentioned. and put the software layers in place to make sure Yeah, so the whole big data thing started with Hadoop, So the lineage still goes back to Dr. Benet but that will also secure my cloud workloads as part of a and we can run that for you as a service but can you talk about, at the node to be able to operate on that operator, a lot of the customers have said to us this week and we can show you exactly who's had access to that data and really appreciate you being on the front lines of the Virtual Vertica BDC, Big Data Conference.

ENTITIES

Entity	Category	Confidence
Australia	LOCATION	0.99+
Europe	LOCATION	0.99+
Target	ORGANIZATION	0.99+
Verizon	ORGANIZATION	0.99+
Vertica	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
May 2018	DATE	0.99+
NIST	ORGANIZATION	0.99+
2016	DATE	0.99+
Boston	LOCATION	0.99+
2018	DATE	0.99+
San Francisco	LOCATION	0.99+
New York	LOCATION	0.99+
Target Corporation	ORGANIZATION	0.99+
$250 million	QUANTITY	0.99+
50	QUANTITY	0.99+
Rich Gaston	PERSON	0.99+
Singapore	LOCATION	0.99+
Turkey	LOCATION	0.99+
Ferrari	ORGANIZATION	0.99+
six years	QUANTITY	0.99+
2020	DATE	0.99+
one box	QUANTITY	0.99+
China	LOCATION	0.99+
C	TITLE	0.99+
Stanford University	ORGANIZATION	0.99+
Java	TITLE	0.99+
First	QUANTITY	0.99+
one	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
U.S.	LOCATION	0.99+
this week	DATE	0.99+
National Institute of Science and Technology	ORGANIZATION	0.99+
Each jurisdiction	QUANTITY	0.99+
both	QUANTITY	0.99+
Vertica	TITLE	0.99+
Rich	PERSON	0.99+
this year	DATE	0.98+
Vertica Virtual Big Data Conference	EVENT	0.98+
one channel	QUANTITY	0.98+
one process	QUANTITY	0.98+
GDPR	TITLE	0.98+
SQL	TITLE	0.98+
five billion rows	QUANTITY	0.98+
about five billion	QUANTITY	0.97+
One	QUANTITY	0.97+
C sharp	TITLE	0.97+
Benet	PERSON	0.97+
first	QUANTITY	0.96+
four-letter	QUANTITY	0.96+
Vertica Big Data Conference 2020	EVENT	0.95+
Hadoop	TITLE	0.94+
Kafka	TITLE	0.94+
Micro Focus	ORGANIZATION	0.94+

Ron Cormier, The Trade Desk | Virtual Vertica BDC 2020

>> David: It's the cube covering the virtual Vertica Big Data conference 2020 brought to you by Vertica. Hello, buddy, welcome to this special digital presentation of the cube. We're tracking the Vertica virtual Big Data conferences, the cubes. I think fifth year doing the BDC. We've been to every big data conference that they've held and really excited to be helping with the digital component here in these interesting times. Ron Cormier is here, Principal database engineer at the Trade Desk. Ron, great to see you. Thanks for coming on. >> Hi, David, my pleasure, good to see you as well. >> So we're talking a little bit about your background you got, you're basically a Vertica and database guru, but tell us about your role at Trade Desk and then I want to get into a little bit about what Trade Desk does. >> Sure, so I'm a principal database engineer at the Trade Desk. The Trade Desk was one of my customers when I was working with Hp, at HP, as a member of the Vertica team, and I joined the Trade Desk in early 2016. And since then, I've been working on building out their Vertica capabilities and expanding the data warehouse footprint and as ever growing database technology, data volume environment. >> And the Trade Desk is an ad tech firm and you are specializing in real time ad serving and pricing. And I guess real time you know, people talk about real time a lot we define real time as before you lose the customer. Maybe you can talk a little bit about you know, the Trade Desk in the business and maybe how you define real time. >> Totally, so to give everybody kind of a frame of reference. Anytime you pull up your phone or your laptop and you go to a website or you use some app and you see an ad what's happening behind the scenes is an auction is taking place. And people are bidding on the privilege to show you an ad. And across the open Internet, this happens seven to 13 million times per second. And so the ads, the whole auction dynamic and the display of the ad needs to happen really fast. So that's about as real time as it gets outside of high frequency trading, as far as I'm aware. So we put the Trade Desk participates in those auctions, we bid on behalf of our customers, which are ad agencies, and the agencies represent brands so the agencies are the madman companies of the world and they have brands that under their guidance, and so they give us budget to spend, to place the ads and to display them and once the ads get displayed, so we bid on the hundreds of thousands of auctions per second. Once we make those bids, anytime we do make a bid some data flows into our data platform, which is powered by Vertica. And, so we're getting hundreds of thousands of events per second. We have other events that flow into Vertica as well. And we clean them up, we aggregate them, and then we run reports on the data. And we run about 40,000 reports per day on behalf of our customers. The reports aren't as real time as I was talking about earlier, they're more batch oriented. Our customers like to see big chunks of time, like a whole day or a whole week or a whole month on a single report. So we wait for that time period to complete and then we run the reports on the results. >> So you you have one of the largest commercial infrastructures, in the Big Data sphere. Paint a picture for us. I understand you got a couple of like 320 node clusters we're talking about petabytes of data. But describe what your environment looks like. >> Sure, so like I said, we've been very good customers for a while. And we started out with with a bunch of enterprise clusters. So the Enterprise Mode is the traditional Vertica deployment where the compute and the storage is tightly coupled all raid arrays on the servers. And we had four of those and we're doing okay, but our volumes are ever increasing, we wanted to store more data. And we wanted to run more reports in a shorter period of time, was to keep pushing. And so we had these four clusters and then we started talking with Vertica about Eon mode, and that's Vertica separation of compute and storage where you get the compute and the storage can be scaled independently, we can add storage without adding compute or vice versa or we can add both, like. So that was something that we were very interested in for a couple reasons. One, our enterprise clusters, we're running out of disk, like when adding disk is expensive. In Enterprise Mode, it's kind of a pain, you got to add, compute at the same time, so you kind of end up in an unbalanced place. So beyond mode that problem gets a lot better. We can add disk, infinite disk because it's backed by S3. And we can add compute really easy to scale, the number of things that we run in parallel concurrency, just add a sub cluster. So they are two US East and US west of Amazon, so reasonably diverse. And and the real benefit is that they can, we can stop nodes when we don't need them. Our workload is fairly lumpy, I call it. Like we, after the day completes, we do the ingest, we do the aggregation for ingesting and aggregating all day, but the final hour, so it needs to be completed. And then once that's done, then the number of reports that we need to run spikes up, it goes really high. And we run those reports, we spin up a bunch of extra compute on the fly, run those reports and then spin them down. And we don't have to pay for that, for the rest of the day. So Eon has been a nice Boone for us for both those reasons. >> I'd love to explore you on little bit more. I mean, it's relatively new, I think 2018 Vertica announced Eon mode, so it's only been out there a couple years. So I'm curious for the folks that haven't moved the Eon mode, can you which presumably they want to for the same reasons that you mentioned why by the stories and chunks when you're on Storage if you don't have to, what were some of the challenges that you had to, that you faced in going to Eon mode? What kind of things did you have to prepare for? Were there any out of scope expectations? Can you share that experience with us? >> Sure, so we were an early adopter. We participated in the beta program. I mean, we, I think it's fair to say we actually drove the requirements and a lot of ways because we approached Vertica early on. So the challenges were what you'd expect any early adopter to be going through. The sort of getting things working as expected. I mean, there's a number of cases, which I could touch upon, like, we found an efficiency in the way that it accesses the data on S3 and it was accessing the data too frequently, which ended up was just expensive. So our S3 bill went up pretty significantly for a couple of months. So that was a challenge, but we worked through that another was that we recently made huge strides in with Vertica was the ability to stop and start nodes and not have to start them very quickly. And when they start to not interfere with any running queries, so when we create, when we want to spin up a bunch to compute, there was a point in time when it would break certain queries that were already running. So that that was a challenge. But again, the very good team has been quite responsive to solving these issues and now that's behind us. In terms of those who need to get started, there's or looking to get started. there's a number of things to think about. Off the top of my head there's sort of new configuration items that you'll want to think about, like how instance type. So certainly the Amazon has a variety of instances and its important to consider one of Vertica's architectural advantages in these areas Vertica has this caching layer on the instances themselves. And what that does is if we can keep the data in cache, what we've found is that the performance is basically the same performance of Enterprise Mode. So having a good size cast when needed, can be a little worrying. So we went with the I three instance types, which have a lot of local NVME storage that we can, so we can cache data and get good performance. That's one thing to think about. The number of nodes, the instance type, certainly the number of shards is a sort of technical item that needs to be considered. It's how the data gets, its distributed. It's sort of a layer on top of the segmentation that some Vertica engineers will be familiar with. And probably I mean, the, one of the big things that one needs to consider is how to get data in the database. So if you have an existing database, there's no sort of nice tool yet to suck all the data into an Eon database. And so I think they're working on that. But we're at the point we got there. We had to, we exported all our data out of enterprise cluster as cache dumped it out to S3 and then we had the Eon cluster to suck that data. >> So awesome advice. Thank you for sharing that with the community. So but at the end of the day, so it sounds like you had some learning to do some tweaking to do and obviously how to get the data in. At the end of the day, was it worth it? What was the business impact? >> Yeah, it definitely was worth it for us. I mean, so right now, we have four times the data in our Eon cluster that we have in our enterprise clusters. We still run some enterprise clusters. We started with four at the peak. Now we're down to two. So we have the two young clusters. So it's been, I think our business would say it's been a huge win, like we're doing things that we really never could have done before, like for accessing the data on enterprise would have been really difficult. It would have required non trivial engineering to do things like daisy chaining clusters together, and then how to aggregate data across clusters, which would, again, non trivial. So we have all the data we want, we can continue to grow data, where running reports on seasonality. So our customers can compare their campaigns last year versus this year, which is something we just haven't been able to do in the past. We've expanded that. So we grew the data vertically, we've expanded the data horizontally as well. So we were adding columns to our aggregates. We are, in reaching the data much more than we have in the past. So while we still have enterprise kicking around, I'd say our clusters are doing the majority of the heavy lifting. >> And the cloud was part of the enablement, here, particularly with scale, is that right? And are you running certain... >> Definitely. >> And you are running on prem as well, or are you in a hybrid mode? Or is it all AWS? >> Great question, so yeah. When I've been speaking about enterprise, I've been referring to on prem. So we have a physical machines in data centers. So yeah, we are running a hybrid now and I mean, and so it's really hard to get like an apples to apples direct comparison of enterprise on prem versus Eon in the cloud. One thing that I touched upon in my presentation is it would require, if I try to get apples to apples, And I think about how I would run the entire workload on enterprise or on Eon, I had to run the entire thing, we want both, I tried to think about how many cores, we would need CPU cores to do that. And basically, it would be about the same number of cores, I think, for enterprise on prime versus Eon in the cloud. However, Eon nodes only need to be running half the course only need to be running about six hours out of the day. So the other the other 18 hours I can shut them down and not be paying for them, mostly. >> Interesting, okay, and so, I got to ask you, I mean, notwithstanding the fact that you've got a lot invested in Vertica, and get a lot of experience there. A lot of you know, emerging cloud databases. Did you look, I mean, you know, a lot about database, not just Vertica, your database guru in many areas, you know, traditional RDBMS, as well as MPP new cloud databases. What is it about Vertica that works for you in this specific sweet spot that you've chosen? What's really the difference there? >> Yeah, so I think the key differences is the maturity. There are a number, I am familiar with another, a number of other database platforms in the cloud and otherwise, column stores specifically, that don't have the maturity that we're used to and we need at our scale. So being able to specify alternate projections, so different sort orders on my data is huge. And, there's other platforms where we don't have that capability. And so the, Vertica is, of course, the original column store and they've had time to build up a lead in terms of their maturity and features and I think that other other column stores cloud, otherwise are playing a little bit of catch up in that regard. Of course, Vertica is playing catch up on the cloud side. But if I had to pick whether I wanted to write a column store, first graph from scratch, or use a defined file system, like a cloud file system from scratch, I'd probably think it would be easier to write the cloud file system. The column store is where the real smarts are. >> Interesting, let's talk a little bit about some of the challenges you have in reporting. You have a very dynamic nature of reporting, like I said, your clients want to they want to a time series, they just don't want to snap snapshot of a slice. But at the same time, your reporting is probably pretty lumpy, a very dynamic, you know, demand curve. So first of all, is that accurate? Can you describe that sort of dynamic, dynamism and how are you handling that? >> Yep, that's exactly right. It is lumpy. And that's the exact word that I use. So like, at the end of the UTC day, when UTC midnight rolls around, that's we do the final ingest the final aggregate and then the queue for the number of reports that need to run spikes. So the majority of those 40,000 reports that we run per day are run in the four to six hours after that spikes up. And so that's when we need to have all the compute come online. And that's what helps us answer all those queries as fast as possible. And that's a big reason why Eon is advantage for us because the rest of the day we kind of don't necessarily need all that compute and we can shut it down and not pay for it. >> So Ron, I wonder if you could share with us just sort of the wrap here, where you want to take this you're obviously very close to Vertica. Are you driving them in a heart and Eon mode, you mentioned before you'd like, you'd have the ability to load data into Eon mode would have been nice for you, I guess that you're kind of over that hump. But what are the kinds of things, If Column Mahoney is here in the room, what are you telling him that you want the team, the engineering team at Vertica to work on that would make your life better? >> I think the things that need the most attention sort of near term is just the smoothing out some of the edges in terms of making it a little bit more seamless in terms of the cloud aspects to it. So our goal is to be able to start instances and have them join the cluster in less than five minutes. We're not quite there yet. If you look at some of the other cloud database platforms, they're beating that handle it so I know the team is working on that. Some of the other things are the control. Like I mentioned, while we like control in the column store, we also want control on the cloud side of things in terms of being able to dedicate cluster, some clusters specific. We can pin workloads against a specific sub cluster and take advantage of the cast that's over there. We can say, okay, this resource pool. I mean, the sub cluster is a new concept, relatively new concept for Vertica. So being able to have control of many things at sub cluster level, resource pools, configuration parameters, and so on. >> Yeah, so I mean, I personally have always been impressed with Vertica. And their ability to sort of ride the wave adopt new trends. I mean, they do have a robust stack. It's been, you know, been 10 plus years around. They certainly embraced to do, the embracing machine learning, we've been talking about the cloud. So I actually have a lot of confidence to them, especially when you compare it to other sort of mid last decade MPP column stores that came out, you know, Vertica is one of the few remaining certainly as an independent brand. So I think that speaks the team there and the engineering culture. But give your final word. Just final thoughts on your role the company Vertica wherever you want to take it. >> Yeah, no, I mean, we're really appreciative and we value the partners that we have and so I think it's been a win win, like our volumes are, like I know that we have some data that got pulled into their test suite. So I think it's been a win win for both sides and it'll be a win for other Vertica customers and prospects, knowing that they're working with some of the highest volume, velocity variety data that (mumbles) >> Well, Ron, thanks for coming on. I wish we could have met face to face at the the Encore in Boston. I think next year we'll be able to do that. But I appreciate that technology allows us to have these remote conversations. Stay safe, all the best to you and your family. And thanks again. >> My pleasure, David, good speaking with you. >> And thank you for watching everybody, we're covering this is the Cubes coverage of the Vertica virtual Big Data conference. I'm Dave volante. We'll be right back right after this short break. (soft music)

Published Date : Mar 31 2020

SUMMARY :

brought to you by Vertica. So we're talking a little bit about your background and I joined the Trade Desk in early 2016. And the Trade Desk is an ad tech firm And people are bidding on the privilege to show you an ad. So you you have one of the largest And and the real benefit is that they can, for the same reasons that you mentioned why by dumped it out to S3 and then we had the Eon cluster So but at the end of the day, So we have all the data we want, And the cloud was part of the enablement, here, half the course only need to be running I mean, notwithstanding the fact that you've got that don't have the maturity about some of the challenges you have in reporting. because the rest of the day we kind of So Ron, I wonder if you could share with us in terms of the cloud aspects to it. the company Vertica wherever you want to take it. and we value the partners that we have Stay safe, all the best to you and your family. of the Vertica virtual Big Data conference.

ENTITIES

Entity	Category	Confidence
Ron	PERSON	0.99+
David	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
Ron Cormier	PERSON	0.99+
HP	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
last year	DATE	0.99+
AWS	ORGANIZATION	0.99+
40,000 reports	QUANTITY	0.99+
Boston	LOCATION	0.99+
18 hours	QUANTITY	0.99+
fifth year	QUANTITY	0.99+
US	LOCATION	0.99+
Dave volante	PERSON	0.99+
next year	DATE	0.99+
seven	QUANTITY	0.99+
both	QUANTITY	0.99+
One	QUANTITY	0.99+
2018	DATE	0.99+
less than five minutes	QUANTITY	0.99+
this year	DATE	0.99+
10 plus years	QUANTITY	0.99+
one	QUANTITY	0.99+
four	QUANTITY	0.99+
early 2016	DATE	0.98+
apples	ORGANIZATION	0.98+
two young clusters	QUANTITY	0.98+
two	QUANTITY	0.98+
both sides	QUANTITY	0.98+
about six hours	QUANTITY	0.98+
Cubes	ORGANIZATION	0.98+
six hours	QUANTITY	0.98+
US East	LOCATION	0.98+
Hp	ORGANIZATION	0.98+
Eon	ORGANIZATION	0.96+
S3	TITLE	0.95+
13 million times per second	QUANTITY	0.94+
half	QUANTITY	0.94+
prime	COMMERCIAL_ITEM	0.94+
four times	QUANTITY	0.92+
hundreds of thousands of auctions	QUANTITY	0.92+
mid last decade	DATE	0.89+
one thing	QUANTITY	0.88+
One thing	QUANTITY	0.87+
single report	QUANTITY	0.85+
couple reasons	QUANTITY	0.84+
four clusters	QUANTITY	0.83+
first graph	QUANTITY	0.81+
Vertica	TITLE	0.81+
hundreds of thousands of events per second	QUANTITY	0.8+
about 40,000 reports per day	QUANTITY	0.78+
Vertica Big Data conference 2020	EVENT	0.77+
320 node	QUANTITY	0.74+
a whole week	QUANTITY	0.72+
Vertica virtual Big Data	EVENT	0.7+

Josh Rogers, Syncsort | Big Data NYC 2017

>> Announcer: Live from Midtown Manhattan it's theCUBE. Covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Welcome back everyone live here in New York City this theCUBE's coverage of our fifth annual annual event that we put on ourselves in conjunction Strata Hadoop now called Strata Data. It's theCUBE and we're covering the scene here at Hadoop World going back to 2010, eight years of Coverage. I'm John Furrier co-host of theCUBE. Usually Dave Vellante is here but he's down covering the Splunk Conference and who was there yesterday was no other than Josh Rogers my next guest the CEO of Syncsort, you were with Dave Vellante yesterday and live on theCUBE in Washington, DC for the Splunk .conf kind of a Big Data Conference but it's a proprietary, branded event for themselves. This is a more industry even here at Big Data NYC that we put on. Welcome back glad you flew up on the on the Concord, the private jet. >> Early morning but it was was fine. >> No good to see you a CEO of Syncsort, you guys have been busy. For the folks watching in theCUBE community know that you've been on many times. The folks that are learning more about theCUBE every day, you guys had an interesting transformations as a company, take a minute to talk about where you've come from and where you are today. Certainly a ton of corporate development activity in your end it, as you guys are seeing the opportunities, you're moving on them. Take a minute to explain. >> So, you know it's been a great journey so far and there's a lot more work to do, but you know Syncsort is one of the first software companies, right. Founded in the late 60's today has a unparalleled franchise in the mainframe space. But over the last 10 years or so we branched out into open systems and delivered high performance data integration solutions. About 4 years ago really started to invest in the Big Data space we had a DNA around performance and scale we felt like that would be relevant in the Big Data space. We delivered a Hadoop focused product and today we focus around that product around helping customers ingest mainframe data assets into their into Hadoop clusters along with other types data. But a specific focus there. That has lead us into understanding a bigger market space that we call Big Iron to Big Data. And what we see in the marketplace is that customers are adapting. >> Just before you get in there I love that term, Big Iron Big Data you know I love Big Iron. Used to be a term for the mainframe for the younger generation out there. But you're really talking about you guys have leveraged experience with the installed base activity that scale call it batched, molded, single threaded, whatever you want to call it. But as you got into the game of Big Data you then saw other opportunities, did I get that right? You got into the game with some Hadoop, then you realize, whoa, I can do some large scale. What was that opportunity? >> The opportunity is that you know large enterprise is absolutely investing heavily in the next generation of analytic technologies in a new stack. Hadoop is a part of that, Spark is a part of that. And they're rapidly adopting these new infrastructures to drive deeper analytics to answer bigger questions and improve their business and in multiple dimensions. The opportunity we saw was that you know the ability for those enterprises to be able to integrate this new kind of architecture with the legacy architectures. So, the old architectures that were powering key applications impede key up producers of data was a challenge, there was multiple technology challenges, there's cultural challenges. And we had this kind of expertise on both sides of the house and and we found that to be unique in the marketplace. So we put a lot of effort into understanding, defining what are the challenges in that Big Iron to Big Data space that helped customers maximize their value out of these investments in next generation architectures. And we define the problem two ways, one is our two components. One is that people are generating more and more data more and more touch points and driving more and more transactions with their customers. And that's generating increased load on the compute environments and they want to figure out how do I run that, you know if I have a mainframe how to run as efficiently as possible contain my costs maximize availability and uptime. At the same time I've got all this new data that I can start to analyze but I got to get it from the area that it's produced into this next generation system. And there's a lot of challenges there. So we started to isolate, you know, what are the specific use cases the present customers challenge and deliver very different IT solutions. Overarching kind of messages around positioning is around solving the Big Iron to Big Data challenge. >> You guys had done some acquisitions and been successful, I want to talk a little bit about the ones that you like right now that happened the past year or two years. I think you've done five in the past two years. A couple key notable ones that set you up kind of give you pole position for some of these big markets, and then after we talk then I want to talk about your ecosystem opportunity. But some of the acquisitions and what's working for you? What's been the big deals? >> So the larger the larger we did in 2016 was a company called Trillium, leader in the data quality space. Long time leader in the data quality space and the opportunity we saw with Trillium was to complement our data movement integration capabilities. A natural complement, but to focus very specifically on how to drive value in this next generation architecture. Particularly in things like Hadoop. what I'd like to be able to do is apply best in class data quality routines directly in that environment. And so we, from our experience in delivering these Big Data solutions in the past, we knew that we could take a lot of technology and create really powerful solutions that were that leverage the native kind of capabilities of Hadoop but had it on a layer of you've proven technology for best in class day quality. Probably the biggest news of the last few weeks has been that we were acquired by a new private equity partner called Centerbridge Partners. In that acquisition actually acquired Syncsort and they acquired a company called Vision Solutions. And we've combined those organizations. >> John: When did that happen? >> The deal was announced July, early July and it closed in the middle of August. And vision solutions is a really interesting company. They're the leader in high availability for the IBM i market. IBM i was originally called AS/400 it's had a couple of different names and a dominant kind of market position. What we liked about that business was A. That market position four thousand customers generally large enterprise. And also you know market leading capability around data replication in real time. >> And we saw IBM. >> Migration data, disaster recovery kind of thing? >> It's DR it's high availability, it's migrations, it's also changed data capture actually. And leveraging all common technology elements there. But it also represents a market leading franchise in IBM i which is in many ways very similar to the mainframe. Run optimized for transactional systems, hard to kind of get at. >> Sounds like you're reconstructing the mainframe in the cloud. >> It's not so much that, it's the recognition that those compute systems still run the world. They still run all the transactions. >> Well, some say the cloud is a software mainframe. >> I think over time you'll see that, we don't see that our business today. There is a cloud aspect our business it's not to move this transactional applications running on those platforms into the cloud yet. Although I suspect that happens at some point. But our point, our interest was more these are the systems that are producing the world's data. And it's hard to to get. >> There are big, big power sources for data, they're not going anywhere. So we've got the expertise to source that data into these next generation systems. And that's a tricky problem for a lot of customers, and and not something. >> That a problem they have. And you guys basically cornered the market on that. >> So think about Big Iron and Big Data as these two components, being able to source data and make a productive using these next generation analytics systems, and also be able to run those existing systems as you know efficiently as possible. >> All right, so how do you talk to customers and I've asked this question before so I just ask again, oh, Syncsort now you got vision you guys are just a bunch of old mainframe guys. What do you know about cloud native? A lot of the hipsters and the young guns out there might not know about some of the things you're doing on the cutting edge, because even though you have the power base of these old big systems, we're just throwing off massive amounts of data that aren't going anywhere. You still are integrated into some cutting edge. Talk about that, that narrative, and how you. >> So I mean the folks that we target. >> I used cloud only as an example. Shiny, cool, new toys. >> Organizations we target and our customers and prospects, and generally we we serve large enterprise. You know large complex global enterprises. They are making significant investments in Hadoop and Splunk and these next generation environments. We approach them and say we believe to get full value out of your investments in these next generation technologies, it would be helpful if you had your most critical data assets available. And that's hard, and we can help you do that. And we can help you do that in a number of ways that you won't be able to find anywhere else. That includes features in our products, it includes experts on the ground. And what we're seeing is there's a huge demand because, you know, Hadoop is really kind of you can see it in the Cloudera and Hortonworks results and the scale of revenue. This is a you know a real foundational component data management this point. Enterprises are embracing it. If they can't solve that integration challenge between the systems that produce all the data and, you know, where they want to analyze the data There's a there's a big value gap. And we think we're uniquely positioned to be able to do that, one because we've got the technical expertise, two, they're all our customers at this point, we have six thousand customers. >> You guys have executed very well. I just got to say you guys are just slowly taking territory down you and you got a great strategy, get into a business, you don't overplay your hand or get over your skis, whatever you want to call it. And you figure it out and see if was a fit. If it is, grab it, if not, you move on. So also you guys have relationships so we're talking about your ecosystem. What is your ecosystem and what is your partner strategy? >> I'll talk a little bit about the overall strategy and I'll talk about how partners fit into that. Our strategy is to identify specific use cases that are common and challenging in our customer set, that fall within this Big Iron to Big Data umbrella. It's then to deliver a solution that is highly differentiated. Now, the third piece of that is to partner very closely with you know the emerging platform vendors in the in the Big Data space. And the reason for that is we're solving an integration challenge for them. Like Cloudera, like Hortonworks, like Splunk. We launched a relationship with Calibra in the middle the year. We just announced our relationship. >> Yeah, for them the benefits of them is they don't do the heavy lifting you've got that covered. >> We can we can solve a lot of pain points they have getting their platforms setup. >> That's hard to replicate on their end, it's not like they're going to go build it. >> Cloudera and Hortonworks, they don't have mainframe skills. They don't understand how to go access >> Classic partnering example. >> But that the other pieces is we do real engineering work with these partnerships. So we build, we write code to integrate and add value to platforms. >> It's not a Barney deal, it's not an optical deal. >> Absolutely. >> Any jazz is critical in the VM world of some of the deals he's been done in the industry referring to his deal, that's seems to be back in vogue thank God, that people going to say they're going to do a deal and they back it with actually following through. What about other partnerships, how else, how you looking at partnering? So, pretty much, where it fits in your business, are people coming to you, are you going to them? >> We certainly have people coming to us. The the key thing, the number one driver is customers. You know, as we understand use cases, as customers introduce us to new challenges that they are facing, we will not just look at how do we solve it, but and what are the other platforms that we're integrating with, and if we believe we can add unique value to that partner we'll approach that partner. >> Let's talk customers, give me some customer use cases that you're working on right now, that you think are notable worth highlighting. >> Sure so we do a lot in the in the financial services space. You know we have a number of customers >> Where there's mainframes. >> Where there's a lot of mainframes, but it's not just in financial services. Here's an interesting one, was insurance company and they were looking at how to transition their mainframe archive strategy. So they have regulations around how long they have to keep data, they had been using traditional mainframe archive technology, very expensive on annual basis and also unflexible. They didn't have access to. >> And performance too. At the end of the day don't forget performance >> They want performance, this was more of an archive use case and what they really wanted was an ability both access the data and also lower the cost of storing the data for the required time from a regulation perspective. And so they made the decision that they wanted to store it in the cloud, they want to store it in S3. There's a complicated data movement there, there's a complicated data translation process there and you need to understand the mainframe and you need to understand AWS and S3 and all those components, and we had all those pieces and all that expertise and were able to solve that. So we're doing that with a few different customers now. But that's just an example of, you know, there's a great ROI, there's a lot more business flexibility then there's a modernization aspect to it that's very attractive. >> Well, great to hear from you today. I'm glad you made it up here, again you were in DC yesterday thanks for coming in, checking out to shows you're certainly pounding the pavement as they say in New York, to quote New Yorker phrase. What's new for you guys, what's coming out? More acquisitions happening? what's the outlook for Syncsort? >> So were were always active on the M&A front. We certainly have a pipeline of activities and there's a lot of different you know interesting spaces, adjacencies that we're exploring right now. There's nothing that I can really talk about there >> Can you talk about the categories you're looking at? >> Sure you know, things around metadata management, things around real-time data movement, cloud opportunities. There's there's some interesting opportunities in the artificial intelligence, machine learning space. Those are all >> Deep learning. >> Deep learning, those are all interesting spaces for us to think about. Security and other space is interesting. So we're pretty active in a lot of adjacencies >> Classic adjacent markets that you're looking at. So you take one step at a time, slow. >> But then we try to innovate on, you know, after the catch, so we did three announcements this week. Transaction tracing for Ironstream and a kind of refresh of data quality for Hadoop approach. So we'll continue to innovate on the organic setup as well. >> Final question the whole private equity thing. So that's done, so they put a big bag of money in there and brought the two companies together. Is there structural changes, management changes, you're the Syncsort CEO is there a new co name? >> The combined companies will operate under the Syncsort name, I'll serve as the CEO. >> Syncsort is the remaining name and you guys now have another company under it. >> Yes, that's right. >> And cash they put in, probably a boatload of cash for corporate development. >> The announcement the announced deal value was $1.2 billion a little over $1.2 billion. >> So you get a checkbook and looking to buy companies? >> We are we're going to continue, as I said yesterday, to Dave, you know I like to believe that we proved the hypothesis were in about the second inning. Can't wait to keep playing the game. >> It's interesting just, real quick while I got you in here, we got a break coming up for the guys. Private equity move is a good move in this transitional markets, you and I have talked about this in the past off-camera. It's a great thing to do, is take, if you're public and you're not really knocking it out of the park. Kill the 90 day shot clock, go private, there seems to be a lot of movement there. Retool and then re-emerge stronger. >> We've never been public, but I will say, the Centerbridge team has been terrific. A lot of resources there and certainly we do talk we're still very quarterly focused, but I think we've got a great partner and look forward to continue. >> The waves are coming, the big waves are coming so get your big surfboard out, we say in California. Josh, thanks for spending the time. Josh Rogers, CEO Syncsort here on theCUBE. More live coverage in New York after this break. Stay with us for our day two of three days of coverage of Big Data NYC 2017. Our event that we hold every year here in conjunction with Hadoop World right around the corner. I'm John Furrier, we'll be right back.

Published Date : Oct 2 2017

SUMMARY :

Brought to you by SiliconANGLE Media the CEO of Syncsort, you were with Dave Vellante No good to see you a CEO of Syncsort, in the Big Data space we had a DNA around performance You got into the game with some Hadoop, of the house and and we found that to be unique about the ones that you like right now and the opportunity we saw with Trillium was and it closed in the middle of August. hard to kind of get at. reconstructing the mainframe in the cloud. It's not so much that, it's the recognition the systems that are producing the world's data. and and not something. And you guys basically cornered the market on that. as you know efficiently as possible. A lot of the hipsters and the young guns out there I used cloud only as an example. And that's hard, and we can help you do that. I just got to say you guys are just slowly Now, the third piece of that is to partner very closely is they don't do the heavy lifting you've got that covered. We can we can solve a lot of pain points it's not like they're going to go build it. Cloudera and Hortonworks, they don't But that the other pieces is we of some of the deals he's been done in the industry the other platforms that we're integrating with, that you think are notable worth highlighting. the financial services space. and they were looking at how to transition At the end of the day don't forget performance and you need to understand the mainframe Well, great to hear from you today. and there's a lot of different you know interesting spaces, in the artificial intelligence, machine learning space. Security and other space is interesting. So you take one step at a time, slow. But then we try to innovate on, you know, and brought the two companies together. the Syncsort name, I'll serve as the CEO. Syncsort is the remaining name and you guys And cash they put in, probably a boatload of cash the announced deal value was $1.2 billion to Dave, you know I like to believe that we proved in this transitional markets, you and I the Centerbridge team has been terrific. Our event that we hold every year here

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
New York	LOCATION	0.99+
Josh Rogers	PERSON	0.99+
2016	DATE	0.99+
California	LOCATION	0.99+
$1.2 billion	QUANTITY	0.99+
Syncsort	ORGANIZATION	0.99+
July	DATE	0.99+
John Furrier	PERSON	0.99+
Josh	PERSON	0.99+
two companies	QUANTITY	0.99+
Centerbridge Partners	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
90 day	QUANTITY	0.99+
Washington, DC	LOCATION	0.99+
yesterday	DATE	0.99+
2010	DATE	0.99+
Centerbridge	ORGANIZATION	0.99+
three days	QUANTITY	0.99+
Vision Solutions	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
five	QUANTITY	0.99+
DC	LOCATION	0.99+
Big Iron	ORGANIZATION	0.99+
third piece	QUANTITY	0.99+
Calibra	ORGANIZATION	0.99+
Hadoop World	ORGANIZATION	0.99+
one	QUANTITY	0.99+
One	QUANTITY	0.99+
two ways	QUANTITY	0.99+
two components	QUANTITY	0.99+
early July	DATE	0.99+
Trillium	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
both sides	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
late 60's	DATE	0.98+
today	DATE	0.98+
middle of August	DATE	0.98+
M&A	ORGANIZATION	0.98+
this week	DATE	0.98+
AWS	ORGANIZATION	0.98+
six thousand customers	QUANTITY	0.98+
NYC	LOCATION	0.98+
Midtown Manhattan	LOCATION	0.98+
one step	QUANTITY	0.98+
Splunk	ORGANIZATION	0.98+
four thousand customers	QUANTITY	0.98+
eight years	QUANTITY	0.98+
vision solutions	ORGANIZATION	0.98+
over $1.2 billion	QUANTITY	0.97+
both	QUANTITY	0.97+
Barney	ORGANIZATION	0.97+
S3	TITLE	0.97+
Ironstream	ORGANIZATION	0.97+
Splunk Conference	EVENT	0.97+
About 4 years ago	DATE	0.96+
two	QUANTITY	0.96+
past year	DATE	0.96+
three announcements	QUANTITY	0.96+
Concord	LOCATION	0.95+
theCUBE	ORGANIZATION	0.95+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Big Data Conference 2016: