Tendü Yogurtçu, Syncsort
(upbeat music) >> Hi and welcome to another Cube Conversation, where we go in depth with the thought leaders in the industry that are making significant changes to how we conduct digital business, and the likelihood of success with digital business transformations. I'm Peter Burris. Every organization today has some experience with the power of analytics, but they're also learning that the value of their analytic systems are, in part, constrained and determined by their access to core information. Some of the most important information that any business can start to utilize within their new advanced analytic systems, quite frankly, is that operational business information that the business has been using to run the business on for years. Now, we've looked at that as silos, and maybe it is, although partly that's in response to the need to have good policy, good governance, and good certainty and predictably in how the system behaves, and how secure it's going to be. So, the question is, how do we marry the new world of advanced analytics with the older, but, nonetheless, extremely valuable world of operational processing to create new types of value within digital business today? It's a great topic and we've got a great conversation. Tendü Yogurtçu is the CTO of Syncsort. Tendü, welcome back to theCube. >> Hi Peter, it's great to be back in theCube. >> Excellent. So, look, let's start with a quick update on Syncsort. How are you doing? What's going on? >> Oh, it's been really a exciting time at Syncsort. We have seen tremendous growth in the last three years. We quadrupled our revenue and also number of employees, tripled organic innovation and growth, as well as true acquisitions. So, we now have 7,000 plus customers in over 100 countries, and we still have the 84 of Fortune 100 serving large enterprises. It's been a really great journey. >> Well, so let's get into the specific distinction that you guys have. At Wikibon theCube, we've observed, we predicted that 1919, 2019, rather, 2019 was going to be the year that the enterprise asserted itself in the cloud. We had seen a lot of developers drive cloud forward, we've seen a lot of analytics drive cloud forward, but now as enterprises are entering into cloud in a big way, they're generating or bringing with them new types of challenges and issues that have to be addressed. So, when you think about where we are in the journey to more advanced analytics, better operational certainty, greater use of information, what do you think the chief challenges that customers face today are? >> Of course, as you mentioned, that everybody, every organization is trying to take advantage of the data, data is the core, and take advantage of the digital transformation to enable them for taking, getting more value out of their data. And, in doing so, they are moving into cloud, into hybrid cloud architectures. We have seen early implementations starting with the data lake. Everybody started creating this centralized data hub enabling advanced analytics and creating a data marketplace for their internal or external clients. And the early data lakes were utilizing Hadoop on on-premise architectures, now we are also seeing data lakes sometimes expanding over hybrid or cloud architectures. The challenges that these organizations also started realizing is around once I create this data marketplace, the access to the data, critical customer data, critical product data-- >> Order data. >> Order data, is a bigger challenge that I told that it will be in the pilot project because these critical data sets and core data sets often in financial services, banking, and insurance, and healthcare are environments, data platforms that these companies have invested over multiple decades. And I'm not referring to that as legacy because definition of legacy changes, these environments, platforms have been holding these critical data assets for decades successfully. >> We call them high value traditional applications because the traditional we know what they do, there's a certain operational certainty, and we've built up, you know, the organization around them to take care of those assets, but they still are very, very high value. >> Exactly, and making those applications and data available for next generation, next wave platforms is becoming a challenge for couple of different reasons. One, accessing this data, and accessing this data making sure the policies and the security and the privacy around these data stores are preserved when the data is available for advanced analytics, whether it's in the cloud or on-premise deployments. >> So, before you go to the second one, I want to make sure I understand that because it seems very, very important, that what you're saying is, if I may, the data is not just the ones and the zeros in the file, the data really needs to start being thought of as the policies, the governance, the security, and all the other attributes and elements, the metadata, if you will, has to be preserved as the data is getting used. >> Absolutely, and there are challenges around that because now you have to have skillsets to understand the data in those different types of stores, relational data warehouses, Mainframe, IBM i, SQL, Oracle, many different data owners and different teams in the organization, and then, you have to make sense of it and preserve the policies around each of these data assets while bringing it to the new analytics environments. And make sure that everybody is aligned with the access to privacy and the policies and the governance around that data. And also, mapping the metadata to the target systems, right? That's a big challenge because somebody who understands these data sets in a Mainframe environment is not necessarily understanding the cloud data stores or the new data formats, so how do you kind of bridge that gap and map into the target environment? >> And vice versa, right? >> Likewise, yes. >> This is where Syncsort starts getting really interesting because, as you noted, a lot of the folks in the Mainframe world may not have the familiarity of how the cloud works, and a lot of the folks, at least from a data standpoint, and a lot of folks in the cloud that have been doing things with object stores and whatnot, may not, in Hadoop, may not have the knowledge of how the Mainframe works. And so those two sides are seeing silos, but the reality is both sides have set up policies and governance models and security regimes and everything else because it works for the workloads that are in place on each side. >> Absolutely. >> So Syncsort's an interesting company because you guys have experience of crossing that divide. >> Absolutely, and we see both the next wave and existing data platforms as a moving, evolving target because these challenges have existed twenty years ago, ten years ago, it's just the platforms were different, the volume, the variety, complex was different, however, Hadoop, five, ten years ago was the next wave, now it's the cloud, blockchain will be the next platform that we have to still kind of adapt and make sure that we are advancing our data and creating value out of data. So that's accessing and preserving those policies is one challenge. And then the second challenge is that as you are making these data sets available for analytics or mission learning, data science applications, you're duplicating, standardizing, cleansing, making sure that you can deliver trusted data becomes a big challenge because if you train the models with the bad data, if you create the models with the bad data you have bad model and then bad data inside. So, mission learning and artificial intelligence depends on the data and the quality of the data, so it's not just bringing all enterprise data for analytics, it's also making sure that the data is delivered in a trusted way. That's a big challenge. >> Yeah, let me build on that if I may, Tendü, because a lot of these tools involve black box belief in what the tool's performing. >> Correct. >> So you really don't have a lot of visibility in the inner workings of how the algorithm is doing things. It's, you know, that's the way it is. So, in many respects, you're only real visibility into the quality of the outcome of these tools is visibility into the quality of data that's going into the building of these models. Have I got that right? >> Correct. And in mission learning, the effect of bad data it really multiplies because of the training of the model, as well as the insights. And with blockchain, in the future, it will also become very critical because once you load the data into blockchain platform, it's immutable. So, data quality comes at a higher price in some sense. So that's another big challenge. >> Which is to say that if you load bad data into a blockchain, it's bad forever. >> Yes, that's very true. So that's obviously another area that Syncsort, as we are accessing all of the enterprise data, delivering high quality data, discovering and understanding the data, and delivering the duplicated, standardized, enriched data to the mission learning and AI pipeline and analytics pipeline is an area that we are focused with our products. And the third challenge is that as you are doing it, the speed starts mattering because, okay, I created the data lake or the data hub, the next big use case we started seeing is that oh yeah, but I have twenty terabyte data, only ten percent is changing on a nightly basis, so how do I keep my data lake in sync? Not only that, I want to keep my data lake in sync, I also would like to feed that changed data and keep my downstream applications in sync. I want to feed the changed data to the micro services in the cloud. That speed of delivery started really becoming very critical requirement for the businesses. >> Speed and the targeting of the delivery. >> Speed of the targeting grid, exactly. Because I think the bottom line is you really want to create an architecture that you can be agnostic and also be able to deliver at the speed the business is going to require at different times. Sometimes it's near real time in a batch, sometimes it's real time and you have to feed the changes as quickly as possible to the consumer applications and the micro services in the cloud. >> Well, we've got a lot of CIOs who are starting to ask us questions about, especially as they start thinking about Kubernetes and Istio and other types of platforms that are intended to facilitate the orchestration and ultimately the management of how these container-based applications work. And we're starting to talk more about the idea of data assurance. Make sure the data is good, make sure it's high quality, make sure it's being taken care of, but also make sure that it's targeted where it needs to be, because you don't want a situation where you spin up a new cluster, which you could do very quickly with Kubernetes, but you haven't made the data available to that Kubernetes based application so that they can actually run. And a lot of CIOs and a lot of application development leaders and a lot of business people are now starting to think about that. How do I make sure the data is where it needs to be so that the applications run when they need to run? >> That's a great point, and going back to your kind of comment around the cloud and taking advantage of cloud architectures, one of the things we have observed is organizations looking at cloud in terms of scalable elasticity and reducing costs, dated lift and shift of applications, and not all applications can be taking advantage of cloud elasticity when you do that. Most of these applications are created for the existing on premise fixed architectures, so they are not designed to take advantage of that. And we are seeing a shift now, and the shift is around instead of trying to kind of lift and shift the existing applications, one, for new applications, let me try to adopt the technology assets, like you mentioned Kubernetes, that I can stay vendor agnostic for cloud vendors, but, more importantly, let me try to have some best practices in organization that new applications can be created to take advantage of the elasticity, even though they may not be running in the cloud yet. So some organizations refer to this as cloud native, cloud first, some different terms. And make the data, because the core asset here is always the data, make the data available, instead of going after the applications, make the data from these existing on premise and different platforms available for cloud. We are definitely seeing that shift. >> Yeah, and make sure and assure that that data is high quality, carries the policies, carries the governance, doesn't break the security models, all those other things. >> That is a big difference between how actual organizations ran into their Hadoop data lake implementations versus the cloud architectures now, because when initial Hadoop data lake implementations happened, it was dump all the data. And then, oh, I have to deal with the data quality now. >> No, it was also, oh, those Mainframe people just would, they're so difficult to work with, meanwhile, you're still closing the books on a monthly basis, on a quarterly basis, you're not losing orders, your customers aren't calling you on the phone angry, and that, at the end of the day, is what business has to do. You have to be able to extend what you can currently do with a digital business approach, and if you can replace certain elements of it, okay. But you can't end up with less functionality as you move forward into the cloud. >> Absolutely, and it's not just Mainframe, it's IBM i, it's the Oracle, it's the teradata, it's the DTSA, it's growing rapidly in terms of the complexity of that data infrastructure. And for cloud, we are seeing now a lot of pilots are happening with the cloud data warehouses, and trying to see if the cloud data warehouses can accommodate some of these hybrid deployments, and also we are seeing there's more focus, not after the fact, but more focus on data quality from day one. How am I going to insure that I'm delivering trusted data and populating the cloud data stores, or delivering trusted data to micro services in the cloud. There is a greater focus for both governance and quality. >> So, high quality data movement that leads to high quality data delivery in ways that the business can be certain that whatever derivative of work is done, remains high quality. >> Absolutely. >> Tendü Yogurtçu, thank you very much for being once again on theCube, it's always great to have you here. >> Thank you, Peter, it's wonderful to be here. >> Tendü Yogurtçu is the CTO of Syncsort, and, once again, I want to thank you very much for participating in this cloud, or this Cube Conversation, cloud on the mind, this Cube Conversation. Until next time. (upbeat music)
SUMMARY :
and the likelihood of success with How are you doing? and we still have the 84 of Fortune 100 in the journey to more advanced analytics, data is the core, and take advantage And I'm not referring to that as legacy because the traditional we know what they do, making sure the policies and the security and the privacy and elements, the metadata, if you will, and preserve the policies around each of these data assets and a lot of folks in the cloud that have been have experience of crossing that divide. for analytics, it's also making sure that the data because a lot of these tools involve into the quality of the outcome of these tools And in mission learning, the effect of bad data Which is to say that if you load bad data And the third challenge is that as you are doing it, at the speed the business is going to so that the applications run when they need to run? And make the data, because the core asset here carries the governance, doesn't break the security models, the cloud architectures now, because when and that, at the end of the day, it's the Oracle, it's the teradata, it's the DTSA, the business can be certain that whatever once again on theCube, it's always great to have you here. Tendü Yogurtçu is the CTO of Syncsort,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Peter Burris | PERSON | 0.99+ |
Peter | PERSON | 0.99+ |
2019 | DATE | 0.99+ |
Syncsort | ORGANIZATION | 0.99+ |
84 | QUANTITY | 0.99+ |
second challenge | QUANTITY | 0.99+ |
twenty terabyte | QUANTITY | 0.99+ |
two sides | QUANTITY | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
both sides | QUANTITY | 0.99+ |
each side | QUANTITY | 0.99+ |
one challenge | QUANTITY | 0.99+ |
third challenge | QUANTITY | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
1919 | DATE | 0.99+ |
both | QUANTITY | 0.99+ |
ten years ago | DATE | 0.99+ |
ten percent | QUANTITY | 0.98+ |
Kubernetes | TITLE | 0.98+ |
second one | QUANTITY | 0.98+ |
twenty years ago | DATE | 0.98+ |
Wikibon theCube | ORGANIZATION | 0.98+ |
over 100 countries | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
7,000 plus customers | QUANTITY | 0.97+ |
Mainframe | ORGANIZATION | 0.97+ |
one | QUANTITY | 0.94+ |
Hadoop | TITLE | 0.92+ |
five | DATE | 0.9+ |
each | QUANTITY | 0.9+ |
Tendü Yogurtçu | ORGANIZATION | 0.9+ |
first | QUANTITY | 0.89+ |
Tendü | ORGANIZATION | 0.88+ |
Kubernetes | ORGANIZATION | 0.87+ |
wave | EVENT | 0.86+ |
last three years | DATE | 0.84+ |
next | EVENT | 0.77+ |
One | QUANTITY | 0.74+ |
Conversation | EVENT | 0.72+ |
day one | QUANTITY | 0.71+ |
IBM i | ORGANIZATION | 0.7+ |
Cube | ORGANIZATION | 0.67+ |
Yogurtçu | PERSON | 0.62+ |
100 | QUANTITY | 0.52+ |
Istio | TITLE | 0.52+ |
Fortune | ORGANIZATION | 0.52+ |
theCube | ORGANIZATION | 0.51+ |
SQL | TITLE | 0.5+ |
DTSA | ORGANIZATION | 0.48+ |
Cube | COMMERCIAL_ITEM | 0.46+ |
Tendü Yogurtçu, Syncsort | BigData NYC 2017
>> Announcer: Live from midtown Manhattan, it's theCUBE, covering BigData New York City 2017, brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Hello everyone, welcome back to theCUBE's special BigData NYC coverage of theCUBE here in Manhattan in New York City, we're in Hell's Kitchen. I'm John Furrier, with my cohost Jim Kobielus, whose Wikibon analyst for BigData. In conjunction with Strata Data going on right around the corner, this is our annual event where we break down the big data, the AI, the cloud, all the goodness of what's going on in big data. Our next guest is Tendu Yogurtcu who's the Chief Technology Officer at Syncsort. Great to see you again, CUBE alumni, been on multiple times. Always great to have you on, get the perspective, a CTO perspective and the Syncsort update, so good to see you. >> Good seeing you John and Jim. It's a pleasure being here too. Again the pulse of big data is in New York, and it's a great week with a lot of happening. >> I always borrow the quote from Pat Gelsinger, who's the CEO of VMware, he said on theCUBE in I think 2011, before he joined VMware as CEO he was at EMC. He said if you're not out in front of that next wave, you're driftwood. And the key to being successful is to ride the waves, and the big waves are coming in now with AI, certainly big data has been rising tide for its own bubble but now the aperture of the scale of data's larger, Syncsort has been riding the wave with us, we've been having you guys on multiple times. And it was important to the mainframe in the early days, but now Syncsort just keeps on adding more and more capabilities, and you're riding the wave, the big wave, the big data wave. What's the update now with you guys, where are you guys now in context of today's emerging data landscape? >> Absolutely. As organizations progress with their modern data architectures and building the next generation analytics platforms, leveraging machine learning, leveraging cloud elasticity, we have observed that data quality and data governance have become more critical than ever. Couple of years we have been seeing this trend, I would like to create a data lake, data as a service, and enable bigger insights from the data, and this year, really every enterprise is trying to have that trusted data set created, because data lakes are turning into data swamps, as Dave Vellante refers often (John laughs) and collection of this diverse data sets, whether it's mainframe, whether it's messaging queues, whether it's relational data warehouse environments is challenging the customers, and we can take one simple use case like Customer 360, which we have been talking for decades now, right? Yet still it's a complex problem. Everybody is trying to get that trusted single view of their customers so that they can serve the customer needs in a better way, offer better solutions and products to customers, get better insights about the customer behavior, whether leveraging deep learning, machine learning, et cetera. However, in order to do that, the data has to be in a clean, trusted, valid format, and every business is going global. You have data sets coming from Asia, from Europe, from Latin America, and many different places, in different formats and it's becoming challenge. We acquired Trillium Software in December 2016, and our vision was really to bring that world leader enterprise grade data quality into the big data environments. So last week we announced our Trillium Quality for Big Data product. This product brings unmatched capabilities of data validation, cleansing, enrichment, and matching, fuzzy matching to the data lake. We are also leveraging our Intelligent eXecution engine that we developed for data integration product, the MX8. So we are enabling the organizations to take this data quality offering, whether it's in Hadoop, MapReduce or Apache Spark, whichever computer framework it's going to be in the future. So we are very excited about that now. >> Congratulations, you mentioned the data lake being a swamp, that Dave Vellante referred to. It's interesting, because how does it become a swamp if it's a silo, right? We've seen data silos being antithesis to governance, it challenges, certainly IoT. Then you've got the complication of geopolitical borders, you mentioned that earlier. So you still got to integrate the data, you need data quality, which has been around for a while but now it's more complex. What specifically about the cleansing and the quality of the data that's more important now in the landscape now? Is it those factors, are that the drivers of the challenges today and what's the opportunity for customers, how do they figure this out? >> Complexity is because of many different factors. Some of it from being global. Every business is trying to have global presence, and the data is originating from web, from mobile, from many different data sets, and if we just take a simple address, these address formats are different in every single country. Trillium Quality for Big Data, we support over 150 postal data from different countries, and data enrichment with this data. So it becomes really complex, because you have to deal with different types of data from different countries, and the matching also becomes very difficult, whether it's John Furrier, J Furrier, John Currier, you have to be >> All my handles on Twitter, knowing that's about. (Tendu laughs) >> All of the handles you have. Every business is trying to have a better targeting in terms of offering product and understanding the single and one and only John Furrier as a customer. That creates a complexity, and any data management and data processing challenge, the variety of data and the speed that data is really being populated is higher than ever we have observed. >> Hold on Jim, I want to get Jim involved in this one conversation, 'cause I want to just make sure those guys can get settled in on, and adjust your microphone there. Jim, she's bringing up a good point, I want you to weigh in just to kind of add to the conversation and take it in the direction of where the automation's happening. If you look at what Tendu's saying as to complexity is going to have an opportunity in software. Machine learning, root-level cleanliness can be automated, because Facebook and others have shown that you can apply machine learning and techniques to the volume of data. No human can get at all the nuances. How is that impacting the data platforms and some of the tooling out there, in your opinion? >> Yeah well, much of the issue, one of the core issues is where do you place the data matching and data cleansing logic or execution in this distributed infrastructure. At the source, in the cloud, at the consumer level in terms of rolling up the disparate versions of data into a common view. So by acquiring a very strong, well-established reputable brand in data cleansing, Trillium, as Syncsort has done, a great service to your portfolio, to your customers. You know, Trillium is well known for offering lots of options in terms of where to configure the logic, where to deploy it within distributed hybrid architectures. Give us a sense for going forward the range of options you're going to be providing with for customers on where to place the cleansing and matching logic. How you're going to support, Syncsort, a flexible workflows in terms of curation of the data and so forth, because the curation cycle for data is critically important, the stewardship. So how do you plan to address all of that going forward in your product portfolio, Tendu? >> Thank you for asking the question, Jim, because that's exactly the challenge that we hear from our customers, especially from larger enterprise and financial services, banking and insurance. So our plan is our actually next upcoming release end of the year, is targeting very flexible deployment. Flexible deployment in the sense that you might be creating, when you understand the data and create the business rules and said what kind of matching and enrichment that you'll be performing on the data sets, you can actually have those business rules executed in the source of the data or in the data lake or switch between the source and the enterprise data lake that you are creating. That flexibility is what we are targeting, that's one area. On the data creation side, we see these percentages, 80% of data stewards' time is spent on data prep, data creation and data cleansing, and it is actually really a very high percentage. From our customers we see this still being a challenge. One area that we started investing is using the machine learning to understand the data, and using that discovery of the data capabilities we currently have to make recommendations what those business rules can be, or what kind of data validation and cleansing and matching might be required. So that's an area that we will be investing. >> Are you contemplating in terms of incorporating in your product portfolio, using machine learning to drive a sort of, the term I like to use is recommendation engine, that presents recommendations to the data stewards, human beings, about different data schemas or different ways of matching the data, different ways of, the optimal way of reconciling different versions of customer data. So is there going to be like a recommendation engine of that sort >> It's going to be >> In line with your >> That's what our plan currently recommendations so the users can opt to apply or not, or to modify them, because sometimes when you go too far with automation you still need some human intervention in making these decisions because you might be operating on a sample of data versus the full data set, and you may actually have to infuse some human understanding and insight as well. So our plan is to make as a recommendation in the first phase at least, that's what we are planning. And when we look at the portfolio of the products and our CEO Josh is actually today was also in theCUBE, part of Splunk .conf. We have acquisitions happening, we have organic innovation that's happening, and we really try to stay focused in terms of how do we create more value from your data, and how do we increase the business serviceability, whether it's with our Ironstream product, we made an announcement this week, Ironstream transaction tracing to create more visibility to application performance and more visibility to IT operations, for example when you make a payment with your mobile, you might be having problem and you want to be able to trace back to the back end, which is usually a legacy mainframe environment, or whether you are populating the data lake and you want to keep the data in sync and fresh with the data source, and apply the change as a CDC, or whether you are making that data from raw data set to more consumable data by creating the trusted, high quality data set. We are very much focused on creating more value and bigger insights out of the data sets. >> And Josh'll be on tomorrow, so folks watching, we're going to get the business perspective. I have some pointed questions I'm going to ask him, but I'll take one of the questions I was going to ask him but I want to get your response from a technical perspective as CTO. As Syncsort continues your journey, you keep on adding more and more things, it's been quite impressive, you guys done a great job, >> Tendu: Thank you. >> We enjoy covering the success there, watching you guys really evolve. What is the value proposition for Syncsort today, technically? If you go in, talk to a customer, and prospective new customer, why Syncsort, what's the enabling value that you're providing under the hood, technically for customers? >> We are enabling our customers to access and integrate data sets in a trusted manner. So we are ultimately liberating the data from all of the enterprise data stores, and making that data consumable in a trusted manner. And everything we provide in that data management stack, is about making data available, making data accessible and integrated the modern data architecture, bridging the gap between those legacy environments and the modern data architecture. And it becomes really a big challenge because this is a cross-platform play. It is not a single environment that enterprises are working with. Hadoop is real now, right? Hadoop is in the center of data warehouse architecture, and whether it's on-premise or in the cloud, there is also a big trend about the cloud. >> And certainly batch, they own the batch thing. >> Yeah, and as part of that, it becomes very important to be able to leverage the existing data assets in the enterprise, and that requires an understanding of the legacy data stores, and existing infrastructure, and existing data warehouse attributes. >> John: And you guys say you provide that. >> We provide that and that's our baby and provide that in enterprise grade manner. >> Hold on Jim, one second, just let her finish the thought. Okay, so given that, okay, cool you got that out there. What's the problem that you're solving for customers today? What's the big problem in the enterprise and in the data world today that you address? >> I want to have a single view of my data, and whether that data is originating on the mobile or that data is originating on the mainframe, or in the legacy data warehouse, and we provide that single view in a trusted manner. >> When you mentioned Ironstream, that reminded me that one of the core things that we're seeing in Wikibon in terms of, IT operations is increasingly being automated through AI, some call it AI ops and whatnot, we're going deeper on the research there. Ironstream, by bringing mainframe and transactional data, like the use case you brought in was IT operations data, into a data lake alongside machine data that you might source from the internet of things and so forth. Seem to me that that's a great enabler potentially for Syncsort if it wished to play your solutions or position them into IT operations as an enabler, leveraging your machine learning investments to build more automated anomaly detection and remediation into your capabilities. What are your thoughts? Is that where you're going or do you see it as an opportunity, AI for IT ops, for Syncsort going forward? >> Absolutely. We target use cases around IT operations and application performance. We integrate with Splunk ITSI, and we also provide this data available in the big data analytics platforms. So those are really application performance and IT operations are the main uses cases we target, and as part of the advanced analytics platform, for example, we can correlate that data set with other machine data that's originating in other platforms in the enterprise. Nobody's looking at what's happening on mainframe or what's happening in my Hadoop cluster or what's happening on my VMware environment, right. They want to correlate the data that's closed platform, and that's one of the biggest values we bring, whether it's on the machine data, or on the application data. >> Yeah, that's quite a differentiator for you. >> Tendu, thanks for coming on theCUBE, great to see you. Congratulations on your success. Thanks for sharing. >> Thank you. >> Okay, CUBE coverage here in BigData NYC, exclusive coverage of our event, BigData NYC, in conjunction with Strata Hadoop right around the corner. This is our annual event for SiliconANGLE, and theCUBE and Wikibon. I'm John Furrier, with Jim Kobielus, who's our analyst at Wikibon on big data. Peter Burris has been on theCUBE, he's here as well. Big three days of wall-to-wall coverage on what's happening in the data world. This is theCUBE, thanks for watching, be right back with more after this short break.
SUMMARY :
brought to you by SiliconANGLE Media all the goodness of what's going on in big data. and it's a great week with a lot of happening. and the big waves are coming in now with AI, and enable bigger insights from the data, of the data that's more important now and the data is originating from web, from mobile, All my handles on Twitter, All of the handles you have. and some of the tooling out there, in your opinion? and so forth, because the curation cycle for data and create the business rules and said the term I like to use is recommendation engine, and bigger insights out of the data sets. but I'll take one of the questions I was going to ask him What is the value proposition for Syncsort today, and integrated the modern data architecture, in the enterprise, and that requires an understanding and provide that in enterprise grade manner. and in the data world today that you address? or that data is originating on the mainframe, like the use case you brought in was IT operations data, and that's one of the biggest values we bring, Tendu, thanks for coming on theCUBE, great to see you. and theCUBE and Wikibon.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jim Kobielus | PERSON | 0.99+ |
Jim | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Pat Gelsinger | PERSON | 0.99+ |
Asia | LOCATION | 0.99+ |
Europe | LOCATION | 0.99+ |
Peter Burris | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
December 2016 | DATE | 0.99+ |
VMware | ORGANIZATION | 0.99+ |
New York | LOCATION | 0.99+ |
Tendu Yogurtcu | PERSON | 0.99+ |
Manhattan | LOCATION | 0.99+ |
Latin America | LOCATION | 0.99+ |
Josh | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Syncsort | ORGANIZATION | 0.99+ |
2011 | DATE | 0.99+ |
Ironstream | ORGANIZATION | 0.99+ |
80% | QUANTITY | 0.99+ |
tomorrow | DATE | 0.99+ |
EMC | ORGANIZATION | 0.99+ |
last week | DATE | 0.99+ |
Tendu | PERSON | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
one second | QUANTITY | 0.99+ |
over 150 postal data | QUANTITY | 0.99+ |
BigData | ORGANIZATION | 0.99+ |
Wikibon | ORGANIZATION | 0.99+ |
one | QUANTITY | 0.98+ |
Trillium Software | ORGANIZATION | 0.98+ |
New York City | LOCATION | 0.98+ |
Trillium | ORGANIZATION | 0.98+ |
single | QUANTITY | 0.98+ |
John Currier | PERSON | 0.98+ |
first phase | QUANTITY | 0.98+ |
theCUBE | ORGANIZATION | 0.97+ |
this week | DATE | 0.96+ |
Tendü Yogurtçu | PERSON | 0.96+ |
this year | DATE | 0.96+ |
ORGANIZATION | 0.95+ | |
Couple of years | QUANTITY | 0.95+ |
today | DATE | 0.95+ |
single view | QUANTITY | 0.94+ |
CUBE | ORGANIZATION | 0.94+ |
NYC | LOCATION | 0.94+ |
one area | QUANTITY | 0.93+ |
J Furrier | PERSON | 0.92+ |
Hadoop | TITLE | 0.91+ |
2017 | EVENT | 0.9+ |
three days | QUANTITY | 0.89+ |
single environment | QUANTITY | 0.88+ |
One area | QUANTITY | 0.87+ |
one conversation | QUANTITY | 0.86+ |
Apache | ORGANIZATION | 0.85+ |
big wave | EVENT | 0.84+ |
one simple use case | QUANTITY | 0.82+ |
Scott Gnau, Hortonworks & Tendü Yogurtçu, Syncsort - DataWorks Summit 2017
>> Man's Voiceover: Live, from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017, brought to you by Hortonworks. (upbeat music) >> Welcome back to theCUBE, we are live at Day One of the DataWorks Summit, we've had a great day here, I'm surprised that we still have our voices left. I'm Lisa Martin, with my co-host George Gilbert. We have been talking with great innovators today across this great community, folks from Hortonworks, of course, IBM, partners, now I'd like to welcome back to theCube, who was here this morning in the green shoes, the CTO of Hortonworks, Scott Gnau, welcome back Scott! >> Great to be here yet again. >> Yet again! And we have another CTO, we've got CTO corner over here, with CUBE Alumni and the CTO of SyncSort, Tendu Yogurtcu Welcome back to theCUBE both of you >> Pleasure to be here, thank you. >> So, guys, what's new with the partnership? I know that syncsort, you have 87%, or 87 of the Fortune 100 companies are customers. Scott, 60 of the Fortune 100 companies are customers of Hortonworks. Talk to us about the partnership that you have with syncsort, what's new, what's going on there? >> You know there's always something new in our partnership. We launched our partnership, what a year and a half ago or so? >> Yes. And it was really built on the foundation of helping our customers get time to value very quickly, right and leveraging our mutual strengths. And we've been back on theCUBE a couple of times and we continue to have new things to talk about whether it be new customer successes or new feature functionalities or new integration of our technology. And so it's not just something that's static and sitting still, but it's a partnership that was had a great foundation in value and continues to grow. And, ya know, with some of the latest moves that I'm sure Tendu will bring us up to speed on that Syncsort has made, customers who have jumped on the bandwagon with us together are able to get much more benefit than originally they even intended. >> Let me talk about some of the things actually happening with Syncsort and with the partnership. Thank you Scott. And Trillium acquisition has been transformative for us really. We have achieved quite a lot within the last six months. Delivering joint solutions between our data integration, DMX-h, and Trillium data quality and profiling portfolio and that was kind of our first step very much focused on the data governance. We are going to have data quality for Data Lake product available later this year and this week actually we will be announcing our partnership with Collibra data governance platform basically making business rules and technical meta data available through the Collibra dashboards for data scientists. And in terms of our joint solution and joint offering for data warehouse optimization and the bundle that we launched early February of this year that's in production, a large complex production deployment's already happened. Our customers access all their data all enterprise data including legacy data, warehouse, new data sources as well as legacy main frame in the data lake so we will be announcing again in a week or so change in the capture capabilities from legacy data storage into Hadoop keeping that data fresh and giving more choices to our customers in terms of populating the data lake as well as use cases like archiving data into cloud. >> Tendu, let me try and unpack what was a very dense, in a good way, lot of content. Sticking my foot in my mouth every 30 seconds (laughter) >> Scott Voiceover: I think he called you dense. (laughter) >> So help us visualize a scenario where you have maybe DMX-h bringing data in you might have changed it at capture coming from a live data base >> Tendu Voiceover: Yes. and you've got the data quality at work as well. Help us picture how much faster and higher fidelity the data flow might be relative to >> Sure, absolutely. So, our bundle and our joint solution with Hortonworks really focuses on business use cases. And one of those use cases is enterprise data warehouse optimization where we make all data, all enterprise data accessible in the data lake. Now, if you are an insurance company managing claims or you are building a data as a service, Hadoop is a service architecture, there are multiple ways that you can keep that data fresh in the data lake. And you can have changed it at capture by basically taking snap-shots of the data and comparing in the data lake which is a viable method of doing it. But, as the data volumes are growing and the real time analytics requirements of the business are growing we recognize our customers are also looking for alternative ways that they can actually capture the change in real time when the change is just like less than 10% of the data, original data set and keep the data fresh in the data lake. So that enables faster analytics, real time analytics, as well as in the case that if you are doing something from on-premise to the cloud or archiving data, it also saves on the resources like the network bandwidth and overall resource efficiency. Now, while we are doing this, obviously we are accessing the data and the data goes through our processing engines. What Trillium brings to the table is the unmatched capabilities that are on profiling that data, getting better understanding of that data. So we will be focused on delivering products around that because as we understand data we can also help our customers to create the business rules, to cleanse that data, and preserve the fidelity of the data and integrity of the data. >> So, with the change data capture it sounds like near real time, you're capturing changes in near real time, could that serve as a streaming solution that then is also populating the history as well? >> Absolutely. We can go through streaming or message cues. We also offer more efficient proprietary ways of streaming the data to the Hadoop. >> So the, I assume the message cues refers to, probably Kafka and then your own optimized solution for sort of maximum performance, lowest latency. >> Yes, we can do either true Kafka cues which is very efficient as well. We can also go through proprietary methods. >> So, Scott, help us understand then now the governance capabilities that, um I'm having a senior moment (laughter) I'm getting too many of these! (laughter) Help us understand the governance capabilities that Syncsort's adding to the, sort of mix with the data warehouse optimization package and how it relates to what you're doing. >> Yeah, right. So what we talked about even again this morning, right the whole notion of the value of open squared, right open source and open ecosystem. And I think this is clearly an open ecosystem kind of play. So we've done a lot of work since we initially launched the partnership and through the different product releases where our engineering teams and the Syncsort teams have done some very good low-level integration of our mutual technologies so that the Syncsort tool can exploit those horizontal core services like Yarn for multi tendency and workload management and of course Atlas for data governance. So as then the Syncsort team adds feature functionality on the outside of that tool that simply accrete's to the benefit of what we've built together. And so that's why I say customers who started down this journey with us together are now going to get the benefit of additional options from that ecosystem that they can plug in additional feature functionality. And at the same time we're really thrilled because, and we've talked about this on many times right, the whole notion of governance and meta data management in the big data space is a big deal. And so the fact that we're able to come to the table with an open source solution to create common meta data tagging that then gets utilized by multiple different applications I think creates extreme value for the industry and frankly for our customers because now, regardless of the application they choose, or the applications that they choose, they can at least have that common trusted infrastructure where all of that information is tagged and it stays with the data through the data's life cycle. >> So you're partnership sounds very very symbiotic, that there's changes made on one side that reflect the other. Give us an example of where is your common customer, and this might not be, well, they're all over the place, who has got an enterprise data warehouse, are you finding more customers that are looking to modernize this? That have multi-cloud, core edge, IOT devices that's a pretty distributed environment versus customers that might be still more on prem? What's kind of the mix there? >> Can I start and then I will let you build on. I want to add something to what Scott said earlier. Atlas is a very important integration point for us and in terms of the partnership that you mentioned the relation, I think one of the strengths of our partnership is at many different levels it's not just executive level, it's cross functional and also from very close field teams, marketing teams and engineering field teams working together And in terms of our customers, it's really organizations are trying to move toward modern data architecture. And as they are trying to build the modern data architecture there are the data in motion piece I will let Scott talk about, data in rest piece and as we have so much data coming from cloud, originating through mobile and web in the enterprise, especially the Fortune 500, that we talk, Fortune 100 we talked about, insurance, health care, Talco financial services and banking has a lot of legacy data stores. So our, really joint solution and the couple of first use cases, business use cases we targeted were around that. How do we enable these data stores and data in the modern data architecture? I will let Scott >> Yeah, I agree And so certainly we have a lot of customers already who are joint customers and so they can get the value of the partnership kind of cuz they've already made the right decision, right. I also think, though, there's a lot of green field opportunity for us because there are hundreds if not thousands of customers out there who have legacy data systems where their data is kind of locked away. And by the way, it's not to say the systems aren't functioning and doing a good job, they are. They're running business facing applications and all of that's really great, but that is a source of raw material that belongs also in the data lake, right, and can be, can certainly enhance the value of all the other data that's being built there. And so the value, frankly, of our partnership is really creating that easy bridge to kind of unlock that data from those legacy systems and get it in the data lake and then from there, the sky's the limit, right. Is it reference data that can then be used for consistency of response when you're joining it to social data and web data? Frankly, is it an online archive, and optimization of the overall data fabric and off loading some of the historical data that may not even be used in legacy systems and having a place to put it where it actually can be accessed. And so, there are a lot of great use cases. You're right, it's a very symbiotic relationship. I think there's only upside because we really do complement each other and there is a distinct value proposition not just for our existing customers but frankly for a large set of customers out there that have, kind of, the data locked away. >> So, how would you see do you see the data warehouse optimization sort of solution set continuing to expand its functional footprint? What are some things to keep pushing out the edge conditions, the realm of possibilities? >> Some of the areas that we are jointly focused on is we are liberating that data from the enterprise data warehouse or legacy architectures. Through the syncs or DMX-h we actually understand the path that data travel from, the meta data is something that we can now integrate into Atlas and publish into Atlas and have Atlas as the open data governance solution. So that's an area that definitely we see an opportunity to grow and also strengthen that joint solution. >> Sure, I mean extended provenance is kind of what you're describing and that's a big deal when you think about some of these legacy systems where frankly 90% of the costs of implementing them originally was actually building out those business rules and that meta data. And so being able to preserve that and bring it over into a common or an open platform is a really big deal. I'd say inside of the platform of course as we continue to create new performance advantages in, ya know, the latest releases of Hive as an example where we can get low latency query response times there's a whole new class of work loads that now is appropriate to move into this platform and you'll see us continue to move along those lines as we advance the technology from the open community. >> Well, congratulations on continuing this great, symbiotic as we said, partnership. It sounds like it's incredible strong on the technology side, on the strategic side, on the GTM side. I'd loved how you said liberating data so that companies can really unlock its transformational value. We want to thank both of you for Scott coming back on theCUBE >> Thank you. twice in one day. >> Twice in one day. Tendu, thank you as well >> Thank you. for coming back to theCUBE. >> Always a pleasure. For both of our CTO's that have joined us from Hortonworks and Syncsort and my co-host George Gilbert, I am Lisa Martin, you've been watching theCUBE live from day one of the DataWorks summit. Stick around, we've got great guests coming up (upbeat music)
SUMMARY :
in the heart of Silicon Valley, the CTO of Hortonworks, Scott Gnau, Pleasure to be here, Scott, 60 of the Fortune 100 companies We launched our partnership, what and we continue to have new things and the bundle that we launched early February of this year what was a very dense, in a good way, lot of content. Scott Voiceover: I think he called you dense. and higher fidelity the data flow might be relative to and keep the data fresh in the data lake. We can go through streaming or message cues. So the, I assume the message cues refers to, Yes, we can do either true Kafka cues and how it relates to what you're doing. And so the fact that we're able that reflect the other. and in terms of the partnership and get it in the data lake Some of the areas that we are jointly focused on frankly 90% of the costs of implementing them originally on the strategic side, on the GTM side. Thank you. Tendu, thank you as well for coming back to theCUBE. For both of our CTO's that have joined us
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Scott | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
hundreds | QUANTITY | 0.99+ |
90% | QUANTITY | 0.99+ |
Twice | QUANTITY | 0.99+ |
Scott Gnau | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
twice | QUANTITY | 0.99+ |
San Jose | LOCATION | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
Trillium | ORGANIZATION | 0.99+ |
Syncsort | ORGANIZATION | 0.99+ |
both | QUANTITY | 0.99+ |
60 | QUANTITY | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
Data Lake | ORGANIZATION | 0.99+ |
less than 10% | QUANTITY | 0.99+ |
this week | DATE | 0.99+ |
one day | QUANTITY | 0.99+ |
Tendu | ORGANIZATION | 0.99+ |
Collibra | ORGANIZATION | 0.99+ |
87% | QUANTITY | 0.99+ |
first step | QUANTITY | 0.99+ |
thousands of customers | QUANTITY | 0.99+ |
Syncsort | TITLE | 0.98+ |
87 | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
Atlas | TITLE | 0.98+ |
later this year | DATE | 0.98+ |
SyncSort | ORGANIZATION | 0.98+ |
DataWorks Summit | EVENT | 0.98+ |
a year and a half ago | DATE | 0.97+ |
Tendu | PERSON | 0.97+ |
DataWorks Summit 2017 | EVENT | 0.97+ |
Day One | QUANTITY | 0.97+ |
Fortune 500 | ORGANIZATION | 0.96+ |
a week | QUANTITY | 0.96+ |
one side | QUANTITY | 0.96+ |
Fortune 100 | ORGANIZATION | 0.96+ |
Scott Voiceover | PERSON | 0.95+ |
Hadoop | TITLE | 0.93+ |
Atlas | ORGANIZATION | 0.93+ |
theCUBE | ORGANIZATION | 0.92+ |
this morning | DATE | 0.92+ |
CTO | PERSON | 0.92+ |
day one | QUANTITY | 0.92+ |
couple | QUANTITY | 0.91+ |
last six months | DATE | 0.9+ |
first use cases | QUANTITY | 0.9+ |
early February of this year | DATE | 0.89+ |
theCube | ORGANIZATION | 0.89+ |
CUBE Alumni | ORGANIZATION | 0.87+ |
DataWorks summit | EVENT | 0.86+ |
today | DATE | 0.86+ |
Talco financial services | ORGANIZATION | 0.85+ |
every 30 seconds | QUANTITY | 0.83+ |
Fortune | ORGANIZATION | 0.8+ |
Kafka | PERSON | 0.79+ |
DMX-h | ORGANIZATION | 0.75+ |
data lake | ORGANIZATION | 0.73+ |
Man's Voiceover | TITLE | 0.6+ |
Kafka | TITLE | 0.6+ |
Tendü Yogurtçu | BigData SV 2017
>> Announcer: Live from San Jose, California. It's The Cube, covering Big Data Silicon Valley 2017. (upbeat electronic music) >> California, Silicon Valley, at the heart of the big data world, this is The Cube's coverage of Big Data Silicon Valley in conjunction with Strata Hadoop, well of course we've been here for multiple years, covering Hadoop World for now our eighth year, now that's Strata Hadoop but we do our own event, Big Data SV in New York City and Silicon Valley, SV NYC. I'm John Furrier, my cohost George Gilbert, analyst at Wikibon. Our next guest is Tendü Yogurtçu with Syncsort, general manager of the big data, did I get that right? >> Yes, you got it right. It's always a pleasure to be at The Cube. >> (laughs) I love your name. That's so hard for me to get, but I think I was close enough there. Welcome back. >> Thank you. >> Great to see you. You know, one of the things I'm excited about with Syncsort is we've been following you guys, we talk to you guys every year, and it just seems to be that every year, more and more announcements happen. You guys are unstoppable. You're like what Amazon does, just more and more announcements, but the theme seems to be integration. Give us the latest update. You had an update, you bought Trillium, you got a hit deal with Hortonworks, you got integrated with Spark, you got big news here, what's the news here this year? >> Sure. Thank you for having me. Yes, it's very exciting times at Syncsort and I've probably say that every time I appear because every time it's more exciting than the previous, which is great. We bought Trillium Software and Trillium Software has been leading data quality over a decade in many of the enterprises. It's very complimentary to our data integration, data management portfolio because we are helping our customers to access all of their enterprise data, not just the new emerging sources in the connected devices and mobile and streaming. Also leveraging reference data, my main frame legacy systems and the legacy enterprise data warehouse. While we are doing that, accessing data, data lake is now actually, in some cases, turning into data swamp. That was a term Dave Vellante used a couple of years back in one of the crowd chats and it's becoming real. So, data-- >> Real being the data swamps, data lakes are turning into swamps because they're not being leveraged properly? >> Exactly, exactly. Because it's about also having access to write data, and data quality is very complimentary because dream has had trusted right data, so to enterprise customers in the traditional environments, so now we are looking forward to bring that enterprise trust of the data quality into data lake. In terms of the data integration, data integration has been always very critical to any organization. It's even more critical now that the data is shifting gravity and the amount of data organizations have. What we have been delivering in very large enterprise production environments for the last three years is we are hearing our competitors making announcements in those areas very recently, which is a validation because we are already running in very large production environments. We are offering value by saying "Create your applications for integrating your data," whether it's in the cloud or originating on the cloud or origination on the main frames, whether it's on the legacy data warehouse, you can deploy the same exact application without any recompilations, without any changes on your standalone Windows laptop or in Hadoop MapReduce, or Spark in the cloud. So this design once and deploy anywhere is becoming more and more critical with data, it's originating in many different places and cloud is definitely one of them. Our data warehouse optimization solution with Hortonworks and AtScale, it's a special package to accelerate this adoption. It's basically helping organizations to offload the workload from the existing Teradata or Netezza data warehouse and deploying in Hadoop. We provide a single button to automatically map the metadata, create the metadata in Hive or on Hadoop and also make the data accessible in the new environment and AtScale provides fast BI on top of that. >> Wow, that's amazing. I want to ask you a question, because this is a theme, so I just did a tweetup just now while you were talking saying "the theme this year is cleaning up the data lakes, or data swamps, AKA data lakes. The other theme is integration. Can you just lay out your premise on how enterprises should be looking at integration now because it's the multi-vendor world, it's the multi-cloud world, multi-data type and source with metadata world. How do you advise customers that have the plethora of action coming at them. IOT, you've got cloud, you've got big data, I've got Hadoop here, I got Spark over here, what's the integration formula? >> First thing is identify your business use cases. What's your business's challenge, what's your business goals, and the challenge, because that should be the real driver. We assist in some organizations, they start with the intention "we would like to create a data lake" without having that very clear understanding, what is it that I'm trying to solve with this data lake? Data as a service is really becoming a theme across multiple organizations, whether it's on the enterprise side or on some of the online retail organizations, for example. As part of that data as a service, organizations really need to adopt tools that are going to enable them to take advantage of the technology stack. The technology stack is evolving very rapidly. The skill sets are rare, and skill sets are rare because you need to be kind of making adjustments. Am I hiring Ph.D students who can program Scala in the most optimized way, or should I hire Java developers, or should I hire Python developers, the names of the tools in the stack, Spark one versus Spark two APIs, change. It's really evolving very rapidly. >> It's hard to find Scala developers, I mean, you go outside Silicon Valley. >> Exactly. So you need to be, as an organization, ours advises that you really need to find tools that are going to fit those business use cases and provide a single software environment, that data integration might be happening on premise now, with some of the legacy enterprise data warehouse, and it might happen in a hybrid, on premise and cloud environment in the near future and perhaps completely in the cloud. >> So standard tools, tools that have some standard software behind it, so you don't get stuck in the personnel hiring problem. Some unique domain expertise that's hard to hire. >> Yes, skill set is one problem, the second problem is the fact that the applications needs to be recompiled because the stack is evolving and the APIs are not compatible with the previous version, so that's the maintenance cost to keep up with things, to be able to catch up with the new versions of the stack, that's another area that the tools really help, because you want to be able to develop the application and deploy it anywhere in any complete platform. >> So Tendü, if I hear you properly, what you're saying is integration sounds great on paper, it's important, but there's some hidden costs there, and that is the skill set and then there's the stack recompiling, I'm making sure. Okay, that's awesome. >> The tools help with that. >> Take a step back and zoom out and talk about Syncsort's positioning, because you guys have been changing with the stacks as well, I mean you guys have been doing very well with the announcements, you've been just coming on the market all the time. What is the current value proposition for Syncsort today? >> The current value proposition is really we have organizations to create the next generation modern data architecture by accessing and liberating all enterprise data and delivering that data at the right time and the right quality data. It's liberate, integrate, with integrity. That's our value proposition. How do we do that? We provide that single software environment. You can have batch legacy data and streaming data sources integrated in the same exact environment and it enables you to adapt to Spark 2 or Flink or whichever complete framework is going to help them. That has been our value proposition and it is proven in many production deployments. >> What's interesting to is the way you guys have approached the market. You've locked down the legacy, so you have, we talk about the main frame and well beyond that now, you guys have and understand the legacy, so you kind of lock that down, protect it, make it secure, it's security-wise, but you do that too, but making sure it works because it's still data there, because legacy systems are really critical in the hybrid. >> Main frame expertise and heritage that we have is a critical part of our offering. We will continue to focus on innovation on the main frame side as well as on the distributed. One of the announcements that we made since our last conversation was we have partnership with Compuware and we now bring in more data types about application failures, it's a Band-Aid data to Splunk for operational intelligence. We will continue to also support more delivery types, we have batch delivery, we have streaming delivery, and now replication into Hadoop has been a challenge so our focus is now replication from the B2 on mainframe and ISA on mainframe to Hadoop environments. That's what we will continue to focus on, mainframe, because we have heritage there and it's also part of big enterprise data lake. You cannot make sense of the customer data that you are getting from mobile if you don't reference the critical data sets that are on the mainframe. With the Trillium acquisition, it's very exciting because now we are at a kind of pivotal point in the market, we can bring that data validation, cleansing, and matching superior capabilities we have to the big data environments. One of the things-- >> So when you get in low latency, you guys do the whole low latency thing too? You bring it in fast? >> Yes, we bring it, that's our current value proposition and as we are accessing this data and integrating this part of the data lake, now we have capabilities with Trillium that we can profile that data, get statistics and start using machine learning to automate the data steward's job. Data stewards are still spending 75% of their time trying to clean the data. So if we can-- >> Lot of manual work labor there, and modeling too, by the way, the modeling and just the cleaning, cleaning and modeling kind of go hand in hand. >> Exactly. If we can automate any of these steps to drive the business rules automatically and provide right data on the data lake, that would be very valuable. This is what we are hearing from our customers as well. >> We've heard probably five years about the data lake as the center of gravity of big data, but we're hearing at least a bifurcation, maybe more, where now we want to take that data and apply it, operationalize it in making decisions with machine learning, predictive analytics, but at the same time we're trying to square this strange circle of data, the data lake where you didn't say up front what you wanted it to look like but now we want ever richer metadata to make sense out of it, a layer that you're putting on it, the data prep layer, and others are trying to put different metadata on top of it. What do you see that metadata layer looking like over the next three to five years? >> The governance is a very key topic and social organizations who are ahead of the game in the big data and who already established that data lake, data governance and even analytics governance becomes important. What we are delivering here with Trillium, we will have generally available by end of Q1. We are basically bringing business rules to the data. Instead of bringing data to business rules, we are taking the business rules and deploying them where the data exists. That will be key because of the data gravity you mentioned because the data might be in the Hadoop environment, there might be in a, like I said, enterprise data warehouse, and it might be originating in the cloud, and you don't want to move the data to the business rules. You want to move the business rules to where the data exists. Cloud is an area that we see more and more of our customers are moving forward. Two main use cases around our integration is one, because the data is originating in cloud, and the second one is archiving data to cloud, and we announced actually, tighter integration with cloud with our director earlier this week for this event, and that we have been in cloud deployments and we have actually an offering, an elastic MapReduce already and on AC too for couple of years now, and also on the Google cloud storage, but this announcement is primarily making deployments even easier by leveraging cloud director's elasticity for increasing and reducing the deployment. Now our customers will also take advantage of integration jobs from that elasticity. >> Tendü, it's great to have you on The Cube because you have an engineering mind but you're also now general manager of the business, and your business is changing. You're in the center of the action, so I want to get your expertise and insight into enterprise readiness concept and we saw last week at Google Cloud 2017, you know, Google going down the path of being enterprise ready, or taking steps, I don't think they're fully ready, but they're certainly serious about the cloud on the enterprise, and that's clear from Diane Green, who knows the enterprise. It sparked the conversation last week, around what does enterprise readiness mean for cloud players, because there's so many details in between the lines, if you will, of what products are, that integration, certification, SLAs. What's your take on the notion of cloud readiness? Vizaviz, Google and others that are bringing cloud compute, a lot of resources, with an IOT market that's now booming, big data evolving very, very fast, lot of realtime, lot of analytics, lot of innovation happening. What's the enterprise picture look like from a readiness standpoint? How do these guys get ready? >> From a big picture, for enterprise there are couple of things that these cannot be afterthought. Security, metadata lineage is part of data governance, and being able to have flexibility in the architecture, that they will not be kind of recreating the jobs that they might have all the way to deployed and on premise environments, right? To be able to have the same application running from on premise to cloud will be critical because it gives flexibility for adaptation in the enterprise. Enterprise may have some MapReduce jobs running on premise with the Spark jobs on cloud because they are really doing some predictive analytics, graph analytics on those, they want to be able to kind of have that flexible architecture where we hear this concept of a hybrid environment. You don't want to be deploying a completely different product in the cloud and redo your jobs. That flexibility of architecture, flexibility-- >> So having different code bases in the cloud versus on prem requires two jobs to do the same thing. >> Two jobs for maintaining, two jobs for standardizing, and two different skill sets of people potentially. So security, governance, and being able to access easily and have applications move in between environments will be very critical. >> So seamless integration between clouds and on prem first, and then potentially multi-cloud. That's table stakes in your mind. >> They are absolutely table stakes. A lot of vendors are trying to focus on that, definitely Hadoop vendors are also focusing on that. Also, one of the things, like when people talk about governance, the requirements are changing. We have been talking about single view and customer 360 for a while now, right? Do we have it right yet? The enrichment is becoming a key. With Trillium we made the recent announcement, the precise enriching, it's not just the address that you want to deliver and make sure that address should be correct, it's also the email address, and the phone number, is it mobile number, is it landline? It's enriched data sets that we have to be really dealing, and there's a lot of opportunity, and we are really excited because data quality, discovery and integration are coming together and we have a good-- >> Well Tendü, thank you for joining us, and congratulations as Syncsort broadens their scope to being a modern data platform solution provider for companies, congratulations. >> Thank you. >> Thanks for coming. >> Thank you for having me. >> This is The Cube here live in Silicon Valley and San Jose, I'm John Furrier, George Gilbert, you're watching our coverage of Big Data Silicon Valley in conjunction with Strata Hadoop. This is Silicon Angles, The Cube, we'll be right back with more live coverage. We've got two days of wall to wall coverage with experts and pros talking about big data, the transformations here inside The Cube. We'll be right back. (upbeat electronic music)
SUMMARY :
It's The Cube, covering Big Data Silicon Valley 2017. general manager of the big data, did I get that right? Yes, you got it right. That's so hard for me to get, but more announcements, but the theme seems to be integration. a decade in many of the enterprises. on Hadoop and also make the data accessible in it's the multi-cloud world, multi-data type it's on the enterprise side or on some It's hard to find Scala developers, I mean, the near future and perhaps completely in the cloud. get stuck in the personnel hiring problem. another area that the tools really help, So Tendü, if I hear you properly, what you're coming on the market all the time. and delivering that data at the right the legacy, so you kind of lock that down, One of the announcements that we made since automate the data steward's job. the modeling and just the cleaning, and provide right data on the data lake, data, the data lake where you didn't say the data to the business rules. many details in between the lines, if you will, kind of recreating the jobs that they might code bases in the cloud versus on prem So security, governance, and being able to on prem first, and then potentially multi-cloud. it's also the email address, and Well Tendü, thank you for the transformations here inside The Cube.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
two jobs | QUANTITY | 0.99+ |
Two jobs | QUANTITY | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
75% | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
New York City | LOCATION | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
Diane Green | PERSON | 0.99+ |
San Jose, California | LOCATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Scala | TITLE | 0.99+ |
Syncsort | ORGANIZATION | 0.99+ |
San Jose | LOCATION | 0.99+ |
second problem | QUANTITY | 0.99+ |
last week | DATE | 0.99+ |
Compuware | ORGANIZATION | 0.99+ |
two days | QUANTITY | 0.99+ |
Spark 2 | TITLE | 0.99+ |
one | QUANTITY | 0.99+ |
one problem | QUANTITY | 0.99+ |
Vizaviz | ORGANIZATION | 0.99+ |
Tendü Yogurtçu | PERSON | 0.99+ |
Spark | TITLE | 0.99+ |
eighth year | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
five years | QUANTITY | 0.99+ |
Two main use cases | QUANTITY | 0.98+ |
Trillium | ORGANIZATION | 0.98+ |
Python | TITLE | 0.98+ |
Netezza | ORGANIZATION | 0.98+ |
Trillium Software | ORGANIZATION | 0.98+ |
this year | DATE | 0.98+ |
Wikibon | ORGANIZATION | 0.97+ |
Hortonworks | ORGANIZATION | 0.97+ |
Hadoop | TITLE | 0.97+ |
earlier this week | DATE | 0.96+ |
today | DATE | 0.96+ |
Teradata | ORGANIZATION | 0.95+ |
Big Data Silicon Valley 2017 | EVENT | 0.94+ |
First thing | QUANTITY | 0.94+ |
single view | QUANTITY | 0.94+ |
big data | ORGANIZATION | 0.92+ |
Hive | TITLE | 0.92+ |
Java | TITLE | 0.92+ |
The Cube | ORGANIZATION | 0.92+ |
single button | QUANTITY | 0.91+ |
AtScale | ORGANIZATION | 0.91+ |
end of Q1 | DATE | 0.9+ |
single software | QUANTITY | 0.9+ |
second one | QUANTITY | 0.89+ |
first | QUANTITY | 0.89+ |
California, | LOCATION | 0.89+ |
Flink | TITLE | 0.88+ |
Big Data | TITLE | 0.88+ |
two different skill | QUANTITY | 0.87+ |
Silicon Valley, | LOCATION | 0.84+ |
360 | QUANTITY | 0.83+ |
three | QUANTITY | 0.82+ |
last three years | DATE | 0.8+ |
Valley | TITLE | 0.79+ |
Google Cloud 2017 | EVENT | 0.79+ |
Windows | TITLE | 0.78+ |
prem | ORGANIZATION | 0.76+ |
couple of years back | DATE | 0.76+ |
NYC | LOCATION | 0.75+ |
two APIs | QUANTITY | 0.75+ |