UNLIST TILL 4/2 - Vertica Big Data Conference Keynote

>> Joy: Welcome to the Virtual Big Data Conference. Vertica is so excited to host this event. I'm Joy King, and I'll be your host for today's Big Data Conference Keynote Session. It's my honor and my genuine pleasure to lead Vertica's product and go-to-market strategy. And I'm so lucky to have a passionate and committed team who turned our Vertica BDC event, into a virtual event in a very short amount of time. I want to thank the thousands of people, and yes, that's our true number who have registered to attend this virtual event. We were determined to balance your health, safety and your peace of mind with the excitement of the Vertica BDC. This is a very unique event. Because as I hope you all know, we focus on engineering and architecture, best practice sharing and customer stories that will educate and inspire everyone. I also want to thank our top sponsors for the virtual BDC, Arrow, and Pure Storage. Our partnerships are so important to us and to everyone in the audience. Because together, we get things done faster and better. Now for today's keynote, you'll hear from three very important and energizing speakers. First, Colin Mahony, our SVP and General Manager for Vertica, will talk about the market trends that Vertica is betting on to win for our customers. And he'll share the exciting news about our Vertica 10 announcement and how this will benefit our customers. Then you'll hear from Amy Fowler, VP of strategy and solutions for FlashBlade at Pure Storage. Our partnership with Pure Storage is truly unique in the industry, because together modern infrastructure from Pure powers modern analytics from Vertica. And then you'll hear from John Yovanovich, Director of IT at AT&T, who will tell you about the Pure Vertica Symphony that plays live every day at AT&T. Here we go, Colin, over to you. >> Colin: Well, thanks a lot joy. And, I want to echo Joy's thanks to our sponsors, and so many of you who have helped make this happen. This is not an easy time for anyone. We were certainly looking forward to getting together in person in Boston during the Vertica Big Data Conference and Winning with Data. But I think all of you and our team have done a great job, scrambling and putting together a terrific virtual event. So really appreciate your time. I also want to remind people that we will make both the slides and the full recording available after this. So for any of those who weren't able to join live, that is still going to be available. Well, things have been pretty exciting here. And in the analytic space in general, certainly for Vertica, there's a lot happening. There are a lot of problems to solve, a lot of opportunities to make things better, and a lot of data that can really make every business stronger, more efficient, and frankly, more differentiated. For Vertica, though, we know that focusing on the challenges that we can directly address with our platform, and our people, and where we can actually make the biggest difference is where we ought to be putting our energy and our resources. I think one of the things that has made Vertica so strong over the years is our ability to focus on those areas where we can make a great difference. So for us as we look at the market, and we look at where we play, there are really three recent and some not so recent, but certainly picking up a lot of the market trends that have become critical for every industry that wants to Win Big With Data. We've heard this loud and clear from our customers and from the analysts that cover the market. If I were to summarize these three areas, this really is the core focus for us right now. We know that there's massive data growth. And if we can unify the data silos so that people can really take advantage of that data, we can make a huge difference. We know that public clouds offer tremendous advantages, but we also know that balance and flexibility is critical. And we all need the benefit that machine learning for all the types up to the end data science. We all need the benefits that they can bring to every single use case, but only if it can really be operationalized at scale, accurate and in real time. And the power of Vertica is, of course, how we're able to bring so many of these things together. Let me talk a little bit more about some of these trends. So one of the first industry trends that we've all been following probably now for over the last decade, is Hadoop and specifically HDFS. So many companies have invested, time, money, more importantly, people in leveraging the opportunity that HDFS brought to the market. HDFS is really part of a much broader storage disruption that we'll talk a little bit more about, more broadly than HDFS. But HDFS itself was really designed for petabytes of data, leveraging low cost commodity hardware and the ability to capture a wide variety of data formats, from a wide variety of data sources and applications. And I think what people really wanted, was to store that data before having to define exactly what structures they should go into. So over the last decade or so, the focus for most organizations is figuring out how to capture, store and frankly manage that data. And as a platform to do that, I think, Hadoop was pretty good. It certainly changed the way that a lot of enterprises think about their data and where it's locked up. In parallel with Hadoop, particularly over the last five years, Cloud Object Storage has also given every organization another option for collecting, storing and managing even more data. That has led to a huge growth in data storage, obviously, up on public clouds like Amazon and their S3, Google Cloud Storage and Azure Blob Storage just to name a few. And then when you consider regional and local object storage offered by cloud vendors all over the world, the explosion of that data, in leveraging this type of object storage is very real. And I think, as I mentioned, it's just part of this broader storage disruption that's been going on. But with all this growth in the data, in all these new places to put this data, every organization we talk to is facing even more challenges now around the data silo. Sure the data silos certainly getting bigger. And hopefully they're getting cheaper per bit. But as I said, the focus has really been on collecting, storing and managing the data. But between the new data lakes and many different cloud object storage combined with all sorts of data types from the complexity of managing all this, getting that business value has been very limited. This actually takes me to big bet number one for Team Vertica, which is to unify the data. Our goal, and some of the announcements we have made today plus roadmap announcements I'll share with you throughout this presentation. Our goal is to ensure that all the time, money and effort that has gone into storing that data, all the data turns into business value. So how are we going to do that? With a unified analytics platform that analyzes the data wherever it is HDFS, Cloud Object Storage, External tables in an any format ORC, Parquet, JSON, and of course, our own Native Roth Vertica format. Analyze the data in the right place in the right format, using a single unified tool. This is something that Vertica has always been committed to, and you'll see in some of our announcements today, we're just doubling down on that commitment. Let's talk a little bit more about the public cloud. This is certainly the second trend. It's the second wave maybe of data disruption with object storage. And there's a lot of advantages when it comes to public cloud. There's no question that the public clouds give rapid access to compute storage with the added benefit of eliminating data center maintenance that so many companies, want to get out of themselves. But maybe the biggest advantage that I see is the architectural innovation. The public clouds have introduced so many methodologies around how to provision quickly, separating compute and storage and really dialing-in the exact needs on demand, as you change workloads. When public clouds began, it made a lot of sense for the cloud providers and their customers to charge and pay for compute and storage in the ratio that each use case demanded. And I think you're seeing that trend, proliferate all over the place, not just up in public cloud. That architecture itself is really becoming the next generation architecture for on-premise data centers, as well. But there are a lot of concerns. I think we're all aware of them. They're out there many times for different workloads, there are higher costs. Especially if some of the workloads that are being run through analytics, which tend to run all the time. Just like some of the silo challenges that companies are facing with HDFS, data lakes and cloud storage, the public clouds have similar types of siloed challenges as well. Initially, there was a belief that they were cheaper than data centers, and when you added in all the costs, it looked that way. And again, for certain elastic workloads, that is the case. I don't think that's true across the board overall. Even to the point where a lot of the cloud vendors aren't just charging lower costs anymore. We hear from a lot of customers that they don't really want to tether themselves to any one cloud because of some of those uncertainties. Of course, security and privacy are a concern. We hear a lot of concerns with regards to cloud and even some SaaS vendors around shared data catalogs, across all the customers and not enough separation. But security concerns are out there, you can read about them. I'm not going to jump into that bandwagon. But we hear about them. And then, of course, I think one of the things we hear the most from our customers, is that each cloud stack is starting to feel even a lot more locked in than the traditional data warehouse appliance. And as everybody knows, the industry has been running away from appliances as fast as it can. And so they're not eager to get locked into another, quote, unquote, virtual appliance, if you will, up in the cloud. They really want to make sure they have flexibility in which clouds, they're going to today, tomorrow and in the future. And frankly, we hear from a lot of our customers that they're very interested in eventually mixing and matching, compute from one cloud with, say storage from another cloud, which I think is something that we'll hear a lot more about. And so for us, that's why we've got our big bet number two. we love the cloud. We love the public cloud. We love the private clouds on-premise, and other hosting providers. But our passion and commitment is for Vertica to be able to run in any of the clouds that our customers choose, and make it portable across those clouds. We have supported on-premises and all public clouds for years. And today, we have announced even more support for Vertica in Eon Mode, the deployment option that leverages the separation of compute from storage, with even more deployment choices, which I'm going to also touch more on as we go. So super excited about our big bet number two. And finally as I mentioned, for all the hype that there is around machine learning, I actually think that most importantly, this third trend that team Vertica is determined to address is the need to bring business critical, analytics, machine learning, data science projects into production. For so many years, there just wasn't enough data available to justify the investment in machine learning. Also, processing power was expensive, and storage was prohibitively expensive. But to train and score and evaluate all the different models to unlock the full power of predictive analytics was tough. Today you have those massive data volumes. You have the relatively cheap processing power and storage to make that dream a reality. And if you think about this, I mean with all the data that's available to every company, the real need is to operationalize the speed and the scale of machine learning so that these organizations can actually take advantage of it where they need to. I mean, we've seen this for years with Vertica, going back to some of the most advanced gaming companies in the early days, they were incorporating this with live data directly into their gaming experiences. Well, every organization wants to do that now. And the accuracy for clickability and real time actions are all key to separating the leaders from the rest of the pack in every industry when it comes to machine learning. But if you look at a lot of these projects, the reality is that there's a ton of buzz, there's a ton of hype spanning every acronym that you can imagine. But most companies are struggling, do the separate teams, different tools, silos and the limitation that many platforms are facing, driving, down sampling to get a small subset of the data, to try to create a model that then doesn't apply, or compromising accuracy and making it virtually impossible to replicate models, and understand decisions. And if there's one thing that we've learned when it comes to data, prescriptive data at the atomic level, being able to show end of one as we refer to it, meaning individually tailored data. No matter what it is healthcare, entertainment experiences, like gaming or other, being able to get at the granular data and make these decisions, make that scoring applies to machine learning just as much as it applies to giving somebody a next-best-offer. But the opportunity has never been greater. The need to integrate this end-to-end workflow and support the right tools without compromising on that accuracy. Think about it as no downsampling, using all the data, it really is key to machine learning success. Which should be no surprise then why the third big bet from Vertica is one that we've actually been working on for years. And we're so proud to be where we are today, helping the data disruptors across the world operationalize machine learning. This big bet has the potential to truly unlock, really the potential of machine learning. And today, we're announcing some very important new capabilities specifically focused on unifying the work being done by the data science community, with their preferred tools and platforms, and the volume of data and performance at scale, available in Vertica. Our strategy has been very consistent over the last several years. As I said in the beginning, we haven't deviated from our strategy. Of course, there's always things that we add. Most of the time, it's customer driven, it's based on what our customers are asking us to do. But I think we've also done a great job, not trying to be all things to all people. Especially as these hype cycles flare up around us, we absolutely love participating in these different areas without getting completely distracted. I mean, there's a variety of query tools and data warehouses and analytics platforms in the market. We all know that. There are tools and platforms that are offered by the public cloud vendors, by other vendors that support one or two specific clouds. There are appliance vendors, who I was referring to earlier who can deliver package data warehouse offerings for private data centers. And there's a ton of popular machine learning tools, languages and other kits. But Vertica is the only advanced analytic platform that can do all this, that can bring it together. We can analyze the data wherever it is, in HDFS, S3 Object Storage, or Vertica itself. Natively we support multiple clouds on-premise deployments, And maybe most importantly, we offer that choice of deployment modes to allow our customers to choose the architecture that works for them right now. It still also gives them the option to change move, evolve over time. And Vertica is the only analytics database with end-to-end machine learning that can truly operationalize ML at scale. And I know it's a mouthful. But it is not easy to do all these things. It is one of the things that highly differentiates Vertica from the rest of the pack. It is also why our customers, all of you continue to bet on us and see the value that we are delivering and we will continue to deliver. Here's a couple of examples of some of our customers who are powered by Vertica. It's the scale of data. It's the millisecond response times. Performance and scale have always been a huge part of what we have been about, not the only thing. I think the functionality all the capabilities that we add to the platform, the ease of use, the flexibility, obviously with the deployment. But if you look at some of the numbers they are under these customers on this slide. And I've shared a lot of different stories about these customers. Which, by the way, it still amaze me every time I talk to one and I get the updates, you can see the power and the difference that Vertica is making. Equally important, if you look at a lot of these customers, they are the epitome of being able to deploy Vertica in a lot of different environments. Many of the customers on this slide are not using Vertica just on-premise or just in the cloud. They're using it in a hybrid way. They're using it in multiple different clouds. And again, we've been with them on that journey throughout, which is what has made this product and frankly, our roadmap and our vision exactly what it is. It's been quite a journey. And that journey continues now with the Vertica 10 release. The Vertica 10 release is obviously a massive release for us. But if you look back, you can see that building on that native columnar architecture that started a long time ago, obviously, with the C-Store paper. We built it to leverage that commodity hardware, because it was an architecture that was never tightly integrated with any specific underlying infrastructure. I still remember hearing the initial pitch from Mike Stonebreaker, about the vision of Vertica as a software only solution and the importance of separating the company from hardware innovation. And at the time, Mike basically said to me, "there's so much R&D in innovation that's going to happen in hardware, we shouldn't bake hardware into our solution. We should do it in software, and we'll be able to take advantage of that hardware." And that is exactly what has happened. But one of the most recent innovations that we embraced with hardware is certainly that separation of compute and storage. As I said previously, the public cloud providers offered this next generation architecture, really to ensure that they can provide the customers exactly what they needed, more compute or more storage and charge for each, respectively. The separation of compute and storage, compute from storage is a major milestone in data center architectures. If you think about it, it's really not only a public cloud innovation, though. It fundamentally redefines the next generation data architecture for on-premise and for pretty much every way people are thinking about computing today. And that goes for software too. Object storage is an example of the cost effective means for storing data. And even more importantly, separating compute from storage for analytic workloads has a lot of advantages. Including the opportunity to manage much more dynamic, flexible workloads. And more importantly, truly isolate those workloads from others. And by the way, once you start having something that can truly isolate workloads, then you can have the conversations around autonomic computing, around setting up some nodes, some compute resources on the data that won't affect any of the other data to do some things on their own, maybe some self analytics, by the system, etc. A lot of things that many of you know we've already been exploring in terms of our own system data in the product. But it was May 2018, believe it or not, it seems like a long time ago where we first announced Eon Mode and I want to make something very clear, actually about Eon mode. It's a mode, it's a deployment option for Vertica customers. And I think this is another huge benefit that we don't talk about enough. But unlike a lot of vendors in the market who will dig you and charge you for every single add-on like hit-buy, you name it. You get this with the Vertica product. If you continue to pay support and maintenance, this comes with the upgrade. This comes as part of the new release. So any customer who owns or buys Vertica has the ability to set up either an Enterprise Mode or Eon Mode, which is a question I know that comes up sometimes. Our first announcement of Eon was obviously AWS customers, including the trade desk, AT&T. Most of whom will be speaking here later at the Virtual Big Data Conference. They saw a huge opportunity. Eon Mode, not only allowed Vertica to scale elastically with that specific compute and storage that was needed, but it really dramatically simplified database operations including things like workload balancing, node recovery, compute provisioning, etc. So one of the most popular functions is that ability to isolate the workloads and really allocate those resources without negatively affecting others. And even though traditional data warehouses, including Vertica Enterprise Mode have been able to do lots of different workload isolation, it's never been as strong as Eon Mode. Well, it certainly didn't take long for our customers to see that value across the board with Eon Mode. Not just up in the cloud, in partnership with one of our most valued partners and a platinum sponsor here. Joy mentioned at the beginning. We announced Vertica Eon Mode for Pure Storage FlashBlade in September 2019. And again, just to be clear, this is not a new product, it's one Vertica with yet more deployment options. With Pure Storage, Vertica in Eon mode is not limited in any way by variable cloud, network latency. The performance is actually amazing when you take the benefits of separate and compute from storage and you run it with a Pure environment on-premise. Vertica in Eon Mode has a super smart cache layer that we call the depot. It's a big part of our secret sauce around Eon mode. And combined with the power and performance of Pure's FlashBlade, Vertica became the industry's first advanced analytics platform that actually separates compute and storage for on-premises data centers. Something that a lot of our customers are already benefiting from, and we're super excited about it. But as I said, this is a journey. We don't stop, we're not going to stop. Our customers need the flexibility of multiple public clouds. So today with Vertica 10, we're super proud and excited to announce support for Vertica in Eon Mode on Google Cloud. This gives our customers the ability to use their Vertica licenses on Amazon AWS, on-premise with Pure Storage and on Google Cloud. Now, we were talking about HDFS and a lot of our customers who have invested quite a bit in HDFS as a place, especially to store data have been pushing us to support Eon Mode with HDFS. So as part of Vertica 10, we are also announcing support for Vertica in Eon Mode using HDFS as the communal storage. Vertica's own Roth format data can be stored in HDFS, and actually the full functionality of Vertica is complete analytics, geospatial pattern matching, time series, machine learning, everything that we have in there can be applied to this data. And on the same HDFS nodes, Vertica can actually also analyze data in ORC or Parquet format, using External tables. We can also execute joins between the Roth data the External table holds, which powers a much more comprehensive view. So again, it's that flexibility to be able to support our customers, wherever they need us to support them on whatever platform, they have. Vertica 10 gives us a lot more ways that we can deploy Eon Mode in various environments for our customers. It allows them to take advantage of Vertica in Eon Mode and the power that it brings with that separation, with that workload isolation, to whichever platform they are most comfortable with. Now, there's a lot that has come in Vertica 10. I'm definitely not going to be able to cover everything. But we also introduced complex types as an example. And complex data types fit very well into Eon as well in this separation. They significantly reduce the data pipeline, the cost of moving data between those, a much better support for unstructured data, which a lot of our customers have mixed with structured data, of course, and they leverage a lot of columnar execution that Vertica provides. So you get complex data types in Vertica now, a lot more data, stronger performance. It goes great with the announcement that we made with the broader Eon Mode. Let's talk a little bit more about machine learning. We've been actually doing work in and around machine learning with various extra regressions and a whole bunch of other algorithms for several years. We saw the huge advantage that MPP offered, not just as a sequel engine as a database, but for ML as well. Didn't take as long to realize that there's a lot more to operationalizing machine learning than just those algorithms. It's data preparation, it's that model trade training. It's the scoring, the shaping, the evaluation. That is so much of what machine learning and frankly, data science is about. You do know, everybody always wants to jump to the sexy algorithm and we handle those tasks very, very well. It makes Vertica a terrific platform to do that. A lot of work in data science and machine learning is done in other tools. I had mentioned that there's just so many tools out there. We want people to be able to take advantage of all that. We never believed we were going to be the best algorithm company or come up with the best models for people to use. So with Vertica 10, we support PMML. We can import now and export PMML models. It's a huge step for us around that operationalizing machine learning projects for our customers. Allowing the models to get built outside of Vertica yet be imported in and then applying to that full scale of data with all the performance that you would expect from Vertica. We also are more tightly integrating with Python. As many of you know, we've been doing a lot of open source projects with the community driven by many of our customers, like Uber. And so now with Python we've integrated with TensorFlow, allowing data scientists to build models in their preferred language, to take advantage of TensorFlow. But again, to store and deploy those models at scale with Vertica. I think both these announcements are proof of our big bet number three, and really our commitment to supporting innovation throughout the community by operationalizing ML with that accuracy, performance and scale of Vertica for our customers. Again, there's a lot of steps when it comes to the workflow of machine learning. These are some of them that you can see on the slide, and it's definitely not linear either. We see this as a circle. And companies that do it, well just continue to learn, they continue to rescore, they continue to redeploy and they want to operationalize all that within a single platform that can take advantage of all those capabilities. And that is the platform, with a very robust ecosystem that Vertica has always been committed to as an organization and will continue to be. This graphic, many of you have seen it evolve over the years. Frankly, if we put everything and everyone on here wouldn't fit on a slide. But it will absolutely continue to evolve and grow as we support our customers, where they need the support most. So, again, being able to deploy everywhere, being able to take advantage of Vertica, not just as a business analyst or a business user, but as a data scientists or as an operational or BI person. We want Vertica to be leveraged and used by the broader organization. So I think it's fair to say and I encourage everybody to learn more about Vertica 10, because I'm just highlighting some of the bigger aspects of it. But we talked about those three market trends. The need to unify the silos, the need for hybrid multiple cloud deployment options, the need to operationalize business critical machine learning projects. Vertica 10 has absolutely delivered on those. But again, we are not going to stop. It is our job not to, and this is how Team Vertica thrives. I always joke that the next release is the best release. And, of course, even after Vertica 10, that is also true, although Vertica 10 is pretty awesome. But, you know, from the first line of code, we've always been focused on performance and scale, right. And like any really strong data platform, the execution engine, the optimizer and the execution engine are the two core pieces of that. Beyond Vertica 10, some of the big things that we're already working on, next generation execution engine. We're already actually seeing incredible early performance from this. And this is just one example, of how important it is for an organization like Vertica to constantly go back and re-innovate. Every single release, we do the sit ups and crunches, our performance and scale. How do we improve? And there's so many parts of the core server, there's so many parts of our broader ecosystem. We are constantly looking at coverages of how we can go back to all the code lines that we have, and make them better in the current environment. And it's not an easy thing to do when you're doing that, and you're also expanding in the environment that we are expanding into to take advantage of the different deployments, which is a great segue to this slide. Because if you think about today, we're obviously already available with Eon Mode and Amazon, AWS and Pure and actually MinIO as well. As I talked about in Vertica 10 we're adding Google and HDFS. And coming next, obviously, Microsoft Azure, Alibaba cloud. So being able to expand into more of these environments is really important for the Vertica team and how we go forward. And it's not just running in these clouds, for us, we want it to be a SaaS like experience in all these clouds. We want you to be able to deploy Vertica in 15 minutes or less on these clouds. You can also consume Vertica, in a lot of different ways, on these clouds. As an example, in Amazon Vertica by the Hour. So for us, it's not just about running, it's about taking advantage of the ecosystems that all these cloud providers offer, and really optimizing the Vertica experience as part of them. Optimization, around automation, around self service capabilities, extending our management console, we now have products that like the Vertica Advisor Tool that our Customer Success Team has created to actually use our own smarts in Vertica. To take data from customers that give it to us and help them tune automatically their environment. You can imagine that we're taking that to the next level, in a lot of different endeavors that we're doing around how Vertica as a product can actually be smarter because we all know that simplicity is key. There just aren't enough people in the world who are good at managing data and taking it to the next level. And of course, other things that we all hear about, whether it's Kubernetes and containerization. You can imagine that that probably works very well with the Eon Mode and separating compute and storage. But innovation happens everywhere. We innovate around our community documentation. Many of you have taken advantage of the Vertica Academy. The numbers there are through the roof in terms of the number of people coming in and certifying on it. So there's a lot of things that are within the core products. There's a lot of activity and action beyond the core products that we're taking advantage of. And let's not forget why we're here, right? It's easy to talk about a platform, a data platform, it's easy to jump into all the functionality, the analytics, the flexibility, how we can offer it. But at the end of the day, somebody, a person, she's got to take advantage of this data, she's got to be able to take this data and use this information to make a critical business decision. And that doesn't happen unless we explore lots of different and frankly, new ways to get that predictive analytics UI and interface beyond just the standard BI tools in front of her at the right time. And so there's a lot of activity, I'll tease you with that going on in this organization right now about how we can do that and deliver that for our customers. We're in a great position to be able to see exactly how this data is consumed and used and start with this core platform that we have to go out. Look, I know, the plan wasn't to do this as a virtual BDC. But I really appreciate you tuning in. Really appreciate your support. I think if there's any silver lining to us, maybe not being able to do this in person, it's the fact that the reach has actually gone significantly higher than what we would have been able to do in person in Boston. We're certainly looking forward to doing a Big Data Conference in the future. But if I could leave you with anything, know this, since that first release for Vertica, and our very first customers, we have been very consistent. We respect all the innovation around us, whether it's open source or not. We understand the market trends. We embrace those new ideas and technologies and for us true north, and the most important thing is what does our customer need to do? What problem are they trying to solve? And how do we use the advantages that we have without disrupting our customers? But knowing that you depend on us to deliver that unified analytics strategy, it will deliver that performance of scale, not only today, but tomorrow and for years to come. We've added a lot of great features to Vertica. I think we've said no to a lot of things, frankly, that we just knew we wouldn't be the best company to deliver. When we say we're going to do things we do them. Vertica 10 is a perfect example of so many of those things that we from you, our customers have heard loud and clear, and we have delivered. I am incredibly proud of this team across the board. I think the culture of Vertica, a customer first culture, jumping in to help our customers win no matter what is also something that sets us massively apart. I hear horror stories about support experiences with other organizations. And people always seem to be amazed at Team Vertica's willingness to jump in or their aptitude for certain technical capabilities or understanding the business. And I think sometimes we take that for granted. But that is the team that we have as Team Vertica. We are incredibly excited about Vertica 10. I think you're going to love the Virtual Big Data Conference this year. I encourage you to tune in. Maybe one other benefit is I know some people were worried about not being able to see different sessions because they were going to overlap with each other well now, even if you can't do it live, you'll be able to do those sessions on demand. Please enjoy the Vertica Big Data Conference here in 2020. Please you and your families and your co-workers be safe during these times. I know we will get through it. And analytics is probably going to help with a lot of that and we already know it is helping in many different ways. So believe in the data, believe in data's ability to change the world for the better. And thank you for your time. And with that, I am delighted to now introduce Micro Focus CEO Stephen Murdoch to the Vertica Big Data Virtual Conference. Thank you Stephen. >> Stephen: Hi, everyone, my name is Stephen Murdoch. I have the pleasure and privilege of being the Chief Executive Officer here at Micro Focus. Please let me add my welcome to the Big Data Conference. And also my thanks for your support, as we've had to pivot to this being virtual rather than a physical conference. Its amazing how quickly we all reset to a new normal. I certainly didn't expect to be addressing you from my study. Vertica is an incredibly important part of Micro Focus family. Is key to our goal of trying to enable and help customers become much more data driven across all of their IT operations. Vertica 10 is a huge step forward, we believe. It allows for multi-cloud innovation, genuinely hybrid deployments, begin to leverage machine learning properly in the enterprise, and also allows the opportunity to unify currently siloed lakes of information. We operate in a very noisy, very competitive market, and there are people, who are in that market who can do some of those things. The reason we are so excited about Vertica is we genuinely believe that we are the best at doing all of those things. And that's why we've announced publicly, you're under executing internally, incremental investment into Vertica. That investments targeted at accelerating the roadmaps that already exist. And getting that innovation into your hands faster. This idea is speed is key. It's not a question of if companies have to become data driven organizations, it's a question of when. So that speed now is really important. And that's why we believe that the Big Data Conference gives a great opportunity for you to accelerate your own plans. You will have the opportunity to talk to some of our best architects, some of the best development brains that we have. But more importantly, you'll also get to hear from some of our phenomenal Roth Data customers. You'll hear from Uber, from the Trade Desk, from Philips, and from AT&T, as well as many many others. And just hearing how those customers are using the power of Vertica to accelerate their own, I think is the highlight. And I encourage you to use this opportunity to its full. Let me close by, again saying thank you, we genuinely hope that you get as much from this virtual conference as you could have from a physical conference. And we look forward to your engagement, and we look forward to hearing your feedback. With that, thank you very much. >> Joy: Thank you so much, Stephen, for joining us for the Vertica Big Data Conference. Your support and enthusiasm for Vertica is so clear, and it makes a big difference. Now, I'm delighted to introduce Amy Fowler, the VP of Strategy and Solutions for FlashBlade at Pure Storage, who was one of our BDC Platinum Sponsors, and one of our most valued partners. It was a proud moment for me, when we announced Vertica in Eon mode for Pure Storage FlashBlade and we became the first analytics data warehouse that separates compute from storage for on-premise data centers. Thank you so much, Amy, for joining us. Let's get started. >> Amy: Well, thank you, Joy so much for having us. And thank you all for joining us today, virtually, as we may all be. So, as we just heard from Colin Mahony, there are some really interesting trends that are happening right now in the big data analytics market. From the end of the Hadoop hype cycle, to the new cloud reality, and even the opportunity to help the many data science and machine learning projects move from labs to production. So let's talk about these trends in the context of infrastructure. And in particular, look at why a modern storage platform is relevant as organizations take on the challenges and opportunities associated with these trends. The answer is the Hadoop hype cycles left a lot of data in HDFS data lakes, or reservoirs or swamps depending upon the level of the data hygiene. But without the ability to get the value that was promised from Hadoop as a platform rather than a distributed file store. And when we combine that data with the massive volume of data in Cloud Object Storage, we find ourselves with a lot of data and a lot of silos, but without a way to unify that data and find value in it. Now when you look at the infrastructure data lakes are traditionally built on, it is often direct attached storage or data. The approach that Hadoop took when it entered the market was primarily bound by the limits of networking and storage technologies. One gig ethernet and slower spinning disk. But today, those barriers do not exist. And all FlashStorage has fundamentally transformed how data is accessed, managed and leveraged. The need for local data storage for significant volumes of data has been largely mitigated by the performance increases afforded by all Flash. At the same time, organizations can achieve superior economies of scale with that segregation of compute and storage. With compute and storage, you don't always scale in lockstep. Would you want to add an engine to the train every time you add another boxcar? Probably not. But from a Pure Storage perspective, FlashBlade is uniquely architected to allow customers to achieve better resource utilization for compute and storage, while at the same time, reducing complexity that has arisen from the siloed nature of the original big data solutions. The second and equally important recent trend we see is something I'll call cloud reality. The public clouds made a lot of promises and some of those promises were delivered. But cloud economics, especially usage based and elastic scaling, without the control that many companies need to manage the financial impact is causing a lot of issues. In addition, the risk of vendor lock-in from data egress, charges, to integrated software stacks that can't be moved or deployed on-premise is causing a lot of organizations to back off the all the way non-cloud strategy, and move toward hybrid deployments. Which is kind of funny in a way because it wasn't that long ago that there was a lot of talk about no more data centers. And for example, one large retailer, I won't name them, but I'll admit they are my favorites. They several years ago told us they were completely done with on-prem storage infrastructure, because they were going 100% to the cloud. But they just deployed FlashBlade for their data pipelines, because they need predictable performance at scale. And the all cloud TCO just didn't add up. Now, that being said, well, there are certainly challenges with the public cloud. It has also brought some things to the table that we see most organizations wanting. First of all, in a lot of cases applications have been built to leverage object storage platforms like S3. So they need that object protocol, but they may also need it to be fast. And the said object may be oxymoron only a few years ago, and this is an area of the market where Pure and FlashBlade have really taken a leadership position. Second, regardless of where the data is physically stored, organizations want the best elements of a cloud experience. And for us, that means two main things. Number one is simplicity and ease of use. If you need a bunch of storage experts to run the system, that should be considered a bug. The other big one is the consumption model. The ability to pay for what you need when you need it, and seamlessly grow your environment over time totally nondestructively. This is actually pretty huge and something that a lot of vendors try to solve for with finance programs. But no finance program can address the pain of a forklift upgrade, when you need to move to next gen hardware. To scale nondestructively over long periods of time, five to 10 years plus is a crucial architectural decisions need to be made at the outset. Plus, you need the ability to pay as you use it. And we offer something for FlashBlade called Pure as a Service, which delivers exactly that. The third cloud characteristic that many organizations want is the option for hybrid. Even if that is just a DR site in the cloud. In our case, that means supporting appplication of S3, at the AWS. And the final trend, which to me represents the biggest opportunity for all of us, is the need to help the many data science and machine learning projects move from labs to production. This means bringing all the machine learning functions and model training to the data, rather than moving samples or segments of data to separate platforms. As we all know, machine learning needs a ton of data for accuracy. And there is just too much data to retrieve from the cloud for every training job. At the same time, predictive analytics without accuracy is not going to deliver the business advantage that everyone is seeking. You can kind of visualize data analytics as it is traditionally deployed as being on a continuum. With that thing, we've been doing the longest, data warehousing on one end, and AI on the other end. But the way this manifests in most environments is a series of silos that get built up. So data is duplicated across all kinds of bespoke analytics and AI, environments and infrastructure. This creates an expensive and complex environment. So historically, there was no other way to do it because some level of performance is always table stakes. And each of these parts of the data pipeline has a different workload profile. A single platform to deliver on the multi dimensional performances, diverse set of applications required, that didn't exist three years ago. And that's why the application vendors pointed you towards bespoke things like DAS environments that we talked about earlier. And the fact that better options exists today is why we're seeing them move towards supporting this disaggregation of compute and storage. And when it comes to a platform that is a better option, one with a modern architecture that can address the diverse performance requirements of this continuum, and allow organizations to bring a model to the data instead of creating separate silos. That's exactly what FlashBlade is built for. Small files, large files, high throughput, low latency and scale to petabytes in a single namespace. And this is importantly a single rapid space is what we're focused on delivering for our customers. At Pure, we talk about it in the context of modern data experience because at the end of the day, that's what it's really all about. The experience for your teams in your organization. And together Pure Storage and Vertica have delivered that experience to a wide range of customers. From a SaaS analytics company, which uses Vertica on FlashBlade to authenticate the quality of digital media in real time, to a multinational car company, which uses Vertica on FlashBlade to make thousands of decisions per second for autonomous cars, or a healthcare organization, which uses Vertica on FlashBlade to enable healthcare providers to make real time decisions that impact lives. And I'm sure you're all looking forward to hearing from John Yavanovich from AT&T. To hear how he's been doing this with Vertica and FlashBlade as well. He's coming up soon. We have been really excited to build this partnership with Vertica. And we're proud to provide the only on-premise storage platform validated with Vertica Eon Mode. And deliver this modern data experience to our customers together. Thank you all so much for joining us today. >> Joy: Amy, thank you so much for your time and your insights. Modern infrastructure is key to modern analytics, especially as organizations leverage next generation data center architectures, and object storage for their on-premise data centers. Now, I'm delighted to introduce our last speaker in our Vertica Big Data Conference Keynote, John Yovanovich, Director of IT for AT&T. Vertica is so proud to serve AT&T, and especially proud of the harmonious impact we are having in partnership with Pure Storage. John, welcome to the Virtual Vertica BDC. >> John: Thank you joy. It's a pleasure to be here. And I'm excited to go through this presentation today. And in a unique fashion today 'cause as I was thinking through how I wanted to present the partnership that we have formed together between Pure Storage, Vertica and AT&T, I want to emphasize how well we all work together and how these three components have really driven home, my desire for a harmonious to use your word relationship. So, I'm going to move forward here and with. So here, what I'm going to do the theme of today's presentation is the Pure Vertica Symphony live at AT&T. And if anybody is a Westworld fan, you can appreciate the sheet music on the right hand side. What we're going to what I'm going to highlight here is in a musical fashion, is how we at AT&T leverage these technologies to save money to deliver a more efficient platform, and to actually just to make our customers happier overall. So as we look back, and back as early as just maybe a few years ago here at AT&T, I realized that we had many musicians to help the company. Or maybe you might want to call them data scientists, or data analysts. For the theme we'll stay with musicians. None of them were singing or playing from the same hymn book or sheet music. And so what we had was many organizations chasing a similar dream, but not exactly the same dream. And, best way to describe that is and I think with a lot of people this might resonate in your organizations. How many organizations are chasing a customer 360 view in your company? Well, I can tell you that I have at least four in my company. And I'm sure there are many that I don't know of. That is our problem because what we see is a repetitive sourcing of data. We see a repetitive copying of data. And there's just so much money to be spent. This is where I asked Pure Storage and Vertica to help me solve that problem with their technologies. What I also noticed was that there was no coordination between these departments. In fact, if you look here, nobody really wants to play with finance. Sales, marketing and care, sure that you all copied each other's data. But they actually didn't communicate with each other as they were copying the data. So the data became replicated and out of sync. This is a challenge throughout, not just my company, but all companies across the world. And that is, the more we replicate the data, the more problems we have at chasing or conquering the goal of single version of truth. In fact, I kid that I think that AT&T, we actually have adopted the multiple versions of truth, techno theory, which is not where we want to be, but this is where we are. But we are conquering that with the synergies between Pure Storage and Vertica. This is what it leaves us with. And this is where we are challenged and that if each one of our siloed business units had their own stories, their own dedicated stories, and some of them had more money than others so they bought more storage. Some of them anticipating storing more data, and then they really did. Others are running out of space, but can't put anymore because their bodies aren't been replenished. So if you look at it from this side view here, we have a limited amount of compute or fixed compute dedicated to each one of these silos. And that's because of the, wanting to own your own. And the other part is that you are limited or wasting space, depending on where you are in the organization. So there were the synergies aren't just about the data, but actually the compute and the storage. And I wanted to tackle that challenge as well. So I was tackling the data. I was tackling the storage, and I was tackling the compute all at the same time. So my ask across the company was can we just please play together okay. And to do that, I knew that I wasn't going to tackle this by getting everybody in the same room and getting them to agree that we needed one account table, because they will argue about whose account table is the best account table. But I knew that if I brought the account tables together, they would soon see that they had so much redundancy that I can now start retiring data sources. I also knew that if I brought all the compute together, that they would all be happy. But I didn't want them to tackle across tackle each other. And in fact that was one of the things that all business units really enjoy. Is they enjoy the silo of having their own compute, and more or less being able to control their own destiny. Well, Vertica's subclustering allows just that. And this is exactly what I was hoping for, and I'm glad they've brought through. And finally, how did I solve the problem of the single account table? Well when you don't have dedicated storage, and you can separate compute and storage as Vertica in Eon Mode does. And we store the data on FlashBlades, which you see on the left and right hand side, of our container, which I can describe in a moment. Okay, so what we have here, is we have a container full of compute with all the Vertica nodes sitting in the middle. Two loader, we'll call them loader subclusters, sitting on the sides, which are dedicated to just putting data onto the FlashBlades, which is sitting on both ends of the container. Now today, I have two dedicated storage or common dedicated might not be the right word, but two storage racks one on the left one on the right. And I treat them as separate storage racks. They could be one, but i created them separately for disaster recovery purposes, lashing work in case that rack were to go down. But that being said, there's no reason why I'm probably going to add a couple of them here in the future. So I can just have a, say five to 10, petabyte storage, setup, and I'll have my DR in another 'cause the DR shouldn't be in the same container. Okay, but I'll DR outside of this container. So I got them all together, I leveraged subclustering, I leveraged separate and compute. I was able to convince many of my clients that they didn't need their own account table, that they were better off having one. I eliminated, I reduced latency, I reduced our ticketing I reduce our data quality issues AKA ticketing okay. I was able to expand. What is this? As work. I was able to leverage elasticity within this cluster. As you can see, there are racks and racks of compute. We set up what we'll call the fixed capacity that each of the business units needed. And then I'm able to ramp up and release the compute that's necessary for each one of my clients based on their workloads throughout the day. And so while they compute to the right before you see that the instruments have already like, more or less, dedicated themselves towards all those are free for anybody to use. So in essence, what I have, is I have a concert hall with a lot of seats available. So if I want to run a 10 chair Symphony or 80, chairs, Symphony, I'm able to do that. And all the while, I can also do the same with my loader nodes. I can expand my loader nodes, to actually have their own Symphony or write all to themselves and not compete with any other workloads of the other clusters. What does that change for our organization? Well, it really changes the way our database administrators actually do their jobs. This has been a big transformation for them. They have actually become data conductors. Maybe you might even call them composers, which is interesting, because what I've asked them to do is morph into less technology and more workload analysis. And in doing so we're able to write auto-detect scripts, that watch the queues, watch the workloads so that we can help ramp up and trim down the cluster and subclusters as necessary. There has been an exciting transformation for our DBAs, who I need to now classify as something maybe like DCAs. I don't know, I have to work with HR on that. But I think it's an exciting future for their careers. And if we bring it all together, If we bring it all together, and then our clusters, start looking like this. Where everything is moving in harmonious, we have lots of seats open for extra musicians. And we are able to emulate a cloud experience on-prem. And so, I want you to sit back and enjoy the Pure Vertica Symphony live at AT&T. (soft music) >> Joy: Thank you so much, John, for an informative and very creative look at the benefits that AT&T is getting from its Pure Vertica symphony. I do really like the idea of engaging HR to change the title to Data Conductor. That's fantastic. I've always believed that music brings people together. And now it's clear that analytics at AT&T is part of that musical advantage. So, now it's time for a short break. And we'll be back for our breakout sessions, beginning at 12 pm Eastern Daylight Time. We have some really exciting sessions planned later today. And then again, as you can see on Wednesday. Now because all of you are already logged in and listening to this keynote, you already know the steps to continue to participate in the sessions that are listed here and on the previous slide. In addition, everyone received an email yesterday, today, and you'll get another one tomorrow, outlining the simple steps to register, login and choose your session. If you have any questions, check out the emails or go to www.vertica.com/bdc2020 for the logistics information. There are a lot of choices and that's always a good thing. Don't worry if you want to attend one or more or can't listen to these live sessions due to your timezone. All the sessions, including the Q&A sections will be available on demand and everyone will have access to the recordings as well as even more pre-recorded sessions that we'll post to the BDC website. Now I do want to leave you with two other important sites. First, our Vertica Academy. Vertica Academy is available to everyone. And there's a variety of very technical, self-paced, on-demand training, virtual instructor-led workshops, and Vertica Essentials Certification. And it's all free. Because we believe that Vertica expertise, helps everyone accelerate their Vertica projects and the advantage that those projects deliver. Now, if you have questions or want to engage with our Vertica engineering team now, we're waiting for you on the Vertica forum. We'll answer any questions or discuss any ideas that you might have. Thank you again for joining the Vertica Big Data Conference Keynote Session. Enjoy the rest of the BDC because there's a lot more to come

Published Date : Mar 30 2020

SUMMARY :

And he'll share the exciting news And that is the platform, with a very robust ecosystem some of the best development brains that we have. the VP of Strategy and Solutions is causing a lot of organizations to back off the and especially proud of the harmonious impact And that is, the more we replicate the data, Enjoy the rest of the BDC because there's a lot more to come

ENTITIES

Entity	Category	Confidence
Stephen	PERSON	0.99+
Amy Fowler	PERSON	0.99+
Mike	PERSON	0.99+
John Yavanovich	PERSON	0.99+
Amy	PERSON	0.99+
Colin Mahony	PERSON	0.99+
AT&T	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
John Yovanovich	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
Joy King	PERSON	0.99+
Mike Stonebreaker	PERSON	0.99+
John	PERSON	0.99+
May 2018	DATE	0.99+
100%	QUANTITY	0.99+
Wednesday	DATE	0.99+
Colin	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Vertica Academy	ORGANIZATION	0.99+
five	QUANTITY	0.99+
Joy	PERSON	0.99+
2020	DATE	0.99+
two	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
Stephen Murdoch	PERSON	0.99+
Vertica 10	TITLE	0.99+
Pure Storage	ORGANIZATION	0.99+
one	QUANTITY	0.99+
today	DATE	0.99+
Philips	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
AT&T.	ORGANIZATION	0.99+
September 2019	DATE	0.99+
Python	TITLE	0.99+
www.vertica.com/bdc2020	OTHER	0.99+
One gig	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
Second	QUANTITY	0.99+
First	QUANTITY	0.99+
15 minutes	QUANTITY	0.99+
yesterday	DATE	0.99+

Infrastructure For Big Data Workloads

>> From the SiliconANGLE media office in Boston, Massachusetts, it's theCUBE! Now, here's your host, Dave Vellante. >> Hi, everybody, welcome to this special CUBE Conversation. You know, big data workloads have evolved, and the infrastructure that runs big data workloads is also evolving. Big data, AI, other emerging workloads need infrastructure that can keep up. Welcome to this special CUBE Conversation with Patrick Osborne, who's the vice president and GM of big data and secondary storage at Hewlett Packard Enterprise, @patrick_osborne. Great to see you again, thanks for coming on. >> Great, love to be back here. >> As I said up front, big data's changing. It's evolving, and the infrastructure has to also evolve. What are you seeing, Patrick, and what's HPE seeing in terms of the market forces right now driving big data and analytics? >> Well, some of the things that we see in the data center, there is a continuous move to move from bare metal to virtualized. Everyone's on that train. To containerization of existing apps, your apps of record, business, mission-critical apps. But really, what a lot of folks are doing right now is adding additional services to those applications, those data sets, so, new ways to interact, new apps. A lot of those are being developed with a lot of techniques that revolve around big data and analytics. We're definitely seeing the pressure to modernize what you have on-prem today, but you know, you can't sit there and be static. You gotta provide new services around what you're doing for your customers. A lot of those are coming in the form of this Mode 2 type of application development. >> One of the things that we're seeing, everybody talks about digital transformation. It's the hot buzzword of the day. To us, digital means data first. Presumably, you're seeing that. Are organizations organizing around their data, and what does that mean for infrastructure? >> Yeah, absolutely. We see a lot of folks employing not only technology to do that. They're doing organizational techniques, so, peak teams. You know, bringing together a lot of different functions. Also, too, organizing around the data has become very different right now, that you've got data out on the edge, right? It's coming into the core. A lot of folks are moving some of their edge to the cloud, or even their core to the cloud. You gotta make a lot of decisions and be able to organize around a pretty complex set of places, physical and virtual, where your data's gonna lie. >> There's a lot of talk, too, about the data pipeline. The data pipeline used to be, you had an enterprise data warehouse, and the pipeline was, you'd go through a few people that would build some cubes and then they'd hand off a bunch of reports. The data pipeline, it's getting much more complex. You've got the edge coming in, you've got, you know, core. You've got the cloud, which can be on-prem or public cloud. Talk about the evolution of the data pipeline and what that means for infrastructure and big data workloads. >> For a lot of our customers, and we've got a pretty interesting business here at HPE. We do a lot with the Intelligent Edge, so, our Edgeline servers in Aruba, where a a lot of the data is sitting outside of the traditional data center. Then we have what's going on in the core, which, for a lot of customers, they are moving from either traditional EDW, right, or even Hadoop 1.0 if they started that transformation five to seven years ago, to, a lot of things are happening now in real time, or a combination thereof. The data types are pretty dynamic. Some of that is always getting processed out on the edge. Results are getting sent back to the core. We're also seeing a lot of folks move to real-time data analytics, or some people call it fast data. That sits in your core data center, so utilizing things like Kafka and Spark. A lot of the techniques for persistent storage are brand new. What it boils down to is, it's an opportunity, but it's also very complex for our customers. >> What about some of the technical trends behind what's going on with big data? I mean, you've got sprawl, with both data sprawl, you've got workload sprawl. You got developers that are dealing with a lot of complex tooling. What are you guys seeing there, in terms of the big mega-trends? >> We have, as you know, HPE has quite a few customers in the mid-range in enterprise segments. We have some customers that are very tech-forward. A lot of those customers are moving from this, you know, Hadoop 1.0, Hadoop 2.0 system to a set of essentially mixed workloads that are very multi-tenant. We see customers that have, essentially, a mix of batch-oriented workloads. Now they're introducing these streaming type of workloads to folks who are bringing in things like TensorFlow and GPGPUs, and they're trying to apply some of the techniques of AI and ML into those clusters. What we're seeing right now is that that is causing a lot of complexity, not only in the way you do your apps, but the number of applications and the number of tenants who use that data. It's getting used all day long for various different, so now what we're seeing is it's grown up. It started as an opportunity, a science project, the POC. Now it's business-critical. Becoming, now, it's very mission-critical for a lot of the services that drives. >> Am I correct that those diverse workloads used to require a bespoke set of infrastructure that was very siloed? I'm inferring that technology today will allow you to bring those workloads together on a single platform. Is that correct? >> A couple of things that we offer, and we've been helping customers to get off the complexity train, but provide them flexibility and elasticity is, a lot of the workloads that we did in the past were either very vertically-focused and integrated. One app server, networking, storage, to, you know, the beginning of the analytics phase was really around symmetrical clusters and scaling them out. Now we've got a very rich and diverse set of components and infrastructure that can essentially allow a customer to make a data lake that's very scalable. Compute, storage-oriented nodes, GPU-oriented nodes, so it's very flexible and helps us, helps the customers take complexity out of their environment. >> In thinking about, when you talk to customers, what are they struggling with, specifically as it relates to infrastructure? Again, we talked about tooling. I mean, Hadoop is well-known for the complexity of the tooling. But specifically from an infrastructure standpoint, what are the big complaints that you hear? >> A couple things that we hear is that my budget's flat for the next year or couple years, right? We talked earlier in the conversation about, I have to modernize, virtualize, containerizing my existing apps, that means I have to introduce new services as well with a very different type of DevOps, you know, mode of operations. That's all with the existing staff, right? That's the number one issue that we hear from the customers. Anything that we can do to help increase the velocity of deployment through automation. We hear now, frankly, the battle is for whether I'm gonna run these type of workloads on-prem versus off-prem. We have a set of technology as well as services, enabling services with Pointnext. You remember the acquisition we made around cloud technology partners to right-place where those workloads are gonna go and become like a broker in that conversation and assist customers to make that transition and then, ultimately, give them an elastic platform that's gonna scale for the diverse set of workloads that's well-known, sized, easy to deploy. >> As you get all this data, and the data's, you know, Hadoop, it sorta blew up the data model. Said, "Okay, we'll leave the data where it is, "we'll bring the compute there." You had a lot of skunk works projects growing. What about governance, security, compliance? As you have data sprawl, how are customers handling that challenge? Is it a challenge? >> Yeah, it certainly is a challenge. I mean, we've gone through it just recently with, you know, GDPR is implemented. You gotta think about how that's gonna fit into your workflow, and certainly security. The big thing that we see, certainly, is around if the data's residing outside of your traditional data center, that's a big issue. For us, when we have Edgeline servers, certainly a lot of things are coming in over wireless, there's a big buildout in advent of 5G coming out. That certainly is an area that customers are very concerned about in terms of who has their data, who has access to it, how can you tag it, how can you make sure it's secure. That's a big part of what we're trying to provide here at HPE. >> What specifically is HPE doing to address these problems? Products, services, partnerships, maybe you could talk about that a little bit. Maybe even start with, you know, what's your philosophy on infrastructure for big data and AI workloads? >> I mean, for us, we've over the last two years have really concentrated on essentially two areas. We have the Intelligent Edge, which is, certainly, it's been enabled by fantastic growth with our Aruba products in the networks in space and our Edgeline systems, so, being able to take that type of compute and get it as far out to the edge as possible. The other piece of it is around making hybrid IT simple, right? In that area, we wanna provide a very flexible, yet easy-to-deploy set of infrastructure for big data and AI workloads. We have this concept of the Elastic Platform for Analytics. It helps customers deploy that for a whole myriad of requirements. Very compute-oriented, storage-oriented, GPUs, cold and warm data lakes, for that matter. And the third area, what we've really focused on is the ecosystem that we bring to our customers as a portfolio company is evolving rapidly. As you know, in this big data and analytics workload space, the software development portion of it is super dynamic. If we can bring a vetted, well-known ecosystem to our customers as part of a solution with advisory services, that's definitely one of the key pieces that our customers love to come to HP for. >> What about partnerships around things like containers and simplifying the developer experience? >> I mean, we've been pretty public about some of our efforts in this area around OneSphere, and some of these, the models around, certainly, advisory services in this area with some recent acquisitions. For us, it's all about automation, and then we wanna be able to provide that experience to the customers, whether they want to develop those apps and deploy on-prem. You know, we love that. I think you guys tag it as true private cloud. But we know that the reality is, most people are embracing very quickly a hybrid cloud model. Given the ability to take those apps, develop them, put them on-prem, run them off-prem is pretty key for OneSphere. >> I remember Antonio Neri, when you guys announced Apollo, and you had the astronaut there. Antonio was just a lowly GM and VP at the time, and now he's, of course, CEO. Who knows what's in the future? But Apollo, generally at the time, it was like, okay, this is a high-performance computing system. We've talked about those worlds, HPC and big data coming together. Where does a system like Apollo fit in this world of big data workloads? >> Yeah, so we have a very wide product line for Apollo that helps, you know, some of them are very tailored to specific workloads. If you take a look at the way that people are deploying these infrastructures now, multi-tenant with many different workloads. We allow for some compute-focused systems, like the Apollo 2000. We have very balanced systems, the Apollo 4200, that allow a very good mix of CPU, memory, and now customers are certainly moving to flash and storage-class memory for these type of workloads. And then, Apollo 6500 were some of the newer systems that we have. Big memory footprint, NVIDIA GPUs allowing you to do very high calculations rates for AI and ML workloads. We take that and we aggregate that together. We've made some recent acquisitions, like Plexxi, for example. A big part of this is around simplification of the networking experience. You can probably see into the future of automation of the networking level, automation of the compute and storage level, and then having a very large and scalable data lake for customers' data repositories. Object, file, HTFS, some pretty interesting trends in that space. >> Yeah, I'm actually really super excited about the Plexxi acquisition. I think it's because flash, it used to be the bottleneck was the spinning disk, flash pushes the bottleneck largely to the network. Plexxi gonna allow you guys to scale, and I think actually leapfrog some of the other hyperconverged players that are out there. So, super excited to see what you guys do with that acquisition. It sounds like your focus is on optimizing the design for I/O. I'm sure flash fits in there as well. >> And that's a huge accelerator for, even when you take a look at our storage business, right? So, 3PAR, Nimble, All-Flash, certainly moving to NVMe and storage-class memory for acceleration of other types of big data databases. Even though we're talking about Hadoop today, right now, certainly SAP HANA, scale-out databases, Oracle, SQL, all these things play a part in the customer's infrastructure. >> Okay, so you were talking before about, a little bit about GPUs. What is this HPE Elastic Platform for big data analytics? What's that all about? >> I mean, we have a lot of the sizing and scalability falls on the shoulders of our customers in this space, especially in some of these new areas. What we've done is, we have, it's a product/a concept, and what we do is we have this, it's called the Elastic Platform for Analytics. It allows, with all those different components that I rattled off, all great systems in of their own, but when it comes to very complex multi-tenant workloads, what we do is try to take the mystery out of that for our customers, to be able to deploy that cookie-cutter module. We're even gonna get to a place pretty soon where we're able to offer that as a consumption-based service so you don't have to choose for an elastic type of acquisition experience between on-prem and off-prem. We're gonna provide that as well. It's not only a set of products. It's reference architectures. We do a lot of sizing with our partners. The Hortonworks, CloudEra's, MapR's, and a lot of the things that are out in the open source world. It's pretty good. >> We've been covering big data, as you know, for a long, long time. The early days of big data was like, "Oh, this is great, "we're just gonna put white boxes out there "and off the shelf storage!" Well, that changed as big data got, workloads became more enterprise, mainstream, they needed to be enterprise-ready. But my question to you is, okay, I hear you. You got products, you got services, you got perspectives, a philosophy. Obviously, you wanna sell some stuff. What has HPE done internally with regard to big data? How have you transformed your own business? >> For us, we wanna provide a really rich experience, not just products. To do that, you need to provide a set of services and automation, and what we've done is, with products and solutions like InfoSight, we've been able to, we call it AI for the Data Center, or certainly, the tagline of predictive analytics is something that Nimble's brought to the table for a long time. To provide that level of services, InfoSight, predictive analytics, AI for the Data Center, we're running our own big data infrastructure. It started a number of years ago even on our 3PAR platforms and other products, where we had scale-up databases. We moved and transitioned to batch-oriented Hadoop. Now we're fully embedded with real-time streaming analytics that come in every day, all day long, from our customers and telemetry. We're using AI and ML techniques to not only improve on what we've done that's certainly automating for the support experience, and making it easy to manage the platforms, but now introducing things like learning, automation engines, the recommendation engines for various things for our customers to take, essentially, the hands-on approach of managing the products and automate it and put into the products. So, for us, we've gone through a multi-phase, multi-year transition that's brought in things like Kafka and Spark and Elasticsearch. We're using all these techniques in our system to provide new services for our customers as well. >> Okay, great. You're practitioners, you got some street cred. >> Absolutely. >> Can I come back on InfoSight for a minute? It came through an acquisition of Nimble. It seems to us that you're a little bit ahead, and maybe you say a lot a bit ahead of the competition with regard to that capability. How do you see it? Where do you see InfoSight being applied across the portfolio, and how much of a lead do you think you have on competitors? >> I'm paranoid, so I don't think we ever have a good enough lead, right? You always gotta stay grinding on that front. But we think we have a really good product. You know, it speaks for itself. A lot of the customers love it. We've applied it to 3PAR, for example, so we came out with some, we have VMVision for a 3PAR that's based on InfoSight. We've got some things in the works for other product lines that are imminent pretty soon. You can think about what we've done for Nimble and 3PAR, we can apply similar type of logic to Elastic Platform for Analytics, like running at that type of cluster scale to automate a number of items that are pretty pedantic for the customers to manage. There's a lot of work going on within HPE to scale that as a service that we provide with most of our products. >> Okay, so where can I get more information on your big data offerings and what you guys are doing in that space? >> Yeah, so, we have, you can always go to hp.com/bigdata. We've got some really great information out there. We're in our run-up to our big end user event that we do every June in Las Vegas. It's HPE Discover. We have about 15,000 of our customers and trusted partners there, and we'll be doing a number of talks. I'm doing some work there with a British telecom. We'll give some great talks. Those'll be available online virtually, so you'll hear about not only what we're doing with our own InfoSight and big data services, but how other customers like BTE and 21st Century Fox and other folks are applying some of these techniques and making a big difference for their business as well. >> That's June 19th to the 21st. It's at the Sands Convention Center in between the Palazzo and the Venetian, so it's a good conference. Definitely check that out live if you can, or if not, you can all watch online. Excellent, Patrick, thanks so much for coming on and sharing with us this big data evolution. We'll be watching. >> Yeah, absolutely. >> And thank you for watcihing, everybody. We'll see you next time. This is Dave Vellante for theCUBE. (fast techno music)

Published Date : Jun 12 2018

SUMMARY :

From the SiliconANGLE media office and the infrastructure that in terms of the market forces right now to modernize what you have on-prem today, One of the things that we're seeing, of their edge to the cloud, of the data pipeline A lot of the techniques What about some of the technical trends for a lot of the services that drives. Am I correct that a lot of the workloads for the complexity of the tooling. You remember the acquisition we made the data where it is, is around if the data's residing outside Maybe even start with, you know, of the Elastic Platform for Analytics. Given the ability to take those apps, GM and VP at the time, automation of the compute So, super excited to see what you guys do in the customer's infrastructure. Okay, so you were talking before about, and a lot of the things But my question to you and automate it and put into the products. you got some street cred. bit ahead of the competition for the customers to manage. that we do every June in Las Vegas. Definitely check that out live if you can, We'll see you next time.

ENTITIES

Entity	Category	Confidence
Patrick	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Aruba	LOCATION	0.99+
Antonio	PERSON	0.99+
BTE	ORGANIZATION	0.99+
Patrick Osborne	PERSON	0.99+
HPE	ORGANIZATION	0.99+
June 19th	DATE	0.99+
Antonio Neri	PERSON	0.99+
Las Vegas	LOCATION	0.99+
Pointnext	ORGANIZATION	0.99+
Hewlett Packard Enterprise	ORGANIZATION	0.99+
NVIDIA	ORGANIZATION	0.99+
third area	QUANTITY	0.99+
21st Century Fox	ORGANIZATION	0.99+
Apollo 4200	COMMERCIAL_ITEM	0.99+
@patrick_osborne	PERSON	0.99+
Apollo 6500	COMMERCIAL_ITEM	0.99+
InfoSight	ORGANIZATION	0.99+
MapR	ORGANIZATION	0.99+
Sands Convention Center	LOCATION	0.99+
Boston, Massachusetts	LOCATION	0.98+
Apollo 2000	COMMERCIAL_ITEM	0.98+
CloudEra	ORGANIZATION	0.98+
HP	ORGANIZATION	0.98+
Nimble	ORGANIZATION	0.98+
Spark	TITLE	0.98+
SAP HANA	TITLE	0.98+
next year	DATE	0.98+
GDPR	TITLE	0.98+
One app	QUANTITY	0.98+
Venetian	LOCATION	0.98+
two areas	QUANTITY	0.98+
today	DATE	0.98+
hp.com/bigdata	OTHER	0.97+
one	QUANTITY	0.97+
Hortonworks	ORGANIZATION	0.97+
Mode 2	OTHER	0.96+
single platform	QUANTITY	0.96+
SQL	TITLE	0.96+
One	QUANTITY	0.96+
21st	DATE	0.96+
Elastic Platform	TITLE	0.95+
3PAR	TITLE	0.95+
Hadoop 1.0	TITLE	0.94+
seven years ago	DATE	0.93+
CUBE Conversation	EVENT	0.93+
Palazzo	LOCATION	0.93+
Hadoop	TITLE	0.92+
Kafka	TITLE	0.92+
Hadoop 2.0	TITLE	0.91+
Elasticsearch	TITLE	0.9+
Plexxi	ORGANIZATION	0.87+
Apollo	ORGANIZATION	0.87+
of years ago	DATE	0.86+
Elastic Platform for Analytics	TITLE	0.85+
Oracle	ORGANIZATION	0.83+
TensorFlow	TITLE	0.82+
Edgeline	ORGANIZATION	0.82+
Intelligent Edge	ORGANIZATION	0.81+
about 15,000 of	QUANTITY	0.78+
one issue	QUANTITY	0.77+
five	DATE	0.74+
HPE Discover	ORGANIZATION	0.74+
both data	QUANTITY	0.73+
data	ORGANIZATION	0.73+
years	DATE	0.72+
SiliconANGLE	LOCATION	0.71+
EDW	TITLE	0.71+
Edgeline	COMMERCIAL_ITEM	0.71+
HPE	TITLE	0.7+
OneSphere	ORGANIZATION	0.68+
couple	QUANTITY	0.64+
3PAR	ORGANIZATION	0.63+

Big Data Silicon Valley 2018 Recap

>> Dave: Good morning everybody and welcome to Big Data SV. >> Come down, hang out with us today as we have continued conversations. >> Will this trend, this Big Data trend, solve the problems that decision support and business intelligence couldn't solve. We're going to talk about that today. Gentlemen, welcome to theCUBE. (energetic rock music) >> Dave: We're setting up for the digital business era. >> What do people really want to do? And it's big data analytics. I want to ingest a lot of information. I want to enrich it. I want to analyze it and I want to take actions and then I want to go park it. >> Leveraging everything that is open source to build models and put models in production. >> We talk a little bit like it's Google Docs for your data. >> So I no longer have to send daily data dumps to partners. They can simply query the data themselves. >> We've taken the two approaches of enterprise analytics and self-services and tried to create a scenario where you kind of get the best of both worlds. >> The epicenter of this whole data management has to move to cloud. >> It saves you a lot of time and effort. You can focus on more strategic projects. >> Do you agree it's kind of bifurcated. There's the Spotifys, and the Ubers, and the AirBnBs that are crushing it and then there's a lot of traditional enterprises that are still stovepipe and struggling. >> Marketing people, operational people, finance people, they need data to do their jobs. Their jobs are becoming more data-driven but they're not necessarily data people. >> They're depending on the vendor landscape to provide them with an entry level set of tools. >> Don't make me work harder and add new staff. Solve the problem. >> Yeah, it's all about solving problems. >> A lot more on machine learning now and artificial intelligence and frankly a lot of discussion around ethics. >> Data governance, it is in fact a business imperative. >> Marketers want all the customer data they can get, right? But there's social security numbers, PII-- Who should be able to see and use what because if this data is used inappropriately then it can cause a lot of problems. >> Creating that visibility is very important. >> The biggest casualty is going to be their customer relationship if they don't do this because most companies don't know their customers fully. >> The key that digital transformation is really a lauder on the concept of real time. >> If anybody deals with the data that's in motion, you lose because I'm analyzing as it's happening and then you would be analyzing after at rest. >> Speed is so important these days and the new companies that are grasping data aggressively, putting it somewhere where they can make decisions on it on a day-to-day basis, they're winning. >> Come on down, be part of our audience. We also have a great party tonight where you can network with some of our experts and analysts. (energetic rock music) >> Our expectation is that as the tooling gets better, we will see more people be able to present themselves truly as capable of doing this, and that will accelerate the process. >> To me, one of the first things a CDO has to do is understand how a company gets value out of its data. >> You can either run away from that data and say, look, I'm going to not, I'm going to bury my head in the sand, I'm going to be a business, I'm just going to forget about that data stuff and that's certainly a way to go. Right? It's a way to go away. >> It's easy to get overwhelmed for companies, you have to pick somewhere, right? >> You don't have to go sit in the basement for a year having something that is 'the thing', the unicorn in the business, it's small quick wins. >> We're not afraid of makin' mistakes. If we provision infrastructure and we don't get it right the first time, we just change it. >> That's something that we would just never be able to do previously in a data center. >> When companies get started with the right first project they can build on that success and invest more, whereas if you're not experimenting and trying things and moving, you're never going to get there. >> Dave: Thanks for watching, everybody. This is thCUBE. We're live from Big Data SV. >> And we're clear. Thank you. (audience applauds)

Published Date : Mar 12 2018

SUMMARY :

to Big Data SV. Come down, hang out with us today We're going to talk about that today. and I want to take actions and then I want to go park it. to build models and put models in production. So I no longer have to send daily data dumps to partners. We've taken the two approaches of enterprise analytics has to move to cloud. It saves you a lot of time and effort. and the AirBnBs that are crushing it they need data to do their jobs. to provide them with an entry level set of tools. Solve the problem. and artificial intelligence and frankly Who should be able to see and use what The biggest casualty is going to be on the concept of real time. If anybody deals with the data that's in motion, that are grasping data aggressively, putting it somewhere We also have a great party tonight where you can network Our expectation is that as the tooling gets better, To me, one of the first things a CDO has to do I'm going to be a business, I'm just going to forget You don't have to go sit in the basement for a year the first time, we just change it. able to do previously in a data center. and invest more, whereas if you're not experimenting This is thCUBE. And we're clear.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Ubers	ORGANIZATION	0.99+
AirBnBs	ORGANIZATION	0.99+
Spotifys	ORGANIZATION	0.99+
tonight	DATE	0.99+
today	DATE	0.98+
Google Docs	TITLE	0.98+
both worlds	QUANTITY	0.98+
Big Data	ORGANIZATION	0.98+
first time	QUANTITY	0.97+
one	QUANTITY	0.97+
first project	QUANTITY	0.97+
two approaches	QUANTITY	0.97+
first	QUANTITY	0.95+
a year	QUANTITY	0.93+
2018	DATE	0.92+
Big Data SV	ORGANIZATION	0.8+
Valley	TITLE	0.65+
Silicon	LOCATION	0.6+
Big Data	TITLE	0.46+

Lewis Kaneshiro & Karthik Ramasamy, Streamlio | Big Data SV 2018

(upbeat techno music) >> Narrator: Live, from San Jose, it's theCUBE! Presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partners. >> Welcome back to Big Data SV, everybody. My name is Dave Vellante and this is theCUBE, the leader in live tech coverage. You know, this is our 10th big data event. When we first started covering big data, back in 2010, it was Hadoop, and everything was a batch job. About four or five years ago, everybody started talking about real time and the ability to affect outcomes before you lose the customer. Lewis Kaneshiro was here. He's the CEO of Streamlio and he's joined by Karthik Ramasamy who's the chief product officer. They're both co-founders. Gentlemen, welcome to theCUBE. My first question is, why did you start this company? >> Sure, we came together around a vision that enterprises need to access the value around fast data. And so as you mentioned, enterprises are moving out of the slow data era, and looking for a fast data value to their data, to really deliver that back to their users or their use cases. And so, coming together around that idea of real time action what we did was we realized that enterprises can't all access this data with projects right now that are not meant to work together, that are very difficult, perhaps, to stitch together. So what we did was create an intelligent platform for fast data that's really accessible to enterprises of all sizes. What we do is we unify the core components to access fast data, which is messaging, compute and stream storage, accessing the best of breed open-source technology that's really open-source out of Twitter and Yahoo! >> It's a good thing I was going to ask why does the world need to know there are, you know, streaming platforms, but Lewis kind of touched on it, 'cause it's too hard. It's too complicated, so you guys are trying to simplify all that. >> Yep, the reason mainly we wanted to simplify it because, based on all our experiences at Twitter and Yahoo! one of the key aspects was to to simplify it so that it's conceivable by regular enterprise because Twitter and Yahoo! kind of our position can afford the talent and the expertise in order to do this real time platforms. But when it goes to normal enterprises, they don't have access to the expertise and the cost benefits that they might have to reincur. So, because of that we wanted to use these open-source projects, the Twitter and the Yahoo!'s provider, combine them, and make sure that you have a simple, easy, drag and drop kind of interface, so that it's easily conceivable for any enterprise. Essentially, what we are trying to do is reduce the (mumbles) for enterprises for real time, for all enterprises. >> Dave: Yeah, enterprises will pay up... >> Yes. >> For a solution. The companies that you used to work for, they all gladly throw engineering at the problem. >> Yeah. >> Sure. >> To save time, but most organizations, they don't have the resources and so. Okay, so how does it, would it work prior to Streamlio? Maybe take us through sort of how a company would attack this problem, the complexities of what they have to deal with, and what life is like with you guys. >> So, current state of the world is it's fragmented solution, today. So the state of the world is where you take multiple pieces of different projects and you'd assemble them together in formats so that you can do (mumbles) right? So the reason why people end up doing is each of these big data projects that people use was the same for completely different purpose. Like messaging is one, and compute is another one, and third one is storage one. So, essentially what we have done as company is to simplify this aspect by integrating this well-known, best-of-the-breed projects called, for messaging we use something called Apache Poser, for compute we use something called Apache Krem, from Twitter, and similarly for storage, for real time storage, we use something called Apache Bookkeeper, so and to unify them, so that, under the hoods, it may be three systems, but, as a user, when you are using it, it serves or functions as a single system. So you install the system, and ingest your data, express your computation, and get the results out, in one single system. >> So you've unified or converged these functions. If I understand it correctly, we talking off camera a little bit, the team, Lewis, that you've assembled actually developed a lot of these, or hugely committed to these open-source projects, right? >> Absolutely, co-creators of each of the projects and what that allows us to do is to really integrate, at a deep level, each project. For example, Pulsar is actually a pub/sub system that is built on Bookkeeper, and Bookkeeper, in our minds, is a pure list best-of-breed stream storage solution. So, fast and durable storage. That storage is also used in Apache Heron to store State. So, as you can see, enterprises, rather than stitching together multiple different solutions for queuing, streaming, compute, and storage, now have one option that they can install in a very small cluster, and operationally it's very simple to scale up. We simply add nodes if you get data spikes. And what this allows is enterprises to access new and exciting use cases that really weren't possible before. For example, machine learning model deployment to real time. So I'm a data scientist and what I found is in data science, you spend a lot of time training models in batch mode. It's a legacy type of approach, but once the model is trained, you want to put that model into production in real time so that you can deliver that value back to a user in real time. Let's call it under two second SLA. So, that has been a great use case for Streamlio because we are a ready made intelligent platform for fast data, for MLai deployment. >> And the use cases are typically stateful and your persisting data, is that right? >> Yes, use cases, it can be used for stateless use cases also, but the key advantage that we bring to a table is stateful storage. And since we ship along with the storage (mumbles) stateful storage becomes much easier because of the fact that it can be used to store a real intermediate state of the computation or it can be used for the staging (mumbles) data when it spills over from what the memory is it's automatically stored to disk or you can even in the data for as long as you want so that you can unlock the value later after the data has been processed for the fast data. You can access the lazy data later, in time. >> So give us the run-down on the company, funding, you know, VCs, head count. Give us the basics. >> Sure, we raise Series A from Lightspeed Venture Partners, lead by John Vrionis and Sudip Chakrabarti. We've raised seven and a half million and emerged from stealth back in August. That allowed us to ramp up our team to 17, now, mainly engineers, in order to really have a very solid product, but we launched post rev, prelaunch and some of our customers are really looking at geo replication across multiple data centers and so active, active geo replication is an open-source feature in Apache Pulsar, and that's been a huge draw, compared to some other solutions that are out there. As you can see, this theme of simplifying architecture is where Streamlio sits, so unifying, queuing and streaming allows us to replace a number of different legacy systems. So that's been one avenue to help growth. The other, obviously is on the compute piece. As enterprises are finding new and exciting use cases to deliver back to their users, the compute piece needs to scale up and down. We also announce Pulsar Functions, which is stream-native compute that allows very simple function computation in native Python and Java, so you spin out the Apache Python cluster or Streamlio platform, and you simply have compute functionality. That allows us to access edge use cases, so IOT is a huge, kind of exciting POC's for us right now where we have connected car examples that don't need heavyweight schedule or deployment at the edge. It's Pulsar Pulsar functions. What that allows us to do are things like fraud detection, anomaly detection at the edge, model deployment at the edge, interpolation, observability, and alerts. >> And, so how do you charge for this? Is it usage based. >> Sure. What we found is enterprise are more comfortable on a per node basis, simply because we have the ambition to really scale up and help enterprises really use Streamlio as their fast data platform across the entire enterprise. We found that having a per data charge rate actually would limit that growth, and so per node and shared architecture. So, we took an early investment in optimizing around Kubernetes. And so, as enterprises are adopting Kubernetes, we are the most simple installation on Kubernetes, so on-prem, multicloud, at the edge. >> I love it, so I mean for years we've just been talking about the complexity headwinds in this big data space. We certainly saw that with Hadoop. You know, Spark was designed to certainly solve some of those problems, but. Sounds like you're doing some really good work to take that further. Lewis and Karthik, thank you so much for coming on theCUBE. I really appreciate it. >> Thanks for having us, Dave. >> All right, thank you for watching. We're here at Big Data SV, live from San Jose. We'll be right back. (techno music)

Published Date : Mar 9 2018

SUMMARY :

brought to you by SiliconANGLE Media and the ability to affect outcomes And so as you mentioned, enterprises are moving out so you guys are trying to simplify all that. and the cost benefits that they might have to reincur. The companies that you used to work for, and what life is like with you guys. so that you can do (mumbles) right? the team, Lewis, that you've assembled so that you can deliver that value so that you can unlock the value later you know, VCs, head count. the compute piece needs to scale up and down. And, so how do you charge for this? have the ambition to really scale up and help enterprises Lewis and Karthik, thank you so much for coming on theCUBE. All right, thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Karthik Ramasamy	PERSON	0.99+
Karthik	PERSON	0.99+
Lewis Kaneshiro	PERSON	0.99+
Dave	PERSON	0.99+
San Jose	LOCATION	0.99+
Lightspeed Venture Partners	ORGANIZATION	0.99+
John Vrionis	PERSON	0.99+
Lewis	PERSON	0.99+
2010	DATE	0.99+
August	DATE	0.99+
three systems	QUANTITY	0.99+
Streamlio	ORGANIZATION	0.99+
Yahoo!	ORGANIZATION	0.99+
each	QUANTITY	0.99+
Twitter	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Java	TITLE	0.99+
first question	QUANTITY	0.99+
Sudip Chakrabarti	PERSON	0.99+
one option	QUANTITY	0.99+
Python	TITLE	0.99+
both	QUANTITY	0.99+
seven and a half million	QUANTITY	0.99+
17	QUANTITY	0.98+
each project	QUANTITY	0.98+
third one	QUANTITY	0.98+
Kubernetes	TITLE	0.98+
single system	QUANTITY	0.98+
first	QUANTITY	0.96+
Pulsar	TITLE	0.96+
Streamlio	TITLE	0.96+
Spark	TITLE	0.94+
Bookkeeper	TITLE	0.94+
one	QUANTITY	0.93+
one single system	QUANTITY	0.92+
theCUBE	ORGANIZATION	0.91+
today	DATE	0.91+
Big Data SV 2018	EVENT	0.9+
Apache	ORGANIZATION	0.89+
Silicon Valley	LOCATION	0.89+
SLA	TITLE	0.89+
one avenue	QUANTITY	0.89+
Series A	OTHER	0.88+
five years ago	DATE	0.86+
Big Data	EVENT	0.85+
About four	DATE	0.85+
Big Data SV	EVENT	0.82+
IOT	TITLE	0.81+
Poser	TITLE	0.75+
Big Data SV	ORGANIZATION	0.71+
10th big	QUANTITY	0.67+
Apache Heron	TITLE	0.65+
under two second	QUANTITY	0.62+
data	EVENT	0.61+
Streamlio	PERSON	0.54+
event	QUANTITY	0.48+
Hadoop	TITLE	0.45+
Krem	TITLE	0.32+

John Furrier & Dave Vellante unpack the Russion Hack | Big Data SV 2018

>> Announcer: Live from San Jose. It's theCUBE. Presenting big data, Silicon Valley. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Hello everyone, I'm John Furrier, co-host of theCube. I'm here with Dave Vellante, my co-host. Exclusive conversation around the role of data, data for good and bad. We always cover the role of data. We used to talk about AI and data for good but in this exclusive interview... And we have some exclusive material about data for bad. Dave, we've been talking about weaponizing data a year ago in SiliconEAGLE in theCUBE, around how data is being weaponized, and certainly in the elections. We know the Russians were involved. We know that data, you can buy journalists, you can create fake news. And for every click-bate and fake news is bad content. But also on the other side of this, there's good bate; good news. So the world's changin'. There needs to be a better place, needs to be some action taken, because there's now evidence that the role that the Russians had, using fake news and weaponizing it to sway the election and other things has been out there. So this is somethin' that we've been talkin' about. >> Yeah I mean the signature of the hacks is pretty clear. I think there is a distinct signature when you talk to the experts of when it's China or when it's Russia. Russia, very clever, about the way they target somebody whose maybe a pawn; but they try to make him or her feel like a king, grab their credentials and then work their way in. They've been doing this for decades, right? >> And the thing is to, is that now it's not just state-sponsored, there's new groups out there that they can enable open source tools. We report on theCUBE that terrorist organizations and bad actors, are taking open source tools and threats from state nations, posing as threats to democracy in the U.S. and other countries. This is a huge problem. >> And it's, in a way, it's harder than the nuclear problem. We had weapons pointed at each other, right. This is... The United States has a lot to lose. If we go on the offense, others can attack us and attack our systems, which are pretty mature. So, recently we talked to Garry Kasparov. I had an exclusive interview with him. He's very outspoken. Kasparov is the greatest chess player in history, by most accounts. And he is a political activist, he's an author. And he had a number of things to say about this. Let's listen to him, it's about a couple minute clip, and then we'll come back and talk about it. Watch this. >> Garry: Knowing Vladimir Putin and the mentality of the KGB mentality and the way he has been approaching the global problems; I had no doubt that the question was not if Putin would attack somewhere, but the question is when and where? And the attack on U.S. democracy was a surprise here but it was not surprise for us because we could see how they built these capabilities for more than a decade. Because they have been creating fake news industry in Russia to deal with Russian opposition 2004, 2005. Then they used against neighboring countries like Estonia in 2007. Then they moved to eastern Europe and then through western Europe. So when they ended up attacking the United States, they would've had almost a decade of experience. And it's quite unfortunate that, while there was kind of information about this attacks, the previous administration decided just to take it easy. And the result is that we have this case of interference; I hope there will be more indictments. I hope we'll get to the bottom of that. Because, we know that they are still pretty active in Europe. And they will never seize there-- >> Dave: Germany, France-- >> Garry: Exactly. But it's... I call Putin as: merchant of doubt. Because, unlike Soviet propaganda machine, he's not selling one ideology. All he wants is to spread chaos. So that's why it's not about and, oh this is the only, the right teaching. No, no, no. No, it's wrong, it's wrong, everything... Yeah, maybe there are 10 different ways of saying the truth. Truth is relevant. And that's a very powerful message because it's spreading these doubts. And he's very good in just creating these confusions and actually, bringing people to fight each other. And I have to say he succeeded-- >> Dave: Our president is taken a page out of that. Unfortunately. But I also think the big issue we face as a country, in the United States, is 2020. Is the election in 2020 is going to be about who leverages social media and the weaponization of social media. And the Russian attackers you talk to the black hats, very sophisticated, very intriguing how they come in, they find the credentials-- >> Garry: But look, we know, Jesus, every expert knows that in this industry, if you are trying to defend yourself, if you are on the defense all the time you will lose. It's a losing proposition. So the only way to deter the aggression is to make sure that they won't be counterattacks. So that there will be devastating blows, those who are attacking the United States. And you need the political will because, technology is here; America is still the leading power in the world. But the political will, unfortunately-- >> Dave: However, I would say that, but it's different than with nuclear warheads. Robert Gates was on theCUBE, he said to me, and I asked him about offense versus defense. He said the only thing about the Unite States is we have a lot to lose. So we have to be careful. (laughter) How aggressive we can be. >> Garry: No, exactly. That is just, it's, yes. It's a great error of uncertainty: what can you lose? If you show strength. But I can tell you exactly how you are going to lose everything, if you are not-- >> Dave: Vigilant. >> Garry: If you are not vigilant. If you are not deterrent. If you are not sending the right signal to the Putins of this world that aggression against America will have the price that you cannot bear. >> So John, pretty unequivocal comments from Garry Kasparov. So a lot of people don't believe that you can actually manipulate social media that way. You've been in social for a long time, since the beginning days. Maybe you could explain how one, would a country or a state sponsored terrorism; how would they go about manipulating individuals? >> You know Dave, I've been involved in internet infrastructure from the beginning days of Web 1.0 and through search engines. Student of the data. I've seen the data. I've seen our, the data that we have from our media company. I've seen the data on Facebook and here's the deal: there's bad actors doin' fake news, controlling everything, creating bad outcomes. It's important for everyone to understand that there's an actual opposite spectrum. Which is the exact opposite of the bad; there's a good version. So what we can learn from this is that there's a positive element of this, if we can believe it, which is actually a way to make it work for good. And that is trust, high-quality data, reputation and context. That is a very hard problem. Facebook is tryin' to solve it. You know we're workin' on solving that. But here's the anatomy of the hack. If you control the narrative, you can control the meme. If you can control the meme, you can control the idea. If you can control the idea, you can control the belief system. If you can control the belief system, you can control the population. That is exactly what has happened with the election. That is what's happening now in social networks. That's why so many people are turning off to social networks. Because this is hackable; you can actually hack the brains and outcomes of people. Because, controlling the narrative, controlling the meme, controlling the idea, controlling the belief system: you can impact the population. That has absolutely been done. >> Without firin' a shot. >> Without firing a shot. This is the new cold social network wars that are goin' on. And again, that has been identified, but there's an opposite effect. And the opposite effect is having a trust system, a short cut to trust; there will be a Google in our future, Google, like what Google did to search engines. It will be for social networks. That is, whoever can nail the trust, reputation, context: what is real and what is not. Will ultimately have all the users goin' to their doorstep. This is the opportunity for news organizations, for platforms and it's all going to be driven by new infrastructure, new software. This is something we can learn from. But there is a way to hack, it's been done. I've just laid it out. That's what's happening. >> Well, blockchain solved or play a role in solving this problem of reputation in your opinion. >> Well you know that I believe centralized is bad. 'Cause you can hack a centralized database and the data. Ownership is huge. I personally believe that blockchain and this notion of decentralized data ownership will ultimately go back to the people and that the decentralized applications and cryptocurrency leads a path, it's not yet proven, there's no clear visibility yet. But many believe that the wallet is a new browser and that cryptocurrency can put the power to the people; so that new data can emerge. To vet in a person who says they're something that they're not. News that says they're somethin' that they're not. This is a trust. This is something that is not yet available. That's what I'm sayin'. You can't get it with Google, you can't get it with Facebook. You can't get it in these platforms. So the world has to change at an infrastructure level. That's the opportunity to blockchain. Aside from all the things like who's going to give the power for the miners; a variety of technical issues. But conceptually, there is a path there. That's a new democracy. This is global phenomenon. It's a societal change. This is so cutting edge, but it's yet very promising at the same time. >> This is super important because I can't tell you how many times have you've received an email from one political persuasion or the other that lays out emphatically, that this individual did that or... And you do some research and you find out it's fake news. It happens all the time. >> There's no context for these platforms. Facebook optimizes their data for advertising optimization and you're going to see data being optimized for user control, community control, community curation. More objective not subjective data. This is the new algorithm, this is what machine learning in AI will make a difference. This is the new trust equation that will emerge. This is a phenomenal opportunity for entrepreneurs. If you're in the media business and you're not thinking about this, you will be out of business. That's our opinion. >> Excellent John. Well thanks for your thoughts and sharing with us how these hacks are done. This is real. The midterm elections, 2020 is really going to be won or lost on social media. Appreciate that. >> And Facebook's fumbling and they're going to try to do good. We'll see what they do. >> Alright. >> Alright. >> That's a wrap. Good job. >> Thanks for watching.

Published Date : Mar 9 2018

SUMMARY :

Brought to you by SiliconANGLE Media that the role that the Russians had, using fake news Yeah I mean the signature of the hacks is pretty clear. And the thing is to, is that now it's not Kasparov is the greatest chess player in history, I had no doubt that the question was not the right teaching. And the Russian attackers you talk to the black hats, America is still the leading power in the world. He said the only thing about the Unite States is we It's a great error of uncertainty: what can you lose? If you are not sending the right signal So a lot of people don't believe that you can actually Which is the exact opposite of the bad; This is the new cold social network wars that are goin' on. in solving this problem of reputation in your opinion. and that cryptocurrency can put the power to the people; This is super important because I can't tell you This is the new algorithm, this is what machine learning This is real. And Facebook's fumbling and they're going to try to do good. That's a wrap.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Putin	PERSON	0.99+
Garry	PERSON	0.99+
2007	DATE	0.99+
Robert Gates	PERSON	0.99+
John Furrier	PERSON	0.99+
John	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
Garry Kasparov	PERSON	0.99+
2004	DATE	0.99+
San Jose	LOCATION	0.99+
Jesus	PERSON	0.99+
Kasparov	PERSON	0.99+
Google	ORGANIZATION	0.99+
2005	DATE	0.99+
United States	LOCATION	0.99+
2020	DATE	0.99+
Europe	LOCATION	0.99+
Vladimir Putin	PERSON	0.99+
Putins	PERSON	0.99+
10 different ways	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
KGB	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
U.S.	LOCATION	0.99+
more than a decade	QUANTITY	0.98+
eastern Europe	LOCATION	0.98+
western Europe	LOCATION	0.98+
Russian	OTHER	0.98+
America	ORGANIZATION	0.98+
Russia	LOCATION	0.97+
decades	QUANTITY	0.96+
a year ago	DATE	0.96+
Estonia	LOCATION	0.94+
theCube	ORGANIZATION	0.92+
Germany,	LOCATION	0.89+
theCUBE	ORGANIZATION	0.89+
one ideology	QUANTITY	0.87+
a couple minute	QUANTITY	0.81+
Unite States	LOCATION	0.81+
Soviet	OTHER	0.78+
Russians	PERSON	0.77+
Russion Hack	TITLE	0.75+
China	LOCATION	0.74+
United States	ORGANIZATION	0.7+
almost a decade	QUANTITY	0.69+
one political persuasion	QUANTITY	0.68+
Russia	ORGANIZATION	0.65+
Big Data SV 2018	TITLE	0.6+
SiliconEAGLE	ORGANIZATION	0.56+
theCUBE	TITLE	0.53+
France	LOCATION	0.49+
Web	OTHER	0.43+

Jaspreet Singh, Druva & Jake Burns, Live Nation | Big Data SV 2018

>> Narrator: Live from San Jose, it's theCUBE. Presenting: Big Data Silicon Valley. Brought to you by SiliconANGLE Media, and its ecosystem partners. >> Welcome back, everyone, we're here live at San Jose for Big Data SV, Big Data Silicon Valley. I'm John Furrier, cohost of theCUBE. We're here with two great guests, Jaspreet Singh, founder and CEO of Druva, and Jake Burns, VP of Cloud Services of Live Nation Entertainment. Welcome to theCUBE, so what's going on with Cloud? Apps are out there, backup, recovery, what's going on? >> So, we went all in with AWS, and late 2015 and through 2016 we moved all of our corporate infrastructure into AWS, and I think we're a little bit unique in that situation, so in terms of our posture, we're 100% Cloud. >> John: Jaspreet, what's going on with you guys in the Cloud, because we've talked about this before, with a lot of the apps in the cloud, backup is really important. What's the key thing that you guys are doing together with Live Nation? >> Sure, so I think the notion of data is now pretty much everywhere. The data is captured, controlled in data center, now it's getting decentralized into getting into apps and ecosystems, and softwares and services deployed either at the edge or in the Cloud. As the data gets more and more decentralized, the notion of data management, bead backup, BD discovery. Anything has to get more and more centralized. And we strongly believe the epicenter of this whole data management has to move to Cloud. So, Druva is a size based provider for data management. And we work with Live Nation to predict the apps not just in the data center. But, also at the edge and also the Cloud data center. The applications deployed in the Cloud, be it Live Nation or Ticketmaster. >> And what are some of the workloads you guys are backing up? That's with Druva. >> Yeah so, it's pretty much all corporate, IT applications. You know, typical things you'd find in any IT shop really. So, you know, we have our financial systems and we have some of our smaller ticketing systems and you know, corporate websites. Things of that nature. So, it's like we have 120 applications that are running and it's just really kind of one of everything. >> We were talking before we came on camera about the history of computing and the Cloud has obviously changed the game. How would you compare the Cloud as a trend relative to operationalizing the role of data and obviously GDPR, Ransomware. These are things that now with the perimeter gone. There's worries. So now, how do you guys look at the Cloud? So Jake, I will start with you. If you can compare and contrast, where we have come from and where we are going. Role of the Cloud. Significant primary, expanding. How would you compare that? And how would you talk to someone who says Hey I'm still in the data center world? What's going on with Cloud? >> Well, yeah, it's significant and it's expanding, both. And you know, it's really transforming the way we do business. So you know just from a high level, things like shortening the time to market for applications, going from three to six months just to get a proof of concept started to today, you know, in the Cloud. Being able to innovate really by trying things trying to... we try 20 different things, decide what works, what doesn't work. And at very low cost. So, it allows us to really do things that just weren't possible before. So, also, we we move more quickly because, you know, we're not afraid of making mistakes. If we provision infrastructure and we don't get it right the first time, we just change it. You know, that's something that we would just never be able to do previously in the data center. So to answer your question, everything is different. >> And as a service model's been kind of key. Is the consumption on your end different like I mean radically different? Like give an example of like how much time would be saved or taken to use other the traditional approaches. >> Oh for sure. You know, in the role of IT has completely changed because you know, instead of worrying about nuts and bolts and servers and storage arrays and data centers. You know, we could really focus on the things that are important to the business. You know, those things delivering results for the business. So, bringing value, bringing applications online and trying things that are going to help you know, us do business rather than focusing on all the minutiae. All that stuff's now been outsourced to Cloud providers. So, really, we kind of have a similar head count and staff. But, we are focused on things that bring value rather than things that are just kind of frivolous. >> Jaspreet, you guys have been very successful startup growing rapidly. The Cloud been a good friend that trend is your friend with the Cloud. >> What's different operationally that you guys are tapping into? What's that tail wind for Druva that's making you guys successful? And is it the ease of use? Is it the ease of consumption? Is it the tech? What's the secret to success with Druva? >> Sure, so, we believe cloud is a very big business transformation trend more than a technology trend. It's how you consumer service with a fixed SLA, with a fixed service agreement across the globe. So, it's ease of consumption. It's simplicity of views. It's orchestration. It's cost control. All those things. So, our promise to our customers is the complexity of data management, backups, archives, data production, which is a risk mitigation project. You know, can be completely abstracted by a simple service. For example, you know, Live Nation consumers, consumer drove a service through Amazon Marketplace. So, think about consuming a critical service like data management through simplicity of marketplace, pay as you go, as you consume the service. Across the globe. In the US, in Australia, and Europe. And also, helps the vendors like us to innovate better. Because we have a control environment to understand how different customers are using the service and be able to orchestrate better security pusher, better threat prevention, better cost control. DevOps. So, it improves the pusher of the service being offered and helps the customer consumer. >> You both are industry veterans by today's standards unless you're like 24 doing some of the cryptocurrency stuff that, you know, doesn't know the old IT baggage. How would you guys view the multi-Cloud conversation? Because we hear that all the time. Multi-Cloud has come up so many times. What does it mean? Jake, what does multi-Cloud actually mean? Is it the same workload across multiple Clouds? Is it the fact that there is multiple Clouds? Certainly, there will be multiple Clouds? But, so, help us digest what that even means these days. >> Yeah, that's a great question and it's a really interesting topic. Multi-Cloud is one of those things where, you know, there's so many benefits to using more than one Cloud provider. But, there are also a lot of pitfalls. So, people really underestimate the difference in the technology and the complexity of managing the technology when you change Cloud providers. I'm talking primarily about infrastructure service providers like Amazon web services. So, you know, I think there's a lot of good reasons to be multi-Cloud to get the best features out of different providers, to not have, you know, the risk of having all your data in one place with one vendor. But, you know, it needs to be done in such a way where you don't take that hit in overhead and complexity and you know, I think that's kind of a prohibitive barrier for most enterprises. >> And what are the big pitfalls that you see? Is it mainly underestimating the stack complexity between them or is it more of just operational questions? I mean what is the pitfalls that you've observed? >> Yeah, so, moving from like a typical IT data center environment to public Cloud provider like AWS. You're essentially asking all your technical staff to start speaking in a new language. Now if you were to introduce a second Cloud provider to that environment, now you're asking them to learn a third language as well. And that's a lot to ask. So, you really have two scenarios where you can make that work today without using a third party. And that's ask all of your staff to know both and that's just not feasible. Or have two tech teams. One for each Cloud platform. That's really not something businesses want to do. So, I think the real answer is to rely on a third party that can come in and abstract one of those Cloud complexities Well, one of those Cloud providers out. So, you don't have to directly manage it. And in that way, you can get the benefit of being multi-Cloud, that data protection of being multi-Cloud. But, not have to introduce that complexity to your environment. >> To provide some abstraction layer. Some sort of software approach. >> Yeah, like for example, if you have your primary systems in AWS, and you use a software like Druva Phoenix to backup your data and you put that data into a second Cloud provider. You don't have to an account with that second Cloud provider. You don't have to have the risk of associating without a complexity associated without that is I think is a very >> And that's where you're looking for differentiation. We look at venues, say hey don't make me work harder. >> Right. >> And add new staff. Solve the problem. >> Yeah, it's all about solving problems right? And that's why we're doing this. >> So, Druva talk about this thing. Because we talked about it earlier about To me we could be oh we're on Azure. Well, they have Office 365 of course they're going to have Microsoft. A lot of people have a lot going on and AWS. So, maybe we're not there at the world where you can actually use provision across Clouds, the same workload, It would be nice to have that someday if it was seamless. But, I think that's might be the nirvana. But at the end of the day, an enterprise might have Office 365 and some Azure. But, I got some mostly Amazon over here I'm doing a lot of development on and doing a DevOps, and I'm on-prim. How do you talk to that? Because that's like you got to backup Office 365, you got to do the on-prim thing, you got to do the Amazon thing. How do you guys solve that problem? What's the conversation? >> Absolutely. I think over time we believe best of breed will win. So, people will deploy different type of cloud for different workloads. Pete's has hosted IaaS or platform like PaaS. When they do that, when they host multiple services, softwares to deploy services. I think its hard to control where the data will go. What we can orchestrate or anybody can orchestrate is the centralizing the data management part of it. So, Druva has the best pusher, has the best coverage across multiple heterogeneous Cloud breed. You know. Services like Office 365, Box, or Saleforce or B platforms like S3 or Dynono DB through our product called Apollo or hosted platforms like what Live Nation is using through our Phoenix product line. So getting the breadth of coverage, consistency of policies on a single platform is what will make enterprises adopt what's best out there without worrying about how you build abstraction for data management. >> Jake, what's the biggest thing you see people who are moving to the Cloud for the first time? What are they struggling with? Is it the idea that there's no perimeter? Is it staff training? I mean what are some of the as people move from Test Dev and or start to put in production the Cloud? What are some of the critical things they should think about? >> Yeah, there are so many of them. But first, really, its just getting buy in, you know, from your technical staff because, you know, in an enterprise environment you bring in a Cloud provider it's very easily framed to hold as if we're just being outsourced right? So, I think getting past that barrier first and really getting through to folks and letting them know that really this is good for you. This is not bad for you. You're going to be learning a new skill, very valuable skill, and you're going to be more effective at your job. So, I think that's the first thing. After that, once you start moving to the Cloud, then, the thing that becomes apparent very quickly is cost control. So, you know, the thing with public Cloud is you know, before you had this really kind of narrow range of what IT could cost. Now with the traditional data center, now we have this huge range. And yes, it can be cheaper than it was before. But, it can also be far more expensive than it was before. >> So, service is sprawled or just not paying attention? Both? >> Well, you essentially you're giving your engineers a blank check. So, you need to have some governance and, you know, you really need to think about things that you didn't have to think about before. You're paying for consumption. So, you really have to watch your consumption. >> So, take me thorough the mental model of D duplication in the Cloud. Because I'm trying to like visualize it or grok it a little bit. Okay, so, the Cloud is out there, data's everywhere. And do I move the compute to the data? How does the backup and recovery and data management work? And does D Doup change with Cloud? Because some people think I got my D Doup already and I'm on premise. I've been doing these old solutions. How does D Doup specifically change in the Cloud or does it? >> I know scale changes. You're looking at, you know, the best D Doup systems, if you look historically, you know, were 100 terabyte, 200 terabyte, Dedup indexes, data domain. The scale changes, you know, customers expect massive scale in Cloud. Our largest customer had 10 perabyte in a single Dedup index. It's 100x scale difference compared to what traditional systems could do. Number two, you could create a quality of service which is not really bound by a fixed, you know, algorithm like variable lent or whatever. So, you can optimize a Dedup very clearly for the right workload. The right Dedup for the right workload. So, you may Dedup off of 365 differently than your VMware instances, compared to your Oracle databases or your Endpoint workload. So, it helps you that as a service business model helps you create a custom, tailored solution for the right data. And bring the scale. We don't have the complexity of scale. But, to get the benefit of scale. All, you know, simply managing the cloud. >> Jake, what's it like working with Druve? What's the benefit that they bring to you guys? >> Yeah, so, specifically around backups for our enterprise systems, you know, that's a difficult challenge to solve natively in the Cloud. Especially if you're going to be limited to using Cloud native tools. So, it's really it's a really perfect use case for a third party provider. You know, people don't think about this much but in the old days, in the data center, you know, our backups went offsite into a vault. They were on tapes. It was very difficult for us to lose those or for them to be erased accidentally or even intentionally. Once you go into the Cloud, especially if you're all in with the Cloud, like we are. Everything is easier. And so, accidents are easier also. You know, deleting your data is easier. So, you know, what we really want and what a lot of enterprises want. >> And security too is a potential >> Absolutely, yeah. And so, what we want is we want to get some of that benefit, you know, back that we had from that inefficiency that we had beforehand. We love all the benefits of the Cloud. But, we want to have our data protected also. So, this is a great role for a company like Druva to come in and offer a product like Phoenix and say, you know, we're going to handle we're going to handle your backups for you essentially. So, you're going to put it in a safe place. We're going to secure it for you. And we're going to make sure it's secure for you. And doing it software is a service like Druva does with Phoenix. I think is the absolute right way to go. It's exactly what you need. >> Well, congratulations Jake Burns, Vice President in Cloud services. >> Thank you. >> At Live Nation entertainment. Jaspreet Singh, CEO of Druva, great to have you on. Congratulations on your success. >> Thank you. >> Inside the tornado called Cloud computing. A lot more stuff coming. More CUBE coverage coming up after this short break. Be right back. (electronic music)

Published Date : Mar 9 2018

SUMMARY :

Brought to you by SiliconANGLE Media, Welcome to theCUBE, so what's going on with Cloud? So, we went all in with AWS, What's the key thing that you guys are doing and services deployed either at the edge or in the Cloud. you guys are backing up? So, you know, we have our financial systems And how would you talk to someone who says to today, you know, in the Cloud. Is the consumption on your end different on the things that are important to the business. Jaspreet, you guys have been very successful So, it improves the pusher of the service being offered that, you know, doesn't know the old IT baggage. to not have, you know, the risk And in that way, you can get the benefit To provide some abstraction layer. and you put that data into a second Cloud provider. And that's where you're looking for differentiation. Solve the problem. And that's why we're doing this. Because that's like you got to backup So, Druva has the best pusher, So, you know, the thing with public Cloud is So, you really have to watch your consumption. And do I move the compute to the data? the best D Doup systems, if you look historically, So, you know, what we really want to get some of that benefit, you know, back in Cloud services. Jaspreet Singh, CEO of Druva, great to have you on. Inside the tornado called Cloud computing.

ENTITIES

Entity	Category	Confidence
Jake Burns	PERSON	0.99+
Jaspreet Singh	PERSON	0.99+
Europe	LOCATION	0.99+
John	PERSON	0.99+
John Furrier	PERSON	0.99+
Live Nation Entertainment	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
US	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Jake	PERSON	0.99+
Australia	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
100x	QUANTITY	0.99+
three	QUANTITY	0.99+
San Jose	LOCATION	0.99+
One	QUANTITY	0.99+
Jaspreet	PERSON	0.99+
Office 365	TITLE	0.99+
one	QUANTITY	0.99+
Live Nation	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Druva	ORGANIZATION	0.99+
200 terabyte	QUANTITY	0.99+
first	QUANTITY	0.99+
120 applications	QUANTITY	0.99+
Both	QUANTITY	0.99+
100%	QUANTITY	0.99+
100 terabyte	QUANTITY	0.99+
second	QUANTITY	0.99+
Phoenix	ORGANIZATION	0.99+
two scenarios	QUANTITY	0.99+
late 2015	DATE	0.98+
six months	QUANTITY	0.98+
first time	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
Ticketmaster	ORGANIZATION	0.98+
2016	DATE	0.98+
10 perabyte	QUANTITY	0.98+
two great guests	QUANTITY	0.97+
S3	TITLE	0.97+
Cloud	TITLE	0.97+
one vendor	QUANTITY	0.97+
GDPR	TITLE	0.97+
single platform	QUANTITY	0.96+
Oracle	ORGANIZATION	0.96+
Big Data SV	ORGANIZATION	0.96+
Azure	TITLE	0.95+
365	QUANTITY	0.95+
today	DATE	0.94+
20 different things	QUANTITY	0.94+
Big Data Silicon Valley	ORGANIZATION	0.94+
Druva Phoenix	TITLE	0.93+
Druva	TITLE	0.93+
one place	QUANTITY	0.93+
Cloud Services	ORGANIZATION	0.92+
more than one Cloud	QUANTITY	0.91+
two tech teams	QUANTITY	0.91+
first thing	QUANTITY	0.89+
DB	TITLE	0.89+

Peter Burris Big Data Research Presentation

(upbeat music) >> Announcer: Live from San Jose, it's theCUBE presenting Big Data Silicon Valley brought to you by SiliconANGLE Media and its ecosystem partner. >> What am I going to spend time, next 15, 20 minutes or so, talking about. I'm going to answer three things. Our research has gone deep into where are we now in the big data community. I'm sorry, where is the big data community going, number one. Number two is how are we going to get there and number three, what do the numbers say about where we are? So those are the three things. Now, since when we want to get out of here, I'm going to fly through some of these slides but again there's a lot of opportunity for additional conversation because we're all about having conversations with the community. So let's start here. The first thing to know, when we think about where this is all going is it has to be bound. It's inextricably bound up with digital transformation. Well, what is digital transformation? We've done a lot of research on this. This is Peter Drucker who famously said many years ago, that the purpose of a business is to create and keep a customer. That's what a business is. Now what's the difference between a business and a digital business? What's the business between Sears Roebuck, or what's the difference between Sears Roebuck and Amazon? It's data. A digital business uses data as an asset to create and keep customers. It infuses data and operations differently to create more automation. It infuses data and engagement differently to catalyze superior customer experiences. It reformats and restructures its concept of value proposition and product to move from a product to a services orientation. The role of data is the centerpiece of digital business transformation and in many respects that is where we're going, is an understanding and appreciation of that. Now, we think there's going to be a number of strategic capabilities that will have to be built out to make that possible. First off, we have to start thinking about what it means to put data to work. The whole notion of an asset is an asset is something that can be applied to a productive activity. Data can be applied to a productive activity. Now, there's a lot of very interesting implications that we won't get into now, but essentially if we're going to treat data as an asset and think about how we could put more data to work, we're going to focus on three core strategic capabilities about how to make that possible. One, we need to build a capability for collecting and capturing data. That's a lot of what IoT is about. It's a lot of what mobile computing is about. There's going to be a lot of implications around how to ethically and properly do some of those things but a lot of that investment is about finding better and superior ways to capture data. Two, once we are able to capture that data, we have to turn it into value. That in many respects is the essence of big data. How we turn data into data assets, in the form of models, in the form of insights, in the form of any number of other approaches to thinking about how we're going to appropriate value out of data. But it's not just enough to create value out of it and have it sit there as potential value. We have to turn it into kinetic value, to actually do the work with it and that is the last piece. We have to build new capabilities for how we're going to apply data to perform work better, to enact based on data. Now, we've got a concept we're researching now that we call systems of agency, which is the idea that there's going to be a lot of new approaches, new systems with a lot of intelligence and a lot of data that act on behalf of the brand. I'm not going to spend a lot of time going into this but remember that word because I will come back to it. Systems of agency is about how you're going to apply data to perform work with automation, augmentation, and actuation on behalf of your brand. Now, all this is going to happen against the backdrop of cloud optimization. I'll explain what we mean by that right now. Very importantly, increasingly how you create value out of data, how you create future options on the value of your data is going to drive your technology choices. For the first 10 years of the cloud, the presumption is all data was going to go to the cloud. We think that a better way of thinking about it is how is the cloud experience going to come to the data. We've done a lot of research on the cost of data movement and both in terms of the actual out-of-pocket costs but also the potential uncertainty, the transaction costs, etc, associated with data movement. And that's going to be one of the fundamental pieces or elements of how we think about the future of big data and how digital business works, is what we think about data movement. I'll come to that in a bit. But our proposition is increasingly, we're going to see architectural approaches that focus on how we're going to move the cloud experience to the data. We've got this notion of true private cloud which is effectively the idea of the cloud experience on or near premise. That doesn't diminish the role that the cloud's going to play on industry or doesn't say that Amazon and AWS and Microsoft Azure and all the other options are not important. They're crucially important but it means we have to start thinking architecturally about how we're going to create value of data out of data and recognize that means that it, we have to start envisioning how our organization and infrastructure is going to be set up so that we can use data where it needs to be or where it's most valuable and often that's close to the action. So if we think then about that very quickly because it's a backdrop for everything, increasingly we're going to start talking about the idea of where's the workload going to go? Where's workload the dog going to be against this kind of backdrop of the divorce of infrastructure? We believe that and our research pretty strongly shows that a lot of workloads are going to go to true private cloud but a lot of big data is moving into the cloud. This is a prediction we made a few years ago and it's clearly happening and it's underway and we'll get into what some of the implications are. So again, when we say that a lot of the big data elements, a lot of the process of creating value out of data is going to move into the cloud. That doesn't mean that all the systems of agency that build or rely on that data, the inference engines, etc, are also in a public cloud. A lot of them are going to be distributed out to the edge, out to where the action needs to be because of latency and other types of issues. This is a fundamental proposition and I know I'm going fast but hopefully I'm being clear. All right, so let's now get to the second part. This is kind of where the industry's going. Data is an asset. Invest in strategic business capabilities to appreciate, to create those data assets and appreciate the value of those assets and utilize the cloud intelligently to generate and ensure increasing returns. So the next question is well, how will we get there? Now. Right now, not too far from here, Neil Raden for example, was on the show floor yesterday. Neil made the observation that, as he wandered around, he only heard the word big data two or three times. The concept of big data is not dead. Whether the term is or is not is somebody else's decision. Our perspective, very simply, is that the notion is bifurcating. And it's bifurcating because we see different strategic imperatives happening at two different levels. On the one hand, we see infrastructure convergence. The idea that increasingly we have to think about how we're going to bring and federated data together, both from a systems and a data management standpoint. And on the other hand, we're going to see infrastructure or application specialization. That's going to have an enormous implication over next few years, if only because there just aren't enough people in the world that understand how to create value out of data. And there's going to be a lot of effort made over the next few years to find new ways to go from that one expertise group to billions of people, billions of devices, and those are the two dominant considerations in the industry right now. How can we converge data physically, logically, and on the other hand, how can we liberate more of the smarts associated with this very, very powerful approach so that more people get access to the capacities and the capabilities and the assets that are being generated by that process. Now, we've done at Wikibon, probably I don't know, 18, 20, 23 predictions overall on the role that or on the changes being wrought by digital business. Here I'm going to focus on four of them that are central to our big data research. We have many more but I'm just going to focus on four. The first one, when we think about infrastructure convergence we worry about hardware. Here's a prediction about what we think is going to happen with hardware and our observation is we believe pretty strongly that future systems are going to be built on the concept of how do you increase the value of data assets. The technologies are all in place. Simpler parts that it more successfully bind specifically through all its storage and network are going to play together. Why, because increasingly that's the fundamental constraint. How do I make data available to other machines, actors, sources of change, sources of process within the business. Now, we envision or we are watching before our very eyes, new technologies that allow us to take these simple piece parts and weave them together in very powerful fabrics or grids, what we call UniGrid. So that there is almost no latency between data that exists within one of these, call it a molecule, and anywhere else in that grid or lattice. Now again, these are not systems that are going to be here in five years. All the piece parts are here today and there are companies that are actually delivering them. So if you take a look at what Micron has done with Mellanox and other players, that's an example of one of these true private cloud oriented machines in place. The bottom line though is that there is a lot of room left in hardware. A lot of room. This is what cloud suppliers are building and are going to build but increasingly as we think about true private cloud, enterprises are going to look at this as well. So future systems for improving data assets. The capacity of this type of a system with low latency amongst any source of data means that we can now think about data not as... Not as a set of sources that have to be each individually, each having some control over its own data and sinks woven together by middleware and applications but literally as networks of data. As we start to think about distributing data and distributing control and authority associated with that data more broadly across systems, we now have to think about what does it mean to create networks of data? Because that, in many respects, is how these assets are going to be forged. I haven't even mentioned the role that security is going to play in all of this by the way but fundamentally that's how it's likely to play out. We'll have a lot of different sources but from a business standpoint, we're going to think about how those sources come together into a persistent network that can be acted upon by the business. One of the primary drivers of this is what's going on at the edge. Marc Andreessen famously said that software is eating the world, well our observation is great but if software's eating the world, it's eating it at the edge. That's where it's happening. Secondly, that this notion of agency zones. I said I'm going to bring that word up again, how systems act on behalf of a brand or act on behalf of an institution or business is very, very crucial because the time necessary to do the analysis, perform the intelligence, and then take action is a real constraint on how we do things. And our expectation is that we're going to see what we call an agency zone or a hub zone or cloud zone defined by latency and how we architect data to get the data that's necessary to perform that piece of work into the zone where it's required. Now, the implications of this is none of this is going to happen if we don't use AI and related technologies to increasingly automate how we handle infrastructure. And technologies like blockchain have the potential to provide a interesting way of imagining how these networks of data actually get structured. It's not going to solve everything. There's some people that think the blockchain is kind of everything that's necessary but it will be a way of describing a network of data. So we see those technologies on the ascension. But what does it mean for DBMS? In the old way, in the old world, the old way of thinking, the database manager was the control point for data. In the new world these networks of data are going to exist beyond a single DBMS and in fact, over time, that concept of federated data actually has a potential to become real. When we have these networks of data, we're going to need people to act upon them and that's essentially a lot of what the data scientist is going to be doing. Identifying the outcome, identifying the data that's required, and weaving that data through the construction and management, manipulation of pipelines, to ensure that the data as an asset can persist for the purposes of solving a near-term problem or over whatever duration is required to solve a longer term problem. Data scientists remain very important but we're going to see, as a consequence of improvements in tooling capable of doing these things, an increasing recognition that there's a difference between a data scientist and a data scientist. There's going to be a lot of folks that participate in the process of manipulating, maintaining, managing these networks of data to create these business outcomes but we're going to see specialization in those ranks as the tooling is more targeted to specific types of activities. So the data scientist is going to become or will remain an important job, going to lose a little bit of its luster because it's going to become clear what it means. So some data scientists will probably become more, let's call them data network administrators or networks of data administrators. And very importantly as I said earlier, there's just not enough of these people on the planet and so increasingly when we think about again, digital business and the idea of creating data assets. A central challenge is going to be how to create the data or how to turn all the data that can be captured into assets that can be applied to a lot of different uses. There's going to be two fundamental changes to the way we are currently conceiving of the big data world on the horizon. One is well, it's pretty clear that Hadoop can only go so far. Hadoop is a great tool for certain types of activities and certain numbers of individuals. So Hadoop solves problems for an important but relatively limited subset of the world. Some of the new data science platforms that we just talked about, that I just talked about, they're going to help with a degree of specialization that hasn't been available before in the data world, will certainly also help but it also will only take it so far. The real way that we see the work that we're doing, the work that the big data community is performing, turned into sources of value that extend into virtually every single corner of humankind is going to be through these cloud services that are being built and increasingly through packaged applications. A lot of computer science, it still exists between what I just said and when this actually happens. But in many respects, that's the challenge of the vendor ecosystem. How to reconstruct the idea of packaged software, which has historically been built around operations and transaction processing, with a known data model and an unknown or the known process and some technology challenges. How do we reapply that to a world where we now are thinking about, well we don't know exactly what the process is because the data tells us at the moment that the actions going to be taking place. It's a very different way of thinking about application development. A very different way of thinking about what's important in IT and very different way of thinking about how business is going to be constructed and how strategy's going to be established. Packaged applications are going to be crucially important. So in the last few minutes here, what are the numbers? So this is kind of the basis for our analysis. Digital business, role of data is an asset, having an enormous impact in how we think about hardware, how do we think about database management or data management, how we think about the people involved in this, and ultimately how we think about how we're going to deliver all this value out to the world. And the numbers are starting to reflect that. So why don't you think about four numbers as I go through the two or three slides. Hundred and three billion, 68%, 11%, and 2017. So of all the numbers that you will see, those are four of the most important numbers. So let's start by looking at the total market place. This is the growth of the hardware, software, and services pieces of the big data universe. Now we have a fair amount of additional research that breaks all these down into tighter segments, especially in software side. But the key number here is we're talking about big numbers. 103 billion over the course of next 10 years and let's be clear that 103 billion dollars actually has a dramatic amplification on the rest of the computing industry because a lot of the pricing models associated with, especially the software, are tied back to open source which has its own issues. And very importantly, the fact that the services business is going to go through an enormous amount of change over the next five years as service companies better understand how to deliver some of these big data rich applications. The second point to note here is that it was in 2017 that the software market surpassed the hardware market in big data. Again, for first number of years we focused on buying the hardware and the system software associated with that and the software became something that we hope to discover. So I was having a conversation here in theCUBE with the CEO of Transwarp which is a very interesting Chinese big data company and I asked what's the difference between how you do things in China and how we do things in the US? He said well, in the US you guys focus on proof of concept. You spend an enormous amount of time asking, does the hardware work? Does the database software work? Does the data management software work? In China we focus on the outcome. That's what we focus on. Here you have to placate the IT organization to make sure that everybody in IT is comfortable with what's about to happen. In China, were focused on the business people. This is the first year that software is bigger than hardware and it's only going to get bigger and bigger over time. It doesn't mean again, that hardware is dead or hardware is not important. It's going to remain very important but it does mean that the centerpiece of the locus of the industry is moving. Now, when we think about what the market shares look like, it's a very fragmented market. 60%, 68% of the market is still other. This is a highly immature market that's going to go through a number of changes over the next few years. Partly catalyzed by that notion of infrastructure convergence. So in four years our expectation is that, that 68% is going to start going down pretty fast as we see greater consolidation in how some of these numbers come together. Now IBM is the biggest one on the basis of the fact that they operate in all these different segments. They operating the hardware, software, and services segment but especially because they're very strong within the services business. The last one I want to point your attention to is this one. I mentioned earlier on, that our expectation is that the market increasingly is going to move to a packaged application orientation or packaged services orientation as a way of delivering expertise about big data to customers. Splunk is the leading software player right now. Why, because that's the perspective that they've taken. Now, perhaps we're a limited subset. It's perhaps for a limited subset of individuals or markets or of sectors but it takes a packaged application, weaves these technologies together, and applies them to an outcome. And we think this presages more of that kind of activity over the course of the next few years. Oracle, kind of different approach and we'll see how that plays out over the course of the next five years as well. Okay, so that's where the numbers are. Again, a lot more numbers, a lot of people you can talk to. Let me give you some action items. First one, if data was a core asset, how would IT, how would your business be different? Stop and think about that. If it wasn't your buildings that were the asset, it wasn't the machines that were the asset, it wasn't your people by themselves who were the asset, but data was the asset. How would you reinstitutionalize work? That's what every business is starting to ask, even if they don't ask it in the same way. And our advice is, then do it because that's the future of business. Not that data is the only asset but data is a recognized central asset and that's going to have enormous impacts on a lot of things. The second point I want to leave you with, tens of billions of users and I'm including people and devices, are dependent on thousands of data scientists that's an impedance mismatch that cannot be sustained. Packaged apps and these cloud services are going to be the way to bridge that gap. I'd love to tell you that it's all going to be about tools, that we're going to have hundreds of thousands or millions or tens of millions or hundreds of millions of data scientists suddenly emerge out of the woodwork. It's not going to happen. The third thing is we think that big businesses, enterprises, have to master what we call the big inflection. The big tech inflection. The first 50 years were about known process and unknown technology. How do I take an accounting package and do I put on a mainframe or a mini computer a client/server or do I do it on the web? Unknown technology. Well increasingly today, all of us have a pretty good idea what the base technology is going to be. Does anybody doubt it's going to be the cloud? We got a pretty good idea what the base technology is going to be. What we don't know is what are the new problems that we can attack, that we can address with data rich approaches to thinking about how we turn those systems into actors on behalf of our business and customers. So I'm a couple minutes over, I apologize. I want to make sure everybody can get over to the keynotes if you want to. Feel free to stay, theCUBE's going to be live at 9:30. If I got that right. So it's actually pretty exciting if anybody wants to see how it works, feel free to stay. Georgia's here, Neil's here, I'm here. I mentioned Greg Terrio, Dave Volante, John Greco, I think I saw Sam Kahane back in the corner. Any questions, come and ask us, we'll be more than happy. Thank you very much for, oh David Volante. >> David: I have a question. >> Yes. >> David: Do you have time? >> Yep. >> David: So you talk about data as a core asset, that if you look at the top five companies by market cap in the US, Google, Amazon, Facebook, etc. They're data companies, they got data at the core which is kind of what your first bullet here describes. How do you see traditional companies closing that gap where humans, buildings, etc at the core as we enter this machine intelligence era, what's your advice to the traditional companies on how they close that gap? >> All right. So the question was, the most valuable companies in the world are companies that are well down the path of treating data as an asset. How does everybody else get going? Our observation is you go back to what's the value proposition? What actions are most important? what's data is necessary to perform those actions? Can changing the way the data is orchestrated and organized and put together inform or change the cost of performing that work by changing the cost transactions? Can you increase a new service along the same lines and then architect your infrastructure and your business to make sure that the data is near the action in time for the action to be absolute genius to your customer. So it's a relatively simple thought process. That's how Amazon thought, Apple increasingly thinks like that, where they design the experience and they think what data is necessary to deliver that experience. That's a simple approach but it works. Yes, sir. >> Audience Member: With the slide that you had a few slides ago, the market share, the big spenders, and you mentioned that, you asked the question do any of us doubt that cloud is the future? I'm with Snowflake, I don't see many of those large vendors in the cloud and I was wondering if you could speak to what are you seeing in terms of emerging vendors in that space. >> What a great question. So the question was, when you look at the companies that are catalyzing a lot of the change, you don't see a lot of the big companies being at the leadership. And someone from Snowflake just said, well who's going to lead it? That's a big question that has a lot of implications but at this point time it's very clear that the big companies are suffering a bit from the old, from the old, trying to remember what the... RCA syndrome. I think Clay Christensen talked about this. You know, the innovators dilemma. So RCA actually is one of the first creators. They created the transistor and they held a lot of original patents on it. They put that incredible new technology, back in the forties and fifties, under the control of the people who ran the vacuum tube business. When was the last time anybody bought RCA stock? The same problem is existing today. Now, how is that going to play out? Are we going to see a lot of, as we've always seen, a lot of new vendors emerge out of this industry, grow into big vendors with IPO related exits to try to scale their business? Or are we going to see a whole bunch of gobbling up? That's what I'm not clear on but it's pretty clear at this point in time that a lot of the technology, a lot of the science, is being done in smaller places. The moderating feature of that is the services side. Because there's limited groupings of expertise that the companies that today are able to attract that expertise. The Googles, the Facebooks, the AWSs, etc, the Amazons. Are doing so in support of a particular service. IBM and others are trying to attract that talent so they can apply it to customer problems. We'll see over the next few years whether the IBMs and the Accentures and the big service providers are able to attract the kind of talent necessary to diffuse that knowledge into the industry faster. So it's the rate at which that the idea of internet scale computing, the idea of big data being applied to business problems, can diffuse into the marketplace through services. If it can diffuse faster that will have both an accelerating impact for smaller vendors, as it has in the past. But it may also again, have a moderating impact because a lot of that expertise that comes out of IBM, IBM is going to find ways to drive in the product faster than it ever has before. So it's a complicated answer but that's our thinking at this point time. >> Dave: Can I add to that? >> Yeah. (audience member speaking faintly) >> I think that's true now but I think the real question, not to not to argue with Dave but this is part of what we do. The real question is how is that knowledge going to diffuse into the enterprise broadly? Because Airbnb, I doubt is going to get into the business of providing services. (audience member speaking faintly) So I think that the whole concept of community, partnership, ecosystem is going to remain very important as it always has and we'll see how fast those service companies that are dedicated to diffusing knowledge, diffusing knowledge into customer problems actually occurs. Our expectation is that as the tooling gets better, we will see more people be able to present themselves truly as capable of doing this and that will accelerate the process. But the next few years are going to be really turbulent and we'll see which way it actually ends up going. (audience member speaking faintly) >> Audience Member: So I'm with IBM. So I can tell you 100% for sure that we are, I hired literally 50 data scientists in the last three months to go out and do exactly what you're saying. Sit down with clients and help them figure out how to do data science in the enterprise. And so we are in fact scaling it, we're getting people that have done this at Google, Facebook. Not a whole lot of those 'cause we want to do it with people that have actually done it in legacy fortune 500 Companies, right? Because there's a little bit difference there. >> So. >> Audience Member: So we are doing exactly what you said and Microsoft is doing the same thing, Amazon is actually doing the same thing too, Domino Data Lab. >> They don't like they're like talking about it too much but they're doing it. >> Audience Member: But all the big players from the data science platform game are doing this at a different scale. >> Exactly. >> Audience Member: IBM is doing it on a much bigger scale than anyone else. >> And that will have an impact on ultimately how the market gets structured and who the winners end up being. >> Audience Member: To add too, a lot of people thought that, you mentioned the Red Hat of big data, a lot of people thought Cloudera was going to be the Red Hat of big data and if you look at what's happened to their business. (background noise drowns out other sounds) They're getting surrounded by the cloud. We look at like how can we get closer to companies like AWS? That was like a wild card that wasn't expected. >> Yeah but look, at the end of the day Red Hat isn't even the Red Hat of open source. So the bottom line is the thing to focus on is how is this knowledge going to diffuse. That's the thing to focus on. And there's a lot of different ways, some of its going to diffuse through tools. If it diffuses through tools, it increases the likelihood that we'll have more people capable of doing this in IBM and others can hire more. That Citibank can hire more. That's an important participant, that's an important play. So you have something to say about that but it also says we're going to see more of the packaged applications emerge because that facilitates the diffusion. This is not, we haven't figured out, I don't know exactly, nobody knows exactly the exact shape it's going to take. But that's the centerpiece of our big data researches. How is that diffusion process going to happen, accelerate, and what's the resulting structure going to look like? And ultimately how are enterprises going to create value with whatever results. Yes, sir. (audience member asks question faintly) So the recap question is you see more people coming in and promising the moon but being incapable of delivering because they are, partly because the technology is uncertain and for other reasons. So here's our approach. Or here's our observation. We actually did a fair amount of research on this. When you take a look at what we call a approach to doing big data that's optimized for the costs of procurement i.e. let's get the simplest combination of infrastructure, the simplest combination of open-source software, the simplest contracting, to create that proof of concept that you can stand things up very quickly if you have enough expertise but you can create that proof of concept but the process of turning that into actually a production system extends dramatically. And that's one of the reasons why the Clouderas did not take over the universe. There are other reasons. As George Gilbert's research has pointed out, that Cloudera is spending 53, 55 % of their money right now just integrating all the stuff that they bought into the distribution five years ago. Which is a real great recipe for creating customer value. The bottom line though is that if we focus on the time to value in production, we end up taking a different path. We don't focus as much on whether the hardware is going to work and the network is going to work and the storage can be integrated and how it's going to impact the database and what that's going to mean to our Oracle license pool and all the other things that people tend to think about if they're focused on the technology. And so as a consequence, you get better time to value if you focus on bringing the domain expertise, working with the right partner, working with the appropriate approach, to go from what's the value proposition, what actions are associated with a value proposition, what's stated in that area to perform those actions, how can I take transaction costs out of performing those actions, where's the data need to be, what infrastructure do I require? So we have to focus on a time to value not the time to procure. And that's not what a lot of professional IT oriented people are doing because many of them, I hate say it, but many of them still acquire new technology with the promise to helping the business but having a stronger focus on what it's going to mean to their careers. All right, I want to be really respectful to everybody's time. The keynotes start in about five minutes which means you just got time. If you want to stay, feel free to stay. We'll be here, we'll be happy to talk but I think that's pretty much going to close our presentation broadcast. Thank you very much for being an attentive audience and I hope you found this useful. (upbeat music)

Published Date : Mar 9 2018

SUMMARY :

brought to you by SiliconANGLE Media that the actions going to be taking place. by market cap in the US, Google, Amazon, Facebook, etc. or change the cost of performing that work in the cloud and I was wondering if you could speak to the idea of big data being applied to business problems, (audience member speaking faintly) Our expectation is that as the tooling gets better, in the last three months to go out and do and Microsoft is doing the same thing, but they're doing it. Audience Member: But all the big players from Audience Member: IBM is doing it on a much bigger scale how the market gets structured They're getting surrounded by the cloud. and the network is going to work

ENTITIES

Entity	Category	Confidence
Dave Volante	PERSON	0.99+
Marc Andreessen	PERSON	0.99+
Dave	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Neil	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
Sam Kahane	PERSON	0.99+
Google	ORGANIZATION	0.99+
Neil Raden	PERSON	0.99+
2017	DATE	0.99+
John Greco	PERSON	0.99+
Citibank	ORGANIZATION	0.99+
Greg Terrio	PERSON	0.99+
China	LOCATION	0.99+
David Volante	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Clay Christensen	PERSON	0.99+
David	PERSON	0.99+
Sears Roebuck	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Domino Data Lab	ORGANIZATION	0.99+
Peter Drucker	PERSON	0.99+
US	LOCATION	0.99+
Amazons	ORGANIZATION	0.99+
two	QUANTITY	0.99+
11%	QUANTITY	0.99+
George Gilbert	PERSON	0.99+
AWS	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
68%	QUANTITY	0.99+
millions	QUANTITY	0.99+
53, 55 %	QUANTITY	0.99+
60%	QUANTITY	0.99+
Peter Burris	PERSON	0.99+
Facebooks	ORGANIZATION	0.99+
103 billion	QUANTITY	0.99+
Googles	ORGANIZATION	0.99+
second part	QUANTITY	0.99+
second point	QUANTITY	0.99+
IBMs	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
AWSs	ORGANIZATION	0.99+
Accentures	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
One	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
four	QUANTITY	0.99+
Hundred	QUANTITY	0.99+
Transwarp	ORGANIZATION	0.99+
Mellanox	ORGANIZATION	0.99+
tens of millions	QUANTITY	0.99+
three things	QUANTITY	0.99+
Micron	ORGANIZATION	0.99+
50 data scientists	QUANTITY	0.99+
First	QUANTITY	0.99+
yesterday	DATE	0.99+
three times	QUANTITY	0.99+
103 billion dollars	QUANTITY	0.99+
Red Hat	TITLE	0.99+
first bullet	QUANTITY	0.99+
Two	QUANTITY	0.99+
Airbnb	ORGANIZATION	0.99+
Secondly	QUANTITY	0.99+
five years	QUANTITY	0.98+
one	QUANTITY	0.98+
both	QUANTITY	0.98+
hundreds of millions	QUANTITY	0.98+
first	QUANTITY	0.98+

Jacques Nadeau, Dremio | Big Data SV 2018

>> Announcer: Live from San Jose, it's theCUBE, presenting Big Data Silicon Valley. Brought to you by SiliconANGLE Media and it's ecosystem partners. >> Welcome back to Big Data SV in San Jose. This theCUBE, the leader in live tech coverage. My name is Dave Vellante and this is day two of our wall-to-wall coverage. We've been here most of the week, had a great event last night, about 50 or 60 of our CUBE community members were here. We had a breakfast this morning where the Wikibon research team laid out it's big data forecast, the eighth big data forecast and report that we've put out, so check out that online. Jacques Nadeau is here. He is the CTO and co-founder of Dremio. Jacque, welcome to theCUBE, thanks for coming on. >> Thanks for having me here. >> So we were talking a little bit about what you guys do. Three year old company. Well, let me start. Why did you co-found Dremio? >> So, it was a very simple thing I saw, so, over the last ten years or so, we saw a regression in the ability for people to get at data, so you see all these really cool technologies that came out to store data. Data lakes, you know, SQL systems, all these different things that make developers very agile with data. But what we were also seeing was a regression in the ability for analysts and data consumers to get at that data because the systems weren't designed for analysts, they were designed for data producers and developers. And we said, you know what, there needs to be a way to solve this. We need to be able to empower people to be self-sufficient again at the data consumption layer. >> Okay, so you solved that problem how, you said, called it a self-service of a data platform. >> Yeah, yeah, so self-service data platform and the idea is pretty simple. It's that, no matter where the data is physically, people should be able to interact with a logical view of it. And so, we talk a little bit like it's Google Docs for your data. So people can go into the system, they can see the different data sets that are available to them, collaborate around those, create changes to those that they can then share with other people in the organization, always dealing with the logical layer and then, behind the scenes, we have physical capabilities to interact with all the different system we interact with. But that's something that business users shouldn't have to think as much about and so, if you think about how people interact with data today, it's very much about copies. So every time you want to do something, typically you're going to make a copy. I want to reshape the data, I make a copy. I want to make it go faster, I make a copy. And those copies are very, very difficult for people to manage and they could have mixed the business meaning of data with the physical, I'm making copies to make them faster or whatever. And so our perspective is that, if you can separate away the physical concerns from the logical, then business users have a much more, much more likelihood to be able to do something self-service. >> So you're essentially virtualizing my corpus of data, independent of location, is that right, I mean-- >> It's part of what we do, yeah. No, it's part of what we do. So, the way we look at it is, is kind of several different components to try to make something self-service. It starts with, yeah, virtualize or abstract away the details of the physical, right? But then, on top of that, expose a very, sort of a very user-friendly interface that allows people to sort of catalog and understand the different things, you know, search for things that they want to interact with, and then curate things, even if they're non-technical users, right? So the goal is that, if you talk to sort of even large internet companies in the Valley, it's very hard to even hire the amount of data engineering that you need to satisfy all the requests of your end-users of data. And so the, and so the goal of Dremio is basically to figure out different tools that can provide a non-technical experience for getting at the data. So that's sort of the start of it but then the second step is, once you've got access to this thing and people can collaborate and sort of deal with the data, then you've got these huge volumes of data, right? It's big data and so how do you make that go faster? And then we have some components that we deal with, sort of, speed and acceleration. >> So maybe talk about how people are leveraging this capability, this platform, what the business impact is, what have you seen there? >> So a lot of people have this problem, which is, they have data all over the place and they're trying to figure out "How do I expose this "to my end-users?" And those end-users might be analysts, they might be data scientists, they might be product managers that are trying to figure out how their product is working. And so, what they're doing today is they're typically trying to build systems internally that, to provide these capabilities. And so, for example, working with a large auto manufacturer. And they've got a big initiative where they're trying to make the data that they have, they have huge amounts of data across all sort of different parts of the organization and they're trying to make that available to different data consumers. Now, of course, there's a bunch of security concerns that you need to have around that, but they just want to make the data more accessible. And so, what they're doing is they're using Dremio to figure out ways to, basically, catalog all the data below, expose that to the different users, applying lots of different security rules around that, and then create a bunch of reflections, which make the things go faster as people are interacting with the things. >> Well, what about the governance factor? I mean, you heard this in the hadoop world years ago. "Ah, we're going to make, we're going to harden hadoop, "we're going to" and really, there was no governance and it became more and more important. How do you guys handle that? Do you partner with people? Is it up to the customer to figure that out? Do you provide that? >> It's several different things, right? It's a complex ecosystem, right? So it's a combination of things. You start with partnering with different systems to make sure that you integrate well with those things. So the different things that control some parts of credentials inside the systems all the way down to "What's the file system permissions?", right? "What are the permissions inside of something like Hive and the metastore there?" And then other systems on top of that, like Sentry or Ranger are also exposing different credentialing, right? And so we work hard to sort of integrate with those things. On top of that, Dremio also provides a full security model inside of the sort of virtual space that we work. And so people can control the permissions, the ability to access or edit any object inside of Dremio based on user roles and LDAP and those kinds of things. So it's, it's kind of multiple layers that have to be working together. >> And tell me more about the company. So founded three years ago, I think a couple of raises, >> Yep >> who's backing you? >> Yeah, yeah, yeah, so we founded just under three years ago. We had great initial investors, in Red Point and Lightspeed, so two great initial investors and we raised about 15 million on that round. And then we actually just closed a B round in January of this year and we added Norwest to the portfolio there. >> Awesome, so you're now in the mode of, I mean, they always say, you know, software is such a capital-efficient business but you see software companies raising, you know, 900 million dollars and so, presumably, that's to compete, to go to market and, you know, differentiate with your messaging and branding. Is that sort of what the, the phase that you're in now? You kind of developed a product, it's technically sound, it's proven in the marketspace and now you're scaling the, the go-to-market, is that right? >> That's exactly right. So, so we've had a lot of early successes, a lot of Fortune 100 companies using Dremio today. For example, we're working with TransUnion. We're working with Intel. We actually have a great relationship with OVH, which is the third-largest hosting company in the world, so a lot of great, Daimler is another one. So working with a lot of great companies, seeing sort of great early success with the product with those companies, and really looking to say "Hey, we're out here." We've got a booth for the first time at Strata here and we're sort of letting people know about, sort of, a better way, or easier way, for people to deal with data >> Yeah. >> A happier way. >> I mean, it's a crowded space, right? There's a lot of tools out there, a lot of companies. I'm interested in how you sort of differentiate. Obviously simplification is a part of that, the breadth of your capabilities. But maybe, in your words, you could share with me how you differentiate from the competition and how you break out from the noise. >> Yeah, yeah, yeah, so it's, you're absolutely right, it's a very crowded space. Everybody's using the same words and that makes it very hard for people to understand what's going on. And so, what we've found is very simple is that typically we will actually, the first meeting we deal with a customer, within the first 10 minutes we'll demo the product. Because so many technologies are technologies, not, they're not products and so you have to figure out how to use the product. You've got to figure out how you would customize it for your certain use-case. And what we've found with our product is, by making it very, very simple, people start, the light goes on in a very short amount of time and so, we also do things on our website so that you can see, in a couple of minutes, or even less than that, little animations that sort of give you a sense of what it's about. But really, it's just "Hey, this is a product "which is about", there's this light bulb that goes on, it's great. And you figure this out over the course of working with different customers, right? But there's this light bulb that goes on for people that are so confused by all the things that are going on and if we can just sit down with them, show them the product for a few minutes, all of a sudden they're like "Wait a minute, "I can use this", right? So you're frequently talking to buyers that are not the most technical parts of the organization initially, and so most of the technologies they look at are technologies that are very difficult to understand and they have to look to others to try to even understand how it would fit into their architecture. With Dremio, we have customers that can, that have installed it and gotten up, and within an hour or two, started to see real value. And that sort of excitement happens even in the demo, with most people. >> So you kind of have this bifurcated market. Since the big data meme, everybody says they're data-driven and you've got a bifurcated market in that, you've got the companies that are data-driven and you've got companies who say they're data-driven but really aren't. Who are your customers? Are they in both? Are they predominantly in the data-driven side? Are they predominantly in the trying to be data-driven? >> Well, I would say that they all would say that they're data-driven. >> Yeah, everyone, who's going to say "Well, we're not data-driven." >> Yeah, yeah, yeah. So I would say >> We're dead. >> I would say that everybody has data and they've got some ways that they're using it well and other places where they feel like they're not using it as well as they should. And so, I mean, the reason that we exist is to make it so it's easier for people to get value out of data, and so, if they were getting all the value they think they could get out of data, then we probably wouldn't exist and they would be fully data-driven. So I think that everybody, it's a journey and people are responding well to us, in part, because we're helping them down that journey. >> Well, the reason I asked that question is that we go to a lot of shows and everybody likes to throw out the digital transformation buzzword and then use Uber and Airbnb as an example, but if you dig deeper, you see that data is at the core of those companies and they're now beginning to apply machine intelligence and they're leveraging all this data that they've built up, this data architecture that they built up over the last five or 10 years. And then you've got this set of companies where all the data lives in silos and I can see you guys being able to help them. At the same time, I can see you helping the disruptors, so how do you see that? I mean, in terms of your role, in terms of affecting either digital transformations or digital disruptions. >> Well, I'd say that in either case, so we believe in a very sort of simple thing, which is that, so going back to what I said at the beginning, which is just that I see this regression in terms of data access, right? And so what happens is that, if you have a tightly-coupled system between two layers, then it becomes very difficult for people to sort of accommodate two different sets of needs. And so, the change over the last 10 years was the rise of the developer as the primary person for controlling data and that brought a huge amount of great things to it but analysis was not one of them. And there's tools that try to make that better but that's really the problem. And so our belief is very simple, which is that a new tier needs to be introduced between the consumers and the, and the producers of data. And that, and so that tier may interact with different systems, it may be more complex or whatever, for certain organizations, but the tier is necessary in all organizations because the analysts shouldn't be shaken around every time the developers change how they're doing data. >> Great. John Furrier has a saying that "Data is the new development kit", you know. He said that, I don't know, eight years ago and it's really kind of turned out to be the case. Jacques Nadeau, thanks very much for coming on theCUBE. Really appreciate your time. >> Yeah. >> Great to meet you. Good luck and keep us informed, please. >> Yes, thanks so much for your time, I've enjoyed it. >> You're welcome. Alright, thanks for watching everybody. This is theCUBE. We're live from Big Data SV. We'll be right back. (bright music)

Published Date : Mar 9 2018

SUMMARY :

Brought to you by SiliconANGLE Media We've been here most of the week, So we were talking a little bit about what you guys do. And we said, you know what, there needs to be a way Okay, so you solved that problem how, and the idea is pretty simple. So the goal is that, if you talk to sort of expose that to the different users, I mean, you heard this in the hadoop world years ago. And so people can control the permissions, And tell me more about the company. And then we actually just closed a B round that's to compete, to go to market and, you know, for people to deal with data and how you break out from the noise. and so most of the technologies they look at So you kind of have this bifurcated market. that they're data-driven. Yeah, everyone, who's going to say So I would say And so, I mean, the reason that we exist is At the same time, I can see you helping the disruptors, And so, the change over the last 10 years "Data is the new development kit", you know. Great to meet you. This is theCUBE.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Jacques Nadeau	PERSON	0.99+
Daimler	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Norwest	ORGANIZATION	0.99+
Intel	ORGANIZATION	0.99+
Wikibon	ORGANIZATION	0.99+
TransUnion	ORGANIZATION	0.99+
Jacque	PERSON	0.99+
San Jose	LOCATION	0.99+
OVH	ORGANIZATION	0.99+
Lightspeed	ORGANIZATION	0.99+
second step	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
two layers	QUANTITY	0.99+
Airbnb	ORGANIZATION	0.99+
both	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Google Docs	TITLE	0.99+
Red Point	ORGANIZATION	0.99+
Strata	ORGANIZATION	0.99+
60	QUANTITY	0.98+
900 million dollars	QUANTITY	0.98+
three years ago	DATE	0.98+
eight years ago	DATE	0.98+
two	QUANTITY	0.98+
Dremio	PERSON	0.98+
first 10 minutes	QUANTITY	0.98+
last night	DATE	0.98+
about 15 million	QUANTITY	0.97+
theCUBE	ORGANIZATION	0.97+
first time	QUANTITY	0.97+
Dremio	ORGANIZATION	0.97+
Big Data SV	ORGANIZATION	0.96+
an hour	QUANTITY	0.96+
two great initial investors	QUANTITY	0.95+
today	DATE	0.93+
first meeting	QUANTITY	0.93+
this morning	DATE	0.92+
two different sets	QUANTITY	0.9+
third	QUANTITY	0.88+
Big Data	ORGANIZATION	0.87+
SQL	TITLE	0.87+
10 years	QUANTITY	0.87+
CUBE	ORGANIZATION	0.87+
years ago	DATE	0.86+
Silicon Valley	LOCATION	0.86+
January of this year	DATE	0.84+
Dremio	TITLE	0.84+
Three year old	QUANTITY	0.81+
last 10 years	DATE	0.8+
Sentry	ORGANIZATION	0.77+
one of them	QUANTITY	0.75+
about 50	QUANTITY	0.75+
day two	QUANTITY	0.74+
Ranger	ORGANIZATION	0.74+
SV	EVENT	0.7+
last ten years	DATE	0.68+
eighth big	QUANTITY	0.68+
Data	ORGANIZATION	0.66+
Big	EVENT	0.65+
couple of minutes	QUANTITY	0.61+
CTO	PERSON	0.56+
one	QUANTITY	0.55+
last	DATE	0.52+
100 companies	QUANTITY	0.52+
under	DATE	0.51+
five	QUANTITY	0.5+
2018	DATE	0.5+
Hive	TITLE	0.42+

Steve Wilkes, Striim | Big Data SV 2018

>> Narrator: Live from San Jose it's theCUBE. Presenting Big Data Silicon Valley. Brought to you by SiliconANGLE Media and its ecosystem partners. (upbeat music) >> Welcome back to San Jose everybody, this is theCUBE, the leader in live tech coverage and you're watching BigData SV, my name is Dave Vellante. In the early days of Hadoop everything was batch oriented. About four or five years ago the market really started to focus on real time and streaming analytics to try to really help companies affect outcomes while things were still in motion. Steve Wilks is here, he's the co-founder and CTO of a company called Stream, a firm that's been in this business for around six years. Steve welcome to theCUBE, good to see you. Thanks for coming on. >> Thanks Dave it's a pleasure to be here. >> So tell us more about that, you started about six years ago, a little bit before the market really started talking about real time and streaming. So what led you to that conclusion that you should co-found Steam way ahead of its time? >> It's partly our heritage. So the four of us that founded Stream, we were executives at GoldenGate Software. In fact our CEO Ali Kutay was the CEO of GoldenGate Software. So when we were acquired by Oracle in 2009, after having to work for Oracle for a couple years, we were trying to work out what to do next. And GoldenGate was replication software right? So it's moving data from one place to another. But customers would ask us in customer advisory boards, that data seems valuable, it's moving. Can you look at it while it's moving and analyze it while it's moving, get value out of that moving data? And so that was kind of set in our heads. And then we were thinking about what to do next, that was kind of the genesis of the idea. So the concept around Stream when we first started the company was we can't just give people streaming data, we need to give them the ability to process that data, analyze it, visualize it, play with it and really truly understand the data. As well as being able to collect it and move it somewhere else. And so the goal from day one was always to build a full end-to-end platform that did everything customers needed to do for streaming integration analytics out of the box. And that's what we've done after six years. >> I got to ask a really basic question, so you're talking about your experience at GoldenGate moving data from point a to point b and somebody said well why don't we put that to work. But is there change data or was it static data? Why couldn't I just analyze it in place? >> GoldenGate works on change data. >> Okay so that's why, there was changes going through. Why wait until it hits its target, let's do some work in real time and learn from that, get greater productivity. And now you guys have taken that to a new level. That new level being what? Modern tools, modern technologies? >> A platform built from the ground up to be inherently distributed, scalable, reliable with exactly one's processing guarantees. And to be a complete end-to-end platform. There's a recognition that the first part of being able to do streaming data integration or analytics is that you need to be able to collect the data right? And while change data captured from databases is the way to get data out of databases in a streaming fashion, you also have to deal with files and devices and message queues and anywhere else the data can reside. So you need a large number of different data collectors that all turn the enterprise data sources into streaming data. And similarly if you want to store data somewhere you need a large collection of target adapters that deliver to things. Not just on premise but also in the cloud. So things like Amazon S3 or the cloud databases like Redshift and Google BigQuery. So the idea was really that we wanted to give customers everything they need and that everything they need isn't trivial. It's not just, well we take Apache Kafka and then we stuff things into it and then we take things out. Pretty often, for example, you need to be able to enrich data and that means you need to be able to join streaming data with additional context information, reference data. And that reference data may come form a database or from files or somewhere else. So you can't call out to the database and maintain the speeds of streaming data. We have customers that are doing hundreds of thousands of events per second. So you can't call out to a database for every event and ask for records to enrich it with. And you can't even do that with an external cache because it's just not fast enough. So we built in an in-memory data grid as part of our platform. So you can join streaming data with the context information in real time without slowing anything down. So when you're thinking about doing streaming integration, it's more than just moving data around. It's ability to process it and get it in the right form, to be able to analyze it, to be able to do things like complex event processing on that data. And also to be able to visualize it and play with it is an essential part of the whole platform. >> So I wanted to ask you about end-to-end. I've seen a lot of products from larger, maybe legacy companies that will say it's end-to-end but what it really is, is a cobbled together pieces that they bought in and then, this is our end-to-end platform, but it's not unified. Or I've seen others "Well we've got an end-to-end platform" oh really, can I see the visualization? "Well we don't have visualization "we use this third party for visualization". So convince me that you're end-to-end. >> So our platform when you start with it you go into a UI, you can start building data flows. Those data flows start from connectors, we have all the connectors that you need to get your enterprise data. We have wizards to help you build those. And so now you have a data stream. Now you want to start processing that, we have SQL-based processing so you can do everything from filtering, transformation, aggregation, enrichment of data. If you want to load reference data into memory you use a cache component to drag that in, configure that. You now have data in-memory you can join with your streams. If you want to now take the results of all that processing and write it somewhere, use one of our target connectors, drag that in so you've got a data flow that's getting bigger and bigger, doing more and more processing. So now you're writing some of that data out to Kafka, oh I'm going to also add in another target adaptor write some of it into Azure Blob Storage and some of it's going to Amazon Redshift. So now you have a much bigger data flow. But now you say okay well I also want to do some analytics on that. So you take the data stream, you build another data flow that is doing some aggregation of a Windows, maybe some complex event processing, and then you use that dashboard builder to build a dashboard to visualize all of that. And that's all in one product. So it literally is everything you need to get value immediately. And you're right, the big vendors they have multiple different products and they're very happy to sell you consulting to put them all together. Even if you're trying to build this from open source and you know, organizations try and do that, you need five or six major pieces of open source, a lot of support in libraries, and a huge team of developers to just build a platform that you can start to build applications on. And most organizations aren't software platform companies, they're finance companies, oil and gas companies, healthcare companies. And they really want to focus on solving business problems and not on reinventing the wheel by building a software platform. So we can just go in there and say look; value immediately. And that really, really helps. >> So what are some of your favorite use cases, examples, maybe customer examples that you can share with me? >> So one of the great examples, one of my customers they have a lot of data in our HP non-stop system. And they needed to be able to get visibility into that immediately. And this was like order processing, supply chain, ERP data. And it would've taken a very large amount of time to do analytics directly on the HP nonstop. And finding resources to do that is hard as well. So they needed to get the data out and they need to get it into the appropriate place. And they recognize that use the right technology to ask the right question. So they wanted some of it in Hadoop so they could do some machine learning on that. They wanted some of it to go into Kafka so they could get real time analytics. And they wanted some of it to go into HBase so they could query it immediately and use that for reference purposes. So they utilized us to do change data capture against the HP nonstop, deliver that datastream out immediately into Kafka and also push some of it into HEFS and some of it into HBase. So they immediately got value out of that, because then they could also build some real-time analytics on it. It would sent out alerts if things were taking too long in their order processing system. And allowed them to get visibility directly into their process that they couldn't get before with much fewer resources and more modern technologies than they could have used before. So that's one example. >> Can I ask you a question about that? So you talked about Kafka, HBase, you talk about a lot of different open source projects. You've integrated those or you've got entries and exits into those? >> So we ship with Kafka as part of our product. It's an optional messaging bus. So, our platform has two different ways of moving data around. We have a high-speed, in-memory only message bus and that works almost network speed and it's great for a lot of different use cases. And that is what backs our data streams. So when you build a data flow, you have streams in between each step, that is backed by an in-memory bus. Pretty often though, in use cases, you need to be able to potentially rewind data for recovery purposes or have different applications running at different speeds and that's where a persistent message bus like Kafka comes in but you don't want to use a persistent message bus for everything because it's doing IO and it's slowing things down. So you typically use that at the beginning, at the sources, especially things like IOT where you can't rewind into them. Things like databases and files, you can rewind into them and replay and recover but IOT sources, you can't do that. So you would push that into a Kafka backed stream and then subsequent processing is in-memory. So we have that as part of our product. We also have Elastic as part of our product for results storage. You can switch to other results storage but that's our default. And we have a few other key components that are part of our product but then on the periphery, we have adapters integrate with a lot of the other things that you mentioned. So we have adapters to read and write HDFS, Hive, HBase, Across, Cloudera, Autumn Works, even MapR. So we have the MapR versions of the file system and MapR streams and MapR DB and then there's lots of other more proprietary connectors like CVC from Oracle, and SQL server, and MySQL and MariaDB. And then database connectors for delivery to virtually any JDBC compliant database. >> I took you down a tangent before you had a chance. You were going to give us another example. We're pretty much out of time but if you can briefly share either that or the last word, I'll give it to you. >> I think the last word would be that that is one example. We have lots and lots of other types of use cases that we do including things like: migrating data from on-premise to the cloud, being able to distribute log data, and being able to analyze that log data being able to do in-memory analytics and get real-time insights immediately and send alerts. It's a very comprehensive platform but each one of those use cases are very easy to develop on their own and you can do them very quickly. And of course as the use case expands within a customer, they build more and more and so they end up using the same platform for lots of different use cases within the same account. >> And how large is the company? How many people? >> We are around 70 people right now. >> 70 People and you're looking for funding? What rounds are you in? Where are you at with funding and revenue and all that stuff? >> Well I'd have to defer to my CEO for those questions. >> All right, so you've been around for what, six years you said? >> Yeah, we have a number of rounds of funding. We had initial seed funding then we had the investment by Summit Partners that carried us through for a while. Then subsequent investment from Intel Capital, Dell EMC, Atlantic Bridge. And that's where we are right now. >> Good, excellent. Steve, thanks so much for coming on theCUBE, really appreciate your time. >> Great, it's awesome. Thank you Dave. >> Great to meet you. All right, keep it right there everybody, we'll be back with our next guest. This is theCUBE. We're live from BigData SV in San Jose. We'll be right back. (techno music)

Published Date : Mar 9 2018

SUMMARY :

Brought to you by SiliconANGLE Media the market really started to focus So what led you to that conclusion So it's moving data from one place to another. I got to ask a really basic question, And now you guys have taken that to a new level. and that means you need to be able to So I wanted to ask you about end-to-end. So our platform when you start with it And they needed to be able to get visibility So you talked about Kafka, HBase, So when you build a data flow, you have streams We're pretty much out of time but if you can briefly to develop on their own and you can do them very quickly. And that's where we are right now. really appreciate your time. Thank you Dave. Great to meet you.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Steve Wilks	PERSON	0.99+
Steve	PERSON	0.99+
2009	DATE	0.99+
Steve Wilkes	PERSON	0.99+
five	QUANTITY	0.99+
Intel Capital	ORGANIZATION	0.99+
GoldenGate Software	ORGANIZATION	0.99+
Ali Kutay	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
hundreds	QUANTITY	0.99+
GoldenGate	ORGANIZATION	0.99+
Kafka	TITLE	0.99+
San Jose	LOCATION	0.99+
Stream	ORGANIZATION	0.99+
MySQL	TITLE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Atlantic Bridge	ORGANIZATION	0.99+
six years	QUANTITY	0.99+
Steam	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
MapR	TITLE	0.99+
HP	ORGANIZATION	0.99+
four	QUANTITY	0.99+
70 People	QUANTITY	0.99+
Dell EMC	ORGANIZATION	0.99+
MariaDB	TITLE	0.99+
Striim	PERSON	0.99+
SQL	TITLE	0.99+
one	QUANTITY	0.98+
each step	QUANTITY	0.98+
Summit Partners	ORGANIZATION	0.98+
two different ways	QUANTITY	0.97+
first part	QUANTITY	0.97+
around six years	QUANTITY	0.97+
around 70 people	QUANTITY	0.96+
HBase	TITLE	0.96+
one example	QUANTITY	0.96+
theCUBE	ORGANIZATION	0.95+
BigData SV	ORGANIZATION	0.94+
Big Data	ORGANIZATION	0.92+
Hadoop	TITLE	0.92+
one product	QUANTITY	0.92+
each one	QUANTITY	0.91+
six major pieces	QUANTITY	0.91+
About four	DATE	0.91+
CVC	TITLE	0.89+
first	QUANTITY	0.89+
about six years ago	DATE	0.88+
day one	QUANTITY	0.88+
Elastic	TITLE	0.87+
Silicon Valley	LOCATION	0.87+
Windows	TITLE	0.87+
five years ago	DATE	0.86+
S3	TITLE	0.82+
JDBC	TITLE	0.81+
Azure	TITLE	0.8+
CEO	PERSON	0.79+
one place	QUANTITY	0.78+
Redshift	TITLE	0.76+
Autumn	ORGANIZATION	0.75+
second	QUANTITY	0.74+
thousands	QUANTITY	0.72+
Big Data SV 2018	EVENT	0.71+
couple years	QUANTITY	0.71+
Google	ORGANIZATION	0.69+

Praveen Kankariya, Impetus | Big Data SV 2018

>> Narrator: Live from San Jose, it's theCUBE. Presenting Big Data Silicon Valley. Brought to you by SiliconANGLE Media, and its ecosystem partners. (electronica flourish) >> We're back at Big Data SV. This is theCUBE, the leader in live tech coverage. My name is Dave Vellante. Praveen Kankariya is here. He's the CEO of a company called Impetus. Company's been around the Big Data space before Hadoop, even. Praveen, thanks for back in theCUBE, good to see you. >> Thank you, Dave. >> So, as I said in the open, you've seen a lot. You kind of really got into the Big Data space in 2007, seen it blow through the Hadoop, you know, sort of batch world into the real time world, seen the data management headwinds. From your perspective, you know, what kind of problems are you solving today in the Big Data world? >> So I can go into the details of what we are doing, but at a high level, we are helping companies converge to a singular, enterprise-wide data model. 'Cause I think that is a crisis in the Fortune 500 today, and there'll be have and have-nots. >> Dave: What do you mean a crisis? >> I routinely run into companies who do not have their data model stitched. So they know the same customer, they know me by five different handles, and they don't have it figured out, that I'm the same guy. So, that I think is a major problem. So I think the C-suite is, they would not like to hear this, but they are flying partially blind. >> I have a theory on this, but I want to hear yours-- >> Sure. >> Why is that such a big problem? >> So, the most efficient business in the world is a one-man business, because everything is flowing in the same brain. The moment you hire your first employee, you start having communication breakdowns. And now these companies have hundreds and thousands of employees. Hundreds of thousands of employees. There's a lot of breakdown. There are airlines that, when I'm upgraded to first class, are offering me an economy-plus seat when I go to check in. That's ... they're turning me off, and they're losing an opportunity to, real opportunity to upsell something else to me. So. >> Okay, well, so let's bring this into the world of digital transformation. Everybody talks about those buzzwords, so let's try to put some sort of meat on that bone. If you look at the top five companies by market cap, Amazon, Apple, Facebook, Google. I'm missing somebody. Anyway, they're big. 500 billion, 700 billion dollars. They're all sort of what we would call data-driven. What does that mean? Data is at the core of their enterprise. A lot of the companies you're talking about, human expertise is the core of their enterprise, and they've got data that's sort of in silos, surrounding it. >> Praveen: Yes, yes. >> Is that an accurate description? >> That's-- And how can you help close that gap? >> So they have data in silos, and even that data in silos is not being used at velocity, with velocity. That data is, you know, it's taking much longer for them to even clean up that data, get access to that data, derive insights from that data. >> Dave: Right. >> So there's a lot of sluggishness, overall. >> Dave: So how do you help? >> How do we help? Great question. We help in many different ways. So we actually, so my company provides solutions. So we have some, a few products of our own, and then we work with all kinds of product companies. But we're about solving a problem, so when the customers we engage with, we actually solve a problem, so that there's a business outcome before we walk out. That's the big difference. We're not here to just sell the next sexy platform, or this or that, you know. We're not just here to excite the developers. >> So, maybe you could give me some of your favorite examples of where you've helped some of your clients. >> So there's one fairly large company, it's a household name around the world. And we have helped them create a single source of truth using a Big Data infrastructure. This has about six and a half thousand feeds of data coming in, continuously. Some continuously, some every few minutes, every few hours, whatnot. But then all their data is stitched together, and it's got guardrails, there's full governance. So, and now this platform is available to every business unit, to run their own applications. There's a set of APIs who go in and develop their own applications. So shadow idea is being promoted in this environment. It's not being looked down upon. >> So it's not sitting in one box, presumably, it's distributed throughout the organization? >> It is distributed. And you know, there're are some, you know, as long as you stay within the governance structure, you can derive, you know, somebody wants a graph database, they can derive a graph database from this massive, fully-connected data set, which is an enterprise-wide data set. >> Don't you see as some of the challenges, as well as cultural, there are some industries that might say, or some executives that say, "Well, you know my industry, "healthcare is an example, really hasn't been disrupted. "We're maybe insulated from that." I feel as though that's somewhat risky thinking, and it's easy to maybe sit back say, "Well, I'm going to wait, see what happens." What are your thoughts on that? >> Look at the data. The week Jeff Bezos announced that he is tying up with JPMC and Warren Buffet, some of the largest healthcare companies, and I'm talking of Fortune 10 companies, they lost about 20% of their market cap that week. So, you don't have to listen to me. Listen to the markets. >> Well, that's true. We see what happens in grocery, see what happens in... We haven't really seen, as I say, the disruption in healthcare, financial services, but it's all data, and that changes the equation. So why, let's see, not why. How when, if you get to this, so it sounds like step one is to get that sort of single data model across the organization, but there's other steps. You got to figure out how to monetize the data, not necessarily by selling it, but how data contributes to the monetization of the company. You got to it accessible, you got to make it of high quality, you've got to get the right skill sets. So there's a lot to it, and more than just the technology. Maybe you could talk about that. >> So the way, I would like to preach, if I'm allowed to-- >> Dave: Please, it's theCUBE... (laughs) >> No, no, I mean, I don't mean here, but if any CEO was listening to me, what I would like to tell them is, just create a vision of your ultimate connected data model. And then start looking at how do you converge out of that vision. It may not happen in one day, one week, one year. It's going to take time, and you know, every business is in flight, so they have to operate continuously, but they have to keep gravitating. And the biggest casualty is going to be their customer relationship if they don't do this. Because most companies don't know their customers fully. I mean, that little example of the airline which was showing me, flashing an ad for economy seats, premium economy seats when I'm already in first class, they don't know me. Some part of that company doesn't know me. So they're not able to service me well. Here now they lost an opportunity to monetize, but I think from another perspective, they lost an opportunity to really offer me something which would've made my flight way more comfortable. >> Well. >> So. >> Then you wonder if that's the dynamic that you encountered, what's the speed to market, the agility of that organization? They're hampered by their ability to, whether it's roll out new apps, identify new data sources, create new products for the customers. Have you seen, what kind of impacts have you seen within your customers? You gave the example before, of that sort of single data model, the single version of the truth. What business impacts have been able to affect for your customers? >> So, there, I mean I can go on giving you anecdotes from my observations, my front row observations into these companies. >> Yeah, it'd be good to have some kind of proof points, right? Our audience would love to hear that. >> So, you know there's a company not too far from here. They've stitched every click stream, right to product usage data. To support data, to every marketing email opened. And they can tell who's buying, what happened, what is their support experience, who's upgrading, who's upgrading faster because they had a positive support experience, or not. So everything is tied. Any direction you want to look into your customer space, you can go and get visibility from every perspective you can think of. That's customer 360. We worked with a credit card company where they had a massive rules engine, which had been developed over generations to report fraud, to catch fraud, while a transaction's being processed. We actually, once they got all their data together, we could apply a massive machine learning engine. And we started learning from customers' own behavior, so we completely discarded the rules engine, and now we have a learning system which is flagging fraudulent transactions. So they managed to cut down their false positives tremendously, and in turn reduced inconvenience. It used to be embarrassing for me to give out a card and get it declined in front of a customer. >> So, as I said at the top, you've seen sort of the evolution of this whole Big Data meme before it was called Big Data. What are the things that may be exciting you? We seem to be entering a new era we call digital. There's a cognitive era, AI, machine intelligence. What do you see that's exciting, and real? >> So number one, so I like to divide this space into two parts, the whole space of data analytics. There's the data plumbing, which we call data management, and whatnot. I have to plumb all my data together. Only then I can feed this data into my AI models. Now I can do in my silos today, but for me to do at a global level for my entire corporation, I need it all stitched together. And then, of course, these models are very real. My son, my 22-year old son is using TensorFlow for some little startup that he's cooking. And it took him just a month to pick it up and start applying it. So why can't our large companies do so? And in turn, bring down the cost of services, cost of products, the velocity of delivering those things to us, and make life better. >> So, the barriers to technology deployment are getting lower. >> And this is all feasible, Dave, right now. >> Yeah. >> You know, I mean, this is all, this is a dream 10 years ago. If somebody had said, you know, for an old corporation to stitch all its data, "What're you talking about? "It's not going to happen." But now, this is possible, and it's feasible. It's not going to require, make a massive hole in their budgets. >> But don't you think it's also table stakes to compete in over, the next 10 years? >> It is, there is table stakes. It's actually kind of late, from my perspective. If I had to go invest in the market, I mean, I would invest in companies who have their data act together. >> Yeah, yeah. So, what's the, how do you tell, when a company has its data act together? When you walk into a prospect, how do you know, what do you see, what're the characteristics of somebody who has that act together? >> It's hard for me to give you a few characteristics, but you know, you can tell what is the mandate they're operating under, if there are clear mandates. Because, for most companies, this is lost because of turf battle. This whole battle is lost due to turf issues. And the moment you see senior executives working together, with a massive willingness to bring everything together. You know, they'll have different turfs, and they're willing to contribute data, and bring it together. That's a phenomenally positive sign, because once that happens, then every large company has the wherewithal to go hire 50 data scientists, or work with all kinds of companies, including mine, to get data science help. >> Yeah, it comes back to the culture, doesn't it? >> Yes, absolutely. >> All right, Praveen, we have to leave it right there. Thanks very much for coming back in theCUBE. >> Thank you Dave, thank you. Thank you for the opportunity. >> You're very welcome. All right, keep it right there, everybody. This is theCUBE. We're live from the Forager in San Jose, Big Data SV. We'll be right back. (electronica flourish)

Published Date : Mar 9 2018

SUMMARY :

Brought to you by SiliconANGLE Media, Praveen, thanks for back in theCUBE, good to see you. You kind of really got into the Big Data space in 2007, So I can go into the details of what we are doing, that I'm the same guy. because everything is flowing in the same brain. Data is at the core of their enterprise. That data is, you know, it's taking much longer for them We're not here to just sell the next sexy platform, So, maybe you could give me to every business unit, And you know, there're are some, you know, and it's easy to maybe sit back say, So, you don't have to listen to me. So there's a lot to it, and more than just the technology. Dave: Please, it's theCUBE... It's going to take time, and you know, if that's the dynamic that you encountered, So, there, I mean I can go on giving you anecdotes Yeah, it'd be good to have So they managed to cut down We seem to be entering a new era we call digital. So number one, so I like to divide this space So, the barriers to technology deployment It's not going to require, If I had to go invest in the market, So, what's the, how do you tell, It's hard for me to give you a few characteristics, All right, Praveen, we have to leave it right there. Thank you for the opportunity. We're live from the Forager in San Jose, Big Data SV.

ENTITIES

Entity	Category	Confidence
Amazon	ORGANIZATION	0.99+
Apple	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Google	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Jeff Bezos	PERSON	0.99+
2007	DATE	0.99+
Praveen Kankariya	PERSON	0.99+
JPMC	ORGANIZATION	0.99+
one week	QUANTITY	0.99+
Praveen	PERSON	0.99+
Impetus	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
one box	QUANTITY	0.99+
one year	QUANTITY	0.99+
two parts	QUANTITY	0.99+
one day	QUANTITY	0.99+
50 data scientists	QUANTITY	0.99+
first employee	QUANTITY	0.99+
San Jose	LOCATION	0.99+
five different handles	QUANTITY	0.98+
10 years ago	DATE	0.98+
Big Data SV	ORGANIZATION	0.98+
today	DATE	0.98+
700 billion dollars	QUANTITY	0.98+
about 20%	QUANTITY	0.97+
about six and a half thousand feeds	QUANTITY	0.97+
Big Data	ORGANIZATION	0.96+
single	QUANTITY	0.96+
five companies	QUANTITY	0.96+
Impetus	PERSON	0.96+
one-man	QUANTITY	0.96+
theCUBE	ORGANIZATION	0.95+
one	QUANTITY	0.95+
22-year old	QUANTITY	0.94+
step one	QUANTITY	0.94+
single source	QUANTITY	0.93+
single version	QUANTITY	0.92+
2018	DATE	0.91+
next 10 years	DATE	0.87+
first class	QUANTITY	0.86+
hundreds and	QUANTITY	0.86+
Hundreds of thousands of employees	QUANTITY	0.85+
Silicon Valley	LOCATION	0.85+
Buffet	PERSON	0.84+
a month	QUANTITY	0.83+
Fortune	ORGANIZATION	0.82+
500 billion	QUANTITY	0.81+
10 companies	QUANTITY	0.76+
Hadoop	TITLE	0.69+
hours	QUANTITY	0.69+
employees	QUANTITY	0.68+
Hadoop	LOCATION	0.68+
week	DATE	0.68+
360	QUANTITY	0.6+
Forager	ORGANIZATION	0.56+
Fortune 500	ORGANIZATION	0.56+
Warren	ORGANIZATION	0.54+
thousands	QUANTITY	0.53+
TensorFlow	TITLE	0.51+
minutes	QUANTITY	0.5+
every	QUANTITY	0.49+
SV	EVENT	0.48+

David Abercrombie, Sharethrough & Michael Nixon, Snowflake | Big Data SV 2018

>> Narrator: Live from San Jose, it's theCUBE. Presenting Big Data, Silicon Valley. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Hi, I'm George Gilbert, and we are broadcasting from the Strata Data Conference, we're right around the corner at the Forager Tasting Room & Eatery. We have this wonderful location here, and we are very lucky to have with us Michael Nixon, from Snowflake, which is a leading cloud data warehouse. And David Abercrombie from Sharethrough which is a leading ad tech company. And between the two of them, they're going to tell us some of the most advance these cases we have now for cloud-native data warehousing. Michael, why don't you start with giving us some context for how on a cloud platform one might rethink a data warehouse? >> Yeah, thank you. That's a great question because let me first answer it from the end-user, business value perspective, when you run a workload on a cloud, there's a certain level of expectation you want out of the cloud. You want scalability, you want unlimited scalability, you want to be able to support all your users, you want to be able to support the data types, whatever they may be that comes in into your organization. So, there's a level of expectation that one should expect from a service point of view once you're in a cloud. So, a lot of the technology that were built up to this point have been optimized for on-premises types of data warehousing where perhaps that level of service and currency and unlimited scalability was not really expected but, guess what? Once it comes to the cloud, it's expected. So those on-premises technologies aren't suitable in the cloud, so for enterprises and, I mean, companies, organizations of all types from finance, banking, manufacturing, ad tech as we'll have today, they want that level of service in the cloud. And so, those technologies will not work, and so it requires a rethinking of how those architectures are built. And it requires being built for the cloud. >> And just to, alright, to break this down and be really concrete, some of the rethinking. We separate compute from storage, which is a familiar pattern that we've learned in the cloud but we also then have to have this sort of independent elasticity between-- >> Yes. Storage and the compute, and then Snowflake's taken it even a step further where you can spin out multiple compute clusters. >> Right. >> Tell us how that works and why that's so difficult and unique. >> Yeah, you know, that's taking us under the covers a little bit, but what makes our infrastructure unique is that we have a three-layer architecture. We separate, just as you said, storage from the compute layer, from the services layer. And that's really important because as I mentioned before, you want unlimited capacity, unlimited resources. So, if you scale, compute, and today's world on on-premises MPP, what that really means is that you have to bring the storage along with the compute because compute is tied to the storage so when you scale the storage along with the compute, usually that involves a lot of burden on the data warehouse manager because now they have to redistribute the data and that means redistributing keys, managing keys if you will. And that's a burden, and by the reverse, if all you wanted to do was increase storage but not the compute, because compute was tied to storage. Why you have to buy these additional compute notes, and that might add to the cost when, in fact, all you really wanted to pay for was for additional storage? So, by separating those, you keep them independent, and so you can scale storage apart from compute and then, once you have your compute resources in place, the virtual warehouses that you're talking about that have completed the job, you spun them up, it's done its job, and you take it down, guess what? You can release those resources, and of course, in releasing those resources, basically you can cut your cost as well because, for us, it's pure usage-based pricing. You only pay for what you use, and that's really fantastic. >> Very different from the on-prem model where, as you were saying, tied compute and storage together, so. >> Yeah, let's think about what that means architecturally, right? So if you have an on-premises data warehouse, and you want to scale your capacity, chances are you'll have to have that hardware in place already. And having that hardware in place already means you're paying that expense and, so you may pay for that expense six months prior to need it. Let's take a retailer example. >> Yeah. >> You're gearing up for a peak season, which might be Christmas, and so you put that hardware in place sometime in June, you'll always put it in advanced because why? You have to bring up the environment, so you have to allow time for implementation or, if you will, deployment to make sure everything is operational. >> Okay. >> And then what happens is when that peak period comes, you can't expand in that capacity. But what happens once that peak period is over? You paid for that hardware, but you don't really need it. So, our vision is, or the vision we believe you should have when you move workloads to the cloud is, you pay for those when you need them. >> Okay, so now, David, help us understand, first, what was the business problem you were trying to solve? And why was Snowflake, you know, sort of uniquely suited for that? >> Well, let me talk a little bit about Sharethrough. We're ad tech, at the core of our business we run an ad exchange, where we're doing programmatic training with the bids, with the real-time bidding spec. The data is very high in volume, with 12 billion impressions a month, that's a lot of bids that we have to process, a lot of bid requests. The way it operates, the bids and the bid responses and programmatic training are encoded in JSONs, so our ad exchange is basically exchanging messages in JSON with our business partners. And the JSONs are very complicated, there's a lot of richness and detail, such that the advertisers can decide whether or not they want to bid. Well, this data is very complicated, very high-volume. And advertising, like any business, we really need to have good analytics to understand how our business is operating, how our publishers are doing, how our advertisers are doing. And it all depends upon this very high-volume, very complex JSON event data stream. So, Snowflake was able to ingest our high-volume data very gracefully. The JSON parsing techniques of Snowflake allow me to expose the complicated data structure in a way that's very transparent and usable to our analysts. Our use of Snowflake has replaced clunkier tools where the analysts basically had to be programmers, writing programs in Scala or something to do in analysis. And now, because we've transparently and easily exposed the complicated structures within Snowflake in a relational database, they can use good old-fashioned SQL to run their queries, literally, afternoon analysis is now a five-minute query. >> So, let me, as I'm listening to you describe this. We've had various vendors telling us about these workflows in the sort of data prep and data science tool change. It almost sounds to me like Snowflake is taking semi-structured or complex data and it's sort of unraveling it and normalizing is kind of an overloaded term but it's making it business-ready, so you don't need as much of that manual data prep. >> Yeah, exactly, you don't need as much manual data prep, or you don't need as much expertise. For instance, Snowflake's JSON capabilities, in terms of drilling down the JSON tree with dot path notation, or expanding nested objects is very expressive, very powerful, but still your typical analyst or your BI tool certainly wouldn't know how to do that. So, in Snowflake, we sort of have our cake and eat it too. We can have our JSONs with their full richness in our database, but yet we can simplify and expose the data elements that are needed for analysis, so that an analyst, their first day on the job, they can get right to work and start writing queries. >> So let me ask you about, a little more about the programmatic ad use case. So if you have billions of impressions per month, I'm guessing that means you have quite a few times more, in terms of bids, and then there's the, you know once you have, I guess a successful one, you want to track what happens. >> Correct. >> So tell us a little more about that, what that workload looks like, in terms of, what analytics you're trying to perform, what's your tracking? >> Yeah, well, you're right. There's different steps in our funnel. The impression request expands out by a factor of a dozen as we send it to all the different potential bidders. We track all that data, the responses come back, we track that, we track our decisions and why we selected the bidder. And then, once the ad is shown, of course there's various beacons and tracking things that fire. We'd have to track all of that data, and the only way we could make sense out of our business is by bringing all that data together. And in a way that is reliable, transparent, and visible, and also has data integrity, that's another thing I like about the Snowflake database is that it's a good old-fashioned SQL database that I can declare my primary keys, I can run QC checks, I can ensure high data integrity that is demanded by BI and other sorts of analytics. >> What would be, as you continue to push the boundaries of the ad tech service, what's some functionality that you're looking to add, and Snowflake as your partner, either that's in there now that you still need to take advantage of or things that you're looking to in the future? >> Well, moving forward, of course, we, it's very important for us to be able to quickly gauge the effectiveness of new products. The ad tech market is fast-changing, there's always new ways of bidding, new products that are being developed, new ways for the ad ecosystem to work. And so, as we roll those out, we need to be able to quickly analyze, you know, "Is this thing working or not?" You know, kind of an agile environment, pivot or prove it. Does this feature work or not? So, having all the data in one place makes that possible for that very quick assessment of the viability of a new feature, new product. >> And, dropping down a little under the covers for how that works, does that mean, like you still have the base JSON data that you've absorbed, but you're going to expose it with different schemas or access patterns? >> Yeah, indeed. For instance, we make use of the SQL schemas, roles, and permissions internally where we can have the different teams have their own domain of data that they can expose internally, and looking forward, there's the share house feature of Snowflake that we're looking to implement with our partners, where, rather than sending them data, like a daily dump of data, we can give them access to their data in our database through this top layer that Michael mentioned, the service layer, essentially allows me to create a view grant select onto another customer. So I no longer have to send daily data dumps to partners or have some sort of API for getting data. They can simply query the data themselves so we'll be implementing that feature with our major partners. >> I would be remiss in not asking at a data conference like this, now that there's the tie-in with CuBOL and Spark Integration and Machine Learning, is there anything along that front that you're planning to exploit in the near future? >> Well, yeah, Sharethrough, we're very experimental, playful, we're always examining new data technologies and new ways of doing things but now with Snowflake as sort of our data warehouse of curated data. I've got two petabytes of referential integrity data, and that is reliable. We can move forward into our other analyses and other uses of data knowing that we have captured every event exactly once, and we know exactly where it fits in a business context, in a relational manner. It's clean, good data integrity, reliable, accessible, visible, and it's just plain old SQL. (chuckles) >> That's actually a nice way to sum it up. We've got the integrity that we've come to expect and love from relational databases. We've got the flexibility of machine-oriented data, or JSON. But we don't have to give up the query engine, and then now you have more advanced features, analytic features that you can take advantage of coming down the pipe. >> Yeah, again we're a modern platform for the modern age, that's basically cloud-based computing. With a platform like Snowflake in the backend, you can now move those workloads that you're accustomed to to the cloud and have in the environment that you're familiar with, and it saves you a lot of time and effort. You can focus on more strategic projects. >> Okay, well, with that, we're going to take a short break. This has been George Gilbert, we're with Michael Nixon of Snowflake, and David Abercrombie of Sharethrough listening to how the most modern ad tech companies are taking advantage of the most modern cloud data warehouses. And we'll be back after a short break here at the Strata Data Conference, thanks. (quirky music)

Published Date : Mar 9 2018

SUMMARY :

Brought to you by SiliconANGLE Media some of the most advance these cases we have now a certain level of expectation you want out of the cloud. concrete, some of the rethinking. Storage and the compute, and then Snowflake's taken it and unique. that have completed the job, you spun them up, Very different from the on-prem model where, as you and you want to scale your capacity, chances are You have to bring up the environment, so you have to allow You paid for that hardware, but you don't really need it. of richness and detail, such that the advertisers can So, let me, as I'm listening to you describe this. of drilling down the JSON tree with dot path notation, I'm guessing that means you have quite a few times more, I like about the Snowflake database analyze, you know, "Is this thing working or not?" the service layer, essentially allows me to create and that is reliable. and then now you have more you can now move those workloads that you're accustomed to at the Strata Data Conference, thanks.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
George Gilbert	PERSON	0.99+
David Abercrombie	PERSON	0.99+
Michael Nixon	PERSON	0.99+
Michael	PERSON	0.99+
June	DATE	0.99+
two	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Scala	TITLE	0.99+
first	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
five-minute	QUANTITY	0.99+
Snowflake	TITLE	0.99+
Christmas	EVENT	0.98+
Strata Data Conference	EVENT	0.98+
three-layer	QUANTITY	0.98+
first day	QUANTITY	0.98+
a dozen	QUANTITY	0.98+
two petabytes	QUANTITY	0.97+
Sharethrough	ORGANIZATION	0.97+
JSON	TITLE	0.97+
SQL	TITLE	0.96+
one place	QUANTITY	0.95+
six months	QUANTITY	0.94+
Forager Tasting Room & Eatery	ORGANIZATION	0.91+
today	DATE	0.89+
Snowflake	ORGANIZATION	0.87+
Spark	TITLE	0.87+
12 billion impressions a month	QUANTITY	0.87+
Machine Learning	TITLE	0.84+
Big Data	ORGANIZATION	0.84+
billions of impressions	QUANTITY	0.8+
CuBOL	TITLE	0.79+
Big Data SV 2018	EVENT	0.77+
once	QUANTITY	0.72+
theCUBE	ORGANIZATION	0.63+
JSONs	TITLE	0.61+
times	QUANTITY	0.55+

Satyen Sangani, Alation | Big Data SV 2018

>> Announcer: Live from San Jose, it's theCUBE. Presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partners. (upbeat music) >> Welcome back to theCUBE, I'm Lisa Martin with John Furrier. We are covering our second day of our event Big Data SV. We've had some great conversations, John, yesterday, today as well. Really looking at Big Data, digital transformation, Big Data, plus data science, lots of opportunity. We're excited to welcome back to theCUBE an alumni, Satyen Sangani, the co-founder and CEO of Alation. Welcome back! >> Thank you, it's wonderful to be here again. >> So you guys finish up your fiscal year end of December 2017, where in the first quarter of 2018. You guys had some really strong results, really strong momentum. >> Yeah. >> Tell us what's going on at Alation, how are you pulling this momentum through 2018. >> Well, I think we have had an enterprise focused business historically, because we solve a very complicated problem for very big enterprises, and so, in the last quarter we added customers like American Express, PepsiCo, Roche. And with huge expansions from our existing customers, some of whom, over the course of a year, I think went 12 X from an initial base. And so, we found some just incredible momentum in Q4 and for us that was a phenomenal cap to a great year. >> What about the platform you guys are doing? Can you just take a minute to explain what Alation does again just to refresh where you are on the product side? You mentioned some new accounts, some new use cases. >> Yeah. >> What's the update? Take a minute, talk about the update. >> Absolutely, so, you certainly know, John, but Alation's a data catalog and a data catalog essentially, you can think of it as Yelp or Amazon for data and information side of the enterprise. So if you think about how many different databases there are, how many different reports there are, how many different BI tools there are, how many different APIs there are, how many different algorithms there are, it's pretty dizzying for the average analyst. It's pretty dizzying for the average CIO. It's pretty dizzying for the average chief data officer. And particularly, inside of Fortune 500s where you have hundreds of thousands of databases. You have a situation where people just have too much signal or too much noise, not enough signal. And so what we do is we provide this Yelp for that information. You can come to Alation as a catalog. You can do a search on revenue 2017. You'll get all of the reports, all of the dashboards, all of the tables, all of the people that you might need to be able to find. And that gives you a single place of reference, so you can understand what you've got and what can answer your questions. >> What's interesting is, first of all, I love data. We're data driven, we're geeks on data. But when I start talking to folks that are outside the geek community or nerd community, you say data and they go, "Oh," because they cringe and they say, "Facebook." They see that data issues there. GDPR, data nightmare, where's the store, you got to manage it. And then, people are actually using data, so they're realizing how hard (laughs) it is. >> Yeah >> How much data do we have? So it's kind of like a tropic disillusionment, if you will. Now they got to get their hands on it. They've got to put it to work. >> Yeah. >> And they know that So, it's now becoming really hard (laughs) in their mind. This is business people. >> Yeah. >> They have data everywhere. How do you guys talk to that customer? Because, if you don't have quality data, if you don't have data you can trust, if you don't have the right people, it's hard to get it going. >> Yeah. >> How do you guys solve that problem and how do you talk to customers? >> So we talk a lot about data literacy. There is a lot of data in this world and that data is just emblematic of all of the stuff that's going on in this world. There's lots of systems, there's lots of complexity and the data, basically, just is about that complexity. Whether it's weblogs, or sensors, or the like. And so, you can either run away from that data, and say, "Look, I'm going to not, "I'm going to bury my head in the sand. "I'm going to be a business. "I'm just going to forget about that data stuff." And that's certainly a way to go. >> John: Yeah. >> It's a way to go away. >> Not a good outlook. >> I was going to say, is that a way of going out of business? >> Or, you can basically train, it's a human resources problem fundamentally. You've got to train your people to understand how to use data, to become data literate. And that's what our software is all about. That's what we're all about as a company. And so, we have a pretty high bar for what we think we do as a business and we're this far into that. Which is, we think we're training people to use data better. How do you learn to think scientifically? How do you go use data to make better decisions? How do you build a data driven culture? Those are the sorts of problems that I'm excited to work on. >> Alright, now take me through how you guys play out in an engagement with the customer. So okay, that's cool, you guys can come in, we're getting data literate, we understand we need to use data. Where are you guys winning? Where are you guys seeing some visibility, both in terms of the traction of the usage of the product, the use cases? Where is it kind of coming together for you guys? >> Yeah, so we literally, we have a mantra. I think any early stage company basically wins because they can focus on doing a couple of things really well. And for us, we basically do three things. We allow people to find data. We allow people to understand the data that they find. And we allow them to trust the data that they see. And so if I have a question, the first place I start is, typically, Google. I'll go there and I'll try to find whatever it is that I'm looking for. Maybe I'm looking for a Mediterranean restaurant on 1st Street in San Jose. If I'm going to go do that, I'm going to do that search and I'm going to find the thing that I'm looking for, and then I'm going to figure out, out of the possible options, which one do I want to go to. And then I'll figure out whether or not the one that has seven ratings is the one that I trust more than the one that has two. Well, data is no different. You're going to have to find the data sets. And inside of companies, there could be 20 different reports and there could be 20 different people who have information, and so you're going to trust those people through having context and understanding. >> So, trust, people, collaboration. You mentioned some big brands that you guys added towards the end of calendar 2017. How do you facilitate these conversations with maybe the chief data officer. As we know, in large enterprises, there's still a lot of ownership over data silos. >> Satyen: Yep. >> What is that conversation like, as you say on your website, "The first data catalog designed for collaboration"? How do you help these organizations as large as Coca-Cola understand where all the data are and enable the human resources to extract values, and find it, understand it, and trust it? >> Yeah, so we have a very simple hypothesis, which is, look, people fundamentally have questions. They're fundamentally curious. So, what you need to do as a chief data officer, as a chief information officer, is really figure out how to unlock that curiosity. Start with the most popular data sets. Start with the most popular systems. Start with the business people who have the most curiosity and the most demand for information. And oh, by the way, we can measure that. Which is the magical thing that we do. So we can come in and say, "Look, "we look at the logs inside of your systems to know "which people are using which data sets, "which sources are most popular, which areas are hot." Just like a social network might do. And so, just like you can say, "Okay, these are the trending restaurants." We can say, "These are the trending data sets." And that curiosity allows people to know, what data should I document first? What data should I make available first? What data do I improve the data quality over first? What data do I govern first? And so, in a world where you've got tons of signal, tons of systems, it's totally dizzying to figure out where you should start. But what we do is, we go these chief data officers and say, "Look, we can give you a tool and a catalyst so "that you know where to go, "what questions to answer, who to serve first." And you can use that to expand to other groups in the company. >> And this is interesting, a lot of people you mentioned social networks, use data to optimize for something, and in the case of Facebook, they they use my data to target ads for me. You're using data to actually say, "This is how people are using the data." So you're using data for data. (laughs) >> That's right. >> So you're saying-- >> Satyen: We're measuring how you can use data. >> And that's interesting because, I hear a lot of stories like, we bought a tool, we never used it. >> Yep. >> Or people didn't like the UI, just kind of falls on the side. You're looking at it and saying, "Let's get it out there and let's see who's using the data." And then, are you doubling down? What happens? Do I get a little star, do I get a reputation point, am I being flagged to HR as a power user? How are you guys treating that gamification in this way? It's interesting, I mean, what happens? Do I become like-- >> Yeah, so it's funny because, when you think about search, how do you figure out that something's good? So what Google did is, they came along and they've said, "We've got PageRank." What we're going to do is we're going to say, "The pages that are the best pages are the ones "that people link to most often." Well, we can do the same thing for data. The data sources that are the most useful ones are the people that are used most often. Now on top of that, you can say, "We're going to have experts put ratings," which we do. And you can say people can contribute knowledge and reviews of how this data set can be used. And people can contribute queries and reports on top of those data sets. And all of that gives you this really rich graph, this rich social graph, so that now when I look at something it doesn't look like Greek. It looks like, "Oh, well I know Lisa used this data set, "and then John used it "and so at least it must answer some questions "that are really intelligent about the media business "or about the software business. "And so that can be really useful for me "if I have no clue as to what I'm looking at." >> So the problem that you-- >> It's on how you demystify it through the social connections. >> So the problem that you solve, if what I hear you correctly, is that you make it easy to get the data. So there's some ease of use piece of it, >> Yep. >> cataloging. And then as you get people using it, this is where you take the data literacy and go into operationalizing data. >> Satyen: That's right. >> So this seems to be the challenge. So, if I'm a customer and I have a problem, the profile of your target customer or who your customers are, people who need to expand and operationalize data, how would you talk about it? >> Yeah, so it's really interesting. We talk about, one of our customers called us, sort of, the social network for nerds inside of an enterprise. And I think for me that's a compliment. (John laughing) But what I took from that, and when I explained the business of Alation, we start with those individuals who are data literate. The data scientists, the data engineers, the data stewards, the chief data officer. But those people have the knowledge and the context to then explain data to other people inside of that same institution. So in the same way that Facebook started with Harvard, and then went to the rest of the Ivies, and then went to the rest of the top 20 schools, and then ultimately to mom, and dad, and grandma, and grandpa. We're doing the exact same thing with data. We start with the folks that are data literate, we expand from there to a broader audience of people that don't necessarily have data in their titles, but have curiosity and questions. >> I like that on the curiosity side. You spent some time up at Strata Data. I'm curious, what are some of the things you're hearing from customers, maybe partners? Everyone used to talk about Hadoop, it was this big thing. And then there was a creation of data lakes, and swampiness, and all these things that are sort of becoming more complex in an organization. And with the rise of myriad data sources, the velocity, the volume, how do you help an enterprise understand and be able to catalog data from so many different sources? Is it that same principle that you just talked about in terms of, let's start with the lowest hanging fruit, start making the impact there and then grow it as we can? Or is an enterprise needs to be competitive and move really, really quickly? I guess, what's the process? >> How do you start? >> Right. >> What do people do? >> Yes! >> So it's interesting, what we find is multiple ways of starting with multiple different types of customers. And so, we have some customers that say, "Look, we've got a big, we've got Teradata, "and we've got some Hadoop, "and we've got some stuff on Amazon, "and we want to connect it all." And those customers do get started, and they start with hundreds of users, in some case, they start with thousands of users day one, and they just go Big Bang. And interestingly enough, we can get those customers enabled in matters of weeks or months to go do that. We have other customers that say, "Look, we're going to start with a team of 10 people "and we're going to see how it grows from there." And, we can accommodate either model or either approach. From our prospective, you just have to have the resources and the investment corresponding to what you're trying to do. If you're going to say, "Look, we're going to have, two dollars of budget, and we're not going to have the human resources, and the stewardship resources behind it." It's going to be hard to do the Big Bang. But if you're going to put the appropriate resources up behind it, you can do a lot of good. >> So, you can really facilitate the whole go big or go home approach, as as well as the let's start small think fast approach. >> That's right, and we always, actually ironically, recommend the latter. >> Let's start small, think fast, yeah. >> Because everybody's got a bigger appetite than they do the ability to execute. And what's great about the tool, and what I tell our customers and our employees all day long is, there's only metric I track. So year over year, for our business, we basically grow in accounts by net of churn by 55%. Year over year, and that's actually up from the prior year. And so from my perspective-- >> And what does that mean? >> So what that means is, the same customer gave us 55 cents more on the dollar than they did the prior year. Now that's best in class for most software businesses that I've heard. But what matters to me is not so much that growth rate in and of itself. What it means to me is this, that nobody's come along and says, "I've mastered my data. "I understand all of the information side of my company. "Every person knows everything there is to know." That's never been said. So if we're solving a problem where customers are saying, "Look, we get, and we can find, and understand, "and trust data, and we can do that better last year "than we did this year, and we can do it even more "with more people," we're going to be successful. >> What I like about what you're doing is, you're bringing an element of operationalizing data for literacy and for usage. But you're really bringing this notion of a humanizing element to it. Where you see it in security, you see it in emerging ecosystems. Where there's a community of data people who know how hard it is and was, and it seems to be getting easier. But the tsunami of new data coming in, IOT data, whatever, and new regulators like GDPR. These are all more surface area problems. But there's a community coming together. How have you guys seen your product create community? Have you seen any data on that, 'cause it sounds like, as people get networked together, the natural outcome of that is possibly usage you attract. But is there a community vibe that you're seeing? Is there an internal collaboration where they sit, they're having meet ups, they're having lunches. There's a social aspect in a human aspect. >> No, it's humanal, no, it's amazing. So in really subtle but really, really powerful ways. So one thing that we do for every single data source or every single report that we document, we just put who are the top users of this particular thing. So really subtly, day one, you're like, "I want to go find a report. "I don't even know "where to go inside of this really mysterious system". Postulation, you're able to say, "Well, I don't know where to go, but at least I can go call up John or Lisa," and say, "Hey, what is it that we know about this particular thing?" And I didn't have to know them. I just had to know that they had this report and they had this intelligence. So by just discovering people in who they are, you pick up on what people can know. >> So people of the new Google results, so you mentioned Google PageRank, which is web pages and relevance. You're taking a much more people approach to relevance. >> Satyen: That's right. >> To the data itself. >> That's right, and that builds community in very, very clear ways, because people have curiosity. Other people are in the mechanism why in which they satisfy that curiosity. And so that community builds automatically. >> They pay it forward, they know who to ask help for. >> That's right. >> Interesting. >> That's right. >> Last question, Satyen. The tag line, first data catalog designed for collaboration, is there a customer that comes to mind to you as really one that articulates that point exactly? Where Alation has come in and really kicked open the door, in terms of facilitating collaboration. >> Oh, absolutely. I was literally, this morning talking to one of our customers, Munich Reinsurance, largest reinsurance customer or company in the world. Their chief data officer said, "Look, three years ago, "we started with 10 people working on data. "Today, we've got hundreds. "Our aspiration is to get to thousands." We have three things that we do. One is, we actually discover insights. It's actually the smallest part of what we do. The second thing that we do is, we enable people to use data. And the third thing that we do is, drive a data driven culture. And for us, it's all about scaling knowledge, to centers in China, to centers in North America, to centers in Australia. And they've been doing that at scale. And they go to each of their people and they say, "Are you a data black belt, are you a data novice?" It's kind of like skiing. Are you blue diamond or a black diamond. >> Always ski in pairs (laughs) >> That's right. >> And they do ski in pairs. And what they end up ultimately doing is saying, "Look, we're going to train all of our workforce to become better, so that in three, 10 years, we're recognized as one of the most innovative insurance companies in the world." Three years ago, that was not the case. >> Process improvement at a whole other level. My final question for you is, for the folks watching or the folks that are going to watch this video, that could be a potential customer of yours, what are they feeling? If I'm the customer, what smoke signals am I seeing that say, I need to call Alation? What are some of the things that you've found that would tell a potential customer that they should be talkin' to you guys? >> Look, I think that they've got to throw out the old playbook. And this was a point that was made by some folks at a conference that I was at earlier this week. But they basically were saying, "Look, the DLNA's PlayBook was all about providing the right answer." Forget about that. Just allow people to ask the right questions. And if you let people's curiosity guide them, people are industrious, and ambitious, and innovative enough to go figure out what they need to go do. But if you see this as a world of control, where I'm going to just figure out what people should know and tell them what they're going to go know. that's going to be a pretty, a poor career to go choose because data's all about, sort of, freedom and innovation and understanding. And we're trying to push that along. >> Satyen, thanks so much for stopping by >> Thank you. >> and sharing how you guys are helping organizations, enterprises unlock data curiosity. We appreciate your time. >> I appreciate the time too. >> Thank you. >> And thanks John! >> And thank you. >> Thanks for co-hosting with me. For John Furrier, I'm Lisa Martin, you're watching theCUBE live from our second day of coverage of our event Big Data SV. Stick around, we'll be right back with our next guest after a short break. (upbeat music)

Published Date : Mar 9 2018

SUMMARY :

brought to you by SiliconANGLE Media Satyen Sangani, the co-founder and CEO of Alation. So you guys finish up your fiscal year how are you pulling this momentum through 2018. in the last quarter we added customers like What about the platform you guys are doing? Take a minute, talk about the update. And that gives you a single place of reference, you got to manage it. So it's kind of like a tropic disillusionment, if you will. And they know that How do you guys talk to that customer? And so, you can either run away from that data, Those are the sorts of problems that I'm excited to work on. Where is it kind of coming together for you guys? and I'm going to find the thing that I'm looking for, that you guys added towards the end of calendar 2017. And oh, by the way, we can measure that. a lot of people you mentioned social networks, I hear a lot of stories like, we bought a tool, And then, are you doubling down? And all of that gives you this really rich graph, It's on how you demystify it So the problem that you solve, And then as you get people using it, and operationalize data, how would you talk about it? and the context to then explain data the volume, how do you help an enterprise understand have the resources and the investment corresponding to So, you can really facilitate the whole recommend the latter. than they do the ability to execute. What it means to me is this, that nobody's come along the natural outcome of that is possibly usage you attract. And I didn't have to know them. So people of the new Google results, And so that community builds automatically. is there a customer that comes to mind to And the third thing that we do is, And what they end up ultimately doing is saying, that they should be talkin' to you guys? And if you let people's curiosity guide them, and sharing how you guys are helping organizations, Thanks for co-hosting with me.

ENTITIES

Entity	Category	Confidence
PepsiCo	ORGANIZATION	0.99+
Lisa Martin	PERSON	0.99+
Satyen Sangani	PERSON	0.99+
John	PERSON	0.99+
American Express	ORGANIZATION	0.99+
Alation	ORGANIZATION	0.99+
Roche	ORGANIZATION	0.99+
Satyen	PERSON	0.99+
thousands	QUANTITY	0.99+
Lisa	PERSON	0.99+
55 cents	QUANTITY	0.99+
Australia	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Coca-Cola	ORGANIZATION	0.99+
2018	DATE	0.99+
10 people	QUANTITY	0.99+
three	QUANTITY	0.99+
John Furrier	PERSON	0.99+
hundreds	QUANTITY	0.99+
Yelp	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
China	LOCATION	0.99+
Harvard	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Today	DATE	0.99+
2017	DATE	0.99+
55%	QUANTITY	0.99+
second day	QUANTITY	0.99+
North America	LOCATION	0.99+
Google	ORGANIZATION	0.99+
today	DATE	0.99+
two dollars	QUANTITY	0.99+
20 different people	QUANTITY	0.99+
yesterday	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
last year	DATE	0.99+
three years ago	DATE	0.99+
first	QUANTITY	0.99+
second thing	QUANTITY	0.99+
One	QUANTITY	0.99+
one	QUANTITY	0.99+
first quarter of 2018	DATE	0.99+
20 different reports	QUANTITY	0.99+
three things	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
last quarter	DATE	0.98+
DLNA	ORGANIZATION	0.98+
third thing	QUANTITY	0.98+
Three years ago	DATE	0.98+
each	QUANTITY	0.98+
single	QUANTITY	0.98+
both	QUANTITY	0.98+
1st Street	LOCATION	0.98+
Big Bang	EVENT	0.98+
this year	DATE	0.98+
Strata Data	ORGANIZATION	0.97+
12 X	QUANTITY	0.97+
GDPR	TITLE	0.97+
seven ratings	QUANTITY	0.96+
Alation	PERSON	0.95+
this morning	DATE	0.95+
Big Data SV 2018	EVENT	0.94+
first data	QUANTITY	0.94+
Teradata	ORGANIZATION	0.93+
10 years	QUANTITY	0.93+

Ian Swanson, DataScience.com | Big Data SV 2018

(royal music) >> Announcer: John Cleese. >> There's a lot of people out there who have no idea what they're doing, but they have absolutely no idea that they have no idea what they're doing. Those are the ones with the confidence and stupidity who finish up in power. That's why the planet doesn't work. >> Announcer: Knowledgeable, insightful, and a true gentleman. >> The guy at the counter recognized me and said... Are you listening? >> John Furrier: Yes, I'm tweeting away. >> No, you're not. >> I tweet, I'm tweeting away. >> He is kind of rude that way. >> You're on your (bleep) keyboard. >> Announcer: John Cleese joins the Cube alumni. Welcome, John. >> John Cleese: Have you got any phone calls you need to answer? >> John Furrier: Hold on, let me check. >> Announcer: Live from San Jose, it's the Cube, presenting Big Data Silicon Valley, brought to you by Silicon Angle Media and its ecosystem partners. (busy music) >> Hey, welcome back to the Cube's continuing coverage of our event, Big Data SV. I'm Lisa Martin with my co-host, George Gilbert. We are down the street from the Strata Data Conference. This is our second day, and we've been talking all things big data, cloud data science. We're now excited to be joined by the CEO of a company called Data Science, Ian Swanson. Ian, welcome to the Cube. >> Thanks so much for having me. I mean, it's been a awesome two days so far, and it's great to wrap up my trip here on the show. >> Yeah, so, tell us a little bit about your company, Data Science, what do you guys do? What are some of the key opportunities for you guys in the enterprise market? >> Yeah, absolutely. My company's called datascience.com, and what we do is we offer an enterprise data science platform where data scientists get to use all they tools they love in all the languages, all the libraries, leveraging everything that is open source to build models and put models in production. Then we also provide IT the ability to be able to manage this massive stack of tools that data scientists require, and it all boils down to one thing, and that is, companies need to use the data that they've been storing for years. It's about, how do you put that data into action. We give the tools to data scientists to get that data into action. >> Let's drill down on that a bit. For a while, we thought if we just put all our data in this schema-on-read repository, that would be nirvana. But it wasn't all that transparent, and we recognized we have to sort of go in and structure it somewhat, help us take the next couple steps. >> Ian: Yeah, the journey. >> From this partially curated data sets to something that turns into a model that is actionable. >> That's actually been the theme in the show here at the Strata Data Conference. If we went back years ago, it was, how do we store data. Then it was, how do we not just store and manage, but how do we transform it and get it into a shape that we can actually use it. The theme of this year is how do we get it to that next step, the next step of putting it into action. To layer onto that, data scientists need to access data, yes, but then they need to be able to collaborate, work together, apply many different techniques, machine learning, AI, deep learning, these are all techniques of a data scientist to be able to build a model. But then there's that next step, and the next is, hey, I built this model, how do I actually get it in production? How does it actually get used? Here's the shocking thing. I was at an event where there's 500 data scientists in the audience, and I said, "Stand up if you worked on a model for more than nine months "and it never went into production." 90% of the audience stood up. That's the last mile that we're all still working on, and what's exciting is, we can make it possible today. >> Wanting to drill down into the sort of, it sounds like there's a lot of choice in the tools. But typically, to do a pipeline, you either need well established APIs that everyone understands and plugs together with, or you need an end to end sort of single vendor solution that becomes the sort of collaboration backbone. How are you organized, how are you built? >> This might be self-serving, but datascience.com, we have enterprise data science platform, we recommend a unified platform for data science. Now, that unified platform needs to be highly configurable. You need to make it so that that workbench, you can use any tool that you want. Some data scientists might want to use a hammer, others want to be able to use a screwdriver over here. The power is how configurable, how extensible it is, how open source you can adopt everything. The amazing trends that we've seen have been proprietary solutions going back decades, to now, the rise of open source. Every day, dozens if not hundreds of new machine learning libraries are being released every single day. We've got to give those capabilities to data scientists and make them scale. >> OK, so the, and I think it's pretty easy to see how you would have incorporate new machine learning libraries into a pipeline. But then there's also the tools for data preparation, and for like feature extraction and feature engineering, you might even have some tools that help you with figuring out which algorithm to select. What holds all that together? >> Yeah, so orchestrating the enterprise data science stack is the hardest challenge right now. There has to be a company like us that is the glue, that is not just, do these solutions work together, but also, how do they collaborate, what is that workflow? What are those steps in that process? There's one thing that you might have left out, and that is, model deployment, model interpretation, model management. >> George: That's the black art, yeah. >> That's where this whole thing is going next. That was the exciting thing that I heard in terms of all these discussion with business leaders throughout the last two days is model deployment, model management. >> If I can kind of take this to maybe shift the conversation a little bit to the target audience. Talked a lot about data scientists and needing to enable them. I'm curious about, we just talked with, a couple of guests ago, about the chief data officer. How, you work with enterprises, how common is the chief data officer role today? What are some of the challenges they've got that datascience.com can help them to eliminate? >> Yeah, the CIO and the chief data officer, we have CIOs that have been selecting tools for companies to use, and now the chief data officer is sitting down with the CEO and saying, "How do we actually drive business results?" We work very closely with both of those personas. But on the CDO side, it's really helping them educate their teams on the possibilities of what could be realized with the data at hand, and making sure that IT is enabling the data scientists with the right tools. We supply the tools, but we also like to go in there with our customers and help coach, help educate what is possible, and that helps with the CDO's mission. >> A question along that front. We've been talking about sort of empowering the data scientist, and really, from one end of the modeling life cycle all the way to the end or the deployment, which is currently the hardest part and least well supported. But we also have tons of companies that don't have data science trained people, or who are only modestly familiar. Where do, what do we do with them? How do we get those companies into the mainstream in terms of deploying this? >> I think whether you're a small company or a big company, digital transformation is the mandate. Digital transformation is not just, how do I make a taxi company become Uber, or how do I make a speaker company become Sonos, the smart speaker, it's how do I exploit all the sources of my data to get better and improved operational processes, new business models, increased revenue, reduced operation costs. You could start small, and so we work with plenty of smaller companies. They'll hire a couple data scientists, and they're able to do small quick wins. You don't have to go sit in the basement for a year having something that is the thing, the unicorn in the business, it's small quick wins. Now we, my company, we believe in writing code, trained, educated, data scientists. There are solutions out there that you throw data at, you push a button, it gets an output. It's this magic black box. There's risk in that. Model interpretation, what are the features it's scoring on, there's risk, but those companies are seeing some level of success. We firmly believe, though, in hiring a data science team that is trained, you can start small, two or three, and get some very quick wins. >> I was going to say, those quick wins are essential for survivability, like digital transformation is essential, but it's also, I mean, to survival at a minimum, right? >> Ian: Yes. >> Those quick wins are presumably transformative to an enterprise being able to sustain, and then eventually, or ideally, be able to take market share from their competition. >> That is key for the CDO. The CDO is there pitching what is possible, he's pitching, she's pitching the dream. In order to be able to help visualize what that dream and the outcome could be, we always say, start small, quick wins, then from there, you can build. What you don't want to do is go nine months working on something and you don't know if there's going to be outcome. A lot of data science is trial and error. This is science, we're testing hypotheses. There's not always an outcome that's to be there, so small quick wins is something we highly recommend. >> A question, one of the things that we see more and more is the idea that actionable insights are perishable, and that latency matters. In fact, you have a budget for latency, almost, like in that short amount of time, the more sort of features that you can dynamically feed into a model to get a score, are you seeing more of that? How are the use cases that you're seeing, how's that pattern unfolding? >> Yeah, so we're seeing more streaming data use cases. We work with some of the biggest technology companies in the world, so IoT, connected services, streaming real time decisions that are happening. But then, also, there are so many use cases around org that could be marketing, finance, HR related, not just tech related. On the marketing side, imagine if you're customer service, and somebody calls you, and you know instantly the lifetime value of that customer, and it kicks off a totally new talk track, maybe get escalated immediately to a new supervisor, because that supervisor can handle this top tier customer. These are decisions that can happen real time leveraging machine learning models, and these are things that, again, are small quick wins, but massive, massive impact. It's about decision process now. That's digital transformation. >> OK. Are you seeing patterns in terms of how much horsepower customers are budgeting for the training process, creating the model? Because we know it's very compute intensive, like, even Intel, some people call it, like, high performance compute, like a supercomputer type workload. How much should people be budgeting? Because we don't see any guidelines or rules of thumb for this. >> I still think the boundaries are being worked out. There's a lot of great work that Nvidia's doing with GPU, we're able to do things faster on compute power. But even if we just start from the basics, if you go and talk to a data scientist at a massive company where they have a team of over 1,000 data scientists, and you say to do this analysis, how do you spin up your compute power? Well, I go walk over to IT and I knock on the door, and I say, "Set up this machine, set up this cluster." That's ridiculous. A product like ours is able to instantly give them the compute power, scale it elastically with our cloud service partners or work with on-prem solutions to be able to say, get the power that you need to get the results in the time that's needed, quick, fast. In terms of the boundaries of the budget, that's still being defined. But at the end of the day, we are seeing return on investment, and that's what's key. >> Are you seeing a movement towards a greater scope of integration for the data science tool chain? Or is it that at the high end, where you have companies with 1,000 data scientists, they know how to deal with specialized components, whereas, when there's perhaps less of, a smaller pool of expertise, the desire for end to end integration is greater. >> I think there's this kind of thought that is not necessarily right, and that is, if you have a bigger data science team, you're more sophisticated. We actually see the same sophistication level of 1,000 person data science team, in many cases, to a 20 person data science team, and sometimes inverse, I mean, it's kind of crazy. But it's, how do we make sure that we give them the tools so they can drive value. Tools need to include collaboration and workflow, not just hammers and nails, but how do we work together, how do we scale knowledge, how do we get it in the hands of the line of business so they can use the results. It's that that is key. >> That's great, Ian. I also like that you really kind of articulated start small, quick ins can make massive impact. We want to thank you so much for stopping by the Cube and sharing that, and what you guys are doing at Data Science to help enterprises really take advantage of the value that data can really deliver. >> Thanks so much for having datascience.com on, really appreciate it. >> Lisa: Absolutely. George, thank you for being my co-host. >> You're always welcome. >> We want to thank you for watching the Cube. I'm Lisa Martin with George Gilbert, and we are at our event Big Data SV on day two. Stick around, we'll be right back with our next guest after a short break. (busy music)

Published Date : Mar 8 2018

SUMMARY :

Those are the ones with the confidence and stupidity and a true gentleman. The guy at the counter recognized me and said... Announcer: John Cleese joins the Cube alumni. brought to you by Silicon Angle Media We are down the street from the Strata Data Conference. and it's great to wrap up my trip here on the show. and it all boils down to one thing, and that is, the next couple steps. to something that turns into a model that is actionable. and the next is, hey, I built this model, that becomes the sort of collaboration backbone. how open source you can adopt everything. OK, so the, and I think it's pretty easy to see Yeah, so orchestrating the enterprise data science stack in terms of all these discussion with business leaders a couple of guests ago, about the chief data officer. and making sure that IT is enabling the data scientists empowering the data scientist, and really, having something that is the thing, or ideally, be able to take market share and the outcome could be, we always say, start small, the more sort of features that you can dynamically in the world, so IoT, connected services, customers are budgeting for the training process, get the power that you need to get the results Or is it that at the high end, We actually see the same sophistication level and sharing that, and what you guys are doing Thanks so much for having datascience.com on, George, thank you for being my co-host. and we are at our event Big Data SV on day two.

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Ian Swanson	PERSON	0.99+
George	PERSON	0.99+
Ian	PERSON	0.99+
Lisa	PERSON	0.99+
Uber	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
John	PERSON	0.99+
John Cleese	PERSON	0.99+
500 data scientists	QUANTITY	0.99+
90%	QUANTITY	0.99+
dozens	QUANTITY	0.99+
Nvidia	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
20 person	QUANTITY	0.99+
Data Science	ORGANIZATION	0.99+
nine months	QUANTITY	0.99+
1,000 person	QUANTITY	0.99+
two	QUANTITY	0.99+
two days	QUANTITY	0.99+
more than nine months	QUANTITY	0.99+
second day	QUANTITY	0.99+
1,000 data scientists	QUANTITY	0.99+
three	QUANTITY	0.99+
Big Data SV	EVENT	0.99+
over 1,000 data scientists	QUANTITY	0.99+
Cube	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Strata Data Conference	EVENT	0.98+
one	QUANTITY	0.98+
Intel	ORGANIZATION	0.98+
Sonos	ORGANIZATION	0.98+
one thing	QUANTITY	0.97+
a year	QUANTITY	0.96+
today	DATE	0.95+
day two	QUANTITY	0.95+
this year	DATE	0.94+
single	QUANTITY	0.92+
Big Data SV 2018	EVENT	0.88+
DataScience.com	ORGANIZATION	0.87+
hundreds of new machine learning libraries	QUANTITY	0.86+
lot of people	QUANTITY	0.83+
decades	QUANTITY	0.82+
every single day	QUANTITY	0.81+
years ago	DATE	0.77+
last two days	DATE	0.76+
datascience.com	ORGANIZATION	0.75+
one end	QUANTITY	0.7+
years	QUANTITY	0.67+
datascience.com	OTHER	0.65+
couple steps	QUANTITY	0.64+
Big Data	EVENT	0.64+
couple of guests	DATE	0.57+
couple	QUANTITY	0.52+
Silicon Valley	LOCATION	0.52+
things	QUANTITY	0.5+
Cube	TITLE	0.47+

Ziya Ma, Intel | Big Data SV 2018

>> Live from San Jose, it's theCUBE! Presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partners. >> Welcome back to theCUBE. Our continuing coverage of our event, Big data SV. I'm Lisa Martin with my co-host George Gilbert. We're down the street from the Strata Data Conference, hearing a lot of interesting insights on big data. Peeling back the layers, looking at opportunities, some of the challenges, barriers to overcome but also the plethora of opportunities that enterprises alike have that they can take advantage of. Our next guest is no stranger to theCUBE, she was just on with me a couple days ago at the Women in Data Science Conference. Please welcome back to theCUBE, Ziya Ma. Vice President of Software and Services Group and the Director of Big Data Technologies from Intel. Hi Ziya! >> Hi Lisa. >> Long time, no see. >> I know, it was just really two to three days ago. >> It was, well and now I can say happy International Women's Day. >> The same to you, Lisa. >> Thank you, it's great to have you here. So as I mentioned, we are down the street from the Strata Data Conference. You've been up there over the last couple days. What are some of the things that you're hearing with respect to big data? Trends, barriers, opportunities? >> Yeah, so first it's very exciting to be back at the conference again. The one biggest trend, or one topic that's hit really hard by many presenters, is the power of bringing the big data system and data science solutions together. You know, we're definitely seeing in the last few years the advancement of big data and advancement of data science or you know, machine learning, deep learning truly pushing forward business differentiation and improve our life quality. So that's definitely one of the biggest trends. Another thing I noticed is there was a lot of discussion on big data and data science getting deployed into the cloud. What are the learnings, what are the use cases? So I think that's another noticeable trend. And also, there were some presentations on doing the data science or having the business intelligence on the edge devices. That's another noticeable trend. And of course, there were discussion on security, privacy for data science and big data so that continued to be one of the topics. >> So we were talking earlier, 'cause there's so many concepts and products to get your arms around. If someone is looking at AI and machine learning on the back end, you know, we'll worry about edge intelligence some other time, but we know that Intel has the CPU with the Xeon and then this lower power one with Atom. There's the GPU, there's ASICs, FPGAS, and then there are these software layers you know, with higher abstraction layer, higher abstraction level. Help us put some of those pieces together for people who are like saying, okay, I know I've got a lot of data, I've got to train these sophisticated models, you know, explain this to me. >> Right, so Intel is a real solution provider for data science and big data. So at the hardware level, and George, as you mentioned, we offer a wide range of products from general purpose like Xeon to targeted silicon such as FPGA, Nervana, and other ASICs chips like Nervana. And also we provide adjacencies like networking the hardware, non-volatile memory and mobile. You know, those are the other adjacent products that we offer. Now on top of the hardware layer, we deliver fully optimized software solutions stack from libraries, frameworks, to tools and solutions. So that we can help engineers or developers to create AI solutions with greater ease and productivity. For instance, we deliver Intel optimized math kernel library. That leverage of the latest instruction set gives us significant performance boosts when you are running your software on Intel hardware. We also deliver framework like BigDL and for Spark and big data type of customers if they are looking for deep learning capabilities. We also optimize some popular open source deep learning frameworks like Caffe, like TensorFlow, MXNet, and a few others. So our goal is to provide all the necessary solutions so that at the end our customers can create the applications, the solutions that they really need to address their biggest pinpoints. >> Help us think about the maturity level now. Like, we know that the very most sophisticated internet service providers who are sort of all over this machine learning now for quite a few years. Banks, insurance companies, people who've had this. Statisticians and actuaries who have that sort of skillset are beginning to deploy some of these early production apps. Where are we in terms of getting this out to the mainstream? What are some of the things that have to happen? >> To get it to mainstream, there are so many things we could do. First I think we will continue to see the wide range of silicon products but then there are a few things Intel is pushing. For example, we're developing this in Nervana, graph compiler that will encapsulate the hardware integration details and present a consistent API for developers to work with. And this is one thing that we hope that we can eventually help the developer community with. And also, we are collaborating with the end user. Like, from the enterprise segment. For example, we're working with the financial services industry, we're working with a manufacturing sector and also customers from the medical field. And online retailers, trying to help them to deliver or create the data science and analytics solutions on Intel-based hardware or Intel optimized software. So that's another thing that we do. And we're seeing actually very good progress in this area. Now we're also collaborating with many cloud service providers. For instance, we work with some of the top seven cloud service providers, both in the U.S. and also in China to democratize the, not only our hardware, but also our libraries and tools, BigDL, MKL, and other frameworks and libraries so that our customers, including individuals and businesses, can easily access to those building blocks from the cloud. So definitely we're working from different factors. >> So last question in the last couple of minutes. Let's kind of vibe on this collaboration theme. Tell us a little bit about the collaboration that you're having with, you mentioned customers in some highly regulated industries, for as an example. But a little bit to understand what's that symbiosis? What is Intel learning from your customers that's driving Intel's innovation of your technologies and big data? >> That's an excellent question. So Lisa, maybe I can start my sharing a couple of customer use cases. What kind of a solution that we help our customer to address. I think it's always wise not to start a conversation with the customer on technology that you deliver. You want to understand the customer's needs first. And then so that you can provide a solution that really address their biggest pinpoint rather than simply selling technology. So for example, we have worked with an online retailer to better understand their customers' shopping behavior and to assess their customers' preferences and interests. And based upon that analysis, the online retailer made different product recommendations and maximized its customers' purchase potential. And it drove up the retailer's sales. You know, that's one type of use case that we have worked. We also have partnered with the customers from the medical field. Actually, today at the Strata Conference we actually had somebody highlighting, we had a joint presentation with UCSF where we helped the medical center to automate the diagnosis and grading of meniscus lesions. And so today actually, that's all done manually by the radiologist but now that entire process is automated. The result is much more accurate, much more consistent, and much more timely. Because you don't have to wait for the availability of a radiologist to read all the 3D MRI images. And that can all be done by machines. You know, so those are the areas that we work with our customers, understand their business need, and give them the solution they are looking for. >> Wow, the impact there. I wish we had more time to dive into some of those examples. But we thank you so much, Ziya, for stopping by twice in one week to theCUBE and sharing your insights. And we look forward to having you back on the show in the near future. >> Thanks, so thanks Lisa, thanks George for having me. >> And for my co-host George Gilbert, I'm Lisa Martin. We are live at Big Data SV in San Jose. Come down, join us for the rest of the afternoon. We're at this cool place called Forager Tasting and Eatery. We will be right back with our next guest after a short break. (electronic outro music)

Published Date : Mar 8 2018

SUMMARY :

brought to you by SiliconANGLE Media some of the challenges, barriers to overcome What are some of the things that you're So that's definitely one of the biggest trends. on the back end, So at the hardware level, and George, as you mentioned, What are some of the things that have to happen? and also customers from the medical field. So last question in the last couple of minutes. customers from the medical field. And we look forward to having you We will be right back with our

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
UCSF	ORGANIZATION	0.99+
George	PERSON	0.99+
Lisa	PERSON	0.99+
San Jose	LOCATION	0.99+
China	LOCATION	0.99+
Ziya Ma	PERSON	0.99+
U.S.	LOCATION	0.99+
International Women's Day	EVENT	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Ziya	PERSON	0.99+
one week	QUANTITY	0.99+
today	DATE	0.99+
twice	QUANTITY	0.99+
First	QUANTITY	0.99+
Strata Data Conference	EVENT	0.99+
one topic	QUANTITY	0.98+
Spark	TITLE	0.98+
both	QUANTITY	0.98+
Intel	ORGANIZATION	0.98+
one thing	QUANTITY	0.98+
three days ago	DATE	0.98+
Women in Data Science Conference	EVENT	0.97+
Strata Conference	EVENT	0.96+
first	QUANTITY	0.96+
BigDL	TITLE	0.96+
TensorFlow	TITLE	0.96+
one type	QUANTITY	0.95+
two	DATE	0.94+
MXNet	TITLE	0.94+
Caffe	TITLE	0.92+
theCUBE	ORGANIZATION	0.91+
one	QUANTITY	0.9+
Software and Services Group	ORGANIZATION	0.9+
Forager Tasting and Eatery	ORGANIZATION	0.88+
Vice President	PERSON	0.86+
Big Data Technologies	ORGANIZATION	0.84+
seven cloud service providers	QUANTITY	0.81+
last couple days	DATE	0.81+
Atom	COMMERCIAL_ITEM	0.76+
Silicon Valley	LOCATION	0.76+
Big Data SV 2018	EVENT	0.74+
a couple days ago	DATE	0.72+
Big Data SV	ORGANIZATION	0.7+
Xeon	COMMERCIAL_ITEM	0.7+
Nervana	ORGANIZATION	0.68+
Big Data	EVENT	0.62+
last	DATE	0.56+
data	EVENT	0.54+
case	QUANTITY	0.52+
3D	QUANTITY	0.48+
couple	QUANTITY	0.47+
years	DATE	0.47+
Nervana	TITLE	0.45+
Big	ORGANIZATION	0.32+

Blaine Mathieu, VANTIQ | Big Data SV 2018

>> Announcer: Live from San Jose, it's The Cube, presenting Big Data, Silicon Valley. Brought to you by Silicon Angle Media and its ecosystem partners. >> Welcome back to The Cube. Our continuing coverage of our event, Big Data SV continues. I am Lisa Martin joined by Peter Burris. We're in downtown San Jose at a really cool place called Forager Tasting and Eatery. Come down, hang out with us today as we have continued conversations around all things big data, everything in between. This is our second day here and we're excited to welcome to The Cube the CMO of VANTIQ, Blaine Mathieu. Blaine, great to meet you, great to have you on the program. >> Great to be here, thanks for inviting me. >> So, VANTIQ, you guys are up the street in Walnut Creek. What do you guys do, what are you about, what makes VANTIQ different? >> Well, in a nutshell, VANTIQ is a so called high productivity application development platform to allow developers to build, deploy, and manage so called event driven real time applications, the kind of applications that are critical for driving many of the digital transformation initiatives that enterprises are trying to get on top of these days. >> Digital trasformation, it's a term that can mean so many different things, but today, it's essential for companies to be able to compete, especially enterprise companies with newer companies that are more agile, more modern. But if we peel apart digital transformation, there's so many elements that are essential. How do you guys help companies, enterprises, say, evolve their application architectures that might currently not be able to support an actual transformation to a digital business? >> Well, I think that's a great question, thank you. I think the key to digital trasformation is really a lot around the concept of real time, okay. The reason Uber is disrupting or has disrupted the taxi industry is the old way of doing it was somebody called a taxi and then they waited 30 minutes for a taxi to show up and then they told the taxi where to go and hopefully they got there. Whereas, Uber, turned that into a real time business, right? You called, you pinged something on your phone. They knew your location. They knew the location of the driver. They matched those up, brought 'em together in real time. Already knew where to bring you to and ensured you had the right route and that location. All of this data flowing, all of these actions have been taken in real time. The same thing applies to a disruptor like Netflix, okay? In the old days, Blockbuster used to send you, you know, a leaflet in the mail telling you what the new movies are. Maybe it was personalized for you. Probably not. No, Netflix knows who you are instantly, gives you that information, again, in real time based on what you've done in the past and is able to give you, deliver the movie also, in real time pretty well. Every disruptor you look at around digital transformation is bringing a business or a process that was done slowly and impersonally to make it happen in real time. Unfortunately, enterprise applications and the architectures, as you said a second ago, that are being used in most applications today weren't designed to enable these real time use cases. A great example is sales force. So, a sales force is a pretty standard, what you'd call a request application. So, you make a request, a person, generally, makes a request of the system, system goes into a database, queries that database, find information and then returns it back to the user. And that whole process could take, you know, significant amounts of time, especially if the right data isn't in the database at the time and you have to go request it or find it or create it. A new type of application needs to be created that's not fundamentally database centric, but it's able to take these real time data streams coming in from devices, from people, from enterprise systems, process them in real time and then take an action. >> So, let's pretend I'm a CEO. >> Yeah. >> One of the key things you said, and I want you to explain it better, is event. What is event? What is an event and how does that translate into a digital business decision? >> This notion of complex event processing CEP has been around in technology for a long time and yet, it surprises me still a lot of folks we talk to, CEOs, have never heard of the concept. And, it's very simple really. An event is just something that happens in the context of business. That's as complex and as simple as it is. An event could be a machine increases in temperature by one degree, a car moves from one location to another location. It could be an enterprise system, like an ERP system, you know, approves a PO. It could be a person pressing a button on a mobile device. All of those, or it could be an IOT device putting off a signal about the state of a machine. Increasingly, we're getting a lot of events coming from IOT devices. So, really, any particular interesting business situation or a change in a situation that happens is an event And increasingly driven, as you know, by IOT, by augmented reality, by AI and machine learning, by autonomous vehicles, by all these new real time technologies are spinning off more and more events, streams of these events coming off in rapid fashion and we have to be able to do something about them. >> Let me take a crack at it and you tell me if I've got this right. That, historically, applications have been defined in terms of processes and so, in many respects, there was a very concrete, discreet, well established program, set of steps that were performed and then the transaction took place. And event, it seems to me is, yeah, we generally described it, but it changes in response to the data. >> Right, right. >> So, an event is kind of like an outside in driven by data. >> Right, right. >> System response, whereas, your traditional transaction processing is an inside out driven by a sequence of programmed steps, and that decision might have been made six years ago. So, the event is what's happening right now informed by data versus a transaction, traditional transaction is much more, what did we decide to do six years ago and it just gets sustained. Have I got that right? >> That's right. Absolutely right or six hours ago or even six minutes ago, which might seem wow, six minutes, that's pretty good, but take a use case for a field service agent trying to fix a machine or an air conditioner on top of a building. In today's world now, that air conditioner has hundreds of sensors that are putting off data about the state of that air conditioner in real time. A service tech has the ability to, while the machine is still putting off that data, be able to make repairs and changes and fixes, again, in the moment, see how that is changing the data coming off the machine, and then, continue to make the appropriate repairs in collaboration with a smart system or an application that's helping them. >> That's how identifying patterns about what the problem is, versus some of the old ways was where we had recipe of, you know, steps that you went through in the call center. >> Right, right. And the customer is getting more and more frustrated. >> They got their clipboard out and had the 52 steps they followed to see oh that didn't work, now the next step. No, data can help us do that much more efficiently and effectively if we're able to process it in real time. >> So, in many respects, what we're really talking about is an application world or a world looking forward where the applications, which historically have been very siloed, process driven, to a world where the application function is much more networked together and the application, the output of one application is having a significant impact through data on the performance of an application somewhere else. That seems like it's got the potential to be an extremely complex fabric. (laughing) So, do I wait until I figure all that out (laughing) and then I start building it? Or do I, I mean, how do I do it? Do I start small and create and grow into it? What's the best way for people to start working on this? >> Well, you're absolutely right. Building these complex, geeking out a little bit, you know, asynchronous, non-blocking, so called reactive applications, that's the concept that we've been using in computer science for some time, is very hard, frankly. Okay, it's much easier to build computing systems that process things step one, step, two, step three, in order, but if you have to build a system that is able to take real time inputs or changes at any point in the process at any time and go in a different direction, it's very complex. And, computer scientists have been writing applications like this for decades. It's possible to do, but that isn't possible to do at the speed that companies now want to transform themselves, right? By the time you spec out an application and spend two years writing it, your business competitors have already disrupted you. The requirements have already changed. You need to be much more rapid and agile. And so, the secret sauce to this whole thing is to be able to write these transformative applications or create them, not even write is actually the wrong word to use, to be able to create them. >> Generate them. >> Yeah, generate them in a way which is very fast, does not require a guru level developer and reactive Java or some super low level code that you'd have to use to otherwise do it, so that you can literally have business people help design the applications, conceptually build them almost in real time, get them out into the market, and then be able to modify them as you need to, you know, on the fly. >> If I can build on that for just one second. So, it used to be we had this thing called computer assisted software engineer. >> (laughs) Right, right. >> We were going to operate this very very high level language. It's kind of-- But then, we would use code and build a code and the two of them were separated and so the minute that we deployed, somebody would go off and maintain and the whole thing would break. >> Right, right. >> Do you have that problem? >> No, well, that's exactly right. So, the old, you know, the old, the previous way of doing it was about really modeling an application, maybe visually, drag and drop, but then fundamentally, you created a bunch of code and then your job, as you said after, was to maintain and deploy and manage. >> Try to sustain some connection back up to that beautiful visual model. >> And you probably didn't because that was too much. That was too much work, so forget about the model after that. Instead, what we're able to do these days is to build the applications visually, you know, really for the most part with either super low code or, in many cases, no code because we have the ability to abstract away a lot of the complexity, a lot of the complex code that you'd have to write, we can represent that, okay, with these logical abstractions, create the applications themselves, and then continue to maintain, add to, modify the application using the exact same structure. You're not now stuck on, now you're stuck with 20,000 lines of code that you have to, that you have to edit. You're continuing to run and maintain the application just the way you built it, okay. We've now got to the place in computer science where we can actually do these things. We couldn't do them, you know, 20 years ago with case, but we can absolutely do them now. >> So, I'm hearing from a customer internal perspective a lot of operational efficiencies that VANTIQ can drive. Let's look now from a customer's perspective. What are the business impacts you're able to make? You mentioned the word reactive a minute ago when you were talking about applications, but do you have an example where you've, VANTIQ, has enabled a customer, a business, to be more, to be proactive and be able to identify through, you know, complex event processing, what their customers are doing to be able to deliver relevant messages and really drive revenue, drive profit? >> Right, right. So many, you know, so many great examples. And, I mentioned field service a few minutes ago. I've got a lot of clients in that doing this real time field service using these event processing applications. One that I want to bring up right now is one of the largest global shoe manufacturers, actually, that's a client of VANTIQ. I, unfortunately, can't say the name right now 'cause they want to keep what they're doing under wraps, but we all definitely know the company. And they're using this to manage the security, primarily, around their real time global supply chain. So, they've got a big challenge with companies in different countries redirecting shipments of their shoes, selling them on the gray market, at different prices than what are allowed in different regions of the world. And so, through both sensorizing the packages, the barcode scanning, the enterprise systems bringing all that data together in real time, they can literally tell in the moment is something is be-- If a package is redirected to the wrong region or if literally a shoe or a box of shoes is being sold where it shouldn't be sold at the wrong price. They used to get a monthly report on the activities and then they would go and investigate what happened last month. Now, their fraud detection manager is literally sitting there getting this in real time, saying, oh, Singapore sold a pallet of shoes that they should not have been able to sell five minute ago. Call up the guy in Singapore and have him go down and see what's going on and fix that issue. That's pretty powerful when you think about it. >> Definitely, so like reduction in fraud or increase in fraud detection. Sounds like, too, there's a potential for a significant amount of cost savings to the business, not just meeting the external customer needs, but from a, from a cost perspective reduction. Not just some probably TCO, but in operational expenses. >> For sure, although, I would say most of the digital transformation initiatives, when we talk to CEOs and CIOs, they're not focused as much on cost savings, as they're focused on A, avoiding being disrupted by the next interesting startup, B, creating new lines of business, new revenue streams, finding out a way to do something differently dramatically better than they're currently doing it. It's not only about optimizing or squeezing some cost out of their current application. This thing that we are talking about, I guess you could say it's an improvement on their current process, but really, it's actually something they just weren't even really doing before. Just a total different way of doing fraud detection and managing their global supply chain that they just fundamentally weren't even doing. And now, of course, they're looking at many other use cases across the company, not just in supply chain, but, you know, smart manufacturing, so many use cases. Your point about savings, though, there's, you know, what value does the application itself bring? Then, there's the question of what does it cost to build and maintain and deploy the application itself, right? And, again, with these new visual development tools, they're not modeling tools, you're literally developing the application visually. You know, I've been in so many scenarios where we talked to large enterprises. You know, we talk about what we're doing, like we talk about right now, and they say, okay, we'd love to do a POC, proof of concept. We want to allocate six months for this POC, like normally you would probably do for building most enterprise applications. And, we inevitably say, well, how about Friday? How about we have the POC done by Friday? And, you know, we get the Germans laugh, you know, laugh uncomfortably and we go away and deliver the POC by Friday because of how much different it is to build applications this way versus writing low level Java or C-sharp code and sticking together a bunch of technologies and tools 'cause we abstract all that away. And, you know, the eyes drop open and the mouth drops open and it's incredible what modern technology can do to radically change how software is being developed. >> Wow, big impact in a short period of time. That's always a nice thing to be able to deliver. >> It is, it is to-- It's great to be able to surprise people like that. >> Exactly, exactly. Well, Blaine, thank you so much for stopping by, sharing what VANTIQ is doing to help companies be disruptive and for sharing those great customer examples. We appreciate your time. >> You're welcome. Appreciate the time. >> And for my co-host, Peter Burris, I'm Lisa Martin. You're watching The Cube's continuing coverage of our event, Big Data SV Live from San Jose, down the street from the Strata Data Conference. Stick around, we'll be right back with our next guest after a short breal. (techy music)

Published Date : Mar 8 2018

SUMMARY :

Brought to you by Silicon Angle Media the CMO of VANTIQ, Blaine Mathieu. So, VANTIQ, you guys are up the street in Walnut Creek. for driving many of the digital transformation that might currently not be able to support and the architectures, as you said a second ago, One of the key things you said, in the context of business. in response to the data. So, an event is kind of like an outside in So, the event is what's happening right now and changes and fixes, again, in the moment, of the old ways was where we had recipe of, you know, And the customer is getting more and more frustrated. they followed to see oh that didn't work, and the application, the output of one application And so, the secret sauce to this whole thing to modify them as you need to, you know, on the fly. So, it used to be we had this thing and so the minute that we deployed, So, the old, you know, the old, Try to sustain just the way you built it, okay. but do you have an example where you've, that they should not have been able to sell to the business, not just meeting and deliver the POC by Friday because to be able to deliver. It's great to be able to surprise people Well, Blaine, thank you so much for stopping by, Appreciate the time. down the street from the Strata Data Conference.

ENTITIES

Entity	Category	Confidence
Blaine	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Peter Burris	PERSON	0.99+
Singapore	LOCATION	0.99+
Uber	ORGANIZATION	0.99+
two years	QUANTITY	0.99+
Netflix	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
VANTIQ	ORGANIZATION	0.99+
Blaine Mathieu	PERSON	0.99+
20,000 lines	QUANTITY	0.99+
30 minutes	QUANTITY	0.99+
two	QUANTITY	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
52 steps	QUANTITY	0.99+
Walnut Creek	LOCATION	0.99+
six months	QUANTITY	0.99+
Java	TITLE	0.99+
one degree	QUANTITY	0.99+
Friday	DATE	0.99+
second day	QUANTITY	0.99+
last month	DATE	0.99+
one second	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
six years ago	DATE	0.98+
both	QUANTITY	0.98+
Strata Data Conference	EVENT	0.98+
Big Data SV Live	EVENT	0.98+
One	QUANTITY	0.98+
The Cube	ORGANIZATION	0.98+
today	DATE	0.98+
one	QUANTITY	0.98+
20 years ago	DATE	0.98+
Big Data SV 2018	EVENT	0.97+
six hours ago	DATE	0.97+
six minutes ago	DATE	0.97+
five minute ago	DATE	0.97+
a minute ago	DATE	0.96+
hundreds of sensors	QUANTITY	0.95+
The Cube	TITLE	0.94+
Blockbuster	ORGANIZATION	0.91+
few minutes ago	DATE	0.89+
step one	QUANTITY	0.89+
step three	QUANTITY	0.85+
Forager Tasting and Eatery	ORGANIZATION	0.85+
decades	QUANTITY	0.84+
six minutes	QUANTITY	0.84+
C	TITLE	0.83+
Big Data	ORGANIZATION	0.81+
one location	QUANTITY	0.78+
one application	QUANTITY	0.77+
second ago	DATE	0.71+
CEP	ORGANIZATION	0.53+
big	ORGANIZATION	0.52+
Germans	PERSON	0.51+
techy	ORGANIZATION	0.41+
Data	EVENT	0.31+

Matt Maccaux, Dell EMC | Big Data SV 2018

>> Male Narrator: Live from San Jose, it's theCube. Presenting Big Data Silicon Valley, brought to you by SilconANGLE Media and it's ecosystem partners. >> Welcome back to theCube's continuing coverage of our event, Big Data SV in downtown San Jose. I'm Lisa Martin, my co-host is Dave Vellante. Hey Dave. >> Hey Lisa, how's it going? >> Good. >> Doing a great job here, by the way. >> Well thank you, sir. >> Keeping the trains going. >> Yeah. >> Well done. >> We've had a really interesting couple of days, we started here yesterday interviewing lots of great guys and gals on Big Data and everything in between. A lots of different topics there, opportunities, challenges, digital transformation, how can customers really evolve on this journey? We're excited to welcome back to theCube, one of our distinguished alumni, Matt Maccaux, the Global Big Data Practice Lead from Dell EMC. Welcome back. >> Well thanks for having me, appreciate it, it's a pleasure to be here. >> Yeah, so lots of stuff going on. We've been here, as I mentioned, we're down the street from the Strata Data Conference and we've had a lot of great conversations, very educational, informative. You've been with the whole Dell EMC family for a while now. We'd love to get your perspective on, kind of, what's going on from your team's standpoint. What are you seeing in the enterprises with respect to Big Data and being able to really leverage data across the business as a value driver and a revenue generator? >> Yeah, it's interesting that what we see across the business in terms of, especially in the big enterprises, there, many organizations, even the more mature ones, are still struggling to get that extra dollar, that extra level of monetization out of their data assets. They, everyone talks about monetizing data and using data, treating it as an asset, but organizations are struggling with that, not because of the technology, the technology's been put in, they've ramped up their teams, their skills. It's, what we tend to see inhibiting this digital transformation growth is process. It's organizational strife and it's not looking to best practices, even within own, their own organization, we're doing things like DevOps. So, why would we treat the notion of creating a data model any different than we would regular application development? Well, organizations still carry that weight, that inertia, they still treat Big Data and analytics like they do the data warehouse, and the most effective organizations are starting to incorporate that agile methodology and agile thinking, no snowflakes, infrastructure's code, these concepts of quickly and rapidly repeatedly doing these things, those are the organizations that are really starting to pull away from their competitors in industry. So, Dell EMC, our consulting group and our product lines are all there to support that transformation journey by taking those best practices and DevOps DataOps and bringing that to the analytical space. >> Do you think that companies, Matt, have a pretty good sense as to how applications that they develop are going to affect, create value, creating value is, let's simplify it, increasing revenue or cutting cost? Generally people can predict with the impact, they can write a business case around it. My observation was that certainly in the early days of so-called Big Data, people really didn't have an understanding as to the relationship between their data and that value, and so, many companies mistakenly thought, "Well I need to figure out how to sell my data," versus understand how data affects monetization. I wonder if you could comment on that and how has that progressed throughout the years? >> Yeah, that's a good point, we, from a consulting practice, used to do a lot of, what we call, proof of values, where organizations, after they kicked the tires and covered some use cases, we took them through a very slow, methodical business case RY analysis. You're going to spend this much on infrastructure, you're going to hire these people, you're going to take this data, and poof, you're going to make this much money, you're going to save this much money. Well, we're doing less and less of that these days because organizations have a good feel for where they want to go and the potential upside for doing this where they're now tend to struggle is, "Well, how do I actually get there?" "There's still a lot of tools and a lot of technologies and which is right for my business?" "What is the right process and how do I build that consensus in the organization?" And so, from a business consulting perspective, we're doing less of the RY work and more of the governance, the sort of, governance work by aligning stakeholders, getting those repeatable patterns and architectures in place to help organizations take that first few wins and then scale it. >> Where do you see the action these days? I mean there's somehow I profile use cases, obviously getting people to click on ads, Big Data has helped with that, fraud detection has come such a long way in the last 10 years, ya know, no doubt, certainly risk assessment, ya know, from the financial services industry. Those are the obvious ones, where else do you see Big Data analytics to the changing the world, if you will? >> Yeah, so I'd say those static or batch-type workloads are well understood. That, hey, is there fraud on transactions that occurred yesterday or last night? What is the customer score, lifetime value score for customer? Where we see more trends in the enterprise space is streaming. So, what can we catch in real time and help our people make real time decisions? So, and that is dealing with unstructured data. So, I've got a call center and I'm listening to the voice that's coming in, putting some sentiment analysis on that and then providing a score or script to the customer call agent in real time. And those, sort of, streaming use cases, whether it's images or voice, that, I think, is the next paradigm for use cases that organizations want to tackle. 'Cause if you can prevent a customer from leaving in real time, right, say, you know what, it sounds like you're upset, what if we did X to help retain you, it's going to be significant. All these organizations have a good idea of the cost it takes to acquire a new customer and the cost of losing a customer, so if they can put that intelligence in upstream, they no longer have to spend so much money trying to capture new customers 'cause they can focus on the ones they have. So, I think that, sort of, time between customer and streaming is where the next set of, I think, money's to be found. >> So customer experience is critical for businesses in any organization, I'm wondering, kind of, what the juxtaposition is of businesses going, "Yes, we have to be able "to do things in real time, in enterprise, "we have to be agile, yet we have, in order "to really facilitate a really effective, relevant, "timely customer experience, many departments "and organizations in a business need access to data." From a political perspective, how does Dell EMC, how does your consulting practice help an enterprise be able to start opening up these barriers internally to be able to enable data sharing so that they can drive and take advantage of things like real-time streaming to ultimately improve the customer experience, revenue, et cetera? >> Yeah, it's going to sound really trite, but the first step is getting everyone in a room and talking about what good looks like, what are the low-hanging... And everyone's going to agree on those use cases, there going to say, "These are the things we have to do," right, "We want to lose fewer customers, we want to..." You know, whatever the case may be, so everyone will agree on that. So, the politics don't come into play there. So, "Well, what data do we require for that?" "Okay, well, we've got all this data, great, "no disagreement there." Well, where is the data located? Who's the owner or the steward of that data? And now, who's going to be responsible for monetizing that? And that's where we tend to see the breakdown because when these things cross the line of business and customer always crosses the line of business, you end up with turf wars. And so this, the emergence of the Chief Data Officer, who's responsible for the policy and the prioritization and the ownership of these things is such a key role now, that, and it's not a CIO responsible for data, it is a business aligned executive reporting to the chief, CEO, COO, CFO. Again, business alignment, that tends to be the decision maker or at least the thing that solves for those conflicts across those BUs. And when that happens, then we see real change. But, if there's not that role or that person that can put that line in the sand and say, "This is how we're going to do it," you end up with that political strife and then you end up with silos of information or point solutions across the enterprise and it doesn't serve anyone. >> What are you seeing in terms of that CDO role? I mean, initially the Chief Data Officer was really within regulated businesses, financial services, healthcare, government. And then you've seen it permeate, ya know, to more mainstream. Do you see that role as having legs? A lot of people have questioned that role. What Chief Digital Officer, Chief Data Officer is encroaching on the CIO territory? I'm inferring from your comments that you're optimistic about that role going forward. >> I am, as long as it's well-defined as having unique capabilities that's different than the CIO. Again, I think the first generation of Chief Data Officers were very CIO-liked or CIO-for-data and that's when you ended up with the turf wars. And then it was like, "Okay, well this is "what we're doing." But then you had someone who was sort of a peer for infrastructure and so, it just didn't seem to work out. And so, now we're seeing that role being redefined, it's less about the technology and the tools and the infrastructure, and it's more about the policies, the consistency, the architectures. >> You know I'd observe, I wonder if we can talk about this for a little bit, it's the CDO role. To me, one of the first things a CDO has to do is understand how a company gets value out of its data, what is the, and if it's a full profit company, what's the monetization, where does that come from? Not selling the data, as we were talking about earlier. And then there is what data, what data, where are, what data architecture, data sources, how do we give access to that? And then quality, data quality seems to be something that they worry about. And then skills, not, none, no technology in here. And then somehow they're going to form relationships with the line of business and it's simultaneous to figuring that out. Does that seem like a reasonable framework for the CIO, CDOs job? >> It does, and you call them Chief Data Governance Officer, I mean, it really falls under the umbrella of governance. It's about standards and consistency, but also these policies of, there are finite resources, whether we're talking people or computes. What do you do when there's not enough resources and more demand? How do you prioritize the things that the business does? Well, do you have policies and matrices that say, "Okay, well, is it material, actionable, timely?" "Then yes, then we'll proceed with this." "No, it doesn't pass." And it doesn't have to be about money. However the organization judges itself is what it should be based on. So, whether we're talking non-profit, we helped a school system recently better align kids with schedules and also learning abilities by sitting them next to each other in classes, there's no profit in that other than the education of children, so every organization judges itself or measures itself a little differently, but it comes back to those KPIs. What are your KPIs, how does that align to business initiatives? And then everything should flow from there. Now, I'm not saying it's easy work. Data governance is the hardest thing to do in this space and that's why I think so few organizations take it on 'cause it's a long, slow process and, ya know, you should've started 10 years ago on it and if you haven't, it feels like this mountain that is really high to climb. >> What you're saying is outcome driven. >> Yeah. >> Independent of the types of organizations. I want to talk about innovation, I've been asking a lot of people this week, do you feel like Big Data, ya know, the meme of Big Data that was created eight, 10 years ago, do you feel like it lived up to its promises? >> That's a loaded question. I think if you were to ask the back office enterprises, I would say yes. In terms of customers feeling it, probably not, because when you use an Uber app to hail a cab and pay $3.75 to go across town, it feels like a quality of life, but you don't know that that's a data-driven decision. As a consumer, your average consumer, you probably don't feel that. As you're clicking through Amazon and they know, sort of, the goods that you need, or the fact that they know what you're going to need and they've got it in a warehouse that they can get to you later that day, it doesn't feel like a Big Data solution, it just feels like, "Hey, the people I'm doing business with, they know me better." People don't really understand that that's a Big Data and analytics concept, so, has it lived up to the hype? Externally, I think the perception is that it has not, but the businesses that really get it, feel that absolutely it has. That's 'cause you, do you agree it's kind of bifurcated? >> Matt Maccaux: Yeah, it is. >> The Spotify's and the Ubers and the Airbnb's that are crushing it and then there's a lot of traditional enterprises that are still stove pipe and struggling. >> Yeah, it's funny, when we talk to customers, we've got our introductory power points, right, it always talks about the new businesses and the old businesses and, and I'm finding that that doesn't play very well anymore with enterprise customers. They're like, "We're never going to be the Uber "of our industry, it's not going to happen "if I'm a fortune 100 legacy, it's not going to happen." "What I really want to do, though, "is help my customers or make more money here, "I'm not going to be the Uber, it's just not going to happen." "We're not the culture, we're not the, we're not set up "that way, we have all of this technical legacy stuff, "but I really want to get more value out of my data, "how do I do that?" And so that message resonates. >> Isn't that in some ways, though, how do you feel about this, is it a recipe for disruption, where that's not going to happen, but something could happen where somebody digitizes your business? >> Yes, absolutely, if there are organizations, if you're in the fortune 500 and you are not worried about someone coming along and disrupting you, then you are probably not doing the right job. I would be kept awake every night, whether it was financial services or industrial manufacturing. >> Dave Vellante: Grocery. >> Nobody thought that the taxis, who the hell would come in and disrupt the cab industry? Ya got to hire all these people, the cars are junk, the customer experience is awful. Well, someone has come along and there's been an industry related to this, now they have their bumps in the road, so are they going to be disrupted again or what's the next level of disruption? But, I think it is technology that fuels that, but it's also the cultural shift as part of that, which is outside the technologies, the socioeconomic trends that I think drive that, as well. >> But even, ya know, and we've got just a few seconds left, the cultural shift internally. It sounds like, from what you're describing, if an enterprise is going to recognize, "I'm not going to compete with an Uber or an Airbnb "or a Netflix, but I've got to be able to compete "with my existing peers of enterprise organizations," the CDO role sounds like it's a matter of survivability. >> Yes. >> Without putting that in place, you can't capitalize on the value of data monetized and et cetera. Well guys, I wish we had more time 'cause I think we're opening a can of worms here, but Dave, Matt thanks so much for having this conversation. Thank you for stopping by. >> Thanks for having me here, it was a real pleasure. >> Likewise. We want to thank you for watching theCube. We are continuing our coverage of our event, Big Data SV in downtown San Jose. For Dave Vellante, my co-host, I'm Lisa Martin. Stick around, we'll be right back with our next guest after a short break. (upbeat music)

Published Date : Mar 8 2018

SUMMARY :

brought to you by SilconANGLE Media Welcome back to theCube's continuing coverage by the way. We're excited to welcome back to theCube, it's a pleasure to be here. We'd love to get your perspective on, and bringing that to the analytical space. applications that they develop are going to affect, and more of the governance, the sort of, Those are the obvious ones, where else do you see the cost it takes to acquire a new customer these barriers internally to be able Again, business alignment, that tends to be I mean, initially the Chief Data Officer and the infrastructure, and it's more about To me, one of the first things a CDO has to do Data governance is the hardest thing to do Independent of the types or the fact that they know what you're going to need The Spotify's and the Ubers and the Airbnb's and the old businesses and, and I'm finding then you are probably not doing the right job. their bumps in the road, so are they going to be "or a Netflix, but I've got to be able to compete that in place, you can't capitalize We want to thank you for watching theCube.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Matt Maccaux	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Matt	PERSON	0.99+
$3.75	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
Spotify	ORGANIZATION	0.99+
Lisa	PERSON	0.99+
Airbnb	ORGANIZATION	0.99+
yesterday	DATE	0.99+
Uber	ORGANIZATION	0.99+
Dell EMC	ORGANIZATION	0.99+
last night	DATE	0.99+
SilconANGLE Media	ORGANIZATION	0.99+
Ubers	ORGANIZATION	0.99+
first step	QUANTITY	0.99+
10 years ago	DATE	0.99+
Netflix	ORGANIZATION	0.99+
one	QUANTITY	0.98+
eight,	DATE	0.97+
this week	DATE	0.96+
Strata Data Conference	EVENT	0.96+
Big Data SV 2018	EVENT	0.96+
Male Narrator	TITLE	0.95+
first generation	QUANTITY	0.94+
Big Data	ORGANIZATION	0.93+
San Jose	LOCATION	0.92+
Silicon Valley	LOCATION	0.87+
theCube	ORGANIZATION	0.83+
theCube	TITLE	0.83+
Big Data	TITLE	0.8+
DevOps	TITLE	0.79+
last 10 years	DATE	0.79+
Big Data SV	EVENT	0.73+
first things	QUANTITY	0.72+
Live from	TITLE	0.7+
first few	QUANTITY	0.7+
days	QUANTITY	0.64+
CDO	TITLE	0.62+
500	QUANTITY	0.59+
couple	QUANTITY	0.59+
100	QUANTITY	0.55+

Octavian Tanase, NetApp | Big Data SV 2018

>> Announcer: Live from San Jose it's The Cube presenting Big Data, Silicon Valley brought to you by SiliconANGLE Media and its ecosystem partners. >> Good morning. Welcome to The Cube. We are on day two of our coverage our event Big Data SV. I'm Lisa Martin with my cohost Dave Vellante. We're down the street from the Strata Data Conference. This is The Cube's tenth big data event and we had a great day yesterday learning a lot from myriad guests on very different nuances of big data journey where things are going. We're excited to welcome back to The Cube an alumni, Octavian Tanase, the Senior Vice President of Data ONTAP fron Net App. Octavian, welcome back to The Cube. >> Glad to be here. >> So you've been at the Strata Data Conference for the last couple of days. From a big data perspective, what are some of the things that you're hearing, in terms of from a customer's perspective on what's working, what challenges, opportunities? I'm very excited to be here and learn about the innovation of our partners in the industry and share with our partners and our customers what we're doing to enable them to drive more value out of that data. The reality is that data has become the 21st Century gold or oil that powers the business and everybody's looking to apply new techniques, a lot of times machine learning, deep learning, to draw more value of the data, make better decisions and compete in the marketplace. Octavian, you've been at NetApp now eight years and I've been watching NetApp, as we were talking about offline, for decades and I've seen the ebb and flow and this company has transformed many, many times. The latest, obviously cloud came in, flash came into play and then you're also going through a major transition in the customer based to clustered ONTAP. You seemed to negotiate that. NetApp is back, thriving, stock's up. What's happening at NetApp? What's the culture like these days? Give us the update. >> I think we've been very fortunate to have a CEO like George Kurian, who has been really focused on helping us do basically fewer things better, really focus on our core business, simplify our operations and continue to innovate and this is probably the area that I'm most excited about. It's always good to make sure that you accelerate the business, make it simpler for your customers and your partners to do business with you, but what you have to do is innovate. We are a product company. We are passionate about innovation. I believe that we are innovating with more pace than many of the startups in the space so that's probably the most exciting thing that has been part of our transformation. >> So let's talk about big data. Back in the day if you had a big data problem you would buy a big Unix box, maybe buy some Oracle licenses, try to put all your data into that box and that became your data warehouse. The brilliance of Hadoop was hey we can leave the data where it is. There's too much data to put into the box so we're going to bring five megabytes to code to a petabyte of data. And the other piece of it is CFOs loved it, because we're going to reduce the cost of our expensive data warehouse and we're going to buy off the shelf components: white box, servers and off the shelf disk drives. We're going to put that together and life will be good. Well as things matured, the old client-server days, it got very expensive, you needed enterprise grade. So where does NetApp fit into that equation, because originally big storage companies like NetApp, they weren't part of the equation? Has that changed? >> Absolutely. One of the things that has enabled that transformation, that change is we made a deliberate decision to focus on software defined and making sure that the ONTAP operating system is available wherever data is being created: on the edge in an IoT device, in the traditional data center or in the cloud. So we are in the unique position to enable analytics, big data, wherever those applications reside. One of the things that we've recently done is we've partnered with IDC and what the study, what the analysis has shown is that deploying in analytics, a Hadoop or NoSQL type of solution on top of NetApp is half the cost of DAS. So when you consider the cost of servers, the licenses that you're going to have to pay for, these commercial implementations of Hadoop as well as the storage and the data infrastructure, you are much better off choosing NetApp than a white box type of solution. >> Let's unpack that a little bit, because if I infer correctly from what you said normally you would say the operational costs are going to be dramatically lower, it's easier to manage a professional system like a NetApp ONTAP, it's integrated, great software, but am I hearing you correctly, you're saying the acquisition costs are actually less than if I'm buying white box? A lot of people are going to be skeptical about that, say Octavian no way, it's cheaper to buy white box stuff. Defend that statement. >> Absolutely. If you're looking at the whole solution that includes the server and the storage, what NetApp enables you to do if you're running the solution on top of ONTAP you reduce the need for so many servers. If you reduce that number you also reduce the licensing cost. Moreover, if you actually look at the core value proposition of the storage layer there, DAS typically makes three copies of the data. We don't. We are very greedy and we're making sure that you're using shared storage and we are applying a bunch of storage efficiency techniques to further compress, compact that data for world class storage efficiency. >> So cost efficiency is obviously a great benefit for any company when they're especially evolving, from a digital perspective. What are some of the business level benefits? You mentioned speed a minute ago. What is Data ONTAP and even ONTAP in the cloud enabling your enterprise customers to achieve at the business level, maybe from faster time to market, identifying with machine learning and AI new products? Give me an example of maybe a customer that you think really articulates the value that ONTAP in the cloud can deliver. >> One of the things that's really important is to have your data management capability, whatever the data is being produced so ONTAP being consumed either as a VM or a service ... I don't know if you've seen some of the partnerships that we have with AWS and Azure. We're able to offer the same rich data management capabilities, not only the traditional data center, but in the cloud. What that really enables customers to do is to simplify and have the same operating system, the same data management platform for the both the second platform traditional applications as well as for the third platform applications. I've seen a company like Adobe be very successful in deploying their infrastructure, their services not only on prem in their traditional data center, but using ONTAP Cloud. So we have more than about 1,500 customers right now that have adopted ONTAP in the AWS cloud. >> What are you seeing in terms of the adoption of flash and I'm particularly interested in the intersection of flash adoption and the developer angle, because we've seen, in certain instances, certain organizations are able to share data off of flash much more efficiently that you would be, for instance, of a spinning disk? Have you seen a developer impact in your customer base? >> Absolutely I think most of customers initially have adopted flash, because of high throughput and low latency. I think over time customers really understood and identified with the overall value proposition in cost of ownership in flash that it enables them to consolidate multiple workloads in a smaller footprint. So that enables you to then reduce the cost to operate that infrastructure and it really gives you a range of applications that you can deploy that you were never able to do that. Everybody's looking to do in place, in line analytics that now are possible, because of this fast media. Folks are looking to accelerate old applications in which they cannot invest anymore, but they just want to run faster. Flash also tends to be more reliable than traditional storage, so customers definitely appreciate that fewer things could go wrong so overall the value proposition of flash, it's all encompassing and we believe that in the near future flash will be the defacto standard in everybody's data center, whether it's on prem or in the cloud. >> How about backup and recovery in big data? We obviously, in the enterprise, very concerned about data protection. What's similar in big data? What's different and what's NetApp's angle on that? >> I think data protection and data security will never stop being important to our customers. Security's top of mind for everybody in the industry and it's a source of resume changing events, if you would, and they're typically not promotions. So we have invested a tremendous deal in certifications for HIPAA, for FIPS, we are enabling encryption, both at rest and in flight. We've done a lot of work to make sure that the encryption can happen in software layer, to make sure that we give the customers best storage class efficiency and what we're also leveraging is the innovation that ONTAP has done over many years to protect the data, replicate its snapshots, peering the data to the cloud. These are techniques that we're commonly using to reduce the cost of ownership, also protect the data the customers deploy. >> So security's still a hot topic and, like you said, it probably always will be, but it's a shared responsibility, right? So customers leveraging NetApps safe or on prem hybrid also using Azure or AWS, who's your target audience? If you're talking to the guys and gals that are still managing storage are you also having the CSO or the security guys come in, the gals, to understand we've got this appointment in Azure or AWS so we're going to bring in ONTAP to facilitate this? There's a shared responsibility of security. Who's at the table, from your perspective, in your customers that you need to help understand how they facilitate true security? >> It's definitely been a transformative event where more and more people in IQ organizations are involved in the decisions that are required to deploy the applications. There was a time when we would talk only to the storage admin. After a while we started talking to the application admin, the virtualization admin and now you're talking to the line of business who has that vested interest to make sure that they can harness the power of the data in their environment. So you have the CSO, you have the traditional infrastructure people, you have the app administration and you have the app owner, the business owner that are all at the table that are coming and looking to choose the best of breed solution for their data management. >> What are the conversations like with your CXO, executives? Everybody talks about digital transformation. It's kind of an overused term, but there's real substance when you actually peel the onion. What are you seeing as NetApp's role in effecting digital transformations within your customer base? >> I think we have a vision of how we can help enterprises take advantage of the digital transformation and adopt it. I think we have three tenants of that vision. Number one is we're helping customers harness the power of the cloud. Number two, we're looking to enable them to future proof their investments and build the next generation data center. And number three, nobody starts with a fresh slate so we're looking to help customers modernize their current infrastructure through storage. We have a lot of expertise in storage. We've helped, over time, customers time and again adopt disruptive technologies in nondisruptive ways. We're looking to adopt these technologies and trends on behalf of our customers and then help them use them in a seamless safe way. >> And continue their evolution to identify new revenue streams, new products, new opportunities and even probably give other lines of business access to this data that they need to understand is there value here, how can we harness it faster than our competitors, right? >> Absolutely. It's all about deriving value out of the data. I think earlier I called it the gold of the 21st Century. This is a trend that will continue. I believe there will be no enterprise or center that won't focus on using machine learning, deep learning, analytics to derive more value out of the data to find more customer touch points, to optimize their business to really compete in the marketplace. >> Data plus AI plus cloud economics are the new innovation drivers of the next 10, 20 years. >> Completely agree. >> Well Octavian thanks so much for spending time with us this morning sharing what's new at NetApp, some of the visions that you guys have and also some of the impact that you're making with customers. We look forward to having you back on the program in the near future. >> Thank you. Appreciate having the time. >> And for my cohost Dave Vellante I'm Lisa Martin. You're watching The Cube live on day two of coverage of our event, Big Data SV. We're at this really cool venue, Forager Tasting Room. Come down here, join us, get to hear all these great conversations. Stick around and we'll be right back with our next guest after a short break. (electronic music)

Published Date : Mar 8 2018

SUMMARY :

brought to you by SiliconANGLE Media We're down the street from the Strata Data Conference. in the customer based to clustered ONTAP. that you accelerate the business, Back in the day if you had a big data problem and making sure that the ONTAP operating system A lot of people are going to be skeptical about that, that includes the server and the storage, that ONTAP in the cloud can deliver. that have adopted ONTAP in the AWS cloud. to operate that infrastructure and it really gives you We obviously, in the enterprise, peering the data to the cloud. that you need to help understand that are required to deploy the applications. What are the conversations like with your CXO, executives? and build the next generation data center. out of the data to find more customer touch points, are the new innovation drivers of the next 10, 20 years. We look forward to having you back on the program Appreciate having the time. get to hear all these great conversations.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
George Kurian	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Octavian Tanase	PERSON	0.99+
Adobe	ORGANIZATION	0.99+
Octavian	PERSON	0.99+
AWS	ORGANIZATION	0.99+
eight years	QUANTITY	0.99+
San Jose	LOCATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
NetApp	TITLE	0.99+
Hadoop	TITLE	0.99+
five megabytes	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
second platform	QUANTITY	0.99+
21st Century	DATE	0.99+
HIPAA	TITLE	0.99+
Strata Data Conference	EVENT	0.99+
yesterday	DATE	0.99+
ONTAP	TITLE	0.99+
The Cube	TITLE	0.99+
IDC	ORGANIZATION	0.98+
both	QUANTITY	0.98+
One	QUANTITY	0.98+
Unix	COMMERCIAL_ITEM	0.98+
NetApp	ORGANIZATION	0.97+
The Cube	ORGANIZATION	0.97+
Silicon Valley	LOCATION	0.96+
ONTAP Cloud	TITLE	0.95+
more than about 1,500 customers	QUANTITY	0.95+
NetApps	TITLE	0.93+
Big Data SV	EVENT	0.93+
Big Data SV 2018	EVENT	0.93+
day two	QUANTITY	0.93+
Forager Tasting Room	LOCATION	0.88+
NoSQL	TITLE	0.87+
Azure	ORGANIZATION	0.86+
third platform applications	QUANTITY	0.81+
a minute ago	DATE	0.81+
Number two	QUANTITY	0.8+
Senior Vice President	PERSON	0.79+
three tenants	QUANTITY	0.78+
decades	QUANTITY	0.74+
a petabyte of data	QUANTITY	0.73+
tenth big	QUANTITY	0.71+
Number one	QUANTITY	0.71+
three copies	QUANTITY	0.7+
this morning	DATE	0.69+
number three	QUANTITY	0.68+
ONTAP	ORGANIZATION	0.67+
Data ONTAP	ORGANIZATION	0.64+
event	QUANTITY	0.64+
Net App	TITLE	0.64+
10	QUANTITY	0.64+
half	QUANTITY	0.6+
flash	TITLE	0.58+
much	QUANTITY	0.58+
Big Data	EVENT	0.57+
years	QUANTITY	0.55+

Sastry Malladi, FogHorn | Big Data SV 2018

>> Announcer: Live from San Jose, it's theCUBE, presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partner. (upbeat electronic music) >> Welcome back to The Cube. I'm Lisa Martin with George Gilbert. We are live at our event, Big Data SV, in downtown San Jose down the street from the Strata Data Conference. We're joined by a new guest to theCUBE, Sastry Malladi, the CTO Of FogHorn. Sastry, welcome to theCUBE. >> Thank you, thank you, Lisa. >> So FogHorn, cool name, what do you guys do, who are you? Tell us all that good stuff. >> Sure. We are a startup based in Silicon Valley right here in Mountain View. We started about three years ago, three plus years ago. We provide edge computing intelligence software for edge computing or fog computing. That's how our company name got started is FogHorn. For our particularly, for our IoT industrial sector. All of the industrial guys, whether it's transportation, manufacturing, oil and gas, smart cities, smart buildings, any of those different sectors, they use our software to predict failure conditions in real time, or do condition monitoring, or predictive maintenance, any of those use cases and successfully save a lot of money. Obviously in the process, you know, we get paid for what we do. >> So Sastry... GE populized this concept of IIoT and the analytics and, sort of the new business outcomes you could build on it, like Power by the Hour instead of selling a jet engine. >> Sastry: That's right. But there's... Actually we keep on, and David Floor did some pioneering research on how we're going to have to do a lot of analytics on the edge for latency and bandwidth. What's the FogHorn secret sauce that others would have difficulty with on the edge analytics? >> Okay, that's a great question. Before I directly answer the question, if you don't mind, I'll actually even describe why that's even important to do that, right? So a lot of these industrial customers, if you look at, because we work with a lot of them, the amount of data that's produced from all of these different machines is terabytes to petabytes of data, it's real. And it's not just the traditional digital sensors but there are video, audio, acoustic sensors out there. The amount of data is humongous, right? It's not even practical to send all of that to a Cloud environment and do data processing, for many reasons. One is obviously the connectivity, bandwidth issues, and all of that. But the two most important things are cyber security. None of these customers actually want to connect these highly expensive machines to the internet. That's one. The second is the lack of real-time decision making. What they want to know, when there is a problem, they want to know before it's too late. We want to notify them it is a problem that is occurring so that have a chance to go fix it and optimize their asset that is in question. Now, existing solutions do not work in this constrained environment. That's why FogHorn had to invent that solution. >> And tell us, actually, just to be specific, how constrained an environment you can operate in. >> We can run in about less than 100 to 150 megabytes of memory, single-core to dual-core of CPU, whether it's an ARM processor, an x86 Intel-based processor, almost literally no storage because we're a real-time processing engine. Optionally, you could have some storage if you wanted to store some of the results locally there but that's the kind of environment we're talking about. Now, when I say 100 megabytes of memory, it's like a quarter of Raspberry Pi, right? And even in that environment we have customers that run dozens of machinery models, right? And we're not talking -- >> George: Like an ensemble. >> Like an anomaly detection, a regression, a random forest, or a clustering, or a gamut, some of those. Now, if we get into more deep learning models, like image processing and neural net and all of that, you obviously need a little bit more memory. But what we have shown, we could still run, one of our largest smart city buildings customer, elevator company, runs in a raspberry Pi on millions of elevators, right? Dozens of machinery algorithms on top of that, right? So that's the kind of size we're talking about. >> Let me just follow up with one question on the other thing you said, with, besides we have to do the low-latency locally. You said a lot of customers don't want to connect these brown field, I guess, operations technology machines to the internet, and physically, I mean there was physical separation for security. So it's like security, Bill Joy used to say "Security by obscurity." Here it's security by -- >> Physical separation, absolutely. Tell me about it. I was actually coming from, if you don't mind, last week I was in Saudi Arabia. One of the oil and gas plants where we deployed our software, you have to go to five levels of security even to get to there, It's a multibillion dollar plant and refining the gas and all of that. Completely offline, no connectivity to the internet, and we installed, in their existing small box, our software, connected to their live video cameras that are actually measuring the stuff, doing the processing and detecting the specific conditions that we're looking for. >> That's my question, which was if they want to be monitoring. So there's like one low level, really low hardware low level, the sensor feeds. But you could actually have a richer feed, which is video and audio, but how much of that, then, are you doing the, sort of, inferencing locally? Or even retraining, and I assume that since it's not the OT device, and it's something that's looking at it, you might be more able to send it back up the Cloud if you needed to do retraining? >> That's exactly right. So the way the model works is particularly for image processing because you need, it's a more complex process to train than create a model. You could create a model offline, like in a GPU box, an FPGA box and whatnot. Import and bring the model back into this small little device that's running in the plant, and now the live video data is coming in, the model is inferencing the specific thing. Now there are two ways to update and revise the model: incremental revision of the model, you could do that if you want, or you can send the results to a central location. Not internet, they do have local, in this example for example a PIDB, an OSS PIDB, or some other local service out there, where you have an opportunity to gather the results from each of these different locations and then consolidate and retrain the model, put the model back again. >> Okay, the one part that I didn't follow completely is... If the model is running ultimately on the device, again and perhaps not even on a CPU, but a programmable logic controller. >> It could, even though a programmable controller also typically have some shape of CPU there as well. These days, most of the PLCs, programmable controllers, have either an RM-based processor or an x86-based processor. We can run either one of those too. >> So, okay, assume you've got the model deployed down there, for the, you know, local inferencing. Now, some retraining is going to go on in the Cloud, where you have, you're pulling in the richer perspective from many different devices. How does that model get back out to the device if it doesn't have the connectivity between the device and the Cloud? >> Right, so if there's strictly no connectivity, so what happens is once the model is regenerated or retrained, they put a model in a USB stick, it's a low attack. USB stick, bring it to the PLC device and upload the model. >> George: Oh, so this is sort of how we destroyed the Iranian centrifuges. >> That's exactly right, exactly right. But you know, some other environments, even though it's not connectivity to the Cloud environment, per se, but the devices have the ability to connect to the Cloud. Optionally, they say, "Look, I'm the device "that's coming up, do you have an upgraded model for me?" Then it can pull the model. So in some of the environments it's super strict where there are absolutely no way to connect this device, you put it in a USB stick and bring the model back here. Other environments, device can query the Cloud but Cloud cannot connect to the device. This is a very popular model these days because, in other words imagine this, an elevator sitting in a building, somebody from the Cloud cannot reach the elevator, but an elevator can reach the Cloud when it wants to. >> George: Sort of like a jet engine, you don't want the Cloud to reach the jet engine. >> That's exactly right. The jet engine can reach the Cloud it if wants to, when it wants to, but the Cloud cannot reach the jet engine. That's how we can pull the model. >> So Sastry, as a CTO you meet with customers often. You mentioned you were in Saudi Arabia last week. I'd love to understand how you're leveraging and gaging with customers to really help drive the development of FogHorn, in terms of being differentiated in the market. What are those, kind of bi-directional, symbiotic customer relationships like? And how are they helping FogHorn? >> Right, that's actually a great question. We learn a lot from customers because we started a long time ago. We did an initial version of the product. As we begin to talk to the customers, particularly that's part of my job, where I go talk to many of these customers, they give us feedback. Well, my problem is really that I can't even do, I can't even give you connectivity to the Cloud, to upgrade the model. I can't even give you sample data. How do you do that modeling, right? And sometimes they say, "You know what, "We are not technical people, help us express the problem, "the outcome, give me tools "that help me express that outcome." So we created a bunch of what we call OT tools, operational technology tools. How we distinguish ourselves in this process, from the traditional Cloud-based vendor, the traditional data science and data analytics companies, is that they think in terms of computer scientists, computer programmers, and expressions. We think in terms of industrial operators, what can they express, what do they know? They don't really necessarily care about, when you tell them, "I've got an anomaly detection "data science machine algorithm", they're going to look at you like, "What are you talking about? "I don't understand what you're talking about", right? You need to tell them, "Look, this machine is failing." What are the conditions in which the machine is failing? How do you express that? And then we translate that requirement, or that into the underlying models, underlying Vel expressions, Vel or CPU expression language. So we learned a ton from user interface, capabilities, latency issues, connectivity issues, different protocols, a number of things that we learn from customers. >> So I'm curious with... More of the big data vendors are recognizing data in motion and data coming from devices. And some, like Hortonworks DataFlow NiFi has a MiNiFi component written in C plus plus, really low resource footprint. But I assume that that's really just a transport. It's almost like a collector and that it doesn't have the analytics built in -- >> That's exactly right, NiFi has the transport, it has the real-time transport capability for sure. What it does not have is this notion of that CEP concept. How do you combine all of the streams, everything is a time series data for us, right, from the devices. Whether it's coming from a device or whether it's coming from another static source out there. How do you express a pattern, a recognition pattern definition, across these streams? That's where our CPU comes in the picture. A lot of these seemingly similar software capabilities that people talk about, don't quite exactly have, either the streaming capability, or the CPU capability, or the real-time, or the low footprint. What we have is a combination of all of that. >> And you talked about how everything's time series to you. Is there a need to have, sort of an equivalent time series database up in some central location? So that when you subset, when you determine what relevant subset of data to move up to the Cloud, or you know, on-prem central location, does it need to be the same database? >> No, it doesn't need to be the same database. It's optional. In fact, we do ship a local time series database at the edge itself. If you have a little bit of a local storage, you can down sample, take the results, and store it locally, and many customers actually do that. Some others, because they have their existing environment, they have some Cloud storage, whether it's Microsoft, it doesn't matter what they use, we have connectors from our software to send these results into their existing environments. >> So, you had also said something interesting about your, sort of, tool set, as being optimized for operations technology. So this is really important because back when we had the Net-Heads and the Bell-Heads, you know it was a cultural clash and they had different technologies. >> Sastry: They sure did, yeah. >> Tell us more about how selling to operations, not just selling, but supporting operations technology is different from IT technology and where does that boundary live? >> Right, so typical IT environment, right, you start with the boss who is the decision maker, you work with them and they approve the project and you go and execute that. In an industrial, in an OT environment, it doesn't quite work like that. Even if the boss says, "Go ahead and go do this project", if the operator on the floor doesn't understand what you're talking about, because that person is in charge of operating that machine, it doesn't quite work like that. So you need to work bottom up as well, to convincing them that you are indeed actually solving their pain point. So the way we start, where rather than trying to tell them what capabilities we have as a product, or what we're trying to do, the first thing we ask is what is their pain point? "What's your problem? What is the problem "you're trying to solve?" Some customers say, "Well I've got yield, a lot of scrap. "Help me reduce my scrap. "Help me to operate my equipment better. "Help me predict these failure conditions "before it's too late." That's how the problem starts. Then we start inquiring them, "Okay, what kind of data "do you have, what kind of sensors do you have? "Typically, do you have information about under what circumstances you have seen failures "versus not seeing failures out there?" So in the process of inauguration we begin to understand how they might actually use our software and then we tell them, "Well, here, use your software, "our software, to predict that." And, sorry, I want 30 more seconds on that. The other thing is that, typically in an IT environment, because I came from that too, I've been in this position for 30 plus years, IT, UT and all of that, where we don't right away talk about CEP, or expressions, or analytics, and we don't talk about that. We talk about, look, you have these bunch of sensors, we have OT tools here, drag and drop your sensors, express the outcome that you're trying to look for, what is the outcome you're trying to look for, and then we drive behind the scenes what it means. Is it analytics, is it machine learning, is it something else, and what is it? So that's kind of how we approach the problem. Of course, if, sometimes you do surprisingly occasionally run into very technical people. From those people we can right away talk about, "Hey, you need these analytics, you need to use machinery, "you need to use expressions" and all of that. That's kind of how we operate. >> One thing, you know, that's becoming clearer is I think this widespread recognition that's data intensive and low latency work to be done near the edge. But what goes on in the Cloud is actually closer to simulation and high-performance compute, if you want to optimize a model. So not just train it, but maybe have something that's prescriptive that says, you know, here's the actionable information. As more of your data is video and audio, how do you turn that into something where you can simulate a model, that tells you the optimal answer? >> Right, so this is actually a good question. From our experience, there are models that require a lot of data, for example, video and audio. There are some other models that do not require a lot of data for training. I'll give you an example of what customer use cases that we have. There's one customer in a manufacturing domain, where they've been seeing a lot of finished goods failures, there's a lot of scrap and the problem then was, "Hey, predict the failures, "reduce my scrap, save the money", right? Because they've been seeing a lot of failures every single day, we did not need a lot of data to train and create a model to that. So, in fact, we just needed one hour's worth of data. We created a model, put the thing, we have reduced, completely eliminated their scrap. There are other kinds of models, other kinds of models of video, where we can't do that in the edge, so we're required for example, some video files or simulated audio files, take it to an offline model, create the model, and see whether it's accurately predicting based on the real-time video coming in or not. So it's a mix of what we're seeing between those two. >> Well Sastry, thank you so much for stopping by theCUBE and sharing what it is that you guys at FogHorn are doing, what you're hearing from customers, how you're working together with them to solve some of these pretty significant challenges. >> Absolutely, it's been a pleasure. Hopefully this was helpful, and yeah. >> Definitely, very educational. We want to thank you for watching theCUBE, I'm Lisa Martin with George Gilbert. We are live at our event, Big Data SV in downtown San Jose. Come stop by Forager Tasting Room, hang out with us, learn as much as we are about all the layers of big data digital transformation and the opportunities. Stick around, we will be back after a short break. (upbeat electronic music)

Published Date : Mar 8 2018

SUMMARY :

brought to you by SiliconANGLE Media down the street from the Strata Data Conference. what do you guys do, who are you? Obviously in the process, you know, the new business outcomes you could build on it, What's the FogHorn secret sauce that others Before I directly answer the question, if you don't mind, how constrained an environment you can operate in. but that's the kind of environment we're talking about. So that's the kind of size we're talking about. on the other thing you said, with, and refining the gas and all of that. the Cloud if you needed to do retraining? Import and bring the model back If the model is running ultimately on the device, These days, most of the PLCs, programmable controllers, if it doesn't have the connectivity USB stick, bring it to the PLC device and upload the model. we destroyed the Iranian centrifuges. but the devices have the ability to connect to the Cloud. you don't want the Cloud to reach the jet engine. but the Cloud cannot reach the jet engine. So Sastry, as a CTO you meet with customers often. they're going to look at you like, and that it doesn't have the analytics built in -- or the real-time, or the low footprint. So that when you subset, when you determine If you have a little bit of a local storage, So, you had also said something interesting So the way we start, where rather than trying that tells you the optimal answer? and the problem then was, "Hey, predict the failures, and sharing what it is that you guys at FogHorn are doing, Hopefully this was helpful, and yeah. We want to thank you for watching theCUBE,

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
George	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Saudi Arabia	LOCATION	0.99+
Sastry Malladi	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
one hour	QUANTITY	0.99+
Sastry	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
GE	ORGANIZATION	0.99+
100 megabytes	QUANTITY	0.99+
Lisa	PERSON	0.99+
Bill Joy	PERSON	0.99+
two	QUANTITY	0.99+
FogHorn	ORGANIZATION	0.99+
last week	DATE	0.99+
Mountain View	LOCATION	0.99+
30 more seconds	QUANTITY	0.99+
David Floor	PERSON	0.99+
one question	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
30 plus years	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
three plus years ago	DATE	0.99+
one customer	QUANTITY	0.98+
one	QUANTITY	0.98+
second	QUANTITY	0.98+
C plus plus	TITLE	0.98+
One	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
150 megabytes	QUANTITY	0.98+
two ways	QUANTITY	0.97+
Strata Data Conference	EVENT	0.97+
Iranian	OTHER	0.97+
five levels	QUANTITY	0.95+
millions of elevators	QUANTITY	0.95+
about less than 100	QUANTITY	0.95+
one part	QUANTITY	0.94+
Vel	OTHER	0.94+
One thing	QUANTITY	0.92+
dozens of machinery models	QUANTITY	0.92+
each	QUANTITY	0.91+
Intel	ORGANIZATION	0.91+
FogHorn	PERSON	0.86+
2018	DATE	0.85+
first thing	QUANTITY	0.85+
single-core	QUANTITY	0.85+
NiFi	ORGANIZATION	0.82+
Power by the Hour	ORGANIZATION	0.81+
about three years ago	DATE	0.81+
Forager Tasting R	ORGANIZATION	0.8+
a ton	QUANTITY	0.8+
CTO	PERSON	0.79+
multibillion dollar	QUANTITY	0.79+
Data	EVENT	0.79+
Bell-Heads	ORGANIZATION	0.78+
every single day	QUANTITY	0.76+
The Cube	ORGANIZATION	0.75+
Cloud	COMMERCIAL_ITEM	0.73+
Dozens of machinery algorithms	QUANTITY	0.71+
Pi	COMMERCIAL_ITEM	0.71+
petabytes	QUANTITY	0.7+
raspberry	ORGANIZATION	0.69+
Big Data	ORGANIZATION	0.68+
Cloud	TITLE	0.67+
dual-core	QUANTITY	0.65+
Sastry	ORGANIZATION	0.62+
Net	ORGANIZATION	0.61+

Daniel Raskin, Kinetica | Big Data SV 2018

>> Narrator: Live, from San Jose, it's theCUBE. Presenting Big Data Silicon Valley. Brought to you by SiliconANGLE Media and its ecosystem partners (mellow electronic music) >> Welcome back to theCUBE, on day two of our coverage of our event, Big Data SV. I'm Lisa Martin, my co-host is Peter Burris. We are the down the street from the Strata Data Conference, we've had a great day yesterday, and great morning already, really learning and peeling back the layers of big data, challenges, opportunities, next generation, we're welcoming back to theCUBE an alumni, the CMO of Kinetica, Dan Raskin. Hey Dan, welcome back to theCUBE. >> Thank you, thank you for having me. >> So, I'm a messaging girl, look at your website, the insight engine for the extreme data economy. Tell us about the extreme data economy, and what is that, what does it mean for your customers? >> Yeah, so it's a great question, and, from our perspective, we sit, we're here at Strata, and you see all the different vendors kind of talking about what's going on, and there's a little bit of word spaghetti out there that makes it really hard for customers to think about how big data is affecting them today, right? And so, what we're actually looking at is the idea of, the world's changed. That, big data from five years ago, doesn't necessarily address all the use cases today. If you think about what customers are going through, you have more users, devices, and things coming on, there's more data coming back than ever before, and it's not just about creating the data driven business, and building these massive data lakes that turn into data swamps, it's really about how do you create the data-powered business. So when we're using that term, we're really trying to call out that the world's changed, that, in order for businesses to compete in this new world, they have to think about to take data and create CoreIP that differentiates, how do I use it to affect the omnichannel, how do I use it to deal with new things in the realm of banking and Fintech, how do I use it to protect myself against disruption in telco, and so, the extreme data economy is really this idea that you have business in motion, more things coming online ever before, how do I create a data strategy, where data is infused in my business, and creates CoreIP that helps me maintain category leadership or grow. >> So as you think about that challenge, there's a number of technologies that come into play. Not least of which is the industry, while it's always to a degree been driven by what hardware can do, that's moderated a bit over time, but today, in many respects, a lot of what is possible is made possible, by what hardware can do, and what hardware's going to be able to do. We've been using similar AI algorithms for a long time. But we didn't have the power to use them! We had access to data, but we didn't have the power to acquire and bring it in. So how is the relationship between your software, and your platform, and some of the new hardware that's becoming available, starting to play out in a way of creating value for customers? >> Right, so, if you think about this in terms of this extreme data concept, and you think about it in terms of a couple of things, one, streaming data, just massive amounts of streaming data coming in. Billions of rows that people want to take and translate into value. >> And that data coming from-- >> It's coming from users, devices, things, interacting with all the different assets, more edge devices that are coming online, and the Wild West essentially. You look at the world of IoT and it's absolutely insane, with the number of protocols, and device data that's coming back to a company, and then you think about how do you actually translate this into real-time insight. Not near real-time, where it's taking seconds, but true millisecond response times where you can infuse this into your business, and one of our whole premises about Kinetica is the idea of this massive parallel compute. So the idea of not using CPUs anymore, to actually drive the powering behind your intelligence, but leveraging GPUs, and if you think about this, a CPU has 64 cores, 64 parallel things that you can do at a time, a GPU can have up to 6,000 cores, 6,000 parallel things, so it's kind of like lizard brain verse modern brain. How do you actually create this next generation brain that has all these neural networks, for processing the data, in a way that you couldn't. And then on top of that, you're using not just the technology of GPUs, you're trying to operationalize it. So how do you actually bring the data scientist, the BI folks, the business folks all together to actually create a unified operational process, and the underlying piece is the Kinetica engine and the GPU used to do this, but the power is really in the use cases of what you can do with it, and how you actually affect different industries. >> So can you elaborate a little bit more on the use cases, in this kind of game changing environment? >> Yeah, so there's a couple of common use cases that we're seeing, one that affects every enterprise is the idea of breaking down silos of business units, and creating the customer 360 view. How do I actually take all these disparate data feeds, bring them into an engine where I can visualize concepts about my customer and the environment that they're living in, and provide more insight? So if you think about things like Whole Foods and Amazon merging together, you now have this power of, how do I actually bridge the digital and physical world to create a better omnichannel experience for the user, how do I think about things in terms of what preferences they have, personalization, how to actually pair that with sensor data to affect how they actually navigate in a Whole Foods store more efficiently, and that's affecting every industry, you could take that to banking as well and think about the banking omminchannel, and ATMs, and the digital bank, and all these Fintech upstarts that are working to disrupt them. A great example for us is the United States Postal Service, where we're actually looking at all the data, the environmental data, around the US Postal Service, we're able to visualize it in real-time, we're able to affect the logistics of how they actually navigate through their routes, we're able to look things like postal workers separating out of their zones, and potentially kicking off alerts around that, so effectively making the business more efficient. But, we've moved into this world where we always used to talk about brick and mortar going to cloud, we're now in this world where the true value is how you bridge the digital and physical world, and create more transformative experiences, and that's what we want to do with data. So it could be logistics, it could be omnichannel, it could be security, you name it. It affects every single industry that we're talking about. >> So I got two questions, what is Kinetica's contribution to that, and then, very importantly, as a CMO, how are you thinking about making sure that the value that people are creating, or can create with Kinetica, gets more broadly diffused into an ecosystem. >> Yeah, so the power that we're bringing is the idea of how to operationalize this in a way where again, you're using your data to create value, so, having a single engine where you're collecting all of this data, massive volumes of data, terabytes upon terabytes of data, enabling it where you can query the data, with millisecond response times, and visualize it, with millisecond response times, run machine learning algorithms against it to augment it, you still have that human ability to look at massive sets of data, and do ad hoc discovery, but can run machining learning algorithms against that and complement it with machine learning. And then the operational piece of bringing the data scientists into the same platform that the business is using, so you don't have data recency issues, is a really powerful mix. The other piece I would just add is the whole piece around data discovery, you can't really call it big data if, in order to analyze the data, you have to downsize and downsample to look at a subset of data. It's all about looking at the entire set. So that's where we really bring value. >> So, to summarize very quickly, you are providing a platform that can run very, very fast, in a parallel system, and memories in these parallel systems, so that large amounts of data can be acted upon. >> That's right. >> Now, so, the next question is, there's not going to be a billion people that are going to use your tool to do things, how are you going to work with an ecosystem and partners to get the value that you're able to create with this data, out into the engine enterprise. >> It's a great question, and probably the biggest challenge that I have, which is, how do you get above the word spaghetti, and just get into education around this. And so I think the key is getting into examples, of how it's affecting the industry. So don't talk about the technology, and streaming from Kafka into a GPU-powered engine, talk about the impact to the business in terms of what it brings in terms of the omnichannel. You look at something like Japan in the 2020 Olympics, and you think about that in terms of telco, and how are the mobile providers going to be able to take all the data of what people are doing, and to related that to ad-tech, to relate that to customer insight, to relate that to new business models of how they could sell the data, that's the world of education we have to focus on, is talk about the transformative value it brings from the customer perspective, the outside-in as opposed to the inside-out. >> On that educational perspective, as a CMO, I'm sure you meet with a lot of customers, do you find that you might be in this role of trying to help bridge the gaps between different roles in an organization, where there's data silos, and there's probably still some territorial culture going on? What are you finding in terms of Kinetica's ability to really help educate and maybe bring more stakeholders, not just to the table, but kind of build a foundation of collaboration? >> Yeah, it's a really interesting question because I think it means, not just for Kinetica, but all vendors in the space, have to get out of their comfort zone, and just stop talking speeds and feeds and scale, and in fact, when we were looking at how to tell our story, we did an analysis of where most companies were talking, and they were focusing a lot more on the technical aspirations that developers sell, which is important, you still need to court the developer, you have community products that they can download, and kick the tires with, but we need to extend our dialogue, get out of our customer comfort zone, and start talking more to CIOs, CTOs, CDOs, and that's just reaching out to different avenues of communication, different ways of engaging. And so, I think that's kind of a core piece that I'm taking away from Strata, is we do a wonderful job of speaking to developers, we all need to get out of our comfort zone and talk to a broader set of folks, so business folks. >> Right, 'cause that opens up so many new potential products, new revenue streams, on the marketing side being able to really target your customer base audience, with relevant, timely offers, to be able to be more connected. >> Yeah, the worst scenario is talking to an enterprise around the wonders of a technology that they're super excited about, but they don't know the use case that they're trying to solve, start with the use case they're trying to solve, start with thinking about how this could affect their position in the market, and work on that, in partnership. We have to do that in collaboration with the customers. We can't just do that alone, it's about building a partnership and learning together around how you use data in a different way. >> So as you imagine, the investments that Kinetica is going to make over the next few years, with partners, with customers, what do you hope Kinetica will be in 2020? >> So, we want it to be that transformative engine for enterprises, we think we are delivering something that's quite unique in the world, and, you want to see this on a global basis, affecting our customer's value. I almost want to take us out of the story, and if I'm successful, you're going to hear wonderful enterprise companies across telco, banking, and other areas just telling their story, and we happen to be the engine behind it. >> So you're an ingredient in their success. >> Yes, a core ingredient in their success. >> So if we think about over the course of the next technology, set of technology waves, are they any particular applications that you think you're going to be stronger in? So I'll give you an example, do you envision that Kinetica can have a major play in how automation happens inside infrastructure, or how developers start seeing patterns in data, imagine how those assets get created. Where are some of the kind of practical, but not really, or rarely talked about applications that you might find yourselves becoming more of an ingredient because they themselves become ingredients to some of these other big use cases? >> There are a lot of commonalities that we're starting to see, and the interesting piece is the architecture that you implement tends to be the same, but the context of how you talk about it, and the impact it has tends to be different, so, I already mentioned the customer 360 view? First and foremost, break down silos across your organization, figure out how do you get your data into one place where you can run queries against it, you can visualize it, you can do machine learning analysis, that's a foundational element, and, I have a company in Asia called Lippo that is doing that in their space, where all of the sudden they're starting to glean things they didn't know about their customer before to create, doing that ad hoc discovery, so that's one area. The other piece is this use case of how do you actually operationalize data scientists, and machine learning, into your core business? So, that's another area that we focus on. There are simple entry points, things like Tableau Acceleration, where you put us underneath the existing BI infrastructure, and all of the sudden, you're a hundred times faster, and now your business folks can sit at the table, and make real-time business decisions, where in the past, if they clicked on certain things, they'd have to wait to get those results. Geospatial visualization's a no-brainer, the idea of taking environmental data, pairing it with your customer data, for example, and now learning about interactions. And I'd say the other piece is more innovation driven, where we would love sit down with different innovation groups in different verticals and talk with them about, how are you looking to monetize your data in the future, what are the new business models, how does things like voice interaction affect your data strategy, what are the different ways you want to engage with your data, so there's a lot of different realms we can go to. >> One of the things you said as we wrap up here, that I couldn't agree with more, is, the best value articulation I think a brand can have, period, is through the voice of their customer. And being able to be, and I think that's one of the things that Paul said yesterday is, defining Kinetica's success based on the success of your customers across industry, and I think really doesn't get more objective than a customer who has, not just from a developer perspective, maybe improved productivity, or workforce productivity, but actually moved the business forward, to a point where you're maybe bridging the gaps between the digital and physical, and actually enabling that business to be more profitable, open up new revenue streams because this foundation of collaboration has been established. >> I think that's a great way to think about it-- >> Which is good, 'cause he's your CEO. >> (laughs) Yes, that sustains my job. But the other piece is, I almost get embarrassed talking about Kinetica, I don't want to be the car salesman, or the vacuum salesman, that sprinkles dirt on the floor and then vacuums it up, I'd rather us kind of fade to the behind the scenes power where our customers are out there telling wonderful stories that have an impact on how people live in this world. To me, that's the best marketing you can do, is real stories, real value. >> Couldn't agree more. Well Dan, thanks so much for stopping by, sharing what things that Kinetica is doing, some of the things you're hearing, and how you're working to really build this foundation of collaboration and enablement within your customers across industries. We look forward to hearing the kind of cool stuff that happens with Kinetica, throughout the rest of the year, and again, thanks for stopping by and sharing your insights. >> Thank you for having me. >> I want to thank you for watching theCUBE, I'm Lisa Martin with my co-host Peter Burris, we are at Big Data SV, our second day of coverage, at a cool place called the Forager Tasting Room, in downtown San Jose, stop by, check us out, and have a chance to talk with some of our amazing analysts on all things big data. Stick around though, we'll be right back with our next guest after a short break. (mellow electronic music)

Published Date : Mar 8 2018

SUMMARY :

Brought to you by SiliconANGLE Media We are the down the street from the Strata Data Conference, and what is that, what does it mean for your customers? and it's not just about creating the data driven business, So how is the relationship between your software, if you think about this in terms of this is really in the use cases of what you can do with it, and the digital bank, and all these Fintech upstarts making sure that the value that people are creating, is the idea of how to operationalize this in a way you are providing a platform that are going to use your tool to do things, and how are the mobile providers going to be able and kick the tires with, but we need to extend our dialogue, on the marketing side being able to really target We have to do that in collaboration with the customers. the engine behind it. that you think you're going to be stronger in? and the impact it has tends to be different, so, One of the things you said as we wrap up here, To me, that's the best marketing you can do, some of the things you're hearing, and have a chance to talk with some of our amazing analysts

ENTITIES

Entity	Category	Confidence
Peter Burris	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Paul	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dan Raskin	PERSON	0.99+
Whole Foods	ORGANIZATION	0.99+
Daniel Raskin	PERSON	0.99+
64 cores	QUANTITY	0.99+
Asia	LOCATION	0.99+
Dan	PERSON	0.99+
2020	DATE	0.99+
San Jose	LOCATION	0.99+
two questions	QUANTITY	0.99+
Kinetica	ORGANIZATION	0.99+
Lippo	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
second day	QUANTITY	0.99+
yesterday	DATE	0.99+
6,000 parallel	QUANTITY	0.99+
64 parallel	QUANTITY	0.99+
2020 Olympics	EVENT	0.99+
Strata Data Conference	EVENT	0.99+
telco	ORGANIZATION	0.98+
theCUBE	ORGANIZATION	0.98+
one	QUANTITY	0.98+
single engine	QUANTITY	0.97+
First	QUANTITY	0.97+
Wild West	LOCATION	0.97+
today	DATE	0.97+
five years ago	DATE	0.96+
Big Data SV	ORGANIZATION	0.96+
one area	QUANTITY	0.95+
Strata	ORGANIZATION	0.95+
United States Postal Service	ORGANIZATION	0.94+
day two	QUANTITY	0.93+
Narrator: Live	TITLE	0.93+
One	QUANTITY	0.93+
one place	QUANTITY	0.9+
Fintech	ORGANIZATION	0.88+
up to 6,000 cores	QUANTITY	0.88+
years	DATE	0.88+
US Postal Service	ORGANIZATION	0.88+
Billions of rows	QUANTITY	0.87+
terabytes	QUANTITY	0.85+
Japan	LOCATION	0.82+
hundred times	QUANTITY	0.82+
terabytes of data	QUANTITY	0.81+
Strata	TITLE	0.8+
Tableau Acceleration	TITLE	0.78+
single industry	QUANTITY	0.78+
CoreIP	TITLE	0.76+
360 view	QUANTITY	0.75+
Silicon Valley	LOCATION	0.73+
billion people	QUANTITY	0.73+
2018	DATE	0.73+
Data SV	EVENT	0.72+
Kinetica	COMMERCIAL_ITEM	0.72+
Forager Tasting Room	ORGANIZATION	0.68+
Big	EVENT	0.67+
millisecond	QUANTITY	0.66+
Kafka	PERSON	0.6+
Big Data	ORGANIZATION	0.59+
Data SV	ORGANIZATION	0.58+
big data	ORGANIZATION	0.56+
next	DATE	0.55+
lot	QUANTITY	0.54+
Big	ORGANIZATION	0.47+

Chris Selland, Unifi Software | Big Data SV 2018

>> Voiceover: Live from San Jose, it's The Cube. Presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partners. >> Welcome back to The Cube, our continuing coverage of our event, Big Data SV. We're on day two of this event. I'm Lisa Martin, with George Gilbert. We've had a great day yesterday learning a lot and really peeling back the layers of big data, looking at it from different perspectives, from challenges to opportunities. Joining us next is one of our Cube alumni, Chris Selland, the VP of Strategic Alliances from Unifi Software. Chris, great to meet you, welcome back! >> Thank you Lisa, it's great to be here. I have to say, as a alumni and a many time speaker, this venue is spectacular. Congratulations on the growth of The Cube, and this is an awesome venue. I've been on The Cube a bunch of times and this is as nice as I've ever seen it, >> Yeah, this is pretty cool, >> Onward and upward. This place is great. Isn't it cool? >> It really is. This is our 10th Big Data event, we've been having five now in San Jose, do our fifth one in New York City in the fall, and it's always interesting because we get the chance, George and I, and the other hosts, to really look at what is going on from different perspectives in the industry of big data. So before we kind of dig into that, tell us a little bit about Unifi Software, what do you guys do, what is unique and differentiating about Unifi. >> Sure, yeah, so I joined Unifi a little over a year ago. You know, I was attracted to the company because it really, I think, is aligned with where the market is going, and Peter talked this morning, Peter Burris was talking this morning about networks of data. Unifi is fundamentally a data catalog and data preparation platform, kind of combined or unified together. So, you know, so people say, "What do you do?" We're a data catalog with integrated data preparation. And the idea behind that, to go to Peter's, you know, mention of networks of data, is that data is becoming more and more distributed in terms of where it is, where it lives, where it sits. This idea of we're going to put everything in the data warehouse, and then we're going to put everything in the data lake, well, in reality, some of the data's in the warehouse, some of the data's in the lake, some of the data's in SAS applications, some of the data's in blob storage. And where is all of that data, what is it, and what can I do with it, that's really the fundamental problem that we solve. And, by the way, solve it for business people, because it's not just data scientists anymore, it's really going out into the entire business community now, you know, marketing people, operations people, finance people, they need data to do their jobs. Their jobs are becoming more data driven, but they're not necessarily data people. They don't know what schemas are, or joins are, but they know, "I need better data "to be able to do my job more effectively." So that's really what we're helping with. So, Chris, this is, it's kind of interesting, if you distill, you know, the capability down to the catalog and the prep-- >> Chris: Yep. So that it's ready for a catalog, but that sort of thing is, it's like investment in infrastructure, in terms of like building the highway system, but there're going to be, you know, for those early highways, there's got to be roots that you, a reason to build them out. What are some of those early use cases that justifies the investment in data infrastructure? >> There absolutely are, I mean, and by the way, those roots don't go away, those roots, you know, just like cities, right? New roots get built on top of them. So we're very much, you know, about, there's still data sitting in mainframes and legacy systems and you know, that data is absolutely critical for many large organizations. We do a lot of working in banking and financial services, and healthcare. They're still-- >> George: Are there common use cases that they start with? >> A lot of times-- >> Like, either by industry or just cross-sectional? >> Well, it's interesting, because, you know, analysts like yourselves have tended to put data catalog, which is a relatively new term, although some other big analyst firm that's having another conference this week, they were telling us recently that, starts with a "G," right? They were telling us that data catalog is now the number one search term they're getting. But it's been, by many annals, also kind of lumped in, lumped in's the wrong word, but incorporated with data governance. So traditionally, governance, another word that starts with "G," it's been the term. So, we often, we're not a traditional data governance platform, per se, but cataloging data has to have a foundation of security in governance. You know, think about what's going on in the world right now, both in the court of law and the court of public opinion, things like GDPR, right? So GDPR sort of says any customer data you have needs to be managed a certain way, with a certain level of sensitivity, and then there's other capabilities you need to open up to customers, like the right to be forgotten, so that means I need to have really good control, first of all, knowledge of, control over, and governance over my customer data. I talked about all those business people before. Certainly marketers are a great example. Marketers want all the customer data they can get, right? But there's social security numbers, PII, who should be able to see and use what? Because, if this data is used inappropriately, then it can cause a lot of problems. So, IT kind of sits in a-- they want to enable the business, but at the same time, there's a lot of risk there. So, anyway, going back to your question, you know, the catalog market is kind of evolved out of the governance market with more of a focus on kind of, you know, enabling the business, but making sure that it's done in a secure and well-governed way. >> George: Guard rails. >> Yes, guard rails, exactly, good way to say it. So, yep, that's good, I said about 500 words, and you distilled it to about two, right? Perfect, yep. >> So, in terms of your role in strategic alliances, tell us a little about some of the partnerships that Unifi is forging, to help customers understand where all this data is, to your point earlier, the different lines of business that need it to drive, identify where's their value, and drive the business forward, can actually get it. >> Absolutely, well, certainly to your point, our customers are our partners, and we can talk about some of them. But also, strategic alliances, we work very closely with a number of, you know, larger technology companies, Microsoft is a good example. We were actually part of the Microsoft Accelerator Program, which I think they've now rebranded Microsoft for Startups, but we've really been given tremendous support by that team, and we're doing a lot of work to, kind of, we're to some degree cloud agnostic, we support AWS, we support Azure, we support Google Cloud, but we're doing a lot of our development also on the Azure cloud platform. But you know, customers use all of the above, so we need to support all of the above. So Microsoft's a very close partner of ours. Another, I'll be in two weeks, and we've got some interesting news pending, which unfortunately I can't get into today, but maybe in a couple weeks, with Adobe. We're working very closely with them on their marketing cloud, their experience cloud, which is what they call their enterprise marketing cloud, which obviously, big, big focus on customer data, and then we've been working with a number of organizations and the sort of professional services system integration. We've had a lot of success with a firm called Access Group. We announced the partnership with them about two weeks ago. They've been a great partner for us, as well. So, you know, it's all about an ecosystem. Making customers successful is about getting an ecosystem together, so it's a really exciting place to be. >> So, Chris, it's actually interesting, it sounds like there's sort of a two classic routes to market. One is essentially people building your solution into theirs, whether it's an application or, you know, >> Chris: An enabling layer. >> Yes. >> Chris: Yes. >> Even higher layer. But with corporate developers, you know, it's almost like we spent years experimenting with these data lakes. But they were a little too opaque. >> Chris: Yes. >> And you know, it's not just that you provide the guard rails, but you also provide, sort of some transparency-- >> Chris: Yes. >> Into that. Have you seen a greater success rate within organizations who curate their data lakes, as opposed to those who, you know, who don't? >> Yes, absolutely. I think Peter said it very well in his presentation this morning, as well. That, you know, generally when you see data lake, we associate it with Hadoop. There are use cases that Hadoop is very good for, but there are others where it might not be the best fit. Which, to the early point about networks of data and distributed data, so companies that have, or organizations that have approached Hadoop with a "let's use it what it's good for," as opposed to "let's just dump "everything in there and figure it out later," and there have been a lot of the latter, but the former have done, generally speaking, a lot better, and that's what you're seeing. And we actually use Hadoop as a part of our platform, at least for the data preparation and transformation side of what we do. We use it in its enabling technology, as well. >> You know, it's funny, actually, when you talk about, as Peter talked about, networks of data versus centralized repositories. Scott Gnau, CTO of Hortonworks, was on yesterday, and he was talking about how he had originally come from Teradata, and that they had tried to do work, that he had tried to push them in the direction of recognizing that not all the analytic data was going to be in Teradata, you know, but they had to look more broadly with Hadapt, and I forgot what the rest of, you know-- >> Chris: Right, Aster, and-- >> Aster, yeah. >> Chris: Yes, exactly, yep. >> But what was interesting is that Hortonworks was moving towards the "we believe "everything is going to be in the data lake," but now, with their data plane service, they're talking about, you know, "We have to give you visibility and access." You mediate access to data everywhere. >> Chris: Right. >> So maybe help, so for folks who aren't, like, all bought into Hortonworks, for example, how much, you know, explain how you work relative to data plane service. >> Well, you know, maybe I could step back and give you a more general answer, because I agree with that philosophically, right? That, as I think we've been talking about here, with the networks of data, that goes back to my prior statement that there's, you know, there's different types of data platforms that have different use cases, and different types of solutions should be built on top of them, so things are getting more distributed. I think that, you know, Hortonworks, like every company, has to make the investments that are, as we are, making their customers successful. So, using Hadoop, and Hortonworks is one of our supported Hadoop platforms, we do work with them on engagements, but you know, it's all about making customers successful, ultimately. It's not about a particular product, it's about, you know, which data belongs in which location, and for what use case and what purpose, and then at the same time, when we're taking all of these different data sets and data sources, and cataloging them and preparing them and creating our output, where should we put that and catalog that, so we can create kind of a continuous improvement cycle, as well, and for those types-- >> A flywheel. >> A flywheel, exactly, continuous improvement flywheel, and for those types of purposes, you know, that's actually great use case for, you know, Hortonworks, Hadoop. That's a lot of what we typically use it for. We can actually put the data any place our customers define, but that's very often what we do with it, and then, but doing it in a very structured and organized way. As opposed to, you know, a lot of the early Hadoop, and not specific to any particular distro that went bad, were, it was just like, "Let's just dump it all "into Hadoop because it's cheaper." You know, "Let's, 'cause it's cheaper than the warehouse, "so let's just put it all in there, "and we'll figure what to do with it later." That's bad, but if you're using it in a structured way, it can be extremely useful. At the same point, and at the same time, not everything's going to go there belongs there, if you're being thoughtful about it. So you're seeing a lot more thoughtfulness these days, which is good. Which is good for customers, and it's good for us in the vendor side. Us, Hortonworks, everybody, so. >> So is there, maybe you can tell us of the different approaches to, like, the advantage of integrating the data prep with the catalogized service, because as soon as you're done with data prep it's visible within the catalog. >> Chris: Absolutely, that's one, yep. >> When, let's say when people do derive additional views into the data, how are they doing that in a way that then gets also registered back in the catalog, for further discovery? >> Yeah, well, having the integrated data preparation which is a huge differentiator from us, there are a lot of data catalog products out there, but our huge differentiator, one of them, is the fact that we have integrated data preparation. We don't have to hand off to another product, so that, as you said, gives us the ability to then catalog our output and build that flywheel, that continuous improvement flywheel, and it also just basically simplifies things for customers, hence our name. So, you know, it really kind of starts there. I think I, the second part of your question I didn't really, rewind back on that for me, it was-- >> Go ahead. >> Well, I'm not sure I remember it, right now, either. >> We all need more coffee. >> Exactly, we all need more coffee. >> So I'll ask you this last question, then. >> Yes, please. >> What are, so here we are in March 2018, what are you looking forward to, in terms of momentum and evolution of Unifi this year? >> Well, a lot of it, and tying into my role, I mentioned we will be at Adobe Summit in two weeks, so if you're going to be at Adobe Summit, come see us there, some of the work that we're doing with our partner, some of the events we're doing with people like Microsoft and Access, but really it's also just customer success, I mean, we're seeing tremendous momentum on the customer side, working with our customers, working with our partners, and again, as I mentioned, we're seeing so much more thoughtfulness in the market, these days, and less talk about, you know, the speeds and feeds, and more around business solutions. That's really also where our professional services, system integration partners, many of whom I've been with this week, really help, because they're building out solutions. You know, GDPR is coming in May, right? And you're starting to really see a groundswell of, okay, you know, and that's not about, you know, speeds and feeds. That's ultimately about making sure that I'm compliant with, you know, this huge regulatory environment. And at the same time, the court of public opinion is just as important. You know, we want to make sure that we're doing the right thing with data. Spread it throughout organization, make ourselves successful and make our customers successful. So, it's a lot of fun. >> That's, fun is good. >> Exactly, fun is good. >> Well, we thank you so much, Chris, for stopping back by The Cube and sharing your insights, what you're hearing in the big data industry, and some of the momentum that you're looking forward to carrying throughout the year. >> It's always a pleasure, and you, too. So, love the venue. >> Lisa: All right. >> Thank you, Lisa, thank you, George. >> Absolutely. We want to thank you for watching The Cube. You're watching our coverage of our event, Big Data SV, hashtag BigDataSV, for George, I almost said George Martin. For George Gilbert. >> George: I wish. >> George R.R., yeah. You would not be here if you were George R.R. Martin. >> George: No, I wouldn't. >> That was a really long way to say thank you for watching. I'm Lisa Martin, for this George. Stick around, we'll be right back with our next guest. (techno music)

Published Date : Mar 8 2018

SUMMARY :

brought to you by SiliconANGLE Media and really peeling back the layers of big data, Thank you Lisa, it's great to be here. Onward and upward. George and I, and the other hosts, So, you know, so people say, "What do you do?" you know, for those early highways, and legacy systems and you know, with more of a focus on kind of, you know, and you distilled it to about two, right? and drive the business forward, can actually get it. So, you know, it's all about an ecosystem. or, you know, But with corporate developers, you know, as opposed to those who, you know, who don't? That, you know, generally when you see data lake, and I forgot what the rest of, you know-- yeah. "We have to give you visibility and access." how much, you know, explain how you work to my prior statement that there's, you know, and for those types of purposes, you know, So is there, maybe you can tell us So, you know, it really kind of starts there. and that's not about, you know, speeds and feeds. Well, we thank you so much, Chris, So, love the venue. We want to thank you for watching The Cube. You would not be here if you were George R.R. That was a really long way to say thank you for watching.

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Chris	PERSON	0.99+
Peter	PERSON	0.99+
Chris Selland	PERSON	0.99+
George	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
Lisa	PERSON	0.99+
March 2018	DATE	0.99+
Adobe	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Peter Burris	PERSON	0.99+
Unifi	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
George R.R. Martin	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Unifi Software	ORGANIZATION	0.99+
May	DATE	0.99+
Teradata	ORGANIZATION	0.99+
George Martin	PERSON	0.99+
George R.R.	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Access Group	ORGANIZATION	0.99+
yesterday	DATE	0.99+
both	QUANTITY	0.99+
five	QUANTITY	0.99+
GDPR	TITLE	0.99+
SiliconANGLE Media	ORGANIZATION	0.98+
this week	DATE	0.98+
Hadapt	ORGANIZATION	0.98+
fifth one	QUANTITY	0.98+
about 500 words	QUANTITY	0.98+
Hadoop	TITLE	0.98+
Adobe Summit	EVENT	0.98+
one	QUANTITY	0.98+
two weeks	QUANTITY	0.98+
One	QUANTITY	0.96+
Aster	PERSON	0.96+
this morning	DATE	0.96+
this year	DATE	0.95+
two weeks ago	DATE	0.95+
today	DATE	0.95+
The Cube	ORGANIZATION	0.95+
Cube	ORGANIZATION	0.93+
Big Data	EVENT	0.91+
day two	QUANTITY	0.91+
Access	ORGANIZATION	0.9+

Yuanhao Sun, Transwarp | Big Data SV 2018

>> Announcer: Live, from San Jose, it's The Cube (light music) Presenting Big Data Silicon Valley. Brought to you by Silicon Angle Media, and its ecosystem partners. >> Hi, I'm Peter Burris and welcome back to Big Data SV, The Cube's, again, annual broadcast of what's happening in the big data marketplace here at, or adjacent to Strada here in San Jose. We've been broadcasting all day. We're going to be here tomorrow as well, over at the Forager eatery and place to come meander. So come on over. Spend some time with us. Now, we've had a number of great guests. Many of the thought leaders that are visiting here in San Jose today were on the big data marketplace. But I don't think any has traveled as far as our next guest. Yuanhao Sun is the ceo of Transwarp. Come all the way from Shanghai Yuanhao. It's once again great to see you on The Cube. Thank you very much for being here. >> Good to see you again. >> So Yuanhao, the Transwarp as a company has become extremely well known for great technology. There's a lot of reasons why that's the case, but you have some interesting updates on how the technology's being applied. Why don't you tell us what's going on? >> Okay, so, recently we announced the first order to the TPC-DS benchmark result. Our product, calling scepter, that is, SQL engine on top of Hadoop. We already add quite a lot of features, like dissre transactions, like a full SQL support. So that it can mimic, like oracle or the mutual, and also traditional database features so that we can pass the whole test. This single is also scalable, because it's distributed, scalable. So the large benchmark, like TPC-DS. It starts from 10 terabytes. SQL engine can pester without much trouble. >> So I know that there have been other firms that have claimed to pass TPCC-DS, but they haven't been audited. What does it mean to say you're audited? I'd presume that as a result, you've gone through some extremely stringent and specific tests to demonstrate that you can actually pass the entire suite. >> Yes, actually, there is a third party auditor. They already audit our test process and it results for the passed six, uh, five months. So it is fully audited. The reason why we can pass the test is because, actually, there's two major reasons for traditional databases. They are not scalable to the process large dataset. So they could not pass the test. For (mumbles) vendors, because the SQL engine, the features to reach enough to pass all the test. You know, there several steps in the benchmark, and the SQL queries, there are 99 queries, the syntax is not supported by all howve vendors yet. And also, the benchmark required to upload the data, after the queries, and then we run the queries for multiple concurrent users. That means you have to support disputed transactions. You have to make the upload data consistent. For howve vendors, the SQL engine on Hadoop. They haven't implemented the de-switch transaction capabilities. So that's why they failed to pass the benchmark. >> So I had the honor of traveling to Shanghai last year and going and speaking at your user conference and was quite impressed with the energy that was in the room as you announced a large number of new products. You've been very focused on taking what open source has to offer but adding significant value to it. As you said, you've done a lot with the SQL interfaces and various capabilities of SQL on top of Hadoop. Where is Transwarp going with its products today? How is it expanding? How is it being organizing? How is it being used? >> We group these products into three catalog, including big data, cloud, AI and the machine learning. So there are three categories. The big data, we upgrade the SQL engine, the stream engine, and we have a set of tools called adjustable studio to help people to streamline the big data operations. And the second part I lie is data cloud. We call it transwarp data cloud. So this product is going to be raised in early in May this year. So this product we build this product on top of common idiots. We provide how to buy the service, get a sense as service, air as a service to customers. A lot of people took credit multiple tenets. And they turned as isolated by network, storage, cpu. They free to create a clusters and speeding up on turning it off. So it can also scale hundreds of cost. So this is the, I think this is the first we implement, like, a network isolation and sweaty percendency in cobinets. So that it can support each day affairs and all how to components. And because it is elastic, just like car computing, but we run on bare model, people can consult the data, consult the applications in one place. Because all application and Hadoop components are conternalized, that means, we are talking images. We can spend up a very quickly and scale through a larger cluster. So this data cloud product is very interesting for large company, because they usually have a small IT team. But they have to provide a (mumbles), and a machine only capability to larger groups, like one found the people. So they need a convenient way to manage all these bigger clusters. And they have to isolate the resources. Even they need a bidding system. So this product is, we already have few big names in China, like China Post, Picture Channel, and Secret of Source Channel. So they are already applying this data cloud for their internal customers. >> And China has a, has a few people, so I presume that, you know, China Post for example, is probably a pretty big implementation. >> Yes so, they have a, but the IT team is, like less than 100 people, but they have to support thousands of users. So that's why they, you usually would deploy 100 cluster for each application, right, but today, for large organization, they have lots of applications. They hope to leverage big data capability, but a very small team, IT team, can also part of so many applications. So they need a convenient the way like a, just like when you put Hadoop on public cloud. We provide a product that allows you to provide a hardware service in private cloud on bare model machines. So this is the second product category. And the third is the machine learning and artificial intelligence. We provide a data sales platform, a machine learning tool, that is, interactive tools that allows people to create the machine only pipelines and models. We even implemented some automatic modeling capability that allow you to, to fisher in youring automatically or seeming automatically and to select the best items for you so that the machine learning can be, so everyone can be at Los Angeles. So they can use our tool to quickly create a models. And we also have some probuter models for different industry, like financial service, like banks, security companies, even iot. So we have different probuter machine only models for them. We just need to modify the template, then apply the machine only models to the applications very quickly. So that probably like a lesson, for example, for a bank customer, they just use it to deploy a model in one week. This is very quick for them. Otherwise, in the past, they have a company to build that application, to develop much models. They usually takes several months. Today it is much faster. So today we have three categories, particularly like cloud and machine learning. >> Peter Burris: Machine learning and AI. >> And so three products. >> And you've got some very, very big implementations. So you were talking about a couple of banks, but we were talking, before we came on, about some of the smart cities. >> Yuanhao Sun: Right. Kinds of things that you guys are doing at enormous scale. >> Yes, so we deploy our streaming productor for more than 300 cities in China. So this cluster is like connected together. So we use streaming capability to monitor the traffic and send the information from city to the central government. So all the, the sort of essential repoetry. So whenever illegal behavior on the road is detected, that information will be sent to the policeman, or the central repoetry within two second. Whenever you are seen by the camera in any place in China, their loads where we send out within two seconds. >> So the bad behavior is detected. It's identified as the location. The system also knows where the nearest police person is. And it sends a message and says, this car has performed something bad. >> Yeah and you should stop that car in the next station or in the next crossroad. Today there are tens of thousands policeman. They depends on this system for their daily work. >> Peter Burris: Interesting. >> So, just a question on, it sounds like one of your, sort of nearest competitors, in terms of, let's take the open source community, at least the APIs, and in their case open source, Waway. Have their been customers that tried to do a POC with you and with Waway, and said, well it took four months using the pure open source stuff, and it took, say, two weeks with your stack having, being much broader and deeper? Are any examples like that? >> There are quite a lot. We have more macro-share, like in financial services, we have about 100 bank users. So if we take all banks into account, for them they already use Hadoop. So we, our macro-share is above 60%. >> George Gilbert: 60. >> Yeah, in financial services. We usually do POC and, like run benchmarks. They are real workloads and usually it takes us three days or one week. They can found, we can speed up their workload very quickly. For Bank of China, they might go to their oracle workload to our platform. And they test our platform and the huave platform too. So the first thing is they cannot marry the whole oracle workload to open source Hadoop, because the missing features. We are able to support all this workloads with very minor modifications. So the modification takes only several hours. And we can finish the whole workload within two hours, but originally they take, usually take oracle more than one day, >> George Gilbert: Wow. >> more than ten hours to finish the workload. So it is very easy to see the benefits quickly. >> Now the you have a streaming product also with that same SQL interface. Are you going to see a migration of applications that used to be batch to more near real time or continuous, or will you see a whole new set of applications that weren't done before, because the latency wasn't appropriate? >> For streaming applications, real time cases they are mostly new applications, but if we are using storm api or spark streaming api, it is not so easy to develop your applications. And another issue is once you detect one new rule, you had to add those rules dynamically to your cluster. So to add to your printer, they do not have so many knowledge of writing scholar codes. They only know how to configure. Probably they are familiar with c-code. They just need to add one SQL statement to add a new rule. So that they can. >> In your system. >> Yeah, in our system. So it is much easier for them to program streaming applications. And for those customers who they don't have real time equations, they hope to do, like a real time data warehousing. They collect all this data from websites from their censors, like Petrol Channel, an oil company, the large oil company. They collect all the (mumbles) information directly to our streaming product. In the past, they just accredit to oracle and around the dashboard. So it only takes hours to see the results. But today, the application can be moved through our streaming product with only a few modifications, because they are all SQL statements. And this application becomes the real time. They can see the real time dashboard results in several seconds. >> So Yuanhao, you're number one in China. You're moving more aggressively to participate in the US market. What's the, last question, what's the biggest difference between being number one in China, the way that big data is being done in China versus the way you're encountering big data being done here, certainly in the US, for example? Is there a difference? >> I think there are some difference. Some a seem, katsumoto usually request a POC. But in China, they usually, I think they focus more on the results. They focus on what benefit they can gain from your product. So we have to prove them. So we have to hip them to my great application to see the benefits. I think in US, they focus more on technology than Chinese customers. >> Interesting, so they're more on technology here in the US, more in the outcome in China. Once again, Yuanhao Sun, from, ceo of Transwarp, thank you very much for being on The Cube. >> Thank you. And I'm Peter Burris with George Gilbert, my co-host, and we'll be back with more from big data SV, in San Jose. Come on over to the Forager, and spend some time with us. And we'll be back in a second. (light music)

Published Date : Mar 8 2018

SUMMARY :

Brought to you by Silicon Angle Media, over at the Forager eatery and place to come meander. So Yuanhao, the Transwarp as a company has become So that it can mimic, like oracle or the mutual, to demonstrate that you can actually pass the entire suite. And also, the benchmark required to upload the data, So I had the honor of traveling to Shanghai last year So this product is going to be raised you know, China Post for example, and to select the best items for you So you were talking about a couple of banks, Kinds of things that you guys are doing at enormous scale. from city to the central government. So the bad behavior is detected. or in the next crossroad. and it took, say, two weeks with your stack having, So if we take all banks into account, So the first thing is they cannot more than ten hours to finish the workload. Now the you have a streaming product also So to add to your printer, So it only takes hours to see the results. to participate in the US market. So we have to prove them. in the US, more in the outcome in China. Come on over to the Forager, and spend some time with us.

ENTITIES

Entity	Category	Confidence
Peter Burris	PERSON	0.99+
Shanghai	LOCATION	0.99+
George Gilbert	PERSON	0.99+
US	LOCATION	0.99+
China	LOCATION	0.99+
99 queries	QUANTITY	0.99+
three days	QUANTITY	0.99+
two weeks	QUANTITY	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
five months	QUANTITY	0.99+
San Jose	LOCATION	0.99+
China Post	ORGANIZATION	0.99+
Picture Channel	ORGANIZATION	0.99+
one week	QUANTITY	0.99+
six	QUANTITY	0.99+
four months	QUANTITY	0.99+
Los Angeles	LOCATION	0.99+
10 terabytes	QUANTITY	0.99+
last year	DATE	0.99+
today	DATE	0.99+
Today	DATE	0.99+
tomorrow	DATE	0.99+
more than one day	QUANTITY	0.99+
more than 300 cities	QUANTITY	0.99+
second part	QUANTITY	0.99+
two hours	QUANTITY	0.99+
less than 100 people	QUANTITY	0.99+
more than ten hours	QUANTITY	0.99+
Waway	ORGANIZATION	0.99+
Bank of China	ORGANIZATION	0.99+
third	QUANTITY	0.99+
Hadoop	TITLE	0.99+
Petrol Channel	ORGANIZATION	0.99+
three products	QUANTITY	0.98+
one new rule	QUANTITY	0.98+
hundreds	QUANTITY	0.98+
three categories	QUANTITY	0.98+
SQL	TITLE	0.98+
single	QUANTITY	0.98+
Transwarp	ORGANIZATION	0.98+
first	QUANTITY	0.98+
tens of thousands policeman	QUANTITY	0.98+
Yuanhao Sun	ORGANIZATION	0.98+
each application	QUANTITY	0.98+
two seconds	QUANTITY	0.98+
100 cluster	QUANTITY	0.97+
first thing	QUANTITY	0.97+
about 100 bank users	QUANTITY	0.97+
two second	QUANTITY	0.97+
each day	QUANTITY	0.97+
Big Data SV	ORGANIZATION	0.97+
The Cube	ORGANIZATION	0.96+
two major reasons	QUANTITY	0.95+
one	QUANTITY	0.95+
above 60%	QUANTITY	0.95+
early in May this year	DATE	0.94+
Source Channel	ORGANIZATION	0.93+
Big Data	ORGANIZATION	0.92+
Chinese	OTHER	0.9+
Strada	LOCATION	0.89+
second product category	QUANTITY	0.88+

Ron Bodkin, Google | Big Data SV 2018

>> Announcer: Live from San Jose, it's theCUBE. Presenting Big Data, Silicon Valley, brought to you by Silicon Angle Media and its ecosystem partners. >> Welcome back to theCUBE's continuing coverage of our event Big Data SV. I'm Lisa Martin, joined by Dave Vellante and we've been here all day having some great conversations really looking at big data, cloud, AI machine-learning from many different levels. We're happy to welcome back to theCUBE one of our distinguished alumni, Ron Bodkin, who's now the Technical Director of Applied AI at Google. Hey Ron, welcome back. >> It's nice to be back Lisa, thank you. >> Yeah, thanks for coming by. >> Thanks Dave. >> So you have been a friend of theCUBE for a long time, you've been in this industry and this space for a long time. Let's take a little bit of a walk down memory lane, your perspectives on Big Data Hadoop and the evolution that you've seen. >> Sure, you know so I first got involved in big data back in 2007. I was VP in generating a startup called QuantCast in the online advertising space. You know, we were using early versions of Hadoop to crunch through petabytes of data and build data science models and I saw a huge opportunity to bring those kind of capabilities to the enterprise. You know, we were working with early Hadoop vendors. Actually, at the time, there was really only one commercial vendor of Hadoop, it was Cloudera and we were working with them and then you know, others as they came online, right? So back then we had to spend a lot of time explaining to enterprises what was this concept of big data, why it was Hadoop as an open source could get interesting, what did it mean to build a data lake? And you know, we always said look, there's going to be a ton of value around data science, right? Putting your big data together and collecting complete information and then being able to build data science models to act in your business. So you know, the exciting thing for me is you know, now we're at a stage where many companies have put those assets together. You've got access to amazing cloud scale resources like we have at Google to not only work with great information, but to start to really act on it because you know, kind of in parallel with that evolution of big data was the evolution of the algorithms as well as the access to large amounts of digital data that's propelled, you know, a lot of innovation in AI through this new trend of deep learning that we're invested heavily in. >> I mean the epiphany of Hadoop when I first heard about it was bringing, you know, five megabytes of code to a petabyte of data as sort of the bromide. But you know, the narrative in the press has really been well, they haven't really lived up to expectations, the ROI has been largely a reduction on investment and so is that fair? I mean you've worked with practitioners, you know, all your big data career and you've seen a lot of companies transform. Obviously Google as a big data company is probably the best example of one. Do you think that's a fair narrative or did the big data hype fail to live up to expectations? >> I think there's a couple of things going on here. One is, you know, that the capabilities in big data have varied widely, right? So if you look at the way, for example, at Google we operate with big data tools that we have, they're extremely productive, work at massive scale, you know, with large numbers of users being able to slice and dice and get deep analysis of data. It's a great setup for doing machine learning, right? That's why we have things like BigQuery available in the cloud. You know, I'd say that what happened in the open source Hadoop world was it ended up settling in on more of the subset of use cases around how do we make it easy to store large amounts of data inexpensively, how do we offload ETL, how do we make it possible for data scientists to get access to raw data? I don't think that's as functional as what people really had imagined coming out of big data. But it's still served a useful function complementing what companies were already doing at their warehouse, right? So I'd say those efforts to collect big data and to make them available have really been a, they've set the stage for analytic value both through better building of analytic databases but especially through machine learning. >> And there's been some clear successes. I mean, one of them obviously is advertising, Google's had a huge success there. But much more, I mean fraud detection, you're starting to see health care really glom on. Financial services have been big on this, you know, maybe largely for marketing reasons but also risk, You know for sure, so there's been some clear successes. I've likened it to, you know, before you got to paint, you got to scrape and you got to, you put in caulking and so forth. And now we're in a position where you've got a corpus of data in your organization and you can really start to apply things like machine learning and artificial intelligence. Your thoughts on that premise? >> Yeah, I definitely think there's a lot of truth to that. I think some of it was, there was a hope, a lot of people thought that big data would be magic, that you could just dump a bunch of raw data without any effort and out would come all the answers. And that was never a realistic hope. There's always a level of you have to at least have some level of structure in the data, you have to put some effort in curating the data so you have valid results, right? So it's created a set of tools to allow scaling. You know, we now take for granted the ability to have elastic data, to have it scale and have it in the cloud in a way that just wasn't the norm even 10 years ago. It's like people were thinking about very brittle, limited amounts of data in silos was the norm, so the conversation's changed so much, we almost forget how much things have evolved. >> Speaking of evolution, tell us a little bit more about your role with applied AI at Google. What was the genesis of it and how are you working with customers for them to kind of leverage this next phase of big data and applying machine learning so that they really can identify, well monetize content and data and actually identify new revenue streams? >> Absolutely, so you know at Google, we really started the journey to become an AI-first company early this decade, a little over five years ago. We invested in the Google X team, you know, Jeff Dean was one of the leaders there, sort of to invest in, hey, these deep learning algorithms are having a big impact, right? Fei-Fei Li, who's now the Chief Scientist at Google Cloud was at Stanford doing research around how can we teach a computer to see and catalog a lot of digital data for visual purposes? So combining that with advances in computing with first GPUs and then ultimately we invested in specialized hardware that made it work well for us. The massive-scale TPU's, right? That combination really started to unlock all kinds of problems that we could solve with machine learning in a way that we couldn't before. So it's now become central to all kinds of products at Google, whether it be the biggest improvements we've had in search and advertising coming from these deep learning models but also breakthroughs, products like Google Photos where you can now search and find photos based on keywords from intelligence in a machine that looks at what's in the photo, right? So we've invested and made that a central part of the business and so what we're seeing is as we build up the cloud business, there's a tremendous interest in how can we take Google's capabilities, right, our investments in open source deep learning frameworks, TensorFlow, our investments in hardware, TPU, our scalable infrastructure for doing machine learning, right? We're able to serve a billion inferences a second, right? So we've got this massive capability we've built for our own products that we're now making available for customers and the customers are saying, "How do I tap into that? "How can I work with Google, how can I work with "the products, how can I work with the capabilities?" So the applied AI team is really about how do we help customers drive these 10x opportunities with machine learning, partnering with Google? And the reason it's a 10x opportunity is you've had a big set of improvements where models that weren't useful commercially until recently are now useful and can be applied. So you can do things like translating languages automatically, like recognizing speech, like having automated dialog for chat bots or you know, all kinds of visual APIs like our AutoML API where engineers can feed up images and it will train a model specialized to their need to recognize what you're looking for, right? So those types of advances mean that all kinds of business process can be reconceived of, and dramatically improved with automation, taking a lot of human drudgery out. So customers are like "That's really "exciting and at Google you're doing that. "How do we get that, right? "We don't know how to go there." >> Well natural language processing has been amazing in the last couple of years. Not surprising that Google is so successful there. I was kind of blown away that Amazon with Alexa sort of blew past Siri, right? And so thinking about new ways in which we're going to interact with our devices, it's clearly coming, so it leads me into my question on innovation. What's driven in your view, the innovation in the last decade and what's going to drive innovation the next 10 years? >> I think innovation is very much a function of having the right kind of culture and mindset, right? So I mean for us at Google, a big part of it is what we call 10x thinking, which is really focusing on how do you think about the big problem and work on something that could have a big impact? I also think that you can't really predict what's going to work, but there's a lot of interesting ideas and many of them won't pan out, right? But the more you have a culture of failing fast and trying things and at least being open to the data and give it a shot, right, and say "Is this crazy thing going to work?" That's why we have things like Google X where we invest in moonshots but that's where, you know, throughout the business, we say hey, you can have a 20% project, you can go work on something and many of them don't work or have a small impact but then you get things like Gmail getting created out of a 20% project. It's a cultural thing that you foster and encourage people to try things and be open to the possibility that something big is on your hands, right? >> On the cultural front, it sounds like in some cases depending on the enterprise, it's a shift, in some cases it's a cultural journey. The Google on Google story sounds like it could be a blueprint, of course, how do we do this? You've done this but how much is it a blueprint on the technology capitalizing on deep learning capabilities as well as a blueprint for helping organizations on this cultural journey to be actually being able to benefit and profit from this? >> Yeah, I mean that's absolutely right Lisa that these are both really important aspects, that there's a big part of the cultural journey. In order to be an AI-first company, to really reconceive your business around what can happen with machine learning, it's important to be a digital company, right? To have a mindset of making quick decisions and thinking about how data impacts your business and activating in real time. So there's a cultural journey that companies are going through. How do we enable our knowledge workers to do this kind of work, how do we think about our products in a new way, how do we reconceive, think about automation? There's a lot of these aspects that are cultural as well, but I think a big part of it is, you know, it's easy to get overwhelmed for companies but it's like you have pick somewhere, right? What's something you can do, what's a true north, what's an area where you can start to invest and get impact and start the journey, right? Start to do pilots, start to get something going. What we found, something I've found in my career has been when companies get started with the right first project and get some success, they can build on that success and invest more, right? Whereas you know, if you're not experimenting and trying things and moving, you're never going to get there. >> Momentum is key, well Ron, thank you so much for taking some time to stop by theCUBE. I wish we had more time to chat but we appreciate your time. >> No, it's great to be here again. >> See ya. >> We want to thank you for watching theCUBE live from our event, Big Data SV in San Jose. I'm Lisa Martin with Dave Vellante, stick around we'll be back with our wrap shortly. (relaxed electronic jingle)

Published Date : Mar 8 2018

SUMMARY :

brought to you by Silicon Angle Media We're happy to welcome back to theCUBE So you have been a friend of theCUBE for a long time, and then you know, others as they came online, right? was bringing, you know, five megabytes of code One is, you know, that the capabilities and you can really start to apply things like There's always a level of you have to at What was the genesis of it and how are you We invested in the Google X team, you know, been amazing in the last couple of years. we invest in moonshots but that's where, you know, on this cultural journey to be actually but I think a big part of it is, you know, Momentum is key, well Ron, thank you We want to thank you for watching theCUBE live

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Ron Bodkin	PERSON	0.99+
Lisa Martin	PERSON	0.99+
2007	DATE	0.99+
Jeff Dean	PERSON	0.99+
Ron	PERSON	0.99+
Dave	PERSON	0.99+
Google	ORGANIZATION	0.99+
Lisa	PERSON	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Fei-Fei Li	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
20%	QUANTITY	0.99+
one	QUANTITY	0.99+
Hadoop	TITLE	0.99+
five megabytes	QUANTITY	0.99+
Siri	TITLE	0.99+
theCUBE	ORGANIZATION	0.99+
QuantCast	ORGANIZATION	0.99+
10x	QUANTITY	0.99+
both	QUANTITY	0.99+
Google X	ORGANIZATION	0.98+
first project	QUANTITY	0.97+
Silicon Valley	LOCATION	0.97+
Gmail	TITLE	0.97+
Big Data	ORGANIZATION	0.97+
first	QUANTITY	0.96+
One	QUANTITY	0.96+
10 years ago	DATE	0.95+
BigQuery	TITLE	0.94+
early this decade	DATE	0.94+
last couple of years	DATE	0.94+
Big Data SV	EVENT	0.94+
Alexa	TITLE	0.94+
Big Data SV 2018	EVENT	0.93+
Cloudera	ORGANIZATION	0.91+
last decade	DATE	0.89+
Google Cloud	ORGANIZATION	0.87+
over five years ago	DATE	0.85+
first company	QUANTITY	0.82+
10x opportunities	QUANTITY	0.82+
one commercial	QUANTITY	0.81+
next 10 years	DATE	0.8+
first GPUs	QUANTITY	0.78+
Big Data Hadoop	TITLE	0.68+
AutoML	TITLE	0.68+
Google X	TITLE	0.63+
Applied	ORGANIZATION	0.62+
a second	QUANTITY	0.61+
petabytes	QUANTITY	0.57+
petabyte	QUANTITY	0.56+
billion inferences	QUANTITY	0.54+
TensorFlow	TITLE	0.53+
Stanford	ORGANIZATION	0.51+
Google Photos	ORGANIZATION	0.42+

Seth Dobrin, IBM | Big Data SV 2018

>> Announcer: Live from San Jose, it's theCUBE. Presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and it's ecosystem partners. >> Welcome back to theCUBE's continuing coverage of our own event, Big Data SV. I'm Lisa Martin, with my cohost Dave Vellante. We're in downtown San Jose at this really cool place, Forager Eatery. Come by, check us out. We're here tomorrow as well. We're joined by, next, one of our CUBE alumni, Seth Dobrin, the Vice President and Chief Data Officer at IBM Analytics. Hey, Seth, welcome back to theCUBE. >> Hey, thanks for having again. Always fun being with you guys. >> Good to see you, Seth. >> Good to see you. >> Yeah, so last time you were chatting with Dave and company was about in the fall at the Chief Data Officers Summit. What's kind of new with you in IBM Analytics since then? >> Yeah, so the Chief Data Officers Summit, I was talking with one of the data governance people from TD Bank and we spent a lot of time talking about governance. Still doing a lot with governance, especially with GDPR coming up. But really started to ramp up my team to focus on data science, machine learning. How do you do data science in the enterprise? How is it different from doing a Kaggle competition, or someone getting their PhD or Masters in Data Science? >> Just quickly, who is your team composed of in IBM Analytics? >> So IBM Analytics represents, think of it as our software umbrella, so it's everything that's not pure cloud or Watson or services. So it's all of our software franchise. >> But in terms of roles and responsibilities, data scientists, analysts. What's the mixture of-- >> Yeah. So on my team I have a small group of people that do governance, and so they're really managing our GDPR readiness inside of IBM in our business unit. And then the rest of my team is really focused on this data science space. And so this is set up from the perspective of we have machine-learning engineers, we have predictive-analytics engineers, we have data engineers, and we have data journalists. And that's really focus on helping IBM and other companies do data science in the enterprise. >> So what's the dynamic amongst those roles that you just mentioned? Is it really a team sport? I mean, initially it was the data science on a pedestal. Have you been able to attack that problem? >> So I know a total of two people that can do that all themselves. So I think it absolutely is a team sport. And it really takes a data engineer or someone with deep expertise in there, that also understands machine-learning, to really build out the data assets, engineer the features appropriately, provide access to the model, and ultimately to what you're going to deploy, right? Because the way you do it as a research project or an activity is different than using it in real life, right? And so you need to make sure the data pipes are there. And when I look for people, I actually look for a differentiation between machine-learning engineers and optimization. I don't even post for data scientists because then you get a lot of data scientists, right? People who aren't really data scientists, and so if you're specific and ask for machine-learning engineers or decision optimization, OR-type people, you really get a whole different crowd in. But the interplay is really important because most machine-learning use cases you want to be able to give information about what you should do next. What's the next best action? And to do that, you need decision optimization. >> So in the early days of when we, I mean, data science has been around forever, right? We always hear that. But in the, sort of, more modern use of the term, you never heard much about machine learning. It was more like stats, math, some programming, data hacking, creativity. And then now, machine learning sounds fundamental. Is that a new skillset that the data scientists had to learn? Did they get them from other parts of the organization? >> I mean, when we talk about math and stats, what we call machine learning today has been what we've been doing since the first statistics for years, right? I mean, a lot of the same things we apply in what we call machine learning today I did during my PhD 20 years ago, right? It was just with a different perspective. And you applied those types of, they were more static, right? So I would build a model to predict something, and it was only for that. It really didn't apply it beyond, so it was very static. Now, when we're talking about machine learning, I want to understand Dave, right? And I want to be able to predict Dave's behavior in the future, and learn how you're changing your behavior over time, right? So one of the things that a lot of people don't realize, especially senior executives, is that machine learning creates a self-fulfilling prophecy. You're going to drive a behavior so your data is going to change, right? So your model needs to change. And so that's really the difference between what you think of as stats and what we think of as machine learning today. So what we were looking for years ago is all the same we just described it a little differently. >> So how fine is the line between a statistician and a data scientist? >> I think any good statistician can really become a data scientist. There's some issues around data engineering and things like that but if it's a team sport, I think any really good, pure mathematician or statistician could certainly become a data scientist. Or machine-learning engineer. Sorry. >> I'm interested in it from a skillset standpoint. You were saying how you're advertising to bring on these roles. I was at the Women in Data Science Conference with theCUBE just a couple of days ago, and we hear so much excitement about the role of data scientists. It's so horizontal. People have the opportunity to make impact in policy change, healthcare, etc. So the hard skills, the soft skills, mathematician, what are some of the other elements that you would look for or that companies, enterprises that need to learn how to embrace data science, should look for? Someone that's not just a mathematician but someone that has communication skills, collaboration, empathy, what are some of those, openness, to not lead data down a certain, what do you see as the right mix there of a data scientist? >> Yeah, so I think that's a really good point, right? It's not just the hard skills. When my team goes out, because part of what we do is we go out and sit with clients and teach them our philosophy on how you should integrate data science in the enterprise. A good part of that is sitting down and understanding the use case. And working with people to tease out, how do you get to this ultimate use case because any problem worth solving is not one model, any use case is not one model, it's many models. How do you work with the people in the business to understand, okay, what's the most important thing for us to deliver first? And it's almost a negotiation, right? Talking them back. Okay, we can't solve the whole problem. We need to break it down in discreet pieces. Even when we break it down into discreet pieces, there's going to be a series of sprints to deliver that. Right? And so having these soft skills to be able to tease that in a way, and really help people understand that their way of thinking about this may or may not be right. And doing that in a way that's not offensive. And there's a lot of really smart people that can say that, but they can come across at being offensive, so those soft skills are really important. >> I'm going to talk about GDPR in the time we have remaining. We talked about in the past, the clocks ticking, May the fines go into effect. The relationship between data science, machine learning, GDPR, is it going to help us solve this problem? This is a nightmare for people. And many organizations aren't ready. Your thoughts. >> Yeah, so I think there's some aspects that we've talked about before. How important it's going to be to apply machine learning to your data to get ready for GDPR. But I think there's some aspects that we haven't talked about before here, and that's around what impact does GDPR have on being able to do data science, and being able to implement data science. So one of the aspects of the GDPR is this concept of consent, right? So it really requires consent to be understandable and very explicit. And it allows people to be able to retract that consent at any time. And so what does that mean when you build a model that's trained on someone's data? If you haven't anonymized it properly, do I have to rebuild the model without their data? And then it also brings up some points around explainability. So you need to be able to explain your decision, how you used analytics, how you got to that decision, to someone if they request it. To an auditor if they request it. Traditional machine learning, that's not too much of a problem. You can look at the features and say these features, this contributed 20%, this contributed 50%. But as you get into things like deep learning, this concept of explainable or XAI becomes really, really important. And there were some talks earlier today at Strata about how you apply machine learning, traditional machine learning to interpret your deep learning or black box AI. So that's really going to be important, those two things, in terms of how they effect data science. >> Well, you mentioned the black box. I mean, do you think we'll ever resolve the black box challenge? Or is it really that people are just going to be comfortable that what happens inside the box, how you got to that decision is okay? >> So I'm inherently both cynical and optimistic. (chuckles) But I think there's a lot of things we looked at five years ago and we said there's no way we'll ever be able to do them that we can do today. And so while I don't know how we're going to get to be able to explain this black box as a XAI, I'm fairly confident that in five years, this won't even be a conversation anymore. >> Yeah, I kind of agree. I mean, somebody said to me the other day, well, it's really hard to explain how you know it's a dog. >> Seth: Right (chuckles). But you know it's a dog. >> But you know it's a dog. And so, we'll get over this. >> Yeah. >> I love that you just brought up dogs as we're ending. That's my favorite thing in the world, thank you. Yes, you knew that. Well, Seth, I wish we had more time, and thanks so much for stopping by theCUBE and sharing some of your insights. Look forward to the next update in the next few months from you. >> Yeah, thanks for having me. Good seeing you again. >> Pleasure. >> Nice meeting you. >> Likewise. We want to thank you for watching theCUBE live from our event Big Data SV down the street from the Strata Data Conference. I'm Lisa Martin, for Dave Vellante. Thanks for watching, stick around, we'll be rick back after a short break.

Published Date : Mar 8 2018

SUMMARY :

brought to you by SiliconANGLE Media Welcome back to theCUBE's continuing coverage Always fun being with you guys. Yeah, so last time you were chatting But really started to ramp up my team So it's all of our software franchise. What's the mixture of-- and other companies do data science in the enterprise. that you just mentioned? And to do that, you need decision optimization. So in the early days of when we, And so that's really the difference I think any good statistician People have the opportunity to make impact there's going to be a series of sprints to deliver that. in the time we have remaining. And so what does that mean when you build a model Or is it really that people are just going to be comfortable ever be able to do them that we can do today. I mean, somebody said to me the other day, But you know it's a dog. But you know it's a dog. I love that you just brought up dogs as we're ending. Good seeing you again. We want to thank you for watching theCUBE

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Seth	PERSON	0.99+
Dave	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Seth Dobrin	PERSON	0.99+
20%	QUANTITY	0.99+
50%	QUANTITY	0.99+
TD Bank	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
two people	QUANTITY	0.99+
tomorrow	DATE	0.99+
IBM Analytics	ORGANIZATION	0.99+
two things	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
one model	QUANTITY	0.99+
five years	QUANTITY	0.98+
20 years ago	DATE	0.98+
Big Data SV	EVENT	0.98+
five years ago	DATE	0.98+
GDPR	TITLE	0.98+
theCUBE	ORGANIZATION	0.98+
one	QUANTITY	0.98+
Strata Data Conference	EVENT	0.97+
today	DATE	0.97+
first statistics	QUANTITY	0.95+
CUBE	ORGANIZATION	0.94+
Women in Data Science Conference	EVENT	0.94+
both	QUANTITY	0.94+
Chief Data Officers Summit	EVENT	0.93+
Big Data SV 2018	EVENT	0.93+
couple of days ago	DATE	0.93+
years	DATE	0.9+
Forager Eatery	ORGANIZATION	0.9+
first	QUANTITY	0.86+
Watson	TITLE	0.86+
Officers Summit	EVENT	0.74+
Data Officer	PERSON	0.73+
SV	EVENT	0.71+
President	PERSON	0.68+
Strata	TITLE	0.67+
Big Data	ORGANIZATION	0.66+
earlier today	DATE	0.65+
Silicon Valley	LOCATION	0.64+
years	QUANTITY	0.6+
Chief	EVENT	0.44+
Kaggle	ORGANIZATION	0.43+

Maribel Lopez, Lopez Research | Big Data SV 2018

>> Narrator: Live, from San Jose. It's theCUBE. Presenting Big Data, Silicon Valley. Brought to you by SiliconAngle Media, and its ecosystem partners. >> Welcome come back to theCUBE, we are live in San Jose, at our event, Big Data SV. I'm Lisa Martin. And we are down the street from the Strata Data Conference. We've had a great day so far, talking with a lot of folks from different companies that are all involved in the big data unraveling process. I'm excited to welcome back to theCUBE one of our extinguished alumni, Maribel Lopez; the founder and principal analyst at Lopez research. Welcome back to theCUBE. >> Thank you. I'm excited to be here. >> Yeah, so you've been, a startup conference started a couple days ago. What are some the trends and things that you're hearing that are really kind of top of mind for not just the customers that are attending, the companies that are creating or are trying to create solutions around this big data challenge and opportunity? >> Yeah absolutely, I mean I think we talked a lot about data in the years past. How do you gather the data? How do you store the data? How you might want to process the data? This year seems to be all about how do I make something interesting happen with the data? How do I make an intelligent inside? How do I cure prostate cancer? How do I make sure I can classify images? It's a really different show, and we've also changed some of the terminology a lot more in machine learning now, and artificial intelligence, and frankly a lot of discussion around ethics. So it's been very interesting. >> Data ethics you mean? >> Data ethics; how do we do privacy? How do we maintain the right level of data so that we don't have bias in our data? How do we get Diversity Inclusion going? Lots really interesting powerful human topics, not just about the data. >> I love that the human topics especially where you know AI and ML come into play. You talked, data diversity. Or bias that we were just at that women and data science conference a couple of days ago talking to a lot of female leaders in in data science, computer science, both in academia as well as in industry. And one of the interesting topics about the gender disparity, is the fact that that is limiting the analyses on data in terms of, there may be a few perspectives looking on it. So there's an inherent bias there. So that's one issue, and I'd like to get your thoughts on that. Another is with that thought, lack of thought diversity, I guess I would say going into analyzing the data, companies might be potentially limiting themselves on the types of products that they can create, how to monetize the data and actually drive new revenue streams. On the kind of thought diversity will start there. What are some of the things that you're hearing, and what are some of your recommendations for your clients on how to get some of that bias out of data analysis? >> Yes it's interesting. One is trying to find multiple sources of data. So there's data that you have and that you own. But there is a wide range of openly available data now. There's some challenges around making sure that that data is clean before you integrated with your data. But basically, diversifying your data sources with third party data is one big thing that we're talking about. In previous analytical generations, I think we talked a lot about how to have a hypothesis, and you were trying to prove a hypothesis. And now I think we're trying to be a little more open and looser, and not really lead the data where per se, but try to find the right patterns and correlations in the data. And then just awareness in general. Like we don't believe we're biased. But if we have data that's biased who gets put into the system. So we have to really be thoughtful about what we put into the system. So I think that those three things combined have really changed the way people are looking at it. And there's a lot of awareness now around that. Because we assume at some point, the machines might be making certain decisions for us. And we want to make sure that they have the best information to do that. And that they don't limit our opportunities as a society. >> Where are companies in terms of the clients that you see, culturally in terms of embracing the openness? 'Cause you're right! From a scientific scientific method perspective. People go into, I'm going to hypothesize this because I think I'm going to find this. And maybe wanting the data to say this. Where are companies, we'll say enterprises, in becoming culturally more open to not leading the data somewhere and bringing up bias? >> Well, there are two interesting things here, right? I think there are some people that have gone down the data route for a while now, sort of the industry leading companies. They're in this mindset now trying to make sure they don't leave the data, they don't create biases in the data. They have ways to explain how the data and the analysis of the learning came about, not just for regulation, but so that they can make sure they ethically done the right thing. But then I think there's the other 95 percent of companies that they're not even there yet. They don't know that this is a problem yet. So they're still dealing with the "I've got a pool in the data." "I've got to do something with it." They don't even know what they want to do with it let alone if it's biased or not. So we're not quite at the leading the witness point there with a lot of organizations. >> But that's something that you expect to see maybe down the road. >> I'm hoping we'll get ahead of it. I'm really hoping that we'll get ahead of it. >> It's a good positive outlook on it, yeah? >> I think that, I think because the real analysis of the data problem in a big machine learning, deep learning way is so new, and the people are actually out seeking guidance, that there is an opportunity to get ahead of it. The second thing that's happening is, people don't have data scientists, right? So they don't necessarily have the people that can code this. So what they're doing now, is they're depending on the vendor landscape to provide them with an entry level set of tools. So if you're Microsoft, if you're Google, if you're Amazon, you're trying very hard to make sure that you're giving tools that have the right ethics in them, and that can help kickstart people's Machine Learning efforts. So I think that's going to be a real win for us. And we talked a lot today at the Strata conference about how, oh you don't have enough images, you can't do that. Or you don't have enough data, you can't do that. Or you don't have enough data scientists. And some of what came back is that, some of the best and the brightest have coded some things that you can start to use to kickstart that will get you to a better place than you ever could have started with yourself. So that was pretty exciting, you know. Transfer learning as an example of taking you know, image node from Google and some algorithms, and using those to take your images and try to figure out if somebody has Alzheimer's or not. Encode things Alzheimer's or not characteristic. So, very cool stuff, very exciting and nice to see that we've got some minds working on this for us. >> Yeah, definitely. Where you're meeting with clients that don't have a data scientist, or chief analytics officer? Sounds like a lot of the technologies need to or some have built in sort of enablement for a difference data citizen within a company. If you talking to clients that don't have a data scientist or data science team, who are your constituents there? Where are companies that don't maybe have that skill gap? Who do they go to in their organization to start evaluating the data that they have to get to know what and start to understand what their potential is? >> Yeah, there's a couple of places people go. They go to their business decision analytics people. So the people that were working with their BI dashboards, for example. The second place they go is to the cloud computing guys, cuz we're hearing a lot about cloud computing and maybe I can buy some of the stuff from the cloud. I'm just going to roll up and get all my machine learning in the cloud, right? So we're not there yet. So the biggest thing that I talk to people about right now is, what are the realities around Machine Learning and AI? We've made tremendous progress but you know you read the newspaper, and something is going to get rid of your job, and AI's going to take over the world, and we're kind of far from that reality. First of all it's very dystopian and negative. But even if it weren't that, you know what you can do today, is not that. So there's a lot of stages in between. So the first thing is just trying to get people comfortable with. No you can't just buy one product, and throw in some data, and you've got everything you need. >> Right. >> We're not there yet. But we're getting closer. You can add some components, you can get some new information, you could do some new correlations. So just getting a reality and grounding of where we are, and that we have a lot of opportunity, and that it's moving very fast. that's the other thing. >> Right. >> IT leaders are used to all evaluated once a year, evaluated once every couple of years. These things are moving in monthly increments. Like really huge changes in product categories. So you kind of have to keep on top of it to make sure you know what's available to you. >> Right. And if they don't they miss out on not only the ability to monetize data streams, but essentially going out of business. Because somebody will come in may be more nimble and agile, and be able to do it faster. >> Yeah. And we already saw those with the digital native companies that started born in the cloud companies, we used to call them. Well, now, everybody can be using the cloud. So the question then is like what's the next wave of that? The next wave of that is around understanding how to use your data, understanding how to get third-party data, and being able to rapidly make decisions and change models based on that. >> One of the things that's interesting about big data is you know it was a big buzzword, and it seems to be becoming less of a buzzword now. Gartner even was saying I think the number was 85 percent of big data projects and I think that's more in tested environments fail. And I often say, "Failure in a lot of cases is not a bad effort." Because it spawns genesis of new products, new ideas, et cetera. But when you're talking with clients who go, alright, we've embraced Hadoop, we've got this big data lake, now it's turning really swampy. We don't know-- >> We've got lakes, we've got oceans, we've got ponds. Yeah. >> Right. What's the conversation there where you're helping a customer clean that swamp up, get broader visibility across their datasets and enable different lines of business. Not just you know, the BI folks or the cloud folks or IT. But marketing, logistics, sales. What's that conversation like to clean up the swamp and do more enablement for visibility? >> I think one of the things that we got really hung up on was, you know, creating a data ocean, right? We're going to bring everything all in one place, it's going to be this one massive data source. >> It sounded great. >> It's going to be awesome. And this is not the reality of the world, right? So I think the first thing in the cleaning up that we have to do, is being able to figure out what's the source of truth for any given dataset that somebody needs. So you see 15 salespeople walk in and they all have different versions of the data that shouldn't happen. >> Right. >> So we need to get to the point where they know where the source of truth is for that data. The second is sort of governance around the data. We spent a lot of time dumping the data but not a lot of time in terms of getting governance around who can access it, what they can do with it, for how long they could have access to it. Is it just internal? Is it internal and external? So I think that's the second thing around like harassing and haranguing the swamps, and the lakes and the ponds, right? And then assuming that you do that, I think the other thing is, You know, if you have a hammer everything looks like a nail. Well, in reality you know when you construct things you have nails, you have screws, you have bolts, right? And picking the right tool for the job is something that the IT leadership has to work with. And the only way that they get that right is to work very closely with the different lines of business so they can understand the problem. Because the business leader knows the problem, they don't know the solution. If you put them together which we've talked about forever, frankly. But now I think we're seeing more imperatives for those two to work closely together. And sometimes it's even driven by security, just to make sure that the data isn't leaking into other places or that it's secure and that they've met regulatory compliance. So we're in a much better space than we were two, three, five years ago cuz we're thinking about the real problems now. Not just how do you collect it, and how do you store it. But how do we actually make it an actionable manageable set of solutions. >> Exactly, and make it work for the business. Well Maribel, I wish we had more time, but thank you so much for stopping by theCUBE, sharing the insights that you've seen. Not just at a conference, but also with your clients. >> Thank you. >> We want to thank you for watching theCUBE. Again, I'm Lisa Martin, live from Big Data SV, in Downtown San Jose. Get involved in the conversation #BigDataSV. Come see us at the Forager Eatery & Tasting Room, and I'll be right back with our next guest. (upbeat music)

Published Date : Mar 8 2018

SUMMARY :

Brought to you by SiliconAngle Media, that are all involved in the big data unraveling process. I'm excited to be here. just the customers that are attending, a lot about data in the years past. so that we don't have bias in our data? and I'd like to get your thoughts on that. and looser, and not really lead the data where per se, that you see, culturally in terms of embracing the openness? and the analysis of the learning came about, But that's something that you expect to see I'm really hoping that we'll get ahead of it. and the brightest have coded some things that they have to get to know and maybe I can buy some of the stuff from the cloud. and that we have a lot of opportunity, to make sure you know and be able to do it faster. that started born in the cloud companies, and it seems to be becoming less of a buzzword now. we've got oceans, we've got ponds. What's that conversation like to clean up the swamp that we got really hung up on was, you know, So you see 15 salespeople walk in and they all have is something that the IT leadership has to work with. sharing the insights that you've seen. and I'll be right back with our next guest.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Maribel	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Maribel Lopez	PERSON	0.99+
San Jose	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
15 salespeople	QUANTITY	0.99+
SiliconAngle Media	ORGANIZATION	0.99+
85 percent	QUANTITY	0.99+
95 percent	QUANTITY	0.99+
Gartner	ORGANIZATION	0.99+
one issue	QUANTITY	0.99+
two	QUANTITY	0.99+
today	DATE	0.99+
one	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
both	QUANTITY	0.98+
Strata Data Conference	EVENT	0.98+
Big Data SV	ORGANIZATION	0.98+
second thing	QUANTITY	0.98+
one product	QUANTITY	0.98+
first thing	QUANTITY	0.98+
three things	QUANTITY	0.97+
once a year	QUANTITY	0.97+
second	QUANTITY	0.96+
This year	DATE	0.96+
One	QUANTITY	0.96+
First	QUANTITY	0.96+
theCUBE	ORGANIZATION	0.96+
Downtown San Jose	LOCATION	0.96+
Strata	EVENT	0.94+
two interesting things	QUANTITY	0.94+
five years ago	DATE	0.94+
Big Data	ORGANIZATION	0.9+
couple days ago	DATE	0.87+
couple of days ago	DATE	0.85+
once	QUANTITY	0.78+
#BigDataSV	ORGANIZATION	0.75+
one place	QUANTITY	0.75+
second place	QUANTITY	0.75+
every couple of years	QUANTITY	0.75+
Forager	LOCATION	0.7+
Data	ORGANIZATION	0.69+
Narrator: Live	TITLE	0.69+
wave	EVENT	0.68+
years past	DATE	0.66+
three	QUANTITY	0.66+
Alzheimer	OTHER	0.66+
Big	EVENT	0.65+
Hadoop	TITLE	0.64+
Big Data SV	EVENT	0.59+
Eatery & Tasting Room	ORGANIZATION	0.57+
Lopez Research	ORGANIZATION	0.55+
SV 2018	EVENT	0.54+
thing	QUANTITY	0.53+
Lopez	ORGANIZATION	0.49+

Kunal Agarwal, Unravel Data | Big Data SV 2018

>> Announcer: Live from San Jose, it's theCube! Presenting Big Data: Silicon Valley Brought to you by SiliconANGLE Media and its ecosystem partners. (techno music) >> Welcome back to theCube. We are live on our first day of coverage at our event BigDataSV. I am Lisa Martin with my co-host George Gilbert. We are at this really cool venue in downtown San Jose. We invite you to come by today, tonight for our cocktail party. It's called Forager Tasting Room and Eatery. Tasty stuff, really, really good. We are down the street from the Strata Data Conference, and we're excited to welcome to theCube a first-time guest, Kunal Agarwal, the CEO of Unravel Data. Kunal, welcome to theCube. >> Thank you so much for having me. >> So, I'm a marketing girl. I love the name Unravel Data. (Kunal laughs) >> Thank you. >> Two year old company. Tell us a bit about what you guys do and why that name... What's the implication there with respect to big data? >> Yeah, we are a application performance management company. And big data applications are just very complex. And the name Unravel is all about unraveling the mysteries of big data and understanding why things are not performing well and not really needing a PhD to do so. We're simplifying application performance management for the big data stack. >> Lisa: Excellent. >> So, so, um, you know, one of the things that a lot of people are talking about with Hadoop, originally it was this cauldron of innovation. Because we had the "let a thousand flowers bloom" in terms of all the Apache projects. But then once we tried to get it into operation, we discovered there's a... >> Kunal: There's a lot of problems. (Kunal laughs) >> There's an overhead, there's a downside to it. >> Maybe tell us, tell us why you both need to know, you need to know how people have done this many, many times. >> Yeah. >> How you need to learn from experience and then how you can apply that even in an environment where someone hasn't been doing it for that long. >> Right. So, if I back a little bit. Big data is powerful, right? It's giving companies an advantage that they never had, and data's an asset to all of these different companies. Now they're running everything from BI, machine learning, artificial intelligence, IOT, streaming applications on top of it for various reasons. Maybe it is to create a new product to understand the customers better, etc., But as you rightly pointed out, when you start to implement all of these different applications and jobs, it's very, very hard. It's because big data is very complex. With that great power comes a lot of complexity, and what we started to see is a lot of companies, while they want to create these applications and provide that differentiation to their company, they just don't have enough expertise as well in house to go and write good applications, maintain these applications, and even manage the underlying infrastructure and cluster that all these applications are running on. So we took it upon ourselves where we thought, Hey, if we simplify application performance management and if we simplify ongoing management challenges, then these companies would run more big data applications, they would be able to expand their use cases, and not really be fearful of, Hey, we don't know how to go and solve these problems. Do we actually rely on our system that is so complex and new? And that's the gap the Unravel fills, which is we monitor and manage not only one componenent of the big data ecosystem, but like you pointed out, it's a, it's a full zoo of all of these systems. You have Hadoop, and you have Spark, and you have Kafka for data injection. You may have some NoSQL systems and newer MPP platforms as well. So the vision of Unravel is really to be that one place where you can come in and understand what's happening with your applications and your system overall and be able to resolve those problems in an automatic, simple way. >> So, all right, let's start at the concrete level of what a developer might get out of >> Kunal: Right. >> something that's wrapped in Unravel and then tell us what the administrator experiences. >> Kunal: Absolutely. So if you are a big data developer you've got in a business requirement that, Hey, go and make this application that understands our customers better, right? They may choose a tool of their liking, maybe Hive, maybe Spark, maybe Kafka for data injection. And what they'll do is they'll write an app first in dev, in their dev environment or the QA environment. And they'll say, Hey, maybe this application is failing, or maybe this application is not performing as fast as I want it to, or even worse that this application is starting to hog a lot of resources, which may slow down my other applications. Now to understand what's causing these kind of problems today developers really need a PhD to go and decipher them. They have to look at tons of law rogs, uh, raw logs metrics, configuration settings and then try to stitch the story up in their head, trying to figure out what is the effect, what is the cause? Maybe it's this problem, maybe it's some other problem. And then do trial and error to try, you know to solving that particular issue. Now what we've seen is big data developers come in variety of flavors. You have the hardcore developers who truly understand Spark and Hadoop and everything, but then 80% of the people submitting these applications are data scientist or business analysts, who may understand SQL, who may know Python, but don't necessarily know what distributed computing and parallel processing and all of these things really are, and where can inefficiencies and problems really lie. So we give them this one view, which will connect all of these different data sources and then tell them in plain English, this is the problem, this is why this problem happened, and this is how you can go and resolve it, thereby getting them unstuck and making it very simple for them to go in and get the performance that they're getting. >> So, these, these, um, they're the developers up front and you're giving them a whole new, sort of, toolchain or environment to solve the operational issues. >> Kunal: Right. >> So that the, if it's DevOps, its really dev is much more sufficient. >> Yes, yes, I mean, all companies want to run fast. They don't want to be slowed down. If you have a problem today, they'll file a ticket, it'll go to the operations team, you wait a couple of days to get some more information back. That just means your business has slowed down. If things are simple enough where the application developers themselves can resolve a lot of these issues, that'll get the business unstuck and get them moving on further. Now, to the other point which you were asking, which is what about the operations and the app support people? So, Unravel's a great tool for them too because that helps them see what's happening holistically in the cluster. How are other applications behaving with each other? It's usually a multitenant, multiapplication environment that these big data jobs are running on. So, is my apps slowing down George's apps? Am I stealing resources from your applications? More so, not just about an individual application issue itself. So Unravel will give you visibility into each app, as well as the overall cluster to help you understand cluster-wide problems. >> Love to get at, maybe peel apart your target audience a little bit. You talked about DevOps. But also the business analysts, data scientists, and we talk about big data. Data is, has such tremendous power to fuel a company and, you know, like you said use it to deliver and, create and deliver new products. Are you talking with multiple audiences within a company? Do you start at DevOps and they bring in their peers? Or do you actually start, maybe, at the Chief Data Officer level? What's that kind of entrance for Unravel? >> So the word I use to describe this is DataOps, instead of DevOps, right? So in the older world you had developers, and you had operations people. Over here you have a data team and operations people, and that data team can comprise of the developers, the data scientists, the business analysts, etc., as well. But you're right. Although we first target the operations role because they have to manage and monitor the system and make sure everything is running like a well-oiled machine, they are now spreading it out to be end-users, meaning the developers themselves saying, "Don't come to me for every problem. "Look at Unravel, try solve it here, "and if you cannot, then come to me." This is all, again, improving agility within the company, making sure that people have the necessary tools and insights to carry on with their day. >> Sounds like an enabler, >> Yeah, absolutely. >> That operations would push down to the DevOp, the developers themselves. >> And even the managers and the CDOs, for example, they want to see their ROI that they're getting from their big data investments. They want to see, they have put in these millions of dollars, have got an infrastructure and these services set up, but how are we actually moving the needle forward? Are there any applications that we're actually putting in business, and is that driving any business value? So we will be able to give them a very nice dashboard helping them understand what kind of throughput are you getting from your system, how many applications were you able to develop last week and onboard to your production environment? And what's the rate of innovation that's really happening inside your company on those big data ecosystems? >> It sort of brings up an interesting question on two prongs. One is the well-known, but inexact number about how many big data projects, >> Kunal: Yeah, yeah. >> I don't know whether they fail or didn't pay off. So there's going in and saying, "Hey, we can help you manage this "because it was too complicated." But then there's also the, all the folks who decided, "Well, we really don't want "to run it all on-prem. "We're not going to throw away everything we did there, "but we're going to also put a lot of new investment >> Kunal: Exactly, exactly. >> in the cloud. Now, Wikibon has a term for that, which true private cloud, which is when you have the operational processes that you use in the public cloud and you can apply them on-prem. >> Right. >> George: But there's not many products that help you do that. How can Unravel work...? >> Kunal: That's a very good questions, George. We're seeing the world move more and more to a cloud environment, or I should say an on-demand environment where you're not so bothered about the infrastructure and the services, but you want Spark as a dial tone. You want Kafka as a dial tone. You want a machine-learning platform as a dial tone. You want to come in there, you want to put in your data, and you want to just start running it. Unravel has been designed from the ground up to monitor and manage any of these environments. So, Unravel can solve problems for your applications running on-premise and similarly all the applications that are running on cloud. Now, on the cloud there are other levels of problems as well so, of course, you'd have applications that are slow, applications that are failing; we can solve those problems. But if you look at a cloud environment, a lot of these now provide you an autoscaling capability, meaning, Hey, if this app doesn't run in the amount of time that we were hoping it to run, let's add extra hardware and run this application. Well, if you just keep throwing machines at the problem, it's not going to solve your issue. Now, it doesn't decrease the time that it will take linearly with how many servers that you're actually throwing in there, so what we can help companies understand is what is the resource requirement of a particular application? How should we be intelligently allocating resources to make sure that you're able to meet your time SLAs, your constraints of, here I need to finish this with x number of minutes, but at the same time be intelligent about how much cost you're spending over there. Do you actually need 500 containers to go and run this app? Well, you may have needed 200. How do you know that? So, Unravel will also help you get efficient with your run, not just faster, but also can it be a good multitenant citizen, can it use limited resources to actually run this applications as well? >> So, Kunal, some of the things I'm hearing from a customer's standpoint that are potential positive business outcomes are internal: performance boost. >> Kunal: Yeah. >> It also sounds like, sort of... productivity improvements internally. >> And then also the opportunity to have the insight to deliver new products, but even I'm thinking of, you know, helping make a retailer, for example, be able to do more targeted marketing, so >> the business outcomes and the impact that Unravel can make really seem to have pretty strong internal and external benefits. >> Kunal: Yes. >> Is there a favorite customer story, (Kunal laughs) don't have to mention names, that you really think speaks to your capabilities? >> So, 100% Improving performance is a very big factor of what Unravel can do. Decreasing costs by improving productivity, by limiting the amount of resources that you're using, is a very, very big factor. Now, amongst all of these companies that we work with, one key factor is improving reliability, which means, Hey, it's fine that he can speed up this application, but sometimes I know the latency that I expect from an app, maybe it's a second, maybe it's a minute, depending on the type of application. But what businesses cannot tolerate is this app taking five x amount more time today. If it's going to finish in a minute, tell me it'll finish in a minute and make sure it finishes in a minute. And this is a big use case for all of the big data vendors because a lot of the customers are moving from Teradata, or from Vertica, or from other relation databases, on to Hortonworks or Cloudera or Amazon EMR. Why? Because it's one tenth the amount of cost for running these workloads. But, all the customers get frustrated and say, "I don't mind paying 10 x more money, "but because over there it used to work. "Over here, there are just so many complications, "and I don't have reliability with these applications." So that's a big, big factor of, you know, how we actually help these customers get value out of the Unravel product. >> Okay, so, um... A question I'm, sort of... why aren't there so many other Unravels? >> Kunal: Yeah. (Kunal laughs) >> From what I understood from past conversations. >> Kunal: Yeah. >> You can only really build the models that are at the heart of your capabilities based on tons and tons of telemetry >> Kunal: Yeah. >> that cloud providers or, or, sort of, internet scale service providers have accumulated in that, because they all have sort of a well-known set of configurations and well-known kind of typology. In other words, there're not a million degrees of freedom on any particular side that you can, you have a well-scoped problem, and you have tons of data. So it's easier to build the models. So who, who else could do this? >> Yeah, so the difference between Unravel and other monitoring products is Unravel is not a monitoring product. It's an intelligent performance management suite. What that means is we don't just give you graphs and metrics and say, "Here are all the raw information, "you go figure it out." Instead, we have to take it a step further where we are actually giving people answers. In order to develop something like that, you need full stack information; that's number one. Meaning information from applications all the way down to infrastructure and everything in between. Why? Because problems can lie anywhere. And if you don't have that full stack info, you're blind-siding yourself, or limiting the scope of the problems that you can actually search for. Secondly is, like you were rightly pointing out, how do I create answers from all this raw data? So you have to think like how an expert with big data would think, which is if there is a problem what are the kinds of checks, balances, places that that person would look into, and how would that person establish that this is indeed the root cause of the problem today? And then, how would that person actually resolve this particular problem? So, we have a big team of scientists, researchers. In fact, my co-founder is a professor of computer science at Duke University who has been researching data-based optimization techniques for the last decade. We have about 80 plus publications in this area, Starfish being one of them. We have a bunch of other publications, which talk about how do you automate problem discovery, root cause analysis, as well as resolution, to get best performance out of these different databases? And you're right. A lot of work has gone on the research side, but a lot of work has gone in understanding the needs of the customers. So we worked with some of the biggest companies out there, which have some of the biggest big data clusters, to learn from them, what are some everyday, ongoing management challenges that you face, and then taking that problem to our datasets and figuring out, how can we automate problem discovery? How can we proactively spot a lot of these errors? I joke around and I tell people that we're big data for big data. Right? All these companies that we serve, they are gathering all of this data, and they're trying to find patterns, and they're trying to find, you know, some sort of an insight with their data. Our data is system generated data, performance data, application data, and we're doing the exact same thing, which is figuring out inefficiencies, problems, cause and effect of things, to be able to solve it in a more intelligent, smart way. >> Well, Kunal, thank you so much for stopping by theCube >> Kunal: Of course. >> And sharing how Unravel Data is helping to unravel the complexities of big data. (Kunal laughs) >> Thank you so much. Really appreciate it. >> Now you're a Cube almuni. (Kunal laughs) >> Absolutely. Thanks so much for having me. >> Kunal, thanks. >> Yeah, and we want to thank you for watching the Cube. I'm Lisa Martin with George Gilbert. We are live at our own event BigData SV in downtown San Jose, California. Stick around. George and I will be right back with our next guest. (quiet crowd noise) (techno music)

Published Date : Mar 8 2018

SUMMARY :

Brought to you by SiliconANGLE Media We invite you to come by today, I love the name Unravel Data. Tell us a bit about what you guys do and not really needing a PhD to do so. So, so, um, you know, one of the things that Kunal: There's a lot of problems. there's a downside to it. tell us why you both need to know, and then how you can apply that even in an environment of the big data ecosystem, but like you pointed out, and then tell us what the administrator experiences. and this is how you can go and resolve it, and you're giving them a whole new, sort of, So that the, if it's DevOps, Now, to the other point which you were asking, to fuel a company and, you know, like you said So in the older world you had developers, DevOp, the developers themselves. and is that driving any business value? One is the well-known, but inexact number "Hey, we can help you manage this and you can apply them on-prem. that help you do that. and you want to just start running it. So, Kunal, some of the things I'm hearing It also sounds like, sort of... that Unravel can make really seem to have So that's a big, big factor of, you know, A question I'm, sort of... and you have tons of data. What that means is we don't just give you graphs to unravel the complexities of big data. Thank you so much. Now you're a Cube almuni. Thanks so much for having me. Yeah, and we want to thank you

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Kunal Agarwal	PERSON	0.99+
George	PERSON	0.99+
Kunal	PERSON	0.99+
Lisa	PERSON	0.99+
80%	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Vertica	ORGANIZATION	0.99+
Unravel Data	ORGANIZATION	0.99+
Teradata	ORGANIZATION	0.99+
today	DATE	0.99+
500 containers	QUANTITY	0.99+
One	QUANTITY	0.99+
Two year	QUANTITY	0.99+
two prongs	QUANTITY	0.99+
last week	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
tonight	DATE	0.99+
200	QUANTITY	0.99+
first day	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Spark	TITLE	0.99+
Cloudera	ORGANIZATION	0.99+
each app	QUANTITY	0.99+
Python	TITLE	0.98+
a minute	QUANTITY	0.98+
English	OTHER	0.98+
one	QUANTITY	0.98+
Duke University	ORGANIZATION	0.98+
five	QUANTITY	0.98+
Kafka	TITLE	0.98+
Hadoop	TITLE	0.98+
BigData SV	EVENT	0.97+
first-time	QUANTITY	0.97+
Strata Data Conference	EVENT	0.97+
one key factor	QUANTITY	0.96+
millions of dollars	QUANTITY	0.95+
about 80 plus publications	QUANTITY	0.95+
SQL	TITLE	0.95+
DevOps	TITLE	0.94+
first	QUANTITY	0.94+
BigDataSV	EVENT	0.94+
tons and tons	QUANTITY	0.94+
both	QUANTITY	0.94+
Unravel	ORGANIZATION	0.93+
Secondly	QUANTITY	0.91+
million degrees	QUANTITY	0.91+
San Jose, California	LOCATION	0.91+
Hive	TITLE	0.91+
last decade	DATE	0.91+
Unravel	TITLE	0.9+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for boxes: