Day Two Wrap Up | PentahoWorld 2017

>> Narrator: Live from Orlando, Florida it's theCUBE covering PentahoWorld 2017. Brought to you by Hitachi Vantara. >> Welcome back to sunny Orlando everybody. This is theCUBE, the leader in live tech coverage, and this is our second day covering PentahoWorld 2017. theCUBE was here in 2015 when Pentaho had just been recently acquired by Hitachi. We then, let's see, around September timeframe we saw Hitachi rebrand, Hitachi Data Systems rebrand as Hitachi Vantara, bringing together three components of its business, the Hitachi Data Systems business, the Hitachi Insights business, and of course, the Pentaho Analytics platform. We heard yesterday from Brian Householder, the president and COO of Hitachi Vantara, what the strategy was. I thought he was a very crisp, clear presenter. The strategy made a lot of sense, it resonated. Obviously a lot of execution to be done. And then subsequently at the last two days we've heard largely from Pentaho practitioners who are applying this end to end analytics platform to really transform their businesses, to really become data driven supporting those digital transformations. So pretty positive story overall. A lot of work to be done. We got to see how this whole edge to outcome plays out. Sounds good. There's got to be some execution there. We got to see the ecosystem grow for sure. These guys got a great story. This conference should explode. >> It's really a validation for Pentaho. They've been on the market for more than a decade now as the spearhead for the open source analytics revolution in business analytics, and in predictive modeling, and in data integration, all of it open source. And they've come very far and they're really a blue chip solution program. I think this show has been a great validation of Pentaho's portfolio presence in the market. Now Hitachi Vantara has a gem of a core asset. Clearly, the storage market, the data center converged infrastructure, the core Hitachi Data Systems product lines, are starting to experience the low growth that such a mature space experiences. And clearly they're placing a strong bet on Hitachi Vantara that the IoT, that the edge analytics market, will just boom wide open. Hitachi Insight Group, which was only created last year by their corporate parent, was chartered to explore opportunities in IoT. They've got the Lumata platform. They had, Hitachi Next, their conference last month, focused on IoT. I think that's really the capstone, the Lumata portfolio, in this overall story. Now, I think what we're hearing this week is that great, they've got the components, the building blocks, of potential growth, but I don't think they're going to be able to achieve takeoff growth until such time, Hitachi Vantara, they have a stronger, more credible reach out to the developer community, specifically the developers who are building the AI and machine learning for deployment to the edge. That will require to have credibility in that space. Clearly it's going to have to be the new set of frameworks, such as TensorFlow, and MXNet, and Fee-an-o, and so forth. They're going to need some sort of a modeling framework or abstraction from it that sits on top of the Pentaho platform or really across all of their offerings, including Lumata, and enables a developer to using, the mainstream application developer to use code, whether it be Python or R or Java, whatever, to build the deep learning and AI models at the highest level of abstraction, the business level of abstraction, then to automatically compile those models, which are computational graphs, down to formats that are optimized and efficient to run on devices of all sorts, chip sets of all sorts, that are increasingly resource constrained. They're not there yet. I'm not hearing that overall developer story at this show. I think they've got a lot of smart people, including Brian, pushing them in that direction. Hopefully next year's PentahoWorld or however they may rebrand this show, I think they'll probably have more of that put together, but we'll keep on waiting to see. >> And that's something that I pushed on a little bit this week. In particular, that requires a whole new go to market where the starting point is developers and then you're nurturing those developers. And certainly Pentaho has experience with community editions, but that was more to get enterprise buyers to kind of try before they buy. As you know well, Jim, the developer community is, they're very fickle, they're persnickety, they're demanding, and they're super smart, and they can be your best advocates or they'll just ignore you. That's just kind of the way it is with developers. And if you can appeal to them you can get a foothold in markets. We've seen it. Look at what Microsoft has done, look at what Amazon has done, certainly Docker, you know, on and on and on. >> Community marketing that's full bore (mumbles) user groups, developer days, hackathons, the whole nine yards, I'm not seeing a huge emphasis on community marketing in that really evangelistic sense. They need to go there seriously. They need to win the hearts and minds of the next generation developer, the next generation developer who actually won't care about whether it's TensorFlow backends or the other ones. What they will care is the high level framework, and really a collaborative framework, that's a solution provider gives them for their teams to collaborate on building and training and deploying all this stuff. I'm not hearing from this solution provider, devops really, here this year. Hopefully in the coming years there will be. Other vendors are a bit further along than they are. We see a bit further along IBM is. We see a bit further along like Cloudera and others are in putting together really a developer friendly ecosystem of components within a broader data lake framework. >> Yeah, and that's not been the historical Pentaho DNA. However, as you know, to reach out, have a community effort to reach out to developers requires resources and commitment, and it's not a one shot deal. But, it also requires a platform, and what we're seeing today is the formation of that. The reformation of Hitachi into Hitachi Vantara with a lot of resources that has a vision of a platform, of which Pentaho is a critical component, but it's going to take a lot of effort, a lot of cultivating. I presume they're having those conversations internally. They're not ready to have them externally, which is I presume why they're not having them. But that's something that we're going to certainly watch for in the coming years. What else? You gave a talk this afternoon. >> Yeah, AI is Eating the Edge, and it was well received. In fact, when I prepared my thoughts and my research about a month ago for this event I was thinking, "Am I way too far ahead?" This is Pentaho. I've been of course familiar with them since their inception. I thought, "Are there other users? "Are there developers? "Is their community going deep into AI "and all the IoT stuff?" And the last day or so here at this event it's like, "Whoa, everybody here is into that. "They know this stuff." So, not only was I relieved that I wouldn't have to explain the ABCs of all that, they were ahead of me in terms of the questions I got. The questions are, once again, what framework should we adopt for AI, the whole TensorFlow, all those framework wars, which I think are sort of overblown and they will be fairly soon, it'll be irrelevant, but those kinds of questions. Those are actually developer level questions that people are just here and they're coming to me with. >> Well, you know, I tell you, I'm no expert in frameworks, but my advice would be whatever framework you adopt you're probably not going to be using that same framework down the road. So you have to be flexible as an organization. A lot of technical leaders tell me this is look, technology is going to come and it's going to go. We got to have great people. We've got to be able to respond to the market requirements. We have to have processes that allow us to be proactive and responsive, and that your choice of framework should ensure that it doesn't constrict you in those areas. >> And you know, the framework that actually appeals to this crowd, including the people in my room, it's a wiki bot framework, it's also what Brian Hopkins of Forrester presented, the three tier architecture. There's the edge devices. There are the gateways or hubs. There's the cloud. We call them primary, secondary, tertiaries. Whatever you call them, you put different data, you put different analytics on each of those tiers. And then really in many ways in a modular fashion then you begin to orchestrate with Kubernetes and so forth these AI infused apps and these distributed architectures, like self driving vehicles or whatever. And the buzz I've been getting here, including in my session, everybody is saying, "Yeah, that's exactly the way to go." In other words, thinking in those terms prevents you as a developer from thinking that AI has to be some monolithic frigging stack on one single node. No, it actually has to be massively parallel and distributed, because these are potentially very compute intensive applications. I think there's a growing realization in the developer community that when you're talking about developing AI you're really talking about developing two core workloads. There's the inferencing, which is where the magic happens in terms of predictions and classifications, but even more resource consumptive is the training that has to happen in the cloud, and that's data, that's exabytes, petabytes intensive potentially. That's compute intensive. Very different workload. That definitely needs to happen in the cloud primarily. There's a little bit of federated training that goes out to the edge, but that's really the exception right now. So there's a growing realization in the developer community that boy, we better get a really good platform for training. And actually they could leverage, we've seen it in our research of wiki bot is that, many AI developers, many deep learning developers, actually leverage their Spark clusters for training of TensorFlow and so forth, because of in memory massive parallelism, so forth and so on. I think there will be a growing realization in the developer community that the investments they've been making in Hadoop and Spark will just be leveraged for this growing stack, for training if nothing else. >> Well, in 8.0 that was sort of the big buzz here. And you and I talked at the open with Rebecca, our other co-host, about 8.0 A lot of incremental improvements. But you know what, in talking to customers that's kind of what they want. They want Pentaho to do a good job of incorporating, curating, open source content, open source platforms and products, bringing them into their system, and making sure that their customers can take advantage of them. That's what they consistently kept asking for. They weren't freaked out about lack of AI and lack of deep learning and ML and Weka is fine. Now maybe it's a blind spot, I don't know. >> No, no, actually I've had 24 hours since they announced to chew on it. In fact, I have a SiliconANGLE article going up fairly soon with essentially my trip report and my basic takeaway. And actually what I like about 8.0 is that it focuses on streaming, bringing open source analytic streaming more completely into the Pentaho data integration platform, in other words, their stronger interoperability with Spark streaming, with Kafka, and so forth, but also they have the ability within 8.0 to better match realtime streaming workloads to execution engines in a distributed fabric. In other words, what I think that represents not only in terms of Hitachi Vantara's portfolio, but in terms of where the industry is going with all things to do with big data applications whether or not they involve AI is streaming is coming into the mainstream, pun intended, and data at rest platforms are starting to become marginalized in a lot of applications. In other words, Hadoop is data at rest par excellence, so are a fair number of other no SQL platforms. Those are not going away. Those are the core of your data lakes. But most development is being developed now, most AI and machine learning is being developed for streaming environments that increasingly are edge oriented. So Pentaho, Hitachi Vantara, for 8.0 have put in the right incremental features for the market that lies ahead. So in many ways I think that was actually a well thought out release for this particular event. >> Great. Okay, some of the highlights here. We had a lot of different industries, gaming, we had experts on autonomous vehicles, we had the NASDAQ guys on, that was a very interesting segment, the German police interview you did, the chief data officer of community colleges in Indiana. So, a lot of diversity, which underscores the platformness of Pentaho. It's not some industry specific system. It is a horizontal capabilities platform. Final thoughts on the show, some interesting things that you saw, things you learned? >> Yeah, on the show itself, they did a really good job. Hitachi Vantara, of course it's a new brand, but it's an old company, and it's even an old established set of product teams that have come together in a hurry essentially, though it's really been two years since the acquisition. They did a really good job of presenting a unified go to market message. That's a good start They've done a good job of the fact that they had these two shows in a rapid sequence, Hitachi Next, which was IoT and Lumata, but it was Hitachi Vantara, and now this one where it's all data analytics. The fact that here in the peak of fall event season they had these two shows really highlighting their innovations and their romance for those two core of their portfolio, and have done a good job of positioning themselves in each case, that shows that the teams are orchestrating well in terms of at least go to market presenting their value prop. I think in terms of the actual, we've had a lot of great customer and partner interviews on this show. And I think, you mentioned gaming first, I wasn't actually on the gaming related CUBE interview, but gaming is a hot, of course it's a hot, hot market for AI increasingly. A lot of AI that gets developed now for lots of applications involves simulations of whatever scenario you're building, including like autonomous vehicles. So gaming is in many ways a set of practices that are well established and mature that are becoming fundamental to development of all AI, because you're developing synthetic data based on simulation environments. The fact that Hitachi Vantara has strong presence as a data provider in the gaming market I think in many ways indicates that they've got ... It's a crowded marketplace. They have much larger competitors and deeper pocketed, but I think the fact is they've got all the piece parts needed to be a roaring success in this new era, and they've got strong and very loyal customers I'm discovering, not discovering, I've known this all along. But, since I've rejoined the analysts' space it's been revalidated that Pentaho how strong in blue chip they are. Now that they're a new brand in a new era, they're turning themselves around fairly well. I don't think that they'll be isolated by ... Clearly, I mean, with AI ... AI right now belongs to AWS and Microsoft and Google and IBM to some degree. We have to recognize that the Hitachi Vantaras of the world right now are still a second tier in that arena. They probably have to hitch their wagon to at least one of those core cloud providers as a core partner going forward to really prevail. >> Dave: Which they can do. >> Yeah, they can do. >> Alright. Jim, thanks very much for closing with me. Thanks to you all for watching. theCUBE puts out a lot of content. You can go to SiliconAngle.com to see all the news. theCUBE.net is where we host all these videos. Wikibon.com is our research site, so check that out, as well. We've got CrowdChats going on, CrowdChat.net. It's just unbelievable. >> Unbelievable. >> Rush of content. We're all about the data, we're all about sharing, so check those sites out. Thanks very much to the crew here. Great job. And next week a lot going on. We're in New York City. We've got some stuff going on there. Want to thank our sponsor, without whom this show, this CUBE show, would not be possible, Hitachi Vantara slash Pentaho. >> Thank you to sunny Orlando. It's great and wonderful. >> This has been theCUBE at PentahoWorld 2017. We'll see you next time. Thanks for watching. (techno music)

Published Date : Oct 27 2017

SUMMARY :

Brought to you by Hitachi Vantara. and of course, the Pentaho Analytics platform. the mainstream application developer to use code, That's just kind of the way it is with developers. of the next generation developer, Yeah, and that's not been the historical Pentaho DNA. that people are just here and they're coming to me with. that same framework down the road. that has to happen in the cloud, and making sure that their customers all things to do with big data applications the German police interview you did, The fact that here in the peak of fall event season Thanks to you all for watching. We're all about the data, Thank you to sunny Orlando. We'll see you next time.

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Brian	PERSON	0.99+
Brian Hopkins	PERSON	0.99+
Hitachi	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Jim	PERSON	0.99+
Brian Householder	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Indiana	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Rebecca	PERSON	0.99+
2015	DATE	0.99+
last year	DATE	0.99+
New York City	LOCATION	0.99+
Hitachi Vantara	ORGANIZATION	0.99+
Pentaho	ORGANIZATION	0.99+
24 hours	QUANTITY	0.99+
Hitachi Data Systems	ORGANIZATION	0.99+
two shows	QUANTITY	0.99+
last month	DATE	0.99+
two years	QUANTITY	0.99+
yesterday	DATE	0.99+
Python	TITLE	0.99+
Java	TITLE	0.99+
Hitachi Insight Group	ORGANIZATION	0.99+
each case	QUANTITY	0.99+
Lumata	ORGANIZATION	0.99+
Orlando, Florida	LOCATION	0.99+
Forrester	ORGANIZATION	0.99+
next year	DATE	0.99+
second day	QUANTITY	0.99+
next week	DATE	0.99+
NASDAQ	ORGANIZATION	0.98+
two core	QUANTITY	0.98+
Spark	TITLE	0.98+
this week	DATE	0.98+
theCUBE	ORGANIZATION	0.97+
R	TITLE	0.97+
three tier	QUANTITY	0.97+
this year	DATE	0.97+
second tier	QUANTITY	0.97+
Orlando	LOCATION	0.96+
Hitachi Insights	ORGANIZATION	0.96+
8.0	QUANTITY	0.96+
more than a decade	QUANTITY	0.94+
theCUBE.net	OTHER	0.94+
first	QUANTITY	0.94+
PentahoWorld 2017	EVENT	0.94+
one shot	QUANTITY	0.93+
each	QUANTITY	0.92+
this afternoon	DATE	0.92+
Kafka	TITLE	0.91+
today	DATE	0.91+
Cloudera	ORGANIZATION	0.9+
three components	QUANTITY	0.89+
Hadoop	TITLE	0.89+
a month ago	DATE	0.89+
TensorFlow	TITLE	0.87+
wiki	TITLE	0.86+
German	OTHER	0.85+
MXNet	TITLE	0.85+

Arik Pelkey, Pentaho - BigData SV 2017 - #BigDataSV - #theCUBE

>> Announcer: Live from Santa Fe, California, it's the Cube covering Big Data Silicon Valley 2017. >> Welcome, back, everyone. We're here live in Silicon Valley in San Jose for Big Data SV in conjunct with stratAHEAD Hadoop part two. Three days of coverage here in Silicon Valley and Big Data. It's our eighth year covering Hadoop and the Hadoop ecosystem. Now expanding beyond just Hadoop into AI, machine learning, IoT, cloud computing with all this compute is really making it happen. I'm John Furrier with my co-host George Gilbert. Our next guest is Arik Pelkey who is the senior director of product marketing at Pentaho that we've covered many times and covered their event at Pentaho world. Thanks for joining us. >> Thank you for having me. >> So, in following you guys I'll see Pentaho was once an independent company bought by Hitachi, but still an independent group within Hitachi. >> That's right, very much so. >> Okay so you guys some news. Let's just jump into the news. You guys announced some of the machine learning. >> Exactly, yeah. So, Arik Pelkey, Pentaho. We are a data integration and analytics software company. You mentioned you've been doing this for eight years. We have been at Big Data for the past eight years as well. In fact, we're one of the first vendors to support Hadoop back in the day, so we've been along for the journey ever since then. What we're announcing today is really exciting. It's a set of machine learning orchestration capabilities, which allows data scientists, data engineers, and data analysts to really streamline their data science processes. Everything from ingesting new data sources through data preparation, feature engineering which is where a lot of data scientists spend their time through tuning their models which can still be programmed in R, in Weka, in Python, and any other kind of data science tool of choice. What we do is we help them deploy those models inside of Pentaho as a step inside of Pentaho, and then we help them update those models as time goes on. So, really what this is doing is it's streamlining. It's making them more productive so that they can focus their time on things like model building rather than data preparation and feature engineering. >> You know, it's interesting. The market is really active right now around machine learning and even just last week at Google Next, which is their cloud event, they had made the acquisition of Kaggle, which is kind of an open data science. You mentioned the three categories: data engineer, data science, data analyst. Almost on a progression, super geek to business facing, and there's different approaches. One of the comments from the CEO of Kaggle on the acquisition when we wrote up at Sylvan Angle was, and I found this fascinating, I want to get your commentary and reaction to is, he says the data science tools are as early as generations ago, meaning that all the advances and open source and tooling and software development is far along, but now data science is still at that early stage and is going to get better. So, what's your reaction to that, because this is really the demand we're seeing is a lot of heavy lifing going on in the data science world, yet there's a lot of runway of more stuff to do. What is that more stuff? >> Right. Yeah, we're seeing the same thing. Last week I was at the Gardener Data and Analytics conference, and that was kind of the take there from one of their lead machine learning analysts was this is still really early days for data science software. So, there's a lot of Apache projects out there. There's a lot of other open source activity going on, but there are very few vendors that bring to the table an integrated kind of full platform approach to the data science workflow, and that's what we're bringing to market today. Let me be clear, we're not trying to replace R, or Python, or MLlib, because those are the tools of the data scientists. They're not going anywhere. They spent eight years in their phD program working with these tools. We're not trying to change that. >> They're fluent with those tools. >> Very much so. They're also spending a lot of time doing feature engineering. Some research reports, say between 70 and 80% of their time. What we bring to the table is a visual drag and drop environment to do feature engineering a much faster, more efficient way than before. So, there's a lot of different kind of desperate siloed applications out there that all do interesting things on their own, but what we're doing is we're trying to bring all of those together. >> And the trends are reduce the time it takes to do stuff and take away some of those tasks that you can use machine learning for. What unique capabilities do you guys have? Talk about that for a minute, just what Pentaho is doing that's unique and added value to those guys. >> So, the big thing is I keep going back to the data preparation part. I mean, that's 80% of time that's still a really big challenge. There's other vendors out there that focus on just the data science kind of workflow, but where we're really unqiue is around being able to accommodate very complex data environments, and being able to onboard data. >> Give me an example of those environments. >> Geospatial data combined with data from your ERP or your CRM system and all kinds of different formats. So, there might be 15 different data formats that need to be blended together and standardized before any of that can really happen. That's the complexity in the data. So, Pentaho, very consistent with everything else that we do outside of machine learning, is all about helping our customers solve those very complex data challenges before doing any kind of machine learning. One example is one customer is called Caterpillar Machine Asset Intelligence. So, their doing predictive maintenance onboard container ships and on ferry's. So, they're taking data from hundreds and hundreds of sensors onboard these ships, combining that kind of operational sensor data together with geospatial data and then they're serving up predictive maintenance alerts if you will, or giving signals when it's time to replace an engine or complace a compressor or something like that. >> Versus waiting for it to break. >> Versus waiting for it to break, exactly. That's one of the real differentiators is that very complex data environment, and then I was starting to move toward the other differentiator which is our end to end platform which allows customers to deliver these analytics in an embedded fashion. So, kind of full circle, being able to send that signal, but not to an operational system which is sometimes a challenge because you might have to rewrite the code. Deploying models is a really big challenge within Pentaho because it is this fully integrated application. You can deploy the models within Pentaho and not have to jump out into a mainframe environment or something like that. So, I'd say differentiators are very complex data environments, and then this end to end approach where deploying models is much easier than ever before. >> Perhaps, let's talk about alternatives that customers might see. You have a tool suite, and others might have to put together a suite of tools. Maybe tell us some of the geeky version would be the impendent mismatch. You know, like the chasms you'd find between each tool where you have to glue them together, so what are some of those pitfalls? >> One of the challenges is, you have these data scientists working in silos often times. You have data analysts working in silos, you might have data engineers working in silos. One of the big pitfalls is not really collaborating enough to the point where they can do all of this together. So, that's a really big area that we see pitfalls. >> Is it binary not collaborating, or is it that the round trip takes so long that the quality or number of collaborations is so drastically reduced that the output is of lower quality? >> I think it's probably a little bit of both. I think they want to collaborate but one person might sit in Dearborn, Michigan and the other person might sit in Silicon Valley, so there's just a location challenge as well. The other challenge is, some of the data analysts might sit in IT and some of the data scientists might sit in an analytics department somewhere, so it kind of cuts across both location and functional area too. >> So let me ask from the point of view of, you know we've been doing these shows for a number of years and most people have their first data links up and running and their first maybe one or two use cases in production, very sophisticated customers have done more, but what seems to be clear is the highest value coming from those projects isn't to put a BI tool in front of them so much as to do advanced analytics on that data, apply those analytics to inform a decision, whether a person or a machine. >> That's exactly right. >> So, how do you help customers over that hump and what are some other examples that you can share? >> Yeah, so speaking of transformative. I mean, that's what machine learning is all about. It helps companies transform their businesses. We like to talk about that at Pentaho. One customer kind of industry example that I'll share is a company called IMS. IMS is in the business of providing data and analytics to insurance companies so that the insurance companies can price insurance policies based on usage. So, it's a usage model. So, IMS has a technology platform where they put sensors in a car, and then using your mobile phone, can track your driving behavior. Then, your insurance premium that month reflects the driving behavior that you had during that month. In terms of transformative, this is completely upending the insurance industry which has always had a very fixed approach to pricing risk. Now, they understand everything about your behavior. You know, are you turning too fast? Are you breaking too fast, and they're taking it further than that too. They're able to now do kind of a retroactive look at an accident. So, after an accident, they can go back and kind of decompose what happened in the accident and determine whether or not it was your fault or was in fact the ice on the street. So, transformative? I mean, this is just changing things in a really big way. >> I want to get your thoughts on this. I'm just looking at some of the research. You know, we always have the good data but there's also other data out there. In your news, 92% of organizations plan to deploy more predictive analytics, however 50% of organizations have difficulty integrating predictive analytics into their information architecture, which is where the research is shown. So my question to you is, there's a huge gap between the technology landscapes of front end BI tools and then complex data integration tools. That seems to be the sweet spot where the value's created. So, you have the demand and then front end BI's kind of sexy and cool. Wow, I could power my business, but the complexity is really hard in the backend. Who's accessing it? What's the data sources? What's the governance? All these things are complicated, so how do you guys reconcile the front end BI tools and the backend complexity integrations? >> Our story from the beginning has always been this one integrated platform, both for complex data integration challenges together with visualizations, and that's very similar to what this announcement is all about for the data science market. We're very much in line with that. >> So, it's the cart before the horse? Is it like the BI tools are really driven by the data? I mean, it makes sense that the data has to be key. Front end BI could be easy if you have one data set. >> It's funny you say that. I presented at the Gardner conference last week and my topic was, this just in: it's not about analytics. Kind of in jest, but it drove a really big crowd. So, it's about the data right? It's about solving the data problem before you solve the analytics problem whether it's a simple visualization or it's a complex fraud machine learning problem. It's about solving the data problem first. To that quote, I think one of the things that they were referencing was the challenging information architectures into which companies are trying to deploy models and so part of that is when you build a machine learning model, you use R and Python and all these other ones we're familiar with. In order to deploy that into a mainframe environment, someone has to then recode it in C++ or COBOL or something else. That can take a really long time. With our integrated approach, once you've done the feature engineering and the data preparation using our drag and drop environment, what's really interesting is that you're like 90% of the way there in terms of making that model production ready. So, you don't have to go back and change all that code, it's already there because you used it in Pentaho. >> So obviously for those two technologies groups I just mentioned, I think you had a good story there, but it creates problems. You've got product gaps, you've got organizational gaps, you have process gaps between the two. Are you guys going to solve that, or are you currently solving that today? There's a lot of little questions in there, but that seems to be the disconnect. You know, I can do this, I can do that, do I do them together? >> I mean, sticking to my story of one integrated approach to being able to do the entire data science workflow, from beginning to end and that's where we've really excelled. To the extent that more and more data engineers and data analysts and data scientists can get on this one platform even if their using R and WECCA and Python. >> You guys want to close those gaps down, that's what you guys are doing, right? >> We want to make the process more collaborative and more efficient. >> So Dave Alonte has a question on CrowdChat for you. Dave Alonte was in the snowstorm in Boston. Dave, good to see you, hope you're doing well shoveling out the driveway. Thanks for coming in digitally. His question is HDS has been known for mainframes and storage, but Hitachi is an industrial giant. How is Pentaho leveraging Hitatchi's IoT chops? >> Great question, thanks for asking. Hitatchi acquired Pentaho about two years ago, this is before my time. I've been with Pentaho about ten months ago. One of the reasons that they acquired Pentaho is because a platform that they've announced which is called Lumata which is their IoT platform, so what Pentaho is, is the analytics engine that drives that IoT platform Lumata. So, Lumata is about solving more of the hardware sensor, bringing data from the edge into being able to do the analytics. So, it's an incredibly great partnership between Lumata and Pentaho. >> Makes an eternal customer too. >> It's a 90 billion dollar conglomerate so yeah, the acquisition's been great and we're still very much an independent company going to market on our own, but we now have a much larger channel through Hitatchi's reps around the world. >> You've got IoT's use case right there in front of you. >> Exactly. >> But you are leveraging it big time, that's what you're saying? >> Oh yeah, absolutely. We're a very big part of their IoT strategy. It's the analytics. Both of the examples that I shared with you are in fact IoT, not by design but it's because there's a lot of demand. >> You guys seeing a lot of IoT right now? >> Oh yeah. We're seeing a lot of companies coming to us who have just hired a director or vice president of IoT to go out and figure out the IoT strategy. A lot of these are manufacturing companies or coming from industries that are inefficient. >> Digitizing the business model. >> So to the other point about Hitachi that I'll make, is that as it relates to data science, a 90 billion dollar manufacturing and otherwise giant, we have a very deep bench of phD data scientists that we can go to when there's very complex data science problems to solve at customer sight. So, if a customer's struggling with some of the basic how do I get up and running doing machine learning, we can bring our bench of data scientist at Hitatchi to bear in those engagements, and that's a really big differentiator for us. >> Just to be clear and one last point, you've talked about you handle the entire life cycle of modeling from acquiring the data and prepping it all the way through to building a model, deploying it, and updating it which is a continuous process. I think as we've talked about before, data scientists or just the DEV ops community has had trouble operationalizing the end of the model life cycle where you deploy it and update it. Tell us how Pentaho helps with that. >> Yeah, it's a really big problem and it's a very simple solution inside of Pentaho. It's basically a step inside of Pentaho. So, in the case of fraud let's say for example, a prediction might say fraud, not fraud, fraud, not fraud, whatever it is. We can then bring that kind of full lifecycle back into the data workflow at the beginning. It's a simple drag and drop step inside of Pentaho to say which were right and which were wrong and feed that back into the next prediction. We could also take it one step further where there has to be a manual part of this too where it goes to the customer service center, they investigate and they say yes fraud, no fraud, and then that then gets funneled back into the next prediction. So yeah, it's a big challenge and it's something that's relatively easy for us to do just as part of the data science workflow inside of Pentaho. >> Well Arick, thanks for coming on The Cube. We really appreciate it, good luck with the rest of the week here. >> Yeah, very exciting. Thank you for having me. >> You're watching The Cube here live in Silicon Valley covering Strata Hadoop, and of course our Big Data SV event, we also have a companion event called Big Data NYC. We program with O'Reilley Strata Hadoop, and of course have been covering Hadoop really since it's been founded. This is The Cube, I'm John Furrier. George Gilbert. We'll be back with more live coverage today for the next three days here inside The Cube after this short break.

Published Date : Mar 14 2017

SUMMARY :

it's the Cube covering Big Data Silicon Valley 2017. and the Hadoop ecosystem. So, in following you guys I'll see Pentaho was once You guys announced some of the machine learning. We have been at Big Data for the past eight years as well. One of the comments from the CEO of Kaggle of the data scientists. environment to do feature engineering a much faster, and take away some of those tasks that you can use So, the big thing is I keep going back to the data That's the complexity in the data. So, kind of full circle, being able to send that signal, You know, like the chasms you'd find between each tool One of the challenges is, you have these data might sit in IT and some of the data scientists So let me ask from the point of view of, the driving behavior that you had during that month. and the backend complexity integrations? is all about for the data science market. I mean, it makes sense that the data has to be key. It's about solving the data problem before you solve but that seems to be the disconnect. To the extent that more and more data engineers and more efficient. shoveling out the driveway. One of the reasons that they acquired Pentaho the acquisition's been great and we're still very much Both of the examples that I shared with you of IoT to go out and figure out the IoT strategy. is that as it relates to data science, from acquiring the data and prepping it all the way through and feed that back into the next prediction. of the week here. Thank you for having me. for the next three days here inside The Cube

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Hitachi	ORGANIZATION	0.99+
Dave Alonte	PERSON	0.99+
Pentaho	ORGANIZATION	0.99+
Dave	PERSON	0.99+
90%	QUANTITY	0.99+
Arik Pelkey	PERSON	0.99+
Boston	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
Hitatchi	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
one	QUANTITY	0.99+
50%	QUANTITY	0.99+
eight years	QUANTITY	0.99+
Arick	PERSON	0.99+
One	QUANTITY	0.99+
Lumata	ORGANIZATION	0.99+
Last week	DATE	0.99+
two technologies	QUANTITY	0.99+
15 different data formats	QUANTITY	0.99+
first	QUANTITY	0.99+
92%	QUANTITY	0.99+
One example	QUANTITY	0.99+
Both	QUANTITY	0.99+
Three days	QUANTITY	0.99+
Python	TITLE	0.99+
Kaggle	ORGANIZATION	0.99+
one customer	QUANTITY	0.99+
today	DATE	0.99+
eighth year	QUANTITY	0.99+
last week	DATE	0.99+
Santa Fe, California	LOCATION	0.99+
two	QUANTITY	0.99+
each tool	QUANTITY	0.99+
90 billion dollar	QUANTITY	0.99+
80%	QUANTITY	0.99+
Caterpillar	ORGANIZATION	0.98+
both	QUANTITY	0.98+
NYC	LOCATION	0.98+
first data	QUANTITY	0.98+
Pentaho	LOCATION	0.98+
San Jose	LOCATION	0.98+
The Cube	TITLE	0.98+
Big Data SV	EVENT	0.97+
COBOL	TITLE	0.97+
70	QUANTITY	0.97+
C++	TITLE	0.97+
IMS	TITLE	0.96+
MLlib	TITLE	0.96+
one person	QUANTITY	0.95+
R	TITLE	0.95+
Big Data	EVENT	0.95+
Gardener Data and Analytics	EVENT	0.94+
Gardner	EVENT	0.94+
Strata Hadoop	TITLE	0.93+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Lumata: