Stephanie McReynolds, Alation | CUBEConversation, November 2019
>> Announcer: From our studios, in the heart of Silicon Valley, Palo Alto, California, this is a CUBE conversation. >> Hello, and welcome to theCUBE studios, in Palo Alto, California for another CUBE conversation where we go in depth with though leaders driving innovation across tech industry. I'm your host, Peter Burris. The whole concept of self service analytics has been with us decades in the tech industry. Sometimes its been successful, most times it hasn't been. But we're making great progress and have over the last few years as the technologies matures, as the software becomes more potent, but very importantly as the users of analytics become that much more familiar with what's possible and that much more wanting of what they could be doing. But this notion of self service analytics requires some new invention, some new innovation. What are they? How's that going to play out? Well, we're going to have a great conversation today with Stephanie McReynolds, she's Senior Vice President of Marketing, at Alation. Stephanie, thanks again for being on theCUBE. >> Thanks for inviting me, it's great to be back. >> So, tell us a little, give us an update on Alation. >> So as you know, Alation was one of the first companies to bring a data catalog to the market. And that market category has now been cemented and defined depending on the industry analyst you talk to. There could be 40 or 50 vendors now who are providing data catalogs to the market. So this has become one of the hot technologies to include in a modern analytics stacks. Particularly, we're seeing a lot of demand as companies move from on premise deployments into the cloud. Not only are they thinking about how do we migrate our systems, our infrastructure into the cloud but with data cataloging more importantly, how do we migrate our users to the cloud? How do we get self-service users to understand where to go to find data, how to understand it, how to trust it, what re-use can we do of it's existing assets so we're not just exploding the amount of processing we're doing in the cloud. So that's been very exciting, it's helped us grow our business. We've now seen four straight years of triple digit revenue growth which is amazing for a high growth company like us. >> Sure. >> We also have over 150 different organizations in production with a data catalog as part of their modern analytics stack. And many of those organizations are moving into the thousands of users. So eBay was probably our first customer to move into the, you know, over a thousand weekly logins they're now up to about 4,000 weekly logins through Alation. But now we have customers like Boeing and General Electric and Pfizer and we just closed a deal with US Air Force. So we're starting to see all sorts of different industries and all sorts of different users from the analytics specialist in your organization, like a data scientist or a data engineer, all the way out to maybe a product manager or someone who doesn't really think of them as an analytics expert using Alation either directly or sometimes through one of our partnerships with folks like Tableau or Microstrategy or Power BI. >> So, if we think about this notion of self- service analytics, Stephanie, and again it's Alation has been a leader in defining this overall category, we think in terms of an individual who has some need for data but is, most importantly, has questions they think data can answer and now they're out looking for data. Take us through that process. They need to know where the data is, they need to know what it is, they need to know how to use it, and they need to know what to do if they make a mistake. How is that, how are the data catalogs, like Alation, serving that, and what's new? >> Yeah, so as consumers, this world of data cataloging is very similar if you go back to the introduction of the internet. >> Sure. >> How did you find a webpage in the 90's? Pretty difficult, you had to know the exact URL to go to in most cases, to find a webpage. And then a Yahoo was introduced, and Yahoo did a whole bunch of manual curation of those pages so that you could search for a page and find it. >> So Yahoo was like a big catalog. >> It was like a big catalog, an inventory of what was out there. So the original data catalogs, you could argue, were what we would call from an technical perspective, a metadata repository. No business user wants to use a metadata repository but it created an inventory of what are all the data assets that we have in the organizations and what's the description of those data assets. The meta- data. So metadata repositories were kind of the original catalogs. The big breakthrough for data catalogs was: How do we become the Google of finding data in the organization? So rather than manually curating everything that's out there and providing an in- user inferant with an answer, how could we use machine learning and AI to look at patterns of usage- what people are clicking on, in terms of data assets- surface those as data recommendations to any end user whether they're an analytics specialist or they're just a self- service analytics user. And so that has been the real break through of this new category called data cataloging. And so most folks are accessing a data catalog through a search interface or maybe they're writing a SQL query and there's SQL recommendations that are being provided by the catalog-- >> Or using a tool that utilizes SQL >> Or using a tool that utilizes SQL, and for most people in a- most employees in a large enterprise when you get those thousands of users, they're using some other tool like Tableau or Microstrategy or, you know, a variety of different data visualization providers or data science tools to actually access that data. So a big part of our strategy at Alation has been, how do we surface this data recommendation engine in those third party products. And then if you think about it, once you're surfacing that information and providing some value to those end users, the next thing you want to do is make sure that they're using that data accurately. And that's a non- trivial problem to solve, because analytics and data is complicated. >> Right >> And metadata is extremely complicated-- >> And metadata is-- because often it's written in a language that's arcane and done to be precise from a data standpoint, that's not easily consumable or easily accessible by your average human being. >> Right, so a label, for example, on a table in a data base might be cust_seg_257, what does that mean? >> It means we can process it really quickly in the system. >> Yeah, but as-- >> But it's useless to a human being-- >> As a marketing manager, right? I'm like, hey, I want to do some customer segmentation analysis and I want to find out if people who live in California might behave differently if I provide them an offer than people that live in Massachusetts, it's not intuitive to say, oh yeah, that's in customer_seg_ so what data catalogs are doing is they're thinking about that marketing manager, they're thinking about that peer business user and helping make that translation between business terminology, "Hey I want to run some customer segmentation analysis for the West" with the technical, physical model, that underlies the data in that data base which is customer_seg_257 is the table you need to access to get the answer to that question. So as organizations start to adapt more self- service analytics, it's important that we're managing not just the data itself and this translation from technical metadata to business metadata, but there's another layer that's becoming even more important as organizations embrace self- service analytics. And that's how is this data actually being processed? What is the logic that is being used to traverse different data sets that end users now have access to. So if I take gender information in one table and I have information on income on another table, and I have some private information that identifies those two customers as the same in those two tables, in some use tables I can join that data, if I'm doing marketing campaigns, I likely can join that data. >> Sure. >> If I'm running a loan approval process here in the United States, I cannot join that data. >> That's a legal limitation, that's not a technical issue-- >> That's a legal, federal, government issue. Right? And so here's where there's a discussion, in folks that are knowledgeable about data and data management, there's a discussion of how do we govern this data? But I think by saying how we govern this data, we're kind of covering up what's actually going on, because you don't have govern that data so much as you have to govern the analysis. How is this joined, how are we combining these two data sets? If I just govern the data for accuracy, I might not know the usage scenario which is someone wants to combine these two things which makes it's illegal. Separately, it's fine, combined, it's illegal. So now we need to think about, how do we govern the analytics themselves, the logic that is being used. And that gets kind of complicated, right? For a marketing manager to understand the difference between those things on the surface is doesn't really make sense. It only makes sense when the context of that government regulation is shared and explained and in the course of your workflow and dragging and dropping in a Tableau report, you might not remember that, right? >> That's right, and the derivative output that you create that other people might then be able to use because it's back in the data catalog, doesn't explicitly note, often, that this data was generated as a combination of a join that might not be in compliance with any number of different rules. >> Right, so about a year and a half ago, we introduced a new feature in our data catalog called Trust Check. >> Yeah, I really like this. This is a really interesting thing. >> And that was meant to be a way where we could alert end users to these issues- hey, you're trying to run the same analytic and that's not allowed. We're going to give you a warning, we're not going to let you run that query, we're going to stop you in your place. So that was a way in the workflow of someone while they're typing a SQL statement or while they're dragging and dropping in Tableau to surface that up. Now, some of the vendors we work with, like Tableau, have doubled down on this concept of how do they integrate with an enterprise data catalog to make this even easier. So at Tableau conference last week, they introduced a new metadata API, they introduced a Tableau catalog, and the opportunity for these type of alerts to be pushed into the Tableau catalog as well as directly into reports and worksheets and dashboards that end users are using. >> Let me make sure I got this. So it means that you can put a lot of the compliance rules inside Alation and have a metadata API so that Alation effectively is governing the utilization of data inside the Tableau catalog. >> That's right. So think about the integration with Tableau is this communication mechanism to surface up these policies that are stored centrally in your data catalog. And so this is important, this notion of a central place of reference. We used to talk about data catalogs just as a central place of reference for where all your data assets lie in the organizations, and we have some automated ways to crawl those sources and create a centralized inventory. What we've added in our new release, which is coming out here shortly, is the ability to centralize all your policies in that catalog as well as the pointers to your data in that catalog. So you have a single source of reference for how this data needs to be governed, as well as a single source of reference for how this data is used in the organization. >> So does that mean, ultimately, that someone could try to do something, trust check and say, no you can't, but this new capability will say, and here's why or here's what you do. >> Exactly. >> A descriptive step that says let me explain why you can't do it. >> That's right. Let me not just stop your query and tell you no, let me give you the details as to why this query isn't a good query and what you might be able to do to modify that query should you still want to run it. And so all of that context is available for any end user to be able to become more aware of what is the system doing, and why is recommending. And on the flip side, in the world before we had something like Trust Check, the only opportunity for an IT Team to stop those queries was just to stop them without explanation or to try to publish manuals and ask people to run tests, like the DMV, so that they memorized all those rules of governance. >> Yeah, self- service, but if there's a problem you have to call us. >> That's right. That's right. So what we're trying to do is trying to make the work of those governance teams, those IT Teams, much easier by scaling them. Because we all know the volume of data that's being created, the volume of analysis that's being created is far greater than any individual can come up with, so we're trying to scale those precious data expert resources-- >> Digitize them-- >> Yeah, exactly. >> It's a digital transformation of how we acquire data necessary-- >> And then-- >> for data transformation. >> make it super transparent for the end user as to why they're being told yes or no so that we remove this friction that's existed between business and IT when trying to perform analytics. >> But I want to build a little bit on one of the things I thought I heard you say, and that is that the idea that this new feature, this new capability will actually prescribe an alternative, logical way for you to get your information that might be in compliance. Have I got that right? >> Yeah, that's right. Because what we also have in the catalog is a workflow that allows individuals called Stewards, analytics Stewards to be able to make recommendations and certifications. So if there's a policy that says though shall not use the data in this way, the Stewards can then say, but here's an alternative mechanism, here's an alternative method, and by the way, not only are we making this as a recommendation but this is certified for success. We know that our best analysts have already tried this out, or we know that this complies with government regulation. And so this is a more active way, then, for the two parties to collaborate together in a distributed way, that's asynchronous, and so it's easy for everyone no matter what hour of the day they're working or where they're globally located. And it helps progress analytics throughout the organization. >> Oh and more importantly, it increases the likelihood that someone who is told you now have self- service capability doesn't find themselves abandoning it the first time that somebody says no, because we've seen that over and over with a lot of these query tools, right? That somebody says, oh wow, look at this new capability until the screen, you know, metaphorically, goes dark. >> Right, until it becomes too complicated-- >> That's right-- >> and then you're like, oh I guess I wasn't really trained on this. >> And then they walk away. And it doesn't get adopted. >> Right. >> And this is a way, it's very human centered way to bring that self- service analyst into the system and be a full participant in how you generate value out of it. >> And help them along. So you know, the ultimate goal that we have as an organization, is help organizations become our customers, become data literate populations. And you can only become data literate if you get comfortable working with the date and it's not a black box to you. So the more transparency that we can create through our policy center, through documenting the data for end users, and making it more easy for them to access, the better. And so, in the next version of the Alation product, not only have we implemented features for analytic Stewards to use, to certify these different assets, to log their policies, to ensure that they can document those policies fully with examples and use cases, but we're also bringing to market a professional services offering from our own team that says look, given that we've now worked with about 20% of our installed base, and observed how they roll out Stewardship initiatives and how they assign Stewards and how they manage this process, and how they manage incentives, we've done a lot of thinking about what are some of the best practices for having a strong analytics Stewardship practice if you're a self- service analytics oriented organization. And so our professional services team is now available to help organizations roll out this type of initiative, make it successful, and have that be supported with product. So the psychological incentives of how you get one of these programs really healthy is important. >> Look, you guys have always been very focused on ensuring that your customers were able to adopt valued proposition, not just buy the valued proposition. >> Right. >> Stephanie McReynolds, Senior Vice President of Marketing Relation, once again, thanks for being on theCUBE. >> Thanks for having me. >> And thank you for joining us for another CUBE conversation. I'm Peter Burris. See you next time.
SUMMARY :
in the heart of Silicon Valley, Palo Alto, California, and that much more wanting of what they could be doing. So, tell us a little, depending on the industry analyst you talk to. and General Electric and Pfizer and we just closed a deal and they need to know what to do if they make a mistake. of the internet. of those pages so that you could search for a page And so that has been the real break through the next thing you want to do is make sure that's arcane and done to be precise from a data standpoint, and I have some private information that identifies in the United States, I cannot join that data. and in the course of your workflow and dragging and dropping That's right, and the derivative output that you create we introduced a new feature in our data catalog This is a really interesting thing. and the opportunity for these type of alerts to be pushed So it means that you can put a lot of the compliance rules is the ability to centralize all your policies and here's why or here's what you do. let me explain why you can't do it. the only opportunity for an IT Team to stop those queries but if there's a problem you have to call us. the volume of analysis that's being created so that we remove this friction that's existed and that is that the idea that this new feature, and by the way, not only are we making this Oh and more importantly, it increases the likelihood and then you're like, And then they walk away. And this is a way, it's very human centered way So the psychological incentives of how you get one of these not just buy the valued proposition. Senior Vice President of Marketing Relation, once again, And thank you for joining us for another
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Boeing | ORGANIZATION | 0.99+ |
Pfizer | ORGANIZATION | 0.99+ |
General Electric | ORGANIZATION | 0.99+ |
Stephanie McReynolds | PERSON | 0.99+ |
Stephanie | PERSON | 0.99+ |
Peter Burris | PERSON | 0.99+ |
40 | QUANTITY | 0.99+ |
California | LOCATION | 0.99+ |
Massachusetts | LOCATION | 0.99+ |
Yahoo | ORGANIZATION | 0.99+ |
November 2019 | DATE | 0.99+ |
Alation | ORGANIZATION | 0.99+ |
eBay | ORGANIZATION | 0.99+ |
two parties | QUANTITY | 0.99+ |
two things | QUANTITY | 0.99+ |
two tables | QUANTITY | 0.99+ |
two customers | QUANTITY | 0.99+ |
one table | QUANTITY | 0.99+ |
United States | LOCATION | 0.99+ |
50 vendors | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
Palo Alto, California | LOCATION | 0.99+ |
SQL | TITLE | 0.99+ |
last week | DATE | 0.99+ |
US Air Force | ORGANIZATION | 0.99+ |
Microstrategy | ORGANIZATION | 0.99+ |
first customer | QUANTITY | 0.99+ |
Tableau | ORGANIZATION | 0.98+ |
Tableau | TITLE | 0.98+ |
Stewards | ORGANIZATION | 0.98+ |
Power BI | ORGANIZATION | 0.98+ |
over 150 different organizations | QUANTITY | 0.98+ |
90's | DATE | 0.97+ |
today | DATE | 0.97+ |
single | QUANTITY | 0.97+ |
one | QUANTITY | 0.97+ |
about 20% | QUANTITY | 0.97+ |
four straight years | QUANTITY | 0.97+ |
first time | QUANTITY | 0.97+ |
CUBE | ORGANIZATION | 0.96+ |
over a thousand weekly logins | QUANTITY | 0.96+ |
thousands of users | QUANTITY | 0.96+ |
two data | QUANTITY | 0.94+ |
Microstrategy | TITLE | 0.94+ |
first companies | QUANTITY | 0.92+ |
Tableau | EVENT | 0.9+ |
about | DATE | 0.9+ |
Silicon Valley, Palo Alto, California | LOCATION | 0.89+ |
a year and a half ago | DATE | 0.88+ |
about 4,000 weekly logins | QUANTITY | 0.86+ |
Trust Check | ORGANIZATION | 0.82+ |
single source | QUANTITY | 0.79+ |
Trust Check | TITLE | 0.75+ |
theCUBE | ORGANIZATION | 0.75+ |
customer_seg_257 | OTHER | 0.74+ |
up | QUANTITY | 0.73+ |
Alation | PERSON | 0.72+ |
decades | QUANTITY | 0.7+ |
cust_seg_257 | OTHER | 0.66+ |
Senior Vice President | PERSON | 0.65+ |
years | DATE | 0.58+ |
CUBEConversation | EVENT | 0.51+ |
Stephanie McReynolds, Alation | CUBE Conversation, December 2018
(bright classical music) >> Hi, I'm Peter Burris and welcome to another CUBE Conversation from our studios here in Palo Alto, California. We've got another great conversation today, specifically we're going to talk about some of the trends and changes in data catalogs, which were emerging as a crucial technology to advance data-driven business on a global scale. And to do that, we've got Alation here, specifically Stephanie McReynolds who's the Vice-President of Marketing at Alation. Stephanie, welcome back to theCUBE. >> Thank you, it's great to be here again. >> So Stephanie, before we get into this very important topic of the increasing, obviously role or connection between knowing what your data is, knowing where it is, and business outcomes in a data-driven business world, let's talk about Alation. What's the update? >> Yeah, so we just celebrated, yesterday in fact, the sixth anniversary of incorporation of the company. And upon, reflecting on some of the milestones that we've seen over those six years, one of the exciting developments is we went from initially about seven production implementations a couple years after we were founded, to now over a hundred organizations that are using Alation. And in those organizations over the last couple of years, we've seen many organizations move from hundreds of users, to now thousands of users. An organization like Scout24 has 70 percent of the company as self-servicing analytics users and a significant portion of those users now using Alation. So we're seeing companies in Europe like Scout24 who's in Germany. Companies like Pfizer in the United States. Munich Reinsurance in the financial services industry. Also hit about 2000 users of Alation, and so it's exciting to look at our origins with eBay as our very first customer, who's now up to about 3000 users. And then these more recent companies adopt Alation all of them now getting to a point where they really have a large population that's using a data catalog to drive self-service analytics and business outcomes out of those self-serving analytics. >> So a hundred first-rate brands as users, it's international expansion. Sounds like Alation's really going places. What I want to do though, is I want to talk a little bit about some of the outcomes that these companies are starting to achieve. Now we have been on the record here at circling the angle with theCUBE wiki bomb for quite some time, trying to draw a relationship between business, digital business, and the role that data plays. Digital business transformation, in many respects, is about how you evolve the role the data plays in your business to become more data-driven. It's hard to do without knowing what your data is, where it is, and having some notion of how it's being used in a verified trusted way. How are you seeing your company's start to tie the use of catalogs to some of these outcomes? What kind of outcomes are folks trying to achieve first off? >> Yeah, you're right. Just basic table stakes for turning an organization into an organization that relies on data-driven decision-making rather than intuitive-decision making requires an inventory. And so that's table stakes for any catalog, and you see a number of vendors out there providing data inventories. But what I think is exciting with the customers that we work with, is they are really undertaking transformative change, not just in the tooling and technology their company uses, but also in the organizational structure, and data literacy programs, and driving towards real business impact, and real business outcomes. An example of an Alation customer, who's been talking recently about outcomes, is Pfizer. Pfizer was covered in a Wall Street Journal article, recently. Also was speaking at TABLO Conference, about how they're using a combination of the Alation data catalog with TABLO on the front end, and a data science platform called Data IQ, in an integrated analytics workbench that is helping them with new drug discovery. And so, for populations of ill individuals, who may have a rare form of heart disease, they're now able to use machine learning and algorithms that are informed by the data catalog to catch one percent, two percent of heart disease patients who have a slight deviation from the norm, and can deliver drugs appropriately to that population. Another example of the business outcome would be with an insurance company; very different industry, right? But, Munich Reinsurance is a huge global reinsurance company, so you think about hurricanes or the fires we had here in the United States, they actually support first line insurers by reinsuring them. They're also founding new business units for new types of risks in the market. An example would be a factory that is fully controlled by robots. Think about the risks of having that factory be taken over by hackers in the middle of the night, where there's not a lot of employees on the floor. Munich Reinsurance is leveraging the data catalog as a collaboration platform between actuaries and individuals that are knowledgeable in the business to define what are the data products that could support an entirely new business units, like for cyber crimes. And investing in those business units based on the innovation they're doing using the data catalog as a collaboration platform. So these are two great examples of organizations that, a couple years ago started with a data catalog, but have driven so many more initiatives than just analyst productivity off of that implementation. >> Oh, those are great outcomes. One of them talking about robots in the factory, automated factory, one thing, if they went haywire, would make for some interesting viral video. (gently laughs) >> That's right. That's right. >> But coming back, but the reason I say that is because in many respects, these practices, these relations with the outcomes, the outcomes are the real complex thing. You talked about becoming more familiar with data, using data differently, becoming more data driven. That requires some pretty significant organizational change. And it seems to me, and I'm querying you on this, that the bringing together these users to share their stories about how to achieve these data driven outcomes, made more productive by catalogs and related technologies. Communities must start to be forming. Are you seeing communities form around achieving these outcomes and utilizing these types of technologies to accelerate the business change? >> So what's really interesting at an organization like Munich Reinsurance or at Pfizer, is there's an internal community that is using the data catalog as a collaboration platform and as kind of a social networking platform for the data nerds. So if I am a brand new user of self-service analytics, I may be a product manager who doesn't know how to write a sequel query yet. Who doesn't know how to go and wrangle my own data. >> Yeah, may never want to. (playfully laughs) >> May never want to. May never want to. Who may not know how to go and validate data for quality or consistency. I can now go to the data catalog to find trusted resources of data assets, be that a dashboard to report that's already been written or be that raw data that someone else has certified, or just has used in the past. So we're seeing this social influence happen within companies that are using data catalogs, where they can see for the data catalog pages, who's used, who's validated this data set so that I now trust the data. And then, what we've seen happen, just within the last year and-a-half or so, is these organizations, the sponsors of the data of these organizations, are starting to share best practices naturally with one another, and saying, hey >> Across organizations. >> Across organizations. And so there has been a demand for Alation to get out into the market and help catalyze the creation of communities across different organizations. We kicked off, within the last two months, a series of meetings that we've called RevAlation. >> R-E-V-A-L >> That's right >> A-T-I-O-N >> R-E-V-A-L-A-T-I-O-N And the thing behind the name is, if you can start to share best practices in terms of how you create a data-driven culture across organizations, you can begin to really get breakthrough speed, right? In making this transformation to a data-driven organization. And so, I think what's interesting at the RevAlation events, is folks are not talking just about how they're using the tool, how they're using technology. They're actually talking about how do we improve the data literacy of our organizations and what are the programs in place that leverage, maybe the data catalog, to do that. And so they're starting to really think about, how does, not just the technical architecture and the tooling change in their organizations, but how do we close this gap between having access to data and trusting the data and getting folks who maybe aren't, too familiar with the technical aspects of the data supply chain. How do we make them comfortable in moving away from intuitive decisions to data-driven decisions? >> Yeah, so the outcome really is not just the application of the tool, it's the new behaviors in the business that are associated with data-driven. But to do that, you still have to gain insight and understand what kinds of practices are best used with the tool itself. >> That's right. >> So it's got to be a combination. But, you know, Alation has been, if I can say this. Alation's been on this path for a while. Not too long ago, you came on theCUBE and you talked about trust check. >> Right. >> Which was an effort to establish conventions and standards for how data could be verified and validated so that it would be easy to use, so that someone could use the data and be certain that it is what it is, without necessarily having to understand the data. Something that could be very good for, for example, for folks who are very focused on the outcome, and not focused on the science of the data associated with it. >> That's right. >> So, is this part of, it's RevAlation, it's trust check. Is this part of the journey you're on to try to get people to see this relation between data-driven business and knowing more about your data? >> It absolutely is. It's a journey to get organizations to understand what is the power that they have internally, within this data. And close the gap on, which is in part organizational, but in part for individuals user's psychological and how do you get to a trusted decision. And so, you'll continue to see us invest in features like trust check that highlight how technology can make recommendations; can help validate and verify what the experts in the organization know and propagate that more widely. And then you'll also see us share more best practices about how do you start to create the right organizational change, and how do you start to impact the psychology of fear that we've had in many organizations around data. And I think that's where Alation is uniquely placed, because we have the highest number of data catalog customers of any other vendor I'm familiar with in this space. And we also have a unique design approach. When we go into organizations and talk about adopting a data catalog, it's as much about, how do our products support psychological comfort with data as well as, how do they support the actual workflow of getting that query completed, or getting that data certified. And so I think we've taken a bit of a unique approach to the market from the beginning where we're really designing holistically. We're not just, how do you execute a software program that supports workflow? But how do you start to think about how the data consumer actually adopts that best practices and starts to think differently about how they use data in a more confident way? >> Well I think the first time that you and I talked in theCUBE was probably 2016, and I was struck by the degree to which Alation as a tool, and the language that you used in describing it was clearly designed for human beings to use it. >> Right. >> As opposed to for data. And I think that, that is a unique proposition, because at the end of the day, the goal here, is to have people use data to achieve outcomes and not just to do a better job of managing data. >> And that doesn't mean that, I mean we have a ton of machine learning, >> Sure. >> And AI in the products. That doesn't take away from the power of those algorithms to speed up human work and human behavior. But we really believe that the algorithms need to compliment human input and that there should be a human in the loop with decision-making. And then the algorithms propagate the knowledge that we have of experts in the organization. And that's where you get the real breakthrough business outcomes, when you can take input from a lot of different human perspectives and optimize an outcome by using technology as a support structure to help that. >> In a way that's familiar and natural and easy for others in your organization. >> That's right. That seems, you know, if you go back to. >> It makes sense. >> When we were all introduced to Google it was a little bit of an odd thing to go ask Google questions and get results back from the internet. We see data evolving in the same way. Alation is the Google for your data in your organization. At some point it'll be very natural to say, 'Hey Alation, what happened with revenue last month?' And Alation will come back with an answer. So I think that, that future is in sight, where it's very easy to use data. You know you're getting trusted responses. You know that they're accurate because there's either a certification program in place that the technology supports, or there's a social network that's bubbling this information up to the top, that is a trusted source. And so, that evolution in data needs to happen for our organizations to broadly see analytic driven outcomes. Just as in our consumer or personal life, Google had to show us a new way to evolving, you know, to a kind of answering machine on the internet. >> Excellent. Stephanie McReynolds, Vice-President of Marketing Alation, talked to us about building communities, to become more of a, to achieve data-driven outcomes, utilizing data catalog technology. Stephanie, thanks very much for being here. >> Thanks for inviting me. >> And once again, I'm Peter Burris, and this has been another CUBE Conversation until next time. (bright classical music)
SUMMARY :
And to do that, we've got Alation here, What's the update? Munich Reinsurance in the about some of the outcomes combination of the Alation data robots in the factory, That's right. that the bringing together platform for the data nerds. Yeah, may never want to. the data of these organizations, into the market and help the data catalog, to do that. of the tool, it's the new So it's got to be a combination. the data associated with it. to see this relation between And close the gap on, which to use it. and not just to do a better And AI in the products. in your organization. That seems, you know, if you go back to. that the technology supports, talked to us about building communities, and this has been another CUBE
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Stephanie | PERSON | 0.99+ |
Stephanie McReynolds | PERSON | 0.99+ |
Pfizer | ORGANIZATION | 0.99+ |
Peter Burris | PERSON | 0.99+ |
Europe | LOCATION | 0.99+ |
Scout24 | ORGANIZATION | 0.99+ |
December 2018 | DATE | 0.99+ |
ORGANIZATION | 0.99+ | |
Germany | LOCATION | 0.99+ |
one percent | QUANTITY | 0.99+ |
two percent | QUANTITY | 0.99+ |
70 percent | QUANTITY | 0.99+ |
Alation | ORGANIZATION | 0.99+ |
2016 | DATE | 0.99+ |
Munich Reinsurance | ORGANIZATION | 0.99+ |
Palo Alto, California | LOCATION | 0.99+ |
United States | LOCATION | 0.99+ |
eBay | ORGANIZATION | 0.99+ |
hundreds of users | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
first customer | QUANTITY | 0.99+ |
sixth anniversary | QUANTITY | 0.98+ |
first time | QUANTITY | 0.98+ |
about 2000 users | QUANTITY | 0.98+ |
One | QUANTITY | 0.98+ |
six years | QUANTITY | 0.97+ |
thousands of users | QUANTITY | 0.97+ |
today | DATE | 0.96+ |
about 3000 users | QUANTITY | 0.95+ |
one | QUANTITY | 0.95+ |
two great examples | QUANTITY | 0.94+ |
TABLO Conference | EVENT | 0.93+ |
first | QUANTITY | 0.92+ |
over a hundred organizations | QUANTITY | 0.9+ |
TABLO | ORGANIZATION | 0.88+ |
last month | DATE | 0.88+ |
Alation | TITLE | 0.86+ |
last couple of years | DATE | 0.86+ |
couple years ago | DATE | 0.85+ |
last two months | DATE | 0.85+ |
Vice | PERSON | 0.84+ |
Data IQ | TITLE | 0.82+ |
Marketing Alation | ORGANIZATION | 0.81+ |
Munich | ORGANIZATION | 0.81+ |
about seven production implementations | QUANTITY | 0.74+ |
hundred first-rate brands | QUANTITY | 0.74+ |
one thing | QUANTITY | 0.74+ |
first line | QUANTITY | 0.72+ |
CUBE Conversation | EVENT | 0.68+ |
Alation | PERSON | 0.67+ |
Wall Street Journal | TITLE | 0.67+ |
last year and-a-half | DATE | 0.66+ |
President | PERSON | 0.63+ |
up to | QUANTITY | 0.58+ |
theCUBE | ORGANIZATION | 0.56+ |
RevAlation | EVENT | 0.55+ |
RevAlation | TITLE | 0.52+ |
couple years | QUANTITY | 0.48+ |
Reinsurance | TITLE | 0.44+ |
Stephanie McReynolds, Alation | theCUBE NYC 2018
>> Live from New York, It's theCUBE! Covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Hello and welcome back to theCUBE live in New York City, here for CUBE NYC. In conjunct with Strata Conference, Strata Data, Strata Hadoop This is our ninth year covering the big data ecosystem which has evolved into machine learning, A.I., data science, cloud, a lot of great things happening all things data, impacting all businesses I'm John Furrier, your host with Dave Vellante and Peter Burris, Peter is filling in for Dave Vellante. Next guest, Stephanie McReynolds who is the CMO, VP of Marketing for Alation, thanks for joining us. >> Thanks for having me. >> Good to see you. So you guys have a pretty spectacular exhibit here in New York. I want to get to that right away, top story is Attack of the Bots. And you're showing a great demo. Explain what you guys are doing in the show. >> Yah, well it's robot fighting time in our booth, so we brought a little fun to the show floor my kids are.. >> You mean big data is not fun enough? >> Well big data is pretty fun but occasionally you got to get your geek battle on there so we're having fun with robots but I think the real story in the Alation booth is about the product and how machine learning data catalogs are helping a whole variety of users in the organization everything from improving analyst productivity and even some business user productivity of data to then really supporting data scientists in their work by helping them to distribute their data products through a data catalog. >> You guys are one of the new guard companies that are doing things that make it really easy for people who want to use data, practitioners that the average data citizen has been called, or people who want productivity. Not necessarily the hardcore, setting up clusters, really kind of like the big data user. What's that market look like right now, has it met your expectations, how's business, what's the update? >> Yah, I think we have a strong perspective that for us to close the final mile and get to real value out of the data, it's a human challenge, there's a trust gap with managers. Today on stage over at STRATA it was interesting because Google had a speaker and it wasn't their chief data officer it was their chief decision scientist and I think that reflects what that final mile is is that making decisions and it's the trust gap that managers have with data because they don't know how the insides are coming to them, what are all the details underneath. In order to be able to trust decisions you have to understand who processed the data, what decision making criteria did they use, was this data governed well, are we introducing some bias into our algorithms, and can that be controlled? And so Alation becomes a platform for supporting getting answers to those issues. And then there's plenty of other companies that are optimizing the performance of those QUERYS and the storage of that data, but we're trying to really to close that trust gap. >> It's very interesting because from a management standpoint we're trying to do more evidence based management. So there's a major trend in board rooms, and executive offices to try to find ways to acculturate the executive team to using data, evidence based management healthcare now being applied to a lot of other domains. We've also historically had a situation where the people who focused or worked with the data was a relatively small coterie of individuals that crave these crazy systems to try to bring those two together. It sounds like what you're doing, and I really like the idea of the data scientists, being able to create data products that then can be distributed. It sounds like you're trying to look at data as an asset to be created, to be distributed so they can be more easily used by more people in your organization, have we got that right? >> Absolutely. So we're now seeing we're in just over a hundred production implementations of Alation, at large enterprises, and we're now seeing those production implementations get into the thousands of users. So this is going beyond those data specialists. Beyond the unicorn data scientists that understand the systems and math and technology. >> And business. >> And business, right. In business. So what we're seeing now is that a data catalog can be a point of collaboration across those different audiences in an enterprise. So whereas three years ago some of our initial customers kept the data catalog implementations small, right. They were getting access to the specialists to this catalog and asked them to certify data assets for others, what were starting to see is a proliferation of creation of self service data assets, a certification process that now is enterprise-wide, and thousands of users in these organizations. So Ebay has over a thousand weekly logins, Munich Reinsurance was on stage yesterday, their head of data engineering said they have 2,000 users on Alation at this point on their data lake, Fiserv is going to speak on Thursday and they're getting up to those numbers as well, so we see some really solid organizations that are solving medical, pharmaceutical issues, right, the largest re insurer in the world leading tech companies, starting to adopt a data catalog as a foundation for how their going to make those data driven decisions in the organization. >> Talk about how the product works because essentially you're bringing kind of the decision scientists, for lack of a better word, and productivity worker, almost like a business office suite concept, as a SAS, so you got a SAS model that says "Hey you want to play with data, use it but you have to do some front end work." Take us through how you guys roll out the platform, how are your customers consuming the service, take us through the engagement with customers. >> I think for customers, the most interesting part of this product is that it displays itself as an application that anyone can use, right? So there's a super familiar search interface that, rather than bringing back webpages, allows you to search for data assets in your organization. If you want more information on that data asset you click on those search results and you can see all of the information of how that data has been used in the organization, as well as the technical details and the technical metadata. And I think what's even more powerful is we actually have a recommendation engine that recommends data assets to the user. And that can be plugged into Tablo and Salesworth, Einstein Analytics, and a whole variety of other data science tools like Data Haiku that you might be using in your organization. So this looks like a very easy to use application that folks are familiar with that you just need a web browser to access, but on the backend, the hard work that's happening is the automation that we do with the platform. So by going out and crawling these source systems and looking at not just the technical descriptions of data, the metadata that exists, but then being able to understand by parsing the sequel weblogs, how that data is actually being used in the organization. We call it behavior I.O. by looking at the behaviors of how that data's being used, from those logs, we can actually give you a really good sense of how that data should be used in the future or where you might have gaps in governing that data or how you might want to reorient your storage or compute infrastructure to support the type of analytics that are actually being executed by real humans in your organization. And that's eye opening to a lot of I.T. sources. >> So you're providing insights to the data usage so that the business could get optimized for whether it's I.T. footprint component, or kinds of use cases, is that kind of how it's working? >> So what's interesting is the optimization actually happens in a pretty automated way, because we can make recommendations to those consumers of data of how they want to navigate the system. Kind of like Google makes recommendations as you browse the web, right? >> If you misspell something, "Oh did you mean this", kind of thing? >> "Did you mean this, might you also be interested in this", right? It's kind of a cross between Google and Amazon. Others like you may have used these other data assets in the past to determine revenue for that particular region, have you thought about using this filter, have you thought about using this join, did you know that you're trying to do analysis that maybe the sales ops guy has already done, and here's the certified report, why don't you just start with that? We're seeing a lot of reuse in organizations, wherein the past I think as an industry when Tablo and Click and all these B.I tools that were very self service oriented started to take off it was all about democratizing visualization by letting every user do their own thing and now we're realizing to get speed and accuracy and efficiency and effectiveness maybe there's more reuse of the work we've already done in existing data assets and by recommending those and expanding the data literacy around the interpretation of those, you might actually close this trust gap with the data. >> But there's one really important point that you raised, and I want to come back to it, and that is this notion of bias. So you know, Alation knows something about the data, knows a lot about the metadata, so therefore, I don't want to say understands, but it's capable of categorizing data in that way. And you're also able to look at the usage of that data by parsing some of sequel statements and then making a determination of the data as it's identified is appropriately being used based on how people are actually applying it so you can identify potential bias or potential misuse or whatever else it might be. That is an incredibly important thing. As you know John, we had an event last night and one of the things that popped up is how do you deal with emergence in data science in A.I, etc. And what methods do you put in place to actually ensure that the governance model can be extended to understand how those things are potentially in a very soft way, corrupting the use of the data. So could you spend a little bit more time talking about that because it's something a lot of people are interested in, quite frankly we don't know about a lot of tools that are doing that kind of work right now. It's an important point. >> I think the traditional viewpoint was if we just can manage the data we will be able to have a govern system. So if we control the inputs then well have a safe environment, and that was kind of like the classic single source of truth, data warehouse type model. >> Stewards of the data. >> What we're seeing is with the proliferation of sources of data and how quickly with IOT and new modern sources, data is getting created, you're not able to manage data at that point of that entry point. And it's not just about systems, it's about individuals that go on the web and find a dataset and then load it into a corporate database, right? Or you merge an Excel file with something that in a database. And so I think what we see happening, not only when you look at bias but if you look at some of the new regulations like [Inaudible] >> Sure. Ownership, [Inaudible] >> The logic that you're using to process that data, the algorithm itself can be biased, if you have a biased training data site that you feed it into a machine learning algorithm, the algorithm itself is going to be biased. And so the control point in this world where data is proliferating and we're not sure we can control that entirely, becomes the logic embedded in the algorithm. Even if that's a simple sequel statement that's feeding a report. And so Alation is able to introspect that sequel and highlight that maybe there is bias at work and how this algorithm is composed. So with GDPR the consumer owns their own data, if they want to pull it out from a training data set, you got to rerun that algorithm without that consumer data and that's your control point then going forward for the organization on different governance issues that pop up. >> Talk about the psychology of the user base because one of the things that shifted in the data world is a few stewards of data managed everything, now you've got a model where literally thousands of people of an organization could be users, productivity users, so you get a social component in here that people know who's doing data work, which in a way, creates a new persona or class of worker. A non techy worker. >> Yeah. It's interesting if you think about moving access to the data and moving the individuals that are creating algorithms out to a broader user group, what's important, you have to make sure that you're educating and training and sharing knowledge with that democratized audience, right? And to be able to do that you kind of want to work with human psychology, right? You want to be able to give people guidance in the course of their work rather than have them memorize a set of rules and try to remember to apply those. If you had a specialist group you can kind of control and force them to memorize and then apply, the more modern approach is to say "look, with some of these machine learning techniques that we have, why don't we make a recommendation." What you're going to do is introduce bias into that calculation. >> And we're capturing that information as you use the data. >> Well were also making a recommendation to say "Hey do you know you're doing this? Maybe you don't want to do that." Most people are using the data are not bad actors. They just can't remember all the rule sets to apply. So what were trying to do is cut someone behaviorally in the act before they make that mistake and say hey just a bit of a reminder, a bit of a coaching moment, did you know what you're doing? Maybe you can think of another approach to this. And we've found that many organizations that changes the discussion around data governance. It's no longer this top down constraint to finding insight, which frustrates an audience, is trying to use that data. It's more like a coach helping you improve and then social aspect of wanting to contribute to the system comes into play and people start communicating, collaborating, the platform and curating information a little bit. >> I remember when Microsoft Excel came out, the spreadsheet, or Lotus 123, oh my God, people are going to use these amazing things with spreadsheets, they did. You're taking a similar approach with analytics, much bigger surface area of work to kind of attack from a data perspective, but in a way kind of the same kind of concept, put the hands of the users, have the data in their hands so to speak. >> Yeah, enable everyone to make data driven decisions. But make sure that they're interpreting that data in the right way, right? Give them enough guidance, don't let them just kind of attack the wild west and fair it out. >> Well looking back at the Microsoft Excel spreadsheet example, I remember when a finance department would send a formatted spreadsheet with all the rules for how to use it out of 50 different groups around the world, and everyone figured out that you can go in and manipulate the macros and deliver any results they want. And so it's that same notion, you have to know something about that, but this site, in many respects Stephanie you're describing a data governance model that really is more truly governance, that if we think about a data asset it's how do we mediate a lot of different claims against that set of data so that its used appropriately, so its not corrupted, so that it doesn't effect other people, but very importantly so that the out6comes are easier to agree upon because there's some trust and there's some valid behaviors and there's some verification in the flow of the data utilization. >> And where we give voice to a number of different constituencies. Because business opinions from different departments can run slightly counter to one another. There can be friction in how to use particular data assets in the business depending on the lens that you have in that business and so what were trying to do is surface those different perspectives, give them voice, allow those constituencies to work that out in a platform that captures that debate, captures that knowledge, makes that debate a knowledge of foundation to build upon so in many ways its kind of like the scientific method, right? As a scientist I publish a paper. >> Get peer reviewed. >> Get peer reviewed, let other people weigh in. >> And it becomes part of the canon of knowledge. >> And it becomes part of the canon. And in the scientific community over the last several years you see that folks are publishing their data sets out publicly, why can't an enterprise do the same thing internally for different business groups internally. Take the same approach. Allow others to weigh in. It gets them better insights and it gets them more trust in that foundation. >> You get collective intelligence from the user base to help come in and make the data smarter and sharper. >> Yeah and have reusable assets that you can then build upon to find the higher level insights. Don't run the same report that a hundred people in the organization have already run. >> So the final question for you. As you guys are emerging, starting to do really well, you have a unique approach, honestly we think it fits in kind of the new guard of analytics, a productivity worker with data, which is we think is going to be a huge persona, where are you guys winning, and why are you winning with your customer base? What are some things that are resonating as you go in and engage with prospects and customers and existing customers? What are they attracted to, what are they like, and why are you beating the competition in your sales and opportunities? >> I think this concept of a more agile, grassroots approach to data governance is a breath of fresh air for anyone who spend their career in the data space. Were at a turning point in industry where you're now seeing chief decision scientists, chief data officers, chief analytic officers take a leadership role in organizations. Munich Reinsurance is using their data team to actually invest and hold new arms of their business. That's how they're pushing the envelope on leadership in the insurance space and were seeing that across our install base. Alation becomes this knowledge repository for all of those mines in the organization, and encourages a community to be built around data and insightful questions of data. And in that way the whole organization raises to the next level and I think its that vision of what can be created internally, how we can move away from just claiming that were a big data organization and really starting to see the impact of how new business models can be creative in these data assets, that's exciting to our customer base. >> Well congratulations. A hot start up. Alation here on theCUBE in New York City for cubeNYC. Changing the game on analytics, bringing a breath of fresh air to hands of the users. A new persona developing. Congratulations, great to have you. Stephanie McReynolds. Its the cube. Stay with us for more live coverage, day one of two days live in New York City. We'll be right back.
SUMMARY :
Brought to you by SiliconANGLE Media the CMO, VP of Marketing for Alation, thanks for joining us. So you guys have a pretty spectacular so we brought a little fun to the show floor in the Alation booth is about the product You guys are one of the new guard companies is that making decisions and it's the trust gap and I really like the idea of the data scientists, production implementations get into the thousands of users. and asked them to certify data assets for others, kind of the decision scientists, gaps in governing that data or how you might want to so that the business could get optimized as you browse the web, right? in the past to determine revenue for that particular region, and one of the things that popped up is how do you deal and that was kind of like the classic it's about individuals that go on the web and find a dataset the algorithm itself is going to be biased. because one of the things that shifted in the data world And to be able to do that you kind of They just can't remember all the rule sets to apply. have the data in their hands so to speak. that data in the right way, right? and everyone figured out that you can go in in the business depending on the lens that you have And in the scientific community over the last several years You get collective intelligence from the user base Yeah and have reusable assets that you can then build upon and why are you winning with your customer base? and really starting to see the impact of how new business bringing a breath of fresh air to hands of the users.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Stephanie McReynolds | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Peter Burris | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Stephanie | PERSON | 0.99+ |
Thursday | DATE | 0.99+ |
New York | LOCATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
50 different groups | QUANTITY | 0.99+ |
Peter | PERSON | 0.99+ |
New York City | LOCATION | 0.99+ |
Ebay | ORGANIZATION | 0.99+ |
2,000 users | QUANTITY | 0.99+ |
Excel | TITLE | 0.99+ |
Attack of the Bots | TITLE | 0.99+ |
thousands | QUANTITY | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
two days | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
ninth year | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
STRATA | ORGANIZATION | 0.99+ |
Today | DATE | 0.99+ |
Fiserv | ORGANIZATION | 0.99+ |
last night | DATE | 0.99+ |
three years ago | DATE | 0.99+ |
Alation | PERSON | 0.99+ |
NYC | LOCATION | 0.98+ |
Lotus 123 | TITLE | 0.98+ |
Munich Reinsurance | ORGANIZATION | 0.98+ |
one | QUANTITY | 0.98+ |
GDPR | TITLE | 0.97+ |
Alation | ORGANIZATION | 0.96+ |
Microsoft | ORGANIZATION | 0.94+ |
SAS | ORGANIZATION | 0.94+ |
over a thousand weekly logins | QUANTITY | 0.91+ |
theCUBE | ORGANIZATION | 0.9+ |
Strata Conference | EVENT | 0.89+ |
single source | QUANTITY | 0.86+ |
thousands of people | QUANTITY | 0.86+ |
thousands of users | QUANTITY | 0.84+ |
Tablo | ORGANIZATION | 0.83+ |
day one | QUANTITY | 0.78+ |
2018 | EVENT | 0.75+ |
CUBE | ORGANIZATION | 0.75+ |
Salesworth | ORGANIZATION | 0.74+ |
Einstein Analytics | ORGANIZATION | 0.73+ |
Tablo | TITLE | 0.73+ |
Strata Hadoop | EVENT | 0.73+ |
a hundred people | QUANTITY | 0.7+ |
2018 | DATE | 0.66+ |
point | QUANTITY | 0.63+ |
years | DATE | 0.63+ |
Alation | LOCATION | 0.62+ |
Click | ORGANIZATION | 0.62+ |
Munich Reinsurance | TITLE | 0.6+ |
over a hundred | QUANTITY | 0.59+ |
Data | ORGANIZATION | 0.58+ |
Strata Data | EVENT | 0.57+ |
last | DATE | 0.55+ |
Haiku | TITLE | 0.47+ |
Stephanie McReynolds, Alation | DataWorks Summit 2018
>> Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We're joined by Stephanie McReynolds. She is the Vice President of Marketing at Alation. Thanks so much for, for returning to theCUBE, Stephanie. >> Thank you for having me again. >> So, before the cameras were rolling, we were talking about Kevin Slavin's talk on the main stage this morning, and talking about, well really, a background to sort of this concern about AI and automation coming to take people's jobs, but really, his overarching point was that we really, we shouldn't, we shouldn't let the algorithms take over, and that humans actually are an integral piece of this loop. So, riff on that a little bit. >> Yeah, what I found fascinating about what he presented were actual examples where having a human in the loop of AI decision-making had a more positive impact than just letting the algorithms decide for you, and turning it into kind of a black, a black box. And the issue is not so much that, you know, there's very few cases where the algorithms make the wrong decision. What happens the majority of the time is that the algorithms actually can't be understood by human. So if you have to roll back >> They're opaque, yeah. >> in your decision-making, or uncover it, >> I mean, who can crack what a convolutional neural network does, at a layer by layer, nobody can. >> Right, right. And so, his point was, if we want to avoid not just poor outcomes, but also make sure that the robots don't take over the world, right, which is where every like, media person goes first, right? (Rebecca and James laugh) That you really need a human in the loop of this process. And a really interesting example he gave was what happened with the 2015 storm, and he talked about 16 different algorithms that do weather predictions, and only one algorithm predicted, mis-predicted that there would be a huge weather storm on the east coast. So if there had been a human in the loop, we wouldn't have, you know, caused all this crisis, right? The human could've >> And this is the storm >> Easily seen. >> That shut down the subway system, >> That's right. That's right. >> And really canceled New York City for a few days there, yeah. >> That's right. So I find this pretty meaningful, because Alation is in the data cataloging space, and we have a lot of opportunity to take technical metadata and automate the collection of technical and business metadata and do all this stuff behind the scenes. >> And you make the discovery of it, and the analysis of it. >> We do the discovery of this, and leading to actual recommendations to users of data, that you could turn into automated analyses or automated recommendations. >> Algorithmic, algorithmically augmented human judgment is what it's all about, the way I see it. What do you think? >> Yeah, but I think there's a deeper insight that he was sharing, is it's not just human judgment that is required, but for humans to actually be in the loop of the analysis as it moves from stage to stage, that we can try to influence or at least understand what's happening with that algorithm. And I think that's a really interesting point. You know, there's a number of data cataloging vendors, you know, some analysts will say there's anywhere from 10 to 30 different vendors in the data cataloging space, and as vendors, we kind of have this debate. Some vendors have more advanced AI and machine learning capabilities, and other vendors haven't automated at all. And I think that the answer, if you really want humans to adopt analytics, and to be comfortable with the decision-making of those algorithms, you need to have a human in the loop, in the middle of that process, of not only making the decision, but actually managing the data that flows through these systems. >> Well, algorithmic transparency and accountability is an increasing requirement. It's a requirement for GDPR compliance, for example. >> That's right. >> That I don't see yet with Wiki, but we don't see a lot of solution providers offering solutions to enable more of an automated roll-up of a narrative of an algorithmic decision path. But that clearly is a capability as it comes along, and it will. That will absolutely depend on a big data catalog managing the data, the metadata, but also helping to manage the tracking of what models were used to drive what decision, >> That's right. >> And what scenario. So that, that plays into what Alation >> So we talk, >> And others in your space do. >> We call that data catalog, almost as if the data's the only thing that we're tracking, but in addition to that, that metadata or the data itself, you also need to track the business semantics, how the business is using or applying that data and that algorithmic logic, so that might be logic that's just being used to transform that data, or it might be logic to actually make and automate decision, like what they're talking about GDPR. >> It's a data artifact catalog. These are all artifacts that, they are derived in many ways, or supplement and complement the data. >> That's right. >> They're all, it's all the logic, like you said. >> And what we talk about is, how do you create transparency into all those artifacts, right? So, a catalog starts with this inventory that creates a foundation for transparency, but if you don't make those artifacts accessible to a business person, who might not understand what is metadata, what is a transformation script. If you can't make that, those artifacts accessible to a, what I consider a real, or normal human being, right, (James laughs) I love to geek out, but, (all laugh) at some point, not everyone is going to understand. >> She's the normal human being in this team. >> I'm normal. I'm normal. >> I'm the abnormal human being among the questioners here. >> So, yeah, most people in the business are just getting our arms around how do we trust the output of analytics, how do we understand enough statistics and know what to apply to solve a business problem or not, and then we give them this like, hairball of technical artifacts and say, oh, go at it. You know, here's your transparency. >> Well, I want to ask about that, that human that we're talking about, that needs to be in the loop at every stage. What, that, surely, we can make the data more accessible, and, but it also requires a specialized skill set, and I want to ask you about the talent, because I noticed on your LinkedIn, you said, hey, we're hiring, so let me know. >> That's right, we're always hiring. We're a startup, growing well. >> So I want to know from you, I mean, are you having difficulty with filling roles? I mean, what is at the pipeline here? Are people getting the skills that they need? >> Yeah, I mean, there's a wide, what I think is a misnomer is there's actually a wide variety of skills, and I think we're adding new positions to this pool of skills. So I think what we're starting to see is an expectation that true business people, if you are in a finance organization, or you're in a marketing organization, or you're in a sales organization, you're going to see a higher level of data literacy be expected of that, that business person, and that's, that doesn't mean that they have to go take a Python course and learn how to be a data scientist. It means that they have to understand statistics enough to realize what the output of an algorithm is, and how they should be able to apply that. So, we have some great customers, who have formally kicked off internal training programs that are data literacy programs. Munich Re Insurance is a good example. They spoke with James a couple of months ago in Berlin. >> Yeah, this conference in Berlin, yeah. >> That's right, that's right, and their chief data officer has kicked off a formal data literacy training program for their employees, so that they can get business people comfortable enough and trusting the data, and-- >> It's a business culture transformation initiative that's very impressive. >> Yeah. >> How serious they are, and how comprehensive they are. >> But I think we're going to see that become much more common. Pfizer has taken, who's another customer of ours, has taken on a similar initiative, and how do they make all of their employees be able to have access to data, but then also know when to apply it to particular decision-making use cases. And so, we're seeing this need for business people to get a little bit of training, and then for new roles, like information stewards, or data stewards, to come online, folks who can curate the data and the data assets, and help be kind of translators in the organization. >> Stephanie, will there be a need for a algorithm curator, or a model curator, to, you know, like a model whisperer, to explain how these AI, convolutional, recurrent, >> Yeah. >> Whatever, all these neural, how, what they actually do, you know. Would there be a need for that going forward? Another as a normal human being, who can somehow be bilingual in neural net and in standard language? >> I think, I think so. I mean, I think we've put this pressure on data scientists to be that person. >> Oh my gosh, they're so busy doing their job. How can we expect them to explain, and I mean, >> Right. >> And to spend 100% of their time explaining it to the rest of us? >> And this is the challenge with some of the regulations like GDPR. We aren't set up yet, as organizations, to accommodate this complexity of understanding, and I think that this part of the market is going to move very quickly, so as vendors, one of the things that we can do is continue to help by building out applications that make it easy for information stewardship. How do you lower the barrier for these specialist roles and make it easy for them to do their job by using AI and machine learning, where appropriate, to help scale the manual work, but keeping a human in the loop to certify that data asset, or to add additional explanation and then taking their work and using AI, machine learning, and automation to propagate that work out throughout the organization, so that everyone then has access to those explanations. So you're no longer requiring the data scientists to hold like, I know other organizations that hold office hours, and the data scientist like sits at a desk, like you did in college, and people can come in and ask them questions about neural nets. That's just not going to scale at today's pace of business. >> Right, right. >> You know, the term that I used just now, the algorithm or model whisperer, you know, the recommend-er function that is built into your environment, in similar data catalog, is a key piece of infrastructure to rank the relevance rank, you know, the outputs of the catalog or responses to queries that human beings might make. You know, the recommendation ranking is critically important to help human beings assess the, you know, what's going on in the system, and give them some advice about how to, what avenues to explore, I think, so. >> Yeah, yeah. And that's part of our definition of data catalog. It's not just this inventory of technical metadata. >> That would be boring, and dry, and useless. >> But that's where, >> For most human beings. >> That's where a lot of vendor solutions start, right? >> Yeah. >> And that's an important foundation. >> Yeah, for people who don't live 100% of their work day inside the big data catalog. I hear what you're saying, you know. >> Yeah, so people who want a data catalog, how you make that relevant to the business is you connect those technical assets, that technical metadata with how is the business actually using this in practice, and how can we have proactive recommendation or the recommendation engines, and certifications, and this information steward then communicating through this platform to others in the organization about how do you interpret this data and how do you use it to actually make business decisions. And I think that's how we're going to close the gap between technology adoption and actual data-driven decision-making, which we're not quite seeing yet. We're only seeing about 30, when they survey, only about 36% of companies are actually confident they're making data-driven decisions, even though there have been, you know, millions, if not billions of dollars that have gone into the data analytics market and investments, and it's because as a manager, I don't quite have the data literacy yet, and I don't quite have the transparency across the rest of the organization to close that trust gap on analytics. >> Here's my feeling, in terms of cultural transformations across businesses in general. I think the legal staff of every company is going to need to get real savvy on using those kinds of tools, like your catalog, with recommendation engines, to support e-discovery, or discovery of the algorithmic decision paths that were taken by their company's products, 'cause they're going to be called by judges and juries, under a subpoena and so forth, and so on, to explain all this, and they're human beings who've got law degrees, but who don't know data, and they need the data environment to help them frame up a case for what we did, and you know, so, we being the company that's involved. >> Yeah, and our politicians. I mean, anyone who's read Cathy's book, Weapons of Math Destruction, there are some great use cases of where, >> Math, M-A-T-H, yeah. >> Yes, M-A-T-H. But there are some great examples of where algorithms can go wrong, and many of our politicians and our representatives in government aren't quite ready to have that conversation. I think anyone who watched the Zuckerberg hearings you know, in congress saw the gap of knowledge that exists between >> Oh my gosh. >> The legal community, and you know, and the tech community today. So there's a lot of work to be done to get ready for this new future. >> But just getting back to the cultural transformation needed to be, to make data-driven decisions, one of the things you were talking about is getting the managers to trust the data, and we're hearing about what are the best practices to have that happen in the sense, of starting small, be willing to experiment, get out of the lab, try to get to insight right away. What are, what would your best advice be, to gain trust in the data? >> Yeah, I think the biggest gap is this issue of transparency. How do you make sure that everyone understands each step of the process and has access to be able to dig into that. If you have a foundation of transparency, it's a lot easier to trust, rather than, you know, right now, we have kind of like the high priesthood of analytics going on, right? (Rebecca laughs) And some believers will believe, but a lot of folks won't, and, you know, the origin story of Alation is really about taking these concepts of the scientific revolution and scientific process and how can we support, for data analysis, those same steps of scientific evaluation of a finding. That means that you need to publish your data set, you need to allow others to rework that data, and come up with their own findings, and you have to be open and foster conversations around data in your organization. One other customer of ours, Meijer, who's a grocery store in the mid-west, and if you're west coast or east coast-based, you might not have heard of them-- >> Oh, Meijers, thrifty acres. I'm from Michigan, and I know them, yeah. >> Gigantic. >> Yeah, there you go. Gigantic grocery chain in the mid-west, and, Joe Oppenheimer there actually introduced a program that he calls the social contract for analytics, and before anyone gets their license to use Tableau, or MicroStrategy, or SaaS, or any of the tools internally, he asks those individuals to sign a social contract, which basically says that I'll make my work transparent, I will document what I'm doing so that it's shareable, I'll use certain standards on how I format the data, so that if I come up with a, with a really insightful finding, it can be easily put into production throughout the rest of the organization. So this is a really simple example. His inspiration for that social contract was his high school freshman. He was entering high school and had to sign a social contract, that he wouldn't make fun of the teachers, or the students, you know, >> I love it. >> Very simple basics. >> Yeah, right, right, right. >> I wouldn't make fun of the teacher. >> We all need social contract. >> Oh my gosh, you have to make fun of the teacher. >> I think it was a little more formal than that, in the language, but that was the concept. >> That's violating your civil rights as a student. I'm sorry. (Stephanie laughs) >> Stephanie, always so much fun to have you here. Thank you so much for coming on. >> Thank you. It's a pleasure to be here. >> I'm Rebecca Knight, for James Kobielus. We'll have more of theCUBE's live coverage of DataWorks just after this.
SUMMARY :
brought to you by Hortonworks. She is the Vice President of Marketing Thank you for having me and that humans actually of the time is that yeah. I mean, who can crack but also make sure that the robots That's right. And really canceled because Alation is in the and the analysis of it. and leading to actual recommendations the way I see it. and to be comfortable with It's a requirement for GDPR compliance, the metadata, but also helping to manage that plays into what Alation that metadata or the data itself, or supplement and complement the data. it's all the logic, I love to geek out, but, She's the normal human being I'm normal. I'm the abnormal and know what to apply that needs to be in the That's right, we're always hiring. and how they should be able to apply that. Yeah, this conference It's a business culture and how comprehensive they are. in the organization. and in standard language? on data scientists to be to explain, and I mean, and the data scientist to rank the relevance rank, you know, definition of data catalog. and dry, and useless. And that's an important inside the big data catalog. and I don't quite have the transparency and so on, to explain all this, Yeah, and our politicians. and many of our politicians and the tech community today. is getting the managers to trust the data, and has access to be and I know them, yeah. or the students, you know, the teacher. the teacher. in the language, but that was That's violating much fun to have you here. It's a pleasure to be here. We'll have more of theCUBE's live coverage
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
James Kobielus | PERSON | 0.99+ |
Stephanie McReynolds | PERSON | 0.99+ |
Rebecca Knight | PERSON | 0.99+ |
Rebecca | PERSON | 0.99+ |
Michigan | LOCATION | 0.99+ |
Stephanie | PERSON | 0.99+ |
Berlin | LOCATION | 0.99+ |
James | PERSON | 0.99+ |
100% | QUANTITY | 0.99+ |
Kevin Slavin | PERSON | 0.99+ |
San Jose | LOCATION | 0.99+ |
millions | QUANTITY | 0.99+ |
Cathy | PERSON | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
Pfizer | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Munich Re Insurance | ORGANIZATION | 0.99+ |
San Jose, California | LOCATION | 0.99+ |
congress | ORGANIZATION | 0.99+ |
New York City | LOCATION | 0.99+ |
Joe Oppenheimer | PERSON | 0.99+ |
Python | TITLE | 0.99+ |
10 | QUANTITY | 0.99+ |
Meijers | ORGANIZATION | 0.99+ |
Zuckerberg | PERSON | 0.99+ |
16 different algorithms | QUANTITY | 0.99+ |
Weapons of Math Destruction | TITLE | 0.99+ |
GDPR | TITLE | 0.99+ |
One | QUANTITY | 0.98+ |
each step | QUANTITY | 0.98+ |
theCUBE | ORGANIZATION | 0.98+ |
about 36% | QUANTITY | 0.98+ |
DataWorks Summit 2018 | EVENT | 0.97+ |
Tableau | TITLE | 0.97+ |
about 30 | QUANTITY | 0.97+ |
Hortonworks | ORGANIZATION | 0.97+ |
Alation | ORGANIZATION | 0.96+ |
one algorithm | QUANTITY | 0.96+ |
30 different vendors | QUANTITY | 0.95+ |
billions of dollars | QUANTITY | 0.95+ |
2015 | DATE | 0.95+ |
SaaS | TITLE | 0.94+ |
one | QUANTITY | 0.94+ |
Gigantic | ORGANIZATION | 0.93+ |
first | QUANTITY | 0.9+ |
MicroStrategy | TITLE | 0.88+ |
this morning | DATE | 0.88+ |
couple of months ago | DATE | 0.84+ |
today | DATE | 0.81+ |
Meijer | ORGANIZATION | 0.77+ |
Wiki | TITLE | 0.74+ |
Vice President | PERSON | 0.72+ |
DataWorks | ORGANIZATION | 0.71+ |
Alation | PERSON | 0.53+ |
DataWorks | EVENT | 0.43+ |
Nenshad Bardoliwalla & Stephanie McReynolds | BigData NYC 2017
>> Live from midtown Manhattan, it's theCUBE covering Big Data New York City 2017. Brought to you by Silicon Angle Media and its ecosystem sponsors. (upbeat techno music) >> Welcome back, everyone. Live here in New York, Day Three coverage, winding down for three days of wall to wall coverage theCUBE covering Big Data NYC in conjunction with Strata Data, formerly Strata Hadoop and Hadoop World, all part of the Big Data ecosystem. Our next guest is Nenshad Bardoliwalla Co-Founder and Chief Product Officer of Paxata, hot start up in the space. A lot of kudos. Of course, they launched on theCUBE in 2013 three years ago when we started theCUBE as a separate event from O'Reilly. So, great to see the success. And Stephanie McReynolds, you've been on multiple times, VP of Marketing at Alation. Welcome back, good to see you guys. >> Thank you. >> Happy to be here. >> So, winding down, so great kind of wrap-up segment here in addition to the partnership that you guys have. So, let's first talk about before we get to the wrap-up of the show and kind of bring together the week here and kind of summarize everything. Tell about your partnership you guys have. Paxata, you guys have been doing extremely well. Congratulations. Prakash was talking on theCUBE. Great success. You guys worked hard for it. I'm happy for you. But partnering is everything. Ecosystem is everything. Alation, their collaboration with data. That's there ethos. They're very user-centric. >> Nenshad: Yes. >> From the founders. Seemed like a good fit. What's the deal? >> It's a very natural fit between the two companies. When we started down the path of building new information management capabilities it became very clear that the market had strong need for both finding data, right? What do I actually have? I need an inventory, especially if my data's in Amazon S3, my data is in Azure Blob storage, my data is on-premise in HDFS, my data is in databases, it's all over the place. And I need to be able to find it. And then once I find it, I want to be able to prepare it. And so, one of the things that really drove this partnership was the very common interests that both companies have. And number one, pushing user experience. I love the Alation product. It's very easy to use, it's very intuitive, really it's a delightful thing to work with. And at the same time they also share our interests in working in these hybrid multicloud environments. So, what we've done and what we announced here at Strata is actually this bi-directional integration between the products. You can start in Alation and find a data set that you want to work with, see what collaboration or notes or business metadata people have created and then say, I want to go see this in Paxata. And in a single click you can then actually open it up in Paxata and profile that data. Vice versa you can also be in Paxata and prepare data, and then with a single click push it back, and then everybody who works with Alation actually now has knowledge of where that data is. So, it's a really nice synergy. >> So, you pushed the user data back to Alation, cause that's what they care a lot about, the cataloging and making the user-centric view work. So, you provide, it's almost a flow back and forth. It's a handshake if you will to data. Am I getting that right? >> Yeah, I mean, the idea's to keep the analyst or the user of that data, data scientist, even in some cases a business user, keep them in the flow of their work as much as possible. But give them the advantage of understanding what others in the organization have done with that data prior and allow them to transform it, and then share that knowledge back with the rest of the community that might be working with that data. >> John: So, give me an example. I like your Excel spreadsheet concept cause that's obvious. People know what Excel spreadsheet is so. So, it's Excel-like. That's an easy TAM to go after. All Microsoft users might not get that Azure thing. But this one, just take me through a usecase. >> So, I've got a good example. >> Okay, take me through. >> It's very common in a data lake for your data to be compressed. And when data's compressed, to a user it looks like a black box. So, if the data is compressed in Avro or Parquet or it's even like JSON format. A business user has no idea what's in that file. >> John: Yeah. >> So, what we do is we find the file for them. It may have some comments on that file of how that data's been used in past projects that we infer from looking at how others have used that data in Alation. >> John: So, you put metadata around it. >> We put a whole bunch of metadata around it. It might be comments that people have made. It might be >> Annotations, yeah. >> actual observations, annotations. And the great thing that we can do with Paxata is open that Avro file or Parquet file, open it up so that you can actually see the data elements themselves. So, all of a sudden, the business user has access without having to use a command line utility or understand anything about compression, and how you open that file up-- >> John: So, as Paxata spitting out there nuggets of value back to you, you're kind of understanding it, translating it to the user. And they get to do their thing, you get to do your thing, right? >> It's making a Avro or a Parquet file as easy to use as Excel, basically. Which is great, right? >> It's awesome. >> Now, you've enabled >> a whole new class of people who can use that. >> Well, and people just >> Get turned off when it's anything like jargon, or like, "What is that? I'm afraid it's phishing. Click on that and oh!" >> Well, the scary thing is that in a data lake environment, in a lot of cases people don't even label the files with extensions. They're just files. (Stephanie laughs) So, what started-- >> It's like getting your pictures like DS, JPEG. It's like what? >> Exactly. >> Right. >> So, you're talking about unlabeled-- >> If you looked on your laptop, and if you didn't have JPEG or DOC or PPT. Okay, I don't know that this file is. Well, what you have in the data lake environment is that you have thousands of these files that people don't really know what they are. And so, with Alation we have the ability to get all the value around the curation of the metadata, and how people are using that data. But then somebody says, "Okay, but I understand that this file exists. What's in it?" And then with Click to Profile from Alation you're immediately taken into Paxata. And now you're actually looking at what's in that file. So, you can very quickly go from this looks interesting to let me understand what's inside of it. And that's very powerful. >> Talk about Alation. Cause I had the CEO on, also their lead investor Greg Sands from Costanoa Ventures. They're a pretty amazing team but it's kind of out there. No offense, it's kind of a compliment actually. (Stephanie laughs) >> They got a symbolic >> Stephanie: Keep going. system Stanford guy, who's like super-smart. >> Nenshad: Yeah. >> They're on something that's really unique but it's almost too simple to be. Like, wait a minute! Google for the data, it's an awesome opportunity. How do you describe Alation to people who say, "Hey, what's this Alation thing?" >> Yeah, so I think that the best way to describe it is it's the browser for all of the distributed data in the enterprise. Sorry, so it's both the catalog, and the browser that sits on top of it. It sounds very simple. Conceptually it's very simple but they have a lot of richness in what they're able to do behind the scenes in terms of introspecting what type of work people are doing with data, and then taking that knowledge and actually surfacing it to the end user. So, for example, they have very powerful scenarios where they can watch what people are doing in different data sources, and then based on that information actually bubble up how queries are being used or the different patterns that people are doing to consume data with. So, what we find really exciting is that this is something that is very complex under the covers. Which Paxata is as well being built upon Spark. But they have put in the hard engineering work so that it looks simple to the end user. And that's the exact same thing that we've tried to do. >> And that's the hard problem. Okay, Stephanie back ... That was a great example by the way. Can't wait to have our little analyst breakdown of the event. But back to Alation for you. So, how do you talk about, you've been VP of Marketing of Alation. But you've been around the block. You know B2B, tech, big data. So, you've seen a bunch of different, you've worked at Trifacta, you worked at other companies, and you've seen a lot of waves of innovation come. What's different about Alation that people might not know about? How do you describe the difference? Because it sounds easy, "Oh, it's a browser! It's a catalog!" But it's really hard. Is it the tech that's the secret? Is it the approach? How do you describe the value of Alation? I think what's interesting about Alation is that we're solving a problem that since the dawn of the data warehouse has not been solved. And that is how to help end users really find and understand the data that they need to do their jobs. A lot of our customers talk about this-- >> John: Hold on. Repeat that. Cause that's like a key thing. What problem hasn't been solved since the data warehouse? >> To be able to actually find and fully understand, understand to the point of trust the data that you want to use for your analysis. And so, in the world of-- >> John: That sounds so simple. >> Stephanie: In the world of data warehousing-- >> John: Why is it so hard? >> Well, because in the world of data warehousing business people were told what data they should use. Someone in IT decided how to model the data, came up with a KPR calculation, and told you as a business person, you as a CEO, this is how you're going to monitor you business. >> John: Yeah. >> What business person >> Wants to be told that by an IT guy, right? >> Well, it was bounded by IT. >> Right. >> Expression and discovery >> Should be unbounded. Machine learning can take care of a lot of bounded stuff. I get that. But like, when you start to get into the discovery side of it, it should be free. >> Well, no offense to the IT team, but they were doing their best to try to figure out how to make this technology work. >> Well, just look at the cost of goods sold for storage. I mean, how many EMC drives? Expensive! IT was not cheap. >> Right. >> Not even 10, 15, 20 years ago. >> So, now when we have more self-service access to data, and we can have more exploratory analysis. What data science really introduced and Hadoop introduced was this ability on-demand to be able to create these structures, you have this more iterative world of how you can discover and explore datasets to come to an insight. The only challenge is, without simplifying that process, a business person is still lost, right? >> John: Yeah. >> Still lost in the data. >> So, we simply call that a catalog. But a catalog is much more-- >> Index, catalog, anthology, there's other words for it, right? >> Yeah, but I think it's interesting because like a concept of a catalog is an inventory has been around forever in this space. But the concept of a catalog that learns from other's behavior with that data, this concept of Behavior I/O that Aaron talked about earlier today. The fact that behavior of how people query data as an input and that input then informs a recommendation as an output is very powerful. And that's where all the machine learning and A.I. comes to work. It's hidden underneath that concept of Behavior I/O but that's there real innovation that drives this rich catalog is how can we make active recommendations to a business person who doesn't have to understand the technology but they know how to apply that data to making a decision. >> Yeah, that's key. Behavior and textual information has always been the two fly wheels in analysis whether you're talking search engine or data in general. And I think what I like about the trends here at Big Data NYC this weekend. We've certainly been seeing it at the hundreds of CUBE events we've gone to over the past 12 months and more is that people are using data differently. Not only say differently, there's baselining, foundational things you got to do. But the real innovators have a twist on it that give them an advantage. They see how they can use data. And the trend is collective intelligence of the customer seems to be big. You guys are doing it. You're seeing patterns. You're automating the data. So, it seems to be this fly wheel of some data, get some collective data. What's your thoughts and reactions. Are people getting it? Is this by people doing it by accident on purpose kind of thing? Did people just fell on their head? Or you see, "Oh, I just backed into this?" >> I think that the companies that have emerged as the leaders in the last 15 or 20 years, Google being a great example, Amazon being a great example. These are companies whose entire business models were based on data. They've generated out-sized returns. They are the leaders on the stock market. And I think that many companies have awoken to the fact that data as a monetizable asset to be turned into information either for analysis, to be turned into information for generating new products that can then be resold on the market. The leading edge companies have figured that out, and our adopting technologies like Alation, like Paxata, to get a competitive advantage in the business processes where they know they can make a difference inside of the enterprise. So, I don't think it's a fluke at all. I think that most of these companies are being forced to go down that path because they have been shown the way in terms of the digital giants that are currently ruling the enterprise tech world. >> All right, what's your thoughts on the week this week so far on the big trends? What are obvious, obviously A.I., don't need to talk about A.I., but what were the big things that came out of it? And what surprised you that didn't come out from a trends standpoint buzz here at Strata Data and Big Data NYC? What were the big themes that you saw emerge and didn't emerge what was the surprise? Any surprises? >> Basically, we're seeing in general the maturation of the market finally. People are finally realizing that, hey, it's not just about cool technology. It's not about what distribution or package. It's about can you actually drive return on investment? Can you actually drive insights and results from the stack? And so, even the technologists that we were talking with today throughout the course of the show are starting to talk about it's that last mile of making the humans more intelligent about navigating this data, where all the breakthroughs are going to happen. Even in places like IOT, where you think about a lot of automation, and you think about a lot of capability to use deep learning to maybe make some decisions. There's still a lot of human training that goes into that decision-making process and having agency at the edge. And so I think this acknowledgement that there should be balance between human input and what the technology can do is a nice breakthrough that's going to help us get to the next level. >> What's missing? What do you see that people missed that is super-important, that wasn't talked much about? Is there anything that jumps out at you? I'll let you think about it. Nenshad, you have something now. >> Yeah, I would say I completely agree with what Stephanie said which we are seeing the market mature. >> John: Yeah. >> And there is a compelling force to now justify business value for all the investments people have made. The science experiment phase of the big data world is over. People now have to show a return on that investment. I think that being said though, this is my sort of way of being a little more provocative. I still think there's way too much emphasis on data science and not enough emphasis on the average business analyst who's doing work in the Fortune 500. >> It should be kind of the same thing. I mean, with data science you're just more of an advanced analyst maybe. >> Right. But the idea that every person who works with data is suddenly going to understand different types of machine learning models, and what's the right way to do hyper parameter tuning, and other words that I could throw at you to show that I'm smart. (laughter) >> You guys have a vision with the Excel thing. I could see how you see that perspective because you see a future. I just think we're not there yet because I think the data scientists are still handcuffed and hamstrung by the fact that they're doing too much provisioning work, right? >> Yeah. >> To you're point about >> surfacing the insights, it's like the data scientists, "Oh, you own it now!" They become the sysadmin, if you will, for their department. And it's like it's not their job. >> Well, we need to get them out of data preparation, right? >> Yeah, get out of that. >> You shouldn't be a data scientist-- >> Right now, you have two values. You've got the use interface value, which I love, but you guys do the automation. So, I think we're getting there. I see where you're coming from, but still those data sciences have to set the tone for the generation, right? So, it's kind of like you got to get those guys productive. >> And it's not a .. Please go ahead. >> I mean, it's somewhat interesting if you look at can the data scientist start to collaborate a little bit more with the common business person? You start to think about it as a little bit of scientific inquiry process. >> John: Yeah. >> Right? >> If you can have more innovators around the table in a common place to discuss what are the insights in this data, and people are bringing business perspective together with machine learning perspective, or the knowledge of the higher algorithms, then maybe you can bring those next leaps forward. >> Great insight. If you want my observations, I use the crazy analogy. Here's my crazy analogy. Years it's been about the engine Model T, the car, the horse and buggy, you know? Now, "We got an engine in the car!" And they got wheels, it's got a chassis. And so, it's about the apparatus of the car. And then it evolved to, "Hey, this thing actually drives. It's transportation." You can actually go from A to B faster than the other guys, and people still think there's a horse and buggy market out there. So, they got to go to that. But now people are crashing. Now, there's an art to driving the car. >> Right. >> So, whether you're a sports car or whatever, this is where the value piece I think hits home is that, people are driving the data now. They're driving the value proposition. So, I think that, to me, the big surprise here is how people aren't getting into the hype cycle. They like the hype in terms of lead gen, and A.I., but they're too busy for the hype. It's like, drive the value. This is not just B.S. either, outcomes. It's like, "I'm busy. I got security. I got app development." >> And I think they're getting smarter about how their valuing data. We're starting to see some economic models, and some ways of putting actual numbers on what impact is this data having today. We do a lot of usage analysis with our customers, and looking at they have a goal to distribute data across more of the organization, and really get people using it in a self-service manner. And from that, you're being able to calculate what actually is the impact. We're not just storing this for insurance policy reasons. >> Yeah, yeah. >> And this cheap-- >> John: It's not some POC. Don't do a POC. All right, so we're going to end the day and the segment on you guys having the last word. I want to phrase it this way. Share an anecdotal story you've heard from a customer, or a prospective customer, that looked at your product, not the joint product but your products each, that blew you away, and that would be a good thing to leave people with. What was the coolest or nicest thing you've heard someone say about Alation and Paxata? >> For me, the coolest thing they said, "This was a social network for nerds. I finally feel like I've found my home." (laughter) >> Data nerds, okay. >> Data nerds. So, if you're a data nerd, you want to network, Alation is the place you want to be. >> So, there is like profiles? And like, you guys have a profile for everybody who comes in? >> Yeah, so the interesting thing is part of our automation, when we go and we index the data sources we also index the people that are accessing those sources. So, you kind of have a leaderboard now of data users, that contract one another in system. >> John: Ooh. >> And at eBay leader was this guy, Caleb, who was their data scientist. And Caleb was famous because everyone in the organization would ask Caleb to prepare data for them. And Caleb was like well known if you were around eBay for awhile. >> John: Yeah, he was the master of the domain. >> And then when we turned on, you know, we were indexing tables on teradata as well as their Hadoop implementation. And all of a sudden, there are table structures that are Caleb underscore cussed. Caleb underscore revenue. Caleb underscore ... We're like, "Wow!" Caleb drove a lot of teradata revenue. (Laughs) >> Awesome. >> Paxata, what was the coolest thing someone said about you in terms of being the nicest or coolest most relevant thing? >> So, something that a prospect said earlier this week is that, "I've been hearing in our personal lives about self-driving cars. But seeing your product and where you're going with it I see the path towards self-driving data." And that's really what we need to aspire towards. It's not about spending hours doing prep. It's not about spending hours doing manual inventories. It's about getting to the point that you can automate the usage to get to the outcomes that people are looking for. So, I'm looking forward to self-driving information. Nenshad, thanks so much. Stephanie from Alation. Thanks so much. Congratulations both on your success. And great to see you guys partnering. Big, big community here. And just the beginning. We see the big waves coming, so thanks for sharing perspective. >> Thank you very much. >> And your color commentary on our wrap up segment here for Big Data NYC. This is theCUBE live from New York, wrapping up great three days of coverage here in Manhattan. I'm John Furrier. Thanks for watching. See you next time. (upbeat techo music)
SUMMARY :
Brought to you by Silicon Angle Media and Hadoop World, all part of the Big Data ecosystem. in addition to the partnership that you guys have. What's the deal? And so, one of the things that really drove this partnership So, you pushed the user data back to Alation, Yeah, I mean, the idea's to keep the analyst That's an easy TAM to go after. So, if the data is compressed in Avro or Parquet of how that data's been used in past projects It might be comments that people have made. And the great thing that we can do with Paxata And they get to do their thing, as easy to use as Excel, basically. a whole new class of people Click on that and oh!" the files with extensions. It's like getting your pictures like DS, JPEG. is that you have thousands of these files Cause I had the CEO on, also their lead investor Stephanie: Keep going. Google for the data, it's an awesome opportunity. And that's the exact same thing that we've tried to do. And that's the hard problem. What problem hasn't been solved since the data warehouse? the data that you want to use for your analysis. Well, because in the world of data warehousing But like, when you start to get into to the IT team, but they were doing Well, just look at the cost of goods sold for storage. of how you can discover and explore datasets So, we simply call that a catalog. But the concept of a catalog that learns of the customer seems to be big. And I think that many companies have awoken to the fact And what surprised you that didn't come out And so, even the technologists What do you see that people missed the market mature. in the Fortune 500. It should be kind of the same thing. But the idea that every person and hamstrung by the fact that they're doing They become the sysadmin, if you will, So, it's kind of like you got to get those guys productive. And it's not a .. can the data scientist start to collaborate or the knowledge of the higher algorithms, the car, the horse and buggy, you know? So, I think that, to me, the big surprise here is across more of the organization, and the segment on you guys having the last word. For me, the coolest thing they said, Alation is the place you want to be. Yeah, so the interesting thing is if you were around eBay for awhile. And all of a sudden, there are table structures And great to see you guys partnering. See you next time.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Stephanie | PERSON | 0.99+ |
Stephanie McReynolds | PERSON | 0.99+ |
Greg Sands | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Caleb | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
Nenshad | PERSON | 0.99+ |
New York | LOCATION | 0.99+ |
Prakash | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Aaron | PERSON | 0.99+ |
Silicon Angle Media | ORGANIZATION | 0.99+ |
2013 | DATE | 0.99+ |
thousands | QUANTITY | 0.99+ |
Costanoa Ventures | ORGANIZATION | 0.99+ |
Manhattan | LOCATION | 0.99+ |
two companies | QUANTITY | 0.99+ |
both companies | QUANTITY | 0.99+ |
Excel | TITLE | 0.99+ |
Trifacta | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Strata Data | ORGANIZATION | 0.99+ |
Alation | ORGANIZATION | 0.99+ |
Paxata | ORGANIZATION | 0.99+ |
Nenshad Bardoliwalla | PERSON | 0.99+ |
eBay | ORGANIZATION | 0.99+ |
three days | QUANTITY | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
two values | QUANTITY | 0.99+ |
NYC | LOCATION | 0.99+ |
hundreds | QUANTITY | 0.99+ |
Big Data | ORGANIZATION | 0.99+ |
first | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
Strata Hadoop | ORGANIZATION | 0.99+ |
Hadoop World | ORGANIZATION | 0.99+ |
earlier this week | DATE | 0.98+ |
Paxata | PERSON | 0.98+ |
today | DATE | 0.98+ |
Day Three | QUANTITY | 0.98+ |
Parquet | TITLE | 0.96+ |
three years ago | DATE | 0.96+ |
Stephanie McReynolds, Alation & Lee Paries, Think Big Analytics - #BigDataSV - #theCUBE
>> Voiceover: San Jose, California, tt's theCUBE, covering Big Data Silicon Valley 2017. (techno music) >> Hey, welcome back everyone. Live in Silicon Valley for Big Data SV. This is theCUBE coverage in conjunction with Strata + Hadoop. I'm John Furrier with George Gilbert at Wikibon. Two great guests. We have Stephanie McReynolds, Vice President of startup Alation, and Lee Paries who is the VP of Think Big Analytics. Thanks for coming back. Both been on theCUBE, you have been on theCUBE before, but Think Big has been on many times. Good to see you. What's new, what are you guys up to? >> Yeah, excited to be here and to be here with Lee. Lee and I have a personal relationship that goes back quite aways in the industry. And then what we're talking about today is the integration between Kylo, which was recently announced as an open source project from Think Big, and Alation's capability to sit on top of Kylo and to gather to increase the velocity of data lake initiatives, kind of going from zero to 60 in a pretty short amount of time to get both technical value from Kylo and business value from Alation. >> So talk about Alation's traction, because you guys has been an interesting startup, a lot of great press. George is a big fan. He's going to jump in with some questions, but some good product fit with the market. What's the update? What's some of the status on the traction in terms of the company and customers and whatnot? >> Yeah, we've been growing pretty rapidly for a startup. We've doubled our production customer count from last time we talked. Some great brand names. Munich Reinsurance this morning was talking about their implementation. So they have 600 users of Alation in their organization. We've entered Europe, not only with Munich Reinsurance but Tesco is a large account of ours in Europe now. And here in the States we've seen broad adoption across a wide range of industries, every one from Pfizer in the healthcare space to eBay, who's been our longest standing customer. They have about 1,000 weekly users on Alation. So not only a great increase in number of logos, but also organic growth internally at many of these companies across data scientists, data analysts, business analysts, a wide range of users of the product, as well. >> It's been interesting. What I like about your approach, and we talk about Think Big about it before, we let every guest come in so far that's been in the same area is talking about metadata layers, and so this is interesting, there's a metadata data addressability if you will for lack of a better description, but yet human usable has to be integrating into human processes, whether it's virtualization, or any kind of real time app or anything. So you're seeing this convergence between I need to get the data into an app, whether it's IoT data or something else, really really fast, so really kind of the discovery pieces now, the interesting layer, how competitive is it, and what's the different solutions that you guys see in this market? >> Yeah, I think it's interesting, because metadata has kind of had a revival, right? Everyone is talking about the importance in metadata and open integration with metadata. I think really our angle is as Alation is that having open transfer of technical metadata is very important for the foundation of analytics, but what really brings that technical metadata to life is also understanding what is the business context of what's happening technically in the system? What's the business context of data? What's the behavioral context of how that data has been used that might inform me as an analyst? >> And what's your unique approach to that? Because that's like the Holy Grail. It's like translating geek metadata, indexing stuff into like usable business outcomes. It's been a cliche for years, you know. >> The approach is really based on machine learning and AI technology to make recommendations to business users about what might be interesting to them. So we're at a state in the market where there is so much data that is available and that you can access, either in Hadoop as a data lake or in a data warehouse in a database like Teradata, that today what you need as state of the art is the system to start to recommend to you what might be interesting data for you to use as a data scientist or an analyst, and not just what's the data you could use, but how accurate is that data, how trustworthy is it? I think there's a whole nother theme of governance that's rising that's tied to that metadata discussion, which is it's not enough to just shove bits and bytes between different systems anymore. You really need to understand how has this data been manipulated and used and how does that influence my security considerations, my privacy considerations, the value I'm going to be able to get out of that data set? >> What's your take on this, 'cause you guys have a relationship. How is Think Big doing? Then talk about the partnership you guys have with Alation. >> Sure, so I mean when you look at what we've done specifically to an open source project it's the first one that Teradata has fully sponsored and released based on Apache 2.0 called Kylo, it's really about the enablement of the full data lake platform and the full framework, everywhere from ingest, to securing it, to governing it, which part of that is collecting is part of that process, the basic technical and business metadata so later you can hand it over to the user so they could sample, they could profile the data, they can find, they can search in a Google like manner, and then you can enable the organization with that data. So when you look at it from a standpoint of partnering together, it's really about collecting that data specifically within Hadoop to enable it, yet with the ability then to hand it off to more the enterprise wide solution like Alation through API connections that connect to that, and then for them they enrich it in a way that they go about it with the social collaboration and the business to extend it from there. >> So that's the accelerant then. So you're accelerating the open source project in through this new, with Alation. So you're still going to rock and roll with the open source. >> Very much going to rock and roll with the open source. So it's really been based on five years of Think Big's work in the marketplace over about 150 data lakes. The IT we've built around that to do things repeatedly, consistently, and then releasing that in the last two years, dedicated development based on Apache Spark and NiFi to stand that out. >> Great work by the way. Open sources continue to be more relevant. But I got to get your perspective on a meme that's been floating around day one here, and maybe it's because of the election, but someone said, "We got to drain the data swamp, "and make data great again." And not a play on Trump, but the data lake is going through a transition and saying, "Okay, we've got data lakes," but now this year it's been a focus on making that much more active and cleaner and making sure it doesn't become a swamp if you will. So there's been a focus of taking data lake content and getting it into real time, and IoT has kind of I think been a forcing function. But you guys, do you guys have a perspective on that on where data lakes are going? Certainly it's been trending conversation here at the show. >> Yeah, I think IoT has been part of drain that data swamp, but I think also now you have a mass of business analysts that are starting to get access to that data in the lake. These Hadoop implementations are maturing to the stage where you have-- >> John: To value coming out of it. >> Yeah, and people are trying to wring value out of that lake, and sometimes finding that it is harder than they expected because the data hasn't been pre-prepared for them. This old world of IT would pre-prepare the data, and then I got a single metric or I got a couple metrics to choose from is now turned on its head. People are taking a more exploratory, discovery oriented approach to navigating through their data and finding that the nuisances of data really matter when trying to evolve an insight. So the literacy in these organizations and their awareness of some of the challenges of a lake are coming to the forefront, and I think that's a healthy conversation for us all to have. If you're going to have a data driven organization, you have to really understand the nuisances of your data to know where to apply it appropriately to decision making. >> So (mumbles) actually going back quite a few years when he started at Microsoft said, Internet software has changed paradigm so much in that we have this new set of actions where it was discover, learn, try, buy, recommend, and it sounds like as a consumer of data in a data lake we've added or preppended this discovery step. Where in a well curated data warehouse it was learn, you had your X dimensions that were curated and refined, and you don't have that as much with the data lake. I guess I'm wondering, it's almost like if you're going to take, as we were talking to the last team with AtScale and moving OLAP to be something you consume on a data lake the way you consume on a data warehouse, it's almost like Alation and a smart catalog is as much a requirement as a visualization tool is by itself on a data warehouse? >> I think what we're seeing is this notion of data needing to be curated, and including many brains and many different perspectives in that curation process is something that's defining the future of analytics and how people use technical metadata, and what does it mean for the devops organization to get involved in draining that swamp? That means not only looking at the elements of the data that are coming in from a technical perspective, but then collaborating with a business to curate the value on top of that data. >> So in other words it's not just to help the user, the business analyst, navigate, but it's also to help the operational folks do a better job of curating once they find out who's using it, who's using the data and how. >> That's right. They kind of need to know how this data is going to be used in the organization. The volumes are so high that they couldn't possibly curate every bit and byte that is stored in the data lake. So by looking at how different individuals in the organization and different groups are trying to access that data that gives early signal to where should we be spending more time or less time in processing this data and helping the organization really get to their end goals of usage. >> Lee, I want to ask you a question. On your blog post, I just was pointed out earlier, you guys quote a Gartner stat which says, which is pretty doom and gloom, which said, "70% of Hadoop deployments in 2017 "will either fail or deliver their estimated cost savings "of their predicted revenue." And then it says, "That's a dim view, "but not shared by the Kylo community." How are you guys going to make the Kylo data lake software work well? What's your thoughts on that? Because I think people, that's the number one, again, question that I highlighted earlier is okay, I don't want a swamp, so that's fear, whether they get one or not, so they worry about data cleansing and all these things. So what's Kylo doing that's going to accelerate, or lower that number, of fails in the data lake world? >> Yeah sure, so again, a lot of it's through experience of going out there and seeing what's done. A lot of people have been doing a lot of different things within the data lakes, but when you go in there there's certain things they're not doing, and then when you're doing them it's about doing them over consistently and continually improving upon that, and that's what Kylo is, it's really a framework that we keep adding to, and as the community grows and other projects come in there can enhance it we bring the value. But a lot of times when we go in it it's basically end users can't get to the data, either one because they're not allowed to because maybe it's not secured and relied to turn it over to them and let them drive with it, or they don't know the data is there, which goes back to basic collecting the basic metadata and data (mumbles) to know it's there to leverage it. So a lot of times it's going back and looking at and leveraging what we have to build that solid foundation so IT and operations can feel like they can hand that over in a template format so business users could get to the data and start acting off of that. >> You just lost your mic there, but Stephanie, I got to ask you a question. So just on a point of clarification, so you guys, are you supporting Kylo? Is that the relationship, or how does that work? >> So we're integrated with Kylo. So Kylo will ingest data into the lake, manage that data lake from a security perspective giving folks permissions, enables some wrangling on that data, and what Alation is receiving then from Kylo is that technical metadata that's being created along that entire path. >> So you're certified with Kylo? How does that all work from the customer standpoint? >> That's a very much integration partnership that we'd be working together. >> So from a customer standpoint it's clean and you then provide the benefits on the other side? >> Correct. >> Yeah, absolutely. We've been working with data lake implementations for some time, since our founding really, and I think this is an extension of our philosophy that the data lakes are going to play an important role that are going to complement databases and analytics tools, business intelligence tools, and the analytics environment, and the open source is part of the future of how folks are building these environments. So we're excited to support the Kylo initiative. We've had a longstanding relationship with Teradata as a partner, so it's a great way to work together. >> Thanks for coming on theCUBE. Really appreciate it, and thank... What do you think of the show you guys so far? What's the current vibe of the show? >> Oh, it's been good so far. I mean, it's one day into it, but very good vibe so far. Different topics and different things-- >> AI machine learning. You couldn't be more happier with that machine learning-- >> Great to see machine learning taking a forefront, people really digging into the details around what it means when you apply it. >> Stephanie, thanks for coming on theCUBE, really appreciate it. More CUBE coverage after the show break. Live from Silicon Valley, I'm John Furrier with George Gilbert. We'll be right back after this short break. (techno music)
SUMMARY :
(techno music) What's new, what are you guys up to? and to gather to increase He's going to jump in with some questions, And here in the States we've seen broad adoption that you guys see in this market? Everyone is talking about the importance in metadata Because that's like the Holy Grail. is the system to start to recommend to you Then talk about the partnership you guys have with Alation. and the business to extend it from there. So that's the accelerant then. and NiFi to stand that out. and maybe it's because of the election, to the stage where you have-- and finding that the nuisances of data really matter to be something you consume on a data lake and many different perspectives in that curation process but it's also to help the operational folks and helping the organization really get in the data lake world? and data (mumbles) to know it's there to leverage it. but Stephanie, I got to ask you a question. and what Alation is receiving then from Kylo that we'd be working together. that the data lakes are going to play an important role What's the current vibe of the show? Oh, it's been good so far. You couldn't be more happier with that machine learning-- people really digging into the details More CUBE coverage after the show break.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Stephanie McReynolds | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Europe | LOCATION | 0.99+ |
Stephanie | PERSON | 0.99+ |
Lee | PERSON | 0.99+ |
Tesco | ORGANIZATION | 0.99+ |
Lee Paries | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Trump | PERSON | 0.99+ |
2017 | DATE | 0.99+ |
John | PERSON | 0.99+ |
Pfizer | ORGANIZATION | 0.99+ |
five years | QUANTITY | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Think Big | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
70% | QUANTITY | 0.99+ |
San Jose, California | LOCATION | 0.99+ |
Alation | ORGANIZATION | 0.99+ |
Teradata | ORGANIZATION | 0.99+ |
Think Big Analytics | ORGANIZATION | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
Gartner | ORGANIZATION | 0.99+ |
zero | QUANTITY | 0.99+ |
Kylo | ORGANIZATION | 0.99+ |
60 | QUANTITY | 0.99+ |
600 users | QUANTITY | 0.98+ |
AtScale | ORGANIZATION | 0.98+ |
eBay | ORGANIZATION | 0.98+ |
ORGANIZATION | 0.98+ | |
today | DATE | 0.98+ |
first one | QUANTITY | 0.98+ |
Hadoop | TITLE | 0.98+ |
Both | QUANTITY | 0.98+ |
both | QUANTITY | 0.97+ |
Two great guests | QUANTITY | 0.97+ |
this year | DATE | 0.97+ |
about 1,000 weekly users | QUANTITY | 0.97+ |
one day | QUANTITY | 0.95+ |
single metric | QUANTITY | 0.95+ |
Apache Spark | ORGANIZATION | 0.94+ |
Kylo | TITLE | 0.93+ |
Wikibon | ORGANIZATION | 0.93+ |
NiFi | ORGANIZATION | 0.92+ |
about 150 data lakes | QUANTITY | 0.92+ |
Apache 2.0 | TITLE | 0.89+ |
this morning | DATE | 0.88+ |
couple | QUANTITY | 0.86+ |
Big Data Silicon Valley 2017 | EVENT | 0.84+ |
day one | QUANTITY | 0.83+ |
Vice President | PERSON | 0.81+ |
Strata | TITLE | 0.77+ |
Kylo | PERSON | 0.77+ |
#theCUBE | ORGANIZATION | 0.76+ |
Big Data | ORGANIZATION | 0.75+ |
last two years | DATE | 0.71+ |
one | QUANTITY | 0.7+ |
Munich Reinsurance | ORGANIZATION | 0.62+ |
CUBE | ORGANIZATION | 0.52+ |
Stephanie McReynolds - HP Big Data 2015 - theCUBE
live from Boston Massachusetts extracting the signal from the noise it's the kue covering HP big data conference 2015 brought to you by HP software now your host John furrier and Dave vellante okay welcome back everyone we are here live in boston massachusetts for HP's big data conference this is a special presentation of the cube our flagship program where we go out to the events and extract the season for the noise I'm John furrier with Dave allante here Wikibon down on research our next guest Stephanie McReynolds VP margon elation hot new startup that's been kind of coming out of stealth that's out there big data a lot of great stuff Stephanie welcome to the cube great see you great to be here tell us what the start at first of all because good buzz going on it's kind of stealth buzz but it's really with the fought leaders and really the you know the people in the industry who know what they're talking about like what you guys are doing so so introduce the company tells me you guys are doing and relationship with Vertica and exciting stuff absolutely a lesion is a exciting company we just started to come out of south in March of this year and we came out of self with some great production customers so eBay is a customer they have hundreds of analysts using our systems we also have square as a customer smaller analytics team but the value that you Neelix teams are getting out of this product is really being able to access their data in human context so we do some machine learning to look at how individuals are using data in an organization and take that machine learning and also gather some of the human insights about how that data is being used by experts surface that all in line with in work so what kind of data cuz Stonebreaker was kind of talking yesterday about the 3 v's which we all know but the one that's really coming mainstream in terms of a problem space is variety variety you have the different variety of schema sources and then you have a lot of unstructured exhaust or data flying around can you be specific on what you guys do yeah I mean it's interesting because there's several definitions of data and big data going around right and so I'm you know we connect to a lot of database systems and we also connect to a lot of Hadoop implementations so we deal with both structured data as well as what i consider unstructured data and i think the third part of what we do is bring in context from human created data or cumin information with which robert yesterday was talking about a little bit which is you know what happens in a lot of analytic organizations is that and there's a very manual process of documenting some of the data that's being used in these projects and that's done on wiki pages or spreadsheets that are floating around the organization and that's actually a really black base camp all these collaboration all these collaboration platforms and what you realize when you start to really get into the work of using that information to try to write your queries is that trying to reference a wiki page and then write your sequel and flip back and forth between maybe ten different documents is not very productive for the analyst so what our customers are seeing is that by consolidating all of that data and information in one place where the tables are actually reference side by side with the annotations their analysts can get from twenty to fifty percent savings and productivity and new analysts maybe more importantly new analyst can get up to speed quite a bit quicker and that square the day I was talking to one of the the data scientists and he was was talking about you know his process for finding data in the organization which prior to using elation it would take about 30 minutes going two maybe three or four people to find the data he needed for his analysis and with elation in five seconds he can run a query search for the date he wants gets it back gets all kind of all that expert annotation already around that base data said he's ready to roll he can start I'm testing some of us akashi go platform right they've heard it was it a platform and it and you said you work with a lot of database the databases right so it's tightly integrated with the database in this use case so it's interesting and you know we see databases as a source of information so we don't create copies of the data on our platform we go out and point to the data where it lies and surface that you know that data to to the end user now in the case of verdict on our relationship with Vertica and we've also integrated verdict in our stack to support we call data forensics which is the building for not an analyst who's using the system day to day but for NIT individual to understand where the behaviors around this data and the types of analysis that are being done and so verdicts a great high performance platform for dashboarding and business intelligence a back end of that providing you know quick access to aggregates so one of they will work on a vertica you guys just the engine what specifically again yeah so so we use the the vertica the vertical engine underneath our forensics product and then the that's you know one portion of our platform the rest of our platform is built out on other other technologies so verdict is part of your solution it's part of our solution it's it's one application that we part of one application we deliver so we've been talking all week about this week Colin Mahoney in his talk yesterday and I saw a pretty little history on erp how initially was highly customized and became packaged apps and he sort of pointed to a similar track with analytics although he said it's not going to be the same it's going to be more composable sort of applications I wonder and historically the analytics in the database have been closely aligned I'll say maybe not integrated you see that model continuing do you see it more packaged apps or will thus what Collins calling composable apps what's the relationship between your platforming and the application yeah so our platform is is really more tooling for those individuals that are building or creating those applications so we're helping data scientists and analysts find what algorithms they want to use as a foundation for those applications so a little bit more on the discovery side where folks are doing a lot of experiment and experimentation they may be having to prepare data in different ways in order to figure out what might work for those applications and that's where we fit in as a vendor and what's your license model and so you know we're on a subscription model we have customers that have data teams in the in the hundreds at a place like eBay you know the smaller implementations could be maybe just teams of five analyst 10a analyst fairly small spatial subscription and it's a seat base subscription but we can run in the cloud we can run on premise and we do some interesting things around securing the data where you can and see your columns bommana at the data sets for financial services organizations and our customers that have security concerns and most of those are on premise top implementation 70 talk about the inspiration of the company in about the company he's been three years since then came out of stealth what's the founders like what's the DNA the company what do you guys do differently and what was the inspiration behind this yeah what's really what's really interesting I think about the founding of the company is that and the technical founders come from both Google and Apple so you have an interesting observation that both individuals had made independently hardcore algorithmic guy and then like relevant clean yeah and both those kind of made interesting observations about how Google and Apple two of the most data-driven companies you know on the planet we're struggling and their analytics teams were struggling with being able to share queries and share data sets and there was a lot of replication of work that was happening and so much for the night you know but both of these folks from different angles kind of came together at adulation said look there's there's a lot of machine learning algorithms that could help with this process and there's also a lot of good ways with natural language processing to let people interact with their data in more natural ways the founder from from Apple Aaron key he was on the Siri team so we had a lot of experience designing products for navigability and ease of use and natural language learning and so those two perspectives coming together have created some technology fundamentals in our product and it's an experience to some scar tissue from large-scale implementations of data yeah very large-scale implementations of data and also a really deep awareness of what the human equation brings to the table so machine learning algorithms aren't enough in and of themselves and I think ken rudin had some interesting comments this morning where you know he kind of pushed it one step further and said it's not just about finding insight data science about is about having impact and you can't have impact unless you create human contacts and you have communication and collaboration around the data so we give analyst a query tool by which we surface the machine learning context that we have about the data that's being used in the organization and what queries have been running that data but we surface in a way where the human can get recommendations about how to improve their their sequel and drive towards impact and then share that understanding with other analysts in the organization so you get an innovation community that's started so who you guys targets let's step back on the page go to market now you guys are launched got some funding can you share the amount or is it private confidential or was how much did you raise who are you targeting what's your go-to market what's the value proposition give us the give us this data yeah so its initial value proposition is just really about analyst productivity that's where we're targeted how can you take your teams of analysts and everyone knows it's hard to hire these days so you're not going to be able to grow those teams out overnight how do you make the analyst the data scientist the phd's you have on staff much more productive how do you take that eighty to ninety percent of the time that they make them using stuff sharing data because I stuff you in the sharing data try to get them out of the TD of trying to just find eight in the organization and prepare it and let them really innovate and and use that to drive value back to the to the organization so we're often selling to individual analysts to analytics teams the go to market starts there and the value proposition really extends much further in the organization so you know you find teams and organizations that have been trying to document their data through traditional data governance means or ETL tools for a very long time and a lot of those projects have stalled out and the way that we crawl systems and use machine learning automation and to automate some of that documentation really gives those projects and new life in our enterprise data has always been elusive I mean do you go back decades structured day to all these pre pre built databases it's been hard right so it's you can crack that nut that's going to be a very lucrative in this opportunity I got the Duke clusters now storing everything I mean some clients we talked to here on the key customers of a CHP or IBM big companies they're storing everything just because they don't know they do it again yeah I mean if the past has been hard in part because we in some cases over manage the modeling of the data and I think what's exciting now about storing all your data in Hadoop and storing first and then asking questions later is you're able to take a more discovery oriented hypothesis testing iterative approach and if you think about how true innovation works you know you build insights on top of one another to get to the big breakthrough concepts and so I think we're at an interesting point in the market for a solution like this that can help with that increasing complexity of data environment so you just raise your series a raised nine million you maybe did some seed round before that so pretty early days for you guys you mentioned natural language processing before one of your founders are you using NLP and in your solution in any way or so we have a we have a search interface that allows you to look for that technical data to look for metadata and for data objects and by entering a simple simple natural language search terms so we are using that as part of our interface in solution right and so kind of early customer successes can you talk about any examples or yeah you know there's some great examples and jointly with Vertica square is as a customer and their analytics team is using us on a day-to-day basis not only to find data sets and the organization but to document those those data sets and eBay has hundreds of analysts that are using elation today in a day to day manner and they've seen quite a bit of productivity out of their new analysts that are coming on the system's it used to take analysts about 18 months to really get their feet around them in the ebay environment because of the complexity of all of the different systems at ebay and understanding where to go for that customer table you know that they needed to use and now analysts are up and running about six months and their data governance team has found that elation has really automated and prioritized the process around documentation for them and so it's a great light a great foundation for them there and data curators and data stewards to go in and rich the data and collaborate more with the analysts and the actual data users to get to a point of catalogued catalog data disease so what's the next you guys going to be on the road in New York Post Radek hadoop world big data NYC is coming up a big event in New York I'm Cuba visa we're getting the word out about elation and then what we're doing we have customers that are you know starting to speak about their use cases and the value that they're seeing and will be in New York market share I believe will be speaking on our behalf there to share their stories and then we're also going to a couple other conferences after that you know the fall is an exciting time which one's your big ones there so i will be at strada in New York and a September early October and then mid-october we're going to be at both teradata partners and tableaus conference as well so we connect not only to databases of all set different sorts but also to go with users are the tools yeah awesome well anything else you'd like to add share at the company is awesome we're some great things about you guys been checking around I'll see you found out about you guys and a lot of people like the company I mean a lot of insiders like moving little see you didn't raise too much cash that's raised lettin that's not the million zillion dollar round I think what led you guys take nine million yeah raised a million and I you know I think we're building this company in a traditional value oriented way great word hey stay long bringing in revenue and trying to balance that out with the venture capital investment it's not that we won't take money but we want to build this company in a very durable so the vision is to build a durable company absolutely absolutely and that may be different than some of our competitors out there these days but that's that we've and I have not taken any financing and SiliconANGLE at all so you know we're getting we believe in that and you might pass up some things but you know what have control and you guys have some good partners so congratulations um final word what's this conference like you go to a lot of events what's your take on this on this event yeah I do i do end up going to a lot of events that's part of the marketing role you know i think what's interesting about this conference is that there are a lot of great conversations that are happening and happening not just from a technology perspective but also between business people and deep thinking about how to innovate and verticals customers i think are some of the most loyal customers i've seen in the in the market so it's great in their advanced to they're talking about some pretty big problems but they're solving it's not like little point solutions it's more we architecting some devops i get a dev I'm good I got trashed on Twitter private messages all last night about me calling this a DevOps show it's not really a DevOps cloud show but there's a DevOps vibe here the people who are working on the solutions I think they're just a real of real vibe people are solving real problems and they're talking about them and they're sharing their opinions and I I think that's you know that's similar to what you see in DevOps the guys with dev ops are in the front line the real engineers their engineering so they have to engineer because of that no pretenders here that's for sure are you talking about it's not a big sales conference right it's a lot of customer content their engineering solutions talking to Peter wants a bullshit they want reaiah I mean I got a lot on the table i'm gonna i'm doing some serious work and i want serious conversations and that's refreshing for us but we love love of hits like it's all right Stephanie thinks for so much come on cubes sharing your insight congratulations good luck with the new startup hot startups here in Boston hear the verdict HP software show will be right back more on the cube after this short break you you
**Summary and Sentiment Analysis are not been shown because of improper transcript**
ENTITIES
Entity | Category | Confidence |
---|---|---|
Colin Mahoney | PERSON | 0.99+ |
Stephanie McReynolds | PERSON | 0.99+ |
Apple | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
twenty | QUANTITY | 0.99+ |
Peter | PERSON | 0.99+ |
eBay | ORGANIZATION | 0.99+ |
New York | LOCATION | 0.99+ |
Boston | LOCATION | 0.99+ |
three | QUANTITY | 0.99+ |
John furrier | PERSON | 0.99+ |
Stephanie | PERSON | 0.99+ |
five seconds | QUANTITY | 0.99+ |
Vertica square | ORGANIZATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
ken rudin | PERSON | 0.99+ |
three years | QUANTITY | 0.99+ |
Dave vellante | PERSON | 0.99+ |
Vertica | ORGANIZATION | 0.99+ |
nine million | QUANTITY | 0.99+ |
ninety percent | QUANTITY | 0.99+ |
Dave allante | PERSON | 0.99+ |
yesterday | DATE | 0.99+ |
Cuba | LOCATION | 0.99+ |
both individuals | QUANTITY | 0.99+ |
hundreds | QUANTITY | 0.99+ |
Boston Massachusetts | LOCATION | 0.99+ |
ten different documents | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
fifty percent | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
mid-october | DATE | 0.98+ |
HP | ORGANIZATION | 0.98+ |
robert | PERSON | 0.98+ |
one | QUANTITY | 0.98+ |
nine million | QUANTITY | 0.98+ |
about six months | QUANTITY | 0.98+ |
Collins | PERSON | 0.97+ |
2015 | DATE | 0.97+ |
Hadoop | TITLE | 0.97+ |
two perspectives | QUANTITY | 0.97+ |
Aaron | PERSON | 0.97+ |
Siri | TITLE | 0.97+ |
four people | QUANTITY | 0.97+ |
eighty | QUANTITY | 0.97+ |
this week | DATE | 0.97+ |
Neelix | ORGANIZATION | 0.96+ |
about 30 minutes | QUANTITY | 0.96+ |
Radek | PERSON | 0.95+ |
CHP | ORGANIZATION | 0.95+ |
HP Big Data | ORGANIZATION | 0.95+ |
one place | QUANTITY | 0.95+ |
about 18 months | QUANTITY | 0.95+ |
March of this year | DATE | 0.95+ |
NYC | LOCATION | 0.95+ |
five | QUANTITY | 0.94+ |
hundreds of analysts | QUANTITY | 0.94+ |
ebay | ORGANIZATION | 0.93+ |
eight | QUANTITY | 0.93+ |
third part | QUANTITY | 0.92+ |
boston massachusetts | LOCATION | 0.91+ |
one application | QUANTITY | 0.91+ |
70 | QUANTITY | 0.9+ |
this morning | DATE | 0.9+ |
ORGANIZATION | 0.89+ | |
today | DATE | 0.89+ |
SiliconANGLE | ORGANIZATION | 0.88+ |
big data | EVENT | 0.88+ |
one step | QUANTITY | 0.88+ |
September early October | DATE | 0.87+ |
last night | DATE | 0.84+ |
lot of events | QUANTITY | 0.84+ |
NLP | ORGANIZATION | 0.83+ |
a million | QUANTITY | 0.79+ |
lot of events | QUANTITY | 0.78+ |
lot | QUANTITY | 0.78+ |
Wikibon Presents: Software is Eating the Edge | The Entangling of Big Data and IIoT
>> So as folks make their way over from Javits I'm going to give you the least interesting part of the evening and that's my segment in which I welcome you here, introduce myself, lay out what what we're going to do for the next couple of hours. So first off, thank you very much for coming. As all of you know Wikibon is a part of SiliconANGLE which also includes theCUBE, so if you look around, this is what we have been doing for the past couple of days here in the TheCUBE. We've been inviting some significant thought leaders from over on the show and in incredibly expensive limousines driven them up the street to come on to TheCUBE and spend time with us and talk about some of the things that are happening in the industry today that are especially important. We tore it down, and we're having this party tonight. So we want to thank you very much for coming and look forward to having more conversations with all of you. Now what are we going to talk about? Well Wikibon is the research arm of SiliconANGLE. So we take data that comes out of TheCUBE and other places and we incorporated it into our research. And work very closely with large end users and large technology companies regarding how to make better decisions in this incredibly complex, incredibly important transformative world of digital business. What we're going to talk about tonight, and I've got a couple of my analysts assembled, and we're also going to have a panel, is this notion of software is eating the Edge. Now most of you have probably heard Marc Andreessen, the venture capitalist and developer, original developer of Netscape many years ago, talk about how software's eating the world. Well, if software is truly going to eat the world, it's going to eat at, it's going to take the big chunks, big bites at the Edge. That's where the actual action's going to be. And what we want to talk about specifically is the entangling of the internet or the industrial internet of things and IoT with analytics. So that's what we're going to talk about over the course of the next couple of hours. To do that we're going to, I've already blown the schedule, that's on me. But to do that I'm going to spend a couple minutes talking about what we regard as the essential digital business capabilities which includes analytics and Big Data, and includes IIoT and we'll explain at least in our position why those two things come together the way that they do. But I'm going to ask the august and revered Neil Raden, Wikibon analyst to come on up and talk about harvesting value at the Edge. 'Cause there are some, not now Neil, when we're done, when I'm done. So I'm going to ask Neil to come on up and we'll talk, he's going to talk about harvesting value at the Edge. And then Jim Kobielus will follow up with him, another Wikibon analyst, he'll talk specifically about how we're going to take that combination of analytics and Edge and turn it into the new types of systems and software that are going to sustain this significant transformation that's going on. And then after that, I'm going to ask Neil and Jim to come, going to invite some other folks up and we're going to run a panel to talk about some of these issues and do a real question and answer. So the goal here is before we break for drinks is to create a community feeling within the room. That includes smart people here, smart people in the audience having a conversation ultimately about some of these significant changes so please participate and we look forward to talking about the rest of it. All right, let's get going! What is digital business? One of the nice things about being an analyst is that you can reach back on people who were significantly smarter than you and build your points of view on the shoulders of those giants including Peter Drucker. Many years ago Peter Drucker made the observation that the purpose of business is to create and keep a customer. Not better shareholder value, not anything else. It is about creating and keeping your customer. Now you can argue with that, at the end of the day, if you don't have customers, you don't have a business. Now the observation that we've made, what we've added to that is that we've made the observation that the difference between business and digital business essentially is one thing. That's data. A digital business uses data to differentially create and keep customers. That's the only difference. If you think about the difference between taxi cab companies here in New York City, every cab that I've been in in the last three days has bothered me about Uber. The reason, the difference between Uber and a taxi cab company is data. That's the primary difference. Uber uses data as an asset. And we think this is the fundamental feature of digital business that everybody has to pay attention to. How is a business going to use data as an asset? Is the business using data as an asset? Is a business driving its engagement with customers, the role of its product et cetera using data? And if they are, they are becoming a more digital business. Now when you think about that, what we're really talking about is how are they going to put data to work? How are they going to take their customer data and their operational data and their financial data and any other kind of data and ultimately turn that into superior engagement or improved customer experience or more agile operations or increased automation? Those are the kinds of outcomes that we're talking about. But it is about putting data to work. That's fundamentally what we're trying to do within a digital business. Now that leads to an observation about the crucial strategic business capabilities that every business that aspires to be more digital or to be digital has to put in place. And I want to be clear. When I say strategic capabilities I mean something specific. When you talk about, for example technology architecture or information architecture there is this notion of what capabilities does your business need? Your business needs capabilities to pursue and achieve its mission. And in the digital business these are the capabilities that are now additive to this core question, ultimately of whether or not the company is a digital business. What are the three capabilities? One, you have to capture data. Not just do a good job of it, but better than your competition. You have to capture data better than your competition. In a way that is ultimately less intrusive on your markets and on your customers. That's in many respects, one of the first priorities of the internet of things and people. The idea of using sensors and related technologies to capture more data. Once you capture that data you have to turn it into value. You have to do something with it that creates business value so you can do a better job of engaging your markets and serving your customers. And that essentially is what we regard as the basis of Big Data. Including operations, including financial performance and everything else, but ultimately it's taking the data that's being captured and turning it into value within the business. The last point here is that once you have generated a model, or an insight or some other resource that you can act upon, you then have to act upon it in the real world. We call that systems of agency, the ability to enact based on data. Now I want to spend just a second talking about systems of agency 'cause we think it's an interesting concept and it's something Jim Kobielus is going to talk about a little bit later. When we say systems of agency, what we're saying is increasingly machines are acting on behalf of a brand. Or systems, combinations of machines and people are acting on behalf of the brand. And this whole notion of agency is the idea that ultimately these systems are now acting as the business's agent. They are at the front line of engaging customers. It's an extremely rich proposition that has subtle but crucial implications. For example I was talking to a senior decision maker at a business today and they made a quick observation, they talked about they, on their way here to New York City they had followed a woman who was going through security, opened up her suitcase and took out a bird. And then went through security with the bird. And the reason why I bring this up now is as TSA was trying to figure out how exactly to deal with this, the bird started talking and repeating things that the woman had said and many of those things, in fact, might have put her in jail. Now in this case the bird is not an agent of that woman. You can't put the woman in jail because of what the bird said. But increasingly we have to ask ourselves as we ask machines to do more on our behalf, digital instrumentation and elements to do more on our behalf, it's going to have blow back and an impact on our brand if we don't do it well. I want to draw that forward a little bit because I suggest there's going to be a new lifecycle for data. And the way that we think about it is we have the internet or the Edge which is comprised of things and crucially people, using sensors, whether they be smaller processors in control towers or whether they be phones that are tracking where we go, and this crucial element here is something that we call information transducers. Now a transducer in a traditional sense is something that takes energy from one form to another so that it can perform new types of work. By information transducer I essentially mean it takes information from one form to another so it can perform another type of work. This is a crucial feature of data. One of the beauties of data is that it can be used in multiple places at multiple times and not engender significant net new costs. It's one of the few assets that you can say about that. So the concept of an information transducer's really important because it's the basis for a lot of transformations of data as data flies through organizations. So we end up with the transducers storing data in the form of analytics, machine learning, business operations, other types of things, and then it goes back and it's transduced, back into to the real world as we program the real world and turning into these systems of agency. So that's the new lifecycle. And increasingly, that's how we have to think about data flows. Capturing it, turning it into value and having it act on our behalf in front of markets. That could have enormous implications for how ultimately money is spent over the next few years. So Wikibon does a significant amount of market research in addition to advising our large user customers. And that includes doing studies on cloud, public cloud, but also studies on what's happening within the analytics world. And if you take a look at it, what we basically see happening over the course of the next few years is significant investments in software and also services to get the word out. But we also expect there's going to be a lot of hardware. A significant amount of hardware that's ultimately sold within this space. And that's because of something that we call true private cloud. This concept of ultimately a business increasingly being designed and architected around the idea of data assets means that the reality, the physical realities of how data operates, how much it costs to store it or move it, the issues of latency, the issues of intellectual property protection as well as things like the regulatory regimes that are being put in place to govern how data gets used in between locations. All of those factors are going to drive increased utilization of what we call true private cloud. On premise technologies that provide the cloud experience but act where the data naturally needs to be processed. I'll come a little bit more to that in a second. So we think that it's going to be a relatively balanced market, a lot of stuff is going to end up in the cloud, but as Neil and Jim will talk about, there's going to be an enormous amount of analytics that pulls an enormous amount of data out to the Edge 'cause that's where the action's going to be. Now one of the things I want to also reveal to you is we've done a fair amount of data, we've done a fair amount of research around this question of where or how will data guide decisions about infrastructure? And in particular the Edge is driving these conversations. So here is a piece of research that one of our cohorts at Wikibon did, David Floyer. Taking a look at IoT Edge cost comparisons over a three year period. And it showed on the left hand side, an example where the sensor towers and other types of devices were streaming data back into a central location in a wind farm, stylized wind farm example. Very very expensive. Significant amounts of money end up being consumed, significant resources end up being consumed by the cost of moving the data from one place to another. Now this is even assuming that latency does not become a problem. The second example that we looked at is if we kept more of that data at the Edge and processed at the Edge. And literally it is a 85 plus percent cost reduction to keep more of the data at the Edge. Now that has enormous implications, how we think about big data, how we think about next generation architectures, et cetera. But it's these costs that are going to be so crucial to shaping the decisions that we make over the next two years about where we put hardware, where we put resources, what type of automation is possible, and what types of technology management has to be put in place. Ultimately we think it's going to lead to a structure, an architecture in the infrastructure as well as applications that is informed more by moving cloud to the data than moving the data to the cloud. That's kind of our fundamental proposition is that the norm in the industry has been to think about moving all data up to the cloud because who wants to do IT? It's so much cheaper, look what Amazon can do. Or what AWS can do. All true statements. Very very important in many respects. But most businesses today are starting to rethink that simple proposition and asking themselves do we have to move our business to the cloud, or can we move the cloud to the business? And increasingly what we see happening as we talk to our large customers about this, is that the cloud is being extended out to the Edge, we're moving the cloud and cloud services out to the business. Because of economic reasons, intellectual property control reasons, regulatory reasons, security reasons, any number of other reasons. It's just a more natural way to deal with it. And of course, the most important reason is latency. So with that as a quick backdrop, if I may quickly summarize, we believe fundamentally that the difference today is that businesses are trying to understand how to use data as an asset. And that requires an investment in new sets of technology capabilities that are not cheap, not simple and require significant thought, a lot of planning, lot of change within an IT and business organizations. How we capture data, how we turn it into value, and how we translate that into real world action through software. That's going to lead to a rethinking, ultimately, based on cost and other factors about how we deploy infrastructure. How we use the cloud so that the data guides the activity and not the choice of cloud supplier determines or limits what we can do with our data. And that's going to lead to this notion of true private cloud and elevate the role the Edge plays in analytics and all other architectures. So I hope that was perfectly clear. And now what I want to do is I want to bring up Neil Raden. Yes, now's the time Neil! So let me invite Neil up to spend some time talking about harvesting value at the Edge. Can you see his, all right. Got it. >> Oh boy. Hi everybody. Yeah, this is a really, this is a really big and complicated topic so I decided to just concentrate on something fairly simple, but I know that Peter mentioned customers. And he also had a picture of Peter Drucker. I had the pleasure in 1998 of interviewing Peter and photographing him. Peter Drucker, not this Peter. Because I'd started a magazine called Hired Brains. It was for consultants. And Peter said, Peter said a number of really interesting things to me, but one of them was his definition of a customer was someone who wrote you a check that didn't bounce. He was kind of a wag. He was! So anyway, he had to leave to do a video conference with Jack Welch and so I said to him, how do you charge Jack Welch to spend an hour on a video conference? And he said, you know I have this theory that you should always charge your client enough that it hurts a little bit or they don't take you seriously. Well, I had the chance to talk to Jack's wife, Suzie Welch recently and I told her that story and she said, "Oh he's full of it, Jack never paid "a dime for those conferences!" (laughs) So anyway, all right, so let's talk about this. To me, things about, engineered things like the hardware and network and all these other standards and so forth, we haven't fully developed those yet, but they're coming. As far as I'm concerned, they're not the most interesting thing. The most interesting thing to me in Edge Analytics is what you're going to get out of it, what the result is going to be. Making sense of this data that's coming. And while we're on data, something I've been thinking a lot lately because everybody I've talked to for the last three days just keeps talking to me about data. I have this feeling that data isn't actually quite real. That any data that we deal with is the result of some process that's captured it from something else that's actually real. In other words it's proxy. So it's not exactly perfect. And that's why we've always had these problems about customer A, customer A, customer A, what's their definition? What's the definition of this, that and the other thing? And with sensor data, I really have the feeling, when companies get, not you know, not companies, organizations get instrumented and start dealing with this kind of data what they're going to find is that this is the first time, and I've been involved in analytics, I don't want to date myself, 'cause I know I look young, but the first, I've been dealing with analytics since 1975. And everything we've ever done in analytics has involved pulling data from some other system that was not designed for analytics. But if you think about sensor data, this is data that we're actually going to catch the first time. It's going to be ours! We're not going to get it from some other source. It's going to be the real deal, to the extent that it's the real deal. Now you may say, ya know Neil, a sensor that's sending us information about oil pressure or temperature or something like that, how can you quarrel with that? Well, I can quarrel with it because I don't know if the sensor's doing it right. So we still don't know, even with that data, if it's right, but that's what we have to work with. Now, what does that really mean? Is that we have to be really careful with this data. It's ours, we have to take care of it. We don't get to reload it from source some other day. If we munge it up it's gone forever. So that has, that has very serious implications, but let me, let me roll you back a little bit. The way I look at analytics is it's come in three different eras. And we're entering into the third now. The first era was business intelligence. It was basically built and governed by IT, it was system of record kind of reporting. And as far as I can recall, it probably started around 1988 or at least that's the year that Howard Dresner claims to have invented the term. I'm not sure it's true. And things happened before 1988 that was sort of like BI, but 88 was when they really started coming out, that's when we saw BusinessObjects and Cognos and MicroStrategy and those kinds of things. The second generation just popped out on everybody else. We're all looking around at BI and we were saying why isn't this working? Why are only five people in the organization using this? Why are we not getting value out of this massive license we bought? And along comes companies like Tableau doing data discovery, visualization, data prep and Line of Business people are using this now. But it's still the same kind of data sources. It's moved out a little bit, but it still hasn't really hit the Big Data thing. Now we're in third generation, so we not only had Big Data, which has come and hit us like a tsunami, but we're looking at smart discovery, we're looking at machine learning. We're looking at AI induced analytics workflows. And then all the natural language cousins. You know, natural language processing, natural language, what's? Oh Q, natural language query. Natural language generation. Anybody here know what natural language generation is? Yeah, so what you see now is you do some sort of analysis and that tool comes up and says this chart is about the following and it used the following data, and it's blah blah blah blah blah. I think it's kind of wordy and it's going to refined some, but it's an interesting, it's an interesting thing to do. Now, the problem I see with Edge Analytics and IoT in general is that most of the canonical examples we talk about are pretty thin. I know we talk about autonomous cars, I hope to God we never have them, 'cause I'm a car guy. Fleet Management, I think Qualcomm started Fleet Management in 1988, that is not a new application. Industrial controls. I seem to remember, I seem to remember Honeywell doing industrial controls at least in the 70s and before that I wasn't, I don't want to talk about what I was doing, but I definitely wasn't in this industry. So my feeling is we all need to sit down and think about this and get creative. Because the real value in Edge Analytics or IoT, whatever you want to call it, the real value is going to be figuring out something that's new or different. Creating a brand new business. Changing the way an operation happens in a company, right? And I think there's a lot of smart people out there and I think there's a million apps that we haven't even talked about so, if you as a vendor come to me and tell me how great your product is, please don't talk to me about autonomous cars or Fleet Managing, 'cause I've heard about that, okay? Now, hardware and architecture are really not the most interesting thing. We fell into that trap with data warehousing. We've fallen into that trap with Big Data. We talk about speeds and feeds. Somebody said to me the other day, what's the narrative of this company? This is a technology provider. And I said as far as I can tell, they don't have a narrative they have some products and they compete in a space. And when they go to clients and the clients say, what's the value of your product? They don't have an answer for that. So we don't want to fall into this trap, okay? Because IoT is going to inform you in ways you've never even dreamed about. Unfortunately some of them are going to be really stinky, you know, they're going to be really bad. You're going to lose more of your privacy, it's going to get harder to get, I dunno, mortgage for example, I dunno, maybe it'll be easier, but in any case, it's not going to all be good. So let's really think about what you want to do with this technology to do something that's really valuable. Cost takeout is not the place to justify an IoT project. Because number one, it's very expensive, and number two, it's a waste of the technology because you should be looking at, you know the old numerator denominator thing? You should be looking at the numerators and forget about the denominators because that's not what you do with IoT. And the other thing is you don't want to get over confident. Actually this is good advice about anything, right? But in this case, I love this quote by Derek Sivers He's a pretty funny guy. He said, "If more information was the answer, "then we'd all be billionaires with perfect abs." I'm not sure what's on his wishlist, but you know, I would, those aren't necessarily the two things I would think of, okay. Now, what I said about the data, I want to explain some more. Big Data Analytics, if you look at this graphic, it depicts it perfectly. It's a bunch of different stuff falling into the funnel. All right? It comes from other places, it's not original material. And when it comes in, it's always used as second hand data. Now what does that mean? That means that you have to figure out the semantics of this information and you have to find a way to put it together in a way that's useful to you, okay. That's Big Data. That's where we are. How is that different from IoT data? It's like I said, IoT is original. You can put it together any way you want because no one else has ever done that before. It's yours to construct, okay. You don't even have to transform it into a schema because you're creating the new application. But the most important thing is you have to take care of it 'cause if you lose it, it's gone. It's the original data. It's the same way, in operational systems for a long long time we've always been concerned about backup and security and everything else. You better believe this is a problem. I know a lot of people think about streaming data, that we're going to look at it for a minute, and we're going to throw most of it away. Personally I don't think that's going to happen. I think it's all going to be saved, at least for a while. Now, the governance and security, oh, by the way, I don't know where you're going to find a presentation where somebody uses a newspaper clipping about Vladimir Lenin, but here it is, enjoy yourselves. I believe that when people think about governance and security today they're still thinking along the same grids that we thought about it all along. But this is very very different and again, I'm sorry I keep thrashing this around, but this is treasured data that has to be carefully taken care of. Now when I say governance, my experience has been over the years that governance is something that IT does to make everybody's lives miserable. But that's not what I mean by governance today. It means a comprehensive program to really secure the value of the data as an asset. And you need to think about this differently. Now the other thing is you may not get to think about it differently, because some of the stuff may end up being subject to regulation. And if the regulators start regulating some of this, then that'll take some of the degrees of freedom away from you in how you put this together, but you know, that's the way it works. Now, machine learning, I think I told somebody the other day that claims about machine learning in software products are as common as twisters in trail parks. And a lot of it is not really what I'd call machine learning. But there's a lot of it around. And I think all of the open source machine learning and artificial intelligence that's popped up, it's great because all those math PhDs who work at Home Depot now have something to do when they go home at night and they construct this stuff. But if you're going to have machine learning at the Edge, here's the question, what kind of machine learning would you have at the Edge? As opposed to developing your models back at say, the cloud, when you transmit the data there. The devices at the Edge are not very powerful. And they don't have a lot of memory. So you're only going to be able to do things that have been modeled or constructed somewhere else. But that's okay. Because machine learning algorithm development is actually slow and painful. So you really want the people who know how to do this working with gobs of data creating models and testing them offline. And when you have something that works, you can put it there. Now there's one thing I want to talk about before I finish, and I think I'm almost finished. I wrote a book about 10 years ago about automated decision making and the conclusion that I came up with was that little decisions add up, and that's good. But it also means you don't have to get them all right. But you don't want computers or software making decisions unattended if it involves human life, or frankly any life. Or the environment. So when you think about the applications that you can build using this architecture and this technology, think about the fact that you're not going to be doing air traffic control, you're not going to be monitoring crossing guards at the elementary school. You're going to be doing things that may seem fairly mundane. Managing machinery on the factory floor, I mean that may sound great, but really isn't that interesting. Managing well heads, drilling for oil, well I mean, it's great to the extent that it doesn't cause wells to explode, but they don't usually explode. What it's usually used for is to drive the cost out of preventative maintenance. Not very interesting. So use your heads. Come up with really cool stuff. And any of you who are involved in Edge Analytics, the next time I talk to you I don't want to hear about the same five applications that everybody talks about. Let's hear about some new ones. So, in conclusion, I don't really have anything in conclusion except that Peter mentioned something about limousines bringing people up here. On Monday I was slogging up and down Park Avenue and Madison Avenue with my client and we were visiting all the hedge funds there because we were doing a project with them. And in the miserable weather I looked at him and I said, for godsake Paul, where's the black car? And he said, that was the 90s. (laughs) Thank you. So, Jim, up to you. (audience applauding) This is terrible, go that way, this was terrible coming that way. >> Woo, don't want to trip! And let's move to, there we go. Hi everybody, how ya doing? Thanks Neil, thanks Peter, those were great discussions. So I'm the third leg in this relay race here, talking about of course how software is eating the world. And focusing on the value of Edge Analytics in a lot of real world scenarios. Programming the real world for, to make the world a better place. So I will talk, I'll break it out analytically in terms of the research that Wikibon is doing in the area of the IoT, but specifically how AI intelligence is being embedded really to all material reality potentially at the Edge. But mobile applications and industrial IoT and the smart appliances and self driving vehicles. I will break it out in terms of a reference architecture for understanding what functions are being pushed to the Edge to hardware, to our phones and so forth to drive various scenarios in terms of real world results. So I'll move a pace here. So basically AI software or AI microservices are being infused into Edge hardware as we speak. What we see is more vendors of smart phones and other, real world appliances and things like smart driving, self driving vehicles. What they're doing is they're instrumenting their products with computer vision and natural language processing, environmental awareness based on sensing and actuation and those capabilities and inferences that these devices just do to both provide human support for human users of these devices as well as to enable varying degrees of autonomous operation. So what I'll be talking about is how AI is a foundation for data driven systems of agency of the sort that Peter is talking about. Infusing data driven intelligence into everything or potentially so. As more of this capability, all these algorithms for things like, ya know for doing real time predictions and classifications, anomaly detection and so forth, as this functionality gets diffused widely and becomes more commoditized, you'll see it burned into an ever-wider variety of hardware architecture, neuro synaptic chips, GPUs and so forth. So what I've got here in front of you is a sort of a high level reference architecture that we're building up in our research at Wikibon. So AI, artificial intelligence is a big term, a big paradigm, I'm not going to unpack it completely. Of course we don't have oodles of time so I'm going to take you fairly quickly through the high points. It's a driver for systems of agency. Programming the real world. Transducing digital inputs, the data, to analog real world results. Through the embedding of this capability in the IoT, but pushing more and more of it out to the Edge with points of decision and action in real time. And there are four capabilities that we're seeing in terms of AI enabled, enabling capabilities that are absolutely critical to software being pushed to the Edge are sensing, actuation, inference and Learning. Sensing and actuation like Peter was describing, it's about capturing data from the environment within which a device or users is operating or moving. And then actuation is the fancy term for doing stuff, ya know like industrial IoT, it's obviously machine controlled, but clearly, you know self driving vehicles is steering a vehicle and avoiding crashing and so forth. Inference is the meat and potatoes as it were of AI. Analytics does inferences. It infers from the data, the logic of the application. Predictive logic, correlations, classification, abstractions, differentiation, anomaly detection, recognizing faces and voices. We see that now with Apple and the latest version of the iPhone is embedding face recognition as a core, as the core multifactor authentication technique. Clearly that's a harbinger of what's going to be universal fairly soon which is that depends on AI. That depends on convolutional neural networks, that is some heavy hitting processing power that's necessary and it's processing the data that's coming from your face. So that's critically important. So what we're looking at then is the AI software is taking root in hardware to power continuous agency. Getting stuff done. Powered decision support by human beings who have to take varying degrees of action in various environments. We don't necessarily want to let the car steer itself in all scenarios, we want some degree of override, for lots of good reasons. They want to protect life and limb including their own. And just more data driven automation across the internet of things in the broadest sense. So unpacking this reference framework, what's happening is that AI driven intelligence is powering real time decisioning at the Edge. Real time local sensing from the data that it's capturing there, it's ingesting the data. Some, not all of that data, may be persistent at the Edge. Some, perhaps most of it, will be pushed into the cloud for other processing. When you have these highly complex algorithms that are doing AI deep learning, multilayer, to do a variety of anti-fraud and higher level like narrative, auto-narrative roll-ups from various scenes that are unfolding. A lot of this processing is going to begin to happen in the cloud, but a fair amount of the more narrowly scoped inferences that drive real time decision support at the point of action will be done on the device itself. Contextual actuation, so it's the sensor data that's captured by the device along with other data that may be coming down in real time streams through the cloud will provide the broader contextual envelope of data needed to drive actuation, to drive various models and rules and so forth that are making stuff happen at the point of action, at the Edge. Continuous inference. What it all comes down to is that inference is what's going on inside the chips at the Edge device. And what we're seeing is a growing range of hardware architectures, GPUs, CPUs, FPGAs, ASIC, Neuro synaptic chips of all sorts playing in various combinations that are automating more and more very complex inference scenarios at the Edge. And not just individual devices, swarms of devices, like drones and so forth are essentially an Edge unto themselves. You'll see these tiered hierarchies of Edge swarms that are playing and doing inferences of ever more complex dynamic nature. And much of this will be, this capability, the fundamental capabilities that is powering them all will be burned into the hardware that powers them. And then adaptive learning. Now I use the term learning rather than training here, training is at the core of it. Training means everything in terms of the predictive fitness or the fitness of your AI services for whatever task, predictions, classifications, face recognition that you, you've built them for. But I use the term learning in a broader sense. It's what's make your inferences get better and better, more accurate over time is that you're training them with fresh data in a supervised learning environment. But you can have reinforcement learning if you're doing like say robotics and you don't have ground truth against which to train the data set. You know there's maximize a reward function versus minimize a loss function, you know, the standard approach, the latter for supervised learning. There's also, of course, the issue, or not the issue, the approach of unsupervised learning with cluster analysis critically important in a lot of real world scenarios. So Edge AI Algorithms, clearly, deep learning which is multilayered machine learning models that can do abstractions at higher and higher levels. Face recognition is a high level abstraction. Faces in a social environment is an even higher level of abstraction in terms of groups. Faces over time and bodies and gestures, doing various things in various environments is an even higher level abstraction in terms of narratives that can be rolled up, are being rolled up by deep learning capabilities of great sophistication. Convolutional neural networks for processing images, recurrent neural networks for processing time series. Generative adversarial networks for doing essentially what's called generative applications of all sort, composing music, and a lot of it's being used for auto programming. These are all deep learning. There's a variety of other algorithm approaches I'm not going to bore you with here. Deep learning is essentially the enabler of the five senses of the IoT. Your phone's going to have, has a camera, it has a microphone, it has the ability to of course, has geolocation and navigation capabilities. It's environmentally aware, it's got an accelerometer and so forth embedded therein. The reason that your phone and all of the devices are getting scary sentient is that they have the sensory modalities and the AI, the deep learning that enables them to make environmentally correct decisions in the wider range of scenarios. So machine learning is the foundation of all of this, but there are other, I mean of deep learning, artificial neural networks is the foundation of that. But there are other approaches for machine learning I want to make you aware of because support vector machines and these other established approaches for machine learning are not going away but really what's driving the show now is deep learning, because it's scary effective. And so that's where most of the investment in AI is going into these days for deep learning. AI Edge platforms, tools and frameworks are just coming along like gangbusters. Much development of AI, of deep learning happens in the context of your data lake. This is where you're storing your training data. This is the data that you use to build and test to validate in your models. So we're seeing a deepening stack of Hadoop and there's Kafka, and Spark and so forth that are driving the training (coughs) excuse me, of AI models that are power all these Edge Analytic applications so that that lake will continue to broaden in terms, and deepen in terms of a scope and the range of data sets and the range of modeling, AI modeling supports. Data science is critically important in this scenario because the data scientist, the data science teams, the tools and techniques and flows of data science are the fundamental development paradigm or discipline or capability that's being leveraged to build and to train and to deploy and iterate all this AI that's being pushed to the Edge. So clearly data science is at the center, data scientists of an increasingly specialized nature are necessary to the realization to this value at the Edge. AI frameworks are coming along like you know, a mile a minute. TensorFlow has achieved a, is an open source, most of these are open source, has achieved sort of almost like a defacto standard, status, I'm using the word defacto in air quotes. There's Theano and Keras and xNet and CNTK and a variety of other ones. We're seeing range of AI frameworks come to market, most open source. Most are supported by most of the major tool vendors as well. So at Wikibon we're definitely tracking that, we plan to go deeper in our coverage of that space. And then next best action, powers recommendation engines. I mean next best action decision automation of the sort of thing Neil's covered in a variety of contexts in his career is fundamentally important to Edge Analytics to systems of agency 'cause it's driving the process automation, decision automation, sort of the targeted recommendations that are made at the Edge to individual users as well as to process that automation. That's absolutely necessary for self driving vehicles to do their jobs and industrial IoT. So what we're seeing is more and more recommendation engine or recommender capabilities powered by ML and DL are going to the Edge, are already at the Edge for a variety of applications. Edge AI capabilities, like I said, there's sensing. And sensing at the Edge is becoming ever more rich, mixed reality Edge modalities of all sort are for augmented reality and so forth. We're just seeing a growth in certain, the range of sensory modalities that are enabled or filtered and analyzed through AI that are being pushed to the Edge, into the chip sets. Actuation, that's where robotics comes in. Robotics is coming into all aspects of our lives. And you know, it's brainless without AI, without deep learning and these capabilities. Inference, autonomous edge decisioning. Like I said, it's, a growing range of inferences that are being done at the Edge. And that's where it has to happen 'cause that's the point of decision. Learning, training, much training, most training will continue to be done in the cloud because it's very data intensive. It's a grind to train and optimize an AI algorithm to do its job. It's not something that you necessarily want to do or can do at the Edge at Edge devices so, the models that are built and trained in the cloud are pushed down through a dev ops process down to the Edge and that's the way it will work pretty much in most AI environments, Edge analytics environments. You centralize the modeling, you decentralize the execution of the inference models. The training engines will be in the cloud. Edge AI applications. I'll just run you through sort of a core list of the ones that are coming into, already come into the mainstream at the Edge. Multifactor authentication, clearly the Apple announcement of face recognition is just a harbinger of the fact that that's coming to every device. Computer vision speech recognition, NLP, digital assistance and chat bots powered by natural language processing and understanding, it's all AI powered. And it's becoming very mainstream. Emotion detection, face recognition, you know I could go on and on but these are like the core things that everybody has access to or will by 2020 and they're core devices, mass market devices. Developers, designers and hardware engineers are coming together to pool their expertise to build and train not just the AI, but also the entire package of hardware in UX and the orchestration of real world business scenarios or life scenarios that all this intelligence, the submitted intelligence enables and most, much of what they build in terms of AI will be containerized as micro services through Docker and orchestrated through Kubernetes as full cloud services in an increasingly distributed fabric. That's coming along very rapidly. We can see a fair amount of that already on display at Strata in terms of what the vendors are doing or announcing or who they're working with. The hardware itself, the Edge, you know at the Edge, some data will be persistent, needs to be persistent to drive inference. That's, and you know to drive a variety of different application scenarios that need some degree of historical data related to what that device in question happens to be sensing or has sensed in the immediate past or you know, whatever. The hardware itself is geared towards both sensing and increasingly persistence and Edge driven actuation of real world results. The whole notion of drones and robotics being embedded into everything that we do. That's where that comes in. That has to be powered by low cost, low power commodity chip sets of various sorts. What we see right now in terms of chip sets is it's a GPUs, Nvidia has gone real far and GPUs have come along very fast in terms of power inference engines, you know like the Tesla cars and so forth. But GPUs are in many ways the core hardware sub straight for in inference engines in DL so far. But to become a mass market phenomenon, it's got to get cheaper and lower powered and more commoditized, and so we see a fair number of CPUs being used as the hardware for Edge Analytic applications. Some vendors are fairly big on FPGAs, I believe Microsoft has gone fairly far with FPGAs inside DL strategy. ASIC, I mean, there's neuro synaptic chips like IBM's got one. There's at least a few dozen vendors of neuro synaptic chips on the market so at Wikibon we're going to track that market as it develops. And what we're seeing is a fair number of scenarios where it's a mixed environment where you use one chip set architecture at the inference side of the Edge, and other chip set architectures that are driving the DL as processed in the cloud, playing together within a common architecture. And we see some, a fair number of DL environments where the actual training is done in the cloud on Spark using CPUs and parallelized in memory, but pushing Tensorflow models that might be trained through Spark down to the Edge where the inferences are done in FPGAs and GPUs. Those kinds of mixed hardware scenarios are very, very, likely to be standard going forward in lots of areas. So analytics at the Edge power continuous results is what it's all about. The whole point is really not moving the data, it's putting the inference at the Edge and working from the data that's already captured and persistent there for the duration of whatever action or decision or result needs to be powered from the Edge. Like Neil said cost takeout alone is not worth doing. Cost takeout alone is not the rationale for putting AI at the Edge. It's getting new stuff done, new kinds of things done in an automated consistent, intelligent, contextualized way to make our lives better and more productive. Security and governance are becoming more important. Governance of the models, governance of the data, governance in a dev ops context in terms of version controls over all those DL models that are built, that are trained, that are containerized and deployed. Continuous iteration and improvement of those to help them learn to do, make our lives better and easier. With that said, I'm going to hand it over now. It's five minutes after the hour. We're going to get going with the Influencer Panel so what we'd like to do is I call Peter, and Peter's going to call our influencers. >> All right, am I live yet? Can you hear me? All right so, we've got, let me jump back in control here. We've got, again, the objective here is to have community take on some things. And so what we want to do is I want to invite five other people up, Neil why don't you come on up as well. Start with Neil. You can sit here. On the far right hand side, Judith, Judith Hurwitz. >> Neil: I'm glad I'm on the left side. >> From the Hurwitz Group. >> From the Hurwitz Group. Jennifer Shin who's affiliated with UC Berkeley. Jennifer are you here? >> She's here, Jennifer where are you? >> She was here a second ago. >> Neil: I saw her walk out she may have, >> Peter: All right, she'll be back in a second. >> Here's Jennifer! >> Here's Jennifer! >> Neil: With 8 Path Solutions, right? >> Yep. >> Yeah 8 Path Solutions. >> Just get my mic. >> Take your time Jen. >> Peter: All right, Stephanie McReynolds. Far left. And finally Joe Caserta, Joe come on up. >> Stephie's with Elysian >> And to the left. So what I want to do is I want to start by having everybody just go around introduce yourself quickly. Judith, why don't we start there. >> I'm Judith Hurwitz, I'm president of Hurwitz and Associates. We're an analyst research and fault leadership firm. I'm the co-author of eight books. Most recent is Cognitive Computing and Big Data Analytics. I've been in the market for a couple years now. >> Jennifer. >> Hi, my name's Jennifer Shin. I'm the founder and Chief Data Scientist 8 Path Solutions LLC. We do data science analytics and technology. We're actually about to do a big launch next month, with Box actually. >> We're apparent, are we having a, sorry Jennifer, are we having a problem with Jennifer's microphone? >> Man: Just turn it back on? >> Oh you have to turn it back on. >> It was on, oh sorry, can you hear me now? >> Yes! We can hear you now. >> Okay, I don't know how that turned back off, but okay. >> So you got to redo all that Jen. >> Okay, so my name's Jennifer Shin, I'm founder of 8 Path Solutions LLC, it's a data science analytics and technology company. I founded it about six years ago. So we've been developing some really cool technology that we're going to be launching with Box next month. It's really exciting. And I have, I've been developing a lot of patents and some technology as well as teaching at UC Berkeley as a lecturer in data science. >> You know Jim, you know Neil, Joe, you ready to go? >> Joe: Just broke my microphone. >> Joe's microphone is broken. >> Joe: Now it should be all right. >> Jim: Speak into Neil's. >> Joe: Hello, hello? >> I just feel not worthy in the presence of Joe Caserta. (several laughing) >> That's right, master of mics. If you can hear me, Joe Caserta, so yeah, I've been doing data technology solutions since 1986, almost as old as Neil here, but been doing specifically like BI, data warehousing, business intelligence type of work since 1996. And been doing, wholly dedicated to Big Data solutions and modern data engineering since 2009. Where should I be looking? >> Yeah I don't know where is the camera? >> Yeah, and that's basically it. So my company was formed in 2001, it's called Caserta Concepts. We recently rebranded to only Caserta 'cause what we do is way more than just concepts. So we conceptualize the stuff, we envision what the future brings and we actually build it. And we help clients large and small who are just, want to be leaders in innovation using data specifically to advance their business. >> Peter: And finally Stephanie McReynolds. >> I'm Stephanie McReynolds, I had product marketing as well as corporate marketing for a company called Elysian. And we are a data catalog so we help bring together not only a technical understanding of your data, but we curate that data with human knowledge and use automated intelligence internally within the system to make recommendations about what data to use for decision making. And some of our customers like City of San Diego, a large automotive manufacturer working on self driving cars and General Electric use Elysian to help power their solutions for IoT at the Edge. >> All right so let's jump right into it. And again if you have a question, raise your hand, and we'll do our best to get it to the floor. But what I want to do is I want to get seven questions in front of this group and have you guys discuss, slog, disagree, agree. Let's start here. What is the relationship between Big Data AI and IoT? Now Wikibon's put forward its observation that data's being generated at the Edge, that action is being taken at the Edge and then increasingly the software and other infrastructure architectures need to accommodate the realities of how data is going to work in these very complex systems. That's our perspective. Anybody, Judith, you want to start? >> Yeah, so I think that if you look at AI machine learning, all these different areas, you have to be able to have the data learned. Now when it comes to IoT, I think one of the issues we have to be careful about is not all data will be at the Edge. Not all data needs to be analyzed at the Edge. For example if the light is green and that's good and it's supposed to be green, do you really have to constantly analyze the fact that the light is green? You actually only really want to be able to analyze and take action when there's an anomaly. Well if it goes purple, that's actually a sign that something might explode, so that's where you want to make sure that you have the analytics at the edge. Not for everything, but for the things where there is an anomaly and a change. >> Joe, how about from your perspective? >> For me I think the evolution of data is really becoming, eventually oxygen is just, I mean data's going to be the oxygen we breathe. It used to be very very reactive and there used to be like a latency. You do something, there's a behavior, there's an event, there's a transaction, and then you go record it and then you collect it, and then you can analyze it. And it was very very waterfallish, right? And then eventually we figured out to put it back into the system. Or at least human beings interpret it to try to make the system better and that is really completely turned on it's head, we don't do that anymore. Right now it's very very, it's synchronous, where as we're actually making these transactions, the machines, we don't really need, I mean human beings are involved a bit, but less and less and less. And it's just a reality, it may not be politically correct to say but it's a reality that my phone in my pocket is following my behavior, and it knows without telling a human being what I'm doing. And it can actually help me do things like get to where I want to go faster depending on my preference if I want to save money or save time or visit things along the way. And I think that's all integration of big data, streaming data, artificial intelligence and I think the next thing that we're going to start seeing is the culmination of all of that. I actually, hopefully it'll be published soon, I just wrote an article for Forbes with the term of ARBI and ARBI is the integration of Augmented Reality and Business Intelligence. Where I think essentially we're going to see, you know, hold your phone up to Jim's face and it's going to recognize-- >> Peter: It's going to break. >> And it's going to say exactly you know, what are the key metrics that we want to know about Jim. If he works on my sales force, what's his attainment of goal, what is-- >> Jim: Can it read my mind? >> Potentially based on behavior patterns. >> Now I'm scared. >> I don't think Jim's buying it. >> It will, without a doubt be able to predict what you've done in the past, you may, with some certain level of confidence you may do again in the future, right? And is that mind reading? It's pretty close, right? >> Well, sometimes, I mean, mind reading is in the eye of the individual who wants to know. And if the machine appears to approximate what's going on in the person's head, sometimes you can't tell. So I guess, I guess we could call that the Turing machine test of the paranormal. >> Well, face recognition, micro gesture recognition, I mean facial gestures, people can do it. Maybe not better than a coin toss, but if it can be seen visually and captured and analyzed, conceivably some degree of mind reading can be built in. I can see when somebody's angry looking at me so, that's a possibility. That's kind of a scary possibility in a surveillance society, potentially. >> Neil: Right, absolutely. >> Peter: Stephanie, what do you think? >> Well, I hear a world of it's the bots versus the humans being painted here and I think that, you know at Elysian we have a very strong perspective on this and that is that the greatest impact, or the greatest results is going to be when humans figure out how to collaborate with the machines. And so yes, you want to get to the location more quickly, but the machine as in the bot isn't able to tell you exactly what to do and you're just going to blindly follow it. You need to train that machine, you need to have a partnership with that machine. So, a lot of the power, and I think this goes back to Judith's story is then what is the human decision making that can be augmented with data from the machine, but then the humans are actually training the training side and driving machines in the right direction. I think that's when we get true power out of some of these solutions so it's not just all about the technology. It's not all about the data or the AI, or the IoT, it's about how that empowers human systems to become smarter and more effective and more efficient. And I think we're playing that out in our technology in a certain way and I think organizations that are thinking along those lines with IoT are seeing more benefits immediately from those projects. >> So I think we have a general agreement of what kind of some of the things you talked about, IoT, crucial capturing information, and then having action being taken, AI being crucial to defining and refining the nature of the actions that are being taken Big Data ultimately powering how a lot of that changes. Let's go to the next one. >> So actually I have something to add to that. So I think it makes sense, right, with IoT, why we have Big Data associated with it. If you think about what data is collected by IoT. We're talking about a serial information, right? It's over time, it's going to grow exponentially just by definition, right, so every minute you collect a piece of information that means over time, it's going to keep growing, growing, growing as it accumulates. So that's one of the reasons why the IoT is so strongly associated with Big Data. And also why you need AI to be able to differentiate between one minute versus next minute, right? Trying to find a better way rather than looking at all that information and manually picking out patterns. To have some automated process for being able to filter through that much data that's being collected. >> I want to point out though based on what you just said Jennifer, I want to bring Neil in at this point, that this question of IoT now generating unprecedented levels of data does introduce this idea of the primary source. Historically what we've done within technology, or within IT certainly is we've taken stylized data. There is no such thing as a real world accounting thing. It is a human contrivance. And we stylize data and therefore it's relatively easy to be very precise on it. But when we start, as you noted, when we start measuring things with a tolerance down to thousandths of a millimeter, whatever that is, metric system, now we're still sometimes dealing with errors that we have to attend to. So, the reality is we're not just dealing with stylized data, we're dealing with real data, and it's more, more frequent, but it also has special cases that we have to attend to as in terms of how we use it. What do you think Neil? >> Well, I mean, I agree with that, I think I already said that, right. >> Yes you did, okay let's move on to the next one. >> Well it's a doppelganger, the digital twin doppelganger that's automatically created by your very fact that you're living and interacting and so forth and so on. It's going to accumulate regardless. Now that doppelganger may not be your agent, or might not be the foundation for your agent unless there's some other piece of logic like an interest graph that you build, a human being saying this is my broad set of interests, and so all of my agents out there in the IoT, you all need to be aware that when you make a decision on my behalf as my agent, this is what Jim would do. You know I mean there needs to be that kind of logic somewhere in this fabric to enable true agency. >> All right, so I'm going to start with you. Oh go ahead. >> I have a real short answer to this though. I think that Big Data provides the data and compute platform to make AI possible. For those of us who dipped our toes in the water in the 80s, we got clobbered because we didn't have the, we didn't have the facilities, we didn't have the resources to really do AI, we just kind of played around with it. And I think that the other thing about it is if you combine Big Data and AI and IoT, what you're going to see is people, a lot of the applications we develop now are very inward looking, we look at our organization, we look at our customers. We try to figure out how to sell more shoes to fashionable ladies, right? But with this technology, I think people can really expand what they're thinking about and what they model and come up with applications that are much more external. >> Actually what I would add to that is also it actually introduces being able to use engineering, right? Having engineers interested in the data. Because it's actually technical data that's collected not just say preferences or information about people, but actual measurements that are being collected with IoT. So it's really interesting in the engineering space because it opens up a whole new world for the engineers to actually look at data and to actually combine both that hardware side as well as the data that's being collected from it. >> Well, Neil, you and I have talked about something, 'cause it's not just engineers. We have in the healthcare industry for example, which you know a fair amount about, there's this notion of empirical based management. And the idea that increasingly we have to be driven by data as a way of improving the way that managers do things, the way the managers collect or collaborate and ultimately collectively how they take action. So it's not just engineers, it's supposed to also inform business, what's actually happening in the healthcare world when we start thinking about some of this empirical based management, is it working? What are some of the barriers? >> It's not a function of technology. What happens in medicine and healthcare research is, I guess you can say it borders on fraud. (people chuckling) No, I'm not kidding. I know the New England Journal of Medicine a couple of years ago released a study and said that at least half their articles that they published turned out to be written, ghost written by pharmaceutical companies. (man chuckling) Right, so I think the problem is that when you do a clinical study, the one that really killed me about 10 years ago was the women's health initiative. They spent $700 million gathering this data over 20 years. And when they released it they looked at all the wrong things deliberately, right? So I think that's a systemic-- >> I think you're bringing up a really important point that we haven't brought up yet, and that is is can you use Big Data and machine learning to begin to take the biases out? So if you let the, if you divorce your preconceived notions and your biases from the data and let the data lead you to the logic, you start to, I think get better over time, but it's going to take a while to get there because we do tend to gravitate towards our biases. >> I will share an anecdote. So I had some arm pain, and I had numbness in my thumb and pointer finger and I went to, excruciating pain, went to the hospital. So the doctor examined me, and he said you probably have a pinched nerve, he said, but I'm not exactly sure which nerve it would be, I'll be right back. And I kid you not, he went to a computer and he Googled it. (Neil laughs) And he came back because this little bit of information was something that could easily be looked up, right? Every nerve in your spine is connected to your different fingers so the pointer and the thumb just happens to be your C6, so he came back and said, it's your C6. (Neil mumbles) >> You know an interesting, I mean that's a good example. One of the issues with healthcare data is that the data set is not always shared across the entire research community, so by making Big Data accessible to everyone, you actually start a more rational conversation or debate on well what are the true insights-- >> If that conversation includes what Judith talked about, the actual model that you use to set priorities and make decisions about what's actually important. So it's not just about improving, this is the test. It's not just about improving your understanding of the wrong thing, it's also testing whether it's the right or wrong thing as well. >> That's right, to be able to test that you need to have humans in dialog with one another bringing different biases to the table to work through okay is there truth in this data? >> It's context and it's correlation and you can have a great correlation that's garbage. You know if you don't have the right context. >> Peter: So I want to, hold on Jim, I want to, >> It's exploratory. >> Hold on Jim, I want to take it to the next question 'cause I want to build off of what you talked about Stephanie and that is that this says something about what is the Edge. And our perspective is that the Edge is not just devices. That when we talk about the Edge, we're talking about human beings and the role that human beings are going to play both as sensors or carrying things with them, but also as actuators, actually taking action which is not a simple thing. So what do you guys think? What does the Edge mean to you? Joe, why don't you start? >> Well, I think it could be a combination of the two. And specifically when we talk about healthcare. So I believe in 2017 when we eat we don't know why we're eating, like I think we should absolutely by now be able to know exactly what is my protein level, what is my calcium level, what is my potassium level? And then find the foods to meet that. What have I depleted versus what I should have, and eat very very purposely and not by taste-- >> And it's amazing that red wine is always the answer. >> It is. (people laughing) And tequila, that helps too. >> Jim: You're a precision foodie is what you are. (several chuckle) >> There's no reason why we should not be able to know that right now, right? And when it comes to healthcare is, the biggest problem or challenge with healthcare is no matter how great of a technology you have, you can't, you can't, you can't manage what you can't measure. And you're really not allowed to use a lot of this data so you can't measure it, right? You can't do things very very scientifically right, in the healthcare world and I think regulation in the healthcare world is really burdening advancement in science. >> Peter: Any thoughts Jennifer? >> Yes, I teach statistics for data scientists, right, so you know we talk about a lot of these concepts. I think what makes these questions so difficult is you have to find a balance, right, a middle ground. For instance, in the case of are you being too biased through data, well you could say like we want to look at data only objectively, but then there are certain relationships that your data models might show that aren't actually a causal relationship. For instance, if there's an alien that came from space and saw earth, saw the people, everyone's carrying umbrellas right, and then it started to rain. That alien might think well, it's because they're carrying umbrellas that it's raining. Now we know from real world that that's actually not the way these things work. So if you look only at the data, that's the potential risk. That you'll start making associations or saying something's causal when it's actually not, right? So that's one of the, one of the I think big challenges. I think when it comes to looking also at things like healthcare data, right? Do you collect data about anything and everything? Does it mean that A, we need to collect all that data for the question we're looking at? Or that it's actually the best, more optimal way to be able to get to the answer? Meaning sometimes you can take some shortcuts in terms of what data you collect and still get the right answer and not have maybe that level of specificity that's going to cost you millions extra to be able to get. >> So Jennifer as a data scientist, I want to build upon what you just said. And that is, are we going to start to see methods and models emerge for how we actually solve some of these problems? So for example, we know how to build a system for stylized process like accounting or some elements of accounting. We have methods and models that lead to technology and actions and whatnot all the way down to that that system can be generated. We don't have the same notion to the same degree when we start talking about AI and some of these Big Datas. We have algorithms, we have technology. But are we going to start seeing, as a data scientist, repeatability and learning and how to think the problems through that's going to lead us to a more likely best or at least good result? >> So I think that's a bit of a tough question, right? Because part of it is, it's going to depend on how many of these researchers actually get exposed to real world scenarios, right? Research looks into all these papers, and you come up with all these models, but if it's never tested in a real world scenario, well, I mean we really can't validate that it works, right? So I think it is dependent on how much of this integration there's going to be between the research community and industry and how much investment there is. Funding is going to matter in this case. If there's no funding in the research side, then you'll see a lot of industry folk who feel very confident about their models that, but again on the other side of course, if researchers don't validate those models then you really can't say for sure that it's actually more accurate, or it's more efficient. >> It's the issue of real world testing and experimentation, A B testing, that's standard practice in many operationalized ML and AI implementations in the business world, but real world experimentation in the Edge analytics, what you're actually transducing are touching people's actual lives. Problem there is, like in healthcare and so forth, when you're experimenting with people's lives, somebody's going to die. I mean, in other words, that's a critical, in terms of causal analysis, you've got to tread lightly on doing operationalizing that kind of testing in the IoT when people's lives and health are at stake. >> We still give 'em placebos. So we still test 'em. All right so let's go to the next question. What are the hottest innovations in AI? Stephanie I want to start with you as a company, someone at a company that's got kind of an interesting little thing happening. We start thinking about how do we better catalog data and represent it to a large number of people. What are some of the hottest innovations in AI as you see it? >> I think it's a little counter intuitive about what the hottest innovations are in AI, because we're at a spot in the industry where the most successful companies that are working with AI are actually incorporating them into solutions. So the best AI solutions are actually the products that you don't know there's AI operating underneath. But they're having a significant impact on business decision making or bringing a different type of application to the market and you know, I think there's a lot of investment that's going into AI tooling and tool sets for data scientists or researchers, but the more innovative companies are thinking through how do we really take AI and make it have an impact on business decision making and that means kind of hiding the AI to the business user. Because if you think a bot is making a decision instead of you, you're not going to partner with that bot very easily or very readily. I worked at, way at the start of my career, I worked in CRM when recommendation engines were all the rage online and also in call centers. And the hardest thing was to get a call center agent to actually read the script that the algorithm was presenting to them, that algorithm was 99% correct most of the time, but there was this human resistance to letting a computer tell you what to tell that customer on the other side even if it was more successful in the end. And so I think that the innovation in AI that's really going to push us forward is when humans feel like they can partner with these bots and they don't think of it as a bot, but they think about as assisting their work and getting to a better result-- >> Hence the augmentation point you made earlier. >> Absolutely, absolutely. >> Joe how 'about you? What do you look at? What are you excited about? >> I think the coolest thing at the moment right now is chat bots. Like to be able, like to have voice be able to speak with you in natural language, to do that, I think that's pretty innovative, right? And I do think that eventually, for the average user, not for techies like me, but for the average user, I think keyboards are going to be a thing of the past. I think we're going to communicate with computers through voice and I think this is the very very beginning of that and it's an incredible innovation. >> Neil? >> Well, I think we all have myopia here. We're all thinking about commercial applications. Big, big things are happening with AI in the intelligence community, in military, the defense industry, in all sorts of things. Meteorology. And that's where, well, hopefully not on an every day basis with military, you really see the effect of this. But I was involved in a project a couple of years ago where we were developing AI software to detect artillery pieces in terrain from satellite imagery. I don't have to tell you what country that was. I think you can probably figure that one out right? But there are legions of people in many many companies that are involved in that industry. So if you're talking about the dollars spent on AI, I think the stuff that we do in our industries is probably fairly small. >> Well it reminds me of an application I actually thought was interesting about AI related to that, AI being applied to removing mines from war zones. >> Why not? >> Which is not a bad thing for a whole lot of people. Judith what do you look at? >> So I'm looking at things like being able to have pre-trained data sets in specific solution areas. I think that that's something that's coming. Also the ability to, to really be able to have a machine assist you in selecting the right algorithms based on what your data looks like and the problems you're trying to solve. Some of the things that data scientists still spend a lot of their time on, but can be augmented with some, basically we have to move to levels of abstraction before this becomes truly ubiquitous across many different areas. >> Peter: Jennifer? >> So I'm going to say computer vision. >> Computer vision? >> Computer vision. So computer vision ranges from image recognition to be able to say what content is in the image. Is it a dog, is it a cat, is it a blueberry muffin? Like a sort of popular post out there where it's like a blueberry muffin versus like I think a chihuahua and then it compares the two. And can the AI really actually detect difference, right? So I think that's really where a lot of people who are in this space of being in both the AI space as well as data science are looking to for the new innovations. I think, for instance, cloud vision I think that's what Google still calls it. The vision API we've they've released on beta allows you to actually use an API to send your image and then have it be recognized right, by their API. There's another startup in New York called Clarify that also does a similar thing as well as you know Amazon has their recognition platform as well. So I think in a, from images being able to detect what's in the content as well as from videos, being able to say things like how many people are entering a frame? How many people enter the store? Not having to actually go look at it and count it, but having a computer actually tally that information for you, right? >> There's actually an extra piece to that. So if I have a picture of a stop sign, and I'm an automated car, and is it a picture on the back of a bus of a stop sign, or is it a real stop sign? So that's going to be one of the complications. >> Doesn't matter to a New York City cab driver. How 'about you Jim? >> Probably not. (laughs) >> Hottest thing in AI is General Adversarial Networks, GANT, what's hot about that, well, I'll be very quick, most AI, most deep learning, machine learning is analytical, it's distilling or inferring insights from the data. Generative takes that same algorithmic basis but to build stuff. In other words, to create realistic looking photographs, to compose music, to build CAD CAM models essentially that can be constructed on 3D printers. So GANT, it's a huge research focus all around the world are used for, often increasingly used for natural language generation. In other words it's institutionalizing or having a foundation for nailing the Turing test every single time, building something with machines that looks like it was constructed by a human and doing it over and over again to fool humans. I mean you can imagine the fraud potential. But you can also imagine just the sheer, like it's going to shape the world, GANT. >> All right so I'm going to say one thing, and then we're going to ask if anybody in the audience has an idea. So the thing that I find interesting is traditional programs, or when you tell a machine to do something you don't need incentives. When you tell a human being something, you have to provide incentives. Like how do you get someone to actually read the text. And this whole question of elements within AI that incorporate incentives as a way of trying to guide human behavior is absolutely fascinating to me. Whether it's gamification, or even some things we're thinking about with block chain and bitcoins and related types of stuff. To my mind that's going to have an enormous impact, some good, some bad. Anybody in the audience? I don't want to lose everybody here. What do you think sir? And I'll try to do my best to repeat it. Oh we have a mic. >> So my question's about, Okay, so the question's pretty much about what Stephanie's talking about which is human and loop training right? I come from a computer vision background. That's the problem, we need millions of images trained, we need humans to do that. And that's like you know, the workforce is essentially people that aren't necessarily part of the AI community, they're people that are just able to use that data and analyze the data and label that data. That's something that I think is a big problem everyone in the computer vision industry at least faces. I was wondering-- >> So again, but the problem is that is the difficulty of methodologically bringing together people who understand it and people who, people who have domain expertise people who have algorithm expertise and working together? >> I think the expertise issue comes in healthcare, right? In healthcare you need experts to be labeling your images. With contextual information where essentially augmented reality applications coming in, you have the AR kit and everything coming out, but there is a lack of context based intelligence. And all of that comes through training images, and all of that requires people to do it. And that's kind of like the foundational basis of AI coming forward is not necessarily an algorithm, right? It's how well are datas labeled? Who's doing the labeling and how do we ensure that it happens? >> Great question. So for the panel. So if you think about it, a consultant talks about being on the bench. How much time are they going to have to spend on trying to develop additional business? How much time should we set aside for executives to help train some of the assistants? >> I think that the key is not, to think of the problem a different way is that you would have people manually label data and that's one way to solve the problem. But you can also look at what is the natural workflow of that executive, or that individual? And is there a way to gather that context automatically using AI, right? And if you can do that, it's similar to what we do in our product, we observe how someone is analyzing the data and from those observations we can actually create the metadata that then trains the system in a particular direction. But you have to think about solving the problem differently of finding the workflow that then you can feed into to make this labeling easy without the human really realizing that they're labeling the data. >> Peter: Anybody else? >> I'll just add to what Stephanie said, so in the IoT applications, all those sensory modalities, the computer vision, the speech recognition, all that, that's all potential training data. So it cross checks against all the other models that are processing all the other data coming from that device. So that the natural language process of understanding can be reality checked against the images that the person happens to be commenting upon, or the scene in which they're embedded, so yeah, the data's embedded-- >> I don't think we're, we're not at the stage yet where this is easy. It's going to take time before we do start doing the pre-training of some of these details so that it goes faster, but right now, there're not that many shortcuts. >> Go ahead Joe. >> Sorry so a couple things. So one is like, I was just caught up on your incentivizing programs to be more efficient like humans. You know in Ethereum that has this notion, which is bot chain, has this theory, this concept of gas. Where like as the process becomes more efficient it costs less to actually run, right? It costs less ether, right? So it actually is kind of, the machine is actually incentivized and you don't really know what it's going to cost until the machine processes it, right? So there is like some notion of that there. But as far as like vision, like training the machine for computer vision, I think it's through adoption and crowdsourcing, so as people start using it more they're going to be adding more pictures. Very very organically. And then the machines will be trained and right now is a very small handful doing it, and it's very proactive by the Googles and the Facebooks and all of that. But as we start using it, as they start looking at my images and Jim's and Jen's images, it's going to keep getting smarter and smarter through adoption and through very organic process. >> So Neil, let me ask you a question. Who owns the value that's generated as a consequence of all these people ultimately contributing their insight and intelligence into these systems? >> Well, to a certain extent the people who are contributing the insight own nothing because the systems collect their actions and the things they do and then that data doesn't belong to them, it belongs to whoever collected it or whoever's going to do something with it. But the other thing, getting back to the medical stuff. It's not enough to say that the systems, people will do the right thing, because a lot of them are not motivated to do the right thing. The whole grant thing, the whole oh my god I'm not going to go against the senior professor. A lot of these, I knew a guy who was a doctor at University of Pittsburgh and they were doing a clinical study on the tubes that they put in little kids' ears who have ear infections, right? And-- >> Google it! Who helps out? >> Anyway, I forget the exact thing, but he came out and said that the principle investigator lied when he made the presentation, that it should be this, I forget which way it went. He was fired from his position at Pittsburgh and he has never worked as a doctor again. 'Cause he went against the senior line of authority. He was-- >> Another question back here? >> Man: Yes, Mark Turner has a question. >> Not a question, just want to piggyback what you're saying about the transfixation of maybe in healthcare of black and white images and color images in the case of sonograms and ultrasound and mammograms, you see that happening using AI? You see that being, I mean it's already happening, do you see it moving forward in that kind of way? I mean, talk more about that, about you know, AI and black and white images being used and they can be transfixed, they can be made to color images so you can see things better, doctors can perform better operations. >> So I'm sorry, but could you summarize down? What's the question? Summarize it just, >> I had a lot of students, they're interested in the cross pollenization between AI and say the medical community as far as things like ultrasound and sonograms and mammograms and how you can literally take a black and white image and it can, using algorithms and stuff be made to color images that can help doctors better do the work that they've already been doing, just do it better. You touched on it like 30 seconds. >> So how AI can be used to actually add information in a way that's not necessarily invasive but is ultimately improves how someone might respond to it or use it, yes? Related? I've also got something say about medical images in a second, any of you guys want to, go ahead Jennifer. >> Yeah, so for one thing, you know and it kind of goes back to what we were talking about before. When we look at for instance scans, like at some point I was looking at CT scans, right, for lung cancer nodules. In order for me, who I don't have a medical background, to identify where the nodule is, of course, a doctor actually had to go in and specify which slice of the scan had the nodule and where exactly it is, so it's on both the slice level as well as, within that 2D image, where it's located and the size of it. So the beauty of things like AI is that ultimately right now a radiologist has to look at every slice and actually identify this manually, right? The goal of course would be that one day we wouldn't have to have someone look at every slice to like 300 usually slices and be able to identify it much more automated. And I think the reality is we're not going to get something where it's going to be 100%. And with anything we do in the real world it's always like a 95% chance of it being accurate. So I think it's finding that in between of where, what's the threshold that we want to use to be able to say that this is, definitively say a lung cancer nodule or not. I think the other thing to think about is in terms of how their using other information, what they might use is a for instance, to say like you know, based on other characteristics of the person's health, they might use that as sort of a grading right? So you know, how dark or how light something is, identify maybe in that region, the prevalence of that specific variable. So that's usually how they integrate that information into something that's already existing in the computer vision sense. I think that's, the difficulty with this of course, is being able to identify which variables were introduced into data that does exist. >> So I'll make two quick observations on this then I'll go to the next question. One is radiologists have historically been some of the highest paid physicians within the medical community partly because they don't have to be particularly clinical. They don't have to spend a lot of time with patients. They tend to spend time with doctors which means they can do a lot of work in a little bit of time, and charge a fair amount of money. As we start to introduce some of these technologies that allow us to from a machine standpoint actually make diagnoses based on those images, I find it fascinating that you now see television ads promoting the role that the radiologist plays in clinical medicine. It's kind of an interesting response. >> It's also disruptive as I'm seeing more and more studies showing that deep learning models processing images, ultrasounds and so forth are getting as accurate as many of the best radiologists. >> That's the point! >> Detecting cancer >> Now radiologists are saying oh look, we do this great thing in terms of interacting with the patients, never have because they're being dis-intermediated. The second thing that I'll note is one of my favorite examples of that if I got it right, is looking at the images, the deep space images that come out of Hubble. Where they're taking data from thousands, maybe even millions of images and combining it together in interesting ways you can actually see depth. You can actually move through to a very very small scale a system that's 150, well maybe that, can't be that much, maybe six billion light years away. Fascinating stuff. All right so let me go to the last question here, and then I'm going to close it down, then we can have something to drink. What are the hottest, oh I'm sorry, question? >> Yes, hi, my name's George, I'm with Blue Talon. You asked earlier there the question what's the hottest thing in the Edge and AI, I would say that it's security. It seems to me that before you can empower agency you need to be able to authorize what they can act on, how they can act on, who they can act on. So it seems if you're going to move from very distributed data at the Edge and analytics at the Edge, there has to be security similarly done at the Edge. And I saw (speaking faintly) slides that called out security as a key prerequisite and maybe Judith can comment, but I'm curious how security's going to evolve to meet this analytics at the Edge. >> Well, let me do that and I'll ask Jen to comment. The notion of agency is crucially important, slightly different from security, just so we're clear. And the basic idea here is historically folks have thought about moving data or they thought about moving application function, now we are thinking about moving authority. So as you said. That's not necessarily, that's not really a security question, but this has been a problem that's been in, of concern in a number of different domains. How do we move authority with the resources? And that's really what informs the whole agency process. But with that said, Jim. >> Yeah actually I'll, yeah, thank you for bringing up security so identity is the foundation of security. Strong identity, multifactor, face recognition, biometrics and so forth. Clearly AI, machine learning, deep learning are powering a new era of biometrics and you know it's behavioral metrics and so forth that's organic to people's use of devices and so forth. You know getting to the point that Peter was raising is important, agency! Systems of agency. Your agent, you have to, you as a human being should be vouching in a secure, tamper proof way, your identity should be vouching for the identity of some agent, physical or virtual that does stuff on your behalf. How can that, how should that be managed within this increasingly distributed IoT fabric? Well a lot of that's been worked. It all ran through webs of trust, public key infrastructure, formats and you know SAML for single sign and so forth. It's all about assertion, strong assertions and vouching. I mean there's the whole workflows of things. Back in the ancient days when I was actually a PKI analyst three analyst firms ago, I got deep into all the guts of all those federation agreements, something like that has to be IoT scalable to enable systems agency to be truly fluid. So we can vouch for our agents wherever they happen to be. We're going to keep on having as human beings agents all over creation, we're not even going to be aware of everywhere that our agents are, but our identity-- >> It's not just-- >> Our identity has to follow. >> But it's not just identity, it's also authorization and context. >> Permissioning, of course. >> So I may be the right person to do something yesterday, but I'm not authorized to do it in another context in another application. >> Role based permissioning, yeah. Or persona based. >> That's right. >> I agree. >> And obviously it's going to be interesting to see the role that block chain or its follow on to the technology is going to play here. Okay so let me throw one more questions out. What are the hottest applications of AI at the Edge? We've talked about a number of them, does anybody want to add something that hasn't been talked about? Or do you want to get a beer? (people laughing) Stephanie, you raised your hand first. >> I was going to go, I bring something mundane to the table actually because I think one of the most exciting innovations with IoT and AI are actually simple things like City of San Diego is rolling out 3200 automated street lights that will actually help you find a parking space, reduce the amount of emissions into the atmosphere, so has some environmental change, positive environmental change impact. I mean, it's street lights, it's not like a, it's not medical industry, it doesn't look like a life changing innovation, and yet if we automate streetlights and we manage our energy better, and maybe they can flicker on and off if there's a parking space there for you, that's a significant impact on everyone's life. >> And dramatically suppress the impact of backseat driving! >> (laughs) Exactly. >> Joe what were you saying? >> I was just going to say you know there's already the technology out there where you can put a camera on a drone with machine learning within an artificial intelligence within it, and it can look at buildings and determine whether there's rusty pipes and cracks in cement and leaky roofs and all of those things. And that's all based on artificial intelligence. And I think if you can do that, to be able to look at an x-ray and determine if there's a tumor there is not out of the realm of possibility, right? >> Neil? >> I agree with both of them, that's what I meant about external kind of applications. Instead of figuring out what to sell our customers. Which is most what we hear. I just, I think all of those things are imminently doable. And boy street lights that help you find a parking place, that's brilliant, right? >> Simple! >> It improves your life more than, I dunno. Something I use on the internet recently, but I think it's great! That's, I'd like to see a thousand things like that. >> Peter: Jim? >> Yeah, building on what Stephanie and Neil were saying, it's ambient intelligence built into everything to enable fine grain microclimate awareness of all of us as human beings moving through the world. And enable reading of every microclimate in buildings. In other words, you know you have sensors on your body that are always detecting the heat, the humidity, the level of pollution or whatever in every environment that you're in or that you might be likely to move into fairly soon and either A can help give you guidance in real time about where to avoid, or give that environment guidance about how to adjust itself to your, like the lighting or whatever it might be to your specific requirements. And you know when you have a room like this, full of other human beings, there has to be some negotiated settlement. Some will find it too hot, some will find it too cold or whatever but I think that is fundamental in terms of reshaping the sheer quality of experience of most of our lived habitats on the planet potentially. That's really the Edge analytics application that depends on everybody having, being fully equipped with a personal area network of sensors that's communicating into the cloud. >> Jennifer? >> So I think, what's really interesting about it is being able to utilize the technology we do have, it's a lot cheaper now to have a lot of these ways of measuring that we didn't have before. And whether or not engineers can then leverage what we have as ways to measure things and then of course then you need people like data scientists to build the right model. So you can collect all this data, if you don't build the right model that identifies these patterns then all that data's just collected and it's just made a repository. So without having the models that supports patterns that are actually in the data, you're not going to find a better way of being able to find insights in the data itself. So I think what will be really interesting is to see how existing technology is leveraged, to collect data and then how that's actually modeled as well as to be able to see how technology's going to now develop from where it is now, to being able to either collect things more sensitively or in the case of say for instance if you're dealing with like how people move, whether we can build things that we can then use to measure how we move, right? Like how we move every day and then being able to model that in a way that is actually going to give us better insights in things like healthcare and just maybe even just our behaviors. >> Peter: Judith? >> So, I think we also have to look at it from a peer to peer perspective. So I may be able to get some data from one thing at the Edge, but then all those Edge devices, sensors or whatever, they all have to interact with each other because we don't live, we may, in our business lives, act in silos, but in the real world when you look at things like sensors and devices it's how they react with each other on a peer to peer basis. >> All right, before I invite John up, I want to say, I'll say what my thing is, and it's not the hottest. It's the one I hate the most. I hate AI generated music. (people laughing) Hate it. All right, I want to thank all the panelists, every single person, some great commentary, great observations. I want to thank you very much. I want to thank everybody that joined. John in a second you'll kind of announce who's the big winner. But the one thing I want to do is, is I was listening, I learned a lot from everybody, but I want to call out the one comment that I think we all need to remember, and I'm going to give you the award Stephanie. And that is increasing we have to remember that the best AI is probably AI that we don't even know is working on our behalf. The same flip side of that is all of us have to be very cognizant of the idea that AI is acting on our behalf and we may not know it. So, John why don't you come on up. Who won the, whatever it's called, the raffle? >> You won. >> Thank you! >> How 'about a round of applause for the great panel. (audience applauding) Okay we have a put the business cards in the basket, we're going to have that brought up. We're going to have two raffle gifts, some nice Bose headsets and speaker, Bluetooth speaker. Got to wait for that. I just want to say thank you for coming and for the folks watching, this is our fifth year doing our own event called Big Data NYC which is really an extension of the landscape beyond the Big Data world that's Cloud and AI and IoT and other great things happen and great experts and influencers and analysts here. Thanks for sharing your opinion. Really appreciate you taking the time to come out and share your data and your knowledge, appreciate it. Thank you. Where's the? >> Sam's right in front of you. >> There's the thing, okay. Got to be present to win. We saw some people sneaking out the back door to go to a dinner. >> First prize first. >> Okay first prize is the Bose headset. >> Bluetooth and noise canceling. >> I won't look, Sam you got to hold it down, I can see the cards. >> All right. >> Stephanie you won! (Stephanie laughing) Okay, Sawny Cox, Sawny Allie Cox? (audience applauding) Yay look at that! He's here! The bar's open so help yourself, but we got one more. >> Congratulations. Picture right here. >> Hold that I saw you. Wake up a little bit. Okay, all right. Next one is, my kids love this. This is great, great for the beach, great for everything portable speaker, great gift. >> What is it? >> Portable speaker. >> It is a portable speaker, it's pretty awesome. >> Oh you grabbed mine. >> Oh that's one of our guys. >> (lauging) But who was it? >> Can't be related! Ava, Ava, Ava. Okay Gene Penesko (audience applauding) Hey! He came in! All right look at that, the timing's great. >> Another one? (people laughing) >> Hey thanks everybody, enjoy the night, thank Peter Burris, head of research for SiliconANGLE, Wikibon and he great guests and influencers and friends. And you guys for coming in the community. Thanks for watching and thanks for coming. Enjoy the party and some drinks and that's out, that's it for the influencer panel and analyst discussion. Thank you. (logo music)
SUMMARY :
is that the cloud is being extended out to the Edge, the next time I talk to you I don't want to hear that are made at the Edge to individual users We've got, again, the objective here is to have community From the Hurwitz Group. And finally Joe Caserta, Joe come on up. And to the left. I've been in the market for a couple years now. I'm the founder and Chief Data Scientist We can hear you now. And I have, I've been developing a lot of patents I just feel not worthy in the presence of Joe Caserta. If you can hear me, Joe Caserta, so yeah, I've been doing We recently rebranded to only Caserta 'cause what we do to make recommendations about what data to use the realities of how data is going to work in these to make sure that you have the analytics at the edge. and ARBI is the integration of Augmented Reality And it's going to say exactly you know, And if the machine appears to approximate what's and analyzed, conceivably some degree of mind reading but the machine as in the bot isn't able to tell you kind of some of the things you talked about, IoT, So that's one of the reasons why the IoT of the primary source. Well, I mean, I agree with that, I think I already or might not be the foundation for your agent All right, so I'm going to start with you. a lot of the applications we develop now are very So it's really interesting in the engineering space And the idea that increasingly we have to be driven I know the New England Journal of Medicine So if you let the, if you divorce your preconceived notions So the doctor examined me, and he said you probably have One of the issues with healthcare data is that the data set the actual model that you use to set priorities and you can have a great correlation that's garbage. What does the Edge mean to you? And then find the foods to meet that. And tequila, that helps too. Jim: You're a precision foodie is what you are. in the healthcare world and I think regulation For instance, in the case of are you being too biased We don't have the same notion to the same degree but again on the other side of course, in the Edge analytics, what you're actually transducing What are some of the hottest innovations in AI and that means kind of hiding the AI to the business user. I think keyboards are going to be a thing of the past. I don't have to tell you what country that was. AI being applied to removing mines from war zones. Judith what do you look at? and the problems you're trying to solve. And can the AI really actually detect difference, right? So that's going to be one of the complications. Doesn't matter to a New York City cab driver. (laughs) So GANT, it's a huge research focus all around the world So the thing that I find interesting is traditional people that aren't necessarily part of the AI community, and all of that requires people to do it. So for the panel. of finding the workflow that then you can feed into that the person happens to be commenting upon, It's going to take time before we do start doing and Jim's and Jen's images, it's going to keep getting Who owns the value that's generated as a consequence But the other thing, getting back to the medical stuff. and said that the principle investigator lied and color images in the case of sonograms and ultrasound and say the medical community as far as things in a second, any of you guys want to, go ahead Jennifer. to say like you know, based on other characteristics I find it fascinating that you now see television ads as many of the best radiologists. and then I'm going to close it down, It seems to me that before you can empower agency Well, let me do that and I'll ask Jen to comment. agreements, something like that has to be IoT scalable and context. So I may be the right person to do something yesterday, Or persona based. that block chain or its follow on to the technology into the atmosphere, so has some environmental change, the technology out there where you can put a camera And boy street lights that help you find a parking place, That's, I'd like to see a thousand things like that. that are always detecting the heat, the humidity, patterns that are actually in the data, but in the real world when you look at things and I'm going to give you the award Stephanie. and for the folks watching, We saw some people sneaking out the back door I can see the cards. Stephanie you won! Picture right here. This is great, great for the beach, great for everything All right look at that, the timing's great. that's it for the influencer panel and analyst discussion.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Judith | PERSON | 0.99+ |
Jennifer | PERSON | 0.99+ |
Jim | PERSON | 0.99+ |
Neil | PERSON | 0.99+ |
Stephanie McReynolds | PERSON | 0.99+ |
Jack | PERSON | 0.99+ |
2001 | DATE | 0.99+ |
Marc Andreessen | PERSON | 0.99+ |
Jim Kobielus | PERSON | 0.99+ |
Jennifer Shin | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Joe Caserta | PERSON | 0.99+ |
Suzie Welch | PERSON | 0.99+ |
Joe | PERSON | 0.99+ |
David Floyer | PERSON | 0.99+ |
Peter | PERSON | 0.99+ |
Stephanie | PERSON | 0.99+ |
Jen | PERSON | 0.99+ |
Neil Raden | PERSON | 0.99+ |
Mark Turner | PERSON | 0.99+ |
Judith Hurwitz | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Elysian | ORGANIZATION | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
Qualcomm | ORGANIZATION | 0.99+ |
Peter Burris | PERSON | 0.99+ |
2017 | DATE | 0.99+ |
Honeywell | ORGANIZATION | 0.99+ |
Apple | ORGANIZATION | 0.99+ |
Derek Sivers | PERSON | 0.99+ |
New York | LOCATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
New York City | LOCATION | 0.99+ |
1998 | DATE | 0.99+ |