Zhamak Dehghani, ThoughtWorks | theCUBE on Cloud 2021

>>from around the globe. It's the Cube presenting Cuban cloud brought to you by silicon angle in 2000 >>nine. Hal Varian, Google's chief economist, said that statisticians would be the sexiest job in the coming decade. The modern big data movement >>really >>took off later in the following year. After the Second Hadoop World, which was hosted by Claudette Cloudera in New York City. Jeff Ham Abakar famously declared to me and John further in the Cube that the best minds of his generation, we're trying to figure out how to get people to click on ads. And he said that sucks. The industry was abuzz with the realization that data was the new competitive weapon. Hadoop was heralded as the new data management paradigm. Now, what actually transpired Over the next 10 years on Lee, a small handful of companies could really master the complexities of big data and attract the data science talent really necessary to realize massive returns as well. Back then, Cloud was in the early stages of its adoption. When you think about it at the beginning of the last decade and as the years passed, Maurin Mawr data got moved to the cloud and the number of data sources absolutely exploded. Experimentation accelerated, as did the pace of change. Complexity just overwhelmed big data infrastructures and data teams, leading to a continuous stream of incremental technical improvements designed to try and keep pace things like data Lakes, data hubs, new open source projects, new tools which piled on even Mawr complexity. And as we reported, we believe what's needed is a comm pleat bit flip and how we approach data architectures. Our next guest is Jean Marc de Connie, who is the director of emerging technologies That thought works. John Mark is a software engineer, architect, thought leader and adviser to some of the world's most prominent enterprises. She's, in my view, one of the foremost advocates for rethinking and changing the way we create and manage data architectures. Favoring a decentralized over monolithic structure and elevating domain knowledge is a primary criterion. And how we organize so called big data teams and platforms. Chamakh. Welcome to the Cube. It's a pleasure to have you on the program. >>Hi, David. This wonderful to be here. >>Well, okay, so >>you're >>pretty outspoken about the need for a paradigm shift in how we manage our data and our platforms that scale. Why do you feel we need such a radical change? What's your thoughts there? >>Well, I think if you just look back over the last decades you gave us, you know, a summary of what happened since 2000 and 10. But if even if we go before then what we have done over the last few decades is basically repeating and, as you mentioned, incrementally improving how we've managed data based on a certain assumptions around. As you mentioned, centralization data has to be in one place so we can get value from it. But if you look at the parallel movement off our industry in general since the birth of Internet, we are actually moving towards decentralization. If we think today, like if this move data side, if he said the only way Web would work the only way we get access to you know various applications on the Web pages is to centralize it. We would laugh at that idea, but for some reason we don't. We don't question that when it comes to data, right? So I think it's time to embrace the complexity that comes with the growth of number of sources, the proliferation of sources and consumptions models, you know, embrace the distribution of sources of data that they're not just within one part of organization. They're not just within even bounds of organization there beyond the bounds of organization. And then look back and say Okay, if that's the trend off our industry in general, Um, given the fabric of computation and data that we put in, you know globally in place, then how the architecture and technology and organizational structure incentives need to move to embrace that complexity. And to me, that requires a paradigm shift, a full stack from how we organize our organizations, how we organize our teams, how we, you know, put a technology in place, um, to to look at it from a decentralized angle. >>Okay, so let's let's unpack that a little bit. I mean, you've spoken about and written that today's big architecture and you basically just mentioned that it's flawed, So I wanna bring up. I love your diagrams of a simple diagram, guys, if you could bring up ah, figure one. So on the left here we're adjusting data from the operational systems and other enterprise data sets and, of course, external data. We cleanse it, you know, you've gotta do the do the quality thing and then serve them up to the business. So So what's wrong with that picture that we just described and give granted? It's a simplified form. >>Yeah, quite a few things. So, yeah, I would flip the question may be back to you or the audience if we said that. You know, there are so many sources off the data on the Actually, the data comes from systems and from teams that are very diverse in terms off domains. Right? Domain. If if you just think about, I don't know retail, Uh, the the E Commerce versus Order Management versus customer This is a very diverse domains. The data comes from many different diverse domains. And then we expect to put them under the control off a centralized team, a centralized system. And I know that centralization. Probably if you zoom out, it's centralized. If you zoom in it z compartmentalized based on functions that we can talk about that and we assume that the centralized model will be served, you know, getting that data, making sense of it, cleansing and transforming it then to satisfy in need of very diverse set of consumers without really understanding the domains, because the teams responsible for it or not close to the source of the data. So there is a bit of it, um, cognitive gap and domain understanding Gap, um, you know, without really understanding of how the data is going to be used, I've talked to numerous. When we came to this, I came up with the idea. I talked to a lot of data teams globally just to see, you know, what are the pain points? How are they doing it? And one thing that was evident in all of those conversations that they actually didn't know after they built these pipelines and put the data in whether the data warehouse tables or like, they didn't know how the data was being used. But yet the responsible for making the data available for these diverse set of these cases, So s centralized system. A monolithic system often is a bottleneck. So what you find is, a lot of the teams are struggling with satisfying the needs of the consumers, the struggling with really understanding the data. The domain knowledge is lost there is a los off understanding and kind of in that in that transformation. Often, you know, we end up training machine learning models on data that is not really representative off the reality off the business. And then we put them to production and they don't work because the semantic and the same tax off the data gets lost within that translation. So we're struggling with finding people thio, you know, to manage a centralized system because there's still the technology is fairly, in my opinion, fairly low level and exposes the users of those technologies. I said, Let's say warehouse a lot off, you know, complexity. So in summary, I think it's a bottleneck is not gonna, you know, satisfy the pace of change, of pace, of innovation and the pace of, you know, availability of sources. Um, it's disconnected and fragmented, even though the centralizes disconnected and fragmented from where the data comes from and where the data gets used on is managed by, you know, a team off hyper specialized people that you know, they're struggling to understand the actual value of the data, the actual format of the data, so it's not gonna get us where our aspirations and ambitions need to be. >>Yes. So the big data platform is essentially I think you call it, uh, context agnostic. And so is data becomes, you know, more important, our lives. You've got all these new data sources, you know, injected into the system. Experimentation as we said it with the cloud becomes much, much easier. So one of the blockers that you've started, you just mentioned it is you've got these hyper specialized roles the data engineer, the quality engineer, data scientists and and the It's illusory. I mean, it's like an illusion. These guys air, they seemingly they're independent and in scale independently. But I think you've made the point that in fact, they can't that a change in the data source has an effect across the entire data lifecycle entire data pipeline. So maybe you could maybe you could add some color to why that's problematic for some of the organizations that you work with and maybe give some examples. >>Yeah, absolutely so in fact, that initially the hypothesis around that image came from a Siris of requests that we received from our both large scale and progressive clients and progressive in terms of their investment in data architectures. So this is where clients that they were there were larger scale. They had divers and reached out of domains. Some of them were big technology tech companies. Some of them were retail companies, big health care companies. So they had that diversity off the data and the number off. You know, the sources of the domains they had invested for quite a few years in, you know, generations. If they had multi generations of proprietary data warehouses on print that they were moving to cloud, they had moved to the barriers, you know, revisions of the Hadoop clusters and they were moving to the cloud. And they the challenges that they were facing were simply there were not like, if I want to just, like, you know, simplifying in one phrase, they were not getting value from the data that they were collecting. There were continuously struggling Thio shift the culture because there was so much friction between all of these three phases of both consumption of the data and transformation and making it available consumption from sources and then providing it and serving it to the consumer. So that whole process was full of friction. Everybody was unhappy. So its bottom line is that you're collecting all this data. There is delay. There is lack of trust in the data itself because the data is not representative of the reality has gone through a transformation. But people that didn't understand really what the data was got delayed on bond. So there is no trust. It's hard to get to the data. It's hard to create. Ultimately, it's hard to create value from the data, and people are working really hard and under a lot of pressure. But it's still, you know, struggling. So we often you know, our solutions like we are. You know, Technologies will often pointed to technology. So we go. Okay, This this version of you know, some some proprietary data warehouse we're using is not the right thing. We should go to the cloud, and that certainly will solve our problems. Right? Or warehouse wasn't a good one. Let's make a deal Lake version. So instead of you know, extracting and then transforming and loading into the little bits. And that transformation is that, you know, heavy process, because you fundamentally made an assumption using warehouses that if I transform this data into this multi dimensional, perfectly designed schema that then everybody can run whatever choir they want that's gonna solve. You know everybody's problem, but in reality it doesn't because you you are delayed and there is no universal model that serves everybody's need. Everybody that needs the divers data scientists necessarily don't don't like the perfectly modeled data. They're looking for both signals and the noise. So then, you know, we've We've just gone from, uh, et elles to let's say now to Lake, which is okay, let's move the transformation to the to the last mile. Let's just get load the data into, uh into the object stores into semi structured files and get the data. Scientists use it, but they're still struggling because the problems that we mentioned eso then with the solution. What is the solution? Well, next generation data platform, let's put it on the cloud, and we sell clients that actually had gone through, you know, a year or multiple years of migration to the cloud. But with it was great. 18 months I've seen, you know, nine months migrations of the warehouse versus two year migrations of the various data sources to the clubhouse. But ultimately, the result is the same on satisfy frustrated data users, data providers, um, you know, with lack of ability to innovate quickly on relevant data and have have have an experience that they deserve toe have have a delightful experience off discovering and exploring data that they trust. And all of that was still a missed so something something else more fundamentally needed to change than just the technology. >>So then the linchpin to your scenario is this notion of context and you you pointed out you made the other observation that look, we've made our operational systems context aware. But our data platforms are not on bond like CRM system sales guys very comfortable with what's in the CRM system. They own the data. So let's talk about the answer that you and your colleagues are proposing. You're essentially flipping the architecture whereby those domain knowledge workers, the builders, if you will, of data products or data services there now, first class citizens in the data flow and they're injecting by design domain knowledge into the system. So So I wanna put up another one of your charts. Guys, bring up the figure to their, um it talks about, you know, convergence. You showed data distributed domain, dream and architecture. Er this self serve platform design and this notion of product thinking. So maybe you could explain why this approach is is so desirable, in your view, >>sure. The motivation and inspiration for the approach came from studying what has happened over the last few decades in operational systems. We had a very similar problem prior to micro services with monolithic systems, monolithic systems where you know the bottleneck. Um, the changes we needed to make was always, you know, our fellow Noto, how the architecture was centralized and we found a nice nation. I'm not saying this is the perfect way of decoupling a monolith, but it's a way that currently where we are in our journey to become data driven, um is a nice place to be, um, which is distribution or decomposition off your system as well as organization. I think when we whenever we talk about systems, we've got to talk about people and teams that's responsible for managing those systems. So the decomposition off the systems and the teams on the data around domains because that's how today we are decoupling our business, right? We're decoupling our businesses around domains, and that's a that's a good thing and that What does that do really for us? What it does? Is it localizes change to the bounded context of fact business. It creates clear boundary and interfaces and contracts between the rest of the universe of the organization on that particular team, so removes the friction that often we have for both managing the change and both serving data or capability. So it's the first principle of data meshes. Let's decouple this world off analytical data the same to mirror the same way we have to couple their systems and teams and business why data is any different. And the moment you do that, So you, the moment you bring the ownership to people who understands the data best, then you get questions that well, how is that any different from silence that's connected databases that we have today and nobody can get to the data? So then the rest of the principles is really to address all of the challenges that comes with this first principle of decomposition around domain Context on the second principle is well, we have to expect a certain level off quality and accountability and responsibility for the teams that provide the data. So let's bring product thinking and treating data as a product to the data that these teams now, um share and let's put accountability around. And we need a new set of incentives and metrics for domain teams to share the data. We need to have a new set off kind of quality metrics that define what it means for the data to be a product. And we can go through that conversation perhaps later eso then the second principle is okay. The teams now that are responsible, the domain teams responsible for the analytical data need to provide that data with a certain level of quality and assurance. Let's call that a product and bring products thinking to that. And then the next question you get asked off by C. E. O s or city or the people who build the infrastructure and, you know, spend the money. They said, Well, it's actually quite complex to manage big data, and now we're We want everybody, every independent team to manage the full stack of, you know, storage and computation and pipelines and, you know, access, control and all of that. And that's well, we have solved that problem in operational world. And that requires really a new level of platform thinking toe provide infrastructure and tooling to the domain teams to now be able to manage and serve their big data. And that I think that requires reimagining the world of our tooling and technology. But for now, let's just assume that we need a new level of abstraction to hide away ton of complexity that unnecessarily people get exposed to and that that's the third principle of creating Selves of infrastructure, um, to allow autonomous teams to build their domains. But then the last pillar, the last you know, fundamental pillar is okay. Once you distributed problem into a smaller problems that you found yourself with another set of problems, which is how I'm gonna connect this data, how I'm gonna you know, that the insights happens and emerges from the interconnection of the data domains right? It does not necessarily locked into one domain. So the concerns around interoperability and standardization and getting value as a result of composition and interconnection of these domains requires a new approach to governance. And we have to think about governance very differently based on a Federated model and based on a computational model. Like once we have this powerful self serve platform, we can computational e automate a lot of governance decisions. Um, that security decisions and policy decisions that applies to you know, this fabric of mesh not just a single domain or not in a centralized. Also, really. As you mentioned that the most important component of the emissions distribution of ownership and distribution of architecture and data the rest of them is to solve all the problems that come with that. >>So very powerful guys. We actually have a picture of what Jamaat just described. Bring up, bring up figure three, if you would tell me it. Essentially, you're advocating for the pushing of the pipeline and all its various functions into the lines of business and abstracting that complexity of the underlying infrastructure, which you kind of show here in this figure, data infrastructure is a platform down below. And you know what I love about this Jama is it to me, it underscores the data is not the new oil because I could put oil in my car I can put in my house, but I can't put the same court in both places. But I think you call it polyglot data, which is really different forms, batch or whatever. But the same data data doesn't follow the laws of scarcity. I can use the same data for many, many uses, and that's what this sort of graphic shows. And then you brought in the really important, you know, sticking problem, which is that you know the governance which is now not a command and control. It's it's Federated governance. So maybe you could add some thoughts on that. >>Sure, absolutely. It's one of those I think I keep referring to data much as a paradigm shift. And it's not just to make it sound ground and, you know, like, kind of ground and exciting or in court. And it's really because I want to point out, we need to question every moment when we make a decision around how we're going to design security or governance or modeling off the data, we need to reflect and go back and say, um, I applying some of my cognitive biases around how I have worked for the last 40 years, I have seen it work. Or do I do I really need to question. And we do need to question the way we have applied governance. I think at the end of the day, the rule of the data governance and objective remains the same. I mean, we all want quality data accessible to a diverse set of users. And these users now have different personas, like David, Personal data, analyst data, scientists, data application, Um, you know, user, very diverse personal. So at the end of the day, we want quality data accessible to them, um, trustworthy in in an easy consumable way. Um, however, how we get there looks very different in as you mentioned that the governance model in the old world has been very commander control, very centralized. Um, you know, they were responsible for quality. They were responsible for certification off the data, you know, applying making sure the data complies. But also such regulations Make sure you know, data gets discovered and made available in the world of the data mesh. Really. The job of the data governance as a function becomes finding that equilibrium between what decisions need to be um, you know, made and enforced globally. And what decisions need to be made locally so that we can have an interoperable measure. If data sets that can move fast and can change fast like it's really about instead of hardest, you know, kind of putting the putting those systems in a straitjacket of being constant and don't change, embrace, change and continuous change of landscape because that's that's just the reality we can't escape. So the role of governance really the governance model called Federated and Computational. And by that I mean, um, every domain needs to have a representative in the governance team. So the role of the data or domain data product owner who really were understand the data that domain really well but also wears that hacks of a product owner. It is an important role that had has to have a representation in the governance. So it's a federation off domains coming together, plus the SMEs and people have, you know, subject matter. Experts who understands the regulations in that environmental understands the data security concerns, but instead off trying to enforce and do this as a central team. They make decisions as what need to be standardized, what need to be enforced. And let's push that into that computational E and in an automated fashion into the into the camp platform itself. For example, instead of trying to do that, you know, be part of the data quality pipeline and inject ourselves as people in that process, let's actually, as a group, define what constitutes quality, like, how do we measure quality? And then let's automate that and let Z codify that into the platform so that every native products will have a C I City pipeline on as part of that pipeline. Those quality metrics gets validated and every day to product needs to publish those SLOC or service level objectives. So you know, whatever we choose as a measure of quality, maybe it's the, you know, the integrity of the data, the delay in the data, the liveliness of it, whatever the are the decisions that you're making, let's codify that. So it's, um, it's really, um, the role of the governance. The objectives of the governance team tried to satisfies the same, but how they do it. It is very, very different. I wrote a new article recently trying to explain the logical architecture that would emerge from applying these principles. And I put a kind of light table to compare and contrast the roll off the You know how we do governance today versus how we will do it differently to just give people a flavor of what does it mean to embrace the centralization? And what does it mean to embrace change and continuous change? Eso hopefully that that that could be helpful. >>Yes, very so many questions I haven't but the point you make it to data quality. Sometimes I feel like quality is the end game. Where is the end game? Should be how fast you could go from idea to monetization with the data service. What happens again? You sort of address this, but what happens to the underlying infrastructure? I mean, spinning a PC to S and S three buckets and my pie torches and tensor flows. And where does that that lives in the business? And who's responsible for that? >>Yeah, that's I'm glad you're asking this question. Maybe because, um, I truly believe we need to re imagine that world. I think there are many pieces that we can use Aziz utilities on foundational pieces, but I but I can see for myself a 5 to 7 year roadmap of building this new tooling. I think, in terms of the ownership, the question around ownership, if that would remains with the platform team, but and perhaps the domain agnostic, technology focused team right that there are providing instead of products themselves. And but the products are the users off those products are data product developers, right? Data domain teams that now have really high expectations in terms of low friction in terms of lead time to create a new data product. Eso We need a new set off tooling, and I think with the language needs to shift from, You know, I need a storage buckets. So I need a storage account. So I need a cluster to run my, you know, spark jobs, too. Here's the declaration of my data products. This is where the data for it will come from. This is the data that I want to serve. These are the policies that I need toe apply in terms of perhaps encryption or access control. Um, go make it happen. Platform, go provision, Everything that I mean so that as a data product developer. All I can focus on is the data itself, representation of semantic and representation of the syntax. And make sure that data meets the quality that I have that I have to assure and it's available. The rest of provisioning of everything that sits underneath will have to get taken care of by the platform. And that's what I mean by requires a re imagination and in fact, Andi, there will be a data platform team, the data platform teams that we set up for our clients. In fact, themselves have a favorite of complexity. Internally, they divide into multiple teams multiple planes, eso there would be a plane, as in a group of capabilities that satisfied that data product developer experience, there would be a set of capabilities that deal with those need a greatly underlying utilities. I call it at this point, utilities, because to me that the level of abstraction of the platform is to go higher than where it is. So what we call platform today are a set of utilities will be continuing to using will be continuing to using object storage, will continue using relation of databases and so on so there will be a plane and a group of people responsible for that. There will be a group of people responsible for capabilities that you know enable the mesh level functionality, for example, be able to correlate and connects. And query data from multiple knows. That's a measure level capability to be able to discover and explore the measure data products as a measure of capability. So it would be set of teams as part of platforms with a strong again platform product thinking embedded and product ownership embedded into that. To satisfy the experience of this now business oriented domain data team teams s way have a lot of work to do. >>I could go on. Unfortunately, we're out of time. But I guess my first I want to tell people there's two pieces that you put out so far. One is, uh, how to move beyond a monolithic data lake to a distributed data mesh. You guys should read that in a data mesh principles and logical architectures kind of part two. I guess my last question in the very limited time we have is our organization is ready for this. >>E think the desire is there I've bean overwhelmed with number off large and medium and small and private and public governments and federal, you know, organizations that reached out to us globally. I mean, it's not This is this is a global movement and I'm humbled by the response of the industry. I think they're the desire is there. The pains are really people acknowledge that something needs to change. Here s so that's the first step. I think that awareness isa spreading organizations. They're more and more becoming aware. In fact, many technology providers are reach out to us asking what you know, what shall we do? Because our clients are asking us, You know, people are already asking We need the data vision. We need the tooling to support. It s oh, that awareness is there In terms of the first step of being ready, However, the ingredients of a successful transformation requires top down and bottom up support. So it requires, you know, support from Chief Data Analytics officers or above the most successful clients that we have with data. Make sure the ones that you know the CEOs have made a statement that, you know, we want to change the experience of every single customer using data and we're going to do, we're going to commit to this. So the investment and support, you know, exists from top to all layers. The engineers are excited that maybe perhaps the traditional data teams are open to change. So there are a lot of ingredients. Substance to transformation is to come together. Um, are we really ready for it? I think I think the pioneers, perhaps the innovators. If you think about that innovation, careful. My doctors, probably pioneers and innovators and leaders. Doctors are making making move towards it. And hopefully, as the technology becomes more available, organizations that are less or in, you know, engineering oriented, they don't have the capability in house today, but they can buy it. They would come next. Maybe those are not the ones who aren't quite ready for it because the technology is not readily available. Requires, you know, internal investment today. >>I think you're right on. I think the leaders are gonna lead in hard, and they're gonna show us the path over the next several years. And I think the the end of this decade is gonna be defined a lot differently than the beginning. Jammeh. Thanks so much for coming in. The Cuban. Participate in the >>program. Pleasure head. >>Alright, Keep it right. Everybody went back right after this short break.

Published Date : Jan 22 2021

SUMMARY :

cloud brought to you by silicon angle in 2000 The modern big data movement It's a pleasure to have you on the program. This wonderful to be here. pretty outspoken about the need for a paradigm shift in how we manage our data and our platforms the only way we get access to you know various applications on the Web pages is to So on the left here we're adjusting data from the operational lot of data teams globally just to see, you know, what are the pain points? that's problematic for some of the organizations that you work with and maybe give some examples. And that transformation is that, you know, heavy process, because you fundamentally So let's talk about the answer that you and your colleagues are proposing. the changes we needed to make was always, you know, our fellow Noto, how the architecture was centralized And then you brought in the really important, you know, sticking problem, which is that you know the governance which So at the end of the day, we want quality data accessible to them, um, Where is the end game? And make sure that data meets the quality that I I guess my last question in the very limited time we have is our organization is ready So the investment and support, you know, Participate in the Alright, Keep it right.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Jean Marc de Connie	PERSON	0.99+
Hal Varian	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
New York City	LOCATION	0.99+
John Mark	PERSON	0.99+
5	QUANTITY	0.99+
Jeff Ham Abakar	PERSON	0.99+
two year	QUANTITY	0.99+
two pieces	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
John	PERSON	0.99+
nine months	QUANTITY	0.99+
2000	DATE	0.99+
18 months	QUANTITY	0.99+
first step	QUANTITY	0.99+
second principle	QUANTITY	0.99+
both places	QUANTITY	0.99+
both	QUANTITY	0.99+
One	QUANTITY	0.99+
a year	QUANTITY	0.99+
one part	QUANTITY	0.99+
first	QUANTITY	0.99+
Claudette Cloudera	PERSON	0.99+
third principle	QUANTITY	0.98+
10	DATE	0.98+
first principle	QUANTITY	0.98+
one domain	QUANTITY	0.98+
today	DATE	0.98+
Lee	PERSON	0.98+
one phrase	QUANTITY	0.98+
three phases	QUANTITY	0.98+
Cuban	OTHER	0.98+
Jammeh	PERSON	0.97+
7 year	QUANTITY	0.97+
Mawr	PERSON	0.97+
Jamaat	PERSON	0.97+
last decade	DATE	0.97+
Maurin Mawr	PERSON	0.94+
single domain	QUANTITY	0.92+
one thing	QUANTITY	0.91+
ThoughtWorks	ORGANIZATION	0.9+
one	QUANTITY	0.9+
nine	QUANTITY	0.9+
theCUBE	ORGANIZATION	0.89+
end	DATE	0.88+
last few decades	DATE	0.87+
one place	QUANTITY	0.87+
Second Hadoop World	EVENT	0.86+
three	OTHER	0.85+
C. E. O	ORGANIZATION	0.84+
this decade	DATE	0.84+
Siris	TITLE	0.83+
coming decade	DATE	0.83+
Andi	PERSON	0.81+
Chamakh	PERSON	0.8+
three buckets	QUANTITY	0.77+
Jama	PERSON	0.77+
Cuban	PERSON	0.76+
Aziz	ORGANIZATION	0.72+
years	DATE	0.72+
first class	QUANTITY	0.72+
last 40	DATE	0.67+
single customer	QUANTITY	0.66+
part two	OTHER	0.66+
last	DATE	0.66+
Cloud	TITLE	0.56+
2021	DATE	0.55+
next 10 years	DATE	0.54+
Hadoop	EVENT	0.53+
following year	DATE	0.53+
years	QUANTITY	0.51+
Cube	ORGANIZATION	0.5+
Noto	ORGANIZATION	0.45+
Cube	PERSON	0.39+
Cube	COMMERCIAL_ITEM	0.26+

Zhamak Dehghani, Director of Emerging Technologies at ThoughtWorks

(bright music) >> In 2009, Hal Varian, Google's Chief Economist said that statisticians would be the sexiest job in the coming decade. The modern big data movement really took off later in the following year, after the second Hadoop World, which was hosted by Cloudera, in New York city. Jeff Hama Bachar, famously declared to me and John Furrie, in "theCUBE," that the best minds of his generation were trying to figure out how to get people to click on ads. And he said that sucks. The industry was abuzz with the realization that data was the new competitive weapon. Hadoop was heralded as the new data management paradigm. Now what actually transpired over the next 10 years was only a small handful of companies could really master the complexities of big data and attract the data science talent, really necessary to realize massive returns. As well, back then, cloud was in the early stages of its adoption. When you think about it at the beginning of the last decade, and as the years passed, more and more data got moved to the cloud, and the number of data sources absolutely exploded, experimentation accelerated, as did the pace of change. Complexity just overwhelmed big data infrastructures and data teams, leading to a continuous stream of incremental technical improvements designed to try and keep pace, things like data lakes, data hubs, new open source projects, new tools, which piled on even more complexity. And as we reported, we believe what's needed is a complete bit flip and how we approach data architectures. Our next guest is Zhamak Dehgani, who is the Director of Emerging Technologies at ThoughtWorks. Zhamak is a software engineer, architect, thought leader and advisor, to some of the world's most prominent enterprises. She's in my view, one of the foremost advocates for rethinking and changing the way we create and manage data architectures, favoring a decentralized over monolithic structure, and elevating domain knowledge as a primary criterion, and how we organize so-called big data teams and platforms. Zhamak, welcome to the cube, it's a pleasure to have you on the program. >> Hi David, it's wonderful to be here. >> Okay. So you're pretty outspoken about the need for a paradigm shift, in how we manage our data, and our platforms at scale. Why do you feel we need such a radical change? What's your thoughts there? >> Well, I think if you just look back over the last decades, you gave us a summary of what happened since 2010. But even if we got it before then, what we have done over the last few decades is basically repeating, and as you mentioned, incrementally improving how we manage data, based on certain assumptions around, as you mentioned, centralization. Data has to be in one place so we can get value from it. But if you look at the parallel movement of our industry in general, since the birth of internet, we are actually moving towards decentralization. If we think today, like if in this move data side, if we said, the only way web would work, the only way we get access to various applications on the web or pages is to centralize it, we would laugh at that idea, but for some reason, we don't question that when it comes to data, right? So I think it's time to embrace the complexity that comes with the growth of number of sources, the proliferation of sources and consumptions models, embrace the distribution of sources of data, that they're not just within one part of organization. They're not just within even bounds of organizations. They're beyond the bounds of organization, and then look back and say, okay, if that's the trend of our industry in general, given the fabric of compensation and data that we put in globally in place, then how the architecture and technology and organizational structure incentives need to move, to embrace that complexity. And to me, that requires a paradigm shift. A full stack from how we organize our organizations, how we organize our teams, how we put a technology in place to look at it from a decentralized angle. >> Okay, so let's unpack that a little bit. I mean, you've spoken about and written today's big architecture, and you've basically just mentioned that it's flawed. So I want to bring up, I love your diagrams, you have a simple diagram, guys if you could bring up figure one. So on the left here, we're adjusting data from the operational systems, and other enterprise data sets. And of course, external data, we cleanse it, you've got to do the quality thing, and then serve them up to the business. So what's wrong with that picture that we just described, and give granted it's a simplified form. >> Yeah. Quite a few things. So, and I would flip the question maybe back to you or the audience. If we said that there are so many sources of the data and actually data comes from systems and from teams that are very diverse in terms of domains, right? Domain. If you just think about, I don't know, retail, the E-Commerce versus auto management, versus customer. These are very diverse domains. The data comes from many different diverse domains, and then we expect to put them under the control of a centralized team, a centralized system. And I know that centralization probably, if you zoom out is centralized, if you zoom in it's compartmentalized based on functions, and we can talk about that. And we assume that the centralized model, will be getting that data, making sense of it, cleansing and transforming it, then to satisfy a need of very diverse set of consumers without really understanding the domains because the teams responsible for it are not close to the source of the data. So there is a bit of a cognitive gap and domain understanding gap, without really understanding how the data is going to be used. I've talked to numerous, when we came to this, I came up with the idea. I talked to a lot of data teams globally, just to see, what are the pain points? How are they doing it? And one thing that was evident in all of those conversations, that they actually didn't know, after they built these pipelines and put the data in, whether the data warehouse tables or linked, they didn't know how the data was being used. But yet they're responsible for making the data available for this diverse set of use cases. So essentially system and monolithic system, often is a bottleneck. So what you find is that a lot of the teams are struggling with satisfying the needs of the consumers, are struggling with really understanding the data, the domain knowledge is lost, there is a loss of understanding and kind of it in that transformation, often we end up training machine learning models on data, that is not really representative of the reality of the business, and then we put them to production and they don't work because the semantic and the syntax of the data gets lost within that translation. So, and we are struggling with finding people to manage a centralized system because still the technology's fairly, in my opinion, fairly low level and exposes the users of those technology sets and let's say they warehouse a lot of complexity. So in summary, I think it's a bottleneck, it's not going to satisfy the pace of change or pace of innovation, and the availability of sources. It's disconnected and fragmented, even though there's centralized, it's disconnected and fragmented from where the data comes from and where the data gets used, and is managed by a team of hyper specialized people, they're struggling to understand the actual value of the data, the actual format of the data. So it's not going to get us where our aspirations, our ambitions need to be. >> Yeah, so the big data platform is essentially, I think you call it context agnostic. And so as data becomes more important in our lives, you've got all these new data sources injected into the system, experimentation as we said, the cloud becomes much, much easier. So one of the blockers that you've cited and you just mentioned it, is you've got these hyper specialized roles, the data engineer, the quality engineer, data scientist. And it's a losery. I mean, it's like an illusion. These guys, they seemingly they're independent, and can scale independently, but I think you've made the point that in fact, they can't. That a change in a data source has an effect across the entire data life cycle, entire data pipeline. So maybe you could add some some color to why that's problematic for some of the organizations that you work with, and maybe give some examples. >> Yeah, absolutely. So in fact initially, the hypothesis around data mesh came from a series of requests that we received from our both large scale and progressive clients, and progressive in terms of their investment in data architecture. So these were clients that were larger scale, they had diverse and rich set of domain, some of them were big technology, tech companies, some of them were big retail companies, big healthcare companies. So they had that diversity of the data and a number of the sources of the domains. They had invested for quite a few years in generations, of they had multi-generations of PROPRICER data warehouses on prem that were moving to cloud. They had moved through the various revisions of the Hadoop clusters, and they were moving to that to cloud, and then the challenges that they were facing were simply... If I want to just simplify it in one phrase, they we're not getting value from the data that they were collecting. They were continuously struggling to shift the culture because there was so much friction between all of these three phases of both consumption of the data, then transformation and making it available. Consumption from sources and then providing it and serving it to the consumer. So that whole process was full of friction. Everybody was unhappy. So it's bottom line is that you're collecting all this data, there is delay, there is lack of trust in the data itself, because the data is not representative of the reality, it's gone through the transformation, but people that didn't understand really what the data was got delayed. And so there's no trust, it's hard to get to the data. Ultimately, it's hard to create value from the data, and people are working really hard and under a lot of pressure, but it's still struggling. So we often, our solutions, like we are... Technologies, we will often point out to technology. So we go. Okay, this version of some proprietary data warehouse we're using is not the right thing. We should go to the cloud and that certainly will solve our problem, right? Or warehouse wasn't a good one, let's make a data Lake version. So instead of extracting and then transforming and loading into the database, and that transformation is that heavy process because you fundamentally made an assumption using warehouses that if I transform this data into this multidimensional perfectly designed schema, that then everybody can draw on whatever query they want, that's going to solve everybody's problem. But in reality, it doesn't because you are delayed and there is no universal model that serves everybody's need, everybody needs are diverse. Data scientists necessarily don't like the perfectly modeled data, they're for both signals and the noise. So then we've just gone from ATLs to let's say now to Lake, which is... Okay, let's move the transformation to the last mile. Let's just get load the data into the object stores and sort of semi-structured files and get the data scientists use it, but they still struggling because of the problems that we mentioned. So then what is the solution? What is the solution? Well, next generation data platform. Let's put it on the cloud. And we saw clients that actually had gone through a year or multiple years of migration to the cloud but it was great, 18 months, I've seen nine months migrations of the warehouse versus two year migrations of various data sources to the cloud. But ultimately the result is the same, unsatisfied, frustrated data users, data providers with lack of ability to innovate quickly on relevant data and have an experience that they deserve to have, have a delightful experience of discovering and exploring data that they trust. And all of that was still amiss. So something else more fundamentally needed to change than just the technology. >> So the linchpin to your scenario is this notion of context. And you pointed out, you made the other observation that "Look we've made our operational systems context aware but our data platforms are not." And like CRM system sales guys are very comfortable with what's in the CRMs system. They own the data. So let's talk about the answer that you and your colleagues are proposing. You're essentially flipping the architecture whereby those domain knowledge workers, the builders if you will, of data products or data services, they are now first-class citizens in the data flow, and they're injecting by design domain knowledge into the system. So I want to put up another one of your charts guys, bring up the figure two there. It talks about convergence. She showed data distributed, domain driven architecture, the self-serve platform design, and this notion of product thinking. So maybe you could explain why this approach is so desirable in your view. >> Sure. The motivation and inspirations for that approach came from studying what has happened over the last few decades in operational systems. We had a very similar problem prior to microservices with monolithic systems. One of the things systems where the bottleneck, the changes we needed to make was always on vertical now to how the architecture was centralized. And we found a nice niche. And I'm not saying this is a perfect way of decoupling your monolith, but it's a way that currently where we are in our journey to become data driven, it is a nice place to be, which is distribution or a decomposition of your system as well as organization. I think whenever we talk about systems, we've got to talk about people and teams that are responsible for managing those systems. So the decomposition of the systems and the teams, and the data around domains. Because that's how today we are decoupling our business, right? We are decoupling our businesses around domains, and that's a good thing. And what does that do really for us? What it does is it localizes change to the bounded context of that business. It creates clear boundary and interfaces and contracts between the rest of the universe of the organization, and that particular team, so removes the friction that often we have for both managing the change, and both serving data or capability. So if the first principle of data meshes, let's decouple this world of analytical data the same to mirror. The same way we have decoupled our systems and teams, and business. Why data is any different. And the moment you do that, so the moment you bring the ownership to people who understands the data best, then you get questions that well, how is that any different from silos of disconnected databases that we have today and nobody can get to the data? So then the rest of the principles is really to address all of the challenges that comes with this first principle of decomposition around domain context. And the second principle is, well, we have to expect a certain level of quality and accountability, and responsibility for the teams that provide the data. So let's bring products thinking and treating data as a product, to the data that these teams now share, and let's put accountability around it. We need a new set of incentives and metrics for domain teams to share the data, we need to have a new set of kind of quality metrics that define what it means for the data to be a product, and we can go through that conversation perhaps later. So then the second principle is, okay, the teams now that are responsible, the domain teams responsible for their analytical data need to provide that data with a certain level of quality and assurance. Let's call that a product, and bring product thinking to that. And then the next question you get asked off at work by CIO or CTO is the people who build the infrastructure and spend the money. They say, well, "It's actually quite complex to manage big data, now where we want everybody, every independent team to manage the full stack of storage and computation and pipelines and access control and all of that." Well, we've solved that problem in operational world. And that requires really a new level of platform thinking to provide infrastructure and tooling to the domain teams to now be able to manage and serve their big data, and I think that requires re-imagining the world of our tooling and technology. But for now, let's just assume that we need a new level of abstraction to hide away a ton of complexity that unnecessarily people get exposed to. And that's the third principle of creating self-serve infrastructure to allow autonomous teams to build their domains. But then the last pillar, the last fundamental pillar is okay, once he distributed a problem into smaller problems that you found yourself with another set of problems, which is how I'm going to connect this data. The insights happens and emerges from the interconnection of the data domains, right? It's just not necessarily locked into one domain. So the concerns around interoperability and standardization and getting value as a result of composition and interconnection of these domains requires a new approach to governance. And we have to think about governance very differently based on a federated model. And based on a computational model. Like once we have this powerful self-serve platform, we can computationally automate a lot of covenants decisions and security decisions, and policy decisions, that applies to this fabric of mesh, not just a single domain or not in a centralized. So really, as you mentioned, the most important component of the data mesh is distribution of ownership and distribution of architecture in data, the rest of them is to solve all the problems that come with that. >> So, very powerful. And guys, we actually have a picture of what Zhamak just described. Bring up figure three, if you would. So I mean, essentially, you're advocating for the pushing of the pipeline and all its various functions into the lines of business and abstracting that complexity of the underlying infrastructure which you kind of show here in this figure, data infrastructure as a platform down below. And you know why I love about this, Zhamak, is, to me it underscores the data is not the new oil. Because I can put oil in my car, I can put it in my house but I can't put the same code in both places. But I think you call it polyglot data, which is really different forms, batch or whatever. But the same data doesn't follow the laws of scarcity. I can use the same data for many, many uses, and that's what this sort of graphic shows. And then you brought in the really important, sticking problem, which is that the governance which is now not a command and control, it's federated governance. So maybe you could add some thoughts on that. >> Sure, absolutely. It's one of those, I think I keep referring to data mesh as a paradigm shift, and it's not just to make it sound grand and like kind of grand and exciting or important, it's really because I want to point out, we need to question every moment when we make a decision around, how we're going to design security, or governance or modeling of the data. We need to reflect and go back and say, "Am I applying some of my cognitive biases around how I have worked for the last 40 years?" I've seen it work? Or "Do I do I really need to question?" And do need to question the way we have applied governance. I think at the end of the day, the role of the data governance and the objective remains the same. I mean, we all want quality data accessible to a diverse set of users and its users now know have different personas, like data persona, data analysts, data scientists, data application user. These are very diverse personas. So at the end of the day, we want quality data accessible to them, trustworthy in an easy consumable way. However, how we get there looks very different in as you mentioned that the governance model in the old world has been very command and control, very centralized. They were responsible for quality, they were responsible for certification of the data, applying and making sure the data complies with all sorts of regulations, make sure data gets discovered and made available. In the world of data mesh, really the job of the data governance as a function becomes finding the equilibrium between what decisions need to be made and enforced globally, and what decisions need to be made locally so that we can have an interoperable mesh of data sets that can move fast and can change fast. It's really about, instead of kind of putting those systems in a straight jacket of being constantly and don't change, embrace change, and continuous change of landscape because that's just the reality we can't escape. So the role of governance really, the modern governance model I called federated and computational. And by that I mean, every domain needs to have a representative in the governance team. So the role of the data or domain data product owner who really were understands that domain really well, but also wears that hats of the product owner. It's an important role that has to have a representation in the governance. So it's a federation of domains coming together. Plus the SMEs, and people have Subject Matter Experts who understand the regulations in that environment, who understands the data security concerns. But instead of trying to enforce and do this as a central team, they make decisions as what needs to be standardized. What needs to be enforced. And let's push that into that computationally and in an automated fashion into the platform itself, For example. Instead of trying to be part of the data quality pipeline and inject ourselves as people in that process, let's actually as a group, define what constitutes quality. How do we measure quality? And then let's automate that, and let's codify that into the platform, so that every day the products will have a CICD pipeline, and as part of that pipeline, law's quality metrics gets validated, and every day to product needs to publish those SLOs or Service Level Objectives, or whatever we choose as a measure of quality, maybe it's the integrity of the data, or the delay in the data, the liveliness of the data, whatever are the decisions that you're making. Let's codify that. So it's really the objectives of the governance team trying to satisfies the same, but how they do it, it's very, very different. And I wrote a new article recently, trying to explain the logical architecture that would emerge from applying these principles, and I put a kind of a light table to compare and contrast how we do governance today, versus how we'll do it differently, to just give people a flavor of what does it mean to embrace decentralization, and what does it mean to embrace change, and continuous change. So hopefully that could be helpful. >> Yes. There's so many questions I have. But the point you make it too on data quality, sometimes I feel like quality is the end game, Where the end game should be how fast you can go from idea to monetization with a data service. What happens again? And you've sort of addressed this, but what happens to the underlying infrastructure? I mean, spinning up EC2s and S3 buckets, and MyPytorches and TensorFlows. That lives in the business, and who's responding for that? >> Yeah, that's why I'm glad you're asking this question, David, because I truly believe we need to reimagine that world. I think there are many pieces that we can use as utilities are foundational pieces, but I can see for myself at five to seven year road map building this new tooling. I think in terms of the ownership, the question around ownership, that would remain with the platform team, but I don't perhaps a domain agnostic technology focused team, right? That there are providing a set of products themselves, but the users of those products are data product developers, right? Data domain teams that now have really high expectations, in terms of low friction, in terms of a lead time to create a new data products. So we need a new set of tooling and I think the language needs to shift from I need a storage bucket, or I need a storage account, to I need a cluster to run my spark jobs. Too, here's the declaration of my data products. This is where the data file will come from, this is a data that I want to serve, these are the policies that I need to apply in terms of perhaps encryption or access control, go make it happen platform, go provision everything that I need, so that as a data product developer, all I can focus on is the data itself. Representation of semantic and representation of the syntax, and make sure that data meets the quality that I have to assure and it's available. The rest of provisioning of everything that sits underneath will have to get taken care of by the platform. And that's what I mean by requires a reimagination. And there will be a data platform team. The data platform teams that we set up for our clients, in fact themselves have a fair bit of complexity internally, they divide into multiple teams, multiple planes. So there would be a plane, as in a group of capabilities that satisfied that data product developer experience. There would be a set of capabilities that deal with those nitty gritty underlying utilities, I call them (indistinct) utilities because to me, the level of abstraction of the platform needs to go higher than where it is. So what we call platform today are a set of utilities we'll be continuing to using. We'll be continuing to using object storage, we will continue to using relational databases and so on. So there will be a plane and a group of people responsible for that. There will be a group of people responsible for capabilities that enable the mesh level functionality, for example, be able to correlate and connect and query data from multiple nodes, that's a mesh level capability, to be able to discover and explore the mesh of data products, that's the mesh of capability. So it would be a set of teams as part of platform. So we use a strong, again, products thinking embedded in a product and ownership embedded into that to satisfy the experience of this now business oriented domain data teams. So we have a lot of work to do. >> I could go on, unfortunately, we're out of time, but I guess, first of all, I want to tell people there's two pieces that you've put out so far. One is how to move beyond a Monolithic Data Lake to a distributed data mesh. You guys should read that in the "Data Mesh Principles and Logical Architecture," is kind of part two. I guess my last question in the very limited time we have is are organizations ready for this? >> I think how the desire is there. I've been overwhelmed with the number of large and medium and small and private and public, and governments and federal organizations that reached out to us globally. I mean, this is a global movement and I'm humbled by the response of the industry. I think, the desire is there, the pains are real, people acknowledge that something needs to change here. So that's the first step. I think awareness is spreading, organizations are more and more becoming aware, in fact, many technology providers are reaching to us asking what shall we do because our clients are asking us, people are already asking, we need the data mesh and we need the tooling to support it. So that awareness is there in terms of the first step of being ready. However, the ingredients of a successful transformation requires top-down and bottom-up support. So it requires support from chief data analytics officers, all above, the most successful clients that we have with data mesh are the ones that, the CEOs have made a statement that, "We'd want to change the experience of every single customer using data, and we're going to commit to this." So the investment and support exists from top to all layers, the engineers are excited, the maybe perhaps the traditional data teams are open to change. So there are a lot of ingredients of transformations that come together. Are we really ready for it? I think the pioneers, perhaps, the innovators if you think about that innovation curve of adopters, probably pioneers and innovators and lead adopters are making moves towards it, and hopefully as the technology becomes more available, organizations that are less engineering oriented, they don't have the capability in-house today, but they can buy it, they would come next. Maybe those are not the ones who are quite ready for it because the technology is not readily available and requires internal investments to make. >> I think you're right on. I think the leaders are going to lean in hard and they're going to show us the path over the next several years. And I think that the end of this decade is going to be defined a lot differently than the beginning. Zhamak, thanks so much for coming to "theCUBE" and participating in the program. >> Thank you for hosting me, David. >> Pleasure having you. >> It's been wonderful. >> All right, keep it right there everybody, we'll be back right after this short break. (slow music)

Published Date : Dec 23 2020

SUMMARY :

and attract the data science and our platforms at scale. and data that we put in globally in place, So on the left here, we're adjusting data how the data is going to be used. So one of the blockers that you've cited and a number of the So the linchpin to your scenario for the data to be a product, is that the governance So at the end of the day, we But the point you make and make sure that data meets the quality in the "Data Mesh Principles and hopefully as the technology and participating in the program. after this short break.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
David	PERSON	0.99+
Michael	PERSON	0.99+
Marc Lemire	PERSON	0.99+
Chris O'Brien	PERSON	0.99+
Verizon	ORGANIZATION	0.99+
Hilary	PERSON	0.99+
Mark	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Ildiko Vancsa	PERSON	0.99+
John	PERSON	0.99+
Alan Cohen	PERSON	0.99+
Lisa Martin	PERSON	0.99+
John Troyer	PERSON	0.99+
Rajiv	PERSON	0.99+
Europe	LOCATION	0.99+
Stefan Renner	PERSON	0.99+
Ildiko	PERSON	0.99+
Mark Lohmeyer	PERSON	0.99+
JJ Davis	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Beth	PERSON	0.99+
Jon Bakke	PERSON	0.99+
John Farrier	PERSON	0.99+
Boeing	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Dave Nicholson	PERSON	0.99+
Cassandra Garber	PERSON	0.99+
Peter McKay	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Dave Brown	PERSON	0.99+
Beth Cohen	PERSON	0.99+
Stu Miniman	PERSON	0.99+
John Walls	PERSON	0.99+
Seth Dobrin	PERSON	0.99+
Seattle	LOCATION	0.99+
5	QUANTITY	0.99+
Hal Varian	PERSON	0.99+
JJ	PERSON	0.99+
Jen Saavedra	PERSON	0.99+
Michael Loomis	PERSON	0.99+
Lisa	PERSON	0.99+
Jon	PERSON	0.99+
Rajiv Ramaswami	PERSON	0.99+
Stefan	PERSON	0.99+

Democratizing AI and Advanced Analytics with Dataiku x Snowflake

>>My name is Dave Volonte, and with me are two world class technologists, visionaries and entrepreneurs. And Wa Dodgeville is the he co founded Snowflake, and he's now the president of the product division. And Florian Duetto is the co founder and CEO of Data Aiko. Gentlemen, welcome to the Cube to first timers. Love it. >>Great to be here >>now, Florian you and Ben Wa You have a number of customers in common. And I have said many times on the Cube that you know, the first era of cloud was really about infrastructure, making it more agile, taking out costs. And the next generation of innovation is really coming from the application of machine intelligence to data with the cloud is really the scale platform. So is that premise your relevant to you? Do you buy that? And and why do you think snowflake and data ICU make a good match for customers? >>I think that because it's our values that are aligned when it's all about actually today allowing complexity for customers. So you close the gap or the democratizing access to data access to technology. It's not only about data data is important, but it's also about the impact of data. Who can you make the best out of data as fast as possible as easily as possible within an organization. And another value is about just the openness of the platform building the future together? Uh, I think a platform that is not just about the platform but also full ecosystem of partners around it, bringing the level off accessibility and flexibility you need for the 10 years away. >>Yeah, so that's key. But it's not just data. It's turning data into insights. Have been why you came out of the world of very powerful but highly complex databases. And we know we all know that you and the snowflake team you get very high marks for really radically simplifying customers lives. But can you talk specifically about the types of challenges that your customers air using snowflake to solve? >>Yeah, so So the really the challenge, you know, be four. Snowflake. I would say waas really? To put all the data, you know, in one place and run all the computers, all the workloads that you wanted to run, You know, against that data and off course, you know, existing legacy platforms. We're not able to support. You know that level of concurrency, Many workload. You know, we we talk about machine learning that a science that are engendering, you know, that our house big data were closed or running in one place didn't make sense at all. And therefore, you know what customers did is to create silos, silos of data everywhere, you know, with different system having a subset of the data. And of course, now you cannot analyze this data in one place. So, snowflake, we really solve that problem by creating a single, you know, architectural where you can put all the data in the cloud. So it's a really cloud native we really thought about You know how to solve that problem, how to create, you know, leverage, Cloud and the lessee cc off cloud to really put all the die in one place, but at the same time not run all workload at the same place. So each workload that runs in Snowflake that is dedicated, You know, computer resource is to run, and that makes it very Ajai, right? You know, Floyd and talk about, you know, data scientists having to run analysis, so they need you know a lot of compute resources, but only for, you know, a few hours on. Do you know, with snowflake they can run these new work lord at this workload to the system, get the compute resources that they need to run this workload. And when it's over, they can shut down. You know that their system, it will be automatically shut down. Therefore, they would not pay for the resources that they don't use. So it's a very Ajai system where you can do this, analyzes when you need, and you have all the power to run all this workload at the same time. >>Well, it's profound what you guys built to me. I mean, of course, everybody's trying to copy it now. It was like, remember that bringing the notion of bringing compute to the data and the Hadoop days, and I think that that Asai say everybody is sort of following your suit now are trying to Florian I gotta say the first data scientist I ever interviewed on the Cube was amazing. Hilary Mason, right after she started a bit Lee. And, you know, she made data science that sounds so compelling. But data science is hard. So same same question for you. What do you see is the biggest challenges for customers that they're facing with data science. >>The biggest challenge, from my perspective, is that owns you solve the issue of the data. Seidel with snowflake, you don't want to bring another Seidel, which would be a side off skills. Essentially, there is to the talent gap between the talented label of the market, or are it is to actually find recruits trained data scientist on what needs to be done. And so you need actually to simplify the access to technologies such as every organization can make it, whatever the talent, by bridging that gap and to get there, there is a need of actually breaking up the silos. And in a collaborative approach where technologists and business work together and actually put some their hands into those data projects together, >>it makes sense for flooring. Let's stay with you for a minute. If I can your observation spaces, you know it's pretty, pretty global, and and so you have a unique perspective on how companies around the world might be using data and data science. Are you seeing any trends may be differences between regions or maybe within different industries. What are you seeing? >>Yes. Yeah, definitely. I do see trends that are not geographic that much, but much more in terms of maturity of certain industries and certain sectors, which are that certain industries invested a lot in terms of data, data access, ability to start data in the last few years and no age, a level of maturity where they can invest more and get to the next steps. And it's really rely on the ability of certain medial certain organization actually to have built this long term strategy a few years ago and no start raping up the benefits. >>You know, a decade ago, Florian Hal Varian, we, you know, famously said that the sexy job in the next 10 years will be statisticians. And then everybody sort of change that to data scientists and then everybody. All the statisticians became data scientists, and they got a raise. But data science requires more than just statistics acumen. What what skills >>do >>you see as critical for the next generation of data science? >>Yeah, it's a good question because I think the first generation of the patient is became the licenses because they could done some pipe and quickly on be flexible. And I think that the skills or the next generation of data sentences will definitely be different. It will be first about being able to speak the language of the business, meaning, oh, you translate data inside predictive modeling all of this into actionable insight or business impact. And it would be about you collaborate with the rest of the business. It's not just a farce. You can build something off fast. You can do a notebook in python or your credit models off themselves. It's about, oh, you actually build this bridge with the business. And obviously those things are important. But we also has become the center of the fact that technology will evolve in the future. There will be new tools and technologies, and they will still need to keep this level of flexibility and get to understand quickly, quickly. What are the next tools they need to use the new languages or whatever to get there. >>As you look back on 2020 what are you thinking? What are you telling people as we head into next year? >>Yeah, I I think it's Zaveri interesting, right? We did this crisis, as has told us that the world really can change from one day to the next. And this has, you know, dramatic, you know, and perform the, you know, aspect. For example, companies all the sudden, you know, So their revenue line, you know, dropping. And they had to do less meat data. Some of the companies was the reverse, right? All the sudden, you know, they were online, like in stock out, for example, and their business, you know, completely, you know, change, you know, from one day to the other. So this GT off, You know, I, you know, adjusting the resource is that you have tow the task a need that can change, you know, using solution like snowflakes, you know, really has that. And we saw, you know, both in in our customers some customers from one day to the to do the next where, you know, growing like big time because they benefited, you know, from from from from co vid and their business benefited, but also, as you know, had to drop. And what is nice with with with cloud, it allows to, you know, I just compute resources toe, you know, to your business needs, you know, and really adjusted, you know, in our, uh, the the other aspect is is understanding what is happening, right? You need to analyze the we saw all these all our customers basically wanted to understand. What is that going to be the impact on my business? How can I adapt? How can I adjust? And and for that, they needed to analyze data. And, of course, a lot of data which are not necessarily data about, you know, their business, but also data from the outside. You know, for example, coffee data, You know, where is the States? You know, what is the impact? You know, geographic impact from covitz, You know, all the time and access to this data is critical. So this is, you know, the promise off the data crowd, right? You know, having one single place where you can put all the data off the world. So our customers, all the Children you know, started to consume the cov data from our that our marketplace and and we had the literally thousands of customers looking at this data analyzing this data, uh, to make good decisions So this agility and and and this, you know, adapt adapting, you know, from from one hour to the next is really critical. And that goes, you know, with data with crowding adjusting, resource is on and that's, you know, doesn't exist on premise. So So So indeed, I think the lesson learned is is we are living in a world which machines changing all the time and we have for understanding We have to adjust and and And that's why cloud, you know, somewhere it's great. >>Excellent. Thank you. You know the kid we like to talk about disruption, of course. Who doesn't on And also, I mean, you look at a I and and the impact that is beginning to have and kind of pre co vid. You look at some of the industries that were getting disrupted by, you know, we talked about digital transformation and you had on the one end of the spectrum industries like publishing which are highly disrupted or taxis. And you could say Okay, well, that's, you know, bits versus Adam, the old Negroponte thing. But then the flip side of that look at financial services that hadn't been dramatically disrupted. Certainly healthcare, which is ripe for disruption Defense. So the number number of industries that really hadn't leaned into digital transformation If it ain't broke, don't fix it. Not on my watch. There was this complacency and then, >>of >>course, co vid broke everything. So, florian, I wonder if you could comment? You know what industry or industries do you think you're gonna be most impacted by data science and what I call machine intelligence or a I in the coming years and decades? >>Honestly, I think it's all of them artist, most of them because for some industries, the impact is very visible because we're talking about brand new products, drones like cars or whatever that are very visible for us. But for others, we are talking about sport from changes in the way you operate as an organization, even if financial industry itself doesn't seems to be so impacted when you look it from the consumer side or the outside. In fact, internally, it's probably impacted just because the way you use data on developer for flexibility, you need the kind off cost gay you can get by leveraging the latest technologies is just enormous, and so it will actually transform the industry that also and overall, I think that 2020 is only a where, from the perspective of a I and analytics, we understood this idea of maturity and resilience, maturity, meaning that when you've got a crisis, you actually need data and ai more than before. You need to actually call the people from data in the room to take better decisions and look for a while and not background. And I think that's a very important learning from 2020 that will tell things about 2021 and the resilience it's like, Yeah, Data Analytics today is a function consuming every industries and is so important that it's something that needs to work. So the infrastructure is to work in frustration in super resilient. So probably not on prime on a fully and prime at some point and the kind of residence where you need to be able to plan for literally anything like no hypothesis in terms of behaviors can be taken for granted. And that's something that is new and which is just signaling that we're just getting to the next step for the analytics. >>I wonder, Benoit, if you have anything to add to that. I mean, I often wonder, you know, winter machine's gonna be able to make better diagnoses than doctors. Some people say already, you know? Well, the financial services traditional banks lose control of payment systems. Uh, you know what's gonna happen to big retail stores? I mean, maybe bring us home with maybe some of your final thoughts. >>Yeah, I would say, you know, I I don't see that as a negative, right? The human being will always be involved very closely, but the machine and the data can really have, you know, see, Coalition, you know, in the data that that would be impossible for for for human being alone, you know, you know, to to discover so So I think it's going to be a compliment, not a replacement on. Do you know everything that has made us you know faster, you know, doesn't mean that that we have less work to do. It means that we can doom or and and we have so much, you know, to do, uh, that that I would not be worried about, You know, the effect off being more efficient and and and better at at our you know, work. And indeed, you know, I fundamentally think that that data, you know, processing off images and doing, you know, I ai on on on these images and discovering, you know, patterns and and potentially flagging, you know, disease, where all year that then it was possible is going toe have a huge impact in in health care, Onda and And as as as Ryan was saying, every you know, every industry is going to be impacted by by that technology. So So, yeah, I'm very optimistic. >>Great guys. I wish we had more time. I gotta leave it there. But so thanks so much for coming on. The Cube was really a pleasure having you.

Published Date : Nov 20 2020

SUMMARY :

And Wa Dodgeville is the he co founded And I have said many times on the Cube that you know, the first era of cloud was really about infrastructure, So you close the gap or the democratizing access to data And we know we all know that you and the snowflake team you get very high marks for Yeah, so So the really the challenge, you know, be four. And, you know, And so you need actually to simplify the access to you know it's pretty, pretty global, and and so you have a unique perspective on how companies the ability of certain medial certain organization actually to have built this long term strategy You know, a decade ago, Florian Hal Varian, we, you know, famously said that the sexy job in the next And it would be about you collaborate with the rest of the business. So our customers, all the Children you know, started to consume the cov you know, we talked about digital transformation and you had on the one end of the spectrum industries You know what industry or industries do you think you're gonna be most impacted by data the kind of residence where you need to be able to plan for literally I mean, I often wonder, you know, winter machine's gonna be able to make better diagnoses that data, you know, processing off images and doing, you know, I ai on I gotta leave it there.

ENTITIES

Entity	Category	Confidence
Dave Volonte	PERSON	0.99+
Florian Duetto	PERSON	0.99+
Hilary Mason	PERSON	0.99+
Florian Hal Varian	PERSON	0.99+
Florian	PERSON	0.99+
Benoit	PERSON	0.99+
Ryan	PERSON	0.99+
Ben Wa	PERSON	0.99+
Data Aiko	ORGANIZATION	0.99+
2020	DATE	0.99+
10 years	QUANTITY	0.99+
Lee	PERSON	0.99+
Wa Dodgeville	PERSON	0.99+
next year	DATE	0.99+
python	TITLE	0.99+
Snowflake	ORGANIZATION	0.99+
first	QUANTITY	0.99+
one place	QUANTITY	0.99+
one hour	QUANTITY	0.98+
a decade ago	DATE	0.98+
Floyd	PERSON	0.98+
2021	DATE	0.98+
one day	QUANTITY	0.98+
both	QUANTITY	0.97+
today	DATE	0.97+
first generation	QUANTITY	0.96+
Adam	PERSON	0.93+
Onda	ORGANIZATION	0.93+
one single place	QUANTITY	0.93+
florian	PERSON	0.93+
each workload	QUANTITY	0.92+
one	QUANTITY	0.91+
four	QUANTITY	0.9+
few years ago	DATE	0.88+
thousands of customers	QUANTITY	0.88+
Cube	COMMERCIAL_ITEM	0.87+
first data scientist	QUANTITY	0.84+
single	QUANTITY	0.83+
Asai	PERSON	0.82+
two world	QUANTITY	0.81+
first era	QUANTITY	0.74+
next 10 years	DATE	0.74+
Negroponte	PERSON	0.73+
Zaveri	ORGANIZATION	0.72+
Dataiku	ORGANIZATION	0.7+
Cube	ORGANIZATION	0.64+
Ajai	ORGANIZATION	0.58+
years	DATE	0.57+
covitz	PERSON	0.53+
decades	QUANTITY	0.52+
Cube	PERSON	0.45+
Snowflake	TITLE	0.45+
Seidel	ORGANIZATION	0.43+
snowflake	EVENT	0.35+
Seidel	COMMERCIAL_ITEM	0.34+

Democratizing AI & Advanced Analytics with Dataiku x Snowflake | Snowflake Data Cloud Summit

>> My name is Dave Vellante. And with me are two world-class technologists, visionaries and entrepreneurs. Benoit Dageville, he co-founded Snowflake and he's now the President of the Product Division, and Florian Douetteau is the Co-founder and CEO of Dataiku. Gentlemen, welcome to the cube to first timers, love it. >> Yup, great to be here. >> Now Florian you and Benoit, you have a number of customers in common, and I've said many times on theCUBE, that the first era of cloud was really about infrastructure, making it more agile, taking out costs. And the next generation of innovation, is really coming from the application of machine intelligence to data with the cloud, is really the scale platform. So is that premise relevant to you, do you buy that? And why do you think Snowflake, and Dataiku make a good match for customers? >> I think that because it's our values that aligned, when it gets all about actually today, and knowing complexity of our customers, so you close the gap. Where we need to commoditize the access to data, the access to technology, it's not only about data. Data is important, but it's also about the impacts of data. How can you make the best out of data as fast as possible, as easily as possible, within an organization. And another value is about just the openness of the platform, building a future together. Having a platform that is not just about the platform, but also for the ecosystem of partners around it, bringing the level of accessibility, and flexibility you need for the 10 years of that. >> Yeah, so that's key, that it's not just data. It's turning data into insights. Now Benoit, you came out of the world of very powerful, but highly complex databases. And we know we all know that you and the Snowflake team, you get very high marks for really radically simplifying customers' lives. But can you talk specifically about the types of challenges that your customers are using Snowflake to solve? >> Yeah, so the challenge before snowflake, I would say, was really to put all the data in one place, and run all the computes, all the workloads that you wanted to run against that data. And of course existing legacy platforms were not able to support that level of concurrency, many workload, we talk about machine learning, data science, data engineering, data warehouse, big data workloads, all running in one place didn't make sense at all. And therefore be what customers did this to create silos, silos of data everywhere, with different system, having a subset of the data. And of course now, you cannot analyze this data in one place. So Snowflake, we really solved that problem by creating a single architecture where you can put all the data into cloud. So it's a really cloud native. We really thought about how solve that problem, how to create, leverage cloud, and the elasticity of cloud to really put all the data in one place. But at the same time, not run all workload at the same place. So each workload that runs in Snowflake, at its dedicated compute resources to run. And that makes it agile, right? Florian talked about data scientist having to run analysis, so they need a lot of compute resources, but only for a few hours. And with Snowflake, they can run these new workload, add this workload to the system, get the compute resources that they need to run this workload. And then when it's over, they can shut down their system, it will automatically shut down. Therefore they would not pay for the resources that they don't use. So it's a very agile system, where you can do this analysis when you need, and you have all the power to run all these workload at the same time. >> Well, it's profound what you guys built. I mean to me, I mean of course everybody's trying to copy it now, it was like, I remember that bringing the notion of bringing compute to the data, in the Hadoop days. And I think that, as I say, everybody is sort of following your suit now or trying to. Florian, I got to say the first data scientist I ever interviewed on theCUBE, it was the amazing Hillary Mason, right after she started at Bitly, and she made data sciences sounds so compelling, but data science is a hard. So same question for you, what do you see as the biggest challenges for customers that they're facing with data science? >> The biggest challenge from my perspective, is that once you solve the issue of the data silo, with Snowflake, you don't want to bring another silo, which will be a silo of skills. And essentially, thanks to the talent gap, between the talent available to the markets, or are released to actually find recruits, train data scientists, and what needs to be done. And so you need actually to simplify the access to technologies such as, every organization can make it, whatever the talent, by bridging that gap. And to get there, there's a need of actually backing up the silos. Having a collaborative approach, where technologies and business work together, and actually all puts up their ends into those data projects together. >> It makes sense, Florain let's stay with you for a minute, if I can. Your observation space, it's pretty, pretty global. And so you have a unique perspective on how can companies around the world might be using data, and data science. Are you seeing any trends, maybe differences between regions, or maybe within different industries? What are you seeing? >> Yeah, definitely I do see trends that are not geographic, that much, but much more in terms of maturity of certain industries and certain sectors. Which are, that certain industries invested a lot, in terms of data, data access, ability to store data. As well as experience, and know region level of maturity, where they can invest more, and get to the next steps. And it's really relying on the ability of certain leaders, certain organizations, actually, to have built these long-term data strategy, a few years ago when no stats reaping of the benefits. >> A decade ago, Florian, Hal Varian famously said that the sexy job in the next 10 years will be statisticians. And then everybody sort of changed that to data scientist. And then everybody, all the statisticians became data scientists, and they got a raise. But data science requires more than just statistics acumen. What skills do you see as critical for the next generation of data science? >> Yeah, it's a great question because I think the first generation of data scientists, became data scientists because they could have done some Python quickly, and be flexible. And I think that the skills of the next generation of data scientists will definitely be different. It will be, first of all, being able to speak the language of the business, meaning how you translates data insight, predictive modeling, all of this into actionable insights of business impact. And it would be about how you collaborate with the rest of the business. It's not just how fast you can build something, how fast you can do a notebook in Python, or do predictive models of some sorts. It's about how you actually build this bridge with the business, and obviously those things are important, but we also must be cognizant of the fact that technology will evolve in the future. There will be new tools, new technologies, and they will still need to keep this level of flexibility to understand quickly what are the next tools they need to use a new languages, or whatever to get there. >> As you look back on 2020, what are you thinking? What are you telling people as we head into next year? >> Yeah, I think it's very interesting, right? This crises has told us that the world really can change from one day to the next. And this has dramatic and perform the aspects. For example companies all of a sudden, show their revenue line dropping, and they had to do less with data. And some other companies was the reverse, right? All of a sudden, they were online like Instacart, for example, and their business completely changed from one day to the other. So this agility of adjusting the resources that you have to do the task, and need that can change, using solution like Snowflake really helps that. Then we saw both in our customers. Some customers from one day to the next, were growing like big time, because they benefited from COVID, and their business benefited. But others had to drop. And what is nice with cloud, it allows you to adjust compute resources to your business needs, and really address it in house. The other aspect is understanding what happening, right? You need to analyze. We saw all our customers basically, wanted to understand what is the going to be the impact on my business? How can I adapt? How can I adjust? And for that, they needed to analyze data. And of course, a lot of data which are not necessarily data about their business, but also they are from the outside. For example, COVID data, where is the States, what is the impact, geographic impact on COVID, the time. And access to this data is critical. So this is the premise of the data cloud, right? Having one single place, where you can put all the data of the world. So our customer obviously then, started to consume the COVID data from that our data marketplace. And we had delete already thousand customers looking at this data, analyzing these data, and to make good decisions. So this agility and this, adapting from one hour to the next is really critical. And that goes with data, with cloud, with interesting resources, and that doesn't exist on premise. So indeed I think the lesson learned is we are living in a world, which is changing all the time, and we have to understand it. We have to adjust, and that's why cloud some ways is great. >> Excellent thank you. In theCUBE we like to talk about disruption, of course, who doesn't? And also, I mean, you look at AI, and the impact that it's beginning to have, and kind of pre-COVID. You look at some of the industries that were getting disrupted by, everyone talks about digital transformation. And you had on the one end of the spectrum, industries like publishing, which are highly disrupted, or taxis. And you can say, okay, well that's Bits versus Adam, the old Negroponte thing. But then the flip side of, you say look at financial services that hadn't been dramatically disrupted, certainly healthcare, which is ripe for disruption, defense. So there a number of industries that really hadn't leaned into digital transformation, if it ain't broke, don't fix it. Not on my watch. There was this complacency. And then of course COVID broke everything. So Florian I wonder if you could comment, what industry or industries do you think are going to be most impacted by data science, and what I call machine intelligence, or AI, in the coming years and decade? >> Honestly, I think it's all of them, or at least most of them, because for some industries, the impact is very visible, because we have talking about brand new products, drones, flying cars, or whatever that are very visible for us. But for others, we are talking about a part from changes in the way you operate as an organization. Even if financial industry itself doesn't seem to be so impacted, when you look at it from the consumer side, or the outside insights in Germany, it's probably impacted just because the way you use data (mumbles) for flexibility you need. Is there kind of the cost gain you can get by leveraging the latest technologies, is just the numbers. And so it's will actually comes from the industry that also. And overall, I think that 2020, is a year where, from the perspective of AI and analytics, we understood this idea of maturity and resilience, maturity meaning that when you've got to crisis you actually need data and AI more than before, you need to actually call the people from data in the room to take better decisions, and look for one and a backlog. And I think that's a very important learning from 2020, that will tell things about 2021. And the resilience, it's like, data analytics today is a function transforming every industries, and is so important that it's something that needs to work. So the infrastructure needs to work, the infrastructure needs to be super resilient, so probably not on prem or not fully on prem, at some point. And the kind of resilience where you need to be able to blend for literally anything, like no hypothesis in terms of BLOs, can be taken for granted. And that's something that is new, and which is just signaling that we are just getting to a next step for data analytics. >> I wonder Benoir if you have anything to add to that. I mean, I often wonder, when are machines going to be able to make better diagnoses than doctors, some people say already. Will the financial services, traditional banks lose control of payment systems? What's going to happen to big retail stores? I mean, maybe bring us home with maybe some of your finals thoughts. >> Yeah, I would say I don't see that as a negative, right? The human being will always be involved very closely, but then the machine, and the data can really help, see correlation in the data that would be impossible for human being alone to discover. So I think it's going to be a compliment not a replacement. And everything that has made us faster, doesn't mean that we have less work to do. It means that we can do more. And we have so much to do, that I will not be worried about the effect of being more efficient, and bare at our work. And indeed, I fundamentally think that data, processing of images, and doing AI on these images, and discovering patterns, and potentially flagging disease way earlier than it was possible. It is going to have a huge impact in health care. And as Florian was saying, every industry is going to be impacted by that technology. So, yeah, I'm very optimistic. >> Great, guys, I wish we had more time. I've got to leave it there, but so thanks so much for coming on theCUBE. It was really a pleasure having you.

Published Date : Nov 9 2020

SUMMARY :

and Florian Douetteau is the And the next generation of innovation, the access to data, about the types of challenges all the workloads that you of bringing compute to the And essentially, thanks to the talent gap, And so you have a unique perspective And it's really relying on the that the sexy job in the next 10 years of the next generation the resources that you have and the impact that And the kind of resilience where you need Will the financial services, and the data can really help, I've got to leave it there,

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Benoit	PERSON	0.99+
Florian Douetteau	PERSON	0.99+
Florian	PERSON	0.99+
Benoit Dageville	PERSON	0.99+
Dataiku	ORGANIZATION	0.99+
2020	DATE	0.99+
Hillary Mason	PERSON	0.99+
Hal Varian	PERSON	0.99+
10 years	QUANTITY	0.99+
Python	TITLE	0.99+
Snowflake	ORGANIZATION	0.99+
Germany	LOCATION	0.99+
one hour	QUANTITY	0.99+
both	QUANTITY	0.99+
next year	DATE	0.99+
Bitly	ORGANIZATION	0.99+
one day	QUANTITY	0.98+
2021	DATE	0.98+
A decade ago	DATE	0.98+
one place	QUANTITY	0.97+
Snowflake Data Cloud Summit	EVENT	0.97+
Snowflake	TITLE	0.96+
each workload	QUANTITY	0.96+
today	DATE	0.96+
first generation	QUANTITY	0.96+
Benoir	PERSON	0.95+
snowflake	EVENT	0.94+
first era	QUANTITY	0.92+
COVID	OTHER	0.92+
single architecture	QUANTITY	0.91+
thousand customers	QUANTITY	0.9+
first data scientist	QUANTITY	0.9+
one	QUANTITY	0.88+
one single place	QUANTITY	0.87+
few years ago	DATE	0.86+
Negroponte	PERSON	0.85+
Florain	ORGANIZATION	0.82+
two world	QUANTITY	0.81+
first	QUANTITY	0.8+
Instacart	ORGANIZATION	0.75+
next 10 years	DATE	0.7+
hours	QUANTITY	0.67+
Snowflake	EVENT	0.59+
a minute	QUANTITY	0.58+
theCUBE	ORGANIZATION	0.55+
Adam	PERSON	0.49+

Benoit Dageville and Florian Douetteau V1

>> Hello everyone, welcome back to theCUBE'S wall to wall coverage of the Snowflake Data Cloud Summit. My name is Dave Vellante and with me are two world-class technologists, visionaries, and entrepreneurs. Benoit Dageville is the, he co-founded Snowflake. And he's now the president of the Product division and Florian Douetteau is the co-founder and CEO of Dataiku. Gentlemen, welcome to theCUBE, two first timers, love it. >> Great time to be here. >> Now Florian, you and Benoit, you have a number of customers in common. And I've said many times on theCUBE that, the first era of cloud was really about infrastructure, making it more agile taking out costs. And the next generation of innovation is really coming from the application of machine intelligence to data with the cloud, is really the scale platform. So is that premise relevant to you, do you buy that? And why do you think Snowflake and Dataiku make a good match for customers? >> I think that because it's our values that align. When it gets all about actually today, and knowing complexity per customer, so you close the gap or we need to commoditize the access to data, the access to technology, it's not only about data, data is important, but it's also about the impacts of data. How can you make the best out of data as fast as possible, as easily as possible within an organization? And another value is about just the openness of the platform, building a future together. I think a platform that is not just about the platform but also for the ecosystem of partners around it, bringing the little bit of accessibility and flexibility, you need for the 10 years of that. >> Yes, so that's key, but it's not just data. It's turning data into insights. Now Benoit, you came out of the world of very powerful, but highly complex databases. And we all know that, you and the Snowflake team, you get very high marks for really radically simplifying customers' lives. But can you talk specifically about the types of challenges that your customers are using Snowflake to solve? >> Yeah, so really the challenge before Snowflake, I would say, was really to put all the data, in one place and run all the computes, all the workloads that you wanted to run, against that data. And of course, existing legacy platforms were not able to support that level of concurrency, many workload. We talk about machine learning, data science, data engineering, data warehouse, big data workloads, all running in one place, didn't make sense at all. And therefore, what customers did, is to create silos, silos of data everywhere, with different systems having a subset of the data. And of course now you cannot analyze this data in one place. So Snowflake, we really solved that problem by creating a single architecture where you can put all the data in the cloud. So it's a really cloud native. We really thought about how to solve that problem, how to create leverage cloud and the elasticity of cloud to really put all the data in one place. But at the same time, not run all workload at the same place. So each workload that runs in Snowflake at least dedicate compute resources to run. And that makes it very agile, right. Florian talked about data scientist having to run analysis. So they need a lot of compute resources, but only for few hours and with Snowflake, they can run these new workload, add this workload to the system, get the compute resources that they need to run this workload. And then when it's over, they can shut down their system. It will automatically shut down. Therefore they would not pay for the resources that they don't choose. So it's a very agile system, where you can do these analysis when you need, and you have all the power to run all these workload at the same time. >> Well, it's profound what you guys built. To me, I mean, because everybody's trying to copy it now. It's like, I remember the notion of bringing compute to the data in the Hadoop days. And I think that, as I say, everybody is sort of following your suit now or trying to. Florian, I got to say, the first data scientist I ever interviewed on theCUBE was the amazing Hilary Mason, right after she started at Bitly. And she made data science sounds so compelling, but data science is hard. So same question for you. What do you see is the biggest challenges for customers that they're facing with data science? >> The biggest challenge from my perspective is that once you solve the issue of the data silo with Snowflake, you don't want to bring another silo, which would be a silo of skills. And essentially, thanks to that talent gap between the talent and labor of the markets, or how it is to actually find, recruit and train data scientists and what needs to be done. And so you need actually to simplify the access to technology such as every organization can make it, whatever the talents by bridging that gap. And to get there, there is a need of actually breaking up the silos. I think a collaborative approach, where technologies and business work together and actually all put some of their ends into those data projects together. >> Yeah, it makes sense. So Florian, Let's stay with you for a minute, if I can. Your observation spaces, is pretty, pretty global. And so, you have a unique perspective on how companies around the world might be using data and data science. Are you seeing any trends, maybe differences between regions or maybe within different industries? What are you seeing? >> Yep. Yeah, definitely, I do see trends that are not geographic that much, but much more in terms of maturity of certain industries and certain sectors, which are that certain industries invested a lot in terms of data, data access, ability to store data as well as few years and know each level of maturity where they can invest more and get to the next steps. And it's really reliant to reach out to certain details, certain organization, actually to have built this longterm data strategy a few years ago, and no stocks ripping off the benefits. >> You know, a decade ago, Florian, Hal Varian famously said that the sexy job in the next 10 years will be statisticians. And then everybody sort of changed that to data scientists. And then everybody, all the statisticians became data scientists and they got a raise. But data science requires more than just statistics acumen. What skills do you see is critical for the next generation of data science? >> Yeah, it's a good question because I think the first generation of data scientists became better scientists because they could learn some Python quickly and be flexible. And I think that skills of the next generation of data scientists will definitely be different. It will be first about being able to speak the language of the business, meaning all you translate data insight, predictive modeling, all of this into actionable insights or business impact. And it will be about who you collaborate with the rest of the business. It's not just how fast you can build something, how fast you can do a notebook in Python or do quantity models of some sorts. It's about how you actually build this bridge with the business. And obviously those things are important, but we also must be cognizant of the fact that technology will evolve in the future. There will be new tools in technologies, and they will still need to get this level of flexibility and get to understand quickly what are the next tools, they need to use or new languages or whatever to get there. >> Thank you for that. Benoit, let's come back to you. This year has been tumultuous to say the least for everyone, but it's a good time to be in tech, ironically. And if you're in cloud, it's even better. But you look at Snowflake and Dataiku, you guys had done well, despite the economic uncertainty and the challenges of the pandemic. As you look back on 2020, what are you thinking? What are you telling people as we head into next year? >> Yeah, I think it's very interesting, right. We, this crisis has told us that the world really can change from one day to the next. And this has dramatic and profound aspects. For example, companies all of a sudden, saw their revenue line dropping and they had to do less with data. And some of the companies was the reverse, right? All of a sudden, they were online like Instacart, for example, and their business completely change from one day to the other. So this agility of adjusting the resources that you have to do the task, a need that can change, using solution like Snowflake, really helps that. And we saw both in our customers. Some customers from one day to the next, were growing like big time, because they benefited from COVID and their business benefited, but also, as you know, had to drop and what is nice with cloud, it allows to adjust compute resources to your business needs and really address it in-house. The other aspect is understanding what is happening, right? You need to analyze. So we saw all our customers basically wanted to understand, what is it going to be the impact on my business? How can I adapt? How can I adjust? And for that, they needed to analyze data. And of course, a lot of data, which are not necessarily data about their business, but also data from the outside. For example, COVID data. Where is the state, what is the impact, geographic impact on COVID all the time. And access to this data is critical. So this is the promise of the data cloud, right? Having one single place where you can put all the data of the world. So, our customers all of a sudden, started to consume the COVID data from our data marketplace. And we have the unit already thousands of customers looking at this data, analyzing this data to make good decisions. So this agility and this adapting from one hour to the next is really critical and that goes with data, with cloud, more interesting resources and that's doesn't exist on premise. So, indeed I think the lesson learned is, we are living in a world which is changing all the time, and we have to understand it. We have to adjust and that's why cloud, some way is great. >> Excellent, thank you. You know, in theCUBE, we like to talk about disruption, of course, who doesn't. And also, I mean, you look at AI and the impact that it's beginning to have and kind of pre-COVID, you look at some of the industries that were getting disrupted by, everybody talks about digital transformation and you had on the one end of the spectrum, industries like publishing, which are highly disrupted or taxis, and you can say, "Okay well, that's Bits versus Adam, the old Negroponte thing." But then the flip side of this, it says, "Look at financial services that hadn't been dramatically disrupted, certainly healthcare, which is right for disruption, defense." So the more the number of industries that really hadn't leaned into digital transformation, if it ain't broke, don't fix it. Not on my watch. There was this complacency. And then of course COVID broke everything. So Florian, I wonder if you could comment, what industry or industries do you think are going to be most impacted by data science and what I call machine intelligence or AI in the coming years and decades? >> Honestly, I think it's all of them, or at least most of them. Because for some industries, the impact is very visible because we are talking about brand new products, drones, flying cars, or whatever is that are very visible for us. But for others, we are talking about spectrum changes in the way you operate as an organization. Even if financial industry itself doesn't seem to be so impacted when you look at it from the consumer side or the outside. In fact internally, it's probably impacted just because of the way you use data to develop for flexibility you need, is there kind of a cost gain you can get by leveraging the latest technologies, is just enormous. And so it will, actually comes from the industry, that also. And overall, I think that 2020 is a year where, from the perspective of AI and analytics, we understood this idea of maturity and resilience. Maturity, meaning that when you've got a crisis, you actually need data and AI more than before, you need to actually call the people from data in the room to take better decisions and look forward and not backward. And I think that's a very important learning from 2020 that will tell things about 2021. And resilience, it's like, yeah, data analytics today is a function consuming every industries, and is so important that it's something that needs to work. So the infrastructure needs to work, the infrastructure needs to be super resilient. So probably not on trend and not fully on trend, at some point and the kind of residence where you need to be able to plan for literally anything. like no hypothesis in terms of behaviors can be taken for granted. And that's something that is new and which is just signaling that we are just getting into a next step for all data analytics. >> I wonder Benoit, if you have anything to add to that, I mean, I often wonder, you know, when are machines going to be able to make better diagnoses than doctors, some people say already. Will the financial services, traditional banks lose control of payment systems? You know, what's going to happen to big retail stores? I mean, may be bring us home with maybe some of your final thoughts. >> Yeah, I would say, I don't see that as a negative, right? The human being will always be involved very closely, but then the machine and the data can really help, see correlation in the data that would be impossible for human being alone to discover. So, I think it's going to be a compliment, not a replacement and everything that has made us faster, doesn't mean that we have less work to do. It means that we can do more. And we have so much to do. That I would not be worried about the effect of being more efficient and better at our work. And indeed, I fundamentally think that, data, processing of images and doing AI on these images and discovering patterns and potentially flagging disease, way earlier than it was possible, it is going to have a huge impact in health care. And as Florian was saying, every industry is going to be impacted by that technology. So, yeah, I'm very optimistic. >> Great, Guys, I wish we had more time. We got to leave it there but so thanks so much for coming on theCUBE. It was really a pleasure having you. >> [Benoit & Florian] Thank you. >> You're welcome but keep it right there, everybody. We'll back with our next guest, right after this short break. You're watching theCUBE.

Published Date : Oct 21 2020

SUMMARY :

And he's now the president And the next generation of the access to data, the And we all know that, you all the workloads that you the notion of bringing the access to technology such as And so, you have a unique And it's really reliant to reach out Hal Varian famously said that the sexy job And it will be about who you collaborate and the challenges of the pandemic. adjusting the resources that you have end of the spectrum, of the way you use data to I mean, I often wonder, you know, So, I think it's going to be a compliment, We got to leave it there right after this short break.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Florian	PERSON	0.99+
Benoit	PERSON	0.99+
Florian Douetteau	PERSON	0.99+
Benoit Dageville	PERSON	0.99+
2020	DATE	0.99+
10 years	QUANTITY	0.99+
Dataiku	ORGANIZATION	0.99+
Hilary Mason	PERSON	0.99+
Python	TITLE	0.99+
Hal Varian	PERSON	0.99+
next year	DATE	0.99+
Snowflake	ORGANIZATION	0.99+
one place	QUANTITY	0.99+
both	QUANTITY	0.99+
one hour	QUANTITY	0.99+
Bitly	ORGANIZATION	0.99+
Snowflake Data Cloud Summit	EVENT	0.99+
a decade ago	DATE	0.98+
one day	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
first	QUANTITY	0.98+
each level	QUANTITY	0.98+
Snowflake	TITLE	0.98+
2021	DATE	0.97+
today	DATE	0.97+
first generation	QUANTITY	0.97+
pandemic	EVENT	0.97+
few years ago	DATE	0.93+
thousands of customers	QUANTITY	0.93+
single architecture	QUANTITY	0.92+
first era	QUANTITY	0.88+
Negroponte	PERSON	0.87+
first data scientist	QUANTITY	0.87+
Instacart	ORGANIZATION	0.87+
This year	DATE	0.86+
one single place	QUANTITY	0.86+
two	QUANTITY	0.83+
two world-	QUANTITY	0.78+
each workload	QUANTITY	0.78+
one	QUANTITY	0.76+
Adam	PERSON	0.74+
next 10 years	DATE	0.69+
first timers	QUANTITY	0.52+
COVID	OTHER	0.51+
COVID	ORGANIZATION	0.43+
COVID	EVENT	0.37+
decades	DATE	0.29+

Leigh Martin, Infor | Inforum DC 2018

>> Live from Washington, D.C., it's theCUBE! Covering Inforum D.C. 2018. Brought to you by Infor. >> Well, welcome back to Washington, D.C., We are alive here at the Convention Center at Inforum 18, along with Dave Vellante, I'm John Walls. It's a pleasure now, welcome to theCUBE, Leigh Martin, who is the Senior Director of the Dynamic Science Labs at Infor, and good afternoon to you Leigh! >> Good afternoon, thank you for having me. >> Thanks for comin' on. >> Thank you for being here. Alright, well tell us about the Labs first off, obviously, data science is a big push at Infor. What do you do there, and then why is data science such a big deal? >> So Dynamic Science Labs is based in Cambridge, Massachusetts, we have about 20 scientists with backgrounds in math and science areas, so typically PhDs in Statistics and Operations Research, and those types of areas. And, we've really been working over the last several years to build solutions for Infor customers that are Math and Science based. So, we work directly with customers, typically through proof of concept, so we'll work directly with customers, we'll bring in their data, and we will build a solution around it. We like to see them implement it, and make sure we understand that they're getting the value back that we expect them to have. Once we prove out that piece of it, then we look for ways to deliver it to the larger group of Infor customers, typically through one of the Cloud Suites, perhaps functionality, that's built into a Cloud Suite, or something like that. >> Well, give me an example, I mean it's so, as you think-- you're saying that you're using data that's math and science based, but, for application development or solution development if you will. How? >> So, I'll give you an example, so we have a solution called Inventory Intelligence for Healthcare, it's moving towards a more generalized name of Inventory Intelligence, because we're going to move it out of the healthcare space and into other industries, but this is a product that we built over the last couple of years. We worked with a couple of customers, we brought in their loss and data, so their loss in customers, we bring the data into an area where we can work on it, we have a scientist in our team, actually, she's one of the Senior Directors in the team, Dawn Rose, who led the effort to design and build this, design and build the algorithm underlying the product; and what it essentially does is, it allows hospitals to find the right level of inventory. Most hospitals are overstocked, so this gives them an opportunity to bring down their inventory levels, to a manageable place without increasing stockouts, so obviously, it's very important in healthcare, that you're not having a lot of stockouts. And so, we spent a lot of time working with these customers, really understanding what the data was like that they were giving to us, and then Dawn and her team built the algorithm that essentially says, here's what you've done historically, right? So it's based on historic data, at the item level, at the location level. What've you done historically, and how can we project out the levels you should have going forward, so that they're at the right level where you're saving money, but again, you're not increasing stockouts, so. So, it's a lot of time and effort to bring those pieces together and build that algorithm, and then test it out with the customers, try it out a couple of times, you make some tweaks based on their business process and exactly how it works. And then, like I said, we've now built that out into originally a stand-alone application, and in about a month, we're going to go live in Cloud Suite Financials, so it's going to be a piece of functionality inside of Cloud Suite Financials. >> So, John, if I may, >> Please. >> I'm going to digress for a moment here because the first data scientist that I ever interviewed was the famous Hilary Mason, who's of course now at Cloudera, but, and she told me at the time that the data scientist is a part mathematician, part scientist, part statistician, part data hacker, part developer, and part artist. >> Right. (laughs) >> So, you know it's an amazing field that Hal Varian, who is the Google Economist said, "It's going to be the hottest field, in the next 10 years." And this is sort of proven true, but Leigh, my question is, so you guys are practitioners of data science, and then you bring that into your product, and what we hear from a lot of data scientists, other than that sort of, you know, panoply of skill sets, is, they spend more time wrangling data, and the tooling isn't there for collaboration. How are you guys dealing with that? How has that changed inside of Infor? >> It is true. And we actually really focus on first making sure we understand the data and the context of the data, so it's really important if you want to solve a particular business problem that a customer has, to make sure you understand exactly what is the definition of each and every piece of data that's in all of those fields that they sent over to you, before you try to put 'em inside an algorithm and make them do something for you. So it is very true that we spend a lot of time cleaning and understanding data before we ever dive into the problem solving aspect of it. And to your point, there is a whole list of other things that we do after we get through that phase, but it's still something we spend a lot of time on today, and that has been the case for, a long time now. We, wherever we can, we apply new tools and new techniques, but actually just the simple act of going in there and saying, "What am I looking at, how does it relate?" Let me ask the customer to clarify this to make sure I understand exactly what it means. That part doesn't go away, because we're really focused on solving the customer solution and then making sure that we can apply that to other customers, so really knowing what the data is that we're working with is key. So I don't think that part has actually changed too much, there are certainly tools that you can look at. People talk a lot about visualization, so you can start thinking, "Okay, how can I use some visualization to help me understand the data better?" But, just that, that whole act of understanding data is key and core to what we do, because, we want to build the solution that really answers the answers the business problem. >> The other thing that we hear a lot from data scientists is that, they help you figure out what questions you actually have to ask. So, it sort of starts with the data, they analyze the data, maybe you visualize the data, as you just pointed out, and all these questions pop out. So what is the process that you guys use? You have the data, you've got the data scientist, you're looking at the data, you're probably asking all these questions. You get, of course, get questions from your customers as well. You're building models maybe to address those questions, training the models to get better and better and better, and then you infuse that into your software. So, maybe, is that the process? Is it a little more complicated than that? Maybe you could fill in the gaps. >> Yeah, so, I, my personal opinion, and I think many of my colleagues would agree with me on this is, starting with the business problem, for us, is really the key. There are ways to go about looking at the data and then pulling out the questions from the data, but generally, that is a long and involved process. Because, it takes a lot of time to really get that deep into the data. So when we work, we really start with, what's the business problem that the customer's trying to solve? And then, what's the data that needs to be available for us to be able to solve that? And then, build the algorithm around that. So for us, it's really starting with the business problem. >> Okay, so what are some of the big problems? We heard this morning, that there's a problem in that, there's more job openings than there are candidates, and productivity, business productivity is not being impacted. So there are two big chewy problems that data scientists could maybe attack, and you guys seem to be passionate about those, so. How does data science help solve those problems? >> So, I think that, at Infor, I'll start off by saying at Infor there's actually, I talked about the folks that are in our office in Cambridge, but there's quite a bit of data science going on outside of our team, and we are the data science team, but there are lots of places inside of Infor where this is happening. Either in products that contains some sort of algorithmic approach, the HCM team for sure, the talent science team which works on HCM, that's a team that's led by Jill Strange, and we work with them on certain projects in certain areas. They are very focused on solving some of those people-related problems. For us, we work a little bit more on the, some of the other areas we work on is sort of the manufacturing and distribution areas, we work with the healthcare side of things, >> So supply chain, healthcare? >> Exactly. So some of the other areas, because they are, like I said, there are some strong teams out there that do data science, it's just, it's also incorporated with other things, like the talent science team. So, there's lots of examples of it out there. In terms of how we go about building it, so we, like I was saying, we work on answering the business, the business question upfront, understanding the data, and then, really sitting with the customer and building that out, and, so the problems that come to us are often through customers who have particular things that they want to answer. So, a lot of it is driven by customer questions, and particular problems that they're facing. Some of it is driven by us. We have some ideas about things that we think, would be really useful to customers. Either way, it ends up being a customer collaboration with us, with the product team, that eventually we'll want to roll it out too, to make sure that we're answering the problem in the way that the product team really feels it can be rolled out to customers, and better used, and more easily used by them. >> I presume it's a non-linear process, it's not like, that somebody comes to you with a problem, and it's okay, we're going to go look at that. Okay now, we got an answer, I mean it's-- Are you more embedded into the development process than that? Can you just explain that? >> So, we do have, we have a development team in Prague that does work with us, and it's depending on whether we think we're going to actually build a more-- a product with aspects to it like a UI, versus just a back end solution. Depends on how we've decided we want to proceed with it. so, for example, I was talking about Inventory Intelligence for Healthcare, we also have Pricing Science for Distribution, both of those were built initially with UIs on them, and customers could buy those separately. Now that we're in the Cloud Suites, that those are both being incorporated into the Cloud Suite. So, we have, going back to where I was talking about our team in Prague, we sometimes build product, sort of a fully encased product, working with them, and sometimes we work very closely with the development teams from the various Cloud Suites. And the product management team is always there to help us, to figure out sort of the long term plan and how the different pieces fit together. >> You know, kind of big picture, you've got AI right, and then machine learning, pumping all kinds of data your way. So, in a historical time frame, this is all pretty new, this confluence right? And in terms of development, but, where do you see it like 10 years from now, 20 years from now? What potential is there, we've talked about human potential, unlocking human potential, we'll unlock it with that kind of technology, what are we looking at, do you think? >> You know, I think that's such a fascinating area, and area of discussion, and sort of thinking, forward thinking. I do believe in sort of this idea of augmented intelligence, and I think Charles was talking a little bit about, about that this morning, although not in those particular terms; but this idea that computers and machines and technology will actually help us do better, and be better, and being more productive. So this idea of doing sort of the rote everyday tasks, that we no longer have to spend time doing, that'll free us up to think about the bigger problems, and hopefully, and my best self wants to say we'll work on famine, and poverty, and all those problems in the world that, really need our brains to focus on, and work. And the other interesting part of it is, if you think about, sort of the concept of singularity, and are computers ever going to actually be able to think for themselves? That's sort of another interesting piece when you talk about what's going to happen down the line. Maybe it won't happen in 10 years, maybe it will never happen, but there's definitely a lot of people out there, who are well known in sort of tech and science who talk about that, and talk about the fears related to that. That's a whole other piece, but it's fascinating to think about 10 years, 20 years from now, where we are going to be on that spectrum? >> How do you guys think about bias in AI and data science, because, humans express bias, tribalism, that's inherent in human nature. If machines are sort of mimicking humans, how do you deal with that and adjudicate? >> Yeah, and it's definitely a concern, it's another, there's a lot of writings out there and articles out there right now about bias in machine learning and in AI, and it's definitely a concern. I actually read, so, just being aware of it, I think is the first step, right? Because, as scientists and developers develop these algorithms, going into it consciously knowing that this is something they have to protect against, I think is the first step, for sure. And then, I was just reading an article just recently about another company (laughs) who is building sort of a, a bias tracker, so, a way to actually monitor your algorithm and identify places where there is perhaps bias coming in. So, I do think we'll see, we'll start to see more of those things, it gets very complicated, because when you start talking about deep learning and networks and AI, it's very difficult to actually understand what's going on under the covers, right? It's really hard to get in and say this is the reason why, your AI told you this, that's very hard to do. So, it's not going to be an easy process but, I think that we're going to start to see that kind of technology come. >> Well, we heard this morning about some sort of systems that could help, my interpretation, automate, speed up, and minimize the hassle of performance reviews. >> Yes. (laughs) >> And that's the classic example of, an assertive woman is called abrasive or aggressive, an assertive man is called a great leader, so it's just a classic example of bias. I mentioned Hilary Mason, rock star data scientist happens to be a woman, you happen to be a woman. Your thoughts as a woman in tech, and maybe, can AI help resolve some of those biases? >> Yeah. Well, first of all I want to say, I'm very pleased to work in an organization where we have some very strong leaders, who happen to be women, so I mentioned Dawn Rose, who designed our IIH solution, I mentioned Jill Strange, who runs the talent science organization. Half of my team is women, so, particularly inside of sort of the science area inside of Infor, I've been very pleased with the way we've built out some of that skill set. And, I'm also an active member of WIN, so the Women's Infor Network is something I'm very involved with, so, I meet a lot of people across our organization, a lot of women across our organization who have, are just really strong technology supporters, really intelligent, sort of go-getter type of people, and it's great to see that inside of Infor. I think there's a lot of work to be done, for sure. And you can always find stories, from other, whether it's coming out of Silicon Valley, or other places where you hear some, really sort of arcane sounding things that are still happening in the industry, and so, some of those things it's, it's disappointing, certainly to hear that. But I think, Van Jones said something this morning about how, and I liked the way he said it, and I'm not going to be able say it exactly, but he said something along the lines of, "The ground is there, the formation is starting, to get us moving in the right direction." and I think, I'm hopeful for the future, that we're heading in that way, and I think, you know, again, he sort of said something like, "Once the ground swell starts going in that direction, people will really jump in, and will see the benefits of being more diverse." Whether it's across, having more women, or having more people of color, however things expand, and that's just going to make us all better, and more efficient, and more productive, and I think that's a great thing. >> Well, and I think there's a spectrum, right? And on one side of the spectrum, there's intolerable and unacceptable behavior, which is just, should be zero tolerance in my opinion, and the passion of ours in theCUBE. The other side of that spectrum is inclusion, and it's a challenge that we have as a small company, and I remember having a conversation, earlier this year with an individual. And we talk about quotas, and I don't think that's the answer. Her comment was, "No, that's not the answer, you have to endeavor to reach deeper beyond your existing network." Which is hard sometimes for us, 'cause you're so busy, you're running around, it's like okay it's the convenient thing to do. But you got to peel the onion on that network, and actually take the extra time and make it a priority. I mean, your thoughts on that? >> No, I think that's a good point, I mean, if I think about who my circle is, right? And the people that I know and I interact with. If I only reach out to the smallest group of people, I'm not getting really out beyond my initial circle. So I think that's a very good point, and I think that that's-- we have to find ways to be more interactive, and pull from different areas. And I think it's interesting, so coming back to data science for a minute, if you sort of think about the evolution of where we got to, how we got to today where, now we're really pulling people from science areas, and math areas, and technology areas, and data scientists are coming from lots of places, right? And you don't always have to have a PhD, right? You don't necessary have to come up through that system to be a good data scientist, and I think, to see more of that, and really people going beyond, beyond just sort of the traditional circles and the traditional paths to really find people that you wouldn't normally identify, to bring into that, that path, is going to help us, just in general, be more diverse in our approach. >> Well it certainly it seems like it's embedded in the company culture. I think the great reason for you to be so optimistic going forward, not only about your job, but about the way companies going into that doing your job. >> What would you advise, young people generally, who want to crack into the data science field, but specifically, women, who have clearly, are underrepresented in technology? >> Yeah, so, I think the, I think we're starting to see more and more women enter the field, again it's one of those, people know it, and so there's less of a-- because people are aware of it, there's more tendency to be more inclusive. But I definitely think, just go for it, right? I mean if it's something you're interested in, and you want to try it out, go to a coding camp, and take a science class, and there's so many online resources now, I mean there's, the massive online courses that you can take. So, even if you're hesitant about it, there are ways you can kind of be at home, and try it out, and see if that's the right thing for you. >> Just dip your toe in the water. >> Yes, exactly, exactly! Try it out and see, and then just decide if that's the right thing for you, but I think there's a lot of different ways to sort of check it out. Again, you can take a course, you can actually get a degree, there's a wide range of things that you can do to kind of experiment with it, and then find out if that's right for you. >> And if you're not happy with the hiring opportunities out there, just start a company, that's my advice. >> That's right. (laughing together) >> Agreed, I definitely agree! >> We thank you-- we appreciate the time, and great advice, too. >> Thank you so much. >> Leigh Martin joining us here at Inforum 18, we are live in Washington, D.C., you're watching the exclusive coverage, right here, on theCUBE. (bubbly music)

Published Date : Sep 25 2018

SUMMARY :

Brought to you by Infor. and good afternoon to you Leigh! and then why is data science such a big deal? and we will build a solution around it. Well, give me an example, I mean it's so, as you think-- and how can we project out that the data scientist is a part mathematician, (laughs) and then you bring that into your product, and that has been the case for, a long time now. and then you infuse that into your software. and I think many of my colleagues and you guys seem to be passionate about those, so. some of the other areas we work on is sort of the so the problems that come to us are often through that somebody comes to you with a problem, And the product management team is always there to help us, what are we looking at, do you think? and talk about the fears related to that. How do you guys think about bias that this is something they have to protect against, Well, we heard this morning about some sort of And that's the classic example of, and it's great to see that inside of Infor. and it's a challenge that we have as a small company, and I think that that's-- I think the great reason for you to be and see if that's the right thing for you. and then just decide if that's the right thing for you, the hiring opportunities out there, That's right. we appreciate the time, and great advice, too. at Inforum 18, we are live in Washington, D.C.,

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Hilary Mason	PERSON	0.99+
John Walls	PERSON	0.99+
Hal Varian	PERSON	0.99+
Jill Strange	PERSON	0.99+
Dynamic Science Labs	ORGANIZATION	0.99+
John	PERSON	0.99+
Leigh Martin	PERSON	0.99+
Washington, D.C.	LOCATION	0.99+
Cambridge	LOCATION	0.99+
Prague	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
Charles	PERSON	0.99+
Leigh	PERSON	0.99+
Infor	ORGANIZATION	0.99+
Van Jones	PERSON	0.99+
Dawn	PERSON	0.99+
WIN	ORGANIZATION	0.99+
first step	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Dawn Rose	PERSON	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
Cloud Suite	TITLE	0.99+
Women's Infor Network	ORGANIZATION	0.98+
Convention Center	LOCATION	0.98+
one	QUANTITY	0.98+
today	DATE	0.98+
both	QUANTITY	0.97+
10 years	QUANTITY	0.97+
this morning	DATE	0.96+
Cloud Suites	TITLE	0.96+
first	QUANTITY	0.96+
one side	QUANTITY	0.95+
Cloud Suite Financials	TITLE	0.93+
each	QUANTITY	0.92+
two big chewy problems	QUANTITY	0.92+
about 20 scientists	QUANTITY	0.92+
D.C.	LOCATION	0.9+
earlier this year	DATE	0.9+
20 years	QUANTITY	0.88+
last couple of years	DATE	0.88+
DC	LOCATION	0.87+
first data scientist	QUANTITY	0.85+
Inforum 18	ORGANIZATION	0.83+
Google	ORGANIZATION	0.79+
Half of my team	QUANTITY	0.76+
years	DATE	0.75+
couple	QUANTITY	0.74+
Inventory Intelligence	TITLE	0.71+
years	QUANTITY	0.69+
HCM	ORGANIZATION	0.68+
about a month	QUANTITY	0.68+
next 10 years	DATE	0.68+
2018	DATE	0.66+
20	DATE	0.63+
theCUBE	ORGANIZATION	0.62+
last	DATE	0.55+
Inforum	ORGANIZATION	0.54+
zero	QUANTITY	0.52+
Economist	TITLE	0.51+
Cloud	TITLE	0.49+
Inventory	ORGANIZATION	0.47+
Inforum	EVENT	0.42+

Next-Generation Analytics Social Influencer Roundtable - #BigDataNYC 2016 #theCUBE

>> Narrator: Live from New York, it's the Cube, covering big data New York City 2016. Brought to you by headline sponsors, CISCO, IBM, NVIDIA, and our ecosystem sponsors, now here's your host, Dave Valante. >> Welcome back to New York City, everybody, this is the Cube, the worldwide leader in live tech coverage, and this is a cube first, we've got a nine person, actually eight person panel of experts, data scientists, all alike. I'm here with my co-host, James Cubelis, who has helped organize this panel of experts. James, welcome. >> Thank you very much, Dave, it's great to be here, and we have some really excellent brain power up there, so I'm going to let them talk. >> Okay, well thank you again-- >> And I'll interject my thoughts now and then, but I want to hear them. >> Okay, great, we know you well, Jim, we know you'll do that, so thank you for that, and appreciate you organizing this. Okay, so what I'm going to do to our panelists is ask you to introduce yourself. I'll introduce you, but tell us a little bit about yourself, and talk a little bit about what data science means to you. A number of you started in the field a long time ago, perhaps data warehouse experts before the term data science was coined. Some of you started probably after Hal Varian said it was the sexiest job in the world. (laughs) So think about how data science has changed and or what it means to you. We're going to start with Greg Piateski, who's from Boston. A Ph.D., KDnuggets, Greg, tell us about yourself and what data science means to you. >> Okay, well thank you Dave and thank you Jim for the invitation. Data science in a sense is the second oldest profession. I think people have this built-in need to find patterns and whatever we find we want to organize the data, but we do it well on a small scale, but we don't do it well on a large scale, so really, data science takes our need and helps us organize what we find, the patterns that we find that are really valid and useful and not just random, I think this is a big challenge of data science. I've actually started in this field before the term Data Science existed. I started as a researcher and organized the first few workshops on data mining and knowledge discovery, and the term data mining became less fashionable, became predictive analytics, now it's data science and it will be something else in a few years. >> Okay, thank you, Eves Mulkearns, Eves, I of course know you from Twitter. A lot of people know you as well. Tell us about your experiences and what data scientist means to you. >> Well, data science to me is if you take the two words, the data and the science, the science it holds a lot of expertise and skills there, it's statistics, it's mathematics, it's understanding the business and putting that together with the digitization of what we have. It's not only the structured data or the unstructured data what you store in the database try to get out and try to understand what is in there, but even video what is coming on and then trying to find, like George already said, the patterns in there and bringing value to the business but looking from a technical perspective, but still linking that to the business insights and you can do that on a technical level, but then you don't know yet what you need to find, or what you're looking for. >> Okay great, thank you. Craig Brown, Cube alum. How many people have been on the Cube actually before? >> I have. >> Okay, good. I always like to ask that question. So Craig, tell us a little bit about your background and, you know, data science, how has it changed, what's it all mean to you? >> Sure, so I'm Craig Brown, I've been in IT for almost 28 years, and that was obviously before the term data science, but I've evolved from, I started out as a developer. And evolved through the data ranks, as I called it, working with data structures, working with data systems, data technologies, and now we're working with data pure and simple. Data science to me is an individual or team of individuals that dissect the data, understand the data, help folks look at the data differently than just the information that, you know, we usually use in reports, and get more insights on, how to utilize it and better leverage it as an asset within an organization. >> Great, thank you Craig, okay, Jennifer Shin? Math is obviously part of being a data scientist. You're good at math I understand. Tell us about yourself. >> Yeah, so I'm a senior principle data scientist at the Nielsen Company. I'm also the founder of 8 Path Solutions, which is a data science, analytics, and technology company, and I'm also on the faculty in the Master of Information and Data Science program at UC Berkeley. So math is part of the IT statistics for data science actually this semester, and I think for me, I consider myself a scientist primarily, and data science is a nice day job to have, right? Something where there's industry need for people with my skill set in the sciences, and data gives us a great way of being able to communicate sort of what we know in science in a way that can be used out there in the real world. I think the best benefit for me is that now that I'm a data scientist, people know what my job is, whereas before, maybe five ten years ago, no one understood what I did. Now, people don't necessarily understand what I do now, but at least they understand kind of what I do, so it's still an improvement. >> Excellent. Thank you Jennifer. Joe Caserta, you're somebody who started in the data warehouse business, and saw that snake swallow a basketball and grow into what we now know as big data, so tell us about yourself. >> So I've been doing data for 30 years now, and I wrote the Data Warehouse ETL Toolkit with Ralph Timbal, which is the best selling book in the industry on preparing data for analytics, and with the big paradigm shift that's happened, you know for me the past seven years has been, instead of preparing data for people to analyze data to make decisions, now we're preparing data for machines to make the decisions, and I think that's the big shift from data analysis to data analytics and data science. >> Great, thank you. Miriam, Miriam Fridell, welcome. >> Thank you. I'm Miriam Fridell, I work for Elder Research, we are a data science consultancy, and I came to data science, sort of through a very circuitous route. I started off as a physicist, went to work as a consultant and software engineer, then became a research analyst, and finally came to data science. And I think one of the most interesting things to me about data science is that it's not simply about building an interesting model and doing some interesting mathematics, or maybe wrangling the data, all of which I love to do, but it's really the entire analytics lifecycle, and a value that you can actually extract from data at the end, and that's one of the things that I enjoy most is seeing a client's eyes light up or a wow, I didn't really know we could look at data that way, that's really interesting. I can actually do something with that, so I think that, to me, is one of the most interesting things about it. >> Great, thank you. Justin Sadeen, welcome. >> Absolutely, than you, thank you. So my name is Justin Sadeen, I work for Morph EDU, an artificial intelligence company in Atlanta, Georgia, and we develop learning platforms for non-profit and private educational institutions. So I'm a Marine Corp veteran turned data enthusiast, and so what I think about data science is the intersection of information, intelligence, and analysis, and I'm really excited about the transition from big data into smart data, and that's what I see data science as. >> Great, and last but not least, Dez Blanchfield, welcome mate. >> Good day. Yeah, I'm the one with the funny accent. So data science for me is probably the funniest job I've ever to describe to my mom. I've had quite a few different jobs, and she's never understood any of them, and this one she understands the least. I think a fun way to describe what we're trying to do in the world of data science and analytics now is it's the equivalent of high altitude mountain climbing. It's like the extreme sport version of the computer science world, because we have to be this magical unicorn of a human that can understand plain english problems from C-suite down and then translate it into code, either as soles or as teams of developers. And so there's this black art that we're expected to be able to transmogrify from something that we just in plain english say I would like to know X, and we have to go and figure it out, so there's this neat extreme sport view I have of rushing down the side of a mountain on a mountain bike and just dodging rocks and trees and things occasionally, because invariably, we do have things that go wrong, and they don't quite give us the answers we want. But I think we're at an interesting point in time now with the explosion in the types of technology that are at our fingertips, and the scale at which we can do things now, once upon a time we would sit at a terminal and write code and just look at data and watch it in columns, and then we ended up with spreadsheet technologies at our fingertips. Nowadays it's quite normal to instantiate a small high performance distributed cluster of computers, effectively a super computer in a public cloud, and throw some data at it and see what comes back. And we can do that on a credit card. So I think we're at a really interesting tipping point now where this coinage of data science needs to be slightly better defined, so that we can help organizations who have weird and strange questions that they want to ask, tell them solutions to those questions, and deliver on them in, I guess, a commodity deliverable. I want to know xyz and I want to know it in this time frame and I want to spend this much amount of money to do it, and I don't really care how you're going to do it. And there's so many tools we can choose from and there's so many platforms we can choose from, it's this little black art of computing, if you'd like, we're effectively making it up as we go in many ways, so I think it's one of the most exciting challenges that I've had, and I think I'm pretty sure I speak for most of us in that we're lucky that we get paid to do this amazing job. That we get make up on a daily basis in some cases. >> Excellent, well okay. So we'll just get right into it. I'm going to go off script-- >> Do they have unicorns down under? I think they have some strange species right? >> Well we put the pointy bit on the back. You guys have in on the front. >> So I was at an IBM event on Friday. It was a chief data officer summit, and I attended what was called the Data Divas' breakfast. It was a women in tech thing, and one of the CDOs, she said that 25% of chief data officers are women, which is much higher than you would normally see in the profile of IT. We happen to have 25% of our panelists are women. Is that common? Miriam and Jennifer, is that common for the data science field? Or is this a higher percentage than you would normally see-- >> James: Or a lower percentage? >> I think certainly for us, we have hired a number of additional women in the last year, and they are phenomenal data scientists. I don't know that I would say, I mean I think it's certainly typical that this is still a male-dominated field, but I think like many male-dominated fields, physics, mathematics, computer science, I think that that is slowly changing and evolving, and I think certainly, that's something that we've noticed in our firm over the years at our consultancy, as we're hiring new people. So I don't know if I would say 25% is the right number, but hopefully we can get it closer to 50. Jennifer, I don't know if you have... >> Yeah, so I know at Nielsen we have actually more than 25% of our team is women, at least the team I work with, so there seems to be a lot of women who are going into the field. Which isn't too surprising, because with a lot of the issues that come up in STEM, one of the reasons why a lot of women drop out is because they want real world jobs and they feel like they want to be in the workforce, and so I think this is a great opportunity with data science being so popular for these women to actually have a job where they can still maintain that engineering and science view background that they learned in school. >> Great, well Hillary Mason, I think, was the first data scientist that I ever interviewed, and I asked her what are the sort of skills required and the first question that we wanted to ask, I just threw other women in tech in there, 'cause we love women in tech, is about this notion of the unicorn data scientist, right? It's been put forth that there's the skill sets required to be a date scientist are so numerous that it's virtually impossible to have a data scientist with all those skills. >> And I love Dez's extreme sports analogy, because that plays into the whole notion of data science, we like to talk about the theme now of data science as a team sport. Must it be an extreme sport is what I'm wondering, you know. The unicorns of the world seem to be... Is that realistic now in this new era? >> I mean when automobiles first came out, they were concerned that there wouldn't be enough chauffeurs to drive all the people around. Is there an analogy with data, to be a data-driven company. Do I need a data scientist, and does that data scientist, you know, need to have these unbelievable mixture of skills? Or are we doomed to always have a skill shortage? Open it up. >> I'd like to have a crack at that, so it's interesting, when automobiles were a thing, when they first bought cars out, and before they, sort of, were modernized by the likes of Ford's Model T, when we got away from the horse and carriage, they actually had human beings walking down the street with a flag warning the public that the horseless carriage was coming, and I think data scientists are very much like that. That we're kind of expected to go ahead of the organization and try and take the challenges we're faced with today and see what's going to come around the corner. And so we're like the little flag-bearers, if you'd like, in many ways of this is where we're at today, tell me where I'm going to be tomorrow, and try and predict the day after as well. It is very much becoming a team sport though. But I think the concept of data science being a unicorn has come about because the coinage hasn't been very well defined, you know, if you were to ask 10 people what a data scientist were, you'd get 11 answers, and I think this is a really challenging issue for hiring managers and C-suites when the generants say I was data science, I want big data, I want an analyst. They don't actually really know what they're asking for. Generally, if you ask for a database administrator, it's a well-described job spec, and you can just advertise it and some 20 people will turn up and you interview to decide whether you like the look and feel and smell of 'em. When you ask for a data scientist, there's 20 different definitions of what that one data science role could be. So we don't initially know what the job is, we don't know what the deliverable is, and we're still trying to figure that out, so yeah. >> Craig what about you? >> So from my experience, when we talk about data science, we're really talking about a collection of experiences with multiple people I've yet to find, at least from my experience, a data science effort with a lone wolf. So you're talking about a combination of skills, and so you don't have, no one individual needs to have all that makes a data scientist a data scientist, but you definitely have to have the right combination of skills amongst a team in order to accomplish the goals of data science team. So from my experiences and from the clients that I've worked with, we refer to the data science effort as a data science team. And I believe that's very appropriate to the team sport analogy. >> For us, we look at a data scientist as a full stack web developer, a jack of all trades, I mean they need to have a multitude of background coming from a programmer from an analyst. You can't find one subject matter expert, it's very difficult. And if you're able to find a subject matter expert, you know, through the lifecycle of product development, you're going to require that individual to interact with a number of other members from your team who are analysts and then you just end up well training this person to be, again, a jack of all trades, so it comes full circle. >> I own a business that does nothing but data solutions, and we've been in business 15 years, and it's been, the transition over time has been going from being a conventional wisdom run company with a bunch of experts at the top to becoming more of a data-driven company using data warehousing and BI, but now the trend is absolutely analytics driven. So if you're not becoming an analytics-driven company, you are going to be behind the curve very very soon, and it's interesting that IBM is now coining the phrase of a cognitive business. I think that is absolutely the future. If you're not a cognitive business from a technology perspective, and an analytics-driven perspective, you're going to be left behind, that's for sure. So in order to stay competitive, you know, you need to really think about data science think about how you're using your data, and I also see that what's considered the data expert has evolved over time too where it used to be just someone really good at writing SQL, or someone really good at writing queries in any language, but now it's becoming more of a interdisciplinary action where you need soft skills and you also need the hard skills, and that's why I think there's more females in the industry now than ever. Because you really need to have a really broad width of experiences that really wasn't required in the past. >> Greg Piateski, you have a comment? >> So there are not too many unicorns in nature or as data scientists, so I think organizations that want to hire data scientists have to look for teams, and there are a few unicorns like Hillary Mason or maybe Osama Faiat, but they generally tend to start companies and very hard to retain them as data scientists. What I see is in other evolution, automation, and you know, steps like IBM, Watson, the first platform is eventually a great advance for data scientists in the short term, but probably what's likely to happen in the longer term kind of more and more of those skills becoming subsumed by machine unique layer within the software. How long will it take, I don't know, but I have a feeling that the paradise for data scientists may not be very long lived. >> Greg, I have a follow up question to what I just heard you say. When a data scientist, let's say a unicorn data scientist starts a company, as you've phrased it, and the company's product is built on data science, do they give up becoming a data scientist in the process? It would seem that they become a data scientist of a higher order if they've built a product based on that knowledge. What is your thoughts on that? >> Well, I know a few people like that, so I think maybe they remain data scientists at heart, but they don't really have the time to do the analysis and they really have to focus more on strategic things. For example, today actually is the birthday of Google, 18 years ago, so Larry Page and Sergey Brin wrote a very influential paper back in the '90s About page rank. Have they remained data scientist, perhaps a very very small part, but that's not really what they do, so I think those unicorn data scientists could quickly evolve to have to look for really teams to capture those skills. >> Clearly they come to a point in their career where they build a company based on teams of data scientists and data engineers and so forth, which relates to the topic of team data science. What is the right division of roles and responsibilities for team data science? >> Before we go, Jennifer, did you have a comment on that? >> Yeah, so I guess I would say for me, when data science came out and there was, you know, the Venn Diagram that came out about all the skills you were supposed to have? I took a very different approach than all of the people who I knew who were going into data science. Most people started interviewing immediately, they were like this is great, I'm going to get a job. I went and learned how to develop applications, and learned computer science, 'cause I had never taken a computer science course in college, and made sure I trued up that one part where I didn't know these things or had the skills from school, so I went headfirst and just learned it, and then now I have actually a lot of technology patents as a result of that. So to answer Jim's question, actually. I started my company about five years ago. And originally started out as a consulting firm slash data science company, then it evolved, and one of the reasons I went back in the industry and now I'm at Nielsen is because you really can't do the same sort of data science work when you're actually doing product development. It's a very very different sort of world. You know, when you're developing a product you're developing a core feature or functionality that you're going to offer clients and customers, so I think definitely you really don't get to have that wide range of sort of looking at 8 million models and testing things out. That flexibility really isn't there as your product starts getting developed. >> Before we go into the team sport, the hard skills that you have, are you all good at math? Are you all computer science types? How about math? Are you all math? >> What were your GPAs? (laughs) >> David: Anybody not math oriented? Anybody not love math? You don't love math? >> I love math, I think it's required. >> David: So math yes, check. >> You dream in equations, right? You dream. >> Computer science? Do I have to have computer science skills? At least the basic knowledge? >> I don't know that you need to have formal classes in any of these things, but I think certainly as Jennifer was saying, if you have no skills in programming whatsoever and you have no interest in learning how to write SQL queries or RR Python, you're probably going to struggle a little bit. >> James: It would be a challenge. >> So I think yes, I have a Ph.D. in physics, I did a lot of math, it's my love language, but I think you don't necessarily need to have formal training in all of these things, but I think you need to have a curiosity and a love of learning, and so if you don't have that, you still want to learn and however you gain that knowledge I think, but yeah, if you have no technical interests whatsoever, and don't want to write a line of code, maybe data science is not the field for you. Even if you don't do it everyday. >> And statistics as well? You would put that in that same general category? How about data hacking? You got to love data hacking, is that fair? Eaves, you have a comment? >> Yeah, I think so, while we've been discussing that for me, the most important part is that you have a logical mind and you have the capability to absorb new things and the curiosity you need to dive into that. While I don't have an education in IT or whatever, I have a background in chemistry and those things that I learned there, I apply to information technology as well, and from a part that you say, okay, I'm a tech-savvy guy, I'm interested in the tech part of it, you need to speak that business language and if you can do that crossover and understand what other skill sets or parts of the roles are telling you I think the communication in that aspect is very important. >> I'd like throw just something really quickly, and I think there's an interesting thing that happens in IT, particularly around technology. We tend to forget that we've actually solved a lot of these problems in the past. If we look in history, if we look around the second World War, and Bletchley Park in the UK, where you had a very similar experience as humans that we're having currently around the whole issue of data science, so there was an interesting challenge with the enigma in the shark code, right? And there was a bunch of men put in a room and told, you're mathematicians and you come from universities, and you can crack codes, but they couldn't. And so what they ended up doing was running these ads, and putting challenges, they actually put, I think it was crossword puzzles in the newspaper, and this deluge of women came out of all kinds of different roles without math degrees, without science degrees, but could solve problems, and they were thrown at the challenge of cracking codes, and invariably, they did the heavy lifting. On a daily basis for converting messages from one format to another, so that this very small team at the end could actually get in play with the sexy piece of it. And I think we're going through a similar shift now with what we're refer to as data science in the technology and business world. Where the people who are doing the heavy lifting aren't necessarily what we'd think of as the traditional data scientists, and so, there have been some unicorns and we've championed them, and they're great. But I think the shift's going to be to accountants, actuaries, and statisticians who understand the business, and come from an MBA star background that can learn the relevant pieces of math and models that we need to to apply to get the data science outcome. I think we've already been here, we've solved this problem, we've just got to learn not to try and reinvent the wheel, 'cause the media hypes this whole thing of data science is exciting and new, but we've been here a couple times before, and there's a lot to be learned from that, my view. >> I think we had Joe next. >> Yeah, so I was going to say that, data science is a funny thing. To use the word science is kind of a misnomer, because there is definitely a level of art to it, and I like to use the analogy, when Michelangelo would look at a block of marble, everyone else looked at the block of marble to see a block of marble. He looks at a block of marble and he sees a finished sculpture, and then he figures out what tools do I need to actually make my vision? And I think data science is a lot like that. We hear a problem, we see the solution, and then we just need the right tools to do it, and I think part of consulting and data science in particular. It's not so much what we know out of the gate, but it's how quickly we learn. And I think everyone here, what makes them brilliant, is how quickly they could learn any tool that they need to see their vision get accomplished. >> David: Justin? >> Yeah, I think you make a really great point, for me, I'm a Marine Corp veteran, and the reason I mentioned that is 'cause I work with two veterans who are problem solvers. And I think that's what data scientists really are, in the long run are problem solvers, and you mentioned a great point that, yeah, I think just problem solving is the key. You don't have to be a subject matter expert, just be able to take the tools and intelligently use them. >> Now when you look at the whole notion of team data science, what is the right mix of roles, like role definitions within a high-quality or a high-preforming data science teams now IBM, with, of course, our announcement of project, data works and so forth. We're splitting the role division, in terms of data scientist versus data engineers versus application developer versus business analyst, is that the right breakdown of roles? Or what would the panelists recommend in terms of understanding what kind of roles make sense within, like I said, a high performing team that's looking for trying to develop applications that depend on data, machine learning, and so forth? Anybody want to? >> I'll tackle that. So the teams that I have created over the years made up these data science teams that I brought into customer sites have a combination of developer capabilities and some of them are IT developers, but some of them were developers of things other than applications. They designed buildings, they did other things with their technical expertise besides building technology. The other piece besides the developer is the analytics, and analytics can be taught as long as they understand how algorithms work and the code behind the analytics, in other words, how are we analyzing things, and from a data science perspective, we are leveraging technology to do the analyzing through the tool sets, so ultimately as long as they understand how tool sets work, then we can train them on the tools. Having that analytic background is an important piece. >> Craig, is it easier to, I'll go to you in a moment Joe, is it easier to cross train a data scientist to be an app developer, than to cross train an app developer to be a data scientist or does it not matter? >> Yes. (laughs) And not the other way around. It depends on the-- >> It's easier to cross train a data scientist to be an app developer than-- >> Yes. >> The other way around. Why is that? >> Developing code can be as difficult as the tool set one uses to develop code. Today's tool sets are very user friendly. where developing code is very difficult to teach a person to think along the lines of developing code when they don't have any idea of the aspects of code, of building something. >> I think it was Joe, or you next, or Jennifer, who was it? >> I would say that one of the reasons for that is data scientists will probably know if the answer's right after you process data, whereas data engineer might be able to manipulate the data but may not know if the answer's correct. So I think that is one of the reasons why having a data scientist learn the application development skills might be a easier time than the other way around. >> I think Miriam, had a comment? Sorry. >> I think that what we're advising our clients to do is to not think, before data science and before analytics became so required by companies to stay competitive, it was more of a waterfall, you have a data engineer build a solution, you know, then you throw it over the fence and the business analyst would have at it, where now, it must be agile, and you must have a scrum team where you have the data scientist and the data engineer and the project manager and the product owner and someone from the chief data office all at the table at the same time and all accomplishing the same goal. Because all of these skills are required, collectively in order to solve this problem, and it can't be done daisy chained anymore it has to be a collaboration. And that's why I think spark is so awesome, because you know, spark is a single interface that a data engineer can use, a data analyst can use, and a data scientist can use. And now with what we've learned today, having a data catalog on top so that the chief data office can actually manage it, I think is really going to take spark to the next level. >> James: Miriam? >> I wanted to comment on your question to Craig about is it harder to teach a data scientist to build an application or vice versa, and one of the things that we have worked on a lot in our data science team is incorporating a lot of best practices from software development, agile, scrum, that sort of thing, and I think particularly with a focus on deploying models that we don't just want to build an interesting data science model, we want to deploy it, and get some value. You need to really incorporate these processes from someone who might know how to build applications and that, I think for some data scientists can be a challenge, because one of the fun things about data science is you get to get into the data, and you get your hands dirty, and you build a model, and you get to try all these cool things, but then when the time comes for you to actually deploy something, you need deployment-grade code in order to make sure it can go into production at your client side and be useful for instance, so I think that there's an interesting challenge on both ends, but one of the things I've definitely noticed with some of our data scientists is it's very hard to get them to think in that mindset, which is why you have a team of people, because everyone has different skills and you can mitigate that. >> Dev-ops for data science? >> Yeah, exactly. We call it insight ops, but yeah, I hear what you're saying. Data science is becoming increasingly an operational function as opposed to strictly exploratory or developmental. Did some one else have a, Dez? >> One of the things I was going to mention, one of the things I like to do when someone gives me a new problem is take all the laptops and phones away. And we just end up in a room with a whiteboard. And developers find that challenging sometimes, so I had this one line where I said to them don't write the first line of code until you actually understand the problem you're trying to solve right? And I think where the data science focus has changed the game for organizations who are trying to get some systematic repeatable process that they can throw data at and just keep getting answers and things, no matter what the industry might be is that developers will come with a particular mindset on how they're going to codify something without necessarily getting the full spectrum and understanding the problem first place. What I'm finding is the people that come at data science tend to have more of a hacker ethic. They want to hack the problem, they want to understand the challenge, and they want to be able to get it down to plain English simple phrases, and then apply some algorithms and then build models, and then codify it, and so most of the time we sit in a room with whiteboard markers just trying to build a model in a graphical sense and make sure it's going to work and that it's going to flow, and once we can do that, we can codify it. I think when you come at it from the other angle from the developer ethic, and you're like I'm just going to codify this from day one, I'm going to write code. I'm going to hack this thing out and it's just going to run and compile. Often, you don't truly understand what he's trying to get to at the end point, and you can just spend days writing code and I think someone made the comment that sometimes you don't actually know whether the output is actually accurate in the first place. So I think there's a lot of value being provided from the data science practice. Over understanding the problem in plain english at a team level, so what am I trying to do from the business consulting point of view? What are the requirements? How do I build this model? How do I test the model? How do I run a sample set through it? Train the thing and then make sure what I'm going to codify actually makes sense in the first place, because otherwise, what are you trying to solve in the first place? >> Wasn't that Einstein who said if I had an hour to solve a problem, I'd spend 55 minutes understanding the problem and five minutes on the solution, right? It's exactly what you're talking about. >> Well I think, I will say, getting back to the question, the thing with building these teams, I think a lot of times people don't talk about is that engineers are actually very very important for data science projects and data science problems. For instance, if you were just trying to prototype something or just come up with a model, then data science teams are great, however, if you need to actually put that into production, that code that the data scientist has written may not be optimal, so as we scale out, it may be actually very inefficient. At that point, you kind of want an engineer to step in and actually optimize that code, so I think it depends on what you're building and that kind of dictates what kind of division you want among your teammates, but I do think that a lot of times, the engineering component is really undervalued out there. >> Jennifer, it seems that the data engineering function, data discovery and preparation and so forth is becoming automated to a greater degree, but if I'm listening to you, I don't hear that data engineering as a discipline is becoming extinct in terms of a role that people can be hired into. You're saying that there's a strong ongoing need for data engineers to optimize the entire pipeline to deliver the fruits of data science in production applications, is that correct? So they play that very much operational role as the backbone for... >> So I think a lot of times businesses will go to data scientist to build a better model to build a predictive model, but that model may not be something that you really want to implement out there when there's like a million users coming to your website, 'cause it may not be efficient, it may take a very long time, so I think in that sense, it is important to have good engineers, and your whole product may fail, you may build the best model it may have the best output, but if you can't actually implement it, then really what good is it? >> What about calibrating these models? How do you go about doing that and sort of testing that in the real world? Has that changed overtime? Or is it... >> So one of the things that I think can happen, and we found with one of our clients is when you build a model, you do it with the data that you have, and you try to use a very robust cross-validation process to make sure that it's robust and it's sturdy, but one thing that can sometimes happen is after you put your model into production, there can be external factors that, societal or whatever, things that have nothing to do with the data that you have or the quality of the data or the quality of the model, which can actually erode the model's performance over time. So as an example, we think about cell phone contracts right? Those have changed a lot over the years, so maybe five years ago, the type of data plan you had might not be the same that it is today, because a totally different type of plan is offered, so if you're building a model on that to say predict who's going to leave and go to a different cell phone carrier, the validity of your model overtime is going to completely degrade based on nothing that you have, that you put into the model or the data that was available, so I think you need to have this sort of model management and monitoring process to take this factors into account and then know when it's time to do a refresh. >> Cross-validation, even at one point in time, for example, there was an article in the New York Times recently that they gave the same data set to five different data scientists, this is survey data for the presidential election that's upcoming, and five different data scientists came to five different predictions. They were all high quality data scientists, the cross-validation showed a wide variation about who was on top, whether it was Hillary or whether it was Trump so that shows you that even at any point in time, cross-validation is essential to understand how robust the predictions might be. Does somebody else have a comment? Joe? >> I just want to say that this even drives home the fact that having the scrum team for each project and having the engineer and the data scientist, data engineer and data scientist working side by side because it is important that whatever we're building we assume will eventually go into production, and we used to have in the data warehousing world, you'd get the data out of the systems, out of your applications, you do analysis on your data, and the nirvana was maybe that data would go back to the system, but typically it didn't. Nowadays, the applications are dependent on the insight coming from the data science team. With the behavior of the application and the personalization and individual experience for a customer is highly dependent, so it has to be, you said is data science part of the dev-ops team, absolutely now, it has to be. >> Whose job is it to figure out the way in which the data is presented to the business? Where's the sort of presentation, the visualization plan, is that the data scientist role? Does that depend on whether or not you have that gene? Do you need a UI person on your team? Where does that fit? >> Wow, good question. >> Well usually that's the output, I mean, once you get to the point where you're visualizing the data, you've created an algorithm or some sort of code that produces that to be visualized, so at the end of the day that the customers can see what all the fuss is about from a data science perspective. But it's usually post the data science component. >> So do you run into situations where you can see it and it's blatantly obvious, but it doesn't necessarily translate to the business? >> Well there's an interesting challenge with data, and we throw the word data around a lot, and I've got this fun line I like throwing out there. If you torture data long enough, it will talk. So the challenge then is to figure out when to stop torturing it, right? And it's the same with models, and so I think in many other parts of organizations, we'll take something, if someone's doing a financial report on performance of the organization and they're doing it in a spreadsheet, they'll get two or three peers to review it, and validate that they've come up with a working model and the answer actually makes sense. And I think we're rushing so quickly at doing analysis on data that comes to us in various formats and high velocity that I think it's very important for us to actually stop and do peer reviews, of the models and the data and the output as well, because otherwise we start making decisions very quickly about things that may or may not be true. It's very easy to get the data to paint any picture you want, and you gave the example of the five different attempts at that thing, and I had this shoot out thing as well where I'll take in a team, I'll get two different people to do exactly the same thing in completely different rooms, and come back and challenge each other, and it's quite amazing to see the looks on their faces when they're like, oh, I didn't see that, and then go back and do it again until, and then just keep iterating until we get to the point where they both get the same outcome, in fact there's a really interesting anecdote about when the UNIX operation system was being written, and a couple of the authors went away and wrote the same program without realizing that each other were doing it, and when they came back, they actually had line for line, the same piece of C code, 'cause they'd actually gotten to a truth. A perfect version of that program, and I think we need to often look at, when we're building models and playing with data, if we can't come at it from different angles, and get the same answer, then maybe the answer isn't quite true yet, so there's a lot of risk in that. And it's the same with presentation, you know, you can paint any picture you want with the dashboard, but who's actually validating when the dashboard's painting the correct picture? >> James: Go ahead, please. >> There is a science actually, behind data visualization, you know if you're doing trending, it's a line graph, if you're doing comparative analysis, it's bar graph, if you're doing percentages, it's a pie chart, like there is a certain science to it, it's not that much of a mystery as the novice thinks there is, but what makes it challenging is that you also, just like any presentation, you have to consider your audience. And your audience, whenever we're delivering a solution, either insight, or just data in a grid, we really have to consider who is the consumer of this data, and actually cater the visual to that person or to that particular audience. And that is part of the art, and that is what makes a great data scientist. >> The consumer may in fact be the source of the data itself, like in a mobile app, so you're tuning their visualization and then their behavior is changing as a result, and then the data on their changed behavior comes back, so it can be a circular process. >> So Jim, at a recent conference, you were tweeting about the citizen data scientist, and you got emasculated by-- >> I spoke there too. >> Okay. >> TWI on that same topic, I got-- >> Kirk Borne I hear came after you. >> Kirk meant-- >> Called foul, flag on the play. >> Kirk meant well. I love Claudia Emahoff too, but yeah, it's a controversial topic. >> So I wonder what our panel thinks of that notion, citizen data scientist. >> Can I respond about citizen data scientists? >> David: Yeah, please. >> I think this term was introduced by Gartner analyst in 2015, and I think it's a very dangerous and misleading term. I think definitely we want to democratize the data and have access to more people, not just data scientists, but managers, BI analysts, but when there is already a term for such people, we can call the business analysts, because it implies some training, some understanding of the data. If you use the term citizen data scientist, it implies that without any training you take some data and then you find something there, and they think as Dev's mentioned, we've seen many examples, very easy to find completely spurious random correlations in data. So we don't want citizen dentists to treat our teeth or citizen pilots to fly planes, and if data's important, having citizen data scientists is equally dangerous, so I'm hoping that, I think actually Gartner did not use the term citizen data scientist in their 2016 hype course, so hopefully they will put this term to rest. >> So Gregory, you apparently are defining citizen to mean incompetent as opposed to simply self-starting. >> Well self-starting is very different, but that's not what I think what was the intention. I think what we see in terms of data democratization, there is a big trend over automation. There are many tools, for example there are many companies like Data Robot, probably IBM, has interesting machine learning capability towards automation, so I think I recently started a page on KDnuggets for automated data science solutions, and there are already 20 different forums that provide different levels of automation. So one can deliver in full automation maybe some expertise, but it's very dangerous to have part of an automated tool and at some point then ask citizen data scientists to try to take the wheels. >> I want to chime in on that. >> David: Yeah, pile on. >> I totally agree with all of that. I think the comment I just want to quickly put out there is that the space we're in is a very young, and rapidly changing world, and so what we haven't had yet is this time to stop and take a deep breath and actually define ourselves, so if you look at computer science in general, a lot of the traditional roles have sort of had 10 or 20 years of history, and so thorough the hiring process, and the development of those spaces, we've actually had time to breath and define what those jobs are, so we know what a systems programmer is, and we know what a database administrator is, but we haven't yet had a chance as a community to stop and breath and say, well what do we think these roles are, and so to fill that void, the media creates coinages, and I think this is the risk we've got now that the concept of a data scientist was just a term that was coined to fill a void, because no one quite knew what to call somebody who didn't come from a data science background if they were tinkering around data science, and I think that's something that we need to sort of sit up and pay attention to, because if we don't own that and drive it ourselves, then somebody else is going to fill the void and they'll create these very frustrating concepts like data scientist, which drives us all crazy. >> James: Miriam's next. >> So I wanted to comment, I agree with both of the previous comments, but in terms of a citizen data scientist, and I think whether or not you're citizen data scientist or an actual data scientist whatever that means, I think one of the most important things you can have is a sense of skepticism, right? Because you can get spurious correlations and it's like wow, my predictive model is so excellent, you know? And being aware of things like leaks from the future, right? This actually isn't predictive at all, it's a result of the thing I'm trying to predict, and so I think one thing I know that we try and do is if something really looks too good, we need to go back in and make sure, did we not look at the data correctly? Is something missing? Did we have a problem with the ETL? And so I think that a healthy sense of skepticism is important to make sure that you're not taking a spurious correlation and trying to derive some significant meaning from it. >> I think there's a Dilbert cartoon that I saw that described that very well. Joe, did you have a comment? >> I think that in order for citizen data scientists to really exist, I think we do need to have more maturity in the tools that they would use. My vision is that the BI tools of today are all going to be replaced with natural language processing and searching, you know, just be able to open up a search bar and say give me sales by region, and to take that one step into the future even further, it should actually say what are my sales going to be next year? And it should trigger a simple linear regression or be able to say which features of the televisions are actually affecting sales and do a clustering algorithm, you know I think hopefully that will be the future, but I don't see anything of that today, and I think in order to have a true citizen data scientist, you would need to have that, and that is pretty sophisticated stuff. >> I think for me, the idea of citizen data scientist I can relate to that, for instance, when I was in graduate school, I started doing some research on FDA data. It was an open source data set about 4.2 million data points. Technically when I graduated, the paper was still not published, and so in some sense, you could think of me as a citizen data scientist, right? I wasn't getting funding, I wasn't doing it for school, but I was still continuing my research, so I'd like to hope that with all the new data sources out there that there might be scientists or people who are maybe kept out of a field people who wanted to be in STEM and for whatever life circumstance couldn't be in it. That they might be encouraged to actually go and look into the data and maybe build better models or validate information that's out there. >> So Justin, I'm sorry you had one comment? >> It seems data science was termed before academia adopted formalized training for data science. But yeah, you can make, like Dez said, you can make data work for whatever problem you're trying to solve, whatever answer you see, you want data to work around it, you can make it happen. And I kind of consider that like in project management, like data creep, so you're so hyper focused on a solution you're trying to find the answer that you create an answer that works for that solution, but it may not be the correct answer, and I think the crossover discussion works well for that case. >> So but the term comes up 'cause there's a frustration I guess, right? That data science skills are not plentiful, and it's potentially a bottleneck in an organization. Supposedly 80% of your time is spent on cleaning data, is that right? Is that fair? So there's a problem. How much of that can be automated and when? >> I'll have a shot at that. So I think there's a shift that's going to come about where we're going to move from centralized data sets to data at the edge of the network, and this is something that's happening very quickly now where we can't just hold everything back to a central spot. When the internet of things actually wakes up. Things like the Boeing Dreamliner 787, that things got 6,000 sensors in it, produces half a terabyte of data per flight. There are 87,400 flights per day in domestic airspace in the U.S. That's 43.5 petabytes of raw data, now that's about three years worth of disk manufacturing in total, right? We're never going to copy that across one place, we can't process, so I think the challenge we've got ahead of us is looking at how we're going to move the intelligence and the analytics to the edge of the network and pre-cook the data in different tiers, so have a look at the raw material we get, and boil it down to a slightly smaller data set, bring a meta data version of that back, and eventually get to the point where we've only got the very minimum data set and data points we need to make key decisions. Without that, we're already at the point where we have too much data, and we can't munch it fast enough, and we can't spin off enough tin even if we witch the cloud on, and that's just this never ending deluge of noise, right? And you've got that signal versus noise problem so then we're now seeing a shift where people looking at how do we move the intelligence back to the edge of network which we actually solved some time ago in the securities space. You know, spam filtering, if an emails hits Google on the west coast of the U.S. and they create a check some for that spam email, it immediately goes into a database, and nothing gets on the opposite side of the coast, because they already know it's spam. They recognize that email coming in, that's evil, stop it. So we've already fixed its insecurity with intrusion detection, we've fixed it in spam, so we now need to take that learning, and bring it into business analytics, if you like, and see where we're finding patterns and behavior, and brew that out to the edge of the network, so if I'm seeing a demand over here for tickets on a new sale of a show, I need to be able to see where else I'm going to see that demand and start responding to that before the demand comes about. I think that's a shift that we're going to see quickly, because we'll never keep up with the data munching challenge and the volume's just going to explode. >> David: We just have a couple minutes. >> That does sound like a great topic for a future Cube panel which is data science on the edge of the fog. >> I got a hundred questions around that. So we're wrapping up here. Just got a couple minutes. Final thoughts on this conversation or any other pieces that you want to punctuate. >> I think one thing that's been really interesting for me being on this panel is hearing all of my co-panelists talking about common themes and things that we are also experiencing which isn't a surprise, but it's interesting to hear about how ubiquitous some of the challenges are, and also at the announcement earlier today, some of the things that they're talking about and thinking about, we're also talking about and thinking about. So I think it's great to hear we're all in different countries and different places, but we're experiencing a lot of the same challenges, and I think that's been really interesting for me to hear about. >> David: Great, anybody else, final thoughts? >> To echo Dez's thoughts, it's about we're never going to catch up with the amount of data that's produced, so it's about transforming big data into smart data. >> I could just say that with the shift from normal data, small data, to big data, the answer is automate, automate, automate, and we've been talking about advanced algorithms and machine learning for the science for changing the business, but there also needs to be machine learning and advanced algorithms for the backroom where we're actually getting smarter about how we ingestate and how we fix data as it comes in. Because we can actually train the machines to understand data anomalies and what we want to do with them over time. And I think the further upstream we get of data correction, the less work there will be downstream. And I also think that the concept of being able to fix data at the source is gone, that's behind us. Right now the data that we're using to analyze to change the business, typically we have no control over. Like Dez said, they're coming from censors and machines and internet of things and if it's wrong, it's always going to be wrong, so we have to figure out how to do that in our laboratory. >> Eaves, final thoughts? >> I think it's a mind shift being a data scientist if you look back at the time why did you start developing or writing code? Because you like to code, whatever, just for the sake of building a nice algorithm or a piece of software, or whatever, and now I think with the spirit of a data scientist, you're looking at a problem and say this is where I want to go, so you have more the top down approach than the bottom up approach. And have the big picture and that is what you really need as a data scientist, just look across technologies, look across departments, look across everything, and then on top of that, try to apply as much skills as you have available, and that's kind of unicorn that they're trying to look for, because it's pretty hard to find people with that wide vision on everything that is happening within the company, so you need to be aware of technology, you need to be aware of how a business is run, and how it fits within a cultural environment, you have to work with people and all those things together to my belief to make it very difficult to find those good data scientists. >> Jim? Your final thought? >> My final thoughts is this is an awesome panel, and I'm so glad that you've come to New York, and I'm hoping that you all stay, of course, for the the IBM Data First launch event that will take place this evening about a block over at Hudson Mercantile, so that's pretty much it. Thank you, I really learned a lot. >> I want to second Jim's thanks, really, great panel. Awesome expertise, really appreciate you taking the time, and thanks to the folks at IBM for putting this together. >> And I'm big fans of most of you, all of you, on this session here, so it's great just to meet you in person, thank you. >> Okay, and I want to thank Jeff Frick for being a human curtain there with the sun setting here in New York City. Well thanks very much for watching, we are going to be across the street at the IBM announcement, we're going to be on the ground. We open up again tomorrow at 9:30 at Big Data NYC, Big Data Week, Strata plus the Hadoop World, thanks for watching everybody, that's a wrap from here. This is the Cube, we're out. (techno music)

Published Date : Sep 28 2016

SUMMARY :

Brought to you by headline sponsors, and this is a cube first, and we have some really but I want to hear them. and appreciate you organizing this. and the term data mining Eves, I of course know you from Twitter. and you can do that on a technical level, How many people have been on the Cube I always like to ask that question. and that was obviously Great, thank you Craig, and I'm also on the faculty and saw that snake swallow a basketball and with the big paradigm Great, thank you. and I came to data science, Great, thank you. and so what I think about data science Great, and last but not least, and the scale at which I'm going to go off script-- You guys have in on the front. and one of the CDOs, she said that 25% and I think certainly, that's and so I think this is a great opportunity and the first question talk about the theme now and does that data scientist, you know, and you can just advertise and from the clients I mean they need to have and it's been, the transition over time but I have a feeling that the paradise and the company's product and they really have to focus What is the right division and one of the reasons I You dream in equations, right? and you have no interest in learning but I think you need to and the curiosity you and there's a lot to be and I like to use the analogy, and the reason I mentioned that is that the right breakdown of roles? and the code behind the analytics, And not the other way around. Why is that? idea of the aspects of code, of the reasons for that I think Miriam, had a comment? and someone from the chief data office and one of the things that an operational function as opposed to and so most of the time and five minutes on the solution, right? that code that the data but if I'm listening to you, that in the real world? the data that you have or so that shows you that and the nirvana was maybe that the customers can see and a couple of the authors went away and actually cater the of the data itself, like in a mobile app, I love Claudia Emahoff too, of that notion, citizen data scientist. and have access to more people, to mean incompetent as opposed to and at some point then ask and the development of those spaces, and so I think one thing I think there's a and I think in order to have a true so I'd like to hope that with all the new and I think So but the term comes up and the analytics to of the fog. or any other pieces that you want to and also at the so it's about transforming big data and machine learning for the science and now I think with the and I'm hoping that you and thanks to the folks at IBM so it's great just to meet you in person, This is the Cube, we're out.

ENTITIES

Entity	Category	Confidence
Jennifer	PERSON	0.99+
Jennifer Shin	PERSON	0.99+
Miriam Fridell	PERSON	0.99+
Greg Piateski	PERSON	0.99+
Justin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
David	PERSON	0.99+
Jeff Frick	PERSON	0.99+
2015	DATE	0.99+
Joe Caserta	PERSON	0.99+
James Cubelis	PERSON	0.99+
James	PERSON	0.99+
Miriam	PERSON	0.99+
Jim	PERSON	0.99+
Joe	PERSON	0.99+
Claudia Emahoff	PERSON	0.99+
NVIDIA	ORGANIZATION	0.99+
Hillary	PERSON	0.99+
New York	LOCATION	0.99+
Hillary Mason	PERSON	0.99+
Justin Sadeen	PERSON	0.99+
Greg	PERSON	0.99+
Dave	PERSON	0.99+
55 minutes	QUANTITY	0.99+
Trump	PERSON	0.99+
2016	DATE	0.99+
Craig	PERSON	0.99+
Dave Valante	PERSON	0.99+
George	PERSON	0.99+
Dez Blanchfield	PERSON	0.99+
UK	LOCATION	0.99+
Ford	ORGANIZATION	0.99+
Craig Brown	PERSON	0.99+
10	QUANTITY	0.99+
8 Path Solutions	ORGANIZATION	0.99+
CISCO	ORGANIZATION	0.99+
five minutes	QUANTITY	0.99+
two	QUANTITY	0.99+
30 years	QUANTITY	0.99+
Kirk	PERSON	0.99+
25%	QUANTITY	0.99+
Marine Corp	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
43.5 petabytes	QUANTITY	0.99+
Boston	LOCATION	0.99+
Data Robot	ORGANIZATION	0.99+
10 people	QUANTITY	0.99+
Hal Varian	PERSON	0.99+
Einstein	PERSON	0.99+
New York City	LOCATION	0.99+
Nielsen	ORGANIZATION	0.99+
first question	QUANTITY	0.99+
Friday	DATE	0.99+
Ralph Timbal	PERSON	0.99+
U.S.	LOCATION	0.99+
6,000 sensors	QUANTITY	0.99+
UC Berkeley	ORGANIZATION	0.99+
Sergey Brin	PERSON	0.99+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Hal Varian: