A Day in the Life of a Data Scientist

>>Hello, everyone. Welcome to the a day in the life of a data science talk. Uh, my name is Terry Chang. I'm a data scientist for the ASML container platform team. And with me, I have in the chat room, they will be moderating the chat. I have Matt MCO as well as Doug Tackett, and we're going to dive straight into kind of what we can do with the asthma container platform and how we can support the role of a data scientist. >>So just >>A quick agenda. So I'm going to do some introductions and kind of set the context of what we're going to talk about. And then we're actually going to dive straight into the ASML container platforms. So we're going to walk straight into what a data scientist will do, kind of a pretty much a day in the life of the data scientists. And then we'll have some question and answer. So big data has been the talk within the last few years within the last decade or so. And with big data, there's a lot of ways to derive meaning. And then a lot of businesses are trying to utilize their applications and trying to optimize every decision with their, uh, application utilizing data. So previously we had a lot of focus on data analytics, but recently we've seen a lot of data being used for machine learning. So trying to take any data that they can and send it off to the data scientists to start doing some modeling and trying to do some prediction. >>So that's kind of where we're seeing modern businesses rooted in analytics and data science in itself is a team sport. We're seeing that it doesn't, we need more than data scientists to do all this modeling. We need data engineers to take the data, massage the data and do kind of some data manipulation in order to get it right for the data scientists. We have data analysts who are monitoring the models, and we even have the data scientists themselves who are building and iterating through multiple different models until they find a one that is satisfactory to the business needs. Then once they're done, they can send it off to the software engineers who will actually build it out into their application, whether it's a mobile app or a web app. And then we have the operations team kind of assigning the resources and also monitoring it as well. >>So we're really seeing data science as a team sport, and it does require a lot of different expertise and here's the kind of basic machine learning pipeline that we see in the industry now. So, uh, at the top we have this training environment and this is, uh, an entire loop. Uh, we'll have some registration, we'll have some inferencing and at the center of all, this is all the data prep, as well as your repositories, such as for your data, for any of your GitHub repository, things of that sort. So we're kind of seeing the machine learning industry, go follow this very basic pattern and at a high level I'll glance through this very quickly, but this is kind of what the, uh, machine learning pipeline will look like on the ASML container platform. So at the top left, we'll have our, our project depository, which is our, uh, persistent storage. >>We'll have some training clusters, we'll have a notebook, we'll have an inference deployment engine and a rest API, which is all sitting on top of the Kubernetes cluster. And the benefit of the container platform is that this is all abstracted away from the data scientist. So I will actually go straight into that. So just to preface, before we go into the data as small container platform, where we're going to look at is a machine learning example, problem that is, uh, trying to predict how long a specific taxi ride will take. So with a Jupiter notebook, the data scientists can take all of this data. They can do their data manipulation, train a model on a specific set of features, such as the location of a taxi ride, the duration of a taxi ride, and then model it to trying to figure out, you know, what, what kind of prediction we can get on a future taxi ride. >>So that's the example that we will talk through today. I'm going to hop out of my slides and jump into my web browser. So let me zoom in on this. So here I have a Jupiter environment and, um, this is all running on the container platform. All I need is actually this link and I can access my environment. So as a data scientist, I can grab this link from my it admin or my system administrator. And I could quickly start iterating and, and start coding. So on the left-hand side of the Jupiter, we actually have a file directory structure. So this is already synced up to my get repository, which I will show in a little bit on the container platform so quickly I can pull any files that are on my get hub repository. I can even push with a button here, but I can, uh, open up this Python notebook. >>And with all this, uh, unique features of the Jupiter environment, I can start coding. So each of these cells can run Python code and in specific the container at the ESMO container platform team, we've actually built our own in-house lime magic commands. So these are unique commands, um, that we can use to interact with the underlying infrastructure of the container platform. So the first line magic command that I want to mention is this command called percent attachments. When I run this command, I'll actually get the available training clusters that I can send training jobs to. So this specific notebook, uh, it's pretty much been created for me to quickly iterate and develop a model very quickly. I don't have to use all the resources. I don't have to allocate a full set of GPU boxes onto my little Jupiter environment. So with the training cluster, I can attach these individual data science notebooks to those training clusters and the data scientists can actually utilize those resources as a shared environment. >>So the, essentially the shared large eight GPU box can actually be shared. They don't have to be allocated to a single data scientist moving on. We have another line magic command, it's called percent percent Python training. This is how we're going to utilize that training cluster. So I will prepare the cell percent percent with the name of the training cluster. And this is going to tell this notebook to send this entire training cell, to be trained on those resources on that training cluster. So the data scientists can quickly iterate through a model. They can then format that model and all that code into a large cell and send it off to that training cluster. So because of that training cluster is actually located somewhere else. It has no context of what has been done locally in this notebook. So we're going to have to do and copy everything into one large cell. >>So as you see here, I'm going to be importing some libraries and I'm in a, you know, start defining some helper functions. I'm going to read in my dataset and with the typical data science modeling life cycle, we're going to have to take in the data. We're going to have to do some data pre-processing. So maybe the data scientists will do this. Maybe the data engineer will do this, but they have access to that data. So I'm here. I'm actually getting there to be reading in the data from the project repository. And I'll talk about this a little bit later with all of the clusters within the container platform, we have access to some project repository that has been set up using the underlying data fabric. So with this, I have, uh, some data preprocessing, I'm going to cleanse some of my data that I noticed that maybe something is missing or, uh, some data doesn't look funky. >>Maybe the data types aren't correct. This will all happen here in these cells. So once that is done, I can print out that the data is done cleaning. I can start training my model. So here we have to split our data, set into a test, train, uh, data split so that we have some data for actually training the model and some data to test the model. So I can split my data there. I could create my XG boost object to start doing my training and XG boost is kind of like a decision tree machine learning algorithm, and I'm going to fit my data into this, uh, XG boost algorithm. And then I'm going to do some prediction. And then in addition, I'm actually going to be tracking some of the metrics and printing them out. So these are common metrics that we, that data scientists want to see when they do their training of the algorithm. >>Just to see if some of the accuracy is being improved, if the loss is being improved or the mean absolute error. So things like that. So these are all things, data scientists want to see. And at the end of this training job, I'm going to be saving the model. So I'm going to be saving it back into the project repository in which we will have access to. And at the end, I will print out the end time so I can execute that cell. And I've already executed that cell. So you'll see all of these print statements happening here. So importing the libraries, the training was run reading and data, et cetera. All of this has been printed out from that training job. Um, and in order to access that, uh, kind of glance through that, we would get an output with a unique history URL. >>So when we send the training job to that training cluster, we'll the training cluster will send back a unique URL in which we'll use the last line magic command that I want to talk about called percent logs. So percent logs will actually, uh, parse out that response from the training cluster. And actually we can track in real time what is happening in that training job so quickly, we can see that the data scientist has a sandbox environment available to them. They have access to their get repository. They have access to a project repository in which they can read in some of their data and save the model. So very quick interactive environment for the data scientists to do all of their work. And it's all provisioned on the ASML container platform. And it's also abstracted away. So here, um, I want to mention that again, this URL is being surfaced through the container platform. >>The data scientist doesn't have to interact with that at all, but let's take, it's take a step back. Uh, this is the day to day in the life of the data scientists. Now, if we go backwards into the container platform and we're going to walk through how it was all set up for them. So here is my login page to the container platform. I'm going to log in as my user, and this is going to bring me to the, uh, view of the, uh, Emma lops tenant within the container platform. So this is where everything has been set up for me, the data scientist doesn't have to see this if they don't need to, but what I'll walk through now is kind of the topics that I mentioned previously that we would go back into. So first is the project repository. So this project deposited comes with each tenant that is created on the platform. >>So this is a more, nothing more than a shared collaborative workspace environment in which data scientist or any data scientist who is allocated to this tenant. They have this politics client that can visually see all their data of all, all of their code. And this is actually taking a piece of the underlying data fabric and using that for your project depository. So you can see here, I have some code I can create and see my scoring script. I can see the models that have been created within this tenant. So it's pretty much a powerful tool in which you can store your code store any of your data and have the ability to read and write from any of your Jupiter environments or any of your created clusters within this tenant. So a very cool ad here in which you can, uh, quickly interact with your data. >>The next thing I want to show is the source control. So here is where you would plug in all of your information for your source control. And if I edit this, you guys will actually see all the information that I've passed in to configure the source control. So on the backend, the container platform will take these credentials and connect the Jupiter notebooks you create within this tenant to that get repository. So this is the information that I've passed in. If GitHub is not of interest, we also have support for bit bucket here as well. So next I want to show you guys that we do have these notebook environments. So, um, the notebook environment was created here and you can see that I have a notebook called Teri notebook, and this is all running on the Kubernetes environment within the container platform. So either the data scientists can come here and create their notebook or their project admin can create the notebook. >>And all you'd have to do is come here to this notebook end points. And this, the container platform will actually map the container platform to a specific port in which you can just give this link to the data scientists. And this link will actually bring them to their own Jupiter environment and they can start doing all of their model just as I showed in that previous Jupiter environment. Next I want to show the training cluster. This is the training cluster that was created in which I can attach my notebook to start utilizing those training clusters. And then the last thing I want to show is the model, the deployment cluster. So once that model has been saved, we have a model registry in which we can register the model into the platform. And then the last step is to create a deployment clusters. So here on my screen, I have a deployment cluster called taxi deployment. >>And then all these serving end points have been configured for me. And most importantly, this endpoint model. So the deployment cluster is actually a wrap the, uh, train model with the flask wrapper and add a rest endpoint to it so quickly. I can operationalize my model by taking this end point and creating a curl command, or even a post request. So here I have my trusty postman tool in which I can format a post request. So I've taken that end point from the container platform. I've formatted my body, uh, right here. So these are some of the features that I want to send to that model. And I want to know how long this specific taxi ride at this location at this time of day would take. So I can go ahead and send that request. And then quickly I will get an output of the ride. >>Duration will take about 2,600 seconds. So pretty much we've walked through how a data scientists can quickly interact with their notebook. They can train their model. And then coming into the platform, we saw the project repository, we saw the source control. We can register the model within the platform, and then quickly we can operationalize that model with our deployment cluster, uh, and have our model up and running and available for inference. So that wraps up the demo. Uh, I'm gonna pass it back to Doug and Matt and see if they want to come off mute and see if there are any questions, Matt, Doug, you there. Okay. >>Yeah. Hey, Hey Terry, sorry. Sorry. Just had some trouble getting off mute there. Uh, no, that was a, that was an excellent presentation. And I think there are generally some questions that come up when I talk to customers around how integrated into the Kubernetes ecosystem is this capability and where does this sort of Ezreal starts? And the open source, uh, technologies like, um, cube flow as an example, uh, begin. >>Yeah, sure. Matt. So this is kind of one layer up. We have our Emma LOBs tenant and this is all running on a piece of a Kubernetes cluster. So if I log back out and go into the site admin view, this is where you would see all the Kubernetes clusters being created. And it's actually all abstracted away from the data scientists. They don't have to know Kubernetes. They just interact with the platform if they want to. But here in the site admin view, I had this Kubernetes dashboard and here on the left-hand side, I have all my Kubernetes sections. So if I just add some compute hosts, whether they're VMs or cloud compute hosts, like ETQ hosts, we can have these, uh, resources abstracted away from us to then create a Kubernetes cluster. So moving on down, I have created this Kubernetes cluster utilizing those resources. >>Um, so if I go ahead and edit this cluster, you'll actually see that have these hosts, which is just a click and a click and drop method. I can move different hosts to then configure my Kubernetes cluster. Once my Kubernetes cluster is configured, I can then create Kubernetes tenant or in this case, it's a namespace. So once I have this namespace available, I can then go into that tenant. And as my user, I don't actually see that it is running on Kubernetes. So in addition with our ML ops tenants, you have the ability to bootstrap cute flow. So queue flow is a open source machine learning framework that is run on Kubernetes, and we have the ability to link that up as well. So, uh, coming back to my Emma lops tenant, I can log in what I showed is the ASML container platform version of Emma flops. But you see here, we've also integrated QP flow. So, uh, very, uh, a nod to, uh, HPS contribution to, you know, utilizing open source. Um, it's actually all configured within our platform. So, um, hopefully, >>Yeah, actually, Tara, can you hear me? It's Doug. So there were a couple of other questions actually about key flare that came in. I wonder whether you could just comment on why we've chosen cube flow. Cause I know there was a question about ML flow in stead and what the differences between ML flow and coop flow. >>Yeah, sure. So the, just to reiterate, there are some questions about QP flow and I'm just, >>Yeah, so obviously one of, uh, one of the people watching saw the queue flow dashboard there, I guess. Um, and so couldn't help but get excited about it. But there was another question about whether, you know, ML flow versus cube flow and what the difference was between them. >>Yeah. So with flow, it's, it's an open source framework that Google has developed. It's a very powerful framework that comes with a lot of other unique tools and Kubernetes. So with Q flow, you really have the ability to launch other notebooks. You have the ability to utilize different Kubernetes operators like TensorFlow and PI torch. You can utilize a lot of the, some of the frameworks within Q4 to do training like Q4 pipelines, which visually allow you to see your training jobs, uh, within the queue flow. It also has a plethora of different serving mechanisms, such as Seldin, uh, for, you know, deploying your, your machine learning models. You have Ks serving, you have TF serving. So Q4 is very, it's a very powerful tool for data scientists to utilize if they want a full end to end open source and know how to use Kubernetes. So it's just a, another way to do your machine learning model development and right with ML flow, it's actually a different piece of the machine learning pipeline. So ML flow mainly focuses on model experimentation, comparing different models, uh, during the training and it off it can be used with Q4. >>The complimentary Terry I think is what you're saying. Sorry. I know we are dramatically running out of time now. So that was really fantastic demo. Thank you very much, indeed. >>Exactly. Thank you. So yeah, I think that wraps it up. Um, one last thing I want to mention is there is this slide that I want to show in case you have any other questions, uh, you can visit hp.com/asml, hp.com/container platform. If you have any questions and that wraps it up. So thank you guys.

Published Date : Mar 17 2021

SUMMARY :

I'm a data scientist for the ASML container platform team. So I'm going to do some introductions and kind of set the context of what we're going to talk about. the models, and we even have the data scientists themselves who are building and iterating So at the top left, we'll have our, our project depository, which is our, And the benefit of the container platform is that this is all abstracted away from the data scientist. So that's the example that we will talk through today. So the first line magic command that I want to mention is this command called percent attachments. So the data scientists can quickly iterate through a model. So maybe the data scientists will do this. So once that is done, I can print out that the data is done cleaning. So I'm going to be saving it back into the project repository in which we will So here, um, I want to mention that again, this URL is being So here is my login page to the container So this is a more, nothing more than a shared collaborative workspace environment in So on the backend, the container platform will take these credentials and connect So once that model has been saved, we have a model registry in which we can register So I've taken that end point from the container platform. So that wraps up the demo. And the open source, uh, technologies like, um, cube flow as an example, So moving on down, I have created this Kubernetes cluster So once I have this namespace available, So there were a couple of other questions actually So the, just to reiterate, there are some questions about QP flow and I'm just, But there was another question about whether, you know, ML flow versus cube flow and So with Q flow, you really have the ability to launch So that was really fantastic demo. So thank you guys.

ENTITIES

Entity	Category	Confidence
Doug	PERSON	0.99+
Doug Tackett	PERSON	0.99+
Terry Chang	PERSON	0.99+
Terry	PERSON	0.99+
Tara	PERSON	0.99+
Matt	PERSON	0.99+
Python	TITLE	0.99+
Google	ORGANIZATION	0.99+
Matt MCO	PERSON	0.99+
Jupiter	LOCATION	0.99+
Kubernetes	TITLE	0.99+
first line	QUANTITY	0.98+
each	QUANTITY	0.98+
GitHub	ORGANIZATION	0.98+
today	DATE	0.98+
first	QUANTITY	0.98+
about 2,600 seconds	QUANTITY	0.97+
Q4	TITLE	0.97+
A Day in the Life of a Data Scientist	TITLE	0.97+
hp.com/asml	OTHER	0.97+
last decade	DATE	0.97+
one layer	QUANTITY	0.95+
hp.com/container	OTHER	0.92+
single data	QUANTITY	0.91+
Emma	PERSON	0.91+
one large cell	QUANTITY	0.91+
each tenant	QUANTITY	0.88+
one	QUANTITY	0.84+
one last thing	QUANTITY	0.81+
Q flow	TITLE	0.8+
Emma	TITLE	0.8+
ESMO	ORGANIZATION	0.76+
last few years	DATE	0.74+
one of	QUANTITY	0.73+
day	QUANTITY	0.72+
eight GPU	QUANTITY	0.7+
Seldin	TITLE	0.69+
Q4	DATE	0.67+
percent percent	OTHER	0.65+
Ezreal	ORGANIZATION	0.65+
some questions	QUANTITY	0.65+
ASML	TITLE	0.65+
ASML	ORGANIZATION	0.61+
people	QUANTITY	0.49+
ETQ	TITLE	0.46+
Teri	ORGANIZATION	0.4+
Emma	ORGANIZATION	0.35+

Matthias Funke, IBM | IBM Data and AI Forum

>>Live from Miami, Florida. It's the cube covering IBM's data and AI forum brought to you by IBM. >>We're back in Miami. You're watching the cube, the leader in live tech coverage, and we're covering the IBM data and AI forum in the port of Miami. Mateus Fuka is here, he's the director of offering management for hybrid data management. Everything data. But see, it's great to see you. It's great to have you. So be here with you. We're going to talk database, we're gonna talk data warehouse, everything. Data, you know, did the database market, you know, 10 years ago, 12 years, it was kind of boring. Right. And now it's like data's everywhere. Database is exploding. What's your point of view on what's going on in the marketplace? You know, I mean it's funny too. You used to it boring because I think it's the boring stuff that really matters nowadays to get, get things to where you get people to value with the solutions you want to be or the modernization. >>Thea. Yeah. Seeking to do on the data estates. Um, the challenge that you have in embracing multi-cloud data architectures. So to get, to get to, well you have to, do, I have to take care of the boring stuff. How real is multi-cloud? I mean, I know multi-cloud is, is real and that everybody has multiple clouds. But is multi-cloud a strategy or is it a sort of a symptom of multi-vendor and it just, we could have ended up here with the shadow it and everything else. >> I think it's a reality and yes, it should be a strategy, but I think more more clients and not they find themselves being exposed to this as a reality with different lines of businesses, acquiring data, um, estates running on different locations, different clouds, you know, and then companies have challenge if you want to bring it all together and actually the value of that data, um, and make it available for analytics or AI solutions. >>You know, you've got to have a strategy. >> So IBM is one of the few companies that has both a cloud and an aggressive multi-cloud strategy. Um, you know, Amazon's got outpost a little bit here and Microsoft I guess has some stuff, uh, a but, but generally speaking, uh, Oracle has got a little bit here but IBM has both a cloud. So you'd love people to come into your cloud, but you recognize not everybody's gonna come in your club. So you have an aggressive multi-cloud strategy. Why is that? What's the underpinning of that strategy? Is it openness? Is it just market, you know, total available market? Why? So first of all, yes, we have a, we have a strong portfolio on IBM cloud and we think, you know, it's the best in terms of, you know, integration with other cloud services, the performance you get on the different data services. >>But we also have a strategy that says we want to be our clients want to go. And many clients might have committed already on a strategic level to a different cloud, whether that's AWS, you know, why IBM cloud or Asia. And so we want to be ready as clients want to go. And our commitment is to offer them a complete portfolio of data services that support different workloads. And a complete portfolio in terms of, um, your, the IBM, uh, hope heavy set of technologies as well as open source technologies, give clients choice but then make them available across that universe of multicloud hybrid cloud on premise in a way that they get a consistent experience. And you know, I mean you are familiar with the term. Oh, you divide and conquer, right? I like to talk about it as uh, you know, um, unify to conquer. >>So our, our mission is really unified experience and unified the access to different capabilities available across multicloud architects. So is that really the brand promise gotta unify across clouds? Absolutely. That's our mission. And what's the experience like today and what is sort of the optimal outcome that you guys are looking for? Uh, being able to run any database on any cloud anywhere. Describe that. >> So I think, um, you'd be talking about chapter one and two off the cloud, right? When it, when it comes to chapter one in my, in my view, chapter one was very much about attracting people to the cloud by offering them a set of managed services that take away the management headaches and you know, the, the infrastructure, uh, management aspects. Um, but when you think about chapter two, when you think about how to run, uh, mission critical workloads on, on a cloud or on premise, um, you know, you want to have the ability to integrate data States that run in different environments and we think that OpenShift is leveling the playing field by avoiding location, by, by giving clients the ability to basically abstract from PI, Teri cloud infrastructure services and mechanisms. >>And that gives them freedom of action. They can, they can deploy a certain workload in one in one place and then decide six months later that they are better off moving that workload somewhere else. Yes. >> So OpenShift is the linchpin, absolutely. That cross-cloud integration, is that right? Correct. And with the advent of the rise of the operator, I think you see, you know, you see, um, the industry closing the gap between the value proposition of a fully managed service and what a client managed open shift based environment can deliver in terms of automation, simplicity and annual Oh value. Let's talk about the database market and you're trying to, what's happening? You've got, you know, transactional database, you've got analytic database, you've got legacy data warehouses, you've got new, emerging, emerging, you know, databases that are handling unstructured data. You got to know sequel, not only sequel lay out the landscape and where, what's IBM strategy in the database space? >>So our strategy has, has, so starting with the DB to family, right? We have introduced about two one, two years ago we introduced somebody called Tacoma sequel engine. That gives you a consistent, um, experience in from an application and user perspective in the way you consume, um, data for different workload types. Whether that's transactional data, um, analytical use cases, speak data overdue or fast data solution events, different data architectures, everything, you know, with a consistent experience from a management perspective, from a, from a working behavior perspective in the way you interact with, with this as an application. And, and not only that, but also then make that available on premises in the cloud, fully managed or now open shift based on any cloud. Right. So our, our, I would say our commitment right now is very much focusing on leveraging OpenShift, leveraging cloud pick for data as a platform to deliver all these capabilities DB to an open source in a unified and consistent. >>Uh, I would say with a unified and consistent experience on anybody's cloud, it's like what's in any bag was first, you know, like six months ago when we announced it. And I think now for us doing the same with data and making that data, make it easy for people to access state our way every to the sides is really, but Ts, what's IBM's point of view on, on the degree of integration that you have to have in that stack from hardware and software. So people, some people would argue, well you have to have the same control plane, same data plane, same hardware, same software, same database on prem as you have in the cloud. What's your thoughts on that degree of homogeneity that's required to succeed? So I think it's certainly something that, uh, companies strive to get to simplify the data architectures, unify, consolidate, reduced the number of data sources that you have to deal with. >>But the reality is that the average enterprise client has 168 different data services they have to incorporate, right? So to me it's a moving target and while you want to consolidate, you will never fully get there. So I think our approach is we want to give to client choice best different choice in terms of technologies for for the same workload type. Potentially, whether it's a post test for four transactional workloads for TB, two for transactional workloads, whatever fits the bill, right? And then at the same time, um, at the same time abstract or unify on top of that by, by when you think about operators and OpenShift, for instance, we invest in a, in um, in operators leveraging a consistent framework that basically provides, you know, homogeneous set of interfaces by which people could deploy and life cycle manager Postgres instance or DB two instance. >>So you need only one skillset to manage all these different data services and you know, it reduces total cost of ownership is it provides more agility and, and you know, you know, accelerates time to value for this client. So you're saying that IBM strategy recognizes the heterogeneity within the client base, right? Um, you're not taking, even though you might have a box somewhere in the portfolio, but you're not a, you need this box only strategy. The God box. This is, this is the hammer and every opportunity is a nail. Yeah, we have way beyond that. So we, we are much more open in the way we embrace open source and we bring open source technologies to our enterprise clients and we invest in integration of these different technologies so they can, the value of those can be actuated much more in a much more straightforward fashion. >>The thing about cloud pay for data and the ability to access data insights in different open Sozo, different depositories, IBM, one third party, but then make that data accessible through data virtualization or full governance, applying governance to the data so that data scientists can actually get reef that data for, for his work. That is really important. Can you argue that that's less lock-in than say like they say the God box approach or the cloud only approach? Yeah, absolutely. Because how so? How so? Because, well, because we give you choice to begin with, right? And it's not only choice in terms of the data services and the different technologies that are available, but also in terms of the location where you deploy these data services and how you run them. Okay. So to me it's all about exit strategies. If I go down a path and a path doesn't work for me, how do I get out? >>Exactly. Um, is that a concern of customers in terms of risk management? Yeah. I think, look, every, every costume out there, I daresay, you know, has a data strategy and every customer needs to make some decisions. But you know, there's only so much information you have today to make that decision. But as you learn more, your decision might change six months down the road. And you know, how to preserve that agility as a business to do course corrections I think is really important. So, okay, a hypothetical that this happens every day. You've got a big portfolio companies, they've done a lot of M and a, they've got, you know, 10 different databases that they're running. They got different clouds that they're using, they've got different development teams using, using different tooling, certainly different physical infrastructure. And they really haven't had a strategy to bring it all together. >>Uh, you're hired as the, uh, the data architect or the CTO of the company and say, but Tia's, the CEO says, fix this problem. You're not, we're not taking advantage, uh, and leveraging our data. Where do you start? So of course, being IBM, I would recommend to start with clapping for data as the number one data platform out there because eventually every component will want to capitalize on the value that the data represents. It's not just about a data layer is not just about a database, it's about an indicated solutions tech that gets people to do analytics over the data, the life insights from the data. That's number one. But even if you are, you know, if, if, if it's not I the IBM stack, right, I would always recommend to the client to think about a strategy that that allows for the flexibility change to change course wide and move workloads from one location to another or move data from one technology stack to another. >>And I think that that kind of, you know, that agility and flexibility and, um, translate into, um, risk mitigation strategies that every client should think about. So cloud pack for data, it's okay, let's start there. I'm gonna, I'm gonna, I'm gonna install that, or I'm gonna access that in, into the cloud. And then what do I have to do as a customer to take advantage of that? Do I just have to point it at my data stores? What are the prerequisites? Yeah. So let's say you deploy that on IBM cloud, right? Then you have, you usually are invested already. So you have data, large data estates either residing on share is already in the cloud. You can pull those, those, those datasets in remotely without really moving the workload of the data sets into a cloud pixel, data managed environment by using technologies like data virtualization, right? >>Or using technologies like data stage and ETL capabilities, um, to access the data. But you can also, as you modernize and you build your next next generation application, you can do that within that managed environment with OpenShift. And, and that's what most people want to do. They want to do a digital transformation. They want to modernize the workloads, but we want to leverage the existing investments that they have been making over the last decade. Okay. So, but there's a discovery phase, right, where you bring in cloud pack for data to say, okay, what do I have? Yup, go find it. And then it's bringing in necessary tooling on the, on the diff with the development side with things like OpenShift and then, and then what it's magically virtualizes my data is that, so just on that point, I think you know, the, what made us much more going forward for his clients is how they can incorporate different datasets with adding insure in open source technologies or, or DB two on a third party vendor said, I don't want to mention right now, but, but what matters more is, so how do I make data accessible? >>How do I discover the data set in a way that I can automatically generate metadata? So I have a business glossary, I have metadata and I understand various data sets. Lyft, that's their vision objective business technology objectives. To be able to do that and to what's watching knowledge catalog, which is part of topic for data is a core component that helps you with dead auto discover the metadata generation basically generating, okay, adding data sets in a way that they are now visible to the data scientists and the ultimate end user. What really matters and I think what is our vision overall is the ability to service the ultimate end user medicine developer, a data scientist, so business analysts so that they can get a chip done without depending on it. Yeah, so that metadata catalog is part of the secret sauce that'll that that allows the system to know what data lives, where, how to get to it and and how to join it. >>Since one of the core elements of that, of that integrated platform and solution state board. What I think is really key here is the effort we spend in integrating these different components so that it is, it is, it looks seamlessly, it is happening in an automated fashion that as much as possible and it delivers on that promise of a self service experience for that person that sits at the very end of that. Oh, if that chain right, but to your sex so much for explaining that QA for coming on the cube. Great to meet you. All right. Keep it right there everybody. We'll be back with our next guest right after this short break. You're watching the cube from the IBM data and AI forum in Miami. We'll be right back.

Published Date : Oct 22 2019

SUMMARY :

IBM's data and AI forum brought to you by IBM. to get, get things to where you get people to value with the solutions you want to be or the modernization. So to get, to get to, well you have to, locations, different clouds, you know, and then companies have challenge if you want to bring it all together and it's the best in terms of, you know, integration with other cloud services, I like to talk about it as uh, you know, um, unify to conquer. So is that really the brand promise gotta unify services that take away the management headaches and you know, the, the infrastructure, And that gives them freedom of action. you know, you see, um, the industry closing the gap between the value proposition of a fully managed service perspective in the way you consume, um, data for different workload types. that you have to have in that stack from hardware and software. So to me it's a moving target and while you want So you need only one skillset to manage all these different data services and you know, it reduces total cost technologies that are available, but also in terms of the location where you deploy these data services And you know, how to preserve that agility as a business to But even if you are, you know, if, if, if it's not I the IBM stack, right, And I think that that kind of, you know, that agility and flexibility and, um, translate I think you know, the, what made us much more going forward for his clients that that allows the system to know what data lives, where, how to get to it and Oh, if that chain right, but to your sex so much

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Mateus Fuka	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Matthias Funke	PERSON	0.99+
Miami	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Miami, Florida	LOCATION	0.99+
six months	QUANTITY	0.99+
six months ago	DATE	0.99+
Sozo	ORGANIZATION	0.98+
today	DATE	0.98+
both	QUANTITY	0.98+
one skillset	QUANTITY	0.98+
two years ago	DATE	0.97+
OpenShift	TITLE	0.97+
10 years ago	DATE	0.97+
one	QUANTITY	0.97+
six months later	DATE	0.97+
first	QUANTITY	0.96+
IBM Data	ORGANIZATION	0.96+
Tia	ORGANIZATION	0.96+
Asia	LOCATION	0.95+
one place	QUANTITY	0.95+
10 different databases	QUANTITY	0.92+
last decade	DATE	0.82+
168 different data services	QUANTITY	0.74+
Lyft	ORGANIZATION	0.74+
12 years	QUANTITY	0.73+
IBM cloud	ORGANIZATION	0.71+
IBM data and AI forum	ORGANIZATION	0.66+
Tacoma	TITLE	0.64+
chapter one	OTHER	0.63+
chapter two	OTHER	0.61+
Forum	ORGANIZATION	0.6+
Teri	ORGANIZATION	0.56+
two one	DATE	0.52+
God	PERSON	0.52+
chapter	OTHER	0.52+
DB two	TITLE	0.49+
Postgres	TITLE	0.38+
AI	EVENT	0.3+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Teri: