Jean-Pierre Dijcks, Oracle - On the Ground - #theCUBE

>> Narrator: The Cube presents, On the Ground. (techno music) >> Hi I'm Peter Burris, welcome to, an On the Ground here at Oracle Headquarters, with Silicon Angle media The Cube. Today we're talking to JP Dijcks, who is the master product manager inside, or one of the master product managers, inside Oracle's big data product group, welcome JP. >> Thank you Peter. >> Well, we're going to talk about how developers get access to this plethora, this miasma, this unbelievable complexity of data that's being made possible by IOT, traditional applications, and other sources, how are developers going to get access to this data? >> That's a good question Peter, I still think that one of the key aspects to getting access to that data is SQL, and so that's one of the ways we are driving, try to figure out, can we get the Oracle SQL engine, and all the richness of SQL analytics enabled on all of that data, no matter the what the format is, or no matter where it lives, how can I enable those SQL analytics on that, and then obviously we've all seemed to shift in APIs, and languages, like people don't necessarily always want to speak SQL and write SQL questions, or write SQL queries. So how do we then enable things like R, how do we enable plural, how do we enable Python, all sorts of things like that, how do we do that, and so the thought we had was, can we use SQL as the common meta-data interface? And the common structure around some of this, and enable all of these languages on top of that through the database. So that's kind of the baseline of what we're thinking of, of enabling this to developers and large communities of users. So that's SQL as an access method, do you also envision that SQL will also be a data creation language? As we think about how to envision big data coming together from a modeling perspective. >> So I think from a modeling perspective the meta-data part we certainly look at as a creation or definition language is probably the better word, how do I do structured queries, 'cause that's what SQL stands for, how do I do that on Jason documents, how do I do that on IOT data as you said, how do I get that done, and so we certainly want to create the meta-data, in like a very traditional data base catalog, or if you compare to a Hive Catalog, very much like that. The execution is very different, it uses the mechanisms under the cover that no SQL data bases have, or that Hadoop HDFS offer, and we certainly have no real interest in doing insert into Hadoop, 'cause the transaction mechanisms work very very differently, so its really focused on the meta-data areas and how do I expose that, how do I classify and categorize that data in ways people know and have seen for years. >> So that data manipulation will be handled by native tools, and some of the creations, some of the generation, some of the modeling will be handled now inside SQL, and there are a lot of SQL folks out there that have pretty good afinity for how to work with data. >> That's absolutely correct. >> So that's what it is, now how does it work? Tell us a bit about how this big data SQL is going to work, in a practical world. >> Okay. So we talked about the modeling already. The first step is that we extend the Oracle database and the catalog to understand things like Hive objects or HDFS kind of, where does stuff live. So we expanded and so we found a way to classify the meta-data first and foremost. The real magic is leveraging the Hadoop stack, so you ask a BI question and you want to join data in Oracle transactions, finance information, let's say with IOT data, which you'd reach out to HDFS for, big data SQL runs on the Hadoop notes, so it's local processing of that data, and it works exactly as HDFS and Hadoop work, in other words, I'm going to do processing local, I'm going to ask the name note which blocks am I supposed to read, that'll get run, we generate that query, we put it down to the Hadoop notes. And that's when some of the magic of SQL kicks in, which is really focused on performance, its performance, performance, performance, that's always the problem with federated data, how do I get it to perform across the board. And so what we took was, >> Predictably. >> Predictably, that's an interesting one, predictable performance, 'cause sometimes it works, sometimes it doesn't. So what we did is we took the exadata that was stored on the software, with all the magic as to how do I get a performance out of a file system out of IO, and we put that on the Hadoop notes, and then we push the queries all the way down to that software, and it does filtering, it does predicate pushdown, it leverages features like Parquet and ORC on the HDFS side, and at the end of the day, it kind of takes the IO requests, which is what a SQL query gives, feeds it to the Hadoop notes, runs it locally, and then sends it back to the database. And so we filter out a lot of the gunk we don't need, 'cause you said, oh I only need yesterdays data, or whatever the predicates are, and so that's how we think we can get an architecture ready that allows the global optimization, 'cause we can see the entire ecosystem in its totality, IOT, Oracle, all of it combined, we optimized the queries, push everything down as far as we can, algorithms to data, not data to algorithms, and that's how we're going to run this performance, predictably performance, on all of these pieces of data. >> So we end up with, if I got this right, let me recap, so we've got this notion that for data creation, data modeling, we can now use SQL, understood by a lot of people, doesn't preclude us from using native tools, but at least that's one place where we can see how it all comes together, we continue to use local tools for the actual manipulation elements. >> Absolutely. >> We are now using synergy like structures so we can push algorithm down to the data, so we're moving a small amount of data to a large amount of data, 'cause its cost down and improves predictability, but at the same time we've got meta-data objects that allow us to anticipate with some degree of predictability how this whole thing will run, and how this will come together back at the keynote, got that right? >> Got that right. >> Alright, so, next question is what's the impact of doing it this way? Talk a bit about, if you can, about how its helping folks who run data, who build applications, and who actually who are trying to get business value out of this whole process. >> So if we start with the business value, I think the biggest thing we bring to the table is simplicity, and standardization. If I have to understand how is this object represented in NoSQL, how in HDFS, how did somebody put a Jason file in here, I have to now spend time on literally digging through that, and then does it conform, do I have to modify it, what do I do? So I think the business value comes out of the SQL layer on top of it. It all looks exactly the same. It's well known, it's well understood, its far quicker to get from, I've got a bunch of data, to actually building a VI report, building a dashboard, building KPIs, and integrating that data, there's nothing new to data, its a level of abstraction we put on top of this, whether you use API or in this case we use SQL, 'cause that's the most common analytics language. So that's one part of how it will impact things. The 2nd is, and I think that's where the architecture is completely unique, we keep complete control of the query execution, from the meta-data we just talked about, and that enables us to do global optimization, and we can, and if you think this through a little bit, and go, oh global optimization sounds really cool, what does that mean? I can now actually start pushing processing, I can move data, and its what we've done in the exadata platform for years, data lives on disk, oh, Peter likes to query it very frequently, let's move it up to Flash, let's move it up to in-memory, let's twist the data around. So all the sudden we got control, we understand what gets queried, we understand where data lives, and we can start to optimize, exactly for the usage pattern the customer has, and that's always the performance aspect. And that goes to the old saying of, how can I get data as quickly to a customer when he really needs it, that's what this does, right, how can I optimize this? I've got thousands of people querying certain elements, move them up in the stack and get the performance and all these queries come back in like seconds. Regulatory stuff that needs to go through like five years of data, let's put it in cheap areas, and let's optimize that, and so the impact is cheaper and faster at the end of the day, and all 'cause there's a singular entity almost that governs the data, it governs the queries, it governs the usage patterns, that's what we uniquely bring to the table with this architecture. >> So I want to build on the notion of governance, because actually one of the interesting things you said was the idea that if its all under a common sort of interfaces, then you have greater visibility, where the data is, who owns it, et cetera. If you do this right, one of the biggest challenges that business are having is the global sense of how you govern your data. If you do this right, are you that much closer to having a competent overall data governance? >> I think we were able to set up a big step forward on it, and it sounds very simple, but we now have a central catalog, that actually understands what your data is and where it lives, in kind of like a well-known way, and again it sounds very simple but if you look at silos, that's the biggest problem, you have multiple silos, multiple things are in there, nobody knows really what's in there, so here we start to publish this in like a common structural layer, we have all the technical meta-data, we track who queries what, who does all those things, so that's a tremendous help in governance. The other side of course, because we still use native tools to let's say manipulate some data, or augment or add new data, we now are going to tie in a lot of the meta-data, that comes from say the Hadoop ecosystem, again into this catalog, and while we're probably not there yet just today on the end to end governance everything's kind of out of the box, here we go. >> And probably never will be. >> And we probably never will, you're right, and I think we set a major step forward with just consolidating it, and exposing people to all the data the have, and you can run all the other tools like, crawl my data and check box anything that says SSN, or looks like a social security number, all of those tools are are still relevant. We just have a consolidated view, dramatically improved governance. >> So I'm going to throw you a curve ball. >> Sure. >> Not all data I want to use is inside my business, or is being generated by sensors that I control, how does big data SQL and related technologies play a role in the actual contracting for additional data sources, and sustaining those relationships that are very very fundamental, how data's shared across organizations. Do you see this information being brought in under this umbrella? Do you see Oracle facilitating those types of relationships, introducing standards for data sharing across partnerships becomes even easier? >> I'm not convinced that big data SQL as a technology is going to solve all the problems we see there, I'm absolutely convinced that Oracle is going to work towards that, you see it in so many acquisitions we've done, you see it in the efforts of making data as a service available to people, and to some extent big data SQL will be a foundation layer to make BI queries run smoother across more and more and more pillars of data. If we can integrate database, Hadoop, and NoSQL, there's nothing that says, oh and by the way, storage cloud. >> And we have relatively common physical governance, that I have the same physical governance, and you have the same physical governance, now its easier for us to show how we can introduce governance across our instances. >> Absolutely, and today we focus a lot on HDFS or Hadoop as the next data pillar, storage cloud, ground to cloud, all of those are on the roadmap for big data SQL to catch up with that, and so if you have data as a service, let's declare that cloud for a second, and I have data in my database in my Hadoop cluster, again, all now becomes part of the same ecosystem of data, and it all looks the same to me from a BI query perspective, from an analytics perspective. And then the, how do I get the data sharing standards set up and all that, part of that is driving a lot of it into cloud, and making it all as a service, 'cause again you put a level of abstraction on top of it, that makes it easier to consume, understand where it came from, and capture the meta-data. >> So JP one last question. >> Sure. >> Oracle opens worlds on the horizon, what are you looking for, or what will your customers be looking for as it pertains to this big data SQL and related technologies? >> I think specifically from a big data SQL perspective, is we're going to drive the possible adoption scope much much further, today we work with HDFS an we work with Oracle database, we're going to announce certain things like exadata, Hadoop will be supportive, we hold down super cluster support, we're going to dramatically expand the footprint big data SQL will run on, people who come for big data SQL or analytics sessions you'll see a lot of the roadmap looking far more forward. I already mentioned some things like ground to cloud, how can I run big data SQL when my exadata is on Premis, and then the rest of my HDFS data is in the cloud, we're going to be talking about how we're going to do that, and what do we think the evolution of big data SQL is going to be, I think that's going to be a very fun session to go to. >> JP Dijcks, a master product manager inside the Oracle big data product group, thank you very much for joining us here On the Ground, at Oracle headquarters, this is The Cube.

Published Date : Sep 6 2016

SUMMARY :

Narrator: The Cube presents, On the Ground. or one of the master product managers, and so that's one of the ways we are driving, and so we certainly want to create the meta-data, and some of the creations, some of the generation, So that's what it is, now how does it work? and the catalog to understand things like Hive objects and so that's how we think we can get an architecture ready So we end up with, if I got this right, let me recap, and who actually who are trying to get business value out of and we can, and if you think this through a little bit, because actually one of the interesting things you said everything's kind of out of the box, here we go. and I think we set a major step forward and sustaining those relationships that are and to some extent big data SQL will be a foundation and you have the same physical governance, Absolutely, and today we focus a lot on HDFS or Hadoop and what do we think the evolution the Oracle big data product group,

ENTITIES

Entity	Category	Confidence
Peter Burris	PERSON	0.99+
Peter	PERSON	0.99+
JP Dijcks	PERSON	0.99+
Jean-Pierre Dijcks	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
JP	PERSON	0.99+
five years	QUANTITY	0.99+
Jason	PERSON	0.99+
Python	TITLE	0.99+
SQL	TITLE	0.99+
NoSQL	TITLE	0.99+
2nd	QUANTITY	0.98+
first step	QUANTITY	0.98+
today	DATE	0.98+
HDFS	ORGANIZATION	0.98+
Today	DATE	0.98+
one	QUANTITY	0.97+
Hadoop	TITLE	0.96+
Parquet	TITLE	0.96+
one part	QUANTITY	0.95+
The Cube	ORGANIZATION	0.95+
thousands of people	QUANTITY	0.94+
yesterdays	DATE	0.94+
ORC	TITLE	0.93+
Silicon Angle	ORGANIZATION	0.92+
Flash	TITLE	0.91+
first	QUANTITY	0.84+
years	QUANTITY	0.83+
a second	QUANTITY	0.82+
the Ground	TITLE	0.82+
Hadoop HDFS	TITLE	0.81+
a lot of people	QUANTITY	0.8+
one place	QUANTITY	0.77+
The Cube	TITLE	0.6+
singular	QUANTITY	0.58+
Narrator	TITLE	0.54+
last question	QUANTITY	0.52+
IOT	ORGANIZATION	0.37+

Archana Venkatraman, IDC | Actifio Data Driven 2019

>> from Boston, Massachusetts. It's the queue covering active eo 2019. Data driven you by activity. >> Hi. We're right outside of the Boston Haba. You're watching >> the cube on stew Minimum in. And this is active Geo data driven. 2019 due date. Two days digging into, You >> know, the role of data inside Cos on, you know, in an ever changing world, happy to welcome to the program of first time guests are China Oven countrymen who's a research manager at I. D. C. Coming to us from across the pond in London. Thanks so much for joining us. Pleasure. So tell us a little bit. I d c. We know. Well, you know, the market landscapes, you know, watching what's happening. Thie said it 77 Zita bites that was put up in the keynote. Came came from I D. C. Tells you you're focused. >> Yeah, so I'm part of the data protection and storage research team, But I have, ah, European focus. I covered the Western European markets where data protection is almost off a neurotic interest to us. So a lot of our investment is actually made on the context of data protection. And how do I become data driven without compromising on security and sovereignty and data locality. So that's something that I look at. I'm also part of our broader multi cloud infrastructure team on also develops practice. I'm looking at all these modern new trends from data perspective as well. So it's kind of nice being >> keeping you busy, huh? Yeah. So about a year ago, every show that I went to there would be a big clock up on the Kino stage counting down until gpr went way actually said on the Q. Many times it's like we'll know when GPR starts with lawsuits. Sister and I feel like it was a couple of days, if not a couple of weeks before some of the big tech firms got sued for this. So here we are 2019. It's been, you know, been a while now since since since this launch. How important is GDP are you know what? How is that impacting customers and kind of ripple effect? Because, you know, here in the States, we're seeing some laws in California and beyond that are following that. But they pushed back from the Oh, hey, we're just gonna have all the data in the world and we'll store it somewhere sure will protect it and keep it secure. But but But >> yeah, yeah, so it's suggestive. Here is a game changer and it's interesting you said this big clock ticking and everybody has been talking about it. So when the European Commission >> announced repairs >> coming, organizations had about two years to actually prepare for it. But there were a lot of naysayers, and they thought, This is not gonna happen. The regulators don't have enough resources to actually go after all of these data breaches, and it's just too complicated. Not everyone's going complaints just not gonna happen. But then they realised that the regulators we're sticking to it on towards the end. Towards the last six months in the race to GDP, and there was this helter skelter running. Their organizations were trying to just do some Die Ryan patch of exercise to have that minimum viable compliance. So there they wanted to make sure that they don't go out of business. They don't have any major data breaches when Jean Pierre comes a difference that that was the story of 2018 although they have so much time to react they didn't on towards the end. They started doing a lot of these patch up work to make sure they had that minimum by the compliance. But over time, what we're seeing is that a lot off a stewed organizations are actually using GDP are as to create that competitive differentiations. If you look at companies like Barclays, they have been so much on top of that game on DH. They include that in their marketing strategies and the corporate social responsibility to say that, Hey, you know our business is important to us, but your privacy and your data is much more valuable to us, and that kind of instantly helps them build that trust. So they have big GDP, our compliance into their operations so much and so well that they can actually sell those kind of GPR consultancy services because they're so good at it. And that's what we are seeing is happening 2019 on DH. Probably the next 12 to 18 months will be about scaling on operational izing GDP are moving from that minimum viable compliance. >> Its interest weighed a conversation with Holly St Clair, whose state of Massachusetts and in our keynote this morning she talked about that data minimalist. I only want as much data as I know what I'm going to do. How I'm goingto leverage it, you know, kind of that pendulum swing back from the I'm goingto poured all the data and think about it later. It is that Did you see that is a trend with, you know, is that just governments is that, you know, you seeing that throughout industries and your >> interesting. So there was seven gpr came into existence. There were a lot of these workshops that were happening for on for organizations and how to become GDP. And there was this Danish public sector organization where one of the employees went to do that workshop was all charged up, and he came back to his employer and said, Hey, can you forget me on it Took that organization about 14 employees and three months to forget one person. So that's the amount of data they were holding in. And they were not dilating on all the processes were manual which took them so long to actually forget one person on. So if you don't cleanse a pure data act now meeting with all these right to be forgotten, Andi, all these specific clauses within GPR is going to be too difficult. And it's going to just eat up your business >> tryingto connecting the dots here. One of the one of the big stumbling blocks is if you look at data protection. If I've got backup, if I've got archive, I mean, if I've taken a snapshot of something and stuck that under a mountain in a giant tape and they say forget about me Oh, my gosh, Do I have to go retrieve that? I need to manage that? The cost could be quite onerous. Help! Help us connect the dots as to what that means to actually, you know, what are the ramifications of this regulation? >> Yeah, So I think so. Judy PR is a beast. It's a dragon off regulations. It's important to dice it to understand what the initial requirements are on one was the first step is to get visibility and classified the data as to what is personal data. You don't want to apply policies to all the data because I might be some garbage in there, so you need to get visibility on A says and classified data on what is personal data. Once you know what data is personal, what do you want to retain? That's when you start applying policies too. Ensure that they are safe and they're anonymous. Pseudonym ized. If you want to do analytics at a later stage on DH, then you think about how you meet. Individual close is so see there's a jeep airframe, but you start by classifying data. Then you apply specific policies to ensure you protect on back up the personal data on. Then you go about meeting the specific requirements. >> What else can you tell us about kind of European markets? You know, I I know when I look at the the cloud space, governance is something very specific to, and I need to make sure my data doesn't leave the borders and like what other trends in you know issues when you hear >> it from Jenny Peered forced a lot ofthe existential threat to a lot of companies. Like, say, hyper scale. Er's SAS men does so they were the first ones to actually become completely compliant to understand their regulations, have European data data hubs, and to have those data centres like I think At that time, Microsoft had this good good collaboration with T systems to have a local data center not controlled by Microsoft, but by somebody who is just a German organizations. You cannot have data locality more than that, right? So they were trying different innovative ways to build confidence among enterprises to make sure that cloud adoption continues on what was interesting. That came out from a research was that way thought, Gee, DPR means people's confidence and cloud is going to plunge. People's confidence in public cloud is going to pledge. That didn't happen. 42% of organizations were still going ahead with their cloud strategies as is, but it's just that they were going to be a lot more cautious. And they want to make sure that the applications and data that they were putting in the cloud was something that they had complete visibility in tow on that didn't have too much of personal data and even if it had, they had complete control over. So they had a different strategy off approaching public cloud, but it didn't slow them down. But over time they realised that to get that control ofthe idea and to get that control of data. They need to have that multiple multi cloud strategy because Cloud had to become a two way street. They need to have an exit strategy. A swell. So they tried to make sure that they adopted multiple cloud technologies and have the data interoperability. Ahs Well, because data management was one of their key key. Top of my prayer. >> Okay, last question I had for you. We're here at the active you event. What? What do you hear from your customers about Octavio? Any research that you have relevant, what >> they're doing, it's going interesting. So copy data management. That's how active you started, right? They created a market for themselves in this competition, a management and be classified copy data management within replication Market on replication is quite a slow market, but this copy data management is big issue, and it's one of the fastest growing market. So So So they started off from a good base, but they created a market for themselves and people started noticing them, and now they have kind of grown further and grown beyond and tried to cover the entire data management space. Andi, I think what's interesting and what's going to be interesting is how they keep up the momentum in building that infrastructure, ecosystem and platform ecosystem. Because companies are moving from protecting data centers to protecting centers of data on if they can help organizations protect multiple centers of data through a unified pane of glass, I have a platform approach to data management. Then they can help organizations become data drivers, which gives them the competitive advantage. So if they can keep up that momentum there going great guns, >> Thank you so much for joining us in Cheshire, sharing the data that you have in the customer viewpoints from Europe. So we'll be back with more coverage here from Active EO data driven 2019 in Boston. Mess fuses on stew Minimum. Thanks for watching the Q. Thank you.

Published Date : Jun 18 2019

SUMMARY :

Data driven you by activity. Hi. We're right outside of the Boston Haba. the cube on stew Minimum in. Well, you know, the market landscapes, you know, watching what's happening. So a lot of our investment is actually made on the context of data protection. you know, been a while now since since since this launch. Here is a game changer and it's interesting you said and the corporate social responsibility to say that, Hey, you know our business is important to It is that Did you see that is a trend with, So that's the amount of data they were holding in. One of the one of the big stumbling blocks is if you look at data protection. It's important to dice it to understand what the initial requirements are on one but it's just that they were going to be a lot more cautious. We're here at the active you event. So if they can keep up that momentum there Thank you so much for joining us in Cheshire, sharing the data that you have in the customer viewpoints from

ENTITIES

Entity	Category	Confidence
Microsoft	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
California	LOCATION	0.99+
Archana Venkatraman	PERSON	0.99+
Cheshire	LOCATION	0.99+
European Commission	ORGANIZATION	0.99+
2019	DATE	0.99+
Barclays	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
London	LOCATION	0.99+
three months	QUANTITY	0.99+
2018	DATE	0.99+
Jenny Peered	PERSON	0.99+
42%	QUANTITY	0.99+
Massachusetts	LOCATION	0.99+
Boston, Massachusetts	LOCATION	0.99+
first step	QUANTITY	0.99+
Two days	QUANTITY	0.99+
SAS	ORGANIZATION	0.99+
one person	QUANTITY	0.98+
GPR	ORGANIZATION	0.98+
one	QUANTITY	0.98+
about two years	QUANTITY	0.97+
Andi	PERSON	0.97+
One	QUANTITY	0.96+
Jean Pierre	PERSON	0.96+
gpr	ORGANIZATION	0.96+
two way	QUANTITY	0.93+
I D. C.	LOCATION	0.93+
Octavio	ORGANIZATION	0.93+
first time	QUANTITY	0.92+
China	LOCATION	0.92+
I. D. C.	LOCATION	0.91+
DPR	ORGANIZATION	0.89+
first ones	QUANTITY	0.89+
about 14 employees	QUANTITY	0.88+
about	DATE	0.87+
Judy PR	PERSON	0.87+
this morning	DATE	0.86+
Boston Haba	LOCATION	0.85+
one of the employees	QUANTITY	0.85+
Holly St Clair	PERSON	0.84+
Active EO	ORGANIZATION	0.82+
Western European	LOCATION	0.82+
last six months	DATE	0.81+
a year ago	DATE	0.81+
DH	ORGANIZATION	0.8+
Actifio Data Driven	TITLE	0.74+
months	DATE	0.67+
gpr	TITLE	0.65+
next 12	DATE	0.65+
European	OTHER	0.61+
seven	QUANTITY	0.6+
Danish	OTHER	0.6+
77 Zita	QUANTITY	0.59+
couple of days	QUANTITY	0.57+
German	LOCATION	0.56+
weeks	QUANTITY	0.56+
European	LOCATION	0.5+
IDC	ORGANIZATION	0.48+
18	QUANTITY	0.38+
Ryan	ORGANIZATION	0.34+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Jean-Pierre Dijcks: