Image Title

Search Results for Hadoop HDFS:

Jean-Pierre Dijcks, Oracle - On the Ground - #theCUBE


 

>> Narrator: The Cube presents, On the Ground. (techno music) >> Hi I'm Peter Burris, welcome to, an On the Ground here at Oracle Headquarters, with Silicon Angle media The Cube. Today we're talking to JP Dijcks, who is the master product manager inside, or one of the master product managers, inside Oracle's big data product group, welcome JP. >> Thank you Peter. >> Well, we're going to talk about how developers get access to this plethora, this miasma, this unbelievable complexity of data that's being made possible by IOT, traditional applications, and other sources, how are developers going to get access to this data? >> That's a good question Peter, I still think that one of the key aspects to getting access to that data is SQL, and so that's one of the ways we are driving, try to figure out, can we get the Oracle SQL engine, and all the richness of SQL analytics enabled on all of that data, no matter the what the format is, or no matter where it lives, how can I enable those SQL analytics on that, and then obviously we've all seemed to shift in APIs, and languages, like people don't necessarily always want to speak SQL and write SQL questions, or write SQL queries. So how do we then enable things like R, how do we enable plural, how do we enable Python, all sorts of things like that, how do we do that, and so the thought we had was, can we use SQL as the common meta-data interface? And the common structure around some of this, and enable all of these languages on top of that through the database. So that's kind of the baseline of what we're thinking of, of enabling this to developers and large communities of users. So that's SQL as an access method, do you also envision that SQL will also be a data creation language? As we think about how to envision big data coming together from a modeling perspective. >> So I think from a modeling perspective the meta-data part we certainly look at as a creation or definition language is probably the better word, how do I do structured queries, 'cause that's what SQL stands for, how do I do that on Jason documents, how do I do that on IOT data as you said, how do I get that done, and so we certainly want to create the meta-data, in like a very traditional data base catalog, or if you compare to a Hive Catalog, very much like that. The execution is very different, it uses the mechanisms under the cover that no SQL data bases have, or that Hadoop HDFS offer, and we certainly have no real interest in doing insert into Hadoop, 'cause the transaction mechanisms work very very differently, so its really focused on the meta-data areas and how do I expose that, how do I classify and categorize that data in ways people know and have seen for years. >> So that data manipulation will be handled by native tools, and some of the creations, some of the generation, some of the modeling will be handled now inside SQL, and there are a lot of SQL folks out there that have pretty good afinity for how to work with data. >> That's absolutely correct. >> So that's what it is, now how does it work? Tell us a bit about how this big data SQL is going to work, in a practical world. >> Okay. So we talked about the modeling already. The first step is that we extend the Oracle database and the catalog to understand things like Hive objects or HDFS kind of, where does stuff live. So we expanded and so we found a way to classify the meta-data first and foremost. The real magic is leveraging the Hadoop stack, so you ask a BI question and you want to join data in Oracle transactions, finance information, let's say with IOT data, which you'd reach out to HDFS for, big data SQL runs on the Hadoop notes, so it's local processing of that data, and it works exactly as HDFS and Hadoop work, in other words, I'm going to do processing local, I'm going to ask the name note which blocks am I supposed to read, that'll get run, we generate that query, we put it down to the Hadoop notes. And that's when some of the magic of SQL kicks in, which is really focused on performance, its performance, performance, performance, that's always the problem with federated data, how do I get it to perform across the board. And so what we took was, >> Predictably. >> Predictably, that's an interesting one, predictable performance, 'cause sometimes it works, sometimes it doesn't. So what we did is we took the exadata that was stored on the software, with all the magic as to how do I get a performance out of a file system out of IO, and we put that on the Hadoop notes, and then we push the queries all the way down to that software, and it does filtering, it does predicate pushdown, it leverages features like Parquet and ORC on the HDFS side, and at the end of the day, it kind of takes the IO requests, which is what a SQL query gives, feeds it to the Hadoop notes, runs it locally, and then sends it back to the database. And so we filter out a lot of the gunk we don't need, 'cause you said, oh I only need yesterdays data, or whatever the predicates are, and so that's how we think we can get an architecture ready that allows the global optimization, 'cause we can see the entire ecosystem in its totality, IOT, Oracle, all of it combined, we optimized the queries, push everything down as far as we can, algorithms to data, not data to algorithms, and that's how we're going to run this performance, predictably performance, on all of these pieces of data. >> So we end up with, if I got this right, let me recap, so we've got this notion that for data creation, data modeling, we can now use SQL, understood by a lot of people, doesn't preclude us from using native tools, but at least that's one place where we can see how it all comes together, we continue to use local tools for the actual manipulation elements. >> Absolutely. >> We are now using synergy like structures so we can push algorithm down to the data, so we're moving a small amount of data to a large amount of data, 'cause its cost down and improves predictability, but at the same time we've got meta-data objects that allow us to anticipate with some degree of predictability how this whole thing will run, and how this will come together back at the keynote, got that right? >> Got that right. >> Alright, so, next question is what's the impact of doing it this way? Talk a bit about, if you can, about how its helping folks who run data, who build applications, and who actually who are trying to get business value out of this whole process. >> So if we start with the business value, I think the biggest thing we bring to the table is simplicity, and standardization. If I have to understand how is this object represented in NoSQL, how in HDFS, how did somebody put a Jason file in here, I have to now spend time on literally digging through that, and then does it conform, do I have to modify it, what do I do? So I think the business value comes out of the SQL layer on top of it. It all looks exactly the same. It's well known, it's well understood, its far quicker to get from, I've got a bunch of data, to actually building a VI report, building a dashboard, building KPIs, and integrating that data, there's nothing new to data, its a level of abstraction we put on top of this, whether you use API or in this case we use SQL, 'cause that's the most common analytics language. So that's one part of how it will impact things. The 2nd is, and I think that's where the architecture is completely unique, we keep complete control of the query execution, from the meta-data we just talked about, and that enables us to do global optimization, and we can, and if you think this through a little bit, and go, oh global optimization sounds really cool, what does that mean? I can now actually start pushing processing, I can move data, and its what we've done in the exadata platform for years, data lives on disk, oh, Peter likes to query it very frequently, let's move it up to Flash, let's move it up to in-memory, let's twist the data around. So all the sudden we got control, we understand what gets queried, we understand where data lives, and we can start to optimize, exactly for the usage pattern the customer has, and that's always the performance aspect. And that goes to the old saying of, how can I get data as quickly to a customer when he really needs it, that's what this does, right, how can I optimize this? I've got thousands of people querying certain elements, move them up in the stack and get the performance and all these queries come back in like seconds. Regulatory stuff that needs to go through like five years of data, let's put it in cheap areas, and let's optimize that, and so the impact is cheaper and faster at the end of the day, and all 'cause there's a singular entity almost that governs the data, it governs the queries, it governs the usage patterns, that's what we uniquely bring to the table with this architecture. >> So I want to build on the notion of governance, because actually one of the interesting things you said was the idea that if its all under a common sort of interfaces, then you have greater visibility, where the data is, who owns it, et cetera. If you do this right, one of the biggest challenges that business are having is the global sense of how you govern your data. If you do this right, are you that much closer to having a competent overall data governance? >> I think we were able to set up a big step forward on it, and it sounds very simple, but we now have a central catalog, that actually understands what your data is and where it lives, in kind of like a well-known way, and again it sounds very simple but if you look at silos, that's the biggest problem, you have multiple silos, multiple things are in there, nobody knows really what's in there, so here we start to publish this in like a common structural layer, we have all the technical meta-data, we track who queries what, who does all those things, so that's a tremendous help in governance. The other side of course, because we still use native tools to let's say manipulate some data, or augment or add new data, we now are going to tie in a lot of the meta-data, that comes from say the Hadoop ecosystem, again into this catalog, and while we're probably not there yet just today on the end to end governance everything's kind of out of the box, here we go. >> And probably never will be. >> And we probably never will, you're right, and I think we set a major step forward with just consolidating it, and exposing people to all the data the have, and you can run all the other tools like, crawl my data and check box anything that says SSN, or looks like a social security number, all of those tools are are still relevant. We just have a consolidated view, dramatically improved governance. >> So I'm going to throw you a curve ball. >> Sure. >> Not all data I want to use is inside my business, or is being generated by sensors that I control, how does big data SQL and related technologies play a role in the actual contracting for additional data sources, and sustaining those relationships that are very very fundamental, how data's shared across organizations. Do you see this information being brought in under this umbrella? Do you see Oracle facilitating those types of relationships, introducing standards for data sharing across partnerships becomes even easier? >> I'm not convinced that big data SQL as a technology is going to solve all the problems we see there, I'm absolutely convinced that Oracle is going to work towards that, you see it in so many acquisitions we've done, you see it in the efforts of making data as a service available to people, and to some extent big data SQL will be a foundation layer to make BI queries run smoother across more and more and more pillars of data. If we can integrate database, Hadoop, and NoSQL, there's nothing that says, oh and by the way, storage cloud. >> And we have relatively common physical governance, that I have the same physical governance, and you have the same physical governance, now its easier for us to show how we can introduce governance across our instances. >> Absolutely, and today we focus a lot on HDFS or Hadoop as the next data pillar, storage cloud, ground to cloud, all of those are on the roadmap for big data SQL to catch up with that, and so if you have data as a service, let's declare that cloud for a second, and I have data in my database in my Hadoop cluster, again, all now becomes part of the same ecosystem of data, and it all looks the same to me from a BI query perspective, from an analytics perspective. And then the, how do I get the data sharing standards set up and all that, part of that is driving a lot of it into cloud, and making it all as a service, 'cause again you put a level of abstraction on top of it, that makes it easier to consume, understand where it came from, and capture the meta-data. >> So JP one last question. >> Sure. >> Oracle opens worlds on the horizon, what are you looking for, or what will your customers be looking for as it pertains to this big data SQL and related technologies? >> I think specifically from a big data SQL perspective, is we're going to drive the possible adoption scope much much further, today we work with HDFS an we work with Oracle database, we're going to announce certain things like exadata, Hadoop will be supportive, we hold down super cluster support, we're going to dramatically expand the footprint big data SQL will run on, people who come for big data SQL or analytics sessions you'll see a lot of the roadmap looking far more forward. I already mentioned some things like ground to cloud, how can I run big data SQL when my exadata is on Premis, and then the rest of my HDFS data is in the cloud, we're going to be talking about how we're going to do that, and what do we think the evolution of big data SQL is going to be, I think that's going to be a very fun session to go to. >> JP Dijcks, a master product manager inside the Oracle big data product group, thank you very much for joining us here On the Ground, at Oracle headquarters, this is The Cube.

Published Date : Sep 6 2016

SUMMARY :

Narrator: The Cube presents, On the Ground. or one of the master product managers, and so that's one of the ways we are driving, and so we certainly want to create the meta-data, and some of the creations, some of the generation, So that's what it is, now how does it work? and the catalog to understand things like Hive objects and so that's how we think we can get an architecture ready So we end up with, if I got this right, let me recap, and who actually who are trying to get business value out of and we can, and if you think this through a little bit, because actually one of the interesting things you said everything's kind of out of the box, here we go. and I think we set a major step forward and sustaining those relationships that are and to some extent big data SQL will be a foundation and you have the same physical governance, Absolutely, and today we focus a lot on HDFS or Hadoop and what do we think the evolution the Oracle big data product group,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Peter BurrisPERSON

0.99+

PeterPERSON

0.99+

JP DijcksPERSON

0.99+

Jean-Pierre DijcksPERSON

0.99+

OracleORGANIZATION

0.99+

JPPERSON

0.99+

five yearsQUANTITY

0.99+

JasonPERSON

0.99+

PythonTITLE

0.99+

SQLTITLE

0.99+

NoSQLTITLE

0.99+

2ndQUANTITY

0.98+

first stepQUANTITY

0.98+

todayDATE

0.98+

HDFSORGANIZATION

0.98+

TodayDATE

0.98+

oneQUANTITY

0.97+

HadoopTITLE

0.96+

ParquetTITLE

0.96+

one partQUANTITY

0.95+

The CubeORGANIZATION

0.95+

thousands of peopleQUANTITY

0.94+

yesterdaysDATE

0.94+

ORCTITLE

0.93+

Silicon AngleORGANIZATION

0.92+

FlashTITLE

0.91+

firstQUANTITY

0.84+

yearsQUANTITY

0.83+

a secondQUANTITY

0.82+

the GroundTITLE

0.82+

Hadoop HDFSTITLE

0.81+

a lot of peopleQUANTITY

0.8+

one placeQUANTITY

0.77+

The CubeTITLE

0.6+

singularQUANTITY

0.58+

NarratorTITLE

0.54+

last questionQUANTITY

0.52+

IOTORGANIZATION

0.37+