Yuanhao Sun, Transwarp | Big Data SV 2018
>> Announcer: Live, from San Jose, it's The Cube (light music) Presenting Big Data Silicon Valley. Brought to you by Silicon Angle Media, and its ecosystem partners. >> Hi, I'm Peter Burris and welcome back to Big Data SV, The Cube's, again, annual broadcast of what's happening in the big data marketplace here at, or adjacent to Strada here in San Jose. We've been broadcasting all day. We're going to be here tomorrow as well, over at the Forager eatery and place to come meander. So come on over. Spend some time with us. Now, we've had a number of great guests. Many of the thought leaders that are visiting here in San Jose today were on the big data marketplace. But I don't think any has traveled as far as our next guest. Yuanhao Sun is the ceo of Transwarp. Come all the way from Shanghai Yuanhao. It's once again great to see you on The Cube. Thank you very much for being here. >> Good to see you again. >> So Yuanhao, the Transwarp as a company has become extremely well known for great technology. There's a lot of reasons why that's the case, but you have some interesting updates on how the technology's being applied. Why don't you tell us what's going on? >> Okay, so, recently we announced the first order to the TPC-DS benchmark result. Our product, calling scepter, that is, SQL engine on top of Hadoop. We already add quite a lot of features, like dissre transactions, like a full SQL support. So that it can mimic, like oracle or the mutual, and also traditional database features so that we can pass the whole test. This single is also scalable, because it's distributed, scalable. So the large benchmark, like TPC-DS. It starts from 10 terabytes. SQL engine can pester without much trouble. >> So I know that there have been other firms that have claimed to pass TPCC-DS, but they haven't been audited. What does it mean to say you're audited? I'd presume that as a result, you've gone through some extremely stringent and specific tests to demonstrate that you can actually pass the entire suite. >> Yes, actually, there is a third party auditor. They already audit our test process and it results for the passed six, uh, five months. So it is fully audited. The reason why we can pass the test is because, actually, there's two major reasons for traditional databases. They are not scalable to the process large dataset. So they could not pass the test. For (mumbles) vendors, because the SQL engine, the features to reach enough to pass all the test. You know, there several steps in the benchmark, and the SQL queries, there are 99 queries, the syntax is not supported by all howve vendors yet. And also, the benchmark required to upload the data, after the queries, and then we run the queries for multiple concurrent users. That means you have to support disputed transactions. You have to make the upload data consistent. For howve vendors, the SQL engine on Hadoop. They haven't implemented the de-switch transaction capabilities. So that's why they failed to pass the benchmark. >> So I had the honor of traveling to Shanghai last year and going and speaking at your user conference and was quite impressed with the energy that was in the room as you announced a large number of new products. You've been very focused on taking what open source has to offer but adding significant value to it. As you said, you've done a lot with the SQL interfaces and various capabilities of SQL on top of Hadoop. Where is Transwarp going with its products today? How is it expanding? How is it being organizing? How is it being used? >> We group these products into three catalog, including big data, cloud, AI and the machine learning. So there are three categories. The big data, we upgrade the SQL engine, the stream engine, and we have a set of tools called adjustable studio to help people to streamline the big data operations. And the second part I lie is data cloud. We call it transwarp data cloud. So this product is going to be raised in early in May this year. So this product we build this product on top of common idiots. We provide how to buy the service, get a sense as service, air as a service to customers. A lot of people took credit multiple tenets. And they turned as isolated by network, storage, cpu. They free to create a clusters and speeding up on turning it off. So it can also scale hundreds of cost. So this is the, I think this is the first we implement, like, a network isolation and sweaty percendency in cobinets. So that it can support each day affairs and all how to components. And because it is elastic, just like car computing, but we run on bare model, people can consult the data, consult the applications in one place. Because all application and Hadoop components are conternalized, that means, we are talking images. We can spend up a very quickly and scale through a larger cluster. So this data cloud product is very interesting for large company, because they usually have a small IT team. But they have to provide a (mumbles), and a machine only capability to larger groups, like one found the people. So they need a convenient way to manage all these bigger clusters. And they have to isolate the resources. Even they need a bidding system. So this product is, we already have few big names in China, like China Post, Picture Channel, and Secret of Source Channel. So they are already applying this data cloud for their internal customers. >> And China has a, has a few people, so I presume that, you know, China Post for example, is probably a pretty big implementation. >> Yes so, they have a, but the IT team is, like less than 100 people, but they have to support thousands of users. So that's why they, you usually would deploy 100 cluster for each application, right, but today, for large organization, they have lots of applications. They hope to leverage big data capability, but a very small team, IT team, can also part of so many applications. So they need a convenient the way like a, just like when you put Hadoop on public cloud. We provide a product that allows you to provide a hardware service in private cloud on bare model machines. So this is the second product category. And the third is the machine learning and artificial intelligence. We provide a data sales platform, a machine learning tool, that is, interactive tools that allows people to create the machine only pipelines and models. We even implemented some automatic modeling capability that allow you to, to fisher in youring automatically or seeming automatically and to select the best items for you so that the machine learning can be, so everyone can be at Los Angeles. So they can use our tool to quickly create a models. And we also have some probuter models for different industry, like financial service, like banks, security companies, even iot. So we have different probuter machine only models for them. We just need to modify the template, then apply the machine only models to the applications very quickly. So that probably like a lesson, for example, for a bank customer, they just use it to deploy a model in one week. This is very quick for them. Otherwise, in the past, they have a company to build that application, to develop much models. They usually takes several months. Today it is much faster. So today we have three categories, particularly like cloud and machine learning. >> Peter Burris: Machine learning and AI. >> And so three products. >> And you've got some very, very big implementations. So you were talking about a couple of banks, but we were talking, before we came on, about some of the smart cities. >> Yuanhao Sun: Right. Kinds of things that you guys are doing at enormous scale. >> Yes, so we deploy our streaming productor for more than 300 cities in China. So this cluster is like connected together. So we use streaming capability to monitor the traffic and send the information from city to the central government. So all the, the sort of essential repoetry. So whenever illegal behavior on the road is detected, that information will be sent to the policeman, or the central repoetry within two second. Whenever you are seen by the camera in any place in China, their loads where we send out within two seconds. >> So the bad behavior is detected. It's identified as the location. The system also knows where the nearest police person is. And it sends a message and says, this car has performed something bad. >> Yeah and you should stop that car in the next station or in the next crossroad. Today there are tens of thousands policeman. They depends on this system for their daily work. >> Peter Burris: Interesting. >> So, just a question on, it sounds like one of your, sort of nearest competitors, in terms of, let's take the open source community, at least the APIs, and in their case open source, Waway. Have their been customers that tried to do a POC with you and with Waway, and said, well it took four months using the pure open source stuff, and it took, say, two weeks with your stack having, being much broader and deeper? Are any examples like that? >> There are quite a lot. We have more macro-share, like in financial services, we have about 100 bank users. So if we take all banks into account, for them they already use Hadoop. So we, our macro-share is above 60%. >> George Gilbert: 60. >> Yeah, in financial services. We usually do POC and, like run benchmarks. They are real workloads and usually it takes us three days or one week. They can found, we can speed up their workload very quickly. For Bank of China, they might go to their oracle workload to our platform. And they test our platform and the huave platform too. So the first thing is they cannot marry the whole oracle workload to open source Hadoop, because the missing features. We are able to support all this workloads with very minor modifications. So the modification takes only several hours. And we can finish the whole workload within two hours, but originally they take, usually take oracle more than one day, >> George Gilbert: Wow. >> more than ten hours to finish the workload. So it is very easy to see the benefits quickly. >> Now the you have a streaming product also with that same SQL interface. Are you going to see a migration of applications that used to be batch to more near real time or continuous, or will you see a whole new set of applications that weren't done before, because the latency wasn't appropriate? >> For streaming applications, real time cases they are mostly new applications, but if we are using storm api or spark streaming api, it is not so easy to develop your applications. And another issue is once you detect one new rule, you had to add those rules dynamically to your cluster. So to add to your printer, they do not have so many knowledge of writing scholar codes. They only know how to configure. Probably they are familiar with c-code. They just need to add one SQL statement to add a new rule. So that they can. >> In your system. >> Yeah, in our system. So it is much easier for them to program streaming applications. And for those customers who they don't have real time equations, they hope to do, like a real time data warehousing. They collect all this data from websites from their censors, like Petrol Channel, an oil company, the large oil company. They collect all the (mumbles) information directly to our streaming product. In the past, they just accredit to oracle and around the dashboard. So it only takes hours to see the results. But today, the application can be moved through our streaming product with only a few modifications, because they are all SQL statements. And this application becomes the real time. They can see the real time dashboard results in several seconds. >> So Yuanhao, you're number one in China. You're moving more aggressively to participate in the US market. What's the, last question, what's the biggest difference between being number one in China, the way that big data is being done in China versus the way you're encountering big data being done here, certainly in the US, for example? Is there a difference? >> I think there are some difference. Some a seem, katsumoto usually request a POC. But in China, they usually, I think they focus more on the results. They focus on what benefit they can gain from your product. So we have to prove them. So we have to hip them to my great application to see the benefits. I think in US, they focus more on technology than Chinese customers. >> Interesting, so they're more on technology here in the US, more in the outcome in China. Once again, Yuanhao Sun, from, ceo of Transwarp, thank you very much for being on The Cube. >> Thank you. And I'm Peter Burris with George Gilbert, my co-host, and we'll be back with more from big data SV, in San Jose. Come on over to the Forager, and spend some time with us. And we'll be back in a second. (light music)
SUMMARY :
Brought to you by Silicon Angle Media, over at the Forager eatery and place to come meander. So Yuanhao, the Transwarp as a company has become So that it can mimic, like oracle or the mutual, to demonstrate that you can actually pass the entire suite. And also, the benchmark required to upload the data, So I had the honor of traveling to Shanghai last year So this product is going to be raised you know, China Post for example, and to select the best items for you So you were talking about a couple of banks, Kinds of things that you guys are doing at enormous scale. from city to the central government. So the bad behavior is detected. or in the next crossroad. and it took, say, two weeks with your stack having, So if we take all banks into account, So the first thing is they cannot more than ten hours to finish the workload. Now the you have a streaming product also So to add to your printer, So it only takes hours to see the results. to participate in the US market. So we have to prove them. in the US, more in the outcome in China. Come on over to the Forager, and spend some time with us.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Peter Burris | PERSON | 0.99+ |
Shanghai | LOCATION | 0.99+ |
George Gilbert | PERSON | 0.99+ |
US | LOCATION | 0.99+ |
China | LOCATION | 0.99+ |
99 queries | QUANTITY | 0.99+ |
three days | QUANTITY | 0.99+ |
two weeks | QUANTITY | 0.99+ |
Silicon Angle Media | ORGANIZATION | 0.99+ |
five months | QUANTITY | 0.99+ |
San Jose | LOCATION | 0.99+ |
China Post | ORGANIZATION | 0.99+ |
Picture Channel | ORGANIZATION | 0.99+ |
one week | QUANTITY | 0.99+ |
six | QUANTITY | 0.99+ |
four months | QUANTITY | 0.99+ |
Los Angeles | LOCATION | 0.99+ |
10 terabytes | QUANTITY | 0.99+ |
last year | DATE | 0.99+ |
today | DATE | 0.99+ |
Today | DATE | 0.99+ |
tomorrow | DATE | 0.99+ |
more than one day | QUANTITY | 0.99+ |
more than 300 cities | QUANTITY | 0.99+ |
second part | QUANTITY | 0.99+ |
two hours | QUANTITY | 0.99+ |
less than 100 people | QUANTITY | 0.99+ |
more than ten hours | QUANTITY | 0.99+ |
Waway | ORGANIZATION | 0.99+ |
Bank of China | ORGANIZATION | 0.99+ |
third | QUANTITY | 0.99+ |
Hadoop | TITLE | 0.99+ |
Petrol Channel | ORGANIZATION | 0.99+ |
three products | QUANTITY | 0.98+ |
one new rule | QUANTITY | 0.98+ |
hundreds | QUANTITY | 0.98+ |
three categories | QUANTITY | 0.98+ |
SQL | TITLE | 0.98+ |
single | QUANTITY | 0.98+ |
Transwarp | ORGANIZATION | 0.98+ |
first | QUANTITY | 0.98+ |
tens of thousands policeman | QUANTITY | 0.98+ |
Yuanhao Sun | ORGANIZATION | 0.98+ |
each application | QUANTITY | 0.98+ |
two seconds | QUANTITY | 0.98+ |
100 cluster | QUANTITY | 0.97+ |
first thing | QUANTITY | 0.97+ |
about 100 bank users | QUANTITY | 0.97+ |
two second | QUANTITY | 0.97+ |
each day | QUANTITY | 0.97+ |
Big Data SV | ORGANIZATION | 0.97+ |
The Cube | ORGANIZATION | 0.96+ |
two major reasons | QUANTITY | 0.95+ |
one | QUANTITY | 0.95+ |
above 60% | QUANTITY | 0.95+ |
early in May this year | DATE | 0.94+ |
Source Channel | ORGANIZATION | 0.93+ |
Big Data | ORGANIZATION | 0.92+ |
Chinese | OTHER | 0.9+ |
Strada | LOCATION | 0.89+ |
second product category | QUANTITY | 0.88+ |
Yuanhao Sun, Transwarp Technology - BigData SV 2017 - #BigDataSV - #theCUBE
>> Announcer: Live from San Jose, California, it's theCUBE, covering Big Data Silicon Valley 2017. (upbeat percussion music) >> Okay, welcome back everyone. Live here in Silicon Valley, San Jose, is the Big Data SV, Big Data Silicon Valley in conjunction with Strata Hadoop, this is theCUBE's exclusive coverage. Over the next two days, we've got wall-to-wall interviews with thought leaders, experts breaking down the future of big data, future of analytics, future of the cloud. I'm John Furrier with my co-host George Gilbert with Wikibon. Our next guest is Yuanhao Sun, who's the co-founder and CTO of Transwarp Technologies. Welcome to theCUBE. You were on, during the, 166 days ago, I noticed, on theCUBE, previously. But now you've got some news. So let's get the news out of the way. What are you guys announcing here, this week? >> Yes, so we are announcing 5.0, the latest version of Transwarp Hub. So in this version, we will call it probably revolutionary product, because the first one is we embedded communities in our product, so we will allow people to isolate different kind of workloads, using dock and containers, and we also provide a scheduler to better support mixed workloads. And the second is, we are building a set of tools allow people to build their warehouse. And then migrate from existing or traditional data warehouse to Hadoop. And we are also providing people capability to build a data mart, actually. It allow you to interactively query data. So we build a column store in memory and on SSD. And we totally write the whole SQL engine. That is a very tiny SQL engine, allow people to query data very quickly. And so today that tiny SQL engine is like about five to ten times faster than Spark 2.0. And we also allow people to build cubes on top of Hadoop. And then, once the cube is built, the SQL performance, like the TBCH performance, is about 100 times faster than existing database, or existing Spark 2.0. So it's super-fast. And in, actually we found a Paralect customer, so they replace their data with software, to build a data mart. And we already migrate, say 100 reports, from their data to our product. So the promise is very good. And the first one is we are providing tool for people to build the machine learning pipelines and we are leveraging TensorFlow, MXNet, and also Spark for people to visualize the pipeline and to build the data mining workflows. So this is kind of like Datasense tools, it's very easy for people to use. >> John: Okay, so take a minute to explain, 'cus that was great, you got the performance there, that's the news out of the way. Take a minute to explain Transwarp, your value proposition, and when people engage you as a customer. >> Yuanhao: Yeah so, people choose our product and the major reason is our compatibility to Oracle, DV2, and teradata SQL syntax, because you know, they have built a lot of applications onto those databases, so when they migrate to Hadoop, they don't want to rewrote whole program, so our compatibility, SQL compatibility is big advantage to them, so this is the first one. And we also support full ANCIT and distribute transactions onto Hadoop. So that a lot of applications can be migrate to our product, with few modification or without any changes. So this is the first our advantage. The second is because we are providing, even the best streaming engine, that is actually derived from Spark. So we apply this technology to IOT applications. You know the IOT pretty soon, they need a very low latency but they also need very complicated models on top of streams. So that's why we are providing full SQL support and machine learning support on top of streaming events. And we are also using event-driven technology to reduce the latency, to five to ten milliseconds. So this is second reason people choose our product. And then today we are announcing 5.0, and I think people will find more reason to choose our product. >> So you have the compatibility SQL, you have the tooling, and now you have the performance. So kind of the triple threat there. So what's the customer saying, when you go out and talk with your customers, what's the view of the current landscape for customers? What are they solving right now, what are the key challenges and pain points that customers have today? >> We have customers in more than 12 vertical segments, and in different verticals they have different pain points, actually so. Take one example: in financial services, the main pain point for them is to migrate existing legacy applications to Hadoop, you know they have accumulated a lot of data, and the performance is very bad using legacy database, so they need high performance Hadoop and Spark to speed up the performance, like reports. But in another vertical, like in logistic and transportation and IOT, the pain point is to find a very low latency streaming engine. At the same time, they need very complicated programming model to write their applications. And that example, like in public sector, they actually need very complicated and large scale search engine. They need to build analytical capability on top of search engine. They can search the results and analyze the result in the same time. >> George: Yuanhao, as always, whenever we get to interview you on theCube, you toss out these gems, sort of like you know diamonds, like big rocks that under millions of years, and incredible pressure, have been squeezed down into these incredibly valuable, kind of, you know, valuable, sort of minerals with lots of goodness in them, so I need you to unpack that diamond back into something that we can make sense out of, or I should say, that's more accessible. You've done something that none of the Hadoop Distro guys have managed to do, which is to build databases that are not just decision support, but can handle OLTP, can handle operational applications. You've done the streaming, you've done what even Databricks can't do without even trying any of the other stuff, which is getting the streaming down to event at a time. Let's step back from all these amazing things, and tell us what was the secret sauce that let you build a platform this advanced? >> So actually, we are driven by our customers, and we do see the trends people are looking for, better solutions, you know there are a lot of pain to set up a habitable class to use the Hadoop technology. So that's why we found it's very meaningful and also very necessary for us to build a SQL database on top of Hadoop. Quite a lot of customers in FS side, they ask us to provide asset until the transaction can be put on top of Hadoop, because they have to guarantee the consistency of their data. Otherwise they cannot use the technology. >> At the risk of interrupting, maybe you can tell us why others have built the analytic databases on top of Hadoop, to give the familiar SQL access, and obviously have a desire also to have transactions next to it, so you can inform a transaction decision with the analytics. One of the questions is, how did you combine the two capabilities? I mean it only took Oracle like 40 years. >> Right, so. Actually our transaction capability is only for analytics, you know, so this OLTP capability it is not for short term transactional applications, it's for data warehouse kind of workloads. >> George: Okay, so when you're ingesting. >> Yes, when you're ingesting, when you modify your data, in batch, you have to guarantee the consistency. So that's the OLTP capability. But we are also building another distributed storage, and distributed database, and that are providing that with OLTP capability. That means you can do concurrent transactions, on that database, but we are still developing that software right now. Today our product providing the digital transaction capability for people to actually build their warehouse. You know quite a lot of people believe data warehouse do not need transaction capability, but we found a lot of people modify their data in data warehouse, you know, they are loading their data continuously to data warehouse, like the CRM tables, customer information, they can be changed over time. So every day people need to update or change the data, that's why we have to provide transaction capability in data warehouse. >> George: Okay, and then so then well tell us also, 'cus the streaming problem is, you know, we're told that roughly two thirds of Spark deployments use streaming as a workload. And the biggest knock on Spark is that it can't process one event at a time, you got to do a little batch. Tell us some of the use cases that can take advantage of doing one event at a time, and how you solved that problem? >> Yuanhao: Yeah so the first use case we encounter is the anti-fraud, or fraud detection application in FSI, so whenever you swipe your credit card, the bank needs to tell you if the transaction is a fraud or not in a few milliseconds. But if you are using Spark streaming, it will usually take 500 milliseconds, so the latency is too high for such kind of application. And that's why we have to provide event per time, like means event-driven processing to detect the fraud, so that we can interrupt the transaction in a few milliseconds, so that's one kind of application. The other can come from IOT applications, so we already put our streaming framework in large manufacture factory. So they have to detect the main function of their equipments in a very short time, otherwise it may explode. So if you... So if you are using Spark streaming, probably when you submit your application, it will take you hundreds of milliseconds, and when you finish your detection, it usually takes a few seconds, so that will be too long for such kind of application. And that's why we need a low latency streaming engine, but you can see it is okay to use Storm or Flink, right? And problem is, we found it is: They need a very complicated programming model, that they are going to solve equation on the streaming events, they need to do the FFT transformation. And they are also asking to run some linear regression or some neural network on top of events, so that's why we have to provide a SQL interface and we are also embedding the CEP capability into our streaming engine, so that you can use pattern to match the events and to send alerts. >> George: So, SQL to get a set of events and maybe join some in the complex event processing, CEP, to say, does this fit a pattern I'm looking for? >> Yuanhao: Yes. >> Okay, and so, and then with the lightweight OLTP, that and any other new projects you're looking at, tell us perhaps the new use cases you'd be appropriated for. >> Yuanhao: Yeah so that's our official product actually, so we are going to solve the problem of large scale OLTP transaction problems like, so you know, a lot of... You know, in China, there is so many population, like in public sector or in banks, they need build a highly scalable transaction systems so that they can support a very high concurrent transactions at the same time, so that's why we are building such kind of technology. You know, in the past, people just divide transaction into multiple databases, like multiple Oracle instances or multiple mySQL instances. But the problem is: if the application is simple, you can very easily divide a transaction over the multiple instances of databases. But if the application is very complicated, especially when the ISV already wrote the applications based on Oracle or traditional database, they already depends on the transaction systems so that's why we have to build a same kind of transaction systems, so that we can support their legacy applications, but they can scale to hundreds of nodes, and they can scale to millions of transactions per second. >> George: On the transactional stuff? >> Yuanhao: Yes. >> Just correct me if I'm wrong, I know we're running out of time but I thought Oracle only scales out when you're doing decision support work, not when you're doing OLTP, not that it, that it can only, that it can maybe stretch to ten nodes or something like that, am I mistaken? >> Yuanhao: Yes, they can scale to 16 to all 32 nodes. >> George: For transactional work? >> For transaction works, but so that's the theoretical limit, but you know, like Google F1 and Google Spanner, they can scale to hundreds of nodes. But you know, the latency is higher than Oracle because you have to use distributed particle to communicate with multiple nodes, so the latency is higher. >> On Google? >> Yes. >> On Google. The latency is higher on the Google? >> 'Cus it has to go like all the way to Europe and back. >> Oracle or Google latency, you said? >> Google, because if you are using two phase commit protocol you have to talk to multiple nodes to broadcast your request to multiple nodes, and then wait for the feedback, so that mean you have a much higher latency, but it's necessary to maintain the consistency. So in a distributed OLTP databases, the latency is usually higher, but the concurrency is also much higher, and scalability is much better. >> George: So that's a problem you've stretched beyond what Oracle's done. >> Yuanhao: Yes, so because customer can tolerant the higher latency, but they need to scale to millions of transactions per second, so that's why we have to build a distributed database. >> George: Okay, for this reason we're going to have to have you back for like maybe five or ten consecutive segments, you know, maybe starting tomorrow. >> We're going to have to get you back for sure. Final question for you: What are you excited about, from a technology, in the landscape, as you look at open source, you're working with Spark, you mentioned Kubernetes, you have micro services, all the cloud. What are you most excited about right now in terms of new technology that's going to help simplify and scale, with low latency, the databases, the software. 'Cus you got IOT, you got autonomous vehicles, you have all this data, what are you excited about? >> So actually, so this technology we already solve these problems actually, but I think the most exciting thing is we found... There's two trends, the first trend is: We found it's very exciting to find more competition framework coming out, like the AI framework, like TensorFlow and MXNet, Torch, and tons of such machine learning frameworks are coming out, so they are solving different kinds of problems, like facial recognition from video and images, like human computer interactions using voice, using audio. So it's very exciting I think, but for... And also it's very, we found it's very exciting we are embedding these, we are combining these technologies together, so that's why we are using competitors you know. We didn't use YARN, because it cannot support TensorFlow or other framework, but you know, if you are using containers and if you have good scheduler, you can schedule any kind of competition frameworks. So we found it's very interesting to, to have these new frameworks, and we can combine together to solve different kinds of problems. >> John: Thanks so much for coming onto theCube, it's an operating system world we're living in now, it's a great time to be a technologist. Certainly the opportunities are out there, and we're breaking it down here inside theCube, live in Silicon Valley, with the best tech executives, best thought leaders and experts here inside theCube. I'm John Furrier with George Gilbert. We'll be right back with more after this short break. (upbeat percussive music)
SUMMARY :
Jose, California, it's theCUBE, So let's get the news out of the way. And the first one is we are providing tool and when people engage you as a customer. And then today we are announcing 5.0, So kind of the triple threat there. the pain point is to find so I need you to unpack because they have to guarantee next to it, so you can you know, so this OLTP capability So that's the OLTP capability. 'cus the streaming problem is, you know, the bank needs to tell you Okay, and so, and then and they can scale to millions scale to 16 to all 32 nodes. so the latency is higher. The latency is higher on the Google? 'Cus it has to go like all so that mean you have George: So that's a the higher latency, but they need to scale segments, you know, to get you back for sure. like the AI framework, like it's a great time to be a technologist.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
George | PERSON | 0.99+ |
John | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
China | LOCATION | 0.99+ |
five | QUANTITY | 0.99+ |
Europe | LOCATION | 0.99+ |
Transwarp Technologies | ORGANIZATION | 0.99+ |
40 years | QUANTITY | 0.99+ |
500 milliseconds | QUANTITY | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
San Jose, California | LOCATION | 0.99+ |
hundreds of nodes | QUANTITY | 0.99+ |
Hadoop | TITLE | 0.99+ |
Today | DATE | 0.99+ |
ten nodes | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
100 reports | QUANTITY | 0.99+ |
tomorrow | DATE | 0.99+ |
second | QUANTITY | 0.99+ |
first one | QUANTITY | 0.99+ |
Yuanhao Sun | PERSON | 0.99+ |
second reason | QUANTITY | 0.99+ |
Spark 2.0 | TITLE | 0.99+ |
today | DATE | 0.99+ |
this week | DATE | 0.99+ |
ten times | QUANTITY | 0.99+ |
16 | QUANTITY | 0.99+ |
two trends | QUANTITY | 0.99+ |
Yuanhao | PERSON | 0.99+ |
SQL | TITLE | 0.99+ |
Spark | TITLE | 0.99+ |
first trend | QUANTITY | 0.99+ |
two capabilities | QUANTITY | 0.98+ |
Silicon Valley, San Jose | LOCATION | 0.98+ |
TensorFlow | TITLE | 0.98+ |
one event | QUANTITY | 0.98+ |
32 nodes | QUANTITY | 0.98+ |
theCUBE | ORGANIZATION | 0.98+ |
Torch | TITLE | 0.98+ |
166 days ago | DATE | 0.98+ |
one example | QUANTITY | 0.98+ |
more than 12 vertical segments | QUANTITY | 0.97+ |
ten milliseconds | QUANTITY | 0.97+ |
hundreds of milliseconds | QUANTITY | 0.97+ |
two thirds | QUANTITY | 0.97+ |
MXNet | TITLE | 0.97+ |
Databricks | ORGANIZATION | 0.96+ |
ORGANIZATION | 0.96+ | |
ten consecutive segments | QUANTITY | 0.95+ |
first use | QUANTITY | 0.95+ |
Wikibon | ORGANIZATION | 0.95+ |
Big Data Silicon Valley | ORGANIZATION | 0.95+ |
Strata Hadoop | ORGANIZATION | 0.95+ |
about 100 times | QUANTITY | 0.94+ |
Big Data SV | ORGANIZATION | 0.94+ |
One of | QUANTITY | 0.94+ |
Peter Burris Big Data Research Presentation
(upbeat music) >> Announcer: Live from San Jose, it's theCUBE presenting Big Data Silicon Valley brought to you by SiliconANGLE Media and its ecosystem partner. >> What am I going to spend time, next 15, 20 minutes or so, talking about. I'm going to answer three things. Our research has gone deep into where are we now in the big data community. I'm sorry, where is the big data community going, number one. Number two is how are we going to get there and number three, what do the numbers say about where we are? So those are the three things. Now, since when we want to get out of here, I'm going to fly through some of these slides but again there's a lot of opportunity for additional conversation because we're all about having conversations with the community. So let's start here. The first thing to know, when we think about where this is all going is it has to be bound. It's inextricably bound up with digital transformation. Well, what is digital transformation? We've done a lot of research on this. This is Peter Drucker who famously said many years ago, that the purpose of a business is to create and keep a customer. That's what a business is. Now what's the difference between a business and a digital business? What's the business between Sears Roebuck, or what's the difference between Sears Roebuck and Amazon? It's data. A digital business uses data as an asset to create and keep customers. It infuses data and operations differently to create more automation. It infuses data and engagement differently to catalyze superior customer experiences. It reformats and restructures its concept of value proposition and product to move from a product to a services orientation. The role of data is the centerpiece of digital business transformation and in many respects that is where we're going, is an understanding and appreciation of that. Now, we think there's going to be a number of strategic capabilities that will have to be built out to make that possible. First off, we have to start thinking about what it means to put data to work. The whole notion of an asset is an asset is something that can be applied to a productive activity. Data can be applied to a productive activity. Now, there's a lot of very interesting implications that we won't get into now, but essentially if we're going to treat data as an asset and think about how we could put more data to work, we're going to focus on three core strategic capabilities about how to make that possible. One, we need to build a capability for collecting and capturing data. That's a lot of what IoT is about. It's a lot of what mobile computing is about. There's going to be a lot of implications around how to ethically and properly do some of those things but a lot of that investment is about finding better and superior ways to capture data. Two, once we are able to capture that data, we have to turn it into value. That in many respects is the essence of big data. How we turn data into data assets, in the form of models, in the form of insights, in the form of any number of other approaches to thinking about how we're going to appropriate value out of data. But it's not just enough to create value out of it and have it sit there as potential value. We have to turn it into kinetic value, to actually do the work with it and that is the last piece. We have to build new capabilities for how we're going to apply data to perform work better, to enact based on data. Now, we've got a concept we're researching now that we call systems of agency, which is the idea that there's going to be a lot of new approaches, new systems with a lot of intelligence and a lot of data that act on behalf of the brand. I'm not going to spend a lot of time going into this but remember that word because I will come back to it. Systems of agency is about how you're going to apply data to perform work with automation, augmentation, and actuation on behalf of your brand. Now, all this is going to happen against the backdrop of cloud optimization. I'll explain what we mean by that right now. Very importantly, increasingly how you create value out of data, how you create future options on the value of your data is going to drive your technology choices. For the first 10 years of the cloud, the presumption is all data was going to go to the cloud. We think that a better way of thinking about it is how is the cloud experience going to come to the data. We've done a lot of research on the cost of data movement and both in terms of the actual out-of-pocket costs but also the potential uncertainty, the transaction costs, etc, associated with data movement. And that's going to be one of the fundamental pieces or elements of how we think about the future of big data and how digital business works, is what we think about data movement. I'll come to that in a bit. But our proposition is increasingly, we're going to see architectural approaches that focus on how we're going to move the cloud experience to the data. We've got this notion of true private cloud which is effectively the idea of the cloud experience on or near premise. That doesn't diminish the role that the cloud's going to play on industry or doesn't say that Amazon and AWS and Microsoft Azure and all the other options are not important. They're crucially important but it means we have to start thinking architecturally about how we're going to create value of data out of data and recognize that means that it, we have to start envisioning how our organization and infrastructure is going to be set up so that we can use data where it needs to be or where it's most valuable and often that's close to the action. So if we think then about that very quickly because it's a backdrop for everything, increasingly we're going to start talking about the idea of where's the workload going to go? Where's workload the dog going to be against this kind of backdrop of the divorce of infrastructure? We believe that and our research pretty strongly shows that a lot of workloads are going to go to true private cloud but a lot of big data is moving into the cloud. This is a prediction we made a few years ago and it's clearly happening and it's underway and we'll get into what some of the implications are. So again, when we say that a lot of the big data elements, a lot of the process of creating value out of data is going to move into the cloud. That doesn't mean that all the systems of agency that build or rely on that data, the inference engines, etc, are also in a public cloud. A lot of them are going to be distributed out to the edge, out to where the action needs to be because of latency and other types of issues. This is a fundamental proposition and I know I'm going fast but hopefully I'm being clear. All right, so let's now get to the second part. This is kind of where the industry's going. Data is an asset. Invest in strategic business capabilities to appreciate, to create those data assets and appreciate the value of those assets and utilize the cloud intelligently to generate and ensure increasing returns. So the next question is well, how will we get there? Now. Right now, not too far from here, Neil Raden for example, was on the show floor yesterday. Neil made the observation that, as he wandered around, he only heard the word big data two or three times. The concept of big data is not dead. Whether the term is or is not is somebody else's decision. Our perspective, very simply, is that the notion is bifurcating. And it's bifurcating because we see different strategic imperatives happening at two different levels. On the one hand, we see infrastructure convergence. The idea that increasingly we have to think about how we're going to bring and federated data together, both from a systems and a data management standpoint. And on the other hand, we're going to see infrastructure or application specialization. That's going to have an enormous implication over next few years, if only because there just aren't enough people in the world that understand how to create value out of data. And there's going to be a lot of effort made over the next few years to find new ways to go from that one expertise group to billions of people, billions of devices, and those are the two dominant considerations in the industry right now. How can we converge data physically, logically, and on the other hand, how can we liberate more of the smarts associated with this very, very powerful approach so that more people get access to the capacities and the capabilities and the assets that are being generated by that process. Now, we've done at Wikibon, probably I don't know, 18, 20, 23 predictions overall on the role that or on the changes being wrought by digital business. Here I'm going to focus on four of them that are central to our big data research. We have many more but I'm just going to focus on four. The first one, when we think about infrastructure convergence we worry about hardware. Here's a prediction about what we think is going to happen with hardware and our observation is we believe pretty strongly that future systems are going to be built on the concept of how do you increase the value of data assets. The technologies are all in place. Simpler parts that it more successfully bind specifically through all its storage and network are going to play together. Why, because increasingly that's the fundamental constraint. How do I make data available to other machines, actors, sources of change, sources of process within the business. Now, we envision or we are watching before our very eyes, new technologies that allow us to take these simple piece parts and weave them together in very powerful fabrics or grids, what we call UniGrid. So that there is almost no latency between data that exists within one of these, call it a molecule, and anywhere else in that grid or lattice. Now again, these are not systems that are going to be here in five years. All the piece parts are here today and there are companies that are actually delivering them. So if you take a look at what Micron has done with Mellanox and other players, that's an example of one of these true private cloud oriented machines in place. The bottom line though is that there is a lot of room left in hardware. A lot of room. This is what cloud suppliers are building and are going to build but increasingly as we think about true private cloud, enterprises are going to look at this as well. So future systems for improving data assets. The capacity of this type of a system with low latency amongst any source of data means that we can now think about data not as... Not as a set of sources that have to be each individually, each having some control over its own data and sinks woven together by middleware and applications but literally as networks of data. As we start to think about distributing data and distributing control and authority associated with that data more broadly across systems, we now have to think about what does it mean to create networks of data? Because that, in many respects, is how these assets are going to be forged. I haven't even mentioned the role that security is going to play in all of this by the way but fundamentally that's how it's likely to play out. We'll have a lot of different sources but from a business standpoint, we're going to think about how those sources come together into a persistent network that can be acted upon by the business. One of the primary drivers of this is what's going on at the edge. Marc Andreessen famously said that software is eating the world, well our observation is great but if software's eating the world, it's eating it at the edge. That's where it's happening. Secondly, that this notion of agency zones. I said I'm going to bring that word up again, how systems act on behalf of a brand or act on behalf of an institution or business is very, very crucial because the time necessary to do the analysis, perform the intelligence, and then take action is a real constraint on how we do things. And our expectation is that we're going to see what we call an agency zone or a hub zone or cloud zone defined by latency and how we architect data to get the data that's necessary to perform that piece of work into the zone where it's required. Now, the implications of this is none of this is going to happen if we don't use AI and related technologies to increasingly automate how we handle infrastructure. And technologies like blockchain have the potential to provide a interesting way of imagining how these networks of data actually get structured. It's not going to solve everything. There's some people that think the blockchain is kind of everything that's necessary but it will be a way of describing a network of data. So we see those technologies on the ascension. But what does it mean for DBMS? In the old way, in the old world, the old way of thinking, the database manager was the control point for data. In the new world these networks of data are going to exist beyond a single DBMS and in fact, over time, that concept of federated data actually has a potential to become real. When we have these networks of data, we're going to need people to act upon them and that's essentially a lot of what the data scientist is going to be doing. Identifying the outcome, identifying the data that's required, and weaving that data through the construction and management, manipulation of pipelines, to ensure that the data as an asset can persist for the purposes of solving a near-term problem or over whatever duration is required to solve a longer term problem. Data scientists remain very important but we're going to see, as a consequence of improvements in tooling capable of doing these things, an increasing recognition that there's a difference between a data scientist and a data scientist. There's going to be a lot of folks that participate in the process of manipulating, maintaining, managing these networks of data to create these business outcomes but we're going to see specialization in those ranks as the tooling is more targeted to specific types of activities. So the data scientist is going to become or will remain an important job, going to lose a little bit of its luster because it's going to become clear what it means. So some data scientists will probably become more, let's call them data network administrators or networks of data administrators. And very importantly as I said earlier, there's just not enough of these people on the planet and so increasingly when we think about again, digital business and the idea of creating data assets. A central challenge is going to be how to create the data or how to turn all the data that can be captured into assets that can be applied to a lot of different uses. There's going to be two fundamental changes to the way we are currently conceiving of the big data world on the horizon. One is well, it's pretty clear that Hadoop can only go so far. Hadoop is a great tool for certain types of activities and certain numbers of individuals. So Hadoop solves problems for an important but relatively limited subset of the world. Some of the new data science platforms that we just talked about, that I just talked about, they're going to help with a degree of specialization that hasn't been available before in the data world, will certainly also help but it also will only take it so far. The real way that we see the work that we're doing, the work that the big data community is performing, turned into sources of value that extend into virtually every single corner of humankind is going to be through these cloud services that are being built and increasingly through packaged applications. A lot of computer science, it still exists between what I just said and when this actually happens. But in many respects, that's the challenge of the vendor ecosystem. How to reconstruct the idea of packaged software, which has historically been built around operations and transaction processing, with a known data model and an unknown or the known process and some technology challenges. How do we reapply that to a world where we now are thinking about, well we don't know exactly what the process is because the data tells us at the moment that the actions going to be taking place. It's a very different way of thinking about application development. A very different way of thinking about what's important in IT and very different way of thinking about how business is going to be constructed and how strategy's going to be established. Packaged applications are going to be crucially important. So in the last few minutes here, what are the numbers? So this is kind of the basis for our analysis. Digital business, role of data is an asset, having an enormous impact in how we think about hardware, how do we think about database management or data management, how we think about the people involved in this, and ultimately how we think about how we're going to deliver all this value out to the world. And the numbers are starting to reflect that. So why don't you think about four numbers as I go through the two or three slides. Hundred and three billion, 68%, 11%, and 2017. So of all the numbers that you will see, those are four of the most important numbers. So let's start by looking at the total market place. This is the growth of the hardware, software, and services pieces of the big data universe. Now we have a fair amount of additional research that breaks all these down into tighter segments, especially in software side. But the key number here is we're talking about big numbers. 103 billion over the course of next 10 years and let's be clear that 103 billion dollars actually has a dramatic amplification on the rest of the computing industry because a lot of the pricing models associated with, especially the software, are tied back to open source which has its own issues. And very importantly, the fact that the services business is going to go through an enormous amount of change over the next five years as service companies better understand how to deliver some of these big data rich applications. The second point to note here is that it was in 2017 that the software market surpassed the hardware market in big data. Again, for first number of years we focused on buying the hardware and the system software associated with that and the software became something that we hope to discover. So I was having a conversation here in theCUBE with the CEO of Transwarp which is a very interesting Chinese big data company and I asked what's the difference between how you do things in China and how we do things in the US? He said well, in the US you guys focus on proof of concept. You spend an enormous amount of time asking, does the hardware work? Does the database software work? Does the data management software work? In China we focus on the outcome. That's what we focus on. Here you have to placate the IT organization to make sure that everybody in IT is comfortable with what's about to happen. In China, were focused on the business people. This is the first year that software is bigger than hardware and it's only going to get bigger and bigger over time. It doesn't mean again, that hardware is dead or hardware is not important. It's going to remain very important but it does mean that the centerpiece of the locus of the industry is moving. Now, when we think about what the market shares look like, it's a very fragmented market. 60%, 68% of the market is still other. This is a highly immature market that's going to go through a number of changes over the next few years. Partly catalyzed by that notion of infrastructure convergence. So in four years our expectation is that, that 68% is going to start going down pretty fast as we see greater consolidation in how some of these numbers come together. Now IBM is the biggest one on the basis of the fact that they operate in all these different segments. They operating the hardware, software, and services segment but especially because they're very strong within the services business. The last one I want to point your attention to is this one. I mentioned earlier on, that our expectation is that the market increasingly is going to move to a packaged application orientation or packaged services orientation as a way of delivering expertise about big data to customers. Splunk is the leading software player right now. Why, because that's the perspective that they've taken. Now, perhaps we're a limited subset. It's perhaps for a limited subset of individuals or markets or of sectors but it takes a packaged application, weaves these technologies together, and applies them to an outcome. And we think this presages more of that kind of activity over the course of the next few years. Oracle, kind of different approach and we'll see how that plays out over the course of the next five years as well. Okay, so that's where the numbers are. Again, a lot more numbers, a lot of people you can talk to. Let me give you some action items. First one, if data was a core asset, how would IT, how would your business be different? Stop and think about that. If it wasn't your buildings that were the asset, it wasn't the machines that were the asset, it wasn't your people by themselves who were the asset, but data was the asset. How would you reinstitutionalize work? That's what every business is starting to ask, even if they don't ask it in the same way. And our advice is, then do it because that's the future of business. Not that data is the only asset but data is a recognized central asset and that's going to have enormous impacts on a lot of things. The second point I want to leave you with, tens of billions of users and I'm including people and devices, are dependent on thousands of data scientists that's an impedance mismatch that cannot be sustained. Packaged apps and these cloud services are going to be the way to bridge that gap. I'd love to tell you that it's all going to be about tools, that we're going to have hundreds of thousands or millions or tens of millions or hundreds of millions of data scientists suddenly emerge out of the woodwork. It's not going to happen. The third thing is we think that big businesses, enterprises, have to master what we call the big inflection. The big tech inflection. The first 50 years were about known process and unknown technology. How do I take an accounting package and do I put on a mainframe or a mini computer a client/server or do I do it on the web? Unknown technology. Well increasingly today, all of us have a pretty good idea what the base technology is going to be. Does anybody doubt it's going to be the cloud? We got a pretty good idea what the base technology is going to be. What we don't know is what are the new problems that we can attack, that we can address with data rich approaches to thinking about how we turn those systems into actors on behalf of our business and customers. So I'm a couple minutes over, I apologize. I want to make sure everybody can get over to the keynotes if you want to. Feel free to stay, theCUBE's going to be live at 9:30. If I got that right. So it's actually pretty exciting if anybody wants to see how it works, feel free to stay. Georgia's here, Neil's here, I'm here. I mentioned Greg Terrio, Dave Volante, John Greco, I think I saw Sam Kahane back in the corner. Any questions, come and ask us, we'll be more than happy. Thank you very much for, oh David Volante. >> David: I have a question. >> Yes. >> David: Do you have time? >> Yep. >> David: So you talk about data as a core asset, that if you look at the top five companies by market cap in the US, Google, Amazon, Facebook, etc. They're data companies, they got data at the core which is kind of what your first bullet here describes. How do you see traditional companies closing that gap where humans, buildings, etc at the core as we enter this machine intelligence era, what's your advice to the traditional companies on how they close that gap? >> All right. So the question was, the most valuable companies in the world are companies that are well down the path of treating data as an asset. How does everybody else get going? Our observation is you go back to what's the value proposition? What actions are most important? what's data is necessary to perform those actions? Can changing the way the data is orchestrated and organized and put together inform or change the cost of performing that work by changing the cost transactions? Can you increase a new service along the same lines and then architect your infrastructure and your business to make sure that the data is near the action in time for the action to be absolute genius to your customer. So it's a relatively simple thought process. That's how Amazon thought, Apple increasingly thinks like that, where they design the experience and they think what data is necessary to deliver that experience. That's a simple approach but it works. Yes, sir. >> Audience Member: With the slide that you had a few slides ago, the market share, the big spenders, and you mentioned that, you asked the question do any of us doubt that cloud is the future? I'm with Snowflake, I don't see many of those large vendors in the cloud and I was wondering if you could speak to what are you seeing in terms of emerging vendors in that space. >> What a great question. So the question was, when you look at the companies that are catalyzing a lot of the change, you don't see a lot of the big companies being at the leadership. And someone from Snowflake just said, well who's going to lead it? That's a big question that has a lot of implications but at this point time it's very clear that the big companies are suffering a bit from the old, from the old, trying to remember what the... RCA syndrome. I think Clay Christensen talked about this. You know, the innovators dilemma. So RCA actually is one of the first creators. They created the transistor and they held a lot of original patents on it. They put that incredible new technology, back in the forties and fifties, under the control of the people who ran the vacuum tube business. When was the last time anybody bought RCA stock? The same problem is existing today. Now, how is that going to play out? Are we going to see a lot of, as we've always seen, a lot of new vendors emerge out of this industry, grow into big vendors with IPO related exits to try to scale their business? Or are we going to see a whole bunch of gobbling up? That's what I'm not clear on but it's pretty clear at this point in time that a lot of the technology, a lot of the science, is being done in smaller places. The moderating feature of that is the services side. Because there's limited groupings of expertise that the companies that today are able to attract that expertise. The Googles, the Facebooks, the AWSs, etc, the Amazons. Are doing so in support of a particular service. IBM and others are trying to attract that talent so they can apply it to customer problems. We'll see over the next few years whether the IBMs and the Accentures and the big service providers are able to attract the kind of talent necessary to diffuse that knowledge into the industry faster. So it's the rate at which that the idea of internet scale computing, the idea of big data being applied to business problems, can diffuse into the marketplace through services. If it can diffuse faster that will have both an accelerating impact for smaller vendors, as it has in the past. But it may also again, have a moderating impact because a lot of that expertise that comes out of IBM, IBM is going to find ways to drive in the product faster than it ever has before. So it's a complicated answer but that's our thinking at this point time. >> Dave: Can I add to that? >> Yeah. (audience member speaking faintly) >> I think that's true now but I think the real question, not to not to argue with Dave but this is part of what we do. The real question is how is that knowledge going to diffuse into the enterprise broadly? Because Airbnb, I doubt is going to get into the business of providing services. (audience member speaking faintly) So I think that the whole concept of community, partnership, ecosystem is going to remain very important as it always has and we'll see how fast those service companies that are dedicated to diffusing knowledge, diffusing knowledge into customer problems actually occurs. Our expectation is that as the tooling gets better, we will see more people be able to present themselves truly as capable of doing this and that will accelerate the process. But the next few years are going to be really turbulent and we'll see which way it actually ends up going. (audience member speaking faintly) >> Audience Member: So I'm with IBM. So I can tell you 100% for sure that we are, I hired literally 50 data scientists in the last three months to go out and do exactly what you're saying. Sit down with clients and help them figure out how to do data science in the enterprise. And so we are in fact scaling it, we're getting people that have done this at Google, Facebook. Not a whole lot of those 'cause we want to do it with people that have actually done it in legacy fortune 500 Companies, right? Because there's a little bit difference there. >> So. >> Audience Member: So we are doing exactly what you said and Microsoft is doing the same thing, Amazon is actually doing the same thing too, Domino Data Lab. >> They don't like they're like talking about it too much but they're doing it. >> Audience Member: But all the big players from the data science platform game are doing this at a different scale. >> Exactly. >> Audience Member: IBM is doing it on a much bigger scale than anyone else. >> And that will have an impact on ultimately how the market gets structured and who the winners end up being. >> Audience Member: To add too, a lot of people thought that, you mentioned the Red Hat of big data, a lot of people thought Cloudera was going to be the Red Hat of big data and if you look at what's happened to their business. (background noise drowns out other sounds) They're getting surrounded by the cloud. We look at like how can we get closer to companies like AWS? That was like a wild card that wasn't expected. >> Yeah but look, at the end of the day Red Hat isn't even the Red Hat of open source. So the bottom line is the thing to focus on is how is this knowledge going to diffuse. That's the thing to focus on. And there's a lot of different ways, some of its going to diffuse through tools. If it diffuses through tools, it increases the likelihood that we'll have more people capable of doing this in IBM and others can hire more. That Citibank can hire more. That's an important participant, that's an important play. So you have something to say about that but it also says we're going to see more of the packaged applications emerge because that facilitates the diffusion. This is not, we haven't figured out, I don't know exactly, nobody knows exactly the exact shape it's going to take. But that's the centerpiece of our big data researches. How is that diffusion process going to happen, accelerate, and what's the resulting structure going to look like? And ultimately how are enterprises going to create value with whatever results. Yes, sir. (audience member asks question faintly) So the recap question is you see more people coming in and promising the moon but being incapable of delivering because they are, partly because the technology is uncertain and for other reasons. So here's our approach. Or here's our observation. We actually did a fair amount of research on this. When you take a look at what we call a approach to doing big data that's optimized for the costs of procurement i.e. let's get the simplest combination of infrastructure, the simplest combination of open-source software, the simplest contracting, to create that proof of concept that you can stand things up very quickly if you have enough expertise but you can create that proof of concept but the process of turning that into actually a production system extends dramatically. And that's one of the reasons why the Clouderas did not take over the universe. There are other reasons. As George Gilbert's research has pointed out, that Cloudera is spending 53, 55 % of their money right now just integrating all the stuff that they bought into the distribution five years ago. Which is a real great recipe for creating customer value. The bottom line though is that if we focus on the time to value in production, we end up taking a different path. We don't focus as much on whether the hardware is going to work and the network is going to work and the storage can be integrated and how it's going to impact the database and what that's going to mean to our Oracle license pool and all the other things that people tend to think about if they're focused on the technology. And so as a consequence, you get better time to value if you focus on bringing the domain expertise, working with the right partner, working with the appropriate approach, to go from what's the value proposition, what actions are associated with a value proposition, what's stated in that area to perform those actions, how can I take transaction costs out of performing those actions, where's the data need to be, what infrastructure do I require? So we have to focus on a time to value not the time to procure. And that's not what a lot of professional IT oriented people are doing because many of them, I hate say it, but many of them still acquire new technology with the promise to helping the business but having a stronger focus on what it's going to mean to their careers. All right, I want to be really respectful to everybody's time. The keynotes start in about five minutes which means you just got time. If you want to stay, feel free to stay. We'll be here, we'll be happy to talk but I think that's pretty much going to close our presentation broadcast. Thank you very much for being an attentive audience and I hope you found this useful. (upbeat music)
SUMMARY :
brought to you by SiliconANGLE Media that the actions going to be taking place. by market cap in the US, Google, Amazon, Facebook, etc. or change the cost of performing that work in the cloud and I was wondering if you could speak to the idea of big data being applied to business problems, (audience member speaking faintly) Our expectation is that as the tooling gets better, in the last three months to go out and do and Microsoft is doing the same thing, but they're doing it. Audience Member: But all the big players from Audience Member: IBM is doing it on a much bigger scale how the market gets structured They're getting surrounded by the cloud. and the network is going to work
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Volante | PERSON | 0.99+ |
Marc Andreessen | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Neil | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Sam Kahane | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Neil Raden | PERSON | 0.99+ |
2017 | DATE | 0.99+ |
John Greco | PERSON | 0.99+ |
Citibank | ORGANIZATION | 0.99+ |
Greg Terrio | PERSON | 0.99+ |
China | LOCATION | 0.99+ |
David Volante | PERSON | 0.99+ |
Apple | ORGANIZATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Clay Christensen | PERSON | 0.99+ |
David | PERSON | 0.99+ |
Sears Roebuck | ORGANIZATION | 0.99+ |
100% | QUANTITY | 0.99+ |
Domino Data Lab | ORGANIZATION | 0.99+ |
Peter Drucker | PERSON | 0.99+ |
US | LOCATION | 0.99+ |
Amazons | ORGANIZATION | 0.99+ |
two | QUANTITY | 0.99+ |
11% | QUANTITY | 0.99+ |
George Gilbert | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
San Jose | LOCATION | 0.99+ |
68% | QUANTITY | 0.99+ |
millions | QUANTITY | 0.99+ |
53, 55 % | QUANTITY | 0.99+ |
60% | QUANTITY | 0.99+ |
Peter Burris | PERSON | 0.99+ |
Facebooks | ORGANIZATION | 0.99+ |
103 billion | QUANTITY | 0.99+ |
Googles | ORGANIZATION | 0.99+ |
second part | QUANTITY | 0.99+ |
second point | QUANTITY | 0.99+ |
IBMs | ORGANIZATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
AWSs | ORGANIZATION | 0.99+ |
Accentures | ORGANIZATION | 0.99+ |
Hadoop | TITLE | 0.99+ |
One | QUANTITY | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
Snowflake | ORGANIZATION | 0.99+ |
four | QUANTITY | 0.99+ |
Hundred | QUANTITY | 0.99+ |
Transwarp | ORGANIZATION | 0.99+ |
Mellanox | ORGANIZATION | 0.99+ |
tens of millions | QUANTITY | 0.99+ |
three things | QUANTITY | 0.99+ |
Micron | ORGANIZATION | 0.99+ |
50 data scientists | QUANTITY | 0.99+ |
First | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
three times | QUANTITY | 0.99+ |
103 billion dollars | QUANTITY | 0.99+ |
Red Hat | TITLE | 0.99+ |
first bullet | QUANTITY | 0.99+ |
Two | QUANTITY | 0.99+ |
Airbnb | ORGANIZATION | 0.99+ |
Secondly | QUANTITY | 0.99+ |
five years | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
hundreds of millions | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |