Rahul Pathak & Shawn Bice, AWS | AWS re:Invent 2018

(futuristic electronic music) >> Live from Las Vegas, its theCUBE covering AWS re:Invent 2018. Brought to you by Amazon Web Services, Intel, and their ecosystem partners. >> Hey welcome back everyone. Live here in Las Vegas with AWS, Amazon Web Services, re:Invent 2018's CUBE coverage. Two sets, wall-to-wall coverage here on the ground floor. I'm here with Dave Vellante. Dave, six years we've been coming to re:Invent. Every year except for the first year. What a progression. We got great news. Always raising the bar, as they say at Amazon. This year, big announcements. One of them is blockchain. Really kind of laying out early formation of how they're going to roll out, thinking about blockchain. We're here to talk about here, with Rahul Pathak, who's the GM of analytics, and data lakes, and blockchain. Managing that. And Shawn Bice who's the vice president of non-relational databases. Guys, welcome to theCUBE. >> Thank you. >> Thank you, it's great to be here. >> I wish my voice was a little bit stronger. I love this segment. You know, we've been doing blockchain. We've been following one of the big events in the industry. If you separate out the whole token ICO scam situation, token economics is actually a great business model opportunity. Blockchain is an infrastructure, a decentralized infrastructure, that's great. But it's early. Day one really for you guys in a literal sense. How are you guys doing blockchain? Take a minute to explain the announcement because there are use cases, low-hanging use cases, that look a lot like IoT and supply chain that people are interested in. So take a minute to explain the announcements and what it means. >> Absolutely, so when we began looking at blockchain and blockchain use cases, we really realized there are two things that customers are trying to do. One case is really keep an immutable record of transactions and in a scenario where centralized trust is okay. And for that we have Amazon QLDB, which is an immutable cryptographically verifiable ledger. And then in scenarios where customers really wanted the decentralized trust and the smart contracts, that's where blockchain frameworks like Hyperledger Fabric and Ethereum play a role. But they're just super complicated to use and that's why we built Managed Blockchain, to make it easy to stand up, scale, and monitor these networks, so customers can focus on building applications. And in terms of use cases on the decentralized side, it's really quite diverse. I mean, we've got a customer, Guardian Life Insurance, so they're looking at Managed Blockchain 'cause they have this distributed network of partners, providers, patients, and customers, and they want to provide decentralized verifiable records of what's taking place. And it's just a broad set of use cases. >> And then we saw in the video this morning, I think it was Indonesian farmers, right? Wasn't that before the keynote? Did you see that? It was good. >> I missed that one. >> Yeah, so they don't have bank accounts. >> Oh, got it. >> And they got a reward system, so they're using the blockchain to reward farmers to participate. >> So a lot of people ask the question is, why do I need blockchain? Why don't you just put in database? So there are unique, which is true by the way, 'cause latency's an issue. (chuckles) Certainly, you might want to avoid blockchain in the short term, until that gets fixed. Assume that every one will get fixed over time, but what are some of the use cases where blockchain actually is relevant? Can you be specific because that's really people starting to make their selection criteria on. Look, I still use a database. I'm going to have all kinds of token and models around, but in a database. Where is the blockchain specifically resonating right now? >> I'll take a shot at this or we can do it together, but when you think of QLDB, it's not that customers are asking us for a ledger database. What they were really saying is, hey, we'd like to have this complete immutable, cryptographically verifiable trail of data. And it wasn't necessarily a blockchain conversation, wasn't necessarily a database conversation, it was like, I really would like to have this complete cyrptographic verifiable trail of data. And it turns out, as you sort of look at the use cases, in particular, the centralized trust scenario, QLDB does exactly that. It's not about decentralized trust. It's really about simply being able to have a database that when you write to that database, you write a transaction to the database, you can't change it. You know, a typical database people are like, well, hey, wait a second, what does immutable really mean? And once you get people to understand that once that transaction is written to a journal, it cannot be changed at all and attached, then all of a sudden there's that breakthrough moment of it being immutable and having that cryptographic trail. >> And the advantage relative to a distributive blockchain is performance, scale, and all the challenges that people always say. >> Yeah, exactly. Like with QLDB, you can find it's going to be two to three times faster cause you're not doing that distributing consensus. >> How about data lakes? Let's talk about data lakes. What problem were you guys trying to solve with the data lakes? There's a lot of them, but. (chuckles) >> That's a great question. So, essentially it's been hard for customers to set up data lakes 'cause you have to figure out where to get data from, you have to land it in S3, you've got to secure it, you've then got to secure every analytic service that you've got, you might have to clean your data. So with lake formation, what we're trying to do is make it super easy to set up data lakes. So we have blueprints for common databases and data sources. We bring that data into an S3 data lake and we've created a central catalog for that data where customers can define granular access policies with the table, and the column, and the row level. We've also got ML-based data cleansing and data deduplication. And so now customers can just use lake formation, set up data lakes, curate their data, protect it in a single place, and have those policies that enforce across all of the analytic services that they might use. >> So does it help solve the data swamp problem, get more value out of the data lake? And if so, how? >> Absolutely, so the way it does that is by automatically cataloging all datas that comes in. So we can recognize what the data is and then we allow customers to add business metadata to that so they can tag this as customer data, or PII data, or this is my table of sales history. And that then becomes searchable. So we automatically generate a catalog as data comes in and that addresses the, what do I have in my data lake problem. >> Okay, so-- >> Go ahead. >> So, Rahul, you're the general manager. Shawn, what's your job, what do you do? >> So our team builds all the non-relational databases at Amazon. So DynamoDB, Neptune, ElastiCache, Timestream, which you'll hear about today, QLDB, et cetera. So all those things-- >> Beanstalk too, Elastic Beanstalk? >> No we do not build Beanstalk. >> Okay, we're a customer of DynamoDB, by the way. >> Great! >> We're happy customers. >> That's great! >> And we use ElastiCache, right? >> Yup, the elastic >> There you go! >> surge still has it. >> So-- >> Haven't used Neptune yet. >> What's the biggest problem stigmas that you guys are trying to raise the bar on? What's the key focus as you get this new worlds and use cases coming together? These are new use cases. How are you guys evaluating it? How are you guys raising the bar? >> You know, that's a really good question you ask. What I've found in my experience is developers that have been building apps for a long time, most people are familiar with relational databases. For years we've been building apps in that context, but when you kind of look at how people are building apps today, it's very different than how they did in the past. Today developer do what they do best. They take an application, a big application, break it down into smaller parts, and they pick the right tool for the right job. >> I think the game developer mark is going to be a canary in the coal mine for developers, and it's a good spot for data formation in these kind of unstructured, non-relational scenarios. Okay, now all this engagement data, could be first person shooter, whatever it is, just throw it, I need to throw it somewhere, and I'll get to it and let it be ready to be worked on by analytics. >> Well, yeah, if you think about that gamer scenario, think about if you and I are building a game, who knows if there's going to be one user, ten players, or 10 million, or 100 million. And if we had 100 million, it's all about the performance being steady. At 100 million or ten. >> You need a fleet of servers. (John laughing) >> And a fleet of servers! >> Have you guys played Fortnite? Or do you have kids that play? >> I look over my kid's shoulder. I might play it. >> I've played, but-- >> They run all their analytics on us. They've got about 14 petabytes in S3 using S3 as their data lake, with EMR and Athena for analytics. >> We got a season-- >> I mean, think about that F1 example on keynotes today. Great example of insights. We apply that kind of concept to Fortnite, by the way, Fortnite has theCUBE in there. It's always a popular term. We noticed that, the hastag, #wherestheCUBEtoday. (Rahul chuckling) I couldn't resist. But the analytics you could get out of all that data, every interaction, all that gesture data. I mean, what are some of the things they're doing? Can you share how they're using the new tech to scale up and get these insights? >> Yeah, absolutely. So they're doing a bunch of things. I mean, one is just the health of the systems when you've got hundreds of millions of players. You need to know if you're up and it's working. The second is around engagement. What games, what collection of people work well together. And then it's what incentives they create in the game, what power ups people buy that lead to continued engagement, 'cause that defines success over the long term. What gets people coming back? And then they have an offline analytics process where they're looking at reporting, and history, and telemetry, so it's very comprehensive. So you're exactly right about gaming and analytics being a huge consumer of databases. >> Now, Shawn, didn't you guys have hard news today on DynamoDB, or? >> Yeah today we announce DynamoDB On-Demand, so customers that basically have workloads that could spike up and then all of a sudden drop off, a lot of these customers basically don't even want to think about capacity planning. They don't want to guess. They just want to basically pay only for what they're using. So we announced DynamoDB On-Demand. The developer experience is simple. You create a table and you putyour read/write capacity in the on-demand mode, and you literally only pay for the request that your workload puts through system. >> It's a great service actually. Again, making life easier for customers. Lower the bill, manage capacity, make things go better, faster, enables value. >> It's all about improving the customer experience. >> Alright, guys, I really appreciate you coming in. I'm really interested in following what you guys do in the future. I'm sure a lot of people watching will be as well, as analytics and AI become a real part of, as you guys move the stack and create that API model for, what you did for infrastructure, for apps. A total game changer, we believe. We're interested in following you guys, I'm sure others are. Where are you going to be this year? What's your focus? Where can people find out more besides going to Amazon site? Is there certain events you're going to be at? How do people get more information and what's the plans? >> There's actually some sessions on lake formation, blockchain that we're doing here. We'll have a continuous stream of summits, so as the AWS Summit calendar for 2019 gets published that's a great place to go for more information. And then just engage with us either on social media or through the web and we'll be happy to follow up. >> Alright, well, we'll do a good job on amplifying. A lot of people are interested, certainly blockchain, super hot. But people want better, stronger, more stable, but they want the decentralized immutable database model. >> Cryptographically verifiable! >> And see as everyone knows. >> Scalable! >> Anyone who wants to keep those, they talk about CUBE coins but I haven't said CUBE coin once on this episode. Wait for those tokens to be released soon. More coverage after this short break, stay with us. I'm John Furrier, and Dave Vellante, we'll be right back. (futuristic buzzing) (futuristic electronic music)

Published Date : Nov 29 2018

SUMMARY :

Brought to you by Amazon Web Services, of how they're going to roll out, thinking about blockchain. it's great to be here. How are you guys doing blockchain? And for that we have Amazon QLDB, which is an immutable Wasn't that before the keynote? And they got So a lot of people ask the question is, that when you write to that database, And the advantage relative Like with QLDB, you can find it's going to be two What problem were you guys trying where to get data from, you have to land it in S3, And that then becomes searchable. Shawn, what's your job, what do you do? So our team builds all the non-relational that you guys are trying to raise the bar on? You know, that's a really good question you ask. and I'll get to it and let it be ready think about if you and I are building a game, You need a fleet of servers. I might play it. as their data lake, with EMR and Athena for analytics. But the analytics you could get out of all that data, 'cause that defines success over the long term. and you literally only pay for the request Lower the bill, manage capacity, improving the customer experience. I'm really interested in following what you guys And then just engage with us either on social media A lot of people are interested, I'm John Furrier, and Dave Vellante, we'll be right back.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Rahul	PERSON	0.99+
Rahul Pathak	PERSON	0.99+
Shawn	PERSON	0.99+
Shawn Bice	PERSON	0.99+
John Furrier	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
ten players	QUANTITY	0.99+
10 million	QUANTITY	0.99+
Dave	PERSON	0.99+
ten	QUANTITY	0.99+
Fortnite	TITLE	0.99+
Intel	ORGANIZATION	0.99+
Las Vegas	LOCATION	0.99+
two	QUANTITY	0.99+
One case	QUANTITY	0.99+
100 million	QUANTITY	0.99+
One	QUANTITY	0.99+
two things	QUANTITY	0.99+
This year	DATE	0.99+
Today	DATE	0.99+
today	DATE	0.99+
S3	TITLE	0.99+
one user	QUANTITY	0.99+
Two sets	QUANTITY	0.99+
six years	QUANTITY	0.99+
EMR	ORGANIZATION	0.98+
second	QUANTITY	0.98+
DynamoDB	TITLE	0.98+
Athena	ORGANIZATION	0.98+
three times	QUANTITY	0.98+
John	PERSON	0.97+
re:Invent	EVENT	0.97+
2019	DATE	0.97+
Day one	QUANTITY	0.95+
one	QUANTITY	0.94+
this year	DATE	0.93+
this morning	DATE	0.91+
hundreds of millions of players	QUANTITY	0.91+
Ethereum	TITLE	0.88+
a second	QUANTITY	0.87+
re:Invent 2018	EVENT	0.85+
about 14 petabytes	QUANTITY	0.85+
single place	QUANTITY	0.85+
Indonesian	OTHER	0.81+
Hyperledger Fabric	TITLE	0.81+
Beanstalk	TITLE	0.79+
Invent 2018	EVENT	0.77+
first year	QUANTITY	0.75+
Neptune	TITLE	0.72+
re:	EVENT	0.67+
Timestream	ORGANIZATION	0.66+
QLDB	ORGANIZATION	0.65+
ElastiCache	TITLE	0.63+
tabase	PERSON	0.62+
DynamoDB	ORGANIZATION	0.61+
Elastic Beanstalk	TITLE	0.61+
Guardian Life Insurance	ORGANIZATION	0.56+
theCUBE	ORGANIZATION	0.5+
them	QUANTITY	0.5+
ElastiCache	ORGANIZATION	0.48+
Blockchain	OTHER	0.46+
QLDB	TITLE	0.45+
F1	TITLE	0.45+

Itamar Ankorion, Attunity & Arvind Rajagopalan, Verizon - #DataWorks - #theCUBE

>> Narrator: Live from San Jose in the heart of Silicon Valley, it's the CUBE covering DataWorks Summit 2017 brought to you by Hortonworks. >> Hey, welcome back to the CUBE live from the DataWorks Summit day 2. We've been here for a day and a half talking with fantastic leaders and innovators, learning a lot about what's happening in the world of big data, the convergence with Internet of Things Machine Learning, artificial intelligence, I could go on and on. I'm Lisa Martin, my co-host is George Gilbert and we are joined by a couple of guys, one is a Cube alumni, Itamar Ankorion, CMO of Attunity, Welcome back to the Cube. >> Thank you very much, good to be here, thank you Lisa and George. >> Lisa: Great to have you. >> And Arvind Rajagopalan, the Director of Technology Services for Verizon, welcome to the Cube. >> Thank you. >> So we were chatting before we went on, and Verizon, you're actually going to be presenting tomorrow, at the DataWorks summit, tell us about building... the journey that Verizon has been on building a Data Lake. >> Oh, Verizon is over the last 20 years, has been a large corporation, made up of a lot of different acquisitions and mergers, and that's how it was formed in 20 years back, and as we've gone through the journey of the mergers and the acquisitions over the years, we had data from different companies come together and form a lot of different data silos. So the reason we kind of started looking at this, is when our CFO started asking questions around... Being able to answer One Verizon questions, it's as simple as having Days Payable, or Working Capital Analysis across all the lines of businesses. And since we have a three-major-ERP footprint, it is extremely hard to get that data out, and there was a lot of manual data prep activities that was going into bringing together those One Verizon views. So that's really what was the catalyst to get the journey started for us. >> And it was driven by your CFO, you said? >> Arvind: That's right. >> Ah, very interesting, okay. So what are some of the things that people are going to hear tomorrow from your breakout session? >> Arvind: I'm sorry, say that again? >> Sorry, what are some of the things that the people, the attendees from your breakout session, are going to learn about the steps and the journey? >> So I'm going to primarily be talking about the challenges that we ran into, and share some around that, and also talk about some of the factors, such as the catalysts and what drew us to sort of moving in that direction, as well as getting to some architectural components, from high-level standpoint, talk about certain partners that we work with, the choices we made from an architecture perspective and the tools, as well as to kind of close the loop on, user adoption and what users are seeing in terms of business value, as we start centralizing all of the data at Verizon from a backoff as Finance and Supply Chains standpoint. So that's kind of what I'm looking at talking tomorrow. >> Arvind, it's interesting to hear you talk about sort of collecting data from essentially backoff as operational systems in a Data Lake. Were there... I assume that the state is sort of more refined and easily structured than the typical stories we hear about Data Lakes. Were there challenges in making it available for exploration and visualization, or were all the early-use cases really just Production Reporting? >> So standard reporting across the ERP systems is very mature and those capabilities are there, but then you look at across-ERP systems and we have three major ERP systems for each of the lines of businesses, when you want to look at combining all of the data, it's very hard, and to add to that, you pointed on self-service discovery, and visualization across all three datas, that's even more challenging, because it takes a lot of heavy lift, to normalize all of the data and bring it into one centralized platform, and we started off the journey with Oracle, and then we had SAP HANA, we were trying to bring all the data together, but then we were looking at systems in our non-SAP ERP systems and bringing that data into a SAP-kind of footprint, one, the cost was tremendously high, also there was a lot of heavy lift and challenges in terms of manually having to normalize the data and bring it into the same kind of data models. And even after all of that was done, it was not very self-service oriented for our users and Finance and Supply Chain. >> Let me drill into two of those things. So it sounds like the ETL process of converting it into a consumable format was very complex, and then it sounds like also, the discoverability, like where a tool, perhaps like Elation, might help, which is very, very immature right now, or maybe not immature, it's still young. Is that what was missing, or why was the ETL process so much more heavyweight than with a traditional data warehouse? >> The ETL processes, there's a lot of heavy lifting there involved, because of the proprietary data structures of the ERP systems, especially SAP is... The data structures and how the data is used across clustered and pool tables, is very proprietary. And on top of that, bringing the data formats and structures from a PeopleSoft ERP system which are supporting different lines of businesses, so there are a lot of customization that's gone into place, there are specific things that we use in the ERPs, in terms of the modules and how the processes are modeled in each of the lines of businesses, complicates things a lot. And then you try and bring all these three different ERPs, and the nuances that they have over the years, try and bring them together, it actually makes it very complex. >> So tell us then, help us understand, how the Data Lake made that easier. Was it because you didn't have to do all the refinement before it got there. And tell us how Attunity helped make that possible. >> Oh absolutely, so I think that's one of the big things, why we picked the Hortonworks as one of our key partners in terms of buidling out the Data Lake, it just came on greed, you aren't necessarily worried about doing a whole lot of ETL before you bring the data in, and it also provides with the tools and the technologies from a lot other partners. We have a lot of maturity now, better provided self-service discovery capabilities for ad hoc analysis and reporting. So this is helpful to the users because now they don't have to wait for prolonged IT development cycles to model the data, do the ETL and build reports for the to consume, which sometimes could take weeks and months. Now in a matter of days, they're able to see the data they're looking for and they're able to start the analysis, and once they start the analysis and the data is accessible, it's a matter of minutes and seconds looking at the different tools, how they want to look at it, how they want to model it, so it's actually being a huge value from the perspective of the users and what they're looking to do. >> Speaking of value, one of the things that was kind of thematic yesterday, we see enterprises are now embracing big data, they're embracing Hadoop, it's got to coexist within our ecosystem, and it's got to inter-operate, but just putting data in a Data Lake or Hadoop, that's not the value there, it's being able to analyze that data in motion, at rest, structured, unstructured, and start being able to glean or take actionable insights. From your CFO's perspective, where are you know of answering some of the questions that he or she had, from an insights perspective, with the Data Lake that you have in place? >> Yeah, before I address that, I wanted to quickly touch upon and wrap up George's question, if you don't mind. Because one of the key challenges, and I do talk about how Attunity helped. I was just about to answer the question before we moved on, so I just want to close the loop on that a little bit. So in terms of bringing the data in, the data acquisition or ingestion is key aspect of it, and again, looking at the proprietary data structures from the ERP systems is very complex, and involves a multi-step process to bring the data into a strange environment, and be able to put it in the swamp bring it into the Lake. And what Attunity has been able to help us with is, it has the intelligence to look at and understand the proprietary data structures of the ERPs, and it is able to bring all the data from the ERP source systems directly into Hadoop, without any stops, or staging data bases along the way. So it's been a huge value from that standpoint, I'll get into more details around that. And to answer your question, around how it's helping from a CFO standpoint, and the users in Finance, as I said, now all the data is available in one place, so it's very easy for them to consume the data, and be able to do ad hoc analysis. So if somebody's looking to, like I said earlier, want to look at and calculate base table, as an example, or they want to look at working capital, we are actually moving data using Attunity, CDC replicate product, we're getting data in real-time, into the Data Lake. So now they're able to turn things around, and do that kind of analysis in a matter of hours, versus overnight or in a matter of days, which was the previous environment. >> And that was kind of one of the things this morning, is it's really about speed, right? It's how fast can you move and it sounds like together with Attunity, Verizon is really not only making things simpler, as you talked about in this kind of model that you have, with different ERP systems, but you're also really able to get information into the right hands much, much faster. >> Absolutely, that's the beauty of the near real-time, and the CDC architecture, we're able to get data in, very easily and quickly, and Attunity also provides a lot of visibility as the data is in flight, we're able to see what's happening in the source system, how many packets are flowing through, and to a point, my developers are so excited to work with a product, because they don't have to worry about the changes happening in the source systems in terms of DDL and those changes are automatically understood by the product and pushed to the destination of Hadoop. So it's been a game-changer, because we have not had any downtime, because when there are things changing on the source system side, historically we had to take downtime, to change those configurations and the scripts, and publish it across environments, so that's been huge from that standpoint as well. >> Absolutely. >> Itamar, maybe, help us understand where Attunity can... It sounds like there's greatly reduced latency in the pipeline between the operational systems and the analytic system, but it also sounds like you still need to essentially reformat the data, so that it's consumable. So it sounds like there's an ETL pipeline that's just much, much faster, but at the same time, when it's like, replicate, it sounds like that goes without transformations. So help us sort of understand that nuance. >> Yeah, that's a great question, George. And indeed in the past few years, customers have been focused predominantly on getting the data to the Lake. I actually think it's one of the changes in the fame, we're hearing here in the show and the last few months is, how do we move to start using the data, the great applications on the data. So we're kind of moving to the next step, in the last few years we focused a lot on innovating and creating the solutions that facilitate and accelerate the process of getting data to the Lake, from a large scope of systems, including complex ones like SAP, and also making the process of doing that easier, providing real-time data that can both feed streaming architectures as well as batch ones. So once we got that covered, to your question, is what happens next, and one of the things we found, I think Verizon is also looking at it now and are being concomitant later. What we're seeing is, when you bring data in, and you want to adopt the streaming, or a continuous incremental type of data ingestion process, you're inherently building an architecture that takes what was originally a database, but you're kind of, in a sense, breaking it apart to partitions, as you're loading it over time. So when you land the data, and Arvind was referring to a swamp, or some customers refer to it as a landing zone, you bring the data into your Lake environment, but at the first stage that data is not structured, to your point, George, in a manner that's easily consumable. Alright, so the next step is, how do we facilitate the next step of the process, which today is still very manual-driven, has custom development and dealing with complex structures. So we actually are very excited, we've introduced, in the show here, we announced a new product by Attunity, Compose for Hive, which extends our Data Lake solutions, and what Compose of Hive is exactly designed to do, is address part of the problem you just described, where's when the data comes in and is partitioned, what Compose for Hive does, is it reassembles these partitions, and it then creates analytic-ready data sets, back in Hive, so it can create operational data stores, it can create historical data stores, so then the data becomes formatted, in a matter that's more easily accessible for users, who want to use analytic tools, VI-tools, Tableau, Qlik, any type of tool that can easily access a database. >> Would there be, as a next step, whether led by Verizon's requirements or Attunity's anticipation of broader customer requirements, something where, there's a, if not near real-time, but a very low latency landing and transformation, so that data that is time-sensitive can join the historical data. >> Absolutely, absolutely. So what we've done, is focus on real-time availability of data. So when we feed the data into the Data Lake, we fit it into ways, one is directly into Hive, but we also go through a streaming architecture, like Kafka, in the case of Hortonworks, can also fit also very well into HDF. So then the next step in the process, is producing those analytic data sets, or data source, out of it, which we enable, and what we do is design it together with our partners, with our inner customers. So again when we work on Replicate, then we worked on Compose, we worked very close with Fortune companies trying to deal with these challenges, so we can design a product. In the case of Compose for Hive for example, we have done a lot of collaboration, at a product engineering level, with Hortonworks, to leverage the latest and greatest in Hive 2.2, Hive LLAP, to be able to push down transformations, so those can be done faster, including real-time, so those datasets can be updated on a frequent basis. >> You talked about kind of customer requirements, either those specific or not, obviously talking to telecommunications company, are you seeing, Itamar, from Attunity's perspective, more of this need to... Alright, the data's in the Lake, or first it comes to the swamp, now it's in the Lake, to start partitioning it, are you seeing this need driven in specific industries, or is this really pretty horizontal? >> That's a good question and this is definitely a horizontal need, it's part of the infrastructure needs, so Verizon is a great customer, and we even worked similarly in telecommunications, we've been working with other customers in other industries, from manufacturing, to retail, to health care, to automotive and others, and in all of those cases it's on a foundation level, it's very similar architectural challenges. You need to ingest the data, you want to do it fast, you want to do it incrementally or continuously, even if you're loading directly into Hadoop. Naturally, when you're loading the data through a Kafka, or streaming architecture, it's a continuous fashon, and then you partition the data. So the partitioning of the data is kind of inherent to the architecture, and then you need to help deal with the data, for the next step in the process. And we're doing it both with Compose for Hive, but also for customers using streaming architectures like Kafka, we provide the mechanisms, from supporting or facilitating things like schema unpollution, and schema decoding, to be able to facilitate the downstream process of processing those partitions of data, so we can make the data available, that works both for analytics and streaming analytics, as well as for scenarios like microservices, where the way in which you partition the data or deliver the data, allows each microservice to pick up on the data it needs, from the relevant partition. >> Well guys, this has been a really informative conversation. Congratulations, Itamar, on the new announcement that you guys made today. >> Thank you very much. >> Lisa: Arvin, great to hear the use case and how Verizon really sounds quite pioneering in what you're doing, wish you continued success there, we look forward to hearing what's next for Verizon, we want to thank you for watching the CUBE, we are again live, day two, of the DataWorks summit, #DWS17, before me my co-host George Gilbert, I am Lisa Martin, stick around, we'll be right back. (relaxed techno music)

Published Date : Jun 14 2017

SUMMARY :

in the heart of Silicon Valley, and we are joined by a couple of guys, Thank you very much, good to be here, the Director of Technology Services for Verizon, at the DataWorks summit, So the reason we kind of started looking at this, that people are going to hear tomorrow and the tools, as well as to kind of close the loop on, than the typical stories we hear about Data Lakes. and bring it into the same kind of data models. So it sounds like the ETL process and the nuances that they have over the years, how the Data Lake made that easier. do the ETL and build reports for the to consume, and it's got to inter-operate, and it is able to bring all the data and it sounds like together with Attunity, and the CDC architecture, we're able to get data in, and the analytic system, getting the data to the Lake. can join the historical data. like Kafka, in the case of Hortonworks, Alright, the data's in the Lake, You need to ingest the data, you want to do it fast, Congratulations, Itamar, on the new announcement Lisa: Arvin, great to hear the use case

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Arvind Rajagopalan	PERSON	0.99+
Arvind	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Verizon	ORGANIZATION	0.99+
Itamar Ankorion	PERSON	0.99+
Lisa	PERSON	0.99+
George	PERSON	0.99+
Itamar	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
two	QUANTITY	0.99+
tomorrow	DATE	0.99+
Kafka	TITLE	0.99+
three	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Cube	ORGANIZATION	0.99+
Arvin	PERSON	0.99+
DataWorks Summit	EVENT	0.99+
SAP HANA	TITLE	0.99+
One	QUANTITY	0.99+
each	QUANTITY	0.99+
yesterday	DATE	0.99+
#DWS17	EVENT	0.99+
one	QUANTITY	0.98+
a day and a half	QUANTITY	0.98+
CDC	ORGANIZATION	0.98+
first stage	QUANTITY	0.98+
Tableau	TITLE	0.98+
DataWorks Summit 2017	EVENT	0.98+
Attunity	ORGANIZATION	0.98+
Hive	TITLE	0.98+
both	QUANTITY	0.98+
Attunity	PERSON	0.98+
DataWorks	EVENT	0.97+
today	DATE	0.97+
Compose for Hive	ORGANIZATION	0.97+
Compose	ORGANIZATION	0.96+
Hive 2.2	TITLE	0.95+
Qlik	TITLE	0.94+
Hadoop	TITLE	0.94+
one place	QUANTITY	0.93+
day two	QUANTITY	0.92+
each microservice	QUANTITY	0.9+
first	QUANTITY	0.9+
20 years back	DATE	0.89+
#DataWorks	ORGANIZATION	0.87+
three major ERP systems	QUANTITY	0.83+
last 20 years	DATE	0.82+
PeopleSoft	ORGANIZATION	0.8+
Data Lake	COMMERCIAL_ITEM	0.8+
SAP	ORGANIZATION	0.79+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for tabase: