Image Title

Search Results for Vertica 10:

Keynote Analysis | Virtual Vertica BDC 2020


 

(upbeat music) >> Narrator: It's theCUBE, covering the Virtual Vertica Big Data Conference 2020. Brought to you by Vertica. >> Dave Vellante: Hello everyone, and welcome to theCUBE's exclusive coverage of the Vertica Virtual Big Data Conference. You're watching theCUBE, the leader in digital event tech coverage. And we're broadcasting remotely from our studios in Palo Alto and Boston. And, we're pleased to be covering wall-to-wall this digital event. Now, as you know, originally BDC was scheduled this week at the new Encore Hotel and Casino in Boston. Their theme was "Win big with big data". Oh sorry, "Win big with data". That's right, got it. And, I know the community was really looking forward to that, you know, meet up. But look, we're making the best of it, given these uncertain times. We wish you and your families good health and safety. And this is the way that we're going to broadcast for the next several months. Now, we want to unpack Colin Mahony's keynote, but, before we do that, I want to give a little context on the market. First, theCUBE has covered every BDC since its inception, since the BDC's inception that is. It's a very intimate event, with a heavy emphasis on user content. Now, historically, the data engineers and DBAs in the Vertica community, they comprised the majority of the content at this event. And, that's going to be the same for this virtual, or digital, production. Now, theCUBE is going to be broadcasting for two days. What we're doing, is we're going to be concurrent with the Virtual BDC. We got practitioners that are coming on the show, DBAs, data engineers, database gurus, we got a security experts coming on, and really a great line up. And, of course, we'll also be hearing from Vertica Execs, Colin Mahony himself right of the keynote, folks from product marketing, partners, and a number of experts, including some from Micro Focus, which is the, of course, owner of Vertica. But I want to take a moment to share a little bit about the history of Vertica. The company, as you know, was founded by Michael Stonebraker. And, Verica started, really they started out as a SQL platform for analytics. It was the first, or at least one of the first, to really nail the MPP column store trend. Not only did Vertica have an early mover advantage in MPP, but the efficiency and scale of its software, relative to traditional DBMS, and also other MPP players, is underscored by the fact that Vertica, and the Vertica brand, really thrives to this day. But, I have to tell you, it wasn't without some pain. And, I'll talk a little bit about that, and really talk about how we got here today. So first, you know, you think about traditional transaction databases, like Oracle or IMBDB tour, or even enterprise data warehouse platforms like Teradata. They were simply not purpose-built for big data. Vertica was. Along with a whole bunch of other players, like Netezza, which was bought by IBM, Aster Data, which is now Teradata, Actian, ParAccel, which was the basis for Redshift, Amazon's Redshift, Greenplum was bought, in the early days, by EMC. And, these companies were really designed to run as massively parallel systems that smoked traditional RDBMS and EDW for particular analytic applications. You know, back in the big data days, I often joked that, like an NFL draft, there was run on MPP players, like when you see a run on polling guards. You know, once one goes, they all start to fall. And that's what you saw with the MPP columnar stores, IBM, EMC, and then HP getting into the game. So, it was like 2011, and Leo Apotheker, he was the new CEO of HP. Frankly, he has no clue, in my opinion, with what to do with Vertica, and totally missed one the biggest trends of the last decade, the data trend, the big data trend. HP picked up Vertica for a song, it wasn't disclosed, but my guess is that it was around 200 million. So, rather than build a bunch of smart tokens around Vertica, which I always call the diamond in the rough, Apotheker basically permanently altered HP for years. He kind of ruined HP, in my view, with a 12 billion dollar purchase of Autonomy, which turned out to be one of the biggest disasters in recent M&A history. HP was forced to spin merge, and ended up selling most of its software to Microsoft, Micro Focus. (laughs) Luckily, during its time at HP, CEO Meg Whitman, largely was distracted with what to do with the mess that she inherited form Apotheker. So, Vertica was left alone. Now, the upshot is Colin Mahony, who was then the GM of Vertica, and still is. By the way, he's really the CEO, and he just doesn't have the title, I actually think they should give that to him. But anyway, he's been at the helm the whole time. And Colin, as you'll see in our interview, is a rockstar, he's got technical and business jobs, people love him in the community. Vertica's culture is really engineering driven and they're all about data. Despite the fact that Vertica is a 15-year-old company, they've really kept pace, and not been polluted by legacy baggage. Vertica, early on, embraced Hadoop and the whole open-source movement. And that helped give it tailwinds. It leaned heavily into cloud, as we're going to talk about further this week. And they got a good story around machine intelligence and AI. So, whereas many traditional database players are really getting hurt, and some are getting killed, by cloud database providers, Vertica's actually doing a pretty good job of servicing its install base, and is in a reasonable position to compete for new workloads. On its last earnings call, the Micro Focus CFO, Stephen Murdoch, he said they're investing 70 to 80 million dollars in two key growth areas, security and Vertica. Now, Micro Focus is running its Suse play on these two parts of its business. What I mean by that, is they're investing and allowing them to be semi-autonomous, spending on R&D and go to market. And, they have no hardware agenda, unlike when Vertica was part of HP, or HPE, I guess HP, before the spin out. Now, let me come back to the big trend in the market today. And there's something going on around analytic databases in the cloud. You've got companies like Snowflake and AWS with Redshift, as we've reported numerous times, and they're doing quite well, they're gaining share, especially of new workloads that are merging, particularly in the cloud native space. They combine scalable compute, storage, and machine learning, and, importantly, they're allowing customers to scale, compute, and storage independent of each other. Why is that important? Because you don't have to buy storage every time you buy compute, or vice versa, in chunks. So, if you can scale them independently, you've got granularity. Vertica is keeping pace. In talking to customers, Vertica is leaning heavily into the cloud, supporting all the major cloud platforms, as we heard from Colin earlier today, adding Google. And, why my research shows that Vertica has some work to do in cloud and cloud native, to simplify the experience, it's more robust in motor stack, which supports many different environments, you know deep SQL, acid properties, and DNA that allows Vertica to compete with these cloud-native database suppliers. Now, Vertica might lose out in some of those native workloads. But, I have to say, my experience in talking with customers, if you're looking for a great MMP column store that scales and runs in the cloud, or on-prem, Vertica is in a very strong position. Vertica claims to be the only MPP columnar store to allow customers to scale, compute, and storage independently, both in the cloud and in hybrid environments on-prem, et cetera, cross clouds, as well. So, while Vertica may be at a disadvantage in a pure cloud native bake-off, it's more robust in motor stack, combined with its multi-cloud strategy, gives Vertica a compelling set of advantages. So, we heard a lot of this from Colin Mahony, who announced Vertica 10.0 in his keynote. He really emphasized Vertica's multi-cloud affinity, it's Eon Mode, which really allows that separation, or scaling of compute, independent of storage, both in the cloud and on-prem. Vertica 10, according to Mahony, is making big bets on in-database machine learning, he talked about that, AI, and along with some advanced regression techniques. He talked about PMML models, Python integration, which was actually something that they talked about doing with Uber and some other customers. Now, Mahony also stressed the trend toward object stores. And, Vertica now supports, let's see S3, with Eon, S3 Eon in Google Cloud, in addition to AWS, and then Pure and HDFS, as well, they all support Eon Mode. Mahony also stressed, as I mentioned earlier, a big commitment to on-prem and the whole cloud optionality thing. So 10.0, according to Colin Mahony, is all about really doubling down on these industry waves. As they say, enabling native PMML models, running them in Vertica, and really doing all the work that's required around ML and AI, they also announced support for TensorFlow. So, object store optionality is important, is what he talked about in Eon Mode, with the news of support for Google Cloud and, as well as HTFS. And finally, a big focus on deployment flexibility. Migration tools, which are a critical focus really on improving ease of use, and you hear this from a lot of customers. So, these are the critical aspects of Vertica 10.0, and an announcement that we're going to be unpacking all week, with some of the experts that I talked about. So, I'm going to close with this. My long-time co-host, John Furrier, and I have talked some time about this new cocktail of innovation. No longer is Moore's law the, really, mainspring of innovation. It's now about taking all these data troves, bringing machine learning and AI into that data to extract insights, and then operationalizing those insights at scale, leveraging cloud. And, one of the things I always look for from cloud is, if you've got a cloud play, you can attract innovation in the form of startups. It's part of the success equation, certainly for AWS, and I think it's one of the challenges for a lot of the legacy on-prem players. Vertica, I think, has done a pretty good job in this regard. And, you know, we're going to look this week for evidence of that innovation. One of the interviews that I'm personally excited about this week, is a new-ish company, I would consider them a startup, called Zebrium. What they're doing, is they're applying AI to do autonomous log monitoring for IT ops. And, I'm interviewing Larry Lancaster, who's their CEO, this week, and I'm going to press him on why he chose to run on Vertica and not a cloud database. This guy is a hardcore tech guru and I want to hear his opinion. Okay, so keep it right there, stay with us. We're all over the Vertica Virtual Big Data Conference, covering in-depth interviews and following all the news. So, theCUBE is going to be interviewing these folks, two days, wall-to-wall coverage, so keep it right there. We're going to be right back with our next guest, right after this short break. This is Dave Vellante and you're watching theCUBE. (upbeat music)

Published Date : Mar 31 2020

SUMMARY :

Brought to you by Vertica. and the Vertica brand, really thrives to this day.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

Larry LancasterPERSON

0.99+

ColinPERSON

0.99+

IBMORGANIZATION

0.99+

HPORGANIZATION

0.99+

70QUANTITY

0.99+

MicrosoftORGANIZATION

0.99+

Michael StonebrakerPERSON

0.99+

Colin MahonyPERSON

0.99+

Stephen MurdochPERSON

0.99+

VerticaORGANIZATION

0.99+

EMCORGANIZATION

0.99+

Palo AltoLOCATION

0.99+

ZebriumORGANIZATION

0.99+

two daysQUANTITY

0.99+

AWSORGANIZATION

0.99+

BostonLOCATION

0.99+

VericaORGANIZATION

0.99+

Micro FocusORGANIZATION

0.99+

2011DATE

0.99+

HPEORGANIZATION

0.99+

UberORGANIZATION

0.99+

firstQUANTITY

0.99+

MahonyPERSON

0.99+

Meg WhitmanPERSON

0.99+

AmazonORGANIZATION

0.99+

Aster DataORGANIZATION

0.99+

SnowflakeORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

FirstQUANTITY

0.99+

12 billion dollarQUANTITY

0.99+

OneQUANTITY

0.99+

this weekDATE

0.99+

John FurrierPERSON

0.99+

15-year-oldQUANTITY

0.98+

PythonTITLE

0.98+

OracleORGANIZATION

0.98+

olin MahonyPERSON

0.98+

around 200 millionQUANTITY

0.98+

Virtual Vertica Big Data Conference 2020EVENT

0.98+

theCUBEORGANIZATION

0.98+

80 million dollarsQUANTITY

0.97+

todayDATE

0.97+

two partsQUANTITY

0.97+

Vertica Virtual Big Data ConferenceEVENT

0.97+

TeradataORGANIZATION

0.97+

oneQUANTITY

0.97+

ActianORGANIZATION

0.97+

UNLIST TILL 4/2 - Extending Vertica with the Latest Vertica Ecosystem and Open Source Initiatives


 

>> Sue: Hello everybody. Thank you for joining us today for the Virtual Vertica BDC 2020. Today's breakout session in entitled Extending Vertica with the Latest Vertica Ecosystem and Open Source Initiatives. My name is Sue LeClaire, Director of Marketing at Vertica and I'll be your host for this webinar. Joining me is Tom Wall, a member of the Vertica engineering team. But before we begin, I encourage you to submit questions or comments during the virtual session. You don't have to wait. Just type your question or comment in the question box below the slides and click submit. There will be a Q and A session at the end of the presentation. We'll answer as many questions as we're able to during that time. Any questions that we don't get to, we'll do our best to answer them offline. Alternatively, you can visit the Vertica forums to post you questions after the session. Our engineering team is planning to join the forums to keep the conversation going. Also a reminder that you can maximize your screen by clicking the double arrow button in the lower right corner of the slides. And yes, this virtual session is being recorded and will be available to view on demand later this week. We'll send you a notification as soon as it's ready. So let's get started. Tom, over to you. >> Tom: Hello everyone and thanks for joining us today for this talk. My name is Tom Wall and I am the leader of Vertica's ecosystem engineering team. We are the team that focuses on building out all the developer tools, third party integrations that enables the SoftMaker system that surrounds Vertica to thrive. So today, we'll be talking about some of our new open source initatives and how those can be really effective for you and make things easier for you to build and integrate Vertica with the rest of your technology stack. We've got several new libraries, integration projects and examples, all open source, to share, all being built out in the open on our GitHub page. Whether you use these open source projects or not, this is a very exciting new effort that will really help to grow the developer community and enable lots of exciting new use cases. So, every developer out there has probably had to deal with the problem like this. You have some business requirements, to maybe build some new Vertica-powered application. Maybe you have to build some new system to visualize some data that's that's managed by Vertica. The various circumstances, lots of choices will might be made for you that constrain your approach to solving a particular problem. These requirements can come from all different places. Maybe your solution has to work with a specific visualization tool, or web framework, because the business has already invested in the licensing and the tooling to use it. Maybe it has to be implemented in a specific programming language, since that's what all the developers on the team know how to write code with. While Vertica has many different integrations with lots of different programming language and systems, there's a lot of them out there, and we don't have integrations for all of them. So how do you make ends meet when you don't have all the tools you need? All you have to get creative, using tools like PyODBC, for example, to bridge between programming languages and frameworks to solve the problems you need to solve. Most languages do have an ODBC-based database interface. ODBC is our C-Library and most programming languages know how to call C code, somehow. So that's doable, but it often requires lots of configuration and troubleshooting to make all those moving parts work well together. So that's enough to get the job done but native integrations are usually a lot smoother and easier. So rather than, for example, in Python trying to fight with PyODBC, to configure things and get Unicode working, and to compile all the different pieces, the right way is to make it all work smoothly. It would be much better if you could just PIP install library and get to work. And with Vertica-Python, a new Python client library, you can actually do that. So that story, I assume, probably sounds pretty familiar to you. Sounds probably familiar to a lot of the audience here because we're all using Vertica. And our challenge, as Big Data practitioners is to make sense of all this stuff, despite those technical and non-technical hurdles. Vertica powers lots of different businesses and use cases across all kinds of different industries and verticals. While there's a lot different about us, we're all here together right now for this talk because we do have some things in common. We're all using Vertica, and we're probably also using Vertica with other systems and tools too, because it's important to use the right tool for the right job. That's a founding principle of Vertica and it's true today too. In this constantly changing technology landscape, we need lots of good tools and well established patterns, approaches, and advice on how to combine them so that we can be successful doing our jobs. Luckily for us, Vertica has been designed to be easy to build with and extended in this fashion. Databases as a whole had had this goal from the very beginning. They solve the hard problems of managing data so that you don't have to worry about it. Instead of worrying about those hard problems, you can focus on what matters most to you and your domain. So implementing that business logic, solving that problem, without having to worry about all of these intense, sometimes details about what it takes to manage a database at scale. With the declarative syntax of SQL, you tell Vertica what the answer is that you want. You don't tell Vertica how to get it. Vertica will figure out the right way to do it for you so that you don't have to worry about it. So this SQL abstraction is very nice because it's a well defined boundary where lots of developers know SQL, and it allows you to express what you need without having to worry about those details. So we can be the experts in data management while you worry about your problems. This goes beyond though, what's accessible through SQL to Vertica. We've got well defined extension and integration points across the product that allow you to customize this experience even further. So if you want to do things write your own SQL functions, or extend database softwares with UDXs, you can do so. If you have a custom data format that might be a proprietary format, or some source system that Vertica doesn't natively support, we have extension points that allow you to use those. To make it very easy to do passive, parallel, massive data movement, loading into Vertica but also to export Vertica to send data to other systems. And with these new features in time, we also could do the same kinds of things with Machine Learning models, importing and exporting to tools like TensorFlow. And it's these integration points that have enabled Vertica to build out this open architecture and a rich ecosystem of tools, both open source and closed source, of different varieties that solve all different problems that are common in this big data processing world. Whether it's open source, streaming systems like Kafka or Spark, or more traditional ETL tools on the loading side, but also, BI tools and visualizers and things like that to view and use the data that you keep in your database on the right side. And then of course, Vertica needs to be flexible enough to be able to run anywhere. So you can really take Vertica and use it the way you want it to solve the problems that you need to solve. So Vertica has always employed open standards, and integrated it with all kinds of different open source systems. What we're really excited to talk about now is that we are taking our new integration projects and making those open source too. In particular, we've got two new open source client libraries that allow you to build Vertica applications for Python and Go. These libraries act as a foundation for all kinds of interesting applications and tools. Upon those libraries, we've also built some integrations ourselves. And we're using these new libraries to power some new integrations with some third party products. Finally, we've got lots of new examples and reference implementations out on our GitHub page that can show you how to combine all these moving parts and exciting ways to solve new problems. And the code for all these things is available now on our GitHub page. And so you can use it however you like, and even help us make it better too. So the first such project that we have is called Vertica-Python. Vertica-Python began at our customer, Uber. And then in late 2018, we collaborated with them and we took it over and made Vertica-Python the first official open source client for Vertica You can use this to build your own Python applications, or you can use it via tools that were written in Python. Python has grown a lot in recent years and it's very common language to solve lots of different problems and use cases in the Big Data space from things like DevOps admission and Data Science or Machine Learning, or just homegrown applications. We use Python a lot internally for our own QA testing and automation needs. And with the Python 2 End Of Life, that happened at the end of 2019, it was important that we had a robust Python solution to help migrate our internal stuff off of Python 2. And also to provide a nice migration path for all of you our users that might be worried about the same problems with their own Python code. So Vertica-Python is used already for lots of different tools, including Vertica's admintools now starting with 9.3.1. It was also used by DataDog to build a Vertica-DataDog integration that allows you to monitor your Vertica infrastructure within DataDog. So here's a little example of how you might use the Python Client to do some some work. So here we open in connection, we run a query to find out what node we've connected to, and then we do a little DataLoad by running a COPY statement. And this is designed to have a familiar look and feel if you've ever used a Python Database Client before. So we implement the DB API 2.0 standard and it feels like a Python package. So that includes things like, it's part of the centralized package manager, so you can just PIP install this right now and go start using it. We also have our client for Go length. So this is called vertica-sql-go. And this is a very similar story, just in a different context or the different programming language. So vertica-sql-go, began as a collaboration with the Microsoft Focus SecOps Group who builds microfocus' security products some of which use vertica internally to provide some of those analytics. So you can use this to build your own apps in the Go programming language but you can also use it via tools that are written Go. So most notably, we have our Grafana integration, which we'll talk a little bit more about later, that leverages this new clients to provide Grafana visualizations for vertica data. And Go is another rising popularity programming language 'cause it offers an interesting balance of different programming design trade-offs. So it's got good performance, got a good current concurrency and memory safety. And we liked all those things and we're using it to power some internal monitoring stuff of our own. And here's an example of the code you can write with this client. So this is Go code that does a similar thing. It opens a connection, it runs a little test query, and then it iterates over those rows, processing them using Go data types. You get that native look and feel just like you do in Python, except this time in the Go language. And you can go get it the way you usually package things with Go by running that command there to acquire this package. And it's important to note here for the DC projects, we're really doing open source development. We're not just putting code out on our GitHub page. So if you go out there and look, you can see that you can ask questions, you can report bugs, you can submit poll requests yourselves and you can collaborate directly with our engineering team and the other vertica users out on our GitHub page. Because it's out on our GitHub page, it allows us to be a little bit faster with the way we ship and deliver functionality compared to the core vertica release cycle. So in 2019, for example, as we were building features to prepare for the Python 3 migration, we shipped 11 different releases with 40 customer reported issues, filed on GitHub. That was done over 78 different poll requests and with lots of community engagement as we do so. So lots of people are using this already, we see as our GitHub badge last showed with about 5000 downloads of this a day of people using it in their software. And again, we want to make this easy, not just to use but also to contribute and understand and collaborate with us. So all these projects are built using the Apache 2.0 license. The master branch is always available and stable with the latest creative functionality. And you can always build it and test it the way we do so that it's easy for you to understand how it works and to submit contributions or bug fixes or even features. It uses automated testing both for locally and with poll requests. And for vertica-python, it's fully automated with Travis CI. So we're really excited about doing this and we're really excited about where it can go in the future. 'Cause this offers some exciting opportunities for us to collaborate with you more directly than we have ever before. You can contribute improvements and help us guide the direction of these projects, but you can also work with each other to share knowledge and implementation details and various best practices. And so maybe you think, "Well, I don't use Python, "I don't use go so maybe it doesn't matter to me." But I would argue it really does matter. Because even if you don't use these tools and languages, there's lots of amazing vertica developers out there who do. And these clients do act as low level building blocks for all kinds of different interesting tools, both in these Python and Go worlds, but also well beyond that. Because these implementations and examples really generalize to lots of different use cases. And we're going to do a deeper dive now into some of these to understand exactly how that's the case and what you can do with these things. So let's take a deeper look at some of the details of what it takes to build one of these open source client libraries. So these database client interfaces, what are they exactly? Well, we all know SQL, but if you look at what SQL specifies, it really only talks about how to manipulate the data within the database. So once you're connected and in, you can run commands with SQL. But these database client interfaces address the rest of those needs. So what does the programmer need to do to actually process those SQL queries? So these interfaces are specific to a particular language or a technology stack. But the use cases and the architectures and design patterns are largely the same between different languages. They all have a need to do some networking and connect and authenticate and create a session. They all need to be able to run queries and load some data and deal with problems and errors. And then they also have a lot of metadata and Type Mapping because you want to use these clients the way you use those programming languages. Which might be different than the way that vertica's data types and vertica's semantics work. So some of this client interfaces are truly standards. And they are robust enough in terms of what they design and call for to support a truly pluggable driver model. Where you might write an application that codes directly against the standard interface, and you can then plug in a different database driver, like a JDBC driver, to have that application work with any database that has a JDBC driver. So most of these interfaces aren't as robust as a JDBC or ODBC but that's okay. 'Cause it's good as a standard is, every database is unique for a reason. And so you can't really expose all of those unique properties of a database through these standard interfaces. So vertica's unique in that it can scale to the petabytes and beyond. And you can run it anywhere in any environment, whether it's on-prem or on clouds. So surely there's something about vertica that's unique, and we want to be able to take advantage of that fact in our solutions. So even though these standards might not cover everything, there's often a need and common patterns that arise to solve these problems in similar ways. When there isn't enough of a standard to define those comments, semantics that different databases might have in common, what you often see is tools will invent plug in layers or glue code to compensate by defining application wide standard to cover some of these same semantics. Later on, we'll get into some of those details and show off what exactly that means. So if you connect to a vertica database, what's actually happening under the covers? You have an application, you have a need to run some queries, so what does that actually look like? Well, probably as you would imagine, your application is going to invoke some API calls and some client library or tool. This library takes those API calls and implements them, usually by issuing some networking protocol operations, communicating over the network to ask vertica to do the heavy lifting required for that particular API call. And so these API's usually do the same kinds of things although some of the details might differ between these different interfaces. But you do things like establish a connection, run a query, iterate over your rows, manage your transactions, that sort of thing. Here's an example from vertica-python, which just goes into some of the details of what actually happens during the Connect API call. And you can see all these details in our GitHub implementation of this. There's actually a lot of moving parts in what happens during a connection. So let's walk through some of that and see what actually goes on. I might have my API call like this where I say Connect and I give it a DNS name, which is my entire cluster. And I give you my connection details, my username and password. And I tell the Python Client to get me a session, give me a connection so I can start doing some work. Well, in order to implement this, what needs to happen? First, we need to do some TCP networking to establish our connection. So we need to understand what the request is, where you're going to connect to and why, by pressing the connection string. and vertica being a distributed system, we want to provide high availability, so we might need to do some DNS look-ups to resolve that DNS name which might be an entire cluster and not just a single machine. So that you don't have to change your connection string every time you add or remove nodes to the database. So we do some high availability and DNS lookup stuff. And then once we connect, we might do Load Balancing too, to balance the connections across the different initiator nodes in the cluster, or in a sub cluster, as needed. Once we land on the node we want to be at, we might do some TLS to secure our connections. And vertica supports the industry standard TLS protocols, so this looks pretty familiar for everyone who've used TLS anywhere before. So you're going to do a certificate exchange and the client might send the server certificate too, and then you going to verify that the server is who it says it is, so that you can know that you trust it. Once you've established that connection, and secured it, then you can start actually beginning to request a session within vertica. So you going to send over your user information like, "Here's my username, "here's the database I want to connect to." You might send some information about your application like a session label, so that you can differentiate on the database with monitoring queries, what the different connections are and what their purpose is. And then you might also send over some session settings to do things like auto commit, to change the state of your session for the duration of this connection. So that you don't have to remember to do that with every query that you have. Once you've asked vertica for a session, before vertica will give you one, it has to authenticate you. and vertica has lots of different authentication mechanisms. So there's a negotiation that happens there to decide how to authenticate you. Vertica decides based on who you are, where you're coming from on the network. And then you'll do an auth-specific exchange depending on what the auth mechanism calls for until you are authenticated. Finally, vertica trusts you and lets you in, so you going to establish a session in vertica, and you might do some note keeping on the client side just to know what happened. So you might log some information, you might record what the version of the database is, you might do some protocol feature negotiation. So if you connect to a version of the database that doesn't support all these protocols, you might decide to turn some functionality off and that sort of thing. But finally, after all that, you can return from this API call and then your connection is good to go. So that connection is just one example of many different APIs. And we're excited here because with vertica-python we're really opening up the vertica client wire protocol for the first time. And so if you're a low level vertica developer and you might have used Postgres before, you might know that some of vertica's client protocol is derived from Postgres. But they do differ in many significant ways. And this is the first time we've ever revealed those details about how it works and why. So not all Postgres protocol features work with vertica because vertica doesn't support all the features that Postgres does. Postgres, for example, has a large object interface that allows you to stream very wide data values over. Whereas vertica doesn't really have very wide data values, you have 30, you have long bar charts, but that's about as wide as you can get. Similarly, the vertica protocol supports lots of features not present in Postgres. So Load Balancing, for example, which we just went through an example of, Postgres is a single node system, it doesn't really make sense for Postgres to have Load Balancing. But Load Balancing is really important for vertica because it is a distributed system. Vertica-python serves as an open reference implementation of this protocol. With all kinds of new details and extension points that we haven't revealed before. So if you look at these boxes below, all these different things are new protocol features that we've implemented since August 2019, out in the open on our GitHub page for Python. Now, the vertica-sql-go implementation of these things is still in progress, but the core protocols are there for basic query operations. There's more to do there but we'll get there soon. So this is really cool 'cause not only do you have now a Python Client implementation, and you have a Go client implementation of this, but you can use this protocol reference to do lots of other things, too. The obvious thing you could do is build more clients for other languages. So if you have a need for a client in some other language that are vertica doesn't support yet, now you have everything available to solve that problem and to go about doing so if you need to. But beyond clients, it's also used for other things. So you might use it for mocking and testing things. So rather than connecting to a real vertica database, you can simulate some of that. You can also use it to do things like query routing and proxies. So Uber, for example, this log here in this link tells a great story of how they route different queries to different vertical clusters by intercepting these protocol messages, parsing the queries in them and deciding which clusters to send them to. So a lot of these things are just ideas today, but now that you have the source code, there's no limit in sight to what you can do with this thing. And so we're very interested in hearing your ideas and requests and we're happy to offer advice and collaborate on building some of these things together. So let's take a look now at some of the things we've already built that do these things. So here's a picture of vertica's Grafana connector with some data powered from an example that we have in this blog link here. So this has an internet of things use case to it, where we have lots of different sensors recording flight data, feeding into Kafka which then gets loaded into vertica. And then finally, it gets visualized nicely here with Grafana. And Grafana's visualizations make it really easy to analyze the data with your eyes and see when something something happens. So in these highlighted sections here, you notice a drop in some of the activity, that's probably a problem worth looking into. It might be a lot harder to see that just by staring at a large table yourself. So how does a picture like that get generated with a tool like Grafana? Well, Grafana specializes in visualizing time series data. And time can be really tricky for computers to do correctly. You got time zones, daylight savings, leap seconds, negative infinity timestamps, please don't ever use those. In every system, if it wasn't hard enough, just with those problems, what makes it harder is that every system does it slightly differently. So if you're querying some time data, how do we deal with these semantic differences as we cross these domain boundaries from Vertica to Grafana's back end architecture, which is implemented in Go on it's front end, which is implemented with JavaScript? Well, you read this from bottom up in terms of the processing. First, you select the timestamp and Vertica is timestamp has to be converted to a Go time object. And we have to reconcile the differences that there might be as we translate it. So Go time has a different time zone specifier format, and it also supports nanosecond precision, while Vertica only supports microsecond precision. So that's not too big of a deal when you're querying data because you just see some extra zeros, not fractional seconds. But on the way in, if we're loading data, we have to find a way to resolve those things. Once it's into the Go process, it has to be converted further to render in the JavaScript UI. So that there, the Go time object has to be converted to a JavaScript Angular JS Date object. And there too, we have to reconcile those differences. So a lot of these differences might just be presentation, and not so much the actual data changing, but you might want to choose to render the date into a more human readable format, like we've done in this example here. Here's another picture. This is another picture of some time series data, and this one shows you can actually write your own queries with Grafana to provide answers. So if you look closely here you can see there's actually some functions that might not look too familiar with you if you know vertica's functions. Vertica doesn't have a dollar underscore underscore time function or a time filter function. So what's actually happening there? How does this actually provide an answer if it's not really real vertica syntax? Well, it's not sufficient to just know how to manipulate data, it's also really important that you know how to operate with metadata. So information about how the data works in the data source, Vertica in this case. So Grafana needs to know how time works in detail for each data source beyond doing that basic I/O that we just saw in the previous example. So it needs to know, how do you connect to the data source to get some time data? How do you know what time data types and functions there are and how they behave? How do you generate a query that references a time literal? And finally, once you've figured out how to do all that, how do you find the time in the database? How do you do know which tables have time columns and then they might be worth rendering in this kind of UI. So Go's database standard doesn't actually really offer many metadata interfaces. Nevertheless, Grafana needs to know those answers. And so it has its own plugin layer that provides a standardizing layer whereby every data source can implement hints and metadata customization needed to have an extensible data source back end. So we have another open source project, the Vertica-Grafana data source, which is a plugin that uses Grafana's extension points with JavaScript and the front end plugins and also with Go in the back end plugins to provide vertica connectivity inside Grafana. So the way this works, is that the plugin frameworks defines those standardizing functions like time and time filter, and it's our plugin that's going to rewrite them in terms of vertica syntax. So in this example, time gets rewritten to a vertica cast. And time filter becomes a BETWEEN predicate. So that's one example of how you can use Grafana, but also how you might build any arbitrary visualization tool that works with data in Vertica. So let's now look at some other examples and reference architectures that we have out in our GitHub page. For some advanced integrations, there's clearly a need to go beyond these standards. So SQL and these surrounding standards, like JDBC, and ODBC, were really critical in the early days of Vertica, because they really enabled a lot of generic database tools. And those will always continue to play a really important role, but the Big Data technology space moves a lot faster than these old database data can keep up with. So there's all kinds of new advanced analytics and query pushdown logic that were never possible 10 or 20 years ago, that Vertica can do natively. There's also all kinds of data-oriented application workflows doing things like streaming data, or Parallel Loading or Machine Learning. And all of these things, we need to build software with, but we don't really have standards to go by. So what do we do there? Well, open source implementations make for easier integrations, and applications all over the place. So even if you're not using Grafana for example, other tools have similar challenges that you need to overcome. And it helps to have an example there to show you how to do it. Take Machine Learning, for example. There's been many excellent Machine Learning tools that have arisen over the years to make data science and the task of Machine Learning lot easier. And a lot of those have basic database connectivity, but they generally only treat the database as a source of data. So they do lots of data I/O to extract data from a database like Vertica for processing in some other engine. We all know that's not the most efficient way to do it. It's much better if you can leverage Vertica scale and bring the processing to the data. So a lot of these tools don't take full advantage of Vertica because there's not really a uniform way to go do so with these standards. So instead, we have a project called vertica-ml-python. And this serves as a reference architecture of how you can do scalable machine learning with Vertica. So this project establishes a familiar machine learning workflow that scales with vertica. So it feels similar to like a scickit-learn project except all the processing and aggregation and heavy lifting and data processing happens in vertica. So this makes for a much more lightweight, scalable approach than you might otherwise be used to. So with vertica-ml-python, you can probably use this yourself. But you could also see how it works. So if it doesn't meet all your needs, you could still see the code and customize it to build your own approach. We've also got lots of examples of our UDX framework. And so this is an older GitHub project. We've actually had this for a couple of years, but it is really useful and important so I wanted to plug it here. With our User Defined eXtensions framework or UDXs, this allows you to extend the operators that vertica executes when it does a database load or a database query. So with UDXs, you can write your own domain logic in a C++, Java or Python or R. And you can call them within the context of a SQL query. And vertica brings your logic to that data, and makes it fast and scalable and fault tolerant and correct for you. So you don't have to worry about all those hard problems. So our UDX examples, demonstrate how you can use our SDK to solve interesting problems. And some of these examples might be complete, total usable packages or libraries. So for example, we have a curl source that allows you to extract data from any curlable endpoint and load into vertica. We've got things like an ODBC connector that allows you to access data in an external database via an ODBC driver within the context of a vertica query, all kinds of parsers and string processors and things like that. We also have more exciting and interesting things where you might not really think of vertica being able to do that, like a heat map generator, which takes some XY coordinates and renders it on top of an image to show you the hotspots in it. So the image on the right was actually generated from one of our intern gaming sessions a few years back. So all these things are great examples that show you not just how you can solve problems, but also how you can use this SDK to solve neat things that maybe no one else has to solve, or maybe that are unique to your business and your needs. Another exciting benefit is with testing. So the test automation strategy that we have in vertica-python these clients, really generalizes well beyond the needs of a database client. Anyone that's ever built a vertica integration or an application, probably has a need to write some integration tests. And that could be hard to do with all the moving parts, in the big data solution. But with our code being open source, you can see in vertica-python, in particular, how we've structured our tests to facilitate smooth testing that's fast, deterministic and easy to use. So we've automated the download process, the installation deployment process, of a Vertica Community Edition. And with a single click, you can run through the tests locally and part of the PR workflow via Travis CI. We also do this for multiple different python environments. So for all python versions from 2.7 up to 3.8 for different Python interpreters, and for different Linux distros, we're running through all of them very quickly with ease, thanks to all this automation. So today, you can see how we do it in vertica-python, in the future, we might want to spin that out into its own stand-alone testbed starter projects so that if you're starting any new vertica integration, this might be a good starting point for you to get going quickly. So that brings us to some of the future work we want to do here in the open source space . Well, there's a lot of it. So in terms of the the client stuff, for Python, we are marching towards our 1.0 release, which is when we aim to be protocol complete to support all of vertica's unique protocols, including COPY LOCAL and some new protocols invented to support complex types, which is our new feature in vertica 10. We have some cursor enhancements to do things like better streaming and improved performance. Beyond that we want to take it where you want to bring it. So send us your requests in the Go client fronts, just about a year behind Python in terms of its protocol implementation, but the basic operations are there. But we still have more work to do to implement things like load balancing, some of the advanced auths and other things. But they're two, we want to work with you and we want to focus on what's important to you so that we can continue to grow and be more useful and more powerful over time. Finally, this question of, "Well, what about beyond database clients? "What else might we want to do with open source?" If you're building a very deep or a robust vertica integration, you probably need to do a lot more exciting things than just run SQL queries and process the answers. Especially if you're an OEM or you're a vendor that resells vertica packaged as a black box piece of a larger solution, you might to have managed the whole operational lifecycle of vertica. There's even fewer standards for doing all these different things compared to the SQL clients. So we started with the SQL clients 'cause that's a well established pattern, there's lots of downstream work that that can enable. But there's also clearly a need for lots of other open source protocols, architectures and examples to show you how to do these things and do have real standards. So we talked a little bit about how you could do UDXs or testing or Machine Learning, but there's all sorts of other use cases too. That's why we're excited to announce here our awesome vertica, which is a new collection of open source resources available on our GitHub page. So if you haven't heard of this awesome manifesto before, I highly recommend you check out this GitHub page on the right. We're not unique here but there's lots of awesome projects for all kinds of different tools and systems out there. And it's a great way to establish a community and share different resources, whether they're open source projects, blogs, examples, references, community resources, and all that. And this tool is an open source project. So it's an open source wiki. And you can contribute to it by submitting yourself to PR. So we've seeded it with some of our favorite tools and projects out there but there's plenty more out there and we hope to see more grow over time. So definitely check this out and help us make it better. So with that, I'm going to wrap up. I wanted to thank you all. Special thanks to Siting Ren and Roger Huebner, who are the project leads for the Python and Go clients respectively. And also, thanks to all the customers out there who've already been contributing stuff. This has already been going on for a long time and we hope to keep it going and keep it growing with your help. So if you want to talk to us, you can find us at this email address here. But of course, you can also find us on the Vertica forums, or you could talk to us on GitHub too. And there you can find links to all the different projects I talked about today. And so with that, I think we're going to wrap up and now we're going to hand it off for some Q&A.

Published Date : Mar 30 2020

SUMMARY :

Also a reminder that you can maximize your screen and frameworks to solve the problems you need to solve.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Tom WallPERSON

0.99+

Sue LeClairePERSON

0.99+

UberORGANIZATION

0.99+

Roger HuebnerPERSON

0.99+

VerticaORGANIZATION

0.99+

TomPERSON

0.99+

Python 2TITLE

0.99+

August 2019DATE

0.99+

2019DATE

0.99+

Python 3TITLE

0.99+

twoQUANTITY

0.99+

SuePERSON

0.99+

PythonTITLE

0.99+

pythonTITLE

0.99+

SQLTITLE

0.99+

late 2018DATE

0.99+

FirstQUANTITY

0.99+

end of 2019DATE

0.99+

VerticaTITLE

0.99+

todayDATE

0.99+

JavaTITLE

0.99+

SparkTITLE

0.99+

C++TITLE

0.99+

JavaScriptTITLE

0.99+

vertica-pythonTITLE

0.99+

TodayDATE

0.99+

first timeQUANTITY

0.99+

11 different releasesQUANTITY

0.99+

UDXsTITLE

0.99+

KafkaTITLE

0.99+

Extending Vertica with the Latest Vertica Ecosystem and Open Source InitiativesTITLE

0.98+

GrafanaORGANIZATION

0.98+

PyODBCTITLE

0.98+

firstQUANTITY

0.98+

UDXTITLE

0.98+

vertica 10TITLE

0.98+

ODBCTITLE

0.98+

10DATE

0.98+

PostgresTITLE

0.98+

DataDogORGANIZATION

0.98+

40 customer reported issuesQUANTITY

0.97+

bothQUANTITY

0.97+

UNLIST TILL 4/2 - The Next-Generation Data Underlying Architecture


 

>> Paige: Hello, everybody, and thank you for joining us today for the virtual Vertica BDC 2020. Today's breakout session is entitled, Vertica next generation architecture. I'm Paige Roberts, open social relationship Manager at Vertica, I'll be your host for this session. And joining me is Vertica Chief Architect, Chuck Bear, before we begin, I encourage you to submit questions or comments during the virtual session. You don't have to wait, just type your question or comment, in the question box that's below the slides and click submit. So as you think about it, go ahead and type it in, there'll be a Q&A session at the end of the presentation, where we'll answer as many questions, as we're able to during the time. Any questions that we don't get a chance to address, we'll do our best to answer offline. Or alternatively, you can visit the Vertica forums to post your questions there, after the session. Our engineering team is planning to join the forum and keep the conversation going, so you can, it's just sort of like the developers lounge would be in delight conference. It gives you a chance to talk to our engineering team. Also, as a reminder, you can maximize your screen by clicking the double arrow button in the lower right corner of the slide. And before you ask, yes, this virtual session is being recorded, and it will be available to view on demand this week, we'll send you a notification, as soon as it's ready. Okay, now, let's get started, over to you, Chuck. >> Chuck: Thanks for the introduction, Paige, Vertica vision is to help customers, get value from structured data. This vision is simple, it doesn't matter what vertical the customer is in. They're all analytics companies, it doesn't matter what the customers environment is, as data is generated everywhere. We also can't do this alone, we know that you need other tools and people to build a complete solution. You know our database is key to delivering on the vision because we need a database that scales. When you start a new database company, you aren't going to win against 30 year old products on features. But from day one, we had something else, an architecture built for analytics performance. This architecture was inspired by the C-store project, combining the best design ideas from academics and industry veterans like Dr. Mike Stonebreaker. Our storage is optimized for performance, we use many computers in parallel. After over 10 years of refinements against various customer workloads, much of the design held up and serendipitously, the fact that we don't store in place updates set Vertica up for success in the cloud as well. These days, there are other tools that embody some of these design ideas. But we have other strengths that are more important than the storage format, where the only good analytics database that runs both on premise and in the cloud, giving customers the option to migrate their workloads, in most convenient and economical environment, or a full data management solution, not just the query tool. Unlike some other choices, ours comes with integration with a sequel ecosystem and full professional support. We organize our product roadmap into four key pillars, plus the cross cutting concerns of open integration and performance and scale. We have big plans to strengthen Vertica, while staying true to our core. This presentation is primarily about the separation pillar, and performance and scale, I'll cover our plans for Eon, our data management architecture, Mart analytic clusters, or fifth generation query executer, and our data storage layer. Let's start with how Vertica manages data, one of the central design points for Vertica was shared nothing, a design that didn't utilize a dedicated hardware shared disk technology. This quote here is how Mike put it politely, but around the Vertica office, shared disk with an LMTB over Mike's dead body. And we did get some early field experience with shared disk, customers, well, in fact will learn on anything if you let them. There were misconfigurations that required certified experts, obscure bugs extent. Another thing about the shared nothing designed for commodity hardware though, and this was in the papers, is that all the data management features like fault tolerance, backup and elasticity have to be done in software. And no matter how much you do, procuring, configuring and maintaining the machines with disks is harder. The software configuration process to add more service may be simple, but capacity planning, racking and stacking is not. The original allure of shared storage returned, this time though, the complexity and economics are different. It's cheaper, even provision storage with a few clicks and only pay for what you need. It expands, contracts and brings the maintenance of the storage close to a team is good at it. But there's a key difference, it's an object store, an object stores don't support the API's and access patterns used by most database software. So another Vertica visionary Ben, set out to exploit Vertica storage organization, which turns out to be a natural fit for modern cloud shared storage. Because Vertica data files are written once and not updated, they match the object storage model perfectly. And so today we have Eon, Eon uses shared storage to hold Vertica data with local disk depot's that act as caches, ensuring that we can get the performance that our customers have come to expect. Essentially Eon in enterprise behave similarly, but we have the benefit of flexible storage. Today Eon has the features our customers expect, it's been developed in tune for years, we have successful customers such as Redpharma, and if you'd like to know more about Eon has helped them succeed in Amazon cloud, I highly suggest reading their case study, which you can find on vertica.com. Eon provides high availability and flexible scaling, sometimes on premise customers with local disks get a little jealous of how recovery and sub-clusters work in Eon. Though we operate on premise, particularly on pure storage, but enterprise also had strengths, the most obvious being that you don't need and short shared storage to run it. So naturally, our vision is to converge the two modes, back into a single Vertica. A Vertica that runs any combination of local disks and shared storage, with full flexibility and portability. This is easy to say, but over the next releases, here's what we'll do. First, we realize that the query executer, optimizer and client drivers and so on, are already the same. Just the transaction handling and data management is different. But there's already more going on, we have peer-to-peer depot operations and other internode transfers. And enterprise also has a network, we could just get files from remote nodes over that network, essentially mimicking the behavior and benefits of shared storage with the layer of software. The only difference at the end of it, will be which storage hold the master copy. In enterprise, the nodes can't drop the files because they're the master copy. Whereas in Eon they can be evicted because it's just the cache, the masters, then shared storage. And in keeping with versus current support for multiple storage locations, we can intermix these approaches at the table level. Getting there as a journey, and we've already taken the first steps. One of the interesting design ideas of the C-store paper is the idea that redundant copies, don't have to have the same physical organization. Different copies can be optimized for different queries, sorted in different ways. Of course, Mike also said to keep the recovery system simple, because it's hard to debug, whenever the recovery system is being used, it's always in a high pressure situation. This turns out to be a contradiction, and the latter idea was better. No down performing stuff, if you don't keep the storage the same. Recovery hardware if you have, to reorganize data in the process. Even query optimization is more complicated. So over the past couple releases, we got rid of non identical buddies. But the storage files can still diverge at the fifth level, because tuple mover operations are synchronized. The same record can end up in different files than different nodes. The next step in our journey, is to make sure both copies are identical. This will help with backup and restore as well, because the second copy doesn't need backed up, or if it is backed up, it appears identical to the deduplication that is going to look present in both backup systems. Simultaneously, we're improving the Vertica networking service to support this new access pattern. In conjunction with identical storage files, we will converge to a recovery system that instantaneous nodes can process queries immediately, by retrieving data they need over the network from the redundant copies as they do in Eon day with even higher performance. The final step then is to unify the catalog and transaction model. Related concepts such as segment and shard, local catalog and shard catalog will be coalesced, as they're really represented the same concepts all along, just in different modes. In the catalog, we'll make slight changes to the definition of a projection, which represents the physical storage organization. The new definition simplifies segmentation and introduces valuable granularities of sharding to support evolution over time, and offers a straightforward migration path for both Eon and enterprise. There's a lot more to our Eon story than just the architectural roadmap. If you missed yesterday's Vertica, in Eon mode presentation about supported cloud, on premise storage option, replays are available. Be sure to catch the upcoming presentation on sizing and configuring vertica and in beyond doors. As we've seen with Eon, Vertica can separate data storage from the compute nodes, allowing machines to quickly fill in for each other, to rebuild fault tolerance. But separating compute and storage is used for much, much more. We now offer powerful, flexible ways for Vertica to add servers and increase access to the data. Vertica nine, this feature is called sub-clusters. It allows computing capacity to be added quickly and incrementally, and isolates workloads from each other. If your exploratory analytics team needs direct access to the source data, they need a lot of machines and not the same number all the time, and you don't 100% trust the kind of queries and user defined functions, they might be using sub-clusters as the solution. While there's much more expensive information available in our other presentation. I'd like to point out the highlights of our latest sub-cluster best practices. We suggest having a primary sub-cluster, this is the one that runs all the time, if you're loading data around the clock. It should be sized for the ETL workloads and also determines the natural shard count. Additional read oriented secondary sub-clusters can be added for real time dashboards, reports and analytics. That way, subclusters can be added or deep provisioned, without disruption to other users. The sub-cluster features of Vertica 9.3 are working well for customers. Yesterday, the Trade Desk presented their use case for Vertica over 300,000 in 5 sub clusters running in the cloud. If you missed a presentation, check out the replay. But we have plans beyond sub-clusters, we're extending sub-clusters to real clusters. For the Vertica savvy, this means the clusters bump, share the same spread ring network. This will provide further isolation, allowing clusters to control their own independent data sets. While replicating all are part of the data from other clusters using a publish subscribe mechanism. Synchronizing data between clusters is a feature customers want to understand the real business for themselves. This vision effects are designed for ancillary aspects, how we should assign resource pools, security policies and balance client connection. We will be simplifying our data segmentation strategy, so that when data that originate in the different clusters meet, they'll still get fully optimized joins, even if those clusters weren't positioned with the same number of nodes per shard. Having a broad vision for data management is a key component to political success. But we also take pride in our execution strategy, when you start a new database from scratch as we did 15 years ago, you won't compete on features. Our key competitive points where speed and scale of analytics, we set a target of 100 x better query performance in traditional databases with path loads. Our storage architecture provides a solid foundation on which to build toward these goals. Every query starts with data retrieval, keeping data sorted, organized by column and compressed by using adaptive caching, to keep the data retrieval time in IO to the bare minimum theoretically required. We also keep the data close to where it will be processed, and you clusters the machines to increase throughput. We have partition pruning a robust optimizer evaluate active use segmentation as part of the physical database designed to keep records close to the other relevant records. So the solid foundation, but we also need optimal execution strategies and tactics. One execution strategy which we built for a long time, but it's still a source of pride, it's how we process expressions. Databases and other systems with general purpose expression evaluators, write a compound expression into a tree. Here I'm using A plus one times B as an example, during execution, if your CPU traverses the tree and compute sub-parts from the whole. Tree traversal often takes more compute cycles than the actual work to be done. Especially in evaluation is a very common operation, so something worth optimizing. One instinct that engineers have is to use what we call, just-in-time or JIT compilation, which means generating code form the CPU into the specific activity expression, and add them. This replaces the tree of boxes that are custom made box for the query. This approach has complexity bugs, but it can be made to work. It has other drawbacks though, it adds a lot to query setup time, especially for short queries. And it pretty much eliminate the ability of mere models, mere mortals to develop user defined functions. If you go back to the problem we're trying to solve, the source of the overhead is the tree traversal. If you increase the batch of records processed in each traversal step, this overhead is amortized until it becomes negligible. It's a perfect match for a columnar storage engine. This also sets the CPU up for efficiency. The CPUs look particularly good, at following the same small sequence of instructions in a tight loop. In some cases, the CPU may even be able to vectorize, and apply the same processing to multiple records to the same instruction. This approach is easy to implement and debug, user defined functions are possible, then generally aligned with the other complexities of implementing and improving a large system. More importantly, the performance, both in terms of query setup and record throughput is dramatically improved. You'll hear me say that we look at research and industry for inspiration. In this case, our findings in line with academic binding. If you'd like to read papers, I recommend everything you always wanted to know about compiled and vectorized queries, don't afraid to ask, so we did have this idea before we read that paper. However, not every decision we made in the Vertica executer that the test of time as well as the expression evaluator. For example, sorting and grouping aren't susceptible to vectorization because sort decisions interrupt the flow. We have used JIT compiling on that for years, and Vertica 401, and it provides modest setups, but we know we can do even better. But who we've embarked on a new design for execution engine, which I call EE five, because it's our best. It's really designed especially for the cloud, now I know what you're thinking, you're thinking, I just put up a slide with an old engine, a new engine, and a sleek play headed up into the clouds. But this isn't just marketing hype, here's what I mean, when I say we've learned lessons over the years, and then we're redesigning the executer for the cloud. And of course, you'll see that the new design works well on premises as well. These changes are just more important for the cloud. Starting with the network layer in the cloud, we can't count on all nodes being connected to the same switch. Multicast doesn't work like it does in a custom data center, so as I mentioned earlier, we're redesigning the network transfer layer for the cloud. Storage in the cloud is different, and I'm not referring here to the storage of persistent data, but to the storage of temporary data used only once during the course of query execution. Our new pattern is designed to take into account the strengths and weaknesses of cloud object storage, where we can't easily do a path. Moving on to memory, many of our access patterns are reasonably effective on bare metal machines, that aren't the best choice on cloud hyperbug that have overheads, page faults or big gap. Here again, we found we can improve performance, a bit on dedicated hardware, and even more in the cloud. Finally, and this is true in all environments, core counts have gone up. And not all of our algorithms take full advantage, there's a lot of ground to cover here. But I think sorting in the perfect example to illustrate these points, I mentioned that we use JIT in sorting. We're getting rid of JIT in favor of a data format that can be treated efficiently, independent of what the data types are. We've drawn on the best, most modern technology from academia and industry. We've got our own analysis and testing, you know what we chose, we chose parallel merge sort, anyone wants to take a guess when merge sort was invented. It was invented in 1948, or at least documented that way, like computing context. If you've heard me talk before, you know that I'm fascinated by how all the things I worked with as an engineer, were invented before I was born. And in Vertica , we don't use the newest technologies, we use the best ones. And what is noble about Vertica is the way we've combined the best ideas together into a cohesive package. So all kidding about the 1940s aside, or he redesigned is actually state of the art. How do we know the sort routine is state of the art? It turns out, there's a pretty credible benchmark or at the appropriately named historic sortbenchmark.org. Anyone with resources looking for fame for their product or academic paper can try to set the record. Record is last set in 2016 with Tencent Sort, 100 terabytes in 99 seconds. Setting the records it's hard, you have to come up with hundreds of machines on a dedicated high speed switching fabric. There's a lot to a distributed sort, there all have core sorting algorithms. The authors of the paper conveniently broke out of the time spent in their sort, 67 out of 99 seconds want to know local sorting. If we break this out, divided by two CPUs and each of 512 nodes, we find that each CPU so there's almost a gig and a half per second. This is for what's called an indy sort, like an Indy race car, is in general purpose. It only handles fixed hundred five records with 10 byte key. There is a record length can vary, then it's called daytona sort, a 10 set daytona sort, is a little slower. One point is 10 gigabytes per second per CPU, now for Verrtica, We have a wide variety ability in record sizes, and more interesting data types, but still no harm in setting us like phone numbers, comfortable to the world record. On my 2017 era AMD desktop CPU, the Vertica EE5 sort to store about two and a half gigabytes per second. Obviously, this test isn't apply to apples because they use their own open power chip. But the number of DRM channels is the same, so it's pretty close the number that says we've hit on the right approach. And it performs this way on premise, in the cloud, and we can adapt it to cloud temp space. So what's our roadmap for integrating EE5 into the product and compare replacing the query executed the database to replacing the crankshaft and other parts of the engine of a car while it's been driven. We've actually done it before, between Vertica three and a half and five, and then we never really stopped changing it, now we'll do it again. The first part in replacing with algorithm called storage merge, which combines sorted data from disk. The first time has was two that are in vertical in incoming 10.0 patch that will be EE5 or resegmented storage merge, and then convert sorting and grouping into do out. There the performance results so far, in cases where the Vertica execute is doing well today, simple environments with simple data patterns, such as this simple capitalistic query, there's a lot of speed up, when we ship the segmentation code, which didn't quite make the freeze as much like to bump longer term, what we do is grouping into the storage of large operations, we'll get to where we think we ought to be, given a theoretical minimum work the CPUs need to do. Now if we look at a case where the current execution isn't doing as well, we see there's a much stronger benefit to the code shipping in Vertica 10. In fact, it turns a chart bar sideways to try to help you see the difference better. This case also benefit from the improvements in 10 product point releases and beyond. They will not happening to the vertical query executer, That was just the taste. But now I'd like to switch to the roadmap first for our adapters layer. I'll start with a story about, how our storage access layer evolved. If you go back to the academic ideas, if you start paper that persuaded investors to fund Vertica, read optimized store was the part that had substantiation in the form of performance data. Much of the paper was speculative, but we tried to follow it anyway. That paper talked about the WS with RS, The rights are in the read store, and how they work together for transaction processing and how there was a supernova. In all honesty, Vertica engineers couldn't figure out from the paper what to do next, incase you want to try, and we asked them they would like, We never got enough clarification to build it that way. But here's what we built, instead. We built the ROS, read optimized store, introduction on steep major revision. It's sorted, ordered columnar and compressed that follows a table partitioning that worked even better than the we are as described in the paper. We also built the last byte optimized store, we built four versions of this over the years actually. But this was the best one, it's not a set of interrelated V tree. It's just an append only, insertion order remember your way here, am sorry, no compression, no base, no partitioning. There is, however, a tuple over which does what we call move out. Move the data from WOS to ROS, sorting and compressing. Let's take a moment to compare how they behave, when you load data directly to the ROS, there's a data parsing operation. Then we finished the sorting, and then compressing right out the columnar data files to stay storage. The next query through executes against the ROS and it runs as it should because the ROS is read optimized. Let's repeat the exercise for WOS, the load operation response before the sorting and compressing, and before the data is written to persistent storage. Now it's possible for a query to come along, and the query could be responsible for sorting the lost data in addition to its other processes. Effect on query isn't predictable until the TM comes along and writes the data to the ROS. Over the years, we've done a lot of comparisons between ROS and WOS. ROS has always been better for sustained load throughput, it achieves much higher records per second without pushing back against the client and hasn't Vertica for when we developed the first usable merge out algorithm. ROS has always been better for predictable query performance, the ROS has never had the same management complexity and limitations as WOS. You don't have to pick a memory size and figure out which transactions get to use the pool. A non persistent nature of ROS always cause headaches when there are unexpected cluster shutdowns. We also looked at field usage data, we found that few customers were using a lot, especially among those that studied the issue carefully. So how we set out on a mission to improve the ROS to the point where it was always better than both the WOS and the profit of the past. And now it's true, ROS is better than the WOS and the loss of a couple of years ago. We implemented storage bundling, better catalog object storage and better tuple mover merge outs. And now, after extensive Q&A and customer testing, we've now succeeded, and in Vertica 10, we've removed the whys. Let's talk for a moment about simplicity, one of the best things Mike Stonebreaker said is no knobs. Anyone want to guess how many knobs we got rid of, and we took the WOS out of the product. 22 were five knobs to control whether it didn't went to ROS as well. Six controlling the ROS itself, Six more to set policies for the typical remove out and so on. In my honest opinion is still wasn't enough control over to achieve excess in a multi tenant environment, the big reason to get rid of the WOS for simplicity. Make the lives of DBAs and users better, we have a long way to go, but we're doing it. On my desk, I keep a jar with the knob in it for each knob in Vertica. When developers add a knob to the product, they have to add a knob to the jar. When they remove a knob, they get to choose one to take out, We have a lot of work to do, but I'm thrilled to report that in 15 years 10 is the first release with a number of knobs ticked downward. Get back to the WOS, I've said the most important thing get rid of it for last. We're getting rid of it so we can deliver our vision of the future to our customer. Remember how he said an Eon and sub-clusters we got all these benefits from shared storage? Guess what can't live in shared storage, the WOS. Remember how it's been a big part of the future was keeping the copies that identical to the primary copy? Independent actions of the WOS took a little at the root of the divergence between copies of the data. You have to admit it when you're wrong. That was in the original design and held up to the a selling point of time, without onto the idea of a separate ROS and WOS for too long. In Vertica, 10, we can finally bid, good reagents. I've covered a lot of ground, so let's put all the pieces together. I've talked a lot about our vision and how we're achieving it. But we also still pay attention to tactical detail. We've been fine tuning our memory management model to enhance performance. That involves revisiting tens of thousands of satellite of code, much like painting the inside of a large building with small paintbrushes. We're getting results as shown in the chart in Vertica nine, concurrent monitoring queries use memory from the global catalog tool, and Vertica 10, they don't. This is only one example of an important detail we're improving. We've also reworked the monitoring tables without network messages into two parts. The increased data we're collecting and analyzing and our quality assurance processes, we're improving on everything. As the story goes, I still have my grandfather's axe, of course, my father had to replace the handle, and I had to replace the head. Along the same lines, we still have Mike Stonebreaker Vertica. We didn't replace the query optimizer twice the debate database designer and storage layer four times each. The query executed is and it's a free design, like charted out how our code has changed over the years. I found that we don't have much from a long time ago, I did some digging, and you know what we have left in 2007. We have the original curly braces, and a little bit of percent code for handling dates and times. To deliver on our mission to help customers get value from their structured data, with high performance of scale, and in diverse deployment environments. We have the sound architecture roadmap, reviews the best execution strategy and solid tactics. On the architectural front, we're converging in an enterprise, we're extending smart analytic clusters. In query processing, we're redesigning the execution engine for the cloud, as I've told you. There's a lot more than just the fast engine. that you want to learn about our new data support for complex data types, improvements to the query optimizer statistics, or extension to live aggregate projections and flatten tables. You should check out some of the other engineering talk that the big data conference. We continue to stay on top of the details from low level CPU and memory too, to the monitoring management, developing tighter feedback cycles between development, Q&A and customers. And don't forget to check out the rest of the pillars of our roadmap. We have new easier ways to get started with Vertica in the cloud. Engineers have been hard at work on machine learning and security. It's easier than ever to use Vertica with third Party product, as a variety of tools integrations continues to increase. Finally, the most important thing we can do, is to help people get value from structured data to help people learn more about Vertica. So hopefully I left plenty of time for Q&A at the end of this presentation. I hope to hear your questions soon.

Published Date : Mar 30 2020

SUMMARY :

and keep the conversation going, and apply the same processing to multiple records

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
MikePERSON

0.99+

Mike StonebreakerPERSON

0.99+

2007DATE

0.99+

Chuck BearPERSON

0.99+

VerticaORGANIZATION

0.99+

2016DATE

0.99+

Paige RobertsPERSON

0.99+

ChuckPERSON

0.99+

second copyQUANTITY

0.99+

99 secondsQUANTITY

0.99+

67QUANTITY

0.99+

100%QUANTITY

0.99+

1948DATE

0.99+

BenPERSON

0.99+

two modesQUANTITY

0.99+

RedpharmaORGANIZATION

0.99+

first timeQUANTITY

0.99+

first stepsQUANTITY

0.99+

PaigePERSON

0.99+

two partsQUANTITY

0.99+

FirstQUANTITY

0.99+

five knobsQUANTITY

0.99+

100 terabytesQUANTITY

0.99+

both copiesQUANTITY

0.99+

TodayDATE

0.99+

each knobQUANTITY

0.99+

WSORGANIZATION

0.99+

AMDORGANIZATION

0.99+

EonORGANIZATION

0.99+

1940sDATE

0.99+

todayDATE

0.99+

One pointQUANTITY

0.99+

first partQUANTITY

0.99+

fifth levelQUANTITY

0.99+

eachQUANTITY

0.99+

yesterdayDATE

0.98+

bothQUANTITY

0.98+

SixQUANTITY

0.98+

firstQUANTITY

0.98+

512 nodesQUANTITY

0.98+

ROSTITLE

0.98+

over 10 yearsQUANTITY

0.98+

YesterdayDATE

0.98+

15 years agoDATE

0.98+

twiceQUANTITY

0.98+

sortbenchmark.orgOTHER

0.98+

first releaseQUANTITY

0.98+

two CPUsQUANTITY

0.97+

Vertica 10TITLE

0.97+

100 xQUANTITY

0.97+

WOSTITLE

0.97+

vertica.comOTHER

0.97+

10 byteQUANTITY

0.97+

this weekDATE

0.97+

oneQUANTITY

0.97+

5 sub clustersQUANTITY

0.97+

twoQUANTITY

0.97+

one exampleQUANTITY

0.97+

over 300,000QUANTITY

0.96+

Dr.PERSON

0.96+

OneQUANTITY

0.96+

tens of thousands of satelliteQUANTITY

0.96+

EE5COMMERCIAL_ITEM

0.96+

fifth generationQUANTITY

0.96+

UNLIST TILL 4/2 - Migrating Your Vertica Cluster to the Cloud


 

>> Jeff: Hello everybody, and thank you for joining us today for the virtual Vertica BDC 2020. Today's break-out session has been titled, "Migrating Your Vertica Cluster to the Cloud." I'm Jeff Healey, and I'm in Vertica marketing. I'll be your host for this break-out session. Joining me here are Sumeet Keswani and Chris Daly, Vertica product technology engineers and key members of our customer success team. Before we begin, I encourage you to submit questions and comments during the virtual session. You don't have to wait, just type your question or comment in the question box below the slides and click Submit. As always, there will be a Q&A session at the end of the presentation. We'll answer as many questions as we're able to during that time. Any questions that we don't address, we'll do our best to answer them offline. And alternatively, you can visit Vertica forums at forum.vertica.com to post your questions there after the session. Our engineering team is planning to join the forums to keep the conversation going. Also as a reminder that you can maximize your screen by clicking the double arrow button in the lower right corner of the slides. And yes, this virtual session is being recorded and will be available to view on demand this week. We'll send you a notification as soon as it's ready. Now let's get started. Over to you, Sumeet. >> Sumeet: Thank you, Jeff. Hello everyone, my name is Sumeet Keswani, and I will be talking about planning to deploy or migrate your Vertica cluster to the Cloud. So you may be moving an on-prem cluster or setting up a new cluster in the Cloud. And there are several design and operational considerations that will come into play. You know, some of these are cost, which industry you are in, or which expertise you have, in which Cloud platform. And there may be a personal preference too. After that, you know, there will be some operational considerations like VM and cluster sizing, what Vertica mode you want to deploy, Eon or Enterprise. It depends on your use keys. What are the DevOps skills available, you know, what elasticity, separation you need, you know, what is your backup and DR strategy, what do you want in terms of high availability. And you will have to think about, you know, how much data you have and where it's going to live. And in order to understand the cost, or the cost and the benefit of deployment and you will have to understand the access patterns, and how you are moving data from and to the Cloud. So things to consider before you move a deployment, a Vertica deployment to the Cloud, right, is one thing to keep in mind is, virtual CPUs, or CPUs in the Cloud, are not the same as the usual CPUs that you've been familiar with in your data center. A vCPU is half of a CPU because of hyperthreading. There is definitely the noisy neighbor effect. There is, depending on what other things are hosted in the Cloud environment, you may see performance, you may occasionally see performance issues. There are I/O limitations on the instance that you provision, so that what that really means is you can't always scale up. You might have to scale up, basically, you have to add more instances rather than getting bigger or the right size instances. Finally, there is an important distinction here. Virtualization is not free. There can be significant overhead to virtualization. It could be as much as 30%, so when you size and scale your clusters, you must keep that in mind. Now the other important aspect is, you know, where you put Vertica cluster is important. The choice of the region, how far it is from your various office locations. Where will the data live with respect to the cluster. And remember, popular locations can fill up. So if you want to scale out, additional capacity may or may not be available. So these are things you have to keep in mind when picking or choosing your Cloud platform and your deployment. So at this point, I want to make a plug for Eon mode. Eon mode is the latest mode, is a Cloud mode from Vertica. It has been designed with Cloud economics in mind. It uses shared storage, which is durable, available, and very cheap, like S3 storage or Google Cloud storage. It has been designed for quick scaling, like scale out, and highly elastic deployments. It has also been designed for high workload isolation, where each application or user group can be isolated from the other ones, so that they'll be paid and monitored separately, without affecting each other. But there are some disadvantages, or perhaps, you know, there's a cost for using Eon mode. Storage in S3 is neither cheap nor efficient. So there is a high latency of I/O when accessing data from S3. There is API and data access cost. There is API and data access cost associated with accessing your data in S3. Vertica in Eon mode has a pay as you go model, which you know, works for some people and does not work for others. And so therefore it is important to keep that in mind. And performance can be a little bit variable here, because it depends on cache, it depends on the local depot, which is a cache, and it is not as predictable as EE mode, so that's another trade-off. So let's spend about a minute and see how a Vertica cluster in Eon mode looks like. A Vertica cluster in Eon mode has S3 as the durability layer where all the data sits. There are subclusters, which are essentially just aggregation groups, which is separated compute, which will service different workloads. So for in this example, you may have two subclusters, one servicing ETL workload and the other one servicing (mic interference obscures speaking). These clusters are isolated, and they do not affect each other's performance. This allows you to scale them independently and isolate workloads. So this is the new Vertica Eon mode which has been specifically designed by us for use in the Cloud. But beyond this, you can use EE mode or Eon mode in the Cloud, it really depends on what your use case is. But both of these are possible, and we highly recommend Eon mode wherever possible. Okay, let's talk a little bit about what we mean by Vertica support in the Cloud. Now as you know, a Cloud is a shared data center, right. Performance in the Cloud can vary. It can vary between regions, availability zones, time of the day, choice of instance type, what concurrency you use, and of course the noisy neighbor effect. You know, we in Vertica, we performance, load, and stress test our product before every release. We have a bunch of use cases, we go through all of them, make sure that we haven't, you know, regressed any performance, and make sure that it works up to standards and gives you the high performance that you've come to expect. However, your solution or your workload is unique to you, and it is still your responsibility to make sure that it is tuned appropriately. To do this, one of the easiest things you can do is you know, pick a tested operating system, allocate the virtual machine, you know, with enough resources. It's something that we recommend, because we have tested it thoroughly. It goes a long way in giving you predictability. So after this I would like to now go into the various platforms, Cloud platforms, that Vertica has worked on. And I'll start with AWS, and my colleague Chris will speak about Azure and GCP. And our thoughts forward. So without further ado, let's start with the Amazon Web Services platform. So this is Vertica running on the Amazon Web Services platform. So as you probably are all aware, Amazon Web Services is the market leader in this space, and indeed really our biggest provider by far, and have been here for a very long time. And Vertica has a deep integration in the Amazon Web Services space. We provide a marketplace offering which has both pay as you go or a bring your own license model. We have many, you know, knowledge base articles, best practices, scripts, and resources that help you configure and use a Vertica database in the Cloud. We have several customers in the Cloud for many, many years now, and we have managed and console-based point and click deployments, you know, for ease of use in the Cloud. So Vertica has a deep integration in the Amazon space, and has been there for quite a bit now. So we communicate a lot of experience here. So let's talk about sizing on AWS. And sizing on any platform comes down to you know, these four or five different things. It comes down to picking the right instance type, picking the right disk volume and type, tuning and optimizing your networking, and finally, you know, some operational concerns like security, maintainability, and backup. So let's go into each one of these on the AWS ecosystem. So the choice of instance type is one of the important choices that you will make. In Eon mode, you know, you don't really need persistent disk. You can, you should probably choose ephemeral disk because it gives you extra speed, and speed with the instance type. We highly recommend the i3.4x instance types, which are very economical, have a big, 4 terabyte depot or cache per node. The i3.metal is similar to the i3.4, but has got significantly better performance, for those subclusters that need this extra oomph. The i3.2 is good for scale out of small ad hoc clusters. You know, they have a smaller cache and lower performance but it's cheap enough to use very indiscriminately. If you were in EE mode, well we don't use S3 as the layer of durability. Your local volumes is where we persist the data. Hence you do need an EBS volume in EE mode. In order to make sure that, you know, that the instance or the deployment is manageable, you might have to use some sort of a software RAID array over the EBS volumes. The most common instance type you see in EE mode is the r4.4x, the c4, or the m4 instance types. And then of course for temp space and depot we always recommend instance volumes. They're just much faster. Okay. So let's go, let's talk about optimizing your network or tuning your network. So the best, the best thing you can do about tuning your network, especially in Eon mode but in other modes too, is to get a VPC S3 endpoint. This is essentially a route table that makes sure that all traffic between your cluster and S3 goes over an internal fabric. This makes it much faster, you don't pay for egress cost, especially if you're doing external tables or your communal storage, but you do need to create it. Many times people will forget doing it. So you really do have to create it. And best of all, it's free. It doesn't cost you anything extra. You just have to create it during cluster creation time, and there's a significant performance difference for using it. The next thing about tuning your network is, you know, sizing it correctly. Pick the closest geographical region to where you'll consume the data. Pick the right availability zone. We highly recommend using cluster placement groups. In fact, they are required for the stability of the cluster. A cluster placement group is essentially, it operates this notion of rack. Nodes in a cluster placement group, are, you know, physically closer to each other than they would otherwise be. And this allows, you know, a 10 Gbps, bidirectional, TCP/IP flow between the nodes. And this makes sure that, you know, you get a high amount of Gbps per second. As you probably are all aware, the Cloud does not support broadcast or UDP broadcast. Hence you must use point-to-point UDP for spread in the Cloud, or in AWS. Beyond that, you know, point-to-point UDP does not scale very well beyond 20 nodes. So you know, as your cluster sizes increase, you must switch over to large cluster mode. And finally, use instances with enhanced networking or SR-IOV support. Again, it's free, it comes with the choice of the instance type and the operating system. We highly recommend it, it makes a big difference in terms of how your workload will perform. So let's talk a little bit about security, configuration, and orchestration. As I said, we provide CloudFormation scripts to make the ease of deployment. You can use the MC point and click. With regard to security, you know, Vertica does support instance profiles out of the box in Amazon. We recommend you use it. This is highly desirable so that you're not passing access keys and secret keys around. If you use our marketplace image, we have picked the latest operating systems, we have patched them, Amazon actually validates everything on marketplace and scans them for security vulnerabilities. So you get that for free. We do some basic configuration, like we disable root ssh access, we disallow any password access, we turn on encryption. And we run a basic set of security checks to make sure that the image is secure. Of course, it could be made more secure. But we try to balance out security, performance, and convenience. And finally, let's talk about backups. Especially in Eon mode I get the question, "Do we really need to back up our system, "since the data is in S3?" And the answer is yes, you do. Because you know, S3's not going to protect you against an accidental drop table. You know, S3 has a finite amount of reliability, durability, and availability. And you may want to be able to restore data differently. Also, backups are important if you're doing DR, or if you have additional cluster in a different region. The other cluster can be considered a backup. And finally, you know, why not create a backup or a disaster recovery cluster, you know, storage is cheap in the Cloud. So you know, we highly recommend you use it. So with this, I would like to hand it over to my colleague Christopher Daly, who will talk about the other two platforms that we support, that is Google and Azure. Over to you, Chris, thank you. >> Chris: Thanks, Sumeet, and hi everyone. So while there's no argument that we here at Vertica have a long history of running within the Amazon Web Services space, there are other alternative Cloud service providers where we do have a presence, such as Google Cloud Platform, or GCP. For those of you who are unfamiliar with GCP, it's considered the third-largest Cloud service provider in the marketspace, and it's priced very competitively to its peers. Has a lot of similarities to AWS in the products and services that it offers, but it tends to be the go-to place for newer businesses or startups. We officially started supporting GCP a little over a year ago with our first entry into their GCP marketplace. So a solution that deployed a fully-functional and ready-to-use Enterprise mode cluster. We followed up on that with the release and the support of Google storage buckets, and now I'm extremely pleased to announce that with the launch of Vertica 10, we're officially supporting Eon mode architecture in GCP as well. But that's not all, as we're adding additional offerings into the GCP marketplace. With the launch of version 10 we'll be introducing a second listing in the marketplace that allows for the deployment of an Eon mode cluster. It's all being driven by our own management consult. This will allow customers to quickly spin up Eon-based clusters within the GCP space. And if that wasn't enough, I'm also pleased to tell you that very soon after the launch we're going to be offering Vertica by the hour in GCP as well. And while we've done a lot to automate the solutions coming out of the marketplace, we recognize the simple fact that for a lot of you, building your cluster manually is really the only option. So with that in mind, let's talk about the things you need to understand in GCP to get that done. So wag me if you think this slide looks familiar. Well nope, it's not an erroneous duplicate slide from Sumeet's AWS section, it's merely an acknowledgement of all the things you need to consider for running Vertica in the Cloud. In Vertica, the choice of the operational mode will dictate some of the choices you'll need to make in the infrastructure, particularly around storage. Just like on-prem solutions, you'll need to understand the disk and networking capacities to get the most out of your cluster. And one of the most attractive things in GCP is the pricing, as it tends to run a little less than the others. But it does translate into less choices and options within the environment. If nothing else, I want you to take one thing away from this slide, and Sumeet said this earlier. VMs running, about AWS, Sumeet said this about AWS earlier. VMs running in the GCP space run on top of hardware that has hyperthreading enabled. And that a vCPU doesn't equate to a core, but rather a processing thread. This becomes particularly important if you're moving from an on-prem environment into the Cloud. Because a physical Vertica node with 32 cores is not the same thing as a VM with 32 vCPUs. In fact, with 32 vCPUs, you're only getting about 16 cores worth of performance. GCP does offer a handful of VM types, which they categorize by letter, but for us, most of these don't make great choices for Vertica nodes. The M series, however, does offer a good core to memory ratio, especially when you're looking at the high-mem variants. Also keep in mind, performance in I/O, such as network and disk, are partially dependent on the VM size, so customers in GCP space should be focusing on 16 vCPU VMs and above for their Vertica nodes. Disk options in GCP can be broken down into two basic types, persistent disks and local disks, which are ephemeral. Persistent disks come in two forms, standard or SSD. For Vertica in Eon mode, we recommend that customers use persistent SSD disks for the catalog, and either local SSD disks or persistent SSD disks for the depot and the temp space. Couple of things to think about here, though. Persistent disks are provisioned as a single device with a settable size. Local disks are provisioned as multiple disk devices with a fixed size, requiring you to use some kind of software RAIDing to create a single storage device. So while local SSD disks provide much more throughput, you're using CPU resources to maintain that RAID set. So you're giving, it's a little bit of a trade-off. Persistent disks offer redundancy, either within the zone that they exist or within the region, and if you're selecting regional redundancy, the disks are replicated across multiple zones in the region. This does have an effect in the performance to VM, so we don't recommend this. What we do recommend is the zonal redundancy when you're using persistent disks, as it gives you that redundancy level without actually affecting the performance. Remember also, in the Cloud space, all I/O is network I/O, as disks are basically block storage devices. This means that disk actions can and will slow down network traffic. And finally, the storage bucket access in GCP is based on GCP interoperability mode, which means that it's basically compliant with the AWS S3 API. In interoperability mode, access to the bucket is granted by a key pair that GCP refers to as HMAC keys. HMAC keys can be generated for individual users or for service accounts. We will recommend that when you're creating HMAC keys, choose a service account to ensure that the keys are not tied to a single employee. When thinking about storage for Enterprise mode, things change a little bit. We still recommend persistent SSD disks over standard ones. However, the use of local SSD disks for anything other than temp space is highly discouraged. I said it before, local SSD disks are ephemeral, meaning that the data's lost if the machine is turned off or goes down. So not really a place you want to store your data. In GCP, multiple persistent disks placed into a software RAID set does not create more throughput like you can find in other Clouds. The I/O saturation usually hits the VM limit long before it hits the disk limit. In fact, performance of a persistent disk is determined not just by the size of the disk but also by the size of the VM. So a good rule of thumb in GCP is to maximize your I/O throughput for persistent disks, is that the size tends to max out at two terabytes for SSDs and 10 terabytes for standard disks. Network performance in GCP can be thought of in two distinct ways. There's node-to-node traffic, and then there's egress traffic. Node-to-node performance in GCP is really good within the zone, with typical traffic between nodes falling in the 10-15 gigabits per second range. This might vary a little from zone to zone and region to region, but usually it's only limited, they're only limited by the existing traffic where the VMs exist. So kind of a noisy neighbor effect. Egress traffic from a VM, however, is subject to throughput caps, and these are based on the size of the VM. So the speed is set for the number of vCPUs in the VM at two gigabits per second per vCPU, and tops out at 32 gigabits per second. So the larger the VM, the more vCPUs you get, the larger the cap. So some things to consider in the NAV ring space for your Vertica cluster, pick a region that's physically close to you, even if you're connecting to the GCP network from a corporate LAN as opposed to the internet. The further the packets have to travel, the longer it's going to take. Also, GCP, like most Clouds, doesn't support UDP broadcast traffic on their virtual NAV ring, so you do have to use the point-to-point flag for spread when you're creating your cluster. And since the network cap on VMs is set at 32 gigabits per second per VM, maximize your network egress throughput and don't use VMs that are smaller than 16 vCPUs for your Vertica nodes. And that gets us to the one question I get asked the most often. How do I get my data into and out of the Cloud? Well, GCP offers many different methods to support different speeds and different price points for data ingress and egress. There's the obvious one, right, across the internet either directly to the VMs or into the storage bucket. Or you can, you know, light up a VPN tunnel to encrypt all that traffic. But additionally, GCP offers direct network interconnect from your corporate network. These get provided either by Google or by a partner, and they vary in speed. They also offer things called direct or carrier peering, which is connecting the edges of the networks between your network and GCP, and you can use a CDN interconnect, which creates, I believe, an on-demand connection from the GCP network, your network to the GCP network provided by a large host of CDN service providers. So GCP offers a lot of ways to move your data around in and out of the GCP Cloud. It's really a matter of what price point works for you, and what technology your corporation is looking to use. So we've talked about AWS, we've talked about GCP, it really only leaves one more Cloud. So last, and by far not the least, there's the Microsoft Azure environment. Holding on strong to the number two place in the major Cloud providers, Azure offers a very robust Cloud offering that's attractive to customers that already consume services from Microsoft. But what you need to keep in mind is that the underlying foundation of their Cloud is based on the Microsoft Windows products. And this makes their Cloud offering a little bit different in the services and offerings that they have. The good news here, though, is that Microsoft has done a very good job of getting their virtualization drivers baked into the modern kernels of most Linux operating systems, making running Linux-based VMs in Azure fairly seamless. So here's the slide again, but now you're going to notice some slight differences. First off, in Azure we only support Enterprise mode. This is because the Azure storage product is very different from Google Cloud storage and S3 on AWS. So while we're working on getting this supported, and we're starting to focus on this, we're just not there yet. This means that since we're only supporting Enterprise mode in Azure, getting the local disk performance right is one of the keys to success of running Vertica here, with the other major key being making sure that you're getting the appropriate networking speeds. Overall, Azure's a really good platform for Vertica, and its performance and pricing are very much on par with AWS. But keep in mind that the newer versions of the Linux operating systems like RHEL and CentOS run much better here than the older versions. Okay, so first things first again, just like GCP, in Azure VMs are running on top of hardware that has hyperthreading enabled. And because of the way Hyper-V, Azure's virtualization engine works, you can actually see this, right? So if you look down into the CPU information of the VM, you'll actually see how it groups the vCPUs by core and by thread. Azure offers a lot of VM types, and is adding new ones all the time. But for us, we see three VM types that make the most sense for Vertica. For customers that are looking to run production workloads in Azure, the Es_v3 and the Ls_v2 series are the two main recommendations. While they differ slightly in the CPU to memory ratio and the I/O throughput, the Es_v3 series is probably the best recommendation for a generalized Vertica node, with the Ls_v2 series being recommended for workloads with higher I/O requirements. If you're just looking to deploy a sandbox environment, the Ds_v3 series is a very suitable choice that really can reduce your overall Cloud spend. VM storage in Azure is provided by a grouping of four different types of disks, all offering different levels of performance. Introduced at the end of last year, the Ultra Disk option is the highest-performing disk type for VMs in Azure. It was designed for database workloads where high throughput and low latency is very desirable. However, the Ultra Disk option is not available in all regions yet, although that's been changing slowly since their launch. The Premium SSD option, which has been around for a while and is widely available, can also offer really nice performance, especially higher capacities. And just like other Cloud providers, the I/O throughput you get on VMs is dictated not only by the size of the disk, but also by the size of the VM and its type. So a good rule of thumb here, VM types with an S will have a much better throughput rate than ones that don't, meaning, and the larger VMs will have, you know, higher I/O throughput than the smaller ones. You can expand the VM disk throughput by using multiple disks in Azure and using a software RAID. This overcomes limitations of single disk performance, but keep in mind, you're now using CPU cycles to maintain that raid, so it is a bit of a trade-off. The other nice thing in Azure is that all their managed disks are encrypted by default on the server side, so there's really nothing you need to do here to enable that. And of course I mentioned this earlier. There is no native access to Azure storage yet, but it is something we're working on. We have seen folks using third-party applications like MinIO to access Azure's storage as an S3 bucket. So it might be something you want to keep in mind and maybe even test out for yourself. Networking in Azure comes in two different flavors, standard and accelerated. In standard networking, the entire network stack is abstracted and virtualized. So this works really well, however, there are performance limitations. Standard networking tends to top out around four gigabits per second. Accelerated networking in Azure is based on single root I/O virtualization of the Mellanox adapter. This is basically the VM talking directly to the physical network card in the host hardware, and it can produce network speeds up to 20 gigabits per second, so much, much faster. Keep in mind, though, that not all VM types and operating systems actually support accelerated networking, and you know, just like disk throughput, network throughput is based on VM type and size. So what do you need to think about for networking in the Azure space? Again, stay close to home. Pick regions that are geographically close to your location. Yes, the backbones between the regions are very, very fast, but the more hops your packets have to make, the longer it takes. Azure offers two types of groupings of their VMs, availability sets and availability zones. Availability zones offer good redundancy across multiple zones, but this actually increases the node-to-node latency, so we recommend you avoid this. Availability sets, on the other hand, keep all your VMs grouped together within a single zone, but makes sure that no two VMs are running on the same host hardware, for redundancy. And just like the other Clouds, UDP broadcast is not supported. So you have to use the point-to-point flag when you're creating your database to ensure that the spread works properly. Spread time out, okay, this is a good one. So recently, Microsoft has started monthly rolling updates of their environment. What this looks like is VMs running on top of hardware that's receiving an update can be paused. And this becomes problematic when the pausing of the VM exceeds eight seconds, as the unpaused members of the cluster now think the paused VM is down. So consider adjusting the spread time out for your clusters in Azure to 30 seconds, and this will help avoid a little of that. If you're deploying a large cluster in Azure, more than 20 nodes, use large closer mode, as point-to-point for spread doesn't really scale well with a lot of Vertica nodes. And finally, you know, pick VM types and operating systems that support accelerated networking. The difference in the node-to-node speeds can be very dramatic. So how do we move data around in Azure, right? So Microsoft views data egress a little differently than other Clouds, as it classifies any data being transmitted by a VM as egress. However, it only bills for data egress that actually leaves the Azure environment. Egress speed limits in Azure are based entirely on the VM type and size, and then they're limited by your connection to them. While not offering as many pathways to access their Cloud as GCP, Azure does offer a direct network-to-network connection called ExpressRoute. Offered by a large group of third-party processors, partners, the ExpressRoute offers multiple tiers of performance that are based on a flat charge for inbound data and a metered charge for outbound data. And of course you can still access these via the internet, and securely through a VPN gateway. So on behalf of Jeff, Sumeet, and myself, I'd like to thank you for listening to our presentation today, and we're now ready for Q&A.

Published Date : Mar 30 2020

SUMMARY :

Also as a reminder that you can maximize your screen So the best, the best thing you can do and the larger VMs will have, you know,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
ChrisPERSON

0.99+

SumeetPERSON

0.99+

Jeff HealeyPERSON

0.99+

Chris DalyPERSON

0.99+

JeffPERSON

0.99+

Christopher DalyPERSON

0.99+

Sumeet KeswaniPERSON

0.99+

GoogleORGANIZATION

0.99+

VerticaORGANIZATION

0.99+

AWSORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

10 GbpsQUANTITY

0.99+

AmazonORGANIZATION

0.99+

forum.vertica.comOTHER

0.99+

30 secondsQUANTITY

0.99+

Amazon Web ServicesORGANIZATION

0.99+

RHELTITLE

0.99+

TodayDATE

0.99+

32 coresQUANTITY

0.99+

CentOSTITLE

0.99+

more than 20 nodesQUANTITY

0.99+

32 vCPUsQUANTITY

0.99+

two platformsQUANTITY

0.99+

eight secondsQUANTITY

0.99+

VerticaTITLE

0.99+

10 terabytesQUANTITY

0.99+

oneQUANTITY

0.99+

todayDATE

0.99+

bothQUANTITY

0.99+

20 nodesQUANTITY

0.99+

two terabytesQUANTITY

0.99+

each applicationQUANTITY

0.99+

S3TITLE

0.99+

two typesQUANTITY

0.99+

LinuxTITLE

0.99+

two subclustersQUANTITY

0.98+

first entryQUANTITY

0.98+

one questionQUANTITY

0.98+

fourQUANTITY

0.98+

AzureTITLE

0.98+

Vertica 10TITLE

0.98+

4/2DATE

0.98+

FirstQUANTITY

0.98+

16 vCPUQUANTITY

0.98+

two formsQUANTITY

0.97+

MinIOTITLE

0.97+

single employeeQUANTITY

0.97+

firstQUANTITY

0.97+

this weekDATE

0.96+

UNLIST TILL 4/2 - Vertica Big Data Conference Keynote


 

>> Joy: Welcome to the Virtual Big Data Conference. Vertica is so excited to host this event. I'm Joy King, and I'll be your host for today's Big Data Conference Keynote Session. It's my honor and my genuine pleasure to lead Vertica's product and go-to-market strategy. And I'm so lucky to have a passionate and committed team who turned our Vertica BDC event, into a virtual event in a very short amount of time. I want to thank the thousands of people, and yes, that's our true number who have registered to attend this virtual event. We were determined to balance your health, safety and your peace of mind with the excitement of the Vertica BDC. This is a very unique event. Because as I hope you all know, we focus on engineering and architecture, best practice sharing and customer stories that will educate and inspire everyone. I also want to thank our top sponsors for the virtual BDC, Arrow, and Pure Storage. Our partnerships are so important to us and to everyone in the audience. Because together, we get things done faster and better. Now for today's keynote, you'll hear from three very important and energizing speakers. First, Colin Mahony, our SVP and General Manager for Vertica, will talk about the market trends that Vertica is betting on to win for our customers. And he'll share the exciting news about our Vertica 10 announcement and how this will benefit our customers. Then you'll hear from Amy Fowler, VP of strategy and solutions for FlashBlade at Pure Storage. Our partnership with Pure Storage is truly unique in the industry, because together modern infrastructure from Pure powers modern analytics from Vertica. And then you'll hear from John Yovanovich, Director of IT at AT&T, who will tell you about the Pure Vertica Symphony that plays live every day at AT&T. Here we go, Colin, over to you. >> Colin: Well, thanks a lot joy. And, I want to echo Joy's thanks to our sponsors, and so many of you who have helped make this happen. This is not an easy time for anyone. We were certainly looking forward to getting together in person in Boston during the Vertica Big Data Conference and Winning with Data. But I think all of you and our team have done a great job, scrambling and putting together a terrific virtual event. So really appreciate your time. I also want to remind people that we will make both the slides and the full recording available after this. So for any of those who weren't able to join live, that is still going to be available. Well, things have been pretty exciting here. And in the analytic space in general, certainly for Vertica, there's a lot happening. There are a lot of problems to solve, a lot of opportunities to make things better, and a lot of data that can really make every business stronger, more efficient, and frankly, more differentiated. For Vertica, though, we know that focusing on the challenges that we can directly address with our platform, and our people, and where we can actually make the biggest difference is where we ought to be putting our energy and our resources. I think one of the things that has made Vertica so strong over the years is our ability to focus on those areas where we can make a great difference. So for us as we look at the market, and we look at where we play, there are really three recent and some not so recent, but certainly picking up a lot of the market trends that have become critical for every industry that wants to Win Big With Data. We've heard this loud and clear from our customers and from the analysts that cover the market. If I were to summarize these three areas, this really is the core focus for us right now. We know that there's massive data growth. And if we can unify the data silos so that people can really take advantage of that data, we can make a huge difference. We know that public clouds offer tremendous advantages, but we also know that balance and flexibility is critical. And we all need the benefit that machine learning for all the types up to the end data science. We all need the benefits that they can bring to every single use case, but only if it can really be operationalized at scale, accurate and in real time. And the power of Vertica is, of course, how we're able to bring so many of these things together. Let me talk a little bit more about some of these trends. So one of the first industry trends that we've all been following probably now for over the last decade, is Hadoop and specifically HDFS. So many companies have invested, time, money, more importantly, people in leveraging the opportunity that HDFS brought to the market. HDFS is really part of a much broader storage disruption that we'll talk a little bit more about, more broadly than HDFS. But HDFS itself was really designed for petabytes of data, leveraging low cost commodity hardware and the ability to capture a wide variety of data formats, from a wide variety of data sources and applications. And I think what people really wanted, was to store that data before having to define exactly what structures they should go into. So over the last decade or so, the focus for most organizations is figuring out how to capture, store and frankly manage that data. And as a platform to do that, I think, Hadoop was pretty good. It certainly changed the way that a lot of enterprises think about their data and where it's locked up. In parallel with Hadoop, particularly over the last five years, Cloud Object Storage has also given every organization another option for collecting, storing and managing even more data. That has led to a huge growth in data storage, obviously, up on public clouds like Amazon and their S3, Google Cloud Storage and Azure Blob Storage just to name a few. And then when you consider regional and local object storage offered by cloud vendors all over the world, the explosion of that data, in leveraging this type of object storage is very real. And I think, as I mentioned, it's just part of this broader storage disruption that's been going on. But with all this growth in the data, in all these new places to put this data, every organization we talk to is facing even more challenges now around the data silo. Sure the data silos certainly getting bigger. And hopefully they're getting cheaper per bit. But as I said, the focus has really been on collecting, storing and managing the data. But between the new data lakes and many different cloud object storage combined with all sorts of data types from the complexity of managing all this, getting that business value has been very limited. This actually takes me to big bet number one for Team Vertica, which is to unify the data. Our goal, and some of the announcements we have made today plus roadmap announcements I'll share with you throughout this presentation. Our goal is to ensure that all the time, money and effort that has gone into storing that data, all the data turns into business value. So how are we going to do that? With a unified analytics platform that analyzes the data wherever it is HDFS, Cloud Object Storage, External tables in an any format ORC, Parquet, JSON, and of course, our own Native Roth Vertica format. Analyze the data in the right place in the right format, using a single unified tool. This is something that Vertica has always been committed to, and you'll see in some of our announcements today, we're just doubling down on that commitment. Let's talk a little bit more about the public cloud. This is certainly the second trend. It's the second wave maybe of data disruption with object storage. And there's a lot of advantages when it comes to public cloud. There's no question that the public clouds give rapid access to compute storage with the added benefit of eliminating data center maintenance that so many companies, want to get out of themselves. But maybe the biggest advantage that I see is the architectural innovation. The public clouds have introduced so many methodologies around how to provision quickly, separating compute and storage and really dialing-in the exact needs on demand, as you change workloads. When public clouds began, it made a lot of sense for the cloud providers and their customers to charge and pay for compute and storage in the ratio that each use case demanded. And I think you're seeing that trend, proliferate all over the place, not just up in public cloud. That architecture itself is really becoming the next generation architecture for on-premise data centers, as well. But there are a lot of concerns. I think we're all aware of them. They're out there many times for different workloads, there are higher costs. Especially if some of the workloads that are being run through analytics, which tend to run all the time. Just like some of the silo challenges that companies are facing with HDFS, data lakes and cloud storage, the public clouds have similar types of siloed challenges as well. Initially, there was a belief that they were cheaper than data centers, and when you added in all the costs, it looked that way. And again, for certain elastic workloads, that is the case. I don't think that's true across the board overall. Even to the point where a lot of the cloud vendors aren't just charging lower costs anymore. We hear from a lot of customers that they don't really want to tether themselves to any one cloud because of some of those uncertainties. Of course, security and privacy are a concern. We hear a lot of concerns with regards to cloud and even some SaaS vendors around shared data catalogs, across all the customers and not enough separation. But security concerns are out there, you can read about them. I'm not going to jump into that bandwagon. But we hear about them. And then, of course, I think one of the things we hear the most from our customers, is that each cloud stack is starting to feel even a lot more locked in than the traditional data warehouse appliance. And as everybody knows, the industry has been running away from appliances as fast as it can. And so they're not eager to get locked into another, quote, unquote, virtual appliance, if you will, up in the cloud. They really want to make sure they have flexibility in which clouds, they're going to today, tomorrow and in the future. And frankly, we hear from a lot of our customers that they're very interested in eventually mixing and matching, compute from one cloud with, say storage from another cloud, which I think is something that we'll hear a lot more about. And so for us, that's why we've got our big bet number two. we love the cloud. We love the public cloud. We love the private clouds on-premise, and other hosting providers. But our passion and commitment is for Vertica to be able to run in any of the clouds that our customers choose, and make it portable across those clouds. We have supported on-premises and all public clouds for years. And today, we have announced even more support for Vertica in Eon Mode, the deployment option that leverages the separation of compute from storage, with even more deployment choices, which I'm going to also touch more on as we go. So super excited about our big bet number two. And finally as I mentioned, for all the hype that there is around machine learning, I actually think that most importantly, this third trend that team Vertica is determined to address is the need to bring business critical, analytics, machine learning, data science projects into production. For so many years, there just wasn't enough data available to justify the investment in machine learning. Also, processing power was expensive, and storage was prohibitively expensive. But to train and score and evaluate all the different models to unlock the full power of predictive analytics was tough. Today you have those massive data volumes. You have the relatively cheap processing power and storage to make that dream a reality. And if you think about this, I mean with all the data that's available to every company, the real need is to operationalize the speed and the scale of machine learning so that these organizations can actually take advantage of it where they need to. I mean, we've seen this for years with Vertica, going back to some of the most advanced gaming companies in the early days, they were incorporating this with live data directly into their gaming experiences. Well, every organization wants to do that now. And the accuracy for clickability and real time actions are all key to separating the leaders from the rest of the pack in every industry when it comes to machine learning. But if you look at a lot of these projects, the reality is that there's a ton of buzz, there's a ton of hype spanning every acronym that you can imagine. But most companies are struggling, do the separate teams, different tools, silos and the limitation that many platforms are facing, driving, down sampling to get a small subset of the data, to try to create a model that then doesn't apply, or compromising accuracy and making it virtually impossible to replicate models, and understand decisions. And if there's one thing that we've learned when it comes to data, prescriptive data at the atomic level, being able to show end of one as we refer to it, meaning individually tailored data. No matter what it is healthcare, entertainment experiences, like gaming or other, being able to get at the granular data and make these decisions, make that scoring applies to machine learning just as much as it applies to giving somebody a next-best-offer. But the opportunity has never been greater. The need to integrate this end-to-end workflow and support the right tools without compromising on that accuracy. Think about it as no downsampling, using all the data, it really is key to machine learning success. Which should be no surprise then why the third big bet from Vertica is one that we've actually been working on for years. And we're so proud to be where we are today, helping the data disruptors across the world operationalize machine learning. This big bet has the potential to truly unlock, really the potential of machine learning. And today, we're announcing some very important new capabilities specifically focused on unifying the work being done by the data science community, with their preferred tools and platforms, and the volume of data and performance at scale, available in Vertica. Our strategy has been very consistent over the last several years. As I said in the beginning, we haven't deviated from our strategy. Of course, there's always things that we add. Most of the time, it's customer driven, it's based on what our customers are asking us to do. But I think we've also done a great job, not trying to be all things to all people. Especially as these hype cycles flare up around us, we absolutely love participating in these different areas without getting completely distracted. I mean, there's a variety of query tools and data warehouses and analytics platforms in the market. We all know that. There are tools and platforms that are offered by the public cloud vendors, by other vendors that support one or two specific clouds. There are appliance vendors, who I was referring to earlier who can deliver package data warehouse offerings for private data centers. And there's a ton of popular machine learning tools, languages and other kits. But Vertica is the only advanced analytic platform that can do all this, that can bring it together. We can analyze the data wherever it is, in HDFS, S3 Object Storage, or Vertica itself. Natively we support multiple clouds on-premise deployments, And maybe most importantly, we offer that choice of deployment modes to allow our customers to choose the architecture that works for them right now. It still also gives them the option to change move, evolve over time. And Vertica is the only analytics database with end-to-end machine learning that can truly operationalize ML at scale. And I know it's a mouthful. But it is not easy to do all these things. It is one of the things that highly differentiates Vertica from the rest of the pack. It is also why our customers, all of you continue to bet on us and see the value that we are delivering and we will continue to deliver. Here's a couple of examples of some of our customers who are powered by Vertica. It's the scale of data. It's the millisecond response times. Performance and scale have always been a huge part of what we have been about, not the only thing. I think the functionality all the capabilities that we add to the platform, the ease of use, the flexibility, obviously with the deployment. But if you look at some of the numbers they are under these customers on this slide. And I've shared a lot of different stories about these customers. Which, by the way, it still amaze me every time I talk to one and I get the updates, you can see the power and the difference that Vertica is making. Equally important, if you look at a lot of these customers, they are the epitome of being able to deploy Vertica in a lot of different environments. Many of the customers on this slide are not using Vertica just on-premise or just in the cloud. They're using it in a hybrid way. They're using it in multiple different clouds. And again, we've been with them on that journey throughout, which is what has made this product and frankly, our roadmap and our vision exactly what it is. It's been quite a journey. And that journey continues now with the Vertica 10 release. The Vertica 10 release is obviously a massive release for us. But if you look back, you can see that building on that native columnar architecture that started a long time ago, obviously, with the C-Store paper. We built it to leverage that commodity hardware, because it was an architecture that was never tightly integrated with any specific underlying infrastructure. I still remember hearing the initial pitch from Mike Stonebreaker, about the vision of Vertica as a software only solution and the importance of separating the company from hardware innovation. And at the time, Mike basically said to me, "there's so much R&D in innovation that's going to happen in hardware, we shouldn't bake hardware into our solution. We should do it in software, and we'll be able to take advantage of that hardware." And that is exactly what has happened. But one of the most recent innovations that we embraced with hardware is certainly that separation of compute and storage. As I said previously, the public cloud providers offered this next generation architecture, really to ensure that they can provide the customers exactly what they needed, more compute or more storage and charge for each, respectively. The separation of compute and storage, compute from storage is a major milestone in data center architectures. If you think about it, it's really not only a public cloud innovation, though. It fundamentally redefines the next generation data architecture for on-premise and for pretty much every way people are thinking about computing today. And that goes for software too. Object storage is an example of the cost effective means for storing data. And even more importantly, separating compute from storage for analytic workloads has a lot of advantages. Including the opportunity to manage much more dynamic, flexible workloads. And more importantly, truly isolate those workloads from others. And by the way, once you start having something that can truly isolate workloads, then you can have the conversations around autonomic computing, around setting up some nodes, some compute resources on the data that won't affect any of the other data to do some things on their own, maybe some self analytics, by the system, etc. A lot of things that many of you know we've already been exploring in terms of our own system data in the product. But it was May 2018, believe it or not, it seems like a long time ago where we first announced Eon Mode and I want to make something very clear, actually about Eon mode. It's a mode, it's a deployment option for Vertica customers. And I think this is another huge benefit that we don't talk about enough. But unlike a lot of vendors in the market who will dig you and charge you for every single add-on like hit-buy, you name it. You get this with the Vertica product. If you continue to pay support and maintenance, this comes with the upgrade. This comes as part of the new release. So any customer who owns or buys Vertica has the ability to set up either an Enterprise Mode or Eon Mode, which is a question I know that comes up sometimes. Our first announcement of Eon was obviously AWS customers, including the trade desk, AT&T. Most of whom will be speaking here later at the Virtual Big Data Conference. They saw a huge opportunity. Eon Mode, not only allowed Vertica to scale elastically with that specific compute and storage that was needed, but it really dramatically simplified database operations including things like workload balancing, node recovery, compute provisioning, etc. So one of the most popular functions is that ability to isolate the workloads and really allocate those resources without negatively affecting others. And even though traditional data warehouses, including Vertica Enterprise Mode have been able to do lots of different workload isolation, it's never been as strong as Eon Mode. Well, it certainly didn't take long for our customers to see that value across the board with Eon Mode. Not just up in the cloud, in partnership with one of our most valued partners and a platinum sponsor here. Joy mentioned at the beginning. We announced Vertica Eon Mode for Pure Storage FlashBlade in September 2019. And again, just to be clear, this is not a new product, it's one Vertica with yet more deployment options. With Pure Storage, Vertica in Eon mode is not limited in any way by variable cloud, network latency. The performance is actually amazing when you take the benefits of separate and compute from storage and you run it with a Pure environment on-premise. Vertica in Eon Mode has a super smart cache layer that we call the depot. It's a big part of our secret sauce around Eon mode. And combined with the power and performance of Pure's FlashBlade, Vertica became the industry's first advanced analytics platform that actually separates compute and storage for on-premises data centers. Something that a lot of our customers are already benefiting from, and we're super excited about it. But as I said, this is a journey. We don't stop, we're not going to stop. Our customers need the flexibility of multiple public clouds. So today with Vertica 10, we're super proud and excited to announce support for Vertica in Eon Mode on Google Cloud. This gives our customers the ability to use their Vertica licenses on Amazon AWS, on-premise with Pure Storage and on Google Cloud. Now, we were talking about HDFS and a lot of our customers who have invested quite a bit in HDFS as a place, especially to store data have been pushing us to support Eon Mode with HDFS. So as part of Vertica 10, we are also announcing support for Vertica in Eon Mode using HDFS as the communal storage. Vertica's own Roth format data can be stored in HDFS, and actually the full functionality of Vertica is complete analytics, geospatial pattern matching, time series, machine learning, everything that we have in there can be applied to this data. And on the same HDFS nodes, Vertica can actually also analyze data in ORC or Parquet format, using External tables. We can also execute joins between the Roth data the External table holds, which powers a much more comprehensive view. So again, it's that flexibility to be able to support our customers, wherever they need us to support them on whatever platform, they have. Vertica 10 gives us a lot more ways that we can deploy Eon Mode in various environments for our customers. It allows them to take advantage of Vertica in Eon Mode and the power that it brings with that separation, with that workload isolation, to whichever platform they are most comfortable with. Now, there's a lot that has come in Vertica 10. I'm definitely not going to be able to cover everything. But we also introduced complex types as an example. And complex data types fit very well into Eon as well in this separation. They significantly reduce the data pipeline, the cost of moving data between those, a much better support for unstructured data, which a lot of our customers have mixed with structured data, of course, and they leverage a lot of columnar execution that Vertica provides. So you get complex data types in Vertica now, a lot more data, stronger performance. It goes great with the announcement that we made with the broader Eon Mode. Let's talk a little bit more about machine learning. We've been actually doing work in and around machine learning with various extra regressions and a whole bunch of other algorithms for several years. We saw the huge advantage that MPP offered, not just as a sequel engine as a database, but for ML as well. Didn't take as long to realize that there's a lot more to operationalizing machine learning than just those algorithms. It's data preparation, it's that model trade training. It's the scoring, the shaping, the evaluation. That is so much of what machine learning and frankly, data science is about. You do know, everybody always wants to jump to the sexy algorithm and we handle those tasks very, very well. It makes Vertica a terrific platform to do that. A lot of work in data science and machine learning is done in other tools. I had mentioned that there's just so many tools out there. We want people to be able to take advantage of all that. We never believed we were going to be the best algorithm company or come up with the best models for people to use. So with Vertica 10, we support PMML. We can import now and export PMML models. It's a huge step for us around that operationalizing machine learning projects for our customers. Allowing the models to get built outside of Vertica yet be imported in and then applying to that full scale of data with all the performance that you would expect from Vertica. We also are more tightly integrating with Python. As many of you know, we've been doing a lot of open source projects with the community driven by many of our customers, like Uber. And so now with Python we've integrated with TensorFlow, allowing data scientists to build models in their preferred language, to take advantage of TensorFlow. But again, to store and deploy those models at scale with Vertica. I think both these announcements are proof of our big bet number three, and really our commitment to supporting innovation throughout the community by operationalizing ML with that accuracy, performance and scale of Vertica for our customers. Again, there's a lot of steps when it comes to the workflow of machine learning. These are some of them that you can see on the slide, and it's definitely not linear either. We see this as a circle. And companies that do it, well just continue to learn, they continue to rescore, they continue to redeploy and they want to operationalize all that within a single platform that can take advantage of all those capabilities. And that is the platform, with a very robust ecosystem that Vertica has always been committed to as an organization and will continue to be. This graphic, many of you have seen it evolve over the years. Frankly, if we put everything and everyone on here wouldn't fit on a slide. But it will absolutely continue to evolve and grow as we support our customers, where they need the support most. So, again, being able to deploy everywhere, being able to take advantage of Vertica, not just as a business analyst or a business user, but as a data scientists or as an operational or BI person. We want Vertica to be leveraged and used by the broader organization. So I think it's fair to say and I encourage everybody to learn more about Vertica 10, because I'm just highlighting some of the bigger aspects of it. But we talked about those three market trends. The need to unify the silos, the need for hybrid multiple cloud deployment options, the need to operationalize business critical machine learning projects. Vertica 10 has absolutely delivered on those. But again, we are not going to stop. It is our job not to, and this is how Team Vertica thrives. I always joke that the next release is the best release. And, of course, even after Vertica 10, that is also true, although Vertica 10 is pretty awesome. But, you know, from the first line of code, we've always been focused on performance and scale, right. And like any really strong data platform, the execution engine, the optimizer and the execution engine are the two core pieces of that. Beyond Vertica 10, some of the big things that we're already working on, next generation execution engine. We're already actually seeing incredible early performance from this. And this is just one example, of how important it is for an organization like Vertica to constantly go back and re-innovate. Every single release, we do the sit ups and crunches, our performance and scale. How do we improve? And there's so many parts of the core server, there's so many parts of our broader ecosystem. We are constantly looking at coverages of how we can go back to all the code lines that we have, and make them better in the current environment. And it's not an easy thing to do when you're doing that, and you're also expanding in the environment that we are expanding into to take advantage of the different deployments, which is a great segue to this slide. Because if you think about today, we're obviously already available with Eon Mode and Amazon, AWS and Pure and actually MinIO as well. As I talked about in Vertica 10 we're adding Google and HDFS. And coming next, obviously, Microsoft Azure, Alibaba cloud. So being able to expand into more of these environments is really important for the Vertica team and how we go forward. And it's not just running in these clouds, for us, we want it to be a SaaS like experience in all these clouds. We want you to be able to deploy Vertica in 15 minutes or less on these clouds. You can also consume Vertica, in a lot of different ways, on these clouds. As an example, in Amazon Vertica by the Hour. So for us, it's not just about running, it's about taking advantage of the ecosystems that all these cloud providers offer, and really optimizing the Vertica experience as part of them. Optimization, around automation, around self service capabilities, extending our management console, we now have products that like the Vertica Advisor Tool that our Customer Success Team has created to actually use our own smarts in Vertica. To take data from customers that give it to us and help them tune automatically their environment. You can imagine that we're taking that to the next level, in a lot of different endeavors that we're doing around how Vertica as a product can actually be smarter because we all know that simplicity is key. There just aren't enough people in the world who are good at managing data and taking it to the next level. And of course, other things that we all hear about, whether it's Kubernetes and containerization. You can imagine that that probably works very well with the Eon Mode and separating compute and storage. But innovation happens everywhere. We innovate around our community documentation. Many of you have taken advantage of the Vertica Academy. The numbers there are through the roof in terms of the number of people coming in and certifying on it. So there's a lot of things that are within the core products. There's a lot of activity and action beyond the core products that we're taking advantage of. And let's not forget why we're here, right? It's easy to talk about a platform, a data platform, it's easy to jump into all the functionality, the analytics, the flexibility, how we can offer it. But at the end of the day, somebody, a person, she's got to take advantage of this data, she's got to be able to take this data and use this information to make a critical business decision. And that doesn't happen unless we explore lots of different and frankly, new ways to get that predictive analytics UI and interface beyond just the standard BI tools in front of her at the right time. And so there's a lot of activity, I'll tease you with that going on in this organization right now about how we can do that and deliver that for our customers. We're in a great position to be able to see exactly how this data is consumed and used and start with this core platform that we have to go out. Look, I know, the plan wasn't to do this as a virtual BDC. But I really appreciate you tuning in. Really appreciate your support. I think if there's any silver lining to us, maybe not being able to do this in person, it's the fact that the reach has actually gone significantly higher than what we would have been able to do in person in Boston. We're certainly looking forward to doing a Big Data Conference in the future. But if I could leave you with anything, know this, since that first release for Vertica, and our very first customers, we have been very consistent. We respect all the innovation around us, whether it's open source or not. We understand the market trends. We embrace those new ideas and technologies and for us true north, and the most important thing is what does our customer need to do? What problem are they trying to solve? And how do we use the advantages that we have without disrupting our customers? But knowing that you depend on us to deliver that unified analytics strategy, it will deliver that performance of scale, not only today, but tomorrow and for years to come. We've added a lot of great features to Vertica. I think we've said no to a lot of things, frankly, that we just knew we wouldn't be the best company to deliver. When we say we're going to do things we do them. Vertica 10 is a perfect example of so many of those things that we from you, our customers have heard loud and clear, and we have delivered. I am incredibly proud of this team across the board. I think the culture of Vertica, a customer first culture, jumping in to help our customers win no matter what is also something that sets us massively apart. I hear horror stories about support experiences with other organizations. And people always seem to be amazed at Team Vertica's willingness to jump in or their aptitude for certain technical capabilities or understanding the business. And I think sometimes we take that for granted. But that is the team that we have as Team Vertica. We are incredibly excited about Vertica 10. I think you're going to love the Virtual Big Data Conference this year. I encourage you to tune in. Maybe one other benefit is I know some people were worried about not being able to see different sessions because they were going to overlap with each other well now, even if you can't do it live, you'll be able to do those sessions on demand. Please enjoy the Vertica Big Data Conference here in 2020. Please you and your families and your co-workers be safe during these times. I know we will get through it. And analytics is probably going to help with a lot of that and we already know it is helping in many different ways. So believe in the data, believe in data's ability to change the world for the better. And thank you for your time. And with that, I am delighted to now introduce Micro Focus CEO Stephen Murdoch to the Vertica Big Data Virtual Conference. Thank you Stephen. >> Stephen: Hi, everyone, my name is Stephen Murdoch. I have the pleasure and privilege of being the Chief Executive Officer here at Micro Focus. Please let me add my welcome to the Big Data Conference. And also my thanks for your support, as we've had to pivot to this being virtual rather than a physical conference. Its amazing how quickly we all reset to a new normal. I certainly didn't expect to be addressing you from my study. Vertica is an incredibly important part of Micro Focus family. Is key to our goal of trying to enable and help customers become much more data driven across all of their IT operations. Vertica 10 is a huge step forward, we believe. It allows for multi-cloud innovation, genuinely hybrid deployments, begin to leverage machine learning properly in the enterprise, and also allows the opportunity to unify currently siloed lakes of information. We operate in a very noisy, very competitive market, and there are people, who are in that market who can do some of those things. The reason we are so excited about Vertica is we genuinely believe that we are the best at doing all of those things. And that's why we've announced publicly, you're under executing internally, incremental investment into Vertica. That investments targeted at accelerating the roadmaps that already exist. And getting that innovation into your hands faster. This idea is speed is key. It's not a question of if companies have to become data driven organizations, it's a question of when. So that speed now is really important. And that's why we believe that the Big Data Conference gives a great opportunity for you to accelerate your own plans. You will have the opportunity to talk to some of our best architects, some of the best development brains that we have. But more importantly, you'll also get to hear from some of our phenomenal Roth Data customers. You'll hear from Uber, from the Trade Desk, from Philips, and from AT&T, as well as many many others. And just hearing how those customers are using the power of Vertica to accelerate their own, I think is the highlight. And I encourage you to use this opportunity to its full. Let me close by, again saying thank you, we genuinely hope that you get as much from this virtual conference as you could have from a physical conference. And we look forward to your engagement, and we look forward to hearing your feedback. With that, thank you very much. >> Joy: Thank you so much, Stephen, for joining us for the Vertica Big Data Conference. Your support and enthusiasm for Vertica is so clear, and it makes a big difference. Now, I'm delighted to introduce Amy Fowler, the VP of Strategy and Solutions for FlashBlade at Pure Storage, who was one of our BDC Platinum Sponsors, and one of our most valued partners. It was a proud moment for me, when we announced Vertica in Eon mode for Pure Storage FlashBlade and we became the first analytics data warehouse that separates compute from storage for on-premise data centers. Thank you so much, Amy, for joining us. Let's get started. >> Amy: Well, thank you, Joy so much for having us. And thank you all for joining us today, virtually, as we may all be. So, as we just heard from Colin Mahony, there are some really interesting trends that are happening right now in the big data analytics market. From the end of the Hadoop hype cycle, to the new cloud reality, and even the opportunity to help the many data science and machine learning projects move from labs to production. So let's talk about these trends in the context of infrastructure. And in particular, look at why a modern storage platform is relevant as organizations take on the challenges and opportunities associated with these trends. The answer is the Hadoop hype cycles left a lot of data in HDFS data lakes, or reservoirs or swamps depending upon the level of the data hygiene. But without the ability to get the value that was promised from Hadoop as a platform rather than a distributed file store. And when we combine that data with the massive volume of data in Cloud Object Storage, we find ourselves with a lot of data and a lot of silos, but without a way to unify that data and find value in it. Now when you look at the infrastructure data lakes are traditionally built on, it is often direct attached storage or data. The approach that Hadoop took when it entered the market was primarily bound by the limits of networking and storage technologies. One gig ethernet and slower spinning disk. But today, those barriers do not exist. And all FlashStorage has fundamentally transformed how data is accessed, managed and leveraged. The need for local data storage for significant volumes of data has been largely mitigated by the performance increases afforded by all Flash. At the same time, organizations can achieve superior economies of scale with that segregation of compute and storage. With compute and storage, you don't always scale in lockstep. Would you want to add an engine to the train every time you add another boxcar? Probably not. But from a Pure Storage perspective, FlashBlade is uniquely architected to allow customers to achieve better resource utilization for compute and storage, while at the same time, reducing complexity that has arisen from the siloed nature of the original big data solutions. The second and equally important recent trend we see is something I'll call cloud reality. The public clouds made a lot of promises and some of those promises were delivered. But cloud economics, especially usage based and elastic scaling, without the control that many companies need to manage the financial impact is causing a lot of issues. In addition, the risk of vendor lock-in from data egress, charges, to integrated software stacks that can't be moved or deployed on-premise is causing a lot of organizations to back off the all the way non-cloud strategy, and move toward hybrid deployments. Which is kind of funny in a way because it wasn't that long ago that there was a lot of talk about no more data centers. And for example, one large retailer, I won't name them, but I'll admit they are my favorites. They several years ago told us they were completely done with on-prem storage infrastructure, because they were going 100% to the cloud. But they just deployed FlashBlade for their data pipelines, because they need predictable performance at scale. And the all cloud TCO just didn't add up. Now, that being said, well, there are certainly challenges with the public cloud. It has also brought some things to the table that we see most organizations wanting. First of all, in a lot of cases applications have been built to leverage object storage platforms like S3. So they need that object protocol, but they may also need it to be fast. And the said object may be oxymoron only a few years ago, and this is an area of the market where Pure and FlashBlade have really taken a leadership position. Second, regardless of where the data is physically stored, organizations want the best elements of a cloud experience. And for us, that means two main things. Number one is simplicity and ease of use. If you need a bunch of storage experts to run the system, that should be considered a bug. The other big one is the consumption model. The ability to pay for what you need when you need it, and seamlessly grow your environment over time totally nondestructively. This is actually pretty huge and something that a lot of vendors try to solve for with finance programs. But no finance program can address the pain of a forklift upgrade, when you need to move to next gen hardware. To scale nondestructively over long periods of time, five to 10 years plus is a crucial architectural decisions need to be made at the outset. Plus, you need the ability to pay as you use it. And we offer something for FlashBlade called Pure as a Service, which delivers exactly that. The third cloud characteristic that many organizations want is the option for hybrid. Even if that is just a DR site in the cloud. In our case, that means supporting appplication of S3, at the AWS. And the final trend, which to me represents the biggest opportunity for all of us, is the need to help the many data science and machine learning projects move from labs to production. This means bringing all the machine learning functions and model training to the data, rather than moving samples or segments of data to separate platforms. As we all know, machine learning needs a ton of data for accuracy. And there is just too much data to retrieve from the cloud for every training job. At the same time, predictive analytics without accuracy is not going to deliver the business advantage that everyone is seeking. You can kind of visualize data analytics as it is traditionally deployed as being on a continuum. With that thing, we've been doing the longest, data warehousing on one end, and AI on the other end. But the way this manifests in most environments is a series of silos that get built up. So data is duplicated across all kinds of bespoke analytics and AI, environments and infrastructure. This creates an expensive and complex environment. So historically, there was no other way to do it because some level of performance is always table stakes. And each of these parts of the data pipeline has a different workload profile. A single platform to deliver on the multi dimensional performances, diverse set of applications required, that didn't exist three years ago. And that's why the application vendors pointed you towards bespoke things like DAS environments that we talked about earlier. And the fact that better options exists today is why we're seeing them move towards supporting this disaggregation of compute and storage. And when it comes to a platform that is a better option, one with a modern architecture that can address the diverse performance requirements of this continuum, and allow organizations to bring a model to the data instead of creating separate silos. That's exactly what FlashBlade is built for. Small files, large files, high throughput, low latency and scale to petabytes in a single namespace. And this is importantly a single rapid space is what we're focused on delivering for our customers. At Pure, we talk about it in the context of modern data experience because at the end of the day, that's what it's really all about. The experience for your teams in your organization. And together Pure Storage and Vertica have delivered that experience to a wide range of customers. From a SaaS analytics company, which uses Vertica on FlashBlade to authenticate the quality of digital media in real time, to a multinational car company, which uses Vertica on FlashBlade to make thousands of decisions per second for autonomous cars, or a healthcare organization, which uses Vertica on FlashBlade to enable healthcare providers to make real time decisions that impact lives. And I'm sure you're all looking forward to hearing from John Yavanovich from AT&T. To hear how he's been doing this with Vertica and FlashBlade as well. He's coming up soon. We have been really excited to build this partnership with Vertica. And we're proud to provide the only on-premise storage platform validated with Vertica Eon Mode. And deliver this modern data experience to our customers together. Thank you all so much for joining us today. >> Joy: Amy, thank you so much for your time and your insights. Modern infrastructure is key to modern analytics, especially as organizations leverage next generation data center architectures, and object storage for their on-premise data centers. Now, I'm delighted to introduce our last speaker in our Vertica Big Data Conference Keynote, John Yovanovich, Director of IT for AT&T. Vertica is so proud to serve AT&T, and especially proud of the harmonious impact we are having in partnership with Pure Storage. John, welcome to the Virtual Vertica BDC. >> John: Thank you joy. It's a pleasure to be here. And I'm excited to go through this presentation today. And in a unique fashion today 'cause as I was thinking through how I wanted to present the partnership that we have formed together between Pure Storage, Vertica and AT&T, I want to emphasize how well we all work together and how these three components have really driven home, my desire for a harmonious to use your word relationship. So, I'm going to move forward here and with. So here, what I'm going to do the theme of today's presentation is the Pure Vertica Symphony live at AT&T. And if anybody is a Westworld fan, you can appreciate the sheet music on the right hand side. What we're going to what I'm going to highlight here is in a musical fashion, is how we at AT&T leverage these technologies to save money to deliver a more efficient platform, and to actually just to make our customers happier overall. So as we look back, and back as early as just maybe a few years ago here at AT&T, I realized that we had many musicians to help the company. Or maybe you might want to call them data scientists, or data analysts. For the theme we'll stay with musicians. None of them were singing or playing from the same hymn book or sheet music. And so what we had was many organizations chasing a similar dream, but not exactly the same dream. And, best way to describe that is and I think with a lot of people this might resonate in your organizations. How many organizations are chasing a customer 360 view in your company? Well, I can tell you that I have at least four in my company. And I'm sure there are many that I don't know of. That is our problem because what we see is a repetitive sourcing of data. We see a repetitive copying of data. And there's just so much money to be spent. This is where I asked Pure Storage and Vertica to help me solve that problem with their technologies. What I also noticed was that there was no coordination between these departments. In fact, if you look here, nobody really wants to play with finance. Sales, marketing and care, sure that you all copied each other's data. But they actually didn't communicate with each other as they were copying the data. So the data became replicated and out of sync. This is a challenge throughout, not just my company, but all companies across the world. And that is, the more we replicate the data, the more problems we have at chasing or conquering the goal of single version of truth. In fact, I kid that I think that AT&T, we actually have adopted the multiple versions of truth, techno theory, which is not where we want to be, but this is where we are. But we are conquering that with the synergies between Pure Storage and Vertica. This is what it leaves us with. And this is where we are challenged and that if each one of our siloed business units had their own stories, their own dedicated stories, and some of them had more money than others so they bought more storage. Some of them anticipating storing more data, and then they really did. Others are running out of space, but can't put anymore because their bodies aren't been replenished. So if you look at it from this side view here, we have a limited amount of compute or fixed compute dedicated to each one of these silos. And that's because of the, wanting to own your own. And the other part is that you are limited or wasting space, depending on where you are in the organization. So there were the synergies aren't just about the data, but actually the compute and the storage. And I wanted to tackle that challenge as well. So I was tackling the data. I was tackling the storage, and I was tackling the compute all at the same time. So my ask across the company was can we just please play together okay. And to do that, I knew that I wasn't going to tackle this by getting everybody in the same room and getting them to agree that we needed one account table, because they will argue about whose account table is the best account table. But I knew that if I brought the account tables together, they would soon see that they had so much redundancy that I can now start retiring data sources. I also knew that if I brought all the compute together, that they would all be happy. But I didn't want them to tackle across tackle each other. And in fact that was one of the things that all business units really enjoy. Is they enjoy the silo of having their own compute, and more or less being able to control their own destiny. Well, Vertica's subclustering allows just that. And this is exactly what I was hoping for, and I'm glad they've brought through. And finally, how did I solve the problem of the single account table? Well when you don't have dedicated storage, and you can separate compute and storage as Vertica in Eon Mode does. And we store the data on FlashBlades, which you see on the left and right hand side, of our container, which I can describe in a moment. Okay, so what we have here, is we have a container full of compute with all the Vertica nodes sitting in the middle. Two loader, we'll call them loader subclusters, sitting on the sides, which are dedicated to just putting data onto the FlashBlades, which is sitting on both ends of the container. Now today, I have two dedicated storage or common dedicated might not be the right word, but two storage racks one on the left one on the right. And I treat them as separate storage racks. They could be one, but i created them separately for disaster recovery purposes, lashing work in case that rack were to go down. But that being said, there's no reason why I'm probably going to add a couple of them here in the future. So I can just have a, say five to 10, petabyte storage, setup, and I'll have my DR in another 'cause the DR shouldn't be in the same container. Okay, but I'll DR outside of this container. So I got them all together, I leveraged subclustering, I leveraged separate and compute. I was able to convince many of my clients that they didn't need their own account table, that they were better off having one. I eliminated, I reduced latency, I reduced our ticketing I reduce our data quality issues AKA ticketing okay. I was able to expand. What is this? As work. I was able to leverage elasticity within this cluster. As you can see, there are racks and racks of compute. We set up what we'll call the fixed capacity that each of the business units needed. And then I'm able to ramp up and release the compute that's necessary for each one of my clients based on their workloads throughout the day. And so while they compute to the right before you see that the instruments have already like, more or less, dedicated themselves towards all those are free for anybody to use. So in essence, what I have, is I have a concert hall with a lot of seats available. So if I want to run a 10 chair Symphony or 80, chairs, Symphony, I'm able to do that. And all the while, I can also do the same with my loader nodes. I can expand my loader nodes, to actually have their own Symphony or write all to themselves and not compete with any other workloads of the other clusters. What does that change for our organization? Well, it really changes the way our database administrators actually do their jobs. This has been a big transformation for them. They have actually become data conductors. Maybe you might even call them composers, which is interesting, because what I've asked them to do is morph into less technology and more workload analysis. And in doing so we're able to write auto-detect scripts, that watch the queues, watch the workloads so that we can help ramp up and trim down the cluster and subclusters as necessary. There has been an exciting transformation for our DBAs, who I need to now classify as something maybe like DCAs. I don't know, I have to work with HR on that. But I think it's an exciting future for their careers. And if we bring it all together, If we bring it all together, and then our clusters, start looking like this. Where everything is moving in harmonious, we have lots of seats open for extra musicians. And we are able to emulate a cloud experience on-prem. And so, I want you to sit back and enjoy the Pure Vertica Symphony live at AT&T. (soft music) >> Joy: Thank you so much, John, for an informative and very creative look at the benefits that AT&T is getting from its Pure Vertica symphony. I do really like the idea of engaging HR to change the title to Data Conductor. That's fantastic. I've always believed that music brings people together. And now it's clear that analytics at AT&T is part of that musical advantage. So, now it's time for a short break. And we'll be back for our breakout sessions, beginning at 12 pm Eastern Daylight Time. We have some really exciting sessions planned later today. And then again, as you can see on Wednesday. Now because all of you are already logged in and listening to this keynote, you already know the steps to continue to participate in the sessions that are listed here and on the previous slide. In addition, everyone received an email yesterday, today, and you'll get another one tomorrow, outlining the simple steps to register, login and choose your session. If you have any questions, check out the emails or go to www.vertica.com/bdc2020 for the logistics information. There are a lot of choices and that's always a good thing. Don't worry if you want to attend one or more or can't listen to these live sessions due to your timezone. All the sessions, including the Q&A sections will be available on demand and everyone will have access to the recordings as well as even more pre-recorded sessions that we'll post to the BDC website. Now I do want to leave you with two other important sites. First, our Vertica Academy. Vertica Academy is available to everyone. And there's a variety of very technical, self-paced, on-demand training, virtual instructor-led workshops, and Vertica Essentials Certification. And it's all free. Because we believe that Vertica expertise, helps everyone accelerate their Vertica projects and the advantage that those projects deliver. Now, if you have questions or want to engage with our Vertica engineering team now, we're waiting for you on the Vertica forum. We'll answer any questions or discuss any ideas that you might have. Thank you again for joining the Vertica Big Data Conference Keynote Session. Enjoy the rest of the BDC because there's a lot more to come

Published Date : Mar 30 2020

SUMMARY :

And he'll share the exciting news And that is the platform, with a very robust ecosystem some of the best development brains that we have. the VP of Strategy and Solutions is causing a lot of organizations to back off the and especially proud of the harmonious impact And that is, the more we replicate the data, Enjoy the rest of the BDC because there's a lot more to come

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
StephenPERSON

0.99+

Amy FowlerPERSON

0.99+

MikePERSON

0.99+

John YavanovichPERSON

0.99+

AmyPERSON

0.99+

Colin MahonyPERSON

0.99+

AT&TORGANIZATION

0.99+

BostonLOCATION

0.99+

John YovanovichPERSON

0.99+

VerticaORGANIZATION

0.99+

Joy KingPERSON

0.99+

Mike StonebreakerPERSON

0.99+

JohnPERSON

0.99+

May 2018DATE

0.99+

100%QUANTITY

0.99+

WednesdayDATE

0.99+

ColinPERSON

0.99+

AWSORGANIZATION

0.99+

Vertica AcademyORGANIZATION

0.99+

fiveQUANTITY

0.99+

JoyPERSON

0.99+

2020DATE

0.99+

twoQUANTITY

0.99+

UberORGANIZATION

0.99+

Stephen MurdochPERSON

0.99+

Vertica 10TITLE

0.99+

Pure StorageORGANIZATION

0.99+

oneQUANTITY

0.99+

todayDATE

0.99+

PhilipsORGANIZATION

0.99+

tomorrowDATE

0.99+

AT&T.ORGANIZATION

0.99+

September 2019DATE

0.99+

PythonTITLE

0.99+

www.vertica.com/bdc2020OTHER

0.99+

One gigQUANTITY

0.99+

AmazonORGANIZATION

0.99+

SecondQUANTITY

0.99+

FirstQUANTITY

0.99+

15 minutesQUANTITY

0.99+

yesterdayDATE

0.99+