Brian Pawlowski, DriveScale | CUBEConversation, Sept 2018

(intense orchestral music) >> Hey welcome back everybody, Jeff Frick here with theCUBE. We're having a CUBE Conversation in our Palo Alto studios, getting a short little break between the madness of the conference season, which is fully upon us, and we're excited to have a long time industry veteran Brian Pawlowski, the CTO of DriveScale, joining us to talk about some of the crazy developments that continue to happen in this in this world that just advances, advances. Brian, great to see you. >> Good morning, Jeff, it's great to be here, I'm a bit, still trying to get used to the timezone after a long, long trip in Europe, but I'm glad to be here, I'm glad we finally were able to schedule this. >> Yes, it's never easy, (laughs) one of the secrets of our business is everyone is actually all together at conferences, it's hard to get 'em together when when there's not that catalyst of a conference to bring everybody together. So give us the 101 on DriveScale. >> So, DriveScale. Let me start with, what is composable infrastructure? DriveScale provides product for orchestrating disaggregated components on a high-performance fabric to allow you to spin up essentially your own private cloud, your own clusters for these modern applications, scale out applications. And I just said a bunch of gobble-dee-gook, what does that mean? The DriveScale software is essentially an orchestration package that provides the ability to take compute nodes and storage nodes on high-performance fabric and securely form multi-tenant architectures, much like you would in a cloud. When we think of application deployment, we think of a hundred nodes or 500 nodes. The applications we're looking at are things that our people are using for big data, machine learning, or AI, or, or these scale out databases. Things like Vertica, Aerospike, is important, DRAM, ESES, dBase database, and, this is an alternative to the standard way of deploying applications in a very static nature onto fixed physical resources, or into network storage coming from the likes of Network Appliance, sorry NetApp, and Dell EMC. It's the modern applications we're after, the big data applications for analytics. >> Right. So it's software that basically manages the orchestration of hardware, I mean of compute, store, and networks you can deploy big data analytics applications? >> Yes. >> Ah, at scale. >> It's absolutely focused on the orchestration part. The typical way applications that we're in pursuit of right now are deployed is on 500 physical bare metal nodes from, pick your vendor, of compute and storage that is all bundled together and then laid out into physical deployment on network. What we do is just that you essentially disaggregate, separate compute, pure compute, no disks at all, storage into another layer, have the fabric, and we inventory it all and, much like vCenter for virtualization, for doing software deployment of applications, we do software deployment of scale out applications and a scale out cluster, so. >> Right. So you talked about using industry standard servers, industry standard storage, does the system accommodate different types of compute and CPUs, different types of storage? Whether it's high performance disks, or it's Flash, how does it accommodate those things? And if I'm trying to set up my big stack of hardware to then deploy your software to get it configured, what're some of the things I should be thinkin' about? >> That's actually, a great question, I'm going to try to hit three points. (clears throat) Absolutely. In fact, a core part of our orchestration layer is to essentially generalize the compute and storage components and the networking components of your data center, and do rule-based, constraint-based selection when creating a cluster. From your perspective when creating a cluster (coughs) you say "I want a hundred nodes, and I'm going to run this application on it, and I need that this environment for the application." And this application is running on local, it thinks it's running local, bare metal, so. You say "A hundred nodes, eight cores each minimum, and I want 64 gig of memory minimum." It'll go out and look at the inventory and do a best match of the components there. You could have different products out there, we are compute agnostic, storage agnostic, you could have mix and match, we will basically do a best fit match of all of your available resources and then propose to you in a couple seconds back with the cluster you want, and then you just hit go, and it forms a cluster in a couple seconds. >> A virtual cluster within that inventory of assets that I-- >> A virtual cluster that-- Yes, out of the inventory of assets, except from the perspective of the application it looks like a physical cluster. This is the critical part of what we do, is that, somebody told me "It's like we have an extension cord between the storage and the compute nodes." They used this analogy yesterday and I said I was going to reuse it, so if they listen to this: Hey, I stole your analogy! We basically provide a long extension cord to the direct-to-test storage, except we've separated out the storage from the compute. What's really cool about that, it was the second point of what you said is that you can mix and match. The mix and match occurs because one of the things your doing with your compute and storage is refreshing your compute and storage at three to five year cycles, separately. When you have the old style model of combining compute and storage in what I'd call a captured dazz scenario. You are forced to do refreshes of both compute and persistent storage at the same time, it just becomes, it's a unmanageable position to be in, and separating out the components provides you a lot of flexibility from mixing and matching different types of components, doing rolling upgrades of the compute separate from the storage, and then also having different storage tiers that you can combine SSD storage, the biggest tiers today are SSD storage and spinning disk storage, being able to either provide spinning disk, SSDs, solid-state storage, or a mixture of both for a hybrid deployment for an application without having to worry about a purchase time having to configure your box that way, we just basically do it on the fly. >> Right. So, and then obviously I can run multiple applications against that big stack of assets, and it's going to go ahead and parse the pieces out that I need for each application. >> We didn't even practice this beforehand, that was a great one too! (laughs) Key part of this is actually providing secure multi-tenant environment is the phrase I use, because it's a common phrase. Our target customer is running multiple applications, 2010, when somebody was deploying big data, they were deploying Hadoop. Quickly, (snaps) think, what were the other things then? Nothing. It was Hadoop. Today it's 10 applications, all scale out, all having different requirements for the reference architecture for the amount of compute storage. So, our orchestration layer basically allows you to provision separate virtual physical clusters in a secure, multi-tenant way, cryptographically secure, and you could encrypt the data too if you wanted you could turn on encryption to get over the wire with that data at rest encryption, think GDPR and stuff like that. But, the different clusters cannot interfere with each other's workloads, and because you're on a fully switched internet fabric, they don't interfere with performance either. But that secure multi-tenant part is critical for the orchestration and management of multiple scale out clusters. >> So then, (light laugh) so in theory, if I'm doing this well, I can continually add capacity, I can upgrade my drives to SSDs, I can put in new CPUs as new great things come out into my big cloud, not my cloud, but my big bucket of resources, and then using your software continue to deploy those against applications as is most appropriate? >> Could we switch seats? (both laugh) Let me ask the questions. (laughing) No, because it's-- >> It sounds great, I just keep adding capacity, and then it redeploys based on the optimum, right? >> That's a great summary because the thing that we're-- the basic problem we're trying to solve is that... This is like the lesson from VMware, right? One lesson from VMware was, first it was, we had unused CPU resources, let's get those unused CPU cycles back. No CPU cycle shall go unused! Right? >> I thought that they needed to keep 50% overhead, just to make sure they didn't bump against the roof. But that's a different conversation. >> That's a little detail, (both laugh) that's a little detail. But anyway. The secondary effect was way more important. Once people decoupled their applications from physical purchase decisions and rolling out physical hardware, they stopped caring about any critical piece of hardware, they then found that the simplified management, the one button push software application deployment, was a critical enabler for business operations and business agility. So, we're trying to do what VMware did for that kind of captured legacy application deployments, we're trying to do that for essentially what has been historically, bare metal, big data application deployment, where people were... Seriously in 2012, 2010, 2012, after virtualization took over the data center, and the IT manager had his cup of coffee and he's layin' back goin' "Man, this is great, I have nothing else to worry about." Then there's a (knocks) and the guy comes in his office, or his cube, and goes "Whaddya want?!" and he goes "Well, I'd like you to deploy 500 bare metal nodes to run this thing called Hadoop." and he goes "Well, I'll just give you 500 virtualized instances." a he goes "Nope, not good enough! I want to start going back to bare metal." And sense then it's gotten worse. So what we're trying to do is restore the balance in the universe, and apply for the scale out clusters what virtualization did for the legacy applications. Does that make a little bit of sense? >> Yeah! And is it heading towards the other direction ride is towards the atomic, right? So if you're trying to break the units of compute and store down to the base, so you've got a unified baseline that you can apply more volume than maybe a particular feature set, in a particular CPU, or a particular, characteristic of a particular type of a storage? >> Right. >> This way you're doing in software, and leveraging a whole bunch of it to satisfy, as you said kind of the meets min for that particular application. >> Yeah, absolutely. And I think, kind of critical about the timing of all this is that virtualization drove, very much, a model of commoditization of CPUs, once VMware hit there, people weren't deploying applications on particular platforms, they were deploying applications on a virtualized hardware model, and that was how applications were always thought about from then on. From a lot of these scale out applications, not a lot of them, all of them, are designed to be hardware agnostic. They want to run on bare metal 'cause they're designed to run, when you play a bare metal application for a scale out, Apache Spark, it uses all of the CPU on the machine, you don't need virtualization because it will use all the CPU, it will use all the bandwidth and the disks underneath it. What we're doing is separating it out to provide lifecycle management between the two of them, but also allow you to change the configurations dynamically over time. But, this word of atomic kinda's a-- the disaggregation part is the first step for composability. You want to break it out, and I'll go here and say that the enterprise storage vendors got it right at one point, I mean, they did something good. When they broke out captured storage to the network and provided a separation of compute and storage, before virtualization, that was a step towards a gaining controlled in a sane management approach to what are essentially very different technologies evolving at very different speeds. And then your comment about "So what if you want to basically replace spinning disks with SSDs?" That's easily done in a composable infrastructure because it's a virtual function, you're just using software, software-defined data center, you're using software, except for the set of applications that just slip past what was being done in the virtualized infrastructure, and the network storage infrastructure. >> Right. And this really supports kind of the trend that we see, which is the new age, which is "No, don't tell me what infrastructure I have, and then I'll build an app and try and make it fit." It's really app first, and the infrastructure has to support the app, and I don't really care as a developer and as a competitive business trying to get apps to satisfy my marketplace, the infrastructure, I'm just now assuming, is going to support whatever I build. This is how you enable that. >> Right. And very importantly, the people that are writing all of these apps, the tons of low apps, Apache-- by the way, there's so many Apache things, Apache Kafka, (laughing) Apache Spark, the Hadoops of the world, the NoSQL databases, >> Flinks, and Oracle, >> Cassandra, Vertica, things that we consider-- >> MongoDB, you got 'em all. MongoDB, right. Let's just keep rolling these things off our tongue. >> They're all CUBE alumni, so we've talked to 'em all. >> Oh, this is great. >> It's awesome. (laughs) >> And they're all brilliant technologists, right? And they have defined applications that are so, so good at what they do, but they didn't all get together beforehand and say, "Hey, by the way, how can we work together to make sure that when this is all deployed, and operating in pipelines, and in parallel, that from an IT management perspective, it all just plays well together?" They solved their particular problems, and when it was just one application being deployed no harm no foul, right? When it's 10 applications being deployed, and all of a sudden the line item for big data application starts creeping past five, six, approaching 10%, people start to get a little bit nervous about the operational cost, the management cost, deployability, I talked about lifecycle management, refreshes, tech refreshes, expansion, all these things that when it's a small thing over there in the corner, okay, I'll just ignore it for a while. Yeah. Do you remember the old adventure game pieces? (Jeff laughs) I'm dating myself. >> What's adventure game, I don't know? (laughs) >> Yeah, when you watered a plant, "Water, please! Water, please!" The plant, the plant in there looked pitiful, you gave it water and then it goes "Water! Water! Give me water!" Then it starts to attack, but. >> I'll have to look that one up. (both laugh) Alright so, before I let you go, you've been at this for a while, you've seen a lot of iterations. As you kind of look forward over the next little while, kind of what do you see as some of the next kind of big movements or kind of big developments as kind of the IT evolution, and every company's now an IT company, or software company continues? >> So, let's just say that this is a great time, why I joined DriveScale actually, a couple reasons. This is a great time for composable infrastructure. It's like "Why is composalbe infrastructure important now?" It does solve a lot of problems, you can deploy legacy applications over and stuff, but, they don't have any pain points per se, they're running in their virtualization infrastructure over here, the enterprise storage over here. >> And IBM still sells mainframes, right? So there's still stuff-- >> IBM still sells mainframes. >> There's still stuff runnin' on those boxes. >> Yes there is. (laughs) >> Just let it be, let it run. >> This came up in Europe. (laughs) >> And just let it run, but there's no pain point there, what these increasingly deployed scale out applications, 2004 when the clocks beep was hit, and then everything went multi-core and then parallel applications became the norm, and then it became scale out applications for these for the Facebooks of the world, the Googles of the world, whatever. >> Amazon, et cetera. >> For their applications, that scale out is becoming the norm moving forward for application architecture, and application deployment. The more data that you process, the more scale out you need, and composable infrastructure is becoming a-- is a critical part of getting that under control, and getting you the flexibility and manageability to allow you to actually make sense of that deployment, in the IT center, in the large. And the second thing I want to mention is that, one thing is that Flash has emerged, and that's driven something called NVME over Fabrics, essentially a high-performance fabric interconnect for providing essentially local latency to remote resources; that is part of the composable infrastructure story today, and you're basically accessing with the speed of local access to solid state memory, you're accessing it over the fabric, and all these things are coming together driving a set of applications that are becoming both increasingly important, and increasingly expensive to deploy. And composable infrastructure allows you to get a handle on controlling those costs, and making it a lot more manageable. >> That's a great summary. And clearly, the amount of data, that's going to be coming into these things is only going up, up, up, so. Great conversation Brian, again, we still got to go meet at Terún, later so. >> Yeah, we have to go, yes. >> We will make that happen with ya. >> Great restaurant in Palo Alto. >> Thanks for stoppin' by, and, really appreciate the conversation. >> Yeah, and if you need to buy DriveScale, I'm your guy. (both laughing) >> Alright, he's Brian, I'm Jeff, you're walking the CUBE Conversation from our Palo Alto studios. Thanks for watchin', we'll see you at a conference soon, I'm sure. See ya next time. (intense orchestral music)

Published Date : Sep 28 2018

SUMMARY :

madness of the conference season, which is fully upon us, but I'm glad to be here, one of the secrets of our business that provides the ability to take the orchestration of hardware, It's absolutely focused on the orchestration part. does the system accommodate and the networking components of your data center, and persistent storage at the same time, and it's going to go ahead and and you could encrypt the data too if you wanted Let me ask the questions. This is like the lesson from VMware, right? I thought that they needed to keep 50% overhead, and apply for the scale out clusters and leveraging a whole bunch of it to satisfy, and the network storage infrastructure. and the infrastructure has to support the app, the Hadoops of the world, the NoSQL databases, MongoDB, you got 'em all. It's awesome. and all of a sudden the line item for big data application the plant in there looked pitiful, kind of the IT evolution, the enterprise storage over here. (laughs) This came up in Europe. for the Facebooks of the world, the Googles of the world, and getting you the flexibility and manageability And clearly, the amount of data, really appreciate the conversation. Yeah, and if you need to buy DriveScale, I'm your guy. we'll see you at a conference soon, I'm sure.

ENTITIES

Entity	Category	Confidence
Brian Pawlowski	PERSON	0.99+
Jeff	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Brian	PERSON	0.99+
50%	QUANTITY	0.99+
Europe	LOCATION	0.99+
10 applications	QUANTITY	0.99+
2012	DATE	0.99+
Palo Alto	LOCATION	0.99+
two	QUANTITY	0.99+
2010	DATE	0.99+
Sept 2018	DATE	0.99+
IBM	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
2004	DATE	0.99+
five year	QUANTITY	0.99+
three	QUANTITY	0.99+
500 nodes	QUANTITY	0.99+
One lesson	QUANTITY	0.99+
MongoDB	TITLE	0.99+
both	QUANTITY	0.99+
six	QUANTITY	0.99+
yesterday	DATE	0.99+
64 gig	QUANTITY	0.99+
eight cores	QUANTITY	0.99+
10%	QUANTITY	0.99+
Network Appliance	ORGANIZATION	0.98+
one application	QUANTITY	0.98+
first step	QUANTITY	0.98+
five	QUANTITY	0.98+
each application	QUANTITY	0.98+
second point	QUANTITY	0.98+
VMware	ORGANIZATION	0.97+
DriveScale	ORGANIZATION	0.97+
GDPR	TITLE	0.97+
101	QUANTITY	0.97+
today	DATE	0.97+
Cassandra	TITLE	0.97+
Today	DATE	0.96+
second thing	QUANTITY	0.96+
CUBE	ORGANIZATION	0.96+
one	QUANTITY	0.96+
NoSQL	TITLE	0.96+
each	QUANTITY	0.96+
Facebooks	ORGANIZATION	0.96+
one thing	QUANTITY	0.95+
one point	QUANTITY	0.95+
both laugh	QUANTITY	0.95+
first	QUANTITY	0.94+
Googles	ORGANIZATION	0.94+
Dell EMC	ORGANIZATION	0.94+
NetApp	ORGANIZATION	0.93+
Apache	ORGANIZATION	0.91+
three points	QUANTITY	0.91+
DriveScale	TITLE	0.88+
Terún	ORGANIZATION	0.88+
500 bare metal nodes	QUANTITY	0.88+
Flinks	TITLE	0.87+
Vertica	TITLE	0.86+
a hundred nodes	QUANTITY	0.85+
vCenter	TITLE	0.84+
CUBEConversation	EVENT	0.83+
couple seconds	QUANTITY	0.83+
500 physical bare metal nodes	QUANTITY	0.81+
couple	QUANTITY	0.81+
Aerospike	TITLE	0.78+
500 virtualized	QUANTITY	0.77+
hundred nodes	QUANTITY	0.76+
secondary	QUANTITY	0.76+
one button	QUANTITY	0.72+
Spark	TITLE	0.68+

Tim Smith, AppNexus | BigData NYC 2017

>> Announcer: Live, from Midtown Manhattan, it's theCUBE. Covering Big Data, New York City, 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Okay welcome back, everyone. Live in Manhattan, New York City, in Hell's Kitchen, this is theCUBE's special event, our annual CUBE-Wikibon Research Big Data event in Manhattan. Alongside Strata, Hadoop; formerly Hadoop World, now called Strata Data, as the world continues. This is our annual event; it's our fifth year here, sixth overall, wanted to kind of move from uptown. I'm John Furrier, the co-host of theCUBE, with Peter Burris, Head of Research at SiliconANGLE and GM of Wikibon Research. Our next guest is Tim Smith, who's the SVP of technical operations at AppNexus; technical operations for large scale is an understatement. But before we get going; Tim, just talk about what AppNexus as a company, what you guys do, what's the core business? >> Sure, AppNexus is the second largest digital advertising marketplace after google. We're an internet technology company that harnessed, we harness data and machine learning to power the companies that comprise the open internet. We began by building a powerful technology platform, in which we embedded core capabilities, tools and features. With me so far? >> Yeah, we got it. >> Okay, on top of that platform, we built a core suite of cloud-based enterprise products that enable the buying and selling of digital advertising, and a scale-transparent and low-cost marketplace where other companies can transact; either using our enterprise products, or those offered by other companies. If you want to hear a little about the daily peaks, peak feeds and speeds, it is Strata, we should probably talk about that. We do about 11.8 billion impressions transacted on a daily basis. Each of those is a real-time auction conducted in a fraction of a second, well under half a second. We see about 225 billion impressions per day, and we handle about 5 million queries per second at peak load. We produce about 150 terabytes of data each day, and we move about 400 gigabits into and out of the internet at peak, all those numbers are daily peaks. Makes sense? >> Yep. >> Okay, so by way of comparison, which might be useful for people, I believe the NYSE currently does roughly 2 million trades per day. So if we round that up to 3 million trades a day and assume the NYSE were to conduct that volume every single day of the year; 7 days a week, 365 days a year, that'd be about a billion trades a year. Similarly, I believe Visa did about 28-and-a-half billion transactions in their fiscal third quarter. I'll round that up to 30 billion, and average it out to about 333 million transactions per day and annualize it to about 4 billion transactions per year. Little bit of math, but as I mentioned, AppNexus does an excess of 10 billion transactions per day. And so it seems reasonable to say that AppNexus does roughly 10 times the transaction volume in one day, than the NYSE does in a year. And similarly, it seems reasonable to say that AppNexus daily does more than two times the transaction volume that Visa does in a year. Obviously, these are all just very rough numbers based on publicly available information about the NYSE and Visa, and both the NYSE and Visa do far, far more volume than AppNexus when measured in terms of dollars. So given our volumes, it's imperative that AppNexus does each transaction with the maximum efficiency and lowest reasonable possible cost, and that is one of the most challenging aspects of my job. >> So thanks for spending the time to give the overview. There's a lot of data; I mean 10 billion a day is massive volume. I mean the internet, and you see the scale, is insane. We're in a new era right now of web-scale. We've seen it in Facebook, and it's enormous. It's only going to get bigger, right? So on the online ad tech, you guys are essentially doing like a Google model, that's not everything but Google, which is still huge numbers. Then you include Microsoft and everybody else. Really heavy lifting, IT-like situation. What's the environment like? And just talk about, you know, what's it like for you guys. Because you got a lot of opp's, I mean terms of dev opp's. You can't break anything, because that 10 billion transaction or near, it's a significant impact. So you have to have everything buttoned-up super tight, yet you got to innovate and grow with the future growth. What's the IT environment like? >> It's interesting. We have about 8,000 servers spread across about seven data centers on three continents, and we run, as you mentioned, around the clock. There's no closing bell; downtime is not acceptable. So when you look at our environment, you're talking about four major categories of server complexes. We have real-time processing, which is the actual ad serving. We have a data pipeline, which is what we call our big data environment. We also have client-facing environment and an infrastructure environment. So we use a lot of different tools and applications, but I think the most relevant ones to this discussion are Hadoop and its friends HDFS, and Hive and Spark. And then we use the Vertica Analytics Platform. And together Hadoop and its friends, and Vertica comprise our entire data pipeline. They're both very disk-intensive. They're cluster based applications, and it's a lot of challenge to keep them up and running. >> So what are some of those challenges? Just explain a little bit, because you also have a lot of opportunity. I mean, it's money flowing through the air, basically; digital air, if you will. I mean, they got a lot of stuff happening. Take us through the challenges. >> You know, our biggest apps are all clustered. And all of our clusters are built with commodity servers, just like a lot of other environments. The big data app clusters traditionally have had internal disks, while almost all of our other servers are very light on disk. One of the biggest challenges is, since the server is the fundamental building block of a cluster, then regardless of whether you need more compute or more storage, you always have to add more servers to get it. That really limits flexibility and creates a lot of inefficiencies, and I really, really am obsessive about reducing and eliminating inefficiencies. So, with me so far? >> Yep. >> Great. The inefficiencies result from two major factors. First, not all workloads require the same ratio of compute to storage. Some workloads are more compute-intensive, and others are really less dependent on storage, while other workloads require a lot more storage. So we have to use standard server configurations and as a result, we wind up with underutilized compute and storage. This is undesirable, it's inefficient, yet given our scale, we have to use standardized configurations. So that's the first big challenge. The second is the compute to disk ratio. It's generally fixed when you buy the servers. Yes, we can certainly add more disks in the field, but that's a labor intensive, and it's complicated from a logistics and an asset management standpoint, and you're fundamentally limited by the number of disk slots in the server. So now you're right back into the trap of more storage requires more servers, regardless of whether you need more compute or not. And then you compound the inefficiencies. >> Couldn't you just move the resources from, unused resources, from one cluster to the other? >> I've been asked that a lot; and no, it's just not that simple. Each application cluster becomes a silo due to its configuration of storage and compute. This means you just can't move servers from clusters because the clusters are optimized for the workloads, and the fact that you can't move resources from one cluster to another, it's more inefficiencies. And then they're compounded over time since workloads change, and the ideal ratio of compute-to-storage changes. And the end result is unused resources trapped in silos and configurations that are no longer optimized for your workload. And there's only really one solution that we've been able to find. And to paraphrase an orator far, far more talented than I am, namely Ronald Reagan, we need to open this gate, tear down these silos. The silos just have to go away. They fundamentally limit flexibility and efficiency. >> What were some of the other issues caused by using servers with internal drives? >> You have more maintenance, you've got to deal with the logistics. But the biggest problem is service and storage have significantly different life cycles. Servers typically have a three year life cycle before they're obsolete. Storage typically is four to six years. You can sometimes stretch that a little further with the storage. Inside the servers that are replaced every 3 years, we end up replacing storage before the end of its effective lifetime; that's inefficient. Further, since the storage is inside the servers, we have to do massive data migrations when we replace servers. Migrations, they're time consuming, they're logistically difficult, and they're high risk. >> So how did DriveScale help you guys? Because you guys certainly have a challenging environment, you laid out the the story, and we appreciate that. How did DriveScale help you with the challenges? >> Well, what we really wanted to do was disaggregate storage from servers, and DriveScale enables us to do that. Disaggregating resources is a new term in the industry, but I think lot of people are focusing on it. I can explain it if you think that would make sense. >> What do you mean by disaggregating resources? Can you explain that, and how it works? >> Sure, so instead of buying servers with internal drives, we now buy diskless servers with JBODs. And DriveScale lets us easily compose servers with whatever amount of disk storage we need, from the server resource pool and the disk resource pool; and they're separate pools. This means we have the right balance of compute and storage for each workload, and we can easily adjust it over time. And all of this is done via software, so it's easy to do with a GUI or in our case, at our scale, scripting. And it's done on demand, and it's much more efficient. >> How does it help you with the underutilized resource challenge you mentioned earlier? >> Well, since we can add and remove resources from each cluster, we can manage exactly how much compute power and storage is deployed for each workload. Since this is all done via software, it can be done quickly and easily. We don't have to send a technician into a data center to physically swap drives, add drives, move drives. It's all done via software and it's very, very efficient. >> Can you move resources between silos? >> Well, yes and no. First off, our goal is no more silos. That said, we still have clusters, and once we completely migrate to DriveScale, all of our compute and storage resources will be consolidated into just a few common pools. And disk storage will no longer differentiate pools; thus, we have fewer pools. For more, we have fewer pools and can use the resources in each pool for more workloads. And when our needs change and they always do, we can reallocate resources as needed. >> What of the life cycle management challenge? How you guys address that? >> Well that's addressed with DriveScale. The compute and the storage are now disaggregated or separated into diskless servers and JBODs, so we can upgrade one without touching the other. We want to upgrade servers to take advantage of new processors or new memory architectures, we just replace the servers, re-combine the disks with the new servers, and we're back up and operating. It saves the cost of buying new disks when we don't need to, and it also simplifies logistics and reduces risk, as we no longer have to run the old plant and the new plant concurrently, and do a complicated data migration. >> What about this qualifying server and storage vendors? Do you still do that? Or how's that impact -- >> We actually don't have to do it. We're still using the same server vendor. We've used Dell for many, many years, we continue to use them. We are using them for storage and there was no real work, we just had to add DriveScale into the mix. >> What's it like working with DriveScale? >> They're really wonderful to work with. They have a really seasoned team. They were at Sun Microsystems and Cisco, they built some of the really foundational products that changed the internet, that the internet was built on. They're really talented, they really bright, and they're really focused on customer success. >> Great story, thanks for sharing that. My final question for you is, you guys have a very big, awesome environment, you've got a lot of scale there. It's great for a startup to get into an environment like this, because one, they could get access to the data, work with a good team like you have. What's it like working with a startup? >> You know it's always challenging at first; too many things to do. >> They got talented guys. Most of the startups, those early day startups, they got all their A players out there. >> They have their A players, and we've been very pleased working with them. We're dealing with the top talent, some of the top talent in the industry, that created the industry. They have a proven track record. We really don't have any concerns, we know they're committed to our success and they have a great team, and great investors. >> A final, final question. For your friends out there are watching, and other practitioners who are trying to run things at scale with a cloud. What's your advice to them? You've been operating at scale, and a lot of, billions of transactions, I mean huge; it's only going to get bigger. Put your IT friendly advice hat on. What's the mindset of operators out there, technical op's, as dev op's comes in seeing a lot of that. What do people need to be thinking about to run at scale? >> There's no magic silver bullet. There's no magic answers. The public cloud is very helpful in a lot of ways, but you really have to think hard about your economics, you have to think about your scale. You just have to be sure that you're going into each decision knowing that you've looked at the costs and the benefits, the performance, the risks, and you don't expect there to be simple answers. >> Yeah, there's no magic beans as they say. You've got to make it work for the business. >> No magic beans, I wish there were. >> Tim, thanks so much for the story. Appreciate the commentaries. Live coverage at Big Data NYC, it's theCUBE. Be back with more after this short break. (upbeat techno music)

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE Media and GM of Wikibon Research. Sure, AppNexus is the second largest of the internet at peak, all those numbers are daily peaks. and that is one of the most challenging aspects of my job. I mean the internet, and you see the scale, is insane. and we run, as you mentioned, around the clock. because you also have a lot of opportunity. One of the biggest challenges is, The second is the compute to disk ratio. and the fact that you can't move resources Further, since the storage is inside the servers, Because you guys certainly have a challenging environment, I can explain it if you think that would make sense. and we can easily adjust it over time. We don't have to send a technician into a data center and once we completely migrate to DriveScale, and the new plant concurrently, We actually don't have to do it. that changed the internet, that the internet was built on. you guys have a very big, awesome environment, You know it's always challenging at first; Most of the startups, those early day startups, that created the industry. What's the mindset of operators out there, and you don't expect there to be simple answers. You've got to make it work for the business. Tim, thanks so much for the story.

ENTITIES

Entity	Category	Confidence
NYSE	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Sun Microsystems	ORGANIZATION	0.99+
Tim Smith	PERSON	0.99+
four	QUANTITY	0.99+
Dell	ORGANIZATION	0.99+
Manhattan	LOCATION	0.99+
AppNexus	ORGANIZATION	0.99+
SiliconANGLE	ORGANIZATION	0.99+
Tim	PERSON	0.99+
Ronald Reagan	PERSON	0.99+
10 times	QUANTITY	0.99+
Visa	ORGANIZATION	0.99+
three year	QUANTITY	0.99+
one day	QUANTITY	0.99+
First	QUANTITY	0.99+
fifth year	QUANTITY	0.99+
second	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
each workload	QUANTITY	0.99+
One	QUANTITY	0.99+
each cluster	QUANTITY	0.99+
google	ORGANIZATION	0.99+
Wikibon Research	ORGANIZATION	0.99+
sixth	QUANTITY	0.99+
six years	QUANTITY	0.99+
one	QUANTITY	0.99+
each pool	QUANTITY	0.99+
Midtown Manhattan	LOCATION	0.99+
fiscal third quarter	DATE	0.99+
Each	QUANTITY	0.99+
7 days a week	QUANTITY	0.99+
one solution	QUANTITY	0.99+
each transaction	QUANTITY	0.98+
one cluster	QUANTITY	0.98+
365 days a year	QUANTITY	0.98+
Facebook	ORGANIZATION	0.98+
each day	QUANTITY	0.98+
a year	QUANTITY	0.98+
10 billion a day	QUANTITY	0.98+
Hell's Kitchen	LOCATION	0.98+
three continents	QUANTITY	0.98+
both	QUANTITY	0.98+
about 28-and-a-half billion transactions	QUANTITY	0.98+
about 150 terabytes	QUANTITY	0.97+
Manhattan, New York City	LOCATION	0.97+
more than two times	QUANTITY	0.97+
Big Data	ORGANIZATION	0.97+
New York City	LOCATION	0.97+
two major factors	QUANTITY	0.97+
about 11.8 billion impressions	QUANTITY	0.96+
about 8,000 servers	QUANTITY	0.96+
about 400 gigabits	QUANTITY	0.96+
Each application cluster	QUANTITY	0.96+
billions	QUANTITY	0.96+
up to 30 billion	QUANTITY	0.96+
NYC	LOCATION	0.95+
under half a second	QUANTITY	0.94+
Strata Data	EVENT	0.93+
each decision	QUANTITY	0.92+
SiliconANGLE Media	ORGANIZATION	0.92+
2017	DATE	0.91+
Vertica	ORGANIZATION	0.91+
about 4 billion transactions per year	QUANTITY	0.9+
Spark	TITLE	0.9+
theCUBE	ORGANIZATION	0.9+
about a billion trades a year	QUANTITY	0.9+
up to 3 million trades a day	QUANTITY	0.9+
10 billion transaction	QUANTITY	0.88+
DriveScale	ORGANIZATION	0.88+
about 333 million transactions per day	QUANTITY	0.87+
Hive	TITLE	0.87+
HDFS	TITLE	0.87+
CUBE-Wikibon Research Big Data	EVENT	0.86+
DriveScale	TITLE	0.86+
10 billion transactions per day	QUANTITY	0.86+
GM	PERSON	0.83+
2 million trades per day	QUANTITY	0.82+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for DriveScale: