Tim Smith, AppNexus | BigData NYC 2017

>> Announcer: Live, from Midtown Manhattan, it's theCUBE. Covering Big Data, New York City, 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Okay welcome back, everyone. Live in Manhattan, New York City, in Hell's Kitchen, this is theCUBE's special event, our annual CUBE-Wikibon Research Big Data event in Manhattan. Alongside Strata, Hadoop; formerly Hadoop World, now called Strata Data, as the world continues. This is our annual event; it's our fifth year here, sixth overall, wanted to kind of move from uptown. I'm John Furrier, the co-host of theCUBE, with Peter Burris, Head of Research at SiliconANGLE and GM of Wikibon Research. Our next guest is Tim Smith, who's the SVP of technical operations at AppNexus; technical operations for large scale is an understatement. But before we get going; Tim, just talk about what AppNexus as a company, what you guys do, what's the core business? >> Sure, AppNexus is the second largest digital advertising marketplace after google. We're an internet technology company that harnessed, we harness data and machine learning to power the companies that comprise the open internet. We began by building a powerful technology platform, in which we embedded core capabilities, tools and features. With me so far? >> Yeah, we got it. >> Okay, on top of that platform, we built a core suite of cloud-based enterprise products that enable the buying and selling of digital advertising, and a scale-transparent and low-cost marketplace where other companies can transact; either using our enterprise products, or those offered by other companies. If you want to hear a little about the daily peaks, peak feeds and speeds, it is Strata, we should probably talk about that. We do about 11.8 billion impressions transacted on a daily basis. Each of those is a real-time auction conducted in a fraction of a second, well under half a second. We see about 225 billion impressions per day, and we handle about 5 million queries per second at peak load. We produce about 150 terabytes of data each day, and we move about 400 gigabits into and out of the internet at peak, all those numbers are daily peaks. Makes sense? >> Yep. >> Okay, so by way of comparison, which might be useful for people, I believe the NYSE currently does roughly 2 million trades per day. So if we round that up to 3 million trades a day and assume the NYSE were to conduct that volume every single day of the year; 7 days a week, 365 days a year, that'd be about a billion trades a year. Similarly, I believe Visa did about 28-and-a-half billion transactions in their fiscal third quarter. I'll round that up to 30 billion, and average it out to about 333 million transactions per day and annualize it to about 4 billion transactions per year. Little bit of math, but as I mentioned, AppNexus does an excess of 10 billion transactions per day. And so it seems reasonable to say that AppNexus does roughly 10 times the transaction volume in one day, than the NYSE does in a year. And similarly, it seems reasonable to say that AppNexus daily does more than two times the transaction volume that Visa does in a year. Obviously, these are all just very rough numbers based on publicly available information about the NYSE and Visa, and both the NYSE and Visa do far, far more volume than AppNexus when measured in terms of dollars. So given our volumes, it's imperative that AppNexus does each transaction with the maximum efficiency and lowest reasonable possible cost, and that is one of the most challenging aspects of my job. >> So thanks for spending the time to give the overview. There's a lot of data; I mean 10 billion a day is massive volume. I mean the internet, and you see the scale, is insane. We're in a new era right now of web-scale. We've seen it in Facebook, and it's enormous. It's only going to get bigger, right? So on the online ad tech, you guys are essentially doing like a Google model, that's not everything but Google, which is still huge numbers. Then you include Microsoft and everybody else. Really heavy lifting, IT-like situation. What's the environment like? And just talk about, you know, what's it like for you guys. Because you got a lot of opp's, I mean terms of dev opp's. You can't break anything, because that 10 billion transaction or near, it's a significant impact. So you have to have everything buttoned-up super tight, yet you got to innovate and grow with the future growth. What's the IT environment like? >> It's interesting. We have about 8,000 servers spread across about seven data centers on three continents, and we run, as you mentioned, around the clock. There's no closing bell; downtime is not acceptable. So when you look at our environment, you're talking about four major categories of server complexes. We have real-time processing, which is the actual ad serving. We have a data pipeline, which is what we call our big data environment. We also have client-facing environment and an infrastructure environment. So we use a lot of different tools and applications, but I think the most relevant ones to this discussion are Hadoop and its friends HDFS, and Hive and Spark. And then we use the Vertica Analytics Platform. And together Hadoop and its friends, and Vertica comprise our entire data pipeline. They're both very disk-intensive. They're cluster based applications, and it's a lot of challenge to keep them up and running. >> So what are some of those challenges? Just explain a little bit, because you also have a lot of opportunity. I mean, it's money flowing through the air, basically; digital air, if you will. I mean, they got a lot of stuff happening. Take us through the challenges. >> You know, our biggest apps are all clustered. And all of our clusters are built with commodity servers, just like a lot of other environments. The big data app clusters traditionally have had internal disks, while almost all of our other servers are very light on disk. One of the biggest challenges is, since the server is the fundamental building block of a cluster, then regardless of whether you need more compute or more storage, you always have to add more servers to get it. That really limits flexibility and creates a lot of inefficiencies, and I really, really am obsessive about reducing and eliminating inefficiencies. So, with me so far? >> Yep. >> Great. The inefficiencies result from two major factors. First, not all workloads require the same ratio of compute to storage. Some workloads are more compute-intensive, and others are really less dependent on storage, while other workloads require a lot more storage. So we have to use standard server configurations and as a result, we wind up with underutilized compute and storage. This is undesirable, it's inefficient, yet given our scale, we have to use standardized configurations. So that's the first big challenge. The second is the compute to disk ratio. It's generally fixed when you buy the servers. Yes, we can certainly add more disks in the field, but that's a labor intensive, and it's complicated from a logistics and an asset management standpoint, and you're fundamentally limited by the number of disk slots in the server. So now you're right back into the trap of more storage requires more servers, regardless of whether you need more compute or not. And then you compound the inefficiencies. >> Couldn't you just move the resources from, unused resources, from one cluster to the other? >> I've been asked that a lot; and no, it's just not that simple. Each application cluster becomes a silo due to its configuration of storage and compute. This means you just can't move servers from clusters because the clusters are optimized for the workloads, and the fact that you can't move resources from one cluster to another, it's more inefficiencies. And then they're compounded over time since workloads change, and the ideal ratio of compute-to-storage changes. And the end result is unused resources trapped in silos and configurations that are no longer optimized for your workload. And there's only really one solution that we've been able to find. And to paraphrase an orator far, far more talented than I am, namely Ronald Reagan, we need to open this gate, tear down these silos. The silos just have to go away. They fundamentally limit flexibility and efficiency. >> What were some of the other issues caused by using servers with internal drives? >> You have more maintenance, you've got to deal with the logistics. But the biggest problem is service and storage have significantly different life cycles. Servers typically have a three year life cycle before they're obsolete. Storage typically is four to six years. You can sometimes stretch that a little further with the storage. Inside the servers that are replaced every 3 years, we end up replacing storage before the end of its effective lifetime; that's inefficient. Further, since the storage is inside the servers, we have to do massive data migrations when we replace servers. Migrations, they're time consuming, they're logistically difficult, and they're high risk. >> So how did DriveScale help you guys? Because you guys certainly have a challenging environment, you laid out the the story, and we appreciate that. How did DriveScale help you with the challenges? >> Well, what we really wanted to do was disaggregate storage from servers, and DriveScale enables us to do that. Disaggregating resources is a new term in the industry, but I think lot of people are focusing on it. I can explain it if you think that would make sense. >> What do you mean by disaggregating resources? Can you explain that, and how it works? >> Sure, so instead of buying servers with internal drives, we now buy diskless servers with JBODs. And DriveScale lets us easily compose servers with whatever amount of disk storage we need, from the server resource pool and the disk resource pool; and they're separate pools. This means we have the right balance of compute and storage for each workload, and we can easily adjust it over time. And all of this is done via software, so it's easy to do with a GUI or in our case, at our scale, scripting. And it's done on demand, and it's much more efficient. >> How does it help you with the underutilized resource challenge you mentioned earlier? >> Well, since we can add and remove resources from each cluster, we can manage exactly how much compute power and storage is deployed for each workload. Since this is all done via software, it can be done quickly and easily. We don't have to send a technician into a data center to physically swap drives, add drives, move drives. It's all done via software and it's very, very efficient. >> Can you move resources between silos? >> Well, yes and no. First off, our goal is no more silos. That said, we still have clusters, and once we completely migrate to DriveScale, all of our compute and storage resources will be consolidated into just a few common pools. And disk storage will no longer differentiate pools; thus, we have fewer pools. For more, we have fewer pools and can use the resources in each pool for more workloads. And when our needs change and they always do, we can reallocate resources as needed. >> What of the life cycle management challenge? How you guys address that? >> Well that's addressed with DriveScale. The compute and the storage are now disaggregated or separated into diskless servers and JBODs, so we can upgrade one without touching the other. We want to upgrade servers to take advantage of new processors or new memory architectures, we just replace the servers, re-combine the disks with the new servers, and we're back up and operating. It saves the cost of buying new disks when we don't need to, and it also simplifies logistics and reduces risk, as we no longer have to run the old plant and the new plant concurrently, and do a complicated data migration. >> What about this qualifying server and storage vendors? Do you still do that? Or how's that impact -- >> We actually don't have to do it. We're still using the same server vendor. We've used Dell for many, many years, we continue to use them. We are using them for storage and there was no real work, we just had to add DriveScale into the mix. >> What's it like working with DriveScale? >> They're really wonderful to work with. They have a really seasoned team. They were at Sun Microsystems and Cisco, they built some of the really foundational products that changed the internet, that the internet was built on. They're really talented, they really bright, and they're really focused on customer success. >> Great story, thanks for sharing that. My final question for you is, you guys have a very big, awesome environment, you've got a lot of scale there. It's great for a startup to get into an environment like this, because one, they could get access to the data, work with a good team like you have. What's it like working with a startup? >> You know it's always challenging at first; too many things to do. >> They got talented guys. Most of the startups, those early day startups, they got all their A players out there. >> They have their A players, and we've been very pleased working with them. We're dealing with the top talent, some of the top talent in the industry, that created the industry. They have a proven track record. We really don't have any concerns, we know they're committed to our success and they have a great team, and great investors. >> A final, final question. For your friends out there are watching, and other practitioners who are trying to run things at scale with a cloud. What's your advice to them? You've been operating at scale, and a lot of, billions of transactions, I mean huge; it's only going to get bigger. Put your IT friendly advice hat on. What's the mindset of operators out there, technical op's, as dev op's comes in seeing a lot of that. What do people need to be thinking about to run at scale? >> There's no magic silver bullet. There's no magic answers. The public cloud is very helpful in a lot of ways, but you really have to think hard about your economics, you have to think about your scale. You just have to be sure that you're going into each decision knowing that you've looked at the costs and the benefits, the performance, the risks, and you don't expect there to be simple answers. >> Yeah, there's no magic beans as they say. You've got to make it work for the business. >> No magic beans, I wish there were. >> Tim, thanks so much for the story. Appreciate the commentaries. Live coverage at Big Data NYC, it's theCUBE. Be back with more after this short break. (upbeat techno music)

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE Media and GM of Wikibon Research. Sure, AppNexus is the second largest of the internet at peak, all those numbers are daily peaks. and that is one of the most challenging aspects of my job. I mean the internet, and you see the scale, is insane. and we run, as you mentioned, around the clock. because you also have a lot of opportunity. One of the biggest challenges is, The second is the compute to disk ratio. and the fact that you can't move resources Further, since the storage is inside the servers, Because you guys certainly have a challenging environment, I can explain it if you think that would make sense. and we can easily adjust it over time. We don't have to send a technician into a data center and once we completely migrate to DriveScale, and the new plant concurrently, We actually don't have to do it. that changed the internet, that the internet was built on. you guys have a very big, awesome environment, You know it's always challenging at first; Most of the startups, those early day startups, that created the industry. What's the mindset of operators out there, and you don't expect there to be simple answers. You've got to make it work for the business. Tim, thanks so much for the story.

ENTITIES

Entity	Category	Confidence
NYSE	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Sun Microsystems	ORGANIZATION	0.99+
Tim Smith	PERSON	0.99+
four	QUANTITY	0.99+
Dell	ORGANIZATION	0.99+
Manhattan	LOCATION	0.99+
AppNexus	ORGANIZATION	0.99+
SiliconANGLE	ORGANIZATION	0.99+
Tim	PERSON	0.99+
Ronald Reagan	PERSON	0.99+
10 times	QUANTITY	0.99+
Visa	ORGANIZATION	0.99+
three year	QUANTITY	0.99+
one day	QUANTITY	0.99+
First	QUANTITY	0.99+
fifth year	QUANTITY	0.99+
second	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
each workload	QUANTITY	0.99+
One	QUANTITY	0.99+
each cluster	QUANTITY	0.99+
google	ORGANIZATION	0.99+
Wikibon Research	ORGANIZATION	0.99+
sixth	QUANTITY	0.99+
six years	QUANTITY	0.99+
one	QUANTITY	0.99+
each pool	QUANTITY	0.99+
Midtown Manhattan	LOCATION	0.99+
fiscal third quarter	DATE	0.99+
Each	QUANTITY	0.99+
7 days a week	QUANTITY	0.99+
one solution	QUANTITY	0.99+
each transaction	QUANTITY	0.98+
one cluster	QUANTITY	0.98+
365 days a year	QUANTITY	0.98+
Facebook	ORGANIZATION	0.98+
each day	QUANTITY	0.98+
a year	QUANTITY	0.98+
10 billion a day	QUANTITY	0.98+
Hell's Kitchen	LOCATION	0.98+
three continents	QUANTITY	0.98+
both	QUANTITY	0.98+
about 28-and-a-half billion transactions	QUANTITY	0.98+
about 150 terabytes	QUANTITY	0.97+
Manhattan, New York City	LOCATION	0.97+
more than two times	QUANTITY	0.97+
Big Data	ORGANIZATION	0.97+
New York City	LOCATION	0.97+
two major factors	QUANTITY	0.97+
about 11.8 billion impressions	QUANTITY	0.96+
about 8,000 servers	QUANTITY	0.96+
about 400 gigabits	QUANTITY	0.96+
Each application cluster	QUANTITY	0.96+
billions	QUANTITY	0.96+
up to 30 billion	QUANTITY	0.96+
NYC	LOCATION	0.95+
under half a second	QUANTITY	0.94+
Strata Data	EVENT	0.93+
each decision	QUANTITY	0.92+
SiliconANGLE Media	ORGANIZATION	0.92+
2017	DATE	0.91+
Vertica	ORGANIZATION	0.91+
about 4 billion transactions per year	QUANTITY	0.9+
Spark	TITLE	0.9+
theCUBE	ORGANIZATION	0.9+
about a billion trades a year	QUANTITY	0.9+
up to 3 million trades a day	QUANTITY	0.9+
10 billion transaction	QUANTITY	0.88+
DriveScale	ORGANIZATION	0.88+
about 333 million transactions per day	QUANTITY	0.87+
Hive	TITLE	0.87+
HDFS	TITLE	0.87+
CUBE-Wikibon Research Big Data	EVENT	0.86+
DriveScale	TITLE	0.86+
10 billion transactions per day	QUANTITY	0.86+
GM	PERSON	0.83+
2 million trades per day	QUANTITY	0.82+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Tim Smith: