Armstrong and Guhamad and Jacques V2

>>from around the globe. It's the Cube covering >>space and cybersecurity. Symposium 2020 hosted by Cal Poly >>Over On Welcome to this Special virtual conference. The Space and Cybersecurity Symposium 2020 put on by Cal Poly with support from the Cube. I'm John for your host and master of ceremonies. Got a great topic today in this session. Really? The intersection of space and cybersecurity. This topic and this conversation is the cybersecurity workforce development through public and private partnerships. And we've got a great lineup. We have Jeff Armstrong's the president of California Polytechnic State University, also known as Cal Poly Jeffrey. Thanks for jumping on and Bang. Go ahead. The second director of C four s R Division. And he's joining us from the office of the Under Secretary of Defense for the acquisition Sustainment Department of Defense, D O D. And, of course, Steve Jake's executive director, founder, National Security Space Association and managing partner at Bello's. Gentlemen, thank you for joining me for this session. We got an hour conversation. Thanks for coming on. >>Thank you. >>So we got a virtual event here. We've got an hour, have a great conversation and love for you guys do? In opening statement on how you see the development through public and private partnerships around cybersecurity in space, Jeff will start with you. >>Well, thanks very much, John. It's great to be on with all of you. Uh, on behalf Cal Poly Welcome, everyone. Educating the workforce of tomorrow is our mission to Cal Poly. Whether that means traditional undergraduates, master students are increasingly mid career professionals looking toe up, skill or re skill. Our signature pedagogy is learn by doing, which means that our graduates arrive at employers ready Day one with practical skills and experience. We have long thought of ourselves is lucky to be on California's beautiful central Coast. But in recent years, as we have developed closer relationships with Vandenberg Air Force Base, hopefully the future permanent headquarters of the United States Space Command with Vandenberg and other regional partners, we have discovered that our location is even more advantages than we thought. We're just 50 miles away from Vandenberg, a little closer than u C. Santa Barbara, and the base represents the southern border of what we have come to think of as the central coast region. Cal Poly and Vandenberg Air force base have partner to support regional economic development to encourage the development of a commercial spaceport toe advocate for the space Command headquarters coming to Vandenberg and other ventures. These partnerships have been possible because because both parties stand to benefit Vandenberg by securing new streams of revenue, workforce and local supply chain and Cal Poly by helping to grow local jobs for graduates, internship opportunities for students, and research and entrepreneurship opportunities for faculty and staff. Crucially, what's good for Vandenberg Air Force Base and for Cal Poly is also good for the Central Coast and the US, creating new head of household jobs, infrastructure and opportunity. Our goal is that these new jobs bring more diversity and sustainability for the region. This regional economic development has taken on a life of its own, spawning a new nonprofit called Reach, which coordinates development efforts from Vandenberg Air Force Base in the South to camp to Camp Roberts in the North. Another factor that is facilitated our relationship with Vandenberg Air Force Base is that we have some of the same friends. For example, Northrop Grumman has has long been an important defense contractor, an important partner to Cal poly funding scholarships and facilities that have allowed us to stay current with technology in it to attract highly qualified students for whom Cal Poly's costs would otherwise be prohibitive. For almost 20 years north of grimness funded scholarships for Cal Poly students this year, their funding 64 scholarships, some directly in our College of Engineering and most through our Cal Poly Scholars program, Cal Poly Scholars, a support both incoming freshman is transfer students. These air especially important because it allows us to provide additional support and opportunities to a group of students who are mostly first generation, low income and underrepresented and who otherwise might not choose to attend Cal Poly. They also allow us to recruit from partner high schools with large populations of underrepresented minority students, including the Fortune High School in Elk Grove, which we developed a deep and lasting connection. We know that the best work is done by balanced teams that include multiple and diverse perspectives. These scholarships help us achieve that goal, and I'm sure you know Northrop Grumman was recently awarded a very large contract to modernized the U. S. I. C B M Armory with some of the work being done at Vandenberg Air Force Base, thus supporting the local economy and protecting protecting our efforts in space requires partnerships in the digital realm. How Polly is partnered with many private companies, such as AWS. Our partnerships with Amazon Web services has enabled us to train our students with next generation cloud engineering skills, in part through our jointly created digital transformation hub. Another partnership example is among Cal Poly's California Cybersecurity Institute, College of Engineering and the California National Guard. This partnership is focused on preparing a cyber ready workforce by providing faculty and students with a hands on research and learning environment, side by side with military, law enforcement professionals and cyber experts. We also have a long standing partnership with PG and E, most recently focused on workforce development and redevelopment. Many of our graduates do indeed go on to careers in aerospace and defense industry as a rough approximation. More than 4500 Cal Poly graduates list aerospace and defense as their employment sector on linked in, and it's not just our engineers and computer sciences. When I was speaking to our fellow Panelists not too long ago, >>are >>speaking to bang, we learned that Rachel sins, one of our liberal arts arts majors, is working in his office. So shout out to you, Rachel. And then finally, of course, some of our graduates sword extraordinary heights such as Commander Victor Glover, who will be heading to the International space station later this year as I close. All of which is to say that we're deeply committed the workforce, development and redevelopment that we understand the value of public private partnerships and that were eager to find new ways in which to benefit everyone from this further cooperation. So we're committed to the region, the state in the nation and our past efforts in space, cybersecurity and links to our partners at as I indicated, aerospace industry and governmental partners provides a unique position for us to move forward in the interface of space and cybersecurity. Thank you so much, John. >>President, I'm sure thank you very much for the comments and congratulations to Cal Poly for being on the forefront of innovation and really taking a unique progressive. You and wanna tip your hat to you guys over there. Thank you very much for those comments. Appreciate it. Bahng. Department of Defense. Exciting you gotta defend the nation spaces Global. Your opening statement. >>Yes, sir. Thanks, John. Appreciate that day. Thank you, everybody. I'm honored to be this panel along with President Armstrong, Cal Poly in my long longtime friend and colleague Steve Jakes of the National Security Space Association, to discuss a very important topic of cybersecurity workforce development, as President Armstrong alluded to, I'll tell you both of these organizations, Cal Poly and the N S. A have done and continue to do an exceptional job at finding talent, recruiting them in training current and future leaders and technical professionals that we vitally need for our nation's growing space programs. A swell Asare collective National security Earlier today, during Session three high, along with my colleague Chris Hansen discussed space, cyber Security and how the space domain is changing the landscape of future conflicts. I discussed the rapid emergence of commercial space with the proliferations of hundreds, if not thousands, of satellites providing a variety of services, including communications allowing for global Internet connectivity. S one example within the O. D. We continue to look at how we can leverage this opportunity. I'll tell you one of the enabling technologies eyes the use of small satellites, which are inherently cheaper and perhaps more flexible than the traditional bigger systems that we have historically used unemployed for the U. D. Certainly not lost on Me is the fact that Cal Poly Pioneer Cube SATs 2020 some years ago, and they set the standard for the use of these systems today. So they saw the valiant benefit gained way ahead of everybody else, it seems, and Cal Poly's focus on training and education is commendable. I especially impressed by the efforts of another of Steve's I colleague, current CEO Mr Bill Britain, with his high energy push to attract the next generation of innovators. Uh, earlier this year, I had planned on participating in this year's Cyber Innovation Challenge. In June works Cal Poly host California Mill and high school students and challenge them with situations to test their cyber knowledge. I tell you, I wish I had that kind of opportunity when I was a kid. Unfortunately, the pandemic change the plan. Why I truly look forward. Thio feature events such as these Thio participating. Now I want to recognize my good friend Steve Jakes, whom I've known for perhaps too long of a time here over two decades or so, who was in acknowledge space expert and personally, I truly applaud him for having the foresight of years back to form the National Security Space Association to help the entire space enterprise navigate through not only technology but Polly policy issues and challenges and paved the way for operational izing space. Space is our newest horrifying domain. That's not a secret anymore. Uh, and while it is a unique area, it shares a lot of common traits with the other domains such as land, air and sea, obviously all of strategically important to the defense of the United States. In conflict they will need to be. They will all be contested and therefore they all need to be defended. One domain alone will not win future conflicts in a joint operation. We must succeed. All to defending space is critical as critical is defending our other operational domains. Funny space is no longer the sanctuary available only to the government. Increasingly, as I discussed in the previous session, commercial space is taking the lead a lot of different areas, including R and D, A so called new space, so cyber security threat is even more demanding and even more challenging. Three US considers and federal access to and freedom to operate in space vital to advancing security, economic prosperity, prosperity and scientific knowledge of the country. That's making cyberspace an inseparable component. America's financial, social government and political life. We stood up US Space force ah, year ago or so as the newest military service is like the other services. Its mission is to organize, train and equip space forces in order to protect us and allied interest in space and to provide space capabilities to the joint force. Imagine combining that US space force with the U. S. Cyber Command to unify the direction of space and cyberspace operation strengthened U D capabilities and integrate and bolster d o d cyber experience. Now, of course, to enable all of this requires had trained and professional cadre of cyber security experts, combining a good mix of policy as well as high technical skill set much like we're seeing in stem, we need to attract more people to this growing field. Now the D. O. D. Is recognized the importance of the cybersecurity workforce, and we have implemented policies to encourage his growth Back in 2013 the deputy secretary of defense signed the D. O d cyberspace workforce strategy to create a comprehensive, well equipped cyber security team to respond to national security concerns. Now this strategy also created a program that encourages collaboration between the D. O. D and private sector employees. We call this the Cyber Information Technology Exchange program or site up. It's an exchange programs, which is very interesting, in which a private sector employees can naturally work for the D. O. D. In a cyber security position that spans across multiple mission critical areas are important to the d. O. D. A key responsibility of cybersecurity community is military leaders on the related threats and cyber security actions we need to have to defeat these threats. We talk about rapid that position, agile business processes and practices to speed up innovation. Likewise, cybersecurity must keep up with this challenge to cyber security. Needs to be right there with the challenges and changes, and this requires exceptional personnel. We need to attract talent investing the people now to grow a robust cybersecurity, workforce, streets, future. I look forward to the panel discussion, John. Thank you. >>Thank you so much bomb for those comments and you know, new challenges and new opportunities and new possibilities and free freedom Operating space. Critical. Thank you for those comments. Looking forward. Toa chatting further. Steve Jakes, executive director of N. S. S. A Europe opening statement. >>Thank you, John. And echoing bangs thanks to Cal Poly for pulling these this important event together and frankly, for allowing the National Security Space Association be a part of it. Likewise, we on behalf the association delighted and honored Thio be on this panel with President Armstrong along with my friend and colleague Bonneau Glue Mahad Something for you all to know about Bomb. He spent the 1st 20 years of his career in the Air Force doing space programs. He then went into industry for several years and then came back into government to serve. Very few people do that. So bang on behalf of the space community, we thank you for your long life long devotion to service to our nation. We really appreciate that and I also echo a bang shot out to that guy Bill Britain, who has been a long time co conspirator of ours for a long time and you're doing great work there in the cyber program at Cal Poly Bill, keep it up. But professor arms trying to keep a close eye on him. Uh, I would like to offer a little extra context to the great comments made by by President Armstrong and bahng. Uh, in our view, the timing of this conference really could not be any better. Um, we all recently reflected again on that tragic 9 11 surprise attack on our homeland. And it's an appropriate time, we think, to take pause while the percentage of you in the audience here weren't even born or babies then For the most of us, it still feels like yesterday. And moreover, a tragedy like 9 11 has taught us a lot to include to be more vigilant, always keep our collective eyes and ears open to include those quote eyes and ears from space, making sure nothing like this ever happens again. So this conference is a key aspect. Protecting our nation requires we work in a cybersecurity environment at all times. But, you know, the fascinating thing about space systems is we can't see him. No, sir, We see Space launches man there's nothing more invigorating than that. But after launch, they become invisible. So what are they really doing up there? What are they doing to enable our quality of life in the United States and in the world? Well, to illustrate, I'd like to paraphrase elements of an article in Forbes magazine by Bonds and my good friend Chuck Beans. Chuck. It's a space guy, actually had Bonds job a fuse in the Pentagon. He is now chairman and chief strategy officer at York Space Systems, and in his spare time he's chairman of the small satellites. Chuck speaks in words that everyone can understand. So I'd like to give you some of his words out of his article. Uh, they're afraid somewhat. So these are Chuck's words. Let's talk about average Joe and playing Jane. Before heading to the airport for a business trip to New York City, Joe checks the weather forecast informed by Noah's weather satellites to see what pack for the trip. He then calls an uber that space app. Everybody uses it matches riders with drivers via GPS to take into the airport, So Joe has lunch of the airport. Unbeknownst to him, his organic lunch is made with the help of precision farming made possible through optimized irrigation and fertilization, with remote spectral sensing coming from space and GPS on the plane, the pilot navigates around weather, aided by GPS and nose weather satellites. And Joe makes his meeting on time to join his New York colleagues in a video call with a key customer in Singapore made possible by telecommunication satellites. Around to his next meeting, Joe receives notice changing the location of the meeting to another to the other side of town. So he calmly tells Syria to adjust the destination, and his satellite guided Google maps redirects him to the new location. That evening, Joe watches the news broadcast via satellite. The report details a meeting among world leaders discussing the developing crisis in Syria. As it turns out, various forms of quote remotely sensed. Information collected from satellites indicate that yet another band, chemical weapon, may have been used on its own people. Before going to bed, Joe decides to call his parents and congratulate them for their wedding anniversary as they cruise across the Atlantic, made possible again by communications satellites and Joe's parents can enjoy the call without even wondering how it happened the next morning. Back home, Joe's wife, Jane, is involved in a car accident. Her vehicle skids off the road. She's knocked unconscious, but because of her satellite equipped on star system, the crash is detected immediately and first responders show up on the scene. In time, Joe receives the news books. An early trip home sends flowers to his wife as he orders another uber to the airport. Over that 24 hours, Joe and Jane used space system applications for nearly every part of their day. Imagine the consequences if at any point they were somehow denied these services, whether they be by natural causes or a foreign hostility. And each of these satellite applications used in this case were initially developed for military purposes and continue to be, but also have remarkable application on our way of life. Just many people just don't know that. So, ladies and gentlemen, now you know, thanks to chuck beans, well, the United States has a proud heritage being the world's leading space faring nation, dating back to the Eisenhower and Kennedy years. Today we have mature and robust systems operating from space, providing overhead reconnaissance to quote, wash and listen, provide missile warning, communications, positioning, navigation and timing from our GPS system. Much of what you heard in Lieutenant General J. T. Thompson earlier speech. These systems are not only integral to our national security, but also our also to our quality of life is Chuck told us. We simply no longer could live without these systems as a nation and for that matter, as a world. But over the years, adversary like adversaries like China, Russia and other countries have come to realize the value of space systems and are aggressively playing ketchup while also pursuing capabilities that will challenge our systems. As many of you know, in 2000 and seven, China demonstrated it's a set system by actually shooting down is one of its own satellites and has been aggressively developing counter space systems to disrupt hours. So in a heavily congested space environment, our systems are now being contested like never before and will continue to bay well as Bond mentioned, the United States has responded to these changing threats. In addition to adding ways to protect our system, the administration and in Congress recently created the United States Space Force and the operational you United States Space Command, the latter of which you heard President Armstrong and other Californians hope is going to be located. Vandenberg Air Force Base Combined with our intelligence community today, we have focused military and civilian leadership now in space. And that's a very, very good thing. Commence, really. On the industry side, we did create the National Security Space Association devoted solely to supporting the national security Space Enterprise. We're based here in the D C area, but we have arms and legs across the country, and we are loaded with extraordinary talent. In scores of Forman, former government executives, So S s a is joined at the hip with our government customers to serve and to support. We're busy with a multitude of activities underway ranging from a number of thought provoking policy. Papers are recurring space time Webcast supporting Congress's Space Power Caucus and other main serious efforts. Check us out at NSS. A space dot org's One of our strategic priorities in central to today's events is to actively promote and nurture the workforce development. Just like cow calling. We will work with our U. S. Government customers, industry leaders and academia to attract and recruit students to join the space world, whether in government or industry and two assistant mentoring and training as their careers. Progress on that point, we're delighted. Be delighted to be working with Cal Poly as we hopefully will undertake a new pilot program with him very soon. So students stay tuned something I can tell you Space is really cool. While our nation's satellite systems are technical and complex, our nation's government and industry work force is highly diverse, with a combination of engineers, physicists, method and mathematicians, but also with a large non technical expertise as well. Think about how government gets things thes systems designed, manufactured, launching into orbit and operating. They do this via contracts with our aerospace industry, requiring talents across the board from cost estimating cost analysis, budgeting, procurement, legal and many other support. Tasker Integral to the mission. Many thousands of people work in the space workforce tens of billions of dollars every year. This is really cool stuff, no matter what your education background, a great career to be part of. When summary as bang had mentioned Aziz, well, there is a great deal of exciting challenges ahead we will see a new renaissance in space in the years ahead, and in some cases it's already begun. Billionaires like Jeff Bezos, Elon Musk, Sir Richard Richard Branson are in the game, stimulating new ideas in business models, other private investors and start up companies. Space companies are now coming in from all angles. The exponential advancement of technology and microelectronics now allows the potential for a plethora of small SAT systems to possibly replace older satellites the size of a Greyhound bus. It's getting better by the day and central to this conference, cybersecurity is paramount to our nation's critical infrastructure in space. So once again, thanks very much, and I look forward to the further conversation. >>Steve, thank you very much. Space is cool. It's relevant. But it's important, as you pointed out, and you're awesome story about how it impacts our life every day. So I really appreciate that great story. I'm glad you took the time Thio share that you forgot the part about the drone coming over in the crime scene and, you know, mapping it out for you. But that would add that to the story later. Great stuff. My first question is let's get into the conversations because I think this is super important. President Armstrong like you to talk about some of the points that was teased out by Bang and Steve. One in particular is the comment around how military research was important in developing all these capabilities, which is impacting all of our lives. Through that story. It was the military research that has enabled a generation and generation of value for consumers. This is kind of this workforce conversation. There are opportunities now with with research and grants, and this is, ah, funding of innovation that it's highly accelerate. It's happening very quickly. Can you comment on how research and the partnerships to get that funding into the universities is critical? >>Yeah, I really appreciate that And appreciate the comments of my colleagues on it really boils down to me to partnerships, public private partnerships. You mentioned Northrop Grumman, but we have partnerships with Lockie Martin, Boeing, Raytheon Space six JPL, also member of organization called Business Higher Education Forum, which brings together university presidents and CEOs of companies. There's been focused on cybersecurity and data science, and I hope that we can spill into cybersecurity in space but those partnerships in the past have really brought a lot forward at Cal Poly Aziz mentioned we've been involved with Cube set. Uh, we've have some secure work and we want to plan to do more of that in the future. Uh, those partnerships are essential not only for getting the r and d done, but also the students, the faculty, whether masters or undergraduate, can be involved with that work. Uh, they get that real life experience, whether it's on campus or virtually now during Covic or at the location with the partner, whether it may be governmental or our industry. Uh, and then they're even better equipped, uh, to hit the ground running. And of course, we'd love to see even more of our students graduate with clearance so that they could do some of that a secure work as well. So these partnerships are absolutely critical, and it's also in the context of trying to bring the best and the brightest and all demographics of California and the US into this field, uh, to really be successful. So these partnerships are essential, and our goal is to grow them just like I know other colleagues and C. S u and the U C are planning to dio, >>you know, just as my age I've seen I grew up in the eighties, in college and during that systems generation and that the generation before me, they really kind of pioneered the space that spawned the computer revolution. I mean, you look at these key inflection points in our lives. They were really funded through these kinds of real deep research. Bond talk about that because, you know, we're living in an age of cloud. And Bezos was mentioned. Elon Musk. Sir Richard Branson. You got new ideas coming in from the outside. You have an accelerated clock now on terms of the innovation cycles, and so you got to react differently. You guys have programs to go outside >>of >>the Defense Department. How important is this? Because the workforce that air in schools and our folks re skilling are out there and you've been on both sides of the table. So share your thoughts. >>No, thanks, John. Thanks for the opportunity responded. And that's what you hit on the notes back in the eighties, R and D in space especially, was dominated by my government funding. Uh, contracts and so on. But things have changed. As Steve pointed out, A lot of these commercial entities funded by billionaires are coming out of the woodwork funding R and D. So they're taking the lead. So what we can do within the deal, the in government is truly take advantage of the work they've done on. Uh, since they're they're, you know, paving the way to new new approaches and new way of doing things. And I think we can We could certainly learn from that. And leverage off of that saves us money from an R and D standpoint while benefiting from from the product that they deliver, you know, within the O D Talking about workforce development Way have prioritized we have policies now to attract and retain talent. We need I I had the folks do some research and and looks like from a cybersecurity workforce standpoint. A recent study done, I think, last year in 2019 found that the cybersecurity workforce gap in the U. S. Is nearing half a million people, even though it is a growing industry. So the pipeline needs to be strengthened off getting people through, you know, starting young and through college, like assess a professor Armstrong indicated, because we're gonna need them to be in place. Uh, you know, in a period of about maybe a decade or so, Uh, on top of that, of course, is the continuing issue we have with the gap with with stamps students, we can't afford not to have expertise in place to support all the things we're doing within the with the not only deal with the but the commercial side as well. Thank you. >>How's the gap? Get? Get filled. I mean, this is the this is again. You got cybersecurity. I mean, with space. It's a whole another kind of surface area, if you will, in early surface area. But it is. It is an I o t. Device if you think about it. But it does have the same challenges. That's kind of current and and progressive with cybersecurity. Where's the gap Get filled, Steve Or President Armstrong? I mean, how do you solve the problem and address this gap in the workforce? What is some solutions and what approaches do we need to put in place? >>Steve, go ahead. I'll follow up. >>Okay. Thanks. I'll let you correct. May, uh, it's a really good question, and it's the way I would. The way I would approach it is to focus on it holistically and to acknowledge it up front. And it comes with our teaching, etcetera across the board and from from an industry perspective, I mean, we see it. We've gotta have secure systems with everything we do and promoting this and getting students at early ages and mentoring them and throwing internships at them. Eyes is so paramount to the whole the whole cycle, and and that's kind of and it really takes focused attention. And we continue to use the word focus from an NSS, a perspective. We know the challenges that are out there. There are such talented people in the workforce on the government side, but not nearly enough of them. And likewise on industry side. We could use Maura's well, but when you get down to it, you know we can connect dots. You know that the the aspect That's a Professor Armstrong talked about earlier toe where you continue to work partnerships as much as you possibly can. We hope to be a part of that. That network at that ecosystem the will of taking common objectives and working together to kind of make these things happen and to bring the power not just of one or two companies, but our our entire membership to help out >>President >>Trump. Yeah, I would. I would also add it again. It's back to partnerships that I talked about earlier. One of our partners is high schools and schools fortune Margaret Fortune, who worked in a couple of, uh, administrations in California across party lines and education. Their fifth graders all visit Cal Poly and visit our learned by doing lab and you, you've got to get students interested in stem at a early age. We also need the partnerships, the scholarships, the financial aid so the students can graduate with minimal to no debt to really hit the ground running. And that's exacerbated and really stress. Now, with this covert induced recession, California supports higher education at a higher rate than most states in the nation. But that is that has dropped this year or reasons. We all understand, uh, due to Kobe, and so our partnerships, our creativity on making sure that we help those that need the most help financially uh, that's really key, because the gaps air huge eyes. My colleagues indicated, you know, half of half a million jobs and you need to look at the the students that are in the pipeline. We've got to enhance that. Uh, it's the in the placement rates are amazing. Once the students get to a place like Cal Poly or some of our other amazing CSU and UC campuses, uh, placement rates are like 94%. >>Many of our >>engineers, they have jobs lined up a year before they graduate. So it's just gonna take key partnerships working together. Uh, and that continued partnership with government, local, of course, our state of CSU on partners like we have here today, both Stephen Bang So partnerships the thing >>e could add, you know, the collaboration with universities one that we, uh, put a lot of emphasis, and it may not be well known fact, but as an example of national security agencies, uh, National Centers of Academic Excellence in Cyber, the Fast works with over 270 colleges and universities across the United States to educate its 45 future cyber first responders as an example, so that Zatz vibrant and healthy and something that we ought Teoh Teik, banjo >>off. Well, I got the brain trust here on this topic. I want to get your thoughts on this one point. I'd like to define what is a public private partnership because the theme that's coming out of the symposium is the script has been flipped. It's a modern error. Things air accelerated get you got security. So you get all these things kind of happen is a modern approach and you're seeing a digital transformation play out all over the world in business. Andi in the public sector. So >>what is what >>is a modern public private partnership? What does it look like today? Because people are learning differently, Covert has pointed out, which was that we're seeing right now. How people the progressions of knowledge and learning truth. It's all changing. How do you guys view the modern version of public private partnership and some some examples and improve points? Can you can you guys share that? We'll start with the Professor Armstrong. >>Yeah. A zai indicated earlier. We've had on guy could give other examples, but Northup Grumman, uh, they helped us with cyber lab. Many years ago. That is maintained, uh, directly the software, the connection outside its its own unit so that students can learn the hack, they can learn to penetrate defenses, and I know that that has already had some considerations of space. But that's a benefit to both parties. So a good public private partnership has benefits to both entities. Uh, in the common factor for universities with a lot of these partnerships is the is the talent, the talent that is, that is needed, what we've been working on for years of the, you know, that undergraduate or master's or PhD programs. But now it's also spilling into Skilling and re Skilling. As you know, Jobs. Uh, you know, folks were in jobs today that didn't exist two years, three years, five years ago. But it also spills into other aspects that can expand even mawr. We're very fortunate. We have land, there's opportunities. We have one tech part project. We're expanding our tech park. I think we'll see opportunities for that, and it'll it'll be adjusted thio, due to the virtual world that we're all learning more and more about it, which we were in before Cove it. But I also think that that person to person is going to be important. Um, I wanna make sure that I'm driving across the bridge. Or or that that satellites being launched by the engineer that's had at least some in person training, uh, to do that and that experience, especially as a first time freshman coming on a campus, getting that experience expanding and as adult. And we're gonna need those public private partnerships in order to continue to fund those at a level that is at the excellence we need for these stem and engineering fields. >>It's interesting People in technology can work together in these partnerships in a new way. Bank Steve Reaction Thio the modern version of what a public, successful private partnership looks like. >>If I could jump in John, I think, you know, historically, Dodi's has have had, ah, high bar thio, uh, to overcome, if you will, in terms of getting rapid pulling in your company. This is the fault, if you will and not rely heavily in are the usual suspects of vendors and like and I think the deal is done a good job over the last couple of years off trying to reduce the burden on working with us. You know, the Air Force. I think they're pioneering this idea around pitch days where companies come in, do a two hour pitch and immediately notified of a wooden award without having to wait a long time. Thio get feedback on on the quality of the product and so on. So I think we're trying to do our best. Thio strengthen that partnership with companies outside the main group of people that we typically use. >>Steve, any reaction? Comment to add? >>Yeah, I would add a couple of these air. Very excellent thoughts. Uh, it zits about taking a little gamble by coming out of your comfort zone. You know, the world that Bond and Bond lives in and I used to live in in the past has been quite structured. It's really about we know what the threat is. We need to go fix it, will design it says we go make it happen, we'll fly it. Um, life is so much more complicated than that. And so it's it's really to me. I mean, you take you take an example of the pitch days of bond talks about I think I think taking a gamble by attempting to just do a lot of pilot programs, uh, work the trust factor between government folks and the industry folks in academia. Because we are all in this together in a lot of ways, for example. I mean, we just sent the paper to the White House of their requests about, you know, what would we do from a workforce development perspective? And we hope Thio embellish on this over time once the the initiative matures. But we have a piece of it, for example, is the thing we call clear for success getting back Thio Uh, President Armstrong's comments at the collegiate level. You know, high, high, high quality folks are in high demand. So why don't we put together a program they grabbed kids in their their underclass years identifies folks that are interested in doing something like this. Get them scholarships. Um, um, I have a job waiting for them that their contract ID for before they graduate, and when they graduate, they walk with S C I clearance. We believe that could be done so, and that's an example of ways in which the public private partnerships can happen to where you now have a talented kid ready to go on Day one. We think those kind of things can happen. It just gets back down to being focused on specific initiatives, give them giving them a chance and run as many pilot programs as you can like these days. >>That's a great point, E. President. >>I just want to jump in and echo both the bank and Steve's comments. But Steve, that you know your point of, you know, our graduates. We consider them ready Day one. Well, they need to be ready Day one and ready to go secure. We totally support that and and love to follow up offline with you on that. That's that's exciting, uh, and needed very much needed mawr of it. Some of it's happening, but way certainly have been thinking a lot about that and making some plans, >>and that's a great example of good Segway. My next question. This kind of reimagining sees work flows, eyes kind of breaking down the old the old way and bringing in kind of a new way accelerated all kind of new things. There are creative ways to address this workforce issue, and this is the next topic. How can we employ new creative solutions? Because, let's face it, you know, it's not the days of get your engineering degree and and go interview for a job and then get slotted in and get the intern. You know the programs you get you particularly through the system. This is this is multiple disciplines. Cybersecurity points at that. You could be smart and math and have, ah, degree in anthropology and even the best cyber talents on the planet. So this is a new new world. What are some creative approaches that >>you know, we're >>in the workforce >>is quite good, John. One of the things I think that za challenge to us is you know, we got somehow we got me working for with the government, sexy, right? The part of the challenge we have is attracting the right right level of skill sets and personnel. But, you know, we're competing oftentimes with the commercial side, the gaming industry as examples of a big deal. And those are the same talents. We need to support a lot of programs we have in the U. D. So somehow we have to do a better job to Steve's point off, making the work within the U. D within the government something that they would be interested early on. So I tracked him early. I kind of talked about Cal Poly's, uh, challenge program that they were gonna have in June inviting high school kid. We're excited about the whole idea of space and cyber security, and so on those air something. So I think we have to do it. Continue to do what were the course the next several years. >>Awesome. Any other creative approaches that you guys see working or might be on idea, or just a kind of stoked the ideation out their internship. So obviously internships are known, but like there's gotta be new ways. >>I think you can take what Steve was talking about earlier getting students in high school, uh, and aligning them sometimes. Uh, that intern first internship, not just between the freshman sophomore year, but before they inter cal poly per se. And they're they're involved s So I think that's, uh, absolutely key. Getting them involved many other ways. Um, we have an example of of up Skilling a redeveloped work redevelopment here in the Central Coast. PG and e Diablo nuclear plant as going to decommission in around 2020 24. And so we have a ongoing partnership toe work on reposition those employees for for the future. So that's, you know, engineering and beyond. Uh, but think about that just in the manner that you were talking about. So the up skilling and re Skilling uh, on I think that's where you know, we were talking about that Purdue University. Other California universities have been dealing with online programs before cove it and now with co vid uh, so many more faculty or were pushed into that area. There's going to be much more going and talk about workforce development and up Skilling and Re Skilling The amount of training and education of our faculty across the country, uh, in in virtual, uh, and delivery has been huge. So there's always a silver linings in the cloud. >>I want to get your guys thoughts on one final question as we in the in the segment. And we've seen on the commercial side with cloud computing on these highly accelerated environments where you know, SAS business model subscription. That's on the business side. But >>one of The >>things that's clear in this trend is technology, and people work together and technology augments the people components. So I'd love to get your thoughts as we look at the world now we're living in co vid um, Cal Poly. You guys have remote learning Right now. It's a infancy. It's a whole new disruption, if you will, but also an opportunity to enable new ways to collaborate, Right? So if you look at people and technology, can you guys share your view and vision on how communities can be developed? How these digital technologies and people can work together faster to get to the truth or make a discovery higher to build the workforce? These air opportunities? How do you guys view this new digital transformation? >>Well, I think there's there's a huge opportunities and just what we're doing with this symposium. We're filming this on one day, and it's going to stream live, and then the three of us, the four of us, can participate and chat with participants while it's going on. That's amazing. And I appreciate you, John, you bringing that to this this symposium, I think there's more and more that we can do from a Cal poly perspective with our pedagogy. So you know, linked to learn by doing in person will always be important to us. But we see virtual. We see partnerships like this can expand and enhance our ability and minimize the in person time, decrease the time to degree enhanced graduation rate, eliminate opportunity gaps or students that don't have the same advantages. S so I think the technological aspect of this is tremendous. Then on the up Skilling and Re Skilling, where employees air all over, they can be reached virtually then maybe they come to a location or really advanced technology allows them to get hands on virtually, or they come to that location and get it in a hybrid format. Eso I'm I'm very excited about the future and what we can do, and it's gonna be different with every university with every partnership. It's one. Size does not fit all. >>It's so many possibilities. Bond. I could almost imagine a social network that has a verified, you know, secure clearance. I can jump in, have a little cloak of secrecy and collaborate with the d o. D. Possibly in the future. But >>these are the >>kind of kind of crazy ideas that are needed. Are your thoughts on this whole digital transformation cross policy? >>I think technology is gonna be revolutionary here, John. You know, we're focusing lately on what we call digital engineering to quicken the pace off, delivering capability to warfighter. As an example, I think a I machine language all that's gonna have a major play and how we operate in the future. We're embracing five G technologies writing ability Thio zero latency or I o t More automation off the supply chain. That sort of thing, I think, uh, the future ahead of us is is very encouraging. Thing is gonna do a lot for for national defense on certainly the security of the country. >>Steve, your final thoughts. Space systems are systems, and they're connected to other systems that are connected to people. Your thoughts on this digital transformation opportunity >>Such a great question in such a fun, great challenge ahead of us. Um echoing are my colleague's sentiments. I would add to it. You know, a lot of this has I think we should do some focusing on campaigning so that people can feel comfortable to include the Congress to do things a little bit differently. Um, you know, we're not attuned to doing things fast. Uh, but the dramatic You know, the way technology is just going like crazy right now. I think it ties back Thio hoping Thio, convince some of our senior leaders on what I call both sides of the Potomac River that it's worth taking these gamble. We do need to take some of these things very way. And I'm very confident, confident and excited and comfortable. They're just gonna be a great time ahead and all for the better. >>You know, e talk about D. C. Because I'm not a lawyer, and I'm not a political person, but I always say less lawyers, more techies in Congress and Senate. So I was getting job when I say that. Sorry. Presidential. Go ahead. >>Yeah, I know. Just one other point. Uh, and and Steve's alluded to this in bonded as well. I mean, we've got to be less risk averse in these partnerships. That doesn't mean reckless, but we have to be less risk averse. And I would also I have a zoo. You talk about technology. I have to reflect on something that happened in, uh, you both talked a bit about Bill Britton and his impact on Cal Poly and what we're doing. But we were faced a few years ago of replacing a traditional data a data warehouse, data storage data center, and we partner with a W S. And thank goodness we had that in progress on it enhanced our bandwidth on our campus before Cove. It hit on with this partnership with the digital transformation hub. So there is a great example where, uh, we we had that going. That's not something we could have started. Oh, covitz hit. Let's flip that switch. And so we have to be proactive on. We also have thio not be risk averse and do some things differently. Eyes that that is really salvage the experience for for students. Right now, as things are flowing, well, we only have about 12% of our courses in person. Uh, those essential courses, uh, and just grateful for those partnerships that have talked about today. >>Yeah, and it's a shining example of how being agile, continuous operations, these air themes that expand into space and the next workforce needs to be built. Gentlemen, thank you. very much for sharing your insights. I know. Bang, You're gonna go into the defense side of space and your other sessions. Thank you, gentlemen, for your time for great session. Appreciate it. >>Thank you. Thank you. >>Thank you. >>Thank you. Thank you. Thank you all. >>I'm John Furry with the Cube here in Palo Alto, California Covering and hosting with Cal Poly The Space and Cybersecurity Symposium 2020. Thanks for watching.

Published Date : Oct 1 2020

SUMMARY :

It's the Cube space and cybersecurity. We have Jeff Armstrong's the president of California Polytechnic in space, Jeff will start with you. We know that the best work is done by balanced teams that include multiple and diverse perspectives. speaking to bang, we learned that Rachel sins, one of our liberal arts arts majors, on the forefront of innovation and really taking a unique progressive. of the National Security Space Association, to discuss a very important topic of Thank you so much bomb for those comments and you know, new challenges and new opportunities and new possibilities of the space community, we thank you for your long life long devotion to service to the drone coming over in the crime scene and, you know, mapping it out for you. Yeah, I really appreciate that And appreciate the comments of my colleagues on clock now on terms of the innovation cycles, and so you got to react differently. Because the workforce that air in schools and our folks re So the pipeline needs to be strengthened But it does have the same challenges. Steve, go ahead. the aspect That's a Professor Armstrong talked about earlier toe where you continue to work Once the students get to a place like Cal Poly or some of our other amazing Uh, and that continued partnership is the script has been flipped. How people the progressions of knowledge and learning truth. that is needed, what we've been working on for years of the, you know, Thio the modern version of what a public, successful private partnership looks like. This is the fault, if you will and not rely heavily in are the usual suspects for example, is the thing we call clear for success getting back Thio Uh, that and and love to follow up offline with you on that. You know the programs you get you particularly through We need to support a lot of programs we have in the U. D. So somehow we have to do a better idea, or just a kind of stoked the ideation out their internship. in the manner that you were talking about. And we've seen on the commercial side with cloud computing on these highly accelerated environments where you know, So I'd love to get your thoughts as we look at the world now we're living in co vid um, decrease the time to degree enhanced graduation rate, eliminate opportunity you know, secure clearance. kind of kind of crazy ideas that are needed. certainly the security of the country. and they're connected to other systems that are connected to people. that people can feel comfortable to include the Congress to do things a little bit differently. So I Eyes that that is really salvage the experience for Bang, You're gonna go into the defense side of Thank you. Thank you all. I'm John Furry with the Cube here in Palo Alto, California Covering and hosting with Cal

ENTITIES

Entity	Category	Confidence
Chuck	PERSON	0.99+
Steve	PERSON	0.99+
Steve Jakes	PERSON	0.99+
John	PERSON	0.99+
Joe	PERSON	0.99+
Steve Jake	PERSON	0.99+
Rachel	PERSON	0.99+
Cal Poly	ORGANIZATION	0.99+
National Security Space Association	ORGANIZATION	0.99+
Jeff Armstrong	PERSON	0.99+
Northrop Grumman	ORGANIZATION	0.99+
PG	ORGANIZATION	0.99+
Chris Hansen	PERSON	0.99+
California	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Jeff	PERSON	0.99+
Jane	PERSON	0.99+
National Security Space Association	ORGANIZATION	0.99+
Jeff Bezos	PERSON	0.99+
Chuck Beans	PERSON	0.99+
California National Guard	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
Boeing	ORGANIZATION	0.99+
National Security Space Association	ORGANIZATION	0.99+
Cal Poly	ORGANIZATION	0.99+
Bond	PERSON	0.99+
United States Space Force	ORGANIZATION	0.99+
2013	DATE	0.99+
Singapore	LOCATION	0.99+
94%	QUANTITY	0.99+
Trump	PERSON	0.99+
Richard Branson	PERSON	0.99+
California Cybersecurity Institute	ORGANIZATION	0.99+
United States Space Command	ORGANIZATION	0.99+
June	DATE	0.99+
Thio	PERSON	0.99+
one	QUANTITY	0.99+
Congress	ORGANIZATION	0.99+
Armstrong	PERSON	0.99+
hundreds	QUANTITY	0.99+
United States	LOCATION	0.99+
N S. A	ORGANIZATION	0.99+
four	QUANTITY	0.99+
Cal poly	ORGANIZATION	0.99+
three	QUANTITY	0.99+
Elon Musk	PERSON	0.99+
York Space Systems	ORGANIZATION	0.99+
National Centers of Academic Excellence in Cyber	ORGANIZATION	0.99+
Bezos	PERSON	0.99+
Purdue University	ORGANIZATION	0.99+
One	QUANTITY	0.99+

Armstrong and Guhamad and Jacques V1

>> Announcer: From around the globe, it's The Cube, covering Space and Cybersecurity Symposium 2020, hosted by Cal Poly. >> Everyone, welcome to this special virtual conference, the Space and Cybersecurity Symposium 2020 put on by Cal Poly with support from The Cube. I'm John Furey, your host and master of ceremony's got a great topic today, and this session is really the intersection of space and cybersecurity. This topic, and this conversation is a cybersecurity workforce development through public and private partnerships. And we've got a great lineup, we've Jeff Armstrong is the president of California Polytechnic State University, also known as Cal Poly. Jeffrey, thanks for jumping on and Bong Gumahad. The second, Director of C4ISR Division, and he's joining us from the Office of the Under Secretary of Defense for the acquisition and sustainment of Department of Defense, DOD, and of course Steve Jacques is Executive Director, founder National Security Space Association, and managing partner at Velos. Gentlemen, thank you for joining me for this session, we've got an hour of conversation, thanks for coming on. >> Thank you. >> So we've got a virtual event here, we've got an hour to have a great conversation, I'd love for you guys to do an opening statement on how you see the development through public and private partnerships around cybersecurity and space, Jeff, we'll start with you. >> Well, thanks very much, John, it's great to be on with all of you. On behalf of Cal Poly, welcome everyone. Educating the workforce of tomorrow is our mission at Cal Poly, whether that means traditional undergraduates, masters students, or increasingly, mid-career professionals looking to upskill or re-skill. Our signature pedagogy is learn by doing, which means that our graduates arrive at employers, ready day one with practical skills and experience. We have long thought of ourselves as lucky to be on California's beautiful central coast, but in recent years, as we've developed closer relationships with Vandenberg Air Force Base, hopefully the future permanent headquarters of the United States Space Command with Vandenberg and other regional partners, We have discovered that our location is even more advantageous than we thought. We're just 50 miles away from Vandenberg, a little closer than UC Santa Barbara and the base represents the Southern border of what we have come to think of as the central coast region. Cal Poly and Vandenberg Air Force Base have partnered to support regional economic development, to encourage the development of a commercial space port, to advocate for the space command headquarters coming to Vandenberg and other ventures. These partnerships have been possible because both parties stand to benefit. Vandenberg, by securing new streams of revenue, workforce, and local supply chain and Cal Poly by helping to grow local jobs for graduates, internship opportunities for students and research and entrepreneurship opportunities for faculty and staff. Crucially, what's good for Vandenberg Air Force Base and for Cal Poly is also good for the central coast and the U.S., creating new head of household jobs, infrastructure, and opportunity. Our goal is that these new jobs bring more diversity and sustainability for the region. This regional economic development has taken on a life of its own, spawning a new nonprofit called REACH which coordinates development efforts from Vandenberg Air Force Base in the South to Camp Roberts in the North. Another factor that has facilitated our relationship with Vandenberg Air Force Base is that we have some of the same friends. For example, Northrop Grumman has as long been an important defense contractor and an important partner to Cal Poly, funding scholarships in facilities that have allowed us to stay current with technology in it to attract highly qualified students for whom Cal Poly's costs would otherwise be prohibitive. For almost 20 years, Northrop Grumman has funded scholarships for Cal Poly students. This year, they're funding 64 scholarships, some directly in our College of Engineering and most through our Cal Poly Scholars Program. Cal Poly scholars support both incoming freshmen and transfer students. These are especially important, 'cause it allows us to provide additional support and opportunities to a group of students who are mostly first generation, low income and underrepresented, and who otherwise might not choose to attend Cal Poly. They also allow us to recruit from partner high schools with large populations of underrepresented minority students, including the Fortune High School in Elk Grove, which we developed a deep and lasting connection. We know that the best work is done by balanced teams that include multiple and diverse perspectives. These scholarships help us achieve that goal and I'm sure you know Northrop Grumman was recently awarded a very large contract to modernize the U.S. ICBM armory with some of the work being done at Vandenberg Air Force Base, thus supporting the local economy and protecting... Protecting our efforts in space requires partnerships in the digital realm. Cal Poly has partnered with many private companies such as AWS. Our partnerships with Amazon Web Services has enabled us to train our students with next generation cloud engineering skills, in part, through our jointly created digital transformation hub. Another partnership example is among Cal Poly's California Cyber Security Institute College of Engineering and the California National Guard. This partnership is focused on preparing a cyber-ready workforce, by providing faculty and students with a hands on research and learning environment side by side with military law enforcement professionals and cyber experts. We also have a long standing partnership with PG&E most recently focused on workforce development and redevelopment. Many of our graduates do indeed go on to careers in aerospace and defense industry. As a rough approximation, more than 4,500 Cal Poly graduates list aerospace or defense as their employment sector on LinkedIn. And it's not just our engineers in computer sciences. When I was speaking to our fellow panelists not too long ago, speaking to Bong, we learned that Rachel Sims, one of our liberal arts majors is working in his office, so shout out to you, Rachel. And then finally, of course, some of our graduates soar to extraordinary heights, such as Commander Victor Glover, who will be heading to the International Space Station later this year. As I close, all of which is to say that we're deeply committed to workforce development and redevelopment, that we understand the value of public-private partnerships, and that we're eager to find new ways in which to benefit everyone from this further cooperation. So we're committed to the region, the state and the nation, in our past efforts in space, cyber security and links to our partners at, as I indicated, aerospace industry and governmental partners provides a unique position for us to move forward in the interface of space and cyber security. Thank you so much, John. >> President Armstrong, thank you very much for the comments and congratulations to Cal Poly for being on the forefront of innovation and really taking a unique, progressive view and want to tip a hat to you guys over there, thank you very much for those comments, appreciate it. Bong, Department of Defense. Exciting, you've got to defend the nation, space is global, your opening statement. >> Yes, sir, thanks John, appreciate that. Thank you everybody, I'm honored to be in this panel along with Preston Armstrong of Cal Poly and my longtime friend and colleague Steve Jacques of the National Security Space Association to discuss a very important topic of a cybersecurity workforce development as President Armstrong alluded to. I'll tell you, both of these organizations, Cal Poly and the NSSA have done and continue to do an exceptional job at finding talent, recruiting them and training current and future leaders and technical professionals that we vitally need for our nation's growing space programs, as well as our collective national security. Earlier today, during session three, I, along with my colleague, Chris Samson discussed space cyber security and how the space domain is changing the landscape of future conflicts. I discussed the rapid emergence of commercial space with the proliferation of hundreds, if not thousands of satellites, providing a variety of services including communications, allowing for global internet connectivity, as one example. Within DOD, we continued to look at how we can leverage this opportunity. I'll tell you, one of the enabling technologies, is the use of small satellites, which are inherently cheaper and perhaps more flexible than the traditional bigger systems that we have historically used and employed for DOD. Certainly not lost on me is the fact that Cal Poly pioneered CubeSats 28, 27 years ago, and they set a standard for the use of these systems today. So they saw the value and benefit gained way ahead of everybody else it seems. And Cal Poly's focus on training and education is commendable. I'm especially impressed by the efforts of another of Steven's colleague, the current CIO, Mr. Bill Britton, with his high energy push to attract the next generation of innovators. Earlier this year, I had planned on participating in this year's cyber innovation challenge in June, Oops, Cal Poly hosts California middle, and high school students, and challenge them with situations to test their cyber knowledge. I tell you, I wish I had that kind of opportunity when I was a kid, unfortunately, the pandemic changed the plan, but I truly look forward to future events such as these, to participate in. Now, I want to recognize my good friend, Steve Jacques, whom I've known for perhaps too long of a time here, over two decades or so, who was an acknowledged space expert and personally I've truly applaud him for having the foresight a few years back to form the National Security Space Association to help the entire space enterprise navigate through not only technology, but policy issues and challenges and paved the way for operationalizing space. Space, it certainly was fortifying domain, it's not a secret anymore, and while it is a unique area, it shares a lot of common traits with the other domains, such as land, air, and sea, obviously all are strategically important to the defense of the United States. In conflict, they will all be contested and therefore they all need to be defended. One domain alone will not win future conflicts, and in a joint operation, we must succeed in all. So defending space is critical, as critical as to defending our other operational domains. Funny, space is the only sanctuary available only to the government. Increasingly as I discussed in a previous session, commercial space is taking the lead in a lot of different areas, including R&D, the so-called new space. So cybersecurity threat is even more demanding and even more challenging. The U.S. considers and futhered access to and freedom to operate in space, vital to advancing security, economic prosperity and scientific knowledge of the country, thus making cyberspace an inseparable component of America's financial, social government and political life. We stood up US Space Force a year ago or so as the newest military service. Like the other services, its mission is to organize, train and equip space forces in order to protect U.S. and allied interest in space and to provide spacecape builders who joined force. Imagine combining that U.S. Space Force with the U.S. Cyber Command to unify the direction of the space and cyberspace operation, strengthen DOD capabilities and integrate and bolster a DOD cyber experience. Now, of course, to enable all of this requires a trained and professional cadre of cyber security experts, combining a good mix of policy, as well as a high technical skill set. Much like we're seeing in STEM, we need to attract more people to this growing field. Now, the DOD has recognized the importance to the cybersecurity workforce, and we have implemented policies to encourage its growth. Back in 2013, the Deputy Secretary of Defense signed a DOD Cyberspace Workforce Strategy, to create a comprehensive, well-equipped cyber security team to respond to national security concerns. Now, this strategy also created a program that encourages collaboration between the DOD and private sector employees. We call this the Cyber Information Technology Exchange program, or CITE that it's an exchange program, which is very interesting in which a private sector employee can naturally work for the DOD in a cyber security position that spans across multiple mission critical areas, important to the DOD. A key responsibility of the cyber security community is military leaders, unrelated threats, and the cyber security actions we need to have to defeat these threats. We talked about rapid acquisition, agile business processes and practices to speed up innovation, likewise, cyber security must keep up with this challenge. So cyber security needs to be right there with the challenges and changes, and this requires exceptional personnel. We need to attract talent, invest in the people now to grow a robust cybersecurity workforce for the future. I look forward to the panel discussion, John, thank you. >> Thank you so much, Bob for those comments and, you know, new challenges or new opportunities and new possibilities and freedom to operate in space is critical, thank you for those comments, looking forward to chatting further. Steve Jacques, Executive Director of NSSA, you're up, opening statement. >> Thank you, John and echoing Bongs, thanks to Cal Poly for pulling this important event together and frankly, for allowing the National Security Space Association be a part of it. Likewise, on behalf of the association, I'm delighted and honored to be on this panel of President Armstrong, along with my friend and colleague, Bong Gumahad. Something for you all to know about Bong, he spent the first 20 years of his career in the Air Force doing space programs. He then went into industry for several years and then came back into government to serve, very few people do that. So Bong, on behalf of the space community, we thank you for your lifelong devotion to service to our nation, we really appreciate that. And I also echo a Bong shout out to that guy, Bill Britton. who's been a long time co-conspirator of ours for a long time, and you're doing great work there in the cyber program at Cal Poly, Bill, keep it up. But Professor Armstrong, keep a close eye on him. (laughter) I would like to offer a little extra context to the great comments made by President Armstrong and Bong. And in our view, the timing of this conference really could not be any better. We all recently reflected again on that tragic 9/11 surprise attack on our homeland and it's an appropriate time we think to take pause. While a percentage of you in the audience here weren't even born or were babies then, for the most of us, it still feels like yesterday. And moreover, a tragedy like 9/11 has taught us a lot to include, to be more vigilant, always keep our collective eyes and ears open, to include those "eyes and ears from space," making sure nothing like this ever happens again. So this conference is a key aspect, protecting our nation requires we work in a cyber secure environment at all times. But you know, the fascinating thing about space systems is we can't see 'em. Now sure, we see space launches, man, there's nothing more invigorating than that. But after launch they become invisible, so what are they really doing up there? What are they doing to enable our quality of life in the United States and in the world? Well to illustrate, I'd like to paraphrase elements of an article in Forbes magazine, by Bongs and my good friend, Chuck Beames, Chuck is a space guy, actually had Bongs job a few years in the Pentagon. He's now Chairman and Chief Strategy Officer at York Space Systems and in his spare time, he's Chairman of the Small Satellites. Chuck speaks in words that everyone can understand, so I'd like to give you some of his words out of his article, paraphrase somewhat, so these are Chuck's words. "Let's talk about average Joe and plain Jane. "Before heading to the airport for a business trip "to New York city, Joe checks the weather forecast, "informed by NOAA's weather satellites, "to see what to pack for the trip. "He then calls an Uber, that space app everybody uses, "it matches riders with drivers via GPS, "to take him to the airport. "So Joe has launched in the airport, "unbeknownst to him, his organic lunch is made "with the help of precision farming "made possible to optimize the irrigation and fertilization "with remote spectral sensing coming from space and GPS. "On the plane, the pilot navigates around weather, "aided by GPS and NOAA's weather satellites "and Joe makes his meeting on time "to join his New York colleagues in a video call "with a key customer in Singapore, "made possible by telecommunication satellites. "En route to his next meeting, "Joe receives notice changing the location of the meeting "to the other side of town. "So he calmly tells Siri to adjust the destination "and his satellite-guided Google maps redirect him "to the new location. "That evening, Joe watches the news broadcast via satellite, "report details of meeting among world leaders, "discussing the developing crisis in Syria. "As it turns out various forms of "'remotely sensed information' collected from satellites "indicate that yet another banned chemical weapon "may have been used on its own people. "Before going to bed, Joe decides to call his parents "and congratulate them for their wedding anniversary "as they cruise across the Atlantic, "made possible again by communication satellites "and Joe's parents can enjoy the call "without even wondering how it happened. "The next morning back home, "Joe's wife, Jane is involved in a car accident. "Her vehicle skids off the road, she's knocked unconscious, "but because of her satellite equipped OnStar system, "the crash is detected immediately, "and first responders show up on the scene in time. "Joe receives the news, books an early trip home, "sends flowers to his wife "as he orders another Uber to the airport. "Over that 24 hours, "Joe and Jane used space system applications "for nearly every part of their day. "Imagine the consequences if at any point "they were somehow denied these services, "whether they be by natural causes or a foreign hostility. "In each of these satellite applications used in this case, "were initially developed for military purposes "and continued to be, but also have remarkable application "on our way of life, just many people just don't know that." So ladies and gentlemen, now you know, thanks to Chuck Beames. Well, the United States has a proud heritage of being the world's leading space-faring nation. Dating back to the Eisenhower and Kennedy years, today, we have mature and robust systems operating from space, providing overhead reconnaissance to "watch and listen," provide missile warning, communications, positioning, navigation, and timing from our GPS system, much of which you heard in Lieutenant General JT Thomson's earlier speech. These systems are not only integral to our national security, but also to our quality of life. As Chuck told us, we simply no longer can live without these systems as a nation and for that matter, as a world. But over the years, adversaries like China, Russia and other countries have come to realize the value of space systems and are aggressively playing catch up while also pursuing capabilities that will challenge our systems. As many of you know, in 2007, China demonstrated its ASAT system by actually shooting down one of its own satellites and has been aggressively developing counterspace systems to disrupt ours. So in a heavily congested space environment, our systems are now being contested like never before and will continue to be. Well, as a Bong mentioned, the United States have responded to these changing threats. In addition to adding ways to protect our system, the administration and the Congress recently created the United States Space Force and the operational United States Space Command, the latter of which you heard President Armstrong and other Californians hope is going to be located at Vandenberg Air Force Base. Combined with our intelligence community, today we have focused military and civilian leadership now in space, and that's a very, very good thing. Commensurately on the industry side, we did create the National Security Space Association, devoted solely to supporting the National Security Space Enterprise. We're based here in the DC area, but we have arms and legs across the country and we are loaded with extraordinary talent in scores of former government executives. So NSSA is joined at the hip with our government customers to serve and to support. We're busy with a multitude of activities underway, ranging from a number of thought-provoking policy papers, our recurring spacetime webcasts, supporting Congress's space power caucus, and other main serious efforts. Check us out at nssaspace.org. One of our strategic priorities and central to today's events is to actively promote and nurture the workforce development, just like Cal-Poly. We will work with our U.S. government customers, industry leaders, and academia to attract and recruit students to join the space world, whether in government or industry, and to assist in mentoring and training as their careers progress. On that point, we're delighted to be working with Cal Poly as we hopefully will undertake a new pilot program with them very soon. So students stay tuned, something I can tell you, space is really cool. While our nation's satellite systems are technical and complex, our nation's government and industry workforce is highly diverse, with a combination of engineers, physicists and mathematicians, but also with a large non-technical expertise as well. Think about how government gets these systems designed, manufactured, launching into orbit and operating. They do this via contracts with our aerospace industry, requiring talents across the board, from cost estimating, cost analysis, budgeting, procurement, legal, and many other support tasks that are integral to the mission. Many thousands of people work in the space workforce, tens of billions of dollars every year. This is really cool stuff and no matter what your education background, a great career to be part of. In summary, as Bong had mentioned as well, there's a great deal of exciting challenges ahead. We will see a new renaissance in space in the years ahead and in some cases it's already begun. Billionaires like Jeff Bezos, Elon Musk, Sir Richard Branson, are in the game, stimulating new ideas and business models. Other private investors and startup companies, space companies are now coming in from all angles. The exponential advancement of technology and micro electronics now allows a potential for a plethora of small sat systems to possibly replace older satellites, the size of a Greyhound bus. It's getting better by the day and central to this conference, cybersecurity is paramount to our nation's critical infrastructure in space. So once again, thanks very much and I look forward to the further conversation. >> Steve, thank you very much. Space is cool, it's relevant, but it's important as you pointed out in your awesome story about how it impacts our life every day so I really appreciate that great story I'm glad you took the time to share that. You forgot the part about the drone coming over in the crime scene and, you know, mapping it out for you, but we'll add that to the story later, great stuff. My first question is, let's get into the conversations, because I think this is super important. President Armstrong, I'd like you to talk about some of the points that was teased out by Bong and Steve. One in particular is the comment around how military research was important in developing all these capabilities, which is impacting all of our lives through that story. It was the military research that has enabled a generation and generation of value for consumers. This is kind of this workforce conversation, there are opportunities now with research and grants, and this is a funding of innovation that is highly accelerated, it's happening very quickly. Can you comment on how research and the partnerships to get that funding into the universities is critical? >> Yeah, I really appreciate that and appreciate the comments of my colleagues. And it really boils down to me to partnerships, public-private partnerships, you have mentioned Northrop Grumman, but we have partnerships with Lockheed Martin, Boeing, Raytheon, Space X, JPL, also member of an organization called Business Higher Education Forum, which brings together university presidents and CEOs of companies. There's been focused on cybersecurity and data science and I hope that we can spill into cybersecurity and space. But those partnerships in the past have really brought a lot forward. At Cal Poly, as mentioned, we've been involved with CubeSat, we've have some secure work, and we want to plan to do more of that in the future. Those partnerships are essential, not only for getting the R&D done, but also the students, the faculty, whether they're master's or undergraduate can be involved with that work, they get that real life experience, whether it's on campus or virtually now during COVID or at the location with the partner, whether it may be governmental or industry, and then they're even better equipped to hit the ground running. And of course we'd love to see more of our students graduate with clearance so that they could do some of that secure work as well. So these partnerships are absolutely critical and it's also in the context of trying to bring the best and the brightest in all demographics of California and the U.S. into this field, to really be successful. So these partnerships are essential and our goal is to grow them just like I know our other colleagues in the CSU and the UC are planning to do. >> You know, just as my age I've seen, I grew up in the eighties and in college and they're in that system's generation and the generation before me, they really kind of pioneered the space that spawned the computer revolution. I mean, you look at these key inflection points in our lives, they were really funded through these kinds of real deep research. Bong, talk about that because, you know, we're living in an age of cloud and Bezos was mentioned, Elon Musk, Sir Richard Branson, you got new ideas coming in from the outside, you have an accelerated clock now in terms of the innovation cycles and so you got to react differently, you guys have programs to go outside of the defense department, how important is this because the workforce that are in schools and/or folks re-skilling are out there and you've been on both sides of the table, so share your thoughts. >> No, thanks Johnny, thanks for the opportunity to respond to, and that's what, you know, you hit on the nose back in the 80's, R&D and space especially was dominated by government funding, contracts and so on, but things have changed as Steve pointed out, allow these commercial entities funded by billionaires are coming out of the woodwork, funding R&D so they're taking the lead, so what we can do within the DOD in government is truly take advantage of the work they've done. And since they're, you know, paving the way to new approaches and new way of doing things and I think we can certainly learn from that and leverage off of that, saves us money from an R&D standpoint, while benefiting from the product that they deliver. You know, within DOD, talking about workforce development, you know, we have prioritized and we have policies now to attract and retain the talent we need. I had the folks do some research and it looks like from a cybersecurity or workforce standpoint, a recent study done, I think last year in 2019, found that the cyber security workforce gap in U.S. is nearing half a million people, even though it is a growing industry. So the pipeline needs to be strengthened, getting people through, you know, starting young and through college, like Professor Armstrong indicated because we're going to need them to be in place, you know, in a period of about maybe a decade or so. On top of that, of course, is the continuing issue we have with the gap with STEM students. We can't afford not have expertise in place to support all the things we're doing within DoD, not only DoD but the commercial side as well, thank you. >> How's the gap get filled, I mean, this is, again, you've got cybersecurity, I mean, with space it's a whole other kind of surface area if you will, it's not really surface area, but it is an IOT device if you think about it, but it does have the same challenges, that's kind of current and progressive with cybersecurity. Where's the gap get filled, Steve or President Armstrong, I mean, how do you solve the problem and address this gap in the workforce? What are some solutions and what approaches do we need to put in place? >> Steve, go ahead., I'll follow up. >> Okay, thanks, I'll let you correct me. (laughter) It's a really good question, and the way I would approach it is to focus on it holistically and to acknowledge it upfront and it comes with our teaching, et cetera, across the board. And from an industry perspective, I mean, we see it, we've got to have secure systems in everything we do, and promoting this and getting students at early ages and mentoring them and throwing internships at them is so paramount to the whole cycle. And that's kind of, it really takes a focused attention and we continue to use the word focus from an NSSA perspective. We know the challenges that are out there. There are such talented people in the workforce, on the government side, but not nearly enough of them and likewise on the industry side, we could use more as well, but when you get down to it, you know, we can connect dots, you know, the aspects that Professor Armstrong talked about earlier to where you continue to work partnerships as much as you possibly can. We hope to be a part of that network, that ecosystem if you will, of taking common objectives and working together to kind of make these things happen and to bring the power, not just of one or two companies, but of our entire membership thereabout. >> President Armstrong-- >> Yeah, I would also add it again, it's back to the partnerships that I talked about earlier, one of our partners is high schools and schools Fortune, Margaret Fortune, who worked in a couple of administrations in California across party lines and education, their fifth graders all visit Cal Poly, and visit our learned-by-doing lab. And you've got to get students interested in STEM at an early age. We also need the partnerships, the scholarships, the financial aid, so the students can graduate with minimal to no debt to really hit the ground running and that's exacerbated and really stress now with this COVID induced recession. California supports higher education at a higher rate than most states in the nation, but that has brought this year for reasons all understand due to COVID. And so our partnerships, our creativity, and making sure that we help those that need the most help financially, that's really key because the gaps are huge. As my colleagues indicated, you know, half a million jobs and I need you to look at the students that are in the pipeline, we've got to enhance that. And the placement rates are amazing once the students get to a place like Cal Poly or some of our other amazing CSU and UC campuses, placement rates are like 94%. Many of our engineers, they have jobs lined up a year before they graduate. So it's just going to take a key partnerships working together and that continued partnership with government local, of course, our state, the CSU, and partners like we have here today, both Steve and Bong so partnerships is the thing. >> You know, that's a great point-- >> I could add, >> Okay go ahead. >> All right, you know, the collaboration with universities is one that we put on lot of emphasis here, and it may not be well known fact, but just an example of national security, the AUC is a national centers of academic excellence in cyber defense works with over 270 colleges and universities across the United States to educate and certify future cyber first responders as an example. So that's vibrant and healthy and something that we ought to take advantage of. >> Well, I got the brain trust here on this topic. I want to get your thoughts on this one point, 'cause I'd like to define, you know, what is a public-private partnership because the theme that's coming out of the symposium is the script has been flipped, it's a modern era, things are accelerated, you've got security, so you've got all of these things kind of happenning it's a modern approach and you're seeing a digital transformation play out all over the world in business and in the public sector. So what is a modern public-private partnership and what does it look like today because people are learning differently. COVID has pointed out, which is that we're seeing right now, how people, the progressions of knowledge and learning, truth, it's all changing. How do you guys view the modern version of public-private partnership and some examples and some proof points, can you guys share that? We'll start with you, Professor Armstrong. >> Yeah, as I indicated earlier, we've had, and I could give other examples, but Northrop Grumman, they helped us with a cyber lab many years ago that is maintained directly, the software, the connection outside it's its own unit so the students can learn to hack, they can learn to penetrate defenses and I know that that has already had some considerations of space, but that's a benefit to both parties. So a good public-private partnership has benefits to both entities and the common factor for universities with a lot of these partnerships is the talent. The talent that is needed, what we've been working on for years of, you know, the undergraduate or master's or PhD programs, but now it's also spilling into upskilling and reskilling, as jobs, you know, folks who are in jobs today that didn't exist two years, three years, five years ago, but it also spills into other aspects that can expand even more. We're very fortunate we have land, there's opportunities, we have ONE Tech project. We are expanding our tech park, I think we'll see opportunities for that and it'll be adjusted due to the virtual world that we're all learning more and more about it, which we were in before COVID. But I also think that that person to person is going to be important, I want to make sure that I'm driving across a bridge or that satellite's being launched by the engineer that's had at least some in person training to do that in that experience, especially as a first time freshman coming on campus, getting that experience, expanding it as an adult, and we're going to need those public-private partnerships in order to continue to fund those at a level that is at the excellence we need for these STEM and engineering fields. >> It's interesting people and technology can work together and these partnerships are the new way. Bongs too with reaction to the modern version of what a public successful private partnership looks like. >> If I could jump in John, I think, you know, historically DOD's had a high bar to overcome if you will, in terms of getting rapid... pulling in new companies, miss the fall if you will, and not rely heavily on the usual suspects, of vendors and the like, and I think the DOD has done a good job over the last couple of years of trying to reduce that burden and working with us, you know, the Air Force, I think they're pioneering this idea around pitch days, where companies come in, do a two-hour pitch and immediately notified of, you know, of an a award, without having to wait a long time to get feedback on the quality of the product and so on. So I think we're trying to do our best to strengthen that partnership with companies outside of the main group of people that we typically use. >> Steve, any reaction, any comment to add? >> Yeah, I would add a couple and these are very excellent thoughts. It's about taking a little gamble by coming out of your comfort zone, you know, the world that Bong and I, Bong lives in and I used to live in the past, has been quite structured. It's really about, we know what the threat is, we need to go fix it, we'll design as if as we go make it happen, we'll fly it. Life is so much more complicated than that and so it's really, to me, I mean, you take an example of the pitch days of Bong talks about, I think taking a gamble by attempting to just do a lot of pilot programs, work the trust factor between government folks and the industry folks and academia, because we are all in this together in a lot of ways. For example, I mean, we just sent a paper to the white house at their request about, you know, what would we do from a workforce development perspective and we hope to embellish on this over time once the initiative matures, but we have a piece of it for example, is a thing we call "clear for success," getting back to president Armstrong's comments so at a collegiate level, you know, high, high, high quality folks are in high demand. So why don't we put together a program that grabs kids in their underclass years, identifies folks that are interested in doing something like this, get them scholarships, have a job waiting for them that they're contracted for before they graduate, and when they graduate, they walk with an SCI clearance. We believe that can be done, so that's an example of ways in which public-private partnerships can happen to where you now have a talented kid ready to go on day one. We think those kinds of things can happen, it just gets back down to being focused on specific initiatives, giving them a chance and run as many pilot programs as you can, like pitch days. >> That's a great point, it's a good segue. Go ahead, President Armstrong. >> I just want to jump in and echo both the Bong and Steve's comments, but Steve that, you know, your point of, you know our graduates, we consider them ready day one, well they need to be ready day one and ready to go secure. We totally support that and love to follow up offline with you on that. That's exciting and needed, very much needed more of it, some of it's happening, but we certainly have been thinking a lot about that and making some plans. >> And that's a great example, a good segue. My next question is kind of re-imagining these workflows is kind of breaking down the old way and bringing in kind of the new way, accelerate all kinds of new things. There are creative ways to address this workforce issue and this is the next topic, how can we employ new creative solutions because let's face it, you know, it's not the days of get your engineering degree and go interview for a job and then get slotted in and get the intern, you know, the programs and you'd matriculate through the system. This is multiple disciplines, cybersecurity points at that. You could be smart in math and have a degree in anthropology and be one of the best cyber talents on the planet. So this is a new, new world, what are some creative approaches that's going to work for you? >> Alright, good job, one of the things, I think that's a challenge to us is, you know, somehow we got me working for, with the government, sexy right? You know, part of the challenge we have is attracting the right level of skill sets and personnel but, you know, we're competing, oftentimes, with the commercial side, the gaming industry as examples is a big deal. And those are the same talents we need to support a lot of the programs that we have in DOD. So somehow we have do a better job to Steve's point about making the work within DOD, within the government, something that they would be interested early on. So attract them early, you know, I could not talk about Cal Poly's challenge program that they were going to have in June inviting high school kids really excited about the whole idea of space and cyber security and so on. Those are some of the things that I think we have to do and continue to do over the course of the next several years. >> Awesome, any other creative approaches that you guys see working or might be an idea, or just to kind of stoke the ideation out there? Internships, obviously internships are known, but like, there's got to be new ways. >> Alright, I think you can take what Steve was talking about earlier, getting students in high school and aligning them sometimes at first internship, not just between the freshman and sophomore year, but before they enter Cal Poly per se and they're involved. So I think that's absolutely key, getting them involved in many other ways. We have an example of upskilling or work redevelopment here in the central coast, PG&E Diablo nuclear plant that is going to decommission in around 2024. And so we have a ongoing partnership to work and reposition those employees for the future. So that's, you know, engineering and beyond but think about that just in the manner that you were talking about. So the upskilling and reskilling, and I think that's where, you know, we were talking about that Purdue University, other California universities have been dealing with online programs before COVID, and now with COVID so many more Faculty were pushed into that area, there's going to be a much more going and talk about workforce development in upskilling and reskilling, the amount of training and education of our faculty across the country in virtual and delivery has been huge. So there's always a silver linings in the cloud. >> I want to get your guys' thoughts on one final question as we end the segment, and we've seen on the commercial side with cloud computing on these highly accelerated environments where, you know, SAS business model subscription, and that's on the business side, but one of the things that's clear in this trend is technology and people work together and technology augments the people components. So I'd love to get your thoughts as we look at a world now, we're living in COVID, and Cal Poly, you guys have remote learning right now, it's at the infancy, it's a whole new disruption, if you will, but also an opportunity enable new ways to encollaborate, So if you look at people and technology, can you guys share your view and vision on how communities can be developed, how these digital technologies and people can work together faster to get to the truth or make a discovery, hire, develop the workforce, these are opportunities, how do you guys view this new digital transformation? >> Well, I think there's huge opportunities and just what we're doing with this symposium, we're filming this on Monday and it's going to stream live and then the three of us, the four of us can participate and chat with participants while it's going on. That's amazing and I appreciate you, John, you bringing that to this symposium. I think there's more and more that we can do. From a Cal Poly perspective, with our pedagogy so, you know, linked to learn by doing in-person will always be important to us, but we see virtual, we see partnerships like this, can expand and enhance our ability and minimize the in-person time, decrease the time to degree, enhance graduation rate, eliminate opportunity gaps for students that don't have the same advantages. So I think the technological aspect of this is tremendous. Then on the upskilling and reskilling, where employees are all over, they can re be reached virtually, and then maybe they come to a location or really advanced technology allows them to get hands on virtually, or they come to that location and get it in a hybrid format. So I'm very excited about the future and what we can do, and it's going to be different with every university, with every partnership. It's one size does not fit all, There's so many possibilities, Bong, I can almost imagine that social network that has a verified, you know, secure clearance. I can jump in, and have a little cloak of secrecy and collaborate with the DOD possibly in the future. But these are the kind of crazy ideas that are needed, your thoughts on this whole digital transformation cross-pollination. >> I think technology is going to be revolutionary here, John, you know, we're focusing lately on what we call visual engineering to quicken the pace of the delivery capability to warfighter as an example, I think AI, Machine Language, all that's going to have a major play in how we operate in the future. We're embracing 5G technologies, and the ability for zero latency, more IOT, more automation of the supply chain, that sort of thing, I think the future ahead of us is very encouraging, I think it's going to do a lot for national defense, and certainly the security of the country. >> Steve, your final thoughts, space systems are systems, and they're connected to other systems that are connected to people, your thoughts on this digital transformation opportunity. >> Such a great question and such a fun, great challenge ahead of us. Echoing my colleagues sentiments, I would add to it, you know, a lot of this has, I think we should do some focusing on campaigning so that people can feel comfortable to include the Congress to do things a little bit differently. You know, we're not attuned to doing things fast, but the dramatic, you know, the way technology is just going like crazy right now, I think it ties back to, hoping to convince some of our senior leaders and what I call both sides of the Potomac river, that it's worth taking this gamble, we do need to take some of these things you know, in a very proactive way. And I'm very confident and excited and comfortable that this is going to be a great time ahead and all for the better. >> You know, I always think of myself when I talk about DC 'cause I'm not a lawyer and I'm not a political person, but I always say less lawyers, more techies than in Congress and Senate, so (laughter)I always get in trouble when I say that. Sorry, President Armstrong, go ahead. >> Yeah, no, just one other point and Steve's alluded to this and Bong did as well, I mean, we've got to be less risk averse in these partnerships, that doesn't mean reckless, but we have to be less risk averse. And also, as you talk about technology, I have to reflect on something that happened and you both talked a bit about Bill Britton and his impact on Cal Poly and what we're doing. But we were faced a few years ago of replacing traditional data, a data warehouse, data storage, data center and we partnered with AWS and thank goodness, we had that in progress and it enhanced our bandwidth on our campus before COVID hit, and with this partnership with the digital transformation hub, so there's a great example where we had that going. That's not something we could have started, "Oh COVID hit, let's flip that switch." And so we have to be proactive and we also have to not be risk-averse and do some things differently. That has really salvaged the experience for our students right now, as things are flowing well. We only have about 12% of our courses in person, those essential courses and I'm just grateful for those partnerships that I have talked about today. >> And it's a shining example of how being agile, continuous operations, these are themes that expand the space and the next workforce needs to be built. Gentlemen, thank you very much for sharing your insights, I know Bong, you're going to go into the defense side of space in your other sessions. Thank you gentlemen, for your time, for a great session, I appreciate it. >> Thank you. >> Thank you gentlemen. >> Thank you. >> Thank you. >> Thank you, thank you all. I'm John Furey with The Cube here in Palo Alto, California covering and hosting with Cal Poly, the Space and Cybersecurity Symposium 2020, thanks for watching. (bright atmospheric music)

Published Date : Sep 18 2020

SUMMARY :

the globe, it's The Cube, and of course Steve Jacques on how you see the development and the California National Guard. to you guys over there, Cal Poly and the NSSA have and freedom to operate and nurture the workforce in the crime scene and, you and it's also in the context and the generation before me, So the pipeline needs to be strengthened, does have the same challenges, and likewise on the industry side, and I need you to look at the students and something that we in business and in the public sector. so the students can learn to hack, to the modern version miss the fall if you will, and the industry folks and academia, That's a great point, and echo both the Bong and bringing in kind of the new way, and continue to do over the course but like, there's got to be new ways. and I think that's where, you and that's on the business side, and it's going to be different and certainly the security of the country. and they're connected to other systems and all for the better. of myself when I talk about DC and Steve's alluded to and the next workforce needs to be built. the Space and Cybersecurity

ENTITIES

Entity	Category	Confidence
Steve	PERSON	0.99+
Chuck	PERSON	0.99+
John	PERSON	0.99+
Joe	PERSON	0.99+
Bob	PERSON	0.99+
Chris Samson	PERSON	0.99+
NSSA	ORGANIZATION	0.99+
Jeff Bezos	PERSON	0.99+
Cal Poly	ORGANIZATION	0.99+
Boeing	ORGANIZATION	0.99+
Steve Jacques	PERSON	0.99+
Bill Britton	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Rachel	PERSON	0.99+
NOAA	ORGANIZATION	0.99+
Jeff Armstrong	PERSON	0.99+
Northrop Grumman	ORGANIZATION	0.99+
PG&E	ORGANIZATION	0.99+
2007	DATE	0.99+
Chuck Beames	PERSON	0.99+
National Security Space Association	ORGANIZATION	0.99+
National Security Space Enterprise	ORGANIZATION	0.99+
United States Space Command	ORGANIZATION	0.99+
Department of Defense	ORGANIZATION	0.99+
California	LOCATION	0.99+
Lockheed Martin	ORGANIZATION	0.99+
California National Guard	ORGANIZATION	0.99+
United States Space Force	ORGANIZATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Rachel Sims	PERSON	0.99+
JPL	ORGANIZATION	0.99+
Steven	PERSON	0.99+
Jeff	PERSON	0.99+
DOD	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Space X	ORGANIZATION	0.99+
Jeffrey	PERSON	0.99+
Jane	PERSON	0.99+
Johnny	PERSON	0.99+
John Furey	PERSON	0.99+
Cal Poly	ORGANIZATION	0.99+
National Security Space Association	ORGANIZATION	0.99+
Armstrong	PERSON	0.99+
June	DATE	0.99+
2013	DATE	0.99+
Singapore	LOCATION	0.99+
United States	LOCATION	0.99+
New York	LOCATION	0.99+
U.S. Space Force	ORGANIZATION	0.99+
Bong	PERSON	0.99+
Elon Musk	PERSON	0.99+
Siri	TITLE	0.99+

UNLIST TILL 4/2 - Tapping Vertica's Integration with TensorFlow for Advanced Machine Learning

>> Paige: Hello, everybody, and thank you for joining us today for the Virtual Vertica BDC 2020. Today's breakout session is entitled "Tapping Vertica's Integration with TensorFlow for Advanced Machine Learning." I'm Paige Roberts, Opensource Relations Manager at Vertica, and I'll be your host for this session. Joining me is Vertica Software Engineer, George Larionov. >> George: Hi. >> Paige: (chuckles) That's George. So, before we begin, I encourage you guys to submit questions or comments during the virtual session. You don't have to wait. Just type your question or comment in the question box below the slides and click submit. So, as soon as a question occurs to you, go ahead and type it in, and there will be a Q and A session at the end of the presentation. We'll answer as many questions as we're able to get to during that time. Any questions we don't get to, we'll do our best to answer offline. Now, alternatively, you can visit Vertica Forum to post your questions there, after the session. Our engineering team is planning to join the forums to keep the conversation going, so you can ask an engineer afterwards, just as if it were a regular conference in person. Also, reminder, you can maximize your screen by clicking the double-arrow button in the lower right corner of the slides. And, before you ask, yes, this virtual session is being recorded, and it will be available to view by the end this week. We'll send you a notification as soon as it's ready. Now, let's get started, over to you, George. >> George: Thank you, Paige. So, I've been introduced. I'm a Software Engineer at Vertica, and today I'm going to be talking about a new feature, Vertica's Integration with TensorFlow. So, first, I'm going to go over what is TensorFlow and what are neural networks. Then, I'm going to talk about why integrating with TensorFlow is a useful feature, and, finally, I am going to talk about the integration itself and give an example. So, as we get started here, what is TensorFlow? TensorFlow is an opensource machine learning library, developed by Google, and it's actually one of many such libraries. And, the whole point of libraries like TensorFlow is to simplify the whole process of working with neural networks, such as creating, training, and using them, so that it's available to everyone, as opposed to just a small subset of researchers. So, neural networks are computing systems that allow us to solve various tasks. Traditionally, computing algorithms were designed completely from the ground up by engineers like me, and we had to manually sift through the data and decide which parts are important for the task and which are not. Neural networks aim to solve this problem, a little bit, by sifting through the data themselves, automatically and finding traits and features which correlate to the right results. So, you can think of it as neural networks learning to solve a specific task by looking through the data without having human beings have to sit and sift through the data themselves. So, there's a couple necessary parts to getting a trained neural model, which is the final goal. By the way, a neural model is the same as a neural network. Those are synonymous. So, first, you need this light blue circle, an untrained neural model, which is pretty easy to get in TensorFlow, and, in edition to that, you need your training data. Now, this involves both training inputs and training labels, and I'll talk about exactly what those two things are on the next slide. But, basically, you need to train your model with the training data, and, once it is trained, you can use your trained model to predict on just the purple circle, so new training inputs. And, it will predict the training labels for you. You don't have to label it anymore. So, a neural network can be thought of as... Training a neural network can be thought of as teaching a person how to do something. For example, if I want to learn to speak a new language, let's say French, I would probably hire some sort of tutor to help me with that task, and I would need a lot of practice constructing and saying sentences in French. And a lot of feedback from my tutor on whether my pronunciation or grammar, et cetera, is correct. And, so, that would take me some time, but, finally, hopefully, I would be able to learn the language and speak it without any sort of feedback, getting it right. So, in a very similar manner, a neural network needs to practice on, example, training data, first, and, along with that data, it needs labeled data. In this case, the labeled data is kind of analogous to the tutor. It is the correct answers, so that the network can learn what those look like. But, ultimately, the goal is to predict on unlabeled data which is analogous to me knowing how to speak French. So, I went over most of the bullets. A neural network needs a lot of practice. To do that, it needs a lot of good labeled data, and, finally, since a neural network needs to iterate over the training data many, many times, it needs a powerful machine which can do that in a reasonable amount of time. So, here's a quick checklist on what you need if you have a specific task that you want to solve with a neural network. So, the first thing you need is a powerful machine for training. We discussed why this is important. Then, you need TensorFlow installed on the machine, of course, and you need a dataset and labels for your dataset. Now, this dataset can be hundreds of examples, thousands, sometimes even millions. I won't go into that because the dataset size really depends on the task at hand, but if you have these four things, you can train a good neural network that will predict whatever result you want it to predict at the end. So, we've talked about neural networks and TensorFlow, but the question is if we already have a lot of built-in machine-learning algorithms in Vertica, then why do we need to use TensorFlow? And, to answer that question, let's look at this dataset. So, this is a pretty simple toy dataset with 20,000 points, but it shows, it simulates a more complex dataset with some sort of two different classes which are not related in a simple way. So, the existing machine-learning algorithms that Vertica already has, mostly fail on this pretty simple dataset. Linear models can't really draw a good line separating the two types of points. Naïve Bayes, also, performs pretty badly, and even the Random Forest algorithm, which is a pretty powerful algorithm, with 300 trees gets only 80% accuracy. However, a neural network with only two hidden layers gets 99% accuracy in about ten minutes of training. So, I hope that's a pretty compelling reason to use neural networks, at least sometimes. So, as an aside, there are plenty of tasks that do fit the existing machine-learning algorithms in Vertica. That's why they're there, and if one of your tasks that you want to solve fits one of the existing algorithms, well, then I would recommend using that algorithm, not TensorFlow, because, while neural networks have their place and are very powerful, it's often easier to use an existing algorithm, if possible. Okay, so, now that we've talked about why neural networks are needed, let's talk about integrating them with Vertica. So, neural networks are best trained using GPUs, which are Graphics Processing Units, and it's, basically, just a different processing unit than a CPU. GPUs are good for training neural networks because they excel at doing many, many simple operations at the same time, which is needed for a neural network to be able to iterate through the training data many times. However, Vertica runs on CPUs and cannot run on GPUs at all because that's not how it was designed. So, to train our neural networks, we have to go outside of Vertica, and exporting a small batch of training data is pretty simple. So, that's not really a problem, but, given this information, why do we even need Vertica? If we train outside, then why not do everything outside of Vertica? So, to answer that question, here is a slide that Philips was nice enough to let us use. This is an example of production system at Philips. So, it consists of two branches. On the left, we have a branch with historical device log data, and this can kind of be thought of as a bunch of training data. And, all that data goes through some data integration, data analysis. Basically, this is where you train your models, whether or not they are neural networks, but, for the purpose of this talk, this is where you would train your neural network. And, on the right, we have a branch which has live device log data coming in from various MRI machines, CAT scan machines, et cetera, and this is a ton of data. So, these machines are constantly running. They're constantly on, and there's a bunch of them. So, data just keeps streaming in, and, so, we don't want this data to have to take any unnecessary detours because that would greatly slow down the whole system. So, this data in the right branch goes through an already trained predictive model, which need to be pretty fast, and, finally, it allows Philips to do some maintenance on these machines before they actually break, which helps Philips, obviously, and definitely the medical industry as well. So, I hope this slide helped explain the complexity of a live production system and why it might not be reasonable to train your neural networks directly in the system with the live device log data. So, a quick summary on just the neural networks section. So, neural networks are powerful, but they need a lot of processing power to train which can't really be done well in a production pipeline. However, they are cheap and fast to predict with. Prediction with a neural network does not require GPU anymore. And, they can be very useful in production, so we do want them there. We just don't want to train them there. So, the question is, now, how do we get neural networks into production? So, we have, basically, two options. The first option is to take the data and export it to our machine with TensorFlow, our powerful GPU machine, or we can take our TensorFlow model and put it where the data is. In this case, let's say that that is Vertica. So, I'm going to go through some pros and cons of these two approaches. The first one is bringing the data to the analytics. The pros of this approach are that TensorFlow is already installed, running on this GPU machine, and we don't have to move the model at all. The cons, however, are that we have to transfer all the data to this machine and if that data is big, if it's, I don't know, gigabytes, terabytes, et cetera, then that becomes a huge bottleneck because you can only transfer in small quantities. Because GPU machines tend to not be that big. Furthermore, TensorFlow prediction doesn't actually need a GPU. So, you would end up paying for an expensive GPU for no reason. It's not parallelized because you just have one GPU machine. You can't put your production system on this GPU, as we discussed. And, so, you're left with good results, but not fast and not where you need them. So, now, let's look at the second option. So, the second option is bringing the analytics to the data. So, the pros of this approach are that we can integrate with our production system. It's low impact because prediction is not processor intensive. It's cheap, or, at least, it's pretty much as cheap as your system was before. It's parallelized because Vertica was always parallelized, which we'll talk about in the next slide. There's no extra data movement. You get the benefit from model management in Vertica, meaning, if you import multiple TensorFlow models, you can keep track of their various attributes, when they were imported, et cetera. And, the results are right where you need them, inside your production pipeline. So, two cons are that TensorFlow is limited to just prediction inside Vertica, and, if you want to retrain your model, you need to do that outside of Vertica and, then, reimport. So, just as a recap of parallelization. Everything in Vertica is parallelized and distributed, and TensorFlow is no exception. So, when you import your TensorFlow model to your Vertica cluster, it gets copied to all the nodes, automatically, and TensorFlow will run in fenced mode which means that it the TensorFlow process fails for whatever reason, even though it shouldn't, but if it does, Vertica itself will not crash, which is obviously important. And, finally, prediction happens on each node. There are multiple threads of TensorFlow processes running, processing different little bits of data, which is faster, much faster, than processing the data line by line because it happens all in a parallelized fashion. And, so, the result is fast prediction. So, here's an example which I hope is a little closer to what everyone is used to than the usual machine learning TensorFlow example. This is the Boston housing dataset, or, rather, a small subset of it. Now, on the left, we have the input data to go back to, I think, the first slide, and, on the right, is the training label. So, the input data consists of, each line is a plot of land in Boston, along with various attributes, such as the level of crime in that area, how much industry is in that area, whether it's on the Charles River, et cetera, and, on the right, we have as the labels the median house value in that plot of land. And, so, the goal is to put all this data into the neural network and, finally, get a model which can train... I don't know, which can predict on new incoming data and predict a good housing value for that data. Now, I'm going to go through, step by step, how to actually use TensorFlow models in Vertica. So, the first step I won't go into much detail on because there are countless tutorials and resources online on how to use TensorFlow to train a neural network, so that's the first step. Second step is to save the model in TensorFlow's 'frozen graph' format. Again, this information is available online. The third step is to create a small, simple JSON file describing the inputs and outputs of the model, and what data type they are, et cetera. And, this is needed for Vertica to be able to translate from TensorFlow land into Vertica equal land, so that it can use a sequel table instead of the input set TensorFlow usually takes. So, once you have your model file and your JSON file, you want to put both of those files in a directory on a node, any node, in a Vertica cluster, and name that directory whatever you want your model to ultimately be called inside of Vertica. So, once you do that you can go ahead and import that directory into Vertica. So, this import model's function already exists in Vertica. All we added was a new category to be able to import. So, what you need to do is specify the pass to your neural network directory and specify that the category that the model is is a TensorFlow model. Once you successfully import, in order to predict, you run this brand new predict TensorFlow function, so, in this case, we're predicting on everything from the input table, which is what the star means. The model name is Boston housing net which is the name of your directory, and, then, there's a little bit of boilerplate. And, the two ID and value after the as are just the names of the columns of your outputs, and, finally, the Boston housing data is whatever sequel table you want to predict on that fits the import type of your network. And, this will output a bunch of predictions. In this case, values of houses that the network thinks are appropriate for all the input data. So, just a quick summary. So, we talked about what is TensorFlow and what are neural networks, and, then, we discussed that TensorFlow works best on GPUs because it needs very specific characteristics. That is TensorFlow works best for training on GPUs while Vertica is designed to use CPUs, and it's really good at storing and accessing a lot of data quickly. But, it's not very well designed for having neural networks trained inside of it. Then, we talked about how neural models are powerful, and we want to use them in our production flow. And, since prediction is fast, we can go ahead and do that, but we just don't want to train there, and, finally, I presented Vertica TensorFlow integration which allows importing a trained neural model, a trained neural TensorFlow model, into Vertica and predicting on all the data that is inside Vertica with few simple lines of sequel. So, thank you for listening. I'm going to take some questions, now.

Published Date : Mar 30 2020

SUMMARY :

and I'll be your host for this session. So, as soon as a question occurs to you, So, the second option is bringing the analytics to the data.

ENTITIES

Entity	Category	Confidence
Vertica	ORGANIZATION	0.99+
Philips	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
George	PERSON	0.99+
99%	QUANTITY	0.99+
20,000 points	QUANTITY	0.99+
second option	QUANTITY	0.99+
Charles River	LOCATION	0.99+
Google	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
Paige Roberts	PERSON	0.99+
third step	QUANTITY	0.99+
first step	QUANTITY	0.99+
George Larionov	PERSON	0.99+
first option	QUANTITY	0.99+
two things	QUANTITY	0.99+
first	QUANTITY	0.99+
Second step	QUANTITY	0.99+
Paige	PERSON	0.99+
each line	QUANTITY	0.99+
two branches	QUANTITY	0.99+
Today	DATE	0.99+
two options	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
300 trees	QUANTITY	0.99+
two approaches	QUANTITY	0.99+
millions	QUANTITY	0.99+
first slide	QUANTITY	0.99+
TensorFlow	TITLE	0.99+
Tapping Vertica's Integration with TensorFlow for Advanced Machine Learning	TITLE	0.99+
two types	QUANTITY	0.99+
two different classes	QUANTITY	0.99+
today	DATE	0.99+
both	QUANTITY	0.99+
Vertica	TITLE	0.99+
first one	QUANTITY	0.98+
two cons	QUANTITY	0.97+
about ten minutes	QUANTITY	0.97+
two hidden layers	QUANTITY	0.97+
French	OTHER	0.96+
each node	QUANTITY	0.95+
one	QUANTITY	0.95+
end this week	DATE	0.94+
two ID	QUANTITY	0.91+
four things	QUANTITY	0.89+

UNLIST TILL 4/2 - End-to-End Security

>> Paige: Hello everybody and thank you for joining us today for the virtual Vertica BDC 2020. Today's breakout session is entitled End-to-End Security in Vertica. I'm Paige Roberts, Open Source Relations Manager at Vertica. I'll be your host for this session. Joining me is Vertica Software Engineers, Fenic Fawkes and Chris Morris. Before we begin, I encourage you to submit your questions or comments during the virtual session. You don't have to wait until the end. Just type your question or comment in the question box below the slide as it occurs to you and click submit. There will be a Q&A session at the end of the presentation and we'll answer as many questions as we're able to during that time. Any questions that we don't address, we'll do our best to answer offline. Also, you can visit Vertica forums to post your questions there after the session. Our team is planning to join the forums to keep the conversation going, so it'll be just like being at a conference and talking to the engineers after the presentation. Also, a reminder that you can maximize your screen by clicking the double arrow button in the lower right corner of the slide. And before you ask, yes, this whole session is being recorded and it will be available to view on-demand this week. We'll send you a notification as soon as it's ready. I think we're ready to get started. Over to you, Fen. >> Fenic: Hi, welcome everyone. My name is Fen. My pronouns are fae/faer and Chris will be presenting the second half, and his pronouns are he/him. So to get started, let's kind of go over what the goals of this presentation are. First off, no deployment is the same. So we can't give you an exact, like, here's the right way to secure Vertica because how it is to set up a deployment is a factor. But the biggest one is, what is your threat model? So, if you don't know what a threat model is, let's take an example. We're all working from home because of the coronavirus and that introduces certain new risks. Our source code is on our laptops at home, that kind of thing. But really our threat model isn't that people will read our code and copy it, like, over our shoulders. So we've encrypted our hard disks and that kind of thing to make sure that no one can get them. So basically, what we're going to give you are building blocks and you can pick and choose the pieces that you need to secure your Vertica deployment. We hope that this gives you a good foundation for how to secure Vertica. And now, what we're going to talk about. So we're going to start off by going over encryption, just how to secure your data from attackers. And then authentication, which is kind of how to log in. Identity, which is who are you? Authorization, which is now that we know who you are, what can you do? Delegation is about how Vertica talks to other systems. And then auditing and monitoring. So, how do you protect your data in transit? Vertica makes a lot of network connections. Here are the important ones basically. There are clients talk to Vertica cluster. Vertica cluster talks to itself. And it can also talk to other Vertica clusters and it can make connections to a bunch of external services. So first off, let's talk about client-server TLS. Securing data between, this is how you secure data between Vertica and clients. It prevents an attacker from sniffing network traffic and say, picking out sensitive data. Clients have a way to configure how strict the authentication is of the server cert. It's called the Client SSLMode and we'll talk about this more in a bit but authentication methods can disable non-TLS connections, which is a pretty cool feature. Okay, so Vertica also makes a lot of network connections within itself. So if Vertica is running behind a strict firewall, you have really good network, both physical and software security, then it's probably not super important that you encrypt all traffic between nodes. But if you're on a public cloud, you can set up AWS' firewall to prevent connections, but if there's a vulnerability in that, then your data's all totally vulnerable. So it's a good idea to set up inter-node encryption in less secure situations. Next, import/export is a good way to move data between clusters. So for instance, say you have an on-premises cluster and you're looking to move to AWS. Import/Export is a great way to move your data from your on-prem cluster to AWS, but that means that the data is going over the open internet. And that is another case where an attacker could try to sniff network traffic and pull out credit card numbers or whatever you have stored in Vertica that's sensitive. So it's a good idea to secure data in that case. And then we also connect to a lot of external services. Kafka, Hadoop, S3 are three of them. Voltage SecureData, which we'll talk about more in a sec, is another. And because of how each service deals with authentication, how to configure your authentication to them differs. So, see our docs. And then I'd like to talk a little bit about where we're going next. Our main goal at this point is making Vertica easier to use. Our first objective was security, was to make sure everything could be secure, so we built relatively low-level building blocks. Now that we've done that, we can identify common use cases and automate them. And that's where our attention is going. Okay, so we've talked about how to secure your data over the network, but what about when it's on disk? There are several different encryption approaches, each depends on kind of what your use case is. RAID controllers and disk encryption are mostly for on-prem clusters and they protect against media theft. They're invisible to Vertica. S3 and GCP are kind of the equivalent in the cloud. They also invisible to Vertica. And then there's field-level encryption, which we accomplish using Voltage SecureData, which is format-preserving encryption. So how does Voltage work? Well, it, the, yeah. It encrypts values to things that look like the same format. So for instance, you can see date of birth encrypted to something that looks like a date of birth but it is not in fact the same thing. You could do cool stuff like with a credit card number, you can encrypt only the first 12 digits, allowing the user to, you know, validate the last four. The benefits of format-preserving encryption are that it doesn't increase database size, you don't need to alter your schema or anything. And because of referential integrity, it means that you can do analytics without unencrypting the data. So again, a little diagram of how you could work Voltage into your use case. And you could even work with Vertica's row and column access policies, which Chris will talk about a bit later, for even more customized access control. Depending on your use case and your Voltage integration. We are enhancing our Voltage integration in several ways in 10.0 and if you're interested in Voltage, you can go see their virtual BDC talk. And then again, talking about roadmap a little, we're working on in-database encryption at rest. What this means is kind of a Vertica solution to encryption at rest that doesn't depend on the platform that you're running on. Encryption at rest is hard. (laughs) Encrypting, say, 10 petabytes of data is a lot of work. And once again, the theme of this talk is everyone has a different key management strategy, a different threat model, so we're working on designing a solution that fits everyone. If you're interested, we'd love to hear from you. Contact us on the Vertica forums. All right, next up we're going to talk a little bit about access control. So first off is how do I prove who I am? How do I log in? So, Vertica has several authentication methods. Which one is best depends on your deployment size/use case. Again, theme of this talk is what you should use depends on your use case. You could order authentication methods by priority and origin. So for instance, you can only allow connections from within your internal network or you can enforce TLS on connections from external networks but relax that for connections from your internal network. That kind of thing. So we have a bunch of built-in authentication methods. They're all password-based. User profiles allow you to set complexity requirements of passwords and you can even reject non-TLS connections, say, or reject certain kinds of connections. Should only be used by small deployments because you probably have an LDAP server, where you manage users if you're a larger deployment and rather than duplicating passwords and users all in LDAP, you should use LDAP Auth, where Vertica still has to keep track of users, but each user can then use LDAP authentication. So Vertica doesn't store the password at all. The client gives Vertica a username and password and Vertica then asks the LDAP server is this a correct username or password. And the benefits of this are, well, manyfold, but if, say, you delete a user from LDAP, you don't need to remember to also delete their Vertica credentials. You can just, they won't be able to log in anymore because they're not in LDAP anymore. If you like LDAP but you want something a little bit more secure, Kerberos is a good idea. So similar to LDAP, Vertica doesn't keep track of who's allowed to log in, it just keeps track of the Kerberos credentials and it even, Vertica never touches the user's password. Users log in to Kerberos and then they pass Vertica a ticket that says "I can log in." It is more complex to set up, so if you're just getting started with security, LDAP is probably a better option. But Kerberos is, again, a little bit more secure. If you're looking for something that, you know, works well for applications, certificate auth is probably what you want. Rather than hardcoding a password, or storing a password in a script that you use to run an application, you can instead use a certificate. So, if you ever need to change it, you can just replace the certificate on disk and the next time the application starts, it just picks that up and logs in. Yeah. And then, multi-factor auth is a feature request we've gotten in the past and it's not built-in to Vertica but you can do it using Kerberos. So, security is a whole application concern and fitting MFA into your workflow is all about fitting it in at the right layer. And we believe that that layer is above Vertica. If you're interested in more about how MFA works and how to set it up, we wrote a blog on how to do it. And now, over to Chris, for more on identity and authorization. >> Chris: Thanks, Fen. Hi everyone, I'm Chris. So, we're a Vertica user and we've connected to Vertica but once we're in the database, who are we? What are we? So in Vertica, the answer to that questions is principals. Users and roles, which are like groups in other systems. Since roles can be enabled and disabled at will and multiple roles can be active, they're a flexible way to use only the privileges you need in the moment. For example here, you've got Alice who has Dbadmin as a role and those are some elevated privileges. She probably doesn't want them active all the time, so she can set the role and add them to her identity set. All of this information is stored in the catalog, which is basically Vertica's metadata storage. How do we manage these principals? Well, depends on your use case, right? So, if you're a small organization or maybe only some people or services need Vertica access, the solution is just to manage it with Vertica. You can see some commands here that will let you do that. But what if we're a big organization and we want Vertica to reflect what's in our centralized user management system? Sort of a similar motivating use case for LDAP authentication, right? We want to avoid duplication hassles, we just want to centralize our management. In that case, we can use Vertica's LDAPLink feature. So with LDAPLink, principals are mirrored from LDAP. They're synced in a considerable fashion from the LDAP into Vertica's catalog. What this does is it manages creating and dropping users and roles for you and then mapping the users to the roles. Once that's done, you can do any Vertica-specific configuration on the Vertica side. It's important to note that principals created in Vertica this way, support multiple forms of authentication, not just LDAP. This is a separate feature from LDAP authentication and if you created a user via LDAPLink, you could have them use a different form of authentication, Kerberos, for example. Up to you. Now of course this kind of system is pretty mission-critical, right? You want to make sure you get the right roles and the right users and the right mappings in Vertica. So you probably want to test it. And for that, we've got new and improved dry run functionality, from 9.3.1. And what this feature offers you is new metafunctions that let you test various parameters without breaking your real LDAPLink configuration. So you can mess around with parameters and the configuration as much as you want and you can be sure that all of that is strictly isolated from the live system. Everything's separated. And when you use this, you get some really nice output through a Data Collector table. You can see some example output here. It runs the same logic as the real LDAPLink and provides detailed information about what would happen. You can check the documentation for specifics. All right, so we've connected to the database, we know who we are, but now, what can we do? So for any given action, you want to control who can do that, right? So what's the question you have to ask? Sometimes the question is just who are you? It's a simple yes or no question. For example, if I want to upgrade a user, the question I have to ask is, am I the superuser? If I'm the superuser, I can do it, if I'm not, I can't. But sometimes the actions are more complex and the question you have to ask is more complex. Does the principal have the required privileges? If you're familiar with SQL privileges, there are things like SELECT, INSERT, and Vertica has a few of their own, but the key thing here is that an action can require specific and maybe even multiple privileges on multiple objects. So for example, when selecting from a table, you need USAGE on the schema and SELECT on the table. And there's some other examples here. So where do these privileges come from? Well, if the action requires a privilege, these are the only places privileges can come from. The first source is implicit privileges, which could come from owning the object or from special roles, which we'll talk about in a sec. Explicit privileges, it's basically a SQL standard GRANT system. So you can grant privileges to users or roles and optionally, those users and roles could grant them downstream. Discretionary access control. So those are explicit and they come from the user and the active roles. So the whole identity set. And then we've got Vertica-specific inherited privileges and those come from the schema, and we'll talk about that in a sec as well. So these are the special roles in Vertica. First role, DBADMIN. This isn't the Dbadmin user, it's a role. And it has specific elevated privileges. You can check the documentation for those exact privileges but it's less than the superuser. The PSEUDOSUPERUSER can do anything the real superuser can do and you can grant this role to whomever. The DBDUSER is actually a role, can run Database Designer functions. SYSMONITOR gives you some elevated auditing permissions and we'll talk about that later as well. And finally, PUBLIC is a role that everyone has all the time so anything you want to be allowed for everyone, attach to PUBLIC. Imagine this scenario. I've got a really big schema with lots of relations. Those relations might be changing all the time. But for each principal that uses this schema, I want the privileges for all the tables and views there to be roughly the same. Even though the tables and views come and go, for example, an analyst might need full access to all of them no matter how many there are or what there are at any given time. So to manage this, my first approach I could use is remember to run grants every time a new table or view is created. And not just you but everyone using this schema. Not only is it a pain, it's hard to enforce. The second approach is to use schema-inherited privileges. So in Vertica, schema grants can include relational privileges. For example, SELECT or INSERT, which normally don't mean anything for a schema, but they do for a table. If a relation's marked as inheriting, then the schema grants to a principal, for example, salespeople, also apply to the relation. And you can see on the diagram here how the usage applies to the schema and the SELECT technically but in Sales.foo table, SELECT also applies. So now, instead of lots of GRANT statements for multiple object owners, we only have to run one ALTER SCHEMA statement and three GRANT statements and from then on, any time that you grant some privileges or revoke privileges to or on the schema, to or from a principal, all your new tables and views will get them automatically. So it's dynamically calculated. Now of course, setting it up securely, is that you want to know what's happened here and what's going on. So to monitor the privileges, there are three system tables which you want to look at. The first is grants, which will show you privileges that are active for you. That is your user and active roles and theirs and so on down the chain. Grants will show you the explicit privileges and inherited_privileges will show you the inherited ones. And then there's one more inheriting_objects which will show all tables and views which inherit privileges so that's useful more for not seeing privileges themselves but managing inherited privileges in general. And finally, how do you see all privileges from all these sources, right? In one go, you want to see them together? Well, there's a metafunction added in 9.3.1. Get_privileges_description which will, given an object, it will sum up all the privileges for a current user on that object. I'll refer you to the documentation for usage and supported types. Now, the problem with SELECT. SELECT let's you see everything or nothing. You can either read the table or you can't. But what if you want some principals to see subset or a transformed version of the data. So for example, I have a table with personnel data and different principals, as you can see here, need different access levels to sensitive information. Social security numbers. Well, one thing I could do is I could make a view for each principal. But I could also use access policies and access policies can do this without introducing any new objects or dependencies. It centralizes your restriction logic and makes it easier to manage. So what do access policies do? Well, we've got row and column access policies. Rows will hide and column access policies will transform data in the row or column, depending on who's doing the SELECTing. So it transforms the data, as we saw on the previous slide, to look as requested. Now, if access policies let you see the raw data, you can still modify the data. And the implication of this is that when you're crafting access policies, you should only use them to refine access for principals that need read-only access. That is, if you want a principal to be able to modify it, the access policies you craft should let through the raw data for that principal. So in our previous example, the loader service should be able to see every row and it should be able to see untransformed data in every column. And as long as that's true, then they can continue to load into this table. All of this is of course monitorable by a system table, in this case access_policy. Check the docs for more information on how to implement these. All right, that's it for access control. Now on to delegation and impersonation. So what's the question here? Well, the question is who is Vertica? And that might seem like a silly question, but here's what I mean by that. When Vertica's connecting to a downstream service, for example, cloud storage, how should Vertica identify itself? Well, most of the time, we do the permissions check ourselves and then we connect as Vertica, like in this diagram here. But sometimes we can do better. And instead of connecting as Vertica, we connect with some kind of upstream user identity. And when we do that, we let the service decide who can do what, so Vertica isn't the only line of defense. And in addition to the defense in depth benefit, there are also benefits for auditing because the external system can see who is really doing something. It's no longer just Vertica showing up in that external service's logs, it's somebody like Alice or Bob, trying to do something. One system where this comes into play is with Voltage SecureData. So, let's look at a couple use cases. The first one, I'm just encrypting for compliance or anti-theft reasons. In this case, I'll just use one global identity to encrypt or decrypt with Voltage. But imagine another use case, I want to control which users can decrypt which data. Now I'm using Voltage for access control. So in this case, we want to delegate. The solution here is on the Voltage side, give Voltage users access to appropriate identities and these identities control encryption for sets of data. A Voltage user can access multiple identities like groups. Then on the Vertica side, a Vertica user can set their Voltage username and password in a session and Vertica will talk to Voltage as that Voltage user. So in the diagram here, you can see an example of how this is leverage so that Alice could decrypt something but Bob cannot. Another place the delegation paradigm shows up is with storage. So Vertica can store and interact with data on non-local file systems. For example, HGFS or S3. Sometimes Vertica's storing Vertica-managed data there. For example, in Eon mode, you might store your projections in communal storage in S3. But sometimes, Vertica is interacting with external data. For example, this usually maps to a user storage location in the Vertica side and it might, on the external storage side, be something like Parquet files on Hadoop. And in that case, it's not really Vertica's data and we don't want to give Vertica more power than it needs, so let's request the data on behalf of who needs it. Lets say I'm an analyst and I want to copy from or export to Parquet, using my own bucket. It's not Vertica's bucket, it's my data. But I want Vertica to manipulate data in it. So the first option I have is to give Vertica as a whole access to the bucket and that's problematic because in that case, Vertica becomes kind of an AWS god. It can see any bucket, any Vertica user might want to push or pull data to or from any time Vertica wants. So it's not good for the principals of least access and zero trust. And we can do better than that. So in the second option, use an ID and secret key pair for an AWS, IAM, if you're familiar, principal that does have access to the bucket. So I might use my, the analyst, credentials, or I might use credentials for an AWS role that has even fewer privileges than I do. Sort of a restricted subset of my privileges. And then I use that. I set it in Vertica at the session level and Vertica will use those credentials for the copy export commands. And it gives more isolation. Something that's in the works is support for keyless delegation, using assumable IAM roles. So similar benefits to option two here, but also not having to manage keys at the user level. We can do basically the same thing with Hadoop and HGFS with three different methods. So first option is Kerberos delegation. I think it's the most secure. It definitely, if access control is your primary concern here, this will give you the tightest access control. The downside is it requires the most configuration outside of Vertica with Kerberos and HGFS but with this, you can really determine which Vertica users can talk to which HGFS locations. Then, you've got secure impersonation. If you've got a highly trusted Vertica userbase, or at least some subset of it is, and you're not worried about them doing things wrong but you want to know about auditing on the HGFS side, that's your primary concern, you can use this option. This diagram here gives you a visual overview of how that works. But I'll refer you to the docs for details. And then finally, option three, this is bringing your own delegation token. It's similar to what we do with AWS. We set something in the session level, so it's very flexible. The user can do it at an ad hoc basis, but it is manual, so that's the third option. Now on to auditing and monitoring. So of course, we want to know, what's happening in our database? It's important in general and important for incident response, of course. So your first stop, to answer this question, should be system tables. And they're a collection of information about events, system state, performance, et cetera. They're SELECT-only tables, but they work in queries as usual. The data is just loaded differently. So there are two types generally. There's the metadata table, which stores persistent information or rather reflects persistent information stored in the catalog, for example, users or schemata. Then there are monitoring tables, which reflect more transient information, like events, system resources. Here you can see an example of output from the resource pool's storage table which, these are actually, despite that it looks like system statistics, they're actually configurable parameters for using that. If you're interested in resource pools, a way to handle users' resource allocation and various principal's resource allocation, again, check that out on the docs. Then of course, there's the followup question, who can see all of this? Well, some system information is sensitive and we should only show it to those who need it. Principal of least privilege, right? So of course the superuser can see everything, but what about non-superusers? How do we give access to people that might need additional information about the system without giving them too much power? One option's SYSMONITOR, as I mentioned before, it's a special role. And this role can always read system tables but not change things like a superuser would be able to. Just reading. And another option is the RESTRICT and RELEASE metafunctions. Those grant and revoke access to from a certain system table set, to and from the PUBLIC role. But the downside of those approaches is that they're inflexible. So they only give you, they're all or nothing. For a specific preset of tables. And you can't really configure it per table. So if you're willing to do a little more setup, then I'd recommend using your own grants and roles. System tables support GRANT and REVOKE statements just like any regular relations. And in that case, I wouldn't even bother with SYSMONITOR or the metafunctions. So to do this, just grant whatever privileges you see fit to roles that you create. Then go ahead and grant those roles to the users that you want. And revoke access to the system tables of your choice from PUBLIC. If you need even finer-grained access than this, you can create views on top of system tables. For example, you can create a view on top of the user system table which only shows the current user's information, uses a built-in function that you can use as part of the view definition. And then, you can actually grant this to PUBLIC, so that each user in Vertica could see their own user's information and never give access to the user system table as a whole, just that view. Now if you're a superuser or if you have direct access to nodes in the cluster, filesystem/OS, et cetera, then you have more ways to see events. Vertica supports various methods of logging. You can see a few methods here which are generally outside of running Vertica, you'd interact with them in a different way, with the exception of active events which is a system table. We've also got the data collector. And that sorts events by subjects. So what the data collector does, it extends the logging and system table functionality, by the component, is what it's called in the documentation. And it logs these events and information to rotating files. For example, AnalyzeStatistics is a function that could be of use by users and as a database administrator, you might want to monitor that so you can use the data collector for AnalyzeStatistics. And the files that these create can be exported into a monitoring database. One example of that is with the Management Console Extended Monitoring. So check out their virtual BDC talk. The one on the management console. And that's it for the key points of security in Vertica. Well, many of these slides could spawn a talk on their own, so we encourage you to check out our blog, check out the documentation and the forum for further investigation and collaboration. Hopefully the information we provided today will inform your choices in securing your deployment of Vertica. Thanks for your time today. That concludes our presentation. Now, we're ready for Q&A.

Published Date : Mar 30 2020

SUMMARY :

in the question box below the slide as it occurs to you So for instance, you can see date of birth encrypted and the question you have to ask is more complex.

ENTITIES

Entity	Category	Confidence
Chris	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Chris Morris	PERSON	0.99+
second option	QUANTITY	0.99+
Vertica	ORGANIZATION	0.99+
Paige Roberts	PERSON	0.99+
two types	QUANTITY	0.99+
first option	QUANTITY	0.99+
three	QUANTITY	0.99+
Alice	PERSON	0.99+
second approach	QUANTITY	0.99+
Paige	PERSON	0.99+
third option	QUANTITY	0.99+
AWS'	ORGANIZATION	0.99+
today	DATE	0.99+
Today	DATE	0.99+
first approach	QUANTITY	0.99+
second half	QUANTITY	0.99+
each service	QUANTITY	0.99+
Bob	PERSON	0.99+
10 petabytes	QUANTITY	0.99+
Fenic	PERSON	0.99+
first	QUANTITY	0.99+
first source	QUANTITY	0.99+
first one	QUANTITY	0.99+
Fen	PERSON	0.98+
S3	TITLE	0.98+
One system	QUANTITY	0.98+
first objective	QUANTITY	0.98+
each user	QUANTITY	0.98+
First role	QUANTITY	0.97+
each principal	QUANTITY	0.97+
4/2	DATE	0.97+
each	QUANTITY	0.97+
both	QUANTITY	0.97+
Vertica	TITLE	0.97+
First	QUANTITY	0.97+
one	QUANTITY	0.96+
this week	DATE	0.95+
three different methods	QUANTITY	0.95+
three system tables	QUANTITY	0.94+
one thing	QUANTITY	0.94+
Fenic Fawkes	PERSON	0.94+
Parquet	TITLE	0.94+
Hadoop	TITLE	0.94+
One example	QUANTITY	0.93+
Dbadmin	PERSON	0.92+
10.0	QUANTITY	0.92+

UNLIST TILL 4/2 - Keep Data Private

>> Paige: Hello everybody and thank you for joining us today for the Virtual Vertica BDC 2020. Today's breakout session is entitled Keep Data Private Prepare and Analyze Without Unencrypting With Voltage SecureData for Vertica. I'm Paige Roberts, Open Source Relations Manager at Vertica, and I'll be your host for this session. Joining me is Rich Gaston, Global Solutions Architect, Security, Risk, and Government at Voltage. And before we begin, I encourage you to submit your questions or comments during the virtual session, you don't have to wait till the end. Just type your question as it occurs to you, or comment, in the question box below the slide and then click Submit. There'll be a Q&A session at the end of the presentation where we'll try to answer as many of your questions as we're able to get to during the time. Any questions that we don't address we'll do our best to answer offline. Now, if you want, you can visit the Vertica Forum to post your questions there after the session. Now, that's going to take the place of the Developer Lounge, and our engineering team is planning to join the Forum, to keep the conversation going. So as a reminder, you can also maximize your screen by clicking the double arrow button, in the lower-right corner of the slides. That'll allow you to see the slides better. And before you ask, yes, this virtual session is being recorded and it will be available to view on-demand this week. We'll send you a notification as soon as it's ready. All right, let's get started. Over to you, Rich. >> Rich: Hey, thank you very much, Paige, and appreciate the opportunity to discuss this topic with the audience. My name is Rich Gaston and I'm a Global Solutions Architect, within the Micro Focus team, and I work on global Data privacy and protection efforts, for many different organizations, looking to take that journey toward breach defense and regulatory compliance, from platforms ranging from mobile to mainframe, everything in between, cloud, you name it, we're there in terms of our solution sets. Vertica is one of our major partners in this space, and I'm very excited to talk with you today about our solutions on the Vertica platform. First, let's talk a little bit about what you're not going to learn today, and that is, on screen you'll see, just part of the mathematics that goes into, the format-preserving encryption algorithm. We are the originators and authors and patent holders on that algorithm. Came out of research from Stanford University, back in the '90s, and we are very proud, to take that out into the market through the NIST standard process, and license that to others. So we are the originators and maintainers, of both standards and athureader in the industry. We try to make this easy and you don't have to learn any of this tough math. Behind this there are also many other layers of technology. They are part of the security, the platform, such as stateless key management. That's a really complex area, and we make it very simple for you. We have very mature and powerful products in that space, that really make your job quite easy, when you want to implement our technology within Vertica. So today, our goal is to make Data protection easy for you, to be able to understand the basics of Voltage Secure Data, you're going to be learning how the Vertica UDx, can help you get started quickly, and we're going to see some examples of how Vertica plus Voltage Secure Data, are going to be working together, in our customer cases out in the field. First, let's take you through a quick introduction to Voltage Secure Data. The business drivers and what's this all about. First of all, we started off with Breach Defense. We see that despite continued investments, in personal perimeter and platform security, Data breaches continue to occur. Voltage Secure Data plus Vertica, provides defense in depth for sensitive Data, and that's a key concept that we're going to be referring to. in the security field defense in depth, is a standard approach to be able to provide, more layers of protection around sensitive assets, such as your Data, and that's exactly what Secure Data is designed to do. Now that we've come through many of these breach examples, and big ticket items, getting the news around breaches and their impact, the business regulators have stepped up, and regulatory compliance, is now a hot topic in Data privacy. Regulations such as GDPR came online in 2018 for the EU. CCPA came online just this year, a couple months ago for California, and is the de-facto standard for the United States now, as organizations are trying to look at, the best practices for providing, regulatory compliance around Data privacy and protection. These gives massive new rights to consumers, but also obligations to organizations, to protect that personal Data. Secure Data Plus Vertica provides, fine grained authorization around sensitive Data, And we're going to show you exactly how that works, within the Vertica platform. At the bottom, you'll see some of the snippets there, of the news articles that just keep racking up, and our goal is to keep you off the news, to keep your company safe, so that you can have the assurance, that even if there is an unintentional, or intentional breach of Data out of the corporation, if it is protected by voltage Secure Data, it will be of no value to those hackers, and then you have no impact, in terms of risk to the organization. What do we mean by defense in depth? Let's take a look first at the encryption types, and the benefits that they provide, and we see our customers implementing, all kinds of different protection mechanisms, within the organization. You could be looking at disk level protection, file system protection, protection on the files themselves. You could protect the entire Database, you could protect our transmissions, as they go from the client to the server via TLS, or other protected tunnels. And then we look at Field-level Encryption, and that's what we're talking about today. That's all the above protections, at the perimeter level at the platform level. Plus, we're giving you granular access control, to your sensitive Data. Our main message is, keep the Data protected for at the earliest possible point, and only access it, when you have a valid business need to do so. That's a really critical aspect as we see Vertica customers, loading terabytes, petabytes of Data, into clusters of Vertica console, Vertica Database being able to give access to that Data, out to a wide variety of end users. We started off with organizations having, four people in an office doing Data science, or analytics, or Data warehousing, or whatever it's called within an organization, and that's now ballooned out, to a new customer coming in and telling us, we're going to have 1000 people accessing it, plus service accounts accessing Vertica, we need to be able to provide fine level access control, and be able to understand what are folks doing with that sensitive Data? And how can we Secure it, the best practices possible. In very simple state, voltage protect Data at rest and in motion. The encryption of Data facilitates compliance, and it reduces your risk of breach. So if you take a look at what we mean by feel level, we could take a name, that name might not just be in US ASCII. Here we have a sort of Latin one extended, example of Harold Potter, and we could take a look at the example protected Data. Notice that we're taking a character set approach, to protecting it, meaning, I've got an alphanumeric option here for the format, that I'm applying to that name. That gives me a mix of alpha and numeric, and plus, I've got some of that Latin one extended alphabet in there as well, and that's really controllable by the end customer. They can have this be just US ASCII, they can have it be numbers for numbers, you can have a wide variety, of different protection mechanisms, including ignoring some characters in the alphabet, in case you want to maintain formatting. We've got all the bells and whistles, that you would ever want, to put on top of format preserving encryption, and we continue to add more to that platform, as we go forward. Taking a look at tax ID, there's an example of numbers for numbers, pretty basic, but it gives us the sort of idea, that we can very quickly and easily keep the Data protected, while maintaining the format. No schema changes are going to be required, when you want to protect that Data. If you look at credit card number, really popular example, and the same concept can be applied to tax ID, often the last four digits will be used in a tax ID, to verify someone's identity. That could be on an automated telephone system, it could be a customer service representative, just trying to validate the security of the customer, and we can keep that Data in the clear for that purpose, while protecting the entire string from breach. Dates are another critical area of concern, for a lot of medical use cases. But we're seeing Date of Birth, being included in a lot of Data privacy conversations, and we can protect dates with dates, they're going to be a valid date, and we have some really nifty tools, to maintain offsets between dates. So again, we've got the real depth of capability, within our encryption, that's not just saying, here's a one size fits all approach, GPS location, customer ID, IP address, all of those kinds of Data strings, can be protected by voltage Secure Data within Vertica. Let's take a look at the UDx basics. So what are we doing, when we add Voltage to Vertica? Vertica stays as is in the center. In fact, if you get the Vertical distribution, you're getting the Secure Data UDx onboard, you just need to enable it, and have Secure Data virtual appliance, that's the box there on the middle right. That's what we come in and add to the mix, as we start to be able to add those capabilities to Vertica. On the left hand side, you'll see that your users, your service accounts, your analytics, are still typically doing Select, Update, Insert, Delete, type of functionality within Vertica. And they're going to come into Vertica's access control layer, they're going to also access those services via SQL, and we simply extend SQL for Vertica. So when you add the UDx, you get additional syntax that we can provide, and we're going to show you examples of that. You can also integrate that with concepts, like Views within Vertica. So that we can say, let's give a view of Data, that gives the Data in the clear, using the UDx to decrypt that Data, and let's give everybody else, access to the raw Data which is protected. Third parties could be brought in, folks like contractors or folks that aren't vetted, as closely as a security team might do, for internal sensitive Data access, could be given access to the Vertical cluster, without risk of them breaching and going into some area, they're not supposed to take a look at. Vertica has excellent control for access, down even to the column level, which is phenomenal, and really provides you with world class security, around the Vertical solution itself. Secure Data adds another layer of protection, like we're mentioning, so that we can have Data protected in use, Data protected at rest, and then we can have the ability, to share that protected Data throughout the organization. And that's really where Secure Data shines, is the ability to protect that Data on mainframe, on mobile, and open systems, in the cloud, everywhere you want to have that Data move to and from Vertica, then you can have Secure Data, integrated with those endpoints as well. That's an additional solution on top, the Secure Data Plus Vertica solution, that is bundled together today for a sales purpose. But we can also have that conversation with you, about those wider Secure Data use cases, we'd be happy to talk to you about that. Security to the virtual appliance, is a lightweight appliance, sits on something like eight cores, 16 gigs of RAM, 100 gig of disk or 200 gig of disk, really a lightweight appliance, you can have one or many. Most customers have four in production, just for redundancy, they don't need them for scale. But we have some customers with 16 or more in production, because they're running such high volumes of transaction load. They're running a lot of web service transactions, and they're running Vertica as well. So we're going to have those virtual appliances, as co-located around the globe, hooked up to all kinds of systems, like Syslog, LDAP, load balancers, we've got a lot of capability within the appliance, to fit into your enterprise IP landscape. So let me get you directly into the neat, of what does the UDx do. If you're technical and you know SQL, this is probably going to be pretty straightforward to you, you'll see the copy command, used widely in Vertica to get Data into Vertica. So let's try to protect that Data when we're ingesting it. Let's grab it from maybe a CSV file, and put it straight into Vertica, but protected on the way and that's what the UDx does. We have Voltage Secure protectors, an added syntax, like I mentioned, to the Vertica SQL. And that allows us to say, we're going to protect the customer first name, using the parameters of hyper alphanumeric. That's our internal lingo of a format, within Secure Data, this part of our API, the API is require very few inputs. The format is the one, that you as a developer will be supplying, and you'll have different ones for maybe SSN, you'll have different formats for street address, but you can reuse a lot of your formats, across a lot of your PII, PHI Data types. Protecting after ingest is also common. So I've got some Data, that's already been put into a staging area, perhaps I've got a landing zone, a sandbox of some sort, now I want to be able to move that, into a different zone in Vertica, different area of the schema, and I want to have that Data protected. We can do that with the update command, and simply again, you'll notice Voltage Secure protect, nothing too wild there, basically the same syntax. We're going to query unprotected Data. How do we search once I've encrypted all my Data? Well, actually, there's a pretty nifty trick to do so. If you want to be able to query unprotected Data, and we have the search string, like a phone number there in this example, simply call Voltage Secure protect on that, now you'll have the cipher text, and you'll be able to search the stored cipher text. Again, we're just format preserving encrypting the Data, and it's just a string, and we can always compare those strings, using standard syntax and SQL. Using views to decrypt Data, again a powerful concept, in terms of how to make this work, within the Vertica Landscape, when you have a lot of different groups of users. Views are very powerful, to be able to point a BI tool, for instance, business intelligence tools, Cognos, Tableau, etc, might be accessing Data from Vertica with simple queries. Well, let's point them to a view that does the hard work, and uses the Vertical nodes, and its horsepower of CPU and RAM, to actually run that Udx, and do the decryption of the Data in use, temporarily in memory, and then throw that away, so that it can't be breached. That's a nice way to keep your users active and working and going forward, with their Data access and Data analytics, while also keeping the Data Secure in the process. And then we might want to export some Data, and push it out to someone in a clear text manner. We've got a third party, needs to take the tax ID along with some Data, to do some processing, all we need to do is call Voltage Secure Access, again, very similar to the protect call, and you're writing the parameter again, and boom, we have decrypted the Data and used again, the Vertical resources of RAM and CPU and horsepower, to do the work. All we're doing with Voltage Secure Data Appliance, is a real simple little key fetch, across a protected tunnel, that's a tiny atomic transaction, gets done very quick, and you're good to go. This is it in terms of the UDx, you have a couple of calls, and one parameter to pass, everything else is config driven, and really, you're up and running very quickly. We can even do demos and samples of this Vertical Udx, using hosted appliances, that we put up for pre sales purposes. So folks want to get up and get a demo going. We could take that Udx, configure it to point to our, appliance sitting on the internet, and within a couple of minutes, we're up and running with some simple use cases. Of course, for on-prem deployment, or deployment in the cloud, you'll want your own appliance in your own crypto district, you have your own security, but it just shows, that we can easily connect to any appliance, and get this working in a matter of minutes. Let's take a look deeper at the voltage plus Vertica solution, and we'll describe some of the use cases and path to success. First of all your steps to, implementing Data-centric security and Vertica. Want to note there on the left hand side, identify sensitive Data. How do we do this? I have one customer, where they look at me and say, Rich, we know exactly what our sensitive Data is, we develop the schema, it's our own App, we have a customer table, we don't need any help in this. We've got other customers that say, Rich, we have a very complex Database environment, with multiple Databases, multiple schemas, thousands of tables, hundreds of thousands of columns, it's really, really complex help, and we don't know what people have been doing exactly, with some of that Data, We've got various teams that share this resource. There, we do have additional tools, I wanted to give a shout out to another microfocus product, which is called Structured Data Manager. It's a great tool that helps you identify sensitive Data, with some really amazing technology under the hood, that can go into a Vertica repository, scan those tables, take a sample of rows or a full table scan, and give you back some really good reports on, we think this is sensitive, let's go confirm it, and move forward with Data protection. So if you need help on that, we've got the tools to do it. Once you identify that sensitive Data, you're going to want to understand, your Data flows and your use cases. Take a look at what analytics you're doing today. What analytics do you want to do, on sensitive Data in the future? Let's start designing our analytics, to work with sensitive Data, and there's some tips and tricks that we can provide, to help you mitigate, any kind of concerns around performance, or any kind of concerns around rewriting your SQL. As you've noted, you can just simply insert our SQL additions, into your code and you're off and running. You want to install and configure the Udx, and secure Data software plants. Well, the UDx is pretty darn simple. The documentation on Vertica is publicly available, you could see how that works, and what you need to configure it, one file here, and you're ready to go. So that's pretty straightforward to process, either grant some access to the Udx, and that's really up to the customer, because there are many different ways, to handle access control in Vertica, we're going to be flexible to fit within your model, of access control and adding the UDx to your mix. Each customer is a little different there, so you might want to talk with us a little bit about, the best practices for your use cases. But in general, that's going to be up and running in just a minute. The security software plants, hardened Linux appliance today, sits on-prem or in the cloud. And you can deploy that. I've seen it done in 15 minutes, but that's what the real tech you had, access to being able to generate a search, and do all this so that, your being able to set the firewall and all the DNS entries, the basically blocking and tackling of a software appliance, you get that done, corporations can take care of that, in just a couple of weeks, they get it all done, because they have wait waiting on other teams, but the software plants are really fast to get stood up, and they're very simple to administer, with our web based GUI. Then finally, you're going to implement your UDx use cases. Once the software appliance is up and running, we can set authentication methods, we could set up the format that you're going to use in Vertica, and then those two start talking together. And it should be going in dev and test in about half a day, and then you're running toward production, in just a matter of days, in most cases. We've got other customers that say, Hey, this is going to be a bigger migration project for us. We might want to split this up into chunks. Let's do the real sensitive and scary Data, like tax ID first, as our sort of toe in the water approach, and then we'll come back and protect other Data elements. That's one way to slice and dice, and implement your solution in a planned manner. Another way is schema based. Let's take a look at this section of the schema, and implement protection on these Data elements. Now let's take a look at the different schema, and we'll repeat the process, so you can iteratively move forward with your deployment. So what's the added value? When you add full Vertica plus voltage? I want to highlight this distinction because, Vertica contains world class security controls, around their Database. I'm an old time DBA from a different product, competing against Vertica in the past, and I'm really aware of the granular access controls, that are provided within various platforms. Vertica would rank at the very top of the list, in terms of being able to give me very tight control, and a lot of different AWS methods, being able to protect the Data, in a lot of different use cases. So Vertica can handle a lot of your Data protection needs, right out of the box. Voltage Secure Data, as we keep mentioning, adds that defense in-Depth, and it's going to enable those, enterprise wide use cases as well. So first off, I mentioned this, the standard of FF1, that is format preserving encryption, we're the authors of it, we continue to maintain that, and we want to emphasize that customers, really ought to be very, very careful, in terms of choosing a NIST standard, when implementing any kind of encryption, within the organization. So 8 ES was one of the first, and Hallmark, benchmark encryption algorithms, and in 2016, we were added to that mix, as FF1 with CS online. If you search NIST, and Voltage Security, you'll see us right there as the author of the standard, and all the processes that went along with that approval. We have centralized policy for key management, authentication, audit and compliance. We can now see that Vertica selected or fetch the key, to be able to protect some Data at this date and time. We can track that and be able to give you audit, and compliance reporting against that Data. You can move protected Data into and out of Vertica. So if we ingest via Kafka, and just via NiFi and Kafka, ingest on stream sets. There are a variety of different ingestion methods, and streaming methods, that can get Data into Vertica. We can integrate secure Data with all of those components. We're very well suited to integrate, with any Hadoop technology or any big Data technology, as we have API's in a variety of languages, bitness and platforms. So we've got that all out of the box, ready to go for you, if you need it. When you're moving Data out of Vertica, you might move it into an open systems platform, you might move it to the cloud, we can also operate and do the decryption there, you're going to get the same plaintext back, and if you protect Data over in the cloud, and move it into Vertica, you're going to be able to decrypt it in Vertica. That's our cross platform promise. We've been delivering on that for many, many years, and we now have many, many endpoints that do that, in production for the world's largest organization. We're going to preserve your Data format, and referential integrity. So if I protect my social security number today, I can protect another batch of Data tomorrow, and that same ciphertext will be generated, when I put that into Vertica, I can have absolute referential integrity on that Data, to be able to allow for analytics to occur, without even decrypting Data in many cases. And we have decrypt access for authorized users only, with the ability to add LDAP authentication authorization, for UDx users. So you can really have a number of different approaches, and flavors of how you implement voltage within Vertica, but what you're getting is the additional ability, to have that confidence, that we've got the Data protected at rest, even if I have a DBA that's not vetted or someone new, or I don't know where this person is from a third party, and being provided access as a DBA level privilege. They could select star from all day long, and they're going to get ciphertext, they're going to have nothing of any value, and if they want to use the UDF to decrypt it, they're going to be tracked and traced, as to their utilization of that. So it allows us to have that control, and additional layer of security on your sensitive Data. This may be required by regulatory agencies, and it's seeming that we're seeing compliance audits, get more and more strict every year. GDPR was kind of funny, because they said in 2016, hey, this is coming, they said in 2018, it's here, and now they're saying in 2020, hey, we're serious about this, and the fines are mounting. And let's give you some examples to kind of, help you understand, that these regulations are real, the fines are real, and your reputational damage can be significant, if you were to be in breach, of a regulatory compliance requirements. We're finding so many different use cases now, popping up around regional protection of Data. I need to protect this Data so that it cannot go offshore. I need to protect this Data, so that people from another region cannot see it. That's all the kind of capability that we have, within secure Data that we can add to Vertica. We have that broad platform support, and I mentioned NiFi and Kafka, those would be on the left hand side, as we start to ingest Data from applications into Vertica. We can have landing zone approaches, where we provide some automated scripting at an OS level, to be able to protect ETL batch transactions coming in. We could protect within the Vertica UDx, as I mentioned, with the copy command, directly using Vertica. Everything inside that dot dash line, is the Vertical Plus Voltage Secure Data combo, that's sold together as a single package. Additionally, we'd love to talk with you, about the stuff that's outside the dash box, because we have dozens and dozens of endpoints, that could protect and access Data, on many different platforms. And this is where you really start to leverage, some of the extensive power of secure Data, to go across platform to handle your web based apps, to handle apps in the cloud, and to handle all of this at scale, with hundreds of thousands of transactions per second, of format preserving encryption. That may not sound like much, but when you take a look at the algorithm, what we're doing on the mathematics side, when you look at everything that goes into that transaction, to me, that's an amazing accomplishment, that we're trying to reach those kinds of levels of scale, and with Vertica, it scales horizontally. So the more nodes you add, the more power you get, the more throughput you're going to get, from voltage secure Data. I want to highlight the next steps, on how we can continue to move forward. Our secure Data team is available to you, to talk about the landscape, your use cases, your Data. We really love the concept that, we've got so many different organizations out there, using secure Data in so many different and unique ways. We have vehicle manufacturers, who are protecting not just the VIN, not just their customer Data, but in fact they're protecting sensor Data from the vehicles, which is sent over the network, down to the home base every 15 minutes, for every vehicle that's on the road, and every vehicle of this customer of ours, since 2017, has included that capability. So now we're talking about, an additional millions and millions of units coming online, as those cars are sold and distributed, and used by customers. That sensor Data is critical to the customer, and they cannot let that be ex-filled in the clear. So they protect that Data with secure Data, and we have a great track record of being able to meet, a variety of different unique requirements, whether it's IoT, whether it's web based Apps, E-commerce, healthcare, all kinds of different industries, we would love to help move the conversations forward, and we do find that it's really a three party discussion, the customer, secure Data experts in some cases, and the Vertica team. We have great enablement within Vertica team, to be able to explain and present, our secure Data solution to you. But we also have that other ability to add other experts in, to keep that conversation going into a broader perspective, of how can I protect my Data across all my platforms, not just in Vertica. I want to give a shout out to our friends at Vertica Academy. They're building out a great demo and training facilities, to be able to help you learn more about these UDx's, and how they're implemented. The Academy, is a terrific reference and resource for your teams, to be able to learn more, about the solution in a self guided way, and then we'd love to have your feedback on that. How can we help you more? What are the topics you'd like to learn more about? How can we look to the future, in protecting unstructured Data? How can we look to the future, of being able to protect Data at scale? What are the requirements that we need to be meeting? Help us through the learning processes, and through feedback to the team, get better, and then we'll help you deliver more solutions, out to those endpoints and protect that Data, so that we're not having Data breach, we're not having regulatory compliance concerns. And then lastly, learn more about the Udx. I mentioned, that all of our content there, is online and available to the public. So vertica.com/secureData , you're going to be able to walk through the basics of the UDX. You're going to see how simple it is to set up, what the UDx syntax looks like, how to grant access to it, and then you'll start to be able to figure out, hey, how can I start to put this, into a PLC in my own environment? Like I mentioned before, we have publicly available hosted appliance, for demo purposes, that we can make available to you, if you want to PLC this. Reach out to us. Let's get a conversation going, and we'll get you the address and get you some instructions, we can have a quick enablement session. We really want to make this accessible to you, and help demystify the concept of encryption, because when you see it as a developer, and you start to get your hands on it and put it to use, you can very quickly see, huh, I could use this in a variety of different cases, and I could use this to protect my Data, without impacting my analytics. Those are some of the really big concerns that folks have, and once we start to get through that learning process, and playing around with it in a PLC way, that we can start to really put it to practice into production, to say, with confidence, we're going to move forward toward Data encryption, and have a very good result, at the end of the day. This is one of the things I find with customers, that's really interesting. Their biggest stress, is not around the timeframe or the resource, it's really around, this is my Data, I have been working on collecting this Data, and making it available in a very high quality way, for many years. This is my job and I'm responsible for this Data, and now you're telling me, you're going to encrypt that Data? It makes me nervous, and that's common, everybody feels that. So we want to have that conversation, and that sort of trial and error process to say, hey, let's get your feet wet with it, and see how you like it in a sandbox environment. Let's now take that into analytics, and take a look at how we can make this, go for a quick 1.0 release, and let's then take a look at, future expansions to that, where we start adding Kafka on the ingest side. We start sending Data off, into other machine learning and analytics platforms, that we might want to utilize outside of Vertica, for certain purposes, in certain industries. Let's take a look at those use cases together, and through that journey, we can really chart a path toward the future, where we can really help you protect that Data, at rest, in use, and keep you safe, from both the hackers and the regulators, and that I think at the end of the day, is really what it's all about, in terms of protecting our Data within Vertica. We're going to have a little couple minutes for Q&A, and we would encourage you to have any questions here, and we'd love to follow up with you more, about any questions you might have, about Vertica Plus Voltage Secure Data. They you very much for your time today.

Published Date : Mar 30 2020

SUMMARY :

and our engineering team is planning to join the Forum, and our goal is to keep you off the news,

ENTITIES

Entity	Category	Confidence
Vertica	ORGANIZATION	0.99+
100 gig	QUANTITY	0.99+
16	QUANTITY	0.99+
16 gigs	QUANTITY	0.99+
200 gig	QUANTITY	0.99+
Paige Roberts	PERSON	0.99+
2016	DATE	0.99+
Paige	PERSON	0.99+
Rich Gaston	PERSON	0.99+
dozens	QUANTITY	0.99+
2018	DATE	0.99+
Vertica Academy	ORGANIZATION	0.99+
2020	DATE	0.99+
SQL	TITLE	0.99+
AWS	ORGANIZATION	0.99+
First	QUANTITY	0.99+
1000 people	QUANTITY	0.99+
Hallmark	ORGANIZATION	0.99+
today	DATE	0.99+
Harold Potter	PERSON	0.99+
Rich	PERSON	0.99+
millions	QUANTITY	0.99+
Stanford University	ORGANIZATION	0.99+
15 minutes	QUANTITY	0.99+
Today	DATE	0.99+
Each customer	QUANTITY	0.99+
one	QUANTITY	0.99+
both	QUANTITY	0.99+
California	LOCATION	0.99+
Kafka	TITLE	0.99+
Vertica	TITLE	0.99+
Latin	OTHER	0.99+
tomorrow	DATE	0.99+
2017	DATE	0.99+
eight cores	QUANTITY	0.99+
two	QUANTITY	0.98+
GDPR	TITLE	0.98+
first	QUANTITY	0.98+
one customer	QUANTITY	0.98+
Tableau	TITLE	0.98+
United States	LOCATION	0.97+
this week	DATE	0.97+
Vertica	LOCATION	0.97+
4/2	DATE	0.97+
Linux	TITLE	0.97+
one file	QUANTITY	0.96+
vertica.com/secureData	OTHER	0.96+
four	QUANTITY	0.95+
about half a day	QUANTITY	0.95+
Cognos	TITLE	0.95+
four people	QUANTITY	0.94+
Udx	ORGANIZATION	0.94+
one way	QUANTITY	0.94+

UNLIST TILL 4/2 - The Next-Generation Data Underlying Architecture

>> Paige: Hello, everybody, and thank you for joining us today for the virtual Vertica BDC 2020. Today's breakout session is entitled, Vertica next generation architecture. I'm Paige Roberts, open social relationship Manager at Vertica, I'll be your host for this session. And joining me is Vertica Chief Architect, Chuck Bear, before we begin, I encourage you to submit questions or comments during the virtual session. You don't have to wait, just type your question or comment, in the question box that's below the slides and click submit. So as you think about it, go ahead and type it in, there'll be a Q&A session at the end of the presentation, where we'll answer as many questions, as we're able to during the time. Any questions that we don't get a chance to address, we'll do our best to answer offline. Or alternatively, you can visit the Vertica forums to post your questions there, after the session. Our engineering team is planning to join the forum and keep the conversation going, so you can, it's just sort of like the developers lounge would be in delight conference. It gives you a chance to talk to our engineering team. Also, as a reminder, you can maximize your screen by clicking the double arrow button in the lower right corner of the slide. And before you ask, yes, this virtual session is being recorded, and it will be available to view on demand this week, we'll send you a notification, as soon as it's ready. Okay, now, let's get started, over to you, Chuck. >> Chuck: Thanks for the introduction, Paige, Vertica vision is to help customers, get value from structured data. This vision is simple, it doesn't matter what vertical the customer is in. They're all analytics companies, it doesn't matter what the customers environment is, as data is generated everywhere. We also can't do this alone, we know that you need other tools and people to build a complete solution. You know our database is key to delivering on the vision because we need a database that scales. When you start a new database company, you aren't going to win against 30 year old products on features. But from day one, we had something else, an architecture built for analytics performance. This architecture was inspired by the C-store project, combining the best design ideas from academics and industry veterans like Dr. Mike Stonebreaker. Our storage is optimized for performance, we use many computers in parallel. After over 10 years of refinements against various customer workloads, much of the design held up and serendipitously, the fact that we don't store in place updates set Vertica up for success in the cloud as well. These days, there are other tools that embody some of these design ideas. But we have other strengths that are more important than the storage format, where the only good analytics database that runs both on premise and in the cloud, giving customers the option to migrate their workloads, in most convenient and economical environment, or a full data management solution, not just the query tool. Unlike some other choices, ours comes with integration with a sequel ecosystem and full professional support. We organize our product roadmap into four key pillars, plus the cross cutting concerns of open integration and performance and scale. We have big plans to strengthen Vertica, while staying true to our core. This presentation is primarily about the separation pillar, and performance and scale, I'll cover our plans for Eon, our data management architecture, Mart analytic clusters, or fifth generation query executer, and our data storage layer. Let's start with how Vertica manages data, one of the central design points for Vertica was shared nothing, a design that didn't utilize a dedicated hardware shared disk technology. This quote here is how Mike put it politely, but around the Vertica office, shared disk with an LMTB over Mike's dead body. And we did get some early field experience with shared disk, customers, well, in fact will learn on anything if you let them. There were misconfigurations that required certified experts, obscure bugs extent. Another thing about the shared nothing designed for commodity hardware though, and this was in the papers, is that all the data management features like fault tolerance, backup and elasticity have to be done in software. And no matter how much you do, procuring, configuring and maintaining the machines with disks is harder. The software configuration process to add more service may be simple, but capacity planning, racking and stacking is not. The original allure of shared storage returned, this time though, the complexity and economics are different. It's cheaper, even provision storage with a few clicks and only pay for what you need. It expands, contracts and brings the maintenance of the storage close to a team is good at it. But there's a key difference, it's an object store, an object stores don't support the API's and access patterns used by most database software. So another Vertica visionary Ben, set out to exploit Vertica storage organization, which turns out to be a natural fit for modern cloud shared storage. Because Vertica data files are written once and not updated, they match the object storage model perfectly. And so today we have Eon, Eon uses shared storage to hold Vertica data with local disk depot's that act as caches, ensuring that we can get the performance that our customers have come to expect. Essentially Eon in enterprise behave similarly, but we have the benefit of flexible storage. Today Eon has the features our customers expect, it's been developed in tune for years, we have successful customers such as Redpharma, and if you'd like to know more about Eon has helped them succeed in Amazon cloud, I highly suggest reading their case study, which you can find on vertica.com. Eon provides high availability and flexible scaling, sometimes on premise customers with local disks get a little jealous of how recovery and sub-clusters work in Eon. Though we operate on premise, particularly on pure storage, but enterprise also had strengths, the most obvious being that you don't need and short shared storage to run it. So naturally, our vision is to converge the two modes, back into a single Vertica. A Vertica that runs any combination of local disks and shared storage, with full flexibility and portability. This is easy to say, but over the next releases, here's what we'll do. First, we realize that the query executer, optimizer and client drivers and so on, are already the same. Just the transaction handling and data management is different. But there's already more going on, we have peer-to-peer depot operations and other internode transfers. And enterprise also has a network, we could just get files from remote nodes over that network, essentially mimicking the behavior and benefits of shared storage with the layer of software. The only difference at the end of it, will be which storage hold the master copy. In enterprise, the nodes can't drop the files because they're the master copy. Whereas in Eon they can be evicted because it's just the cache, the masters, then shared storage. And in keeping with versus current support for multiple storage locations, we can intermix these approaches at the table level. Getting there as a journey, and we've already taken the first steps. One of the interesting design ideas of the C-store paper is the idea that redundant copies, don't have to have the same physical organization. Different copies can be optimized for different queries, sorted in different ways. Of course, Mike also said to keep the recovery system simple, because it's hard to debug, whenever the recovery system is being used, it's always in a high pressure situation. This turns out to be a contradiction, and the latter idea was better. No down performing stuff, if you don't keep the storage the same. Recovery hardware if you have, to reorganize data in the process. Even query optimization is more complicated. So over the past couple releases, we got rid of non identical buddies. But the storage files can still diverge at the fifth level, because tuple mover operations are synchronized. The same record can end up in different files than different nodes. The next step in our journey, is to make sure both copies are identical. This will help with backup and restore as well, because the second copy doesn't need backed up, or if it is backed up, it appears identical to the deduplication that is going to look present in both backup systems. Simultaneously, we're improving the Vertica networking service to support this new access pattern. In conjunction with identical storage files, we will converge to a recovery system that instantaneous nodes can process queries immediately, by retrieving data they need over the network from the redundant copies as they do in Eon day with even higher performance. The final step then is to unify the catalog and transaction model. Related concepts such as segment and shard, local catalog and shard catalog will be coalesced, as they're really represented the same concepts all along, just in different modes. In the catalog, we'll make slight changes to the definition of a projection, which represents the physical storage organization. The new definition simplifies segmentation and introduces valuable granularities of sharding to support evolution over time, and offers a straightforward migration path for both Eon and enterprise. There's a lot more to our Eon story than just the architectural roadmap. If you missed yesterday's Vertica, in Eon mode presentation about supported cloud, on premise storage option, replays are available. Be sure to catch the upcoming presentation on sizing and configuring vertica and in beyond doors. As we've seen with Eon, Vertica can separate data storage from the compute nodes, allowing machines to quickly fill in for each other, to rebuild fault tolerance. But separating compute and storage is used for much, much more. We now offer powerful, flexible ways for Vertica to add servers and increase access to the data. Vertica nine, this feature is called sub-clusters. It allows computing capacity to be added quickly and incrementally, and isolates workloads from each other. If your exploratory analytics team needs direct access to the source data, they need a lot of machines and not the same number all the time, and you don't 100% trust the kind of queries and user defined functions, they might be using sub-clusters as the solution. While there's much more expensive information available in our other presentation. I'd like to point out the highlights of our latest sub-cluster best practices. We suggest having a primary sub-cluster, this is the one that runs all the time, if you're loading data around the clock. It should be sized for the ETL workloads and also determines the natural shard count. Additional read oriented secondary sub-clusters can be added for real time dashboards, reports and analytics. That way, subclusters can be added or deep provisioned, without disruption to other users. The sub-cluster features of Vertica 9.3 are working well for customers. Yesterday, the Trade Desk presented their use case for Vertica over 300,000 in 5 sub clusters running in the cloud. If you missed a presentation, check out the replay. But we have plans beyond sub-clusters, we're extending sub-clusters to real clusters. For the Vertica savvy, this means the clusters bump, share the same spread ring network. This will provide further isolation, allowing clusters to control their own independent data sets. While replicating all are part of the data from other clusters using a publish subscribe mechanism. Synchronizing data between clusters is a feature customers want to understand the real business for themselves. This vision effects are designed for ancillary aspects, how we should assign resource pools, security policies and balance client connection. We will be simplifying our data segmentation strategy, so that when data that originate in the different clusters meet, they'll still get fully optimized joins, even if those clusters weren't positioned with the same number of nodes per shard. Having a broad vision for data management is a key component to political success. But we also take pride in our execution strategy, when you start a new database from scratch as we did 15 years ago, you won't compete on features. Our key competitive points where speed and scale of analytics, we set a target of 100 x better query performance in traditional databases with path loads. Our storage architecture provides a solid foundation on which to build toward these goals. Every query starts with data retrieval, keeping data sorted, organized by column and compressed by using adaptive caching, to keep the data retrieval time in IO to the bare minimum theoretically required. We also keep the data close to where it will be processed, and you clusters the machines to increase throughput. We have partition pruning a robust optimizer evaluate active use segmentation as part of the physical database designed to keep records close to the other relevant records. So the solid foundation, but we also need optimal execution strategies and tactics. One execution strategy which we built for a long time, but it's still a source of pride, it's how we process expressions. Databases and other systems with general purpose expression evaluators, write a compound expression into a tree. Here I'm using A plus one times B as an example, during execution, if your CPU traverses the tree and compute sub-parts from the whole. Tree traversal often takes more compute cycles than the actual work to be done. Especially in evaluation is a very common operation, so something worth optimizing. One instinct that engineers have is to use what we call, just-in-time or JIT compilation, which means generating code form the CPU into the specific activity expression, and add them. This replaces the tree of boxes that are custom made box for the query. This approach has complexity bugs, but it can be made to work. It has other drawbacks though, it adds a lot to query setup time, especially for short queries. And it pretty much eliminate the ability of mere models, mere mortals to develop user defined functions. If you go back to the problem we're trying to solve, the source of the overhead is the tree traversal. If you increase the batch of records processed in each traversal step, this overhead is amortized until it becomes negligible. It's a perfect match for a columnar storage engine. This also sets the CPU up for efficiency. The CPUs look particularly good, at following the same small sequence of instructions in a tight loop. In some cases, the CPU may even be able to vectorize, and apply the same processing to multiple records to the same instruction. This approach is easy to implement and debug, user defined functions are possible, then generally aligned with the other complexities of implementing and improving a large system. More importantly, the performance, both in terms of query setup and record throughput is dramatically improved. You'll hear me say that we look at research and industry for inspiration. In this case, our findings in line with academic binding. If you'd like to read papers, I recommend everything you always wanted to know about compiled and vectorized queries, don't afraid to ask, so we did have this idea before we read that paper. However, not every decision we made in the Vertica executer that the test of time as well as the expression evaluator. For example, sorting and grouping aren't susceptible to vectorization because sort decisions interrupt the flow. We have used JIT compiling on that for years, and Vertica 401, and it provides modest setups, but we know we can do even better. But who we've embarked on a new design for execution engine, which I call EE five, because it's our best. It's really designed especially for the cloud, now I know what you're thinking, you're thinking, I just put up a slide with an old engine, a new engine, and a sleek play headed up into the clouds. But this isn't just marketing hype, here's what I mean, when I say we've learned lessons over the years, and then we're redesigning the executer for the cloud. And of course, you'll see that the new design works well on premises as well. These changes are just more important for the cloud. Starting with the network layer in the cloud, we can't count on all nodes being connected to the same switch. Multicast doesn't work like it does in a custom data center, so as I mentioned earlier, we're redesigning the network transfer layer for the cloud. Storage in the cloud is different, and I'm not referring here to the storage of persistent data, but to the storage of temporary data used only once during the course of query execution. Our new pattern is designed to take into account the strengths and weaknesses of cloud object storage, where we can't easily do a path. Moving on to memory, many of our access patterns are reasonably effective on bare metal machines, that aren't the best choice on cloud hyperbug that have overheads, page faults or big gap. Here again, we found we can improve performance, a bit on dedicated hardware, and even more in the cloud. Finally, and this is true in all environments, core counts have gone up. And not all of our algorithms take full advantage, there's a lot of ground to cover here. But I think sorting in the perfect example to illustrate these points, I mentioned that we use JIT in sorting. We're getting rid of JIT in favor of a data format that can be treated efficiently, independent of what the data types are. We've drawn on the best, most modern technology from academia and industry. We've got our own analysis and testing, you know what we chose, we chose parallel merge sort, anyone wants to take a guess when merge sort was invented. It was invented in 1948, or at least documented that way, like computing context. If you've heard me talk before, you know that I'm fascinated by how all the things I worked with as an engineer, were invented before I was born. And in Vertica , we don't use the newest technologies, we use the best ones. And what is noble about Vertica is the way we've combined the best ideas together into a cohesive package. So all kidding about the 1940s aside, or he redesigned is actually state of the art. How do we know the sort routine is state of the art? It turns out, there's a pretty credible benchmark or at the appropriately named historic sortbenchmark.org. Anyone with resources looking for fame for their product or academic paper can try to set the record. Record is last set in 2016 with Tencent Sort, 100 terabytes in 99 seconds. Setting the records it's hard, you have to come up with hundreds of machines on a dedicated high speed switching fabric. There's a lot to a distributed sort, there all have core sorting algorithms. The authors of the paper conveniently broke out of the time spent in their sort, 67 out of 99 seconds want to know local sorting. If we break this out, divided by two CPUs and each of 512 nodes, we find that each CPU so there's almost a gig and a half per second. This is for what's called an indy sort, like an Indy race car, is in general purpose. It only handles fixed hundred five records with 10 byte key. There is a record length can vary, then it's called daytona sort, a 10 set daytona sort, is a little slower. One point is 10 gigabytes per second per CPU, now for Verrtica, We have a wide variety ability in record sizes, and more interesting data types, but still no harm in setting us like phone numbers, comfortable to the world record. On my 2017 era AMD desktop CPU, the Vertica EE5 sort to store about two and a half gigabytes per second. Obviously, this test isn't apply to apples because they use their own open power chip. But the number of DRM channels is the same, so it's pretty close the number that says we've hit on the right approach. And it performs this way on premise, in the cloud, and we can adapt it to cloud temp space. So what's our roadmap for integrating EE5 into the product and compare replacing the query executed the database to replacing the crankshaft and other parts of the engine of a car while it's been driven. We've actually done it before, between Vertica three and a half and five, and then we never really stopped changing it, now we'll do it again. The first part in replacing with algorithm called storage merge, which combines sorted data from disk. The first time has was two that are in vertical in incoming 10.0 patch that will be EE5 or resegmented storage merge, and then convert sorting and grouping into do out. There the performance results so far, in cases where the Vertica execute is doing well today, simple environments with simple data patterns, such as this simple capitalistic query, there's a lot of speed up, when we ship the segmentation code, which didn't quite make the freeze as much like to bump longer term, what we do is grouping into the storage of large operations, we'll get to where we think we ought to be, given a theoretical minimum work the CPUs need to do. Now if we look at a case where the current execution isn't doing as well, we see there's a much stronger benefit to the code shipping in Vertica 10. In fact, it turns a chart bar sideways to try to help you see the difference better. This case also benefit from the improvements in 10 product point releases and beyond. They will not happening to the vertical query executer, That was just the taste. But now I'd like to switch to the roadmap first for our adapters layer. I'll start with a story about, how our storage access layer evolved. If you go back to the academic ideas, if you start paper that persuaded investors to fund Vertica, read optimized store was the part that had substantiation in the form of performance data. Much of the paper was speculative, but we tried to follow it anyway. That paper talked about the WS with RS, The rights are in the read store, and how they work together for transaction processing and how there was a supernova. In all honesty, Vertica engineers couldn't figure out from the paper what to do next, incase you want to try, and we asked them they would like, We never got enough clarification to build it that way. But here's what we built, instead. We built the ROS, read optimized store, introduction on steep major revision. It's sorted, ordered columnar and compressed that follows a table partitioning that worked even better than the we are as described in the paper. We also built the last byte optimized store, we built four versions of this over the years actually. But this was the best one, it's not a set of interrelated V tree. It's just an append only, insertion order remember your way here, am sorry, no compression, no base, no partitioning. There is, however, a tuple over which does what we call move out. Move the data from WOS to ROS, sorting and compressing. Let's take a moment to compare how they behave, when you load data directly to the ROS, there's a data parsing operation. Then we finished the sorting, and then compressing right out the columnar data files to stay storage. The next query through executes against the ROS and it runs as it should because the ROS is read optimized. Let's repeat the exercise for WOS, the load operation response before the sorting and compressing, and before the data is written to persistent storage. Now it's possible for a query to come along, and the query could be responsible for sorting the lost data in addition to its other processes. Effect on query isn't predictable until the TM comes along and writes the data to the ROS. Over the years, we've done a lot of comparisons between ROS and WOS. ROS has always been better for sustained load throughput, it achieves much higher records per second without pushing back against the client and hasn't Vertica for when we developed the first usable merge out algorithm. ROS has always been better for predictable query performance, the ROS has never had the same management complexity and limitations as WOS. You don't have to pick a memory size and figure out which transactions get to use the pool. A non persistent nature of ROS always cause headaches when there are unexpected cluster shutdowns. We also looked at field usage data, we found that few customers were using a lot, especially among those that studied the issue carefully. So how we set out on a mission to improve the ROS to the point where it was always better than both the WOS and the profit of the past. And now it's true, ROS is better than the WOS and the loss of a couple of years ago. We implemented storage bundling, better catalog object storage and better tuple mover merge outs. And now, after extensive Q&A and customer testing, we've now succeeded, and in Vertica 10, we've removed the whys. Let's talk for a moment about simplicity, one of the best things Mike Stonebreaker said is no knobs. Anyone want to guess how many knobs we got rid of, and we took the WOS out of the product. 22 were five knobs to control whether it didn't went to ROS as well. Six controlling the ROS itself, Six more to set policies for the typical remove out and so on. In my honest opinion is still wasn't enough control over to achieve excess in a multi tenant environment, the big reason to get rid of the WOS for simplicity. Make the lives of DBAs and users better, we have a long way to go, but we're doing it. On my desk, I keep a jar with the knob in it for each knob in Vertica. When developers add a knob to the product, they have to add a knob to the jar. When they remove a knob, they get to choose one to take out, We have a lot of work to do, but I'm thrilled to report that in 15 years 10 is the first release with a number of knobs ticked downward. Get back to the WOS, I've said the most important thing get rid of it for last. We're getting rid of it so we can deliver our vision of the future to our customer. Remember how he said an Eon and sub-clusters we got all these benefits from shared storage? Guess what can't live in shared storage, the WOS. Remember how it's been a big part of the future was keeping the copies that identical to the primary copy? Independent actions of the WOS took a little at the root of the divergence between copies of the data. You have to admit it when you're wrong. That was in the original design and held up to the a selling point of time, without onto the idea of a separate ROS and WOS for too long. In Vertica, 10, we can finally bid, good reagents. I've covered a lot of ground, so let's put all the pieces together. I've talked a lot about our vision and how we're achieving it. But we also still pay attention to tactical detail. We've been fine tuning our memory management model to enhance performance. That involves revisiting tens of thousands of satellite of code, much like painting the inside of a large building with small paintbrushes. We're getting results as shown in the chart in Vertica nine, concurrent monitoring queries use memory from the global catalog tool, and Vertica 10, they don't. This is only one example of an important detail we're improving. We've also reworked the monitoring tables without network messages into two parts. The increased data we're collecting and analyzing and our quality assurance processes, we're improving on everything. As the story goes, I still have my grandfather's axe, of course, my father had to replace the handle, and I had to replace the head. Along the same lines, we still have Mike Stonebreaker Vertica. We didn't replace the query optimizer twice the debate database designer and storage layer four times each. The query executed is and it's a free design, like charted out how our code has changed over the years. I found that we don't have much from a long time ago, I did some digging, and you know what we have left in 2007. We have the original curly braces, and a little bit of percent code for handling dates and times. To deliver on our mission to help customers get value from their structured data, with high performance of scale, and in diverse deployment environments. We have the sound architecture roadmap, reviews the best execution strategy and solid tactics. On the architectural front, we're converging in an enterprise, we're extending smart analytic clusters. In query processing, we're redesigning the execution engine for the cloud, as I've told you. There's a lot more than just the fast engine. that you want to learn about our new data support for complex data types, improvements to the query optimizer statistics, or extension to live aggregate projections and flatten tables. You should check out some of the other engineering talk that the big data conference. We continue to stay on top of the details from low level CPU and memory too, to the monitoring management, developing tighter feedback cycles between development, Q&A and customers. And don't forget to check out the rest of the pillars of our roadmap. We have new easier ways to get started with Vertica in the cloud. Engineers have been hard at work on machine learning and security. It's easier than ever to use Vertica with third Party product, as a variety of tools integrations continues to increase. Finally, the most important thing we can do, is to help people get value from structured data to help people learn more about Vertica. So hopefully I left plenty of time for Q&A at the end of this presentation. I hope to hear your questions soon.

Published Date : Mar 30 2020

SUMMARY :

and keep the conversation going, and apply the same processing to multiple records

ENTITIES

Entity	Category	Confidence
Mike	PERSON	0.99+
Mike Stonebreaker	PERSON	0.99+
2007	DATE	0.99+
Chuck Bear	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
2016	DATE	0.99+
Paige Roberts	PERSON	0.99+
Chuck	PERSON	0.99+
second copy	QUANTITY	0.99+
99 seconds	QUANTITY	0.99+
67	QUANTITY	0.99+
100%	QUANTITY	0.99+
1948	DATE	0.99+
Ben	PERSON	0.99+
two modes	QUANTITY	0.99+
Redpharma	ORGANIZATION	0.99+
first time	QUANTITY	0.99+
first steps	QUANTITY	0.99+
Paige	PERSON	0.99+
two parts	QUANTITY	0.99+
First	QUANTITY	0.99+
five knobs	QUANTITY	0.99+
100 terabytes	QUANTITY	0.99+
both copies	QUANTITY	0.99+
Today	DATE	0.99+
each knob	QUANTITY	0.99+
WS	ORGANIZATION	0.99+
AMD	ORGANIZATION	0.99+
Eon	ORGANIZATION	0.99+
1940s	DATE	0.99+
today	DATE	0.99+
One point	QUANTITY	0.99+
first part	QUANTITY	0.99+
fifth level	QUANTITY	0.99+
each	QUANTITY	0.99+
yesterday	DATE	0.98+
both	QUANTITY	0.98+
Six	QUANTITY	0.98+
first	QUANTITY	0.98+
512 nodes	QUANTITY	0.98+
ROS	TITLE	0.98+
over 10 years	QUANTITY	0.98+
Yesterday	DATE	0.98+
15 years ago	DATE	0.98+
twice	QUANTITY	0.98+
sortbenchmark.org	OTHER	0.98+
first release	QUANTITY	0.98+
two CPUs	QUANTITY	0.97+
Vertica 10	TITLE	0.97+
100 x	QUANTITY	0.97+
WOS	TITLE	0.97+
vertica.com	OTHER	0.97+
10 byte	QUANTITY	0.97+
this week	DATE	0.97+
one	QUANTITY	0.97+
5 sub clusters	QUANTITY	0.97+
two	QUANTITY	0.97+
one example	QUANTITY	0.97+
over 300,000	QUANTITY	0.96+
Dr.	PERSON	0.96+
One	QUANTITY	0.96+
tens of thousands of satellite	QUANTITY	0.96+
EE5	COMMERCIAL_ITEM	0.96+
fifth generation	QUANTITY	0.96+

UNLIST TILL 4/2 - A Technical Overview of Vertica Architecture

>> Paige: Hello, everybody and thank you for joining us today on the Virtual Vertica BDC 2020. Today's breakout session is entitled A Technical Overview of the Vertica Architecture. I'm Paige Roberts, Open Source Relations Manager at Vertica and I'll be your host for this webinar. Now joining me is Ryan Role-kuh? Did I say that right? (laughs) He's a Vertica Senior Software Engineer. >> Ryan: So it's Roelke. (laughs) >> Paige: Roelke, okay, I got it, all right. Ryan Roelke. And before we begin, I want to be sure and encourage you guys to submit your questions or your comments during the virtual session while Ryan is talking as you think of them as you go along. You don't have to wait to the end, just type in your question or your comment in the question box below the slides and click submit. There'll be a Q and A at the end of the presentation and we'll answer as many questions as we're able to during that time. Any questions that we don't address, we'll do our best to get back to you offline. Now, alternatively, you can visit the Vertica forums to post your question there after the session as well. Our engineering team is planning to join the forums to keep the conversation going, so you can have a chat afterwards with the engineer, just like any other conference. Now also, you can maximize your screen by clicking the double arrow button in the lower right corner of the slides and before you ask, yes, this virtual session is being recorded and it will be available to view on demand this week. We'll send you a notification as soon as it's ready. Now, let's get started. Over to you, Ryan. >> Ryan: Thanks, Paige. Good afternoon, everybody. My name is Ryan and I'm a Senior Software Engineer on Vertica's Development Team. I primarily work on improving Vertica's query execution engine, so usually in the space of making things faster. Today, I'm here to talk about something that's more general than that, so we're going to go through a technical overview of the Vertica architecture. So the intent of this talk, essentially, is to just explain some of the basic aspects of how Vertica works and what makes it such a great database software and to explain what makes a query execute so fast in Vertica, we'll provide some background to explain why other databases don't keep up. And we'll use that as a starting point to discuss an academic database that paved the way for Vertica. And then we'll explain how Vertica design builds upon that academic database to be the great software that it is today. I want to start by sharing somebody's approximation of an internet minute at some point in 2019. All of the data on this slide is generated by thousands or even millions of users and that's a huge amount of activity. Most of the applications depicted here are backed by one or more databases. Most of this activity will eventually result in changes to those databases. For the most part, we can categorize the way these databases are used into one of two paradigms. First up, we have online transaction processing or OLTP. OLTP workloads usually operate on single entries in a database, so an update to a retail inventory or a change in a bank account balance are both great examples of OLTP operations. Updates to these data sets must be visible immediately and there could be many transactions occurring concurrently from many different users. OLTP queries are usually key value queries. The key uniquely identifies the single entry in a database for reading or writing. Early databases and applications were probably designed for OLTP workloads. This example on the slide is typical of an OLTP workload. We have a table, accounts, such as for a bank, which tracks information for each of the bank's clients. An update query, like the one depicted here, might be run whenever a user deposits $10 into their bank account. Our second category is online analytical processing or OLAP which is more about using your data for decision making. If you have a hardware device which periodically records how it's doing, you could analyze trends of all your devices over time to observe what data patterns are likely to lead to failure or if you're Google, you might log user search activity to identify which links helped your users find the answer. Analytical processing has always been around but with the advent of the internet, it happened at scales that were unimaginable, even just 20 years ago. This SQL example is something you might see in an OLAP workload. We have a table, searches, logging user activity. We will eventually see one row in this table for each query submitted by users. If we want to find out what time of day our users are most active, then we could write a query like this one on the slide which counts the number of unique users running searches for each hour of the day. So now let's rewind to 2005. We don't have a picture of an internet minute in 2005, we don't have the data for that. We also don't have the data for a lot of other things. The term Big Data is not quite yet on anyone's radar and The Cloud is also not quite there or it's just starting to be. So if you have a database serving your application, it's probably optimized for OLTP workloads. OLAP workloads just aren't mainstream yet and database engineers probably don't have them in mind. So let's innovate. It's still 2005 and we want to try something new with our database. Let's take a look at what happens when we do run an analytic workload in 2005. Let's use as a motivating example a table of stock prices over time. In our table, the symbol column identifies the stock that was traded, the price column identifies the new price and the timestamp column indicates when the price changed. We have several other columns which, we should know that they're there, but we're not going to use them in any example queries. This table is designed for analytic queries. We're probably not going to make any updates or look at individual rows since we're logging historical data and want to analyze changes in stock price over time. Our database system is built to serve OLTP use cases, so it's probably going to store the table on disk in a single file like this one. Notice that each row contains all of the columns of our data in row major order. There's probably an index somewhere in the memory of the system which will help us to point lookups. Maybe our system expects that we will use the stock symbol and the trade time as lookup keys. So an index will provide quick lookups for those columns to the position of the whole row in the file. If we did have an update to a single row, then this representation would work great. We would seek to the row that we're interested in, finding it would probably be very fast using the in-memory index. And then we would update the file in place with our new value. On the other hand, if we ran an analytic query like we want to, the data access pattern is very different. The index is not helpful because we're looking up a whole range of rows, not just a single row. As a result, the only way to find the rows that we actually need for this query is to scan the entire file. We're going to end up scanning a lot of data that we don't need and that won't just be the rows that we don't need, there's many other columns in this table. Many information about who made the transaction, and we'll also be scanning through those columns for every single row in this table. That could be a very serious problem once we consider the scale of this file. Stocks change a lot, we probably have thousands or millions or maybe even billions of rows that are going to be stored in this file and we're going to scan all of these extra columns for every single row. If we tried out our stocks use case behind the desk for the Fortune 500 company, then we're probably going to be pretty disappointed. Our queries will eventually finish, but it might take so long that we don't even care about the answer anymore by the time that they do. Our database is not built for the task we want to use it for. Around the same time, a team of researchers in the North East have become aware of this problem and they decided to dedicate their time and research to it. These researchers weren't just anybody. The fruits of their labor, which we now like to call the C-Store Paper, was published by eventual Turing Award winner, Mike Stonebraker, along with several other researchers from elite universities. This paper presents the design of a read-optimized relational DBMS that contrasts sharply with most current systems, which are write-optimized. That sounds exactly like what we want for our stocks use case. Reasoning about what makes our queries executions so slow brought our researchers to the Memory Hierarchy, which essentially is a visualization of the relative speeds of different parts of a computer. At the top of the hierarchy, we have the fastest data units, which are, of course, also the most expensive to produce. As we move down the hierarchy, components get slower but also much cheaper and thus you can have more of them. Our OLTP databases data is stored in a file on the hard disk. We scanned the entirety of this file, even though we didn't need most of the data and now it turns out, that is just about the slowest thing that our query could possibly be doing by over two orders of magnitude. It should be clear, based on that, that the best thing we can do to optimize our query's execution is to avoid reading unnecessary data from the disk and that's what the C-Store researchers decided to look at. The key innovation of the C-Store paper does exactly that. Instead of storing data in a row major order, in a large file on disk, they transposed the data and stored each column in its own file. Now, if we run the same select query, we read only the relevant columns. The unnamed columns don't factor into the table scan at all since we don't even open the files. Zooming out to an internet scale sized data set, we can appreciate the savings here a lot more. But we still have to read a lot of data that we don't need to answer this particular query. Remember, we had two predicates, one on the symbol column and one on the timestamp column. Our query is only interested in AAPL stock, but we're still reading rows for all of the other stocks. So what can we do to optimize our disk read even more? Let's first partition our data set into different files based on the timestamp date. This means that we will keep separate files for each date. When we query the stocks table, the database knows all of the files we have to open. If we have a simple predicate on the timestamp column, as our sample query does, then the database can use it to figure out which files we don't have to look at at all. So now all of our disk reads that we have to do to answer our query will produce rows that pass the timestamp predicate. This eliminates a lot of wasteful disk reads. But not all of them. We do have another predicate on the symbol column where symbol equals AAPL. We'd like to avoid disk reads of rows that don't satisfy that predicate either. And we can avoid those disk reads by clustering all the rows that match the symbol predicate together. If all of the AAPL rows are adjacent, then as soon as we see something different, we can stop reading the file. We won't see any more rows that can pass the predicate. Then we can use the positions of the rows we did find to identify which pieces of the other columns we need to read. One technique that we can use to cluster the rows is sorting. So we'll use the symbol column as a sort key for all of the columns. And that way we can reconstruct a whole row by seeking to the same row position in each file. It turns out, having sorted all of the rows, we can do a bit more. We don't have any more wasted disk reads but we can still be more efficient with how we're using the disk. We've clustered all of the rows with the same symbol together so we don't really need to bother repeating the symbol so many times in the same file. Let's just write the value once and say how many rows we have. This one length encoding technique can compress large numbers of rows into a small amount of space. In this example, we do de-duplicate just a few rows but you can imagine de-duplicating many thousands of rows instead. This encoding is great for reducing the amounts of disk we need to read at query time, but it also has the additional benefit of reducing the total size of our stored data. Now our query requires substantially fewer disk reads than it did when we started. Let's recap what the C-Store paper did to achieve that. First, we transposed our data to store each column in its own file. Now, queries only have to read the columns used in the query. Second, we partitioned the data into multiple file sets so that all rows in a file have the same value for the partition column. Now, a predicate on the partition column can skip non-matching file sets entirely. Third, we selected a column of our data to use as a sort key. Now rows with the same value for that column are clustered together, which allows our query to stop reading data once it finds non-matching rows. Finally, sorting the data this way enables high compression ratios, using one length encoding which minimizes the size of the data stored on the disk. The C-Store system combined each of these innovative ideas to produce an academically significant result. And if you used it behind the desk of a Fortune 500 company in 2005, you probably would've been pretty pleased. But it's not 2005 anymore and the requirements of a modern database system are much stricter. So let's take a look at how C-Store fairs in 2020. First of all, we have designed the storage layer of our database to optimize a single query in a single application. Our design optimizes the heck out of that query and probably some similar ones but if we want to do anything else with our data, we might be in a bit of trouble. What if we just decide we want to ask a different question? For example, in our stock example, what if we want to plot all the trade made by a single user over a large window of time? How do our optimizations for the previous query measure up here? Well, our data's partitioned on the trade date, that could still be useful, depending on our new query. If we want to look at a trader's activity over a long period of time, we would have to open a lot of files. But if we're still interested in just a day's worth of data, then this optimization is still an optimization. Within each file, our data is ordered on the stock symbol. That's probably not too useful anymore, the rows for a single trader aren't going to be clustered together so we will have to scan all of the rows in order to figure out which ones match. You could imagine a worse design but as it becomes crucial to optimize this new type of query, then we might have to go as far as reconfiguring the whole database. The next problem of one of scale. One server is probably not good enough to serve a database in 2020. C-Store, as described, runs on a single server and stores lots of files. What if the data overwhelms this small system? We could imagine exhausting the file system's inodes limit with lots of small files due to our partitioning scheme. Or we could imagine something simpler, just filling up the disk with huge volumes of data. But there's an even simpler problem than that. What if something goes wrong and C-Store crashes? Then our data is no longer available to us until the single server is brought back up. A third concern, another one of scalability, is that one deployment does not really suit all possible things and use cases we could imagine. We haven't really said anything about being flexible. A contemporary database system has to integrate with many other applications, which might themselves have pretty restricted deployment options. Or the demands imposed by our workloads have changed and the setup you had before doesn't suit what you need now. C-Store doesn't do anything to address these concerns. What the C-Store paper did do was lead very quickly to the founding of Vertica. Vertica's architecture and design are essentially all about bringing the C-Store designs into an enterprise software system. The C-Store paper was just an academic exercise so it didn't really need to address any of the hard problems that we just talked about. But Vertica, the first commercial database built upon the ideas of the C-Store paper would definitely have to. This brings us back to the present to look at how an analytic query runs in 2020 on the Vertica Analytic Database. Vertica takes the key idea from the paper, can we significantly improve query performance by changing the way our data is stored and give its users the tools to customize their storage layer in order to heavily optimize really important or commonly wrong queries. On top of that, Vertica is a distributed system which allows it to scale up to internet-sized data sets, as well as have better reliability and uptime. We'll now take a brief look at what Vertica does to address the three inadequacies of the C-Store system that we mentioned. To avoid locking into a single database design, Vertica provides tools for the database user to customize the way their data is stored. To address the shortcomings of a single node system, Vertica coordinates processing among multiple nodes. To acknowledge the large variety of desirable deployments, Vertica does not require any specialized hardware and has many features which smoothly integrate it with a Cloud computing environment. First, we'll look at the database design problem. We're a SQL database, so our users are writing SQL and describing their data in SQL way, the Create Table statement. Create Table is a logical description of what your data looks like but it doesn't specify the way that it has to be stored, For a single Create Table, we could imagine a lot of different storage layouts. Vertica adds some extensions to SQL so that users can go even further than Create Table and describe the way that they want the data to be stored. Using terminology from the C-Store paper, we provide the Create Projection statement. Create Projection specifies how table data should be laid out, including column encoding and sort order. A table can have multiple projections, each of which could be ordered on different columns. When you query a table, Vertica will answer the query using the projection which it determines to be the best match. Referring back to our stock example, here's a sample Create Table and Create Projection statement. Let's focus on our heavily optimized example query, which had predicates on the stock symbol and date. We specify that the table data is to be partitioned by date. The Create Projection Statement here is excellent for this query. We specify using the order by clause that the data should be ordered according to our predicates. We'll use the timestamp as a secondary sort key. Each projection stores a copy of the table data. If you don't expect to need a particular column in a projection, then you can leave it out. Our average price query didn't care about who did the trading, so maybe our projection design for this query can leave the trader column out entirely. If the question we want to ask ever does change, maybe we already have a suitable projection, but if we don't, then we can create another one. This example shows another projection which would be much better at identifying trends of traders, rather than identifying trends for a particular stock. Next, let's take a look at our second problem, that one, or excuse me, so how should you decide what design is best for your queries? Well, you could spend a lot of time figuring it out on your own, or you could use Vertica's Database Designer tool which will help you by automatically analyzing your queries and spitting out a design which it thinks is going to work really well. If you want to learn more about the Database Designer Tool, then you should attend the session Vertica Database Designer- Today and Tomorrow which will tell you a lot about what the Database Designer does and some recent improvements that we have made. Okay, now we'll move to our next problem. (laughs) The challenge that one server does not fit all. In 2020, we have several orders of magnitude more data than we had in 2005. And you need a lot more hardware to crunch it. It's not tractable to keep multiple petabytes of data in a system with a single server. So Vertica doesn't try. Vertica is a distributed system so will deploy multiple severs which work together to maintain such a high data volume. In a traditional Vertica deployment, each node keeps some of the data in its own locally-attached storage. Data is replicated so that there is a redundant copy somewhere else in the system. If any one node goes down, then the data that it served is still available on a different node. We'll also have it so that in the system, there's no special node with extra duties. All nodes are created equal. This ensures that there is no single point of failure. Rather than replicate all of your data, Vertica divvies it up amongst all of the nodes in your system. We call this segmentation. The way data is segmented is another parameter of storage customization and it can definitely have an impact upon query performance. A common way to segment data is by using a hash expression, which essentially randomizes the node that a row of data belongs to. But with a guarantee that the same data will always end up in the same place. Describing the way data is segmented is another part of the Create Projection Statement, as seen in this example. Here we segment on the hash of the symbol column so all rows with the same symbol will end up on the same node. For each row that we load into the system, we'll apply our segmentation expression. The result determines which segment the row belongs to and then we'll send the row to each node which holds the copy of that segment. In this example, our projection is marked KSAFE 1, so we will keep one redundant copy of each segment. When we load a row, we might find that its segment had copied on Node One and Node Three, so we'll send a copy of the row to each of those nodes. If Node One is temporarily disconnected from the network, then Node Three can serve the other copy of the segment so that the whole system remains available. The last challenge we brought up from the C-Store design was that one deployment does not fit all. Vertica's cluster design neatly addressed many of our concerns here. Our use of segmentation to distribute data means that a Vertica system can scale to any size of deployment. And since we lack any special hardware or nodes with special purposes, Vertica servers can run anywhere, on premise or in the Cloud. But let's suppose you need to scale out your cluster to rise to the demands of a higher workload. Suppose you want to add another node. This changes the division of the segmentation space. We'll have to re-segment every row in the database to find its new home and then we'll have to move around any data that belongs to a different segment. This is a very expensive operation, not something you want to be doing all that often. Traditional Vertica doesn't solve that problem especially well, but Vertica Eon Mode definitely does. Vertica's Eon Mode is a large set of features which are designed with a Cloud computing environment in mind. One feature of this design is elastic throughput scaling, which is the idea that you can smoothly change your cluster size without having to pay the expenses of shuffling your entire database. Vertica Eon Mode had an entire session dedicated to it this morning. I won't say any more about it here, but maybe you already attended that session or if you haven't, then I definitely encourage you to listen to the recording. If you'd like to learn more about the Vertica architecture, then you'll find on this slide links to several of the academic conference publications. These four papers here, as well as Vertica Seven Years Later paper which describes some of the Vertica designs seven years after the founding and also a paper about the innovations of Eon Mode and of course, the Vertica documentation is an excellent resource for learning more about what's going on in a Vertica system. I hope you enjoyed learning about the Vertica architecture. I would be very happy to take all of your questions now. Thank you for attending this session.

Published Date : Mar 30 2020

SUMMARY :

A Technical Overview of the Vertica Architecture. Ryan: So it's Roelke. in the question box below the slides and click submit. that the best thing we can do

ENTITIES

Entity	Category	Confidence
Ryan	PERSON	0.99+
Mike Stonebraker	PERSON	0.99+
Ryan Roelke	PERSON	0.99+
2005	DATE	0.99+
2020	DATE	0.99+
thousands	QUANTITY	0.99+
2019	DATE	0.99+
$10	QUANTITY	0.99+
Paige Roberts	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
Paige	PERSON	0.99+
Node Three	TITLE	0.99+
Today	DATE	0.99+
First	QUANTITY	0.99+
each file	QUANTITY	0.99+
Roelke	PERSON	0.99+
each row	QUANTITY	0.99+
Node One	TITLE	0.99+
millions	QUANTITY	0.99+
each hour	QUANTITY	0.99+
each	QUANTITY	0.99+
Second	QUANTITY	0.99+
second category	QUANTITY	0.99+
each column	QUANTITY	0.99+
One technique	QUANTITY	0.99+
one	QUANTITY	0.99+
two predicates	QUANTITY	0.99+
each node	QUANTITY	0.99+
One server	QUANTITY	0.99+
SQL	TITLE	0.99+
C-Store	TITLE	0.99+
second problem	QUANTITY	0.99+
Ryan Role	PERSON	0.99+
Third	QUANTITY	0.99+
North East	LOCATION	0.99+
each segment	QUANTITY	0.99+
today	DATE	0.98+
single entry	QUANTITY	0.98+
each date	QUANTITY	0.98+
Google	ORGANIZATION	0.98+
one row	QUANTITY	0.98+
one server	QUANTITY	0.98+
single server	QUANTITY	0.98+
single entries	QUANTITY	0.98+
both	QUANTITY	0.98+
20 years ago	DATE	0.98+
two paradigms	QUANTITY	0.97+
a day	QUANTITY	0.97+
this week	DATE	0.97+
billions of rows	QUANTITY	0.97+
Vertica	TITLE	0.97+
4/2	DATE	0.97+
single application	QUANTITY	0.97+
each query	QUANTITY	0.97+
Each projection	QUANTITY	0.97+

UNLIST TILL 4/2 - Vertica in Eon Mode: Past, Present, and Future

>> Paige: Hello everybody and thank you for joining us today for the virtual Vertica BDC 2020. Today's breakout session is entitled Vertica in Eon Mode past, present and future. I'm Paige Roberts, open source relations manager at Vertica and I'll be your host for this session. Joining me is Vertica engineer, Yuanzhe Bei and Vertica Product Manager, David Sprogis. Before we begin, I encourage you to submit questions or comments during the virtual session. You don't have to wait till the end. Just type your question or comment as you think of it in the question box, below the slides and click Submit. Q&A session at the end of the presentation. We'll answer as many of your questions as we're able to during that time, and any questions that we don't address, we'll do our best to answer offline. If you wish after the presentation, you can visit the Vertica forums to post your questions there and our engineering team is planning to join the forums to keep the conversation going, just like a Dev Lounge at a normal in person, BDC. So, as a reminder, you can maximize your screen by clicking the double arrow button in the lower right corner of the slides, if you want to see them bigger. And yes, before you ask, this virtual session is being recorded and will be available to view on demand this week. We are supposed to send you a notification as soon as it's ready. All right, let's get started. Over to you, Dave. >> David: Thanks, Paige. Hey, everybody. Let's start with a timeline of the life of Eon Mode. About two years ago, a little bit less than two years ago, we introduced Eon Mode on AWS. Pretty specifically for the purpose of rapid scaling to meet the cloud economics promise. It wasn't long after that we realized that workload isolation, a byproduct of the architecture was very important to our users and going to the third tick, you can see that the importance of that workload isolation was manifest in Eon Mode being made available on-premise using Pure Storage FlashBlade. Moving to the fourth tick mark, we took steps to improve workload isolation, with a new type of subcluster which Yuanzhe will go through and to the fifth tick mark, the introduction of secondary subclusters for faster scaling and other improvements which we will cover in the slides to come. Getting started with, why we created Eon Mode in the first place. Let's imagine that your database is this pie, the pecan pie and we're loading pecan data in through the ETL cutting board in the upper left hand corner. We have a couple of free floating pecans, which we might imagine to be data supporting external tables. As you know, the Vertica has a query engine capability as well which we call external tables. And so if we imagine this pie, we want to serve it with a number of servers. Well, let's say we wanted to serve it with three servers, three nodes, we would need to slice that pie into three segments and we would serve each one of those segments from one of our nodes. Now because the data is important to us and we don't want to lose it, we're going to be saving that data on some kind of raid storage or redundant storage. In case one of the drives goes bad, the data remains available because of the durability of raid. Imagine also, that we care about the availability of the overall database. Imagine that a node goes down, perhaps the second node goes down, we still want to be able to query our data and through nodes one and three, we still have all three shards covered and we can do this because of buddy projections. Each neighbor, each nodes neighbor contains a copy of the data from the node next to it. And so in this case, node one is sharing its segment with node two. So node two can cover node one, node three can cover node two and node one back to node three. Adding a little bit more complexity, we might store the data in different copies, each copy sorted for a different kind of query. We call this projections in Vertica and for each projection, we have another copy of the data sorted differently. Now it gets complex. What happens when we want to add a node? Well, if we wanted to add a fourth node here, what we would have to do, is figure out how to re-slice all of the data in all of the copies that we have. In effect, what we want to do is take our three slices and slice it into four, which means taking a portion of each of our existing thirds and re-segmenting into quarters. Now that looks simple in the graphic here, but when it comes to moving data around, it becomes quite complex because for each copy of each segment we need to replace it and move that data on to the new node. What's more, the fourth node can't have a copy of itself that would be problematic in case it went down. Instead, what we need is we need that buddy to be sitting on another node, a neighboring node. So we need to re-orient the buddies as well. All of this takes a lot of time, it can take 12, 24 or even 36 hours in a period when you do not want your database under high demand. In fact, you may want to stop loading data altogether in order to speed it up. This is a planned event and your applications should probably be down during this period, which makes it difficult. With the advent of cloud computing, we saw that services were coming up and down faster and we determined to re-architect Vertica in a way to accommodate that rapid scaling. Let's see how we did it. So let's start with four nodes now and we've got our four nodes database. Let's add communal storage and move each of the segments of data into communal storage. Now that's the separation that we're talking about. What happens if we run queries against it? Well, it turns out that the communal storage is not necessarily performing and so the IO would be slow, which would make the overall queries slow. In order to compensate for the low performance of communal storage, we need to add back local storage, now it doesn't have to be raid because this is just an ephemeral copy but with the data files, local to the node, the queries will run much faster. In AWS, communal storage really does mean an S3 bucket and here's a simplified version of the diagram. Now, do we need to store all of the data from the segment in the depot? The answer is no and the graphics inside the bucket has changed to reflect that. It looks more like a bullseye, showing just a segment of the data being copied to the cache or to the depot, as we call it on each one of the nodes. How much data do you store on the node? Well, it would be the active data set, the last 30 days, the last 30 minutes or the last. Whatever period of time you're working with. The active working set is the hot data and that's how large you want to size your depot. By architecting this way, when you scale up, you're not re-segmenting the database. What you're doing, is you're adding more compute and more subscriptions to the existing shards of the existing database. So in this case, we've added a complete set of four nodes. So we've doubled our capacity and we've doubled our subscriptions, which means that now, the two nodes can serve the yellow shard, two nodes can serve the red shard and so on. In this way, we're able to run twice as many queries in the same amount of time. So you're doubling the concurrency. How high can you scale? Well, can you scale to 3X, 5X? We tested this in the graphics on the right, which shows concurrent users in the X axis by the number of queries executed in a minute along the Y axis. We've grouped execution in runs of 10 users, 30 users, 50, 70 up to 150 users. Now focusing on any one of these groups, particularly up around 150. You can see through the three bars, starting with the bright purple bar, three nodes and three segments. That as you add nodes to the middle purple bar, six nodes and three segments, you've almost doubled your throughput up to the dark purple bar which is nine nodes and three segments and our tests show that you can go to 5X with pretty linear performance increase. Beyond that, you do continue to get an increase in performance but your incremental performance begins to fall off. Eon architecture does something else for us and that is it provides high availability because each of the nodes can be thought of as ephemeral and in fact, each node has a buddy subscription in a way similar to the prior architecture. So if we lose node four, we're losing the node responsible for the red shard and now node one has to pick up responsibility for the red shard while that node is down. When a query comes in, and let's say it comes into one and one is the initiator then one will look for participants, it'll find a blue shard and a green shard but when it's looking for the red, it finds itself and so the node number one will be doing double duty. This means that your performance will be cut in half approximately, for the query. This is acceptable until you are able to restore the node. Once you restore it and once the depot becomes rehydrated, then your performance goes back to normal. So this is a much simpler way to recover nodes in the event of node failure. By comparison, Enterprise Mode the older architecture. When we lose the fourth node, node one takes over responsibility for the first shard and the yellow shard and the red shard. But it also is responsible for rehydrating the entire data segment of the red shard to node four, this can be very time consuming and imposes even more stress on the first node. So performance will go down even further. Eon Mode has another feature and that is you can scale down completely to zero. We call this hibernation, you shut down your database and your database will maintain full consistency in a rest state in your S3 bucket and then when you need access to your database again, you simply recreate your cluster and revive your database and you can access your database once again. That concludes the rapid scaling portion of, why we created Eon Mode. To take us through workload isolation is Yuanzhe Bei, Yuanzhe. >> Yuanzhe: Thanks Dave, for presenting how Eon works in general. In the next section, I will show you another important capability of Vertica Eon Mode, the workload isolation. Dave used a pecan pie as an example of database. Now let's say it's time for the main course. Does anyone still have a problem with food touching on their plates. Parents know that it's a common problem for kids. Well, we have a similar problem in database as well. So there could be multiple different workloads accessing your database at the same time. Say you have ETL jobs running regularly. While at the same time, there are dashboards running short queries against your data. You may also have the end of month report running and their can be ad hoc data scientists, connect to the database and do whatever the data analysis they want to do and so on. How to make these mixed workload requests not interfere with each other is a real challenge for many DBAs. Vertica Eon Mode provides you the solution. I'm very excited here to introduce to you to the important concept in Eon Mode called subclusters. In Eon Mode, nodes they belong to the predefined subclusters rather than the whole cluster. DBAs can define different subcluster for different kinds of workloads and it redirects those workloads to the specific subclusters. For example, you can have an ETL subcluster, dashboard subcluster, report subcluster and the analytic machine learning subcluster. Vertica Eon subcluster is designed to achieve the three main goals. First of all, strong workload isolation. That means any operation in one subcluster should not affect or be affected by other subclusters. For example, say the subcluster running the report is quite overloaded and already there can be, the data scienctists running crazy analytic jobs, machine learning jobs on the analytics subcluster and making it very slow, even stuck or crash or whatever. In such scenario, your ETL and dashboards subcluster should not be or at least very minimum be impacted by this crisis and which means your ETL job which should not lag behind and dashboard should respond timely. We have done a lot of improvements as of 10.0 release and will continue to deliver improvements in this category. Secondly, fully customized subcluster settings. That means any subcluster can be set up and tuned for very different workloads without affecting other subclusters. Users should be able to tune up, tune down, certain parameters based on the actual needs of the individual subcluster workload requirements. As of today, Vertica already supports few settings that can be done at the subcluster level for example, the depot pinning policy and then we will continue extending more that is like resource pools (mumbles) in the near future. Lastly, Vertica subclusters should be easy to operate and cost efficient. What it means is that the subcluster should be able to turn on, turn off, add or remove or should be available for use according to rapid changing workloads. Let's say in this case, you want to spin up more dashboard subclusters because we need higher scores report, we can do that. You might need to run several report subclusters because you might want to run multiple reports at the same time. While on the other hand, you can shut down your analytic machine learning subcluster because no data scientists need to use it at this moment. So we made automate a lot of change, the improvements in this category, which I'll explain in detail later and one of the ultimate goal is to support auto scaling To sum up, what we really want to deliver for subcluster is very simple. You just need to remember that accessing subclusters should be just like accessing individual clusters. Well, these subclusters do share the same catalog. So you don't have to work out the stale data and don't need to worry about data synchronization. That'd be a nice goal, Vertica upcoming 10.0 release is certainly a milestone towards that goal, which will deliver a large part of the capability in this direction and then we will continue to improve it after 10.0 release. In the next couple of slides, I will highlight some issues about workload isolation in the initial Eon release and show you how we resolve these issues. First issue when we initially released our first or so called subcluster mode, it was implemented using fault groups. Well, fault groups and the subcluster have something in common. Yes, they are both defined as a set of nodes. However, they are very different in all the other ways. So, that was very confusing in the first place, when we implement this. As of 9.3.0 version, we decided to detach subcluster definition from the fault groups, which enabled us to further extend the capability of subclusters. Fault groups in the pre 9.3.0 versions will be converted into subclusters during the upgrade and this was a very important step that enabled us to provide all the amazing, following improvements on subclusters. The second issue in the past was that it's hard to control the execution groups for different types of workloads. There are two types of problems here and I will use some example to explain. The first issue is about control group size. There you allocate six nodes for your dashboard subcluster and what you really want is on the left, the three pairs of nodes as three execution groups, and each pair of nodes will need to subscribe to all the four shards. However, that's not really what you get. What you really get is there on the right side that the first four nodes subscribed to one shard each and the rest two nodes subscribed to two dangling shards. So you won't really get three execusion groups but instead only get one and two extra nodes have no value at all. The solution is to use subclusters. So instead of having a subcluster with six nodes, you can split it up into three smaller ones. Each subcluster will guarantee to subscribe to all the shards and you can further handle this three subcluster using load balancer across them. In this way you achieve the three real exclusion groups. The second issue is that the session participation is non-deterministic. Any session will just pick four random nodes from the subcluster as long as this covers one shard each. In other words, you don't really know which set of nodes will make up your execution group. What's the problem? So in this case, the fourth node will be doubled booked by two concurrent sessions. And you can imagine that the resource usage will be imbalanced and both queries performance will suffer. What is even worse is that these queries of the two concurrent sessions target different table They will cause the issue, that depot efficiency will be reduced, because both session will try to fetch the files on to two tables into the same depot and if your depot is not large enough, they will evict each other, which will be very bad. To solve this the same way, you can solve this by declaring subclusters, in this case, two subclusters and a load balancer group across them. The reason it solved the problem is because the session participation would not go across the boundary. So there won't be a case that any node is double booked and in terms of the depot and if you use the subcluster and avoid using a load balancer group, and carefully send the first workload to the first subcluster and the second to the second subcluster and then the result is that depot isolation is achieved. The first subcluster will maintain the data files for the first query and you don't need to worry about the file being evicted by the second kind of session. Here comes the next issue, it's the scaling down. In the old way of defining subclusters, you may have several execution groups in the subcluster. You want to shut it down, one or two execution groups to save cost. Well, here comes the pain, because you don't know which nodes may be used by which session at any point, it is hard to find the right timing to hit the shutdown button of any of the instances. And if you do and get unlucky, say in this case, you pull the first four nodes, one of the session will fail because it's participating in the node two and node four at that point. User of that session will notice because their query fails and we know that for many business this is critical problem and not acceptable. Again, with subclusters this problem is resolved. Same reason, session cannot go across the subcluster boundary. So all you need to do is just first prevent query sent to the first subcluster and then you can shut down the instances in that subcluster. You are guaranteed to not break any running sessions. Now, you're happy and you want to shut down more subclusters then you hit the issue four, the whole cluster will go down, why? Because the cluster loses quorum. As a distributed system, you need to have at least more than half of a node to be up in order to commit and keep the cluster up. This is to prevent the catalog diversion from happening, which is important. But do you still want to shut down those nodes? Because what's the point of keeping those nodes up and if you are not using them and let them cost you money right. So Vertica has a solution, you can define a subcluster as secondary to allow them to shut down without worrying about quorum. In this case, you can define the first three subclusters as secondary and the fourth one as primary. By doing so, this secondary subclusters will not be counted towards the quorum because we changed the rule. Now instead of requiring more than half of node to be up, it only require more than half of the primary node to be up. Now you can shut down your second subcluster and even shut down your third subcluster as well and keep the remaining primary subcluster to be still running healthily. There are actually more benefits by defining secondary subcluster in addition to the quorum concern, because the secondary subclusters no longer have the voting power, they don't need to persist catalog anymore. This means those nodes are faster to deploy, and can be dropped and re-added. Without the worry about the catalog persistency. For the most the subcluster that only need to read only query, it's the best practice to define them as secondary. The commit will be faster on this secondary subcluster as well, so running this query on the secondary subcluster will have less spikes. Primary subcluster as usual handle everything is responsible for consistency, the background tasks will be running. So DBAs should make sure that the primary subcluster is stable and assume is running all the time. Of course, you need to at least one primary subcluster in your database. Now with the secondary subcluster, user can start and stop as they need, which is very convenient and this further brings up another issue is that if there's an ETL transaction running and in the middle, a subcluster starting and it become up. In older versions, there is no catalog resync mechanism to keep the new subcluster up to date. So Vertica rolls back to ETL session to keep the data consistency. This is actually quite disruptive because real world ETL workloads can sometimes take hours and rolling back at the end means, a large waste of resources. We resolved this issue in 9.3.1 version by introducing a catalog resync mechanism when such situation happens. ETL transactions will not roll back anymore, but instead will take some time to resync the catalog and commit and the problem is resolved. And last issue I would like to talk about is the subscription. Especially for large subcluster when you start it, the startup time is quite long, because the subscription commit used to be serialized. In one of the in our internal testing with large catalogs committing a subscription, you can imagine it takes five minutes. Secondary subcluster is better, because it doesn't need to persist the catalog during the commit but still take about two seconds to commit. So what's the problem here? Let's do the math and look at this chart. The X axis is the time in the minutes and the Y axis is the number of nodes to be subscribed. The dark blues represents your primary subcluster and light blue represents the secondary subcluster. Let's say the subcluster have 16 nodes in total and if you start a secondary subcluster, it will spend about 30 seconds in total, because the 2 seconds times 16 is 32. It's not actually that long time. but if you imagine that starting secondary subcluster, you expect it to be super fast to react to the fast changing workload and 30 seconds is no longer trivial anymore and what is even worse is on the primary subcluster side. Because the commit is much longer than five minutes let's assume, then at the point, you are committing to six nodes subscription all other nodes already waited for 30 minutes for GCLX or we know the global catalog lock, and the Vertica will crash the nodes, if any node cannot get the GCLX for 30 minutes. So the end result is that your whole database crashed. That's a serious problem and we know that and that's why we are already planning for the fix, for the 10.0, so that all the subscription will be batched up and all the nodes will commit at the same time concurrently. And by doing that, you can imagine the primary subcluster can finish commiting in five minutes instead of crashing and the secondary subcluster can be finished even in seconds. That summarizes the highlights for the improvements we have done as of 10.0, and I hope you already get excited about Emerging Eon Deployment Pattern that's shown here. A primary subcluster that handles data loading, ETL jobs and tuple mover jobs is the backbone of the database and you keep it running all the time. At the same time defining different secondary subcluster for different workloads and provision them when the workload requirement arrives and then de-provision them when the workload is done to save the operational cost. So can't wait to play with the subcluster. Here as are some Admin Tools command you can start using. And for more details, check out our Eon subcluster documentation for more details. And thanks everyone for listening and I'll head back to Dave to talk about the Eon on-prem. >> David: Thanks Yuanzhe. At the same time that Yuanzhe and the rest of the dev team were working on the improvements that Yuanzhe described in and other improvements. This guy, John Yovanovich, stood on stage and told us about his deployment at at&t where he was running Eon Mode on-prem. Now this was only six months after we had launched Eon Mode on AWS. So when he told us that he was putting it into production on-prem, we nearly fell out of our chairs. How is this possible? We took a look back at Eon and determined that the workload isolation and the improvement to the operations for restoring nodes and other things had sufficient value that John wanted to run it on-prem. And he was running it on the Pure Storage FlashBlade. Taking a second look at the FlashBlade we thought alright well, does it have the performance? Yes, it does. The FlashBlade is a collection of individual blades, each one of them with NVMe storage on it, which is not only performance but it's scalable and so, we then asked is it durable? The answer is yes. The data safety is implemented with the N+2 redundancy which means that up to two blades can fail and the data remains available. And so with this we realized DBAs can sleep well at night, knowing that their data is safe, after all Eon Mode outsources the durability to the communal storage data store. Does FlashBlade have the capacity for growth? Well, yes it does. You can start as low as 120 terabytes and grow as high as about eight petabytes. So it certainly covers the range for most enterprise usages. And operationally, it couldn't be easier to use. When you want to grow your database. You can simply pop new blades into the FlashBlade unit, and you can do that hot. If one goes bad, you can pull it out and replace it hot. So you don't have to take your data store down and therefore you don't have to take Vertica down. Knowing all of these things we got behind Pure Storage and partnered with them to implement the first version of Eon on-premise. That changed our roadmap a little bit. We were imagining it would start with Amazon and then go to Google and then to Azure and at some point to Alibaba cloud, but as you can see from the left column, we started with Amazon and went to Pure Storage. And then from Pure Storage, we went to Minio and we launched Eon Mode on Minio at the end of last year. Minio is a little bit different than Pure Storage. It's software only, so you can run it on pretty much any x86 servers and you can cluster them with storage to serve up an S3 bucket. It's a great solution for up to about 120 terabytes Beyond that, we're not sure about performance implications cause we haven't tested it but for your dev environments or small production environments, we think it's great. With Vertica 10, we're introducing Eon Mode on Google Cloud. This means not only running Eon Mode in the cloud, but also being able to launch it from the marketplace. We're also offering Eon Mode on HDFS with version 10. If you have a Hadoop environment, and you want to breathe new fresh life into it with the high performance of Vertica, you can do that starting with version 10. Looking forward we'll be moving Eon mode to Microsoft Azure. We expect to have something breathing in the fall and offering it to select customers for beta testing and then we expect to release it sometime in 2021 Following that, further on horizon is Alibaba cloud. Now, to be clear we will be putting, Vertica in Enterprise Mode on Alibaba cloud in 2020 but Eon Mode is going to trail behind whether it lands in 2021 or not, we're not quite sure at this point. Our goal is to deliver Eon Mode anywhere you want to run it, on-prem or in the cloud, or both because that is one of the great value propositions of Vertica is the hybrid capability, the ability to run in both your on prem environment and in the cloud. What's next, I've got three priority and roadmap slides. This is the first of the three. We're going to start with improvements to the core of Vertica. Starting with query crunching, which allows you to run long running queries faster by getting nodes to collaborate, you'll see that coming very soon. We'll be making improvements to large clusters and specifically large cluster mode. The management of large clusters over 60 nodes can be tedious. We intend to improve that. In part, by creating a third network channel to offload some of the communication that we're now loading onto our spread or agreement protocol. We'll be improving depot efficiency. We'll be pushing down more controls to the subcluster level, allowing you to control your resource pools at the subcluster level and we'll be pairing tuple moving with data loading. From an operational flexibility perspective, we want to make it very easy to shut down and revive primaries and secondaries on-prem and in the cloud. Right now, it's a little bit tedious, very doable. We want to make it as easy as a walk in the park. We also want to allow you to be able to revive into a different size subcluster and last but not least, in fact, probably the most important, the ability to change shard count. This has been a sticking point for a lot of people and it puts a lot of pressure on the early decision of how many shards should my database be? Whether it's in 2020 or 2021. We know it's important to you so it's important to us. Ease of use is also important to us and we're making big investments in the management console, to improve managing subclusters, as well as to help you manage your load balancer groups. We also intend to grow and extend Eon Mode to new environments. Now we'll take questions and answers

Published Date : Mar 30 2020

SUMMARY :

and our engineering team is planning to join the forums and going to the third tick, you can see that and the second to the second subcluster and the improvement to the

ENTITIES

Entity	Category	Confidence
David Sprogis	PERSON	0.99+
David	PERSON	0.99+
one	QUANTITY	0.99+
Dave	PERSON	0.99+
John Yovanovich	PERSON	0.99+
10 users	QUANTITY	0.99+
Paige Roberts	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
Yuanzhe Bei	PERSON	0.99+
John	PERSON	0.99+
five minutes	QUANTITY	0.99+
2020	DATE	0.99+
Amazon	ORGANIZATION	0.99+
30 seconds	QUANTITY	0.99+
50	QUANTITY	0.99+
second issue	QUANTITY	0.99+
12	QUANTITY	0.99+
Yuanzhe	PERSON	0.99+
120 terabytes	QUANTITY	0.99+
30 users	QUANTITY	0.99+
two types	QUANTITY	0.99+
2021	DATE	0.99+
Paige	PERSON	0.99+
30 minutes	QUANTITY	0.99+
three pairs	QUANTITY	0.99+
second	QUANTITY	0.99+
first	QUANTITY	0.99+
nine nodes	QUANTITY	0.99+
first subcluster	QUANTITY	0.99+
two tables	QUANTITY	0.99+
two nodes	QUANTITY	0.99+
first issue	QUANTITY	0.99+
each copy	QUANTITY	0.99+
2 seconds	QUANTITY	0.99+
36 hours	QUANTITY	0.99+
second subcluster	QUANTITY	0.99+
fourth node	QUANTITY	0.99+
each	QUANTITY	0.99+
six nodes	QUANTITY	0.99+
third subcluster	QUANTITY	0.99+
both	QUANTITY	0.99+
twice	QUANTITY	0.99+
First issue	QUANTITY	0.99+
three segments	QUANTITY	0.99+
today	DATE	0.99+
three bars	QUANTITY	0.99+
24	QUANTITY	0.99+
5X	QUANTITY	0.99+
Today	DATE	0.99+
16 nodes	QUANTITY	0.99+
Alibaba	ORGANIZATION	0.99+
each segment	QUANTITY	0.99+
first node	QUANTITY	0.99+
three slices	QUANTITY	0.99+
Each subcluster	QUANTITY	0.99+
each nodes	QUANTITY	0.99+
three nodes	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
two subclusters	QUANTITY	0.98+
three servers	QUANTITY	0.98+
four shards	QUANTITY	0.98+
3X	QUANTITY	0.98+
three	QUANTITY	0.98+
two concurrent sessions	QUANTITY	0.98+

Day Zero Analysis | Cisco Live EU Barcelona 2020

>> Announcer: Live from Barcelona Spain, it's theCUBE. Covering Cisco Live 2020. Brought to you by Cisco and its ecosystem partners. >> Hello everyone, welcome to theCUBE here live in Barcelona Spain for Cisco Live 2020. This is our first CUBE event for the year. Next 10 years of CUBE history, we look back 10 years since we've been around, for 10 years, we have another 10 more we're looking forward to. And this is the first event for 2020 Cisco Live at Barcelona. I'm John Furrier, your host, with Dave Vellante, Stuart Miniman, extracting the signal from the noise. The cloud business is noisy, the networking business is under siege and changing, Dave and Stu we're pre gaming, Cisco Live, kicking off the show, end of the first kind of pre day, Tomorrow's the big keynotes. David Geckler, Verizon exec is preparing to announce rumor has it some insights into what Cisco's position will be vis a vis cloudification, that's going to change their portfolio and probably identify some opportunities, and also some potential gaps in their strategy and what they can do to be competitive. The number one leader in networking, they got a great market position. But cloud is changing the game with networking. >> Yeah, john, it's funny, I heard you talking about, the 10 years and everything. 10 years ago, if I thought about Cisco, I'd be looking at the I pattern of getting the jitter out of the network and trying to tweak everything. And today, what are we talking about with Cisco? We're talking about software, we're talking about cloud. We're talking about developers. Yeah, they're a networking company at its core, but Cisco has been going through a significant transformation, it's been an interesting one to watch. Dave, you wrote a little bit about, Cisco is one of the four horsemen of the internet era, of course the dotcom, they were one of the ones that actually survived and thrived after the dotcom burst, but Cisco is a very different company today far from the $500 billion market cap that they had a few years back, they were at about $200 billion, but still dominant in switching and routing. But there are threats from a number of environments and a lot of changes as to what you need to think about when it comes to Cisco. >> Well, sometimes it's instructive to look back and see how we got here. Cisco made three big bets during its ascendancy, the first one was it bet on IP, I mean, John, you've talked about this a lot, it decimated the mini computer industry by connecting distributed computing and client server, the underlying plumbing there. The second big bet it made was it trained a bunch of engineers, the Cisco certified engineers CCIEs, and they used that as a lever and created a whole army of people that were Cisco advocates, and that was just a brilliant move. And the third was under the leadership of John Chambers, They did about 180 acquisitions, and they were quite good at acquisitions and what that did for them is it continued to fuel growth, it filled in gaps and it kept them relevant with customers. Now, part of that, too, was, Chambers had dozens and dozens of adjacent businesses, remember, he said they were all going to be a billion dollars. Well, most of them, didn't pan out. So they had to cut and burn, and so but now under the leadership of Roberts, they're a much more focused company, kind of getting back to basics, trying to bet on sure things and so let's talk about what some of those sure things are and how Cisco's performing. >> Well it's clear you said lever, they're got to pull a lever at some point and turn the boat that is Cisco, aircraft carrier, what do you want to call it? In the right direction? That's been something that, we've been covering Cisco for decades Stu as you just pointed out, and while we've been close to all the action, I think Cisco knows what's going on. It's clear to me that they kind of understand the landscape. They understand their opportunities in the future, but they're a massive business, Dave, you pointed it out. The combination of all those mergers. The thing that got my attention was as they understood the unification many, many years ago in the compute side, you saw Cisco clearly understands the unification. They know cloud is here, they know that do not make a move, that's cloud friendly, they were going to get swept away and be adrift with the next wave, which is cloud 3.0, whatever we call it. So to me, that's the big story with Cisco. What is the impact of the company when you cloudify business? That's not public cloud, that's hybrid public, economics are changing, the compute capabilities are changing, the network capabilities are changing, got the edge. I think Cisco will be defined by their actions over the next two to three years. What they announce, how they position it, and what they bring value to the customers because you got Silicon One chip, good move, got cloud position, got App D on the on the top of the stack, you got cloud center, they're trying to get to the cloud, but you can't do that until you have the subscription business, until you, can't do pricing by usage unless you have that model. So I think it's a brick by brick, but slowly they're doing it. We have to hear some things next year on Cisco, on how they're going to be true, cloud enabled? >> Well, software is a huge play for them, right? I mean, they've got it, because Cisco's been the dominant player in networking with two thirds of the market, I mean, they've sustained that for a decade plus, and it has allowed them to drive 60 plus percent gross margins for years and years and years, huge operating margin. So how are they going to continue that? Software is the key. And as you say, John, subscriptions is the cloud model that is critical for Cisco. Now they talk about 70% of their software business is subscriptions and annual recurring revenue, it's unclear really how big their software business is, they give hints, I'd peg it at about seven to eight billion last year, maybe growing to 10, 12 billion this year. So pretty sizable, but that's critical in terms of them driving the margins that they need to throw off free cash flow so they can invest in things like stock buybacks and dividends which prop up the stock. >> Well, the problem is you start chasing your tail on the stock price and or product TAMS and product revenue, you might actually miss the boat on the new product. So it's a balance between cannibalizing your own before you can bring in the new, and this is going to be the challenge with Cisco, when do they bite the bullet and say, "Okay, we got to get a position on this piece here "or that piece there, ultimately, "it's going to be about customers." And what do we know, public cloud succeeded with one data, hybrid cloud is a reality and people are executing specific technologies to do an operating model that's cloud And to me, the big wave for Cisco, in my opinion, is multi cloud, because that's not a technology. That's just, that's a value proposition, it's not so much a technology. >> Yeah, Dave, you mentioned a lot of the acquisitions that Cisco has done. In many ways, though, some of the areas where Cisco can be defined is the acquisitions that they didn't do. Cisco did not buy VMware, and were behind in the virtualization wave. And then they created UCS and that actually was a great tailwind for them, created their data center business. They did not end up buying Nicira, and yet, Nicira's done very well. But if you talk to most customers well, even if you're deploying NSX, whose hardware do you tend to have? Well, yes, sure, it might be Arista, might be somebody else's but Cisco still doing good, going well, so they haven't had, there hasn't been a silver bullet to kill Cisco's dominant, but how are they going to do without cloudification? The data center group has gone through a lot of challenges. If you look at they fumbled along with OpenStack, like many other companies did, they went through just as VMware really failed with VCloud Air, the cloud group inside of Cisco had, they had this large Cisco offering that for a couple of years, everybody's looking, I don't know, are you enabling service providers? What are you doing? Now they have management pieces, they're partnering with Google, Amazon, Azure, across the environments, they are heavily involved in Kubernetes and the service meshes. So it remains to be seen where Cisco will find that next Tam expansion to kind of take them to the next wave. >> But Stu, acquisition is a good piece. And what I think they got to do some M&A clearly and organic but the question is would Nicira have been successful at Cisco versus VMware. Look at the timing of that, I think VMware being bought would have been a home run. But Nicira, I don't think that succeeds at Cisco. I think that would be a bunch of knife fights internally. And Nicira would have been shifted up because what it was then and what it is now and VMware are two different things because VMware took it, and shaped it, that I don't think Cisco could have done it at that time, >> The success would have been a defensive move to keep VMware out. That would have been the nature of the success, but I think you're right, the infighting would have been brutal, but VMware wouldn't have Nicira. >> VMware, What they did when they bought Nicira is they spent the first three or four years just making it an extension of VMware. Now it's starting to become their multi Cloud Interconnect. And that's where we need to see Cisco be involved. Cisco's bought many companies that have promised to be multicloud management or that interconnecting fabric and they have not yet panned out. >> Well, security is the linchpin though here, they've made a bunch of acquisitions in security. And I've always said that they've got a position, their networking is the most cost effective, the highest performance and the most secure to connect multiple clouds to hybrid on-prem. And they're in a good position to do that. >> Well, I think I've always said this from day one, you guys know I'm harping on this, Stu and I, we High Five each other all the time when we say this, but back in the days in IT days, the heyday, if you were a network operator, network designer, network architect, you were the king, king or queen. So you had the keys to the kingdom. VMware is a legitimate threat to Cisco. They compete, and they talk about that all the time. But the question is, which community has the keys to the kingdom? Rhetorical question. >> Yeah, well John one point I made earlier, (John laughing) >> Okay, go ahead. >> I remember Pat Gelsinger got on stage and he's like, "Hey, here's the largest collection of network admins" and everybody's looking at him, what are you talking about Pat? When I talk to customers that are deploying NSX, it is mostly not the network team, it is the virtualization team, and they're still often fighting with the network team. But to your point, where I've seen some of the really smart network architects, and people building stuff, Amazon, Azure, Google all have phenomenal people, and they're building environment back Cisco needs to make sure they partner and are embedded there. >> If you, Dave mentioned the leverage. Cisco's got to pull that lever or, turn the boat around and one shift move now, or otherwise, they'll lose that leverage. They have more power than they think in my opinion, they probably do know, but they have the network. And I think the network guys trump the operating guys, because you always swap in operating staff, but you got the network, and the network runs the business. No one could swap out Cisco boxes for a Synopsis years ago, so or Bay, whatever it turned into, so they have that nested position. If they lose that they're done. >> Yeah, and I agree with you, John, there's a lot of, Stu, you pointed out this, people buy NSX and Cisco ACI, but my question is, okay, how long will that redundancy last? I think, to your rhetorical question, Cisco is sitting in the catbird seat and they know networking, they're investing in it. I don't think they're going to lose sight of that. Yeah, wrist is, common Adam and Juniper, but Cisco, they know how to manage that business and maintain its leadership. I guess my question is, have they lost that acquisition formula? Are they as good at acquisitions as they used to be? >> I think their old model's flawed for the modern era. I think the acquisition's got to come in and integrate and I think VMware has proven that they can do acquisitions right. I think that comes from the EMC kind of concept where it's got to fit in beautifully and have synergies right away. I think what Pat Gelsinger is doing I think he's smart and I think that's why VMware is so successful. They got great technical talent, they know the right waves to be on and they execute. So I think Cisco has got to get out of these siloed acquisitions, this business unit mindset and have things come in, if they work, in line with the strategy and the execution. It has to from day one, I've got it. You got to be fitting perfectly in. >> The portfolio is still pretty complicated. You got the core networking. You got things like WebEx, right? I mean, would you want to be going up against Microsoft Teams? But they're in it, Cisco's in it to win it, and they got to they got to talk about-- >> Don't count out Zoom. >> Talking about, no, Zoom's right there too in the mix. And so Cisco's got some work to do, expect some enhancements coming there, in HCI, they've got to walk a fine line Stu, you made this point. On the one hand, they've got, IBM and NetApp with UCS and conversion infrastructure, but then they buy Springpath, which is designed to replace converged infrastructure. So they've got to walk that fine line. >> All right, what are you guys going to hear this week? Let's just wrap this up by going down the line on thoughts and predictions as the keynote kicks off tomorrow, I took some notes, I was doing some, going around the floor trying to get inside people's heads and ask them probing questions. And here's what I got out of it. I think Cisco is going to recognize cloud and absolutely throw the holy water on the fact that it's part of their strategy. I think we'll hear a little bit about Silicon One and how it relates to the portfolio, but I think the big story will be how tying the application environment together with networking, not end to end but really as one seamless solution for customers. I think it's going to be a top story that's been teased out by some of the booths that I saw, connecting things as one holistic thing with application development focus with DevOps. >> Yeah, so John, ACI was application centric infrastructure. And it was critical back in back in the day there is like, well, the application owner really doesn't have much connection there. If you look at what Cisco has been doing the last few years, it is tying together more that application owner, the DevNet group that, we're sitting here in the DevNet zone, that connection between the developer and making enabling them as part of the business absolutely is a wave that Cisco needs to drive. I don't think we're going to see a ton of the Silicon One, 5G and that kind of stuff, if for no other reason then in about a month, they're going to be sitting here with 100,000 people from Mobile World Congress and that's where they keep their dry powder to make sure that they push that piece of it. But that is super important, so and yeah. >> I think, software and security, I mean, I, as you were talking about, Zoom, Teams, so they better focus on collaboration and I want to hear some stuff there, security, IoT and the edge. They've got a very strong position there. Their security, Cisco security business grew 22% last quarter, it's really doing well. So I want to hear more about that. And I think data center, what they're doing in the data center, what they're doing with their switching business, their HCI stuff and converged infrastructure, hyper converged and, three important areas that we'll hear about this week. >> And Dave, I'll emphasise on what you were saying. Edge edge edge, absolutely, if Cisco is going to maintain a dominant player in the network, they need to deliver on that edge. And I've heard a couple of messaging strategies in the past, there was fog computing and all this other stuff, but I think Cisco is in a position today between Meraki that they have between their core product, >> Dave: Devnet. >> To really be able to enable-- >> And those are really-- >> Well, I want to see more progress, I'm looking forward to see, I'm going to drill them on the interviews we do here. They spent millions, billions of dollars satisfying and creating a subscription model with the cloud. We're going to dig into it, we're going to extract the signal from the noise, theCUBE coverage here in Barcelona, Spain. First show of 2020, Cisco Live 2020, I'm John Furrier, Stuart Miniman, Dave Vellante. We'll be right back. (upbeat music)

Published Date : Jan 27 2020

SUMMARY :

Brought to you by Cisco But cloud is changing the game with networking. and a lot of changes as to what you need to think about So they had to cut and burn, So to me, that's the big story with Cisco. driving the margins that they need to throw off Well, the problem is you start chasing your tail but how are they going to do without cloudification? but the question is would Nicira have been successful to keep VMware out. Cisco's bought many companies that have promised to be And they're in a good position to do that. but back in the days in IT days, the heyday, But to your point, where I've seen some of the really smart Cisco's got to pull that lever or, turn the boat around I don't think they're going to lose sight of that. I think the acquisition's got to come in and integrate and they got to they got to talk about-- On the one hand, they've got, IBM and NetApp with UCS I think it's going to be a top story that's been teased out in about a month, they're going to be sitting here in the data center, what they're doing with their they need to deliver on that edge. We're going to dig into it, we're going to extract the signal

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
David Geckler	PERSON	0.99+
Stuart Miniman	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
John	PERSON	0.99+
VMware	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Verizon	ORGANIZATION	0.99+
Pat Gelsinger	PERSON	0.99+
$500 billion	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
10, 12 billion	QUANTITY	0.99+
22%	QUANTITY	0.99+
Barcelona	LOCATION	0.99+
Stu	PERSON	0.99+
NSX	ORGANIZATION	0.99+
10 years	QUANTITY	0.99+
Barcelona, Spain	LOCATION	0.99+
next year	DATE	0.99+
ACI	ORGANIZATION	0.99+
Pat	PERSON	0.99+
Barcelona Spain	LOCATION	0.99+
UCS	ORGANIZATION	0.99+
Nicira	ORGANIZATION	0.99+

Tim Vincent & Steve Roberts, IBM | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicon Valley, it's theCUBE, overing DataWorks Summit 2018. Brought to you by Hortonworks. >> Welcome back everyone to day two of theCUBE's live coverage of DataWorks, here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host James Kobielus. We have two guests on this panel today, we have Tim Vincent, he is the VP of Cognitive Systems Software at IBM, and Steve Roberts, who is the Offering Manager for Big Data on IBM Power Systems. Thanks so much for coming on theCUBE. >> Oh thank you very much. >> Thanks for having us. >> So we're now in this new era, this Cognitive Systems era. Can you set the scene for our viewers, and tell our viewers a little bit about what you do and why it's so important >> Okay, I'll give a bit of a background first, because James knows me from my previous role as, and you know I spent a lot of time in the data and analytics space. I was the CTO for Bob running the analytics group up 'til about a year and a half ago, and we spent a lot of time looking at what we needed to do from a data perspective and AI's perspective. And Bob, when he moved over to the Cognitive Systems, Bob Picciano who's my current boss, Bob asked me to move over and really start helping build, help to build out more of a software, and more of an AI focus, and a workload focus on how we thinking of the Power brand. So we spent a lot of time on that. So when you talk about cognitive systems or AI, what we're really trying to do is think about how you actually couple a combination of software, so co-optimize software space and the hardware space specific of what's needed for AI systems. Because the act of processing, the data processing, the algorithmic processing for AI is very, very different then what you would have for traditional data workload. So we're spending a lot of time thinking about how you actually co-optimize those systems so you can actually build a system that's really optimized for the demands of AI. >> And is this driven by customers, is this driven by just a trend that IBM is seeing? I mean how are you, >> It's a combination of both. >> So a lot of this is, you know, there's a lot of thought put into this before I joined the team. So there was a lot of good thinking from the Power brand, but it was really foresight on things like Moore's Law coming to an end of it's lifecycle right, and the ramifications to that. And at the same time as you start getting into things like narrow NATS and the floating point operations that you need to drive a narrow NAT, it was clear that we were hitting the boundaries. And then there's new technologies such as what Nvidia produces with with their GPUs, that are clearly advantageous. So there's a lot of trends that were comin' together the technical team saw, and at the same time we were seeing customers struggling with specific things. You know how to actually build a model if the training time is going to be weeks, and months, or let alone hours. And one of the scenarios I like to think about, I was probably showing my age a bit, but went to a school called University of Waterloo, and when I went to school, and in my early years, they had a batch based system for compilation and a systems run. You sit in the lab at night and you submit a compile job and the compile job will say, okay it's going to take three hours to compile the application, and you think of the productivity hit that has to you. And now you start thinking about, okay you've got this new skill in data scientists, which is really, really hard to find, they're very, very valuable. And you're giving them systems that take hours and weeks to do what the need to do. And you know, so they're trying to drive these models and get a high degree of accuracy in their predictions, and they just can't do it. So there's foresight on the technology side and there's clear demand on the customer side as well. >> Before the cameras were rolling you were talking about how the term data scientists and app developers is used interchangeably, and that's just wrong. >> And actually let's hear, 'cause I'd be in this whole position that I agree with it. I think it's the right framework. Data science is a team sport but application development has an even larger team sport in which data scientists, data engineers play a role. So, yeah we want to hear your ideas on the broader application development ecosystem, and where data scientists, and data engineers, and sort, fall into that broader spectrum. And then how IBM is supporting that entire new paradigm of application development, with your solution portfolio including, you know Power, AI on Power? >> So I think you used the word collaboration and team sport, and data science is a collaborative team sport. But you're 100% correct, there's also a, and I think it's missing to a great degree today, and it's probably limiting the actual value AI in the industry, and that's had to be data scientists and the application developers interact with each other. Because if you think about it, one of the models I like to think about is a consumer-producer model. Who consumes things and who produces things? And basically the data scientists are producing a specific thing, which is you know simply an AI model, >> Machine models, deep-learning models. >> Machine learning and deep learning, and the application developers are consuming those things and then producing something else, which is the application logic which is driving your business processes, and this view. So they got to work together. But there's a lot of confusion about who does what. You know you see people who talk with data scientists, build application logic, and you know the number of people who are data scientists can do that is, you know it exists, but it's not where the value, the value they bring to the equation. And the application developers developing AI models, you know they exist, but it's not the most prevalent form fact. >> But you know it's kind of unbalanced Tim, in the industry discussion of these role definitions. Quite often the traditional, you know definition, our sculpting of data scientist is that they know statistical modeling, plus data management, plus coding right? But you never hear the opposite, that coders somehow need to understand how to build statistical models and so forth. Do you think that the coders of the future will at least on some level need to be conversant with the practices of building,and tuning, or training the machine learning models or no? >> I think it's absolutely happen. And I will actually take it a step further, because again the data scientist skill is hard for a lot of people to find. >> Yeah. >> And as such is a very valuable skill. And what we're seeing, and we are actually one of the offerings that we're pulling out is something called PowerAI Vision, and it takes it up another level above the application developer, which is how do you actually really unlock the capabilities of AI to the business persona, the subject matter expert. So in the case of vision, how do you actually allow somebody to build a model without really knowing what a deep learning algorithm is, what kind of narrow NATS you use, how to do data preparation. So we build a tool set which is, you know effectively a SME tool set, which allows you to automatically label, it actually allows you to tag and label images, and then as you're tagging and labeling images it learns from that and actually it helps automate the labeling of the image. >> Is this distinct from data science experience on the one hand, which is geared towards the data scientists and I think Watson Analytics among your tools, is geared towards the SME, this a third tool, or an overlap. >> Yeah this is a third tool, which is really again one of the co-optimized capabilities that I talked about, is it's a tool that we built out that really is leveraging the combination of what we do in Power, the interconnect which we have with the GPU's, which is the NVLink interconnect, which gives us basically a 10X improvement in bandwidth between the CPU and GPU. That allows you to actually train your models much more quickly, so we're seeing about a 4X improvement over competitive technologies that are also using GPU's. And if we're looking at machine learning algorithms, we've recently come out with some technology we call Snap ML, which allows you to push machine learning, >> Snap ML, >> Yeah, it allows you to push machine learning algorithms down into the GPU's, and this is, we're seeing about a 40 to 50X improvement over traditional processing. So it's coupling all these capabilities, but really allowing a business persona to something specific, which is allow them to build out AI models to do recognition on either images or videos. >> Is there a pre-existing library of models in the solution that they can tap into? >> Basically it allows, it has a, >> Are they pre-trained? >> No they're not pre-trained models that's one of the differences in it. It actually has a set of models that allow, it picks for you, and actually so, >> Oh yes, okay. >> So this is why it helps the business persona because it's helping them with labeling the data. It's also helping select the best model. It's doing things under the covers to optimize things like hyper-parameter tuning, but you know the end-user doesn't have to know about all these things right? So you're tryin' to lift, and it comes back to your point on application developers, it allows you to lift the barrier for people to do these tasks. >> Even for professional data scientists, there may be a vast library of models that they don't necessarily know what is the best fit for the particular task. Ideally you should have, the infrastructure should recommend and choose, under various circumstances, the models, and the algorithms, the libraries, whatever for you for to the task, great. >> One extra feature of PowerAI Enterprises is that it does include a way to do a quick visual inspection of a models accuracy with a small data sample before you invest in scaling over a cluster or large data set. So you can get a visual indicator as to the, whether the models moving towards accuracy or you need to go and test an alternate model. >> So it's like a dashboard, of like Gini coefficients and all that stuff, okay. >> Exactly it gives you a snapshot view. And the other thing I was going to mention, you guys talked about application development, data scientists and of course a big message here at the conference is, you know data science meets big data and the work that Hortonworks is doing involving the notion of container support in YARN, GPU awareness in YARN, bringing data science experience, which you can include the PowerAI capability that Tim was talking about, as a workload tightly coupled with Hadoop. And this is where our Power servers are really built, not for just a monolithic building block that always has the same ratio of compute and storage, but fit for purpose servers that can address either GPU optimized workloads, providing the bandwidth enhancements that Tim talked about with the GPU, but also day-to-day servers, that can now support two terrabytes of memory, double the overall memory bandwidth on the box, 44 cores that can support up to 176 threads for parallelization of Spark workloads, Sequel workloads, distributed data science workloads. So it's really about choosing the combination of servers that can really mix this evolving workload need, 'cause a dupe isn't now just map produced, it's a multitude of workloads that you need to be able to mix and match, and bring various capabilities to the table for a compute, and that's where Power8, now Power9 has really been built for this kind of combination workloads where you can add acceleration where it makes sense, add big data, smaller core, smaller memory, where it makes sense, pick and choose. >> So Steve at this show, at DataWorks 2018 here in San Jose, the prime announcement, partnership announced between IBM and Hortonworks was IHAH, which I believe is IBM Host Analytics on Hortonworks. What I want to know is that solution that runs inside, I mean it runs on top of HDP 3.0 and so forth, is there any tie-in from an offering management standpoint between that and PowerAI so you can build models in the PowerAI environment, and then deploy them out to, in conjunction with the IHAH, is there, going forward, I mean just wanted to get a sense of whether those kinds of integrations. >> Well the same data science capability, data science experience, whether you choose to run it in the public cloud, or run it in private cloud monitor on prem, it's the same data science package. You know PowerAI has a set of optimized deep-learning libraries that can provide advantage on power, apply when you choose to run those deployments on our Power system alright, so we can provide additional value in terms of these optimized libraries, this memory bandwidth improvements. So really it depends upon the customer requirements and whether a Power foundation would make sense in some of those deployment models. I mean for us here with Power9 we've recently announced a whole series of Linux Power9 servers. That's our latest family, including as I mentioned, storage dense servers. The one we're showcasing on the floor here today, along with GPU rich servers. We're releasing fresh reference architecture. It's really to support combinations of clustered models that can as I mentioned, fit for purpose for the workload, to bring data science and big data together in the right combination. And working towards cloud models as well that can support mixing Power in ICP with big data solutions as well. >> And before we wrap, we just wanted to wrap. I think in the reference architecture you describe, I'm excited about the fact that you've commercialized distributed deep-learning for the growing number of instances where you're going to build containerized AI and distributing pieces of it across in this multi-cloud, you need the underlying middleware fabric to allow all those pieces to play together into some larger applications. So I've been following DDL because you've, research lab has been posting information about that, you know for quite a while. So I'm excited that you guys have finally commercialized it. I think there's a really good job of commercializing what comes out of the lab, like with Watson. >> Great well a good note to end on. Thanks so much for joining us. >> Oh thank you. Thank you for the, >> Thank you. >> We will have more from theCUBE's live coverage of DataWorks coming up just after this. (bright electronic music)

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon he is the VP of Cognitive little bit about what you do and you know I spent a lot of time And at the same time as you how the term data scientists on the broader application one of the models I like to think about and the application developers in the industry discussion because again the data scientist skill So in the case of vision, on the one hand, which is geared that really is leveraging the combination down into the GPU's, and this is, that's one of the differences in it. it allows you to lift the barrier for the particular task. So you can get a visual and all that stuff, okay. and the work that Hortonworks is doing in the PowerAI environment, in the right combination. So I'm excited that you guys Thanks so much for joining us. Thank you for the, of DataWorks coming up just after this.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Bob	PERSON	0.99+
Steve Roberts	PERSON	0.99+
Tim Vincent	PERSON	0.99+
IBM	ORGANIZATION	0.99+
James	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Bob Picciano	PERSON	0.99+
Steve	PERSON	0.99+
San Jose	LOCATION	0.99+
100%	QUANTITY	0.99+
44 cores	QUANTITY	0.99+
two guests	QUANTITY	0.99+
Tim	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
10X	QUANTITY	0.99+
Nvidia	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
IBM Power Systems	ORGANIZATION	0.99+
Cognitive Systems Software	ORGANIZATION	0.99+
today	DATE	0.99+
three hours	QUANTITY	0.99+
one	QUANTITY	0.99+
both	QUANTITY	0.99+
Cognitive Systems	ORGANIZATION	0.99+
University of Waterloo	ORGANIZATION	0.98+
third tool	QUANTITY	0.98+
DataWorks Summit 2018	EVENT	0.97+
50X	QUANTITY	0.96+
PowerAI	TITLE	0.96+
DataWorks 2018	EVENT	0.93+
theCUBE	ORGANIZATION	0.93+
two terrabytes	QUANTITY	0.93+
up to 176 threads	QUANTITY	0.92+
40	QUANTITY	0.91+
about	DATE	0.91+
Power9	COMMERCIAL_ITEM	0.89+
a year and a half ago	DATE	0.89+
IHAH	ORGANIZATION	0.88+
4X	QUANTITY	0.88+
IHAH	TITLE	0.86+
DataWorks	TITLE	0.85+
Watson	ORGANIZATION	0.84+
Linux Power9	TITLE	0.83+
Snap ML	OTHER	0.78+
Power8	COMMERCIAL_ITEM	0.77+
Spark	TITLE	0.76+
first	QUANTITY	0.73+
PowerAI	ORGANIZATION	0.73+
One extra	QUANTITY	0.71+
DataWorks	ORGANIZATION	0.7+
day two	QUANTITY	0.69+
HDP 3.0	TITLE	0.68+
Watson Analytics	ORGANIZATION	0.65+
Power	ORGANIZATION	0.58+
NVLink	OTHER	0.57+
YARN	ORGANIZATION	0.55+
Hadoop	TITLE	0.55+
theCUBE	EVENT	0.53+
Moore	ORGANIZATION	0.45+
Analytics	ORGANIZATION	0.43+
Power9	ORGANIZATION	0.41+
Host	TITLE	0.36+

Analysis of Cisco | DevNet Create 2018

live from the Computer History Museum in Mountain View California it's the cube covering Devon that create 2018 brought to you by Cisco hey welcome back everyone live here at dev net create Cisco's event here at this Museum in Mountain View California art a Silicon Valley I'm here with Laura Cooney we here for two days wall-to-wall coverage breaking down Cisco's move into the DevOps Wow developer world separate from the dev net community which is the Cisco Developer Program and we've been breaking down Lauren great to have you this past two days so we talked to a lot of the Cisco folks a lot of the practitioners let's analyze it let's discuss kind of what's going on first of all Cisco creates a new group almost a year ago next month called dev net create to get out of the Cisco bubble and go out into the cloud world and see if they can't connect the cloud ecosystem cloud native kubernetes all the micro services goodness is going on the application side on infrastructures code and bring that with the Cisco network engineering community who are plumbers network plumbers their network engineers they deal with provisioning gear routes well I think it's interesting because you have this CCIE number that has been decreasing over the past couple of years and that's that's not because the network is less important it's actually because new skill sets are emerging and folks need to take on these new skills to learn and to really flourish in their careers so I think what definite is doing is just tremendous in terms of enabling developers to move up stack to look at things like kubernetes to look at things like you know cloud native to look at new applications you can build new things that you can extend to API integration into you know new types of applications you know we had folks here that we're learning to code in Python for the first time and I think that's awesome I think that's great and the timing is perfect I mean I got to give credit to Susie we and her team at Cisco they have they doing they're doing all the right things I think the way they're handling this is they're not overly aggressive they're not arrogant they're humble they're learning they're listening and they're doing all the right things are bringing a lot to the table from the Cisco table to this community and they've got you know this is very cool but the timing is critical if they tried to do this four years ago how hard would have it been you know you've been there okay I mean they pull this off four years ago I think there was the the goal was always there four years ago but I think the timing was you know you you have to kind of put the the mission in order and get things up and running first you can't just you don't launch a community you build one and I think we you know Cisco really needed to build that core community first and that was that was super credit but even four years ago let's just go back and rewind the clock we was cloud then so it was still the purest DevOps culture it was certainly hard-charging was definitely flying but still even like a lot of the on-premise enterprise folks we're like still kind of poopoo in the cloud you even saw it four years ago Oracle just made their move a couple years ago to the cloud and they're still trying to catch up so you know these legacy vendors and Cisco is one they've pivoted nicely Cisco into this because now the timings there as kubernetes there's enough code to get glued in plugged in with the stack so I think timing has also been a tailwind timing was critical I mean back then we were talking about software-defined networking and you know new services that you could deliver to the cloud and new ways and then DevOps came in is like really the glory child right saying like this DevOps was gonna solve world hunger and and what she came what it came down to basically is you know it is a critical part but there are certain piece parts that needed to come together especially in the open source world to make these things happen I mean to me if I had to like point out I'm just riffing here but you know to me the seminal moment for a cloud and you know agile was happening that's a key driver but it was the fact that was horizontally scalable tech unstructured data the roles of databases software that was becoming this new lightweight glue layer control planes or moving up and down the stack so there wasn't one thing combination of these awesome things were happening that made people go whoa holy-holy we could do more if we think about scale differently skill differently and really how do you bring this you know and this is where you get to edge computing it's how do you actually bring these to the masses how do you go where the people are how do you store data where people are how do you extend security in new ways I mean that's that's gonna be super critical I think the other thing that's also pretty evident is that when you start having new entrants into a market start eating some of your breakfasts then they start eating some of your lunch then you go wait a minute if I don't do something my dinner is gonna be eaten I mean you starting to see people see their business at risk yeah this is a huge thing that that lights up to see XO the CEO the CEO o CDO CIO now it's like okay we got to make a move definitely I think that's that's the way that it has to be and in terms of Cisco I want to get your thoughts because I've always been talking about this and I'm a big Cisco fan I know a lot of people who work there been a big admirer of the company from day one and what they did in the internet generation they did bought a lot of cubbies which create a little bit of a mash mash but that's nothing I issue they really ran the networks what a great culture however we're now seeing applications driving a lot of value and the network needs to be programmable and the challenge that cisco has always been how do we will if the stack as a company and all the little scuttle butts and conversations and parties have been to hallway conversations francisco executives employees is that's been the internal debate how does cisco should cisco move up the stack and if so how so it's been kind of this internal thing good timing now to start moving up the stack because the automations here I think it was great timing four years ago to move up the stack to be honest I think that there were efforts then I know that I was engaged in some to do that rather quickly you know those turned into things that you know went one way or the other I think that there are the right people in the right places at Cisco now to make that actually happen I think you know we're a little early on that I think Suzy Zephyr is just tremendous in terms of driving the users up stack to have them learn these new skills and as they learn these new skills they're learning it on Cisco and that's gonna be really critical that's gonna have the pull power yeah I think this is got a chance a real great chance to and it's not a far reaching of a accomplishment either for them to do this is they can now actually build a developer program now because before they didn't have enough software but what Suzy's doing if I'm Chuck Roberts CEO of Cisco I'm doubling down and what's going on with definite definite create and I can take that def net component and almost kind of expand it out because Cisco has a developer option you look at what they're doing on the collaborative software side the stuff with video they have a total core confidence in video I mean they were early on so many things but now with I got WebEx they're still and so for video conferencing but still beyond that IOT is a video application well huge opportunity in these these communities that pop up and right now you don't have a product if you don't have a supporting community and so salutely be doubling down on this they need to double down on that they probably need to invest more in it than they are now I see it as absolutely critical as they move forward because you know Cisco wants to be one or two in the market for all their products all their solutions to have that they need to have the supporting community dude yeah we did two days here and you know and in terms of events it's not the big glam event it's really a early stage the only the second event within the it hasn't even been 12 months since the first dev net create what I'm impressed by what I love about the cube is we when you get at these early moments when you see it magic happening you get into the communities and you realize wow this is a team that could pull it off and I think Cisco's a company at Cisco live in Barcelona you know it really became apparent to me that Cisco's really pulling in the right direction on a couple things I feel that the big company thing that gotta kind of clean that up a bit just make it more nimble but they got their eye on the prize on video they could really crush the IOT opportunity and the leverage of the network is a huge asset and if they could make that programmable with an open source community behind it man this could be a whole nother Cisco almost bring back that look at the glory days I fully agree Lauren what are you up to these days I mean you got a new gig I do care about your new company and what you're working on you guys write in code you do Advisory do consulting actually stuff I mean you know I like my hands and lots of things so I think it's important to say that you know I've taken my experience at IBM and Microsoft and juniper and Cisco driving new innovation to market faster and new revenue channels and I've taken that and I've started a consulting firm called spark labs and what we do is we use new models like Minimum Viable Product and business model canvas to actually drive you know whether it's product whether it's service whether it's these you know new channels whether it's partner or whether you're just trying to kind of pull together your team in a new way we actually take this and and help you do it in a faster way and you know we've got the models we've got the background and you know we're working with companies that are big and small what kind of engagement you working on what kind of problems you saw so you know we have a larger company that we're working with and one of the things that they ran into is they had just changed around kind of their leadership and we've gone in and worked with their leadership team to kind of establish what this new team needs to look like what are they going to deliver on what are the metrics what are the you know kind of success things that that people are really trying to achieve and how do we empower this new team that has this new leader and you know how do we make sure that everyone's aligned I think that's part of it and the line that's critical alignment is you know you don't if you don't it's it's great to have an amazing vision but if you don't have the execution you're just not going to get there yeah Andy Jesse one of my favorite execs that I've interviewed he's pragmatic he's strong went to Harvard Law hold it against him but great super great guy but he's got a great philosophy I think I come from the Amazon culture is you argue all day long but once a decisions made you align behind it yeah so bring some constructive discourse to the table yep but once it's done they don't tolerate any you know yeah a dysfunctional aggressed passive-aggressive behavior okay say and if say your piece fuck a lot that's exactly it I mean we pull people together for a day or a day and a half and actually run them through the business model canvas which will align with like what their goals are what their mission is how their how their you know being seen in the market and lots of other things but the real goal there is to pull the team together on on you know what exactly those things are and the value that their organization has because if you can't deliver on that message you can't deliver on much more so you do need that alignment and teams are so all over the place often when you're running fast you kind of forget and so sometimes they need to be reminded what's your take on dev net create this year what's your thoughts I think it's great I mean I love the fact that you know there's folks from so many different backgrounds and so many different you know kind of technical areas here I love you know muraki's giving away 1.2 million dollars of equipment and software licenses I think that's phenomenal I'm impressed by a Cisco folks here not too overly overboard and and give them too many compliments because you know they'll get cocky no but still serious dis Cisco people that are here are kicking ass they're doing a great job they're got the microphones on they're doing the demos they're doing a lot they are jazz and they're they're not mailing it in either doing a great job and I think that's that's authentic genuine I think that's going to be a great you know seed in the in the community to grow that up again still they got a lot of work to do but I don't think it's too far of a bridge for Network guys to be cloud guys and to kind of find some middle ground so I think it's the timings perfect I think I'm super impressed with the team and I think this is a great path a Cisco to double down on and and really invest more in because it's definitely got legs and a big fan of the camp thing too we talked about the camp create where they had competitive teams hacking and spending two days on so you know love it love the culture but again early let's see where they go with it I mean if they can get the network ops go on there's DevOps for networks concepts yeah and bring it up and make it programmable couldn't ask for a better time with kubernetes all the coolness going on that microservices good time definitely a great time well great to host with you and we're here live at dev net create wrapping up two days of wall-to-wall coverage of the cube dev net create again this is the cloud ecosystem for cisco separate from the cool or dev net which is the Cisco developer program for all of Cisco a great opportunity for them of course the cubes here covering it we're gonna wrap this up and thanks for watching cube coverage here in the Computer History Museum in Mountain View California thanks for watching

Published Date : Apr 11 2018

SUMMARY :

I love the fact that you know there's

ENTITIES

Entity	Category	Confidence
Laura Cooney	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
Susie	PERSON	0.99+
two days	QUANTITY	0.99+
Andy Jesse	PERSON	0.99+
Chuck Roberts	PERSON	0.99+
1.2 million dollars	QUANTITY	0.99+
one	QUANTITY	0.99+
Barcelona	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
12 months	QUANTITY	0.99+
cisco	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Lauren	PERSON	0.99+
juniper	ORGANIZATION	0.99+
four years ago	DATE	0.99+
second event	QUANTITY	0.99+
four years ago	DATE	0.99+
Python	TITLE	0.99+
two days	QUANTITY	0.99+
four years ago	DATE	0.98+
francisco	ORGANIZATION	0.98+
2018	DATE	0.98+
spark labs	ORGANIZATION	0.98+
first time	QUANTITY	0.98+
Silicon Valley	LOCATION	0.98+
four years ago	DATE	0.98+
Mountain View	LOCATION	0.98+
Suzy Zephyr	PERSON	0.95+
a day or	QUANTITY	0.95+
Amazon	ORGANIZATION	0.94+
a year ago	DATE	0.93+
this year	DATE	0.93+
Mountain View California	LOCATION	0.92+
agile	TITLE	0.92+
one way	QUANTITY	0.91+
a day and a half	QUANTITY	0.91+
Suzy	PERSON	0.9+
IOT	TITLE	0.89+
dev net	ORGANIZATION	0.89+
day one	QUANTITY	0.88+
WebEx	TITLE	0.88+
Mountain View California	LOCATION	0.86+
DevNet	TITLE	0.85+
muraki	PERSON	0.84+
DevOps	TITLE	0.83+
past couple of years	DATE	0.82+
past two days	DATE	0.81+
a couple years ago	DATE	0.8+
lot of people	QUANTITY	0.78+
a minute	QUANTITY	0.78+
next month	DATE	0.77+
dev net	ORGANIZATION	0.77+
first dev	QUANTITY	0.74+
things	QUANTITY	0.74+
CEO	PERSON	0.73+
dev	ORGANIZATION	0.71+
Harvard Law	TITLE	0.7+

Steve Roberts, IBM– DataWorks Summit Europe 2017 #DW17 #theCUBE

>> Narrator: Covering DataWorks Summit, Europe 2017, brought to you by Hortonworks. >> Welcome back to Munich everybody. This is The Cube. We're here live at DataWorks Summit, and we are the live leader in tech coverage. Steve Roberts is here as the offering manager for big data on power systems for IBM. Steve, good to see you again. >> Yeah, good to see you Dave. >> So we're here in Munich, a lot of action, good European flavor. It's my second European, formerly Hadoop Summit, now DataWorks. What's your take on the show? >> I like it. I like the size of the venue. It's the ability to interact and talk to a lot of the different sponsors and clients and partners, so the ability to network with a lot of people from a lot of different parts of the world in a short period of time, so it's been great so far and I'm looking forward to building upon this and towards the next DataWorks Summit in San Jose. >> Terri Virnig VP in your organization was up this morning, had a keynote presentation, so IBM got a lot of love in front of a fairly decent sized audience, talking a lot about the sort of ecosystem and that's evolving, the openness. Talk a little bit about open generally at IBM, but specifically what it means to your organization in the context of big data. >> Well, I am from the power systems team. So we have an initiative that we have launched a couple years ago called Open Power. And Open Power is a foundation of participants innovating from the power processor through all aspects, through accelerators, IO, GPUs, advanced analytics packages, system integration, but all to the point of being able to drive open power capability into the market and have power servers delivered not just through IBM, but through a whole ecosystem of partners. This compliments quite well with the Apache, Hadoop, and Spark philosophy of openness as it relates to software stack. So our story's really about being able to marry the benefits of open ecosystem for open power as it relates to the system infrastructure technology, which drives the same time to innovation, community value, and choice for customers as it relates to a multi-vendor ecosystem and coupled with the same premise as it relates to Hadoop and Spark. And of course, IBM is making significant contributions to Spark as part of the Apache Spark community and we're a key active member, as is Hortonworks with the ODPi organization forwarding the standards around Hadoop. So this is a one, two combo of open Hadoop, open Spark, either from Hortonworks or from IBM sitting on the open power platform built for big data. No other story really exists like that in the market today, open on open. >> So Terri mentioned cognitive systems. Bob Picciano has recently taken over and obviously has some cognitive chops, and some systems chops. Is this a rebranding of power? Is it sort of a layer on top? How should we interpret this? >> No, think of it more as a layer on top. So power will now be one of the assets, one of the sort of member family of the cognitive systems portion on IBM. System z can also be used as another great engine for cognitive in certain clients, certain use cases where they want to run cognitive close to the data and they have a lot of data sitting on System z. So power systems as a server really built for big data and machine learning, in particular our S822LC for high performance computing. This is a server which is landing very well in the deep learning, machine learning space. It offers the Tesla P100 GPU and with the NVIDIA NVLink technology can offer up to 2.8x bandwidth benefits CPU to GPU over what would be available through a PCIe Intel combination today. So this drives immediate value when you need to ensure that not just you're exploiting GPUs, but you of course need to move your data quickly from the processor to the GPU. >> So I was going to ask you actually, sort of what make power so well suited for big data and cognitive applications, particularly relative to Intel alternatives. You touched on that. IBM talks a lot about Moore's Law starting to hit its peak, that innovation is going to come from other places. I love that narrative 'cause it's really combinatorial innovation that's going to lead us in the next 50 years, but can we stay on that thread for a bit? What makes power so substantially unique, uniquely suited and qualified to run cognitive systems and big data? >> Yeah, it actually starts with even more of the fundamentals of the power processors. The power processor has eight threads per core in contrast to Intel's two threads per core. So this just means for being able to parallelize your workloads and workloads that come up in the cognitive space, whether you're running complex queries and need to drive SQL over a lot of parallel pipes or you're writing iterative computation, the same data set as when you're doing model training, these can all benefit from highly parallelized workloads, which can benefit from this 4x thread advantage. But of course to do this, you also need large, fast memory, and we have six times more cache per core versus Broadwell, so this just means you have a lot of memory close to the processor, driving that throughput that you require. And then on top of that, now we get to the ability to add accelerators, and unique accelerators such as I mentioned the NVIDIA in the links scenario for GPU or using the open CAPI as an approach to attach FPGA or Flash to get access speeds, processor memory access speeds, but with an attached acceleration device. And so this is economies of scale in terms of being able to offload specialized compute processing to the right accelerator at the right time, so you can drive way more throughput. The upper bounds are driving workload through individual nodes and being able to balance your IO and compute on an individual node is far superior with the power system server. >> Okay, so multi-threaded, giant memories, and this open CAPI gives you primitive level access I guess to a memory extension, instead of having to-- >> Yeah, pluggable accelerators through this high speed memory extension. >> Instead of going through, what I often call the horrible storage stack, aka SCSI, And so that's cool, some good technology discussion there. What's the business impact of all that? What are you seeing with clients? >> Well, the business impact is not everyone is going to start with supped up accelerated workloads, but they're going to get there. So part of the vision that clients need to understand is to begin to get more insights from their data is, it's hard to predict where your workloads are going to go. So you want to start with a server that provides you some of that upper room for growth. You don't want to keep scaling out horizontally by requiring to add nodes every time you need to add storage or add more compute capacity. So firstly, it's the flexibility, being able to bring versatile workloads onto a node or a small number of nodes and be able to exploit some of these memory advantages, acceleration advantages without necessarily having to build large scale out clusters. Ultimately, it's about improving time to insights. So with accelerators and with large memory, running workloads on a similar configured clusters, you're simply going to get your results faster. For example, recent benchmark we did with a representative set of TPC-DS queries on Hortonworks running on Linux and power servers, we're able to drive 70% more queries per hour over a comparable Intel configuration. So this is just getting more work done on what is now similarly priced infrastructure. 'Cause power family is a broad family that now includes 1U, 2U, scale out servers, along with our 192 core horsepowers for enterprise grade. So we can directly price compete on a scale out box, but we offer a lot more flexible choice as clients want to move up in the workload stack or to bring accelerators to the table as they start to experiment with machine learning. >> So if I understand that right, I can turn two knobs. I can do the same amount of work for less money, TCO play. Or, for the same amount of money, I can do more work. >> Absolutely >> Is that fair? >> Absolutely, now in some cases, especially in the Hadoop space, the size of your cluster is somewhat gated by how much storage you require. And if you're using the classic scale up storage model, you're going to have so many nodes no matter what 'cause you can only put so much storage on the node. So in that case, >> You're scaling storage. >> Your clusters can look the same, but you can put a lot more workload on that cluster or you can bring in IBM, a solution like IBM Spectrum Scale our elastic storage server, which allows you to essentially pull that storage off the nodes, put it in a storage appliance, and at that point, you now have high speed access to storage 'cause of course the network bandwidth has increased to the point that the performance benefit of local storage is no longer really a driving factor to a classic Hadoop deployment. You can get that high speed access in a storage appliance mode with the resiliency at far less cost 'cause you don't need 3x replication, you just have about a 30% overhead for the software erasure coding. And now with your compete nodes, you can really choose and scale those nodes just for your workload purposes. So you're not bound by the number of nodes equal total storage required by storage per node, which is a classic, how big is my cluster calculation. That just doesn't work if you get over 10 nodes, 'cause now you're just starting to get to the point where you're wasting something right? You're either wasting storage capacity or typically you're wasting compute capacity 'cause you're over provisioned on one side or the other. >> So you're able to scale compute and storage independent and tune that for the workload and grow that resource efficiently, more efficiently? >> You can right size the compute and storage for your cluster, but also importantly is you gain the flexibility with that storage tier, that data plan can be used for other non-HDFS workloads. You can still have classic POSIX applications or you may have new object based applications and you can with a single copy of the data, one virtual file system, which could also be geographically distributed, serving both Hadoop and non-Hadoop workloads, so you're saving then additional replicas of the data from being required by being able to onboard that onto a common data layer. >> So that's a return on asset play. You got an asset that's more fungible across the application portfolio. You can get more value out of it. You don't have to dedicate it to this one workload and then over provision for another one when you got extra capacity sitting here. >> It's a TCO play, but it's also a time saver. It's going to get you time to insight faster 'cause you don't have to keep moving that data around. The time you spend copying data is time you should be spending getting insights from the data, so having a common data layer removes that delay. >> Okay, 'cause it's HDFS ready I don't have to essentially move data from my existing systems into this new stovepipe. >> Yeah, we just present it through the HDFS API as it lands in the file system from the original application. >> So now, all this talk about rings of flexibility, agility, etc, what about cloud? How does cloud fit into this strategy? What do are you guys doing with your colleagues and cohorts at Bluemix, aka SoftLayer. You don't use that term anymore, but we do. When we get our bill it says SoftLayer still, but any rate, you know what I'm talking about. The cloud with IBM, how does it relate to what you guys are doing in power systems? >> Well the cloud is still, really the born on the cloud philosophy of IBM software analytics team is still very much the motto. So as you see in the data science experience, which was launched last year, born in the cloud, all our analytics packages whether it be our BigInsights software or our business intelligence software like Cognos, our future generations are landing first in the cloud. And of course we have our whole arsenal of Watson based analytics and APIs available through the cloud. So what we're now seeing as well as we're taking those born in the cloud, but now also offering a lot of those in an on-premise model. So they can also participate in the hybrid model, so data science experience now coming on premise, we're showing it at the booth here today. Bluemix has a on premise version as well, and the same software library, BigInsights, Cognos, SPSS are all available for on prem deployment. So power is still ideal place for hosting your on prem data and to run your analytics close to the data, and now we can federate that through hybrid access to these elements running in the cloud. So the focus is really being able to, the cloud applications being able to leverage the power and System z's based data through high speed connectors and being able to build hybrid configurations where you're running your analytics where they most make sense based upon your performance requirements, data security and compliance requirements. And a lot of companies, of course, are still not comfortable putting all their jewels in the cloud, so typically there's going to be a mix and match. We are expanding the footprint for cloud based offerings both in terms of power servers offered through SoftLayer, but also through other cloud providers, Nimbix is a partner we're working with right now who actually is offering our Power AI package. Power AI is a package of open source, deep learning frameworks, packaged by IBM, optimized for Power in an easily deployed package with IBM support available. And that's, could be deployed on premise in a power server, but also available on a pay per drink purpose through the Nimbix cloud. >> All right, we covered a lot of ground here. We talked strategy, we talked strategic fit, which I guess is sort of a adjunct to strategy, we talked a little bit about the competition and where you differentiate, some of the deployment models, like cloud, other bits and pieces of your portfolio. Can we talk specifically about the announcements that you have here at this event, just maybe summarize for use? >> Yeah, no absolutely. As it relates to IBM, and Hadoop, and Spark, we really have the full stack support, the rich analytics capabilities that I was mentioning, deep insight, prescriptive insights, streaming analytics with IBM Streams, Cognos Business Intelligence, so this set of technologies is available for both IBMs, Hadoop stack, and Hortonworks Hadoop stack today. Our BigInsights and IOP offering, is now out for tech preview, their next release their 4.3 release, is available for technical preview will be available for both Linux on Intel, Linux on power towards the end of this month, so that's kind of one piece of new Hadoop news at the analytics layer. As it relates to power systems, as Hortonworks announced this morning, HDP 2.6 is now available for Linux on power, so we've been partnering closely with Hortonworks to ensure that we have an optimized story for HDP running on power system servers as the data point I shared earlier with the 70% improved queries per hour. At the storage layer, we have a work in progress to certify Hortonworks, to certify Spectrum Scale file system, which really now unlocks abilities to offer this converged storage alternative to the classic Hadoop model. Spectrum Scale actually supports and provides advantages in both a classic Hadoop model with local storage or it can provide the flexibility of offering the same sort of multi-application support, but in a scale out model for storage that it also has the ability to form a part of a storage appliance that we call Elastic Storage Server, which is a combination of power servers and high density storage enclosures, SSD or spinning disk, depending upon the, or flash, depending on the configuration, and that certification will now have that as an available storage appliance, which could underpin either IBM Open Platform or HDP as a Hadoop data leg. But as I mentioned, not just for Hadoop, really for building a common data plane behind mixed analytics workloads that reduces your TCO through converged storage footprint, but more importantly, provides you that flexibility of not having to create data copies to support multiple applications. >> Excellent, IBM opening up its portfolio to the open source ecosystem. You guys have always had, well not always, but in the last 20 years, major, major investments in open source. They continue on, we're seeing it here. Steve, people are filing in. The evening festivities are about to begin. >> Steve: Yeah, yeah, the party will begin shortly. >> Really appreciate you coming on The Cube, thanks very much. >> Thanks a lot Dave. >> You're welcome. >> Great to talk to you. >> All right, keep it right there everybody. John and I will be back with a wrap up right after this short break, right back.

Published Date : Apr 6 2017

SUMMARY :

brought to you by Hortonworks. Steve, good to see you again. Munich, a lot of action, so the ability to network and that's evolving, the openness. as it relates to the system and some systems chops. from the processor to the GPU. in the next 50 years, and being able to balance through this high speed memory extension. What's the business impact of all that? and be able to exploit some of these I can do the same amount of especially in the Hadoop space, 'cause of course the network and you can with a You don't have to dedicate It's going to get you I don't have to essentially move data as it lands in the file system to what you guys are and to run your analytics a adjunct to strategy, to ensure that we have an optimized story but in the last 20 years, Steve: Yeah, yeah, the you coming on The Cube, John and I will be back with a wrap up

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
Steve	PERSON	0.99+
Steve Roberts	PERSON	0.99+
Dave	PERSON	0.99+
Munich	LOCATION	0.99+
Bob Picciano	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Terri	PERSON	0.99+
3x	QUANTITY	0.99+
six times	QUANTITY	0.99+
70%	QUANTITY	0.99+
last year	DATE	0.99+
San Jose	LOCATION	0.99+
two knobs	QUANTITY	0.99+
Bluemix	ORGANIZATION	0.99+
NVIDIA	ORGANIZATION	0.99+
eight threads	QUANTITY	0.99+
Linux	TITLE	0.99+
Hadoop	TITLE	0.99+
both	QUANTITY	0.98+
one	QUANTITY	0.98+
Nimbix	ORGANIZATION	0.98+
today	DATE	0.98+
DataWorks Summit	EVENT	0.98+
SoftLayer	TITLE	0.98+
second	QUANTITY	0.97+
Hadoop Summit	EVENT	0.97+
Intel	ORGANIZATION	0.97+
Spark	TITLE	0.97+
IBMs	ORGANIZATION	0.95+
single copy	QUANTITY	0.95+
end of this month	DATE	0.95+
Watson	TITLE	0.95+
S822LC	COMMERCIAL_ITEM	0.94+
Europe	LOCATION	0.94+
this morning	DATE	0.94+
firstly	QUANTITY	0.93+
HDP 2.6	TITLE	0.93+
first	QUANTITY	0.93+
HDFS	TITLE	0.91+
one piece	QUANTITY	0.91+
Apache	ORGANIZATION	0.91+
30%	QUANTITY	0.91+
ODPi	ORGANIZATION	0.9+
DataWorks Summit Europe 2017	EVENT	0.89+
two threads per core	QUANTITY	0.88+
SoftLayer	ORGANIZATION	0.88+

Tom Roberts, SAP - #sapphirenow - theCUBE

>> Voiceover: From Orlando, Florida, it's theCUBE. (upbeat music) Covering Sapphire Now. Headlines sponsored by SAP HANA Cloud, the leader in platform as a service, with support from Console, Inc., the cloud internet company. Now, here's your host, Peter Burris. >> Welcome back to theCUBE. I'm Peter Burris, and theCUBE is, once again, our flagship platform for bringing what's happening in big events to the community. Today, we're here at SAP Sapphire and I'm being joined by Tom Roberts, who's the Global Vice President of Third-Party Software Solutions. Tom, we're going to spend some time talkin' about how you're working with the ecosystem at SAP to fill in some of those crucial gaps that customers face as they try to create those new outcomes with SAP-related technologies. Tell us a little bit about what your team does. >> Great, Peter, and thanks, appreciate you havin' us here. You know, Peter, one of the key things that Third-Party Solutions does, and what my team does, is we really help complete the solution. Right? So, it's a complex world. We've got customers out there trying to solve some very challenging problems and, of course, SAP brings the bulk of the solution there, but there's going to be some gaps. We've created unique relationships in our ecosystems in order to help fill that and deliver a complete solution. So, for example, you'll hear the name out in the marketplace, Solution Extensions, and that's our external branding. These are solutions that SAP sells on its paper, that have been tested and are supported by SAP, same as our own products, so the customer can buy with confidence and help get that total solution in place. >> So, it's your almost SAP-compliant additional software. >> Yeah, that's right. >> Excellent. That's a really interesting perspective. You know, it's interesting. Over the course of our two days here at Sapphire, and we'll be here tomorrow as well, two things have popped out that are a little bit different from SAP. First off, the tension between whether or not SAP was an applications company or platform company seems to have totally gone away. >> Yes. >> You're a platform company. >> That's correct. >> The second thing that I find very interesting is that SAP has always been the company that kind of, was a little bit more neutral, stood back. When a customer needed us, we'll show up and we'll do it. You're now being a little bit more aggressive about going after business, after some other companies' customers. How are you utilizing this extensions approach to more rapidly create a solutions fabric that can bring, that can rapidly grab new customers for SAP, and your partners? >> Well, Peter, you're right on the money. You know, it's no doubt that the industry has moved rapidly to the cloud. In fact, everybody said it would happen faster and it's happened even faster than they said it would. Everyone is, when they see results, they're always surprised, and cloud growth was even faster than we thought it would be. Now, what a lot of people haven't figured out, but I think SAP has, is that, in a cloud-based solution world, the expectation is that, one, it's seamlessly integrated, and, two, the experience of buying it is seamlessly integrated, and, three, it's supported in a seamlessly integrated way and that's what Solution Extensions delivers in the cloud. So, you take an example of the success we've had with the acquisition of SuccessFactors, growing great, growin' well in the industry, but they have a lot of needs in order to mature the solution and meet the customer's entire wishlist. One example that we use is we've got a relationship with WorkForce Software for time and attendance, so it wasn't something that SAP developed, but it's something that the customers needed and provides high ROI. But, if you go and you look at that solution, you'll look and see that it's directly embedded inside employee central, right on the drop down, so, for the customers, a completely seamless experience, and they can buy that from their SAP account executive. >> So, SAP is installed in a lot of companies, 300 thousand across all industries. >> Right. >> As we move to a digital world, a lot of your customers, a lot of your SAP customers themselves, are starting to envision how software becomes part of their delivery mechanism. >> Right. >> And they're looking at the customers that they serve and saying, I wonder if I can use this software better. Are you startin' to see non-traditional software companies starting to come to you and saying, how can we be part of this program so that we can plug into, or we can enhance, that broad set of solutions for our customers. >> Right. So, look at, everyone likes to talk about Internet of Things, right? So you take a historical business that's asset heavy and, by that I mean, think of like an oil and gas company. You know, traditionally the guys would work out in the field and they didn't carry devices with them. They carried wrenches, (giggles), right? They didn't carry mobile devices that were digitally connected. >> And flasks. (laughter) >> Sometimes. I hope not too often. That's a dangerous line of work. But, if you think about it, now that's changed. Right? They now use the Internet of Things not only to get information back from the field, but they also use it so that when they have to go out and do those repairs, they're getting digital assets that they can see. Now, we have created some relationships, and I'll give you two examples. You'll hear about a relationship that SAP has with OSIsoft, right. They have a well-known reputation for being able to draw that information off Internet of Things, and we've created a link between that and the HANA platform. So that now, you can do that analysis in real-time, because, as you know, HANA is made for the real-time and, if you're going to do Internet of Things, that's the only platform you can really go with. You can't go with, it's not the old batch then analyze later; you need that information happening in real time. That's one example. The other example that I'll give you is you'll see here a Sapphire, you'll see a company called, Utopia. You say, well, alright, I've never heard of this company but they do a unique thing. It's a direct add in into he SAP platform, a solution extension, that allows you to do master data governance around your enterprise assets. And you say, wow, that sounds really complicated. Okay, what is that? This is the ability to look at those documents in a digital way while you're out in the field to understand hey, that bolt there, that needs to be made out of steel, not aluminum, or you're going to have a chemical reaction, for example. That's the kind of thing that can safe lives, save time, and also make the job out in the field easier. And you can't do that just with SAP's software by itself, we need the partners to contribute into that ecosystem and bring that richness there. >> You talked about the rapid adoption of the cloud, in many respects, almost surprising adoption of the cloud. 'Cause you're right, we all knew it was going to happen, many of us didn't necessarily know how fast it was going to be. SAP has a very on-premise and a lot of the programs that SAP put together were initially optimized for that on-premise orientation. >> That's right. >> Are your clients today, when they become part of the SAP extension, or the Solutions Extension program, are they automatically part of both worlds? First off, let me start there. >> Yeah, I mean, it's true that we live in a hybrid world already today. Hybrid happens so quickly. You saw SAP move aggressively forward and acquire some leading cloud companies. >> Yep. >> Right. (mumbles) >> And you did a great job of integrating them, by the way. >> Thank you. I think we did. And I'm really impressed with these properties. I think you saw in the keynote yesterday, a really great representation of some of the leaders of those businesses up there and how tightly they've become part of the SAP family. Now, when you look at Solution Extensions, it mirrors that. We have solutions across all five of the major pillars of the business which, of course, include these cloud properties, and the areas we're seeing the fastest growth, or the most rapid adoption, are in these cloud properties. Because we all went through the era of the best-of-breed became the suite, and then we had the era of the cloud. And if you noticed, when the cloud companies were launched, they were best-of-breed companies and now we're in that period where people want things to move back to the suite because they want integration. >> Or a least at a platform level. >> Sure, because they want efficiency. Efficiency comes from that integration and they get the first round of benefits by moving to the new application in the cloud and they get out of the business of having to operate it themselves. But, then, they want to get back to the business of having that seamlessly integrated with their core operations. So, we live in a hybrid world today but it's clear that the pendulum is moving directly to cloud. >> So are you suggesting to companies that want to be part of the extensions program, that they focus on the cloud first and then everything else second? >> Yes, I would, and here's why. All conversations with customers start with cloud. And they'll look to see if they can do something in the cloud first and it's the default. So, we've really moved past that world where the first conversation's around on-prem and then look to cloud. That changed maybe two to three years ago and today, every conversation starts with the cloud. >> So, I want to go back to that notion of non-traditional software companies creating solutions within the SAP ecosystem for their customers. Do you have companies like that in the extensions program today? >> Well, I think many of these companies are evolving, just like SAP. Now, I tend to deal with the ISVs, so I tend to deal with companies that are in the business of that. But, I will tell you this, what we're seeing with HANA Cloud Platform is exactly what you're talking about. It's that intersection of SAP, our ISV ecosystem, and those non-traditional customers that are, themselves, moving into the digital, and it's that intersection, and you'll see that happen on HCP, where they'll develop applications unique to their own business. I like to remind people this, when we first rolled out our three and then we went to the business suite, companies wrote billions of lines of custom ABAP code to get that system the way they wanted it, in each of these individual companies. Well, as we move to S4, companies are going to revisit what they did to make those systems special and perform just the way they want it to. But they're not going to do that in ABAP, likely. They're likely they're going to do that on HCP, and they're going to build in that platform because that's where they're going to get the integration, that's where they're going to get the benefit of where our ISV ecosystem is headed and tap into the richness of that. So, I think this is why you hear this rebirth of innovation at SAP and it's because it's driven by the customers. That's why we have so many people turn out at Sapphire this week, so much so that even the SAP employees are like, wow, this is really an impressive turnout. >> It's 60,000 plus people, it's one of the most, without question, this is one of most energetic and packed trade shows that I've ever been to. Or customer shows I've ever been to. >> Yes, it's impressive. We're lookin' around here right now and you just, all these, just, bodies. It's incredible. >> Yeah, absolutely. So, if I envision the next couple of years for you, every, we had a partner on yesterday, in fact, and we asked him a couple of pointed questions, as we're asking you, and we asked him, what do you want to see from SAP, as a partner? What would make SAP an even better partner so that you would be that much more willing to tie into the ecosystem? And what they said was, we want to see better road maps to, so that we can see how, where our responsibilities and SAP's responsibilities, our roles and SAP's roles, end. We're still concerned about the platform mentality rolling us. How are you assuaging those ISV concerns about your roadmap as you try to bring even more integrated value into the platform? >> You know, SAP has a brand of trust. And, when you get to road maps, you have to have trust with your partners- who's going to do what. Very clear and transparent conversations. I've seen a lot of maturity from SAP really in the last six to eight months being much more diligent in how they're planning their road maps and how they're involving partners in those road maps. I'll give you an example. You know, Wieland Schriener, who really leads some of the development around S4, in particular, as it relates to initiative that we work on with open text. That's one of our largest partners inside Solution Extensions. We have, right now, about 19 million users who have purchased that through SAP so, really, an incredible relationship, unique in the industry, that we have with them. As they, as we launch S4 and as we push it out into the marketplace, we've seamlessly integrated the open-text capabilities around unstructured content into S4. And, that's happened through the leadership of our development team. By making commitments like that. Weiland presented that on the partner summit on Monday to all the partners in there, really as a message out to them to say, this is how SAP is going to do business in the future with our ISVs and our partners. And it, and we're moving at such a pace it requires that level of coordination. Right? We can't just let it to chance. Or, we can't let it be ambiguous. We have to be clear about we're going to build this and we're expecting our partners to step up here, so that that dance happens the way it should happen. I do respect though, that the partners have that concern, 'cause it's a legacy. >> They're always going to have the concern, but a big piece of it is going to be how well do you share and how well do you work together. >> Yeah. >> Hey, Tom, thank you very much. Tom Roberts, Global Vice President, SAP Solutions Extension program. Thank you very much for being here as part of this great show, talkin' about partnerships and the evolution of the SAP platform and SAP the company. This is theCUBE, we're going to be back shortly with more from Sapphire. (upbeat music) (slow tempo music) >> Voiceover: There'll be millions of people in the near future that are, want to be involved in their own personal well-being and in wellness. Nobody--

Published Date : May 19 2016

SUMMARY :

the leader in platform as a service, that customers face as they try to and help get that total solution in place. So, it's your almost Over the course of our the company that kind of, that the industry has in a lot of companies, are starting to envision how software the customers that they serve and they didn't carry devices with them. And flasks. This is the ability to and a lot of the programs of the SAP extension, that we live in a hybrid Right. And you did a great job of and the areas we're but it's clear that the pendulum and then look to cloud. in the extensions program today? that are in the business of that. it's one of the most, right now and you just, so that you would be really in the last six to eight months and how well do you work together. and the evolution of the SAP in the near future that are,

ENTITIES

Entity	Category	Confidence
Tom Roberts	PERSON	0.99+
Peter	PERSON	0.99+
Tom	PERSON	0.99+
Peter Burris	PERSON	0.99+
Utopia	ORGANIZATION	0.99+
two days	QUANTITY	0.99+
Console, Inc.	ORGANIZATION	0.99+
Monday	DATE	0.99+
yesterday	DATE	0.99+
five	QUANTITY	0.99+
Today	DATE	0.99+
each	QUANTITY	0.99+
today	DATE	0.99+
300 thousand	QUANTITY	0.99+
tomorrow	DATE	0.99+
two examples	QUANTITY	0.99+
three	QUANTITY	0.99+
Orlando, Florida	LOCATION	0.99+
one example	QUANTITY	0.99+
SAP	ORGANIZATION	0.99+
Wieland Schriener	PERSON	0.99+
two things	QUANTITY	0.99+
HANA	TITLE	0.99+
second thing	QUANTITY	0.99+
OSIsoft	ORGANIZATION	0.99+
one	QUANTITY	0.99+
First	QUANTITY	0.99+
billions	QUANTITY	0.99+
first round	QUANTITY	0.99+
first	QUANTITY	0.98+
Sapphire	ORGANIZATION	0.98+
two	DATE	0.98+
first conversation	QUANTITY	0.98+
about 19 million users	QUANTITY	0.98+
three years ago	DATE	0.98+
One example	QUANTITY	0.98+
this week	DATE	0.98+
millions	QUANTITY	0.98+
both worlds	QUANTITY	0.97+
60,000 plus people	QUANTITY	0.97+
HANA Cloud Platform	TITLE	0.97+
S4	TITLE	0.96+
SuccessFactors	ORGANIZATION	0.94+
SAP Sapphire	ORGANIZATION	0.93+
second	QUANTITY	0.93+
two	QUANTITY	0.91+
WorkForce Software	ORGANIZATION	0.9+
eight months	QUANTITY	0.89+
SAP	TITLE	0.88+
theCUBE	ORGANIZATION	0.81+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Chuck Roberts: