Joseph Nelson, Roboflow | AWS Startup Showcase
(chill electronic music) >> Hello everyone, welcome to theCUBE's presentation of the AWS Startups Showcase, AI and machine learning, the top startups building generative AI on AWS. This is the season three, episode one of the ongoing series covering the exciting startups from the AWS ecosystem, talk about AI and machine learning. Can't believe it's three years and season one. I'm your host, John Furrier. Got a great guest today, we're joined by Joseph Nelson, the co-founder and CEO of Roboflow, doing some cutting edge stuff around computer vision and really at the front end of this massive wave coming around, large language models, computer vision. The next gen AI is here, and it's just getting started. We haven't even scratched a service. Thanks for joining us today. >> Thanks for having me. >> So you got to love the large language model, foundation models, really educating the mainstream world. ChatGPT has got everyone in the frenzy. This is educating the world around this next gen AI capabilities, enterprise, image and video data, all a big part of it. I mean the edge of the network, Mobile World Conference is happening right now, this month, and it's just ending up, it's just continue to explode. Video is huge. So take us through the company, do a quick explanation of what you guys are doing, when you were founded. Talk about what the company's mission is, and what's your North Star, why do you exist? >> Yeah, Roboflow exists to really kind of make the world programmable. I like to say make the world be read and write access. And our North Star is enabling developers, predominantly, to build that future. If you look around, anything that you see will have software related to it, and can kind of be turned into software. The limiting reactant though, is how to enable computers and machines to understand things as well as people can. And in a lot of ways, computer vision is that missing element that enables anything that you see to become software. So in the virtue of, if software is eating the world, computer vision kind of makes the aperture infinitely wide. It's something that I kind of like, the way I like to frame it. And the capabilities are there, the open source models are there, the amount of data is there, the computer capabilities are only improving annually, but there's a pretty big dearth of tooling, and an early but promising sign of the explosion of use cases, models, and data sets that companies, developers, hobbyists alike will need to bring these capabilities to bear. So Roboflow is in the game of building the community around that capability, building the use cases that allow developers and enterprises to use computer vision, and providing the tooling for companies and developers to be able to add computer vision, create better data sets, and deploy to production, quickly, easily, safely, invaluably. >> You know, Joseph, the word in production is actually real now. You're seeing a lot more people doing in production activities. That's a real hot one and usually it's slower, but it's gone faster, and I think that's going to be more the same. And I think the parallel between what we're seeing on the large language models coming into computer vision, and as you mentioned, video's data, right? I mean we're doing video right now, we're transcribing it into a transcript, linking up to your linguistics, times and the timestamp, I mean everything's data and that really kind of feeds. So this connection between what we're seeing, the large language and computer vision are coming together kind of cousins, brothers. I mean, how would you compare, how would you explain to someone, because everyone's like on this wave of watching people bang out their homework assignments, and you know, write some hacks on code with some of the open AI technologies, there is a corollary directly related to to the vision side. Can you explain? >> Yeah, the rise of large language models are showing what's possible, especially with text, and I think increasingly will get multimodal as the images and video become ingested. Though there's kind of this still core missing element of basically like understanding. So the rise of large language models kind of create this new area of generative AI, and generative AI in the context of computer vision is a lot of, you know, creating video and image assets and content. There's also this whole surface area to understanding what's already created. Basically digitizing physical, real world things. I mean the Metaverse can't be built if we don't know how to mirror or create or identify the objects that we want to interact with in our everyday lives. And where computer vision comes to play in, especially what we've seen at Roboflow is, you know, a little over a hundred thousand developers now have built with our tools. That's to the tune of a hundred million labeled open source images, over 10,000 pre-trained models. And they've kind of showcased to us all of the ways that computer vision is impacting and bringing the world to life. And these are things that, you know, even before large language models and generative AI, you had pretty impressive capabilities, and when you add the two together, it actually unlocks these kind of new capabilities. So for example, you know, one of our users actually powers the broadcast feeds at Wimbledon. So here we're talking about video, we're streaming, we're doing things live, we've got folks that are cropping and making sure we look good, and audio/visual all plugged in correctly. When you broadcast Wimbledon, you'll notice that the camera controllers need to do things like track the ball, which is moving at extremely high speeds and zoom crop, pan tilt, as well as determine if the ball bounced in or out. The very controversial but critical key to a lot of tennis matches. And a lot of that has been historically done with the trained, but fallible human eye and computer vision is, you know, well suited for this task to say, how do we track, pan, tilt, zoom, and see, track the tennis ball in real time, run at 30 plus frames per second, and do it all on the edge. And those are capabilities that, you know, were kind of like science fiction, maybe even a decade ago, and certainly five years ago. Now the interesting thing, is that with the advent of of generative AI, you can start to do things like create your own training data sets, or kind of create logic around once you have this visual input. And teams at Tesla have actually been speaking about, of course the autopilot team's focused on doing vision tasks, but they've combined large language models to add reasoning and logic. So given that you see, let's say the tennis ball, what do you want to do? And being able to combine the capabilities of what LLM's represent, which is really a lot of basically, core human reasoning and logic, with computer vision for the inputs of what's possible, creates these new capabilities, let alone multimodality, which I'm sure we'll talk more about. >> Yeah, and it's really, I mean it's almost intoxicating. It's amazing that this is so capable because the cloud scales here, you got the edge developing, you can decouple compute power, and let Moore's law and all the new silicone and the processors and the GPUs do their thing, and you got open source booming. You're kind of getting at this next segment I wanted to get into, which is the, how people should be thinking about these advances of the computer vision. So this is now a next wave, it's here. I mean I'd love to have that for baseball because I'm always like, "Oh, it should have been a strike." I'm sure that's going to be coming soon, but what is the computer vision capable of doing today? I guess that's my first question. You hit some of it, unpack that a little bit. What does general AI mean in computer vision? What's the new thing? Because there are old technology's been around, proprietary, bolted onto hardware, but hardware advances at a different pace, but now you got new capabilities, generative AI for vision, what does that mean? >> Yeah, so computer vision, you know, at its core is basically enabling machines, computers, to understand, process, and act on visual data as effective or more effective than people can. Traditionally this has been, you know, task types like classification, which you know, identifying if a given image belongs in a certain category of goods on maybe a retail site, is the shoes or is it clothing? Or object detection, which is, you know, creating bounding boxes, which allows you to do things like count how many things are present, or maybe measure the speed of something, or trigger an alert when something becomes visible in frame that wasn't previously visible in frame, or instant segmentation where you're creating pixel wise segmentations for both instance and semantic segmentation, where you often see these kind of beautiful visuals of the polygon surrounding objects that you see. Then you have key point detection, which is where you see, you know, athletes, and each of their joints are kind of outlined is another more traditional type problem in signal processing and computer vision. With generative AI, you kind of get a whole new class of problem types that are opened up. So in a lot of ways I think about generative AI in computer vision as some of the, you know, problems that you aimed to tackle, might still be better suited for one of the previous task types we were discussing. Some of those problem types may be better suited for using a generative technique, and some are problem types that just previously wouldn't have been possible absent generative AI. And so if you make that kind of Venn diagram in your head, you can think about, okay, you know, visual question answering is a task type where if I give you an image and I say, you know, "How many people are in this image?" We could either build an object detection model that might count all those people, or maybe a visual question answering system would sufficiently answer this type of problem. Let alone generative AI being able to create new training data for old systems. And that's something that we've seen be an increasingly prominent use case for our users, as much as things that we advise our customers and the community writ large to take advantage of. So ultimately those are kind of the traditional task types. I can give you some insight, maybe, into how I think about what's possible today, or five years or ten years as you sort go back. >> Yes, definitely. Let's get into that vision. >> So I kind of think about the types of use cases in terms of what's possible. If you just imagine a very simple bell curve, your normal distribution, for the longest time, the types of things that are in the center of that bell curve are identifying objects that are very common or common objects in context. Microsoft published the COCO Dataset in 2014 of common objects and contexts, of hundreds of thousands of images of chairs, forks, food, person, these sorts of things. And you know, the challenge of the day had always been, how do you identify just those 80 objects? So if we think about the bell curve, that'd be maybe the like dead center of the curve, where there's a lot of those objects present, and it's a very common thing that needs to be identified. But it's a very, very, very small sliver of the distribution. Now if you go out to the way long tail, let's go like deep into the tail of this imagined visual normal distribution, you're going to have a problem like one of our customers, Rivian, in tandem with AWS, is tackling, to do visual quality assurance and manufacturing in production processes. Now only Rivian knows what a Rivian is supposed to look like. Only they know the imagery of what their goods that are going to be produced are. And then between those long tails of proprietary data of highly specific things that need to be understood, in the center of the curve, you have a whole kind of messy middle, type of problems I like to say. The way I think about computer vision advancing, is it's basically you have larger and larger and more capable models that eat from the center out, right? So if you have a model that, you know, understands the 80 classes in COCO, well, pretty soon you have advances like Clip, which was trained on 400 million image text pairs, and has a greater understanding of a wider array of objects than just 80 classes in context. And over time you'll get more and more of these larger models that kind of eat outwards from that center of the distribution. And so the question becomes for companies, when can you rely on maybe a model that just already exists? How do you use your data to get what may be capable off the shelf, so to speak, into something that is usable for you? Or, if you're in those long tails and you have proprietary data, how do you take advantage of the greatest asset you have, which is observed visual information that you want to put to work for your customers, and you're kind of living in the long tails, and you need to adapt state of the art for your capabilities. So my mental model for like how computer vision advances is you have that bell curve, and you have increasingly powerful models that eat outward. And multimodality has a role to play in that, larger models have a role to play in that, more compute, more data generally has a role to play in that. But it will be a messy and I think long condition. >> Well, the thing I want to get, first of all, it's great, great mental model, I appreciate that, 'cause I think that makes a lot of sense. The question is, it seems now more than ever, with the scale and compute that's available, that not only can you eat out to the middle in your example, but there's other models you can integrate with. In the past there was siloed, static, almost bespoke. Now you're looking at larger models eating into the bell curve, as you said, but also integrating in with other stuff. So this seems to be part of that interaction. How does, first of all, is that really happening? Is that true? And then two, what does that mean for companies who want to take advantage of this? Because the old model was operational, you know? I have my cameras, they're watching stuff, whatever, and like now you're in this more of a, distributed computing, computer science mindset, not, you know, put the camera on the wall kind of- I'm oversimplifying, but you know what I'm saying. What's your take on that? >> Well, to the first point of, how are these advances happening? What I was kind of describing was, you know, almost uni-dimensional in that you have like, you're only thinking about vision, but the rise of generative techniques and multi-modality, like Clip is a multi-modal model, it has 400 million image text pairs. That will advance the generalizability at a faster rate than just treating everything as only vision. And that's kind of where LLMs and vision will intersect in a really nice and powerful way. Now in terms of like companies, how should they be thinking about taking advantage of these trends? The biggest thing that, and I think it's different, obviously, on the size of business, if you're an enterprise versus a startup. The biggest thing that I think if you're an enterprise, and you have an established scaled business model that is working for your customers, the question becomes, how do you take advantage of that established data moat, potentially, resource moats, and certainly, of course, establish a way of providing value to an end user. So for example, one of our customers, Walmart, has the advantage of one of the largest inventory and stock of any company in the world. And they also of course have substantial visual data, both from like their online catalogs, or understanding what's in stock or out of stock, or understanding, you know, the quality of things that they're going from the start of their supply chain to making it inside stores, for delivery of fulfillments. All these are are visual challenges. Now they already have a substantial trove of useful imagery to understand and teach and train large models to understand each of the individual SKUs and products that are in their stores. And so if I'm a Walmart, what I'm thinking is, how do I make sure that my petabytes of visual information is utilized in a way where I capture the proprietary benefit of the models that I can train to do tasks like, what item was this? Or maybe I'm going to create AmazonGo-like technology, or maybe I'm going to build like delivery robots, or I want to automatically know what's in and out of stock from visual input fees that I have across my in-store traffic. And that becomes the question and flavor of the day for enterprises. I've got this large amount of data, I've got an established way that I can provide more value to my own customers. How do I ensure I take advantage of the data advantage I'm already sitting on? If you're a startup, I think it's a pretty different question, and I'm happy to talk about. >> Yeah, what's startup angle on this? Because you know, they're going to want to take advantage. It's like cloud startups, cloud native startups, they were born in the cloud, they never had an IT department. So if you're a startup, is there a similar role here? And if I'm a computer vision startup, what's that mean? So can you share your your take on that, because there'll be a lot of people starting up from this. >> So the startup on the opposite advantage and disadvantage, right? Like a startup doesn't have an proven way of delivering repeatable value in the same way that a scaled enterprise does. But it does have the nimbleness to identify and take advantage of techniques that you can start from a blank slate. And I think the thing that startups need to be wary of in the generative AI enlarged language model, in multimodal world, is building what I like to call, kind of like sandcastles. A sandcastle is maybe a business model or a capability that's built on top of an assumption that is going to be pretty quickly wiped away by improving underlying model technology. So almost like if you imagine like the ocean, the waves are coming in, and they're going to wipe away your progress. You don't want to be in the position of building sandcastle business where, you don't want to bet on the fact that models aren't going to get good enough to solve the task type that you might be solving. In other words, don't take a screenshot of what's capable today. Assume that what's capable today is only going to continue to become possible. And so for a startup, what you can do, that like enterprises are quite comparatively less good at, is embedding these capabilities deeply within your products and delivering maybe a vertical based experience, where AI kind of exists in the background. >> Yeah. >> And we might not think of companies as, you know, even AI companies, it's just so embedded in the experience they provide, but that's like the vertical application example of taking AI and making it be immediately usable. Or, of course there's tons of picks and shovels businesses to be built like Roboflow, where you're enabling these enterprises to take advantage of something that they have, whether that's their data sets, their computes, or their intellect. >> Okay, so if I hear that right, by the way, I love, that's horizontally scalable, that's the large language models, go up and build them the apps, hence your developer focus. I'm sure that's probably the reason that the tsunami of developer's action. So you're saying picks and shovels tools, don't try to replicate the platform of what could be the platform. Oh, go to a VC, I'm going to build a platform. No, no, no, no, those are going to get wiped away by the large language models. Is there one large language model that will rule the world, or do you see many coming? >> Yeah, so to be clear, I think there will be useful platforms. I just think a lot of people think that they're building, let's say, you know, if we put this in the cloud context, you're building a specific type of EC2 instance. Well, it turns out that Amazon can offer that type of EC2 instance, and immediately distribute it to all of their customers. So you don't want to be in the position of just providing something that actually ends up looking like a feature, which in the context of AI, might be like a small incremental improvement on the model. If that's all you're doing, you're a sandcastle business. Now there's a lot of platform businesses that need to be built that enable businesses to get to value and do things like, how do I monitor my models? How do I create better models with my given data sets? How do I ensure that my models are doing what I want them to do? How do I find the right models to use? There's all these sorts of platform wide problems that certainly exist for businesses. I just think a lot of startups that I'm seeing right now are making the mistake of assuming the advances we're seeing are not going to accelerate or even get better. >> So if I'm a customer, if I'm a company, say I'm a startup or an enterprise, either one, same question. And I want to stand up, and I have developers working on stuff, I want to start standing up an environment to start doing stuff. Is that a service provider? Is that a managed service? Is that you guys? So how do you guys fit into your customers leaning in? Is it just for developers? Are you targeting with a specific like managed service? What's the product consumption? How do you talk to customers when they come to you? >> The thing that we do is enable, we give developers superpowers to build automated inventory tracking, self-checkout systems, identify if this image is malignant cancer or benign cancer, ensure that these products that I've produced are correct. Make sure that that the defect that might exist on this electric vehicle makes its way back for review. All these sorts of problems are immediately able to be solved and tackled. In terms of the managed services element, we have solutions as integrators that will often build on top of our tools, or we'll have companies that look to us for guidance, but ultimately the company is in control of developing and building and creating these capabilities in house. I really think the distinction is maybe less around managed service and tool, and more around ownership in the era of AI. So for example, if I'm using a managed service, in that managed service, part of their benefit is that they are learning across their customer sets, then it's a very different relationship than using a managed service where I'm developing some amount of proprietary advantages for my data sets. And I think that's a really important thing that companies are becoming attuned to, just the value of the data that they have. And so that's what we do. We tell companies that you have this proprietary, immense treasure trove of data, use that to your advantage, and think about us more like a set of tools that enable you to get value from that capability. You know, the HashiCorp's and GitLab's of the world have proven like what these businesses look like at scale. >> And you're targeting developers. When you go into a company, do you target developers with freemium, is there a paid service? Talk about the business model real quick. >> Sure, yeah. The tools are free to use and get started. When someone signs up for Roboflow, they may elect to make their work open source, in which case we're able to provide even more generous usage limits to basically move the computer vision community forward. If you elect to make your data private, you can use our hosted data set managing, data set training, model deployment, annotation tooling up to some limits. And then usually when someone validates that what they're doing gets them value, they purchase a subscription license to be able to scale up those capabilities. So like most developer centric products, it's free to get started, free to prove, free to poke around, develop what you think is possible. And then once you're getting to value, then we're able to capture the commercial upside in the value that's being provided. >> Love the business model. It's right in line with where the market is. There's kind of no standards bodies these days. The developers are the ones who are deciding kind of what the standards are by their adoption. I think making that easy for developers to get value as the model open sources continuing to grow, you can see more of that. Great perspective Joseph, thanks for sharing that. Put a plug in for the company. What are you guys doing right now? Where are you in your growth? What are you looking for? How should people engage? Give the quick commercial for the company. >> So as I mentioned, Roboflow is I think one of the largest, if not the largest collections of computer vision models and data sets that are open source, available on the web today, and have a private set of tools that over half the Fortune 100 now rely on those tools. So we're at the stage now where we know people want what we're working on, and we're continuing to drive that type of adoption. So companies that are looking to make better models, improve their data sets, train and deploy, often will get a lot of value from our tools, and certainly reach out to talk. I'm sure there's a lot of talented engineers that are tuning in too, we're aggressively hiring. So if you are interested in being a part of making the world programmable, and being at the ground floor of the company that's creating these capabilities to be writ large, we'd love to hear from you. >> Amazing, Joseph, thanks so much for coming on and being part of the AWS Startup Showcase. Man, if I was in my twenties, I'd be knocking on your door, because it's the hottest trend right now, it's super exciting. Generative AI is just the beginning of massive sea change. Congratulations on all your success, and we'll be following you guys. Thanks for spending the time, really appreciate it. >> Thanks for having me. >> Okay, this is season three, episode one of the ongoing series covering the exciting startups from the AWS ecosystem, talking about the hottest things in tech. I'm John Furrier, your host. Thanks for watching. (chill electronic music)
SUMMARY :
of the AWS Startups Showcase, of what you guys are doing, of the explosion of use and you know, write some hacks on code and do it all on the edge. and the processors and of the traditional task types. Let's get into that vision. the greatest asset you have, eating into the bell curve, as you said, and flavor of the day for enterprises. So can you share your your take on that, that you can start from a blank slate. but that's like the that right, by the way, How do I find the right models to use? Is that you guys? and GitLab's of the world Talk about the business model real quick. in the value that's being provided. The developers are the that over half the Fortune and being part of the of the ongoing series
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Joseph Nelson | PERSON | 0.99+ |
Joseph | PERSON | 0.99+ |
Walmart | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
Tesla | ORGANIZATION | 0.99+ |
400 million | QUANTITY | 0.99+ |
2014 | DATE | 0.99+ |
80 objects | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
three years | QUANTITY | 0.99+ |
ten years | QUANTITY | 0.99+ |
80 classes | QUANTITY | 0.99+ |
first question | QUANTITY | 0.99+ |
five years | QUANTITY | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
two | QUANTITY | 0.99+ |
Roboflow | ORGANIZATION | 0.99+ |
Wimbledon | EVENT | 0.99+ |
today | DATE | 0.98+ |
both | QUANTITY | 0.98+ |
five years ago | DATE | 0.98+ |
GitLab | ORGANIZATION | 0.98+ |
one | QUANTITY | 0.98+ |
North Star | ORGANIZATION | 0.98+ |
first point | QUANTITY | 0.97+ |
each | QUANTITY | 0.97+ |
over 10,000 pre-trained models | QUANTITY | 0.97+ |
a decade ago | DATE | 0.97+ |
Rivian | ORGANIZATION | 0.97+ |
Mobile World Conference | EVENT | 0.95+ |
over a hundred thousand developers | QUANTITY | 0.94+ |
EC2 | TITLE | 0.94+ |
this month | DATE | 0.93+ |
season one | QUANTITY | 0.93+ |
30 plus frames per second | QUANTITY | 0.93+ |
twenties | QUANTITY | 0.93+ |
sandcastle | ORGANIZATION | 0.9+ |
HashiCorp | ORGANIZATION | 0.89+ |
theCUBE | ORGANIZATION | 0.88+ |
hundreds of thousands | QUANTITY | 0.87+ |
wave | EVENT | 0.87+ |
North Star | ORGANIZATION | 0.86+ |
400 million image text pairs | QUANTITY | 0.78+ |
season three | QUANTITY | 0.78+ |
episode one | QUANTITY | 0.76+ |
AmazonGo | ORGANIZATION | 0.76+ |
over half | QUANTITY | 0.69+ |
a hundred million | QUANTITY | 0.68+ |
Startup Showcase | EVENT | 0.66+ |
Fortune 100 | TITLE | 0.66+ |
COCO | TITLE | 0.65+ |
Roboflow | PERSON | 0.6+ |
ChatGPT | ORGANIZATION | 0.58+ |
Dataset | TITLE | 0.53+ |
Moore | PERSON | 0.5+ |
COCO | ORGANIZATION | 0.39+ |
Joseph Nelson, Roboflow | Cube Conversation
(gentle music) >> Hello everyone. Welcome to this CUBE conversation here in Palo Alto, California. I'm John Furrier, host of theCUBE. We got a great remote guest coming in. Joseph Nelson, co-founder and CEO of RoboFlow hot startup in AI, computer vision. Really interesting topic in this wave of AI next gen hitting. Joseph, thanks for coming on this CUBE conversation. >> Thanks for having me. >> Yeah, I love the startup tsunami that's happening here in this wave. RoboFlow, you're in the middle of it. Exciting opportunities, you guys are in the cutting edge. I think computer vision's been talked about more as just as much as the large language models and these foundational models are merging. You're in the middle of it. What's it like right now as a startup and growing in this new wave hitting? >> It's kind of funny, it's, you know, I kind of describe it like sometimes you're in a garden of gnomes. It's like we feel like we've got this giant headstart with hundreds of thousands of people building with computer vision, training their own models, but that's a fraction of what it's going to be in six months, 12 months, 24 months. So, as you described it, a wave is a good way to think about it. And the wave is still building before it gets to its full size. So it's a ton of fun. >> Yeah, I think it's one of the most exciting areas in computer science. I wish I was in my twenties again, because I would be all over this. It's the intersection, there's so many disciplines, right? It's not just tech computer science, it's computer science, it's systems, it's software, it's data. There's so much aperture of things going on around your world. So, I mean, you got to be batting all the students away kind of trying to get hired in there, probably. I can only imagine you're hiring regiment. I'll ask that later, but first talk about what the company is that you're doing. How it's positioned, what's the market you're going after, and what's the origination story? How did you guys get here? How did you just say, hey, want to do this? What was the origination story? What do you do and how did you start the company? >> Yeah, yeah. I'll give you the what we do today and then I'll shift into the origin. RoboFlow builds tools for making the world programmable. Like anything that you see should be read write access if you think about it with a programmer's mind or legible. And computer vision is a technology that enables software to be added to these real world objects that we see. And so any sort of interface, any sort of object, any sort of scene, we can interact with it, we can make it more efficient, we can make it more entertaining by adding the ability for the tools that we use and the software that we write to understand those objects. And at RoboFlow, we've empowered a little over a hundred thousand developers, including those in half the Fortune 100 so far in that mission. Whether that's Walmart understanding the retail in their stores, Cardinal Health understanding the ways that they're helping their patients, or even electric vehicle manufacturers ensuring that they're making the right stuff at the right time. As you mentioned, it's early. Like I think maybe computer vision has touched one, maybe 2% of the whole economy and it'll be like everything in a very short period of time. And so we're focused on enabling that transformation. I think it's it, as far as I think about it, I've been fortunate to start companies before, start, sell these sorts of things. This is the last company I ever wanted to start and I think it will be, should we do it right, the world's largest in riding the wave of bringing together the disparate pieces of that technology. >> What was the motivating point of the formation? Was it, you know, you guys were hanging around? Was there some catalyst? What was the moment where it all kind of came together for you? >> You know what's funny is my co-founder, Brad and I, we were making computer vision apps for making board games more fun to play. So in 2017, Apple released AR kit, augmented reality kit for building augmented reality applications. And Brad and I are both sort of like hacker persona types. We feel like we don't really understand the technology until we build something with it and so we decided that we should make an app that if you point your phone at a Sudoku puzzle, it understands the state of the board and then it kind of magically fills in that experience with all the digits in real time, which totally ruins the game of Sudoku to be clear. But it also just creates this like aha moment of like, oh wow, like the ability for our pocket devices to understand and see the world as good or better than we can is possible. And so, you know, we actually did that as I mentioned in 2017, and the app went viral. It was, you know, top of some subreddits, top of Injure, Reddit, the hacker community as well as Product Hunt really liked it. So it actually won Product Hunt AR app of the year, which was the same year that the Tesla model three won the product of the year. So we joked that we share an award with Elon our shared (indistinct) But frankly, so that was 2017. RoboFlow wasn't incorporated as a business until 2019. And so, you know, when we made Magic Sudoku, I was running a different company at the time, Brad was running a different company at the time, and we kind of just put it out there and were excited by how many people liked it. And we assumed that other curious developers would see this inevitable future of, oh wow, you know. This is much more than just a pedestrian point your phone at a board game. This is everything can be seen and understood and rewritten in a different way. Things like, you know, maybe your fridge. Knowing what ingredients you have and suggesting recipes or auto ordering for you, or we were talking about some retail use cases of automated checkout. Like anything can be seen and observed and we presume that that would kick off a Cambrian explosion of applications. It didn't. So you fast forward to 2019, we said, well we might as well be the guys to start to tackle this sort of problem. And because of our success with board games before, we returned to making more board game solving applications. So we made one that solves Boggle, you know, the four by four word game, we made one that solves chess, you point your phone at a chess board and it understands the state of the board and then can make move recommendations. And each additional board game that we added, we realized that the tooling was really immature. The process of collecting images, knowing which images are actually going to be useful for improving model performance, training those models, deploying those models. And if we really wanted to make the world programmable, developers waiting for us to make an app for their thing of interest is a lot less efficient, less impactful than taking our tool chain and releasing that externally. And so, that's what RoboFlow became. RoboFlow became the internal tools that we used to make these game changing applications readily available. And as you know, when you give developers new tools, they create new billion dollar industries, let alone all sorts of fun hobbyist projects along the way. >> I love that story. Curious, inventive, little radical. Let's break the rules, see how we can push the envelope on the board games. That's how companies get started. It's a great story. I got to ask you, okay, what happens next? Now, okay, you realize this new tooling, but this is like how companies get built. Like they solve their own problem that they had 'cause they realized there's one, but then there has to be a market for it. So you actually guys knew that this was coming around the corner. So okay, you got your hacker mentality, you did that thing, you got the award and now you're like, okay, wow. Were you guys conscious of the wave coming? Was it one of those things where you said, look, if we do this, we solve our own problem, this will be big for everybody. Did you have that moment? Was that in 2019 or was that more of like, it kind of was obvious to you guys? >> Absolutely. I mean Brad puts this pretty effectively where he describes how we lived through the initial internet revolution, but we were kind of too young to really recognize and comprehend what was happening at the time. And then mobile happened and we were working on different companies that were not in the mobile space. And computer vision feels like the wave that we've caught. Like, this is a technology and capability that rewrites how we interact with the world, how everyone will interact with the world. And so we feel we've been kind of lucky this time, right place, right time of every enterprise will have the ability to improve their operations with computer vision. And so we've been very cognizant of the fact that computer vision is one of those groundbreaking technologies that every company will have as a part of their products and services and offerings, and we can provide the tooling to accelerate that future. >> Yeah, and the developer angle, by the way, I love that because I think, you know, as we've been saying in theCUBE all the time, developer's the new defacto standard bodies because what they adopt is pure, you know, meritocracy. And they pick the best. If it's sell service and it's good and it's got open source community around it, its all in. And they'll vote. They'll vote with their code and that is clear. Now I got to ask you, as you look at the market, we were just having this conversation on theCUBE in Barcelona at recent Mobile World Congress, now called MWC, around 5G versus wifi. And the debate was specifically computer vision, like facial recognition. We were talking about how the Cleveland Browns were using facial recognition for people coming into the stadium they were using it for ships in international ports. So the question was 5G versus wifi. My question is what infrastructure or what are the areas that need to be in place to make computer vision work? If you have developers building apps, apps got to run on stuff. So how do you sort that out in your mind? What's your reaction to that? >> A lot of the times when we see applications that need to run in real time and on video, they'll actually run at the edge without internet. And so a lot of our users will actually take their models and run it in a fully offline environment. Now to act on that information, you'll often need to have internet signal at some point 'cause you'll need to know how many people were in the stadium or what shipping crates are in my port at this point in time. You'll need to relay that information somewhere else, which will require connectivity. But actually using the model and creating the insights at the edge does not require internet. I mean we have users that deploy models on underwater submarines just as much as in outer space actually. And those are not very friendly environments to internet, let alone 5g. And so what you do is you use an edge device, like an Nvidia Jetson is common, mobile devices are common. Intel has some strong edge devices, the Movidius family of chips for example. And you use that compute that runs completely offline in real time to process those signals. Now again, what you do with those signals may require connectivity and that becomes a question of the problem you're solving of how soon you need to relay that information to another place. >> So, that's an architectural issue on the infrastructure. If you're a tactical edge war fighter for instance, you might want to have highly available and maybe high availability. I mean, these are words that mean something. You got storage, but it's not at the edge in real time. But you can trickle it back and pull it down. That's management. So that's more of a business by business decision or environment, right? >> That's right, that's right. Yeah. So I mean we can talk through some specifics. So for example, the RoboFlow actually powers the broadcaster that does the tennis ball tracking at Wimbledon. That runs completely at the edge in real time in, you know, technically to track the tennis ball and point the camera, you actually don't need internet. Now they do have internet of course to do the broadcasting and relay the signal and feeds and these sorts of things. And so that's a case where you have both edge deployment of running the model and high availability act on that model. We have other instances where customers will run their models on drones and the drone will go and do a flight and it'll say, you know, this many residential homes are in this given area, or this many cargo containers are in this given shipping yard. Or maybe we saw these environmental considerations of soil erosion along this riverbank. The model in that case can run on the drone during flight without internet, but then you only need internet once the drone lands and you're going to act on that information because for example, if you're doing like a study of soil erosion, you don't need to be real time. You just need to be able to process and make use of that information once the drone finishes its flight. >> Well I can imagine a zillion use cases. I heard of a use case interview at a company that does computer vision to help people see if anyone's jumping the fence on their company. Like, they know what a body looks like climbing a fence and they can spot it. Pretty easy use case compared to probably some of the other things, but this is the horizontal use cases, its so many use cases. So how do you guys talk to the marketplace when you say, hey, we have generative AI for commuter vision. You might know language models that's completely different animal because vision's like the world, right? So you got a lot more to do. What's the difference? How do you explain that to customers? What can I build and what's their reaction? >> Because we're such a developer centric company, developers are usually creative and show you the ways that they want to take advantage of new technologies. I mean, we've had people use things for identifying conveyor belt debris, doing gas leak detection, measuring the size of fish, airplane maintenance. We even had someone that like a hobby use case where they did like a specific sushi identifier. I dunno if you know this, but there's a specific type of whitefish that if you grew up in the western hemisphere and you eat it in the eastern hemisphere, you get very sick. And so there was someone that made an app that tells you if you happen to have that fish in the sushi that you're eating. But security camera analysis, transportation flows, plant disease detection, really, you know, smarter cities. We have people that are doing curb management identifying, and a lot of these use cases, the fantastic thing about building tools for developers is they're a creative bunch and they have these ideas that if you and I sat down for 15 minutes and said, let's guess every way computer vision can be used, we would need weeks to list all the example use cases. >> We'd miss everything. >> And we'd miss. And so having the community show us the ways that they're using computer vision is impactful. Now that said, there are of course commercial industries that have discovered the value and been able to be out of the gate. And that's where we have the Fortune 100 customers, like we do. Like the retail customers in the Walmart sector, healthcare providers like Medtronic, or vehicle manufacturers like Rivian who all have very difficult either supply chain, quality assurance, in stock, out of stock, anti-theft protection considerations that require successfully making sense of the real world. >> Let me ask you a question. This is maybe a little bit in the weeds, but it's more developer focused. What are some of the developer profiles that you're seeing right now in terms of low-hanging fruit applications? And can you talk about the academic impact? Because I imagine if I was in school right now, I'd be all over it. Are you seeing Master's thesis' being worked on with some of your stuff? Is the uptake in both areas of younger pre-graduates? And then inside the workforce, What are some of the devs like? Can you share just either what their makeup is, what they work on, give a little insight into the devs you're working with. >> Leading developers that want to be on state-of-the-art technology build with RoboFlow because they know they can use the best in class open source. They know that they can get the most out of their data. They know that they can deploy extremely quickly. That's true among students as you mentioned, just as much as as industries. So we welcome students and I mean, we have research grants that will regularly support for people to publish. I mean we actually have a channel inside our internal slack where every day, more student publications that cite building with RoboFlow pop up. And so, that helps inspire some of the use cases. Now what's interesting is that the use case is relatively, you know, useful or applicable for the business or the student. In other words, if a student does a thesis on how to do, we'll say like shingle damage detection from satellite imagery and they're just doing that as a master's thesis, in fact most insurance businesses would be interested in that sort of application. So, that's kind of how we see uptick and adoption both among researchers who want to be on the cutting edge and publish, both with RoboFlow and making use of open source tools in tandem with the tool that we provide, just as much as industry. And you know, I'm a big believer in the philosophy that kind of like what the hackers are doing nights and weekends, the Fortune 500 are doing in a pretty short order period of time and we're experiencing that transition. Computer vision used to be, you know, kind of like a PhD, multi-year investment endeavor. And now with some of the tooling that we're working on in open source technologies and the compute that's available, these science fiction ideas are possible in an afternoon. And so you have this idea of maybe doing asset management or the aerial observation of your shingles or things like this. You have a few hundred images and you can de-risk whether that's possible for your business today. So there's pretty broad-based adoption among both researchers that want to be on the state of the art, as much as companies that want to reduce the time to value. >> You know, Joseph, you guys and your partner have got a great front row seat, ground floor, presented creation wave here. I'm seeing a pattern emerging from all my conversations on theCUBE with founders that are successful, like yourselves, that there's two kind of real things going on. You got the enterprises grabbing the products and retrofitting into their legacy and rebuilding their business. And then you have startups coming out of the woodwork. Young, seeing greenfield or pick a specific niche or focus and making that the signature lever to move the market. >> That's right. >> So can you share your thoughts on the startup scene, other founders out there and talk about that? And then I have a couple questions for like the enterprises, the old school, the existing legacy. Little slower, but the startups are moving fast. What are some of the things you're seeing as startups are emerging in this field? >> I think you make a great point that independent of RoboFlow, very successful, especially developer focused businesses, kind of have three customer types. You have the startups and maybe like series A, series B startups that you're building a product as fast as you can to keep up with them, and they're really moving just as fast as as you are and pulling the product out at you for things that they need. The second segment that you have might be, call it SMB but not enterprise, who are able to purchase and aren't, you know, as fast of moving, but are stable and getting value and able to get to production. And then the third type is enterprise, and that's where you have typically larger contract value sizes, slower moving in terms of adoption and feedback for your product. And I think what you see is that successful companies balance having those three customer personas because you have the small startups, small fast moving upstarts that are discerning buyers who know the market and elect to build on tooling that is best in class. And so you basically kind of pass the smell test of companies who are quite discerning in their purchases, plus are moving so quick they're pulling their product out of you. Concurrently, you have a product that's enterprise ready to service the scalability, availability, and trust of enterprise buyers. And that's ultimately where a lot of companies will see tremendous commercial success. I mean I remember seeing the Twilio IPO, Uber being like a full 20% of their revenue, right? And so there's this very common pattern where you have the ability to find some of those upstarts that you make bets on, like the next Ubers of the world, the smaller companies that continue to get developed with the product and then the enterprise whom allows you to really fund the commercial success of the business, and validate the size of the opportunity in market that's being creative. >> It's interesting, there's so many things happening there. It's like, in a way it's a new category, but it's not a new category. It becomes a new category because of the capabilities, right? So, it's really interesting, 'cause that's what you're talking about is a category, creating. >> I think developer tools. So people often talk about B to B and B to C businesses. I think developer tools are in some ways a third way. I mean ultimately they're B to B, you're selling to other businesses and that's where your revenue's coming from. However, you look kind of like a B to C company in the ways that you measure product adoption and kind of go to market. In other words, you know, we're often tracking the leading indicators of commercial success in the form of usage, adoption, retention. Really consumer app, traditionally based metrics of how to know you're building the right stuff, and that's what product led growth companies do. And then you ultimately have commercial traction in a B to B way. And I think that that actually kind of looks like a third thing, right? Like you can do these sort of funny zany marketing examples that you might see historically from consumer businesses, but yet you ultimately make your money from the enterprise who has these de-risked high value problems you can solve for them. And I selfishly think that that's the best of both worlds because I don't have to be like Evan Spiegel, guessing the next consumer trend or maybe creating the next consumer trend and catching lightning in a bottle over and over again on the consumer side. But I still get to have fun in our marketing and make sort of fun, like we're launching the world's largest game of rock paper scissors being played with computer vision, right? Like that's sort of like a fun thing you can do, but then you can concurrently have the commercial validation and customers telling you the things that they need to be built for them next to solve commercial pain points for them. So I really do think that you're right by calling this a new category and it really is the best of both worlds. >> It's a great call out, it's a great call out. In fact, I always juggle with the VC. I'm like, it's so easy. Your job is so easy to pick the winners. What are you talking about its so easy? I go, just watch what the developers jump on. And it's not about who started, it could be someone in the dorm room to the boardroom person. You don't know because that B to C, the C, it's B to D you know? You know it's developer 'cause that's a human right? That's a consumer of the tool which influences the business that never was there before. So I think this direct business model evolution, whether it's media going direct or going direct to the developers rather than going to a gatekeeper, this is the reality. >> That's right. >> Well I got to ask you while we got some time left to describe, I want to get into this topic of multi-modality, okay? And can you describe what that means in computer vision? And what's the state of the growth of that portion of this piece? >> Multi modality refers to using multiple traditionally siloed problem types, meaning text, image, video, audio. So you could treat an audio problem as only processing audio signal. That is not multimodal, but you could use the audio signal at the same time as a video feed. Now you're talking about multi modality. In computer vision, multi modality is predominantly happening with images and text. And one of the biggest releases in this space is actually two years old now, was clip, contrastive language image pre-training, which took 400 million image text pairs and basically instead of previously when you do classification, you basically map every single image to a single class, right? Like here's a bunch of images of chairs, here's a bunch of images of dogs. What clip did is used, you can think about it like, the class for an image being the Instagram caption for the image. So it's not one single thing. And by training on understanding the corpora, you basically see which words, which concepts are associated with which pixels. And this opens up the aperture for the types of problems and generalizability of models. So what does this mean? This means that you can get to value more quickly from an existing trained model, or at least validate that what you want to tackle with a computer vision, you can get there more quickly. It also opens up the, I mean. Clip has been the bedrock of some of the generative image techniques that have come to bear, just as much as some of the LLMs. And increasingly we're going to see more and more of multi modality being a theme simply because at its core, you're including more context into what you're trying to understand about the world. I mean, in its most basic sense, you could ask yourself, if I have an image, can I know more about that image with just the pixels? Or if I have the image and the sound of when that image was captured or it had someone describe what they see in that image when the image was captured, which one's going to be able to get you more signal? And so multi modality helps expand the ability for us to understand signal processing. >> Awesome. And can you just real quick, define clip for the folks that don't know what that means? >> Yeah. Clip is a model architecture, it's an acronym for contrastive language image pre-training and like, you know, model architectures that have come before it captures the almost like, models are kind of like brands. So I guess it's a brand of a model where you've done these 400 million image text pairs to match up which visual concepts are associated with which text concepts. And there have been new releases of clip, just at bigger sizes of bigger encoding's, of longer strings of texture, or larger image windows. But it's been a really exciting advancement that OpenAI released in January, 2021. >> All right, well great stuff. We got a couple minutes left. Just I want to get into more of a company-specific question around culture. All startups have, you know, some sort of cultural vibe. You know, Intel has Moore's law doubles every whatever, six months. What's your culture like at RoboFlow? I mean, if you had to describe that culture, obviously love the hacking story, you and your partner with the games going number one on Product Hunt next to Elon and Tesla and then hey, we should start a company two years later. That's kind of like a curious, inventing, building, hard charging, but laid back. That's my take. How would you describe the culture? >> I think that you're right. The culture that we have is one of shipping, making things. So every week each team shares what they did for our customers on a weekly basis. And we have such a strong emphasis on being better week over week that those sorts of things compound. So one big emphasis in our culture is getting things done, shipping, doing things for our customers. The second is we're an incredibly transparent place to work. For example, how we think about giving decisions, where we're progressing against our goals, what problems are biggest and most important for the company is all open information for those that are inside the company to know and progress against. The third thing that I'd use to describe our culture is one that thrives with autonomy. So RoboFlow has a number of individuals who have founded companies before, some of which have sold their businesses for a hundred million plus upon exit. And the way that we've been able to attract talent like that is because the problems that we're tackling are so immense, yet individuals are able to charge at it with the way that they think is best. And this is what pairs well with transparency. If you have a strong sense of what the company's goals are, how we're progressing against it, and you have this ownership mentality of what can I do to change or drive progress against that given outcome, then you create a really healthy pairing of, okay cool, here's where the company's progressing. Here's where things are going really well, here's the places that we most need to improve and work on. And if you're inside that company as someone who has a preponderance to be a self-starter and even a history of building entire functions or companies yourself, then you're going to be a place where you can really thrive. You have the inputs of the things where we need to work on to progress the company's goals. And you have the background of someone that is just necessarily a fast moving and ambitious type of individual. So I think the best way to describe it is a transparent place with autonomy and an emphasis on getting things done. >> Getting shit done as they say. Getting stuff done. Great stuff. Hey, final question. Put a plug out there for the company. What are you going to hire? What's your pipeline look like for people? What jobs are open? I'm sure you got hiring all around. Give a quick plug for the company what you're looking for. >> I appreciate you asking. Basically you're either building the product or helping customers be successful with the product. So in the building product category, we have platform engineering roles, machine learning engineering roles, and we're solving some of the hardest and most impactful problems of bringing such a groundbreaking technology to the masses. And so it's a great place to be where you can kind of be your own user as an engineer. And then if you're enabling people to be successful with the products, I mean you're working in a place where there's already such a strong community around it and you can help shape, foster, cultivate, activate, and drive commercial success in that community. So those are roles that tend themselves to being those that build the product for developer advocacy, those that are account executives that are enabling our customers to realize commercial success, and even hybrid roles like we call it field engineering, where you are a technical resource to drive success within customer accounts. And so all this is listed on roboflow.com/careers. And one thing that I actually kind of want to mention John that's kind of novel about the thing that's working at RoboFlow. So there's been a lot of discussion around remote companies and there's been a lot of discussion around in-person companies and do you need to be in the office? And one thing that we've kind of recognized is you can actually chart a third way. You can create a third way which we call satellite, which basically means people can work from where they most like to work and there's clusters of people, regular onsite's. And at RoboFlow everyone gets, for example, $2,500 a year that they can use to spend on visiting coworkers. And so what's sort of organically happened is team numbers have started to pull together these resources and rent out like, lavish Airbnbs for like a week and then everyone kind of like descends in and works together for a week and makes and creates things. And we call this lighthouses because you know, a lighthouse kind of brings ships into harbor and we have an emphasis on shipping. >> Yeah, quality people that are creative and doers and builders. You give 'em some cash and let the self-governing begin, you know? And like, creativity goes through the roof. It's a great story. I think that sums up the culture right there, Joseph. Thanks for sharing that and thanks for this great conversation. I really appreciate it and it's very inspiring. Thanks for coming on. >> Yeah, thanks for having me, John. >> Joseph Nelson, co-founder and CEO of RoboFlow. Hot company, great culture in the right place in a hot area, computer vision. This is going to explode in value. The edge is exploding. More use cases, more development, and developers are driving the change. Check out RoboFlow. This is theCUBE. I'm John Furrier, your host. Thanks for watching. (gentle music)
SUMMARY :
Welcome to this CUBE conversation You're in the middle of it. And the wave is still building the company is that you're doing. maybe 2% of the whole economy And as you know, when you it kind of was obvious to you guys? cognizant of the fact that I love that because I think, you know, And so what you do is issue on the infrastructure. and the drone will go and the marketplace when you say, in the sushi that you're eating. And so having the And can you talk about the use case is relatively, you know, and making that the signature What are some of the things you're seeing and pulling the product out at you because of the capabilities, right? in the ways that you the C, it's B to D you know? And one of the biggest releases And can you just real quick, and like, you know, I mean, if you had to like that is because the problems Give a quick plug for the place to be where you can the self-governing begin, you know? and developers are driving the change.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Brad | PERSON | 0.99+ |
Joseph | PERSON | 0.99+ |
Joseph Nelson | PERSON | 0.99+ |
January, 2021 | DATE | 0.99+ |
John Furrier | PERSON | 0.99+ |
Medtronic | ORGANIZATION | 0.99+ |
Walmart | ORGANIZATION | 0.99+ |
2019 | DATE | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
Apple | ORGANIZATION | 0.99+ |
John | PERSON | 0.99+ |
400 million | QUANTITY | 0.99+ |
Evan Spiegel | PERSON | 0.99+ |
24 months | QUANTITY | 0.99+ |
2017 | DATE | 0.99+ |
RoboFlow | ORGANIZATION | 0.99+ |
15 minutes | QUANTITY | 0.99+ |
Rivian | ORGANIZATION | 0.99+ |
12 months | QUANTITY | 0.99+ |
20% | QUANTITY | 0.99+ |
Cardinal Health | ORGANIZATION | 0.99+ |
Palo Alto, California | LOCATION | 0.99+ |
Barcelona | LOCATION | 0.99+ |
Wimbledon | EVENT | 0.99+ |
roboflow.com/careers | OTHER | 0.99+ |
first | QUANTITY | 0.99+ |
second segment | QUANTITY | 0.99+ |
each team | QUANTITY | 0.99+ |
six months | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
Intel | ORGANIZATION | 0.99+ |
both worlds | QUANTITY | 0.99+ |
2% | QUANTITY | 0.99+ |
two years later | DATE | 0.98+ |
Mobile World Congress | EVENT | 0.98+ |
Ubers | ORGANIZATION | 0.98+ |
third way | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
a week | QUANTITY | 0.98+ |
Magic Sudoku | TITLE | 0.98+ |
second | QUANTITY | 0.98+ |
Nvidia | ORGANIZATION | 0.98+ |
Sudoku | TITLE | 0.98+ |
MWC | EVENT | 0.97+ |
today | DATE | 0.97+ |
billion dollar | QUANTITY | 0.97+ |
one single thing | QUANTITY | 0.97+ |
over a hundred thousand developers | QUANTITY | 0.97+ |
four | QUANTITY | 0.97+ |
third | QUANTITY | 0.96+ |
Elon | ORGANIZATION | 0.96+ |
third thing | QUANTITY | 0.96+ |
Tesla | ORGANIZATION | 0.96+ |
Jetson | COMMERCIAL_ITEM | 0.96+ |
Elon | PERSON | 0.96+ |
RoboFlow | TITLE | 0.96+ |
ORGANIZATION | 0.95+ | |
Twilio | ORGANIZATION | 0.95+ |
twenties | QUANTITY | 0.95+ |
Product Hunt AR | TITLE | 0.95+ |
Moore | PERSON | 0.95+ |
both researchers | QUANTITY | 0.95+ |
one thing | QUANTITY | 0.94+ |