Image Title

Search Results for Michelle Greer:

Nick Ducoff, Infochimps - SxSWi 2011 - theCUBE


 

hello welcome back mark risen Hopkins here at South by Southwest 2011 and I'm here with Nick do cough from info chimps where I'm from I'm pretty familiar with because I'm a tech center and I hear about these guys all the time you may or may not you should probably should know who these people are but if you're not Nick I'm just going to have you start off with a little bit of an elevator pitch talk about what your company does and acquaint them I hope hopefully they can hear you over the whatever that is a keynote or contest what is going on out there sure thank you info chums is a market place to find share and build on data we have two big customer bases one is the developer community which we're just really focused on making it super easy for developers to build applications you know an application is really two things right it's code and it's a database and there's lots of folks out there that help developers get access to code such as github but there's really not a centralized repository for structured information data and so that's what we're building and we're really excited about it the other part of our business is our marketplace where we have data sets that are published and can be downloaded as flat files so if you're you know mom and pop or you know non technical user and you know data for you is you know viewable in Microsoft Excel that's you know that's the place for you the beautiful thing is it's all found at the same place and that's info gems com I was going to talk a little bit about your recent announcement and Michelle the former contributors SiliconANGLE if you're watching this video you probably know who Michelle Greer is has been excitedly talking in hushed tones don't tell anybody till we announced but check a look at this is really cool your API Explorer and the launch of is it 1000 API is 1000 2000 data sets so i've i've never really dug is deep into your data sets as I have in the last couple of weeks while you've been turning on the API Explorer and uploading these new things so tell me tell me for is all about the broadly about the data sets and the API explored how that works and then we'll dive deeper into a couple of these that are really cool thanks and you know sorry to steal Michelle from you but she's a rock star and we love her so we recently published two thousand new API calls and you know that that's pretty exciting for us we're trying to make you know as much data is available in one place as as there is on the internet and these two thousand API calls range from social media data to weather data to stock data and really you know our key focus here was just to try to think of what are the building blocks for an application and how can we provide just data sets that you know can inspire developers to build applications without ever having to bring data down onto their own server the API Explorer makes it super easy for anybody to come and see you know after they pass through an input what what the output looks like within their web browser so they don't have to go and start coding to figure out what the output is going to look like they can you know get a few samples right there in the browser so the and as someone who is a lightweight developer these days but was a heavy coder back in my early days the API Explorer is what really makes it real in my opinion because you can look at the documentation all day long and we spoke to somebody earlier today that's in the documentation business as soon as you hear that you know it's nor right you know I don't want some ads either you're thinking about it something has to write the documentation which is a which is a big task always or someone's got to read it unless you need it like five minutes ago you you're not going to be hitting the books so but being able to just see a little box and like okay here's what I put into this box and hit the button and see what comes out the other end that's what makes it real so that that's I think something that makes what you guys are doing pretty exciting now but one of the ones that Michele showed me was clearly which is another company that uses you as the platform to publish the data and the AP I and so talk a little bit about what clearly does I can see a hundred uses for this for applications we're developing so talk a little bit about what that does and in depth about as much debt as you can about how they get their data and all that so poorly is a company run by Mac Schneider Hoffer based in London UK and he was previously at Atlas ventures he was a VC you know came back to the bright side of things and started his own company what clerk poorly does is a database across social identities so you know who are you online who am i online I'm Nick do cough um Twitter I'm / do cough on facebook I'm / Nick dash do cough on linkedin and you know it's hard to sometimes find in a programmatic fashion you know all of the identities for a person online and so what queries done is you can pass through whatever you've got twitter handle or Facebook account or a linkedin account and it will help map across all of the other social networks and help you find your flickr account the youtube account your LinkedIn account so that you know developers can help build you know any number of applications we deal we're based out of the cloud air office our Palo Alto group is based on cloud our office so a lot of what we do is using Hadoop to bring structure into unstructured data and I know that API right there I think saved us probably about three months worth of development on one aspect so we're going to be using it just just so you know but I mean being able to surface a surface content in a way that like being able to access you know you know the people that are around it like invented by stop by Southwest you control feeds find people that are there at South by Southwest but you don't always have access to all the content they're publishing because they may not have an auto feed going but you know with something like we really you can pull all their other feeds and then you know just just filter it based on location or date range or whatever it is you're doing and really go up with something useful you know to speak a little bit about what they do and I'm happy to also introduce you to max he's coming into Austin for South by Southwest but I hope you get it through us and not them but so what max does is you know they use indicators you know strong links across your various profiles to see UK is at Nick Duke off really the same guy as facebook / Nick Duke off right you know am I linking to my facebook profile from my twitter profile or you know in my facebook have i mentioned you know back to my twitter profile or my about me profile or something else right so that they can see okay well is this person really this this person well and then this kind of links into the the other discs the other API we were discussing earlier which is the Twitter profile search that combined with maybe the queerly search would be a great way of surfacing like Authority nodes on you know amongst content providers so talk about the differences between Twitter's native profile search we did we ran it on Batman Batman comics my thing and versus the the profile search that you guys have so we're really moving to having you know the data store of choice for us is elastic search it's an incredibly powerful tool that allows you to do essentially boolean searches across large data files for instance the Twitter profile search is a hunt across 100 million nodes and what we've got now is the ability to search across those 100 million users you know with the key words that they use in their profile and that can be you know obviously name it can be how they describe themselves what they like we're even there from Twitter the way that they do it based on just a couple searches that we ran it looks like they have some kind of method of looking both at the tweets themselves as well as potentially other keywords around what you need Charlie in character Gotham news and all kinds of crazy stuff nothing none of it had to do with that man comics per se than loosely associated with Batman so I guess if you're into that there you go but if you want an exact match this would be the way to go so so it's not all social data you've got I know there's some sports related ones in there there's a the raw word searches it was at the British corporate national corpus you've got a couple other ones that escaped me at mall and just a well with 2000 but so lots of interesting data to be able to search tubing so let's uh let's look a little bit broader where did you guys where was the inspiration for this what was the amo because big data is this is the is a focus for us editorially for the next foreseeable future whatever that ends up being because we covered a couple of conferences recently strata Hadoop amazing viewership that we were just talking about the concepts behind big data and it resonated with both our consumer oriented audiences developers of course but also enterprise because big data is something that affects them too and it's not just all about social and mobile and you know the fun stuff that Mashable and the TechCrunch and the web to blogs like to talk about but it's it's crossed over at IT so what was your aha moment that led you to pursue the path that that info chimps has because you're you're positioned at a good nexus for enterprise and all the consumer facing data stores so we'll just just talk a little bit about that journey sure so flip Cromer another one of our co-founders and CTO was pursuing his PhD in physics at UT and in the course of his research no spent a lot of time you know finding and munching data the kind of aha moment for him was it's a pain in the butt to find data online no Google does a wonderful job of indexing you know blobs unstructured information on web pages but they don't do a great job of indexing structured information and so flip set out to solve this problem and asked around his his fellow PhD candidates if anybody might be interested in pursuing pursuing this this this mission and found dhruv bandage m's team and kind of from there you know we've built up to 15 chimps trying to democratize access to structured information so so talk about the process of like data sanitization i know its a mix of automated and hand hand washing of the data so talk if you can talk about that it may be part of your secret sauce but if you didn't talk a little about that process I'd like to learn more sure so one of our kind of core philosophies is we take data and we publish it in a structured format we don't necessarily cleanse it when there's clearly articulated demand for a very high quality data set either we'll find it either through a third party supplier or we'll build it ourselves but unless there's clearly articulated demand we publish it the same way that we find it the only change that we make is we identify columns and rows so that you can make that you know in a machine-readable format okay but and also part of the rolls is documentation of that which is which is your next big but you can only do with 15 people do to so much at one time so you've got all the data published and part of that role is actually making it searchable curated and findable yeah so we absolutely want to continue to work on cleaning up the metadata you know around the data one of the things that we've been working on is a unified format of metadata and so that's something that we're pretty far along on and really excited about and I think it will really help with scalability because you know our data team can ingest data you know pretty quickly at this point you know we're pulling in you know hundreds of gigabytes a week or more probably closer to terabytes a week and but you know we got to make sure that we keep up with respect to you no documentation like you were saying and making it easily findable or we end up in the same place that we were before we started in foot jumps and so what we've done is we've loaded all of the metadata into elasticsearch as well as some of the data so that you know we obviously our search algorithm is part of our special sauce but we try to make you know the data set that's most relevant to you adjacent to the data that you either have or otherwise we're looking for so search search is really becoming a everything old is new again that's like a one of the themes people going back to search and reapplying it to problems that Google you know doesn't need to work on right Google is everybody thinks Google is solved search and I think they'll probably the first to tell you that we got ninety five percent of it down but I think it may be more than that really because there's so many different aspects of search that haven't been tackle I mean you got the semantic side you've got different different organizations that are trying to patch holes in micro site search you know or whitelisted topic-specific search and you're working on a couple different approaches to structure data search so that's that's one of the things I'm seeing is emerging theme what just stepping back I mean you've been like I suspend like a day and a half here in South by Southwest but you've probably been exposed to the the prep a little bit longer than I have been local to Austin what's what are some of the themes you're seeing emerge out of the conference here so you know it's it's all about location right you know you know location local and you know the data that powers that and so with respect to location you know one of the important themes is you know places where am i standing right now and there's a number of folks out there that you know might even tell you different things about where you're standing and so over the next couple months we're pretty excited to announce some partnerships that you know will save for another story to really make it easy for developers to build location-based applications and obviously a big part of that will be you know retail inventory and and and other things about where you are right happy hour specials you know all the other ratings and reviews you know all the kinds of stuff that folks ask for all the time you know can you scrape citysearch can you scrape yelp and you know we won't necessarily but we'll work with a lot of folks who have similar databases or those companies themselves to make it available to our developer community so one of the yet so that's a good position to delve into a little bit because i think that the fear is with companies that sit in a position you do where you envelop so much of an ecosystem is that you will compete with that ecosystem eventually we see it with Twitter you see with Facebook and you know those evangelists for those those organizations will will tell you okay we're not really competing but we know they are I mean either they are or they're just really bad at communicating how they don't want to communicate compete with their own ecosystem so that you leave the data sanitization scraping and otherwise organizing to other people and you're just organizing the organization of the data that that's an interesting point to elaborate on for instance a good number of those two thousand data sets where we took factual corpus of data sets and published them as api's right so we took what was you know structure data and made it published in an application programming interface right and that was something that hadn't been done before and now it's even easier to build on top of those databases right so you know they existed in the wild and we just made them easier to find an easier to access and that's really what we're what we're trying to do very cool stuff big data a theme search a theme South by Southwest 2011 I am Margaret Ann Hopkins we've been chatting with info chimps so a company to watch keep an eye on these guys play with the API Explorer I can't I am I'm not getting paid by these guys to say this I just really like it I played with it I really liked it so I think you should to stay tuned to SiliconANGLE console can hang a lot TV we'll have more coverage coming out of the conference so don't go away

Published Date : Mar 17 2011

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

EntityCategoryConfidence
Michelle GreerPERSON

0.99+

AustinLOCATION

0.99+

Margaret Ann HopkinsPERSON

0.99+

Nick DucoffPERSON

0.99+

MichellePERSON

0.99+

15 peopleQUANTITY

0.99+

ninety five percentQUANTITY

0.99+

MichelePERSON

0.99+

NickPERSON

0.99+

CharliePERSON

0.99+

100 million usersQUANTITY

0.99+

GoogleORGANIZATION

0.99+

youtubeORGANIZATION

0.99+

Palo AltoLOCATION

0.99+

mark risen HopkinsPERSON

0.99+

a day and a halfQUANTITY

0.99+

Nick DukePERSON

0.98+

LinkedInORGANIZATION

0.98+

API ExplorerTITLE

0.98+

TechCrunchORGANIZATION

0.98+

UKLOCATION

0.98+

one timeQUANTITY

0.98+

linkedinORGANIZATION

0.97+

facebookORGANIZATION

0.97+

five minutes agoDATE

0.97+

TwitterORGANIZATION

0.97+

FacebookORGANIZATION

0.97+

two thingsQUANTITY

0.97+

twitterORGANIZATION

0.96+

oneQUANTITY

0.96+

London UKLOCATION

0.96+

flickrORGANIZATION

0.96+

two thousand APIQUANTITY

0.96+

HadoopTITLE

0.96+

bothQUANTITY

0.95+

MashableORGANIZATION

0.95+

Mac Schneider HofferPERSON

0.95+

100 million nodesQUANTITY

0.94+

one placeQUANTITY

0.94+

AtlasORGANIZATION

0.94+

firstQUANTITY

0.93+

githubTITLE

0.93+

one aspectQUANTITY

0.93+

about three monthsQUANTITY

0.93+

Nick DukePERSON

0.92+

two thousand new API callsQUANTITY

0.92+

UTORGANIZATION

0.9+

hundreds of gigabytes a weekQUANTITY

0.89+

twoQUANTITY

0.89+

terabytes a weekQUANTITY

0.89+

CromerPERSON

0.89+

2011DATE

0.88+

earlier todayDATE

0.86+

1000QUANTITY

0.86+

two thousand data setsQUANTITY

0.85+

last couple of weeksDATE

0.85+

a hundred usesQUANTITY

0.85+

API ExplorerTITLE

0.84+

BritishOTHER

0.83+

BatmanPERSON

0.82+

lot of folksQUANTITY

0.8+

theCUBEORGANIZATION

0.8+

2000DATE

0.79+

next couple monthsDATE

0.79+

SiliconANGLEORGANIZATION

0.75+

up to 15 chimpsQUANTITY

0.74+

Batman BatmanTITLE

0.73+

1000 2000 data setsQUANTITY

0.72+

lots of folksQUANTITY

0.71+

number of folksQUANTITY

0.71+