Shafaq Abdullah, The Honest Company - #SparkSummit - #theCUBE

>> Announcer: Covering Spark Summit 2017, brought to you by Databricks. >> This is theCUBE, and we're having a great time at Spark Summit 2017. One of our last guests of the day is Shafaq Abdullah, who is the director of data infrastructure at the Honest Company. Shafaq, welcome to the show. >> Thank you. >> Now, I heard about The Honest Company because of the celebrity founder, right, Jessica Alba? >> Shafaq: That's correct. >> Okay, but how did you end up at the company, weren't you at a start-up before? >> That's exactly correct. So, basically, we did a start-up called InSnap before we actually got into Honest, and the way it happened is that, Insnap was more about instantaneous building personas and using machine learning and Big Data Stack, and Honest at that time was trying to find someone who could them with the data challenges. So, Insnap was the right piece in some of its technology and expertise in big data and machine learning, so we basically built a real-time, instantaneous personas to increase engagement and monetization. It was backed up by Big Data machine learning and Spark instead of our technology. So we used that to basically help Honest to really become data driven, to solve their next generation problem of making products which drive value out of data, and understand their customers better, operate better business, optimize business better. That is why they acquired us, and essentially, we deal with the technology in their stack, not only the technology, but also the culture, the business processes, and the teams which operate those. >> Okay, we're going to dive into some of the technical details about what you're developing with George in just a second, but I have to ask, the company culture is really important at The Honest Company, right? They're well known for being eco-friendly and socially responsible. What was it like moving from a start-up into that company environment, or was it just a natural? >> Basically, of course, Honest was a much bigger start-up for four or five years after it was initially created, so we at Insnap, very lean, agile and much more data driven. That was a bigger difference. So the way we solved it was, we actually, they actually allowed us to create our data organization called Data Signs, which was heading all the data initiatives. And then, we worked with other cross function teams, with finance, with accounting, with growth, with sales to basically help them understand what their needs are, and how to become really data driven by driving the value out of the data by using the state of the art technology. So it was a mix of team alignment and cultural change, focused on the business goal, and getting agrigate to gather around it to make the change. I really enjoyed that while we actually carried out this journey of Honest from being just descriptive, which is essentially just finding what has happened in the data, just generating reports for revenue. By becoming more predictive and prescriptive, which is more like advanced analytics and also advanced advisory role, which together plays in making decisions around features, around businesses and the operations. >> And George, you talked to a lot of customers today, and some of the same themes. Do you want to drill down some of the details of what they're doing. >> I'm curious about how you chose the first projects to get quick wins and to establish credibility? >> Yeah, that's actually a very good question. Basically, we were focused around the low hanging fruit in order to give us a jump-start and to build our reputation so that we can actually take much more advanced technology projects. And in order to do that, what we did was, if you go to Honest.com, and you search in their search bar, their search was very flimsy, and it was not revealing good results. We already built our engine, like a matching engine, so it was very easy to extend it into a full search engine. That was the first deliverable which we could deliver, and we delivered it under a month and a half or two months, right when we came in. And it was like, hey, these guys just improved our search by 10x or 100x; we are getting much more hits, much more coverage of the third strums And that served the tone. Then it was like we also wanted to, another piece which we wanted to tackle was, how do we improve Honest recommendations. That was another project. But before doing that, Honest did not even have a data warehouse, which it could call an advisor warehouse, so that you can get all the data in one place, like a like a data lake, because the data was siloed in organizations, and the analysts could not really get the data into one place and mix and match and analyze the data. So that was another big piece which we did, but we did it very early on. That was the second big deliverable, even before recommendation, the data warehouse. So basically, we plugged in Spark right in the middle, suck up all the data from different places, shove the data in, made this ETL king, which basically extracted, transformed and loaded the data into the data warehouse. Now, this data warehouse basically broke away those silos and made them a cohesive data lake which could be used for driving value and understanding patterns, especially for machine learning, analysts and all the decision makers. >> Was it a data warehouse, or was it a data lake? The reason I ask for distinction is, data warehouse is usually extremely well curated for navigation and discoverability, whereas the data lake is, as some people say, nuts, a little step up from a swamp. >> That's right, so basically, when I call it a data lake, I actually call it, because we have two data aggregation or data gathering infrastructure. One is backed by Spark and S3, which we call a data lake, where unstructured, structured data, there are all kinds of data there, mix and match, and it's not that easy sometimes, you need to do some transformation on top of the data, which is sitting there in order to really get to the needle in the haystack. But data warehouse has in grad shift, which basically gets the data from the data lake, or like the Spark ideal engine, and then makes it more like a metric-driven report, so that it's easily discoverable and it is more like what the business requires right now. It's more like formal reports, and the dimensions and all those attributes are much more well thought of. Whereas data lake is kind of like throwing it all in one piece so that at least we have the data in one place, and then we can analyze and process it. >> In putting all the data first in the data lake and then, essentially, refining it into the data warehouse, what did you use to keep track of the lineage and to make sure that you knew the truth, or truthfulness, behind all the data in the data warehouse once it got there? >> So basically, we built data model on top of S3 and Spark. We used that data model as a basis, as a source of truth to feed in the reports, and that data model was consistent across wherever you find it. So we want to make sure that those attributes, those dimensions and anything related to that data model for the e-com as well as offline patron is consistent. And so we use Spark, we use S3, essentially, to get that data model consistent, and also, we use a bunch of advanced monitoring stuff for that. When we are processing jobs, we want to make sure that we don't lose the data, and we remove the coupling between the systems by decoupling them, and essentially, in the next version, we made it even stream, even based streams, so that was like general strategy which we adopted in order to make sure that we have consistency around data lake and data warehouse. >> What would be the next step? So, now you've significantly enhanced business intelligence, and you have the richest repository behind that data warehouse. What would you do either with the data in the data warehouse or the data in the data lake repository? >> So we are constantly enriching our data lake because that needs to be updated all the time, but at the same time, we want to connect business with our metrics; they essentially derive all of that data which is sitting in the data lake to help optimize a problem. For example, we are working on sales optimization. We are working on operations optimization, demand planning, supply planning, in addition to customer insights. We are also working on other strategic project. For example, instead of just recommending or predicting LTV return, what we are doing is, we are trying to be more descriptive in our analytics in which it takes an advisory role, and looks over all the marketing spend, not just predict the high LTV customers, but actually allocates budget for different marketing spend across different channels for omni comment. For example, TV display ads, you know, all of that, so that's also happening as we speak, as we enrich our data lake and essentially generate those reports. Now, then we also need to circle back with the business folks or decision makers in order to really convince them to use that. So that's why we created these cross-functional teams, aligned to a business goal contextually aware teams, which know their roles and responsibilities, but at the same time, which can collaborate effectively and produce a result which drives the bottom line. >> What kind of customer insights were you looking for? Do they deliver family products, diapers to the home and that sort of thing? What sort of customer insights were you looking for and how is it working? >> Basically, Honest, in all our target customers, we need to better understand what their needs are. So customer insights, for example, the demographics of the customers. In addition we also wanted to see what are the things, what are the patterns which are common in customer, so that we can recommend products which are being bought by one segment of customer versus another. Those common properties, it could be mothers, who have recently had children, but who live in this neighborhood and have this kind of income level. So how do we ensure that we actually predict their demands before it actually happens. So we need to understand their habits, we need to understand the context behind it, if we are making some search, how many pages they use for this kind of product or that kind of a product, and similarly other things which enhance the understanding of the customers, make them into different buckets of segments, and then using those segments to target, because we already have data about LTV and turn as predictive models revealing if a customer is going to turn for whatever reason, we know by doing a similar campaign for other customers this has successfully given us more subscriptions or helped us to reduce a turn, that is how we target them and optimize our campaigns or our promotions for that. >> David: Sure. >> We're also looking for the overall lifestyle of the people who are passionate about Honest brands or brands that exhibit similar values, for example, eco-friendly, safe, and trusted products. >> Right, so we have just a couple of minutes to go before we get to the break. This is great stuff and George, I'll come back to you for a final question in just a moment, but in 30 seconds or so, tell us why you selected Databricks. You probably looked at other options, right? >> Shafaq: Absolutely. >> Can you give us a quick, why you made the decision? >> Absolutely, when we came in at Honest, all they had was a bunch of my secret developers, and very limited big data knowledge. So, now that they need a jump start in order to really get to that level in a very small time. How that's even achievable? We don't even have dedicated data-ops on our team. So basically, Databricks helped to bridge that gap by allowing us to get the infrastructure efficiency we needed by spinning up in hassle free manner. They also had this notebooks feature where we can scale the code and scale the team by actually reusing the boiler plate code, and similarly, different teams have different expertise. For example data science teams like Biton and data engineers like Scallop. So now those Scallop people write function which can be called by teams in data science in the same notebook, essentially giving them the ability to collaborate effectively. And then we also needed some tool to give more traction and visualization for data scientists well as data engineers. Databricks has a big visualization built in which helps to understand the causation corelation at least corelation right of the band, without even importing the data into our, or some other external tool, and making those charts. So there are a bunch of advantages around which we wanted. And then it has a platform API, like DBFS, like a disability files, it's similar to our vestry, which are cool APIs which again provide us the jump start which we needed, in so less amount of time, we actually made those, not only data warehouse, but also data driven parts. >> It sounds like Databricks has delivered. >> Shafaq: Oh yeah. >> Awesome. All right, George, just enough time for one more question if you want to throw on in. >> This one is kind of technical, but not on the technology side so much as, how do you guys measure attribution between channels and omni-channel marketing? >> That's a very good question. We have this project called Marketing Attribution, and essentially, the scope of that project is, we want to give the right ways to the right clicks of the customer as a journey of subscription or conversion. So, we have a model which basically use a bunch of techniques, including weighted and linear regression to basically come up with some kind of a weighted way of allowing those weights to be distributed among different channels. And then we also, the first problem to solve is that we needed to instrument logging so that we get those clicks and searches, all of that, into our data lake. That was done before hand, before starting the MT project, because we have a bunch of touch points. Customer could be doing search, he could be calling our sales rep, he could be tracking his order online, or he could be just leaving his cart in a state which is not fulfilled. And then, now we are trying to get it offline also, on top of that, and we are working on to get so that we know what a customer is doing in store and we have seamless experience using this MTA as a next version of it to give them a seamless experience in brick and mortar store or online. >> Great, that's great stuff, Shafaq. I wish we had more time to go. We'll talk to you more after we stop rolling. Thank you for being so honest, and we appreciate you being on the show. >> Thank you, I really appreciate it. >> Thank you so much. >> George: Shafaq, that was Great. >> All right, to all of you, thank you so much. We're going to be back in a few moments with the daily wrap up. You don't want to miss that. Thank you for joining us on theCUBE for Spark Summit 2017.

Published Date : Jun 7 2017

SUMMARY :

brought to you by Databricks. One of our last guests of the day is Shafaq Abdullah, and essentially, we deal with the technology in their stack, some of the technical details about what you're developing So the way we solved it was, we actually, and some of the same themes. our reputation so that we can actually Was it a data warehouse, or was it a data lake? and then we can analyze and process it. in order to make sure that we have consistency or the data in the data lake repository? but at the same time, we want to connect so that we can recommend products We're also looking for the overall lifestyle of the people to go before we get to the break. in so less amount of time, we actually made those, for one more question if you want to throw on in. so that we know what a customer is doing in store and we appreciate you being on the show. All right, to all of you, thank you so much.

ENTITIES

Entity	Category	Confidence
Shafaq	PERSON	0.99+
George	PERSON	0.99+
Jessica Alba	PERSON	0.99+
David	PERSON	0.99+
Shafaq Abdullah	PERSON	0.99+
four	QUANTITY	0.99+
five years	QUANTITY	0.99+
30 seconds	QUANTITY	0.99+
first	QUANTITY	0.99+
Honest	ORGANIZATION	0.99+
10x	QUANTITY	0.99+
100x	QUANTITY	0.99+
The Honest Company	ORGANIZATION	0.99+
one piece	QUANTITY	0.99+
Insnap	ORGANIZATION	0.99+
One	QUANTITY	0.98+
Databricks	ORGANIZATION	0.98+
Honest.com	ORGANIZATION	0.98+
Scallop	ORGANIZATION	0.98+
S3	TITLE	0.98+
under a month and a half	QUANTITY	0.98+
one place	QUANTITY	0.98+
one segment	QUANTITY	0.98+
first projects	QUANTITY	0.98+
Spark Summit 2017	EVENT	0.98+
Honest Company	ORGANIZATION	0.98+
first problem	QUANTITY	0.97+
Spark	TITLE	0.97+
one more question	QUANTITY	0.97+
Data Signs	ORGANIZATION	0.97+
InSnap	ORGANIZATION	0.97+
two data	QUANTITY	0.97+
today	DATE	0.94+
Biton	ORGANIZATION	0.94+
second big	QUANTITY	0.86+
Spark	ORGANIZATION	0.85+
two months	QUANTITY	0.85+
third strums	QUANTITY	0.78+
S3	ORGANIZATION	0.7+
a second	QUANTITY	0.69+
MTA	TITLE	0.67+
theCUBE	ORGANIZATION	0.61+
Big	ORGANIZATION	0.59+
couple of minutes	QUANTITY	0.52+
last guests	QUANTITY	0.5+
DBFS	TITLE	0.49+
ETL	ORGANIZATION	0.47+
#SparkSummit	EVENT	0.39+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Honest.com: