Tiji Mathew, Patrick Zimet and Senthil Karuppaiah | Io-Tahoe Data Quality Active DQ
(upbeat music), (logo pop up) >> Narrator: From around the globe it's theCUBE. Presenting active DQ intelligent automation for data quality brought to you by IO-Tahoe. >> Are you ready to see active DQ on Snowflake in action? Let's get into the show and tell him, do the demo. With me or Tiji Matthew, the Data Solutions Engineer at IO-Tahoe. Also joining us is Patrick Zeimet Data Solutions Engineer at IO-Tahoe and Senthilnathan Karuppaiah, who's the Head of Production Engineering at IO-Tahoe. Patrick, over to you let's see it. >> Hey Dave, thank you so much. Yeah, we've seen a huge increase in the number of organizations interested in Snowflake implementation. Were looking for an innovative, precise and timely method to ingest their data into Snowflake. And where we are seeing a lot of success is a ground up method utilizing both IO-Tahoe and Snowflake. To start you define your as is model. By leveraging IO-Tahoe to profile your various data sources and push the metadata to Snowflake. Meaning we create a data catalog within Snowflake for a centralized location to document items such as source system owners allowing you to have those key conversations and understand the data's lineage, potential blockers and what data is readily available for ingestion. Once the data catalog is built you have a much more dynamic strategies surrounding your Snowflake ingestion. And what's great is that while you're working through those key conversations IO-Tahoe will maintain that metadata push and partnered with Snowflake ability to version the data. You can easily incorporate potential scheme changes along the way. Making sure that the information that you're working on stays as current as the systems that you're hoping to integrate with Snowflake. >> Nice, Patrick I wonder if you could address how you IO-Tahoe Platform Scales and maybe in what way it provides a competitive advantage for customers. >> Great question where IO-Tahoe shines is through its active DQ or the ability to monitor your data's quality in real time. Marking which roads need remediation. According to the customized business rules that you can set. Ensuring that the data quality standards meet the requirements of your organizations. What's great is through our use of RPA. We can scale with an organization. So as you ingest more data sources we can allocate more robotic workers meaning the results will continue to be delivered in the same timely fashion you've grown used to. What's Morrisons IO-Tahoe is doing the heavy lifting on monitoring data quality. That's frees up your data experts to focus on the more strategic tasks such as remediation that augmentations and analytics developments. >> Okay, maybe Tiji, you could address this. I mean, how does all this automation change the operating model that we were talking to to Aj and Dunkin before about that? I mean, if it involves less people and more automation what else can I do in parallel? >> I'm sure the participants today will also be asking the same question. Let me start with the strategic tasks Patrick mentioned, Io-Tahoe does the heavy lifting. Freeing up data experts to act upon the data events generated by IO-Tahoe. Companies that have teams focused on manually building their inventory of the data landscape. Leads to longer turnaround times in producing actionable insights from their own data assets. Thus, diminishing the value realized by traditional methods. However, our operating model involves profiling and remediating at the same time creating a catalog data estate that can be used by business or IT accordingly. With increased automation and fewer people. Our machine learning algorithms augment the data pipeline to tag and capture the data elements into a comprehensive data catalog. As IO-Tahoe automatically catalogs the data estate in a centralized view, the data experts can partly focus on remediating the data events generated from validating against business rules. We envision that data events coupled with this drillable and searchable view will be a comprehensive one to assess the impact of bad quality data. Let's briefly look at the image on screen. For example, the view indicates that bad quality zip code data impacts the contact data which in turn impacts other related entities in systems. Now contrast that with a manually maintained spreadsheet that drowns out the main focus of your analysis. >> Tiji, how do you tag and capture bad quality data and stop that from you've mentioned these printed dependencies. How do you stop that from flowing downstream into the processes within the applications or reports? >> As IO-Tahoe builds the data catalog across source systems. We tag the elements that meet the business rule criteria while segregating the failed data examples associated with the elements that fall below a certain threshold. The elements that meet the business rule criteria are tagged to be searchable. Thus, providing an easy way to identify data elements that may flow through the system. The segregated data examples on the other hand are used by data experts to triage for the root cause. Based on the root cause potential outcomes could be one, changes in the source system to prevent that data from entering the system in the first place. Two, add data pipeline logic, to sanitize bad data from being consumed by downstream applications and reports or just accept the risk of storing bad data and address it when it meets a certain threshold. However, Dave as for your question about preventing bad quality data from flowing into the system? IO-Tahoe will not prevent it because the controls of data flowing between systems is managed outside of IO-Tahoe. Although, IO-Tahoe will alert and notify the data experts to events that indicate bad data has entered the monitored assets. Also we have redesigned our product to be modular and extensible. This allows data events generated by IO-Tahoe to be consumed by any system that wants to control the targets from bad data. Does IO-Tahoe empowers the data experts to control the bad data from flowing into their system. >> Thank you for that. So, one of the things that we've noticed, we've written about is that you've got these hyper specialized roles within the data, the centralized data organization. And wonder how do the data folks get involved here if at all, and how frequently do they get involved? Maybe Senthilnathan you could take that. >> Thank you, Dave for having me here. Well, based on whether the data element in question is in data cataloging or monitoring phase. Different data folks gets involved. When it isn't in the data cataloging stage. The data governance team, along with enterprise architecture or IT involved in setting up the data catalog. Which includes identifying the critical data elements business term identification, definition, documentation data quality rules, and data even set up data domain and business line mapping, lineage PA tracking source of truth. So on and so forth. It's typically in one time set up review certify then govern and monitor. But while when it is in the monitoring phase during any data incident or data issues IO-Tahoe broadcast data signals to the relevant data folks to act and remedy it as quick as possible. And alerts the consumption team it could be the data science, analytics, business opts are both a potential issue so that they are aware and take necessary preventative measure. Let me show you an example, critical data element from data quality dashboard view to lineage view to data 360 degree view for a zip code for conformity check. So in this case the zip code did not meet the past threshold during the technical data quality check and was identified as non-compliant item and notification was sent to the ID folks. So clicking on the zip code. Will take to the lineage view to visualize the dependent system, says that who are producers and who are the consumers. And further drilling down will take us to the detailed view, that a lot of other information's are presented to facilitate for a root cause analysis and not to take it to a final closure. >> Thank you for that. So Tiji? Patrick was talking about the as is to be. So I'm interested in how it's done now versus before. Do you need a data governance operating model for example? >> Typically a company that decides to make an inventory of the data assets would start out by manually building a spreadsheet managed by data experts of the company. What started as a draft now get break into the model of a company. This leads to loss of collaboration as each department makes a copy of their catalog for their specific needs. This decentralized approach leads to loss of uniformity which each department having different definitions which ironically needs a governance model for the data catalog itself. And as the spreadsheet grows in complexity the skill level needed to maintain. It also increases thus leading to fewer and fewer people knowing how to maintain it. About all the content that took so much time and effort to build is not searchable outside of that spreadsheet document. >> Yeah, I think you really hit the nail on my head Tiji. Now companies want to move away from the spreadsheet approach. IO-Tahoe addresses the shortcoming of the traditional approach enabling companies to achieve more with less. >> Yeah, what the customer reaction has been. We had Webster Bank, on one of the early episodes for example, I mean could they have achieved. What they did without something like active data quality and automation maybe Senthilnathan you could address that? >> Sure, It is impossible to achieve full data quality monitoring and remediation without automation or digital workers in place reality that introverts they don't have the time to do the remediation manually because they have to do an analysis conform fix on any data quality issues, as fast as possible before it gets bigger and no exception to Webster. That's why Webster implemented IO-Tahoe's active DQ to set up the business, metadata management and data quality monitoring and remediation in the Snowflake cloud data Lake. We help and building the center of excellence in the data governance, which is managing the data catalog schedule on demand and in-flight data quality checks, but Snowflake, no pipe on stream are super beneficial to achieve in flight quality checks. Then the data assumption monitoring and reporting last but not the least the time saver is persisting the non-compliant records for every data quality run within the Snowflake cloud, along with remediation script. So that during any exceptions the respect to team members is not only alerted. But also supplied with necessary scripts and tools to perform remediation right from the IO-Tahoe's Active DQ. >> Very nice. Okay guys, thanks for the demo. Great stuff. Now, if you want to learn more about the IO-Tahoe platform and how you can accelerate your adoption of Snowflake book some time with a data RPA expert all you got to do is click on the demo icon on the right of your screen and set a meeting. We appreciate you attending this latest episode of the IO-Tahoe data automation series. Look, if you missed any of the content that's all available on demand. This is Dave Vellante theCUBE. Thanks for watching. (upbeat music)
SUMMARY :
the globe it's theCUBE. and tell him, do the demo. and push the metadata to Snowflake. if you could address or the ability to monitor the operating model on remediating the data events generated into the processes within the data experts to events that indicate So, one of the things that So clicking on the zip code. Thank you for that. the skill level needed to maintain. of the traditional approach one of the early episodes So that during any exceptions the respect of the IO-Tahoe data automation series.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Patrick | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Tiji Matthew | PERSON | 0.99+ |
Tiji Mathew | PERSON | 0.99+ |
Senthil Karuppaiah | PERSON | 0.99+ |
Patrick Zimet | PERSON | 0.99+ |
IO-Tahoe | ORGANIZATION | 0.99+ |
Io-Tahoe | ORGANIZATION | 0.99+ |
Tiji | PERSON | 0.99+ |
360 degree | QUANTITY | 0.99+ |
Senthilnathan Karuppaiah | PERSON | 0.99+ |
each department | QUANTITY | 0.99+ |
Snowflake | TITLE | 0.99+ |
today | DATE | 0.99+ |
Webster | ORGANIZATION | 0.99+ |
Aj | PERSON | 0.99+ |
Dunkin | PERSON | 0.98+ |
Two | QUANTITY | 0.98+ |
IO | ORGANIZATION | 0.97+ |
Patrick Zeimet | PERSON | 0.97+ |
Webster Bank | ORGANIZATION | 0.97+ |
one | QUANTITY | 0.97+ |
one time | QUANTITY | 0.97+ |
both | QUANTITY | 0.96+ |
Senthilnathan | PERSON | 0.96+ |
IO-Tahoe | TITLE | 0.93+ |
first place | QUANTITY | 0.89+ |
IO | TITLE | 0.72+ |
Snowflake | EVENT | 0.71+ |
Tahoe | ORGANIZATION | 0.69+ |
Data Solutions | ORGANIZATION | 0.69+ |
-Tahoe | TITLE | 0.64+ |
Tahoe | TITLE | 0.63+ |
Snowflake | ORGANIZATION | 0.6+ |
Morrisons | ORGANIZATION | 0.6+ |
Tiji Mathew, Patrick Zimet and Senthil Karuppaiah | Io-Tahoe Data Quality: Active DQ
(upbeat music), (logo pop up) >> Narrator: From around the globe it's theCUBE. Presenting active DQ intelligent automation for data quality brought to you by IO-Tahoe. >> Are you ready to see active DQ on Snowflake in action? Let's get into the show and tell him, do the demo. With me or Tiji Matthew, the Data Solutions Engineer at IO-Tahoe. Also joining us is Patrick Zeimet Data Solutions Engineer at IO-Tahoe and Senthilnathan Karuppaiah, who's the Head of Production Engineering at IO-Tahoe. Patrick, over to you let's see it. >> Hey Dave, thank you so much. Yeah, we've seen a huge increase in the number of organizations interested in Snowflake implementation. Were looking for an innovative, precise and timely method to ingest their data into Snowflake. And where we are seeing a lot of success is a ground up method utilizing both IO-Tahoe and Snowflake. To start you define your as is model. By leveraging IO-Tahoe to profile your various data sources and push the metadata to Snowflake. Meaning we create a data catalog within Snowflake for a centralized location to document items such as source system owners allowing you to have those key conversations and understand the data's lineage, potential blockers and what data is readily available for ingestion. Once the data catalog is built you have a much more dynamic strategies surrounding your Snowflake ingestion. And what's great is that while you're working through those key conversations IO-Tahoe will maintain that metadata push and partnered with Snowflake ability to version the data. You can easily incorporate potential scheme changes along the way. Making sure that the information that you're working on stays as current as the systems that you're hoping to integrate with Snowflake. >> Nice, Patrick I wonder if you could address how you IO-Tahoe Platform Scales and maybe in what way it provides a competitive advantage for customers. >> Great question where IO-Tahoe shines is through its active DQ or the ability to monitor your data's quality in real time. Marking which roads need remediation. According to the customized business rules that you can set. Ensuring that the data quality standards meet the requirements of your organizations. What's great is through our use of RPA. We can scale with an organization. So as you ingest more data sources we can allocate more robotic workers meaning the results will continue to be delivered in the same timely fashion you've grown used to. What's Morrisons IO-Tahoe is doing the heavy lifting on monitoring data quality. That's frees up your data experts to focus on the more strategic tasks such as remediation that augmentations and analytics developments. >> Okay, maybe Tiji, you could address this. I mean, how does all this automation change the operating model that we were talking to to Aj and Dunkin before about that? I mean, if it involves less people and more automation what else can I do in parallel? >> I'm sure the participants today will also be asking the same question. Let me start with the strategic task. Patrick mentioned IO-Tahoe does the heavy lifting. Freeing up data experts to act upon the data events generated by IO-Tahoe. Companies that have teams focused on manually building their inventory of the data landscape. Leads to longer turnaround times in producing actionable insights from their own data assets. Thus, diminishing the value realized by traditional methods. However, our operating model involves profiling and remediating at the same time creating a catalog data estate that can be used by business or IT accordingly. With increased automation and fewer people. Our machine learning algorithms augment the data pipeline to tag and capture the data elements into a comprehensive data catalog. As IO-Tahoe automatically catalogs the data estate in a centralized view, the data experts can partly focus on remediating the data events generated from validating against business rules. We envision that data events coupled with this drillable and searchable view will be a comprehensive one to assess the impact of bad quality data. Let's briefly look at the image on screen. For example, the view indicates that bad quality zip code data impacts the contact data which in turn impacts other related entities in systems. Now contrast that with a manually maintained spreadsheet that drowns out the main focus of your analysis. >> Tiji, how do you tag and capture bad quality data and stop that from you've mentioned these printed dependencies. How do you stop that from flowing downstream into the processes within the applications or reports? >> As IO-Tahoe builds the data catalog across source systems. We tag the elements that meet the business rule criteria while segregating the failed data examples associated with the elements that fall below a certain threshold. The elements that meet the business rule criteria are tagged to be searchable. Thus, providing an easy way to identify data elements that may flow through the system. The segregated data examples on the other hand are used by data experts to triage for the root cause. Based on the root cause potential outcomes could be one, changes in the source system to prevent that data from entering the system in the first place. Two, add data pipeline logic, to sanitize bad data from being consumed by downstream applications and reports or just accept the risk of storing bad data and address it when it meets a certain threshold. However, Dave as for your question about preventing bad quality data from flowing into the system? IO-Tahoe will not prevent it because the controls of data flowing between systems is managed outside of IO-Tahoe. Although, IO-Tahoe will alert and notify the data experts to events that indicate bad data has entered the monitored assets. Also we have redesigned our product to be modular and extensible. This allows data events generated by IO-Tahoe to be consumed by any system that wants to control the targets from bad data. Does IO-Tahoe empowers the data experts to control the bad data from flowing into their system. >> Thank you for that. So, one of the things that we've noticed, we've written about is that you've got these hyper specialized roles within the data, the centralized data organization. And wonder how do the data folks get involved here if at all, and how frequently do they get involved? Maybe Senthilnathan you could take that. >> Thank you, Dave for having me here. Well, based on whether the data element in question is in data cataloging or monitoring phase. Different data folks gets involved. When it doesn't the data cataloging stage. The data governance team, along with enterprise architecture or IT involved in setting up the data catalog. Which includes identifying the critical data elements business term identification, definition, documentation data quality rules, and data even set up data domain and business line mapping, lineage PA tracking source of truth. So on and so forth. It's typically in one time set up review certify then govern and monitor. But while when it is in the monitoring phase during any data incident or data issues IO-Tahoe broadcast data signals to the relevant data folks to act and remedy it as quick as possible. And alerts the consumption team it could be the data science, analytics, business opts are both a potential issue so that they are aware and take necessary preventative measure. Let me show you an example, critical data element from data quality dashboard view to lineage view to data 360 degree view for a zip code for conformity check. So in this case the zip code did not meet the past threshold during the technical data quality check and was identified as non-compliant item and notification was sent to the ID folks. So clicking on the zip code. Will take to the lineage view to visualize the dependent system, says that who are producers and who are the consumers. And further drilling down will take us to the detailed view, that a lot of other information's are presented to facilitate for a root cause analysis and not to take it to a final closure. >> Thank you for that. So Tiji? Patrick was talking about the as is to be. So I'm interested in how it's done now versus before. Do you need a data governance operating model for example? >> Typically a company that decides to make an inventory of the data assets would start out by manually building a spreadsheet managed by data experts of the company. What started as a draft now get break into the model of a company. This leads to loss of collaboration as each department makes a copy of their catalog for their specific needs. This decentralized approach leads to loss of uniformity which each department having different definitions which ironically needs a governance model for the data catalog itself. And as the spreadsheet grows in complexity the skill level needed to maintain. It also increases thus leading to fewer and fewer people knowing how to maintain it. About all the content that took so much time and effort to build is not searchable outside of that spreadsheet document. >> Yeah, I think you really hit the nail on my head Tiji. Now companies want to move away from the spreadsheet approach. IO-Tahoe addresses the shortcoming of the traditional approach enabling companies to achieve more with less. >> Yeah, what the customer reaction has been. We had Webster Bank, on one of the early episodes for example, I mean could they have achieved. What they did without something like active data quality and automation maybe Senthilnathan you could address that? >> Sure, It is impossible to achieve full data quality monitoring and remediation without automation or digital workers in place reality that introverts they don't have the time to do the remediation manually because they have to do an analysis conform fix on any data quality issues, as fast as possible before it gets bigger and no exception to Webster. That's why Webster implemented IO-Tahoe's active DQ to set up the business, metadata management and data quality monitoring and remediation in the Snowflake cloud data Lake. We help and building the center of excellence in the data governance, which is managing the data catalog schedule on demand and in-flight data quality checks, but Snowflake, no pipe on stream are super beneficial to achieve in flight quality checks. Then the data assumption monitoring and reporting last but not the least the time saver is persisting the non-compliant records for every data quality run within the Snowflake cloud, along with remediation script. So that during any exceptions the respect to team members is not only alerted. But also supplied with necessary scripts and tools to perform remediation right from the IO-Tahoe's Active DQ. >> Very nice. Okay guys, thanks for the demo. Great stuff. Now, if you want to learn more about the IO-Tahoe platform and how you can accelerate your adoption of Snowflake book some time with a data RPA expert all you got to do is click on the demo icon on the right of your screen and set a meeting. We appreciate you attending this latest episode of the IO-Tahoe data automation series. Look, if you missed any of the content that's all available on demand. This is Dave Vellante theCUBE. Thanks for watching. (upbeat music)
SUMMARY :
the globe it's theCUBE. and tell him, do the demo. and push the metadata to Snowflake. if you could address or the ability to monitor the operating model on remediating the data events generated into the processes within the data experts to events that indicate So, one of the things that So clicking on the zip code. Thank you for that. the skill level needed to maintain. of the traditional approach one of the early episodes So that during any exceptions the respect of the IO-Tahoe data automation series.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Patrick | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Tiji Matthew | PERSON | 0.99+ |
Tiji Mathew | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Patrick Zimet | PERSON | 0.99+ |
IO-Tahoe | ORGANIZATION | 0.99+ |
Senthil Karuppaiah | PERSON | 0.99+ |
360 degree | QUANTITY | 0.99+ |
Tiji | PERSON | 0.99+ |
Senthilnathan Karuppaiah | PERSON | 0.99+ |
each department | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
Snowflake | TITLE | 0.99+ |
Webster | ORGANIZATION | 0.99+ |
Aj | PERSON | 0.99+ |
Dunkin | PERSON | 0.98+ |
Two | QUANTITY | 0.98+ |
IO | ORGANIZATION | 0.97+ |
Patrick Zeimet | PERSON | 0.97+ |
one time | QUANTITY | 0.97+ |
Webster Bank | ORGANIZATION | 0.97+ |
one | QUANTITY | 0.97+ |
Io-Tahoe | ORGANIZATION | 0.96+ |
both | QUANTITY | 0.96+ |
Senthilnathan | PERSON | 0.96+ |
IO-Tahoe | TITLE | 0.95+ |
first place | QUANTITY | 0.89+ |
Snowflake | EVENT | 0.71+ |
Tahoe | ORGANIZATION | 0.69+ |
Data Solutions | ORGANIZATION | 0.69+ |
IO | TITLE | 0.68+ |
-Tahoe | TITLE | 0.64+ |
Snowflake | ORGANIZATION | 0.6+ |
Morrisons | ORGANIZATION | 0.6+ |
Tahoe | TITLE | 0.59+ |