2021 Call for Code Awards: Live from New York, with SNL’s Colin Jost! Learn more

Journey to data platform: Setting up your taxonomy

In today’s enterprises, data is the driver behind all business decisions. A data catalog is central to an enterprise’s efforts in enabling their business users and data scientists to find the right data with good quality. Data with known quality and known provenance is worth its weight in gold. A key piece of cataloging is the tagging of data assets with business vocabulary so that it can be easily found and consumed by the data users. The first piece in establishing a data catalog is to have a well defined taxonomy–consisting of business vocabulary expressed in a hierarchy of business terms–with definitions and well established relationships amongst them.

The following is a process to setup your business glossary to glean insights about your data and provide the most value for your business:

  • Initially concentrate upon a single high-value information area. If the highest priority is regulatory purpose (e.g., personally identifiable information for the California Consumer Privacy Act CCPA), chose another focus area that will benefit from having a taxonomy. For example, personally identifiable information or key value KPIs.

  • Set the focus on the semantic business definitions, and borrow from the language of the business in the form of logical or business intelligence models. Leverage existing lexicons or industry standards. Poll and gather real-time understanding for how such concepts and definitions are currently used or applied.

  • Establish benefit, and garner interest within a selected and focused area of the organization. Demonstrate the richness of a business term in that it includes relationships to information assets, links to external specifications or documentation, and details specific to the application and usage of such a term. Adoption across the organization may not be immediate; however, it will follow as users realize the benefits of a singular portal for defined information. Pick a single purpose where your glossary will bring the highest return to the business. If your first choice is a regulatory compliance, pick a second choice to advertise the benefits of governance.

  • Finally, develop milestones for the establishment and publication of business categories and business terms, including the correct set of authors, approvers, and publishers. Authors define the domain or subject-matter expert who will draft information and the details. Approvers review, comment, and accept or reject the drafted information. Finally, the publishers will make available the approved information for everyone within business glossary.

Some of the common inputs that should be considered while creating a good taxonomy include the following:

  1. Begin with creating a set of hierarchal business categories to reflect the containers of the terms to be created. Each category should be unique and reflect a distinction in its definition. For example: lines of business or geographic boundaries, cost centers.

  2. Create a set of terms that must appear within a business category. It is good practice to additionally include a description, status, and example details.

  3. If you already have terms in categories, go through an exercise to normalize them. For example, if you have a customer name, vendor name, contractor name, verify if these are distinct terms or a single term associated with a relationship.

  4. Use the import capability to ingest pre-existing terms into the catalog

  5. If your current terms have a 1-1 relationship with the columns in your tables, then a logical model could be used to normalize them.

  6. If you have a BI report, that usually represents something of business value, hence they can directly become business terms.

  7. Setup the workflow process so you have a set of reviewers and approvers selected from the stakeholders and a group of editors and publishers from the stewardship organization. Agreeing to terms and their descriptions is typically aided by a meeting of stake holders

  8. As terms are created, set up relationships between terms (e.g., a Customer “Has A” address or “Has a” identifier).

  9. Finally, institute a further review process with key business owners and stakeholders to gather feedback and understand the key benefits they may have received.