Quality control

Best strategy is to combine manual review process with
automated quality control

Manual checks

Strategies to ensure quality via
peer review process: i.e. pipeline creator
and modeator are two different people.
  • Spot check data
  • Compare against data source target
  • Code peer review
  • --

Auto checks

Strategies to detect issues in more automated
and scalable way.
  • JSON schema validation
  • Custom validators
  • Schema editor
  • ---

Manual checks

Best strategy is to combine manual review process with
automated quality control

Spot checking

Simply reviewing data visually will do a lot. If you are familiar
with industry, your experience will help to identify things which look off quickly.

Learn more
  • Pick random data points and make sure they match.

  • Number of rows and checksums: if the target website
    has 15 pages with 100 records each, we need to have 1500 records in your pipeline.

  • Nulls checks - be aware how many fields are null or udnefined, and be explicit that you want it that way.

Snapshot review

You can inspect the result of your code run and make sure the
snapshot make sense to you. You can simply review output
in JSON in inline editor.

Alternatively, you can download the JSONĀ file or you can copy
link to snapshot JSON file and load into other tool (like Postman).

Review statistics

You can review basic statistics for each field and understand how many of them are null or undefined.

Automated checks

Some automated tools for checking

JSONSchema validator

It checks automatically for each row, each field whether data
matches schema.

Custom validators

We also provide ability to provide custom rules for your dataset. Custom validator is Python program which validates your code output.

Learn more about custom validators

Template tuning

The most powerful tool we have is interactive schema editor. It show what fields your dataset is EXPECTED to have VS actually have.

You should strive to keep schema as strict as possible.