Custom validators

One of the most important goals of the data platform is to ensure data quality.
One of the mechanisms is using JSON schema, as discussed above.
However, we also provide ability to provide custom rules for your dataset.
Here are some scenarios when it may be useful.

Contextual validation. While JSON schema works for most scenarios when we need to validate specific value (for example, number of rooms in apartment), it is impossible to validate context (for example that when room number is 1, area should be less than 50 square meters).

Custom logics. You may need very custom logic, for example normalize value (make string lowercase and then check if its one of the list).
import sys
import json


apartments = json.loads(sys.stdin.read())


# CONFIG START
WARNING_PRICE_MIN = 1500
WARNING_PRICE_MAX = 6000


ERROR_PRICE_MIN = 500
ERROR_PRICE_MAX = 10000
# CONFIG END


has_price = False
for apartment in apartments:
 flat_id = apartment.get('id')
 price = apartment.get('price')
 area = apartment.get('area')


 price_sqm = None
 if area and price:
   has_price = True
   price_sqm = price / area
   if price_sqm < ERROR_PRICE_MIN:
     raise Exception('Price for %s too low %f' % (flat_id, price_sqm))
   elif price_sqm > ERROR_PRICE_MAX:
     raise Exception('Price for %s too high %f' % (flat_id, price_sqm))


   elif price_sqm < WARNING_PRICE_MIN:
     print('Price for %s MAY BE too low %f' % (flat_id, price_sqm))
   elif price_sqm > WARNING_PRICE_MAX:
     print('Price for %s MAY BE too high %f' % (flat_id, price_sqm))

Reading the data

Simply load data from stdin.

import sys
import json


apartments = json.loads(sys.stdin.read())

Warning level validation

For warnings print error message to console

 elif price_sqm < WARNING_PRICE_MIN:
     print('Price for %s MAY BE too low %f' % (flat_id, price_sqm))

Error level validation

For errors simply throw exception

elif price_sqm > ERROR_PRICE_MAX:
     raise Exception('Price for %s too high %f' % (flat_id, price_sqm))

Error & warnings

It is possible scenario that validator triggers both warning and error.
In that case higher importance (error) level validation is shown in our tool.

Testing your validator

Now you have your validator code its time to test it! We recommend to test with valid and invalid scenario to make sure it works. First lets create valid JSON file APARTMENTS_VALID.json:

[{‘area’: 50, ‘price’: 200000, ‘id’: ‘1’}]

Test it by piping via command line using terminal:

>>
>> python3 price_validator.py < APARTMENTS_VALID.json
>>

Since data in JSON is valid, as expected nothing will be printed. Now lets created invalid data sample APARTMENTS_INVALID.JSON

[{‘area’: 50, ‘price’: 20000000, ‘id’: ‘1’}]

Lets run it in command line

>>
>> python3 price_validator.py < APARTMENTS_INVALID.json
>> Price for 1 too high 200000000!

We have just completed writing and testing our first validator! Good luck creating new rules to ensure data quality.