One of the most important goals of the data platform is to ensure data quality.
One of the mechanisms is using JSON schema, as discussed above.
However, we also provide ability to provide custom rules for your dataset.
Here are some scenarios when it may be useful.
import sys
import json
apartments = json.loads(sys.stdin.read())
# CONFIG START
WARNING_PRICE_MIN = 1500
WARNING_PRICE_MAX = 6000
ERROR_PRICE_MIN = 500
ERROR_PRICE_MAX = 10000
# CONFIG END
has_price = False
for apartment in apartments:
flat_id = apartment.get('id')
price = apartment.get('price')
area = apartment.get('area')
price_sqm = None
if area and price:
has_price = True
price_sqm = price / area
if price_sqm < ERROR_PRICE_MIN:
raise Exception('Price for %s too low %f' % (flat_id, price_sqm))
elif price_sqm > ERROR_PRICE_MAX:
raise Exception('Price for %s too high %f' % (flat_id, price_sqm))
elif price_sqm < WARNING_PRICE_MIN:
print('Price for %s MAY BE too low %f' % (flat_id, price_sqm))
elif price_sqm > WARNING_PRICE_MAX:
print('Price for %s MAY BE too high %f' % (flat_id, price_sqm))
Simply load data from stdin.
import sys
import json
apartments = json.loads(sys.stdin.read())
For warnings print error message to console
elif price_sqm < WARNING_PRICE_MIN:
print('Price for %s MAY BE too low %f' % (flat_id, price_sqm))
For errors simply throw exception
elif price_sqm > ERROR_PRICE_MAX:
raise Exception('Price for %s too high %f' % (flat_id, price_sqm))
It is possible scenario that validator triggers both warning and error.
In that case higher importance (error) level validation is shown in our tool.
Now you have your validator code its time to test it! We recommend to test with valid and invalid scenario to make sure it works. First lets create valid JSON file APARTMENTS_VALID.json:
[{‘area’: 50, ‘price’: 200000, ‘id’: ‘1’}]
Test it by piping via command line using terminal:
>>
>> python3 price_validator.py < APARTMENTS_VALID.json
>>
Since data in JSON is valid, as expected nothing will be printed. Now lets created invalid data sample APARTMENTS_INVALID.JSON
[{‘area’: 50, ‘price’: 20000000, ‘id’: ‘1’}]
Lets run it in command line
>>
>> python3 price_validator.py < APARTMENTS_INVALID.json
>> Price for 1 too high 200000000!
We have just completed writing and testing our first validator! Good luck creating new rules to ensure data quality.