.. _benchmarks:
Benchmarks
==========
ValidX is the fastest validation library among the following competitors.
* `Cerberus 1.3.4 `_ ~145x slower
* `Colander 2.0 `_ ~3x slower
* `JSONSchema 4.17.3 `_ ~20x slower
* `Marshmallow 3.19.0 `_ ~14x slower
* `Pydantic 1.10.8 `_ ~5x slower
* `Schema 0.7.5 `_ ~30x slower
* `Voluptuous 0.13.1 `_ ~3.5x slower
The following competitors have been excluded from the benchmark,
because the libraries do not work on Python >= 3.10.
* `Valideer 0.4.2 `_
had compatible performance with pure-Python implementation of ValidX.
Excluded until `issue #27 `_ is fixed.
* `Validr 1.2.1 `_
had compatible performance with Cython implementation of ValidX.
Excluded until `issue #60 `_ is fixed.
Use the following command to run benchmarks::
make benchmarks
I got the following results on my laptop:
* CPU Intel i7-1260P
* RAM 32GB
* OS Xubuntu 22.04.2, Linux core 5.15.0-72-generic
* Python 3.10.6
::
----------------------------------------------------- benchmark: 9 tests -----------------------------------------------------
Name (time in us) Min Max Mean StdDev OPS (Kops/s)
------------------------------------------------------------------------------------------------------------------------------
test_validx_cy 1.8540 (1.0) 6.6900 (1.0) 2.0220 (1.0) 0.1247 (1.0) 494.5673 (1.0)
test_validx_py 3.5870 (1.93) 11.1630 (1.67) 4.0128 (1.98) 0.2350 (1.88) 249.2040 (0.50)
test_colander 5.9800 (3.23) 19.6410 (2.94) 6.6070 (3.27) 0.3332 (2.67) 151.3540 (0.31)
test_voluptuous 7.0590 (3.81) 18.3420 (2.74) 7.6800 (3.80) 0.3089 (2.48) 130.2080 (0.26)
test_pydantic 8.7520 (4.72) 23.0670 (3.45) 10.5461 (5.22) 0.5650 (4.53) 94.8216 (0.19)
test_marshmallow 26.5630 (14.33) 47.8270 (7.15) 28.7742 (14.23) 0.9160 (7.34) 34.7533 (0.07)
test_jsonschema 44.2580 (23.87) 62.9430 (9.41) 47.4968 (23.49) 1.3421 (10.76) 21.0540 (0.04)
test_schema 61.0670 (32.94) 82.3220 (12.31) 65.3104 (32.30) 1.5263 (12.24) 15.3115 (0.03)
test_cerberus 250.4110 (135.07) 6,304.0850 (942.31) 295.1710 (145.98) 207.4218 (>1000.0) 3.3879 (0.01)
------------------------------------------------------------------------------------------------------------------------------
Why you should care about performance
-------------------------------------
.. note::
I got tired to update the numbers in this section on each release.
So I decided to give up.
Let it be as it is.
The numbers here are outdated and not based on the benchmark above anymore.
But it doesn't change the main point —
performance is important.
I have been asked by my colleagues:
“Why should we care about performance?
Data validation is not a bottleneck usually.”
And it is correct.
But let's look on it from other side.
Let's say you have a web application that uses Cerberus for data validation,
because Cerberus is the number one in `7 Best Python Libraries for Validating Data`_.
How much will you earn replacing Cerberus by ValidX?
According to the benchmark above Cerberus spends 808 μs for each request,
while ValidX only 2 μs.
So that you will save 806 μs for each request.
How much is it?
If you have a small webserver that takes about 200 requests per second
(I took the number from this `discussion on Stack Overflow`_),
you will save::
806 μs × 200 × 60 × 60 × 24 = 13927.68 s/day
13927.68 ÷ 60 ÷ 60 = 3.8688 h/day
Yes,
you will save almost 4 hours of server time daily,
or almost 5 days monthly!
It is about $5 monthly for each general purpose ``t3.medium`` instance on AWS_,
which costs $0.0416 per hour.
And now it is time to look at your logs,
calculate number of requests you got in the last month,
and compare it with a bill from your hosting provider.
.. _7 Best Python Libraries for Validating Data: https://www.yeahhub.com/7-best-python-libraries-validating-data/
.. _discussion on Stack Overflow: https://stackoverflow.com/questions/1319965/how-many-requests-per-minute-are-considered-heavy-load-approximation
.. _AWS: https://aws.amazon.com/ec2/pricing/on-demand/