Benchmarks

ValidX is the fastest validation library among the following competitors.

The following competitors have been excluded from the benchmark, because the libraries do not work on Python >= 3.10.

  • Valideer 0.4.2 had compatible performance with pure-Python implementation of ValidX. Excluded until issue #27 is fixed.
  • Validr 1.2.1 had compatible performance with Cython implementation of ValidX. Excluded until issue #60 is fixed.

Use the following command to run benchmarks:

make benchmarks

I got the following results on my laptop:

  • CPU Intel i7-1260P
  • RAM 32GB
  • OS Xubuntu 22.04.2, Linux core 5.15.0-72-generic
  • Python 3.10.6
----------------------------------------------------- benchmark: 9 tests -----------------------------------------------------
Name (time in us)          Min                   Max                Mean              StdDev            OPS (Kops/s)
------------------------------------------------------------------------------------------------------------------------------
test_validx_cy          1.8540 (1.0)          6.6900 (1.0)        2.0220 (1.0)        0.1247 (1.0)          494.5673 (1.0)
test_validx_py          3.5870 (1.93)        11.1630 (1.67)       4.0128 (1.98)       0.2350 (1.88)         249.2040 (0.50)
test_colander           5.9800 (3.23)        19.6410 (2.94)       6.6070 (3.27)       0.3332 (2.67)         151.3540 (0.31)
test_voluptuous         7.0590 (3.81)        18.3420 (2.74)       7.6800 (3.80)       0.3089 (2.48)         130.2080 (0.26)
test_pydantic           8.7520 (4.72)        23.0670 (3.45)      10.5461 (5.22)       0.5650 (4.53)          94.8216 (0.19)
test_marshmallow       26.5630 (14.33)       47.8270 (7.15)      28.7742 (14.23)      0.9160 (7.34)          34.7533 (0.07)
test_jsonschema        44.2580 (23.87)       62.9430 (9.41)      47.4968 (23.49)      1.3421 (10.76)         21.0540 (0.04)
test_schema            61.0670 (32.94)       82.3220 (12.31)     65.3104 (32.30)      1.5263 (12.24)         15.3115 (0.03)
test_cerberus         250.4110 (135.07)   6,304.0850 (942.31)   295.1710 (145.98)   207.4218 (>1000.0)        3.3879 (0.01)
------------------------------------------------------------------------------------------------------------------------------

Why you should care about performance

Note

I got tired to update the numbers in this section on each release. So I decided to give up. Let it be as it is. The numbers here are outdated and not based on the benchmark above anymore. But it doesn’t change the main point — performance is important.

I have been asked by my colleagues: “Why should we care about performance? Data validation is not a bottleneck usually.” And it is correct. But let’s look on it from other side.

Let’s say you have a web application that uses Cerberus for data validation, because Cerberus is the number one in 7 Best Python Libraries for Validating Data. How much will you earn replacing Cerberus by ValidX?

According to the benchmark above Cerberus spends 808 μs for each request, while ValidX only 2 μs. So that you will save 806 μs for each request. How much is it?

If you have a small webserver that takes about 200 requests per second (I took the number from this discussion on Stack Overflow), you will save:

806 μs × 200 × 60 × 60 × 24 = 13927.68 s/day
13927.68 ÷ 60 ÷ 60 = 3.8688 h/day

Yes, you will save almost 4 hours of server time daily, or almost 5 days monthly! It is about $5 monthly for each general purpose t3.medium instance on AWS, which costs $0.0416 per hour.

And now it is time to look at your logs, calculate number of requests you got in the last month, and compare it with a bill from your hosting provider.