Replies: 1 comment 4 replies
-
|
Hi @vovavili Thanks for your detailed post! Short answer: yes! Let's chip away at getting feature parity in the built-in checks between GE and Pandera. The 5 checks you enumerated are easily implemented in pandera. I'll start an issue with these 5! @vovavili would you mind helping prioritize what GE checks you'd like supported in pandera? |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, gentlemen!
I've been using this package for my work-related tasks for quite some time now and I find it to be useful beyond belief. In comparison to something like Great Expectations (another data validation suite that I have to use for work-related tasks), Pandera is light-weight, easy to set up and extend and it serves its job in a highly intuitive way, with fantastic integration with hypothesis package. For Great Expectations I had to read documentation multiple times just to get to begin to understand how to configure it properly.
However, one thing that I really like about GE is that is has a wide assortment of built-in checks, while sometime working with Pandera makes me write custom checks for what are seemingly common validation tasks. While this is nothing way too cumbersome, as a matter of common good I think Pandera would benefit if as many common validation tasks as possible were to be bundled with it, leaving custom checks as a resort of less commonly used operations. I think this would push Pandera's already high ease of use even further.
For example, here are some of checks from GE that I frequently rely on during my workflow which I think are common enough to warrant being considered to be built into Pandera as well:
Expect a specific format of datetime string in a given column
Expect all values in a column to be unique; also this
An ability to operate specifically with column's min, max and average values
For a pair of columns, expect value in column n1 to be greater than value in column n2.
Check pertaining to order of rows, i.e. expect column values to be decreasining/increasing
One common thread around checks outlined about is that most of them can be rather simple custom wide checks, and built-in Pandera checks are all dealing with tidy data. I think there is ample space of improvement here as well, since I don't really see a reason as to why most common validation operations cannot be cross-column.
Would you all agree?
What other common validation operations would you think can be bundled into Pandera?
Would you agree with a proposal that some of wide checks should be built-in as well?
Thank you all in advance for your input, thoughts and opinions.
Beta Was this translation helpful? Give feedback.
All reactions