After days of digging into it and trying to understand what's going on, I have concluded that the 12.X.X checksums sometimes fail. It's deterministic but I don't know based on what.
I am talking about this line.
It happens for both text and binary cases. The Docker testsuite has been used (latest tag as of 4 August 2023). We tried both echoing frames or echoing whole messages. In both cases, some test cases fail.
The client definitely receives the correct messages and echoes them correctly back. All of that has been checked thoroughly. What's interesting is that most of the 12.X.X test cases work perfectly fine in all environments, some fail on some operating systems and others on others. But within the operating system, it's consistent and deterministic.
But the most interesting bit is that exactly equivalent frames (and messages) work in other test cases (as similar ones are repeated throughout the suite). Again, deterministic.
Without knowing too much about implementation specifics, I think one of 2 things are happening:
- This leads to different hashes and doesn't accept correct messages from the client as it'sthe other way round on the other end.
- It's a compression thing where the uncompressed message is correct but the testsuite checks the compressed message. There might be cases where deflate is not deterministic but the uncompressed data is still correct.
I don't know too much about the testsuite and the implementation details, so it's difficult for me to rule out either of those 2 ideas and also to come up with others.
Would there be any way to change the way this comparison works to fix this? Full message comparison (uncompressed) for example? Even if temporary, we would like to continue testing 12.X.X testcases.
If there is anything I can provide to help debug this, I will happily do this.
After days of digging into it and trying to understand what's going on, I have concluded that the 12.X.X checksums sometimes fail. It's deterministic but I don't know based on what.
I am talking about this line.
It happens for both text and binary cases. The Docker testsuite has been used (latest tag as of 4 August 2023). We tried both echoing frames or echoing whole messages. In both cases, some test cases fail.
The client definitely receives the correct messages and echoes them correctly back. All of that has been checked thoroughly. What's interesting is that most of the 12.X.X test cases work perfectly fine in all environments, some fail on some operating systems and others on others. But within the operating system, it's consistent and deterministic.
But the most interesting bit is that exactly equivalent frames (and messages) work in other test cases (as similar ones are repeated throughout the suite). Again, deterministic.
Without knowing too much about implementation specifics, I think one of 2 things are happening:
I don't know too much about the testsuite and the implementation details, so it's difficult for me to rule out either of those 2 ideas and also to come up with others.
Would there be any way to change the way this comparison works to fix this? Full message comparison (uncompressed) for example? Even if temporary, we would like to continue testing 12.X.X testcases.
If there is anything I can provide to help debug this, I will happily do this.