Skip to content

Fix endian serialization for EventPipe and TraceLog on big-endian hosts#2429

Open
saitama951 wants to merge 2 commits into
microsoft:mainfrom
saitama951:fix_endian_serialization
Open

Fix endian serialization for EventPipe and TraceLog on big-endian hosts#2429
saitama951 wants to merge 2 commits into
microsoft:mainfrom
saitama951:fix_endian_serialization

Conversation

@saitama951

Copy link
Copy Markdown

Currently dotnet-trace is broken on s390x because
FastSerialization, SpanReader, and TraceLog all wrote and read multi-byte integers using raw struct copies or BitConverter without normalising byte order, producing corrupt .etlx files and misread EventPipe streams when running on a big-endian machine a.k.a s390x.

@saitama951 saitama951 requested a review from a team as a code owner May 27, 2026 17:06
@saitama951

Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree company="IBM"

@saitama951 saitama951 force-pushed the fix_endian_serialization branch from 9ea4ec5 to d4f19c5 Compare May 27, 2026 17:24
@brianrob

Copy link
Copy Markdown
Member

@saitama951, thank you for contributing this change. I think to add support for big endian targets, we'd need to do something at the FastSerialization level to understand the endianness of the file vs. the endianness of the host. Then, we can swap in read/write primitives that can handle the differences.

From what I'm seeing in your change, it looks like there are a attempts to read in one endianness, and if that doesn't work, read in the other. I think we need a pattern that doesn't materially change the development pattern, but allows us to do the right thing based on the file target and current host.

I'd also like to make sure that we have some big endian tests as well so that we can make sure we don't break this support in the future.

Would you be up for looking at this, and proposing a design? I think this would be a good path forward before writing more code, so that we can make sure we're aligned on design.

@saitama951

Copy link
Copy Markdown
Author

From what I'm seeing in your change, it looks like there are a attempts to read in one endianness, and if that doesn't work

Thank you for taking a look. I don't quite understand this. Can you please point to an example in the diff?

@brianrob

brianrob commented Jun 2, 2026

Copy link
Copy Markdown
Member

From what I'm seeing in your change, it looks like there are a attempts to read in one endianness, and if that doesn't work

Thank you for taking a look. I don't quite understand this. Can you please point to an example in the diff?

Sorry, I mis-read a part of your change. I was looking at SpanReader.cs at the time.

@brianrob

brianrob commented Jun 2, 2026

Copy link
Copy Markdown
Member

Do I read correctly that this change makes its endian decision based on the current machine and not the architecture of the machine where the trace was captured?

@saitama951

saitama951 commented Jun 3, 2026

Copy link
Copy Markdown
Author

The .nettrace file format specifies that every primitive type should be in little-endian format,
we have already added the support for eventpipe on s390x in dotnet/runtime which already writes in little-endian format in the .nettrace file (#68648).

now when converting this .nettrace into something readable file format i.e speedscope this fails.
for example in an .nettrace file we have the fast serialization header which writes the length i.e 20 and followed by the string "!FastSerialization.1"

00000000  4e 65 74 74 72 61 63 65  14 00 00 00 21 46 61 73  |Nettrace....!Fas|
00000010  74 53 65 72 69 61 6c 69  7a 61 74 69 6f 6e 2e 31  |tSerialization.1|

now this length of this string in hex present in the .nettrace file is 14 00 00 00 (both in little-endian and big-endian system)
now when this is being read by the TraceEvent library here (link). here the length would result as 00 00 00 14 on a little-endian machine while on a big-endian machine would result in 14 00 00 00 (which is incorrect). so we make use of BinaryPrimitives to reverse this value only on s390x.

now when again writing back the string length into the .etlx file i.e (link) we write back the length before writing the string (link) now when writing on a little endian machine this would write as 14 00 00 00 while on a big-endian machine this would be written as 00 00 00 14, which is again incorrect.

As for the tests, In my opinion the existing tests should suffice for big endian system (since test everything in little-endian format) in case something fails we know we are not writing in the correct endian-format.

SwapEventRecordEndianness(buffer);
fixed (byte* bufferPtr = buffer)
{
WriteBlob((IntPtr)bufferPtr, writer, headerSize);

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @brianrob,
based on you're suggestion I have made most of the changes in the StreamReadWriter file, although I had one concern while writing Int32 to the etlx file, we try to copy all the raw events into the file as a blob which writes in-terms of 32 bit pointer values of the EventRecord,
do you want to segregate WriteBlob as a different implementation (doesn't call write(int) from StreamReadWriter, rather copy its implementation) or keep the BinaryPrimitives.ReadInt32 as it is present in the patch?

@saitama951 saitama951 force-pushed the fix_endian_serialization branch 8 times, most recently from d773612 to 5d78759 Compare June 11, 2026 13:43
@saitama951

Copy link
Copy Markdown
Author

I have re-factored the entire patch, please do review and let me know if any further changes are required?

@saitama951 saitama951 force-pushed the fix_endian_serialization branch from 5d78759 to a4a2635 Compare June 11, 2026 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants