[go: up one dir, main page]

Excel string pool parse reuse bug and weird excel cell blanks

Morning.

My colleagues have been able to produce some weird .xlsxs again.

In stringpool string ids 85 and 86 are as follows. image

Blank cells (when I open them in LibreOffice) then refer to string ID 86. image

When I read the Excel with Orcus, I see the "Ei tiedossa" is read twice. image

What I read in the orcus's string pool reader, I guess that "<t/>" does not empty xlsx_shared_strings_context::m_cur_str and when "<si/>" is hit, memory is reused.

However, I assume there are no further guards against the blank cell being an empty string. What is your opinion on that? The document string factory or parser could handle this by specially storing the string ID, which is an empty string.

I would do the parser since my colleagues have edited these in some form of Excel program, which did not fix this on resave. LibreOffice does fix this if resaved.

Edited by Henrik Valve