Bug 160985 - 16-digits integer values rounded to 15 digits when saving to OOXML .xlsx
Summary: 16-digits integer values rounded to 15 digits when saving to OOXML .xlsx
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.0.0.3 release
Hardware: All All
: medium normal
Assignee: Eike Rathke
URL:
Whiteboard: target:24.8.0 target:24.2.4
Keywords: bibisectNotNeeded, filter:xlsx, regression
Depends on:
Blocks:
 
Reported: 2024-05-08 08:58 UTC by Andrey
Modified: 2024-05-15 19:30 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Original file (133 bytes, text/csv)
2024-05-08 08:59 UTC, Andrey
Details
Result file (5.56 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2024-05-08 08:59 UTC, Andrey
Details
Screenshot of two files (468.54 KB, image/png)
2024-05-08 11:31 UTC, Andrey
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrey 2024-05-08 08:58:33 UTC
Description:
When converting from csv to xlsx, the cell value does not match the original file

Steps to Reproduce:
1. Open csv in spreadsheet
2. Save copy to xlsx
3. Check B3 cell value on csv and in xlsx

Actual Results:
The cell value B3 does not match the original file

Expected Results:
The cell value B3 will match the original file


Reproducible: Always


User Profile Reset: No

Additional Info:
Version: 24.2.3.2 (X86_64) / LibreOffice Community
Build ID: 433d9c2ded56988e8a90e6b2e771ee4e6a5ab2ba
CPU threads: 16; OS: macOS 14.4.1; UI render: Skia/Metal; VCL: osx
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded
Comment 1 Andrey 2024-05-08 08:59:27 UTC
Created attachment 194022 [details]
Original file
Comment 2 Andrey 2024-05-08 08:59:54 UTC
Created attachment 194023 [details]
Result file
Comment 3 ady 2024-05-08 11:26:52 UTC Comment hidden (obsolete)
Comment 4 Andrey 2024-05-08 11:31:24 UTC Comment hidden (obsolete)
Comment 5 Andrey 2024-05-08 11:35:54 UTC Comment hidden (obsolete)
Comment 6 ady 2024-05-08 17:07:13 UTC
So, cell B3 in the original CSV file is:
2000000528764739

whereas the exported XLSX file shows:
2000000528764740

While the importing/exporting result from CSV is slightly different with LO 3.3, the value shown when exporting to XLSX with LO 3.3 is the same as the original CSV file. > REGRESSION.

When saving as ODS, and when exporting to XLS, there is no problem.

While I can reproduce the behavior, I am not sure this can be considered a bug.


Version: 24.8.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 7a895ec4205659038aa95941b65715fed1a3e7be
CPU threads: 4; OS: Windows 10 (10.0 build 19045); UI render: Skia/Raster; VCL: win
Locale: en-US (es_AR); UI: en-US
Calc: CL threaded
Comment 7 ady 2024-05-08 17:13:51 UTC
(In reply to ady from comment #6)

> While I can reproduce the behavior, I am not sure this can be considered a
> bug.

I should clarify that. The length of the value in cell B3 is 16 digits [1]. IDK whether this is about some assumed accuracy limitation, or some rounding error when exporting.

Considering that this was exported correctly in the past, I would guess there is some avoidable problem. In that sense, this is a bug.

[1] <https://wiki.documentfoundation.org/Faq/Calc/Accuracy>
Comment 8 ady 2024-05-08 17:30:07 UTC
In LO 4.0, the CSV is _imported_ with the 2000000528764740 value instead of using the correct 2000000528764739 value. Also in LO 5.0.

In LO 5.1, the CSV is _imported_ correctly again with the correct 2000000528764739 value, but exporting to XLSX still shows the wrong 2000000528764740 value. Exporting to XLS is correct.

Setting the earliest version field as 5.1.
Comment 9 ady 2024-05-08 17:37:05 UTC
The original problem (exporting to XLSX) might be triggered before LO 4.0 already, together with the importing problem.

The import issue was resolved for LO 5.1, but not the export to XLSX.

So, the bibisect procedure might need to start before the LO 5.1 version, when the import was solved. Bibisecting even before LO 4.0 might be needed.
Comment 10 Andrey 2024-05-08 18:00:52 UTC
I created a csv document with 50 lines and 50 columns, saved this document as xlsx and gave it to the customer. in the xlsx document, there were incorrect values in 5 cells that I didn't notice right away. LO QUIETLY changed the values. I think this is a major bug.
Comment 11 ady 2024-05-08 19:22:37 UTC
(In reply to Andrey from comment #10)
> there were
> incorrect values in 5 cells that I didn't notice right away. LO QUIETLY
> changed the values. I think this is a major bug.

I would tend to agree that the issue is important, even though the regression has been present for such a long time and it happens on XLSX but not on ODS files.

Unfortunately, it has to be bisected (either in LO 5.0, or maybe even before LO 4.0 as I mentioned) before it can be fixed.


@Xisco?
Comment 12 raal 2024-05-09 13:47:56 UTC
Hello,
this is not problem with csv import. Try open new calc sheet and paste value 2000000528764739. Then, after export to xlsx is number rounded. ODS is correct. Tested with excel2010 - after paste number 2000000528764739 is number 2000000528764730 in the cell. So I assume this is limitation of xlsx format and not a bug.
As a workaround for csv case you can import column as text in import dialog.
Comment 13 ady 2024-05-09 17:13:50 UTC
(In reply to raal from comment #12)

> this is not problem with csv import.

Correct – please read my comments above. The import problem was solved already for LO version 5.1 (see comment 8). The problem in LO Dev 24.8 alpha is when _exporting_ to XLSX format, which started _after_ LO 3.3 but before (or in) LO 4.0.

> this is not problem with csv import. Try open new calc sheet and paste value
> 2000000528764739. Then, after export to xlsx is number rounded. ODS is
> correct. Tested with excel2010 - after paste number 2000000528764739 is
> number 2000000528764730 in the cell. So I assume this is limitation of xlsx
> format and not a bug.
> As a workaround for csv case you can import column as text in import dialog.

As you demonstrated by pasting the plain value 2000000528764739 to a new empty spreadsheet and saving as XLSX fie, there is no need to workaround the CSV import case.

Considering that the export to XLSX of the same 2000000528764739 value used to work _correctly_ in LO Calc 3.3, there might be a way to avoid the export problem in new versions.

Moreover, exporting to XLS works correctly, so there should be a way to avoid the problem when exporting to XLSX too.
Comment 14 ady 2024-05-09 17:16:39 UTC
I am changing the version field from 5.1 to 4.0, to reduce the confusion between the import problem (which is solved since 5.1) and the export-to-XLSX problem (which is present at least in 4.0 and newer but not in 3.3).
Comment 15 raal 2024-05-09 17:43:14 UTC
See this https://learn.microsoft.com/en-us/office/troubleshoot/excel/last-digits-changed-to-zeros
Excel therefore stores only 15 significant digits in a number and changes digits after the fifteenth place to zeros.
Comment 16 ady 2024-05-09 18:12:46 UTC
(In reply to raal from comment #15)
> See this
> https://learn.microsoft.com/en-us/office/troubleshoot/excel/last-digits-
> changed-to-zeros
> Excel therefore stores only 15 significant digits in a number and changes
> digits after the fifteenth place to zeros.

Yes, I assumed that this is somewhat related to accuracy [1], as I mentioned in comment 7.

But, if Calc can save the numeric value correctly (as shown by LO 3.3 when exporting to XLSX, and by the current export to XLS and by saving to ODS), shouldn't the value be saved accurately?

I am in favor of Calc imitating the behavior of other spreadsheet tools regarding functions (e.g. SUMPRODUCT, VLOOKUP, etc.) and also imitating other aspects such as keyboard shortcuts. But, should accuracy limitations be a limit too?

IOW, should Calc imitate this restriction seen in Excel? Or rather, perhaps let Excel apply its own accuracy limitations when opening the XLSX file (i.e. not-our-bug for such case), but maybe Calc should save the value accurately when it can. Are there negative consequences of such better accuracy?

How Excel handles the value in the XLS equivalent file (which saves the correct value when exported by Calc)?

I am unsure about this one, thus the questions.


[1] <https://wiki.documentfoundation.org/Faq/Calc/Accuracy>
Comment 17 Stéphane Guillou (stragu) 2024-05-14 13:29:38 UTC
(In reply to ady from comment #9)
> The import issue was resolved for LO 5.1, but not the export to XLSX.
This is since build [7ca6c9c122aaad426b4aa6199ceff409c31a7f42] in linux-64-5.1 repo, which is:

commit 0f6203edf74832f84d8263d7a544d679203a4efc
author	Eike Rathke 	Wed Jan 13 14:40:12 2016 +0100
committer	Eike Rathke 	Wed Jan 13 14:47:57 2016 +0100
tdf#96918 display accurate integer double values up to (2^53)-1

Eike, thoughts on this?
Comment 18 ady 2024-05-14 14:45:52 UTC
Simplest STR:
In a new spreadsheet, paste the numeric value:
2000000528764739

Save as ODS and as XLS, then reload > same value.

Save as XLSX, then reload:
* LO 3.3: same value
* LO 4.0 and newer (but maybe older versions too): 2000000528764740

Excel: see comment 15.

Calc could (potentially) still save the value accurately and leave Excel's behavior to Excel. Pros and cons, see comment 16. If Excel would perform calculations with these (different) values differently than Calc (would it?), maybe users won't like that, thus, "pros and cons".

As for the original case presented in this report (i.e. to share accurate values with other parties), there are alternative methods, but that's a different issue.
Comment 19 Eike Rathke 2024-05-14 16:06:07 UTC
This happens already during save to .xlsx, the value 2000000528764739 is saved as 2000000528764740 in the XML file. It does not happen when saving to .ods or .csv (.xls stores the binary value anyway) where also the IEEE 754 binary double floating-point value is converted to string, same as the display string. Apparently two different conversions are involved here.
Comment 20 Commit Notification 2024-05-15 16:03:20 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/1ea85e8f2d815e6c0834215714ed3dad1101b971

Resolves: tdf#160985 Max integer representation for rtl_math_StringFormat_G

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 21 Commit Notification 2024-05-15 16:03:23 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/11bfced0a4f1d5f3ccaf8e1c87c11733ca71f29d

Unit test for tdf#160985

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 22 Eike Rathke 2024-05-15 16:05:37 UTC
Pending review https://gerrit.libreoffice.org/c/core/+/167673 for 24-2
Comment 23 Commit Notification 2024-05-15 19:30:01 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-24-2":

https://git.libreoffice.org/core/commit/49a69504963812d8e2bd7d9a7de6c011a5749dda

Resolves: tdf#160985 Max integer representation for rtl_math_StringFormat_G

It will be available in 24.2.4.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.