feat(array): update String array type#5545
Merged
Conversation
Update flux's String array type to either be an arrow Binary array or an arrow Dictionary with Binary values. This removes the non-arrow compatible single string variant, instead using a dictionary to provide the low memory version for repeated values. A dictionary provides a more general purpose implementation of the same idea to not keep repeating identical values. The StringBuilder still swaps to a standard Binary array after a second unique value of a String is observed, but does not do so just because a NULL is added to the array. In the future the heuristic could be changed to provide memory efficient string representations in other contexts. Moving to a completely arrow-compatible interface makes the String array type much less fragile. It is now possible to use the String array in any context that an arrow Array can be used, and removes the special-case code previously required to split a String array.
The NewStringFromBinaryArray function is only used in one place, where it can be replaced with NewStringData to perform a more succinct conversion. Repurpose the NewStringFromBinaryArray test to test NewStringData.
jeffreyssmith2nd
approved these changes
Jun 24, 2025
Comment on lines
+89
to
+90
| values.Release() | ||
| indices.Release() |
Contributor
There was a problem hiding this comment.
Feels like I'm back in my C++ days 😄
Contributor
Author
There was a problem hiding this comment.
That is because the go API is modelled off of the C++ one.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Update flux's String array type to either be an arrow Binary array or an arrow Dictionary with Binary values. This removes the non-arrow compatible single string variant, instead using a dictionary to provide the low memory version for repeated values. A dictionary provides a more general purpose implementation of the same idea to not keep repeating identical values.
The StringBuilder still swaps to a standard Binary array after a second unique value of a String is observed, but does not do so just because a NULL is added to the array. In the future the heuristic could be changed to provide memory efficient string representations in other contexts.
Moving to a completely arrow-compatible interface makes the String array type much less fragile. It is now possible to use the String array in any context that an arrow Array can be used, and removes the special-case code previously required to split a String array.
Checklist
Dear Author 👋, the following checks should be completed (or explicitly dismissed) before merging.
experimental/docs/Spec.mdhas been updatedDear Reviewer(s) 👋, you are responsible (among others) for ensuring the completeness and quality of the above before approval.