Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 97287bd

Browse files
Move is_valid_ascii() to ascii.h.
This function requires simd.h, which is a rather large dependency for a widely-used header file like pg_wchar.h. Furthermore, there is a report of a third-party tool that is struggling to use pg_wchar.h due to its dependence on simd.h (presumably because simd.h uses several intrinsics). Moving the function to the much less popular ascii.h resolves these issues for now. This commit is back-patched for the benefit of the aforementioned third-party tool. The simd.h dependency was only added in v16, but we've opted to back-patch to v15 so that is_valid_ascii() lives in the same file for all versions where it exists. This could break existing third-party code that uses the function, but we couldn't find any examples of such code. It should be possible to fix any code that this commit breaks by including ascii.h in the file that uses is_valid_ascii(). Author: Jubilee Young Reviewed-by: Tom Lane, John Naylor, Andres Freund, Eric Ridge Discussion: https://postgr.es/m/CAPNHn3oKJJxMsYq%2BqLYzVJOFrUcOr4OF1EC-KtFT-qh8nOOOtQ%40mail.gmail.com Backpatch-through: 15
1 parent 400928b commit 97287bd

File tree

3 files changed

+69
-69
lines changed

3 files changed

+69
-69
lines changed

src/common/wchar.c

+1
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
#include "c.h"
1414

1515
#include "mb/pg_wchar.h"
16+
#include "utils/ascii.h"
1617

1718

1819
/*

src/include/mb/pg_wchar.h

-69
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,6 @@
2222
#ifndef PG_WCHAR_H
2323
#define PG_WCHAR_H
2424

25-
#include "port/simd.h"
26-
2725
/*
2826
* The pg_wchar type
2927
*/
@@ -722,71 +720,4 @@ extern int mic2latin_with_table(const unsigned char *mic, unsigned char *p,
722720
extern WCHAR *pgwin32_message_to_UTF16(const char *str, int len, int *utf16len);
723721
#endif
724722

725-
726-
/*
727-
* Verify a chunk of bytes for valid ASCII.
728-
*
729-
* Returns false if the input contains any zero bytes or bytes with the
730-
* high-bit set. Input len must be a multiple of the chunk size (8 or 16).
731-
*/
732-
static inline bool
733-
is_valid_ascii(const unsigned char *s, int len)
734-
{
735-
const unsigned char *const s_end = s + len;
736-
Vector8 chunk;
737-
Vector8 highbit_cum = vector8_broadcast(0);
738-
#ifdef USE_NO_SIMD
739-
Vector8 zero_cum = vector8_broadcast(0x80);
740-
#endif
741-
742-
Assert(len % sizeof(chunk) == 0);
743-
744-
while (s < s_end)
745-
{
746-
vector8_load(&chunk, s);
747-
748-
/* Capture any zero bytes in this chunk. */
749-
#ifdef USE_NO_SIMD
750-
751-
/*
752-
* First, add 0x7f to each byte. This sets the high bit in each byte,
753-
* unless it was a zero. If any resulting high bits are zero, the
754-
* corresponding high bits in the zero accumulator will be cleared.
755-
*
756-
* If none of the bytes in the chunk had the high bit set, the max
757-
* value each byte can have after the addition is 0x7f + 0x7f = 0xfe,
758-
* and we don't need to worry about carrying over to the next byte. If
759-
* any input bytes did have the high bit set, it doesn't matter
760-
* because we check for those separately.
761-
*/
762-
zero_cum &= (chunk + vector8_broadcast(0x7F));
763-
#else
764-
765-
/*
766-
* Set all bits in each lane of the highbit accumulator where input
767-
* bytes are zero.
768-
*/
769-
highbit_cum = vector8_or(highbit_cum,
770-
vector8_eq(chunk, vector8_broadcast(0)));
771-
#endif
772-
773-
/* Capture all set bits in this chunk. */
774-
highbit_cum = vector8_or(highbit_cum, chunk);
775-
776-
s += sizeof(chunk);
777-
}
778-
779-
/* Check if any high bits in the high bit accumulator got set. */
780-
if (vector8_is_highbit_set(highbit_cum))
781-
return false;
782-
783-
#ifdef USE_NO_SIMD
784-
/* Check if any high bits in the zero accumulator got cleared. */
785-
if (zero_cum != vector8_broadcast(0x80))
786-
return false;
787-
#endif
788-
789-
return true;
790-
}
791-
792723
#endif /* PG_WCHAR_H */

src/include/utils/ascii.h

+68
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,74 @@
1111
#ifndef _ASCII_H_
1212
#define _ASCII_H_
1313

14+
#include "port/simd.h"
15+
1416
extern void ascii_safe_strlcpy(char *dest, const char *src, size_t destsiz);
1517

18+
/*
19+
* Verify a chunk of bytes for valid ASCII.
20+
*
21+
* Returns false if the input contains any zero bytes or bytes with the
22+
* high-bit set. Input len must be a multiple of the chunk size (8 or 16).
23+
*/
24+
static inline bool
25+
is_valid_ascii(const unsigned char *s, int len)
26+
{
27+
const unsigned char *const s_end = s + len;
28+
Vector8 chunk;
29+
Vector8 highbit_cum = vector8_broadcast(0);
30+
#ifdef USE_NO_SIMD
31+
Vector8 zero_cum = vector8_broadcast(0x80);
32+
#endif
33+
34+
Assert(len % sizeof(chunk) == 0);
35+
36+
while (s < s_end)
37+
{
38+
vector8_load(&chunk, s);
39+
40+
/* Capture any zero bytes in this chunk. */
41+
#ifdef USE_NO_SIMD
42+
43+
/*
44+
* First, add 0x7f to each byte. This sets the high bit in each byte,
45+
* unless it was a zero. If any resulting high bits are zero, the
46+
* corresponding high bits in the zero accumulator will be cleared.
47+
*
48+
* If none of the bytes in the chunk had the high bit set, the max
49+
* value each byte can have after the addition is 0x7f + 0x7f = 0xfe,
50+
* and we don't need to worry about carrying over to the next byte. If
51+
* any input bytes did have the high bit set, it doesn't matter
52+
* because we check for those separately.
53+
*/
54+
zero_cum &= (chunk + vector8_broadcast(0x7F));
55+
#else
56+
57+
/*
58+
* Set all bits in each lane of the highbit accumulator where input
59+
* bytes are zero.
60+
*/
61+
highbit_cum = vector8_or(highbit_cum,
62+
vector8_eq(chunk, vector8_broadcast(0)));
63+
#endif
64+
65+
/* Capture all set bits in this chunk. */
66+
highbit_cum = vector8_or(highbit_cum, chunk);
67+
68+
s += sizeof(chunk);
69+
}
70+
71+
/* Check if any high bits in the high bit accumulator got set. */
72+
if (vector8_is_highbit_set(highbit_cum))
73+
return false;
74+
75+
#ifdef USE_NO_SIMD
76+
/* Check if any high bits in the zero accumulator got cleared. */
77+
if (zero_cum != vector8_broadcast(0x80))
78+
return false;
79+
#endif
80+
81+
return true;
82+
}
83+
1684
#endif /* _ASCII_H_ */

0 commit comments

Comments
 (0)