Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit d3cd36a

Browse files
committed
Make to_timestamp() and to_date() range-check fields of their input.
Historically, something like to_date('2009-06-40','YYYY-MM-DD') would return '2009-07-10' because there was no prohibition on out-of-range month or day numbers. This has been widely panned, and it also turns out that Oracle throws an error in such cases. Since these functions are nominally Oracle-compatibility features, let's change that. There's no particular restriction on year (modulo the fact that the scanner may not believe that more than 4 digits are year digits, a matter to be addressed separately if at all). But we now check month, day, hour, minute, second, and fractional-second fields, as well as day-of-year and second-of-day fields if those are used. Currently, no checks are made on ISO-8601-style week numbers or day numbers; it's not very clear what the appropriate rules would be there, and they're probably so little used that it's not worth sweating over. Artur Zakirov, reviewed by Amul Sul, further adjustments by me Discussion: <1873520224.1784572.1465833145330.JavaMail.yahoo@mail.yahoo.com> See-Also: <57786490.9010201@wars-nicht.de>
1 parent 967ed92 commit d3cd36a

File tree

4 files changed

+239
-66
lines changed

4 files changed

+239
-66
lines changed

doc/src/sgml/func.sgml

+48-37
Original file line numberDiff line numberDiff line change
@@ -5832,6 +5832,17 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
58325832
</para>
58335833
</note>
58345834

5835+
<tip>
5836+
<para>
5837+
<function>to_timestamp</function> and <function>to_date</function>
5838+
exist to handle input formats that cannot be converted by
5839+
simple casting. For most standard date/time formats, simply casting the
5840+
source string to the required data type works, and is much easier.
5841+
Similarly, <function>to_number</> is unnecessary for standard numeric
5842+
representations.
5843+
</para>
5844+
</tip>
5845+
58355846
<para>
58365847
In a <function>to_char</> output template string, there are certain
58375848
patterns that are recognized and replaced with appropriately-formatted
@@ -6038,7 +6049,7 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
60386049
</row>
60396050
<row>
60406051
<entry><literal>Q</literal></entry>
6041-
<entry>quarter (ignored by <function>to_date</> and <function>to_timestamp</>)</entry>
6052+
<entry>quarter</entry>
60426053
</row>
60436054
<row>
60446055
<entry><literal>RM</literal></entry>
@@ -6156,20 +6167,6 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
61566167
</para>
61576168
</listitem>
61586169

6159-
<listitem>
6160-
<para>
6161-
<function>to_timestamp</function> and <function>to_date</function>
6162-
exist to handle input formats that cannot be converted by
6163-
simple casting. These functions interpret input liberally,
6164-
with minimal error checking. While they produce valid output,
6165-
the conversion can yield unexpected results. For example,
6166-
input to these functions is not restricted by normal ranges,
6167-
thus <literal>to_date('20096040','YYYYMMDD')</literal> returns
6168-
<literal>2014-01-17</literal> rather than causing an error.
6169-
Casting does not have this behavior.
6170-
</para>
6171-
</listitem>
6172-
61736170
<listitem>
61746171
<para>
61756172
Ordinary text is allowed in <function>to_char</function>
@@ -6195,7 +6192,8 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
61956192

61966193
<listitem>
61976194
<para>
6198-
If the year format specification is less than four digits, e.g.
6195+
In <function>to_timestamp</function> and <function>to_date</function>,
6196+
if the year format specification is less than four digits, e.g.
61996197
<literal>YYY</>, and the supplied year is less than four digits,
62006198
the year will be adjusted to be nearest to the year 2020, e.g.
62016199
<literal>95</> becomes 1995.
@@ -6204,8 +6202,9 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
62046202

62056203
<listitem>
62066204
<para>
6207-
The <literal>YYYY</literal> conversion from string to <type>timestamp</type> or
6208-
<type>date</type> has a restriction when processing years with more than 4 digits. You must
6205+
In <function>to_timestamp</function> and <function>to_date</function>,
6206+
the <literal>YYYY</literal> conversion has a restriction when
6207+
processing years with more than 4 digits. You must
62096208
use some non-digit character or template after <literal>YYYY</literal>,
62106209
otherwise the year is always interpreted as 4 digits. For example
62116210
(with the year 20000):
@@ -6219,22 +6218,32 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
62196218

62206219
<listitem>
62216220
<para>
6222-
In conversions from string to <type>timestamp</type> or
6223-
<type>date</type>, the <literal>CC</literal> (century) field is ignored
6221+
In <function>to_timestamp</function> and <function>to_date</function>,
6222+
the <literal>CC</literal> (century) field is accepted but ignored
62246223
if there is a <literal>YYY</literal>, <literal>YYYY</literal> or
62256224
<literal>Y,YYY</literal> field. If <literal>CC</literal> is used with
6226-
<literal>YY</literal> or <literal>Y</literal> then the year is computed
6227-
as the year in the specified century. If the century is
6225+
<literal>YY</literal> or <literal>Y</literal> then the result is
6226+
computed as that year in the specified century. If the century is
62286227
specified but the year is not, the first year of the century
62296228
is assumed.
62306229
</para>
62316230
</listitem>
62326231

62336232
<listitem>
62346233
<para>
6235-
An ISO 8601 week-numbering date (as distinct from a Gregorian date)
6236-
can be specified to <function>to_timestamp</function> and
6237-
<function>to_date</function> in one of two ways:
6234+
In <function>to_timestamp</function> and <function>to_date</function>,
6235+
weekday names or numbers (<literal>DAY</literal>, <literal>D</literal>,
6236+
and related field types) are accepted but are ignored for purposes of
6237+
computing the result. The same is true for quarter
6238+
(<literal>Q</literal>) fields.
6239+
</para>
6240+
</listitem>
6241+
6242+
<listitem>
6243+
<para>
6244+
In <function>to_timestamp</function> and <function>to_date</function>,
6245+
an ISO 8601 week-numbering date (as distinct from a Gregorian date)
6246+
can be specified in one of two ways:
62386247
<itemizedlist>
62396248
<listitem>
62406249
<para>
@@ -6276,23 +6285,24 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
62766285

62776286
<listitem>
62786287
<para>
6279-
In a conversion from string to <type>timestamp</type>, millisecond
6288+
In <function>to_timestamp</function>, millisecond
62806289
(<literal>MS</literal>) or microsecond (<literal>US</literal>)
6281-
values are used as the
6290+
fields are used as the
62826291
seconds digits after the decimal point. For example
6283-
<literal>to_timestamp('12:3', 'SS:MS')</literal> is not 3 milliseconds,
6284-
but 300, because the conversion counts it as 12 + 0.3 seconds.
6285-
This means for the format <literal>SS:MS</literal>, the input values
6286-
<literal>12:3</literal>, <literal>12:30</literal>, and <literal>12:300</literal> specify the
6287-
same number of milliseconds. To get three milliseconds, one must use
6288-
<literal>12:003</literal>, which the conversion counts as
6292+
<literal>to_timestamp('12.3', 'SS.MS')</literal> is not 3 milliseconds,
6293+
but 300, because the conversion treats it as 12 + 0.3 seconds.
6294+
So, for the format <literal>SS.MS</literal>, the input values
6295+
<literal>12.3</literal>, <literal>12.30</literal>,
6296+
and <literal>12.300</literal> specify the
6297+
same number of milliseconds. To get three milliseconds, one must write
6298+
<literal>12.003</literal>, which the conversion treats as
62896299
12 + 0.003 = 12.003 seconds.
62906300
</para>
62916301

62926302
<para>
62936303
Here is a more
62946304
complex example:
6295-
<literal>to_timestamp('15:12:02.020.001230', 'HH:MI:SS.MS.US')</literal>
6305+
<literal>to_timestamp('15:12:02.020.001230', 'HH24:MI:SS.MS.US')</literal>
62966306
is 15 hours, 12 minutes, and 2 seconds + 20 milliseconds +
62976307
1230 microseconds = 2.021230 seconds.
62986308
</para>
@@ -6310,9 +6320,10 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
63106320
<listitem>
63116321
<para>
63126322
<function>to_char(interval)</function> formats <literal>HH</> and
6313-
<literal>HH12</> as shown on a 12-hour clock, i.e. zero hours
6314-
and 36 hours output as <literal>12</>, while <literal>HH24</>
6315-
outputs the full hour value, which can exceed 23 for intervals.
6323+
<literal>HH12</> as shown on a 12-hour clock, for example zero hours
6324+
and 36 hours both output as <literal>12</>, while <literal>HH24</>
6325+
outputs the full hour value, which can exceed 23 in
6326+
an <type>interval</> value.
63166327
</para>
63176328
</listitem>
63186329

src/backend/utils/adt/formatting.c

+67-27
Original file line numberDiff line numberDiff line change
@@ -3553,9 +3553,6 @@ to_date(PG_FUNCTION_ARGS)
35533553
*
35543554
* The TmFromChar is then analysed and converted into the final results in
35553555
* struct 'tm' and 'fsec'.
3556-
*
3557-
* This function does very little error checking, e.g.
3558-
* to_timestamp('20096040','YYYYMMDD') works
35593556
*/
35603557
static void
35613558
do_to_timestamp(text *date_txt, text *fmt,
@@ -3564,30 +3561,35 @@ do_to_timestamp(text *date_txt, text *fmt,
35643561
FormatNode *format;
35653562
TmFromChar tmfc;
35663563
int fmt_len;
3564+
char *date_str;
3565+
int fmask;
3566+
3567+
date_str = text_to_cstring(date_txt);
35673568

35683569
ZERO_tmfc(&tmfc);
35693570
ZERO_tm(tm);
35703571
*fsec = 0;
3572+
fmask = 0; /* bit mask for ValidateDate() */
35713573

35723574
fmt_len = VARSIZE_ANY_EXHDR(fmt);
35733575

35743576
if (fmt_len)
35753577
{
35763578
char *fmt_str;
3577-
char *date_str;
35783579
bool incache;
35793580

35803581
fmt_str = text_to_cstring(fmt);
35813582

3582-
/*
3583-
* Allocate new memory if format picture is bigger than static cache
3584-
* and not use cache (call parser always)
3585-
*/
35863583
if (fmt_len > DCH_CACHE_SIZE)
35873584
{
3588-
format = (FormatNode *) palloc((fmt_len + 1) * sizeof(FormatNode));
3585+
/*
3586+
* Allocate new memory if format picture is bigger than static
3587+
* cache and not use cache (call parser always)
3588+
*/
35893589
incache = FALSE;
35903590

3591+
format = (FormatNode *) palloc((fmt_len + 1) * sizeof(FormatNode));
3592+
35913593
parse_format(format, fmt_str, DCH_keywords,
35923594
DCH_suff, DCH_index, DCH_TYPE, NULL);
35933595

@@ -3604,33 +3606,27 @@ do_to_timestamp(text *date_txt, text *fmt,
36043606

36053607
if ((ent = DCH_cache_search(fmt_str)) == NULL)
36063608
{
3607-
ent = DCH_cache_getnew(fmt_str);
3608-
36093609
/*
36103610
* Not in the cache, must run parser and save a new
36113611
* format-picture to the cache.
36123612
*/
3613+
ent = DCH_cache_getnew(fmt_str);
3614+
36133615
parse_format(ent->format, fmt_str, DCH_keywords,
36143616
DCH_suff, DCH_index, DCH_TYPE, NULL);
36153617

36163618
(ent->format + fmt_len)->type = NODE_TYPE_END; /* Paranoia? */
3617-
#ifdef DEBUG_TO_FROM_CHAR
3618-
/* dump_node(ent->format, fmt_len); */
3619-
/* dump_index(DCH_keywords, DCH_index); */
3620-
#endif
36213619
}
36223620
format = ent->format;
36233621
}
36243622

36253623
#ifdef DEBUG_TO_FROM_CHAR
36263624
/* dump_node(format, fmt_len); */
3625+
/* dump_index(DCH_keywords, DCH_index); */
36273626
#endif
36283627

3629-
date_str = text_to_cstring(date_txt);
3630-
36313628
DCH_from_char(format, date_str, &tmfc);
36323629

3633-
pfree(date_str);
36343630
pfree(fmt_str);
36353631
if (!incache)
36363632
pfree(format);
@@ -3639,8 +3635,7 @@ do_to_timestamp(text *date_txt, text *fmt,
36393635
DEBUG_TMFC(&tmfc);
36403636

36413637
/*
3642-
* Convert values that user define for FROM_CHAR (to_date/to_timestamp) to
3643-
* standard 'tm'
3638+
* Convert to_date/to_timestamp input fields to standard 'tm'
36443639
*/
36453640
if (tmfc.ssss)
36463641
{
@@ -3696,19 +3691,23 @@ do_to_timestamp(text *date_txt, text *fmt,
36963691
tm->tm_year = (tmfc.cc + 1) * 100 - tm->tm_year + 1;
36973692
}
36983693
else
3694+
{
36993695
/* find century year for dates ending in "00" */
37003696
tm->tm_year = tmfc.cc * 100 + ((tmfc.cc >= 0) ? 0 : 1);
3697+
}
37013698
}
37023699
else
3703-
/* If a 4-digit year is provided, we use that and ignore CC. */
37043700
{
3701+
/* If a 4-digit year is provided, we use that and ignore CC. */
37053702
tm->tm_year = tmfc.year;
37063703
if (tmfc.bc && tm->tm_year > 0)
37073704
tm->tm_year = -(tm->tm_year - 1);
37083705
}
3706+
fmask |= DTK_M(YEAR);
37093707
}
3710-
else if (tmfc.cc) /* use first year of century */
3708+
else if (tmfc.cc)
37113709
{
3710+
/* use first year of century */
37123711
if (tmfc.bc)
37133712
tmfc.cc = -tmfc.cc;
37143713
if (tmfc.cc >= 0)
@@ -3717,10 +3716,14 @@ do_to_timestamp(text *date_txt, text *fmt,
37173716
else
37183717
/* +1 because year == 599 is 600 BC */
37193718
tm->tm_year = tmfc.cc * 100 + 1;
3719+
fmask |= DTK_M(YEAR);
37203720
}
37213721

37223722
if (tmfc.j)
3723+
{
37233724
j2date(tmfc.j, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
3725+
fmask |= DTK_DATE_M;
3726+
}
37243727

37253728
if (tmfc.ww)
37263729
{
@@ -3734,21 +3737,24 @@ do_to_timestamp(text *date_txt, text *fmt,
37343737
isoweekdate2date(tmfc.ww, tmfc.d, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
37353738
else
37363739
isoweek2date(tmfc.ww, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
3740+
fmask |= DTK_DATE_M;
37373741
}
37383742
else
37393743
tmfc.ddd = (tmfc.ww - 1) * 7 + 1;
37403744
}
37413745

37423746
if (tmfc.w)
37433747
tmfc.dd = (tmfc.w - 1) * 7 + 1;
3744-
if (tmfc.d)
3745-
tm->tm_wday = tmfc.d - 1; /* convert to native numbering */
37463748
if (tmfc.dd)
3749+
{
37473750
tm->tm_mday = tmfc.dd;
3748-
if (tmfc.ddd)
3749-
tm->tm_yday = tmfc.ddd;
3751+
fmask |= DTK_M(DAY);
3752+
}
37503753
if (tmfc.mm)
3754+
{
37513755
tm->tm_mon = tmfc.mm;
3756+
fmask |= DTK_M(MONTH);
3757+
}
37523758

37533759
if (tmfc.ddd && (tm->tm_mon <= 1 || tm->tm_mday <= 1))
37543760
{
@@ -3771,6 +3777,7 @@ do_to_timestamp(text *date_txt, text *fmt,
37713777
j0 = isoweek2j(tm->tm_year, 1) - 1;
37723778

37733779
j2date(j0 + tmfc.ddd, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
3780+
fmask |= DTK_DATE_M;
37743781
}
37753782
else
37763783
{
@@ -3785,14 +3792,16 @@ do_to_timestamp(text *date_txt, text *fmt,
37853792

37863793
for (i = 1; i <= MONTHS_PER_YEAR; i++)
37873794
{
3788-
if (tmfc.ddd < y[i])
3795+
if (tmfc.ddd <= y[i])
37893796
break;
37903797
}
37913798
if (tm->tm_mon <= 1)
37923799
tm->tm_mon = i;
37933800

37943801
if (tm->tm_mday <= 1)
37953802
tm->tm_mday = tmfc.ddd - y[i - 1];
3803+
3804+
fmask |= DTK_M(MONTH) | DTK_M(DAY);
37963805
}
37973806
}
37983807

@@ -3808,7 +3817,38 @@ do_to_timestamp(text *date_txt, text *fmt,
38083817
*fsec += (double) tmfc.us / 1000000;
38093818
#endif
38103819

3820+
/* Range-check date fields according to bit mask computed above */
3821+
if (fmask != 0)
3822+
{
3823+
/* We already dealt with AD/BC, so pass isjulian = true */
3824+
int dterr = ValidateDate(fmask, true, false, false, tm);
3825+
3826+
if (dterr != 0)
3827+
{
3828+
/*
3829+
* Force the error to be DTERR_FIELD_OVERFLOW even if ValidateDate
3830+
* said DTERR_MD_FIELD_OVERFLOW, because we don't want to print an
3831+
* irrelevant hint about datestyle.
3832+
*/
3833+
DateTimeParseError(DTERR_FIELD_OVERFLOW, date_str, "timestamp");
3834+
}
3835+
}
3836+
3837+
/* Range-check time fields too */
3838+
if (tm->tm_hour < 0 || tm->tm_hour >= HOURS_PER_DAY ||
3839+
tm->tm_min < 0 || tm->tm_min >= MINS_PER_HOUR ||
3840+
tm->tm_sec < 0 || tm->tm_sec >= SECS_PER_MINUTE ||
3841+
#ifdef HAVE_INT64_TIMESTAMP
3842+
*fsec < INT64CONST(0) || *fsec >= USECS_PER_SEC
3843+
#else
3844+
*fsec < 0 || *fsec >= 1
3845+
#endif
3846+
)
3847+
DateTimeParseError(DTERR_FIELD_OVERFLOW, date_str, "timestamp");
3848+
38113849
DEBUG_TM(tm);
3850+
3851+
pfree(date_str);
38123852
}
38133853

38143854

0 commit comments

Comments
 (0)