-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: SimpleDirectoryReader metadata handles timezones inconsistently #17715
Comments
Hey @yozlet! I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this issue together! To address the inconsistency in timezone handling in the from datetime import datetime, timezone
def _format_file_timestamp(
timestamp: float | None, include_time: bool = False
) -> str | None:
"""
Format file timestamp to a %Y-%m-%d string with timezone awareness.
Args:
timestamp (float): timestamp in float
include_time (bool): whether to include time in the formatted string
Returns:
str: formatted timestamp
None: if the timestamp passed was None
"""
if timestamp is None:
return None
dt = datetime.fromtimestamp(timestamp, tz=timezone.utc)
if include_time:
return dt.strftime("%Y-%m-%dT%H:%M:%SZ")
return dt.strftime("%Y-%m-%d") This change ensures that both date-only and full datetime outputs are consistently in UTC, which should resolve the inconsistency you're experiencing [1]. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
Feel free to open a pr to fix, seems like an easy change |
@logan-markewich Are you happy with defaulting to UTC? Could cause confusion amongst those who aren't expecting it, but it avoids accidental time-travel around daylight saving switchovers. I'll put something in the docs. |
…tently run-llama#17715 Add more metadata dict info to docs
Bug Description
SimpleDirectoryReader
's methods for returning a file's date & time metadata (e.g. last modified date) will occasionally output timestamps using UTC instead of the local timezone.The cause is the date-formatting function
_format_file_metadata
which usesdatetime.fromtimestamp()
when it needs to output just the date, butdatetime.utcfromtimestamp()
for outputting the whole datetime.I only noticed the problem when this test failed on my machine. The test was merged over a month ago so it's interesting that it didn't fail for anyone else; I assume it's because most contributors are developing inside Docker containers or other environments which default to UTC.
Version
latest
Steps to Reproduce
My computer and I are on Pacific Time (UTC-08:00). I have a document that I modified at 18:39 local time, which was after midnight UTC. Let's read its metadata with
SimpleDirectoryReader
:When running the code against the document, note the difference in dates:
Relevant Logs/Tracbacks
The text was updated successfully, but these errors were encountered: