API: to_datetime(ints, unit) give requested unit #63347

jbrockmendel · 2025-12-12T16:44:18Z

to_datetime analogue of #63303.

jorisvandenbossche

Looks good!

pandas/core/tools/datetimes.py

jorisvandenbossche · 2025-12-18T17:23:21Z

pandas/io/json/_json.py

+                # Without this as_unit cast, we would fail to overflow
+                #  and get much-too-large dates
+                return to_datetime(new_data, errors="raise", unit=date_unit).dt.as_unit(
+                    "ns"


I am not directly understanding that comment. new_data are integers here? Why does the return unit of this function need to be nanoseconds? (to preserve current functionality?) Why would this give (wrong?) too large dates?

This is inside a block that tries large units and if they overflow then tries smaller units. This PR makes the large units not-overflow in cases where this piece of code expects them to. Without this edit, e.g. pandas/tests/io/json/test_pandas.py::TestPandasContainer::test_date_unit fails with

left = DatetimeIndex(['30004724859-08-03', '30007462766-08-06', '30010200673-08-08', '30012938580-08-10', '300... '30106027418-11-06', '30108765325-11-08', '30111503232-11-10'], dtype='datetime64[s]', freq=None) right = DatetimeIndex(['2000-01-03', '2000-01-04', '2000-01-05', '2000-01-06', '2000-01-07', '2000-01-10', '200...2000-02-08', '2000-02-09', '2000-02-10', '2000-02-11'], dtype='datetime64[ns]', freq=None)

is my previous comment clear? and if so, suggestions for how to adapt that to a clearer code comment?

This is inside a block that tries large units and if they overflow then tries smaller units.

OK, that was the context I was missing. But, then I still don't entirely get how this was currently working.

The dates you show above like '2000-01-03' fit in the range of all units. So how would the integer value for that ever overflow?

If I put a breakpoint specifically for Overflow on the line below, and run the full test_pandas.py file, I don't get a catch

Fetched the branch to play a bit with the tests: I was misled by the OverflowError, because it is OutOfBoundsDatetime that is being raised when trying to cast to nanoseconds.

So essentially this "infer unit" code assumes that the integer value came from a timestamp that originally had a nanosecond resolution (or at least that should fit in a nanosecond resolution)? Which makes sense from the time we only supported ns.
(Nowadays though .. but that is for another issue)

We could also check this by doing a manual bounds check instead of the casting? (I don't if we have an existing helper function for that)? So we could keep the logic of the inference of the date_unit, but then keep the actual returned data in that unit, instead of forcing it to nanoseconds

(also, for the case where the user specifies the unit, we so we don't have to infer, we actually don't need to force cast to nanoseconds / check bounds, because that restriction is then not needed)

jorisvandenbossche · 2025-12-18T17:24:46Z

pandas/tests/io/json/test_pandas.py


        result = read_json(StringIO(json), typ="series")
-        expected = ts.copy()
+        expected = ts.copy().dt.as_unit("ns")


Not for this PR, but so this is another case where we currently return ns unit but could change to use us by default?

Sure, but im inclined to leave that to will to decide.

In general I think we should use the same default of microseconds whenever we infer / parse strings, and so for IO formats that means whenever they don't store any type / unit information (in contrast to eg parquet). We already do that for csv, html, excel, etc, so I don't think there is a reason to not do that for JSON. But opened #63442 to track that.

jorisvandenbossche · 2025-12-18T17:26:51Z

pandas/tests/resample/test_resampler_grouper.py


    index = pd.MultiIndex(
-        levels=[[1, 2, 3], [pd.to_datetime("2000-01-01", unit="ns")]],
+        levels=[[1, 2, 3], [pd.to_datetime("2000-01-01", unit="ns").as_unit("ns")]],


Did this PR change that? (that this no longer returns nanoseconds)

Yes, but i didn't realize it until I just checked. I thought this PR only affected integer cases. I also didn't think on main that the unit keyword would have any effect in this case. So there's at least two things I need to look into.

OK i think ive figured this out. By passing unit we go down a path that in main always gives "ns" (so the fact that unit="ns" here is irrelevant). Adding as_unit just restores the behavior in main.

jorisvandenbossche · 2025-12-20T22:01:44Z

pandas/io/json/_json.py

        for date_unit in date_units:
            try:
-                return to_datetime(new_data, errors="raise", unit=date_unit)
+                # Without this as_unit cast, we would fail to overflow


something like this:

Suggested change

# Without this as_unit cast, we would fail to overflow

# In case of multiple possible units, infer the likely unit based on the first unit

# for which the parsed dates fit within the nanoseconds bounds

# -> do as_unit cast to ensure OutOfBounds error

jbrockmendel added 5 commits December 12, 2025 08:41

API: to_datetime(ints, unit) give requested unit

73c582c

fix json cases

0eda6d9

update doctests

916da40

Merge branch 'main' into api-dt-ints

f9fd868

fix doctest

a5d7e51

jorisvandenbossche added Non-Nano datetime64/timedelta64 with non-nanosecond resolution Timestamp pd.Timestamp and associated methods labels Dec 18, 2025

jorisvandenbossche added this to the 3.0 milestone Dec 18, 2025

jorisvandenbossche reviewed Dec 18, 2025

View reviewed changes

jbrockmendel added 2 commits December 18, 2025 10:58

Merge branch 'main' into api-dt-ints

a7955da

remove commented-out

b62cc4f

jbrockmendel mentioned this pull request Dec 19, 2025

BUG: pd.to_datetime with low-reso unit and high-reso origin #63419

Open

3 tasks

jorisvandenbossche reviewed Dec 20, 2025

View reviewed changes

-                # Without this as_unit cast, we would fail to overflow
+                # In case of multiple possible units, infer the likely unit based on the first unit
+                # for which the parsed dates fit within the nanoseconds bounds
+                # -> do as_unit cast to ensure OutOfBounds error

Uh oh!

API: to_datetime(ints, unit) give requested unit #63347

Are you sure you want to change the base?

API: to_datetime(ints, unit) give requested unit #63347

Conversation

jbrockmendel commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jbrockmendel commented Dec 12, 2025 •

edited

Loading