Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

ENH: df.to_parquet() should return bytes #37105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
impredicative opened this issue Oct 13, 2020 · 2 comments · Fixed by #37129
Closed

ENH: df.to_parquet() should return bytes #37105

impredicative opened this issue Oct 13, 2020 · 2 comments · Fixed by #37129
Labels
API - Consistency Internal Consistency of API/Behavior Enhancement IO Parquet parquet, feather Needs Discussion Requires discussion from core team before further action
Milestone

Comments

@impredicative
Copy link

Is your feature request related to a problem?

I find it useful to write a parquet to a bytes object for some unit tests. The code that I currently use to do this is quite verbose.

To provide some background, df.to_csv() (w/o args) just works. It returns a str object as is expected. In the same vein, df.to_parquet() (w/o args) should return a bytes object.

More precisely, the current behavior is:

>>> df = pd.DataFrame()

>>> type(df.to_csv())  # This works
<class 'str'>

>>> df.to_parquet() # This should be made to work
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: to_parquet() missing 1 required positional argument: 'path'

Describe the solution you'd like

The requested behavior is:

>>> df = pd.DataFrame()

>>> type(df.to_parquet())
<class 'bytes'>

Other uses of df.to_parquet should obviously remain unaffected.

API breaking implications

It won't break the documented API.

Describe alternatives you've considered

I currently use this verbose code to get what I want:

import io

import pandas as pd

df = pd.DataFrame()
pq_file = io.BytesIO()
df.to_parquet(pq_file)
pq_bytes = pq_file.getvalue()

This workaround is too effortful.

@impredicative impredicative added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 13, 2020
@rhshadrach
Copy link
Member

+1

@rhshadrach rhshadrach added IO Parquet parquet, feather Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 13, 2020
@rmsmani
Copy link

rmsmani commented Oct 14, 2020

+1

@simonjayhawkins simonjayhawkins added the API - Consistency Internal Consistency of API/Behavior label Oct 14, 2020
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Oct 14, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.2 Oct 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Enhancement IO Parquet parquet, feather Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants