Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source MongoDB: customize DISCOVER_LIMIT #9284

Open
alafanechere opened this issue Jan 4, 2022 · 3 comments
Open

Source MongoDB: customize DISCOVER_LIMIT #9284

alafanechere opened this issue Jan 4, 2022 · 3 comments
Labels
area/connectors Connector related issues area/databases community connectors/source/mongodb connectors/sources-database frozen Not being actively worked on team/db-dw-sources Backlog for Database and Data Warehouse Sources team type/enhancement New feature or request

Comments

@alafanechere
Copy link
Contributor

Tell us about the problem you're trying to solve

To make schema discovery faster #8491 limited the schema discovery step to the analysis of 10k records.
Some users might have heterogenous schema spanning on more than this hardcoded limit.

Describe the solution you’d like

We could expose DISCOVER_LIMIT as a parameter in the source configuration to allow users to customize this value, or choose to read all documents in the discovery step.

Describe the alternative you’ve considered or used

Downgrade to version 0.1.8

Additional context

Requested by a user in this Slack conversation

@alafanechere alafanechere added type/enhancement New feature or request area/connectors Connector related issues community labels Jan 4, 2022
@bleonard bleonard added autoteam team/tse Technical Support Engineers labels Apr 26, 2022
@marcosmarxm marcosmarxm added area/databases team/databases and removed team/tse Technical Support Engineers autoteam labels Jun 14, 2022
@grishick grishick added the team/db-dw-sources Backlog for Database and Data Warehouse Sources team label Sep 27, 2022
@phammer
Copy link

phammer commented Jan 11, 2023

Vote from my side for this suggestion, but for different reasons -> make discovery faster.

Our database scheme is quite clean but we have very big documents in it. We have been in the trial with Airbyte Cloud and the MongoDb source connector always timed out during discovery, we were not able to get it running. Support of Airbyte Cloud couldn't help with the topic (thus I don't have logs).

Hint: having the parameter even configurable per MongoDb collection would be the optimal solution because most collection have simple documents, just a few have very complex, heavy documents.

@mickaelandrieu
Copy link
Contributor

Do you know how can I do to override the current connector and change the DISCOVER_LIMIT ? Sadly, I'm using the docker image in production so I guess I have to build the update again inside my container, but my container probably doesnt have what is required to build a new Java connector.

WDYT ?

@mickaelandrieu
Copy link
Contributor

Also

Hint: having the parameter even configurable per MongoDb collection would be the optimal solution because most collection have simple documents, just a few have very complex, heavy documents.

totally agree

@bleonard bleonard added the frozen Not being actively worked on label Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/databases community connectors/source/mongodb connectors/sources-database frozen Not being actively worked on team/db-dw-sources Backlog for Database and Data Warehouse Sources team type/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants