Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow queries exceeding the 20,000 entries limit of the APIs #32

Open
PascalIrz opened this issue Apr 1, 2023 · 5 comments
Open

Allow queries exceeding the 20,000 entries limit of the APIs #32

PascalIrz opened this issue Apr 1, 2023 · 5 comments

Comments

@PascalIrz
Copy link
Collaborator

It would be great improvement for the user!

@cedricbriandgithub
Copy link

cedricbriandgithub commented Apr 2, 2023

Agreed !
This comes from the API itself when trying to override

Message: Error 400 on query: https://hubeau.eaufrance.fr/api/v0/indicateurs_services/communes?page=11&size=2000
Error on parameters:
page please refine your query. The multiplication of page * size parameters can not be more than 20000

I'm not sure there is an easy way arround this.

@PascalIrz
Copy link
Collaborator Author

PascalIrz commented Apr 4, 2023

So far I proceed as follows in 2 steps, so that each call to the get_xx_xx() function does not exceed 20,000 entries:

library(hubeau)

# get sites ids
stations <- get_niveaux_nappes_stations(
  code_departement = '79')

# retrieve the data for each site and stack them into a dataframe
piezo <- map_df(
  .x = stations$code_bss,
  .f = function(x)
    get_niveaux_nappes_chroniques(code_bss = x,
                                  date_debut_mesure = "2015-01-01")
)

@cedricbriandgithub
Copy link

Thank you ! But then I was trying to figure out an easy way to do it generally, which is not possible, it requires something specific for each api and endpoint. Unless we create a table within the package giving the way to cut the information. For many tables perhaps we could use year, here you are using station, but I don't have a fine knowledge enough of the structure and volume of data (outside fishes) to figure out if that might be feasible.

@PascalIrz
Copy link
Collaborator Author

Sure, it is just a workaround. @DDorch may have better ideas for a generic solution.

@DDorch
Copy link
Collaborator

DDorch commented Apr 5, 2023

I agree with @cedricbriandgithub it's impossible to guess how many records would return a query whatever the way you cut it into subqueries. Event a fine treatment for each API could fail over time...

The best thing could be to add a documentation on how to processus in case of a query returning more thanks 20000 records.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants