caio.co/de/cantine


"Document" the new `tique::conditional_collector` by Caio 4 years ago (log)

Blob README.markdown

Showing rendered content. Download source code


Cantine

A cooking recipe search JSON API with over a million recipes.

Walkthrough

The API is publicly accessible at https://caio.co/recipes/api/v0.

You can search via POST on /search:

curl -H "Content-Type: application/json" -d'{ "fulltext": "bacon" }' https://caio.co/recipes/api/v0/search

The output will contain an array under items with each item containing fields like name, crawl_url, num_ingredients, image and more.

If you want more details about a specific recipe, you can GET at /recipe/{uuid}.

There's one more useful endpoint you can GET: /info. We'll refer to it in more detail later, but it basically describes some of the features we support.

Now, to make things easier to read we'll create a simple function in bash:

export API=https://caio.co/recipes/api/v0
function search() { curl -XPOST "$API/search" -H "Content-Type: application/json" -d"$1"; echo; }

So we can do a useful search for recipes with bacon, the phrase "deep fry" and without eggs:

search '{ "fulltext": "bacon -egg \"deep fry\"" }'

Pagination

You should have noticed a next field in the output of our previous search. Should look like base64-encoded gibberish.

If you submit the same search, but with an extra after key with the value you got from next, you get (surprise!) the next results:

search '{ "fulltext": "bacon", "after": "AAAAAABAy6c0cM0Rb7VSU3OJkjB7_hHxeA" }'

Notice that the result contains a next field again? So long as a result contains a next you can keep using it as after to paginate through a result set of any size.

Sorting

From the /info endpoint you can learn all the valid sort options. Currently the default is "relevance", you can sort by every feature sans diet-related ones and you can change the order to ascending.

search '{ "sort": "num_ingredients_asc" }'

Querying Features

From the /info endpoint we can also learn about the features we know about each recipe.

Here's a commented example of what you would see by looking at the output under features.num_ingredients:

{
  // Lowest number of ingredients (at least) one indexed recipe has
  "min": 2,
  // Ditto, but highest
  "max": 93,
  // Number of recipes in the index with the "num_ingredients" feature
  "count": 1183461,
}

Filtering

You can query for any feature and value ranges you want. Recipes with calories within the [100,350[ range:

search '{ "fulltext": "picanha", "filter": { "calories": [100, 350] } }'

Aggregating

You can get a breakdown of any/every feature for arbitrary (half-open) ranges.

Maybe you'd like to see a more detailed counts of a search by total time:

search '{ "fulltext": "cheese bacon", "agg": { "total_time": [ [0, 15], [15, 60], [60, 240] ] } }'

The output will contain a new agg field, that looks something like this:

{
  "agg": {
    "total_time": [
      {
        "min": 0,
        "max": 14,
        "count": 3158
      },
      {
        "min": 15,
        "max": 58,
        "count": 8982
      },
      {
        "min": 60,
        "max": 225,
        "count": 1594
      }
    ]
}

Which is, in order, the breakdown of each of the ranges we requested in the search. So if we add a new filter for [15,60] to the search we should expect 8982 matching recipes:

search '{ "fulltext": "cheese bacon", "filter": { "total_time": [15, 60] } }'

Of course, you can filter and aggregate as many features/ranges as you want.

NOTE: For performance reasons, the agg field is omitted from the result if too many recipes are found (300k currently).

"Documentation"

This is mostly an exercise in learning rust, so if you are looking for well-thought-out things you won't have much luck. The code here is organized as a cargo workspace where the business logic and server code are placed inside the cantine crate and isolated functionality such as cursor-based pagination and query/aggregation-related code generation is implemented in tique.

I plan on exploring the whole ecosystem so documentation will come someday, but for now here's a brief outline of the modules:

Instructions

You can use the sample data to run a tiny version of the API:

cargo run --bin load /tmp/cantine < cantine/tests/sample_recipes.jsonlines
RUST_LOG=debug cargo run /tmp/cantine