caio.co/de/cantine


Update dependencies by Caio 3 years ago (log)

Tree cantine/


Cantine

A cooking recipe search JSON API with over a million recipes.

NOTICE: The full data and API are now offline. Contact me if you have a need for it.

Data

Interested in the recipe data only? You can download the unprocessed crawled recipe data from:

https://caio.co/data/recipes.crawl.original.tsv.bz2 (~ 445 MB)

It is a TSV. Headers:

Running Instructions

You can use the sample data to run a tiny version of the API:

cargo run --bin load /tmp/cantine < cantine/tests/sample_recipes.jsonlines
RUST_LOG=debug BASE_DIR=/tmp/cantine cargo run

If you like, you can download the full dataset already cleaned up and augmented from:

https://caio.co/data/cantine_recipes.jsonlines.bz2 (~ 383 MB)

API Tutorial

The API is publicly accessible at https://caio.co/recipes/api/v0.

You can search via POST on /search:

curl -H "Content-Type: application/json" -d'{ "fulltext": "bacon" }' https://caio.co/recipes/api/v0/search

The output will contain an array under items with each item containing fields like name, crawl_url, num_ingredients, image and more.

If you want more details about a specific recipe, you can GET at /recipe/{uuid}.

There's one more useful endpoint you can GET: /info. We'll refer to it in more detail later, but it basically describes some of the features we support.

Now, to make things easier to read we'll create a simple function in bash:

export API=https://caio.co/recipes/api/v0
function search() { curl -XPOST "$API/search" -H "Content-Type: application/json" -d"$1"; echo; }

So we can do a useful search for recipes with bacon, the phrase "deep fry" and without eggs:

search '{ "fulltext": "bacon -egg \"deep fry\"" }'

Pagination

You should have noticed a next field in the output of our previous search. Should look like base64-encoded gibberish.

If you submit the same search, but with an extra after key with the value you got from next, you get (surprise!) the next results:

search '{ "fulltext": "bacon", "after": "AAAAAABAy6c0cM0Rb7VSU3OJkjB7_hHxeA" }'

Notice that the result contains a next field again? So long as a result contains a next you can keep using it as after to paginate through a result set of any size.

Sorting

From the /info endpoint you can learn all the valid sort options. Currently the default is "relevance", you can sort by every feature sans diet-related ones and you can change the order to ascending.

search '{ "sort": "num_ingredients_asc" }'

Querying Features

From the /info endpoint we can also learn about the features we know about each recipe.

Here's a commented example of what you would see by looking at the output under features.num_ingredients:

{
  // Lowest number of ingredients (at least) one indexed recipe has
  "min": 2,
  // Ditto, but highest
  "max": 93,
  // Number of recipes in the index with the "num_ingredients" feature
  "count": 1183461,
}

Filtering

You can query for any feature and value ranges you want. Recipes with calories within the [100,350[ range:

search '{ "fulltext": "picanha", "filter": { "calories": [100, 350] } }'

Aggregating

You can get a breakdown of any/every feature for arbitrary (half-open) ranges.

Maybe you'd like to see a more detailed counts of a search by total time:

search '{ "fulltext": "cheese bacon", "agg": { "total_time": [ [0, 15], [15, 60], [60, 240] ] } }'

The output will contain a new agg field, that looks something like this:

{
  "agg": {
    "total_time": [
      {
        "min": 0,
        "max": 14,
        "count": 3158
      },
      {
        "min": 15,
        "max": 58,
        "count": 8982
      },
      {
        "min": 60,
        "max": 225,
        "count": 1594
      }
    ]
}

Which is, in order, the breakdown of each of the ranges we requested in the search. So if we add a new filter for [15,60] to the search we should expect 8982 matching recipes:

search '{ "fulltext": "cheese bacon", "filter": { "total_time": [15, 60] } }'

Of course, you can filter and aggregate as many features/ranges as you want.

NOTE: For performance reasons, the agg field is omitted from the result if too many recipes are found (300k currently).