caio.co/de/cantine

Add links to download the crawl data

Id
b38f9d8dbddea0fccacc7138ae9ca2f329a38f6f
Author
Caio
Commit time
2020-04-17T10:21:30+02:00

Modified README.markdown

@@ -23,6 +23,20
[pub]: https://crates.io/crates/tique
[doc]: https://docs.rs/tique

+## Data
+
+Interested in the recipe data only? You can download the unprocessed
+crawled recipe data from:
+
+ https://caio.co/data/recipes.crawl.original.tsv.bz2 (~ 445 MB)
+
+It is a TSV. Headers:
+
+* **source**: The url where the recipe was extracted from
+* **format**: Either `microdata` or `ldjson`, which signals how the
+ recipe data is formatted
+* **json**: The actual data :-)
+

### Running Instructions

@@ -32,6 +46,11
cargo run --bin load /tmp/cantine < cantine/tests/sample_recipes.jsonlines
RUST_LOG=debug BASE_DIR=/tmp/cantine cargo run
```
+
+If you like, you can download the full dataset already cleaned up
+and augmented from:
+
+ https://caio.co/data/cantine_recipes.jsonlines.bz2 (~ 383 MB)

## API Tutorial