Epoch
- Id
- b85abafeecf3131822ad540d3319119a883be767
- Author
- Caio
- Commit time
- 2024-02-26T15:30:51+01:00
Created .config/caca.ini
+[meta]
+description = caio's code asylum
+
+[link "Mirror (github)"]
+href = https://github.com/caio/caca
Created .gitignore
+/target
+Cargo.lock
+perf.*
Created Cargo.toml
+[workspace]
+resolver = "2"
+members = [
+ "caca",
+ "urso",
+]
+default-members = ["caca"]
+
+[workspace.dependencies]
+tracing = { version = "0.1.40", default-features = false }
Created LICENSE
+ EUROPEAN UNION PUBLIC LICENCE v. 1.2
+ EUPL © the European Union 2007, 2016
+
+This European Union Public Licence (the ‘EUPL’) applies to the Work (as defined
+below) which is provided under the terms of this Licence. Any use of the Work,
+other than as authorised under this Licence is prohibited (to the extent such
+use is covered by a right of the copyright holder of the Work).
+
+The Work is provided under the terms of this Licence when the Licensor (as
+defined below) has placed the following notice immediately following the
+copyright notice for the Work:
+
+ Licensed under the EUPL
+
+or has expressed by any other means his willingness to license under the EUPL.
+
+1. Definitions
+
+In this Licence, the following terms have the following meaning:
+
+- ‘The Licence’: this Licence.
+
+- ‘The Original Work’: the work or software distributed or communicated by the
+ Licensor under this Licence, available as Source Code and also as Executable
+ Code as the case may be.
+
+- ‘Derivative Works’: the works or software that could be created by the
+ Licensee, based upon the Original Work or modifications thereof. This Licence
+ does not define the extent of modification or dependence on the Original Work
+ required in order to classify a work as a Derivative Work; this extent is
+ determined by copyright law applicable in the country mentioned in Article 15.
+
+- ‘The Work’: the Original Work or its Derivative Works.
+
+- ‘The Source Code’: the human-readable form of the Work which is the most
+ convenient for people to study and modify.
+
+- ‘The Executable Code’: any code which has generally been compiled and which is
+ meant to be interpreted by a computer as a program.
+
+- ‘The Licensor’: the natural or legal person that distributes or communicates
+ the Work under the Licence.
+
+- ‘Contributor(s)’: any natural or legal person who modifies the Work under the
+ Licence, or otherwise contributes to the creation of a Derivative Work.
+
+- ‘The Licensee’ or ‘You’: any natural or legal person who makes any usage of
+ the Work under the terms of the Licence.
+
+- ‘Distribution’ or ‘Communication’: any act of selling, giving, lending,
+ renting, distributing, communicating, transmitting, or otherwise making
+ available, online or offline, copies of the Work or providing access to its
+ essential functionalities at the disposal of any other natural or legal
+ person.
+
+2. Scope of the rights granted by the Licence
+
+The Licensor hereby grants You a worldwide, royalty-free, non-exclusive,
+sublicensable licence to do the following, for the duration of copyright vested
+in the Original Work:
+
+- use the Work in any circumstance and for all usage,
+- reproduce the Work,
+- modify the Work, and make Derivative Works based upon the Work,
+- communicate to the public, including the right to make available or display
+ the Work or copies thereof to the public and perform publicly, as the case may
+ be, the Work,
+- distribute the Work or copies thereof,
+- lend and rent the Work or copies thereof,
+- sublicense rights in the Work or copies thereof.
+
+Those rights can be exercised on any media, supports and formats, whether now
+known or later invented, as far as the applicable law permits so.
+
+In the countries where moral rights apply, the Licensor waives his right to
+exercise his moral right to the extent allowed by law in order to make effective
+the licence of the economic rights here above listed.
+
+The Licensor grants to the Licensee royalty-free, non-exclusive usage rights to
+any patents held by the Licensor, to the extent necessary to make use of the
+rights granted on the Work under this Licence.
+
+3. Communication of the Source Code
+
+The Licensor may provide the Work either in its Source Code form, or as
+Executable Code. If the Work is provided as Executable Code, the Licensor
+provides in addition a machine-readable copy of the Source Code of the Work
+along with each copy of the Work that the Licensor distributes or indicates, in
+a notice following the copyright notice attached to the Work, a repository where
+the Source Code is easily and freely accessible for as long as the Licensor
+continues to distribute or communicate the Work.
+
+4. Limitations on copyright
+
+Nothing in this Licence is intended to deprive the Licensee of the benefits from
+any exception or limitation to the exclusive rights of the rights owners in the
+Work, of the exhaustion of those rights or of other applicable limitations
+thereto.
+
+5. Obligations of the Licensee
+
+The grant of the rights mentioned above is subject to some restrictions and
+obligations imposed on the Licensee. Those obligations are the following:
+
+Attribution right: The Licensee shall keep intact all copyright, patent or
+trademarks notices and all notices that refer to the Licence and to the
+disclaimer of warranties. The Licensee must include a copy of such notices and a
+copy of the Licence with every copy of the Work he/she distributes or
+communicates. The Licensee must cause any Derivative Work to carry prominent
+notices stating that the Work has been modified and the date of modification.
+
+Copyleft clause: If the Licensee distributes or communicates copies of the
+Original Works or Derivative Works, this Distribution or Communication will be
+done under the terms of this Licence or of a later version of this Licence
+unless the Original Work is expressly distributed only under this version of the
+Licence — for example by communicating ‘EUPL v. 1.2 only’. The Licensee
+(becoming Licensor) cannot offer or impose any additional terms or conditions on
+the Work or Derivative Work that alter or restrict the terms of the Licence.
+
+Compatibility clause: If the Licensee Distributes or Communicates Derivative
+Works or copies thereof based upon both the Work and another work licensed under
+a Compatible Licence, this Distribution or Communication can be done under the
+terms of this Compatible Licence. For the sake of this clause, ‘Compatible
+Licence’ refers to the licences listed in the appendix attached to this Licence.
+Should the Licensee's obligations under the Compatible Licence conflict with
+his/her obligations under this Licence, the obligations of the Compatible
+Licence shall prevail.
+
+Provision of Source Code: When distributing or communicating copies of the Work,
+the Licensee will provide a machine-readable copy of the Source Code or indicate
+a repository where this Source will be easily and freely available for as long
+as the Licensee continues to distribute or communicate the Work.
+
+Legal Protection: This Licence does not grant permission to use the trade names,
+trademarks, service marks, or names of the Licensor, except as required for
+reasonable and customary use in describing the origin of the Work and
+reproducing the content of the copyright notice.
+
+6. Chain of Authorship
+
+The original Licensor warrants that the copyright in the Original Work granted
+hereunder is owned by him/her or licensed to him/her and that he/she has the
+power and authority to grant the Licence.
+
+Each Contributor warrants that the copyright in the modifications he/she brings
+to the Work are owned by him/her or licensed to him/her and that he/she has the
+power and authority to grant the Licence.
+
+Each time You accept the Licence, the original Licensor and subsequent
+Contributors grant You a licence to their contributions to the Work, under the
+terms of this Licence.
+
+7. Disclaimer of Warranty
+
+The Work is a work in progress, which is continuously improved by numerous
+Contributors. It is not a finished work and may therefore contain defects or
+‘bugs’ inherent to this type of development.
+
+For the above reason, the Work is provided under the Licence on an ‘as is’ basis
+and without warranties of any kind concerning the Work, including without
+limitation merchantability, fitness for a particular purpose, absence of defects
+or errors, accuracy, non-infringement of intellectual property rights other than
+copyright as stated in Article 6 of this Licence.
+
+This disclaimer of warranty is an essential part of the Licence and a condition
+for the grant of any rights to the Work.
+
+8. Disclaimer of Liability
+
+Except in the cases of wilful misconduct or damages directly caused to natural
+persons, the Licensor will in no event be liable for any direct or indirect,
+material or moral, damages of any kind, arising out of the Licence or of the use
+of the Work, including without limitation, damages for loss of goodwill, work
+stoppage, computer failure or malfunction, loss of data or any commercial
+damage, even if the Licensor has been advised of the possibility of such damage.
+However, the Licensor will be liable under statutory product liability laws as
+far such laws apply to the Work.
+
+9. Additional agreements
+
+While distributing the Work, You may choose to conclude an additional agreement,
+defining obligations or services consistent with this Licence. However, if
+accepting obligations, You may act only on your own behalf and on your sole
+responsibility, not on behalf of the original Licensor or any other Contributor,
+and only if You agree to indemnify, defend, and hold each Contributor harmless
+for any liability incurred by, or claims asserted against such Contributor by
+the fact You have accepted any warranty or additional liability.
+
+10. Acceptance of the Licence
+
+The provisions of this Licence can be accepted by clicking on an icon ‘I agree’
+placed under the bottom of a window displaying the text of this Licence or by
+affirming consent in any other similar way, in accordance with the rules of
+applicable law. Clicking on that icon indicates your clear and irrevocable
+acceptance of this Licence and all of its terms and conditions.
+
+Similarly, you irrevocably accept this Licence and all of its terms and
+conditions by exercising any rights granted to You by Article 2 of this Licence,
+such as the use of the Work, the creation by You of a Derivative Work or the
+Distribution or Communication by You of the Work or copies thereof.
+
+11. Information to the public
+
+In case of any Distribution or Communication of the Work by means of electronic
+communication by You (for example, by offering to download the Work from a
+remote location) the distribution channel or media (for example, a website) must
+at least provide to the public the information requested by the applicable law
+regarding the Licensor, the Licence and the way it may be accessible, concluded,
+stored and reproduced by the Licensee.
+
+12. Termination of the Licence
+
+The Licence and the rights granted hereunder will terminate automatically upon
+any breach by the Licensee of the terms of the Licence.
+
+Such a termination will not terminate the licences of any person who has
+received the Work from the Licensee under the Licence, provided such persons
+remain in full compliance with the Licence.
+
+13. Miscellaneous
+
+Without prejudice of Article 9 above, the Licence represents the complete
+agreement between the Parties as to the Work.
+
+If any provision of the Licence is invalid or unenforceable under applicable
+law, this will not affect the validity or enforceability of the Licence as a
+whole. Such provision will be construed or reformed so as necessary to make it
+valid and enforceable.
+
+The European Commission may publish other linguistic versions or new versions of
+this Licence or updated versions of the Appendix, so far this is required and
+reasonable, without reducing the scope of the rights granted by the Licence. New
+versions of the Licence will be published with a unique version number.
+
+All linguistic versions of this Licence, approved by the European Commission,
+have identical value. Parties can take advantage of the linguistic version of
+their choice.
+
+14. Jurisdiction
+
+Without prejudice to specific agreement between parties,
+
+- any litigation resulting from the interpretation of this License, arising
+ between the European Union institutions, bodies, offices or agencies, as a
+ Licensor, and any Licensee, will be subject to the jurisdiction of the Court
+ of Justice of the European Union, as laid down in article 272 of the Treaty on
+ the Functioning of the European Union,
+
+- any litigation arising between other parties and resulting from the
+ interpretation of this License, will be subject to the exclusive jurisdiction
+ of the competent court where the Licensor resides or conducts its primary
+ business.
+
+15. Applicable Law
+
+Without prejudice to specific agreement between parties,
+
+- this Licence shall be governed by the law of the European Union Member State
+ where the Licensor has his seat, resides or has his registered office,
+
+- this licence shall be governed by Belgian law if the Licensor has no seat,
+ residence or registered office inside a European Union Member State.
+
+Appendix
+
+‘Compatible Licences’ according to Article 5 EUPL are:
+
+- GNU General Public License (GPL) v. 2, v. 3
+- GNU Affero General Public License (AGPL) v. 3
+- Open Software License (OSL) v. 2.1, v. 3.0
+- Eclipse Public License (EPL) v. 1.0
+- CeCILL v. 2.0, v. 2.1
+- Mozilla Public Licence (MPL) v. 2
+- GNU Lesser General Public Licence (LGPL) v. 2.1, v. 3
+- Creative Commons Attribution-ShareAlike v. 3.0 Unported (CC BY-SA 3.0) for
+ works other than software
+- European Union Public Licence (EUPL) v. 1.1, v. 1.2
+- Québec Free and Open-Source Licence — Reciprocity (LiLiQ-R) or Strong
+ Reciprocity (LiLiQ-R+).
+
+The European Commission may update this Appendix to later versions of the above
+licences without producing a new version of the EUPL, as long as they provide
+the rights granted in Article 2 of this Licence and protect the covered Source
+Code from exclusive appropriation.
+
+All other changes or additions to this Appendix require the production of a new
+EUPL version.
Created README.md
+# caio's code asylum
+
+caca - web front end for git repositories
+
+# why
+
+It all started with me trying to understand how `git log -- somefile`
+was picking commits and still feeling confused after reading
+[the docs][docs] (search for "A more detailed explanation follows", it's
+well written). So I picked up [gix][] to hack my own file history walker
+and, well, here we are...
+
+[docs]: https://www.git-scm.com/docs/git-log
+[gix]: https://github.com/Byron/gitoxide
+
+# usage
+
+Configure it by changing the `GlobalConfig` instance within
+[caca/src/main.rs](caca/src/main.rs#L120) then:
+
+ cargo run -- path/to/gitroot
+
+You can use the `RUST_LOG` environment variable to configure logging.
+The cmdline I tend to use when hacking is something like:
+
+ RUST_LOG=debug cargo watch --ignore '*.html' -x "run ."
+
+# features
+
+- Repository metadata (description, url, owner, etc) is now version
+ controlled. It reads a `.config/caca.ini` file (git-config format)
+ in the default repository branch and keeps that up-to-date
+ (path and branch configurable)
+
+- [.mailmap](https://git-scm.com/docs/gitmailmap) support. If you
+ use urls instead of emails, whenever an author name is shown,
+ it'll be a hyperlink. The web ui doesn't show e-mails
+
+- Atom feeds:
+
+ - There's a global one with activities from every repo
+
+ - Each repository has a feed which lists most recent
+ tags and commits (all branches)
+
+- Special "www" view: render markdown files automatically, hyperlinks
+ to "folder" resolve as `folder/index.md` then `folder/index.html`.
+ Other targets are served as-is, with content-type guessed by the
+ filename
+
+- systemd socket activation support
+
+# ideas
+
+- git blame? I took a blind stab at it once, realised way too late that
+ I was assuming a linear history. Most of it is done and behaves like
+ `git annotate --first-parent path/to/file` but it isn't very good
+ (I annotate by rebuilding the file from its very first version, its
+ perf is worst-case-always - `urso::annotate`)... maybe I'll just
+ expose that and call it "git lame"
+
+- Pikchr? Graphviz? I like plantuml and mermaid but I'm not keen on
+ spinning a server up
+
+- Could extend the www/ view with more smartness: allow alternative
+ templates? Let markdown make use of the front-matter?
+
+- Syntax highlight? I don't care for it it when looking at patches, but
+ for blobs it's sometimes nice. I just feel like this is the browser
+ responsibility, not the server's, so I keep avoiding it
+
+# warts
+
+- It's not CGI
+
+- You have to enable the default `$GITDIR/hooks/post-update` script
+ for every repository in the server (or do something similar)
+
+- Many assumptions about data being utf-8 encoded
+
+- Doesn't support `.gitattributes`
+
+- Doesn't serve archives or .patch files
+
+- Doesn't support the "fancy" http clone
+
+- Doesn't claim to be blazingly fast
+
+- Doesn't make you "code 55% faster"
+
+- Doesn't contain "git" in the name
+
+
+# the code
+
+There are 2 crates:
+
+- `caca`, the web server: it accepts requests, manages the state,
+ controls access to the thread pool and renders html
+
+- `urso` is where I started: got rev walk working for any given path
+ and kept adding features on top
+
+When `caca` starts, it builds an in-memory snapshot of every repository
+it finds by traversing a base directory (optionally filtering for
+`git-daemon-export-ok`) and uses this information to answer most
+simple requests (listing, main repository pages and feeds)
+
+There's a single admin (`caca::admin`) actor that manages the snapshots
+and whenever a change happens within a repository the actor regenerates
+the snapshot and submits it to the client (`caca::client`)
+
+The client is responsible for matching requests (is the repository name
+correct? branch name valid?) and routing accordingly. It makes use of
+the "business logic" within `caca::repo` to craft the responses
+
+Repository changes are detected by relying on git's [post-update][pu]
+hook being called: `git update-server-info` outputs a file that caca
+can watch for changes (`$GIT_DIR/info/refs`). Alternatively, there's
+a rudimentary admin web "api" that can be used to trigger manual
+updates via http
+
+[pu]: https://git-scm.com/docs/githooks#post-update
+
+## alternatives
+
+If this model is not to your liking: there are really good CGI-based
+([cgit][], [cgit-pink][], [gitweb][]) and static files ([stagit][])
+alternatives; And if you'd like a server, just not this one, I'm
+aware of [gitiles][]: I never operated it, but it looks great and the
+goals are quite similar to this one
+
+[cgit]: https://git.zx2c4.com/cgit/about/
+[cgit-pink]: https://git.causal.agency/cgit-pink/about/
+[gitweb]: https://git-scm.com/docs/gitweb
+[stagit]: https://codemadness.org/stagit.html
+[gitiles]: https://gerrit.googlesource.com/gitiles/
+
+# license
+
+This software is licensed under the [European Union Public License
+(EUPL) v. 1.2 only][EUPL-1.2]
+
+[EUPL-1.2]: https://joinup.ec.europa.eu/collection/eupl/eupl-text-eupl-12
Created caca/Cargo.toml
+[package]
+name = "caca"
+version = "0.1.0"
+edition = "2021"
+
+[dependencies]
+urso = { path = "../urso" }
+tracing = { workspace = true }
+serde = { version = "1.0", default-features = false, features = ["derive", "rc"] }
+
+rayon = { version = "1.8.1", default-features = false }
+axum = { version = "0.7.4", default-features = false, features = ["tracing", "http1", "tokio"] }
+tokio = { version = "1.36.0", default-features = false, features = ["net", "fs", "time"] }
+tokio-util = { version = "0.7.10", default-features = false, features = ["io-util"] }
+
+markdown = { version = "1.0.0-alpha.16", default-features = false }
+tracing-subscriber = { version = "0.3.18", default-features = false, features = ["env-filter", "registry", "fmt", "ansi"] }
+rand = { version = "0.8.5", default-features = false }
+tower = { version = "0.4.13", default-features = false }
+tower-http = { version = "0.5.1", default-features = false, features = ["trace", "limit"] }
+lru = { version = "0.12.2", default-features = false }
+minijinja = { version = "1.0.12", default-features = false, features = ["loader", "multi_template", "builtins", "macros"] }
+notify = { version = "6.1.1", default-features = false, features = ["fsevent-sys", "macos_fsevent"] }
+listenfd = { version = "1.0.1", default-features = false }
+chrono = { version = "0.4.34", default-features = false, features = ["serde"] }
+chrono-humanize = { version = "0.2.3", default-features = false }
+url = { version = "2.5.0", default-features = false }
+
+
+[dev-dependencies]
+serde_test = "1.0.176"
Created caca/src/admin.rs
+use std::sync::Arc;
+
+use notify::{RecommendedWatcher, Watcher};
+use tokio::sync::{
+ mpsc::{self, UnboundedSender},
+ oneshot, Mutex,
+};
+
+use crate::{
+ client::{Client, State as ClientState},
+ repo::{RepoState, Repos, Repository},
+ GlobalConfig,
+};
+
+pub(crate) type SharedAdmin = Arc<AdminState>;
+
+pub(crate) async fn launch(
+ repos: Vec<Repository>,
+ config: Arc<GlobalConfig>,
+ pool: Arc<rayon::ThreadPool>,
+) -> Result<(SharedAdmin, Client), Box<dyn std::error::Error>> {
+ let (state, client_state) = prepare_state(
+ repos,
+ config.theme.env()?,
+ Arc::clone(&config),
+ Arc::clone(&pool),
+ );
+
+ let client = crate::client::launch(config, client_state, pool).await;
+ let admin = launch_admin(state, client.clone()).await?;
+
+ let shared_admin = Arc::new(AdminState {
+ admin: Mutex::new(admin),
+ _client: client.clone(),
+ });
+
+ Ok((shared_admin, client))
+}
+
+fn prepare_state(
+ repos: Vec<Repository>,
+ env: minijinja::Environment<'static>,
+ config: Arc<GlobalConfig>,
+ pool: Arc<rayon::ThreadPool>,
+) -> (State, ClientState) {
+ let client_state = ClientState {
+ repos: Repos::new(&config, repos.clone()),
+ env,
+ };
+ let state = State::new(repos, config, pool);
+
+ (state, client_state)
+}
+
+async fn launch_admin(
+ mut state: State,
+ client: Client,
+) -> Result<Admin, Box<dyn std::error::Error>> {
+ let (sender, mut receiver) = mpsc::unbounded_channel();
+
+ let watcher = launch_watcher(&state.repos, &state.config, sender.clone()).await?;
+
+ let admin = Admin {
+ sender,
+ _watcher: watcher,
+ };
+ tokio::spawn(async move {
+ while let Some(msg) = receiver.recv().await {
+ handle_message(msg, &mut state, &client).await;
+ }
+ });
+
+ Ok(admin)
+}
+
+async fn handle_message(msg: Message, state: &mut State, client: &Client) {
+ match msg {
+ Message::Update(name, dst) => {
+ tracing::trace!(name, "received update request");
+ match state.rebuild_snapshot(&name).await {
+ Ok(repo) => {
+ tracing::debug!(
+ name,
+ head = ?repo.snapshot.head,
+ "admin has new state"
+ );
+ if client.catchup(repo).await {
+ tracing::trace!("client caught up");
+ let _ignored = dst.send(Ok(()));
+ } else {
+ tracing::error!("no confirmation from client");
+ let _ignored = dst.send(Err(UpdateError::ClientDown));
+ }
+ }
+ Err(e) => {
+ let _ignored = dst.send(Err(UpdateError::Build(e)));
+ }
+ };
+ }
+ Message::Reload(tmpl, dst) => {
+ tracing::trace!(tmpl, "forwarding reload request to client");
+ if !client.reload_template(tmpl, dst).await {
+ tracing::error!("client looks down");
+ }
+ }
+ }
+}
+
+enum Message {
+ Update(String, oneshot::Sender<Result<(), UpdateError>>),
+ Reload(String, oneshot::Sender<Result<(), minijinja::Error>>),
+}
+
+struct State {
+ repos: Vec<Repository>,
+ pool: Arc<rayon::ThreadPool>,
+ config: Arc<GlobalConfig>,
+}
+
+impl State {
+ fn new(
+ repos: Vec<Repository>,
+ config: Arc<GlobalConfig>,
+ pool: Arc<rayon::ThreadPool>,
+ ) -> Self {
+ Self {
+ repos,
+ pool,
+ config,
+ }
+ }
+
+ async fn rebuild_snapshot(&mut self, name: &str) -> Result<Arc<RepoState>, BuildError> {
+ let Some(pos) = self.repos.iter().position(|r| r.name == name) else {
+ return Err(BuildError::NotFound);
+ };
+
+ let (sender, receiver) = oneshot::channel();
+ let handle = self.repos[pos].handle.clone();
+
+ let config = Arc::clone(&self.config);
+
+ let name = name.to_string();
+ self.pool.spawn(move || {
+ let urso = handle.into_urso();
+ let _ignored = sender.send(RepoState::new(name, &urso, &config));
+ });
+
+ let new_state = Arc::new(
+ receiver
+ .await
+ .map_err(|_discarded| BuildError::PoolReceiveErr)??,
+ );
+
+ self.repos[pos].state = Arc::clone(&new_state);
+ Ok(new_state)
+ }
+}
+
+pub(crate) struct AdminState {
+ pub(crate) admin: Mutex<Admin>,
+ _client: Client,
+}
+
+pub(crate) struct Admin {
+ sender: mpsc::UnboundedSender<Message>,
+ _watcher: notify::RecommendedWatcher,
+}
+
+impl Admin {
+ pub(crate) async fn update(&self, name: String) -> Result<(), UpdateError> {
+ let (sender, receiver) = oneshot::channel();
+ let _ignored = self.sender.send(Message::Update(name, sender));
+ receiver
+ .await
+ .map_err(|_discarded| UpdateError::AdminDown)?
+ }
+}
+
+#[derive(Debug)]
+pub(crate) enum BuildError {
+ Urso(urso::Error),
+ NotFound,
+ PoolReceiveErr,
+}
+
+impl From<urso::Error> for BuildError {
+ fn from(value: urso::Error) -> Self {
+ Self::Urso(value)
+ }
+}
+
+#[derive(Debug)]
+pub(crate) enum UpdateError {
+ Build(BuildError),
+ AdminDown,
+ ClientDown,
+}
+
+async fn launch_watcher(
+ repos: &[Repository],
+ config: &GlobalConfig,
+ admin: UnboundedSender<Message>,
+) -> Result<RecommendedWatcher, Box<dyn std::error::Error>> {
+ let (sender, mut receiver) = tokio::sync::mpsc::unbounded_channel();
+ let watcher = spawn_sync_watcher(repos, sender, config)?;
+
+ tokio::task::spawn(async move {
+ let mut debounced = Vec::new();
+ loop {
+ // If there are enqueued events, wake up early to dispath
+ // otherwise, wait forever for new events
+ let timeout = if debounced.is_empty() {
+ tokio::time::Duration::MAX
+ } else {
+ tokio::time::Duration::from_millis(500)
+ };
+ match tokio::time::timeout(timeout, receiver.recv()).await {
+ Ok(Some(event)) => {
+ tracing::trace!(?event, "received update request from watcher");
+ match event {
+ WatcherEvent::Update(name) => {
+ let (tx, rx) = oneshot::channel();
+ admin.send(Message::Update(name, tx)).expect("admin works");
+ match rx.await {
+ Ok(_) => tracing::trace!("reload success"),
+ Err(err) => tracing::error!(?err, "failure reloading template"),
+ };
+ }
+ WatcherEvent::ReloadTemplate(tmpl) => {
+ if !debounced.iter().any(|d| d == &tmpl) {
+ debounced.push(tmpl);
+ }
+ }
+ }
+ }
+ Ok(None) => {
+ tracing::error!("watcher sender closed. shutting down");
+ break;
+ }
+ Err(_timeout) => {
+ for tmpl in debounced.drain(..) {
+ tracing::debug!(tmpl, "reloading template");
+ let (tx, rx) = oneshot::channel();
+ let _ignored = admin.send(Message::Reload(tmpl, tx));
+ match rx.await {
+ Ok(_) => tracing::trace!("reload success"),
+ Err(err) => tracing::error!(?err, "failure reloading template"),
+ };
+ }
+ }
+ }
+ }
+ });
+
+ Ok(watcher)
+}
+
+fn spawn_sync_watcher(
+ repos: &[Repository],
+ sender: UnboundedSender<WatcherEvent>,
+ config: &GlobalConfig,
+) -> Result<notify::INotifyWatcher, Box<dyn std::error::Error>> {
+ let mut info_to_name = Vec::with_capacity(repos.len());
+ for repo in repos {
+ info_to_name.push((repo.handle.git_dir().join("info/refs"), repo.name.clone()));
+ }
+
+ let theme_dir = config.theme.dir()?;
+ let watch_theme = config.theme.watch_files();
+ let theme_dir_copy = theme_dir.clone();
+
+ let mut watcher =
+ notify::recommended_watcher(move |res: Result<notify::Event, notify::Error>| {
+ // sigh
+ let Ok(event) = res else {
+ return;
+ };
+
+ // git update-server-info just does a file replace dance
+ // it's very easy to detect the end of the dance: atomic
+ // rename where it swaps the temporary with the dst
+ // (don't really need to be precise here, tho: when the
+ // hook gets called the server _already_ has the update)
+ let dst = event.paths.get(1);
+ if matches!(
+ event.kind,
+ notify::EventKind::Modify(notify::event::ModifyKind::Name(
+ notify::event::RenameMode::Both
+ ))
+ ) && dst.is_some_and(|p| p.ends_with("info/refs"))
+ {
+ debug_assert_eq!(2, event.paths.len());
+ let path = dst.unwrap();
+ if let Some((_, name)) = info_to_name.iter().find(|(inforefs, _)| inforefs == path)
+ {
+ tracing::trace!(name, "detected change on repo");
+ let _ignored = sender.send(WatcherEvent::Update(name.clone()));
+ }
+ return;
+ }
+
+ if !watch_theme || theme_dir_copy.is_none() {
+ return;
+ }
+
+ // for themes, anything can happen sinde idk what's being
+ // used to edit the files
+ // debouncing is more important here
+ match event.kind {
+ notify::EventKind::Create(notify::event::CreateKind::File)
+ | notify::EventKind::Modify(notify::event::ModifyKind::Data(_)) => {}
+ _ => return,
+ };
+ let Some(path) = event.paths.first() else {
+ return;
+ };
+
+ if !path
+ // path.ends_with() is deceptive AF eh
+ .extension()
+ .and_then(|n| n.to_str())
+ .is_some_and(|n| n == "html")
+ {
+ return;
+ }
+
+ let dir = theme_dir_copy.as_ref().expect("checked for is_some");
+
+ if let Ok(rest) = path.strip_prefix(dir) {
+ tracing::trace!(path=?rest, "detected theme change");
+ let _ignored = sender.send(WatcherEvent::ReloadTemplate(
+ rest.to_string_lossy().into_owned(),
+ ));
+ }
+ })?;
+
+ for repo in repos.iter() {
+ watcher.watch(
+ &repo.handle.git_dir().join("info"),
+ notify::RecursiveMode::NonRecursive,
+ )?;
+ }
+ if watch_theme && theme_dir.is_some() {
+ watcher.watch(
+ theme_dir.as_ref().unwrap(),
+ notify::RecursiveMode::Recursive,
+ )?;
+ }
+
+ Ok(watcher)
+}
+
+#[derive(Debug)]
+enum WatcherEvent {
+ Update(String),
+ ReloadTemplate(String),
+}
Created caca/src/client/handler.rs
+use std::{path::PathBuf, sync::Arc};
+
+use axum::{
+ http::{header::CONTENT_TYPE, HeaderValue, StatusCode, Uri},
+ response::{IntoResponse, Redirect, Response},
+};
+
+use urso::{Error, Result, Urso, UrsoHandle};
+
+use crate::{
+ repo::{Context, RepoState, Repository},
+ view::{render, render_markdown_template, View},
+ GlobalConfig,
+};
+
+use super::{
+ popo::{Command, Popo},
+ State,
+};
+
+pub(super) struct Handler {
+ pub state: State,
+ pub config: Arc<GlobalConfig>,
+ pub popo: Popo<Blocking, Output>,
+ pub reverse_proxy_base: String,
+}
+
+impl Handler {
+ pub fn catch_up(&mut self, repo: Arc<RepoState>) -> bool {
+ tracing::trace!(
+ repo = repo.name,
+ head = repo.snapshot.head.commit.message.title,
+ "received new state"
+ );
+
+ self.state.repos.update(repo)
+ }
+
+ pub async fn handle(&self, uri: Uri) -> Response {
+ match self.route(uri).await {
+ // rustfmt formats `([smth],else).func()` horribly
+ Output::Markdown(data) => {
+ let headers = [(CONTENT_TYPE, HeaderValue::from_static("text/html"))];
+ (headers, render_markdown_template(&self.state.env, data)).into_response()
+ }
+ Output::Serve((mime, data)) => {
+ let headers = [(CONTENT_TYPE, HeaderValue::from_static(mime))];
+ (headers, data).into_response()
+ }
+ Output::Static(file) => {
+ if let Ok(file) = tokio::fs::File::open(file).await {
+ let body =
+ axum::body::Body::from_stream(tokio_util::io::ReaderStream::new(file));
+ (StatusCode::OK, body).into_response()
+ } else {
+ (StatusCode::NOT_FOUND).into_response()
+ }
+ }
+ Output::NotFound => StatusCode::NOT_FOUND.into_response(),
+ Output::Error(msg) => (StatusCode::INTERNAL_SERVER_ERROR, msg).into_response(),
+ Output::Template(tmpl) => render(&self.state.env, tmpl),
+ Output::Redirect(location) => Redirect::permanent(&location).into_response(),
+ }
+ }
+
+ async fn route(&self, uri: Uri) -> Output {
+ // request path without the leading slash
+ let path = {
+ let p = uri.path();
+ debug_assert!(p.starts_with('/'));
+ &p[1..]
+ };
+
+ if !validate_path(path) {
+ return Output::NotFound;
+ }
+
+ match path {
+ "" => {
+ return Output::Template(View::index(self.state.repos.listing()));
+ }
+ "atom.xml" => {
+ return Output::Template(View::global_feed(self.state.repos.global_feed()));
+ }
+ // otherwise it's a request for a repo: continue
+ _ => {}
+ };
+
+ let Some((repo, repo_uri)) = self.match_repo(path) else {
+ return Output::NotFound;
+ };
+
+ // Naked uri to the repo, no trailing slash. i.e.: <host>/repo
+ if repo_uri.is_empty() && !path.ends_with('/') {
+ return Output::Redirect(format!("{}/{path}/", self.reverse_proxy_base));
+ }
+
+ // uris look like:
+ // <host>:<port>/<repo-name>/<view>/<ctx>?/<path>
+ //
+ // `ctx` is optional and is what estabilishes the
+ // "HEAD" when running git commands
+ //
+ // it always looks something like:
+ //
+ // - /branch/<name>
+ // - /tag/<name>
+ // - /<sha1>
+ //
+ // so a uri like `example.com/repo/blob/branch/main/a.txt`
+ // is requesting the "blob" view, for the "a.txt" file
+ // within the "main" branch of the repo.
+ //
+ // i don't particularly like stuffing this in the path and
+ // initially had a `?r=branch/main` param instead; however,
+ // that complicated things when rendering user content.
+ //
+ // e.g.: relative hyperlinks in markdown files would need
+ // to (sometimes) preserve the parameter otherwise they'd
+ // point at a different version of the resource
+ //
+ // when the context is in the path this problem doesn't
+ // exist because all necessary metadata is in the base
+ // path the browser uses for relative urls
+ let ((view, rest), had_slash) = repo_uri
+ .split_once('/')
+ .map_or(((repo_uri, ""), false), |r| (r, true));
+ match view {
+ "" if rest.is_empty() => Output::Template(View::summary(repo.summary())),
+
+ // tree and www render user markdown; empty path
+ // redirects to / so that the (html) url base is stable
+ "tree" | "www" if !had_slash => {
+ Output::Redirect(format!("{}/{path}/", self.reverse_proxy_base))
+ }
+
+ "tree" => self.repo_tree(repo, rest).await,
+ "atom.xml" if rest.is_empty() => Output::Template(View::feed(repo.feed())),
+ "refs" if rest.is_empty() => Output::Template(View::refs(repo.refs())),
+ "blob" if !rest.is_empty() => self.repo_blob(repo, rest).await,
+ "raw" if !rest.is_empty() => self.repo_raw(repo, rest).await,
+ "log" => self.repo_log(repo, rest).await,
+ "www" => self.repo_www(repo, rest).await,
+ "commit" => self.repo_commit(repo, rest).await,
+
+ _ => self.repo_catchall(repo, repo_uri),
+ }
+ }
+
+ async fn repo_tree(&self, repo: &Repository, uri: &str) -> Output {
+ let Ok((ctx, path)) = repo.split_context(uri) else {
+ return Output::NotFound;
+ };
+
+ if path.ends_with('/') || path.is_empty() {
+ self.exec(repo, ctx, Exec::Tree(path.to_string())).await
+ } else {
+ // XXX smells like future regret
+ // if the guessed mime is "binary" => redirect to raw
+ // if it's textual => redirect to blob
+ // so that a hyperlink to a .md file gets the
+ // pretty version while also being able to use
+ // diplay images with the img tag
+ let (_, is_text) = urso::guess_mime(path, &[]);
+ if is_text {
+ Output::Redirect(repo.blob_url(&ctx, path))
+ } else {
+ Output::Redirect(repo.raw_url(&ctx, path))
+ }
+ }
+ }
+
+ async fn repo_blob(&self, repo: &Repository, uri: &str) -> Output {
+ let Ok((ctx, path)) = repo.split_context(uri) else {
+ return Output::NotFound;
+ };
+
+ if path.is_empty() {
+ return Output::NotFound;
+ }
+
+ self.exec(repo, ctx, Exec::Blob(path.to_string())).await
+ }
+
+ async fn repo_commit(&self, repo: &Repository, uri: &str) -> Output {
+ let Some(ctx) = Context::from_hex(uri) else {
+ return Output::NotFound;
+ };
+
+ self.exec(repo, ctx, Exec::Show).await
+ }
+
+ async fn repo_log(&self, repo: &Repository, uri: &str) -> Output {
+ let Ok((ctx, path)) = repo.split_context(uri) else {
+ return Output::NotFound;
+ };
+
+ self.exec(
+ repo,
+ ctx,
+ Exec::Log((self.config.log_size.get(), path.into())),
+ )
+ .await
+ }
+
+ async fn repo_raw(&self, repo: &Repository, uri: &str) -> Output {
+ let Ok((ctx, path)) = repo.split_context(uri) else {
+ return Output::NotFound;
+ };
+
+ if path.is_empty() {
+ return Output::NotFound;
+ }
+
+ self.exec(repo, ctx, Exec::Raw(path.to_string())).await
+ }
+
+ async fn repo_www(&self, repo: &Repository, uri: &str) -> Output {
+ let ctx = {
+ // XXX could still allow rendering from any ref
+ // by matching for context afterwards
+ if let Some(head) = repo.snapshot.www_head {
+ Context::from_id(head)
+ } else {
+ return Output::NotFound;
+ }
+ };
+
+ self.exec(repo, ctx, Exec::Www(uri.to_string())).await
+ }
+
+ fn repo_catchall(&self, repo: &Repository, rest: &str) -> Output {
+ if rest.is_empty() {
+ return Output::NotFound;
+ }
+
+ if is_dumb_clone(rest) {
+ if !self.config.allow_http_clone {
+ return Output::NotFound;
+ }
+
+ // overzealous: `rest` has been checked for sanity
+ // XXX maybe put this assurance in a type eh
+ let target = repo.handle.git_dir().join(rest);
+ if target.starts_with(repo.handle.git_dir()) {
+ return Output::Static(target);
+ }
+ return Output::NotFound;
+ }
+
+ let Ok((ctx, path)) = repo.split_context(rest) else {
+ return Output::NotFound;
+ };
+
+ // XXX smells like future regret
+ // ends with / => assume a tree
+ // maybe not text => serve raw
+ // otherwise => assume a blob
+ if path.is_empty() || path.ends_with('/') {
+ Output::Redirect(repo.tree_url(&ctx, path.trim_end_matches('/')))
+ } else {
+ debug_assert!(!path.is_empty());
+ let (_, is_text) = urso::guess_mime(path, &[]);
+ if is_text {
+ Output::Redirect(repo.blob_url(&ctx, path))
+ } else {
+ Output::Redirect(repo.raw_url(&ctx, path))
+ }
+ }
+ }
+
+ async fn exec(&self, repo: &Repository, ctx: Context, op: Exec) -> Output {
+ let cmd = Blocking {
+ repo: repo.state.clone(),
+ handle: repo.handle.clone(),
+ ctx,
+ op,
+ };
+
+ match self.popo.execute(cmd).await {
+ Ok(output) => output,
+ // FIXME better
+ Err(_err) => Output::Error("the pool is ded".into()),
+ }
+ }
+
+ fn match_repo<'u>(&self, uri: &'u str) -> Option<(&Repository, &'u str)> {
+ self.state.repos.split_path(uri)
+ }
+}
+
+fn is_dumb_clone(target: &str) -> bool {
+ target == "HEAD"
+ || (target.starts_with("info/") && target.len() > 5) // /info/.+
+ || (target.starts_with("objects/") && target.len() > 8) // /objects/.+
+}
+
+#[derive(Debug, Clone)]
+pub(crate) enum Output {
+ Markdown(Vec<u8>),
+ Serve((&'static str, Vec<u8>)),
+ Static(PathBuf),
+ NotFound,
+ Error(String),
+ Template(View),
+ Redirect(String),
+}
+
+#[derive(Clone)]
+pub(crate) struct Blocking {
+ repo: Arc<RepoState>,
+ handle: UrsoHandle,
+ ctx: Context,
+ op: Exec,
+}
+
+impl std::fmt::Debug for Blocking {
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ f.debug_struct("Command")
+ .field("handle", &self.handle)
+ .field("ctx", &self.ctx)
+ .field("op", &self.op)
+ .field("mailmap_version", &self.repo.snapshot.mailmap_version)
+ .finish_non_exhaustive()
+ }
+}
+
+impl PartialEq for Blocking {
+ fn eq(&self, other: &Self) -> bool {
+ self.handle == other.handle
+ && self.ctx == other.ctx
+ && self.op == other.op
+ && self.repo.snapshot == other.repo.snapshot
+ }
+}
+
+impl Eq for Blocking {}
+
+impl std::hash::Hash for Blocking {
+ fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
+ self.handle.git_dir().hash(state);
+ self.ctx.hash(state);
+ self.op.hash(state);
+ self.repo.snapshot.hash(state);
+ }
+}
+
+#[derive(Debug, Clone, PartialEq, Eq, std::hash::Hash)]
+enum Exec {
+ Show,
+ Tree(String),
+ Raw(String),
+ Blob(String),
+ Log((usize, String)),
+ Www(String),
+}
+
+impl Command for Blocking {
+ type Output = Output;
+
+ fn exec(self) -> Self::Output {
+ let Self {
+ repo,
+ handle,
+ ctx,
+ op,
+ } = self;
+
+ let urso = handle.into_urso();
+ match op.run(&urso, ctx, &repo) {
+ Ok(output) => output,
+ Err(Error::NotFound | Error::NotAFile(_)) => Output::NotFound,
+ Err(err) => {
+ tracing::error!(?err, "unhandled error within cpu worker");
+ Output::Error(format!("cpu worker error: {err}"))
+ }
+ }
+ }
+}
+
+impl Exec {
+ fn run(self, urso: &Urso, ctx: Context, repo: &RepoState) -> Result<Output> {
+ match self {
+ Self::Show => repo
+ .show_commit(urso, ctx)
+ .map(View::commit)
+ .map(Output::Template),
+ Self::Tree(path) => repo
+ .tree(urso, ctx, path)
+ .map(View::tree)
+ .map(Output::Template),
+ Self::Blob(path) => repo
+ .blob(urso, ctx, path)
+ .map(View::blob)
+ .map(Output::Template),
+ Self::Log((size, path)) => repo
+ .log(urso, ctx, size, path)
+ .map(View::log)
+ .map(Output::Template),
+ Self::Raw(path) => RepoState::raw(urso, ctx, path).map(Output::Serve),
+ Self::Www(path) => exec_www(&path, urso, &ctx),
+ }
+ }
+}
+
+fn exec_www(path: &str, urso: &Urso, ctx: &Context) -> urso::Result<Output> {
+ let mut data = Vec::new();
+ let mime = if path.is_empty() || path.ends_with('/') {
+ urso.read_firstof(ctx.head(), path, &["index.md", "index.html"], &mut data)?
+ .mime
+ } else {
+ let (mime, _) = urso.get_file_contents(ctx.head(), path, &mut data)?;
+ mime
+ };
+
+ if mime == "text/markdown" {
+ Ok(Output::Markdown(data))
+ } else {
+ Ok(Output::Serve((mime, data)))
+ }
+}
+
+// only reasonable paths are valid.
+// assumes already decoded url path
+// TODO wrap the in a type before i get bitten maybe
+pub(crate) fn validate_path(mut input: &str) -> bool {
+ loop {
+ if let Some((comp, tail)) = input.split_once('/') {
+ if tail.starts_with('/') {
+ tracing::trace!(input, "bad uri: repeated slashes detected");
+ return false;
+ }
+ if matches!(comp, "." | "..") {
+ tracing::trace!(input, "bad uri: obvious kid shit");
+ return false;
+ }
+
+ input = tail;
+ } else {
+ if matches!(input, "." | "..") {
+ tracing::trace!(input, "bad uri: obvious kid shit");
+ return false;
+ }
+ break;
+ }
+ }
+
+ true
+}
+
+#[cfg(test)]
+mod tests {
+
+ #[test]
+ fn path_validation() {
+ use super::validate_path;
+ assert!(!validate_path("."));
+ assert!(!validate_path(".."));
+ assert!(!validate_path("a/."));
+ assert!(!validate_path("b/./"));
+ assert!(!validate_path("/c/../"));
+ assert!(!validate_path("/d/../"));
+ assert!(!validate_path("//"));
+ assert!(!validate_path("//e"));
+ assert!(!validate_path("f//"));
+ }
+}
Created caca/src/client/mod.rs
+use axum::{
+ http::{StatusCode, Uri},
+ response::{IntoResponse, Response},
+};
+use std::sync::Arc;
+use tokio::sync::{mpsc, oneshot};
+
+use crate::{
+ repo::{RepoState, Repos},
+ GlobalConfig,
+};
+
+mod handler;
+mod popo;
+
+pub(crate) async fn launch(
+ config: Arc<GlobalConfig>,
+ state: State,
+ pool: Arc<rayon::ThreadPool>,
+) -> Client {
+ let popo = popo::launch(pool, config.cache_size).await;
+ let reverse_proxy_base = config
+ .site
+ .reverse_proxy_base
+ .as_ref()
+ .cloned()
+ .unwrap_or_default();
+ let mut handler = handler::Handler {
+ state,
+ config,
+ popo,
+ reverse_proxy_base,
+ };
+ let (sender, mut receiver) = mpsc::unbounded_channel();
+ let client = Client { sender };
+ tokio::spawn(async move {
+ while let Some(msg) = receiver.recv().await {
+ match msg {
+ Message::Handle(uri, dst) => {
+ if dst.send(handler.handle(uri).await).is_err() {
+ tracing::debug!("requester disconnected before response");
+ }
+ }
+ Message::CatchUp(params, dst) => {
+ let _ignored = dst.send(handler.catch_up(params));
+ }
+ Message::Reload(tmpl, dst) => {
+ handler.state.env.remove_template(&tmpl);
+ let _ignored = dst.send(handler.state.env.get_template(&tmpl).map(|_| ()));
+ }
+ }
+ }
+ });
+
+ client
+}
+
+pub(crate) struct State {
+ pub repos: Repos,
+ pub env: minijinja::Environment<'static>,
+}
+
+enum Message {
+ Handle(Uri, oneshot::Sender<Response>),
+ CatchUp(Arc<RepoState>, oneshot::Sender<bool>),
+ Reload(String, oneshot::Sender<Result<(), minijinja::Error>>),
+}
+
+#[derive(Clone, Debug)]
+pub(crate) struct Client {
+ sender: mpsc::UnboundedSender<Message>,
+}
+
+impl Client {
+ pub(crate) async fn handle(&self, uri: Uri) -> Response {
+ let (sender, receiver) = oneshot::channel();
+ if self.sender.send(Message::Handle(uri, sender)).is_err() {
+ return (StatusCode::INTERNAL_SERVER_ERROR, "client is closed").into_response();
+ }
+ match receiver.await {
+ Ok(output) => output,
+ Err(_) => (
+ StatusCode::INTERNAL_SERVER_ERROR,
+ "client crashed while handling request",
+ )
+ .into_response(),
+ }
+ }
+
+ pub(crate) async fn catchup(&self, params: Arc<RepoState>) -> bool {
+ let (sender, receiver) = oneshot::channel();
+ let _ignored = self.sender.send(Message::CatchUp(params, sender));
+ receiver.await.unwrap_or(false)
+ }
+
+ pub(crate) async fn reload_template(
+ &self,
+ filename: String,
+ dst: oneshot::Sender<Result<(), minijinja::Error>>,
+ ) -> bool {
+ self.sender.send(Message::Reload(filename, dst)).is_ok()
+ }
+}
Created caca/src/client/popo.rs
+use std::{num::NonZeroUsize, sync::Arc};
+
+use lru::LruCache;
+use rayon::ThreadPool;
+use tokio::sync::{mpsc, oneshot};
+
+pub trait Command: PartialEq + std::hash::Hash {
+ type Output;
+
+ fn exec(self) -> Self::Output;
+}
+
+pub(crate) async fn launch<C, O>(pool: Arc<ThreadPool>, cache_size: NonZeroUsize) -> Popo<C, O>
+where
+ C: 'static + Clone + std::fmt::Debug + Send + Eq + std::hash::Hash + Command<Output = O>,
+ O: 'static + Clone + std::fmt::Debug + Send,
+{
+ // The channel receives data from user input via the
+ // Popo and from the threadpool when it completes
+ // commands.
+ let (sender, mut receiver) = mpsc::unbounded_channel::<Message<C, O>>();
+
+ let mut state = State {
+ registry: Registry::new(),
+ cache: LruCache::new(cache_size),
+ loopback: sender.clone(),
+ pool,
+ };
+
+ tokio::spawn(async move {
+ loop {
+ while let Some(msg) = receiver.recv().await {
+ match msg {
+ Message::Dispatch(cmd, result_tx) => {
+ state.dispatch(cmd, result_tx);
+ }
+ Message::Complete(id, res) => {
+ state.complete(id, res);
+ }
+ };
+ }
+ }
+ });
+
+ Popo { sender }
+}
+
+#[derive(Debug)]
+enum Message<Command, Output> {
+ // emitted by the user via Deduper
+ Dispatch(Command, oneshot::Sender<Output>),
+ // emitted by the pool
+ Complete(Id, Output),
+}
+
+#[derive(Debug, Clone)]
+pub(crate) struct Popo<C, O> {
+ sender: mpsc::UnboundedSender<Message<C, O>>,
+}
+
+#[derive(Debug, Clone)]
+pub(crate) enum Error {
+ PoolClosed,
+ ResponseClosed,
+}
+
+impl<C, O> Popo<C, O> {
+ pub async fn execute(&self, cmd: C) -> Result<O, Error> {
+ let (result_tx, result_rx) = oneshot::channel();
+ self.send(Message::Dispatch(cmd, result_tx)).await?;
+ self.receive(result_rx).await
+ }
+
+ async fn send(&self, msg: Message<C, O>) -> Result<(), Error> {
+ self.sender
+ .send(msg)
+ .map_err(|_discarded| Error::PoolClosed)
+ }
+
+ async fn receive<T>(&self, result_rx: oneshot::Receiver<T>) -> Result<T, Error> {
+ result_rx.await.map_err(|_discarded| Error::ResponseClosed)
+ }
+}
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+struct Id(u32);
+
+struct Registry<K, V> {
+ inner: Vec<Entry<K, V>>,
+ next_id: u32,
+}
+
+#[derive(PartialEq, Debug, Clone)]
+struct Entry<K, V> {
+ id: Id,
+ key: K,
+ value: V,
+}
+
+impl<K: PartialEq, V> Registry<K, V> {
+ fn new() -> Self {
+ Self {
+ inner: Vec::default(),
+ next_id: 0,
+ }
+ }
+
+ fn publish(&mut self, key: K, value: V) -> Id {
+ debug_assert!(self.inner.iter().all(|i| i.key != key));
+ let id = Id(self.next_id);
+ self.next_id += self.next_id.wrapping_add(1);
+ self.inner.push(Entry { key, id, value });
+ id
+ }
+
+ fn unpublish(&mut self, id: Id) -> Option<(K, V)> {
+ self.inner.iter().position(|i| i.id == id).map(|pos| {
+ let entry = self.inner.swap_remove(pos);
+ (entry.key, entry.value)
+ })
+ }
+}
+
+struct State<K, V> {
+ registry: Registry<K, oneshot::Sender<V>>,
+ cache: LruCache<K, V>,
+ loopback: mpsc::UnboundedSender<Message<K, V>>,
+ pool: Arc<rayon::ThreadPool>,
+}
+
+impl<C, O> State<C, O>
+where
+ C: 'static + Clone + std::fmt::Debug + Send + Eq + std::hash::Hash + Command<Output = O>,
+ O: 'static + Clone + std::fmt::Debug + Send,
+{
+ fn complete(&mut self, id: Id, res: O) {
+ if let Some((cmd, dst)) = self.registry.unpublish(id) {
+ let _ignored = dst.send(res.clone());
+ self.cache.put(cmd, res);
+ }
+ }
+
+ fn dispatch(&mut self, cmd: C, dst: oneshot::Sender<O>) {
+ if let Some(cached) = self.cache.get(&cmd) {
+ let _ignored = dst.send(cached.clone());
+ return;
+ }
+
+ // Otherwise, publish it with a single subscriber
+ let id = self.registry.publish(cmd.clone(), dst);
+ // And submit the command to the pool
+ let sender = self.loopback.clone();
+ self.pool.spawn(move || {
+ // XXX on panic this ends up with a dangling enqueued
+ // command and one or more subscribers...
+ // ok since this pool propagates panics?
+ let result = cmd.exec();
+ // this is pool worker -> pool manager
+ // can only fail if the manager goes away
+ let _ignored_err = sender.send(Message::Complete(id, result));
+ });
+ }
+}
Created caca/src/config.rs
+use std::net::SocketAddr;
+
+use std::net::AddrParseError;
+use std::net::TcpListener;
+
+use crate::view::Theme;
+
+use std::num::NonZeroUsize;
+
+use std::path::PathBuf;
+
+#[derive(Debug, Clone)]
+pub(crate) struct GlobalConfig {
+ pub site: Site,
+ pub max_file_size_bytes: u64,
+ pub repo_object_cache_size: Option<usize>,
+ pub rename_similarity_threshold: Option<f32>,
+ pub metadata_config: Option<MetadataConfig>,
+ // TODO merge mailmap_config / global_mailmap
+ pub mailmap_config: MailmapConfig,
+ pub global_mailmap: Option<PathBuf>,
+ pub feed_size: Option<NonZeroUsize>,
+ pub log_size: NonZeroUsize,
+ pub allow_http_clone: bool,
+ pub cache_size: NonZeroUsize,
+ pub theme: Theme,
+ pub num_threads: Option<usize>,
+ pub export_all: bool,
+ pub listen_mode: ListenMode,
+ // blob_encoding?
+}
+
+#[derive(Debug, Clone)]
+pub(crate) struct Site {
+ pub listing_title: String,
+ pub listing_html_header: String,
+
+ pub base_url: String,
+ // for mounting it as a subfolder when reverse proxying
+ pub reverse_proxy_base: Option<String>,
+
+ // override the url displayed for clone
+ // gets the repo name appended
+ pub clone_base_url: Option<String>,
+}
+
+impl GlobalConfig {
+ // XXX iffy but i'm not doing a builder for this thing
+ pub fn check(self) -> crate::Result<Self> {
+ if self
+ .site
+ .clone_base_url
+ .as_ref()
+ .is_some_and(|u| u.ends_with('/'))
+ {
+ return Err("clone url must not end with slash".into());
+ }
+
+ let parsed = url::Url::parse(&self.repo_clone_url("repository"))?;
+ if !matches!(parsed.scheme(), "git" | "http" | "https") {
+ return Err(format!(
+ "clone url scheme must be git or http(s) got: {}",
+ parsed.scheme()
+ )
+ .into());
+ }
+
+ if !self.allow_http_clone && parsed.scheme().starts_with("http") {
+ return Err("clone url is http but http clone is disabled".into());
+ }
+
+ if self
+ .site
+ .reverse_proxy_base
+ .as_ref()
+ .is_some_and(|p| !p.starts_with('/') || p.ends_with('/'))
+ {
+ return Err(
+ "reverse proxy base must start with / and not end with it. ex: /valid".into(),
+ );
+ }
+
+ Ok(self)
+ }
+
+ pub fn repo_url(&self, name: &str) -> String {
+ format!(
+ "{}/{name}",
+ self.site.reverse_proxy_base.as_deref().unwrap_or_default()
+ )
+ }
+
+ pub fn repo_clone_url(&self, name: &str) -> String {
+ if let Some(ref url) = self.site.clone_base_url {
+ format!("{url}/{name}",)
+ } else {
+ format!(
+ "{}{}/{name}",
+ self.site.base_url,
+ self.site.reverse_proxy_base.as_deref().unwrap_or_default()
+ )
+ }
+ }
+
+ pub fn feed_base_url(&self) -> String {
+ // intentionally not using reverse_proxy_base
+ self.site.base_url.clone()
+ }
+}
+
+#[derive(Debug, Clone)]
+pub(crate) struct MetadataConfig {
+ pub spec: String,
+ pub filename: PathBuf,
+}
+
+impl Default for MetadataConfig {
+ fn default() -> Self {
+ Self {
+ spec: "HEAD".to_string(),
+ // gitconfig is not quite .ini eh
+ filename: PathBuf::from(".config/caca.ini"),
+ }
+ }
+}
+
+#[derive(Debug, Clone)]
+pub(crate) struct MailmapConfig {
+ pub spec: String,
+ pub filename: PathBuf,
+}
+
+impl Default for MailmapConfig {
+ fn default() -> Self {
+ Self {
+ spec: "HEAD".to_string(),
+ filename: PathBuf::from(".mailmap"),
+ }
+ }
+}
+
+#[derive(Debug, Clone)]
+#[allow(dead_code)]
+pub(crate) enum ListenMode {
+ External,
+ Bind(BindOptions),
+}
+
+#[allow(dead_code)]
+impl ListenMode {
+ pub fn external() -> Self {
+ Self::External
+ }
+
+ pub fn addr(addr: &str) -> std::result::Result<Self, AddrParseError> {
+ let addr = addr.parse()?;
+ Ok(Self::Bind(BindOptions {
+ addr,
+ admin_addr: None,
+ }))
+ }
+
+ pub fn with_admin(addr: &str, admin_addr: &str) -> std::result::Result<Self, AddrParseError> {
+ let addr = addr.parse()?;
+ let admin_addr = Some(admin_addr.parse()?);
+ Ok(Self::Bind(BindOptions { addr, admin_addr }))
+ }
+
+ pub fn to_non_blocking_sockets(&self) -> crate::Result<(TcpListener, Option<TcpListener>)> {
+ let (app, admin) = match self {
+ ListenMode::External => {
+ let mut env = listenfd::ListenFd::from_env();
+ let app = env
+ .take_tcp_listener(0)?
+ .ok_or("socket activation: need at least one tcp fd from env")?;
+ let admin = env.take_tcp_listener(1)?;
+ (app, admin)
+ }
+ ListenMode::Bind(opts) => {
+ let app = TcpListener::bind(opts.addr)?;
+ let admin = opts.admin_addr.map(TcpListener::bind).transpose()?;
+ (app, admin)
+ }
+ };
+
+ app.set_nonblocking(true)?;
+ if let Some(ref admin) = admin {
+ admin.set_nonblocking(true)?;
+ }
+
+ Ok((app, admin))
+ }
+}
+
+#[derive(Debug, Clone)]
+pub(crate) struct BindOptions {
+ pub(crate) addr: SocketAddr,
+ pub(crate) admin_addr: Option<SocketAddr>,
+}
Created caca/src/main.rs
+#![forbid(unsafe_code)]
+#![warn(
+ clippy::all,
+ clippy::await_holding_lock,
+ clippy::char_lit_as_u8,
+ clippy::checked_conversions,
+ clippy::dbg_macro,
+ clippy::debug_assert_with_mut_call,
+ clippy::doc_markdown,
+ clippy::empty_enum,
+ clippy::enum_glob_use,
+ clippy::exit,
+ clippy::expl_impl_clone_on_copy,
+ clippy::explicit_deref_methods,
+ clippy::explicit_into_iter_loop,
+ clippy::fallible_impl_from,
+ clippy::filter_map_next,
+ clippy::flat_map_option,
+ clippy::float_cmp_const,
+ clippy::fn_params_excessive_bools,
+ clippy::from_iter_instead_of_collect,
+ clippy::if_let_mutex,
+ clippy::implicit_clone,
+ clippy::imprecise_flops,
+ clippy::inefficient_to_string,
+ clippy::invalid_upcast_comparisons,
+ clippy::large_digit_groups,
+ clippy::large_stack_arrays,
+ clippy::large_types_passed_by_value,
+ clippy::let_unit_value,
+ clippy::linkedlist,
+ clippy::lossy_float_literal,
+ clippy::macro_use_imports,
+ clippy::manual_ok_or,
+ clippy::map_err_ignore,
+ clippy::map_flatten,
+ clippy::map_unwrap_or,
+ clippy::match_on_vec_items,
+ clippy::match_same_arms,
+ clippy::match_wild_err_arm,
+ clippy::match_wildcard_for_single_variants,
+ clippy::mem_forget,
+ clippy::mismatched_target_os,
+ clippy::missing_enforced_import_renames,
+ clippy::mut_mut,
+ clippy::mutex_integer,
+ clippy::needless_borrow,
+ clippy::needless_continue,
+ clippy::needless_for_each,
+ clippy::option_option,
+ clippy::path_buf_push_overwrite,
+ clippy::ptr_as_ptr,
+ clippy::rc_mutex,
+ clippy::ref_option_ref,
+ clippy::rest_pat_in_fully_bound_structs,
+ clippy::same_functions_in_if_condition,
+ clippy::semicolon_if_nothing_returned,
+ clippy::single_match_else,
+ clippy::string_add_assign,
+ clippy::string_add,
+ clippy::string_lit_as_bytes,
+ clippy::string_to_string,
+ clippy::trait_duplication_in_bounds,
+ clippy::unimplemented,
+ clippy::unnested_or_patterns,
+ clippy::unused_self,
+ clippy::useless_transmute,
+ clippy::verbose_file_reads,
+ clippy::zero_sized_map_values,
+ future_incompatible,
+ nonstandard_style,
+ rust_2018_idioms
+)]
+
+use std::{
+ num::NonZeroUsize,
+ path::{Path, PathBuf},
+ sync::Arc,
+};
+
+use tokio::net::TcpListener as AsyncTcpListener;
+
+use axum::{
+ extract::{Path as ReqPath, State},
+ http::{StatusCode, Uri},
+ response::{IntoResponse, Response},
+ routing::{get, post},
+ Router,
+};
+
+use tower_http::{limit::RequestBodyLimitLayer, trace::TraceLayer};
+
+use tracing_subscriber::{
+ filter::{EnvFilter, LevelFilter},
+ fmt,
+ prelude::*,
+};
+
+mod admin;
+mod client;
+mod config;
+mod metadata;
+mod repo;
+mod view;
+
+use crate::{client::Client, config::GlobalConfig, repo::RepoState, view::Theme};
+
+type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; // yolo
+
+fn main() -> Result<()> {
+ tracing_subscriber::registry()
+ .with(fmt::Layer::default().compact().without_time())
+ .with(
+ EnvFilter::builder()
+ .with_default_directive(LevelFilter::INFO.into())
+ .from_env_lossy(),
+ )
+ .init();
+
+ let config = Arc::new(
+ GlobalConfig {
+ site: config::Site {
+ listing_title: String::from("caio's code asylum"),
+ listing_html_header: String::from("<h1>caca</h1>"),
+ base_url: String::from("http://localhost:42080"),
+ clone_base_url: None,
+ // to allow mounting caca as a subdirectory
+ // when set, a repo url is base_url + reverse_proxy_base + / + name
+ reverse_proxy_base: None,
+ },
+ max_file_size_bytes: 2 * 1024 * 1024,
+ rename_similarity_threshold: Some(0.7),
+ repo_object_cache_size: Some(20 * 1024 * 1024),
+ metadata_config: Some(config::MetadataConfig::default()),
+ mailmap_config: config::MailmapConfig::default(),
+ global_mailmap: None,
+ feed_size: NonZeroUsize::new(40),
+ log_size: NonZeroUsize::new(30).unwrap(),
+ allow_http_clone: true,
+ cache_size: NonZeroUsize::new(1000).unwrap(),
+ // theme: Theme::Static,
+ theme: Theme::AutoReload("caca/theme".to_string()),
+ num_threads: None,
+ export_all: true, // false => require git-daemon-export-ok
+ listen_mode: config::ListenMode::addr("[::]:42080")?,
+ // listen_mode: config::ListenMode::with_admin("[::]:42080", "[::1]:42081")?,
+ // listen_mode: config::ListenMode::external(),
+ }
+ .check()?,
+ );
+
+ // May fiddle with env. keep it early at boot
+ let (app_listener, admin_listener) = config.listen_mode.to_non_blocking_sockets()?;
+
+ let num_threads = config
+ .num_threads
+ .unwrap_or(std::thread::available_parallelism()?.get());
+
+ let pool = Arc::new(
+ rayon::ThreadPoolBuilder::new()
+ .thread_name(|number| format!("caca-cpu-{number:02}"))
+ .num_threads(num_threads)
+ .build()?,
+ );
+
+ let basedir =
+ PathBuf::from(std::env::args().nth(1).expect("path as first arg")).canonicalize()?;
+
+ let repos = open_repos(basedir.clone(), &pool, Arc::clone(&config))?;
+ if repos.is_empty() {
+ // XXX not too difficult to allow admin add
+ return Err("No git repositories found".into());
+ }
+ tracing::info!("{} repositories loaded", repos.len());
+
+ let rt = tokio::runtime::Builder::new_current_thread()
+ .enable_all()
+ .build()?;
+
+ rt.block_on(async move {
+ let app_listener = AsyncTcpListener::from_std(app_listener)?;
+ let admin_listener = admin_listener.map(AsyncTcpListener::from_std).transpose()?;
+
+ let (admin, client) = admin::launch(repos, Arc::clone(&config), Arc::clone(&pool)).await?;
+
+ if let Some(listener) = admin_listener {
+ let addr = listener.local_addr()?;
+ tokio::spawn(async move {
+ tracing::info!(?addr, "Admin started");
+ let app = axum::Router::new()
+ .route("/update/*rest", post(admin_update))
+ .with_state(admin);
+ axum::serve(listener, app).await.expect("serves forever");
+ });
+ } else {
+ tracing::debug!("Admin NOT started");
+ }
+
+ let app = Router::new()
+ .route("/", get(handler))
+ .route("/*rest", get(handler))
+ .layer(TraceLayer::new_for_http())
+ .layer(RequestBodyLimitLayer::new(0))
+ .with_state(client);
+
+ let addr = app_listener.local_addr()?;
+ tracing::info!(?addr, "Server started");
+ axum::serve(app_listener, app).await?;
+ Ok(())
+ })
+}
+
+fn discover_git_repos(basedir: PathBuf, export_all: bool) -> Result<Vec<PathBuf>> {
+ let mut queue = vec![basedir];
+ let mut candidates = Vec::new();
+
+ 'queue: while let Some(dir) = queue.pop() {
+ let mut entries = std::fs::read_dir(&dir)?.flatten().collect::<Vec<_>>();
+
+ // backwards so i can swap_remove safely
+ for idx in (0..entries.len()).rev() {
+ let entry = &entries[idx];
+ let filetype = entry.file_type()?;
+ if let Some(name) = entry.path().file_name() {
+ if filetype.is_dir() && name == ".git" {
+ // dir contains a subdir named .git: likely a worktree
+ if !export_all && !dir.join(".git/git-daemon-export-ok").exists() {
+ tracing::debug!(?dir, ".git/git-daemon-export-ok not found");
+ } else {
+ candidates.push(dir);
+ }
+ continue 'queue;
+ }
+ if filetype.is_file() && name == "HEAD" {
+ // dir contains a HEAD file: likely a bare repo
+ if !export_all && !dir.join("git-daemon-export-ok").exists() {
+ tracing::debug!(?dir, "git-daemon-export-ok not found");
+ } else {
+ candidates.push(dir);
+ }
+ continue 'queue;
+ }
+ }
+
+ // retain the subdirectories to keep searching
+ if !filetype.is_dir() {
+ entries.swap_remove(idx);
+ }
+ }
+
+ // if `dir` is not a possible git repo, check its subdirectories
+ queue.extend(entries.into_iter().map(|e| e.path().clone()));
+ }
+
+ Ok(candidates)
+}
+
+async fn admin_update(
+ State(state): State<admin::SharedAdmin>,
+ ReqPath(repo): ReqPath<String>,
+) -> Response {
+ let guard = state.admin.lock().await;
+ match guard.update(repo).await {
+ Ok(()) => StatusCode::OK.into_response(),
+ Err(err) => match err {
+ admin::UpdateError::Build(err) => match err {
+ admin::BuildError::Urso(e) => {
+ (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()).into_response()
+ }
+ admin::BuildError::NotFound => StatusCode::NOT_FOUND.into_response(),
+ admin::BuildError::PoolReceiveErr => (
+ StatusCode::INTERNAL_SERVER_ERROR,
+ "no response from pool during build",
+ )
+ .into_response(),
+ },
+ admin::UpdateError::AdminDown => {
+ (StatusCode::INTERNAL_SERVER_ERROR, "no response from admin").into_response()
+ }
+ admin::UpdateError::ClientDown => {
+ (StatusCode::INTERNAL_SERVER_ERROR, "no response from client").into_response()
+ }
+ },
+ }
+}
+
+fn open_repos(
+ mut basedir: PathBuf,
+ pool: &rayon::ThreadPool,
+ config: Arc<GlobalConfig>,
+) -> Result<Vec<repo::Repository>> {
+ let git_dirs = discover_git_repos(basedir.clone(), config.export_all)?;
+
+ // special case for when one uses a git repo as a basedir:
+ // since basedir is used as a prefix, if it is a repository
+ // it'd end up with an empty name. so replace it with its
+ // parent
+ if git_dirs.len() == 1 && Path::new(&basedir) == git_dirs[0].as_path() {
+ let parent = &git_dirs[0].parent().ok_or("is / a git repo? lel")?;
+ tracing::warn!("basedir may be git repo. will use: {parent:?} as base");
+ basedir = parent.to_path_buf();
+ }
+
+ let prefix = format!("{}/", basedir.display());
+ let (sender, receiver) = std::sync::mpsc::channel();
+ let mut num_dirs = 0;
+ for dir in git_dirs {
+ num_dirs += 1;
+ let sender = sender.clone();
+ let config = Arc::clone(&config);
+
+ let name = dir
+ .to_string_lossy()
+ .strip_prefix(&prefix)
+ .map(|n| n.to_owned())
+ .ok_or_else(|| format!("found repo at {dir:?} not prefix of {prefix}"))?;
+
+ assert!(
+ !name.is_empty(),
+ "repo name must not be empty. prefix={prefix}"
+ );
+
+ pool.spawn(move || {
+ // Yield back a name, the opened repository and its state
+ // In case of errors it yields a name and which error happened
+ let to_send = match urso::Urso::open(
+ dir,
+ config.max_file_size_bytes,
+ config.rename_similarity_threshold,
+ config.repo_object_cache_size,
+ )
+ .and_then(|urso| {
+ RepoState::new(name.clone(), &urso, &config).map(|state| (urso, state))
+ }) {
+ Ok((urso, state)) => (name, Ok((urso.into_handle(), state))),
+ Err(err) => (name, Err(err)),
+ };
+
+ sender.send(to_send).expect("can send reply");
+ });
+ }
+
+ drop(sender);
+ let mut repos = Vec::with_capacity(num_dirs);
+ while let Ok((name, open_result)) = receiver.recv() {
+ match open_result {
+ Ok((handle, state)) => {
+ tracing::debug!(
+ name,
+ head = state.snapshot.head.commit.message.title,
+ "Repository loaded"
+ );
+ repos.push(repo::Repository {
+ handle,
+ state: Arc::new(state),
+ });
+ }
+ Err(err) => {
+ tracing::error!("Error loading repo {name}: {err}");
+ }
+ }
+ }
+
+ // just so there's some stability
+ repos.sort_unstable_by(|a, b| a.name.cmp(&b.name));
+ Ok(repos)
+}
+
+async fn handler(State(client): State<Client>, uri: Uri) -> Response {
+ client.handle(uri).await
+}
Created caca/src/metadata.rs
+#[derive(Debug, Clone, Default, PartialEq, Eq, std::hash::Hash, serde::Serialize)]
+pub(crate) struct Metadata {
+ pub description: Option<String>,
+ pub www: Option<String>,
+ pub links: Vec<Link>,
+ pub state: State,
+}
+
+#[derive(Debug, Clone, PartialEq, Eq, std::hash::Hash, serde::Serialize)]
+pub(crate) struct Link {
+ name: String,
+ href: String,
+ title: Option<String>,
+}
+
+#[derive(Debug, Clone, Copy, PartialEq, Eq, std::hash::Hash, serde::Serialize)]
+pub(crate) enum State {
+ Archived,
+ Default,
+ Pinned,
+}
+
+impl Default for State {
+ fn default() -> Self {
+ Self::Default
+ }
+}
+
+impl State {
+ fn from_str(s: &str) -> Option<Self> {
+ match s {
+ "Archived" | "archived" => Some(Self::Archived),
+ "Pinned" | "pinned" => Some(Self::Pinned),
+ "Default" | "default" => Some(Self::Default),
+ _ => None,
+ }
+ }
+}
+
+impl Metadata {
+ pub(crate) fn is_any_set(&self) -> bool {
+ self != &Self::default()
+ }
+}
+
+pub(crate) fn read_metadata(
+ urso: &urso::Urso,
+ rev: &str,
+ path: &std::path::Path,
+ buf: &mut Vec<u8>,
+) -> Metadata {
+ let head = match urso.rev_parse(rev) {
+ Ok(head) => head,
+ Err(err) => {
+ tracing::trace!(?err, "unable to parse rev spec");
+ return Metadata::default();
+ }
+ };
+
+ if let Err(err) = urso.get_file_contents(head, path, buf) {
+ tracing::trace!(?err, ?path, "unable to retrieve metadata file");
+ return Metadata::default();
+ };
+
+ match parse_metadata(buf) {
+ Ok(done) => done,
+ Err(err) => {
+ tracing::error!(?err, ?path, "bad metadata file");
+ Metadata::default()
+ }
+ }
+}
+
+#[derive(Debug, PartialEq)]
+pub(crate) enum ParseError {
+ Bug(&'static str),
+ Parse(urso::config::Error),
+}
+
+impl From<urso::config::Error> for ParseError {
+ fn from(value: urso::config::Error) -> Self {
+ Self::Parse(value)
+ }
+}
+
+impl std::fmt::Display for ParseError {
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ match self {
+ ParseError::Bug(e) => write!(f, "BUG: {}", e),
+ ParseError::Parse(w) => w.fmt(f),
+ }
+ }
+}
+
+impl std::error::Error for ParseError {
+ fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
+ match self {
+ ParseError::Parse(w) => Some(w),
+ ParseError::Bug(_) => None,
+ }
+ }
+}
+
+pub(crate) fn parse_metadata(data: &[u8]) -> Result<Metadata, ParseError> {
+ let mut metadata = Metadata::default();
+
+ let mut err = None;
+ urso::config::parse(data, |section, subsection, key, value| -> bool {
+ let value = String::from_utf8_lossy(value);
+ match section {
+ "meta" => {
+ if subsection.is_some() {
+ err = Some(ParseError::Bug("meta section must not have subsection"));
+ return false;
+ }
+ match key {
+ "description" => metadata.description = Some(value.into_owned()),
+ "www" => metadata.www = Some(value.into_owned()),
+ "state" => {
+ let state = State::from_str(&value).unwrap_or_else(|| {
+ tracing::warn!(state = value.as_ref(), "invalid state");
+ State::Default
+ });
+ metadata.state = state;
+ }
+ _ => {
+ tracing::warn!(section, subsection, key, ?value, "unknown key");
+ }
+ };
+ }
+ "link" => {
+ let Some(name) = subsection else {
+ err = Some(ParseError::Bug("links must have a subsection"));
+ return false;
+ };
+
+ let idx = metadata
+ .links
+ .iter()
+ .position(|l| l.name == name)
+ .unwrap_or_else(|| {
+ metadata.links.push(Link {
+ name: name.to_string(),
+ href: Default::default(),
+ title: None,
+ });
+ metadata.links.len() - 1
+ });
+
+ match key {
+ "href" => metadata.links[idx].href = value.into_owned(),
+ "title" => metadata.links[idx].title = Some(value.into_owned()),
+ _ => {
+ tracing::warn!(section, subsection, key, ?value, "unknown key");
+ }
+ };
+ }
+ // XXX is ignoring unknown best approach?
+ _ => {}
+ };
+
+ true
+ })?;
+
+ if let Some(err) = err {
+ Err(err)
+ } else {
+ Ok(metadata)
+ }
+}
+
+#[cfg(test)]
+mod tests {
+
+ #[test]
+ fn parses_ok() -> Result<(), super::ParseError> {
+ // nothing set is valid
+ for input in ["", "[meta]", "[unknown]"] {
+ let config = super::parse_metadata(input.as_bytes())?;
+ assert_eq!(super::Metadata::default(), config);
+ }
+
+ // empty value <> omission
+ {
+ let input = "
+ [meta]
+ description=";
+ let config = super::parse_metadata(input.as_bytes())?;
+ assert_eq!(Some("".into()), config.description);
+ }
+
+ // line escape ok
+ {
+ let input = "
+[meta]
+ description = hello \
+ world
+ www= some\
+ thing
+";
+ let config = super::parse_metadata(input.as_bytes())?;
+ assert_eq!(Some("hello world".into()), config.description);
+ assert_eq!(Some("something".into()), config.www);
+ }
+
+ Ok(())
+ }
+}
Created caca/src/repo/feed.rs
+use std::{
+ cmp::{Ordering, Reverse},
+ collections::BinaryHeap,
+ num::NonZeroUsize,
+};
+
+use super::{DateTime, HexId, Signature};
+
+#[derive(Clone, Debug, PartialEq, Eq, std::hash::Hash, serde::Serialize)]
+#[serde(tag = "kind")]
+pub(crate) enum FeedEntry {
+ Tag(TagActivity),
+ Branch(BranchActivity),
+}
+
+// Tag <tag> created on commit <id> by <author> | <time>
+#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord, serde::Serialize)]
+pub(crate) struct TagActivity {
+ // XXX need unique id for feed
+ pub tag_name: String,
+ pub browse_url: String,
+
+ pub tagger: Option<Signature>,
+ pub annotation: Option<String>,
+
+ pub commit: CommitActivity,
+}
+
+impl std::hash::Hash for TagActivity {
+ fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
+ self.browse_url.hash(state);
+ }
+}
+
+#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord, serde::Serialize)]
+pub(crate) struct CommitActivity {
+ pub author: Signature,
+
+ pub id: HexId,
+ pub url: String,
+
+ pub title: String,
+ pub body: String,
+}
+
+// <id> <commit message> on <branch> by <author> | <time>
+#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord, serde::Serialize)]
+pub(crate) struct BranchActivity {
+ pub branch_name: String,
+ pub browse_url: String,
+ pub is_default_branch: bool,
+
+ pub commit: CommitActivity,
+}
+
+impl std::hash::Hash for BranchActivity {
+ fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
+ self.browse_url.hash(state);
+ }
+}
+
+impl FeedEntry {
+ pub fn time(&self) -> DateTime {
+ match self {
+ FeedEntry::Tag(t) => t.tagger.as_ref().map_or(t.commit.author.time, |t| t.time),
+ FeedEntry::Branch(b) => b.commit.author.time,
+ }
+ }
+
+ // used for sorting stability
+ fn id(&self) -> HexId {
+ match self {
+ FeedEntry::Tag(t) => t.commit.id,
+ FeedEntry::Branch(b) => b.commit.id,
+ }
+ }
+}
+
+impl PartialOrd for FeedEntry {
+ fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
+ Some(self.cmp(other))
+ }
+}
+
+impl Ord for FeedEntry {
+ fn cmp(&self, other: &Self) -> Ordering {
+ self.time().cmp(&other.time()).then(match (self, other) {
+ (FeedEntry::Tag(a), FeedEntry::Tag(b)) => a.tag_name.cmp(&b.tag_name),
+ (FeedEntry::Branch(a), FeedEntry::Branch(b)) => a.branch_name.cmp(&b.branch_name),
+ // Tag > Branch
+ (FeedEntry::Branch(_), FeedEntry::Tag(_)) => Ordering::Less,
+ (FeedEntry::Tag(_), FeedEntry::Branch(_)) => Ordering::Greater,
+ })
+ }
+}
+
+#[derive(Clone, Debug, PartialEq, Eq, serde::Serialize)]
+pub(crate) struct GlobalFeedEntry {
+ pub(crate) repo: String,
+ #[serde(flatten)]
+ pub(crate) entry: FeedEntry,
+}
+
+impl PartialOrd for GlobalFeedEntry {
+ fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
+ Some(self.cmp(other))
+ }
+}
+
+impl Ord for GlobalFeedEntry {
+ fn cmp(&self, other: &Self) -> Ordering {
+ // time > name > id
+ self.entry.time().cmp(&other.entry.time()).then_with(|| {
+ self.repo
+ .cmp(&other.repo)
+ .then_with(|| self.entry.id().cmp(&other.entry.id()))
+ })
+ }
+}
+
+#[derive(Debug)]
+pub(crate) struct TopK<T: Ord> {
+ q: BinaryHeap<Reverse<T>>,
+ k: usize,
+}
+
+impl<T: Ord> TopK<T> {
+ pub(crate) fn new(k: NonZeroUsize) -> Self {
+ let k = k.get();
+ let q = BinaryHeap::with_capacity(k);
+ Self { q, k }
+ }
+
+ pub(crate) fn min(&self) -> Option<&T> {
+ self.q.peek().map(|Reverse(i)| i)
+ }
+
+ pub(crate) fn len(&self) -> usize {
+ self.q.len()
+ }
+
+ pub(crate) fn insert(&mut self, info: T) -> bool {
+ if self.q.len() == self.k {
+ if let Some(mut oldest) = self.q.peek_mut() {
+ if oldest.0 < info {
+ *oldest = Reverse(info);
+ return true;
+ }
+ }
+ } else {
+ self.q.push(Reverse(info));
+ return true;
+ }
+ false
+ }
+
+ pub(crate) fn finish(self) -> Vec<T> {
+ self.q
+ .into_sorted_vec()
+ .into_iter()
+ .map(|Reverse(c)| c)
+ .collect()
+ }
+}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+
+ #[test]
+ fn topk_works() {
+ let mut q = TopK::new(NonZeroUsize::new(3).unwrap());
+ for i in 0..6 {
+ q.insert(i);
+ }
+ assert_eq!(vec![5, 4, 3], q.finish());
+ }
+}
Created caca/src/repo/id.rs
+// A thin layer over gix::ObjectId that serializes
+// as hex string instead of tagged enum + &[u8]
+use urso::ObjectId;
+
+#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, std::hash::Hash)]
+pub struct HexId {
+ pub id: urso::ObjectId,
+}
+
+impl std::fmt::Debug for HexId {
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ write!(f, "{}", self.id.to_hex())
+ }
+}
+
+impl std::fmt::Display for HexId {
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ self.id.to_hex().fmt(f)
+ }
+}
+
+impl From<ObjectId> for HexId {
+ fn from(id: ObjectId) -> Self {
+ Self { id }
+ }
+}
+
+impl From<&ObjectId> for HexId {
+ fn from(&id: &ObjectId) -> Self {
+ Self { id }
+ }
+}
+
+impl From<HexId> for ObjectId {
+ fn from(val: HexId) -> Self {
+ val.id
+ }
+}
+
+impl serde::Serialize for HexId {
+ fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
+ where
+ S: serde::Serializer,
+ {
+ // FIXME will break on sha256
+ let mut hex = [0u8; 40];
+ let max_len = self.id.hex_to_buf(hex.as_mut());
+ let hex = std::str::from_utf8(&hex[..hex.len().min(max_len)]).expect("ascii only in hex");
+ serializer.serialize_str(hex)
+ }
+}
+
+#[cfg(test)]
+mod tests {
+ use serde_test::{assert_ser_tokens, Configure, Token};
+
+ #[test]
+ fn hexid_serializes_as_string() {
+ let hex = super::HexId {
+ id: urso::ObjectId::from_hex(&[b'0'; 40]).expect("null object hash is valid"),
+ };
+ assert_ser_tokens(
+ &hex.compact(),
+ &[Token::Str("0000000000000000000000000000000000000000")],
+ );
+ }
+}
Created caca/src/repo/mod.rs
+use std::{collections::HashSet, num::NonZeroUsize, sync::Arc};
+
+use chrono::TimeZone;
+use urso::{
+ diff::{Change, Event, Patch},
+ guess_mime, Error, Mailmap, ObjectId, Urso, UrsoHandle,
+};
+
+use crate::{
+ metadata::{read_metadata, Link, Metadata, State},
+ view::{self, render_markdown},
+ GlobalConfig,
+};
+
+mod feed;
+mod id;
+mod util;
+
+use feed::{FeedEntry, GlobalFeedEntry, TopK};
+
+pub(crate) use id::HexId;
+
+pub(crate) type DateTime = chrono::DateTime<chrono::FixedOffset>;
+
+#[derive(Clone)]
+pub(crate) struct Repository {
+ pub handle: UrsoHandle,
+ pub state: std::sync::Arc<RepoState>,
+}
+
+#[derive(Clone)]
+pub(crate) struct RepoState {
+ pub name: String,
+ pub clone_url: String,
+ pub repo_url: String,
+ pub feed_base_url: String,
+ pub reverse_proxy_base: String,
+ pub snapshot: Snapshot,
+ // NOTE gix-mailmap::Snapshot could impl debug+hash
+ pub mailmap: Mailmap,
+ pub feed: Vec<FeedEntry>,
+}
+
+impl RepoState {
+ pub fn raw(urso: &Urso, ctx: Context, path: String) -> urso::Result<(&'static str, Vec<u8>)> {
+ let mut data = Vec::new();
+ let (mime, _is_text) = urso.get_file_contents(ctx.head(), &path, &mut data)?;
+ Ok((mime, data))
+ }
+
+ // this is even worse than info/refs when a repo is large
+ pub fn refs(&self) -> Refs<'_> {
+ // XXX allocs like a mf could persist
+
+ let branches = self
+ .snapshot
+ .branches
+ .iter()
+ .map(|b| BranchRef {
+ name: b.name.clone(),
+ time: b.commit.author.time,
+ time_relative: b.commit.author.time.into(),
+ browse_url: self.branch_browse_url(&b.name),
+ log_url: self.branch_log_url(&b.name),
+ })
+ .collect();
+
+ let tags = self
+ .snapshot
+ .tags
+ .iter()
+ .map(|t| TagRef {
+ name: t.name.clone(),
+ annotation: t.annotation.as_ref().map(|a| a.message.clone()),
+ time: t.time(),
+ time_relative: t.time().into(),
+ browse_url: self.tag_browse_url(&t.name),
+ log_url: self.tag_log_url(&t.name),
+ })
+ .collect();
+ Refs {
+ repo: self.repo_info(),
+ branches,
+ tags,
+ }
+ }
+
+ pub fn feed(&self) -> RepoFeed<'_> {
+ RepoFeed {
+ repo: self.repo_info(),
+ updated: self.feed.first().map(|e| e.time()).unwrap_or_default(),
+ baseurl: &self.feed_base_url,
+ entries: &self.feed[..],
+ }
+ }
+
+ pub fn log(
+ &self,
+ urso: &Urso,
+ ctx: Context,
+ size: usize,
+ path: String,
+ ) -> urso::Result<Log<'_>> {
+ debug_assert!(size > 0);
+ let mut entries = Vec::with_capacity(size);
+ let mut err = None;
+ let mut next = None;
+
+ urso.log(ctx.head(), &path, |commit, path, _prev, _cur| {
+ match map_commit(&commit, &self.mailmap) {
+ Ok(mapped) => {
+ entries.push(self.commit_as_activity(mapped));
+ }
+ Err(e) => {
+ err = Some(e);
+ return false;
+ }
+ };
+ // if at capacity, figure out the parent so a
+ // pagination url can be generated
+ // FIXME not quite right if the boundary is a merge commit
+ // better to take size+1 and trim
+ if entries.len() == size {
+ if let Some(first_parent) = commit.parent_ids().next() {
+ next = Some((first_parent.detach(), path.display().to_string()));
+ }
+ }
+ entries.len() < size
+ })?;
+
+ let next_url = next.map(|(parent, path)| {
+ let ctx = Context::from_id(parent.into());
+ self.log_url(&ctx, &path)
+ });
+
+ if let Some(err) = err {
+ Err(err)
+ } else {
+ Ok(Log {
+ repo: self.repo_info(),
+ nav: self.build_breadcrumbs(ctx, &path, CrumbKind::Log),
+ entries,
+ path,
+ next_url,
+ })
+ }
+ }
+
+ pub fn new(name: String, urso: &Urso, config: &GlobalConfig) -> urso::Result<Self> {
+ let mut buf = Vec::new();
+
+ // XXX could hold this open and clone(), but
+ // then reloading would require process restart
+ // definitely won't add a fswatch for this...
+ let mut global_mailmap = None;
+ if let Some(ref path) = config.global_mailmap {
+ match std::fs::read(path) {
+ Ok(data) => {
+ global_mailmap = Some(Mailmap::from_bytes(&data));
+ }
+ Err(err) => tracing::warn!(?err, "bad global mailmap"),
+ }
+ }
+
+ let (mailmap_version, mailmap) = urso
+ .rev_parse(&config.mailmap_config.spec)
+ .and_then(|id| {
+ urso.get_file_contents(id, &config.mailmap_config.filename, &mut buf)
+ .map(|_| {
+ tracing::debug!(repo=?urso.git_dir(), object=?id, "mailmap found");
+ (Some(id), Mailmap::from_bytes(&buf))
+ })
+ })
+ .unwrap_or((None, Mailmap::default()));
+
+ let mailmap = global_mailmap
+ .map(|mut global| {
+ global.merge(mailmap.entries());
+ global
+ })
+ .unwrap_or_default();
+
+ let (tags, branches) = render_refs(urso, &mailmap)?;
+
+ let default_branch = urso.default_branch()?;
+ let head = branches
+ .iter()
+ .find(|b| b.name == default_branch)
+ // shouldn't happen eh
+ .ok_or_else(|| {
+ tracing::error!("unable to find branch {default_branch}");
+ Error::DetachedHead
+ })?
+ .clone();
+ let head_id = head.commit.id.id;
+
+ let metadata = if let Some(ref conf) = config.metadata_config {
+ buf.clear();
+ read_metadata(urso, &conf.spec, &conf.filename, &mut buf)
+ } else {
+ Metadata::default()
+ };
+
+ if metadata.is_any_set() {
+ tracing::debug!(repo=?urso.git_dir(), ?metadata, "loaded metadata");
+ }
+
+ // XXX could just resolve ref so that HEAD is valid
+ let www_head = metadata
+ .www
+ .as_ref()
+ .and_then(|n| branches.iter().find(|b| &b.name == n))
+ .map(|b| b.commit.id);
+
+ if metadata.www.is_some() && www_head.is_none() {
+ tracing::error!(repo=?urso.git_dir(), ?metadata, "configured www branch does not exist");
+ }
+
+ let snapshot = Snapshot {
+ head,
+ tags,
+ branches,
+ metadata,
+ www_head,
+ mailmap_version,
+ readme: find_readme(urso, head_id, "")?,
+ };
+
+ let clone_url = config.repo_clone_url(&name);
+ let repo_url = config.repo_url(&name);
+
+ let reverse_proxy_base = config
+ .site
+ .reverse_proxy_base
+ .as_ref()
+ .cloned()
+ .unwrap_or_default();
+
+ let mut repo = Self {
+ name,
+ repo_url,
+ clone_url,
+ feed_base_url: config.feed_base_url(),
+ snapshot,
+ mailmap,
+ reverse_proxy_base,
+ feed: Vec::new(),
+ };
+
+ if let Some(k) = config.feed_size {
+ repo.build_feed(urso, k)?;
+ }
+
+ Ok(repo)
+ }
+
+ fn default_context(&self) -> Context {
+ Context {
+ head: self.snapshot.head.commit.id,
+ kind: ContextKind::Branch(self.snapshot.head.name.clone()),
+ }
+ }
+
+ fn tag_as_activity(&self, tag: Tag) -> feed::TagActivity {
+ let (annotation, tagger) = tag
+ .annotation
+ .map(|ann| (Some(ann.message), Some(ann.author)))
+ .unwrap_or_default();
+
+ feed::TagActivity {
+ browse_url: self.tag_browse_url(&tag.name),
+ tag_name: tag.name,
+ tagger,
+ annotation,
+ commit: self.commit_as_activity(tag.commit),
+ }
+ }
+
+ fn commit_as_activity(&self, commit: CommitInfo) -> feed::CommitActivity {
+ feed::CommitActivity {
+ author: commit.author,
+ id: commit.id,
+ url: self.commit_url(commit.id),
+ title: commit.message.title,
+ body: commit.message.body,
+ }
+ }
+
+ fn branch_activity(&self, name: String, commit: CommitInfo) -> feed::BranchActivity {
+ feed::BranchActivity {
+ is_default_branch: self.snapshot.head.name == name,
+ browse_url: self.branch_browse_url(&name),
+ branch_name: name,
+ commit: self.commit_as_activity(commit),
+ }
+ }
+
+ fn build_feed(&mut self, urso: &Urso, k: NonZeroUsize) -> urso::Result<()> {
+ let mut feed = TopK::new(k);
+ // start with tags since there's no need to look further than
+ // their tip
+ for rev in self.snapshot.tags.iter() {
+ if !feed.insert(FeedEntry::Tag(self.tag_as_activity(rev.clone()))) {
+ break;
+ }
+ }
+
+ // commits may appear in multiple branches, but showing
+ // the same commit in the feed where just the branch
+ // name changed doesn't make much sense; so i don't
+ let mut seen = HashSet::new();
+ let limit = k.get();
+ let mut add_to_feed = |rev: &Branch| -> urso::Result<bool> {
+ // when at capacity, can't add anything if `rev`s tip
+ // is older than the oldest entry in the feed
+ if feed.len() == limit
+ && feed.min().expect("feed is at capacity").time() >= rev.commit.author.time
+ {
+ return Ok(false);
+ }
+
+ let mut taken = 0;
+ let mut err = None;
+ urso.log(rev.commit.id.id, "", |commit, _path, _prev, _cur| {
+ if !seen.insert(commit.id) {
+ return true;
+ }
+ match map_commit(&commit, &self.mailmap) {
+ Ok(mapped) => {
+ let entry =
+ FeedEntry::Branch(self.branch_activity(rev.name.clone(), mapped));
+ if !feed.insert(entry) {
+ return false;
+ }
+ taken += 1;
+ }
+ Err(e) => {
+ err = Some(e);
+ return false;
+ }
+ };
+ taken < limit
+ })?;
+ if let Some(err) = err {
+ Err(err)
+ } else {
+ Ok(true)
+ }
+ };
+
+ // so: first load activity from the default branch
+ // and keep track of the ids seen. this way only brand
+ // new commits on side branches will be listed
+ add_to_feed(&self.snapshot.head)?;
+
+ // now walk through the branches, using the tip as a
+ // guide to know wether there's any chance it contains
+ // recent commits
+ for rev in self.snapshot.branches.iter() {
+ if rev.name == self.snapshot.head.name {
+ continue;
+ }
+ if !add_to_feed(rev)? {
+ tracing::trace!(repo=?urso.git_dir(), branch=?rev.name, "old branch skipped");
+ break;
+ }
+ }
+
+ self.feed = feed.finish();
+ Ok(())
+ }
+
+ fn repo_pages(&self) -> Pages<'_> {
+ // XXX could persist
+ let ctx = self.default_context();
+ Pages {
+ files: self.tree_url(&ctx, ""),
+ history: self.log_url(&ctx, ""),
+ refs: format!("{}/{}/refs", self.reverse_proxy_base, self.name),
+ links: &self.snapshot.metadata.links,
+ }
+ }
+
+ pub fn summary(&self) -> Summary<'_> {
+ Summary {
+ repo: self.repo_info(),
+ pages: self.repo_pages(),
+ head: &self.snapshot.head,
+ readme: self.snapshot.readme.as_ref(),
+ // XXX could be a config eh
+ activity: &self.feed[..self.feed.len().min(10)],
+ }
+ }
+
+ pub fn show_commit(&self, urso: &Urso, ctx: Context) -> urso::Result<Commit<'_>> {
+ debug_assert!(
+ matches!(ctx.kind, ContextKind::Commit),
+ "bad wiring: only commit context should lead to show"
+ );
+ let commit = urso.find_commit(ctx.head())?;
+ let info = map_commit(&urso.find_commit(ctx.head())?, &self.mailmap)?;
+
+ let (parent, parent_ctx) = if let Some(parent) = commit.parent_ids().next() {
+ let pid = parent.detach();
+ (
+ Some(urso.find_commit(pid)?),
+ Some(Context::from_id(pid.into())),
+ )
+ } else {
+ (None, None)
+ };
+
+ // FIXME urls here should check for object.kind since
+ // i error404 for submodules/links. could fix that...
+ let mut events = Vec::new();
+ urso.diff(commit, parent, |event| {
+ match event {
+ Event::Addition(change) => {
+ let Change {
+ file,
+ object: _,
+ patch,
+ } = change;
+ let path = file.path.to_string_lossy().into_owned();
+ let diff = diff_from_patch(patch, file.mime);
+ let current_url = if matches!(diff, Diff::Image) {
+ Some(self.raw_url(&ctx, &path))
+ } else {
+ Some(self.blob_url(&ctx, &path))
+ };
+ events.push(DiffEvent {
+ kind: DiffEventKind::Created,
+ diff,
+ previous_url: None,
+ current_url,
+ path,
+ old_path: None,
+ });
+ }
+ Event::Deletion(change) => {
+ let Change {
+ file,
+ object: _,
+ patch,
+ } = change;
+ let path = file.path.to_string_lossy().into_owned();
+ let diff = diff_from_patch(patch, file.mime);
+ let previous_url = if matches!(diff, Diff::Image) {
+ parent_ctx.as_ref().map(|ctx| self.raw_url(ctx, &path))
+ } else {
+ parent_ctx.as_ref().map(|ctx| self.blob_url(ctx, &path))
+ };
+ events.push(DiffEvent {
+ kind: DiffEventKind::Deleted,
+ diff,
+ previous_url,
+ current_url: None,
+ path,
+ old_path: None,
+ });
+ }
+ Event::Modification { src: _, change } => {
+ let Change {
+ file,
+ object: _,
+ patch,
+ } = change;
+ let path = file.path.to_string_lossy().into_owned();
+ let diff = diff_from_patch(patch, file.mime);
+ let (current_url, previous_url) = if matches!(diff, Diff::Image) {
+ (
+ Some(self.raw_url(&ctx, &path)),
+ parent_ctx.as_ref().map(|ctx| self.raw_url(ctx, &path)),
+ )
+ } else {
+ (
+ Some(self.blob_url(&ctx, &path)),
+ parent_ctx.as_ref().map(|ctx| self.blob_url(ctx, &path)),
+ )
+ };
+ events.push(DiffEvent {
+ kind: DiffEventKind::Modified,
+ diff,
+ previous_url,
+ current_url,
+ path,
+ old_path: None,
+ });
+ }
+ Event::Rename {
+ src: _,
+ src_path,
+ change,
+ } => {
+ let Change {
+ file,
+ object: _,
+ patch,
+ } = change;
+ let src_path = src_path.to_string_lossy().into_owned();
+ let path = file.path.to_string_lossy().into_owned();
+
+ let diff = diff_from_patch(patch, file.mime);
+ let (current_url, previous_url) = if matches!(diff, Diff::Image) {
+ (
+ Some(self.raw_url(&ctx, &path)),
+ parent_ctx.as_ref().map(|ctx| self.raw_url(ctx, &src_path)),
+ )
+ } else {
+ (
+ Some(self.blob_url(&ctx, &path)),
+ parent_ctx.as_ref().map(|ctx| self.blob_url(ctx, &src_path)),
+ )
+ };
+
+ events.push(DiffEvent {
+ kind: DiffEventKind::Renamed,
+ diff,
+ previous_url,
+ current_url,
+ path,
+ old_path: Some(src_path),
+ });
+ }
+ };
+ })?;
+
+ Ok(Commit {
+ repo: self.repo_info(),
+ commit: info,
+ events,
+ })
+ }
+
+ pub fn blob(&self, urso: &Urso, ctx: Context, path: String) -> urso::Result<Blob<'_>> {
+ let (kind, content) = get_content(urso, &ctx, &path)?;
+ let raw_url = self.raw_url(&ctx, &path);
+
+ let mut num_lines = 0;
+ if let Some(ref data) = content {
+ num_lines = data.matches('\n').count();
+ if !data.ends_with('\n') {
+ num_lines += 1;
+ }
+ }
+
+ let tip = self.tip_from_info(
+ &ctx,
+ &path,
+ render_commit(urso, &self.mailmap, urso.tip(ctx.head(), &path)?)?,
+ );
+
+ Ok(Blob {
+ repo: self.repo_info(),
+ nav: self.build_breadcrumbs(ctx, &path, CrumbKind::Blob),
+ kind,
+ content: content.unwrap_or_default(),
+ raw_url,
+ num_lines,
+ tip,
+ path,
+ })
+ }
+
+ pub fn tree(&self, urso: &Urso, ctx: Context, path: String) -> urso::Result<Tree<'_>> {
+ let mut entries = Vec::new();
+ urso.list_path(ctx.head(), &path, |entry| {
+ let mode = entry.mode;
+ // XXX assuming utf8
+ let name = String::from_utf8_lossy(entry.name).into_owned();
+ let (kind, url) = {
+ if mode.is_tree() {
+ (EntryKind::Dir, self.tree_url_base(&ctx, &path, &name))
+ } else if mode.is_link() {
+ (EntryKind::Symlink, "#".into())
+ } else if mode.is_commit() {
+ (EntryKind::Submodule, "#".into())
+ } else {
+ debug_assert!(mode.is_blob(), "unhandled entry kind {}", mode.as_str());
+ (EntryKind::File, self.blob_url_base(&ctx, &path, &name))
+ }
+ };
+
+ entries.push(Entry { name, kind, url });
+ })?;
+ entries.sort_unstable_by(|a, b| a.kind.cmp(&b.kind).then_with(|| a.name.cmp(&b.name)));
+
+ let mut parent_url = None;
+ if !path.is_empty() {
+ let mut base = path.as_str();
+ if path.ends_with('/') {
+ base = &base[0..(base.len() - 1)];
+ }
+ if let Some((parent, _)) = base.rsplit_once('/') {
+ parent_url = Some(self.tree_url(&ctx, parent));
+ } else {
+ parent_url = Some(self.tree_url(&ctx, ""));
+ }
+ }
+
+ let tip = self.tip_from_info(
+ &ctx,
+ &path,
+ render_commit(urso, &self.mailmap, urso.tip(ctx.head(), &path)?)?,
+ );
+
+ let readme = find_readme(urso, ctx.head(), &path)?;
+
+ Ok(Tree {
+ repo: self.repo_info(),
+ nav: self.build_breadcrumbs(ctx, &path, CrumbKind::Tree),
+ path,
+ entries,
+ parent_url,
+ readme,
+ tip,
+ })
+ }
+
+ fn build_breadcrumbs(&self, ctx: Context, path: &str, kind: CrumbKind) -> Breadcrumbs {
+ let build_url = |path| match kind {
+ CrumbKind::Tree | CrumbKind::Blob => self.tree_url(&ctx, path),
+ CrumbKind::Log => self.log_url(&ctx, path),
+ };
+ let head_url = build_url("");
+ let mut components = Vec::new();
+ let mut tail = None;
+ util::breadcrumbs(path, |crumb| match crumb {
+ util::Crumb::Part { name, path } => {
+ components.push(Component {
+ value: name.to_string(),
+ url: build_url(path),
+ });
+ }
+ util::Crumb::End { name } => {
+ tail = Some(name.to_string());
+ }
+ });
+ Breadcrumbs {
+ head: ctx.into(),
+ head_url,
+ components,
+ tail,
+ kind,
+ }
+ }
+
+ pub fn log_url(&self, ctx: &Context, path: &str) -> String {
+ debug_assert!(!path.starts_with('/'), "bad input: {path}");
+ format!("{}/{}/log{ctx}/{path}", self.reverse_proxy_base, self.name)
+ }
+
+ pub fn tree_url(&self, ctx: &Context, path: &str) -> String {
+ debug_assert!(!path.ends_with('/'), "must not end with /: {path}");
+ if path.is_empty() {
+ format!("{}/{}/tree{ctx}/", self.reverse_proxy_base, self.name)
+ } else {
+ format!(
+ "{}/{}/tree{ctx}/{path}/",
+ self.reverse_proxy_base, self.name
+ )
+ }
+ }
+
+ pub fn commit_url(&self, id: impl Into<HexId>) -> String {
+ format!(
+ "{}/{}/commit/{}",
+ self.reverse_proxy_base,
+ self.name,
+ id.into()
+ )
+ }
+
+ fn tree_url_base(&self, ctx: &Context, base: &str, tail: &str) -> String {
+ debug_assert!(!base.starts_with('/'));
+ debug_assert!(!tail.ends_with('/'));
+ if base.is_empty() {
+ format!(
+ "{}/{}/tree{ctx}/{tail}/",
+ self.reverse_proxy_base, self.name
+ )
+ } else if base.ends_with('/') {
+ format!(
+ "{}/{}/tree{ctx}/{base}{tail}/",
+ self.reverse_proxy_base, self.name
+ )
+ } else {
+ format!(
+ "{}/{}/tree{ctx}/{base}/{tail}/",
+ self.reverse_proxy_base, self.name
+ )
+ }
+ }
+
+ fn blob_url_base(&self, ctx: &Context, base: &str, tail: &str) -> String {
+ debug_assert!(!base.starts_with('/'));
+ debug_assert!(!tail.ends_with('/'));
+ if base.is_empty() {
+ format!("{}/{}/blob{ctx}/{tail}", self.reverse_proxy_base, self.name)
+ } else if base.ends_with('/') {
+ format!(
+ "{}/{}/blob{ctx}/{base}{tail}",
+ self.reverse_proxy_base, self.name
+ )
+ } else {
+ format!(
+ "{}/{}/blob{ctx}/{base}/{tail}",
+ self.reverse_proxy_base, self.name
+ )
+ }
+ }
+
+ pub fn raw_url(&self, ctx: &Context, path: &str) -> String {
+ debug_assert!(!path.starts_with('/'), "bad input: {path}");
+ debug_assert!(!path.ends_with('/'), "bad input: {path}");
+ format!("{}/{}/raw{ctx}/{path}", self.reverse_proxy_base, self.name)
+ }
+
+ pub fn blob_url(&self, ctx: &Context, path: &str) -> String {
+ debug_assert!(!path.starts_with('/'), "bad input: {path}");
+ debug_assert!(!path.ends_with('/'), "bad input: {path}");
+ format!("{}/{}/blob{ctx}/{path}", self.reverse_proxy_base, self.name)
+ }
+
+ fn branch_browse_url(&self, name: &str) -> String {
+ format!(
+ "{}/{}/tree/branch/{name}/",
+ self.reverse_proxy_base, self.name
+ )
+ }
+
+ fn branch_log_url(&self, name: &str) -> String {
+ format!(
+ "{}/{}/log/branch/{name}/",
+ self.reverse_proxy_base, self.name
+ )
+ }
+
+ fn tag_browse_url(&self, name: &str) -> String {
+ format!("{}/{}/tree/tag/{name}/", self.reverse_proxy_base, self.name)
+ }
+
+ fn tag_log_url(&self, name: &str) -> String {
+ format!("{}/{}/log/tag/{name}/", self.reverse_proxy_base, self.name)
+ }
+
+ pub fn idle(&self) -> DateTime {
+ let head = self.snapshot.head.commit.author.time;
+ if let Some(tag) = self.snapshot.tags.first() {
+ let tagged_at = tag.time();
+ if tagged_at > head {
+ return tagged_at;
+ }
+ }
+ head
+ }
+
+ fn repo_info(&self) -> Info<'_> {
+ Info {
+ name: &self.name,
+ url: &self.repo_url,
+ clone_url: &self.clone_url,
+ description: self.snapshot.metadata.description.as_deref(),
+ }
+ }
+
+ fn tip_from_info(&self, ctx: &Context, path: &str, info: CommitInfo) -> Tip {
+ let log_url = self.log_url(ctx, path);
+ let url = self.commit_url(info.id);
+ Tip {
+ id: info.id,
+ author_name: info.author.name,
+ message: info.message,
+ author_time_relative: info.author.time.into(),
+ log_url,
+ url,
+ }
+ }
+
+ fn match_branch<'i>(&self, input: &'i str) -> Option<(&Branch, &'i str)> {
+ util::split_first_prefix(input, &self.snapshot.branches[..], |b| &b.name)
+ }
+
+ fn match_tag<'i>(&self, input: &'i str) -> Option<(&Tag, &'i str)> {
+ util::split_first_prefix(input, &self.snapshot.tags[..], |t| &t.name)
+ }
+
+ pub fn split_context<'i>(&self, input: &'i str) -> Result<(Context, &'i str), MatchError<'i>> {
+ let original_input = input;
+ let (kind, input) = original_input
+ .split_once('/')
+ .unwrap_or((original_input, ""));
+
+ match kind {
+ "branch" => {
+ if let Some((branch, input)) = self.match_branch(input) {
+ Ok((
+ Context {
+ head: branch.commit.id,
+ kind: ContextKind::Branch(branch.name.clone()),
+ },
+ input,
+ ))
+ } else {
+ Err(MatchError::BranchNotFound(input))
+ }
+ }
+ "tag" => {
+ if let Some((tag, input)) = self.match_tag(input) {
+ Ok((
+ Context {
+ head: tag.commit.id,
+ kind: ContextKind::Tag(tag.name.clone()),
+ },
+ input,
+ ))
+ } else {
+ Err(MatchError::TagNotFound(input))
+ }
+ }
+ // No reftype kind, maybe it's a commit
+ kind => {
+ if let Ok(commit_id) = ObjectId::from_hex(kind.as_bytes()) {
+ Ok((Context::from_id(commit_id.into()), input))
+ } else {
+ // otherwise, default context and assume the original
+ // input is a parameter for the view
+ Ok((self.default_context(), original_input))
+ }
+ }
+ }
+ }
+}
+
+fn get_content(
+ urso: &Urso,
+ ctx: &Context,
+ path: &str,
+) -> urso::Result<(ContentKind, Option<String>)> {
+ // guess the mime without loading data so that there's
+ // no need to read stuff that won't be rendered
+ let (mime, is_text) = urso::guess_mime(path, &[]);
+
+ // no need to buffer the content for images,
+ // just link to it
+ if mime.starts_with("image/") {
+ return Ok((ContentKind::Image, None));
+ }
+
+ // can't render otherwise
+ if !is_text {
+ return Ok((ContentKind::Other, None));
+ }
+
+ let Some(header) = urso.find_header(ctx.head(), path)? else {
+ return Err(Error::NotFound);
+ };
+
+ if !header.kind.is_blob() {
+ return Err(Error::NotFound);
+ }
+
+ if header.size > urso.max_bytes {
+ return Ok((ContentKind::TooLarge, None));
+ }
+
+ // actually load the data. mime guess is more reliable now
+ let mut data = Vec::new();
+ urso.read_blob(header.id, &mut data)?;
+ let (mime, is_text) = urso::guess_mime(path, &data);
+
+ if mime == "text/markdown" {
+ Ok((ContentKind::Rendered, Some(render_markdown(&data))))
+ } else if is_text {
+ match String::from_utf8(data) {
+ Ok(valid) => Ok((ContentKind::Text, Some(valid))),
+ Err(err) => {
+ tracing::warn!(
+ err = tracing::field::debug(err),
+ path = tracing::field::debug(&path),
+ "unable to decode"
+ );
+ Ok((ContentKind::Other, None))
+ }
+ }
+ } else {
+ tracing::warn!(path, mime, "mime guess mismatch: path-based said its text");
+ Ok((ContentKind::Other, None))
+ }
+}
+
+#[derive(Clone, Debug, PartialEq, Eq, std::hash::Hash)]
+pub(crate) struct Snapshot {
+ pub head: Branch,
+ pub metadata: Metadata,
+ pub www_head: Option<HexId>,
+ pub mailmap_version: Option<ObjectId>,
+ pub readme: Option<Readme>,
+ pub branches: Vec<Branch>,
+ pub tags: Vec<Tag>,
+ // sniff the license / spdx tag?
+}
+
+#[derive(Clone, Debug, serde::Serialize)]
+pub(crate) struct Readme {
+ id: HexId,
+ path: String,
+ content: String,
+ mime: &'static str,
+}
+
+impl std::hash::Hash for Readme {
+ fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
+ self.id.hash(state);
+ }
+}
+
+impl PartialEq for Readme {
+ fn eq(&self, other: &Self) -> bool {
+ self.id.eq(&other.id)
+ }
+}
+
+impl Eq for Readme {}
+
+// XXX fatal errors only; not found should yield none
+fn find_readme<P: AsRef<std::path::Path>>(
+ urso: &Urso,
+ head: ObjectId,
+ basedir: P,
+) -> urso::Result<Option<Readme>> {
+ let mut entries = Vec::new();
+ urso.list_path(head, &basedir, |entry| {
+ if entry.mode.is_blob()
+ && (entry.name.starts_with(b"README") || entry.name.starts_with(b"readme"))
+ {
+ // XXX assuming utf8
+ let name = String::from_utf8_lossy(entry.name).into_owned();
+ entries.push(ReadmeCandidate { id: entry.id, name });
+ }
+ })?;
+
+ // sort by longest name. lower case comes first
+ // so that the last entry is the shortest upper case
+ // i.e.: if there's are README, readme, and README-lang
+ // present, README will be the one that gets picked
+ entries.sort_unstable_by(|a, b| {
+ b.name
+ .len()
+ .cmp(&a.name.len())
+ .then_with(|| b.name.cmp(&a.name))
+ });
+ tracing::trace!(?entries, "ranked readme files");
+
+ if let Some(entry) = entries.pop() {
+ let mut data = Vec::new();
+ urso.read_blob(entry.id, &mut data)?;
+ let (mime, is_text) = guess_mime(&entry.name, &data);
+ tracing::debug!(name = entry.name, mime, "chosen a readme");
+
+ if mime == "text/markdown" {
+ Ok(Some(Readme {
+ id: entry.id.into(),
+ path: entry.name,
+ content: view::render_markdown(&data),
+ mime,
+ }))
+ } else if is_text {
+ // FIXME might not be utf8
+ match String::from_utf8(data) {
+ Ok(content) => Ok(Some(Readme {
+ id: entry.id.into(),
+ path: entry.name,
+ content,
+ mime,
+ })),
+ Err(error) => {
+ tracing::error!(?entry, "decoding readme: {}", error);
+ Ok(None)
+ }
+ }
+ } else {
+ tracing::warn!(?entry, "readme not text");
+ Ok(None)
+ }
+ } else {
+ Ok(None)
+ }
+}
+
+fn render_refs(urso: &Urso, mailmap: &Mailmap) -> urso::Result<(Vec<Tag>, Vec<Branch>)> {
+ let mut tags = Vec::new();
+ let mut branches = Vec::new();
+
+ let mut err = None;
+
+ urso.local_refs(|refkind| {
+ match refkind {
+ urso::RefKind::Tag { tag, head } => {
+ let Some(tagger) = tag.tagger else {
+ tracing::warn!(
+ repo = tracing::field::debug(urso.git_dir()),
+ name = tracing::field::debug(tag.name),
+ "skipped tag: missing tagger"
+ );
+ return true;
+ };
+
+ let commit = match maybe_render_commit(urso, mailmap, head) {
+ Ok(Some(commit)) => commit,
+ Ok(None) => {
+ tracing::warn!(
+ repo = tracing::field::debug(urso.git_dir()),
+ name = tracing::field::debug(tag.name),
+ "skipped tag: non-commit object"
+ );
+ return true;
+ }
+ Err(e) => {
+ err = Some(e);
+ return false;
+ }
+ };
+
+ let author = map_signature(tagger, mailmap);
+ let annotation = Some(Annotation {
+ author,
+ message: String::from_utf8_lossy(tag.message).into_owned(),
+ });
+
+ tags.push(Tag {
+ commit,
+ name: String::from_utf8_lossy(tag.name).into_owned(),
+ annotation,
+ });
+ }
+ urso::RefKind::Branch { name, head } => {
+ let name = String::from_utf8_lossy(name).into_owned();
+
+ let commit = match render_commit(urso, mailmap, head) {
+ Ok(commit) => commit,
+ Err(e) => {
+ err = Some(e);
+ return false;
+ }
+ };
+
+ branches.push(Branch { commit, name });
+ }
+ urso::RefKind::PlainTag { name, head } => {
+ let name = String::from_utf8_lossy(name).into_owned();
+
+ let commit = match maybe_render_commit(urso, mailmap, head) {
+ Ok(Some(commit)) => commit,
+ Ok(None) => {
+ tracing::warn!(
+ repo = tracing::field::debug(urso.git_dir()),
+ name,
+ "skipped plain tag: non-commit object"
+ );
+ return true;
+ }
+ Err(e) => {
+ err = Some(e);
+ return false;
+ }
+ };
+
+ tags.push(Tag {
+ commit,
+ name,
+ annotation: None,
+ });
+ }
+ };
+ true
+ })?;
+
+ if let Some(err) = err {
+ Err(err)
+ } else {
+ tags.sort_unstable_by_key(|a| std::cmp::Reverse(a.time()));
+ branches.sort_unstable_by_key(|a| std::cmp::Reverse(a.commit.author.time));
+ Ok((tags, branches))
+ }
+}
+
+#[derive(Debug)]
+pub(crate) enum MatchError<'a> {
+ BranchNotFound(&'a str),
+ TagNotFound(&'a str),
+}
+
+#[derive(Debug, Clone, serde::Serialize)]
+pub(crate) struct CommitInfo {
+ pub id: HexId,
+ pub author: Signature,
+ pub message: Message,
+}
+
+impl std::hash::Hash for CommitInfo {
+ fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
+ self.id.hash(state);
+ // self.author.hash(state);
+ }
+}
+
+impl PartialEq for CommitInfo {
+ fn eq(&self, other: &Self) -> bool {
+ self.id.eq(&other.id)
+ }
+}
+
+impl Eq for CommitInfo {}
+
+impl PartialOrd for CommitInfo {
+ fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
+ Some(self.cmp(other))
+ }
+}
+
+impl Ord for CommitInfo {
+ fn cmp(&self, other: &Self) -> std::cmp::Ordering {
+ self.author
+ .time
+ .cmp(&other.author.time)
+ .then_with(|| self.id.cmp(&other.id))
+ }
+}
+
+#[derive(Debug, Clone, serde::Serialize)]
+struct Tip {
+ id: HexId,
+ author_name: String,
+ message: Message,
+ author_time_relative: RelativeDateTime,
+ log_url: String,
+ url: String,
+}
+
+#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, serde::Serialize)]
+pub(crate) struct Signature {
+ name: String,
+
+ time: DateTime,
+ time_relative: RelativeDateTime,
+
+ email: String,
+ email_is_url: bool, // eurgh
+}
+
+impl std::hash::Hash for Signature {
+ fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
+ self.name.hash(state);
+ self.time.hash(state);
+ self.email.hash(state);
+ }
+}
+
+#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, std::hash::Hash, serde::Serialize)]
+pub(crate) struct Message {
+ pub title: String,
+ pub body: String,
+}
+
+#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, std::hash::Hash, serde::Serialize)]
+pub(crate) struct Branch {
+ pub commit: CommitInfo,
+ pub name: String,
+}
+
+#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, std::hash::Hash, serde::Serialize)]
+pub struct Annotation {
+ pub author: Signature,
+ pub message: String,
+}
+
+#[derive(Debug, Clone, PartialEq, Eq, serde::Serialize)]
+pub(crate) struct Tag {
+ pub commit: CommitInfo,
+ pub name: String,
+ pub annotation: Option<Annotation>,
+}
+
+impl Tag {
+ // most recent known time
+ fn time(&self) -> DateTime {
+ self.annotation
+ .as_ref()
+ .map_or(self.commit.author.time, |a| a.author.time)
+ }
+}
+
+impl std::hash::Hash for Tag {
+ fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
+ self.name.hash(state);
+ self.commit.hash(state);
+ }
+}
+
+fn convert(t: urso::Time) -> DateTime {
+ let offset = if t.sign == urso::TimeSign::Plus {
+ chrono::FixedOffset::east_opt(t.offset.abs())
+ } else {
+ chrono::FixedOffset::west_opt(t.offset.abs())
+ };
+ offset
+ .and_then(|o| o.timestamp_opt(t.seconds, 0).earliest())
+ .unwrap_or_else(|| {
+ tracing::error!(time=?t, "unable to convert gix::Time to chrono DateTime");
+ DateTime::default()
+ })
+}
+
+#[derive(Debug, serde::Serialize)]
+pub(crate) struct Pages<'a> {
+ files: String,
+ history: String,
+ refs: String,
+ // FIXME metadata.links
+ links: &'a [Link],
+}
+
+#[derive(Debug, serde::Serialize)]
+pub(crate) struct Info<'a> {
+ name: &'a str,
+ description: Option<&'a str>,
+ url: &'a str,
+ clone_url: &'a str,
+}
+
+#[derive(Debug, serde::Serialize)]
+pub(crate) struct Summary<'a> {
+ repo: Info<'a>,
+ pages: Pages<'a>,
+ head: &'a Branch,
+ readme: Option<&'a Readme>,
+ activity: &'a [FeedEntry],
+}
+
+#[derive(Debug, serde::Serialize)]
+pub(crate) struct Commit<'a> {
+ repo: Info<'a>,
+ commit: CommitInfo,
+ events: Vec<DiffEvent>,
+}
+
+#[derive(Debug, serde::Serialize)]
+pub(crate) struct DiffEvent {
+ path: String,
+ old_path: Option<String>, // FIXME only set for kind=renamed
+ kind: DiffEventKind,
+ diff: Diff,
+ current_url: Option<String>,
+ previous_url: Option<String>,
+}
+
+#[derive(Debug, serde::Serialize)]
+#[serde(tag = "kind", content = "value")]
+enum Diff {
+ Unified(Vec<Chunk>),
+ NoChange,
+ TooLarge,
+ Binary,
+ Image,
+}
+
+#[derive(Debug, serde::Serialize)]
+pub(crate) enum DiffEventKind {
+ Created,
+ Deleted,
+ Modified,
+ Renamed,
+}
+
+#[derive(Debug, serde::Serialize)]
+pub(crate) struct Blob<'a> {
+ repo: Info<'a>,
+ nav: Breadcrumbs,
+ kind: ContentKind,
+ content: String,
+ raw_url: String,
+ num_lines: usize,
+ tip: Tip,
+ path: String,
+}
+
+#[derive(Debug, PartialEq, Eq, serde::Serialize)]
+pub(crate) enum ContentKind {
+ Text,
+ Image,
+ Rendered,
+ TooLarge,
+ Other,
+}
+
+#[derive(Debug, serde::Serialize)]
+pub(crate) struct Tree<'a> {
+ repo: Info<'a>,
+ // context
+ nav: Breadcrumbs,
+ path: String,
+ entries: Vec<Entry>,
+ parent_url: Option<String>,
+ readme: Option<Readme>,
+ tip: Tip,
+}
+
+#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, serde::Serialize)]
+pub(crate) enum EntryKind {
+ Dir,
+ File,
+ Symlink,
+ Submodule,
+}
+
+#[derive(Debug, serde::Serialize)]
+struct Entry {
+ name: String,
+ kind: EntryKind,
+ url: String,
+}
+
+// maybeasin
+impl std::ops::Deref for Repository {
+ type Target = RepoState;
+
+ fn deref(&self) -> &Self::Target {
+ &self.state
+ }
+}
+
+#[derive(Debug, serde::Serialize)]
+pub(crate) struct Listing<'a> {
+ title: &'a str,
+ header_html: &'a str,
+ num_pinned: usize,
+ num_archived: usize,
+ repos: Vec<ListEntry<'a>>,
+}
+
+#[derive(Debug, serde::Serialize)]
+struct ListEntry<'a> {
+ name: &'a str,
+ description: Option<&'a str>,
+ state: State,
+ idle: DateTime,
+ idle_relative: RelativeDateTime,
+}
+
+// DateTime that becomes a relative time string
+// when serialized. One way.
+// XXX One problem I see with this is that copies
+// of the same thing will serialize differently
+// so if you render multiple copies you can end
+// up with: "now", "12 ms ago", "31 ms ago", ...
+// for the exact same contained DateTime
+// ...OK i guess
+#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord)]
+pub(crate) struct RelativeDateTime(DateTime);
+
+impl From<DateTime> for RelativeDateTime {
+ fn from(value: DateTime) -> Self {
+ Self(value)
+ }
+}
+
+impl serde::Serialize for RelativeDateTime {
+ fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
+ where
+ S: serde::Serializer,
+ {
+ let relative = format!("{}", chrono_humanize::HumanTime::from(self.0));
+ serializer.serialize_str(&relative)
+ }
+}
+
+pub(crate) struct Repos {
+ title: String,
+ header_html: String,
+ feed_base_url: String,
+ inner: Vec<Repository>,
+ feed: Vec<GlobalFeedEntry>,
+ feed_size: Option<NonZeroUsize>,
+}
+
+impl Repos {
+ pub fn new(config: &GlobalConfig, inner: Vec<Repository>) -> Self {
+ let mut repos = Self {
+ feed_base_url: config.feed_base_url(),
+ inner,
+ feed: Vec::new(),
+ feed_size: config.feed_size,
+ title: config.site.listing_title.clone(),
+ header_html: config.site.listing_html_header.clone(),
+ };
+ repos.build_global_feed();
+ repos
+ }
+
+ pub fn listing(&self) -> Listing<'_> {
+ // can be kept in memory eh
+ let mut listing = Listing {
+ num_pinned: 0,
+ num_archived: 0,
+ repos: Vec::with_capacity(self.inner.len()),
+ title: &self.title,
+ header_html: &self.header_html,
+ };
+ for r in self.inner.iter() {
+ match r.snapshot.metadata.state {
+ State::Archived => listing.num_archived += 1,
+ State::Default => {}
+ State::Pinned => listing.num_pinned += 1,
+ };
+ listing.repos.push(ListEntry {
+ name: &r.name,
+ description: r.snapshot.metadata.description.as_deref(),
+ state: r.snapshot.metadata.state,
+ idle: r.idle(),
+ idle_relative: r.idle().into(),
+ });
+ }
+ listing
+ .repos
+ .sort_unstable_by_key(|k| std::cmp::Reverse(k.idle));
+ listing
+ }
+
+ pub fn global_feed(&self) -> GlobalFeed<'_> {
+ GlobalFeed {
+ updated: self
+ .feed
+ .first()
+ .map(|e| e.entry.time())
+ .unwrap_or_default(),
+ baseurl: &self.feed_base_url,
+ entries: &self.feed,
+ }
+ }
+
+ // XXX could become add_or_replace() easily
+ pub fn update(&mut self, state: Arc<RepoState>) -> bool {
+ if let Some(found) = self.inner.iter_mut().find(|r| r.name == state.name) {
+ found.state = state;
+ true
+ } else {
+ false
+ }
+ }
+
+ fn build_global_feed(&mut self) {
+ let Some(k) = self.feed_size else {
+ return;
+ };
+ let mut topk = TopK::new(k);
+ for repo in self.inner.iter() {
+ for entry in repo.feed.iter() {
+ if !topk.insert(GlobalFeedEntry {
+ repo: repo.name.clone(),
+ entry: entry.clone(),
+ }) {
+ break;
+ }
+ }
+ }
+ self.feed = topk.finish();
+ }
+
+ pub fn split_path<'a>(&self, path: &'a str) -> Option<(&Repository, &'a str)> {
+ util::split_first_prefix(path, &self.inner[..], |r| &r.name)
+ }
+}
+
+#[derive(Debug, serde::Serialize)]
+pub(crate) struct Log<'a> {
+ repo: Info<'a>,
+ path: String,
+ nav: Breadcrumbs,
+ entries: Vec<feed::CommitActivity>,
+ next_url: Option<String>,
+}
+
+#[derive(Debug, serde::Serialize)]
+pub(crate) struct Refs<'a> {
+ repo: Info<'a>,
+ branches: Vec<BranchRef>,
+ tags: Vec<TagRef>,
+}
+
+#[derive(Debug, serde::Serialize)]
+struct BranchRef {
+ name: String,
+ browse_url: String,
+ log_url: String,
+ time: DateTime,
+ time_relative: RelativeDateTime,
+}
+
+#[derive(Debug, serde::Serialize)]
+struct TagRef {
+ name: String,
+ browse_url: String,
+ log_url: String,
+ annotation: Option<String>,
+ time: DateTime,
+ time_relative: RelativeDateTime,
+}
+
+#[derive(Debug, serde::Serialize)]
+pub(crate) struct RepoFeed<'a> {
+ repo: Info<'a>,
+ updated: DateTime,
+ baseurl: &'a str,
+ entries: &'a [FeedEntry],
+}
+
+#[derive(Debug, serde::Serialize)]
+pub(crate) struct GlobalFeed<'a> {
+ updated: DateTime,
+ baseurl: &'a str,
+ entries: &'a [GlobalFeedEntry],
+}
+
+#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord, std::hash::Hash)]
+pub(crate) struct Context {
+ head: HexId,
+ kind: ContextKind,
+}
+
+#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord, std::hash::Hash)]
+enum ContextKind {
+ Branch(String),
+ Tag(String),
+ Commit,
+}
+
+impl Context {
+ pub fn head(&self) -> ObjectId {
+ self.head.id
+ }
+
+ pub fn from_hex(hex: &str) -> Option<Self> {
+ urso::ObjectId::from_hex(hex.as_bytes())
+ .ok()
+ .map(Into::into)
+ .map(Self::from_id)
+ }
+
+ pub fn from_id(head: HexId) -> Self {
+ Self {
+ head,
+ kind: ContextKind::Commit,
+ }
+ }
+
+ fn format_url(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ // XXX must never end with a slash
+ match self.kind {
+ ContextKind::Branch(ref name) => {
+ write!(f, "/branch/{name}")
+ }
+ ContextKind::Tag(ref name) => write!(f, "/tag/{name}"),
+ ContextKind::Commit => write!(f, "/{}", self.head),
+ }
+ }
+}
+
+impl std::fmt::Display for Context {
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ self.format_url(f)
+ }
+}
+
+#[derive(Debug, serde::Serialize)]
+struct Breadcrumbs {
+ head: PlainContext,
+ // points at the start of the path,
+ // when components and tail are empty
+ head_url: String,
+
+ kind: CrumbKind,
+
+ // components
+ components: Vec<Component>,
+
+ tail: Option<String>,
+}
+
+#[derive(Debug, serde::Serialize)]
+struct Component {
+ value: String,
+ url: String,
+}
+
+#[derive(Debug, serde::Serialize)]
+#[serde(tag = "kind", content = "value")]
+enum PlainContext {
+ Commit(HexId),
+ Branch(String),
+ Tag(String),
+}
+
+impl From<Context> for PlainContext {
+ fn from(value: Context) -> Self {
+ match value.kind {
+ ContextKind::Branch(name) => PlainContext::Branch(name),
+ ContextKind::Tag(name) => PlainContext::Tag(name),
+ ContextKind::Commit => PlainContext::Commit(value.head),
+ }
+ }
+}
+
+#[derive(Debug, serde::Serialize)]
+enum CrumbKind {
+ Tree,
+ Log,
+ Blob,
+}
+
+fn render_commit(urso: &Urso, mailmap: &Mailmap, id: ObjectId) -> urso::Result<CommitInfo> {
+ let commit = urso.find_commit(id)?;
+ map_commit(&commit, mailmap)
+}
+
+// tags usually point at commits, but may point at
+// any kind of object (ex: junio-gpg-pub @ git.git)
+// for simplicitly, I simply ignore non-commit tags
+fn maybe_render_commit(
+ urso: &Urso,
+ mailmap: &Mailmap,
+ id: ObjectId,
+) -> urso::Result<Option<CommitInfo>> {
+ match urso.find_commit(id) {
+ Ok(commit) => map_commit(&commit, mailmap).map(Some),
+ Err(err) => {
+ tracing::trace!(?err, ?id, "not a commit");
+ Ok(None)
+ }
+ }
+}
+
+fn map_commit(commit: &urso::Commit<'_>, mailmap: &Mailmap) -> urso::Result<CommitInfo> {
+ let id = commit.id;
+ let commit = commit
+ .decode()
+ .map_err(|_discarded| urso::Error::Decode(id))?;
+
+ // commits have an optional encoding tag. when missing, utf8 is assumed
+ // XXX decoder
+ if commit.encoding.is_some() {
+ tracing::debug!(
+ "commit {} encoded with {:?}, decoded as utf-8",
+ id,
+ String::from_utf8_lossy(commit.encoding.unwrap())
+ );
+ }
+
+ let author = map_signature(commit.author(), mailmap);
+
+ let msg = commit.message();
+ let body = msg
+ .body
+ .map(|b| String::from_utf8_lossy(b).into_owned())
+ .unwrap_or_default();
+ let message = Message {
+ title: String::from_utf8_lossy(msg.title).trim_end().to_string(),
+ body,
+ };
+
+ Ok(CommitInfo {
+ id: id.into(),
+ author,
+ message,
+ })
+}
+
+fn map_signature(sig: urso::SignatureRef<'_>, mailmap: &Mailmap) -> Signature {
+ let mut mapped_name = None;
+ let mut mapped_email = None;
+
+ if let Some(mapped) = mailmap.try_resolve_ref(sig) {
+ mapped_name = mapped.name;
+ mapped_email = mapped.email;
+ }
+
+ let email = String::from_utf8_lossy(mapped_email.unwrap_or(sig.email)).into_owned();
+ let name = String::from_utf8_lossy(mapped_name.unwrap_or(sig.name)).into_owned();
+
+ let mut email_is_url = false;
+ if let Ok(valid_url) = url::Url::parse(&email) {
+ if matches!(valid_url.scheme(), "http" | "https") {
+ email_is_url = true;
+ } else {
+ tracing::trace!(email, "email url but no likey: {}", valid_url.scheme());
+ }
+ }
+
+ let time = convert(sig.time);
+ Signature {
+ name,
+ email,
+ time_relative: time.into(),
+ time,
+ email_is_url,
+ }
+}
+
+fn diff_from_patch(patch: Patch, mime: &str) -> Diff {
+ match patch {
+ Patch::Unified(diff) => Diff::Unified(diff.chunks.into_iter().map(Into::into).collect()),
+ Patch::InputTooLarge => Diff::TooLarge,
+ Patch::BinaryData => {
+ if mime.starts_with("image/") {
+ Diff::Image
+ } else {
+ Diff::Binary
+ }
+ }
+ Patch::NoChange => Diff::NoChange,
+ }
+}
+
+#[derive(Debug, serde::Serialize)]
+struct Chunk {
+ before_start: u32,
+ before_len: usize,
+ after_start: u32,
+ after_len: usize,
+ lines: Vec<Line>,
+}
+
+impl From<urso::diff::Chunk> for Chunk {
+ fn from(value: urso::diff::Chunk) -> Self {
+ Self {
+ before_start: value.before_pos.start + 1,
+ before_len: value.before_pos.len(),
+ after_start: value.after_pos.start + 1,
+ after_len: value.after_pos.len(),
+ lines: value.lines.into_iter().map(|x| x.into()).collect(),
+ }
+ }
+}
+
+#[derive(Debug, serde::Serialize)]
+struct Line {
+ kind: LineKind,
+ sign: char,
+ value: String,
+}
+
+#[derive(Debug, serde::Serialize)]
+enum LineKind {
+ Plus,
+ Minus,
+ Ctx,
+}
+
+impl From<urso::diff::Line> for Line {
+ fn from(value: urso::diff::Line) -> Self {
+ match value {
+ urso::diff::Line::Addition(line) => Self {
+ kind: LineKind::Plus,
+ sign: '+',
+ value: line,
+ },
+ urso::diff::Line::Removal(line) => Self {
+ kind: LineKind::Minus,
+ sign: '-',
+ value: line,
+ },
+ urso::diff::Line::Context(line) => Self {
+ kind: LineKind::Ctx,
+ sign: ' ',
+ value: line,
+ },
+ }
+ }
+}
+
+#[derive(Debug)]
+struct ReadmeCandidate {
+ id: ObjectId,
+ name: String,
+}
Created caca/src/repo/util.rs
+// find the first candidate whose name matches the
+// prefix of the input. yield the rest
+pub(crate) fn split_first_prefix<'c, 'input, T, F>(
+ input: &'input str,
+ candidates: &'c [T],
+ get_name: F,
+) -> Option<(&'c T, &'input str)>
+where
+ F: Fn(&'c T) -> &'c str,
+{
+ for r in candidates {
+ let name = get_name(r);
+ if let Some(rest) = input.strip_prefix(name) {
+ // it's only a match if there are no more chars
+ // or if it starts with the separator ('/')
+ if rest.is_empty() {
+ return Some((r, rest));
+ }
+ if let Some(rest) = rest.strip_prefix('/') {
+ return Some((r, rest));
+ }
+ // XXX Could return None here if I require length ordering
+ // But if I get to a point where this matters, there
+ // are better approaches than scanning
+ }
+ }
+ None
+}
+
+// visit every component of a (unix)path-like str
+// XXX expects sane paths as via client::handler::validate_path
+pub(crate) fn breadcrumbs<'a, F>(input: &'a str, mut visitor: F)
+where
+ F: FnMut(Crumb<'a>),
+{
+ debug_assert!(!input.starts_with('/'), "path must be relative");
+ if input.is_empty() {
+ return;
+ }
+
+ let input = {
+ if input.ends_with('/') {
+ &input[0..input.len() - 1]
+ } else {
+ input
+ }
+ };
+
+ debug_assert!(!input.is_empty());
+
+ let mut last = 0;
+ for (idx, _pat) in input.match_indices('/') {
+ debug_assert_ne!(0, idx);
+ let name = &input[last..idx];
+ let path = &input[0..idx];
+
+ visitor(Crumb::Part { name, path });
+ last = idx + 1;
+ }
+
+ let name = &input[last..];
+ debug_assert!(!name.is_empty());
+ visitor(Crumb::End { name });
+}
+
+#[derive(Debug, PartialEq, Eq)]
+pub(crate) enum Crumb<'a> {
+ Part { name: &'a str, path: &'a str },
+ End { name: &'a str },
+}
+
+#[cfg(test)]
+mod tests {
+
+ use super::*;
+
+ #[test]
+ fn match_ref_works() {
+ // just a simple wrapper over split_first_prefix
+ // match_ref actually came first, which is why
+ // this whole thing looks funny
+ fn match_ref<'i, 'r>(
+ refs: &'r [&'r str],
+ input: &'i str,
+ ) -> Option<(&'r &'r str, &'i str)> {
+ split_first_prefix(input, refs, |r| r)
+ }
+
+ let refs = ["main", "main2", "bob/bugfix", "alice/feature"];
+
+ // can match itself
+ for r in refs.iter() {
+ let (found, rest) =
+ match_ref(&refs[..], r).expect("should be able to match on exact name");
+ assert_eq!(r, found, "should find itself");
+ assert!(rest.is_empty());
+ }
+
+ // splits correctly
+ let cases = [
+ ("main/", "main", ""),
+ ("bob/bugfix/src/hue.rs", "bob/bugfix", "src/hue.rs"),
+ // longest wins, assuming precondition
+ ("main2/README", "main2", "README"),
+ // only one slash is gone after splitting
+ ("main//", "main", "/"),
+ ];
+
+ for (input, wanted_ref, wanted_rest) in cases.iter() {
+ let (found, rest) = match_ref(&refs[..], input).expect("matches something");
+ assert_eq!(wanted_ref, found, "bad ref for input: {}", input);
+ assert_eq!(*wanted_rest, rest, "bad rest for input: {}", input);
+ }
+
+ // doesn't accept partial matches nor junk
+ let junk = [
+ "/main",
+ "master",
+ "unknown/thing",
+ "",
+ " alice/feature",
+ "bob/bugfixx",
+ ];
+ for input in junk.iter() {
+ assert!(
+ match_ref(&refs[..], input).is_none(),
+ "must not think junk is gold"
+ );
+ }
+ }
+
+ fn as_crumbs(input: &str) -> Vec<Crumb<'_>> {
+ let mut out = Vec::new();
+ breadcrumbs(input, |crumb| out.push(crumb));
+ out
+ }
+
+ #[test]
+ fn crumbs() {
+ assert!(as_crumbs("").is_empty());
+ assert_eq!(vec![Crumb::End { name: "a" }], as_crumbs("a"));
+ assert_eq!(
+ vec![
+ Crumb::Part {
+ name: "a",
+ path: "a"
+ },
+ Crumb::End { name: "b" },
+ ],
+ as_crumbs("a/b")
+ );
+ assert_eq!(
+ vec![
+ Crumb::Part {
+ name: "a",
+ path: "a"
+ },
+ Crumb::Part {
+ name: "b",
+ path: "a/b"
+ },
+ Crumb::End { name: "c" },
+ ],
+ as_crumbs("a/b/c")
+ );
+ }
+}
Created caca/src/view.rs
+use std::path::Path;
+
+use axum::{
+ http::{header::CONTENT_TYPE, HeaderName, HeaderValue, StatusCode},
+ response::IntoResponse,
+};
+use minijinja::Environment;
+
+use crate::repo::{Blob, Commit, GlobalFeed, Listing, Log, Refs, RepoFeed, Summary, Tree};
+
+#[derive(Debug, Clone)]
+#[allow(dead_code)]
+pub(crate) enum Theme {
+ Static,
+ Dir(String),
+ AutoReload(String),
+}
+
+impl Theme {
+ pub(crate) fn dir(&self) -> std::io::Result<Option<std::path::PathBuf>> {
+ match self {
+ Theme::Static => Ok(None),
+ Theme::Dir(n) | Theme::AutoReload(n) => {
+ Some(Path::new(n).to_path_buf().canonicalize()).transpose()
+ }
+ }
+ }
+
+ pub(crate) fn watch_files(&self) -> bool {
+ matches!(self, Theme::AutoReload(_))
+ }
+
+ pub(crate) fn env(&self) -> Result<minijinja::Environment<'static>, minijinja::Error> {
+ match self {
+ Theme::Static => static_env(),
+ Theme::Dir(n) | Theme::AutoReload(n) => dir_env(n),
+ }
+ }
+}
+
+pub(crate) fn static_env() -> Result<Environment<'static>, minijinja::Error> {
+ let mut env = Environment::new();
+
+ // XXX any non-silly way of making this easier?
+ env.add_template("base.html", include_str!("../theme/base.html"))?;
+ env.add_template("repo.html", include_str!("../theme/repo.html"))?;
+ env.add_template("macros.html", include_str!("../theme/macros.html"))?;
+
+ env.add_template("index.html", include_str!("../theme/index.html"))?;
+ env.add_template("summary.html", include_str!("../theme/summary.html"))?;
+ env.add_template("tree.html", include_str!("../theme/tree.html"))?;
+ env.add_template("blob.html", include_str!("../theme/blob.html"))?;
+ env.add_template("log.html", include_str!("../theme/log.html"))?;
+ env.add_template("www.html", include_str!("../theme/www.html"))?;
+ env.add_template("refs.html", include_str!("../theme/refs.html"))?;
+ env.add_template("commit.html", include_str!("../theme/commit.html"))?;
+ env.add_template("404.html", include_str!("../theme/404.html"))?;
+ env.add_template("500.html", include_str!("../theme/500.html"))?;
+ env.add_template("www.html", include_str!("../theme/www.html"))?;
+
+ env.add_template("atom.xml.html", include_str!("../theme/atom.xml.html"))?;
+ env.add_template(
+ "global_atom.xml.html",
+ include_str!("../theme/global_atom.xml.html"),
+ )?;
+
+ check_env(&env)?;
+
+ Ok(env)
+}
+
+pub(crate) fn dir_env(dir: &str) -> Result<Environment<'static>, minijinja::Error> {
+ let mut env = Environment::new();
+ env.set_loader(minijinja::path_loader(dir));
+ check_env(&env)?;
+ Ok(env)
+}
+
+fn check_env(env: &Environment<'_>) -> Result<(), minijinja::Error> {
+ for kind in Kind::VALUES {
+ env.get_template(kind.path())?;
+ }
+
+ Ok(())
+}
+
+#[derive(Debug, Clone)]
+pub(crate) struct View {
+ kind: Kind,
+ data: minijinja::Value,
+}
+
+impl View {
+ pub(crate) fn tree(data: Tree<'_>) -> Self {
+ Self {
+ kind: Kind::Tree,
+ data: minijinja::Value::from_serializable(&data),
+ }
+ }
+
+ pub(crate) fn commit(data: Commit<'_>) -> Self {
+ Self {
+ kind: Kind::Commit,
+ data: minijinja::Value::from_serializable(&data),
+ }
+ }
+
+ pub(crate) fn blob(data: Blob<'_>) -> Self {
+ Self {
+ kind: Kind::Blob,
+ data: minijinja::Value::from_serializable(&data),
+ }
+ }
+
+ pub(crate) fn summary(data: Summary<'_>) -> Self {
+ Self {
+ kind: Kind::Summary,
+ data: minijinja::Value::from_serializable(&data),
+ }
+ }
+
+ pub(crate) fn index(data: Listing<'_>) -> Self {
+ Self {
+ kind: Kind::Index,
+ data: minijinja::Value::from_serializable(&data),
+ }
+ }
+
+ pub(crate) fn feed(data: RepoFeed<'_>) -> Self {
+ Self {
+ kind: Kind::Feed,
+ data: minijinja::Value::from_serializable(&data),
+ }
+ }
+
+ pub(crate) fn global_feed(data: GlobalFeed<'_>) -> Self {
+ Self {
+ kind: Kind::GlobalFeed,
+ data: minijinja::Value::from_serializable(&data),
+ }
+ }
+
+ pub(crate) fn refs(data: Refs<'_>) -> Self {
+ Self {
+ kind: Kind::Refs,
+ data: minijinja::Value::from_serializable(&data),
+ }
+ }
+
+ pub(crate) fn log(data: Log<'_>) -> Self {
+ Self {
+ kind: Kind::Log,
+ data: minijinja::Value::from_serializable(&data),
+ }
+ }
+}
+
+#[derive(Debug, Clone)]
+enum Kind {
+ Index,
+ Summary,
+ Tree,
+ Blob,
+ Log,
+ Www,
+ Commit,
+ Refs,
+ NotFound,
+ Error,
+ Feed,
+ GlobalFeed,
+}
+
+impl Kind {
+ const VALUES: [Kind; 12] = [
+ Kind::Index,
+ Kind::Summary,
+ Kind::Tree,
+ Kind::Blob,
+ Kind::Log,
+ Kind::Www,
+ Kind::Commit,
+ Kind::Refs,
+ Kind::NotFound,
+ Kind::Error,
+ Kind::Feed,
+ Kind::GlobalFeed,
+ ];
+
+ const fn path(&self) -> &'static str {
+ match self {
+ Kind::Index => "index.html",
+ Kind::Summary => "summary.html",
+ Kind::Tree => "tree.html",
+ Kind::Blob => "blob.html",
+ Kind::Log => "log.html",
+ Kind::Www => "www.html",
+ Kind::Commit => "commit.html",
+ Kind::Refs => "refs.html",
+ Kind::NotFound => "404.html",
+ Kind::Error => "500.html",
+ Kind::Feed => "atom.xml.html",
+ Kind::GlobalFeed => "global_atom.xml.html",
+ }
+ }
+
+ const fn headers(&self) -> [(HeaderName, HeaderValue); 1] {
+ if matches!(self, Kind::Feed | Kind::GlobalFeed) {
+ [(
+ CONTENT_TYPE,
+ HeaderValue::from_static("application/atom+xml"),
+ )]
+ } else {
+ [(CONTENT_TYPE, HeaderValue::from_static("text/html"))]
+ }
+ }
+}
+
+pub(crate) fn render(env: &Environment<'_>, view: View) -> axum::response::Response {
+ let Ok(tmpl) = env.get_template(view.kind.path()) else {
+ return (
+ StatusCode::INTERNAL_SERVER_ERROR,
+ format!("template not found: {}", view.kind.path()),
+ )
+ .into_response();
+ };
+ match tmpl.render(view.data) {
+ Ok(rendered) => (view.kind.headers(), rendered).into_response(),
+ Err(err) => (
+ StatusCode::INTERNAL_SERVER_ERROR,
+ format!("rendering template: {err:?}"),
+ )
+ .into_response(),
+ }
+}
+
+pub(crate) fn render_markdown_template(env: &Environment<'_>, data: Vec<u8>) -> String {
+ // split frontmatter
+ let (frontmatter, content) = split_frontmatter(&data);
+
+ let mut front = std::collections::HashMap::<String, String>::new();
+ if let Some(matter) = frontmatter {
+ if let Err(err) = urso::config::parse(matter.data, |section, _sub, key, value| -> bool {
+ if section == "page" {
+ front.insert(key.to_string(), String::from_utf8_lossy(value).into_owned());
+ } else {
+ tracing::warn!(
+ section,
+ key,
+ value = String::from_utf8_lossy(value).as_ref(),
+ "unknown frontmatter section"
+ );
+ }
+ true
+ }) {
+ tracing::warn!(?err, "discarded broken frontmatter");
+ };
+ }
+
+ // XXX could let markdown use the frontmatter too, but i don't need
+ // it now
+ let content = render_markdown(content);
+
+ let Ok(tmpl) = env.get_template(Kind::Www.path()) else {
+ // shouldn't happen: startup tests known kinds
+ tracing::error!("missing www template. rendering blank");
+ return String::default();
+ };
+
+ match tmpl.render(minijinja::context! {
+ page => front,
+ content => content,
+ }) {
+ Ok(rendered) => rendered,
+ Err(err) => {
+ tracing::error!(?err, "error rendering user page");
+ String::default()
+ }
+ }
+}
+
+#[derive(Debug, PartialEq, Eq)]
+struct Payload<'a> {
+ data: &'a [u8],
+ kind: PayloadKind,
+}
+
+#[derive(Debug, PartialEq, Eq)]
+enum PayloadKind {
+ Plus,
+ Minus,
+}
+
+fn split_frontmatter(data: &[u8]) -> (Option<Payload<'_>>, &[u8]) {
+ let (needle, kind) = if data.starts_with(b"+++\n") {
+ (b"\n+++\n", PayloadKind::Plus)
+ } else if data.starts_with(b"---\n") {
+ (b"\n---\n", PayloadKind::Minus)
+ } else {
+ return (None, data);
+ };
+
+ let width = needle.len();
+ // start at the newline instead of immediatelly
+ // after it so that ---\n---\n is valid
+ let rest = &data[3..];
+
+ if let Some(pos) = rest.windows(width).position(|haystack| haystack == needle) {
+ // pos == 0 only happens with a dummy zero-length
+ // frontmatter.
+ // otherwise the offset is 1 to skip the newline
+ // character from the starting needle
+ let data = if pos == 0 { &[] } else { &rest[1..pos] };
+ (Some(Payload { data, kind }), &rest[(pos + width)..])
+ } else {
+ (None, data)
+ }
+}
+
+pub(crate) fn render_markdown(data: &[u8]) -> String {
+ let mut opts = markdown::Options::gfm();
+ opts.compile.allow_dangerous_html = true;
+ // XXX can i make use of frontmatter smh
+ // opts.parse.constructs.frontmatter = true;
+ markdown::to_html_with_options(
+ // FIXME assuming markdown files are always utf8 encoded
+ &String::from_utf8_lossy(data),
+ &opts,
+ )
+ .unwrap_or_default()
+}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+
+ #[test]
+ fn frontmatter_splitting() {
+ // nothing to be found
+ {
+ let (head, tail) = split_frontmatter(b"");
+ assert_eq!(head, None);
+ assert_eq!(tail, b"");
+
+ let (head, tail) = split_frontmatter(b"+++");
+ assert_eq!(head, None);
+ assert_eq!(tail, b"+++");
+
+ let (head, tail) = split_frontmatter(b"+++\n++");
+ assert_eq!(head, None);
+ assert_eq!(tail, b"+++\n++");
+
+ let (head, tail) = split_frontmatter(b"+++\n+++");
+ assert_eq!(head, None);
+ assert_eq!(tail, b"+++\n+++");
+
+ let (head, tail) = split_frontmatter(b"+++\n +++\n");
+ assert_eq!(head, None);
+ assert_eq!(tail, b"+++\n +++\n");
+ }
+
+ let (head, tail) = split_frontmatter(b"+++\n+++\n");
+ assert_eq!(
+ head,
+ Some(Payload {
+ data: b"",
+ kind: PayloadKind::Plus
+ })
+ );
+ assert_eq!(tail, b"");
+
+ let (head, tail) = split_frontmatter(b"---\nCACA\n---\nREST");
+ assert_eq!(
+ head,
+ Some(Payload {
+ data: b"CACA",
+ kind: PayloadKind::Minus
+ })
+ );
+ assert_eq!(tail, b"REST");
+
+ let (head, tail) = split_frontmatter(b"+++\nCACA\n+++\nREST+++\n");
+ assert_eq!(
+ head,
+ Some(Payload {
+ data: b"CACA",
+ kind: PayloadKind::Plus
+ })
+ );
+ assert_eq!(tail, b"REST+++\n");
+ }
+}
Created caca/theme/404.html
Created caca/theme/500.html
Created caca/theme/atom.xml.html
+{%- from "macros.html" import feed_entry -%}
+<?xml version="1.0" encoding="utf-8"?>
+<feed xmlns="http://www.w3.org/2005/Atom">
+ <title>{{ repo.name }} activity feed</title>
+ {% if repo.description %}
+ <subtitle>{{ repo.description }}</subtitle>
+ {% endif %}
+ <updated>{{ updated }}</updated>
+ <link href="{{ baseurl|safe }}" />
+ <id>{{ baseurl|safe }}</id>
+
+ {% for e in entries %}
+ {{ feed_entry(e) }}
+ {% endfor %}
+
+</feed>
Created caca/theme/base.html
+<!DOCTYPE html>
+<html lang="en">
+
+<head>
+ <meta charset="utf-8">
+ <title>{% block title %}caio's code asylum{% endblock title %}</title>
+ <link rel="icon"
+ href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><text y=%22.9em%22 font-size=%2290%22>🍄</text></svg>">
+ <meta name="viewport" content="width=device-width, initial-scale=1">
+ <meta name="generator" content="caca - https://caio.co/de/caca">
+ <style>
+ * {
+ box-sizing: border-box;
+ }
+
+ /* derived from https://ultimatemotherfuckingwebsite.com/ bugs and ugly are mine */
+ body {
+ font: 1.2rem/1.5 sans-serif;
+ max-width: 88ch;
+ padding: 2.5rem 1.75rem;
+ word-wrap: break-word;
+ tab-size: 4;
+ }
+
+ h1,
+ h2,
+ h3,
+ h4,
+ h5,
+ h6 {
+ line-height: 1.2;
+ font-family: sans-serif;
+ }
+
+ h1 {
+ font-size: 2.45rem;
+ }
+
+ a:focus {
+ outline: 0.2rem solid;
+ outline-offset: 0.4rem;
+ }
+
+ p {
+ margin-bottom: 2rem;
+ }
+
+ section,
+ aside,
+ footer {
+ margin: 2.5rem auto;
+ }
+
+ ul,
+ ol {
+ margin-top: 0;
+ margin-bottom: 2.25rem;
+ }
+
+ summary {
+ list-style-type: none;
+ }
+
+ /* loudsigh */
+ summary::-webkit-details-marker {
+ display: none;
+ }
+
+ img {
+ max-width: 100%;
+ }
+
+ .idle {
+ margin-left: 0.4rem;
+ display: block;
+ }
+
+ .listing {
+ column-count: 3;
+ column-width: 26ch;
+ }
+
+ .listing>article {
+ margin-left: 1rem;
+ overflow: hidden;
+ }
+
+ nav>ol {
+ list-style-type: none;
+ padding-left: 0;
+ display: inline;
+ }
+
+ nav>ol>li {
+ display: inline;
+ }
+
+ nav>li>a {
+ display: block;
+ }
+
+ nav>ol>li>a::after {
+ content: '/';
+ }
+
+ a.nodec,
+ a.nodec:visited {
+ text-decoration: none;
+ color: #000;
+ }
+
+ .chunk {
+ margin-bottom: 1rem;
+ /* how can i stop fighting with this shit */
+ line-height: 1.4;
+ }
+
+ .unified {
+ border-spacing: 0;
+ border-collapse: separate;
+ box-sizing: border-box;
+ font: 1.2rem/1.275 monospace;
+ vertical-align: top;
+
+ td {
+ padding: 0;
+ }
+ }
+
+ .mb1 {
+ margin-bottom: 1rem;
+ }
+
+
+ .wrap {
+ white-space: pre-wrap;
+ }
+
+ .text {
+ font-family: sans-serif;
+ }
+
+ .code {
+ font-family: monospace;
+ }
+
+ .nowrap {
+ white-space: nowrap;
+ }
+
+ .pre {
+ white-space: pre;
+ }
+
+ .inline {
+ display: inline;
+ }
+
+ .minus {
+ background-color: #E7CCCF;
+ }
+
+ .plus {
+ background-color: #D8E7CE;
+ }
+
+ blockquote {
+ border-left: 2px lightgray solid;
+ padding-left: 0.5rem;
+ }
+
+ article#tip {
+ border-left: 2px lightgray solid;
+
+ details {
+ padding-left: 0.5rem;
+ }
+ }
+
+ @media(max-width: 86ch) {
+ #readme {
+ pre {
+ overflow: auto;
+ }
+ }
+ }
+
+ .line-numbers {
+ display: block;
+ text-align: right;
+ font-family: monospace;
+ max-width: 4rem;
+ margin-right: 0.5rem;
+ border-right: 2px lightgray solid;
+ padding-right: 0.5rem;
+ }
+
+ .blob-content>pre {
+ font-family: monospace;
+ margin: 0;
+ }
+ </style>
+</head>
+
+<body>
+ <header>
+ {% block header %}
+ {% endblock header %}
+ </header>
+ <main>
+ {% block main %}
+ {% endblock main %}
+ </main>
+</body>
+
+
+</html>
Created caca/theme/blob.html
+{% from "macros.html" import tip_details %}
+
+{% extends "repo.html" %}
+
+{% block title %}
+{{ path }} - {{ repo.name }}
+{% endblock title %}
+
+{% block main %}
+<br />
+{{ tip_details(tip, path) }}
+
+<h4>Blob {{path}}</h4>
+{% if kind == "Rendered" %}
+<p>
+ Showing rendered content. <a href="{{ raw_url }}" title="Download source code for {{ path }}">Download source code</a>
+</p>
+<hr />
+<div id="readme">
+ {{ content | safe }}
+</div>
+{% elif kind == "Image" %}
+<img src="{{ raw_url }}">
+{% elif kind == "Other" %}
+<p>Unable to display. <a href="{{ raw_url }}" title="Download {{ path }}">Download</a></p>
+{% elif kind == "TooLarge" %}
+<p>File too large. <a href="{{ raw_url }}" title="Download {{ path }}">Download</a></p>
+{% elif num_lines > 0 %}
+<table>
+ <tbody>
+ <tr>
+ <td class="line-numbers">
+ {% for num in range(1, num_lines + 1) %}
+ <a id="L{{num}}" href="#L{{ num }}">{{ num }}</a><br />
+ {% endfor %}
+ </td>
+ <td class="blob-content">
+ <pre>{{- content -}}</pre>
+ </td>
+ </tr>
+ </tbody>
+</table>
+{% else %}
+<pre>no content</pre>
+{% endif %}
+
+{% endblock main %}
Created caca/theme/commit.html
+{% extends "repo.html" %}
+
+{% block title %}
+{{ commit.message.title }} - {{ commit.id[:10] }} - {{ repo.name }}
+{% endblock title %}
+
+{% block main %}
+<section>
+ <h2>{{ commit.message.title }}</h2>
+ {% if commit.message.body %}
+ <pre class="wrap text">{{ commit.message.body|safe }}</pre>
+ {% endif %}
+</section>
+
+<section>
+ <dl>
+ <dt>Id</dt>
+ <dd>{{ commit.id }}</dd>
+ <dt>Author</dt>
+ {% if commit.author.email_is_url %}
+ <dd><a href="{{ commit.author.email }}">{{ commit.author.name }}</a></dd>
+ {% else %}
+ <dd>{{ commit.author.name }}</dd>
+ {% endif %}
+ <dt>Commit time</dt>
+ <dd>{{ commit.author.time }}</dd>
+ </dl>
+</section>
+
+<section>
+ {% for event in events %}
+ <details class="mb1" open>
+ <summary>
+ {% if event.kind == "Renamed" %}
+ <h4 class="inline">Renamed {{ event.old_path }} to {{ event.path }}</h4>
+ {% else %}
+ <h4 class="inline">{{ event.kind }} {{ event.path }}</h4>
+ {% endif %}
+ {%- if event.diff.kind == "Unified" -%}
+ <nav class="inline">
+ <small>
+ {% if event.previous_url %}
+ <a href="{{ event.previous_url }}" title="view previous version of the file">old</a>
+ {% endif %}
+ {% if event.current_url %}
+ <a href="{{ event.current_url }}" title="view current version of the file">new</a>
+ {% endif %}
+ </small>
+ </nav>
+ {%- endif -%}
+ </summary>
+
+ {%- if event.diff.kind == "Unified" -%}
+ <section>
+ {% for chunk in event.diff.value %}
+ <article class="code chunk">
+ <span class="pre">@@ -{{ chunk.before_start }},{{ chunk.before_len }} +{{ chunk.after_start }},{{
+ chunk.after_len }}</span>
+ <br />
+ {%- for line in chunk.lines -%}
+ <span class="pre {{ line.kind |lower }}">{{ line.sign }}{{ line.value }}</span>
+ <br />
+ {%- endfor -%}
+ </article>
+ {% endfor %}
+ </section>
+
+ {%- elif event.diff.kind == "NoChange" -%}
+ <pre>No visible change</pre>
+
+ {%- elif event.diff.kind == "TooLarge" -%}
+ <pre>File too large</pre>
+
+ {%- elif event.diff.kind == "Binary" -%}
+ <pre>Binary data</pre>
+
+ {%- elif event.diff.kind == "Image" -%}
+ <div>
+ {% if event.previous_url %}
+ <figure>
+ {% if event.current_url %}
+ <figcaption>old</figcaption>
+ {% endif %}
+ <img src="{{ event.previous_url }}" alt="previous version">
+ </figure>
+ {% endif %}
+ {% if event.current_url %}
+ <figure>
+ {% if event.previous_url %}
+ <figcaption>new</figcaption>
+ {% endif %}
+ <img src="{{ event.current_url }}" alt="current version">
+ </figure>
+ {% endif %}
+ </div>
+ {%- endif -%}
+ </details>
+ {% endfor %}
+</section>
+
+{% endblock main %}
Created caca/theme/global_atom.xml.html
+{%- from "macros.html" import feed_entry -%}
+<?xml version="1.0" encoding="utf-8"?>
+<feed xmlns="http://www.w3.org/2005/Atom">
+ <title>{{ baseurl }} activity feed</title>
+ <updated>{{ updated }}</updated>
+ <link href="{{ baseurl|safe }}/" />
+ <id>{{ baseurl|safe }}</id>
+
+ {% for e in entries %}
+ {{ feed_entry(e) }}
+ {% endfor %}
+
+</feed>
Created caca/theme/index.html
+{% from "macros.html" import repo_listing %}
+
+{% extends "base.html" %}
+
+{% block title %}
+{{ title }}
+{% endblock title %}
+
+{% block header %}
+{{ header_html | safe }}
+{% endblock header %}
+
+{% block main %}
+<section>
+ <h2 id="repositories">repositories</h2>
+
+ {% for r in repos if r.state == "Pinned" %}
+ {{ repo_listing(r) }}
+ {% endfor %}
+ {% for r in repos if r.state == "Default" %}
+ {{ repo_listing(r) }}
+ {% endfor %}
+ {% if num_archived > 0 %}
+ <details>
+ <summary>archive ({{ num_archived }})</summary>
+ {% for r in repos if r.state == "Archived" %}
+ {{ repo_listing(r) }}
+ {% endfor %}
+ </details>
+ {% endif %}
+</section>
+{% endblock main %}
Created caca/theme/log.html
+{% extends "repo.html" %}
+
+{% block title %}
+Log{% if path %} for {{ path }}{% endif %} - {{ repo.name }}
+{% endblock title %}
+
+{% block main %}
+{% if path %}
+<h4>Log for {{path}}</h4>
+{% else %}
+<h4>Log</h4>
+{% endif %}
+<ul>
+ {% for e in entries %}
+ <li>
+ <details>
+ <summary>
+ <a href="{{ e.url }}" title="View details for commit {{ e.id }}">{{ e.title}}</a>
+ {% if e.body %}💬{% endif %}
+ <small>
+ by {%- if e.author.email_is_url -%}
+ <a class="nodec" href="{{ e.author.email }}">{{ e.author.name }}</a>
+ {%- else -%}
+ {{ e.author.name }}
+ {%- endif -%}
+ {{ e.author.time_relative }}
+ </small>
+ </summary>
+ {% if e.body %}
+ <pre class="wrap text">{{- e.body -}}</pre>
+ {% endif %}
+ </details>
+ </li>
+ {% endfor %}
+</ul>
+{% if next_url %}
+<nav>
+ <center><a href="{{next_url}}">Older commits</a></center>
+ <br />
+</nav>
+{% endif %}
+
+{% endblock main %}
Created caca/theme/macros.html
+{%- macro tip_details(tip, path="") -%}
+<article id="tip" title="most recent commit{%if path %} for {{ path }}{% endif %}">
+ <details>
+ <summary>
+ <a href="{{ tip.url }}" title="View details for commit {{ tip.id }}">{{
+ tip.message.title}}</a>
+ {% if tip.message.body %}💬{% endif %}
+ <small>
+ by {{ tip.author_name }}
+ {{ tip.author_time_relative }}
+ (<a href="{{ tip.log_url }}" title="View log{%if path %} for {{ path }}{% endif %}">log</a>)
+ </small>
+ </summary>
+ {% if tip.message.body %}
+ <pre class="text wrap">{{- tip.message.body -}}</pre>
+ {% endif %}
+ </details>
+</article>
+{%- endmacro -%}
+
+{%- macro repo_listing(r) -%}
+<article>
+ <p id="name"><b><a href="{{ r.name }}/">{{ r.name }}</a></b>
+ {% if r.description %}
+ <span class="description"><em>{{ r.description}}</em></span>
+ {% endif %}
+ <span class="idle">
+ <small>updated: <time datetime="{{ r.idle }}">{{ r.idle_relative }}</time></small>
+ </span>
+ </p>
+</article>
+{%- endmacro -%}
+
+{%- macro feed_entry(e) -%}
+<entry>
+
+ {% if e.kind == "Tag" %}
+
+ {% if e.repo %}
+ <title>{{ e.repo }}: Tag "{{ e.tag_name }}" created</title>
+ {% else %}
+ <title>Tag "{{ e.tag_name }}" created</title>
+ {% endif %}
+ <link href="{{baseurl | safe }}{{ e.browse_url | safe }}" />
+ <id>{{baseurl | safe }}{{ e.browse_url | safe }}</id>
+
+ {% if e.tagger %}
+ <published>{{ e.tagger.time }}</published>
+ <updated>{{ e.tagger.time }}</updated>
+ {% else %}
+ <published>{{ e.commit.author.time }}</published>
+ <updated>{{ e.commit.author.time }}</updated>
+ {% endif %}
+ {% if e.annotation %}
+ <summary>{{ e.annotation }}</summary>
+ {% else %}
+ <summary>Commit {{ e.commit.id }}: {{ e.commit.title }}</summary>
+ {% endif %}
+ <author>
+ {% if e.tagger %}
+ <name>{{ e.tagger.name }}</name>
+ {% else %}
+ <name>{{ e.commit.author.name }}</name>
+ {% endif %}
+ </author>
+
+ {% elif e.kind == "Branch" %}
+
+ {% if e.repo %}
+ <title>{{ e.repo }}: {{ e.commit.title }}
+ {%- if not e.is_default_branch %} (branch {{ e.branch_name }}){%- endif -%}
+ </title>
+ {% else %}
+ <title>{{ e.commit.title }}
+ {%- if not e.is_default_branch %} (branch {{ e.branch_name }}){%- endif -%}
+ </title>
+ {% endif %}
+ <link href="{{baseurl | safe }}{{ e.commit.url | safe }}" />
+ <id>{{ baseurl | safe }}{{ e.commit.url | safe }}</id>
+ <published>{{ e.commit.author.time }}</published>
+ <updated>{{ e.commit.author.time }}</updated>
+ <summary>{{ e.commit.body }}
+
+ {{- baseurl | safe }}{{ e.browse_url }}
+ </summary>
+ <author>
+ <name>{{ e.commit.author.name }}</name>
+ </author>
+
+ {% endif %} {# e.kind #}
+
+</entry>
+{%- endmacro -%}
Created caca/theme/refs.html
+{% extends "repo.html" %}
+
+{% block main %}
+<section id="branches">
+ <h3>branches</h3>
+ {% for r in branches %}
+ <article>
+ <h4 id="branch-{{r.name}}"><a class="nodec" href="#branch-{{r.name}}">{{ r.name }}</a></h4>
+ <ul>
+ <li><a href="{{ r.browse_url }}">files</a></li>
+ <li><a href="{{ r.log_url }}">log</a></li>
+ </ul>
+ </article>
+ {% endfor %}
+</section>
+
+{% if tags %}
+<section id="tags">
+ <h3>tags</h3>
+ {% for r in tags %}
+ <article>
+ <h4 id="tag-{{r.name}}"><a class="nodec" href="#tag-{{r.name}}">{{ r.name }}</a></h4>
+ {% if r.annotation %}
+ <blockquote class="text wrap">
+ {{- r.annotation -}}
+ </blockquote>
+ {% endif %}
+ <ul>
+ <li><a href="{{ r.browse_url }}">files</a></li>
+ <li><a href="{{ r.log_url }}">log</a></li>
+ </ul>
+ </article>
+ {% endfor %}
+</section>
+{% endif %}
+
+{% endblock main %}
Created caca/theme/repo.html
+{% extends "base.html" %}
+
+{% block title %}{{ repo.name }}{% if repo.description %} - {{ repo.description }}{% endif %} {% endblock title %}
+
+{% block header %}
+<h3><a class="nodec" href="{{ repo.url }}">{{ repo.name }}</a></h3>
+{% if nav %}
+<nav>
+ <ol>
+
+ {# head node: not a href when tail is empty
+ i.e.: when browsing the root of the repo,
+ the context is not clickable #}
+
+ {% if nav.head.kind == "Commit" %}
+
+ {% if nav.tail %}
+ <li><a href="{{ nav.head_url }}">commit/{{ nav.head.value[:10] }}</a></li>
+ {% else %}
+ <li>commit/{{ nav.head.value[:10] }}</li>
+ {% endif %}
+
+ {% elif nav.head.kind == "Tag" %}
+
+ {% if nav.tail %}
+ <li><a href="{{ nav.head_url }}">tag/{{ nav.head.value }}</a></li>
+ {% else %}
+ <li>tag/{{ nav.head.value }}</li>
+ {% endif %}
+
+ {% else %} {# if head.kind = Branch #}
+
+ {% if nav.tail %}
+ <li><a href="{{ nav.head_url }}">branch/{{ nav.head.value }}</a></li>
+ {% else %}
+ <li>branch/{{ nav.head.value }}</li>
+ {% endif %}
+
+ {% endif %}
+
+ {% for comp in nav.components %}
+ <li><a href="{{ comp.url }}">{{ comp.value }}</a></li>
+ {% endfor %}
+
+ {% if nav.tail %}
+ <li>{{ nav.tail }}</li>
+ {% endif %}
+ </ol>
+</nav>
+{% endif %}
+{% endblock header %}
+
+{% block main %}
+{% endblock main %}
Created caca/theme/summary.html
+{% extends "repo.html" %}
+
+{% block title %}
+{{ repo.name }}{% if repo.description %}: {{repo.description}}{% endif %}
+{% endblock title %}
+
+{% block main %}
+<nav>
+ <ul>
+ <li><a href="{{ pages.files }}">Files</a></li>
+ <li><a href="{{ pages.history }}">Log</a></li>
+ <li><a href="{{ pages.refs }}">Refs</a></li>
+ {% for link in pages.links %}
+ <li><a href="{{ link.href }}" {% if link.title %}{{ link.title }}{% endif %}>{{ link.name }}</a></li>
+ {% endfor %}
+ </ul>
+ <pre class="wrap code">git clone {{ repo.clone_url }}</pre>
+</nav>
+
+<h2>Activity</h2>
+
+{# FIXME the lazyness using nbsp and whitespace trimming has become counterproductive #}
+<ul>
+ {% for a in activity %}
+ {% if a.kind == "Branch" %}
+ <li>
+ <details>
+ <summary>
+ <a href="{{ a.commit.url }}" title="View details for commit {{ a.commit.id }}">{{ a.commit.title}}</a>
+ {% if a.commit.body %}💬{% endif %}
+ <small>
+ by {%- if a.commit.author.email_is_url -%}
+ <a class="nodec" href="{{ a.commit.author.email }}">{{ a.commit.author.name }}</a>
+ {%- else -%}
+ {{ a.commit.author.name }}
+ {%- endif -%}
+ {%- if not a.is_default_branch -%}
+ on branch <a href="{{ a.browse_url }}">{{ a.branch_name }}</a>
+ {%- endif -%}
+ {{ a.commit.author.time_relative }}
+ </small>
+ </summary>
+ {% if a.commit.body %}
+ <pre class="wrap text">{{- a.commit.body -}}</pre>
+ {% endif %}
+ </details>
+ </li>
+ {% elif a.kind == "Tag" %}
+ <li>
+ Tag <a href={{ a.browse_url }} title="Browse files on tag {{ a.tag_name }}">{{ a.tag_name }}</a> created
+ <small>
+ by {%- if a.tagger and a.tagger.email_is_url -%}
+ <a class="nodec" href="{{ a.tagger.email }}">{{ a.tagger.name }}</a>
+ {%- elif a.tagger -%}
+ {{ a.tagger.name }}
+ {%- elif a.commit.author.email_is_url -%}
+ <a class="nodec" href="{{ a.commit.author.email }}">{{ a.commit.author.name }}</a>
+ {%- else -%}
+ {{ a.commit.author.name }}
+ {%- endif -%}
+ on commit <a href="{{ a.commit.url }}" title="{{ a.commit.title }}">{{ a.commit.id[:10] }}</a>
+ {%- if a.tagger -%} {{ a.tagger.time_relative }}{%- else -%} {{ a.commit.author.time_relative }}
+ {%- endif -%}
+ </small>
+ </li>
+ {% endif %}
+ {% endfor %}
+</ul>
+
+{% if readme %}
+<hr />
+<div id="readme">
+ {% if readme.mime == "text/plain" %}
+ <pre>{{ readme.content }}</pre>
+ {% else %}
+ {{ readme.content | safe }}
+ {% endif %}
+</div>
+{% endif %}
+
+{% endblock main %}
Created caca/theme/tree.html
+{% from "macros.html" import tip_details %}
+
+{% extends "repo.html" %}
+
+{% block title %}
+Files{% if path %} at {{ path }}{% endif %} - {{ repo.name }}
+{% endblock title %}
+
+{% block main %}
+<br />
+{{ tip_details(tip, path) }}
+
+{% if path %}
+<h4>Tree {{path}}</h4>
+{% else %}
+<h4>Tree</h4>
+{% endif %}
+<section class="listing">
+ {%- for entry in entries -%}
+ {%- if entry.kind == "Dir" -%}
+ <article>📁 <a href="{{ entry.url }}">{{ entry.name }}/</a></article>
+ {% else %}
+ <article>📄 <a href="{{ entry.url }}">{{ entry.name }}</a></article>
+ {%- endif -%}
+ {%- endfor -%}
+</section>
+
+{% if readme %}
+<hr />
+<div id="readme">
+ {% if readme.mime == "text/plain" %}
+ <pre class="code">{{ readme.content }}</pre>
+ {% else %}
+ {{ readme.content | safe }}
+ {% endif %}
+</div>
+{% endif %}
+
+{% endblock main %}
Created caca/theme/www.html
+<!DOCTYPE html>
+<html lang="en">
+
+<head>
+ <meta charset="utf-8">
+ <title>{{ page.title }}</title>
+ <link rel="icon"
+ href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><text y=%22.9em%22 font-size=%2290%22>🍄</text></svg>">
+ <meta name="viewport" content="width=device-width, initial-scale=1">
+ <meta name="generator" content="caca - https://caio.co/de/caca">
+ <style>
+ body {
+ font: 1.2rem/1.5 sans-serif;
+ max-width: 88ch;
+ padding: 2.5rem 1.75rem;
+ word-wrap: break-word;
+ tab-size: 4;
+ }
+
+ h1,
+ h2,
+ h3,
+ h4,
+ h5,
+ h6 {
+ line-height: 1.2;
+ font-family: sans-serif;
+ }
+
+ h1 {
+ font-size: 2.45rem;
+ }
+
+ a:focus {
+ outline: 0.2rem solid;
+ outline-offset: 0.4rem;
+ }
+
+ p {
+ margin-bottom: 2rem;
+ }
+
+ ul,
+ ol {
+ margin-top: 0;
+ margin-bottom: 2.25rem;
+ }
+
+ img {
+ max-width: 100%;
+ }
+
+ blockquote {
+ border-left: 2px lightgray solid;
+ padding-left: 0.5rem;
+ }
+ </style>
+</head>
+
+<body>
+ {{ content|safe }}
+</body>
+
+</html>
Created urso/Cargo.toml
+[package]
+name = "urso"
+version = "0.1.0"
+edition = "2021"
+
+[dependencies]
+gix = { version = "0.58.0", default-features = false, features = ["revision", "blob-diff", "mailmap", "parallel"] }
+tracing = { workspace = true }
+mime_guess = { version = "2.0.4", default-features = false }
+infer = { version = "0.15.0", default-features = false }
Created urso/src/blame/lines.rs
+use std::ops::Range;
+
+#[derive(Debug, PartialEq)]
+pub struct Hunk<T> {
+ pub(crate) len: usize,
+ pub(crate) id: T,
+}
+
+#[derive(Debug, PartialEq)]
+pub struct Lines<T>(Vec<Hunk<T>>);
+
+impl<T> Lines<T>
+where
+ T: Copy + std::fmt::Debug,
+{
+ pub(crate) fn new(id: T, lineno: usize) -> Self {
+ Self(vec![Hunk { len: lineno, id }])
+ }
+
+ pub(crate) fn remove(&mut self, removed: Range<u32>) {
+ let delta = removed.len();
+ debug_assert!(delta > 0);
+ match find_range(&self.0, removed) {
+ (Pos::Left(start), Pos::Left(end)) => {
+ self.0.drain(start..end);
+ }
+ (Pos::Left(start), Pos::Mid(end, offset)) => {
+ self.0[end].len -= offset;
+ if end > start {
+ self.0.drain(start..end);
+ }
+ }
+ (Pos::Mid(start, offset), Pos::Left(end)) => {
+ self.0[start].len = offset;
+ if end > start {
+ self.0.drain((start + 1)..end);
+ }
+ }
+ (Pos::Mid(start, start_offset), Pos::Mid(end, end_offset)) => {
+ if start == end {
+ self.0[start].len -= delta;
+ } else {
+ self.0[start].len = start_offset;
+ self.0[end].len -= end_offset;
+ if end > start {
+ self.0.drain((start + 1)..end);
+ }
+ }
+ }
+ }
+ }
+
+ pub(crate) fn add(&mut self, id: T, added: Range<u32>) {
+ debug_assert!(!added.is_empty());
+ match find_pos(&self.0, added.start as usize) {
+ Pos::Left(idx) => {
+ self.0.insert(
+ idx,
+ Hunk {
+ len: added.len(),
+ id,
+ },
+ );
+ }
+ Pos::Mid(idx, offset) => {
+ // splitting the node at `idx` in half and inserting
+ // a node in the middle
+ // XXX can be smarter with the inserts
+ let remainder = self.0[idx].len - offset;
+ debug_assert!(remainder > 0);
+ self.0[idx].len = offset;
+ let mid_id = self.0[idx].id;
+ self.0.insert(
+ idx + 1,
+ Hunk {
+ len: added.len(),
+ id,
+ },
+ );
+ self.0.insert(
+ idx + 2,
+ Hunk {
+ len: remainder,
+ id: mid_id,
+ },
+ );
+ }
+ };
+ }
+
+ pub(crate) fn into_inner(self) -> Vec<Hunk<T>> {
+ self.0
+ }
+}
+
+impl<T: PartialEq> From<Vec<Hunk<T>>> for Lines<T> {
+ fn from(inner: Vec<Hunk<T>>) -> Self {
+ assert!(!inner.is_empty());
+ for h in inner.iter() {
+ assert_ne!(0, h.len);
+ }
+ Self(inner)
+ }
+}
+
+impl<T: PartialEq> PartialEq<Vec<Hunk<T>>> for Lines<T> {
+ fn eq(&self, other: &Vec<Hunk<T>>) -> bool {
+ self.0.eq(other)
+ }
+}
+
+#[derive(Debug, Clone, PartialEq)]
+enum Pos {
+ Left(usize), // insert on pos
+ Mid(usize, usize), // mitosis (idx, offset)
+}
+
+fn find_pos<T>(state: &[Hunk<T>], lineno: usize) -> Pos
+where
+ T: Copy + std::fmt::Debug,
+{
+ let mut tally = 0;
+ let mut i = 0;
+ for hunk in state.iter() {
+ if tally == lineno {
+ return Pos::Left(i);
+ }
+ if (tally + hunk.len) > lineno {
+ let offset = lineno - tally;
+ debug_assert!(offset > 0);
+ return Pos::Mid(i, offset);
+ }
+ tally += hunk.len;
+ i += 1;
+ }
+ // XXX maybe errorable? this is not the right place to verify
+ // such expectations
+ debug_assert!(tally >= lineno, "trying to find lineno that doesn't exist");
+ Pos::Left(i)
+}
+
+fn find_range<T>(state: &[Hunk<T>], remove: Range<u32>) -> (Pos, Pos)
+where
+ T: Copy + std::fmt::Debug,
+{
+ // NEEDSWORK: the second pos is strictly >= the first one
+ // can do a lot better than using find_pos twice
+ (
+ find_pos(state, remove.start as usize),
+ find_pos(state, remove.end as usize),
+ )
+}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+
+ #[test]
+ #[should_panic]
+ fn find_pos_expects_valid_lineno() {
+ find_pos::<u8>(&[], 9);
+ }
+
+ #[test]
+ fn find_pos_makes_sense() {
+ assert_eq!(Pos::Left(0), find_pos::<u8>(&[], 0));
+
+ let state = vec![
+ Hunk { len: 2, id: 1 },
+ Hunk { len: 5, id: 2 },
+ Hunk { len: 3, id: 1 },
+ ];
+
+ assert_eq!(Pos::Left(0), find_pos(&state, 0));
+ assert_eq!(Pos::Left(1), find_pos(&state, 2));
+ assert_eq!(Pos::Left(2), find_pos(&state, 7));
+ assert_eq!(Pos::Left(3), find_pos(&state, 10));
+ assert_eq!(Pos::Mid(0, 1), find_pos(&state, 1));
+ assert_eq!(Pos::Mid(1, 1), find_pos(&state, 3));
+ assert_eq!(Pos::Mid(1, 2), find_pos(&state, 4));
+ assert_eq!(Pos::Mid(1, 3), find_pos(&state, 5));
+ assert_eq!(Pos::Mid(1, 4), find_pos(&state, 6));
+ assert_eq!(Pos::Mid(2, 1), find_pos(&state, 8));
+ assert_eq!(Pos::Mid(2, 2), find_pos(&state, 9));
+ }
+
+ #[test]
+ #[should_panic]
+ fn find_range_expects_valid_range() {
+ find_range::<u8>(&[], 0..255);
+ }
+
+ #[test]
+ fn find_range_base_case() {
+ assert_eq!((Pos::Left(0), Pos::Left(0)), find_range::<u8>(&[], 0..0));
+ }
+
+ #[test]
+ fn find_range_makes_sense() {
+ let state = vec![
+ Hunk { len: 2, id: 1 },
+ Hunk { len: 5, id: 2 },
+ Hunk { len: 1, id: 3 },
+ Hunk { len: 1, id: 1 },
+ Hunk { len: 4, id: 1 },
+ ];
+
+ assert_eq!((Pos::Left(0), Pos::Mid(0, 1)), find_range(&state, 0..1));
+ assert_eq!((Pos::Left(0), Pos::Left(1)), find_range(&state, 0..2));
+ assert_eq!((Pos::Left(0), Pos::Mid(1, 1)), find_range(&state, 0..3));
+ assert_eq!((Pos::Left(0), Pos::Left(5)), find_range(&state, 0..13));
+
+ assert_eq!((Pos::Mid(1, 4), Pos::Left(2)), find_range(&state, 6..7));
+ assert_eq!((Pos::Mid(1, 4), Pos::Left(3)), find_range(&state, 6..8));
+ assert_eq!((Pos::Mid(4, 2), Pos::Mid(4, 3)), find_range(&state, 11..12));
+ }
+
+ #[test]
+ fn lines_initial_state() {
+ assert_eq!(Lines::new(1u8, 100), vec![Hunk { id: 1u8, len: 100 }]);
+ }
+
+ #[test]
+ fn additions() {
+ let mut lines: Lines<_> = vec![Hunk { len: 10, id: 0 }].into();
+
+ // addition to the middle of the original hunk
+ lines.add(1, 1..3);
+
+ assert_eq!(
+ lines,
+ vec![
+ Hunk { len: 1, id: 0 },
+ Hunk { len: 2, id: 1 },
+ Hunk { len: 9, id: 0 }
+ ],
+ "should've split the original hunk"
+ );
+
+ // addition to the left of the state (e.g. new lines added to the
+ // beginning of the file)
+ lines.add(2, 0..5);
+
+ assert_eq!(
+ lines,
+ vec![
+ Hunk { len: 5, id: 2 },
+ Hunk { len: 1, id: 0 },
+ Hunk { len: 2, id: 1 },
+ Hunk { len: 9, id: 0 }
+ ],
+ );
+
+ // additions to the right of the state
+ lines.add(3, 17..21);
+
+ assert_eq!(
+ lines,
+ vec![
+ Hunk { len: 5, id: 2 },
+ Hunk { len: 1, id: 0 },
+ Hunk { len: 2, id: 1 },
+ Hunk { len: 9, id: 0 },
+ Hunk { len: 4, id: 3 }
+ ],
+ );
+ }
+
+ #[test]
+ fn lines_range_removal() {
+ let mut lines: Lines<_> = vec![
+ Hunk { len: 1, id: 0 }, // keep
+ Hunk { len: 3, id: 1 }, // len: 1
+ Hunk { len: 1, id: 0 }, // remove
+ Hunk { len: 1, id: 0 }, // remove
+ Hunk { len: 4, id: 4 }, // len: 1
+ Hunk { len: 2, id: 3 }, // keep
+ ]
+ .into();
+
+ lines.remove(2..7);
+
+ let wanted = vec![
+ Hunk { len: 1, id: 0 },
+ Hunk { len: 1, id: 1 },
+ Hunk { len: 3, id: 4 },
+ Hunk { len: 2, id: 3 },
+ ];
+
+ assert_eq!(lines, wanted);
+ }
+}
Created urso/src/blame/mod.rs
+use std::borrow::Cow;
+
+use gix::diff::blob::{intern::TokenSource, sources::byte_lines};
+
+use crate::diff::{diff, Line, UnifiedDiff};
+
+mod lines;
+pub use lines::{Hunk, Lines};
+
+// Receives a sequence of object ids and yields a sequecence
+// of ranges to object-id such that every line of the final
+// object (the last one in the sequence) is annotated with
+// the object id that introduced such line.
+pub fn annotate<T, R, E>(ids: &[T], repo: R) -> Result<Annotated<T>, R::Error>
+where
+ T: Copy + std::fmt::Debug + PartialEq,
+ R: Repo<T, Error = E>,
+{
+ assert!(!ids.is_empty(), "needs at least one id");
+
+ // Shortcut: Single version, just need to know
+ // how many lines it has to describe each line
+ if ids.len() == 1 {
+ let mut buf = Vec::new();
+ let id = ids[0];
+ repo.load(&id, &mut buf)?;
+
+ let lines = byte_lines(&buf);
+ let mut content = Vec::with_capacity(lines.estimate_tokens() as usize);
+ for line in lines {
+ content.push(String::from_utf8_lossy(line).into());
+ }
+
+ return Ok(Annotated {
+ annotations: vec![Annotation {
+ lines: 0..(content.len() as u32),
+ id,
+ }],
+ content,
+ });
+ }
+ assert!(ids.len() > 1);
+
+ let mut before_buf = Vec::new();
+ let mut after_buf = Vec::new();
+
+ // diff the versions pairwise and use it to reconstruct
+ // the final state of the blob.
+ let mut last_lineno;
+ let mut last_id;
+ let mut iter = ids[..].windows(2);
+ // First pair, setup the state
+ let mut state = {
+ let Some([prev, cur]) = iter.next() else {
+ unreachable!("guaranteed to have at least 2")
+ };
+ repo.load(prev, &mut before_buf)?;
+ let before = repo.decode_text(&before_buf)?;
+ repo.load(cur, &mut after_buf)?;
+ let after = repo.decode_text(&after_buf)?;
+
+ let delta = diff(&before, &after);
+ last_lineno = delta.after_lineno;
+ last_id = *cur;
+ let mut s = lines::Lines::new(*prev, delta.before_lineno);
+ apply_delta(*cur, &mut s, delta);
+ s
+ };
+ // Now apply the rest
+ while let Some([prev, cur]) = iter.next() {
+ before_buf.clear();
+ after_buf.clear();
+
+ repo.load(prev, &mut before_buf)?;
+ let before = repo.decode_text(&before_buf)?;
+ repo.load(cur, &mut after_buf)?;
+ let after = repo.decode_text(&after_buf)?;
+
+ let delta = diff(&before, &after);
+
+ last_lineno = delta.after_lineno;
+ last_id = *cur;
+ apply_delta(*cur, &mut state, delta);
+ }
+
+ let mut state = state.into_inner();
+ merge_consecutive(&mut state);
+
+ let mut lineno = 0u32;
+ let mut annotations = Vec::new();
+ for hunk in state {
+ let next = lineno + (hunk.len as u32);
+ let lines = lineno..next;
+ annotations.push(Annotation { lines, id: hunk.id });
+ lineno = next;
+ }
+ assert_eq!(lineno as usize, last_lineno);
+
+ let mut content = Vec::with_capacity(last_lineno);
+ let mut buf_b = Vec::new();
+ repo.load(&last_id, &mut buf_b)?;
+ let lines = byte_lines(&buf_b);
+ for line in lines {
+ content.push(String::from_utf8_lossy(line).into());
+ }
+
+ Ok(Annotated {
+ annotations,
+ content,
+ })
+}
+
+// Since the same version may appear multiple times (say: a commit getting
+// reverted), it's possible that `out` now contains consecutive blocks
+// where the ids are the same
+//
+// Merge them so that:
+//
+// [A{0..1}, A{1..2}, B{2..10}]
+//
+// becomes:
+//
+// [A{0..2}, B{2..10}]
+fn merge_consecutive<T>(out: &mut Vec<Hunk<T>>)
+where
+ T: PartialEq + Copy + std::fmt::Debug,
+{
+ if out.is_empty() {
+ return;
+ }
+
+ let mut look_at = 0;
+ let mut write_at = 0;
+ let mut total_merged = 0;
+
+ while look_at < out.len() - 1 {
+ let mut j = look_at + 1;
+ while j < out.len() {
+ // merge nodes sequentially while their ids are the same
+ if out[look_at].id == out[j].id {
+ out[write_at].len += out[j].len;
+ j += 1;
+ total_merged += 1;
+ } else {
+ break;
+ }
+ }
+
+ write_at += 1;
+ look_at = j;
+
+ // a merge happened, so make sure the write_at head
+ // looks exactly like the look_at one
+ if write_at != look_at && look_at < out.len() {
+ out[write_at].len = out[look_at].len;
+ out[write_at].id = out[look_at].id;
+ }
+ }
+
+ if total_merged > 0 {
+ out.truncate(out.len() - total_merged);
+ }
+}
+
+#[derive(Debug, Clone, PartialEq)]
+pub struct Annotation<T> {
+ pub lines: std::ops::Range<u32>,
+ pub id: T,
+}
+
+#[derive(Debug, Clone)]
+pub struct Annotated<T> {
+ pub content: Vec<String>,
+ pub annotations: Vec<Annotation<T>>,
+}
+
+pub trait Repo<T> {
+ type Error: std::error::Error;
+
+ fn load(&self, id: &T, buf: &mut Vec<u8>) -> Result<(), Self::Error>;
+
+ fn decode_text<'a>(&self, data: &'a [u8]) -> std::result::Result<Cow<'a, str>, Self::Error>;
+}
+
+fn apply_delta<T>(id: T, state: &mut lines::Lines<T>, patch: UnifiedDiff)
+where
+ T: std::fmt::Debug + Copy,
+{
+ // NEEDSWORK: looking back at this after some time away will hurt
+ // could do with some abstraction to make it nicer,
+ // or record all the necessary work during the diffing
+ // so that there's no need to spin? not sure it's worth it...
+ #[derive(Debug)]
+ enum Op {
+ Remove(std::ops::Range<u32>),
+ Add(std::ops::Range<u32>),
+ }
+
+ let mut offset: isize = 0;
+ for chunk in patch.chunks {
+ let mut op = None;
+ let mut at = (chunk.before_pos.start as isize + offset) as u32;
+ assert!(chunk.before_pos.start as isize + offset >= 0);
+ for line in chunk.lines.iter() {
+ match line {
+ Line::Addition(_) => {
+ offset += 1;
+ match op {
+ Some(Op::Remove(range)) => {
+ state.remove(range);
+ op = Some(Op::Add(at..at + 1));
+ }
+ Some(Op::Add(mut range)) => {
+ range.end += 1;
+ op = Some(Op::Add(range));
+ }
+ None => {
+ op = Some(Op::Add(at..at + 1));
+ }
+ };
+ }
+ Line::Removal(_) => {
+ offset -= 1;
+ match op {
+ Some(Op::Remove(mut range)) => {
+ range.end += 1;
+ op = Some(Op::Remove(range));
+ }
+ Some(Op::Add(range)) => {
+ at += range.len() as u32;
+ state.add(id, range);
+ op = Some(Op::Remove(at..at + 1));
+ }
+ None => {
+ op = Some(Op::Remove(at..at + 1));
+ }
+ };
+ }
+ Line::Context(_) => {
+ match op {
+ Some(Op::Add(range)) => {
+ at += range.len() as u32;
+ state.add(id, range);
+ }
+ Some(Op::Remove(range)) => {
+ state.remove(range);
+ }
+ None => {}
+ };
+ at += 1;
+ op = None;
+ }
+ };
+ }
+ match op {
+ Some(Op::Remove(range)) => {
+ state.remove(range);
+ }
+ Some(Op::Add(range)) => {
+ state.add(id, range);
+ }
+ None => {}
+ };
+ }
+}
+
+#[cfg(test)]
+mod tests {
+
+ use super::*;
+ use std::{borrow::Cow, collections::HashMap};
+
+ impl Repo<u8> for &HashMap<u8, Vec<u8>> {
+ type Error = std::convert::Infallible;
+
+ fn load(&self, id: &u8, buf: &mut Vec<u8>) -> Result<(), Self::Error> {
+ // there's nothing interesting to test for the error case
+ buf.extend_from_slice(&self[id]);
+ Ok(())
+ }
+
+ fn decode_text<'a>(
+ &self,
+ data: &'a [u8],
+ ) -> std::result::Result<Cow<'a, str>, Self::Error> {
+ Ok(String::from_utf8_lossy(data))
+ }
+ }
+
+ #[test]
+ #[should_panic]
+ fn panics_with_empty_id_set() {
+ annotate(&[], &HashMap::new()).unwrap();
+ }
+
+ #[test]
+ fn handles_single_id() {
+ let mut repo = HashMap::new();
+ repo.insert(1u8, Vec::from(&b"a\nb\nc"[..]));
+ let res = annotate(&[1], &repo).unwrap().annotations;
+
+ assert_eq!(1, res.len(), "should have a single entry");
+ assert_eq!(
+ Annotation {
+ lines: 0..3,
+ id: 1u8
+ },
+ res[0],
+ "should map all 3 lines to a single id"
+ );
+ }
+
+ #[test]
+ fn remove_from_right() {
+ let mut repo = HashMap::new();
+ repo.insert(1u8, Vec::from(&b"a\nb\nc"[..]));
+ repo.insert(2u8, Vec::from(&b"a"[..]));
+
+ let res = annotate(&[1, 2], &repo).unwrap().annotations;
+ assert_eq!(
+ vec![Annotation {
+ lines: 0..1,
+ id: 1u8
+ }],
+ res,
+ );
+ }
+
+ #[test]
+ fn remove_from_left() {
+ let mut repo = HashMap::new();
+ repo.insert(1u8, Vec::from(&b"a\nb\nc\nd"[..]));
+ repo.insert(2u8, Vec::from(&b"c\nd"[..]));
+
+ let res = annotate(&[1, 2], &repo).unwrap().annotations;
+ assert_eq!(
+ vec![Annotation {
+ lines: 0..2,
+ id: 1u8
+ }],
+ res,
+ );
+ }
+
+ #[test]
+ fn remove_from_middle() {
+ let mut repo = HashMap::new();
+ repo.insert(1u8, Vec::from(&b"a\nb\nc\nd"[..]));
+ repo.insert(2u8, Vec::from(&b"a\nd"[..]));
+
+ let res = annotate(&[1, 2], &repo).unwrap().annotations;
+ assert_eq!(
+ vec![Annotation {
+ lines: 0..2,
+ id: 1u8
+ }],
+ res,
+ );
+ }
+
+ #[test]
+ fn add_to_right() {
+ let mut repo = HashMap::new();
+ repo.insert(1u8, Vec::from(&b"a\nb"[..]));
+ repo.insert(2u8, Vec::from(&b"a\nb\nc\nd"[..]));
+
+ let res = annotate(&[1, 2], &repo).unwrap().annotations;
+ assert_eq!(
+ vec![
+ Annotation {
+ lines: 0..2,
+ id: 1u8
+ },
+ Annotation {
+ lines: 2..4,
+ id: 2u8
+ }
+ ],
+ res,
+ );
+ }
+
+ #[test]
+ fn add_to_left() {
+ let mut repo = HashMap::new();
+ repo.insert(1u8, Vec::from(&b"c\nd"[..]));
+ repo.insert(2u8, Vec::from(&b"a\nb\nc\nd"[..]));
+
+ let res = annotate(&[1, 2], &repo).unwrap().annotations;
+ assert_eq!(
+ vec![
+ Annotation {
+ lines: 0..2,
+ id: 2u8
+ },
+ Annotation {
+ lines: 2..4,
+ id: 1u8
+ }
+ ],
+ res,
+ );
+ }
+
+ #[test]
+ fn add_to_middle_single() {
+ let mut repo = HashMap::new();
+ repo.insert(1u8, Vec::from(&b"a\nc"[..]));
+ repo.insert(2u8, Vec::from(&b"a\nb\nc"[..]));
+
+ let res = annotate(&[1, 2], &repo).unwrap().annotations;
+ assert_eq!(
+ vec![
+ Annotation {
+ lines: 0..1,
+ id: 1u8
+ },
+ Annotation {
+ lines: 1..2,
+ id: 2u8
+ },
+ Annotation {
+ lines: 2..3,
+ id: 1u8
+ }
+ ],
+ res,
+ );
+ }
+
+ #[test]
+ fn add_to_middle() {
+ let mut repo = HashMap::new();
+ repo.insert(1u8, Vec::from(&b"a\nd"[..]));
+ repo.insert(2u8, Vec::from(&b"a\nb\nc\nd"[..]));
+
+ let res = annotate(&[1, 2], &repo).unwrap();
+ assert_eq!(
+ vec![
+ Annotation {
+ lines: 0..1,
+ id: 1u8
+ },
+ Annotation {
+ lines: 1..3,
+ id: 2u8
+ },
+ Annotation {
+ lines: 3..4,
+ id: 1u8
+ }
+ ],
+ res.annotations,
+ "{:?}",
+ res
+ );
+ }
+
+ #[test]
+ fn add_and_remove() {
+ let mut repo = HashMap::new();
+ repo.insert(1u8, Vec::from(&b"Z\nb\nZ\nd"[..]));
+ repo.insert(2u8, Vec::from(&b"a\nb\nc\nd\ne"[..]));
+
+ let res = annotate(&[1, 2], &repo).unwrap().annotations;
+ assert_eq!(
+ vec![
+ Annotation {
+ lines: 0..1,
+ id: 2u8
+ },
+ Annotation {
+ lines: 1..2,
+ id: 1u8
+ },
+ Annotation {
+ lines: 2..3,
+ id: 2u8
+ },
+ Annotation {
+ lines: 3..4,
+ id: 1u8
+ },
+ Annotation {
+ lines: 4..5,
+ id: 2u8
+ },
+ ],
+ res,
+ );
+ }
+
+ #[test]
+ fn more_than_two_versions() {
+ let mut repo = HashMap::new();
+ repo.insert(1u8, Vec::from(&b"a"[..]));
+ repo.insert(2u8, Vec::from(&b"a\nb"[..]));
+ repo.insert(3u8, Vec::from(&b"a\nb\nc"[..]));
+ repo.insert(4u8, Vec::from(&b"a\nb\nc\nd"[..]));
+
+ let res = annotate(&[1, 2, 3, 4], &repo).unwrap();
+ assert_eq!(
+ vec![
+ Annotation {
+ lines: 0..1,
+ id: 1u8
+ },
+ Annotation {
+ lines: 1..2,
+ id: 2u8
+ },
+ Annotation {
+ lines: 2..3,
+ id: 3u8
+ },
+ Annotation {
+ lines: 3..4,
+ id: 4u8
+ },
+ ],
+ res.annotations,
+ );
+
+ assert_eq!(vec!["a", "b", "c", "d"], res.content);
+ }
+
+ #[test]
+ fn removals_do_not_leave_empty_ranges() {
+ let mut repo = HashMap::new();
+ repo.insert(1u8, Vec::from(&b"a\nb\nc"[..]));
+ repo.insert(2u8, Vec::from(&b"a\nd\ne"[..]));
+
+ let res = annotate(&[1, 2, 1], &repo).unwrap().annotations;
+ assert_eq!(
+ vec![Annotation {
+ lines: 0..3,
+ id: 1u8
+ },],
+ res,
+ );
+ }
+
+ #[test]
+ fn annotate_empty_file() {
+ let mut repo = HashMap::new();
+ repo.insert(1u8, Vec::new());
+
+ let res = annotate(&[1], &repo).unwrap().annotations;
+ assert_eq!(
+ vec![Annotation {
+ lines: 0..0,
+ id: 1u8
+ }],
+ res
+ );
+ }
+
+ #[test]
+ fn odd_ranges() {
+ let mut repo = HashMap::new();
+
+ let a = b"1
+2
+3
+4
+5
+6
+7
+8
+9";
+
+ // remove 2; add a,b,c
+ // remove 6,7,8; add d
+ let b = b"1
+a
+b
+c
+3
+4
+5
+d
+9";
+ repo.insert(1u8, Vec::from(&a[..]));
+ repo.insert(2u8, Vec::from(&b[..]));
+
+ let res = annotate(&[1, 2], &repo).unwrap().annotations;
+ assert_eq!(
+ vec![
+ Annotation {
+ lines: 0..1,
+ id: 1u8
+ },
+ Annotation {
+ lines: 1..4,
+ id: 2u8
+ },
+ Annotation {
+ lines: 4..7,
+ id: 1u8
+ },
+ Annotation {
+ lines: 7..8,
+ id: 2u8
+ },
+ Annotation {
+ lines: 8..9,
+ id: 1u8
+ },
+ ],
+ res,
+ );
+ }
+
+ #[test]
+ fn annotate_merges_consecutive_blocks() {
+ let mut repo = HashMap::new();
+ repo.insert(1u8, Vec::from(&b"a\nb\nc"[..]));
+ repo.insert(2u8, Vec::from(&b"a\nb"[..]));
+
+ let res = annotate(&[1, 2, 1], &repo).unwrap().annotations;
+ assert_eq!(
+ vec![Annotation {
+ lines: 0..3,
+ id: 1u8
+ },],
+ res,
+ );
+ }
+
+ #[test]
+ fn hunk_merging() {
+ let mut state = vec![
+ Hunk { len: 1, id: 0u8 },
+ Hunk { len: 1, id: 1u8 },
+ Hunk { len: 3, id: 1u8 },
+ Hunk { len: 1, id: 2u8 },
+ Hunk { len: 1, id: 3u8 },
+ Hunk { len: 2, id: 3u8 },
+ Hunk { len: 2, id: 3u8 },
+ ];
+
+ merge_consecutive(&mut state);
+
+ assert_eq!(
+ vec![
+ Hunk { len: 1, id: 0u8 },
+ Hunk { len: 4, id: 1u8 },
+ Hunk { len: 1, id: 2u8 },
+ Hunk { len: 5, id: 3u8 },
+ ],
+ state
+ );
+ }
+
+ #[test]
+ fn merge_doesnt_mess_tail_up() {
+ let mut x = vec![
+ Hunk { len: 1, id: 1u8 },
+ Hunk { len: 1, id: 1u8 },
+ Hunk { len: 1, id: 2u8 },
+ ];
+ merge_consecutive(&mut x);
+ assert_eq!(vec![Hunk { len: 2, id: 1u8 }, Hunk { len: 1, id: 2u8 },], x);
+ }
+}
Created urso/src/diff/mod.rs
+use std::{borrow::Cow, ops::Range};
+
+use gix::{
+ bstr::{BStr, ByteSlice},
+ diff::Rewrites,
+ object::tree::diff::for_each::Error as ForEachError,
+ object::tree::diff::{change::Event as GixEvent, Action},
+ objs::tree::EntryMode,
+ Commit, ObjectId, Tree,
+};
+
+mod sink;
+pub(crate) use sink::{diff, similarity};
+
+use crate::{
+ error::{wrap_err, WrappedError},
+ mime::{self, File},
+};
+
+pub fn diff_commits<R, E, F>(
+ repo: &R,
+ base: Commit<'_>,
+ parent: Option<Commit<'_>>,
+ mut visitor: F,
+) -> Result<(), E>
+where
+ R: Repo<Error = E>,
+ E: std::error::Error + Send + Sync + 'static,
+ F: FnMut(Event),
+{
+ let tree = base
+ .tree()
+ .map_err(|e| wrap_err(format!("reading tree from commit {}", base.id), e))?;
+ let parent_tree = {
+ if let Some(p) = parent {
+ p.tree()
+ .map_err(|e| wrap_err(format!("reading tree from commit {}", p.id), e))?
+ } else {
+ repo.empty_tree()
+ }
+ };
+
+ let foreach_res = parent_tree
+ .changes()
+ .map_err(|e| {
+ wrap_err(
+ format!(
+ "error preparing to diff tree {} vs {}",
+ tree.id, parent_tree.id
+ ),
+ e,
+ )
+ })?
+ .track_path()
+ // FIXME using gix's rename detection here, but using mine on log
+ .track_rewrites(Some(Rewrites {
+ copies: None,
+ percentage: repo.min_similarity(),
+ limit: 0,
+ }))
+ .for_each_to_obtain_tree(&tree, |change| -> Result<Action, E> {
+ match change.event {
+ GixEvent::Addition { entry_mode, id } => {
+ if let Some(event) =
+ handle_change(&repo, change.location, entry_mode, true, id)?
+ {
+ visitor(Event::Addition(event));
+ }
+ }
+ GixEvent::Deletion { entry_mode, id } => {
+ if let Some(event) =
+ handle_change(&repo, change.location, entry_mode, false, id)?
+ {
+ visitor(Event::Deletion(event));
+ }
+ }
+ GixEvent::Modification {
+ previous_entry_mode,
+ previous_id,
+ entry_mode,
+ id,
+ } => {
+ // if both sides aren't blobs, handle as if it
+ // were a Deletion followed by an Addition
+ // FIXME can yield a Modification when entry modes are equal eh
+ if !(entry_mode.is_blob() || previous_entry_mode.is_blob()) {
+ if let Some(event) = handle_change(
+ repo,
+ change.location,
+ previous_entry_mode,
+ false,
+ previous_id,
+ )? {
+ visitor(Event::Deletion(event));
+ }
+ if let Some(event) =
+ handle_change(&repo, change.location, entry_mode, true, id)?
+ {
+ visitor(Event::Addition(event));
+ }
+ } else {
+ handle_modification(
+ &repo,
+ change.location,
+ previous_entry_mode,
+ previous_id,
+ entry_mode,
+ id,
+ )
+ .map(|(src_object, change)| {
+ visitor(Event::Modification {
+ src: src_object,
+ change,
+ });
+ })?;
+ }
+ }
+ GixEvent::Rewrite {
+ source_location,
+ source_entry_mode,
+ source_id,
+ diff: _,
+ entry_mode,
+ id,
+ copy: _,
+ } => {
+ handle_modification(
+ &repo,
+ change.location,
+ source_entry_mode,
+ source_id,
+ entry_mode,
+ id,
+ )
+ .map(|(src_object, change)| {
+ visitor(Event::Rename {
+ src: src_object,
+ change,
+ src_path: source_location.to_path_lossy().into_owned(),
+ });
+ })?;
+ }
+ };
+
+ Ok(Action::Continue)
+ });
+
+ match foreach_res {
+ Ok(_) => Ok(()),
+ Err(ForEachError::ForEach(erased)) => match erased.downcast::<DiffError<E>>() {
+ Ok(err) => Err(*err),
+ Err(other) => {
+ // the foreach api erases the static marker
+ // and this branch shouldn't trigger at all so
+ // yolo it is.
+ Err(wrap_err(format!("{:?}", other), UnknownError))?
+ }
+ },
+ Err(e) => Err(wrap_err(
+ format!("changes between {} and {:?}", tree.id, parent_tree.id,),
+ e,
+ ))?,
+ }
+}
+
+#[derive(Clone)]
+pub struct UnifiedDiff {
+ pub before_lineno: usize,
+ pub after_lineno: usize,
+ pub chunks: Vec<Chunk>,
+}
+
+impl std::fmt::Debug for UnifiedDiff {
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ for c in self.chunks.iter() {
+ c.format_patch(f)?;
+ }
+ Ok(())
+ }
+}
+
+impl UnifiedDiff {
+ pub fn is_empty(&self) -> bool {
+ self.chunks.is_empty()
+ }
+
+ pub fn as_patch(&self) -> std::result::Result<String, std::fmt::Error> {
+ let mut patch = String::new();
+ for chunk in self.chunks.iter() {
+ chunk.format_patch(&mut patch)?;
+ }
+ Ok(patch)
+ }
+}
+
+#[derive(Debug, Clone)]
+pub struct Chunk {
+ pub before_pos: Range<u32>,
+ pub after_pos: Range<u32>,
+ pub lines: Vec<Line>,
+}
+
+impl Chunk {
+ pub fn format_patch<F: std::fmt::Write>(&self, f: &mut F) -> std::fmt::Result {
+ writeln!(
+ f,
+ "@@ -{},{} +{},{} @@",
+ self.before_pos.start + 1,
+ self.before_pos.len(),
+ self.after_pos.start + 1,
+ self.after_pos.len(),
+ )?;
+ for line in self.lines.iter() {
+ match line {
+ Line::Addition(s) => writeln!(f, "+{}", s)?,
+ Line::Removal(s) => writeln!(f, "-{}", s)?,
+ Line::Context(s) => writeln!(f, " {}", s)?,
+ };
+ }
+ Ok(())
+ }
+}
+
+#[derive(Clone)]
+pub enum Line {
+ Addition(String),
+ Removal(String),
+ Context(String),
+}
+
+impl std::fmt::Debug for Line {
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ match self {
+ Line::Addition(s) => write!(f, "\"+{}\"", s),
+ Line::Removal(s) => write!(f, "\"-{}\"", s),
+ Line::Context(s) => write!(f, "\" {}\"", s),
+ }
+ }
+}
+
+#[derive(Debug)]
+pub enum DiffError<E> {
+ Wrapped(WrappedError),
+ Repo(E),
+}
+
+impl<E> From<WrappedError> for DiffError<E> {
+ fn from(value: WrappedError) -> Self {
+ DiffError::Wrapped(value)
+ }
+}
+
+impl<E> std::fmt::Display for DiffError<E>
+where
+ E: std::fmt::Display,
+{
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ match self {
+ DiffError::Wrapped(w) => w.fmt(f),
+ DiffError::Repo(w) => w.fmt(f),
+ }
+ }
+}
+
+#[derive(Clone, Copy, Debug)]
+struct UnknownError;
+
+impl std::fmt::Display for UnknownError {
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ write!(f, "unknown error")
+ }
+}
+
+impl std::error::Error for UnknownError {}
+
+impl<E> std::error::Error for DiffError<E> where E: std::error::Error {}
+
+pub type Result<T, E> = std::result::Result<T, DiffError<E>>;
+
+pub trait Repo {
+ type Error: std::error::Error + Send + Sync + 'static;
+
+ fn min_similarity(&self) -> Option<f32>;
+
+ fn max_bytes(&self) -> u64;
+
+ fn empty_tree(&self) -> Tree<'_>;
+
+ fn load(&self, id: ObjectId, buf: &mut Vec<u8>) -> std::result::Result<(), Self::Error>;
+
+ fn get_header(&self, id: ObjectId) -> std::result::Result<Header, Self::Error>;
+
+ fn decode_text<'a>(&self, data: &'a [u8]) -> std::result::Result<Cow<'a, str>, Self::Error>;
+}
+
+impl<E, T> Repo for &T
+where
+ T: Repo<Error = E>,
+ E: std::error::Error + Send + Sync + 'static,
+{
+ type Error = E;
+
+ fn max_bytes(&self) -> u64 {
+ T::max_bytes(self)
+ }
+
+ fn min_similarity(&self) -> Option<f32> {
+ T::min_similarity(self)
+ }
+
+ fn empty_tree(&self) -> Tree<'_> {
+ T::empty_tree(self)
+ }
+
+ fn load(&self, id: ObjectId, buf: &mut Vec<u8>) -> std::result::Result<(), Self::Error> {
+ T::load(self, id, buf)
+ }
+
+ fn get_header(&self, id: ObjectId) -> std::result::Result<Header, Self::Error> {
+ T::get_header(self, id)
+ }
+
+ fn decode_text<'a>(&self, data: &'a [u8]) -> std::result::Result<Cow<'a, str>, Self::Error> {
+ T::decode_text(self, data)
+ }
+}
+
+fn handle_change<R, E>(
+ repo: &R,
+ filename: &BStr,
+ entry_mode: EntryMode,
+ is_addition: bool, // FIXME eurgh
+ id: gix::Id<'_>,
+) -> Result<Option<Change>, E>
+where
+ R: Repo<Error = E>,
+ E: std::error::Error + Send + Sync + 'static,
+{
+ if entry_mode.is_tree() {
+ return Ok(None);
+ }
+
+ let path = filename.to_path_lossy().into_owned();
+
+ if entry_mode.is_commit() {
+ let diff = if is_addition {
+ diff("", &format!("{}", id))
+ } else {
+ diff(&format!("{}", id), "")
+ };
+ return Ok(Some(Change {
+ file: File::plain(path, mime::TEXT),
+ object: Header {
+ id: id.detach(),
+ size: 0,
+ kind: gix::objs::Kind::Commit,
+ },
+ patch: Patch::Unified(diff),
+ }));
+ }
+
+ let (guessed, is_text) =
+ mime::guess_from_path(&path).map_or((None, false), |m| (Some(m.0), m.1));
+
+ // if the guess is empty, sniff it later after loading from disk
+ if guessed.is_some() && !is_text && !entry_mode.is_link() {
+ return Ok(Some(Change {
+ file: File::plain(path, guessed.unwrap()),
+ object: repo.get_header(id.into()).map_err(DiffError::Repo)?,
+ patch: Patch::BinaryData,
+ }));
+ }
+
+ let header = repo.get_header(id.into()).map_err(DiffError::Repo)?;
+ if header.size > repo.max_bytes() {
+ return Ok(Some(Change {
+ file: File::plain(path, guessed.unwrap_or(mime::BINARY)),
+ object: header,
+ patch: Patch::InputTooLarge,
+ }));
+ }
+
+ let mut buf = Vec::new();
+ repo.load(id.into(), &mut buf).map_err(DiffError::Repo)?;
+
+ // Use the contents to figure out the mime type if the guess
+ // was empty
+ let (mime, is_text) = if let Some(mime) = guessed {
+ (mime, is_text)
+ } else {
+ tracing::trace!(
+ path = tracing::field::debug(&path),
+ guess = tracing::field::debug(mime::guess_from_data(&buf)),
+ "change: had to sniff to guess mime"
+ );
+ mime::guess_from_data(&buf)
+ };
+
+ if !is_text {
+ return Ok(Some(Change {
+ file: File::plain(path, mime),
+ object: repo.get_header(id.into()).map_err(DiffError::Repo)?,
+ patch: Patch::BinaryData,
+ }));
+ }
+
+ let text = match repo.decode_text(&buf) {
+ Ok(decoded) => decoded,
+ Err(err) => {
+ tracing::warn!(
+ err = tracing::field::debug(err),
+ path = tracing::field::debug(&path),
+ "unable to decode"
+ );
+ return Ok(Some(Change {
+ file: File::plain(path, mime),
+ object: repo.get_header(id.into()).map_err(DiffError::Repo)?,
+ patch: Patch::BinaryData,
+ }));
+ }
+ };
+
+ let delta = {
+ if is_addition {
+ diff("", text.as_ref())
+ } else {
+ diff(text.as_ref(), "")
+ }
+ };
+
+ Ok(Some(Change {
+ file: File::plain(path, mime),
+ object: header,
+ patch: Patch::Unified(delta),
+ }))
+}
+
+fn handle_modification<R, E>(
+ repo: &R,
+ filename: &BStr,
+ previous_entry_mode: EntryMode,
+ previous_id: gix::Id<'_>,
+ entry_mode: EntryMode,
+ id: gix::Id<'_>,
+) -> Result<(Header, Change), E>
+where
+ R: Repo<Error = E>,
+ E: std::error::Error + Send + Sync + 'static,
+{
+ debug_assert!(entry_mode.is_blob() && previous_entry_mode.is_blob());
+ let path = filename.to_path_lossy().into_owned();
+ // let guessed = mime::guess_from_path(&path);
+ let (guessed, is_known_text) =
+ mime::guess_from_path(&path).map_or((None, false), |m| (Some(m.0), m.1));
+
+ // if the ids are the same, the only thing that
+ // happened was a mode change
+ if id == previous_id {
+ let header = repo.get_header(id.into()).map_err(DiffError::Repo)?;
+ return Ok((
+ header.clone(),
+ Change {
+ file: File::plain(path, guessed.unwrap_or(mime::BINARY)),
+ object: header,
+ patch: Patch::NoChange,
+ },
+ ));
+ }
+
+ // bail if the guessed mime is not text
+ // if the guess is empty, sniff it later after loading from disk
+ if guessed.is_some() && !is_known_text {
+ let before = repo
+ .get_header(previous_id.into())
+ .map_err(DiffError::Repo)?;
+ let after = repo.get_header(id.into()).map_err(DiffError::Repo)?;
+ return Ok((
+ before,
+ Change {
+ file: File::plain(path, guessed.unwrap()),
+ object: after,
+ patch: Patch::BinaryData,
+ },
+ ));
+ }
+
+ let before = repo
+ .get_header(previous_id.into())
+ .map_err(DiffError::Repo)?;
+ let after = repo.get_header(id.into()).map_err(DiffError::Repo)?;
+
+ if before.size > repo.max_bytes() || after.size > repo.max_bytes() {
+ return Ok((
+ before,
+ Change {
+ file: File::plain(path, guessed.unwrap_or(mime::BINARY)),
+ object: after,
+ patch: Patch::InputTooLarge,
+ },
+ ));
+ }
+
+ let mut before_buf = Vec::with_capacity(before.size as usize);
+
+ repo.load(previous_id.detach(), &mut before_buf)
+ .map_err(DiffError::Repo)?;
+
+ // sniff the data if guessing from the filename led to nothing
+ let (mime, is_text) = if let Some(mime) = guessed {
+ (mime, true)
+ } else {
+ tracing::trace!(
+ path = tracing::field::debug(&path),
+ "had to sniff to guess mime"
+ );
+ mime::guess_from_data(&before_buf)
+ };
+
+ if !is_text {
+ return Ok((
+ before,
+ Change {
+ file: File::plain(path, mime),
+ object: after,
+ patch: Patch::BinaryData,
+ },
+ ));
+ }
+ // XXX is it worth it to check after_buf too?
+ let mut after_buf = Vec::with_capacity(after.size as usize);
+ repo.load(id.detach(), &mut after_buf)
+ .map_err(DiffError::Repo)?;
+
+ let before_text = repo
+ .decode_text(&before_buf)
+ .map_err(|e| DiffError::Repo(e))?;
+ let after_text = repo.decode_text(&after_buf).map_err(DiffError::Repo)?;
+
+ let delta = diff(&before_text, &after_text);
+
+ // bytes are not exactly the same but a line-wise diff led to
+ // no diff
+ if delta.is_empty() {
+ Ok((
+ before,
+ Change {
+ file: File::plain(path, mime),
+ object: after,
+ patch: Patch::NoChange,
+ },
+ ))
+ } else {
+ Ok((
+ before,
+ Change {
+ file: File::plain(path, mime),
+ object: after,
+ patch: Patch::Unified(delta),
+ },
+ ))
+ }
+}
+
+#[derive(Debug)]
+pub enum Event {
+ Addition(Change),
+ Deletion(Change),
+ Modification {
+ src: Header,
+ change: Change,
+ },
+ // it's a modification with an attached
+ // `src_path`
+ Rename {
+ src: Header,
+ src_path: std::path::PathBuf,
+ change: Change,
+ },
+}
+
+#[derive(Debug)]
+pub struct Change {
+ pub file: crate::mime::File,
+ pub object: Header,
+ pub patch: Patch,
+}
+
+#[derive(Debug)]
+pub enum Patch {
+ Unified(UnifiedDiff),
+ InputTooLarge,
+ BinaryData,
+ NoChange,
+}
+
+#[derive(Clone, Debug)]
+pub struct Header {
+ pub id: ObjectId,
+ pub size: u64,
+ pub kind: gix::object::Kind,
+}
Created urso/src/diff/sink.rs
+use std::ops::Range;
+
+use gix::diff::blob::{diff as imara_diff, intern::InternedInput, Algorithm, Sink};
+
+pub(crate) fn diff(before: &str, after: &str) -> super::UnifiedDiff {
+ let input = InternedInput::new(before, after);
+ let sink = StructuredSink::new(&input);
+
+ imara_diff(Algorithm::Myers, &input, sink)
+}
+
+pub(crate) fn similarity(before: &str, after: &str) -> f32 {
+ let before_len = before.len();
+ let after_len = after.len();
+ debug_assert!(
+ before_len > 0 || after_len > 0,
+ "at least one of the sides must be non empty"
+ );
+ let input = InternedInput::new(before, after);
+ let sink = RemovedBytes::new(&input);
+
+ let removed_bytes = imara_diff(Algorithm::Myers, &input, sink);
+ (before_len - removed_bytes) as f32 / before_len.max(after_len) as f32
+}
+
+struct RemovedBytes<'a> {
+ removed_bytes: usize,
+ input: &'a InternedInput<&'a str>,
+}
+
+impl<'a> RemovedBytes<'a> {
+ fn new(input: &'a InternedInput<&'a str>) -> Self {
+ Self {
+ input,
+ removed_bytes: 0,
+ }
+ }
+}
+
+// gix's Statistics sink
+// https://github.com/Byron/gitoxide/blob/72274107fdb8c8faa93a4abbe1382ca3301003c9/gix/src/object/tree/diff/tracked.rs#L407
+impl<'a> Sink for RemovedBytes<'a> {
+ type Out = usize;
+
+ fn process_change(&mut self, before: Range<u32>, _after: Range<u32>) {
+ self.removed_bytes += self.input.before[before.start as usize..before.end as usize]
+ .iter()
+ .map(|token| self.input.interner[*token].len())
+ .sum::<usize>();
+ }
+
+ fn finish(self) -> Self::Out {
+ self.removed_bytes
+ }
+}
+
+// imara-diff's unified diff sink, but written
+// to structs instead of a std::fmt::Write
+// https://github.com/pascalkuthe/imara-diff/blob/30736cc43f0aa63b340c26b48aa39b98a3930de7/src/unified_diff.rs
+struct StructuredSink<'a> {
+ chunks: Vec<super::Chunk>,
+
+ input: &'a InternedInput<&'a str>,
+ pos: u32,
+ before_hunk_start: u32,
+ before_hunk_len: u32,
+ after_hunk_start: u32,
+ after_hunk_len: u32,
+
+ lines: Vec<super::Line>,
+}
+
+impl<'a> StructuredSink<'a> {
+ fn new(input: &'a InternedInput<&'a str>) -> Self {
+ Self {
+ input,
+ chunks: Vec::new(),
+ pos: 0,
+ before_hunk_start: 0,
+ before_hunk_len: 0,
+ after_hunk_start: 0,
+ after_hunk_len: 0,
+ lines: Vec::new(),
+ }
+ }
+
+ fn register_removals(&mut self, range: Range<u32>) {
+ self.before_hunk_len += range.end - range.start;
+ let range = range.start as usize..range.end as usize;
+
+ for &line in self.input.before[range].iter() {
+ self.lines.push(super::Line::Removal(String::from(
+ self.input.interner[line],
+ )));
+ }
+ }
+
+ fn register_additions(&mut self, range: Range<u32>) {
+ self.after_hunk_len += range.end - range.start;
+ for &line in self.input.after[(range.start as usize)..(range.end as usize)].iter() {
+ self.lines.push(super::Line::Addition(String::from(
+ self.input.interner[line],
+ )));
+ }
+ }
+
+ fn register_context(&mut self, range: Range<u32>) {
+ for &line in self.input.before[(range.start as usize)..(range.end as usize)].iter() {
+ self.lines.push(super::Line::Context(String::from(
+ self.input.interner[line],
+ )));
+ }
+ }
+
+ fn flush(&mut self) {
+ if self.before_hunk_len == 0 && self.after_hunk_len == 0 {
+ return;
+ }
+
+ // Advance to collect the context _after_ the changes
+ // being careful not to go beyond the original input length
+ let end = (self.pos + 3).min(self.input.before.len() as u32);
+ self.advance(end, end);
+
+ let lines = std::mem::take(&mut self.lines);
+ let before_pos = self.before_hunk_start..(self.before_hunk_start + self.before_hunk_len);
+ let after_pos = self.after_hunk_start..(self.after_hunk_start + self.after_hunk_len);
+
+ self.chunks.push(super::Chunk {
+ before_pos,
+ after_pos,
+ lines,
+ });
+
+ self.before_hunk_len = 0;
+ self.after_hunk_len = 0;
+ }
+
+ fn advance(&mut self, context_to: u32, pos_to: u32) {
+ self.register_context(self.pos..context_to);
+ let len = context_to - self.pos;
+ self.pos = pos_to;
+ self.before_hunk_len += len;
+ self.after_hunk_len += len;
+ }
+}
+
+impl<'a> Sink for StructuredSink<'a> {
+ type Out = super::UnifiedDiff;
+
+ fn process_change(&mut self, before: Range<u32>, after: Range<u32>) {
+ if before.start - self.pos > 6 {
+ self.flush();
+ self.pos = before.start - 3;
+ self.before_hunk_start = self.pos;
+ self.after_hunk_start = after.start - 3;
+ }
+ self.advance(before.start, before.end);
+ self.register_removals(before);
+ self.register_additions(after);
+ }
+
+ fn finish(mut self) -> Self::Out {
+ self.flush();
+
+ super::UnifiedDiff {
+ before_lineno: self.input.before.len(),
+ after_lineno: self.input.after.len(),
+ chunks: self.chunks,
+ }
+ }
+}
+
+#[cfg(test)]
+mod tests {
+ use gix::diff::blob::{
+ diff as imara_diff, intern::InternedInput, Algorithm, UnifiedDiffBuilder,
+ };
+
+ fn assert_unified_diff_eq(before: &str, after: &str) {
+ let input = InternedInput::new(before, after);
+ let expected = imara_diff(Algorithm::Myers, &input, UnifiedDiffBuilder::new(&input));
+
+ let ours = super::diff(before, after);
+ let got = ours.as_patch().expect("formatting works");
+
+ assert_eq!(
+ got, expected,
+ "patches don't match!\n\nbefore:\n{before}\n\nafter:\n{after}\n"
+ );
+ }
+
+ #[test]
+ fn trivial() {
+ assert_unified_diff_eq("", "");
+ assert_unified_diff_eq("", "a");
+ assert_unified_diff_eq("a", "");
+ assert_unified_diff_eq("a", "b");
+ }
+
+ #[test]
+ fn multiple_chunks() {
+ let before = "
+1
+2
+FIXME
+4
+5
+6
+7
+8
+9
+10
+11
+REMOVEME
+12
+13
+";
+ let after = "
+1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+";
+
+ assert_unified_diff_eq(before, after);
+ }
+
+ #[test]
+ fn unchanged_within_delta() {
+ let before = "
+1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+";
+ let after = "
+1
+2
+3
+4
+5
+6
+THIS IS NEW
+7
+HERE TOO
+8
+9
+10
+11
+12
+13
+14";
+
+ assert_unified_diff_eq(before, after);
+ }
+}
Created urso/src/error.rs
+use std::path::PathBuf;
+
+use gix::ObjectId;
+
+// FIXME clone here would make things easier
+#[derive(Debug)]
+pub enum Error {
+ Bug(String),
+ InvalidRevSpec(Box<gix::revision::spec::parse::single::Error>),
+ ObjectNotFound(ObjectId),
+ NotFound,
+ Wrapped(WrappedError),
+ NotAFile(PathBuf),
+ NotADir(PathBuf),
+ PathNotRelative(PathBuf),
+ Open(Box<gix::open::Error>),
+ DetachedHead,
+ Peel(ObjectId),
+ Decode(ObjectId),
+ Header((ObjectId, String)),
+ ToString(Box<dyn std::error::Error + 'static + Sync + Send>),
+}
+
+// This used to be a PlatformError, with individual discriminants
+// for every gix error I would be propagating up, but these
+// errors are large and change often so, at least for now,
+// I'll go for convenience
+#[derive(Debug)]
+pub struct WrappedError {
+ context: String,
+ wrapped: Box<dyn std::error::Error + 'static + Sync + Send>,
+}
+
+impl std::fmt::Display for WrappedError {
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ write!(f, "{}: {}", self.context, self.wrapped)
+ }
+}
+
+impl std::error::Error for WrappedError {
+ fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
+ Some(self.wrapped.as_ref())
+ }
+}
+
+pub(crate) fn wrap_err<E>(msg: String, error: E) -> WrappedError
+where
+ E: std::error::Error + 'static + Sync + Send,
+{
+ WrappedError {
+ context: msg,
+ wrapped: Box::new(error),
+ }
+}
+
+impl std::fmt::Display for Error {
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ match self {
+ Error::Bug(msg) => write!(f, "BUG: {msg}"),
+ Error::InvalidRevSpec(e) => write!(f, "{}", e),
+ Error::ObjectNotFound(oid) => write!(f, "object not found: {}", oid),
+ Error::NotFound => write!(f, "not found"),
+ Error::NotAFile(path) => write!(f, "not a file: {:?}", path),
+ Error::NotADir(path) => write!(f, "not a dir: {:?}", path),
+ Error::PathNotRelative(path) => write!(f, "path is not relative: {:?}", path),
+ Error::Wrapped(w) => write!(f, "unexpected error: {}", w),
+ Error::Open(e) => write!(f, "{}", e),
+ Error::DetachedHead => write!(f, "repository must have a valid HEAD ref"),
+ Error::Peel(id) => write!(f, "failed to peel reference `{}`", id),
+ Error::Decode(id) => write!(f, "failed to decode object {}", id),
+ Error::Header((oid, msg)) => write!(f, "reading header for {}: {}", oid, msg),
+ Error::ToString(inner) => write!(f, "failed to read bytes as string: {}", inner),
+ }
+ }
+}
+
+impl std::error::Error for Error {
+ fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
+ match self {
+ Error::InvalidRevSpec(e) => Some(e),
+ Error::Open(e) => Some(e),
+ Error::Wrapped(e) => Some(e),
+ Error::ToString(e) => Some(e.as_ref()),
+ _ => None,
+ }
+ }
+}
+
+impl From<WrappedError> for Error {
+ fn from(value: WrappedError) -> Self {
+ Error::Wrapped(value)
+ }
+}
+
+impl From<gix::open::Error> for Error {
+ fn from(value: gix::open::Error) -> Self {
+ Error::Open(Box::new(value))
+ }
+}
+
+impl From<gix::revision::spec::parse::single::Error> for Error {
+ fn from(value: gix::revision::spec::parse::single::Error) -> Self {
+ Self::InvalidRevSpec(Box::new(value))
+ }
+}
Created urso/src/lib.rs
+#![forbid(unsafe_code)]
+#![deny(unreachable_pub)]
+#![warn(
+ clippy::all,
+ clippy::await_holding_lock,
+ clippy::char_lit_as_u8,
+ clippy::checked_conversions,
+ clippy::dbg_macro,
+ clippy::debug_assert_with_mut_call,
+ clippy::doc_markdown,
+ clippy::empty_enum,
+ clippy::enum_glob_use,
+ clippy::exit,
+ clippy::expl_impl_clone_on_copy,
+ clippy::explicit_deref_methods,
+ clippy::explicit_into_iter_loop,
+ clippy::fallible_impl_from,
+ clippy::filter_map_next,
+ clippy::flat_map_option,
+ clippy::float_cmp_const,
+ clippy::fn_params_excessive_bools,
+ clippy::from_iter_instead_of_collect,
+ clippy::if_let_mutex,
+ clippy::implicit_clone,
+ clippy::imprecise_flops,
+ clippy::inefficient_to_string,
+ clippy::invalid_upcast_comparisons,
+ clippy::large_digit_groups,
+ clippy::large_stack_arrays,
+ clippy::large_types_passed_by_value,
+ clippy::let_unit_value,
+ clippy::linkedlist,
+ clippy::lossy_float_literal,
+ clippy::macro_use_imports,
+ clippy::manual_ok_or,
+ clippy::map_err_ignore,
+ clippy::map_flatten,
+ clippy::map_unwrap_or,
+ clippy::match_on_vec_items,
+ clippy::match_same_arms,
+ clippy::match_wild_err_arm,
+ clippy::match_wildcard_for_single_variants,
+ clippy::mem_forget,
+ clippy::mismatched_target_os,
+ clippy::missing_enforced_import_renames,
+ clippy::mut_mut,
+ clippy::mutex_integer,
+ clippy::needless_borrow,
+ clippy::needless_continue,
+ clippy::needless_for_each,
+ clippy::option_option,
+ clippy::path_buf_push_overwrite,
+ clippy::ptr_as_ptr,
+ clippy::rc_mutex,
+ clippy::ref_option_ref,
+ clippy::rest_pat_in_fully_bound_structs,
+ clippy::same_functions_in_if_condition,
+ clippy::semicolon_if_nothing_returned,
+ clippy::single_match_else,
+ clippy::string_add_assign,
+ clippy::string_add,
+ clippy::string_lit_as_bytes,
+ clippy::string_to_string,
+ clippy::todo,
+ clippy::trait_duplication_in_bounds,
+ clippy::unimplemented,
+ clippy::unnested_or_patterns,
+ clippy::unused_self,
+ clippy::useless_transmute,
+ clippy::verbose_file_reads,
+ clippy::zero_sized_map_values,
+ future_incompatible,
+ nonstandard_style,
+ rust_2018_idioms
+)]
+
+use error::wrap_err;
+use gix::{
+ bstr::BStr, object::tree, objs::tree::EntryKind, odb::HeaderExt, prelude::FindExt,
+ traverse::commit::Sorting, Object, Repository, Tree,
+};
+
+// re-export
+pub use gix::{
+ actor::SignatureRef,
+ date::time::Sign as TimeSign,
+ date::Time,
+ mailmap::Snapshot as Mailmap,
+ objs::{tree::EntryMode, CommitRef, TagRef},
+ Commit, ObjectId,
+};
+
+use std::{
+ borrow::Cow,
+ collections::{HashSet, VecDeque},
+ path::{Path, PathBuf},
+};
+
+mod error;
+pub use error::Error;
+
+pub mod diff;
+
+pub mod blame;
+use blame::Annotated;
+
+mod rename;
+use rename::RenameError;
+
+mod mime;
+pub use mime::File;
+
+pub type Result<T> = std::result::Result<T, Error>;
+
+pub struct Urso {
+ repo: Repository,
+ pub max_bytes: u64,
+ similarity_threshold: Option<f32>,
+}
+
+pub fn guess_mime(path: &str, data: &[u8]) -> (&'static str, bool) {
+ mime::guess(path, data)
+}
+
+impl std::fmt::Debug for Urso {
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ f.debug_struct("Urso")
+ .field("repo", &self.git_dir())
+ .field("max_bytes", &self.max_bytes)
+ .field("similarity_threshold", &self.similarity_threshold)
+ .finish()
+ }
+}
+
+#[derive(Clone)]
+pub struct UrsoHandle {
+ inner: gix::ThreadSafeRepository,
+ max_bytes: u64,
+ similarity_threshold: Option<f32>,
+}
+
+impl std::fmt::Debug for UrsoHandle {
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ f.debug_struct("UrsoHandle")
+ .field("repo", &self.inner.git_dir())
+ .field("max_bytes", &self.max_bytes)
+ .field("similarity_threshold", &self.similarity_threshold)
+ .finish()
+ }
+}
+
+// options are intentionally NOT part of the equation
+impl PartialEq for UrsoHandle {
+ fn eq(&self, other: &Self) -> bool {
+ self.inner.git_dir().eq(other.inner.git_dir())
+ }
+}
+
+impl UrsoHandle {
+ pub fn into_urso(self) -> Urso {
+ let UrsoHandle {
+ inner,
+ max_bytes,
+ similarity_threshold,
+ } = self;
+ Urso {
+ repo: inner.into(),
+ max_bytes,
+ similarity_threshold,
+ }
+ }
+
+ pub fn git_dir(&self) -> &Path {
+ self.inner.git_dir()
+ }
+}
+
+impl Urso {
+ pub fn git_dir(&self) -> &Path {
+ self.repo.git_dir()
+ }
+
+ pub fn open(
+ dir: PathBuf,
+ max_bytes: u64,
+ similarity_threshold: Option<f32>,
+ object_cache_size: impl Into<Option<usize>>,
+ ) -> Result<Self> {
+ let mut repo = gix::ThreadSafeRepository::open(dir)?.to_thread_local();
+ repo.object_cache_size(object_cache_size);
+
+ Ok(Self {
+ repo,
+ max_bytes,
+ similarity_threshold,
+ })
+ }
+
+ pub fn into_handle(self) -> UrsoHandle {
+ let Urso {
+ repo,
+ max_bytes,
+ similarity_threshold,
+ } = self;
+
+ UrsoHandle {
+ inner: repo.into_sync(),
+ max_bytes,
+ similarity_threshold,
+ }
+ }
+
+ // finds a blob located at base `path` whose filename matches
+ // one of the given `alts`
+ // for when you want to get the first of /path/{a,b,c,d,e} that
+ // exists in the repo
+ // cheaper than machinegunning get_file_contents since it avoids
+ // decoding the intermediary path objects multiple times
+ // (questionable benefit with tls object decoding cache)
+ pub fn read_firstof<P: AsRef<Path>>(
+ &self,
+ head: ObjectId,
+ path: P,
+ alts: &[&str],
+ buf: &mut Vec<u8>,
+ ) -> Result<File> {
+ let tree = self.get_commit_tree(head)?;
+
+ // root: just try to find alts
+ if path.as_ref().as_os_str().is_empty() {
+ for &alt in alts {
+ if let Some(found) = tree.find_entry(alt) {
+ self.read_blob(found.object_id(), buf)?;
+ return Ok(File::new(path.as_ref().join(alt), buf));
+ }
+ }
+ return Err(Error::NotFound);
+ }
+
+ let Some(entry) = find_path(&path, &tree, buf)? else {
+ return Err(Error::NotFound);
+ };
+
+ if entry.mode().is_tree() {
+ let tree = entry
+ .object()
+ .map_err(|_discarded| Error::ObjectNotFound(entry.id().detach()))?
+ .into_tree();
+ for &alt in alts {
+ if let Some(found) = tree.find_entry(alt) {
+ buf.clear();
+ self.read_blob(found.object_id(), buf)?;
+ return Ok(File::new(path.as_ref().join(alt), buf));
+ }
+ }
+ Err(Error::NotFound)
+ } else {
+ Err(Error::NotFound)
+ }
+ }
+
+ pub fn default_branch(&self) -> Result<String> {
+ // XXX swallowing errors
+ if let Some(head) = self.repo.head_ref().ok().flatten() {
+ Ok(String::from_utf8_lossy(head.name().shorten()).into_owned())
+ } else {
+ Err(Error::DetachedHead)
+ }
+ }
+
+ pub fn rev_parse(&self, spec: &str) -> Result<ObjectId> {
+ Ok(self
+ .repo
+ .rev_parse_single(BStr::new(spec.as_bytes()))?
+ .detach())
+ }
+
+ pub fn diff<F>(&self, base: Commit<'_>, parent: Option<Commit<'_>>, visitor: F) -> Result<()>
+ where
+ F: FnMut(diff::Event),
+ {
+ diff::diff_commits(&self, base, parent, visitor).map_err(|err| match err {
+ diff::DiffError::Wrapped(w) => Error::Wrapped(w),
+ diff::DiffError::Repo(e) => e,
+ })
+ }
+
+ pub fn list_path<V, P>(&self, head: ObjectId, dir: P, visitor: V) -> Result<()>
+ where
+ P: AsRef<std::path::Path>,
+ V: FnMut(Entry<'_>),
+ {
+ let tree = self.get_commit_tree(head)?;
+
+ if !dir.as_ref().is_relative() {
+ return Err(Error::PathNotRelative(dir.as_ref().into()));
+ }
+
+ // empty dir == root of the repo. nothing to find
+ if dir.as_ref().as_os_str().is_empty() {
+ return list_tree(tree, visitor);
+ }
+
+ let mut buf = Vec::new();
+
+ if let Some(entry) = find_path(&dir, &tree, &mut buf)? {
+ if !entry.mode().is_tree() {
+ Err(Error::NotADir(dir.as_ref().into()))
+ } else {
+ list_tree(self.find_object(entry.object_id())?.into_tree(), visitor)
+ }
+ } else {
+ Err(Error::NotFound)
+ }
+ }
+
+ // NEEDSWORK: this is expensive AF
+ // its nature is already costly: find the most recent
+ // commit for each file in the tree
+ // worst case IS walking the whole history until the
+ // very first commit
+ // a objectid->commit lookup would be cheap enough, but
+ // i don't control the ingestion, so keeping it up to
+ // date is not trivial
+ pub fn list_with_log<V, P>(&self, commit_id: ObjectId, dir: P, mut visitor: V) -> Result<()>
+ where
+ // F(Entry, id-of-most-recent-commit)
+ V: FnMut(Entry<'_>, ObjectId),
+ P: AsRef<Path>,
+ {
+ let mut wanted = Vec::new();
+ self.list_path(commit_id, dir.as_ref(), |entry| {
+ wanted.push(entry.to_owned());
+ })?;
+
+ if wanted.is_empty() {
+ return Ok(());
+ }
+
+ let mut buf = Vec::new();
+ let base = dir.as_ref();
+ for rev in self
+ .find_commit(commit_id)?
+ .ancestors()
+ .first_parent_only()
+ .sorting(Sorting::ByCommitTimeNewestFirst)
+ .all()
+ .map_err(|e| wrap_err(format!("walking ancestry of commit {}", commit_id), e))?
+ .flatten()
+ {
+ let commit = self.find_commit(rev.id)?;
+ let commit_tree = commit_tree(&commit)?;
+ let parent_tree = {
+ if let Some(id) = commit.parent_ids().next() {
+ self.get_commit_tree(id.into())?
+ } else {
+ // first commit
+ self.repo.empty_tree()
+ }
+ };
+
+ // from last to first so that swap_remove is safe
+ for idx in (0..wanted.len()).rev() {
+ let t = &wanted[idx];
+ let target = base.join(&t.name);
+
+ let cur = find_path(&target, &commit_tree, &mut buf)?.map(|entry| entry.id());
+ let prev = find_path(&target, &parent_tree, &mut buf)?.map(|entry| entry.id());
+
+ if cur != prev {
+ let target = wanted.swap_remove(idx);
+ visitor(target.as_entry(), commit.id);
+ }
+ }
+
+ if wanted.is_empty() {
+ break;
+ }
+ }
+
+ debug_assert!(wanted.is_empty(), "missing tip commit for {wanted:#?}");
+
+ Ok(())
+ }
+
+ // load data into buf; yield the guessed mime and wether it
+ // can be treated as text
+ // a shortcut for find_header + check-if-blob + guess_mime
+ pub fn get_file_contents<P: AsRef<Path>>(
+ &self,
+ head: ObjectId,
+ path: P,
+ buf: &mut Vec<u8>,
+ ) -> Result<(&'static str, bool)> {
+ if path.as_ref().as_os_str().is_empty() {
+ return Err(Error::NotFound);
+ }
+
+ let tree = self.get_commit_tree(head)?;
+
+ if let Some(entry) = find_path(&path, &tree, buf)? {
+ if entry.mode().is_blob() {
+ buf.clear();
+ self.read_blob(entry.object_id(), buf)?;
+ Ok(mime::guess(path, buf))
+ } else {
+ Err(Error::NotAFile(path.as_ref().into()))
+ }
+ } else {
+ Err(Error::NotFound)
+ }
+ }
+
+ pub fn annotate(&self, commit: ObjectId, path: PathBuf) -> Result<()> {
+ // Must be able to find the given path as a blob
+ // using commit as the head
+ let mut buf = Vec::new();
+ let tree = self.get_commit_tree(commit)?;
+ if let Some(e) = find_path(&path, &tree, &mut buf)? {
+ if !e.mode().is_blob() {
+ return Err(Error::NotAFile(path));
+ }
+ } else {
+ return Err(Error::NotFound);
+ }
+
+ let mut versions = Vec::new();
+ self.path_rev_walk(
+ commit,
+ path,
+ WalkOptions {
+ rename_similarity_threshold: self.similarity_threshold,
+ follow_strategy: FollowStrategy::FirstParent,
+ stop_when_path_disappears: true,
+ },
+ |info: Commit<'_>, _: &Path, _prev, cur| -> bool {
+ if let Some(ver) = cur {
+ versions.push(Version {
+ commit: info.id,
+ object: ver,
+ });
+ }
+ true
+ },
+ )?;
+
+ versions.reverse();
+
+ let out = crate::blame::annotate(&versions[..], self)?;
+
+ let mut lineno = 1;
+ let Annotated {
+ content,
+ annotations,
+ } = out;
+ for b in annotations {
+ let commit = b.id.commit.to_hex_with_len(11);
+ for idx in b.lines {
+ println!("{} {:3} {}", commit, lineno, content[idx as usize]);
+ lineno += 1;
+ }
+ }
+
+ Ok(())
+ }
+
+ // looooooooooong perf tail
+ // may walk all the way to the first commit
+ pub fn tip<P: AsRef<Path>>(&self, head: ObjectId, path: P) -> Result<ObjectId> {
+ let mut found = None;
+
+ // empty path is the root of the repo
+ // the most recent commit is whatever head is at
+ if path.as_ref().as_os_str().is_empty() {
+ return Ok(head);
+ }
+
+ self.path_rev_walk(
+ head,
+ path.as_ref(),
+ WalkOptions {
+ follow_strategy: FollowStrategy::Default,
+ // ensures that the path exists in the
+ // tree of the commit `head` points at
+ // i.e. exits early when not found instead of
+ // climbing the ancestry
+ stop_when_path_disappears: true,
+ // want the first commit only
+ rename_similarity_threshold: None,
+ },
+ |info: Commit<'_>, _: &Path, _, _| {
+ found = Some(info.id);
+ false
+ },
+ )?;
+
+ if let Some(commit) = found {
+ Ok(commit)
+ } else {
+ Err(Error::NotFound)
+ }
+ }
+
+ pub fn log<P, F>(&self, head: ObjectId, path: P, visitor: F) -> Result<()>
+ where
+ P: AsRef<Path>,
+ F: FnMut(Commit<'_>, &Path, Option<ObjectId>, Option<ObjectId>) -> bool,
+ {
+ let tree = self.get_commit_tree(head)?;
+ let mut buf = Vec::new();
+ if find_entry(path.as_ref(), &tree, &mut buf)?.is_none() {
+ return Err(Error::NotFound);
+ }
+
+ self.path_rev_walk(
+ head,
+ path,
+ WalkOptions {
+ rename_similarity_threshold: self.similarity_threshold,
+ follow_strategy: FollowStrategy::Default,
+ stop_when_path_disappears: false,
+ },
+ visitor,
+ )?;
+
+ Ok(())
+ }
+
+ pub fn local_refs<F>(&self, mut visitor: F) -> Result<()>
+ where
+ F: FnMut(RefKind<'_>) -> bool,
+ {
+ 'refs: for mut r in self
+ .repo
+ .references()
+ .map_err(|e| wrap_err("preparing to list all refs".into(), e))?
+ .all()
+ .map_err(|e| wrap_err("iterating over all refs".into(), e))?
+ .flatten()
+ {
+ // XXX this depends on the call order:
+ // peel_to_id_in_place() will overwrite the ref id
+ // with the fully peeled one.
+ // for annotated tags the ref before peeling is
+ // interesting, so must get id before peeling
+ let Some(id) = r.try_id().map(|id| id.detach()) else {
+ // don't care about symbolic refs
+ continue 'refs;
+ };
+ r.peel_to_id_in_place()
+ .map_err(|e| wrap_err("peeling ref".into(), e))?;
+
+ let Some((category, name)) = r.name().category_and_short_name() else {
+ continue 'refs;
+ };
+
+ match category {
+ gix::refs::Category::Tag => {
+ let head = r.inner.peeled.unwrap();
+ // i think i can assume that if head != id, the
+ // tag is annotated
+ match self.find_object(id)?.try_into_tag() {
+ Ok(obj) => {
+ let tag = obj.decode().map_err(|_discarded| Error::Decode(id))?;
+ if !visitor(RefKind::Tag { tag, head }) {
+ break 'refs;
+ }
+ }
+ Err(_ignored) => {
+ if !visitor(RefKind::PlainTag { name, head }) {
+ break 'refs;
+ }
+ }
+ }
+ }
+ gix::refs::Category::LocalBranch => {
+ if !visitor(RefKind::Branch {
+ name,
+ head: r.id().detach(),
+ }) {
+ break 'refs;
+ }
+ }
+ _ignored => (),
+ };
+ }
+
+ Ok(())
+ }
+
+ fn find_object(&self, id: ObjectId) -> Result<Object<'_>> {
+ self.repo
+ .try_find_object(id)
+ .map_err(|e| wrap_err(format!("searching for object {}", id), e))?
+ .ok_or(Error::ObjectNotFound(id))
+ }
+
+ pub fn find_commit(&self, id: ObjectId) -> Result<Commit<'_>> {
+ self.find_object(id)?
+ .try_into_commit()
+ .map_err(|e| wrap_err(format!("reading object {}", id), e).into())
+ }
+
+ fn get_commit_tree(&self, id: ObjectId) -> Result<Tree<'_>> {
+ commit_tree(&self.find_commit(id)?)
+ }
+
+ pub fn read_blob(&self, id: ObjectId, buf: &mut Vec<u8>) -> Result<()> {
+ self.repo
+ .objects
+ .find_blob(&id, buf)
+ .map_err(|e| wrap_err(format!("reading blob id {}", id), e))?;
+
+ Ok(())
+ }
+
+ pub fn find_header<P: AsRef<Path>>(
+ &self,
+ head: ObjectId,
+ path: P,
+ ) -> Result<Option<diff::Header>> {
+ let mut tree = self.get_commit_tree(head)?;
+
+ match tree.peel_to_entry_by_path(path.as_ref()) {
+ Ok(found) => {
+ if let Some(entry) = found {
+ if entry.mode().is_commit() {
+ Ok(Some(diff::Header {
+ id: entry.object_id(),
+ size: 0,
+ kind: gix::objs::Kind::Commit,
+ }))
+ } else {
+ Ok(Some(self.get_header(entry.object_id())?))
+ }
+ } else {
+ Ok(None)
+ }
+ }
+ Err(gix::object::find::existing::Error::NotFound { oid }) => {
+ Err(Error::ObjectNotFound(oid))
+ }
+ Err(e) => Err(wrap_err(
+ format!("looking for path {:?} at tree {}", path.as_ref(), tree.id),
+ e,
+ )
+ .into()),
+ }
+ }
+
+ fn compute_similarity(&self, previous: ObjectId, current: ObjectId) -> Result<f32> {
+ let mut prev = Vec::new();
+ self.read_blob(previous, &mut prev)?;
+
+ let mut cur = Vec::new();
+ self.read_blob(current, &mut cur)?;
+
+ let prev_text = self.blob_bytes_to_string(&prev)?;
+ let cur_text = self.blob_bytes_to_string(&cur)?;
+
+ let result = diff::similarity(prev_text.as_ref(), cur_text.as_ref());
+
+ Ok(result)
+ }
+
+ // XXX fails on symlinks, need to know entry mode before calling
+ fn get_header(&self, id: ObjectId) -> Result<diff::Header> {
+ let h = self
+ .repo
+ .objects
+ .header(id)
+ .map_err(|_discarded| Error::ObjectNotFound(id))?;
+
+ Ok(diff::Header {
+ id,
+ size: h.size(),
+ kind: h.kind(),
+ })
+ }
+
+ // XXX walks until the beginning if the path doesn't exist
+ fn path_rev_walk<P, F>(
+ &self,
+ head: ObjectId,
+ path: P,
+ options: WalkOptions,
+ mut visitor: F,
+ ) -> Result<()>
+ where
+ P: AsRef<Path>,
+ F: PathCommitVisitor,
+ {
+ let mut path = path.as_ref().to_path_buf();
+ let mut queue = OnceQueue::new();
+ queue.insert(head);
+
+ let mut parent_ids = Vec::new();
+ let mut buf = Vec::new();
+
+ while let Some(commit_id) = queue.remove() {
+ let commit = self.find_commit(commit_id)?;
+
+ let commit_tree = self.get_commit_tree(commit_id)?;
+ let entry = find_entry(&path, &commit_tree, &mut buf)?;
+ let current = entry.as_ref().map(|e| e.0);
+
+ if options.stop_when_path_disappears && current.is_none() {
+ continue;
+ }
+
+ parent_ids.clear();
+ commit
+ .parent_ids()
+ .for_each(|id| parent_ids.push(id.detach()));
+ let num_parents = parent_ids.len();
+
+ // Decide which parent(s) to follow and yield back the one
+ // to compare against
+ let parent_id = match num_parents {
+ // root commit. no parent, nothing to follow
+ 0 => {
+ if current.is_some() && !visitor.visit(commit, &path, None, current) {
+ break;
+ }
+ continue;
+ }
+ // single parent, follow it
+ 1 => {
+ queue.insert(parent_ids[0]);
+ parent_ids[0]
+ }
+ _merge_commit => {
+ let mut first_treesame_idx = None;
+
+ for (idx, &parent_id) in parent_ids.iter().enumerate() {
+ let parent_tree = self.get_commit_tree(parent_id)?;
+
+ let previous = find_path(&path, &parent_tree, &mut buf)?
+ .map(|entry| entry.object_id());
+
+ if previous == current {
+ first_treesame_idx = Some(idx);
+ break;
+ }
+ }
+
+ match options.follow_strategy {
+ FollowStrategy::Default => {}
+ FollowStrategy::AllParents => {
+ first_treesame_idx = None;
+ }
+ FollowStrategy::FirstParent => {
+ first_treesame_idx = Some(0);
+ }
+ };
+
+ if let Some(idx) = first_treesame_idx {
+ let parent_id = parent_ids[idx];
+ queue.insert(parent_id);
+ parent_id
+ } else {
+ for &id in parent_ids.iter() {
+ queue.insert(id);
+ }
+ parent_ids[0]
+ }
+ }
+ };
+
+ let parent_tree = self.get_commit_tree(parent_id)?;
+
+ let mut previous = find_entry(&path, &parent_tree, &mut buf)?.map(|entry| entry.0);
+
+ let mut has_renamed = false;
+ if options.rename_similarity_threshold.is_some()
+ && previous.is_none()
+ // only try to detect renames for blob types
+ && entry.is_some_and(|e| matches!(e.1,EntryKind::Blob|EntryKind::BlobExecutable))
+ {
+ tracing::trace!(
+ commit_id = tracing::field::debug(commit_id),
+ parent_id = tracing::field::debug(parent_id),
+ path = tracing::field::debug(&path),
+ "initiating rename detection"
+ );
+
+ if let Some((new_path, new_id)) =
+ rename::find_rename(self, &path, current.unwrap(), &commit_tree, &parent_tree)
+ .map_err(|e| match e {
+ RenameError::Repo(inner) => inner,
+ RenameError::Wrapped(w) => Error::Wrapped(w),
+ })?
+ {
+ tracing::debug!(
+ commit_id = tracing::field::debug(commit_id),
+ parent_id = tracing::field::debug(parent_id),
+ new_path = tracing::field::debug(&new_path),
+ old_path = tracing::field::debug(path),
+ "rename detected"
+ );
+
+ path = new_path;
+ previous = Some(new_id);
+ has_renamed = true;
+ } else {
+ tracing::trace!("no rename detected");
+ }
+ }
+
+ // When a blob is renamed but its contents remain unchanged
+ // `previous` and `current` will be the same. Since the commit
+ // is likely interesting to the calee, the callback is fired
+ if (has_renamed || current != previous)
+ && !visitor.visit(commit, &path, previous, current)
+ {
+ break;
+ }
+ }
+
+ Ok(())
+ }
+
+ // XXX being lossy here can lead to junk for
+ // repositories that hold binary files that can't be
+ // mime-guessed from their name
+ #[allow(clippy::unused_self)] // TODO should know how to decode proper
+ fn blob_bytes_to_string<'a>(&self, data: &'a [u8]) -> Result<Cow<'a, str>> {
+ Ok(Cow::Borrowed(
+ std::str::from_utf8(data).map_err(|e| Error::ToString(Box::new(e)))?,
+ ))
+ }
+}
+
+#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord)]
+struct Version {
+ commit: ObjectId,
+ object: ObjectId,
+}
+
+impl blame::Repo<Version> for &Urso {
+ type Error = Error;
+
+ fn load(&self, id: &Version, buf: &mut Vec<u8>) -> std::result::Result<(), Self::Error> {
+ self.read_blob(id.object, buf)
+ }
+
+ fn decode_text<'a>(
+ &self,
+ data: &'a [u8],
+ ) -> std::result::Result<std::borrow::Cow<'a, str>, Self::Error> {
+ self.blob_bytes_to_string(data)
+ }
+}
+
+impl diff::Repo for &Urso {
+ type Error = Error;
+
+ fn max_bytes(&self) -> u64 {
+ self.max_bytes
+ }
+
+ fn empty_tree(&self) -> Tree<'_> {
+ self.repo.empty_tree()
+ }
+
+ fn load(&self, id: ObjectId, buf: &mut Vec<u8>) -> std::result::Result<(), Self::Error> {
+ self.read_blob(id, buf)
+ }
+
+ fn get_header(&self, id: ObjectId) -> std::result::Result<diff::Header, Self::Error> {
+ Urso::get_header(self, id)
+ }
+
+ fn decode_text<'a>(
+ &self,
+ data: &'a [u8],
+ ) -> std::result::Result<std::borrow::Cow<'a, str>, Self::Error> {
+ self.blob_bytes_to_string(data)
+ }
+
+ fn min_similarity(&self) -> Option<f32> {
+ self.similarity_threshold
+ }
+}
+
+impl rename::Repo for &Urso {
+ type Error = Error;
+
+ fn similarity(&self, prev: ObjectId, cur: ObjectId) -> std::result::Result<f32, Self::Error> {
+ self.compute_similarity(prev, cur)
+ }
+
+ fn get_header(&self, id: ObjectId) -> std::result::Result<diff::Header, Self::Error> {
+ Urso::get_header(self, id)
+ }
+
+ fn min_similarity(&self) -> f32 {
+ self.similarity_threshold.unwrap_or(1.0)
+ }
+}
+
+struct OnceQueue<T> {
+ other: VecDeque<T>,
+ seen: HashSet<T>,
+}
+
+impl<T: std::hash::Hash + Eq + Copy> OnceQueue<T> {
+ pub(crate) fn new() -> Self {
+ Self {
+ other: Default::default(),
+ seen: Default::default(),
+ }
+ }
+
+ pub(crate) fn insert(&mut self, info: T) -> bool {
+ if self.seen.insert(info) {
+ self.other.push_back(info);
+ true
+ } else {
+ false
+ }
+ }
+
+ pub(crate) fn remove(&mut self) -> Option<T> {
+ self.other.pop_front()
+ }
+}
+
+#[derive(Default, Clone)]
+struct WalkOptions {
+ rename_similarity_threshold: Option<f32>,
+ follow_strategy: FollowStrategy,
+ stop_when_path_disappears: bool,
+}
+
+#[derive(PartialEq, Default, Clone)]
+pub enum FollowStrategy {
+ #[default]
+ Default,
+ AllParents,
+ FirstParent,
+}
+
+trait PathCommitVisitor {
+ fn visit(
+ &mut self,
+ commit: Commit<'_>,
+ path: &Path,
+ previous: Option<ObjectId>,
+ current: Option<ObjectId>,
+ ) -> bool;
+}
+
+impl<F> PathCommitVisitor for F
+where
+ F: FnMut(Commit<'_>, &Path, Option<ObjectId>, Option<ObjectId>) -> bool,
+{
+ fn visit(
+ &mut self,
+ commit: Commit<'_>,
+ path: &Path,
+ previous: Option<ObjectId>,
+ current: Option<ObjectId>,
+ ) -> bool {
+ (self)(commit, path, previous, current)
+ }
+}
+
+#[derive(Debug, Clone)]
+pub struct Entry<'a> {
+ pub mode: EntryMode,
+ pub name: &'a [u8],
+ pub id: ObjectId,
+}
+
+impl<'a> Entry<'a> {
+ fn to_owned(&self) -> OwnedEntry {
+ OwnedEntry {
+ mode: self.mode,
+ id: self.id,
+ name: String::from_utf8_lossy(self.name).into_owned(),
+ }
+ }
+}
+
+#[derive(Debug)]
+struct OwnedEntry {
+ mode: EntryMode,
+ name: String,
+ id: ObjectId,
+}
+
+impl OwnedEntry {
+ fn as_entry(&self) -> Entry<'_> {
+ Entry {
+ mode: self.mode,
+ id: self.id,
+ name: self.name.as_bytes(),
+ }
+ }
+}
+
+fn list_tree<V>(tree: Tree<'_>, mut visitor: V) -> Result<()>
+where
+ // F(id, name, mode)
+ V: FnMut(Entry<'_>),
+{
+ for maybe_entry in tree.iter() {
+ let entry =
+ maybe_entry.map_err(|e| wrap_err(format!("listing items for tree {}", tree.id), e))?;
+ visitor(Entry {
+ mode: entry.mode(),
+ name: entry.filename(),
+ id: entry.object_id(),
+ });
+ }
+ Ok(())
+}
+
+fn commit_tree<'a>(commit: &Commit<'a>) -> Result<Tree<'a>> {
+ commit
+ .tree()
+ .map_err(|e| wrap_err(format!("reading tree for commit {}", commit.id), e).into())
+}
+
+fn find_path<'a, P: AsRef<Path>>(
+ path: P,
+ tree: &Tree<'a>,
+ buf: &mut Vec<u8>,
+) -> Result<Option<tree::Entry<'a>>> {
+ match tree.lookup_entry_by_path(&path, buf) {
+ Ok(found) => Ok(found),
+ Err(gix::object::find::existing::Error::NotFound { oid }) => {
+ Err(Error::ObjectNotFound(oid))
+ }
+ Err(e) => Err(wrap_err(
+ format!("looking for path {:?} at tree {}", path.as_ref(), tree.id),
+ e,
+ )
+ .into()),
+ }
+}
+
+fn find_entry<P: AsRef<Path>>(
+ path: P,
+ tree: &Tree<'_>,
+ buf: &mut Vec<u8>,
+) -> Result<Option<(ObjectId, EntryKind)>> {
+ if path.as_ref().as_os_str().is_empty() {
+ Ok(Some((tree.id, EntryKind::Tree)))
+ } else {
+ Ok(find_path(path, tree, buf)?.map(|e| (e.object_id(), e.mode().kind())))
+ }
+}
+
+#[derive(Debug)]
+pub enum RefKind<'a> {
+ PlainTag { name: &'a [u8], head: ObjectId },
+ Tag { tag: TagRef<'a>, head: ObjectId },
+ Branch { name: &'a [u8], head: ObjectId },
+}
+
+pub mod config {
+
+ pub use gix::config::parse::Error;
+ use gix::config::parse::{from_bytes, Event};
+
+ trait EntryVisitor {
+ fn visit(
+ &mut self,
+ section: &str,
+ subsection: Option<&str>,
+ key: &str,
+ value: &[u8],
+ ) -> bool;
+ }
+
+ pub fn parse<V>(data: &[u8], mut visitor: V) -> Result<(), Error>
+ where
+ // F(section, subsection, key, value) -> continue_parsing
+ V: FnMut(&str, Option<&str>, &str, &[u8]) -> bool,
+ {
+ let mut stop = false;
+
+ let mut section = Default::default();
+ // not a plain option so i can reuse the buffer
+ let mut subsection = Default::default();
+ let mut subsection_is_set = false;
+
+ // XXX section names are verified to be alphanumeric and their
+ // container implements AsRef<str>, but the accessor erases it
+ // by yielding &BStr instead of &Name (or &str)
+ macro_rules! assume_str {
+ ($bytes:expr) => {
+ std::str::from_utf8($bytes).unwrap_or_default()
+ };
+ }
+
+ let mut key = String::default();
+ let mut parial_value = Vec::new();
+
+ from_bytes(data, &mut |event| {
+ if stop {
+ return;
+ }
+
+ match event {
+ Event::SectionHeader(head) => {
+ head.name().clone_into(&mut section);
+ subsection_is_set = head.subsection_name().is_some_and(|sub| {
+ sub.clone_into(&mut subsection);
+ true
+ });
+ }
+ Event::SectionKey(section_key) => {
+ section_key.as_ref().clone_into(&mut key);
+ }
+ Event::Value(value) => {
+ let mut sub = None;
+ if subsection_is_set {
+ sub = Some(assume_str!(&subsection));
+ }
+ stop = !visitor(assume_str!(§ion), sub, &key, &value);
+ }
+ Event::ValueNotDone(part) => {
+ parial_value.extend_from_slice(&part);
+ }
+ Event::ValueDone(part) => {
+ parial_value.extend_from_slice(&part);
+
+ let mut sub = None;
+ if subsection_is_set {
+ sub = Some(assume_str!(&subsection));
+ }
+ stop = !visitor(assume_str!(§ion), sub, &key, &parial_value);
+
+ parial_value.clear();
+ }
+ _ => (),
+ }
+ })?;
+
+ Ok(())
+ }
+
+ #[cfg(test)]
+ mod tests {
+ use super::*;
+
+ #[test]
+ fn parse_ok() {
+ let input = "
+[urso]
+ setting = hello \
+ world
+ another= HEAD
+[many \"a\"]
+name = first
+
+[many \"b\"]
+name = second
+";
+ let mut expects = vec![
+ ("urso", None, "setting", "hello world"),
+ ("urso", None, "another", "HEAD"),
+ ("many", Some("a"), "name", "first"),
+ ("many", Some("b"), "name", "second"),
+ ];
+
+ let visitor = |section: &str, sub: Option<&str>, key: &str, value: &[u8]| -> bool {
+ let wanted = expects.remove(0);
+ assert_eq!(wanted.0, section, "wrong section");
+ assert_eq!(wanted.1, sub, "wrong subsection");
+ assert_eq!(wanted.2, key, "wrong key");
+ assert_eq!(
+ wanted.3,
+ std::str::from_utf8(value).expect("valid utf8"),
+ "wrong value"
+ );
+ true
+ };
+
+ parse(input.as_bytes(), visitor).expect("no error parsing");
+
+ assert!(expects.is_empty());
+ }
+ }
+}
Created urso/src/mime.rs
+// mime guessing here assumes that version controlled
+// stuff is text- if it can't guess based on the filename
+// it will sniff the contents (if available) and assume
+// plain text if it can't guess anything
+// maybe this will bite me one day...
+
+use std::path::{Path, PathBuf};
+
+#[derive(Debug, Clone, PartialEq)]
+pub struct File {
+ pub path: PathBuf,
+ pub mime: &'static str,
+}
+
+pub(crate) const BINARY: &str = "application/octet-stream";
+
+pub(crate) const TEXT: &str = "text/plain";
+
+pub(crate) fn guess_from_path<P: AsRef<Path>>(path: P) -> Option<(&'static str, bool)> {
+ let Some(filename) = path.as_ref().file_name() else {
+ // shouldn't happen eh
+ return Some((BINARY, false));
+ };
+
+ // These use extensions as if they were some cute little
+ // pointless thing people attach to names
+ if filename == "go.mod" || filename == "go.sum" || filename == "Cargo.lock" {
+ return Some((TEXT, true));
+ }
+
+ // These extensions are mapped incorrectly
+ if let Some(ext) = path.as_ref().extension() {
+ // TODO fix mime_guess: maps to octect stream
+ if ext == "java" {
+ return Some(("text/x-java", true));
+ }
+
+ // mime_guess has nothing
+ if ext == "go" {
+ return Some((TEXT, true));
+ }
+ }
+
+ if let Some(mime) = mime_guess::from_path(path.as_ref()).first_raw() {
+ return Some((mime, is_text(mime)));
+ }
+
+ // dotfiles (assuming utf8 encoding) are text
+ if filename
+ .as_encoded_bytes()
+ .first()
+ .is_some_and(|first_byte| *first_byte == b"."[0])
+ {
+ return Some((TEXT, true));
+ }
+
+ if filename == "CONTRIBUTING"
+ || filename == "COPYING"
+ || filename == "INSTALL"
+ || filename == "LICENSE"
+ || filename == "README"
+ || filename == "AUTHORS"
+ || filename == "readme"
+ || filename == "Makefile"
+ || filename == "configure"
+ || filename == "Dockerfile"
+ {
+ return Some((TEXT, true));
+ }
+
+ None
+}
+
+pub(crate) fn guess_from_data(data: &[u8]) -> (&'static str, bool) {
+ infer::get(data).map_or((TEXT, true), |m| (m.mime_type(), is_text(m.mime_type())))
+}
+
+pub(crate) fn guess<P: AsRef<Path>>(path: P, data: &[u8]) -> (&'static str, bool) {
+ guess_from_path(&path).unwrap_or_else(|| {
+ if !data.is_empty() {
+ tracing::trace!(
+ path = tracing::field::debug(path.as_ref()),
+ "had to sniff bytes to infer mime"
+ );
+ }
+ guess_from_data(data)
+ })
+}
+
+fn is_text(mime: &'static str) -> bool {
+ let (mime, _param) = mime.split_once(';').unwrap_or((mime, ""));
+ let (mime, suffix) = mime.split_once('+').unwrap_or((mime, ""));
+ let Some((mtype, subtype)) = mime.split_once('/') else {
+ return false;
+ };
+
+ if mtype == "text" {
+ return true;
+ }
+
+ if mtype != "application" {
+ return false;
+ }
+
+ if matches!(suffix, "xml" | "json") {
+ return true;
+ }
+
+ if matches!(
+ subtype,
+ "javascript"
+ | "json"
+ | "srt"
+ | "t"
+ | "tk"
+ | "xml"
+ | "x-sh"
+ | "x-tcl"
+ | "x-tex"
+ | "x-texinfo"
+ ) {
+ return true;
+ }
+
+ false
+}
+
+impl File {
+ pub(crate) fn plain(path: PathBuf, mime: &'static str) -> Self {
+ Self { path, mime }
+ }
+
+ pub(crate) fn new(path: PathBuf, data: &[u8]) -> Self {
+ let (mime, _) = guess(&path, data);
+ Self::plain(path, mime)
+ }
+}
Created urso/src/rename/mod.rs
+use gix::{
+ bstr::{BStr, ByteSlice},
+ object::tree::diff::{change::Event, Action},
+ ObjectId, Tree,
+};
+
+use std::path::{Path, PathBuf};
+
+use crate::error::{wrap_err, WrappedError};
+
+pub(crate) enum RenameError<E> {
+ Repo(E),
+ Wrapped(WrappedError),
+}
+
+impl<E> From<WrappedError> for RenameError<E> {
+ fn from(value: WrappedError) -> Self {
+ RenameError::Wrapped(value)
+ }
+}
+
+// Find out if `path` was created via renaming a path
+// from the `parent` tree
+//
+// Takes a path that exists in the current tree and does
+// NOT exist in the parent
+//
+// Yields back the path and object id of a blob in the
+// parent tree that is most similar to the one pointed
+// at by the input path
+pub(crate) fn find_rename<R, E, P>(
+ repo: R,
+ _path: P,
+ // the path at the current tree points at the blob id
+ id: ObjectId,
+ current: &Tree<'_>,
+ parent: &Tree<'_>,
+) -> Result<Option<(PathBuf, ObjectId)>, RenameError<E>>
+where
+ R: Repo<Error = E>,
+ P: AsRef<Path>,
+{
+ #[cfg(debug)]
+ {
+ let mut buf = Vec::new();
+ assert!(
+ current
+ .lookup_entry_by_path(&_path, &mut buf)
+ .expect("entry lookup via path works")
+ .is_some(),
+ "precondition failure: given path {:?} does not exist in current tree {}",
+ _path.as_ref(),
+ current.id
+ );
+
+ assert!(
+ parent
+ .lookup_entry_by_path(&_path, &mut buf)
+ .expect("entry lookup via path works")
+ .is_none(),
+ "precondition failure: given path {:?} exists in parent tree {}",
+ _path.as_ref(),
+ parent.id
+ );
+ }
+
+ let mut candidates = Vec::new();
+ let target_header = repo.get_header(id).map_err(RenameError::Repo)?;
+
+ // stop conditions:
+ // - error
+ // - identity found
+ let mut map_err = None;
+ let mut by_identity = None;
+
+ map_deleted_blobs(current, parent, |location, id| -> bool {
+ if target_header.id == id {
+ // trivial rename: path changed, blob content didn't
+ by_identity = Some(gix::path::from_bstr(location).into_owned());
+ return false;
+ }
+ match repo.get_header(id) {
+ Ok(header) => {
+ // pruning candidates set:
+ // since similarity is based on the byte length of the
+ // diff, the total sizes can be used to figure out
+ // if there's even a chance they'll be similar enough
+ let max = target_header.size.max(header.size) as f32;
+ let delta = target_header.size.abs_diff(header.size) as f32;
+ if max * repo.min_similarity() >= delta {
+ // XXX should sniff the bytes prior to mime-based filtering
+ let path = location.to_path_lossy();
+ let (guessed, is_text) =
+ crate::mime::guess_from_path(&path).unwrap_or((crate::mime::BINARY, false));
+ if is_text {
+ candidates.push((path.into_owned(), header));
+ } else {
+ tracing::trace!(
+ path = tracing::field::debug(location),
+ mime = guessed,
+ "skipped: not text"
+ );
+ }
+ } else {
+ tracing::trace!(
+ path = tracing::field::debug(location),
+ "skipped: size difference too large"
+ );
+ }
+ true
+ }
+ Err(e) => {
+ map_err = Some(e);
+ false
+ }
+ }
+ })?;
+
+ if let Some(err) = map_err {
+ return Err(RenameError::Repo(err));
+ }
+
+ if let Some(found) = by_identity {
+ return Ok(Some((found, id)));
+ }
+
+ if candidates.is_empty() {
+ return Ok(None);
+ }
+
+ // The set of candidates is potentially massive,
+ // checking the similarity of every single one would
+ // be too costly so i'll use a heuristic that tries
+ // to look at the most likely candidates first and
+ // stops as soon as it finds one that's similar enough.
+ //
+ // Worst case is when a rename can't be found.
+ //
+ // Since byte delta is the criteria, prioritize
+ // blobs with similar length to the target.
+ // Break even with the path, in order to guarantee
+ // output stability
+ candidates.sort_unstable_by(|a, b| {
+ a.1.size
+ .abs_diff(target_header.size)
+ .cmp(&b.1.size.abs_diff(target_header.size))
+ .then_with(|| a.1.id.cmp(&b.1.id))
+ });
+ tracing::trace!(
+ candidates = tracing::field::debug(&candidates),
+ "rename candidates"
+ );
+
+ for (path, header) in candidates {
+ let similarity = repo.similarity(id, header.id).map_err(RenameError::Repo)?;
+ if similarity >= repo.min_similarity() {
+ return Ok(Some((path, header.id)));
+ } else {
+ tracing::trace!(
+ path = tracing::field::debug(&path),
+ threshold = repo.min_similarity(),
+ similarity,
+ "not similar enough"
+ );
+ }
+ }
+
+ Ok(None)
+}
+
+pub(crate) trait Repo {
+ type Error;
+
+ // FIXME ensure within 0..=1
+ fn min_similarity(&self) -> f32;
+
+ fn get_header(&self, id: ObjectId) -> std::result::Result<crate::diff::Header, Self::Error>;
+
+ fn similarity(&self, prev: ObjectId, cur: ObjectId) -> std::result::Result<f32, Self::Error>;
+}
+
+// maps every blob that would be removed when transforming
+// `parent` into `current`
+fn map_deleted_blobs<F>(
+ current: &Tree<'_>,
+ parent: &Tree<'_>,
+ mut cb: F,
+) -> Result<(), WrappedError>
+where
+ F: FnMut(&BStr, ObjectId) -> bool,
+{
+ let outcome = parent
+ .changes()
+ .map_err(|e| {
+ wrap_err(
+ format!("preparing to diff tree {} vs {}", parent.id, current.id,),
+ e,
+ )
+ })?
+ .track_path()
+ .track_rewrites(None)
+ .for_each_to_obtain_tree(current, |change| -> std::result::Result<_, WrappedError> {
+ if let Event::Deletion { entry_mode, id } = change.event {
+ if entry_mode.is_blob() && !cb(change.location, id.detach()) {
+ return Ok(Action::Cancel);
+ }
+ };
+ Ok(Action::Continue)
+ });
+
+ match outcome {
+ // If comparing the trees yields no error or gets cancelled
+ // manually, everything went fine
+ Ok(_)
+ | Err(gix::object::tree::diff::for_each::Error::Diff(
+ gix::diff::tree::changes::Error::Cancelled,
+ )) => Ok(()),
+ Err(e) => Err(wrap_err(
+ format!(
+ "walking diff between trees {} and {}",
+ current.id, parent.id
+ ),
+ e,
+ )),
+ }
+}