Post · bonfire.cafe

@sarahjamielewis@mastodon.social · 5 months ago

Let's say I have a collection of documents (pdf or images etc.).

For each document I have the original, for some I have transcribed text (and for some of those an additional layer featuring translated text if the original was non-English).

On top of those I have my own semi-structured notes regarding the document (context, source, links to other documents, meaning).

Does there exist a (foss) file format/program that I can use to view/structure/store/edit those various semantic layers?

🍒🌳 Hartmut Goebel

@kirschwipfel@nerdculture.de · 5 months ago

@ctietze Isn't this a #Zettlekasten thing?
@sarahjamielewis

David Moles

@chronodm@glammr.us · 5 months ago

@sarahjamielewis @brainwane there are definitely best practices / metadata standards / file formats / etc. for this in the library & archives world, but user-friendly tools not so much 😕

Aaron Brick — אהרן בריק

@aarbrk@mstdn.mx · 5 months ago

@sarahjamielewis I can only suggest a file format, not software: the one defined by the Text Encoding Initiative. TEI-XML is a markup language for archiving and editing that structurally encodes all kinds of annotations, e.g. transcriptions, renditions, descriptions, corrections, standardizations, image locations.... Institutions like the TEI standard because it is rigorous and comprehensive. I hope this helps.

Daniel Blake

@Daniel_Blake@mastodon.top · 5 months ago

@sarahjamielewis https://anytype.io/

clew

@clew@ecoevo.social · 5 months ago

I have heard someone discussing how to set up a database to track very technical artworks — they wanted high resolution photos as a “fixed” layer, like your scans, referring to physical objects; and then there were several different things ABOUT the works each of which might have several stages. (How it was made; artistic analysis; business analysis).

It seemed like they had to “roll their own “ but also, with only a few database concepts that was quite doable.

@sarahjamielewis

clew

@clew@ecoevo.social · 5 months ago

I just remembered what this was -- it was a conversation with Chihuly who happened to be standing in a film festival line next to some database nerds in the 1990s in Seattle. Can't get much more 1990s Seattle than that.

You could ask the Chihuly studio what their knowledge management system currently is!

@sarahjamielewis

$\u1f0a1$

\u1f0a1

@bnlandor@mastodon.social · 5 months ago

@sarahjamielewis I'd use Obsidian.md for that, although the core itself is not foss

Sarah Jamie Lewis

@sarahjamielewis@mastodon.social · 5 months ago

As far as I can tell the closest is various "pdf annotation" solutions which are limited to directly marking up a pdf file with notes, which is strictly not what I want.

In an ideal world, I would like to:

1. Load up a corpus
2. Navigate to a specific document
3. Review various semantic layers e.g. the original, transcription, translations, context etc.
4. Edit any of these layers (maybe I find a better quality scan of the original, maybe I take the time to rederive some handwritten scrawl.

Sarah Jamie Lewis

@sarahjamielewis@mastodon.social · 5 months ago

Some additional capabilities I would like:

5. Be able to link documents together to form a semantic thread /collection(e.g. this document is a direct response to this one / these documents are all about one long term project)

6. Write a new document with linked references to any other document/sub-document/layer/collection

das-g

@das_g@chaos.social · 5 months ago

@sarahjamielewis Maybe enquire at the people behind https://impresso-project.ch

I can very well imagine that it covers at least some of those aspects, but it's quite unclear to me what Impresso does (or will eventually) and what it doesn't (and won't ever) entail.

nadja

@dequbed@mastodon.chaosfield.at · 5 months ago

@sarahjamielewis So Zotero (a reference management tool) can do at least parts of that. You can construct documents from different parts that all still represent the same document in some way, and that's explicitly intended so you can have like scans, transcripts and such. It doesn't really order these parts into 'layers', but you can tag parts so e.g. all transcriptions are tagged as such, maybe solving for what you need anyway?

Sarah Jamie Lewis

@sarahjamielewis@mastodon.social · 5 months ago

@dequbed

I've used Zotero before and just tried it out again, it's almost usable but has two big (and related) blockers:

1. Annotations seem to be the only way to add context to sections / extract sections into new sub-documents, and are strictly limited to sub-page level - I need the ability to add annotations across multiple pages / extract multiple pages into a cohesive sub-document.

2. As far as I can tell extracting annotations into new notes is an all-or-nothing thing.

1 more replies

nadja

@dequbed@mastodon.chaosfield.at · 5 months ago

@sarahjamielewis it also allows you to easily order documents into bigger collections and a document can be in an infinite number of collections. It doesn't have an built-in editor though, but it keeps the files that make up the parts of the document in your filesystem hierarchy, so you can easily edit them with an external editor

Lemmus

@Lemmus@social.vivaldi.net · 5 months ago

@sarahjamielewis EPUB has a defined annotation format, and ODF has everything and the kitchen sink. LibreOffice supports annotations and versions, but multiple language is something I doubt it has.

Kat

@KatS@chaosfem.tw · 5 months ago

@sarahjamielewis Not that I know of, but I'm interested in the answer.