CINXE.COM
IPLD ♦ The Brief Primer
<!doctype html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <link rel="stylesheet" href="/css/layout.css?1718759055581"> <link rel="stylesheet" href="/css/nav.css?1718759055581"> <link rel="stylesheet" href="/css/style.css?1718759055581"> <link rel="stylesheet" href="/css/prismjs@1.24-themes-prism.css"> <title>IPLD ♦ The Brief Primer</title> </head> <body> <header> <div class="sidebar-button" onclick="document.getElementById('sidebar').classList.toggle('sidebar-open')"> <svg xmlns="http://www.w3.org/2000/svg" aria-hidden="true" role="img" viewBox="0 0 448 512" class="icon"> <path fill="currentColor" d="M436 124H12c-6.627 0-12-5.373-12-12V80c0-6.627 5.373-12 12-12h424c6.627 0 12 5.373 12 12v32c0 6.627-5.373 12-12 12zm0 160H12c-6.627 0-12-5.373-12-12v-32c0-6.627 5.373-12 12-12h424c6.627 0 12 5.373 12 12v32c0 6.627-5.373 12-12 12zm0 160H12c-6.627 0-12-5.373-12-12v-32c0-6.627 5.373-12 12-12h424c6.627 0 12 5.373 12 12v32c0 6.627-5.373 12-12 12z"></path> </svg> </div> <a href="/" class="logo">IPLD</a> <aside id=breadcrumbs> <ul> <li><a href="/docs">docs</a></li> <li><a href="/docs/intro">intro</a></li> <li><a href="/docs/intro/primer/">primer</a></li> </ul> </aside> </header> <aside id=sidebar> <nav> <ul> <li> <a href="/docs/">Docs</a><ul> <li> <a href="/docs/intro/">Intro</a><ul> <li> <a href="/docs/intro/hello-world/">Hello, World</a></li> <li class="active-page"> <a href="/docs/intro/primer/">The Brief Primer</a></li> <li> <a href="/docs/intro/ecosystem/">InterPlanetary Ecosystem Overview</a></li> <li> <a href="/docs/intro/community/">Finding Community</a></li></ul></li> <li> <a href="/docs/motivation/">Motivation</a><ul> <li> <a href="/docs/motivation/benefits-of-content-addressing/">Benefits of Content Addressing</a></li> <li> <a href="/docs/motivation/data-to-data-structures/">From Data to Data Structures</a></li></ul></li> <li> <a href="/docs/codecs/">Codecs</a><ul> <li> <a href="/docs/codecs/known/">Known Codecs</a><ul> <li> <a href="/docs/codecs/known/dag-cbor/">DAG-CBOR</a></li> <li> <a href="/docs/codecs/known/dag-json/">DAG-JSON</a></li> <li> <a href="/docs/codecs/known/dag-pb/">DAG-PB</a></li></ul></li></ul></li> <li> <a href="/docs/data-model/">Data Model</a><ul> <li> <a href="/docs/data-model/node/">Nodes</a></li> <li> <a href="/docs/data-model/kinds/">Kinds</a></li> <li> <a href="/docs/data-model/pathing/">Pathing</a></li> <li> <a href="/docs/data-model/traversal/">Traversal</a></li></ul></li> <li> <a href="/docs/advanced-data-layouts/">Advanced Data Layouts</a><ul> <li> <a href="/docs/advanced-data-layouts/intro/">Intro to ADLs</a></li> <li> <a href="/docs/advanced-data-layouts/naming/">ADL Naming</a></li> <li> <a href="/docs/advanced-data-layouts/signalling/">Signalling ADLs</a></li> <li> <a href="/docs/advanced-data-layouts/dynamic-loading/">Dynamic Loading</a></li> <li> <a href="/docs/advanced-data-layouts/known/">Known ADLs</a></li></ul></li> <li> <a href="/docs/schemas/">Schemas</a><ul> <li> <a href="/docs/schemas/intro/">Introduction</a><ul> <li> <a href="/docs/schemas/intro/compare/">compare</a></li> <li> <a href="/docs/schemas/intro/goals/">Goals</a></li> <li> <a href="/docs/schemas/intro/feature-summary/">Feature Summary</a></li></ul></li> <li> <a href="/docs/schemas/features/">Features</a><ul> <li> <a href="/docs/schemas/features/typekinds/">Type Kinds</a></li> <li> <a href="/docs/schemas/features/representation-strategies/">Representation Strategies</a></li> <li> <a href="/docs/schemas/features/links/">Links</a></li> <li> <a href="/docs/schemas/features/indicating-adls/">Using ADLs in Schemas</a></li></ul></li> <li> <a href="/docs/schemas/using/">Using Wisely</a><ul> <li> <a href="/docs/schemas/using/authoring-guide/">Authoring Guide</a></li> <li> <a href="/docs/schemas/using/migrations/">Migrations</a></li></ul></li></ul></li> <li> <a href="/docs/synthesis/">Synthesis</a><ul> <li> <a href="/docs/synthesis/gtd/">Getting Things Done</a></li> <li> <a href="/docs/synthesis/building-in-alignment/">Building in Alignment</a></li> <li> <a href="/docs/synthesis/how-ipfs-web-gateways-work/">How IPFS Web Gateways Work</a></li> <li> <a href="/docs/synthesis/encryption/">Working With Encryption</a></li></ul></li></ul></li> <li> <a href="/specs/">Specs</a><ul> <li> <a href="/specs/about/">About the Specifications</a></li> <li> <a href="/specs/codecs/">Codecs</a><ul> <li> <a href="/specs/codecs/dag-cbor/">DAG-CBOR</a><ul> <li> <a href="/specs/codecs/dag-cbor/fixtures/">DAG-CBOR Test Fixtures</a><ul> <li> <a href="/specs/codecs/dag-cbor/fixtures/cross-codec/">cross-codec</a></li></ul></li> <li> <a href="/specs/codecs/dag-cbor/spec/">Spec</a></li></ul></li> <li> <a href="/specs/codecs/dag-cosmos/">DAG-COSMOS</a><ul> <li> <a href="/specs/codecs/dag-cosmos/basic_types/">basic_types</a></li> <li> <a href="/specs/codecs/dag-cosmos/cosmos_state/">cosmos_state</a></li> <li> <a href="/specs/codecs/dag-cosmos/crypto_types/">crypto_types</a></li> <li> <a href="/specs/codecs/dag-cosmos/tendermint_chain/">tendermint_chain</a></li> <li> <a href="/specs/codecs/dag-cosmos/typed_protobuf/">typed_protobuf</a></li></ul></li> <li> <a href="/specs/codecs/dag-eth/">DAG-ETH</a><ul> <li> <a href="/specs/codecs/dag-eth/basic_types/">basic_types</a></li> <li> <a href="/specs/codecs/dag-eth/chain/">chain</a></li> <li> <a href="/specs/codecs/dag-eth/convenience_types/">convenience_types</a></li> <li> <a href="/specs/codecs/dag-eth/state/">state</a></li></ul></li> <li> <a href="/specs/codecs/dag-jose/">DAG-JOSE</a><ul> <li> <a href="/specs/codecs/dag-jose/fixtures/">fixtures</a></li> <li> <a href="/specs/codecs/dag-jose/spec/">Spec</a></li></ul></li> <li> <a href="/specs/codecs/dag-json/">DAG-JSON</a><ul> <li> <a href="/specs/codecs/dag-json/fixtures/">DAG-JSON Test Fixtures</a><ul> <li> <a href="/specs/codecs/dag-json/fixtures/cross-codec/">cross-codec</a></li></ul></li> <li> <a href="/specs/codecs/dag-json/spec/">Spec</a></li></ul></li> <li> <a href="/specs/codecs/dag-pb/">DAG-PB</a><ul> <li> <a href="/specs/codecs/dag-pb/fixtures/">DAG-PB Test Fixtures</a><ul> <li> <a href="/specs/codecs/dag-pb/fixtures/cross-codec/">cross-codec</a></li></ul></li> <li> <a href="/specs/codecs/dag-pb/spec/">Spec</a></li></ul></li></ul></li> <li> <a href="/specs/advanced-data-layouts/">Advanced Data Layouts</a><ul> <li> <a href="/specs/advanced-data-layouts/fbl/">FBL ADL</a><ul> <li> <a href="/specs/advanced-data-layouts/fbl/spec/">spec</a></li></ul></li> <li> <a href="/specs/advanced-data-layouts/hamt/">HAMT ADL</a><ul> <li> <a href="/specs/advanced-data-layouts/hamt/fixture/">HashMap (HAMT) Test Fixtures</a><ul> <li> <a href="/specs/advanced-data-layouts/hamt/fixture/alice-words/">alice-words</a></li></ul></li> <li> <a href="/specs/advanced-data-layouts/hamt/spec/">spec</a></li></ul></li></ul></li> <li> <a href="/specs/schemas/">Schemas</a><ul> <li> <a href="/specs/schemas/prelude/">prelude</a></li></ul></li> <li> <a href="/specs/transport/">Transports</a><ul> <li> <a href="/specs/transport/car/">CAR</a><ul> <li> <a href="/specs/transport/car/carv1/">CARv1 Specification</a></li> <li> <a href="/specs/transport/car/carv2/">CARv2 Specification</a></li> <li> <a href="/specs/transport/car/fixture/">CAR Test Fixtures</a><ul> <li> <a href="/specs/transport/car/fixture/carv1-basic/">carv1-basic</a></li> <li> <a href="/specs/transport/car/fixture/carv2-basic/">carv2-basic</a></li></ul></li></ul></li> <li> <a href="/specs/transport/graphsync/">Graphsync</a><ul> <li> <a href="/specs/transport/graphsync/known_extensions/">known_extensions</a></li></ul></li> <li> <a href="/specs/transport/trustless-pathing/">Trustless Pathing</a><ul> <li> <a href="/specs/transport/trustless-pathing/fixtures/">Trustless Pathing Fixtures</a><ul> <li> <a href="/specs/transport/trustless-pathing/fixtures/unixfs_20m_variety/">unixfs_20m_variety</a></li></ul></li></ul></li></ul></li> <li> <a href="/specs/selectors/">Selectors</a><ul> <li> <a href="/specs/selectors/fixtures/">fixtures</a><ul> <li> <a href="/specs/selectors/fixtures/selector-fixtures-1/">selector-fixtures-1</a></li> <li> <a href="/specs/selectors/fixtures/selector-fixtures-adl/">selector-fixtures-adl</a></li> <li> <a href="/specs/selectors/fixtures/selector-fixtures-recursion/">selector-fixtures-recursion</a></li></ul></li></ul></li> <li> <a href="/specs/patch/">Patch</a><ul> <li> <a href="/specs/patch/fixtures/">IPLD Patch Test Fixtures</a><ul> <li> <a href="/specs/patch/fixtures/fixtures-1/">fixtures-1</a></li></ul></li></ul></li></ul></li> <li> <a href="/libraries/">Libraries</a><ul> <li> <a href="/libraries/golang/">Golang</a></li> <li> <a href="/libraries/javascript/">JavaScript</a></li> <li> <a href="/libraries/python/">Python</a></li> <li> <a href="/libraries/rust/">Rust</a></li></ul></li> <li> <a href="/design/">Design</a><ul> <li> <a href="/design/objectives/">Objectives</a></li> <li> <a href="/design/concepts/">Concepts</a><ul> <li> <a href="/design/concepts/type-theory-glossary/">type-theory-glossary</a></li></ul></li> <li> <a href="/design/libraries/">Libraries</a><ul> <li> <a href="/design/libraries/nodes-and-kinds/">nodes-and-kinds</a></li></ul></li> <li> <a href="/design/tricky-choices/">Tricky Choices</a><ul> <li> <a href="/design/tricky-choices/dag-pb-forms-impl-and-use/">dag-pb-forms-impl-and-use</a></li> <li> <a href="/design/tricky-choices/map-key-domain/">map-key-domain</a></li> <li> <a href="/design/tricky-choices/numeric-domain/">numeric-domain</a></li> <li> <a href="/design/tricky-choices/ordering/">ordering</a></li> <li> <a href="/design/tricky-choices/string-domain/">string-domain</a></li></ul></li> <li> <a href="/design/open-research/">Open Research</a><ul> <li> <a href="/design/open-research/ADL-autoexecution/">ADL autoexecution</a></li></ul></li></ul></li> <li> <a href="/tools/">Tools</a></li> <li> <a href="/glossary/">Glossary</a></li> <li> <a href="/media/">Media</a></li> <li> <a href="/FAQ/">FAQ</a></li></ul> </nav> </aside> <main> <div class=content> <h1>A Terse, Quick IPLD Primer for the Engineer</h1> <div class="callout callout-info"> <blockquote> <p>Forward: This is meant to be a <em>terse</em> primer -- the aim is just to introduce the basic terms and scope and provide a good foundation for further understanding. The reader will likely still need to seek more information from other documents, or ask questions in support channels, in order to fully learn how to use (or contribute to) IPLD. The primer should simply provide a good grounding for integrating that further information.</p> </blockquote> </div> <div class="callout callout-info"> <blockquote> <p>Key terms are in <strong>Bold Title Case</strong> to highlight them. These are terms you should be able to "ctrl-f" through the page to find other mentions of.</p> </blockquote> </div> <div class="callout callout-info"> <blockquote> <p>Familiarity with some computer science concepts will be presumed. E.g., if the reader is unfamiliar with the concept of an "AST", it will be necessary to look that up in some other source.</p> </blockquote> </div> <h2 id="key-concepts" tabindex="-1"><a class="header-anchor" href="#key-concepts">Key Concepts</a></h2> <h3 id="core-data-model-&-codecs-&-linking" tabindex="-1"><a class="header-anchor" href="#core-data-model-&-codecs-&-linking">Core: Data Model & Codecs & Linking</a></h3> <ul> <li> <p>The IPLD <strong>Data Model</strong> is like an "AST" for data -- but without the "S"; the Data Model is independent of syntax.</p> </li> <li> <p>The IPLD <strong>Data Model</strong> looks and feels roughly like JSON -- there are maps and lists for building nested structures, and strings and booleans and so forth. We also add a concept of byte sequences, and a concept of <strong>Link</strong>s (we'll come back to defining <strong>Linking</strong> and <strong>Link</strong>s later).</p> </li> <li> <p>IPLD data held in the <strong>Data Model</strong> can be marshalled into a serial form via a <strong>Codec</strong> -- and vice versa: serialized data can be unmarshalled to <strong>Data Model</strong> form via a <strong>Codec</strong>.</p> </li> <li> <p><strong>Codec</strong>s are pluggable. IPLD supports many codecs. Some common ones include the JSON and CBOR formats, but there are many more. The only constraint on a Codec is that it must produce and consume <strong>Data Model</strong>, and exchange it with the codec's serial format.</p> <ul> <li>If you're thinking to yourself "wow, there's a lot of details that need specification there", you're certainly right! There's a whole writeup about "Codecs and Completeness" subject in <a href="https://gist.github.com/warpfork/28f93bee7184a708223274583109f31c">another document</a>.</li> </ul> </li> <li> <p>It is possible to take serialized data (e.g., a <strong>Block</strong>) and pass it through a (cryptographic) hashing function. (This isn't a new idea, but one we apply -- IPLD didn't invent hashing; we're just identifying an important fact here.) The resulting "hash" value is useful as a shorthand identifier for the full data.</p> </li> <li> <p><strong>Linking</strong> in IPLD is done using hashes of data. A <strong>Link</strong> points to a <strong>Block</strong>. (More on <strong>Block</strong>s later, in the next section.)</p> </li> <li> <p>Because <strong>Linking</strong> is based on cryptographic hashes, graphs (DAGs, more specifically) of data of unlimited size can be made and reference other subgraphs easily. Unlike URLs or other forms of "linking", this works without any central name registration authority -- this is useful, because it means anyone can create linked data, and consume and traverse linked data, without coordination: it works totally offline.</p> </li> <li> <p><strong>Linking</strong> in IPLD is implemented using a spec called <strong>CID</strong> (short for <strong>C</strong>ontent <strong>ID</strong>entifier). CIDs contain both the hash of the target data, and some info about what <strong>Codec</strong> to use when parsing it, meaning the target data can be turned directly into <strong>Data Model</strong> (!).</p> </li> <li> <p>Now we are ready to produce a thesis statement about the purpose of IPLD:</p> <ul> <li>Given the IPLD libraries and specs, it should be possible to produce a new system "like git" (in that it's content-addressed, decentralized, and excellent) in one order of magnitude less time than it would otherwise take.</li> <li>IPLD takes all of the <em>incidental</em> choices that must be made (but don't "matter", per se) such as choice of codec, choice of hashing function, and so on, and turns them into pluggable components... so you can start developing on the <strong>Data Model</strong> level (where the important semantics are!) and pick the rest later.</li> <li>Furthermore: other people should be able to develop their own programs for interacting with your data easily: they should be able to target the <strong>Data Model</strong> and only worry about the semantics of the data. The codecs and other details should be out-of-box handled already for them, so they can get to business quickly.</li> </ul> </li> </ul> <h3 id="blocks-vs-nodes" tabindex="-1"><a class="header-anchor" href="#blocks-vs-nodes">Blocks vs Nodes</a></h3> <ul> <li> <p>A unit of serialized data is called a <strong>Block</strong> in IPLD. Marshalling turns <strong>Data Model</strong> into in a <strong>Block</strong>; Unmarshalling turns a <strong>Block</strong> into <strong>Data Model</strong>.</p> </li> <li> <p>A <strong>Link</strong> targets a <strong>Block</strong>. (This is necessarily true because a <strong>Block</strong> is the unit of granularity which we hash.)</p> </li> <li> <p>The <strong>Data Model</strong> is composed of <strong>Node</strong>s -- each map is a <strong>Node</strong>; each list is a <strong>Node</strong>; each string is a <strong>Node</strong>; each boolean is a <strong>Node</strong>; each <strong>Link</strong> is a <strong>Node</strong>; etc.</p> </li> <li> <p>A <strong>Block</strong> may contain one <strong>Node</strong>, or many <strong>Node</strong>s in a tree.</p> <ul> <li>By example: <code>{"foo":"bar","baz":1}</code> has five nodes: a map node, three string nodes (two are keys in the map; one is a value), and an int node.</li> <li>You could put all of <code>{"foo":"bar","baz":1}</code> in one <strong>Block</strong>.</li> <li>You could equally well put <code>"a single long string"</code> in one <strong>Block</strong>.</li> <li>Note that a single <strong>Block</strong> cannot contain more than one unconnected tree of <strong>Node</strong>s.</li> </ul> </li> </ul> <h3 id="pathing" tabindex="-1"><a class="header-anchor" href="#pathing">Pathing</a></h3> <ul> <li> <p>A key benefit of having a standardized <strong>Data Model</strong> is that we can define <strong>Pathing</strong> over it -- a <strong>Path</strong> is a simple textual description of how to move from one node in the <strong>Data Model</strong> to another node that is a child (or grandchild, or great-grandchild, etc) of it.</p> </li> <li> <p>A <strong>Path</strong> is composed of <strong>PathSegment</strong>s.</p> </li> <li> <p>Each <strong>PathSegment</strong> is either a key for traversing across a map, or an index for traversing across a list.</p> </li> <li> <p><strong>Path</strong>s are a 1->1 thing: they start from one position in a DAG, and get you to single destination position. If you want something that visits many nodes in a graph, rather than having a single destination, you want <strong>Selectors</strong>.</p> </li> <li> <p><strong>Pathing</strong> in IPLD can also cross over <strong>Link</strong>s transparently -- meaning a <strong>Path</strong> can get you anywhere in a large graph of data.</p> </li> <li> <p><strong>Pathing</strong> in IPLD doesn't contain a concept of "<code>..</code>" (meaning "go up one") because that's not always a defined operation in an a DAG. (Also because "<code>..</code>" is a perfectly valid map key!)</p> <ul> <li>To understand how this could be ambiguous, consider a scenario where multiple <strong>Block</strong>s are connected by <strong>Link</strong>s: if the "<code>..</code>" segment stays within one <strong>Block</strong>, it's clear enough; but if it would point outside of a <strong>Block</strong>... there's no way to interpret that other than look back at the <strong>Path</strong> you were following and chop off a segment. Therefore, one might as well just process the path accordingly, first.</li> </ul> </li> <li> <p>The combination of a <strong>CID</strong> and a <strong>Path</strong> is a context-free way to reference any information that's legible via IPLD. (This is sometimes called a "merklepath".)</p> </li> </ul> <h3 id="advanced-interpretations" tabindex="-1"><a class="header-anchor" href="#advanced-interpretations">Advanced Interpretations</a></h3> <ul> <li> <p>There are two more "advanced" ways of interpreting information that we introduce in IPLD because of their utility: <strong>Schema</strong>s and <strong>ADL</strong>s (short for <strong>A</strong>dvanced <strong>D</strong>ata <strong>L</strong>ayout).</p> </li> <li> <p>Both of these advanced interpretation systems still present their data as <strong>Data Model</strong> -- so <strong>Pathing</strong> and all the other core IPLD concepts still apply over them!</p> </li> </ul> <h3 id="schemas" tabindex="-1"><a class="header-anchor" href="#schemas">Schemas</a></h3> <ul> <li> <p><strong>Schema</strong>s provide developer-friendly ways to describe outlines of data structures that a program wants to handle.</p> <ul> <li>This provides a useful "design language" for IPLD!</li> <li>Implicitly, it also describes the errors that can be generated when attempting to handle data that doesn't match this structural outline.</li> </ul> </li> <li> <p><strong>Schema</strong>s describe two things: <strong>type</strong> information -- which is all about semantics -- and <strong>representation</strong> information -- which is all about how the information is represented at the <strong>Data Model</strong> level.</p> <ul> <li>Separating these two things mean that <strong>Schema</strong>s can be used to describe anything in IPLD, even if it was made <em>without</em> use of Schemas.</li> <li>In fact, <strong>Schema</strong>s can even usefully describe data that was created <em>without IPLD at all</em>.</li> <li><strong>Schema</strong>s layer cleanly with <strong>Codec</strong>s. You can describe the semantics of data... and pick a codec separately.</li> </ul> </li> <li> <p>The kinds of "types" that <strong>Schema</strong>s support are the same as the "kinds" in the <strong>Data Model</strong> (map, list, string, etc)... plus a few more: "struct", "union" (aka sum types), "enum", etc.</p> </li> <li> <p><strong>Schema</strong>s support a variety of possible representations for each type.</p> <ul> <li>For example, the default representation for a "struct" type is simply as a "map", where the keys are the names of the struct's fields.</li> <li>But instead, the representation for that "struct" could also be "tuple" mode (meaning the representation will be a "list" in <strong>Data Model</strong> understanding). In this case, the struct field names aren't present in the representation at all.</li> <li>Some representation strategies contain more redundant data (meaning they're easier to "eyeball" and understand without a schema!); others lean towards higher entropy (meaning that "eyeballing" it might become impractical and a schema might become necessary to understand them fully). We make schemaless and schemafull systems into <em>a gradient</em>: you can choose where on the gradient you want to be (and vary that choice with each type -- e.g., one can start a large data graph with schemaless data, and then use increasingly compact/high-entropy/schemafull representations for data deeper in the graph).</li> </ul> </li> <li> <p>Every <strong>Schema</strong> type also has a defined way of being perceived as <strong>Data Model</strong> -- so, you can <strong>always</strong> apply concepts like <strong>Pathing</strong> <em>over the parsed Schema data</em> just like you would the raw data. It'll just be a little different.</p> <ul> <li>For example: you can have a struct with tuple representation, and it will act like a "map"... at the same time as its representation is a "list". You can traverse and do <strong>Pathing</strong> over either view of the data, at your option.</li> </ul> </li> <li> <p>It's critical to note that <strong>Schema</strong>s are <em>not Turing complete</em>. In fact, they're not even close. There's been a considerable effort made to keep the amount of computation required to decide if data "matches" a Schema or not to be minimal, and at most, proportional to the complexity of the schema (which in turn can be loosely approximated by its sheer textual size).</p> </li> <li> <p>Because the amount of computation needed to determine if a <strong>Schema</strong> "matches" data or not is minimal, it's possible to build <em>version detection</em> and <em>feature detection</em> by simply <em>trying several Schemas in a row</em>.</p> </li> <li> <p>It is not necessary to embed a reference to a <strong>Schema</strong> in a document because of the above reason. In fact, it's desirable <em>not</em> to: explicit versioning is fragile; feature detection allows smooth growth and natural evolution.</p> </li> <li> <p><strong>Schema</strong>s coincidentally provide a very useful input for code-generation tools, which can make Very Fast code for handling the structures described by the Schema.</p> </li> </ul> <h3 id="adls" tabindex="-1"><a class="header-anchor" href="#adls">ADLs</a></h3> <ul> <li> <p><strong>ADL</strong> is short for <strong>A</strong>dvanced <strong>D</strong>ata <strong>L</strong>ayout.</p> </li> <li> <p>The purpose of an <strong>ADL</strong> is to make data legible as a <strong>Data Model</strong> <strong>Node</strong> -- but also, it's distinct from a <strong>Codec</strong>, in that the raw input is <em>also</em> <strong>Data Model</strong> (though it may be one <strong>Node</strong> or <em>many</em>; the input may even span multiple <strong>Block</strong>s).</p> </li> <li> <p>It's probably easiest to understand <strong>ADL</strong>s in terms of some of the problems we've used them to solve:</p> <ul> <li>Having large maps is important. One example of this is for directories in filesystem-like applications. When we want maps that might be larger than can fit reasonably well into a single <strong>Block</strong>, we use an ADL! In particular, "HAMT" is an ADL that we have made and recommend for this purpose. A HAMT splits the map data across many blocks, somewhat like a B+ tree, but presents the whole thing as if it's a single map, conforming to all the <strong>Data Model</strong> <strong>Node</strong> interfaces, so you can program against it normally.</li> <li>Having large byte sequences is important. One example of this is for storing files! When we want to store a byte sequence that might be larger than can fit reasonably well into a single <strong>Block</strong>, we use an ADL! In particular, "FBL" is an ADL that we have made and recommend for this purpose. An FBL can split the byte sequence across many blocks, somewhat like a B+ tree, but presents the whole thing as if it's a single contiguous byte sequence, conforming to all the <strong>Data Model</strong> <strong>Node</strong> interfaces, so you can program against it normally.</li> <li>Encrypting data is neat. We suspect <strong>ADL</strong>s are suitable for describing this, and making encryption operate smoothly within IPLD, without baking any particular algorithm or constructions into IPLD (which would make them harder to innovate on and change). We don't have any fully worked examples of this yet, however.</li> </ul> </li> <li> <p>In contrast to <strong>Schema</strong>s, <strong>ADL</strong>s support topological transformations of data. Because topological transformations do not have an obvious way to have bounded computational costs, we've given up on making <strong>ADL</strong>s be less than Turing-complete.</p> <ul> <li>Accordingly, we've chosen to implement <strong>ADL</strong>s as a "plugin" system: they're usually written in a host language, and must be reimplemented in every IPLD library.</li> <li>In the future, some sort of "bytecode" or "IR" might be introduced which allows more portable definitions of <strong>ADL</strong>s, but no such system is yet specified.</li> <li><strong>ADL</strong>s remain distinct from <strong>Schema</strong>s because the bounded computational costs of <strong>Schema</strong>s makes them suitable for doing "feature detection" with a "try stack"; doing this with ADLs would be prohibitively unpredictable.</li> </ul> </li> <li> <p>Since <strong>ADL</strong>s have raw <strong>Data Model</strong> as their input, it raises the question of how we decide whether an ADL is used on some data. We call this the "signalling problem".</p> <ul> <li>One solution to the signalling problem is "Use a <strong>Schema</strong> to provide the signal".</li> <li>It is possible for other signalling mechanisms to be used. (But there aren't a lot of examples or fully-worked specifications for this yet.)</li> </ul> </li> </ul> <h3 id="yet-more-stuff" tabindex="-1"><a class="header-anchor" href="#yet-more-stuff">Yet More Stuff</a></h3> <ul> <li> <p><strong>Selectors</strong> are a declarative format for specifying a DAG traversal, both stating what parts of the graph to traverse, and providing a way to mark certain nodes as highlighted. There is a standardized <strong>Data Model</strong> tree structure for expressing Selectors (in fact, it's specified by a <strong>Schema</strong>), and most IPLD libraries will provide a native function for evaluating them (typically this involves callbacks for visiting the "highlighted" nodes).</p> </li> <li> <p><strong>Graphsync</strong> is a protocol which builds on <strong>Selectors</strong>, and aims to allow two or more communicating agents to exchange <strong>Block</strong>s of data efficiently: by describing what parts of the graph they're interesting in exchanging, the number of round-trips required to communicate groups of related <strong>Block</strong>s can be greatly reduced.</p> </li> </ul> <h3 id="synthesis" tabindex="-1"><a class="header-anchor" href="#synthesis">Synthesis</a></h3> <ul> <li>It should now be clear from all this: the <strong>Data Model</strong> <strong>Node</strong> interface carries a lot more weigh than just that of the <strong>Data Model</strong> itself: it also defines the necessary (and in many ways the possible) behavior of <strong>Schema</strong> systems and <strong>ADL</strong>s as well; <strong>Codec</strong>s are defined entirely in terms of it; and it is the center of all <strong>Pathing</strong> and other forms of traversal (such as <strong>Selectors</strong>) as well.</li> </ul> <h2 id="getting-things-done-with-ipld" tabindex="-1"><a class="header-anchor" href="#getting-things-done-with-ipld">Getting Things Done with IPLD</a></h2> <p>(There's <a href="/docs/synthesis/gtd/">a fuller document</a> on this in another chapter of the docs.)</p> <p>tl;dr:</p> <ul> <li>Try to do things with the plain <strong>Data Model</strong> first.</li> <li>When you want more rigour and assistive tooling for your data structures, try <strong>Schema</strong>s.</li> <li>If you need to do multi-block data structures, or some other Interesting interpretation of data before you treat it as <strong>Data Model</strong>, then reach for <strong>ADL</strong>s. (But do this judiciously: your data will become incomprehensible to a client that doesn't have your ADL code. It's always better to use a common ADL than to invent a new one.)</li> <li>As a last resort: you can invent a new <strong>Codec</strong>. If you need to process binary forms of data into <strong>Data Model</strong>, and IPLD has never had a codec for this binary format before, then this is what you need. (But do this truly only as a last resort: ideally, this is only done to bridge additional legacy formats into IPLD; new projects should be able to express their logic via the <strong>Data Model</strong> first, and then <em>pick</em> a codec that suits, off the shelf, without additional development effort!)</li> </ul> <h2 id="finer-points" tabindex="-1"><a class="header-anchor" href="#finer-points">Finer Points</a></h2> <ul> <li> <p>Strings are really just bytes: they can contain the same range of data (anything). However, strings carry the implication that they are human readable.</p> <ul> <li>Strings tend to be treated differently by various presentation layers and user-facing tools and debugging representations.</li> <li>Strings and bytes are also serialized distinctly in many <strong>Codec</strong>s.</li> </ul> </li> <li> <p>Maps have a stable, defined iteration order. But it's not necessarily a <em>sorted</em> order.</p> <ul> <li>This is necessarily the case, because some <strong>ADL</strong>s (in fact, HAMTs, specifically) have a natural iteration order, and implementing any other iteration order would be prohibitively expensive, and yet that order is not a sorted order.</li> <li>Some <strong>Codec</strong>s specify a sorted order. If they do so, note that that's the codec's choice; the <strong>Data Model</strong> is more expressive, since it can persist any order.</li> </ul> </li> <li> <p>Lists are not sparse. (Although nothing stops an ADL from making such a thing.)</p> </li> <li> <p>Map keys are strings.</p> <ul> <li>Not bytes. (But remember, the distinction between strings and bytes is mostly in interpretation. In essence, this statement means: map keys are treated as being printable.)</li> <li>Not integers. (It would complicate libraries significantly, and be unsupportable by many codecs, and would create a great deal of ambiguity for tools that want to make human-readable presentations of data.)</li> </ul> </li> <li> <p>There's no way to distinguish <strong>PathSegment</strong>s that are strings (for map keys) vs integers (for list indexes). It's determined by the data it's applied to.</p> </li> <li> <p>Some <strong>PathSegment</strong> do require escaping when conjoined into a <strong>Path</strong>. <strong>Path</strong>s are usually encoded using "/" as a separator; however, "/" is also a valid key in a map (it's just another regular string, after all!); therefore, it follows that escaping may be necessary for some values.</p> </li> </ul> <h2 id="more-info" tabindex="-1"><a class="header-anchor" href="#more-info">More Info</a></h2> <ul> <li>Continue with the other <a href="/docs">docs</a>!</li> <li>For high details, go to the <a href="/specs">specs</a>!</li> <li>For information about known libraries: check out the <a href="/libraries">libraries</a> chapter!</li> </ul> </div> </main> </body> </html>