CINXE.COM

Digital Library Technology Services

<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" > <channel> <title>Digital Library Technology Services</title> <atom:link href="https://wp.nyu.edu/library-dlts/feed/" rel="self" type="application/rss+xml" /> <link>https://wp.nyu.edu/library-dlts</link> <description>A unit of NYU Libraries and NYU IT</description> <lastBuildDate>Thu, 14 Dec 2023 19:20:21 +0000</lastBuildDate> <language>en-US</language> <sy:updatePeriod> hourly </sy:updatePeriod> <sy:updateFrequency> 1 </sy:updateFrequency> <generator>https://wordpress.org/?v=6.5.5</generator> <image> <url>https://wp.nyu.edu/library-dlts/files/2018/08/nyuflavicon-150x150.png</url> <title>Digital Library Technology Services</title> <link>https://wp.nyu.edu/library-dlts</link> <width>32</width> <height>32</height> </image> <item> <title>Spatial Data Repository: Upgrade for 2023</title> <link>https://wp.nyu.edu/library-dlts/2023/12/14/spatial-data-repository-upgrade-2023/</link> <comments>https://wp.nyu.edu/library-dlts/2023/12/14/spatial-data-repository-upgrade-2023/#respond</comments> <dc:creator><![CDATA[Marii Nyrop]]></dc:creator> <pubDate>Thu, 14 Dec 2023 19:20:21 +0000</pubDate> <category><![CDATA[News]]></category> <category><![CDATA[Archives and Special Collections]]></category> <category><![CDATA[Digital Scholarship Services]]></category> <category><![CDATA[Discoverability]]></category> <category><![CDATA[GeoBlacklight]]></category> <category><![CDATA[infrastructure]]></category> <category><![CDATA[Ruby on Rails]]></category> <category><![CDATA[Spatial Date Repository]]></category> <guid isPermaLink="false">https://wp.nyu.edu/library-dlts/?p=2174</guid> <description><![CDATA[<div class="entry-summary"> DLTS, in partnership with Archival Collections Management (ACM), Special Collections, and Application Architecture and Development (AAD), has launched the redesigned Finding Aids and Finding Aids publication system. </div><div class="link-more"><a href="https://wp.nyu.edu/library-dlts/2023/12/14/spatial-data-repository-upgrade-2023/" class="more-link">Continue reading<span class="screen-reader-text"> &#8220;Spatial Data Repository: Upgrade for 2023&#8221;</span>&#8230;</a></div>]]></description> <content:encoded><![CDATA[<p><a href="https://geo.nyu.edu/">NYU&#8217;s Spatial Data Repository</a> (SDR) makes geospatial data searchable, viewable in browser, and usable for research. The SDR includes over 97,000 items ranging from <a href="https://geo.nyu.edu/catalog/nyu-2451-34620">a georeferenced 1797 map of Dublin</a> to the <a href="https://geo.nyu.edu/catalog/stanford-tj200ct4975">Ecoregions of the United States</a> and <a href="https://geo.nyu.edu/catalog/facet/gbl_resourceType_sm">much more</a>.&nbsp;</p> <figure id="attachment_2044" aria-describedby="caption-attachment-2044" style="width: 580px" class="wp-caption aligncenter"><img fetchpriority="high" decoding="async" class="wp-image-2044" src="https://wp.nyu.edu/library-dlts_dev/wp-content/uploads/sites/11676/2023/12/Screen-Shot-2023-12-11-at-3.12.24-PM.png" alt="Screenshot of a webpage from the upgraded Spatial Data Repository, showing a map from 1797 of the city of Dublin, Ireland." width="580" height="758" /><figcaption id="caption-attachment-2044" class="wp-caption-text">Screenshot of a webpage from the upgraded Spatial Data Repository, showing a map from 1797 of the city of Dublin, Ireland.</figcaption></figure> <p>On top of connecting to other NYU services (including the <a href="https://archive.nyu.edu/">Faculty Digital Archive</a>), the SDR also shares geospatial data from 20 collaborating institutions like the <a href="https://geo.btaa.org/">Big Ten Data Alliance</a>, <a href="https://maps.princeton.edu/">Princeton University</a>, and <a href="https://learn.scholarsportal.info/all-guides/borealis/">Scholars Portal Dataverse</a>.</p> <p>The Spatial Data Repository is supported by a collaborative service team of developers, managers, DevOps engineers, and librarians across NYU Libraries and IT. This year, the team took on the major task of upgrading the repository—a coordinated effort that involved updating the <a href="https://geoblacklight.org/">GeoBlacklight</a> discovery application stack, migrating the metadata schema and processing workflows, swapping authentication endpoints, and managing infrastructure changes (See: <a href="https://github.com/NYULibraries/spatial_data_repository/blob/main/.github/changelogs/SDRv2-0_12-2023.md#sdr-v110v20-changelog">CHANGELOG</a>).</p> <p>The project began by contracting Ruby on Rails developer Michael Cain in late June. Since then, the team has met weekly to power through complex tasks and make use of Michael&#8217;s invaluable time and expertise. We were also lucky to welcome NYU&#8217;s new Metadata Librarian for Science &amp; Geospatial Data Zehong Liu to the team in the fall.</p> <p>While there are still minor patches and features to be added, the team celebrated a milestone version 2 deployment of the SDR with GeoBlacklight v4 on November 29th. <img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f389.png" alt="🎉" class="wp-smiley" style="height: 1em; max-height: 1em;" />&nbsp;</p> <p>Thank you to:</p> <ul> <li><a href="https://www.linkedin.com/in/codetrane">Michael Cain</a> and <a href="https://library.nyu.edu/people/marii-nyrop/">Marii Nyrop</a> for development</li> <li><a href="https://library.nyu.edu/people/zehong-liu/">Zehong Liu</a>, <a href="https://library.nyu.edu/people/nicholas-wolf/">Nicholas Wolf</a>, and <a href="https://library.nyu.edu/people/alexander-whelan/">Alex Whelan</a> for metadata curation</li> <li><a href="https://library.nyu.edu/people/derrick-xu/">Derrick Xu</a> and <a href="https://library.nyu.edu/people/ekaterina-pechekhonova/">Kate Pechekhonova</a> for DevOps engineering</li> <li><a href="https://library.nyu.edu/people/michelle-thompson/">Michelle Thompson Gumbs</a> and <a href="https://library.nyu.edu/people/carol-choi/">Carol Choi</a> for QA testing</li> <li><a href="https://library.nyu.edu/people/himanshu-mistry/">Him Mistry</a>, <a href="https://library.nyu.edu/people/debbie-verhoff/">Deb Verhoff</a>, and <a href="https://library.nyu.edu/people/carol-kassel/">Carol Kassel</a> for project leadership and coordination</li> </ul> ]]></content:encoded> <wfw:commentRss>https://wp.nyu.edu/library-dlts/2023/12/14/spatial-data-repository-upgrade-2023/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item> <title>Finding Aids Redesign: Understanding the Input</title> <link>https://wp.nyu.edu/library-dlts/2023/11/07/finding-aids-redesign-understanding-the-input/</link> <comments>https://wp.nyu.edu/library-dlts/2023/11/07/finding-aids-redesign-understanding-the-input/#respond</comments> <dc:creator><![CDATA[Joseph Pawletko]]></dc:creator> <pubDate>Tue, 07 Nov 2023 20:17:26 +0000</pubDate> <category><![CDATA[News]]></category> <category><![CDATA[Archives and Special Collections]]></category> <category><![CDATA[Discoverability]]></category> <category><![CDATA[Finding Aids]]></category> <category><![CDATA[golang]]></category> <category><![CDATA[publishing]]></category> <guid isPermaLink="false">https://wp.nyu.edu/library-dlts/?p=2144</guid> <description><![CDATA[<div class="entry-summary"> DLTS, in partnership with Archival Collections Management (ACM), Special Collections, and Application Architecture and Development (AAD), has launched the redesigned Finding Aids and Finding Aids publication system. </div><div class="link-more"><a href="https://wp.nyu.edu/library-dlts/2023/11/07/finding-aids-redesign-understanding-the-input/" class="more-link">Continue reading<span class="screen-reader-text"> &#8220;Finding Aids Redesign: Understanding the Input&#8221;</span>&#8230;</a></div>]]></description> <content:encoded><![CDATA[<div class="jgp"> <p>How do you translate more than eleven-million lines of XML into modern, elegant, web pages? The Finding Aids Redesign (FADESIGN) team had to answer this question as part of a multi-year effort to replace a 20-year-old web publishing process and <a href="https://wp.nyu.edu/library-dlts/2023/09/22/finding-aids-design-process">modernize the design</a> of the <a href="https://en.wikipedia.org/wiki/Finding_aid">finding aids</a> hosted by NYU Libraries. In order to develop a solution, the team had to better understand the input.</p> <p class="jgp">The eleven-million lines of XML input into the finding aids publishing pipeline are not, thankfully, all in one file. Instead, the XML is distributed across more than five-thousand individual Encoded Archival Description (<a href="https://www2.archivists.org/groups/technical-subcommittee-on-encoded-archival-standards-ts-eas/encoded-archival-description-ead">EAD</a>) files originating from seven organizations housing nine different archival repositories. These XML files conform to the <a href="https://www.loc.gov/ead/eadschema.html">EAD 2002 XML schema</a>, which is incredibly flexible and supports a wide variety of archival-description styles. This flexibility is an asset when describing archival content, but becomes an impediment when developing data structures and web page layouts used to transform the EAD data into HTML. For example, per the EAD 2002 schema, the <a href="https://www.loc.gov/ead/tglib/elements/c.html">component element</a>&nbsp;(<span style="font-family: 'andale mono', monospace">&lt;c&gt;</span>) can be nested <em>ad infinitum</em>. How do you design a web page that handles potentially infinite nesting? Additionally, certain elements like <a href="https://www.loc.gov/ead/tglib/elements/runner.html">runner</a> (<span style="font-family: 'andale mono', monospace">&lt;runner&gt;</span><a href="https://www.loc.gov/ead/tglib/elements/runner.html">)</a> and <a href="https://www.loc.gov/ead/tglib/elements/imprint.html">imprint (</a><span style="font-family: 'andale mono', monospace">&lt;imprint&gt;</span><a href="https://www.loc.gov/ead/tglib/elements/imprint.html">)</a>&nbsp;are defined in the schema but are not used by our archival repositories. Therefore, we needed to find a middle ground between the flexibility supported by the schema and the practical requirements of day-to-day archival description. Enter the <a href="https://github.com/nyudlts/fadesign_29-data-model/blob/main/models.csv">data model</a>.</p> <p class="jgp">The data model is a document collaboratively developed by the archivists and the development team that specifies the subset of EAD 2002 elements and attributes that we would need to faithfully represent the archival description in NYU’s finding aids. The initial data model was developed based on known practices and reviewing data types programmatically generated by the <a href="https://github.com/miku/zek">Zek</a> tool, and refined by moving EADs through our nascent publication pipeline:<a href="https://bpb-us-e1.wpmucdn.com/wp.nyu.edu/dist/a/5498/files/2023/11/fa2_pipeline.png"><img decoding="async" class="alignnone wp-image-2018 size-large" src="https://bpb-us-e1.wpmucdn.com/wp.nyu.edu/dist/a/5498/files/2023/11/fa2_pipeline-1024x323.png" alt="Flowchart showing content moving from EAD to JSON to HTML" width="1024" height="323" /></a></p> <p>The pipeline consists of a <a href="https://go.dev">Golang</a>-based EAD exporter that collects EADs from <a href="https://archivesspace.org">ArchivesSpace</a> instances, another Golang application that parses the incoming EAD files and generates JSON files. These JSON files and a set of Hugo template files are then input into the <a href="https://gohugo.io">Hugo static site generator</a> application to create the finding aid web pages. In order to converge on a publishing solution that met stakeholder requirements, we needed to iterate over the steps below:</p> <ol start="1"> <li>write/update Golang data types based on the data model</li> <li>parse a sample set of EADs and output JSON</li> <li>feed the JSON and Hugo templates to the Hugo static site generator application to generate finding aids</li> <li>review the finding aids, note where changes are required to the design and the data model</li> <li>update the data model</li> </ol> <p>Although the above steps were executed over increasingly large sample sets of EADs, we still had concerns that, given the flexibility of the EAD 2002 schema, there were valid use cases that were not reflected in the data model. We needed more information about the structure and variation of schema elements in use. Therefore, we developed a set of tools to perform analysis on the entire set of more than 5,000 EADs.</p> <p>The <a href="https://github.com/nyudlts/ead-analysis-tools">EAD Analysis Tools</a>&nbsp;are a set of minimal-functionality scripts we created to answer the following questions:</p> <ul id="questions"> <li><a href="#h.safxzjyyg8qx">How many lines are in the largest EAD?</a></li> <li><a href="#h.kdruioiud8ss">What is the maximum nesting level of each element?</a></li> <li>For every element of interest: <ul> <li><a href="#h.m080vmerto47">What <b>attributes</b> does the element have across all EADs in scope?</a></li> <li><a href="#h.m080vmerto47">What <b>child elements</b> does the element have across all EADs in scope?</a></li> <li><a href="#h.zhuge8dzkwvr">How many times does the element appear in the EAD?</a></li> </ul> </li> <li><a href="#h.m080vmerto47">Does the EAD structure vary widely across archival repositories?</a></li> <li><a href="#h.wvwrlaey3zx5">Across all EADs, what sequences of child elements are present?</a></li> <li><a href="#h.u3g1xxdnjgok">What is the set of <span style="font-family: 'andale mono', monospace">&lt;dao @xlink:role&gt;</span> values across all EADs?</a></li> <li><a href="#h.u3g1xxdnjgok">Are there EADs with <span style="font-family: 'andale mono', monospace">&lt;dao @xlink:role&gt;</span> values that are not in the controlled vocabulary?</a></li> </ul> <p>The sections below will walk you through the scripts we used to answer each of the above questions. We wrote the scripts in Ruby and Bash so that we could prototype rapidly and, in the case of the Ruby scripts, take advantage of the <a href="https://nokogiri.org/index.html">Nokogiri XML Ruby Gem</a>.</p> <h2>Conclusion:</h2> <p>Understanding the input to the finding aids publication pipeline was critical to the project&#8217;s success. We developed tools that helped us refine the data model, design appropriate data structures, identify test candidates for various scenarios, gather insight into how the performance of certain operations corresponded to the EAD structure, and identify critical parameters required for the web page design. The data we gathered gave us confidence that when we went to production we were not going to be surprised by a radically different set of EADs incompatible with our publishing infrastructure.&nbsp;Going forward, we can use these same analysis tools to compare EADs from new archival repositories against the data model, allowing us to catch potential issues before publication.</p> <hr /> <h2 id="h.safxzjyyg8qx">Question: How many lines are in the largest EAD?</h2> <h3>Motivation:</h3> <p>We wanted an answer to this question so that we could test our pipeline scalability and throughput. It turns out that this question is rather straightforward to answer using Bash commands and doesn&#8217;t require a stand-alone script.</p> <h3>Script: ad hoc</h3> <p>The following commands will <a href="https://en.wikipedia.org/wiki/Prettyprint&amp;sa=D&amp;source=editors&amp;ust=1699032301756656&amp;usg=AOvVaw01AmZMsK-SDskxdFsjXXUb">pretty-print</a> the EADs and then will count the number of lines in the pretty-printed files. We needed to pretty print the files because XML does not require line breaks, which means that a massive, properly formatted EAD may only contain one very long line. After pretty-printing, however, the EAD line counts can be accurately assessed. For example, without pretty printing, the file in our corpus with the greatest number of lines appears to be <span style="font-family: 'andale mono', monospace">fales/mss_130.xml</span> with 25,986 lines. After pretty printing, however, we find that the file with the greatest number of lines is actually <span style="font-family: 'andale mono', monospace">cbh/bcms_0002.xml</span> with 281,168 lines.</p> <p>Please note that the pretty-print command below overwrites the original EADs with the pretty-printing version. This isn&#8217;t a problem for us because we store the EADs in a Git repository which allows us to easily restore the EADs to their original state.</p> <h3>Sample Execution:</h3> <pre> # pretty-print all XML files in a given directory hierarchy # requires the xmllint command line tool which is part of the <a href="https://gitlab.gnome.org/GNOME/libxml2/-/wikis/home">libxml2 library</a> for f in $(find . -type f -name '*.xml' | sort -V); do &nbsp; echo "$f"; xmllint --format $f &gt; foo &amp;&amp; mv foo $f; done # find the ten EADs with the most lines for f in $(find . -type f -name '*.xml'); do &nbsp; wc -l $f done | sort -nr | head </pre> <h3>Sample Output:</h3> <pre> &nbsp; 281168 ./cbh/bcms_0002.xml &nbsp; 252915 ./tamwag/photos_223.xml &nbsp; 158820 ./tamwag/photos_023.xml &nbsp; 151900 ./akkasah/ad_mc_007.xml &nbsp; 127682 ./tamwag/tam_132.xml &nbsp; 114794 ./fales/mss_343.xml &nbsp; 112833 ./tamwag/tam_326.xml &nbsp; 109295 ./archives/rism_vf_001.xml &nbsp; 101747 ./fales/mss_208.xml &nbsp; &nbsp;96752 ./tamwag/tam_415.xml </pre> <h3>Takeaway:</h3> <p>Our largest EAD consists of 281,168 lines of XML.</p> <p><a href="#questions">return to questions</a></p> <hr /> <h2 id="h.kdruioiud8ss">Question: What is the maximum nesting level of each element?</h2> <h3>Motivation:</h3> <p>As mentioned above, the EAD 2002 schema allows for certain elements to be nested arbitrarily deep. We needed to know how deep the element hierarchies were in practice in order to design the web pages. The <span style="font-family: 'andale mono', monospace">element-depth-analyzer.rb</span> script identifies the EADs with the deepest hierarchies for each element of interest. This was very useful for identifying candidate EADs used in development and testing, because the largest EAD does not necessarily have the deepest nesting level.</p> <h3>Script: <code>element-depth-analyzer.rb</code></h3> <p>The element-depth-analyzer.rb is a simple Ruby script that relies on the <a href="https://nokogiri.org/rdoc/Nokogiri/XML/SAX/Parser.html">Nokogiri Gem&#8217;s SAX stream parser</a>. The script takes two arguments: the path to an EAD file to be analyzed, and the &#8220;element of interest&#8221; on which to run the depth analysis. The script simply responds to SAX parser events, incrementing a counter for each level of an element-of-interest hierarchy, (including the root element). The largest counter value is saved and output at the end of the script execution.</p> <h3>Sample Execution:</h3> <pre> # Find the ten EADs with the deepest &lt;c&gt; element hierarchies&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; for f in $(find . -type f -name '*.xml'); do &nbsp; bin/element-depth-analyzer.rb $f c done | sort -nr | head </pre> <h3>Sample Output:</h3> <pre> # maximum depth observed, relative path to EAD 5,./vlp/mss_lapietra_001.xml 5,./tamwag/photos_097.xml 5,./tamwag/photos_019.xml 5,./nyhs/pr020_geographic_images.xml 5,./fales/mss_208.xml 5,./fales/mss_191.xml 5,./fales/mss_150.xml 5,./fales/mss_110.xml 5,./cbh/arms_2014_019_packer.xml 5,./cbh/arc_006_bergen.xml </pre> <h3>Takeaway:</h3> <p>After running this script across the EAD corpus for each element of interest, we knew the depth of the nesting hierarchies that the design would need to accommodate, and we had identified EADs that would be useful for development and testing.</p> <p><a href="#questions">return to questions</a></p> <hr /> <h2 id="h.m080vmerto47">Does the EAD structure vary widely across archival repositories?</h2> <h3>Motivation:</h3> <p>Although we believed that the data model was solid, we were concerned that there might be element attributes and parent-child element relationships in the EAD corpus that we did not anticipate. Therefore, we developed a pair of &#8220;element-union-analysis&#8221; scripts to gather this information.</p> <h3>Scripts: <code>element-union-single.rb</code> and <code>element-union-multi.rb</code></h3> <p>The element-union-* scripts perform a recursive traversal of the parsed XML file(s). For each element type encountered, e.g., <span style="font-family: 'andale mono', monospace">bioghist</span>, <span style="font-family: 'andale mono', monospace">dao</span>, <span style="font-family: 'andale mono', monospace">archdesc</span>, a new &#8220;AnalysisNode&#8221; is created. The AnalysisNode is used to accumulate all of the attributes, child elements, nesting depth, and whether this element has siblings of the same type. All of the AnalysisNodes are stored in a hash keyed by the element name. During the traversal, the hash is queried to see if an AnalysisNode already exists for the element type. If not, a new AnalysisNode is created, otherwise the existing AnalysisNode is updated. At the end of the traversal, the scripts output all of the information in the AnalysisNodes, sorted by element name.</p> <p>Two scripts were written to perform this analysis: the <span style="font-family: 'andale mono', monospace">element-union-single.rb</span> script is used to analyze individual EAD files, whereas the element-union-multi.rb script performs a union analysis across an arbitrary number of EADs. The script output can be loaded into a spreadsheet and compared against the data model. Additionally, the data from different runs can be loaded into a spreadsheet to compare the results from different sets of EADs. Please see below for an example.</p> <h3>Sample Execution:</h3> <pre> element-union-single.rb fales/mss_208.xml element-union-multi.rb file-containing-paths-of-EADs-to-analyze.txt 1&gt; results.txt</pre> <h3>Sample Output Explanation:</h3> <pre> # The line shown below # dao;true;0;["actuate", "href", "role", "show", "title", "type"];["daodesc"] # is interpreted as follows: # &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dao &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;-- the &lt;dao&gt; element... #&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;true&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;-- has sibling &lt;dao&gt; elements... #&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;-- is not nested inside other &lt;dao&gt; elements... #&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;["actuate", #&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "href", #&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "role", #&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "show", #&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "title", #&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "type"] &nbsp;&nbsp;&nbsp;&lt;-- has the attributes listed inside the brackets, e.g., &lt;dao actuate=... #&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;["daodesc"] &lt;-- has child &lt;daodesc&gt; elements, i.e., &lt;dao&gt;&lt;daodesc&gt;...&lt;/daodesc&gt;&lt;/dao&gt; </pre> <h3>Sample Output:</h3> <pre>name;needs_array?;max_depth;attributes;children;child_sequences abstract;false;0;["id", "label"];["text"];["text"] accessrestrict;false;0;["id"];["head", "p"];["head_p"] appraisal;false;0;["id"];["head", "p"];["head_p"] archdesc;false;0;["level"];["accessrestrict", "appraisal", "arrangement", "bioghist", "controlaccess", "custodhist", "did", "dsc", "prefercite", "processinfo", "relatedmaterial", "scopecontent", "separatedmaterial", "userestrict"];["did_userestrict_accessrestrict_relatedmaterial_arrangement_scopecontent_bioghist_custodhist_prefercite_separatedmaterial_processinfo_appraisal_controlaccess_dsc"] arrangement;false;0;["id"];["head", "p"];["head_p+"] author;false;0;[];["text"];["text"] bioghist;false;0;["id"];["head", "p"];["head_p", "head_p+"] c;true;4;["id", "level", "otherlevel"];["bioghist", "c", "controlaccess", "did", "odd", "scopecontent", "separatedmaterial"];["did", "did_bioghist_scopecontent_c+", "did_bioghist_scopecontent_controlaccess_c+", "did_bioghist_scopecontent_separatedmaterial_controlaccess_c+", "did_c", "did_c+", "did_odd", "did_odd_scopecontent", "did_scopecontent", "did_scopecontent_c+", "did_scopecontent_controlaccess_c+", "did_separatedmaterial"] change;false;0;[];["date", "item"];["date_item"] container;true;0;["altrender", "id", "label", "parent", "type"];["text"];["text"] controlaccess;false;0;[];["corpname", "genreform", "geogname", "occupation", "persname", "subject"];["genreform+_geogname", "genreform+_geogname+", "genreform+_geogname+_genreform+_geogname_genreform+_geogname", "genreform+_geogname_genreform+_geogname_genreform+", "genreform+_geogname_genreform+_geogname_genreform+_geogname_genreform+_subject_genreform_geogname_genreform+_subject+", "genreform+_persname", "genreform+_subject+_geogname+_genreform_subject_genreform+_geogname+_genreform_geogname+_subject_geogname+", "genreform_subject+_genreform_geogname_genreform+_subject_geogname+_subject+_geogname+_subject+_geogname_subject_geogname_subject_genreform+_subject_genreform+_subject_genreform_subject+_genreform_subject_occupation_subject_genreform+_subject+_genreform+_subject+_genreform+_geogname_genreform+_geognam\ e_genreform_geogname+_corpname_persname+_corpname_persname+_corpname_persname_corpname", "genreform_subject+_geogname_subject_genreform+_geogname_genreform+_persname", "geogname+_genreform_geogname+_genreform_geogname_genreform+_geogname_genreform+_persname", "geogname+_persname_corpname", "geogname_genreform+_subject_genreform+_geogname_subject_geogname+_genreform+_persname", "occupation_geogname_genreform+_subject_genreform+_subject+_geogname_genreform_subject_geo\ gname_persname+_corpname"] … p;true;0;[];["date", "emph", "lb", "text", "title"];["date", "emph", "emph_text", "emph_text_emph_text", "emph_text_emph_text_emph_text", "emph_text_emph_text_emph_text_emph_text", "emph_text_emph_text_emph_text_emph_text_emph_text", "text", "text_emph_text", "text_emph_text_emph_text", "text_lb+_text\ ", "title_text"] persname;true;0;["role", "rules", "source"];["text"];["text"] physdesc;false;0;["altrender", "id", "label"];["extent"];["extent", "extent+"] prefercite;false;0;["id"];["head", "p"];["head_p"] processinfo;false;0;["id"];["head", "p"];["head_p+"] profiledesc;false;0;[];["creation", "langusage"];["creation_langusage"] publicationstmt;false;0;[];["p", "publisher"];["publisher_p"] publisher;false;0;[];["text"];["text"] relatedmaterial;false;0;["id"];["head", "p"];["head_p+"] repository;false;0;[];["corpname"];["corpname"] revisiondesc;false;0;[];["change"];["change"] scopecontent;false;0;["id"];["head", "p"];["head_p", "head_p+"] separatedmaterial;false;0;["id"];["head", "p"];["head_p", "head_p+"] sponsor;false;0;[];["text"];["text"] subject;true;0;["source"];["text"];["text"] text;false;0;[];[];[""] title;false;0;["render"];["text"];["text"] titleproper;false;0;[];["num", "text"];["text_num"] titlestmt;false;0;[];["author", "sponsor", "titleproper"];["titleproper_author_sponsor"] unitdate;false;0;["datechar", "normal", "type"];["text"];["text"] unitid;false;0;[];["text"];["text"] unittitle;false;0;[];["emph", "text", "title"];["emph", "emph_text", "emph_text_emph", "text", "text_emph", "text_emph_text", "title_text"] userestrict;false;0;["id"];["head", "p"];["head_p"] </pre> <h3>Sample Analysis Spreadsheet:</h3> <p>Over the course of the FADESIGN project we kept increasing the size of the EAD corpus used for development and testing. Sometimes the additional EADs were from an organization for whom we already had EADs, while at other times we received EADs from an organization that we hadn&#8217;t sampled before. We would run the element-union-* scripts on these sample sets and load the results into a spreadsheet for analysis.</p> <p>In the spreadsheet excerpt below, the results of two sample sets from the same organization are being compared: Sample Set 3 vs. Sample Set 2 (SS2). As you can see in columns F, H, and I, there are differences for the <span style="font-family: 'andale mono', monospace">&lt;archref&gt;</span>&nbsp;and <span style="font-family: 'andale mono', monospace">&lt;author&gt;</span> elements. This information was used to ensure that the data model and corresponding software data structures accommodated these use cases.</p> <p>Spreadsheet excerpt showing the comparison of data from two EAD sample sets:</p> <p><a href="https://bpb-us-e1.wpmucdn.com/wp.nyu.edu/dist/a/5498/files/2023/11/image4.png"><img decoding="async" class="alignnone size-large wp-image-2021" src="https://bpb-us-e1.wpmucdn.com/wp.nyu.edu/dist/a/5498/files/2023/11/image4-1024x518.png" alt="" width="1024" height="518" /></a></p> <h3>Takeaway:</h3> <p>The data collected with these scripts was enormously useful. It not only helped us refine the data model and software data structures as our EAD corpus grew, but also allowed us to compare the EAD structure across archival repositories.</p> <p><a href="#questions">return to questions</a></p> <hr /> <h2 id="h.zhuge8dzkwvr">Question: How many times does an element appear in the EAD?</h2> <h3>Motivation:</h3> <p>We wanted to identify EADs that would be suitable candidates for various testing scenarios.</p> <h4>Script: <code>gen-element-counts.sh</code></h4> <p>The <span style="font-family: 'andale mono', monospace">gen-element-counts.sh </span>script, written in Bash, contains a simple set of commands that leverage the grep utility&#8217;s &#8220;extended regular expression&#8221; and &#8220;count&#8221; functionality. The script prints the results in comma separated value (CSV) format to STDOUT. By redirecting the script output to a .csv file, one can import and review the results using a spreadsheet application. Sorting the spreadsheet by column in descending order, one can see which elements appear most frequently in the EAD.</p> <p>Examples are shown below.</p> <p>One may ask why the data isn&#8217;t column oriented instead of row oriented,e.g., &#8220;Shouldn&#8217;t column A contain the element names and column B contain the element counts?&#8221; The row-oriented output allows one to run the script over an arbitrary number of EADs, directing the output to a file. The file can be processed (see below), and the results loaded into a spreadsheet application. This allows one to determine the EADs with the largest number of a given element across the entire input set. The third spreadsheet excerpt below demonstrates this use case, e.g., <span style="font-family: 'andale mono', monospace">fales/mss_343.xml</span> has the greatest number of <span style="font-family: 'andale mono', monospace">accessrestrict</span> elements at 2,343.</p> <p>Spreadsheet excerpt for a single EAD after importing the csv file:</p> <p><a href="https://bpb-us-e1.wpmucdn.com/wp.nyu.edu/dist/a/5498/files/2023/11/image5.png"><img loading="lazy" decoding="async" class="alignnone size-large wp-image-2022" src="https://bpb-us-e1.wpmucdn.com/wp.nyu.edu/dist/a/5498/files/2023/11/image5-1024x69.png" alt="" width="1024" height="69" /></a></p> <p>Spreadsheet excerpt for a single EAD after sorting columns B through CR in descending order by row 2</p> <p><a href="https://bpb-us-e1.wpmucdn.com/wp.nyu.edu/dist/a/5498/files/2023/11/image2.png"><img loading="lazy" decoding="async" class="alignnone size-large wp-image-2024" src="https://bpb-us-e1.wpmucdn.com/wp.nyu.edu/dist/a/5498/files/2023/11/image2-1024x69.png" alt="" width="1024" height="69" /></a></p> <p>Spreadsheet excerpt containing data from 2,249 EADs. The spreadsheet contains a formula that identifies the EAD with the greatest number of instances for a given element.</p> <div class="wp-block-image"><a href="https://bpb-us-e1.wpmucdn.com/wp.nyu.edu/dist/a/5498/files/2023/11/image1.png"><img loading="lazy" decoding="async" class="alignnone size-large wp-image-2023" src="https://bpb-us-e1.wpmucdn.com/wp.nyu.edu/dist/a/5498/files/2023/11/image1-1024x131.png" alt="" width="1024" height="131" /></a></div> <h3>Sample Execution:</h3> <p><code>gen-element-counts.sh ./tamwag/photos_223.xml &gt; tamwag-photos_223-element_counts.csv</code></p> <h3>Sample Output:</h3> <pre>FILE,abstract,accessrestrict,accruals,acqinfo,address,addressline,altformavail,appraisal,archdesc,archref,arrangement,bibliography,bibref,bioghist,blockquote,c,change,chronitem,chronlist,colspec,container,controlaccess,corpname,creation,custodhist,dao,daodesc,daogrp,daoloc,date,defitem,did,dimensions,dsc,eadheader,eadid,editionstmt,emph,entry,event,eventgrp,extent,extptr,extref,famname,filedesc,fileplan,function,genreform,geogname,head,index,indexentry,item,langmaterial,language,langusage,legalstatus,list,materialspec,name,note,notestmt,num,occupation,odd,originalsloc,origination,otherfindaid,p,persname,physdesc,physfacet,physloc,phystech,prefercite,processinfo,profiledesc,publicationstmt,relatedmaterial,repository,revisiondesc,row,scopecontent,separatedmaterial,subject,table,tbody,tgroup,title,titleproper,titlestmt,unitdate,unittitle,userestrict, tamwag/photos_223.xml,1,1,0,1,1,7,0,0,1,0,2,1,4,1,0,26504,2,0,0,0,53005,2,313,1,0,0,0,0,0,4,0,26505,0,1,1,1,0,117,0,0,0,2,1,0,0,1,0,0,3,5,3720,3700,9907,2,1,0,1,0,0,0,6420,0,0,1,0,1,0,3,0,63,76,1,0,0,1,1,2,1,1,2,1,1,0,5,1,3204,0,0,0,2,1,1,26248,26505,1, </pre> <h3>Sample Data Processing:</h3> <pre> # ---------------------------------------------------------- # extract-element-count script execution / data processing # ---------------------------------------------------------- # extract element counts time for f in $(cat file-list.txt); do ./bin/gen-element-counts.sh $f | tee -a element-counts.txt; done # extract header line head -1 element-counts.txt &gt; element-counts.csv # extract data and append to csv file grep -v FILE element-counts.txt &gt;&gt; element-counts.csv # check that the line count is the EAD file count + 1 (for the header row) wc -l element-counts.csv &nbsp; &nbsp; 2112 element-counts.csv </pre> <h3>Takeaway:</h3> <p>It is useful to know which EAD in a sample set has the maximum occurrences of a given element. The data can be used to identify test candidates and also provides &nbsp;data useful for correlating software performance against EAD element counts . For example, during our EAD indexing we repeatedly observed that certain EADs took significantly longer to index than others. We found that these EADs had the highest number of <a href="https://www.loc.gov/ead/tglib/elements/container.html">container</a>&nbsp;(<span style="font-family: 'andale mono', monospace">&lt;container&gt;</span>) elements. We optimized the code for container element processing which reduced the application execution time by orders of magnitude.</p> <p><a href="#questions">return to questions</a></p> <hr /> <h2 id="h.wvwrlaey3zx5">Question: Across all EADs, what sequences of child elements are present?</h2> <h3>Motivation:</h3> <p>As we worked on this project, we realized one problem that is actually noted in the <a href="https://pkg.go.dev/encoding/xml%23pkg-note-BUG">Golang documentation</a>:</p> <p><code>"Mapping between XML elements and data structures is inherently flawed: an XML element is an order-dependent collection of anonymous values, while a data structure is an order-independent collection of named values."</code></p> <p>Development was already quite far along when we realized this. Instead of changing our entire parsing strategy, which would have required significant rework of our Hugo templates, we decided to selectively implement stream parsing to preserve the element order when necessary . To determine which elements needed to be stream parsed, we created scripts that were variations of the <code>element-union-*</code> scripts described above. These new scripts output the same data as the <code>element-union-*</code> strings, but add child-element sequence information that we used to determine where stream parsing was required.</p> <h3>Scripts: <code>element-union-with-child-sequences-single.rb</code> and <code>element-union-with-child-sequences-multi.rb</code></h3> <h3>Sample Execution:</h3> <p><code>element-union-with-child-sequences-single.rb ./fales/mss_208.xml</code><br /> <code>element-union-with-child-sequences-multi.rb file-containing-paths-of-EADs-to-analyze.txt 1&gt; results.txt</code></p> <h3>Sample Output Explanation:</h3> <pre> # In the output below, a child sequence containing a "&lt;element name&gt;+" # indicates that there was more than one consecutive element of that type. # # For example, the line # bioghist;false;0;["id"];["head", "p"];["head_p", "head_p+"] # can be broken down as follows: # &nbsp; bioghist;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;-- element name # &nbsp; false;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;-- does not have any sibling &lt;bioghist&gt; elements # &nbsp; 0;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;-- is not nested inside any other &lt;bioghist&gt; elements # &nbsp; ["id"];&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;-- has an @id attribute # &nbsp; ["head", "p"];&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;-- has the child elements &lt;head&gt; and &lt;p&gt; # &nbsp; ["head_p", "head_p+"]&nbsp; &lt;-- the &lt;bioghist&gt; has child elements in # &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; the following sequences: # &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&lt;head&gt;&lt;/head&gt;&lt;p&gt;&lt;/p&gt; # &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;and # &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;head&gt;&lt;/head&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;... </pre> <h3>Sample Output:</h3> <pre> name;needs_array?;max_depth;attributes;children;child_sequences abstract;false;0;["id", "label"];["text"];["text"] accessrestrict;false;0;["id"];["head", "p"];["head_p"] appraisal;false;0;["id"];["head", "p"];["head_p"] archdesc;false;0;["level"];["accessrestrict", "appraisal", "arrangement", "bioghist", "controlaccess", "custodhist", "did", "dsc", "prefercite", "processinfo", "relatedmaterial", "scopecontent", "separatedmaterial", "userestrict"];["did_userestrict_accessrestrict_relatedmaterial_arrangement_scopecontent_bioghist_custodhist_prefercite_separatedmaterial_processinfo_appraisal_controlaccess_dsc"] arrangement;false;0;["id"];["head", "p"];["head_p+"] author;false;0;[];["text"];["text"] bioghist;false;0;["id"];["head", "p"];["head_p", "head_p+"] c;true;4;["id", "level", "otherlevel"];["bioghist", "c", "controlaccess", "did", "odd", "scopecontent", "separatedmaterial"];["did", "did_bioghist_scopecontent_c+", "did_bioghist_scopecontent_controlaccess_c+", "did_bioghist_scopecontent_separatedmaterial_controlaccess_c+", "did_c", "did_c+", "did_odd", "did_odd_scopecontent", "did_scopecontent", "did_scopecontent_c+", "did_scopecontent_controlaccess_c+", "did_separatedmaterial"] change;false;0;[];["date", "item"];["date_item"] container;true;0;["altrender", "id", "label", "parent", "type"];["text"];["text"] controlaccess;false;0;[];["corpname", "genreform", "geogname", "occupation", "persname", "subject"];["genreform+_geogname", "genreform+_geogname+", "genreform+_geogname+_genreform+_geogname_genreform+_geogname", "genreform+_geogname_genreform+_geogname_genreform+", "genreform+_geogname_genreform+_geogname_genreform+_geogname_genreform+_subject_genreform_geogname_genreform+_subject+", "genreform+_persname", "genreform+_subject+_geogname+_genreform_subject_genreform+_geogname+_genreform_geogname+_subject_geogname+", "genreform_subject+_genreform_geogname_genreform+_subject_geogname+_subject+_geogname+_subject+_geogname_subject_geogname_subject_genreform+_subject_genreform+_subject_genreform_subject+_genreform_subject_occupation_subject_genreform+_subject+_genreform+_subject+_genreform+_geogname_genreform+_geognam\ e_genreform_geogname+_corpname_persname+_corpname_persname+_corpname_persname_corpname", "genreform_subject+_geogname_subject_genreform+_geogname_genreform+_persname", "geogname+_genreform_geogname+_genreform_geogname_genreform+_geogname_genreform+_persname", "geogname+_persname_corpname", "geogname_genreform+_subject_genreform+_geogname_subject_geogname+_genreform+_persname", "occupation_geogname_genreform+_subject_genreform+_subject+_geogname_genreform_subject_geo\ gname_persname+_corpname"] … p;true;0;[];["date", "emph", "lb", "text", "title"];["date", "emph", "emph_text", "emph_text_emph_text", "emph_text_emph_text_emph_text", "emph_text_emph_text_emph_text_emph_text", "emph_text_emph_text_emph_text_emph_text_emph_text", "text", "text_emph_text", "text_emph_text_emph_text", "text_lb+_text\ ", "title_text"] persname;true;0;["role", "rules", "source"];["text"];["text"] physdesc;false;0;["altrender", "id", "label"];["extent"];["extent", "extent+"] prefercite;false;0;["id"];["head", "p"];["head_p"] processinfo;false;0;["id"];["head", "p"];["head_p+"] profiledesc;false;0;[];["creation", "langusage"];["creation_langusage"] publicationstmt;false;0;[];["p", "publisher"];["publisher_p"] publisher;false;0;[];["text"];["text"] relatedmaterial;false;0;["id"];["head", "p"];["head_p+"] repository;false;0;[];["corpname"];["corpname"] revisiondesc;false;0;[];["change"];["change"] scopecontent;false;0;["id"];["head", "p"];["head_p", "head_p+"] separatedmaterial;false;0;["id"];["head", "p"];["head_p", "head_p+"] sponsor;false;0;[];["text"];["text"] subject;true;0;["source"];["text"];["text"] text;false;0;[];[];[""] title;false;0;["render"];["text"];["text"] titleproper;false;0;[];["num", "text"];["text_num"] titlestmt;false;0;[];["author", "sponsor", "titleproper"];["titleproper_author_sponsor"] unitdate;false;0;["datechar", "normal", "type"];["text"];["text"] unitid;false;0;[];["text"];["text"] unittitle;false;0;[];["emph", "text", "title"];["emph", "emph_text", "emph_text_emph", "text", "text_emph", "text_emph_text", "title_text"] userestrict;false;0;["id"];["head", "p"];["head_p"] </pre> <h3>Takeaway:</h3> <p>It was critical to understand the child element sequences so that we could ensure that our code generated HTML that reflected the original order found in the EADs.</p> <p><a href="#questions">return to questions</a></p> <hr /> <h3>Questions:</h3> <h4 id="h.u3g1xxdnjgok">What is the set of <span style="font-family: 'andale mono', monospace">&lt;dao @xlink:role&gt;</span> values across all EADs?</h4> <h4 id="h.u3g1xxdnjgok">Are there EADs with <span style="font-family: 'andale mono', monospace">&lt;dao @xlink:role&gt;</span> values outside of the controlled vocabulary?</h4> <h3>Motivation:</h3> <p>Our finding aids allow patrons to directly access various types of digital content, e.g., audio, video, images, or request access to other content, e.g., electronic records. The finding aids publishing infrastructure needs to know what type of digital content is being served so that the appropriate viewer can be included in the HTML, e.g., an image view, audio player, video player. The EAD Digital Archival Object (DAO) elements have role attributes that indicate what kind of digital object is being described, e.g., image-service, audio-service, video-service, electronic-records-reading-room. These role attribute values come from a controlled vocabulary. In order to ensure that our controlled vocabulary contained all valid DAO role values in the EADs, and to identify EADs with DAO roles that required remediation, we wrote a script that extracts all of the unique <span style="font-family: 'andale mono', monospace">&lt;dao @xlink:role&gt;</span> values across a set of EADs.</p> <h3>Script: <code>dao-role-extractor-multi.rb</code></h3> <h3>Sample Execution:</h3> <p><code>dao-role-extractor-multi.rb file-containing-paths-of-EADs.txt 1&gt; results.txt</code></p> <h3>Sample Output:</h3> <pre> processing tamwag/aia_001.xml processing tamwag/aia_002.xml processing tamwag/aia_003.xml processing tamwag/aia_004.xml processing tamwag/aia_005.xml processing tamwag/aia_006.xml processing tamwag/aia_007.xml processing tamwag/aia_008.xml processing tamwag/aia_009.xml processing tamwag/aia_010.xml … processing tamwag/wag_370.xml processing tamwag/wag_371.xml processing tamwag/wag_372.xml processing tamwag/wag_373.xml processing tamwag/wag_375.xml processing tamwag/web_arc_001.xml processing tamwag/web_arc_002.xml processing tamwag/web_arc_003.xml # the contents of the results.txt is as follows: ROLES: audio-reading-room audio-service electronic-records-reading-room external-link image-service video-reading-room video-service </pre> <h3>Takeaway:</h3> <p>Being able to extract the DAO roles across an arbitrary set of EADs helped us identify non-conforming EADs and finalize our <span style="font-family: 'andale mono', monospace">&lt;dao @xlink:role&gt;</span> controlled vocabulary.</p> <p><a href="#questions">return to questions</a></p> <hr /> <h2>Acknowledgements:</h2> <p>Thanks to Laura Henze, Deb Verhoff, and Don Mennerich for reviewing this post and providing feedback.</p> </div> ]]></content:encoded> <wfw:commentRss>https://wp.nyu.edu/library-dlts/2023/11/07/finding-aids-redesign-understanding-the-input/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item> <title>Finding Aids Redesign: The Design Process</title> <link>https://wp.nyu.edu/library-dlts/2023/09/22/finding-aids-design-process/</link> <comments>https://wp.nyu.edu/library-dlts/2023/09/22/finding-aids-design-process/#respond</comments> <dc:creator><![CDATA[Laura Henze]]></dc:creator> <pubDate>Fri, 22 Sep 2023 20:56:58 +0000</pubDate> <category><![CDATA[Archives and Special Collections]]></category> <category><![CDATA[Digitization]]></category> <category><![CDATA[News]]></category> <category><![CDATA[Accessibility]]></category> <category><![CDATA[digital preservation]]></category> <category><![CDATA[Discoverability]]></category> <category><![CDATA[Finding Aids]]></category> <category><![CDATA[publishing]]></category> <category><![CDATA[UX]]></category> <guid isPermaLink="false">https://wp.nyu.edu/library-dlts/?p=2122</guid> <description><![CDATA[<div class="entry-summary"> DLTS, in partnership with Archival Collections Management (ACM), Special Collections, and Application Architecture and Development (AAD), has launched the redesigned Finding Aids and Finding Aids publication system. </div><div class="link-more"><a href="https://wp.nyu.edu/library-dlts/2023/09/22/finding-aids-design-process/" class="more-link">Continue reading<span class="screen-reader-text"> &#8220;Finding Aids Redesign: The Design Process&#8221;</span>&#8230;</a></div>]]></description> <content:encoded><![CDATA[<p>After much anticipation and much iteration, we recently launched the redesign of NYU’s Special Collection Finding Aids. These comprise thousands of individual sites, each containing a description of an archival collection and its contents, along with relevant histories and metadata. In addition to the user-facing pages, the launch included a reimagined pipeline for publishing. It took a partnership between multiple departments: Digital Library Technology Services (DLTS), Archival Collections Management (ACM), Special Collections, and Application Architecture and Development (AAD).</p> <p>There were multiple parallel tracks of work, including:</p> <ul> <li>partner communication</li> <li>data modeling and remediation</li> <li>data transformation tool development</li> <li>publishing tool development</li> <li>dev ops</li> <li>discovery system integration</li> <li>templating</li> </ul> <p>As designer on this project, I’d like to share a bit about the design process. It involved conducting research, a lot of conversation, exploring and updating, within the constraints of a complex technology stack.</p> <h2>Preliminary Research: Reading</h2> <p>To learn from the existing body of knowledge, we studied several published articles, including a literature survey. Key takeaways included the importance of calling out digital objects (images, audio, or video available for immediate access online) and making deliberate use of inclusive language.</p> <h2>Review the Finding Aids of peer institutions</h2> <p>The whole team gathered examples of “finding aids in the wild,” and discussed some of their strengths, and which features we might emulate.</p> <h2>Articulate design goals</h2> <p>At this point we were able to name our goals, which emerged as:</p> <ul> <li>Inclusive design.&nbsp; The purpose of the finding aids should be understandable to all, without relying on insider jargon. Site design should use familiar patterns and icons to allow intuitive navigation.&nbsp; Pages should be mobile-friendly and light. They must comply with NYU’s mandate for WCAG 2.0 AA accessibility.</li> <li>Improved presentation of digital objects. Images, audio, and video which are available online should be surfaced, easy to find and easy to navigate. Each object should have its own page and its own URL so it can be bookmarked and cited.</li> <li>Sense of place. The contents are often arranged in complex hierarchies and it’s essential that researchers always know where they are in the hierarchy.</li> <li>Branding. For NYU Libraries research materials, branding should clearly locate the user in the NYU information ecosystem.</li> </ul> <h2>Wireframes</h2> <p>We created low-fidelity wireframes of our initial ideas, to support communication and to uncover any gaps in our thinking.</p> <p>An example of an early wireframe (click to view larger).&nbsp;&nbsp;</p> <p><a href="https://bpb-us-e1.wpmucdn.com/wp.nyu.edu/dist/a/5498/files/2023/09/content_page_early.png"><img loading="lazy" decoding="async" class="alignnone wp-image-2124 size-medium" src="https://bpb-us-e1.wpmucdn.com/wp.nyu.edu/dist/a/5498/files/2023/09/content_page_early-187x300.png" alt="Wireframe showing functional page elements" width="187" height="300" srcset="https://wp.nyu.edu/library-dlts/files/2023/09/content_page_early-187x300.png 187w, https://wp.nyu.edu/library-dlts/files/2023/09/content_page_early-639x1024.png 639w, https://wp.nyu.edu/library-dlts/files/2023/09/content_page_early-768x1230.png 768w, https://wp.nyu.edu/library-dlts/files/2023/09/content_page_early-959x1536.png 959w, https://wp.nyu.edu/library-dlts/files/2023/09/content_page_early-400x641.png 400w, https://wp.nyu.edu/library-dlts/files/2023/09/content_page_early.png 1056w" sizes="(max-width: 187px) 100vw, 187px" /></a></p> <h2>Listening sessions with archivists and public service colleagues</h2> <p>Deb, Weatherly and I met with members of Archival Collections Management and Special Collections, two of our key stakeholder groups. We walked through our design goals and the wireframes, while encouraging specific feedback as well as new requests. Several important features came out of this, especially around the goal of broad comprehensibility. These included creating a dedicated “Request Materials” page, revising the language on the labels, and adding links in the footers to an explainer on “Using Archives &amp; Manuscripts.”</p> <h2>Revised Wireframes</h2> <p>Here are some final wireframes:</p> <p>.<a href="https://bpb-us-e1.wpmucdn.com/wp.nyu.edu/dist/a/5498/files/2023/09/Landing-page-desktop.png"><img loading="lazy" decoding="async" class="size-medium wp-image-2127 alignnone" src="https://bpb-us-e1.wpmucdn.com/wp.nyu.edu/dist/a/5498/files/2023/09/Landing-page-desktop-214x300.png" alt="Wireframe showing functional page elements as displayed" width="214" height="300" srcset="https://wp.nyu.edu/library-dlts/files/2023/09/Landing-page-desktop-214x300.png 214w, https://wp.nyu.edu/library-dlts/files/2023/09/Landing-page-desktop-732x1024.png 732w, https://wp.nyu.edu/library-dlts/files/2023/09/Landing-page-desktop-768x1074.png 768w, https://wp.nyu.edu/library-dlts/files/2023/09/Landing-page-desktop-400x559.png 400w, https://wp.nyu.edu/library-dlts/files/2023/09/Landing-page-desktop.png 1056w" sizes="(max-width: 214px) 100vw, 214px" /></a></p> <p><a href="https://bpb-us-e1.wpmucdn.com/wp.nyu.edu/dist/a/5498/files/2023/09/Landing-page-mobile.png"><img loading="lazy" decoding="async" class="alignnone size-medium wp-image-2128" src="https://bpb-us-e1.wpmucdn.com/wp.nyu.edu/dist/a/5498/files/2023/09/Landing-page-mobile-300x286.png" alt="Wireframe showing functional page elements as displayed on a mobile device" width="300" height="286" srcset="https://wp.nyu.edu/library-dlts/files/2023/09/Landing-page-mobile-300x286.png 300w, https://wp.nyu.edu/library-dlts/files/2023/09/Landing-page-mobile-1024x975.png 1024w, https://wp.nyu.edu/library-dlts/files/2023/09/Landing-page-mobile-768x731.png 768w, https://wp.nyu.edu/library-dlts/files/2023/09/Landing-page-mobile-1250x1191.png 1250w, https://wp.nyu.edu/library-dlts/files/2023/09/Landing-page-mobile-400x381.png 400w, https://wp.nyu.edu/library-dlts/files/2023/09/Landing-page-mobile.png 1324w" sizes="(max-width: 300px) 100vw, 300px" /></a></p> <h2 style="clear: left">Visual Design / Templating / Accessibility</h2> <p>We designed the sites to be visually uncluttered, fast and accessible. The choice of framework was Hugo, a static site generator, because of the flexibility of the templating language and the relative speed and security of “flat” HTML. We included both the NYU Libraries UX team and the NYU IT Accessibility group in ongoing conversations, to best align with NYU’s requirements for visual identity and accessibility.</p> <h2>Usability tests and conversations with users</h2> <p>During development, Deb, Weatherly and I took the Work In Progress to a number of potential users, including researchers, archivists, and representatives from the New-York Historical Society and Center for Brooklyn History. We gathered their impressions of the design’s usability, brand identity, and its overall ability to meet their needs, and updated some of the theming to reflect requests.</p> <h2>Next Steps</h2> <p>We welcome feature requests and have a few on tap already for phase 2, which starts early next year.</p> ]]></content:encoded> <wfw:commentRss>https://wp.nyu.edu/library-dlts/2023/09/22/finding-aids-design-process/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item> <title>New finding aids design launched</title> <link>https://wp.nyu.edu/library-dlts/2023/08/29/new-finding-aids-design-launched/</link> <comments>https://wp.nyu.edu/library-dlts/2023/08/29/new-finding-aids-design-launched/#respond</comments> <dc:creator><![CDATA[Deb Verhoff]]></dc:creator> <pubDate>Tue, 29 Aug 2023 11:57:13 +0000</pubDate> <category><![CDATA[News]]></category> <category><![CDATA[Archives and Special Collections]]></category> <category><![CDATA[Discoverability]]></category> <category><![CDATA[Finding Aids]]></category> <category><![CDATA[publishing]]></category> <guid isPermaLink="false">https://wp.nyu.edu/library-dlts/?p=2132</guid> <description><![CDATA[<div class="entry-summary"> Today, NYU Libraries completed the publication and indexing for more than 5,000 unique finding aids representing NYU’s archival repositories and two partners, The New-York Historical Society and The Center for Brooklyn History.&#160; The project to redesign the NYU Libraries finding&#8230; </div><div class="link-more"><a href="https://wp.nyu.edu/library-dlts/2023/08/29/new-finding-aids-design-launched/" class="more-link">Continue reading<span class="screen-reader-text"> &#8220;New finding aids design launched&#8221;</span>&#8230;</a></div>]]></description> <content:encoded><![CDATA[<div class="adL"> <div class="im"> <div> <div dir="ltr"> <p>Today, NYU Libraries completed the publication and indexing for more than 5,000 unique finding aids representing NYU’s archival repositories and two partners, The New-York Historical Society and The Center for Brooklyn History.&nbsp;</p> <div>The project to redesign the NYU Libraries finding aids was led by Digital Library Technology Services in close collaboration with Archival Collections Management and Special Collections. The team addressed three main objectives to: replace an outdated publishing method; address ongoing issues related to usability for patrons; and improve upon the presentation for digital archival objects.&nbsp;</div> <div>Those of you who work closely with archival content and its description will appreciate the variability across finding aids, and the amount of work that went into parsing this type of data for html web display. If you haven’t yet looked at any finding aids, now is an excellent time to do so!&nbsp;</div> <div>Search or browse at:&nbsp;</div> <div><a href="https://specialcollections.library.nyu.edu/" target="_blank" rel="noopener" data-saferedirecturl="https://www.google.com/url?q=https://specialcollections.library.nyu.edu/&amp;source=gmail&amp;ust=1695500896893000&amp;usg=AOvVaw1ykRbGwwm8RDAMiOifSLf3"><b>https://specialcollections.library.nyu.edu</b></a></div> <div>&nbsp;</div> <p dir="ltr">Thank you to the project team members who contributed to this success.&nbsp;</p> <div>Joseph Pawletko, Senior Software Developer&nbsp;</div> <div>Laura Henze, Senior UX Designer&nbsp;</div> <div>Don Mennerich, Senior Digital Archivist</div> <div>Weatherly Stephan, Head, Archival Collections Management</div> <div>Deb Verhoff, Digital Collections Manager</div> <div>Rasan Rasch, Software Developer</div> <div>David Arjanik, Senior Web Developer</div> <div>Eric Griffis, Manager, Application Architecture and Development</div> <div>Ekaterina Pechekhonova, Senior Dev Ops Engineer</div> <div>Derrick Xu, Dev Ops Engineer</div> </div> </div> </div> </div> ]]></content:encoded> <wfw:commentRss>https://wp.nyu.edu/library-dlts/2023/08/29/new-finding-aids-design-launched/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item> <title>Coca Crystal Video Recordings and Papers published</title> <link>https://wp.nyu.edu/library-dlts/2023/07/11/coca-crystal/</link> <comments>https://wp.nyu.edu/library-dlts/2023/07/11/coca-crystal/#comments</comments> <dc:creator><![CDATA[Carol Kassel]]></dc:creator> <pubDate>Tue, 11 Jul 2023 14:46:24 +0000</pubDate> <category><![CDATA[Archives and Special Collections]]></category> <category><![CDATA[Digitization]]></category> <category><![CDATA[News]]></category> <category><![CDATA[counterculture]]></category> <category><![CDATA[digital preservation]]></category> <category><![CDATA[feminism]]></category> <category><![CDATA[NYC History]]></category> <category><![CDATA[television]]></category> <guid isPermaLink="false">https://wp.nyu.edu/library-dlts/?p=2103</guid> <description><![CDATA[<div class="entry-summary"> Coca Crystal (1947-2016) was a journalist, television personality, and political activist. She contributed to the East Village Other, writing about counterculture politics with a strong focus on women's issues. </div><div class="link-more"><a href="https://wp.nyu.edu/library-dlts/2023/07/11/coca-crystal/" class="more-link">Continue reading<span class="screen-reader-text"> &#8220;Coca Crystal Video Recordings and Papers published&#8221;</span>&#8230;</a></div>]]></description> <content:encoded><![CDATA[<div>We just published videos from the Coca Crystal Video Recordings and Papers collection:</div> <div>&nbsp;</div> <div><a href="https://dlib.nyu.edu/findingaids/html/fales/mss_468">https://dlib.nyu.edu/findingaids/html/fales/mss_468/</a></div> <blockquote> <div>Coca Crystal (1947-2016) was a journalist, television personality, and political activist. She contributed to the&nbsp;<em>East Village Other</em>, writing about counterculture politics with a strong focus on women&#8217;s issues. From 1977 until 1995, she created and hosted the cable-access variety television program <em>The Coca Crystal Show: If I Can&#8217;t Dance, You Can Keep Your Revolution</em>, which featured political commentary, music, guest interviews, and audience call-ins.</div> </blockquote> <div>&nbsp;</div> <div><a href="https://dlib.nyu.edu/findingaids/html/fales/mss_468/dscaspace_677d309ffe7f29cda55ddd8cd563b8b7.html">Episodes</a> include interviews with <a href="https://hdl.handle.net/2333.1/3tx968mt">Debbie Harry, Chris Stein of Blondie</a> , <a href="https://hdl.handle.net/2333.1/wh70s9fm">Women Against Pornography</a>, <a href="https://hdl.handle.net/2333.1/08kps3jh">Abbie Hoffman</a>, <a href="https://hdl.handle.net/2333.1/gxd25gqp">Michelle Shocked</a>, <a href="https://hdl.handle.net/2333.1/z8w9gwfc">Philip Glass</a> and many more.</div> <div>&nbsp;</div> <div>&nbsp;</div> <div>Thanks to Joe and Alberto for moving this collection through the pipeline, and to Don for helping us remove Aeon links when they were no longer needed.</div> ]]></content:encoded> <wfw:commentRss>https://wp.nyu.edu/library-dlts/2023/07/11/coca-crystal/feed/</wfw:commentRss> <slash:comments>4</slash:comments> </item> <item> <title>Mirador IIIF viewer available for all books</title> <link>https://wp.nyu.edu/library-dlts/2023/07/09/mirador-pilot/</link> <comments>https://wp.nyu.edu/library-dlts/2023/07/09/mirador-pilot/#respond</comments> <dc:creator><![CDATA[Carol Kassel]]></dc:creator> <pubDate>Sun, 09 Jul 2023 20:29:53 +0000</pubDate> <category><![CDATA[Archives and Special Collections]]></category> <category><![CDATA[Digitization]]></category> <category><![CDATA[News]]></category> <category><![CDATA[International Image Interoperability Framework (IIIF)]]></category> <category><![CDATA[Mirador]]></category> <guid isPermaLink="false">https://wp.nyu.edu/library-dlts/?p=2081</guid> <description><![CDATA[<div class="entry-summary"> We have officially launched our Mirador pilot! This open-source, IIIF-ready viewing environment is a worthy alternative to our home-grown viewer. </div><div class="link-more"><a href="https://wp.nyu.edu/library-dlts/2023/07/09/mirador-pilot/" class="more-link">Continue reading<span class="screen-reader-text"> &#8220;Mirador IIIF viewer available for all books&#8221;</span>&#8230;</a></div>]]></description> <content:encoded><![CDATA[<p>We have officially launched our Mirador pilot! This open-source, IIIF-ready viewing environment is a worthy alternative to our home-grown viewer. Over the coming months, we will be doing some testing, research, and enhancements to Mirador to prepare it to replace our legacy viewer altogether. We will also be deepening our ties to the Mirador community.</p> <p>You can test-drive Mirador yourself on any of our existing books. Try it on the new Verbatim Reports (the link is at the bottom of the metadata pane):</p> <p><a href="https://hdl.handle.net/2333.1/gtht7k3f">https://hdl.handle.net/2333.1/gtht7k3f</a><br /> <a href="https://hdl.handle.net/2333.1/c2fqzjk0">https://hdl.handle.net/2333.1/c2fqzjk0</a></p> <p>A big thanks to the Mirador project team &#8211; Nicole, Alberto, Laura, and Damon &#8211; for their hard work to get us to this milestone.</p> ]]></content:encoded> <wfw:commentRss>https://wp.nyu.edu/library-dlts/2023/07/09/mirador-pilot/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item> <title>Conspiracy to blackmail, in 1805 Britain: trial text published</title> <link>https://wp.nyu.edu/library-dlts/2023/06/29/verbatim-report/</link> <comments>https://wp.nyu.edu/library-dlts/2023/06/29/verbatim-report/#respond</comments> <dc:creator><![CDATA[Carol Kassel]]></dc:creator> <pubDate>Thu, 29 Jun 2023 21:00:28 +0000</pubDate> <category><![CDATA[Archives and Special Collections]]></category> <category><![CDATA[Digitization]]></category> <category><![CDATA[News]]></category> <category><![CDATA[blackmail]]></category> <category><![CDATA[criminology]]></category> <category><![CDATA[indelicacy]]></category> <category><![CDATA[LGBTQ & Sexuality]]></category> <guid isPermaLink="false">https://wp.nyu.edu/library-dlts/?p=2083</guid> <description><![CDATA[<div class="entry-summary"> We just published the Verbatim Report of the proceedings against Edwards and Passingham for conspiracy to blackmail George Townshend Forrester. This is the frank and unexpurgated text of a trial concerning alleged homosexual acts, which were illegal and taboo in Regency England </div><div class="link-more"><a href="https://wp.nyu.edu/library-dlts/2023/06/29/verbatim-report/" class="more-link">Continue reading<span class="screen-reader-text"> &#8220;Conspiracy to blackmail, in 1805 Britain: trial text published&#8221;</span>&#8230;</a></div>]]></description> <content:encoded><![CDATA[<p>We just published the Verbatim Report of the proceedings against Edwards and Passingham for conspiracy to blackmail George Townshend Forrester:</p> <p><a href="https://hdl.handle.net/2333.1/gtht7k3f">https://hdl.handle.net/2333.1/gtht7k3f</a><br /> <a href="https://hdl.handle.net/2333.1/c2fqzjk0">https://hdl.handle.net/2333.1/c2fqzjk0</a></p> <p>This is the frank and unexpurgated text of a trial concerning alleged homosexual acts, which were illegal and taboo in Regency England &#8211; the last executions for homosexuality taking place in Britain in 1835. As such, this courtroom manuscript, probably written up from notes taken by Passingham&#8217;s defence lawyer, includes prosecution and defence statements, the judge&#8217;s interjections and summing up, and cross-examination of witnesses, represents an unprecedented opportunity to study the full proceedings of a trial that, due to its being considered &#8216;so disgusting to every mind possessing even the smallest portion of delicacy&#8217; could not have been published at the time.</p> <p>Thanks to Michael, Joe, and Alberto for their work on these books.</p> ]]></content:encoded> <wfw:commentRss>https://wp.nyu.edu/library-dlts/2023/06/29/verbatim-report/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item> <title>Sir William Jones Manuscripts collection: 18th Century comparative linguistics studies</title> <link>https://wp.nyu.edu/library-dlts/2023/06/14/sir-william-jones-manuscripts-collection-18th-century-comparative-linguistics-studies/</link> <comments>https://wp.nyu.edu/library-dlts/2023/06/14/sir-william-jones-manuscripts-collection-18th-century-comparative-linguistics-studies/#respond</comments> <dc:creator><![CDATA[Carol Kassel]]></dc:creator> <pubDate>Wed, 14 Jun 2023 21:13:22 +0000</pubDate> <category><![CDATA[Archives and Special Collections]]></category> <category><![CDATA[Digitization]]></category> <category><![CDATA[News]]></category> <category><![CDATA[digital preservation]]></category> <category><![CDATA[linguistics]]></category> <category><![CDATA[Persian]]></category> <category><![CDATA[Sanskrit]]></category> <guid isPermaLink="false">https://wp.nyu.edu/library-dlts/?p=2089</guid> <description><![CDATA[<div class="entry-summary"> We just published the manuscripts of Sir William Jones (1746-1794), a British lawyer in India and pioneer of comparative linguistic studies. Included are documents in Sanskrit and Persian relating to astrology, botany and law. </div><div class="link-more"><a href="https://wp.nyu.edu/library-dlts/2023/06/14/sir-william-jones-manuscripts-collection-18th-century-comparative-linguistics-studies/" class="more-link">Continue reading<span class="screen-reader-text"> &#8220;Sir William Jones Manuscripts collection: 18th Century comparative linguistics studies&#8221;</span>&#8230;</a></div>]]></description> <content:encoded><![CDATA[<div>We just published items from the Sir William Jones Manuscripts collection:</div> <div>&nbsp;</div> <div><a href="http://dlib.nyu.edu/findingaids/html/fales/mss_301/">http://dlib.nyu.edu/findingaids/html/fales/mss_301/</a></div> <div>&nbsp;</div> <div>William Jones (1746-1794) was a British lawyer in India and pioneer of comparative linguistic studies. The Sir William Jones Manuscripts consist of various types of documents related to his linguistic studies, many in their original language with translations or notation by Jones. Included are documents in Sanskrit and Persian relating to astrology, botany and law.</div> <div>&nbsp;</div> <div>Thanks to Michael for his careful handling and photography and to Joe and Alberto for moving these images through the pipeline. Thanks to Rasan for helping us solve a tricky problem with the workflow.</div> ]]></content:encoded> <wfw:commentRss>https://wp.nyu.edu/library-dlts/2023/06/14/sir-william-jones-manuscripts-collection-18th-century-comparative-linguistics-studies/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item> <title>Syria collection published</title> <link>https://wp.nyu.edu/library-dlts/2023/05/10/syria-collection-published/</link> <comments>https://wp.nyu.edu/library-dlts/2023/05/10/syria-collection-published/#respond</comments> <dc:creator><![CDATA[Carol Kassel]]></dc:creator> <pubDate>Wed, 10 May 2023 21:16:33 +0000</pubDate> <category><![CDATA[Archives and Special Collections]]></category> <category><![CDATA[Digitization]]></category> <category><![CDATA[News]]></category> <category><![CDATA[digital preservation]]></category> <category><![CDATA[gelatin silver prints]]></category> <category><![CDATA[Syria]]></category> <guid isPermaLink="false">https://wp.nyu.edu/library-dlts/?p=2091</guid> <description><![CDATA[<div class="entry-summary"> Acquired from photographer Xenia Nikolskaya and researcher Heba Habib, the Syria Collection includes 1,255 photographs including black and white gelatin silver prints and color photographs. </div><div class="link-more"><a href="https://wp.nyu.edu/library-dlts/2023/05/10/syria-collection-published/" class="more-link">Continue reading<span class="screen-reader-text"> &#8220;Syria collection published&#8221;</span>&#8230;</a></div>]]></description> <content:encoded><![CDATA[<div>Another collection from our partner <a href="https://akkasah.org/en/">Akkasah, the Photography Archive at New York University Abu Dhabi</a>:</div> <div>&nbsp;</div> <div><a href="https://dlib.nyu.edu/findingaids/html/akkasah/ad_mc_036/">https://dlib.nyu.edu/findingaids/html/akkasah/ad_mc_036/</a></div> <div>&nbsp;</div> <div>From the finding aid:&nbsp;&nbsp;</div> <blockquote> <div>Acquired from photographer Xenia Nikolskaya and researcher Heba Habib, the Syria Collection includes 1,255 photographs including black and white gelatin silver prints and color photographs, as well as 25 negatives. The images document the social and cultural life of Syria from the 1940&#8217;s to the 1980&#8217;s. They portray religious ceremonies, daily life, street scenes of Damascus, archaeological sites, schools, picnics, family trips, costume parties, historic sites, and landscapes. There are also 10 photographs taken in Lebanon.</div> </blockquote> <div>&nbsp;</div> <div>Thanks to Michael and his team for their QC work and oversight of the digitization process, and to Nicole, Joe, and Alberto for moving these materials through the pipeline.</div> ]]></content:encoded> <wfw:commentRss>https://wp.nyu.edu/library-dlts/2023/05/10/syria-collection-published/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item> <title>A new home for Calabash</title> <link>https://wp.nyu.edu/library-dlts/2022/09/14/a-new-home-for-calabash/</link> <comments>https://wp.nyu.edu/library-dlts/2022/09/14/a-new-home-for-calabash/#respond</comments> <dc:creator><![CDATA[Alex]]></dc:creator> <pubDate>Wed, 14 Sep 2022 19:33:33 +0000</pubDate> <category><![CDATA[News]]></category> <category><![CDATA[Data Migration]]></category> <category><![CDATA[Discoverability]]></category> <category><![CDATA[Faculty Digital Archive]]></category> <category><![CDATA[Metadata]]></category> <guid isPermaLink="false">https://wp.nyu.edu/library-dlts/?p=2050</guid> <description><![CDATA[<div class="entry-summary"> In the summer of 2020, Digital Scholarship Services was approached by NYU professor Jacqueline Bishop about finding a new home for Calabash: A Journal of Caribbean Arts and Letters. Multilingual and focused on centering unheard voices, Cal abash was a&#8230; </div><div class="link-more"><a href="https://wp.nyu.edu/library-dlts/2022/09/14/a-new-home-for-calabash/" class="more-link">Continue reading<span class="screen-reader-text"> &#8220;A new home for Calabash&#8221;</span>&#8230;</a></div>]]></description> <content:encoded><![CDATA[<figure style="width: 939px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="size-full wp-image-2052" src="https://bpb-us-e1.wpmucdn.com/wp.nyu.edu/dist/a/5498/files/2023/02/3b9995_8e3289be7b0697950d40fd05257fdcc1.jpg" alt="" width="939" height="1320" srcset="https://wp.nyu.edu/library-dlts/files/2023/02/3b9995_8e3289be7b0697950d40fd05257fdcc1.jpg 939w, https://wp.nyu.edu/library-dlts/files/2023/02/3b9995_8e3289be7b0697950d40fd05257fdcc1-213x300.jpg 213w, https://wp.nyu.edu/library-dlts/files/2023/02/3b9995_8e3289be7b0697950d40fd05257fdcc1-728x1024.jpg 728w, https://wp.nyu.edu/library-dlts/files/2023/02/3b9995_8e3289be7b0697950d40fd05257fdcc1-768x1080.jpg 768w, https://wp.nyu.edu/library-dlts/files/2023/02/3b9995_8e3289be7b0697950d40fd05257fdcc1-400x562.jpg 400w" sizes="(max-width: 939px) 100vw, 939px" /><figcaption class="wp-caption-text">Cover of an issue of Calabash: A Journal of Caribbean Arts and Letters.</figcaption></figure> <p>In the summer of 2020, Digital Scholarship Services was approached by NYU professor Jacqueline Bishop about finding <a href="https://archive.nyu.edu/handle/2451/62241">a new home for <i>Calabash: A Journal of Caribbean Arts and Letters</i></a>. Multilingual and focused on centering unheard voices, <i>Cal</i></p> <p><i>abash </i>was a pioneering journal showcasing poetry, literature, and visual arts from across the Caribbean. The journal, which Dr. Bishop edited from 2000-2008, had since ceased publishing, and the NYU server that had been hosting the site was due to be retired.&nbsp;</p> <p>A team consisting of Zach Coble (Head, Digital Scholarship Services), Jonathan Greenberg (Digital Scholarly Publishing Specialist), Marii Nyrop (Digital Humanities Technology Specialist), Kate Pechekhonova (Senior DevOps Engineer), Alexandra Provo (Metadata Librarian for Arts &amp; Cultural Heritage Resources), Nick Wolf (Interim Co-Head of Data Services), and Deb Verhoff (Digital Collections Manager) got together to migrate the article PDFs and metadata. After discussing options, checking with our colleagues about the decision to collect this material for the library, and consulting with the new <a href="https://wiki.library.nyu.edu/display/DLTS/Digital+Collecting+and+Preservation+Planning">Digital Collecting and Preservation Planning working groups</a>, the team decided to move the content to our Faculty Digital Archive. That solution would allow for the Libraries to preserve the journal content while providing access and discovery at the journal, issue, and article level, using the Libraries’ existing infrastructure.</p> <p>While it is not unusual to need to migrate content when systems become obsolete, this request required us to adapt existing workflows and develop some new ones. To kick off the work, we created a <a href="https://docs.google.com/spreadsheets/d/1iU4hmNVhvpGKP-jV9E_x4NjkjmqZ6euK2z69-9-suAw/edit?usp=sharing">metadata application profile</a> outlining the fields we wanted to include in the records. Since there were no existing metadata records, to populate some of the identified fields in a more automated way, Alex modified a web scraping script from a recent NYU Libraries Library Carpentry workshop in order to <a href="https://github.com/aprovoNYU/calabash_webscraping">extract article metadata</a> from the <i>Calabash </i>website. The metadata was further prepared using OpenRefine, open source software for data transformation and cleanup (and Alex’s favorite tool). Meanwhile, Marii and Zach pulled the PDFs from the <i>Calabash </i>site, removed cover pages, and stored the files in a <a href="https://github.com/nyu-dss/calabash/tree/main/data">Github repository</a>. With Nick’s help, we used some of the scraped article metadata to register DOIs for each article through Crossref. Finally, we inserted the DOIs into our metadata spreadsheets and created file directories for each issue so that Kate could use <a href="https://github.com/DSpace-Labs/SAFBuilder">SAF builder by Peter Dietz</a> to prepare files and metadata for bulk ingest into our instance of DSpace, NYU’s Faculty Digital Archive.</p> <p>The migration work was multifaceted, iterative, and cross-departmental. To collaborate, we relied on Github and Google Sheets. Along the way, we encountered some challenges, such as data that wouldn’t scrape, a need to reorder names, and decisions about which FDA import method to use. These challenges pushed us to learn more about web scraping, OpenRefine, and the DSpace import process. The scripted and semi-scripted methods we used got us part of the way there, but not quite all the way. To reach the finish line, in 2021 and 2022 we had the help of two outstanding students from the <a href="https://wp.nyu.edu/librarydualdegree/">NYU/LIU Palmer Dual Degree program</a>, Vita Kurland and Katherine Santana, who enhanced the descriptive metadata to improve discoverability so that the journal&#8217;s rich content can now reach a wider audience.</p> ]]></content:encoded> <wfw:commentRss>https://wp.nyu.edu/library-dlts/2022/09/14/a-new-home-for-calabash/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> </channel> </rss>

Pages: 1 2 3 4 5 6 7 8 9 10