Arches Getty Brownbag Talk

Ben O’Steen, Getty Digital - Feb 20, 2019
What is “Arches”?

Background
• Developed jointly by the Getty Conservation Institute (GCI) and World
Monuments Fund (WMF), with code development carried out by
Farallon (SF based company)
• Arches is an Open Source web-based platform (using Django web
framework, Postgres database and Elasticsearch search service).
• Arches can create digital inventories that describe types, locations,
extent, cultural periods, materials, and conditions of heritage
resources and establish the numerous and complex relationships
between those resources.
• Uses the “CIDOC Conceptual Reference Model” at its core, with the
option to use other ontologies such as the Linked Art ontology.

Linked Open Usable Data – Linked Art
A Linked Open Usable Data model, collaboratively designed to work
across cultural heritage organizations, that is easy to publish and
enables a variety of consuming applications.
Design Principles:
• Focused on Usability, not 100% precision / completeness
• Consistently solves actual challenges from real data
• Development is iterative, as new use cases are found
• Solve 90% of use cases, with 10% of the effort
(Thanks go to Rob Sanderson & David Newbury for slide text)

What is Arches good at?
• Allows users to both create complex models to define their data using
a web user interface (UI).
• Creates web forms based on these complex models, which also can
be customized using the web UI.
• Allows users an easy way to enter, find and edit data, again using the
web UI.
• It does not require detailed knowledge of RDF, Linked Data, SPARQL,
Triplestores, and RDF representational formats, paradigms which
other Linked Data tools often implicitly require users to ‘just know’.

What is Arches not good at?
• Currently, Arches does not provide any graph-based query services,
such as GraphQL or SPARQL.
• The Reference Data Manager (that holds and manages Concepts and
Authoratative terms) requires more development better to suit our
needs.
• Arches has not been tested at huge scales.
• The default form ‘widgets’ can be verbose.
• No bulk editing as yet.

A Note on Concepts and Controlled Terms
• These are not handled as Resources in Arches (ie there is no Concept
model)
• They are managed by the Reference Data Manager (RDM) side of
Arches and imported and exported as SKOS representations, and
require a textual label as well as an identifier to be used within
Arches.
• Currently, Resources cannot refer to concepts* that are not already
present in the Arches RDM.
* Or other Resources that are not in Arches

Application Structure
• Arches consists of a web-application layer,
bound to a client-side javascript layer.
• The javascript layer relies on a REST-based
API
• Stores data in a Postgres database, utilizing
Postgres’s native JSON and GIS handling.
• Search is performed using an Elasticsearch
service, that indexes key information as
determined by the application and the
configuration.

Working with other systems
• As of Arches version 4.4 (releasing this week, or next), it will have
acceptable JSON-LD support for import and export.
• JSON-LD can be thought of a means to represent and transfer the complex
information in a way that developers can make use of.
• Arches will be used for what it is good at, with other suitable systems
or applications being used where sensible. A valid, semantically
accurate export and import is a key component for this.

Models in Arches
An Arches Model can be thought of like a template or pattern for a
given type of Resource. It is akin to a database or XML schema in
concept.
The data held in Arches is best thought of as a graph of data rather
than as tabular data (like an Excel or CSV file). A graph is a collection of
points (nodes) connected by relationships (edges).

Models are Hierarchical Trees
The compromise that Arches makes is that Models
are defined as hierarchical trees of information.
They are still a graph of information, but a
resource will ‘own’ all of the new information in its
tree, and reference other resources rather than
duplicate the information they own.
Example: An artist resource will hold information
about the artist’s name, but will only point to the
resources in the system that represent their
paintings.

Web forms for data entry/editing in Arches
They are generated from
• ‘Models’ (that structure data for a given Resource you want to describe),
• The Model’s ‘Cards’* (that hold that data within Arches)
* (I’ll come back to what cards are. They are an Arches application concept, not a Linked Data one)

Web forms for data entry/editing in Arches
They are generated from
• ‘Models’ (that structure data for a given Resource you want to describe),
• The Model’s ‘Cards’* (that hold that data within Arches)
• ‘Branches’ are reusable structures that you use when building Models.
• For example, you might make a ‘Place’ model, and add a Branch that defines how to
describe a contemporary geographical location.
• Branches are reusable templates. They help avoid multiple differing structures
creeping in for basic things like the geographical location, or a record of the creation
of an artwork.

The top node of a model
Some Nodes
Datatypes
A model can have this sort of structure,
which can be bespoke, and also pull in
pre-made branches (like templates).
Datatypes are how you define where in
the model data is stored (or ’collected’),
and what sort of data this must be.
More datatypes can be developed,
added, and have their own behavior.
Some examples: some plain text, a date,
a true or false, a number, a link to
another Resource in Arches, or a link to
a Concept.

Model (‘Graph’ view) Model (Card view)

How does Arches present this?
• As a Form view (for data entry)
• By default, relies on the structure Model to structure the form (1:1 mapping)
• A Datatype in the model is linked to some code/html called a Widget
• A Datatype (python class) maintains the data within the system, a Widget is the
HTML/JS component that lets people edit or enter data for it.
• As a Report (to view/interpret the data)
• ‘Preview’ report (for admins)
• ‘Report’ (for everyone else)
This is coordinated by some code called a ‘Card Component’.

Aspects to Customize?
• A model can specify:
• What datatypes (text, date, Concept, Resource, etc) are used to hold data
• Whether a branch (eg Place, Name, etc) is searchable, is required,
permissions and how it is represented as a Card (or potentially, Cards)
• A Card can specify:
• What Card Component to use (currently there is only a default one)
• Big functional changes possible but equally, big development required
• What Widget to use to gather data for a datatype
• (including configuration options for the widget)
• Whether a Card can hold multiple values (eg multiple Places, Names, etc)
• What Report template to use for a Model.

What we can develop currently (for v 4.3.x):
Widgets and Datatypes are the easiest targets for development
• Datatypes:
• Pro: We can write our own datatypes for holding more complex data
• Pro: We can change how the data is stored and indexed
• Pro: We can code how this data should be represented in or imported from
JSON-LD
• Con: We lose the ability to structure data in the Model in the user interface.
For example, a new datatype for a Name (holding first name, middle
names, etc) would be specified in code, not in the user interface. To
change it, would require changing the code.

Widgets and Datatypes are the easiest targets for development
• Widgets
• Pro: A datatype can have multiple widgets written for it, allowing for easy and
safe customization.
• Pro: What widget is applied to a datatype is set in the card view for a model.
Doesn’t require development time once the widget is written and added!
• Con: Doesn’t affect how the data is held or represented in the search. It is the
interface only.

• Card Component:
• This component orchestrates a lot of things in the form interface.
• Pro: Big possibilities
• Con: Big commitment to rewrite, and does not affect search indexing
• Report:
• Defines how people can view a Resource.
• Can swap report views without unduly affecting anything else.

Forecast of developments
‘Mobile Survey Manager’ (for Arches version 4.4 Q1)
• Allow data entry and upload from mobile devices out in the field and
away from a direct connection to an Arches instance.
• Suited for cataloguing finds and features that have important
geographical data requiring GPS and an internet connection is either
unavailable, or undesired.
• This work will be carried out by Farallon.

‘Workflows’ (for Arches version 4.5, Q4 2019)
• Much more complex and controllable ‘wizards’ for form entry or data
editing.
• Will allow for a more purposed and efficient interface to be
developed for our needs. It is currently difficult to edit or create more
than one resource at a time, and this is labourious for some tasks
• This work will be carried out by Farallon.

Scaling and Performance testing and metrics (Q2-Q3 2019)
• This will be a key factor in informing how we use and deploy Arches
applications and is a priority.
• Adding more integration tests and functional tests, connecting to a
data metrics dashboard and measuring changes in scaling,
performance and load when code or deployment choices are
changed.

‘Notification Hooks’ (by Q3 2019)
• The ability for Arches to notify other users and services that
information it maintains has changed in some way.
• This is essential to keep other non-Arches systems in sync, and to
allow other business processes to be activated when necessary.
• Getty Digital and Farallon have already reached a consensus that this
is necessary and will collaborate with the rest of the user community.

Summary:
• Arches has good potential for allowing complex data to be edited and
curated through simple web form interfaces.
• JSON-LD import and export allow for other systems to make better
use of data held within an Arches instance.
• Models in Arches are hierarchical trees of related fields, and Branches
are reusable patterns of fields and relationships.
• Data is collected in Datatypes (Arches components that defines and
manages a type of data) through Widgets (HTML/JS code that provide
the web UI for a given datatype).

Arches Getty Brownbag Talk

More Related Content

Arches Getty Brownbag Talk