The Azimuth Project
Examples of semantic web applications and environment

This page was part of an article in progress, written by Nadja Kutz. The final version of the article can be found at the arXiv. To discuss this article, please visit the corresponding thread at the Azimuth Forum.

idea

The following is an outline of possible strategies in using semantic web techniques and math with regard to environmental issues. The article uses concrete examples and provides a rather basic treatment of semantic web techniques.

semantic web and math

HTML, internet and semantic web

There are currently quite some efforts to describe webcontent (see e.g. semantic web stack at wikipedia) for “machines”. That is the data which is stored in the web is only partially machine processable, since for example if one has the word “Azimuth” then a computer can eventually list all websites, where this term occurs and thus a computer can in this way even generate a kind of “meaning context” for that word (i.e. describe it in some sense), however double meanings etc. can make the retrieval and use of data with respect to its meanings, difficult. That is the word “Azimuth” would appear in scientific applications in its scientific meaning, but could also refer to the Azimuth project, Azimuth blog or forum.

There are a couple of frameworks which deal with the description of data. There is for example the HTML language, which describes “things” which can e.g. be understood by web browsers. The HTML language was important for the development of the internet. For example with the help of HTML one can connect data content (e.g. text on an internet website) with a certain “location” (the internet website) which is indicated by a webadress, i.e. an URL. The webadress allows to identify an entity of data, like a text, as a “data thing” or “data object”, moreover HTML “tells” the browser how to link to that data object. In particular this approach allows to mutually link data (objects), that is for example the text content (here the “data”) of a website may be linked to the text content of another website. The linking of data is thus an important paradigm (see Linked Data on Wikipedia and Linked Data.org) within the development of the web. The semantic web uses a similar approach. Here individual terms instead of whole websites may get an “identification”. This identification can be a URL or another type of a unique ressource identifier (URI).

The linking of objects is of course a rather old paradigm, which manifests itself last not but least in knitting, weaving etc. techniques of fabrics. The mathematical treatment of “linked objects” started with the development of graph theory around 1736.

the ressource description framework (RDF)

A popular approach to linking within the (semantic) web is thus to use representations of graph structures. For example the socalled RDF triples can be thought of as representations of links in a bead chain, i.e. a triple (thread,bead,thread), where for example the threads may be tight together (“linked”) or like in a “fabric” where also the beads may be tight together. Mathematically a triple can be seen as the graph entity (node 1,arrow,node 2), which is somewhat like the bead link, however the role of bead and thread is here rather inverted. In other words a triple consists of three “things” namely thing 1, which “is” node 1. Within RDF related applications thing 1 is usually called a “ressource” or “subject”, thing 2 which “is” the arrow is called a “property”, “predicate” or “connection” and thing 3, which “is” node 2 is usually called an “object”, so the triple can be written as (ressource, property, object) or (subject, predicate, object). Ressource and property of a triple should “have” a URI, that is the subject or the predicate should be represented by its adress, the object can have a URI or just be some “literal” like a number or a text. An RDF model (i.e. a collection of connected triples with URI’s) can be thus mathematically be seen as a labeled graph, where labels may however be allowed to be identical.

RDF ontologies and possible uses

Assignments of URI’s to triples are usually collected in a “vocabulary” also called “ontology”, which is often stored at one website. A rather wellknown ontology is for example at the project DBpedia, where terms within wikipedia are being “identified” that is individual terms in Wikipedia are linked to a “meaning,” which is stored at an URL. Another example is the yago2 ontology at the project yago-naga. Moreover there are partial commercial projects or fully commercial projects, like the freebase project, Cyc, True Knowledge etc. which provide semantically structured content. It should be mentioned that there is a big commercial interest in having better machine accessible and interpretable data especially in connection with social data. Harvesting consumer/customers data in conjunction with general data (like about products, health and psychological statistics etc.) allows to better target and bind potential customers. The New York times article How Companies Learn Your Secrets gives a good overview and outlook on nowadays practices. It should also be mentioned that there is of course a similar interest from various organisatorial bodies for all sorts of other applications and applications which may not be a priori commercially oriented. So there is interest for process optimizations, organisatorial issues, data mining etc. See e.g. a video of an application from palantirtech.com.

semantic web and math

There are various RDF notations for a labeled graph, like for example N3, Turtle, RDFa, JSON-LD.

There had been some attempts to give a semantic meaning to math data like content Math-ML, Open Math or OMDoc see also W3 math, however overall there seems to be not much math description within standartized RDF. There are RDF decriptions of mathematical entities for the use in libraries see e.g. here, there is some research for expressing mathematical expressions in RDF like by Massimo Marchiori (see also the comments in a corresponding question on stackoverflow).

There are meanwhile quite a lot of RDF descriptions of general data like names etc. in the web. The EU project LOD2 gives some overview also about some powerful tools for accessing the data like RDF browsers etc. (see also the below mentioned examples). However an RDF linkability of mathematical content with a more general content outside of mathematics seems to be still missing.

Apart from this there are a lot of graph theoretical questions reappearing in this context.

semantic web, math and environment

The author of this essay is rather critical, when it comes to the use of machines and computer, however it is clear that the observation and assessment of e.g. economic, environmental and social ongoings is done with the help of computers. Like mathematical evaluation, mathematical modelling, data storage etc. is done with computers.

An important question is thus here in particular how to link data sets to mathematical tools. RDF techniques may here be rather useful.

Before getting too general - a concrete example. Imagine an eco designer or an eco engineer or a group of ecodesigners/engineers who want to design a product, which shall be environmentally friendly. There are some eco-taxonomies (see for example this taxonomy matrix, which can of course be encoded in an RDF graph) within ecodesign or environmental engineering which guide this process, as there are a lot of things to consider when designing an environmentally friendly product. Lets assume the engineer follows the taxonomy and at one point thinks about which materials to use. Like he or she would need for certain parts some specific material, which doesn’t deform too much during a temperature rise, which has a certain density (one could think here also about including data which for example describes materialsources which are available in the vicinity of the designers geolocation) and so on. So he or she could follow a link in an eco-taxonomy which links to “materials” and look for materials, while eventually wanting to include extra data like data of a concrete 3D model (volume, texture, puffyness, curvature etc.) or a geolocation.

For the example of materials the “mathematical content” (namely the data related to materials) is usually rather simple, that is the content often consists rather just of numbers or data tables which are related to concrete measurements than to complicated functions. The 3D model data may contain Bezier splines etc.

There are commercial tools for assessing e.g. the density of a material like the tools by grantadesign. These tools seem to be even connected to LCA tools and CAD-CAx tools like Autodesk (see Grantadesign history). However there seems to be not so much variety yet for such kind of tools.

Moreover these tools are quite expensive, in particular they are way too expensive for someone in a developping country, which is of course in itself a problem for the promotion of better environmental solutions.

Thus one would like to have cheaper or even better an open source equivalent of similar tools for assessing material data, LCA data etc. Also without knowing the commercial softwares it is rather clear what kind of tool/applications are needed.

One problem however in finding open source variants of such tools is already the availability of data. Like there are some efforts to openly collect material data like e.g. core materials UK but these efforts are still rather small. So one has in particular to ask wether it is eventually senseful to crowdsource the collection of the respective data and to ask how to process and use it. It is here in particular to be asked in which sense departments of materials science and engineering could be involved in data gathering.

semantic web, wikipedia and other possible data sources

One rather good crowd sourced data source is Wikipedia. For example the infoboxes (i.e. the little boxes on the right side of a Wikipedia page) contain already quite some information. Amongst others the above mentioned DBpedia project has tools to parse a wikipedia site, especially with regard to the structured content in the infobox and people can collaboratively assign (“map”) a RDF “vocabulary” (DBpedia) and graph description to the data description in the infobox, which can then be part of the DBpedia database. So for example if there is the description “density” of lets say iron in the infobox one could e.g. assign a DBpedia subject URI to “density” and an URI for “density of iron near room temperature” and a property URI for “has the g·cm−3 value”. Furthermore one could think about e.g. assign a triple (“density of iron near room temperature” , “has g·cm−3 value”, 7.874). For the word “density” (and a few other items from the chembox (the wikipedia infobox for chemicals)) the mapping has been done, the datatypes are however at the time of writing not (yet) copied from the DBpedia Mappings Wiki to the DBpedia ontology. Doing this for many materials would allow one to get a plot for the density of materials near room temperature (given that one has an appropriate visualization tool for this task). So one could for example surf through DBpedia or wikipedia (eventually include some prefiltering, especially for DBpedia) and store the corresponding semantic surfhistory as a semantic graph with a corresponding tool (see also the project pathway), select data within the graph which is concerned with the material data to be investigated, extract the wanted information with a facetted search and visualize it with the correspondig mathematical tools. For the example this would mean that one e.g. “surfs to materials” then to for example “plastic” and “steel” and then searches for the densities of plastic and steel and plots them together in a diagram. To some extend it is possible to surf the Yago database (link to Yago browser), however also here sofar not much content with respect to math/physics/engineering has been added, moreover visualization issues like the investigation of the surfhistory are sofar only in a limited way possible.

Likewise one could think about having or linking (in a standartized way) to datasets in Wikipedia, which contain e.g. a 1-n dimensional array (i.e. an ordered list) of measured values of density over temperature. Here one would already need some mathematical RDF description for “1-n dimensional array”, which probably actually exist in some ontology but probably not necessarily in some explicitly mathematically informed RDF ontology (there are for example a few mathematical entities as specified in the xml schema listed in the dbpedia datatypes ontology, however for example the URI for integers at http://www.w3.org/2001/XMLSchema#int is mathematically not very detailled). It should also be mentioned that since an “ordered list” is a rather basic graph it is currently discussed to be included in the serialization of RDF (like within JSON-LD). With the concept of named graphs one could rather straightforward define something like “mathematical RDF classes”. The auhor does not know of any approaches where this had been realized for the case of mathematical standard entities. Like one could for example envisage infoboxes in Wikipedia or an ontology somewhere else which detail mathematical entities like “function” as an RDF graph (this graph could contain RDF subgraphs like domain, range etc.). It is also not clear to the author of how much the inclusion of datasets like within the datahub is envisaged to be linked to Wikipedia. There is a newer initiative called Wikidata which seems to envisage to collect structured data, which is linked to Wikipedia. According to the german computer magazine Heise Online Wikidata is sponsored with 1.3 million Euros by private sponsors, 650.000 Euros come from the Allen Institute for Artificial Intelligence (not to confuse with the Allen_Institute_for_Brain_Science), 325.000 Euros from the Gordon and Betty Moore Foundation and 325.000 Euros from Google.

In this context it is interesting to look at the countrywise volonteer participation within the project DBpedia like via the language mapping statistics of DBpedia at the DBpedia Mapping Sprint, Summer 2011. Here it seems on a first glance that Greece and Portugal are quite leading the race.

It should also be mentioned that there seem to be efforts by the International Organization for Standartization to standartize semantic information, see ISO/IEC 19788, called MLR. The works are however not freely available. Parts can be purchased via ISO/IEC 19788-1:2011. In their description on this site it is outlined:

“The primary purpose of ISO/IEC 19788 is to specify metadata elements and their attributes for the description of learning resources. This includes the rules governing the identification of data elements and the specification of their attributes.”

Likewise one could e.g. think of providing an RDF ontology for blender in and output (which could e.g. include transformations in order to monitor e.g. the environmental result of reducing material).

There exist a couple of tools which allow to visualize RDF data also in non-tabular form, like in visual depictions of graphs. Some of these allow also to collaboratively edit RDF data (or semantic web data in a different form) however there are not overly many collaborative examples. The examples usually vary all in their use of different computer languages and methodologies. The author of the original version of this essay was supervising a student project where such a tool has been created. The name of the tool is Mimirix.

Mimirix and other examples

Mimirix

http://www.daytar.de/art/MIMIRIX/screenshotCar.png
screenshot of DBpedia graph in the socalled "Circle mode"-Client view within Mimirix



http://www.daytar.de/art/MIMIRIX/projectsGrid640mal400.png
screenshot of Mimirix interface for the user graph management in the socalled grid view

Mimirix has been programmed as a Bachelor Project in the winter semester 2012 at the Berlin school HTW by the students Thomas Alpers, Martin Bilsing, Igne Degutyte, Florian Demmer, Felix Griewald and Johanna von Raußendorff. It is a collaborative environment which enables the processing of RDF data in a rather visual, intuitive and user friendly way. It is not adressing a semantic web informed audience, but suitable for a general audience. The frontend of Mimirix allows multiple users to log into the environment and to create and share RDF graphs. A video which gives a demonstration about the functionality of the client is provided in a Mimirix client video. The software is envisaged get some free/open source licence.

Given the circumstances (one semester bachelor project) the environment is of course still in a kind of experimental stage. In particular within the client application the filtering and the perceptional access facilities of RDF data are sofar only in a rather limited form available, like merging, pruning, zooming in and out and the distortion of subgraphs, or facetted searches etc. are not yet possible. In particular it is for example also not yet possible to filter numerical RDF data and/or for example possible to visualize numerical data (possibly in connection to other data or data applications, like axis names, units, notepads, commentboxes, wikis etc.) within a client application. Applications for the customization of the visualization of RDF data (like for example for representing nodes and links by photos, icons etc.) as well as an easier integration of custom clients are additional features, which can be likewise envisaged. This is in particular important as visualizations are amongst others rather dependend on individual aspects, like visibility, cultural and theme related specifications, graphic design awareness/literacy, taste etc.

The client of Mimirix is written in JavaScript and uses amongst others d3 and jQuery, the frontend uses also jQuery and is hooked to a CMS, which had been taylored in CakePHP to suit the special collaborative tasks of Mimirix. The backend consists of the CMS, a LAMP bundle (Debian) and an arc2 triplestore. An access control plugin for arc 2 was not (yet) finished.

The software is currently on a server of the Berlin school HTW. It is not publicly available, since the client is not yet safe enough against code injection and sofar no javascript expert could be found to help out with these very specific questions. There are other issues, like serverspace limitations, legal issues and economic considerations which haven’t been yet sufficiently adressed in order to admit a full public access to Mimirix’ collaborative environment. Mimirix can be installed on any local server and thus the collaborative environment is a aprori not bound to one special server and database.

DBpedia, YAGO, Freebase and others

There are few (collaborative) interactive visualizations within DBpedia and YAGO (like the mentioned YAGO2 browser), however to the authors knowledge there are (apart from the browser) sofar no realizations of applications, which would also be suited for an audience, which has less experience with semantic web techniques.

There are some (interactive) visualizations for the above mentioned freebase and there are eventually more underway. In particular freebase (some tools are proprietary) offers RDF access and a JSON based API. The freebase blog however has been inactive for more than a year, likewise the community website holds currently an outdated warning sign.

The above mentioned LOD2 project provides a comprehensive overview about its tools at their LOD2 technology stack. Here for example semantic data conform browsers, wiki’s, search tools etc. can be found. There is however to the authors knowledge sofar also no collaborative environment, which is comparable with Mimirix or the below mentioned Deepa Mehta project.

semanticweb.org has a also an overview about tools for exploring semantic data. semanticweb.org itself hosts a semantic wiki called MediaWiki. The semanticweb.org community seems to be linked to the Wikidata project.

Deepa Mehta and Mimirix

The project Deepa Mehta which is a 12 year old Berlin based non-profit organization has worked on visualizations of semantic data. It currently provides, like the DBpedia project or the Mimirix project, visualizations of semantic data which are hooked to a collaborative environment. People at Deepa Mehta currently think about using parts of Mimirix, since their underlying data model doesn’t yet support RDF. Deepa Mehta is however GPL which could make the integration of further software development (like for the above mentioned material applications) into commercial applications difficult. For cooperations with companies, which could for example provide more detailled data and services (like material or environmental data, like supply chain impacts (see e.g. the article on greenbiz “Puma’s Eco-Impacts Report Kicks the Ball Forward on Transparency in exchange for the use of the software this could thus be a handicap. On the other hand if further software development is mainly done by volunteers as it had been the case for Deepa Mehta and to some extend also for Mimirix then it is of course to be asked why one should “donate” such a software to potential profit makers for free.

category: meta