18 May 2021
W3C - Data
Data Catalog Vocabulary (DCAT) v3: 2nd Public Working Draft
This message is to update you on the work of the W3C Data Exchange Working Group [1] and to ask for your help in reviewing progress on the third revision of the RDF vocabulary for data catalogs, DCAT, that was published on 04 May 2021. The Second Public Working Draft of the revision is available at https://www.w3.org/TR/2021/WD-vocab-dcat-3-20210504/
The revision of DCAT is part of a group of deliverables described in the Charter [2], but it can be read as a stand-alone recommendation on how catalogs of resources should be published on the web..
This version especially focuses on the areas of versioning [3] and dataset series [4]. The list of changes since the first public working draft of 17 December 2020 is available at [5].
In reviewing the draft, it might be helpful for you to keep in mind the initial "Use Cases and Requirements" document that we are working towards [6], the issues log associated with this milestone in the development of the recommendation [7] and the remaining issues [8].
We welcome feedback along the following lines:
- Do you agree with the direction of travel of this revision of DCAT?
- Are there any areas where we could improve what we have done? [please illustrate]
- Are there any areas where you think the proposal is wrong or could lead us into developing proposals that are erroneous? [please give examples and reasons]
- Are there other use cases for data catalos and datasets descriptions that we have not considered? [please illustrate]
Please also feel free to make any other comments and suggestions regarding the draft.
Please, send comments through github issues (https://github.com/w3c/dxwg/issues) or through email at: public-dxwg-comments@w3.org
Best wishes
[1] https://www.w3.org/2017/dxwg/wiki/Main_Page
[2] https://www.w3.org/2020/02/dx-wg-charter.html
[3]https://www.w3.org/TR/2021/WD-vocab-dcat-3-20210504/#dataset-versions
[4] https://www.w3.org/TR/2021/WD-vocab-dcat-3-20210504/#dataset-series
[5] https://www.w3.org/TR/2021/WD-vocab-dcat-3-20210504/#changes-since-20201217
[6] https://www.w3.org/TR/dcat-ucr/
18 May 2021 9:00pm GMT
04 Feb 2021
W3C - Data
W3C/OGC Publishes Maps for the Web Workshop Report
Making maps a first-class object on the web is a shared goal of W3C and the Open Geospatial Consortium (OGC). The two organizations recently collaborated to hold a W3C-OGC Joint Workshop on Maps for the Web and have published their report.
Workshop co-chair Peter Rushforth, Technology Advisor, Canada Centre for Mapping and Earth Observation, Natural Resources Canada, which sponsored the workshop said, "Improving browser-based maps on the web through a standards-approach will bring benefits to multiple industries, governments and to the Accessibility community. We look forward to continuing our collaboration with W3C and the global maps community to make this vision a reality."
Through live presentations, panel discussions, and pre-recorded videos, workshop participants revealed and highlighted many requirements, and proposals for Web platform maps as a range, that begins with maps for the web supporting real-world-feature -based accessibility requirements for persons with disabilities.
The range of requirements presented also included: rendering, performance, internationalization, privacy, styling, discovery, augmented reality, sensor integration and the need and proposals for standardized, declarative markup as well as associated procedural interface(s) supporting these requirements.
Workshop participants acknowledged that incremental staging of specification, polyfilling and native development of requirements is an essential ingredient in the potential for success of the initiative.
"Maps are a massive enabler to combine geospatial information on the web, across industries, devices, and new application areas such as virtual and augmented reality," said Ted Guild, W3C staff contact for the Spatial Data on the Web Interest Group and workshop co-chair. "A W3C workshop is often the first stage in standardization, leading to the formation of a working group and that is the aspiration here."
As an outcome, the workshop participants seek to initiate a cross-community (W3C, OGC among others) working group that will define a roadmap to specify and implement native Web maps, based on the fundamentals, objectives and characteristics of the open Web. The ongoing work was initiated and has been incubated in the W3C Maps for HTML Community Group since late 2014. Anyone interested in participating in the Maps for the Web discussions should join the free W3C Community Group.
W3C thanks our sponsor, Natural Resources Canada, the Program Committee, our co-host, the Open Geospatial Consortium, and all the participants for making this event a success.
04 Feb 2021 6:57pm GMT
06 Nov 2020
W3C - Data
Cognitive AI - mimicking how we think
Cognitive AI focuses on functionally modelling human memory, reasoning and learning inspired by the evolution of neural systems [...]
06 Nov 2020 11:30am GMT
03 Mar 2020
W3C - Data
Emergence of the Sentient Web and the revolutionary impact of Cognitive AI
The talk starts by looking at the current situation with RDF vying with Labelled Property Graphs (LPG) [...]
03 Mar 2020 1:00pm GMT
04 Feb 2020
W3C - Data
Data Catalog Vocabulary (DCAT) Version 2 Published Today
Today the W3C Dataset Exchange Working Group (DXWG) published version 2 of the Data Catalog (DCAT) vocabulary as a W3C "Recommendation". DCAT gives people and machines a specific and domain-independent approach to create catalogs that express the core elements of a dataset description in a standardized way that is suitable for publication on the Web, and enables cross-domain interoperability by being used either on its own or alongside, as a complement to other data catalog standards. Thanks to this, DCAT facilitates effective search and retrieval, and permits easy scaling up of the query process either through "frictionless" aggregation of dataset descriptions and catalog records from many different sources and domains, or by applying the same query across multiple catalogs and aggregating the results. These patterns can also be varied slightly so as to provide communities with tailored approaches to the dataset catalog that respect the specific nuances of a particular type of data.
Version 2 builds on the initial work published in 2014 by providing, among other things, classes of descriptors that can be used for data services, and a wider set of relationships characterizing datasets and their temporal and spatial aspects. It also removes the constraints that were inherent in the prescribed use of some vocabulary terms for relationships (properties) that were present in its original version, so making their usage pattern more flexible.
Although the expectation is that dataset publishers will want to revise their existing catalogs, in line with their general activities of curation and update to make use of the additional features available in version 2, compatibility between the new version and the earlier version of the DCAT vocabulary has been preserved.
The WG has also made an effort in (i) providing multilingual descriptions of the different terms and properties, facilitating their application across the world; and (ii) explaining the alignment to the Schema.org vocabulary, which is the metadata set most widely used by search engines to optimize the indexing of Web content, and now increasingly being adopted also in data catalogs.
Within just a few years from its first release in 2014, DCAT has become recognised as a key interoperability standard for data catalogs in many countries and organizations. Search engine providers are using it to identify data assets to catalog, and publishers are using it to make their materials more findable. Going forward, the WG expects the incorporation of classes to describe data services into the model will make DCAT an increasingly useful tool in data science and provide a well-trodden path for those implementing the FAIR Principles to follow.
The DXWG appreciates hearing about any implementations of catalogs using DCAT v2. We would also like to know about any errors that you find or problems that you experience so that these can be fed into the ongoing management of version 2, and potentially influence changes to be made in version 3, whose work has just started. You can provide feedback on errors or difficulties you experience with DCAT v2 to the WG either by email to public-dxwg-comments@w3.org or through the dedicated errata page. For new use cases and other issues, please contact us via email or by submitting an issue in the dedicated GitHub repository. We hope that you find this standard a useful addition to your data publications.
04 Feb 2020 1:28pm GMT
23 Oct 2019
W3C - Data
AIOTI, ISO/IEC JTC1, ETSI, oneM2M and W3C Collaborate on Two Joint White Papers on Semantic Interoperability Targeting Developers and Standardization Engineers
Digital transformation holds huge benefits for enabling organizations to be more efficient, more flexible, and more nimble in responding to changes in business and operating conditions. This involves the need to integrate heterogeneous data and services throughout organizations. Semantic interoperability addresses the need for shared understanding of the meaning and context.
To support this, a cross-organization expert group involving ISO/IEC JTC1, ETSI, oneM2M and W3C are collaborating with AIOTI on accelerating adoption of semantic technologies in the IoT. The group has very recently published two joint white papers on semantic interoperability respectively named "Semantic IoT Solutions - A Developer Perspective" and "Towards semantic interoperability standards based on ontologies". This follows on the success of the earlier white paper on "Semantic Interoperability for the Web of Things."
The editor of the white papers, Martin Bauer, says, "We identified two groups that are vital to the successful adoption of semantics - developers and standardization engineers. Developers often lack the background, so the white paper gives them a step-by-step introduction on how to develop semantic systems. Standardization engineers can profit from the group's experience on developing ontologies, explaining how ontology experts and domain experts have to work together to develop ontology-based semantic standards."
23 Oct 2019 10:24am GMT
13 Jun 2019
W3C - Data
Dataset Exchange Working Group Is Making Progress
What are the issues?
The history of computing has been one associated with the realisation that information aggregated was often information with increased value. This led to the conflicting positions of those wanting to merge data from diverse sources to distil more value, and those wanting to prevent the merging of information to retain privacy or other control of processing, or to prevent inappropriate use of data that was in some way felt unfit for general processing. The approach taken by those wanting to exchange datasets bilaterally was to prepare some data interchange agreement (DIA) that explained how the model of data from one party would fit with the data models of the other. This agreement would also cover licensing, and other caveats about the use of the data. This DIA was often a large text document with tables, and often had hand-written signatures to establish the authority of the agreement between parties. This approach changed radically with the advent of the World Wide Web and the scope it provides for dataset exchange at global scale between millions of computers and their users. The advent of the open data movement was the natural progression of this where both citizens and administrations were keen to establish the conditions where significant economic benefit could be obtained from the re-use of public sector information.
The research environment has followed a similar journey as teams and institutions have discovered not only the benefit of being able to aggregate information, but have also been encouraged to make their datasets available as part of the research reproducibility and research transparency agendas. However, in a similar way to the usage agreement aspect of the DIA, Data Sharing Agreements (DSA) have been brought in, particularly in areas such as genomics and other health-related areas where funding bodies such as the US National Institutes of Health have a set of policies for researchers to comply with.
Where is the earlier work?
The provision of guidelines for administrations on how to publish 'open data' was pivotal to the W3C development of the 2017 recommendation on how to publish data on the web that built on the previously developed first version of the W3C standard vocabulary for publishing data catalogs on the Web (DCAT), published three years earlier. The European Commission and national governments adopted this standard for catalogs. In some cases, however, they felt certain elements were missing and they often also wanted to specify which controlled vocabularies to use. This led to the creation of 'application profiles' through which a data publisher could supplement the DCAT vocabulary with elements taken from vocabularies developed in other standardisation efforts, and when necessary also add further constraints. There are a large number of individual application profiles centred on DCAT for data catalogs of individual national administrations or specific dataset types, such as statistical (StatDCAT ) or geospatial (GeoDCAT ).
DCAT Version 2
In 2017 W3C realised that there would be benefit in re-examining the whole situation with dataset exchange on the web and chartered the Dataset Exchange working group [DXWG] to revise DCAT and to also examine the role and scope of application profiles in requesting and serving data on the Web. The revision of DCAT is now in the late stages of the standards development process. The latest public Working Draft is available and readers are encouraged to make themselves aware of this work and provide feedback to the public mailing list at public-dxwg-comments@w3.org and/or as Github issues .
Anything else to think about?
In addition to the DIAs and DSAs, another acronym associated with the process of dataset exchange is "ETL" - this is the Extraction, Transformation and Loading effort that is often required when a party gets a datasets to be merged that use different models or schemas. ETL is often a considerable effort that is only necessary because the parties are using different models. But generally, ETL takes effort but doesn't add value. The ideal situation would be to avoid this essentially nugatory work. There is already a mechanism on the Web for a server to be given an ordered set of choices of the serialisation type for returning a dataset to a client (e.g. preferably XML, if not that then CSV, and if not that then the default HTML). This "content negotiation" has a specific mechanism that depends on providing this ordered list to the server, generally through use of the HTTP "Accept" header. Given that the "application profiles" mentioned earlier describe the model that a dataset such as a data catalog has to conform to for it to be valid in a certain context, there is need for a mechanism where a client can use a list of profiles to indicate to a web server which profile or profiles it would prefer the data provided by the server to adhere to. Since this provides a contract between a data provider and a data consumer, the indication of profile preferences could, amongst other things, perhaps reduce the need for an ETL step in dataset exchange.
Content Negotiation By Profile
The DXWG is also making strong progress in developing a recommendation for "Content Negotiation by Profile" and the Second Public Working Draft was published for community review on 30th April 2019. Readers are encouraged to read this draft and to provide their feedback. Thus, for both specification, we will welcome feedback, including positive support for the proposal being developed, to the public mailing list at public-dxwg-comments@w3.org and/or as Github issues .
Conclusion
Through the combination of improved DCAT for facilitating the discovery of datasets, guidance on profiles (which is still in the early stages of development), and a recommendation on mechanisms that could allow a client to provide an ordered set of choices of profile or model for the datasets it wants returned from servers, the DXWG is working to provide a framework of standards and recommended designs/strategies to help developers improve automation in discovering and merging datasets to deliver the increased value that people expect to gain from data aggregation, whilst at the same time providing a mechanism to automate the selection of models that might reduce the ETL requirement or deliver another preferred model.
Acknowledgements: Thanks to Alejandra Gonzalez-Beltran and Lars G Svensson for helpful comments
13 Jun 2019 1:46pm GMT
19 Mar 2019
W3C - Data
JSON-LD Collaborative Work and Feature Timeline
The JSON-LD Working Group met in person in Washington D.C. in early February, and came to consensus on many of the open issues and requests for enhancement. While standards work is never done, it is clear that a great amount of progress has been made since the group's charter was approved. This led to the decision to freeze the list of features being considered for the 1.1 specifications, two weeks after the next public working draft is available. This is anticipated in the next two weeks, and hence no requests for additional functionality (as opposed to issues with existing functionality) will be considered from the middle of April, 2019. It is, therefore, important to get any issues onto our radar as soon as possible in our github repository, for them to be included in the current phase of work.
Work has taken place in the group around features being requested from other W3C groups, and also to ensure that our decisions are in keeping with existing best practices and thinking around the consortium. In particular, the Verifiable Claims group have been a constant source of inspiration and important functionality in the area of ensuring that JSON developers with no knowledge of the underlying graph model can be productive, one of the core missions of the JSON-LD effort. We also worked with the Web of Things WG to ensure that their use cases could bet met. Finally, following from the shared discussions at TPAC in Lyon, we worked with the TAG to ensure that expectations around JSON-LD embedded within HTML documents were being met, and were intuitive given that much broader context.
The intent of the WG is to move into Candidate Recommendation before the end of summer, 2019. Given the number of issues that have still to be resolved, we consider this to be very achievable, and will set up our discussions in Japan to focus on horizontal review, engagement with other groups and non-normative documents instead of coming to consensus around technical issues.
The group would like to extend their heartfelt thanks to the TAG, and the Verifiable Claims and Web of Things working groups, for their productive engagement with our joint issues. It is heartening to see that focused, technical groups can quickly align in thinking and action, and is a testament to the W3C's attention to internal culture and process. We would also like to thank the Folger Shakespeare Library for their kind hosting at the last minute, due our original hosts needing to cancel because of challenges caused by the US government shutdown.
Face to Face meeting of the W3C JSON-LD Group, February 6-7, Washington D.C.
From left to right: David Lehn, Robert Sanderson, Harold Solbrig, David Newbury, Ivan Herman, Jeff Mixter, Gregg Kellogg, Adam Soroka.
19 Mar 2019 3:37am GMT
11 Mar 2019
W3C - Data
W3C Strategic Highlights: Strengthening the Core of the Web (Web of Data)
(This post is part of a series recapping the October 2018 W3C Strategic Highlights and does not include significant updates since that report.)
Data is increasingly important for all organizations, especially with the rise of IoT and Big Data. W3C has an extensive suite of standards relating to data that were developed over two decades of experience. These include core standards for RDF, the Semantic Web and Linked Data.
The JSON-LD Working Group has recently started to work on updating the JSON-LD specification which covers a JSON based serialization of RDF. This is assisting the W3C Work on the Web of Things which is seeking to use JSON-LD to describe things as objects with properties, actions and events, independently of the underlying protocols.
A W3C Workshop is being planned for early 2019, on emerging standardization opportunities, e.g. query languages for graph databases and improvements for handling link annotations (property graphs), different forms of reasoning that are suited to incomplete, uncertain and inconsistent knowledge, support for enterprise knowledge graphs, AI and Machine Learning, approaches for transforming data between different vocabularies with overlapping semantics, signed Linked Data Graphs, and work on improving W3C's role in respect to hosting work vocabularies and ontologies.
You can also see All Data specifications.
11 Mar 2019 9:00am GMT
11 Feb 2019
W3C - Data
Australians! Who are you, and who has the right to know?
This may be biggest question yet facing our wealthy, Western democracies in their passionate embrace of digital services, digital consumerism and digital personas. To quote from the Australian Consumer Policy Research Centre (CPRC), "87% of Australians were active internet users in 2017, more than 17 million use social networking sites, and 84% are buying products online." Yet the CPRC also tells us that 95% of people want companies to give us ways to opt out of personal information collection. Most Australians do not want their phone numbers, messages, or device identities shared with others. When asked, they are painfully aware of the unfair trade-off in access to digital services versus their own right to privacy and how their data can be used. Paradoxically, people still use Facebook and other platforms despite privacy fears. People seem very willing to sacrifice privacy for a service they want, or perhaps they do it opportunistically without full awareness of the price they pay.
In stark contrast to what Australians want, suddenly George Orwell's fictitious 1984 is very, very possible. Some would say that China is already there with its Social Credit and mass surveillance systems, now being rolled out. 7 million "untrustworthy" people have already been denied access to state-owned transport services, and others denied schools, home purchase and access to their own financial assets under the system. Shortly, playing too many video games or spending money frivolously may be punished.
In the press, the theme seems to be emerging that computer scientists in the tech industry are somehow the blind evil-doers, and that it is the job of the social scientists and government to control their actions to bring some sense to protect the rights of civilians. Indeed, the very welcome EU GDPR and the proposed Australian Consumer Data Right head in this direction. But while the tech industry may be set up as the bad guy, actually socially-conscious tech experts are working towards technological solutions.
For example, mathematician Cathy O'Neil brought our attention to the ethical risks of data analysis and machine learning in her very readable 2016 book "Weapons of Math Destruction". In response, computing researchers like Roger Clarke design governance processes to ensure checks and balances are applied judiciously. Here, we highlight research for people with disabilities, research for people in particularly sensitive situations, and W3C activity for people who want to manage their own identity.
While a lot of the concern about privacy relates, quite reasonably, to the potential applications of private information to harm individuals, what may be less well recognised is the fundamental role of identity in relation to privacy. Identity is simply who you are, or at least who the network thinks you are. Identity is what makes it possible to link independent pieces of information about a person, thereby greatly improving the richness of detail and inference about the person, and also to link a digital persona to a physical, natural person. It's why German regulators ordered Facebook to desist from combining Facebook, WhatsApp and Instagram data only last week (ABC Radio News 8/2/2018).
Once, identity assurance was solely the function of churches, the only bodies keeping population records, but later it became a function of government as birth certificates, marriage certificates, passports and drivers' licences rapidly become the government-assured identity credentials. In 1985-6 the Australian government attempted to introduce a streamlined national identity system known as the Australia Card, aimed at government interactions, but it would have become the identity assurance for commercial Australia too. The huge public backlash in the name of "privacy" mothballed the Australia Card, probably forever. But in a more modern form, the concept remains very much alive. Was that the first time we really thought about privacy in national terms?
Nowadays the Australian Commonwealth, through the DTA, is building a Trusted Digital Identity Framework that aims to "achieve something no other trust framework in the world has achieved to date - to support the establishment and reuse of a digital identity across many different contexts, systems and environments." Mandatorily, identity comprises a family name, given names and date of birth, with optional contact details like email address. Some privacy of these core attributes is admitted by computed (derived) attributes in place of core attributes (e.g. "age over 21"). This would be a government-run system, that looks a lot like variations on the existing "100 points" system currently required by banks for identity assurance, together with a network of trusted identity providers and reliant service providers, all overseen by the government. Although it is a digital approach, and offers a scale of assurance levels, is it fundamentally different to the Australia Card?
We also see, in practice, the dominance of large US IT corporations as digital identity providers (have you ever logged in to a new service via your Google or Facebook credentials?). Mobile telco companies , too, are very keen to become authoritative identity providers, for displaced persons at least. After all, they are equipped with built-in biometric services and follow you everywhere.
The battle for your identity has begun.
The W3C is spearheading an alternative model for "self-sovereign" identity. The idea behind self-sovereign identity is to place many details of a person's online identity in their own hands. A W3C workshop on Strong Authentication and Identity in December last year, brought together Web standards developers, Government, various industry reps, academics, lawyers and others. The self-sovereign approach, using blockchain-driven verifiable credentials, can be used to verifiably attach claims about you (such as "age over 21" ) to various parties that attest to the truth of the claims. This is like an unfakeable way of having you transmit an academic transcript from a University to your employer. It relies on decentralised identifiers that are both sound and administered in a decentralised way by multiple independent authorities, so there is no need for your university to know whether you drive a car, or have travelled to Botswana, ever. A citizen who privately uses Ashley Madison services, for example, can separate that digital identity from the one used for their University and employment.
So, who is managing your privacy and identity on the Web?
The W3C Australia Office and the Australian National University are presenting a morning workshop on the topic in Melbourne, Canberra and Sydney in February as part of their Future of the Web roadshow series. Chaired by W3C evangelist Vivienne Conway, we have Marcos Caceres from Mozilla speaking on privacy protection in browsers like Firefox, and James Bligh from Redcrew on the consumer-controlled privacy protection in the new Consumer Data Rights. Kylie Watson from Delloite will ask what else government can do for its citizens. David Cook from Edith Cowan University speaks up for vulnerable citizens, looking particularly at the trade-offs uniquely required of people with a disability. From the Australian National University we have speakers Lesley Seebeck, Alex Antic, and Alwen Tiu presenting new research on technology in cybersecurity and the privacy it needs to protect for particularly sensitive matters. Grant Noble and David Hyland-Wood of ConsenSys present the emerging notion of self-sovereign identity and the exciting W3C activity making it happen. We round up with a discussion panel chaired by Richard Schutte of startup coLab4, in which we invite your active participation.
For more information and registration, visit https://cecs.anu.edu.au/events/w3c-anu-future-web-who-managing-your-privacy-and-identity-web
11 Feb 2019 6:15am GMT
03 Dec 2018
W3C - Data
The Digital Enterprise - W3C Graph Data Workshop
Data and data services are increasingly strategically important for businesses. This is reflected in initiatives such as the EU's Digitising European Industry Initiative, and claims by Mckinsey that by 2025, digitization is expected to contribute $2 trillion to US GDP. Meanwhile, China plans to boost its trillion dollar digital economy to drive job creation in sectors such as big data and artificial intelligence. On 4-6 March 2019, in Berlin, W3C will seek to bridge different communities to create a fresh view of the challenges ahead and the standards that will be needed to overcome them.
The drive to realise the benefits of digitization necessitates addressing the challenge for managing many heterogeneous data sources distributed across the enterprise. Whilst businesses have relied on relational databases for many years, SQL and RDBMS are cumbersome when it comes to rapidly evolving requirements. As a result we have seen the rise of NoSQL databases that address the need for flexible handling of unstructured data. The need to create links across data is fuelling rapid growth in graph database solutions. Unfortunately, there is lack of portability across these solutions.
The W3C workshop will bring together experts in relational databases, property graphs, RDF/Linked Data, big data, and artificial intelligence and machine learning with a view to forging a shared vision for future needs for graph data, and alignment on graph data query languages. We will discuss what's needed for positioning RDF as an interchange format between different graph database solutions, making RDF easier to use by the vast majority of developers, and opportunities for blending symbolic and statistical approaches for tackling the challenges of real-world data that is incomplete, uncertain, inconsistent and includes errors.
If you are interested in being part of the discussion and helping to shape the future of data on the Web, you are urged to submit a position statement in response to the call for participation preferably before the seasonal break this month, and a hard limit of Friday, 11 January 2019.
03 Dec 2018 3:15pm GMT
27 Jul 2018
W3C - Data
The World Wide Success That Is XML
Most of the XML Working Groups have been closed by now; this year saw XQuery and XSLT close, their work successfully completed.
As we wind down work on standardizing the XML stack at W3C it's worth looking at some of what we have accomplished and why. W3C XML, the Extensible Markup Language, is one of the world's most widely-used formats for representing and exchanging information. The final XML stack is more powerful and easier to work with than many people know, especially for people who might not have used XML since its early days.
Today, XML tools work with JSON, with linked data, with documents, with large databases (both SQL/relational and NoSQL), with the Internet of Things and in automobiles and aircraft and music players. There are even XML shoes. It's everywhere.
XML can be stored in very efficient databases and processed with a highly optimized query language (XQuery, and its younger cousin JSONiQ), can be transformed with an efficient declarative tree manipulation language (XSLT 3), orchestrated in pipelines (XProc), delivered with one of the most effective compression schemes around (EXI, with low entropy server-side parsing), formatted to PDF with both XSL-FO and CSS, and all of these things can be done both with proprietary applications and with open source software.
How did we get here?
The Web SGML Working Group was formed to solve a specific problem: to agree on a subset, or profile, of the large and complex SGML specification that could be shared on the Web and displayed in browser plugins. There were two such plugins at the time, one from SoftQuad (Panorama) and one from EBT/Inso that was never released. Unfortunately it was difficult to construct an SGML document that both plugins would display - there was a clear need for a standard.
We were not trying to replace HTML. We weren't even expecting native XML support in Web browsers. Nor were we trying to make a format for interchange of data or for remote procedure calls.
XML has some redundancy in its syntax. We knew from experience with SGML that documents are generally hard to test, unlike program data, and the redundancy helped to catch errors early and could save up to 80% of support costs (we measured it at SoftQuad). The redundancy, combined with grammar-based checking using schemas of various sorts, helped to improve the reliability of XML systems. And the built-in support for multilingual documents with xml:lang was a first, and an enduring success.
XML, XSL-FO, XSLT, XQuery, XML Schema, XProc, EXI, all of these Working Groups included world experts and had strong industry representation. They were guided by experienced chairs.
Most of the work has finished: people are using the specifications in production and the rate of errata has slowed to a crawl. XQuery, XSLT and EXI ended this year. But just because the specification work is ending doesn't mean XML is ending! It means XML is at a stage where the technology is mature and widely deployed. People aren't reporting many new problems because the problems have already been worked out.
For sure some of the more recently-published specifications are still rolling out: XSLT 3 is very recent, but there was good implementation experience when it was published as a Recommendation. EXI Canonicalisation was published as a Recommendation this past June, and because EXI can be used to send just about any stream of parse events over the wire much more efficiently than compressing the interchange syntax, this spec was eagerly awaited.
But for the most part, it's time to sit back and enjoy the ability to represent information, process it, interchange it, with robustness and efficiency. There's lots of opportunities to explore in making good, sensible use of XML technologies.
XML is everywhere.
Thank you to all who have contributed.
Liam Quin, leaving W3C this week after almost 17 years with XML.
27 Jul 2018 2:56am GMT
02 May 2018
W3C - Data
W3C and the W3C Australia Office bring you a GREAT Smart Cities Tour!
If you're in Australia, or have colleagues who are, and haven't registered for this great event, you need to do it soon! We've got a great panel of speakers who are looking at a topic that is HOT in Australia! The good news is if you're a W3C Member, then this event is free - what a deal!
Our world is increasingly being shaped by the vast amount of data being produced in every aspect of our lives. As more devices get connected through the Internet of Things (IoT), harnessing big data in an integrated way can offer valuable insights that can help achieve smart city goals. This comes with important and interesting challenges to solve in order to actualise the smart city vision. Challenges include data collection, integration and privacy.
The World Wide Web Consortium (W3C), in partnership with the Australian National University invites you to Future of the Web: Data Drives the Smart City. Data Drives the Smart City explores the challenges and progress made in the technology and underpinning standards framework needed to enable smart cities. You will hear from leading experts in the field on how challenges are being tackled.
Dates
- Monday 7 May, Melbourne ANU House, Level 11, 52 Collins St, Melbourne VIC, Monday 7 May, 9.00am - 13.30pm (8.30am registration)
- Tuesday 8 May, Canberra University House, 1 Balmain Cres, Acton ACT, Tuesday 8 May, 9.00am - 13.30pm (8.30am registration)
- Thursday 10 May, Sydney Quest North Ryde, Atlantis, 58-62 Delhi Road, North Ryde NSW, Thursday 10 May, 9.00am - 13.30pm (8.30am registration)
Topics
Topics to be addressed include perspectives from Government, tech industry leadership, Web standards for spatial data and city sensing, technical solutions to privacy management, and smart grid futures. A panel session will discuss capacity building for smart cities.
Speakers
Speakers include Dr Ian Oppermann (NSW Chief Data Scientist), Dr Ole Nielsen (ACT Chief Digital Officer), J. Alan Bird (W3C Global Business Development Lead), Dr Mukesh Mohania (IBM Distinguished Engineer in IBM Research), Dr David Hyland-Wood (Blockchain Protocol Architect, Consensys), Dr Lachlan Blackhall (Entrepreneurial Fellow and Head, Battery Storage and Grid Integration Program), Dr Kerry Taylor (Chair, W3C Spatial Data on the Web), Dr Peter Christen (Professor, Data Mining and Matching, ANU), Christine Cowper (Principal Consultant with Information Integrity Solutions), and Dr Armin Haller (W3C Office Manager, ANU).
Schedule
- 8.30 Registration
- 9:00 Start time
- 12:45 Lunch
- 13:30 Finish
Coffee/tea on arrival, morning tea and lunch will be provided.
See the program details.
Registration: $290. Free to W3C members and ANU and alumni.
02 May 2018 12:22am GMT
15 Feb 2018
W3C - Data
ODRL: A Path Well Travelled
In 2000 - the height of the dotcom boom and bust - a small startup in Australia embarked on a journey to build the next generation digital book store (ebooks back then) and promote an open ecosystem for the trading of digital assets. They formed the Open Digital Rights Language (ODRL) Initiative and quickly assembled like-minded organisations to promote ODRL as an open, interoperable industry standard for licence expression.
The Open Digital Rights Language (ODRL) is a policy expression language that provides a flexible and interoperable information model, vocabulary, and encoding mechanisms for representing statements about the usage of content and services. ODRL describes the underlying concepts, entities, relationships, and terms, that form the foundational basis for the semantics of ODRL policies. Policies are used to represent permitted and prohibited actions over a certain asset, as well as the agreed obligations required to be met by parties. In addition, policies may be limited by constraints (e.g., temporal or spatial constraints), and duties (e.g. payments) may be imposed on permissions.
In 2001, W3C held a Workshop on Digital Rights Management for the Web at which ODRL first floated the idea of forming a Working Group to take the language through the W3C recommendation track, but a Member Submission was the primary outcome.
Years pass by, but the ODRL Initiative still continues its work, promoting the language to standards groups and sectors interested in machine-readable rights. The Open Mobile Alliance was the first standards group to adopt the language for mobile media and numerous groups in the publishing industry and education followed. A new major version was also drafted to also support multiple encodings.
More years pass, and the ODRL Initiative becomes one of the first groups to take up the new W3C Community Group program and becomes the ODRL Community Group (CG) in 2011. The ODRL CG finalises the major new version of the ODRL specifications which now take a broader business view by supporting more generic policies to cover additional industry requirements. The International Press Telecommunications Council (IPTC) becomes one of ODRL's industry partners on the journey.
Finally in 2016, after some "will they, won't they" moments, W3C charters the Permissions & Obligations Expression (POE) Working Group to take the current work of the ODRL CG and gain more industry consensus and semantic improvements with revised specifications. The POE WG worked diligently on their charter and has now delivered the final W3C Recommendations:
The ODRL version 2.2 recommendations include major additions, such as supporting prohibitions and additional duties for obligations, remedies, and consequences of policy usage. Parties and assets now support collections of items, and constraints can now explicitly refine actions and support logical relationships. Semantically, ODRL is now fully based on an RDF ontology and the vocabulary terms updated to meet new use cases. ODRL Profiles - the extensions by communities - has now been greatly improved and is the primary way industry sectors will exchange policies.
The figure below shows the relationship between the key entities of the ODRL Information Model:
The timing of the ODRL recommendation has been fortuitous, as we see so many new opportunities ahead. With the W3C merger with the International Digital Publishing Forum (IDPF) - the organisation that specified the ebook standards - ODRL now can collaborate more closely with the original driver of the language to further address the publishing industry needs.
The European General Data Protection Regulation (GDPR) poses huge challenges for machine-interpretable privacy statements. W3C is hosting an upcoming workshop on Data Privacy Controls and Vocabularies with ODRL Profiles as a topic for modeling personal data and rules in this context.
Using ODRL to express common licenses will provide a popular way for users to instantly express machine-readable licenses. Examples of licenses are the GNU General Public License, the Creative Commons licenses, and the UK Open Government Licenses. ODRL will represent the text license as semantics statements that will also address license compatibility for derivations based on the reuse of multiple digital assets licensed under different terms.
We cannot write a blog post today without mentioning Blockchain. Specifically the euphoria around "smart contracts". This could be one of the greatest impacts that ODRL could achieve. Currently "Smart contracts" are non-interoperable programming languages that do not support any contract models, nor are semantically-based. ODRL can fill this void by providing a robust, policy-based, business-agreement model to capture the business semantics. There has already keen interest in using ODRL from the Dot Blockchain Media group, and more recently, KodakONE.
After two years, business-as-usual work now returns back to the ODRL Community Group, and they now plan to:
- Promote ODRL V2.2 to existing and new sectors/industries
- Nurture an ODRL implementors community
- Support development of ODRL Profiles (and host for smaller communities)
- Formal Register of ODRL Profiles
- Collaborate with W3C on ODRL errata maintenance
- Plan for future major enhancements to ODRL (V3.0)
It has been a long 18-year journey for some. Every path, every unexpected turn, has been rewarding. As we now savour the final outcomes, we also look forward to the next journey for ODRL.
15 Feb 2018 7:36am GMT
31 Jan 2018
W3C - Data
W3C/ERCIM at Boost 4.0 kick off meeting
W3C/ERCIM is one of fifty organizations participating in the Boost 4.0 European project on big data in Industry 4.0 which kicked off with an initial face to face meeting at the Automotive Intelligence Center in Bilbao on 30-31 January 2018. Boost 4.0 will demonstrate the benefits of big data in Industry 4.0 through pilots by major European manufacturers. W3C's role focuses on standardisation, data governance and certification, with a central role for rich metadata as the basis for semantic interoperability across diverse sources of data. This follows on from W3C's involvement in the Big Data Europe project.
31 Jan 2018 6:31pm GMT
03 Jan 2018
W3C - Data
W3C study on Web data standardization
The Web has had a huge impact on how we exchange and access information. The Web of data is growing rapidly, and interoperability depends upon the availability of open standards, whether intended for interchange within small communities, or for use on a global scale. W3C is pleased to release a W3C study on practices and tooling for Web data standardization, and gratefully acknowledges support from the Open Data Institute and Innovate UK.
A lengthy questionnaire was used to solicit input from a wide range of stakeholders. The feedback will be used as a starting point for making W3C a more effective, more welcoming and sustainable venue for communities seeking to develop Web data standards and exploit them to create value added services.
03 Jan 2018 5:44pm GMT