How to be normal: data models in edtech

What do you call the people who go to school? In real life we use a range of terms - student, pupil, learner, child - and we intuitively understand how they relate to each other. In fact in written text it’s often considered good form to vary the words used to avoid jarring repetitions.

However, in software development, variety is less appealing, particularly when you’re trying to connect multiple products together. Think about it: system X has decided to use the term “learners”, but system Y prefers “students”. So integration would have to involve mapping one concept to another, and that adds cost and complexity. The prize for education technology, therefore, is a normalised data model (meaning a commonly agreed format) which everyone can adopt. This challenge is central to Assembly’s purpose, and this blog post explains why we’re building our own standard, rather than adopting an existing one.

Before we get to the reasons for our choices, it’s helpful to explain the challenge in a little more detail. When designing a “normal” world for edtech, the concept of “student” is actually the least of the problems. Think for a moment about what students do when they’re at school. First of all they show up in a room and their attendance is recorded. What should a system call that attendance grouping? A tutor group? A registration group? Their class? Should it be different for primary and secondary schools? It’s no mean feat to build a data model that neatly encapsulates these concepts.

At Assembly, we believe that education data has sat in silos for too long (that’s problem 2 on our launch blog, by the way). Schools increasingly want to buy a range of school improvement products, like curriculum and assessment tools, but if these bolt-ons don’t connect to their core Management Information System (or MIS), it can take ages to configure them by re-entering pupil names and so on. This can lead to human error, and doesn’t help schools in their struggle against the data management burden. Moreover, without a central analytics data store, information remains fragmented, limiting the potential for insight.

The Assembly platform solves this problem by connecting education software to school MIS, and by allowing data from multiple systems to be pooled in our platform for deeper data analysis. But to do this we need a standardised data model, so that common concepts can be linked with the minimum of fuss.

We looked long and hard for a data standard that we could use because, well, nobody wants to create a new standard unless they absolutely have to.

_xkcd_ _on how standards proliferate_

However, our conclusion was that while there are some great initiatives out there, none of them is quite right for the problem we’re solving. Take the Systems Interoperability Framework (SIF) for example. This is a global, open-sourced education specification that is gaining increasing traction in the US, Australia and the UK. So you might think that we would just adopt SIF as our data model, right?

Sadly, it’s not that simple, because SIF is not at all simple. It is wonderfully comprehensive, and brilliant for many purposes. But Assembly needs something simple-to-use and school-oriented, whereas SIF has been abstracted to fit a wider range of use cases, and in that process of abstraction it has become more complicated.

To explain what I mean, let’s come back to the question of what to call a student. If I've understood correctly, the SIF 2.0 data model (part of the latest SIF 3.0.0 UK implementation infrastructure) has a generic root concept of a person, and person data is held in an element called PersonalInformation. Then there are three person object types: LearnerPersonal, WorkforcePersonal, and ContactPersonal. So if you’re developer using SIF, and you want something simple like a list of students along with their names and Unique Pupil Numbers (UPNs), you’d need to look at both the LearnerPersonal object (where you find the UPN) and also the associated PersonalInformation element (which contains the name).

For our customers, that’s overkill. They don’t want to have to type “LearnerPersonal” (15 characters) every time they want to reference a student, and they DEFINITELY don’t want to have to then also join that to PersonalInformation (19 letters) every time they want to include the pupil’s name. Really, they just want the student to be called something intuitive like, say, “Student” (7 letters). And they want the name and UPN fields to be part of the same object.

To reiterate, we’re big fans of SIF for certain purposes and we like its comprehensiveness - it just doesn’t quite give the school app developer the experience we’re after.

There are other open-source standards out there: for example, Ed-Fi in the US. We like it and we’ve found the team there to be really co-operative - but it’s currently US-focused and therefore not geared towards our initial audience of UK schools.

So we’re coming up with our own common data standard. The Assembly data model is being released in stages, as we add data scopes to our platform, and we’ll also augment it based on customer feedback. We are committed to open-sourcing as much of our IP as possible, and that includes our data model, so you are welcome to borrow and reuse the concepts and terminology within your product as you see fit. We'd also really welcome your feedback - do drop us a line if you have thoughts about how it should develop.

For too long, we’ve all used different terms to describe the same thing. We hope that in edtech, standardisation will become the new normal...