The database that underlies Metaweb is fundamentally different than the relational databases that you may be familiar with. Relational databases store data in the form of tables, but the Metaweb database stores data as a graph of nodes and relationships between those nodes. Relational databases use the SQL query language and accept queries and return results using a specialized network protocol. Metaweb uses the MQL query language and communicates via standard HTTP requests and responses. The bulk of this manual is devoted to explaining MQL and demonstrating its use with the mqlread and mqlwrite web services. But because Metaweb's underlying database technology is new and different, it will be helpful if you understand the fundamental architecture of Metaweb first. This chapter explains the graph-based representation of Metaweb data, then shows how the graph of nodes and relationships can be viewed as a collection of objects. It also covers a number of other important architectural details, explaining properties, types, domains, names, ids, namespaces and access control. If you find this chapter difficult, feel free to skip ahead and skim Chapter 3 and Chapter 4 to get an overview of MQL and the mqlread service. With that context you can return to this chapter to solidify your understanding of the Metaweb architecture.
In addition to the Metaweb graph database, Metaweb servers also implement a content store. The content store is responsible for storing chunks of binary data (in SQL these chunks are called BLOBs) and is tightly integrated with the graph database. Each chunk in the content store has a corresponding node in the graph, and metadata about the content (such as its MIME type) is stored as relationships in the graph. Interestingly, the SHA2 hashcode of a chunk is used as the key into the content store, which makes it possible to test whether the store contains a specific chunk without uploading that chunk, and also prevents duplication of entries in the content store. This chapter does not cover the content store; we'll learn how to download data from the content store in Chapter 4 and how to upload data to it in Chapter 6.
When viewed at the lowest level, the Metaweb graph is a set of nodes and a set of links or relationships between those nodes. Each node has a unique identifier (so it can be named and referred to) and a record of when and by whom it was created. Other than the id, timestamp and creator, however, the nodes in the graph hold no information themselves. All the interesting data in the database is stored in the form of relationships between nodes (or between nodes and primitive values).
Graphs can be represented visually using circles to represent nodes and arrows between the circles to represent relationships. In this section, however, we will model the relationships in the Metaweb object store as a tuples of four pieces of data [2] and we will represent sets of these relationships in tabular form, where each row of a table specifies one tuple.
The table below, for example, represents information about The Police, their album Zenyatta Mondatta, and the song Driven to Tears on that album. The From column identifies the node that is the subject of the relationship. This node could also be called the "source" node or the "left" node. The Property column specifies the kind of relationship being described. (Notice that this complicates the visual representation of a Metaweb graph: each arrow in the diagram must be tagged with a property to specify what kind of relationship it represents.) The To and Value columns specify the object of the relationship. To (which could also be called "target", "destination", or "right" specifies another node, and Value specifies a primitive value such as a string of text, a number or a date. One or both of these columns may have a value, depending on the kind of relationship (the Property column) being described.
| From | Property | To | Value |
|---|---|---|---|
| /en/the_police | /type/object/name | /lang/en | The Police |
| /en/zenyatta_mondatta | /type/object/name | /lang/en | Zenyatta Mondatta |
| /guid/1234 | /type/object/name | /lang/en | Driven to Tears |
| /en/zenyatta_mondatta | /music/album/artist | /en/the_police | |
| /guid/1234 | /music/track/album | /en/zenyatta_mondatta | |
| /guid/1234 | /music/track/length | 200.266 |
The first three rows of this table describe a /type/object/name relationship, which defines a human-readable name for a node. Thus, the node we've identified as /en/the_police has the name "The Police", and the node /guid/1234 has the name "Driven to Tears". Notice that these rows have a value in the To column as well as the Value column, and that the To column refers to the node identified as /lang/en. That node represents the English language. Human readable text in Metaweb is always tagged with the language in which it is written. Thus any relationship, such as /type/object/name, that expects a human-readable text value will refer to a language node in the To column and will have a string of text in the Value column.
The fourth row in the table specifies that /en/zenyatta_mondatta has the relationship /music/album/artist with /en/the_police. In English, we might say: "Zenyatta Mondatta is by The Police". The fifth row is similar. It specifies a /music/track/album relationship between two nodes. It says: "Driven to Tears appears on Zenyatta Mondatta". Finally, the sixth row specifies the /music/track/length of Driven to Tears. Note that this relationship is not a link between two nodes, but merely specifies a number in the Value column.
The Property column of our relationship tables have specified relationships with names like /type/object/name and /music/track/album, and you may have wondered about the fact that these property identifiers look so much like node identifiers that appear in the From and To columns. One of the key features of Metaweb is that it does not pre-define the kinds of relationships that can exist between nodes. New properties, representing new kinds of relationships, can be defined at any time and by any user. What this means, of course, is that properties are themselves nodes in the Metaweb graph.
Since properties are nodes, they can appear in the From column of a table of tuples, and can have relationships themselves. For example:
| From | Property | To | Value |
|---|---|---|---|
| /type/object/name | /type/object/name | /lang/en | Name |
| /music/album/artist | /type/object/name | /lang/en | Artist |
| /music/track/album | /type/object/name | /lang/en | Appears On |
| /music/album/artist | /type/property/unique | true | |
| /music/track/album | /type/property/unique | false |
The first row in this table is interestingly self-referential. The /type/object/name property defines its own name as "Name". The second and third rows define names for the /music/album/artist and /music/track/album properties, and it is worth noting that the human-readable names of Metaweb nodes are not always the same as the last component of the node identifier.
The fourth and fifth rows of the table above specify that /music/album/artist is a unique property and that /music/track/album is not. Because /music/album/artist has a /type/property/unique property with value true, Metaweb will not allow any node to have more than one /music/album/artist property. The /type/property/unique property is another unique property: no node can have more than one /type/property/unique relationship.
On the other hand, /music/track/album has /type/property/unique set to false so Metaweb allows a node to have any number of /music/track/album properties (think of songs that are released as singles, then on LPs, and then again on compilation albums). Non-unique properties are common in Metaweb databases, and the group of nodes that are linked to a given node by the same property can be thought of as an unordered set of values.
By default, properties are not unique: that is, a property node that does not have a /type/property/unique property is treated as if it had a /type/property/unique property with the value false.
The Metaweb graph is a directed graph: each of the relationships has a direction from the From node to the To node. On a circles-and-lines representation of the nodes and relationships, each of the lines has an arrowhead to indicate the direction of the relationship. Despite the directionality of the links between nodes, Metaweb can traverse those links both forward and backward when searching the database to find results that match a query.
For example, consider the relationship between albums and the tracks that appear on those albums. We saw above this is represented in the Metaweb graph as a directed link from the track node to the album node. If we ask "what album(s) does Driven to Tears appear on?", the Metaweb database engine searches for tuples that have Driven to Tears in the From column and the /music/track/album property in the Property column, and then reports the value of the To column of any matches it finds.
But it is perfectly possible to write a MQL query that asks "what tracks appear on Zenyatta Mondatta?" (we'll see queries like this many times in Chapter 3 and Chapter 4). In this case, the database engine searches for tuples that have Zenyatta Mondatta in the To column and /music/track/album in the Property column and then reports the value of the From column for any matches it finds.
Because Metaweb is so good at traversing links in either direction, we can usually consider those links to be bi-directional. In fact, properties can be defined to represent links in either direction. The /music/track/album property models the track-to-album link in its forward direction, and it is known as a "master property". But there is also a "reverse property" /music/album/track that represents the same relationship, traversed in the opposite direction. In typical music-related queries, this reverse property is more commonly used – it is the one that allows us to ask about the set of tracks that appear on an album.
A master property and its reverse are known as "reciprocal properties", and the distinction between master and reverse does not usually matter in your queries. We'll see more about reciprocal properties in Section 3.4.4, Section 3.4.5.4 and Section 5.9. For now, we'll just note that the relationship between a property node and its reciprocal property node is defined in the Metaweb graph by the /type/property/reverse_property property [3]:
| From | Property | To | Value |
|---|---|---|---|
| /music/track/album | /type/property/reverse_property | /music/album/track |
Some properties are a rudimentary part of the Metaweb architecture and have a well-defined meaning that is enforced by the Metaweb implementation. /type/object/name, for example, defines a human-readable name for a node, and Metaweb enforces an important restriction: nodes can have multiple names, but only one name per language. (We'll learn more about the names of nodes later in this chapter). /type/property/unique is another property that is part of Metaweb's architecture: this property defines something fundamental about the behavior of another property, and Metaweb's behavior depends its value. In general, if a property id begins with /type, that is a good hint that it has architectural significance.
The vast majority of properties are not like this, however. The /music/album/artist property, for example, is not part of the Metaweb architecture at all – it was defined on freebase.com to model knowledge about music. The Metaweb implementation knows nothing about this property, and its interpretation is ultimately up to the person who is setting or querying its value (or who is writing the application that sets or queries its value). The id /music/album/artist gives us a hint about how to interpret the property and that hint is reinforced by the fact that the property has "Artist" as its /type/object/name. We can even obtain explicit instructions about the interpretation of /music/album/artist by inspecting its /freebase/documented_object/tip property, whose value is "albums recorded primarily by this artist (direct credit or under a pseudonym, but not as part of a band)".
Another way to say this is that the meaning of /type/property/unique is defined by the behavior of the Metaweb implementation, but that the meaning of /music/album/artist exists only in the minds of its users. As far as Metaweb is concerned, /music/album/artist is just another node in the graph.
In an example above, we used the identifier /en/the_police to refer to a node that has the /type/object/name "The Police". This section explains the important differences between names and identifiers.
The /type/object/name property defines a human-readable name for a node. The value of the property includes both the text and the language of the name. As we saw above:
| From | Property | To | Value |
|---|---|---|---|
| /en/the_police | /type/object/name | /lang/en | The Police |
Nodes can have more than one name, but Metaweb enforces an important constraint: a node can have only one name in any given language. (Use /common/topic/alias to define any number of nicknames, in any language, for a node.) As an example, the following table shows hypothetical English, French and Spanish names (and aliases) for the /lang/en node:
| From | Property | To | Value |
|---|---|---|---|
| /lang/en | /type/object/name | /lang/en | English |
| /lang/en | /type/object/name | /lang/fr | Anglais |
| /lang/en | /type/object/name | /lang/es | Ingles |
| /lang/en | /common/topic/alias | /lang/en | American English |
| /lang/en | /common/topic/alias | /lang/en | British English |
| /lang/en | /common/topic/alias | /lang/en | Canadian English |
Names are not unique and not expected to be: it is common for multiple nodes to have the same name. The nodes with ids /lang/en, /en/english_people, and /authority/gnis/57724 represent a language, a nationality, and a town in Arkansas and all have the name "English".
When you ask Metaweb for the name of a node without specifying a language, it returns only the node's name in your preferred language, fostering the convenient illusion that the node has only a single name. Human-readable names exist for the convenience of the human users of Metaweb, and although they are treated specially by the Metaweb architecture, they are not really fundamental to that architecture.
Identifiers, on the other hand, are fundamental. Identifiers consist of three parts: the node that is identified, a namespace and a key within the namespace. The identifier /en/the_police identifies a node in the graph that represents the band The Police. It has namespace /en and key "the_police". The identifier /type/object/name identifies a property node with namespace /type/object and key "name". /type/object is itself an identifier, with namespace /type and key "object". We can also turn this around and say that the namespace /lang and key "en" combine to define the identifier /lang/en. We'll return to the notion of an identifier as a node, a namespace, and a key in Section 2.5.9.
A key can be considered a "local name" within a namespace, and the key plus namespace pair can be called a "fully-qualified name" to distinguish it from a "human-readable name". Usually, however, the pair is simply called an identifier or id. As you have seen, Metaweb identifiers are typically written in "flat form" as strings that use a slash character to separate the namespace from key. This means that identifiers look something like Unix filenames or URLs. Identifiers are intended for use by developers and are often used in code, which is why they appear in code font in this manual.
The important thing about identifiers is that they are unique. Metaweb never allows a namespace to contain duplicate keys, which means that two nodes will never have the same identifier. A node can be associated with more than one namespace/key pair, but any given namespace and key can only be used once, and can only refer to a single node. Note that identifiers are not immutable: nodes can be given new identifiers, and identifiers can be altered so that they refer to new nodes.
Identifiers like /en/the_police are not human-readable names, but are comprehensible to technically-savvy English-speakers, like the readers of this manual. Many nodes in a Metaweb graph (and most nodes on freebase.com) have identifiers of this kind. Those that don't can be referred to using identifiers in the namespace /guid. The key that follows /guid is a string of hexadecimal digits that serves as a globally-unique identifier (or "guid") for a node. The song "Driven to Tears" from the album Zenyatta Mondatta, for example has the identifier /guid/9202a8c04000641f800000000129a87a (we called it /guid/1234 for brevity earlier in the chapter).
Every node in a Metaweb graph has a numeric guid. The guid uniquely identifies the node and is immutable – the guid of a node can never change. The guid is the fundamental identity of a Metaweb node, and at the implementation level, the tuples that define the Metaweb graph refer to nodes (in the From, Property, and To columns) by their guids, not by the identifiers that we've shown in our tables.
Since the guid is the fundamental identity of a node, the identifier of a node is not fundamental. In fact, identifiers are defined by the /type/object/key property, much as names are defined by the /type/object/name property. Here, for example, are tuples that define the identifiers /en/the_police and /en/zenyatta_mondatta.
| From | Property | To | Value |
|---|---|---|---|
| /en/the_police | /type/object/key | /en | the_police |
| /en/zenyatta_mondatta | /type/object/key | /en | zenyatta_mondatta |
| /en | /type/object/key | / | en |
The /type/object/key property expects values in both the To and Value columns. The To column is a reference to the namespace of the identifier being defined, and the Value column holds the key within the namespace. Namespaces are themselves nodes, and they have identifiers. The third row of the table above shows that the identifier of the /en namespace is defined by the "en" key within the special root namespace /. For clarity, these example tuples use ids in each of the columns, which means that they define and use an identifier at the same time. A more accurate representation of the underlying graph would use guids in place of these ids.
A proper understanding of names, ids and namespaces is critical to understand Metaweb, and we'll review and explore them in more depth in Section 3.3 and Section 5.11.
Until now we've been describing the Metaweb database as a graph of nodes and relationships, and have been carefully avoiding the word "object". But that word has appeared many times in property ids like /type/object/key and /freebase/documented_object/tip. While it is important to understand that at a low-level Metaweb databases consist of tuples that define relationships between nodes, it is usually helpful to view those nodes and relationships at a higher level through an object-oriented filter.
According to this object-oriented view, the nodes in the graph define objects, and the relationships in the graph define properties of those objects. Instead of thinking about the relationship between The Police, Zenyatta Mondatta and Driven to Tears in terms of tuples, we might think of them using pseudo-code [4] like this:
{
id: "/en/the_police",
name: "The Police",
/music/artist/album: {
id: "/en/zenyatta_mondatta",
name: "Zenyatta Mondatta",
/music/album/track: {
name: "Driven to Tears",
/music/track/length: 200.266
}
/music/album/track: {
name: "Canary in a Coalmine",
/music/track/length: 146.506
}
}
}
In this view of the data, we see an object with id /en/the_police and name "The Police". (We're using properties id and name as shorthand for the "universal" /type/object/id and /type/object/name properties that are shared by all objects. We'll have more to say about these universal properties below.) That object has an album named "Zenyatta Mondatta", and that album has two tracks (others are omitted here for simplicity) named "Driven to Tears" and "Canary in a Coalmine". Each of those track objects has a length property that specifies its length in seconds. Note that this particular view of the data relies on the properties /music/artist/album and /music/album/track. These are the reverse of the /music/album/artist and /music/track/album properties that we saw in the nodes-and-relationships view of the data.
In order to complete the object-oriented view of Metaweb nodes, we have to introduce the notion of types. A Metaweb type is a collection of related properties, and a Metaweb object can be an instance of one or more types. /en/the_police is an instance of the type /music/artist, /en/zenyatta_mondatta is an instance of /music/album, and the object that represents the track Driven to Tears is an instance of /music/track. Like properties, types are represented as nodes in the graph. Every property node is an instance of /type/property, and every type node is an instance of /type/type (which means that /type/type is an instance of itself!) Property objects use the type of which they are a part as the namespace in which their id is defined. So the properties of /music/track, such as /music/track/album and /music/track/length, have ids that begin with /music/track.
Types and properties are related via the (unique) /type/property/schema property, which defines the relationship between a property and the type of which it is a part. The reverse property /type/type/properties represents the (non-unique) relationship between a type and all of its properties.
There are two other, more important, properties that involve types. The first is /type/object/type. Like /type/object/name and /type/object/key, this is a universal property that is used on practically all object in the database. It defines the types (it is non-unique) that an object belongs to. If we ask the Metaweb engine at freebase.com about the types of the /en/the_police object, for example, we find that in addition to /music/artist, that object is also an instance of /common/topic, /music/producer, and /music/musical_group.
The /type/object/type property is important for a couple of reasons. First, the type of an object tells us what properties it is likely to have. For example, if we know that an object is a /music/artist, we know it makes sense to ask about the /music/artist/album property of the object. Second, the type of an object is a useful disambiguator. If we ask Metaweb for objects named "English", we will likely find many. We can substantially narrow our search by asking for for objects named "English" that are also instances of /type/lang.
In addition to /type/object/type, there is one other very important type-related property. The (unique) /type/property/expected_type property of any property specifies the type of the value of that property. The expected type of the /music/artist/album property, for example, is /music/album, and the expected type of /music/album/track is /music/track.
The addition of types to our object-oriented view of the Metaweb graph allows us to simplify our pseudo-code representation of it. Compare this object representation with the one at the beginning of this section:
{
id: "/en/the_police",
type: "/music/artist",
name: "The Police",
album: {
id: "/en/zenyatta_mondatta",
name: "Zenyatta Mondatta",
track: {
name: "Driven to Tears",
length: 200.266
}
track: {
name: "Canary in a Coalmine",
length: 146.506
}
}
}
We've added a type property to the outermost object, specifying that it is a /music/artist. This allows us to use the simple property name album instead of /music/artist/album. Furthermore, since we know that expected type of /music/artist/album is /music/album, we're now just using the property name track instead of /music/album/track. For the same reason, we've shortened /music/track/length to length.
Both the /type/object/type and /type/property/expected_type properties are a very useful part of the object-oriented view of Metaweb, but they are not fundamental to the nodes-and-relationships view. A node in the graph can have a relationship described by a property p even if that node does not have a /type/object/type relationship with the type that defines p. That is, an object can use a property without "declaring" itself to be a member of the type that defines the property. Properties like /type/object/name, for example, are commonly used on objects, but /type/object/type is never set to /type/object. Similarly, it is perfectly possible (and not uncommon) to define a /common/topic/alias property on an object without setting /type/object/type to /common/topic.
Also, the expected type of a property is only the expected type. The open and fluid nature of the Metaweb graph means that Metaweb cannot guarantee that the values of a property will always be members of the expected type.
/type/object is a special type that serves just to define a namespace for a set of special properties that can be used with any object. The properties with ids in the /type/object namespace are commonly used in queries on any object and are typically written in MQL queries with unqualified names – we speak of the name property and the key property, for example, instead of /type/object/name and /type/object/key. Because of the universality of this type, objects should not have their type set to /type/object. Similarly, properties should not have their expected type set to /type/object.
The following are the universal properties defined by /type/object. The most important ones have already been introduced, but they are all listed here for completeness:
name
This property defines human-readable names for the object, suitable for display to the end users. Each name is a /type/text value which holds a string and defines the human language in which it is written. The name property is special in two ways:
An object may have more than one name, but may only have one name per language. That is, it can have only one English name, only one French name, and so on.
When querying Metaweb, you may treat the name property as if it was a single /type/text value rather than a set of values. Metaweb will automatically return the object's name (if it has one) in your preferred language. Because of this special feature, the name property has /type/property/unique set to true.
key
This property defines identifiers for the object. These identifiers are intended for use by developers and scripts and are not typically displayed to end users. Each key property specifies a namespace object and a name within the namespace. Metaweb guarantees that no two objects will ever have the same identifier.
id
The id property is used to uniquely identify an object using an identifier defined by one of its keys. Identifiers are written as strings with slash characters between namespaces and names. "/type/object" is an id value, as are "/en/the_police" and "/user/docs/music/note". If you query the id of an object that has more than one key, it is unspecified which one is returned. If you query the id of an object with no keys, the value returned is a synthetic id formed by removing the hash character from the object's guid and prefixing it with "/guid/". This property is read-only, but you can define new ids for or alter existing ids of an object with the key property.
type
This property defines the types associated with the object. The object can be viewed as an instance of any of these types. Each type is itself a Metaweb object, of /type/type.
timestamp
This unique read-only property specifies when the object was created.
creator
This unique read-only property specifies which Metaweb user created the object. It has an expected type of /type/user.
permission
This unique read-only property is a link to a /type/permission object. A permission object specifies which Metaweb usergroups are allowed to alter the object. See Section 2.7 for more on users, usergroups and permissions.
guid
Every object in a Metaweb database has a globally unique identifier or guid. The guid property specifies the unique identifier for an object. A guid is a long string of hexadecimal digits following the hash character, and might look like this: #9202a8c04000641f800000000006df1b. No two objects will ever have the same value of the guid property. This property is read-only, and its use is discouraged: you should usually use the id property instead.
The Metaweb type system does not include an inheritance mechanism. The /type/object type is not the supertype of other types; it is simply a set of very general properties that are useful on any object. Although Metaweb types do not form an inheritance hierarchy, they can be categorized as illustrated in Figure 2.1.
Figure 2.1. Categories of Metaweb types
+--/type/id
|
+--/type/int
|
+--/type/float
|
+--/type/boolean
|
+--Value Types--+--/type/text
| |
| +--/type/rawstring
| | +--/restaurant domain
| +--/type/uri |
| | +--/location domain
| +--/type/datetime |
| | +--/film domain +-/music/track
Types-+ +--/type/key | |
| +--/music domain--+-/music/album
| | |
| +--Freebase Types-----+--/book domain +-/music/artist
| | |
| | +--etc.
| |
+--Object Types-+--Core Types (/type domain)
|
+--Common Types (/common domain)
|
+--User-defined types-+--/user/joe/default_domain
|
+--/user/joe/music
Metaweb defines a small set of value types that represent primitives such as numbers, strings, dates and booleans. These value types are described in Section 2.5. All other types are object types. Types are organized into domains, which are simply collections of related types. Like properties and types, domains are represented by Metaweb objects and these domain objects serve as the namespaces for the types they include.
Core types that are fundamental to Metaweb are in the /type domain. This domain includes the value types plus fundamental object types such as /type/type and /type/property. Other commonly useful (but not so fundamental) types are part of the /common domain. Section 2.6 describes the object types in the /type domain plus the most important types in the /common domain. It also discusses a few other domains that contain commonly used types.
In addition to these core and common domains, freebase.com defines many other domains, such as /film, /finance /government and /chemistry for general knowledge representation. You can browse these domains at http://www.freebase.com/site/data, and we'll continue to make heavy use of types from the /music domain in Chapter 3 and Chapter 4.
Finally every Metaweb user has a domain in which they can define their own types. (Object types only, however: users cannot define new value types.) If your Metaweb username is "joe", then you have a domain /user/joe/default_domain. The freebase.com client also allows you to define additional domains in the /user/joe namespace. Chapter 5, for example, will ask you to create a personal domain named /user/joe/music.
Like many programming languages, Metaweb draws a distinction between objects and primitive values such as numbers, dates and strings. When we view the Metaweb graph as a set of tuples, we see that some tuples have a reference to another node in the To column and some have a primitive value in the Value column instead. If a property has an expected type that is a value type, then a tuple involving that property will have a value in the Value column. On the other hand, if a property has an expected type that is an object type then a tuple involving that property will have no value in the Value column.
Metaweb defines nine value types. Like all Metaweb types, value types are identified by type objects such as /type/int (for the value type that represents integer values). The sub-sections that follow explain each of the value types in detail. We begin, however, with a short discussion of value types and properties.
In Chapter 3, we'll learn that there are two ways to ask for the value of a property in MQL. Think of the /music/track/length property: it represents the duration of a track has an expected type of /type/float, which is a value type representing a floating-point number. If we use the first MQL query technique to ask for the length of a particular song, we simply get a single number back. If we use the other technique, MQL will pretend that the value is a simple object with two properties:
value
this property holds the primitive value: a number in this case.
type
this property specifies the type of the primitive: /type/float in this case.
When queried in this way, all value types appear to have these two properties. Keep in mind, however, that these are not true properties: MQL simply allows value types to behave as if they have value and type properties. These properties are represented by the /type/value/value and /type/value/type objects. /type/value is nominally a type, but is never used as one: like /type/object it exists only to group a set of related properties. /type/object defines the universal properties of object types, and /type/value defines the universal properties (only two of them) of value types.
If a property has an expected type of /type/text or /type/key, then any tuple involving that property will have values in both the To and Value column. /type/text and /type/key are considered value types, but are really something of a hybrid between object types and value types. In addition to the synthetic value and type properties, each of these types has a real property as well: /type/text/lang specifies the human language of a string of text and /type/key/namespace specifies the namespace of an id. Further details are in Section 2.5.5 and Section 2.5.9.
Values of this type are signed integers. Metaweb uses a 64-bit representation internally, which means that the range of valid values of /type/int is from -9223372036854775808 to 9223372036854775807. An integer literal is simply an optional minus sign followed by a sequence of decimal digits. Metaweb does not support octal or hexadecimal notation for integers, nor does it allow the use of exponential notation for expressing integers.
Values of this type are signed numbers that may include an integer part, a fractional part, and an order of magnitude (a power of ten by which the integer and fractional parts are multiplied.) Metaweb uses the 64-bit IEEE-754 floating point representation which supports magnitudes between 10-324 and 10308. C and Java programmers may recognize this as the double datatype. Metaweb does not support the special values Infinity and NaN, however.
A literal of /type/float consists of an optional minus sign, and optional integer part, and optional decimal point and fractional part and an optional exponent. The integer and fractional parts are simply strings of decimal digits. The exponent begins with the letter e or E, followed by an optional minus sign, and one to three digits. The following are all valid /type/float literals:
1.0 # integer and fractional part 1 # integer part alone .0 # fractional part alone -1 # minus sign allowed as first character 1E-5 # exponent:1 × 10-5or 0.00001 5.98e24 # weight of earth in kg:5.98 × 1024
There are an infinite number of real numbers, and a 64-bit representation can only describe a finite subset of them. Any number with 12 or fewer significant digits can be stored and retrieved exactly with no loss of precision. Numbers with more than 12 significant digits may have those digits truncated when they are stored in Metaweb.
There are only two values for this type; they represent the boolean truth values true and false. Note that Metaweb sometimes uses the absence of a value (null) in place of false.
A value of /type/text is a string of text plus a reference to a /type/lang object that specifies the human language of that text. The /type/object/name property is the most frequently used property of this type.
/type/text is unusual. Its value property specifies the text itself, but it also has a lang property that specifies the language in which the text is written. The lang property refers to an object of type /type/lang. The /lang namespace holds many instances of this type, such as /lang/en for English.
The text of a /type/text value must be a string of Unicode characters, encoded using the UTF-8 encoding. The encoded string must not occupy more than 4096 bytes. Longer chunks of text (or binary data) can be stored in Metaweb content store.
A value of /type/rawstring is a string of bytes with no associated language specification. The length of the string must not exceed 4096 bytes.
Use /type/rawstring instead of /type/text for small amounts of binary data and for textual strings that are not intended to be human readable.
A value of /type/uri represents a URI (Uniform Resource Identifier: see RFC 3986). The value property holds the URI text, which should consist entirely of ASCII characters. Any non-ASCII characters, and any characters that are not allowed in URIs should be URI-encoded using hexadecimal escapes of the form %XX to represent arbitrary bytes.
An instance of /type/datetime represents an instant in time. That instant may be as long as a year or as short as a fraction of a second. The value property is a string representation of a date and time formatted according to a subset [5] of the ISO 8601 standard.
A /type/datetime value that represents the first millisecond of the 21st century looks like this:
2001-01-01T00:00:00.001Z
Notice the following points about this format:
Longer intervals of time (years, months, etc.) are specified before shorter intervals (minutes, seconds, etc.).
Years must be specified with a full four digits, even when the leading digits are zeros. Negative years are allowed, but years with more than four digits are not allowed.
Months and days must always be specified with two digits, starting with 01, even when the first digit is a 0.
The components of a date are separated from each other with hyphens.
A date is separated from the time that follows with a capital letter T.
Times are specified using a 24-hour clock. Midnight is hour 00, not hour 24. Hours and minutes must be specified with two digits, even when the first digit is 0.
Seconds must be specified with two digits, but may also include a decimal point and a fractional second. Metaweb allows up to 9 digits after the decimal point.
The hours, minutes, and seconds components of a time specification are separated from each other with colons.
A time may be followed by a timezone specification. The capital letter Z is special: it specifies that the time is in Universal Time, or UTC (formerly known as GMT). Local timezones that are later than UTC (east of the Greenwich meridian) are expressed as a positive offset of hours and minutes such as +05:30 for India. Local times earlier than UTC are expressed with a negative offset such as -08:00 for US Pacific time. If no timezone is specified, then then the /type/datetime value is assumed to be a local time in an unknown timezone. Specifying a timezone of +00:00 is the same as specifying Z. Specifying -00:00 is the same as omitting the timezone altogether.
All characters used in the /type/datetime representation are from the ASCII character set, so date and time values can be treated as strings of 8-bit ASCII characters.
A /type/datetime value can represent time at various granularities, and any of the date or time fields on the right-hand side can be omitted to produce a value with a larger granularity. For example, the seconds field can be omitted to specify a day, hour, and minute. Or all the time fields and the day-of-month field can be omitted to specify just a year and a month. Also, the date fields can be omitted to specify a time that is independent of date. A timezone may not be appended to a date alone: there must be at least an hour field specified before a timezone.
Here are some example /type/datetime values that demonstrate the allowed formats:
2001 # The year 2001 2001-01 # January 2001 2001-01-01 # January 1st 2001 2001-01-01T01Z # 1 hour past midnight (UTC), January 1st 2001 2000-12-31T23:59Z # 1 minute before midnight (UTC) December 31st, 2000 2000-12-31T23:59:59Z # 1 second before midnight (UTC) December 31st, 2000 2000-12-31T23:59:59.9Z # .1 second before midnight (UTC) December 31st, 2000 00:00:00Z # Midnight, UTC 12:15 # Quarter past noon, local time 17-05:00 # Happy hour, Boston (US Eastern Standard Time)
Values of /type/key define object ids. /type/key is the expected type of the /type/object/key property and the /type/namespace/keys property, and is not intended for use by any other properties.
The value property of a /type/key value is the local, or unqualified part of an identifier. It must be a string of ASCII characters, and may include letters, numbers, underscores, hyphens and dollar signs. A key may not begin or end with a hyphen or underscore. The dollar sign is special: it must be followed by four hexadecimal digits (using letters A through F, in uppercase), and is used when it is necessary to map Unicode characters into ASCII so that they can be represented in a key. To represent an extended Unicode character (that does not fit in four hexadecimal digits), encode that character in UTF-16 using a surrogate pair, and then express the surrogate pair using two dollar-sign escapes. Each component of an id used for domains, types and properties is further restricted: they may not include hyphens or dollar signs, they may not include two underscores in a row, and they may not start with a
digit.
Like /type/text, /type/key has a third property. The /type/key/namespace property refers to an object, but the interpretation of that object depends on whether the key is the value of a /type/object/key property or a /type/namespace/keys property. The /type/object/key property defines an id for the object (let's call it o) that is the subject of that property. The value of the key property of o is a /type/key value which we'll call k. The namespace property of k is a namespace object n. In this case, the id of n plus the value property of k define an id for the original object o.
The /type/namespace/keys property, on the other hand, defines an identifier within the namespace n that is the subject of that property. The value of the keys property is a /type/key value k as before, and the namespace property of k refers to an object o. Here, the id of n plus the value of k define an id for o. These two different uses of /type/key values are somewhat confusing, but we'll see clarifying examples in Section 3.3.3 and Section 5.11.
This section covers the core object types in the /type domain, and also introduces commonly-used types from the /common domain and elsewhere. You do not need to understand these types in detail in order to make productive use of Metaweb. Still, knowing what these basic types are is a helpful orientation to the system. You can learn more about these (or any) Metaweb types at freebase.com. Learn about a type and its properties by appending the type id to the URL http://freebase.com/type/schema. To read more about the type /type/unit, for example, visit: http://freebase.com/type/schema/type/unit .
The /type domain defines the value types listed in Section 2.5 and the other core types that Metaweb depends on. These core types can be loosely divided into categories. The first category is types that define the Metaweb type system. These types have already been introduced, but are summarized here for completeness:
/type/object
This type exists simply to group the properties, such as /type/object/name and /type/object/type shared by all objects. /type/object is not a supertype of other types, and objects are never actually typed with /type/object.
/type/type
This type describes a type, which means that it is the only type that is an instance of itself. The properties property defines the set of properties of the type. The instance property defines the set of instances of the type (it is the reverse property of /type/object/type). The domain property links to the domain that defines the type. The expected_by property is the reverse of /type/property/expected_type : it is is the set of properties whose values are of this type.
/type/property
This type defines a property. The properties of a property object include expected_type which specifies the type of the value of the property and unique which specifies whether the property is restricted to a single value. The schema property refers to the type object of which the property is a part: it is the reciprocal of /type/type/properties. The unit property specifies a /type/unit value associated with the property: it is useful for properties whose expected type is /type/int or /type/float. If a property is a master property, then the reverse_property property refers to its reciprocal, if one is defined. And if a property is a reverse property, then the master_property property refers to the reciprocal. The enumeration property of a property has to do with identifiers and namespaces and is explained in Section 3.3.5. The requires_permission property is related to access control and is explained in Section 5.12.3.
/type/domain
A domain represents a set of related types, and also serves as a namespace for those types. The types property (the reciprocal of /type/type/domain) specifies the members of the domain. For access control purposes, each domain has an associated usergroup that "owns" the domain.
/type/namespace
This type represents a namespace, and its keys property specifies a set of named objects that exist within the namespace. /type/namespace/keys is the reciprocal of /type/object/key. See Section 2.5.9 for more about keys and namespaces. Namespaces are more fundamental to the Metaweb architecture than types are, and any object can be used as a namespace, even if it is not typed as a /type/namespace object.
Another category of core type are those involved in access control (see Section 2.7 for more on this topic):
/type/user
Each registered Metaweb user is represented by an object of /type/user. User objects have ids in the /user namespace. If your username is joe_developer, then your /type/user object is /user/joe_developer. The usergroup property of a /type/user specifies the usergroups of which the user is a member.
/type/usergroup
This type represents a set of users. The member property is the set of users that belong to the group, and the permitted property is the set of permissions that are granted to the group.
/type/permission
This type is the key to Metaweb access control. The permits property is the set of usergroups that have been granted this permission. And the controls property is the (possibly very large) set of object controlled by this permission object. The default permission object is /boot/all_permission which allows access by any user.
The following types represent content, and metadata about that content, in the Metaweb content store:
/type/content
Large chunks of content, such as HTML documents and graphical images are not stored in regular Metaweb nodes. Instead, these large binary objects (sometimes called blobs) are kept in a separate store. A /type/content object is the bridge between the Metaweb object store and the Metaweb content store. A /type/content object represents an entry in the content store, and the id of the /type/content object is used as an index for retrieving the content.
In addition to providing access to the content store, /type/content defines important properties. The media_type property specifies the MIME type of the content. For textual content, the text_encoding and language properties specify the encoding and language of the text. The length property specifies the size (in bytes) of the content. The source property refers to zero or more /type/content_import objects that specify the source of the content. (If the same content is uploaded multiple times, it may have multiple sources.)
Chapter 4 shows how to download content from Metaweb, and Chapter 6 demonstrates how to upload content.
/type/content_import
This type describes the source of imported content. Its properties include the URI or filename from which the content was obtained, the user who imported the content, and a timestamp that specifies when the content was imported.
/type/media_type
Instances of this type represent a MIME media type such as "text/html" or "image/png". Instances are given fully-qualified names within the /media_type namespace, and can be specified with ids like /media_type/text/html or /media_type/image/png.
/type/text_encoding
Instances of this type represent standard text encodings, such as ASCII and Unicode UTF-8. Instances are given fully-qualified names within the /media_type/text_encoding namespace, and can be specified with ids such as /media_type/text_encoding/ascii.
Finally, the /type domain also includes a few miscellaneous types:
/type/lang
This type represents a human language. Instances, such as /lang/en which represents English, exist in the /lang namespace. The iso639 property specifies the two-letter code (such as "en") of the language. /type/text values have an associated language, as do /type/content objects that represent text.
/type/unit
This type defines a measurement unit, and is the expected type of /type/property/unit. A number of instances, such as /en/meter, /en/kilogram, and /en/second, are defined. This type has no properties.
/type/enumeration
This type is used as the expected type of any property that defines an enumeration. See Section 3.3.5 for details.
/type/link
This special type allows us to view the links in the Metaweb graph as objects. It is used in advanced MQL queries as explained in Section 3.7.
/type/reflect
This is not a true type, but a collection of special properties used in reflective queries. See Section 3.7.3.
The types in the /common domain are not a core part of the Metaweb infrastructure, but, as the name implies, they are commonly useful, and some of them are quite important for the freebase.com client. The five most commonly-used /common types are:
/common/topic
Metaweb objects that are intended for display to end users through freebase.com are called "topics". Such objects typically have some appropriate domain-specific type, such as /music/artist or /food/restaurant, but are also typically instances of the type /common/topic. This type defines properties that allow documents, images, webpages, and nicknames to be associated with the topic. This type is so common that properties like /common/topic/image and /common/topic/alias are sometimes used without actually adding /common/topic to the set of types of an object.
/common/document
This type represents a document of some sort. /common/topic uses this type to associate documents with topics. The most important property is content, which specifies the single /type/content object that refers to the document content. Other properties of /common/document provide meta-information about the document, such as authors, publication date, and so on.
/common/image
/type/content objects that represent images are typically co-typed with this type. /common/image defines a size property that specifies the pixel dimensions of the image.
/common/webpage
This type is simply the URL of a webpage plus a short description of the page's content.
/common/phone_number
This type has two properties of /type/rawstring to hold a phone number and a county code for the phone number.
To view the complete list of types in the /common domain visit http://freebase.com/view/common. In general, you can browse the contents of a domain by appending the domain id to the URL http://freebase.com/view.
The /type and /common domain are not the only ones that include commonly-used types. There are a few others that you should be aware of:
/freebase
This domain defines types used by the freebase.com client. Many of them are quite implementation-specific and not of general interest to applications other than Freebase itself. /freebase/documented_object allows short tips and longer-form documentation to be associated with any other Metaweb object, such as domains, types and properties. /freebase/user_profile instances hold information about Freebase users.
/location
The /location domain contains types related to geographical locations. These include /location/country, /location/administrative_division (as well as country-specific versions such as /location/us_state), /location/citytown, /location/postal_code and /location/mailing_address. The reason that the mailing address type is not in /common along with /common/phone_number is that it depends on other /location types for representing countries, cities, and so on.
/people
This domain defines the important type /people/person as well as related types such as /people/gender and /people/marriage.
/time
This domain defines time and date related types beyond the simple /type/datetime primitive. Types include /time/day_of_week, /time/month, and /time/day_of_year.
/measurement_unit
This domain defines units of measure. The /type/unit type we saw earlier marks an object as a unit so that it can be used with /type/property/unit. But it is this /measurement_unit domain that provides detailed types to represent units. The /en/second object, for example is both a /type/unit and a /measurement_unit/unit_of_time.
The unit types in this domain are not actually commonly used. The more useful types are the compound value types: these are types that define two or more properties (often of primitive type) so that multiple values can be manipulated together as a single value. For example /measurement_unit/time_interval has two /type/datetime properties and is used to represent the starting and ending point of a period of time. /measurement_unit/integer_range is similar, but has properties of type /type/int instead. /measurement_unit/money_value combines a /type/float with a /finance/currency property. And /measurement_unit/dated_money_value combines those two properties with a /type/datetime, tying the amount of money to a specific date (which is useful when dealing with inflation and time-varying currency exchange rates, for example).
Metaweb is completely open for reading. Anyone who can connect to Metaweb servers can read data from them. When adding or editing data, however access control comes into play. We've already seen that the types /type/user, /type/usergroup, and /type/permission are used for access control.
Metaweb's access control model is quite simple. Every object has a permission property that refers to a /type/permission object. The permission object specifies a set of usergroups whose members have permission to modify the object. If a user is a member of one or more of the specified groups, then that user can edit [6] the object. Otherwise, the user is not allowed to.
This simple access control model is, by default, completely open. In order to allow and encourage free collaboration most Metaweb objects have a permission object that gives edit permission to all Metaweb users. If Metaweb user Fred creates a new object, his friend Jill can freely edit that object, and any other Metaweb user can edit the object as well. (In Chapter 6 we'll learn how to alter the default permission to create objects with restricted access.) The use of Metaweb types is also completely open: any user can create instances of any type, regardless of who "owns" the type.
Although most objects can be freely edited and all types can be freely used, Metaweb namespaces are not usually wide open like this. System namespaces like /lang and /type are owned by Metaweb administrators, and regular users cannot add keys to them. The most common user-defined namespaces are domains, which serve as the namespace for types, and types, which serve as the namespace for properties. When you create a new domain, a usergroup and permission object are created with it, and the permission property of the domain is set so that only members of the usergroup can define keys in the domain. This means that only members of the group can define types in the domain. Newly created types use the same permission object as their domain, which means that only users in the usergroup can define properties in the namespace of the type. The freebase.com client allows users to create personal domains, and to edit the membership of the associated usergroup. This allows Fred to create a new domain and then add Jill and other collaborators to the usergroup so that they can create and modify types within the domain.
We'll explore the topic of access control in more detail in Section 5.12 after we have learned how to express MQL write queries.
[2] There are actually six elements of each tuple: relationships, like nodes, have a timestamp indicating when they were added to the database and also a reference to the user that defined the relationship. We'll learn more about the timestamp and creator of a relationship in Section 3.7.
[3] And that the reverse of /type/property/reverse_property is /type/property/master_property!
[4] The resemblance to JavaScript object syntax is intentional. The reason will become clear in Chapter 3.
[5] /type/datetime only supports dates specified using month and day of month. It does not support the ISO 8601 day-of-year, week-of-year and day-of-week representations.
[6] The precise meaning of "edit" is a little complicated, and there is also another form of access control known as per-property access control. Details are in Section 5.12.