Chapter 4. Metaweb Read Services

Chapter 3 explained how to express Metaweb queries using MQL. This chapter explains how to deliver those queries to Metaweb servers and retrieve their response using the mqlread service. It also explains how to search Metaweb with the search service and how to retrieve chunks of data (such as images and HTML documents) using the trans service. The chapter includes example applications and libraries written in Perl, Python, PHP, and JavaScript and concludes with a sophisticated Python library for interacting with Metaweb's read services.

4.1. Basic mqlread Queries with Perl

Metaweb's services are all implemented on top of the HTTP protocol. Submitting a MQL query and retrieving the response, therefore, is simply a matter of constructing the appropriate URL and fetching its content via an HTTP request.

The basic URL for submitting MQL queries to freebase.com is:

https://api.freebase.com/api/service/mqlread

To submit a query to the mqlread service, follow these steps:

  • Place the query inside an "envelope" object.

  • Serialize the envelope object to a JSON string.

  • Escape punctuation characters in the JSON string using standard URL encoding.

  • Concatenate the basic URL above with "?query=" and the escaped and encoded JSON string to form a complete mqlread URL.

  • Fetch the contents of the URL with an HTTP GET request.

Example 4.1 is a command-line utility that lists the albums released by any band you specify. It uses the Metaweb API to retrieve data from freebase.com. It is written in Perl, and demonstrates how to nest an MQL query within an envelope and send that envelope to to the mqlread service. (The structure of the envelope object will be explained in Section 4.2.1.)

Example 4.1. albumlist.pl: submitting MQL queries in Perl

#!/usr/bin/perl
use URI::Escape;  # This module provides the uri_escape function used below.

# Build the Metaweb query, using string manipulation.
# CAUTION: the use of string manipulation here makes this script vulnerable
# to MQL injection attacks when the command-line argument includes JSON.
$band = $ARGV[0]; # This is the band or musician whose albums are to be listed
$query='{"type":"/music/artist","name":"' . $band . '","album":[]}';

# Now place the query in a JSON envelope, and URL encode the envelope.
$envelope = '{"query":' . $query . '}';
$escaped = uri_escape($envelope); 

# Construct the URL that represents the query.
$baseurl='http://api.freebase.com/api/service/mqlread'; # Base URL for queries
$url = $baseurl . "?query=" . $escaped;

# Use the command-line utility curl to fetch the content of the URL.
$result = `curl -s $url`;

# Use regular expressions to extract the album list from the HTTP response.
$result =~ s/^.*"album"\s*:\s*\[\s*([^\]]*)\].*$/$1/s;
$result =~ s/[ \t]*"[ \t,]*//g;

# Finally, display the list of albums.
print "$result\n";

You run this program from the command line. An invocation might look like this:

$ perl albumlist.pl 'Spinal Tap'
Break Like the Wind
This Is Spinal Tap
The Majesty of Rock
Smell the Glove

4.1.1. A Better Perl Album Lister

The first thing to notice about Example 4.1 is that it does not use a JSON serializer or parser: a JSON-encoded MQL query is constructed with string concatenation and the desired results are extracted with regular expressions. These shortcuts keep the example simple and allow us to focus on how the mqlread URL is built and its content fetched. More sophisticated applications, however, use a JSON encoder to serialize the query and a JSON decoder to parse the result. Building queries with string manipulation can be reasonable in the simplest applications (though caution is required to avoid MQL injection attacks), but attempting to extract results with regular expressions is brittle and not a technique to emulate in your own code!

Example 4.2 is a higher-level version of Example 4.1. It uses a JSON serializer and parser and also a higher-level API for URL manipulation. To use it, you must have the JSON.pm module (version 2 or higher) installed[16]. This version of the program also uses a somewhat more sophisticated query to sort albums by their release date, and also does error checking and error reporting in case anything goes wrong with the query. Finally, it also adds an additional input parameter to the query envelope to specify that HTML escaping should not be done on the query results. With this option, ampersands in album names are returned as & rather than as &, which is what we want since this application runs in a terminal rather than in a browser.

Notice that the mqlread service returns the query results in a response envelope object that is similar to the query envelope. This response envelope has a property named result whose value is the result of the MQL query.

Example 4.2. albumlist2.pl: a better Perl album lister

#!/usr/bin/perl -w
use strict;           # Don't allow sloppy syntax
use JSON;             # JSON encoding and decoding with to_json and from_json
use URI::Escape;      # URI encoding with uri_escape
use LWP::UserAgent;   # High-level HTTP API

# Some constants for this script
my $SERVER = 'http://api.freebase.com';            # The Metaweb server
my $QUERYURL = $SERVER . '/api/service/mqlread';   # Path to mqlread service

# What band did the user ask about?
my $band = $ARGV[0];

# Construct a Metaweb query as a Perl data structure.
my $query =  {
    type => "/music/artist",    # We're looking for a band.
    name => $band,              # This is the name of the band.
    album => [{                 # Return some albums.
        name => undef,          # undef is Perl's null.
        sort => "release_date", # Sort by release date.
        release_date => undef   # Return release date, too.
    }]
};

# Put the query in an envelope object.
my $envelope = {        
    query => $query,             # The "query" property holds the query.
    escape => JSON::false        # Don't HTML escape result text
};                      

# Convert the envelope object from Perl hash to JSON string, and URI encode it.
my $encoded = to_json($envelope);    # Serialize object to string.
my $escaped = uri_escape($encoded);  # URI encode the string.

# Build the complete query url.
my $url = $QUERYURL . "?query=" . $escaped;

# Create the HTTP "user agent" we'll use to send the query.
my $ua = LWP::UserAgent->new;

# Send request to the server and get the response.
my $response = $ua->get($url);

# Now handle the mqlread response.
if ($response->is_success) {                      # If we get HTTP 200 OK...
    my $responsetext = $response->content;        # Get result as JSON text.
    my $response = from_json($responsetext);      # Parse text to a Perl hash.

    if ($response->{code} ne "/api/status/ok") {  # If the query was not okay:
        my $err = $response->{messages}[0];       # get the error message obj
        die $err->{code}.': '.$err->{message};    # and exit with error message
    }

    # If there was no error, the MQL result is in the response envelope
    my $result = $response->{result};   # Open response envelope, get result.
    my $albums = $result->{album};      # Get albums array from result.
    for my $album (@$albums) {          # Loop through albums.
        print "$album->{name}";         # Print the name of each.
        if ($album->{release_date}) {   # Print release date, if there is one.
            print " [" . substr($album->{release_date},0,4) . "]"; # Year only.
        }
        print "\n";                     # Add a newline.
    }
}
else {                                  # If query failed...
    die "Server returned error code " . $response->code . "\n";
}

4.2. The mqlread Service

Now that we've seen some working code, this section explains more formally how mqlread works. Like all Metaweb services, mqlread is a web-based service: it takes an HTTP request as input and returns an HTTP response as its output.

The path to the mqlread service on a Metaweb server is /api/service/mqlread. To send a mqlread query to the Metaweb server running at api.freebase.com, for example, you'd use the following URL:

https://api.freebase.com/api/service/mqlread

mqlread works with both GET and POST request methods. GET requests are preferred unless the query is so long that POST must be used instead.

There are two sources of input to mqlread in an HTTP request. The first is the request parameters. For GET requests, these parameters are encoded in the URL itself, following a ? character. For POST requests, the request parameters appear in the body of the request. In both cases, the parameters are URI encoded in the standard way that web browsers encode HTML form submissions. mqlread recognizes request parameters query, queries and callback, and these are documented in sub-sections below. Every mqlread request must include either the query or queries request parameters (but not both). The value of these parameters is a JSON-serialized object known as a query envelope. In addition to holding the actual MQL query that is being submitted, this envelope object may also hold additional mqlread input in the form of "envelope parameters". Envelopes and envelope parameters are covered in detail below.

The second source of mqlread input in an HTTP request is HTTP cookies. mqlread looks for a cookie named mwLastWriteTime. This cookie is only necessary in applications that perform MQL writes as well as MQL reads, and ensures that recent writes are always visible to subsequent read requests performed by the same application (or by the same web browser). The mwLastWriteTime cookie is covered in Chapter 6 rather than in this chapter. In general, Metaweb-enabled applications need not track individual cookies. Instead, they can behave like web browsers do: any cookies returned as output by a Metaweb service should be included as input to subsequent requests.

The output of the mqlread service is the HTTP response body. This body is always (even when errors occur because of bad input) a JSON serialized object in text/plain encoding. This JSON serialized object is known as a response envelope, and it is explained in Section 4.2.3 below.

4.2.1. The query Request Parameter

The simplest mqlread request includes a single request parameter named query. The value of this parameters is a JSON-serialized object known as a query envelope. This envelope object must have a property named query, and may also have additional properties that specify "envelope parameters" – see Section 4.2.4. The value of the query property in the envelope is your JSON-serialized MQL query. Thus a simple mqlread invocation uses a URL like this:

https://api.freebase.com/api/service/mqlread?query={"query":[{Your MQL here}]}

Notice that the word "query" appears twice in this URL. The first (without quotes) is the request parameter, and the second (with quotes) is a property of the envelope object. In this example, the square brackets are part of the MQL query itself, not part of the envelope syntax. Note also that everything after the equals sign should, in practice, be URI-encoded, which transforms quotation marks into %22, and so forth.

4.2.2. The queries Request Parameter

mqlread allows you to submit more than one MQL query at a time. To do this, you must use the queries request parameter instead of query. (Every invocation of mqlread must include one or the other, but not both.) The value of the queries parameter is not a simple query envelope as it is for the query parameter. Instead, it is a JSON-serialized object known as an outer envelope. The outer envelope contains named query envelopes. To submit two queries at once the outer envelope would have two properties. The names of the properties might be q1 and q2, and their values would be the two query envelopes that describe the two queries to be run:

{                                       # Start the outer envelope
  "q1": {                               # Query envelope for query named q1
    "query":{First MQL query here}      # Query property of query envelope
  },                                    # End of first query envelope
  "q2": {                               # Start query envelope for query q2
    "query":[{Second MQL query here}]   # Query property of q2
  }                                     # End of second query envelope
}                                       # End of outer envelope.

The property names used within an outer envelope are arbitrary, but they appear again in the mqlread response.

4.2.3. The Response Envelope

The output of the mqlread service is an HTTP response, which consists of a set of HTTP response headers and a response body. The headers are not typically interesting (though Metaweb engineers might be interested in the X-Metaweb-TID header if you're submitting a bug report). In particular mqlread is not expected to return cookies as part of its output.

The body is the interesting part of the mqlread response. It is a UTF-8 encoded JSON-serialized object. This object is known as the response envelope, and its structure mirrors that of the query envelope with the query property replaced with a result property. If the request used the query parameter to submit a single query envelope, then the result is a single response envelope. The following are side-by-side views of a query and response envelope:

Query Envelope Response Envelope
{
  "query": [{ MQL Query Here }]
}
{
  "result": [{ MQL Response Here }],
  "status": "200 OK",
  "code": "/api/status/ok",
  "transaction_id":[opaque string value]
}

If the request used the queries parameter to submit multiple named query envelopes within an outer envelope, then the response is an outer envelope that uses the same names to refer to multiple response envelopes:

Query Envelopes Response Envelopes
{
  "q1": {
    "query":{First MQL query here}
  },
  "q2": {
    "query":[{Second MQL query here}]
  }
}
{
  "q1": {
    "result":{First MQL result here},
    "code": "/api/status/ok"
  },
  "q2": {
    "result":[{Second MQL result here}],
    "code": "/api/status/ok"
  },
  "status": "200 OK",
  "code": "/api/status/ok",
  "transaction_id":[opaque string value]
}

Notice that response envelopes include code, status, and transaction_id properties. The code property is the most important: it specifies a success or failure status code (as a string) for the query. If the query was successful then the value of this property will be "/api/status/ok". If code does not equal "/api/status/ok", then there was an error of some sort, and the response envelope will include additional details in a messages array. See Section 4.2.6 for further details about status codes and error messages, including details on the status and transaction_id properties.

4.2.4. Envelope Parameters

If you use the query request parameter you specify a single query envelope as its value. If you use queries instead, you specify one or more query envelopes within an outer envelope. In either case, each query envelope must include a property named query that specifies the MQL query to be executed. Each envelope may also include additional properties, known as "envelope parameters" that provide additional input to mqlread and specify how the query should be run. mqlread supports envelope parameters named cursor, escape, lang as_of_time and uniqueness_failure. These parameters are described below.

4.2.4.1. Fetching Large Result Sets with Cursors

Recall that MQL queries are implicitly limited to returning 100 results. You can use the MQL limit directive to specify a different limit, but when there is a very large result set, specifying a very large limit may cause your query to time out. When you expect that your query will have many results, and you want to retrieve all of those results, you should use a cursor [17]. A cursor is simply a way of keeping track of your position within a large set of results, and it enables you to retrieve the results of a query batch by batch with multiple sequential mqlread invocations. Cursors are demonstrated later in this chapter in Section 4.8.

To begin a new query that uses a cursor, include a cursor property with the value true in the query envelope. The response envelope (see Section 4.2.3 will then contain a cursor property. If the value of the cursor in the response is false, it means that all results have been delivered. Otherwise, that property will be a long string of opaque data. Use this string as the value of the cursor property in the query envelope, and submit that query envelope again (leaving the query itself unchanged). This time the response envelope will contain the second batch of results and a new value for the cursor property. Repeat these steps until the cursor property of the response envelope is false.

It is important to understand that cursors only work when multiple results are expected at the top-level of the query. The cursor property is part of the mqlread query envelope syntax, not part of MQL itself, and it cannot be applied to sub-queries of a query. Another way to say this is that it only makes sense to include "cursor":true in an envelope if the first character following "query": in the envelope is [. The query must be expressed as an array in order for a cursor to be meaningful. It is legal, but never useful, for example, to use a cursor in this query envelope:

{
  "cursor":true,
  "query": {
    "type":"/music/artist",
    "name":"The Police",
    "album":[]
  }
}

If you want to use a cursor to retrieve a list of albums in batches, the array of albums must be at the toplevel of the query:

{
  "cursor":true,
  "query": [{
    "type":"/music/album",
    "artist":"The Police",
    "name":null,
    "limit":10
  }]
}

Note the addition of the limit directive to the query to specify the size of each batch we want returned.

Cursor values remain valid after they are used. Once you have downloaded result batches sequentially, you can reuse saved cursor values to download those batches again in whatever order you like. (Except for the first batch which does not have a cursor.) Results retrieved with cursors are based on the state of the database as it existed when the first query (with "cursor":true) was issued. [18] So results retrieved with a given cursor will always be the same, and will ignore any insertions or deletions that occurred after the original query. Cursors can be assumed to have a lifetime at least as long as that of your application. But updates to Metaweb's backend software can invalidate cursors, so you should not assume that they live forever. Cursors are not intended to be stored in databases or files or encoded into long-lived URLs, for example. They should not be considered "permalinks" or persistent bookmarks to a past state of the database.

4.2.4.2. Disabling HTML Escapes

By default, mqlread uses HTML entities &lt;, &gt;, and &amp; in its responses in place of the characters <, >, and &, and this means that text returned by mqlread is safe for display in web browsers. To disable this escaping, add an escape parameter to the query envelope, and set its value to false. (You can also explicitly request HTML escaping with "escape":"html", but this is the default behavior and is not required.) If you do disable HTML escaping, you should be careful never to display the mqlread output in a web browser, since it could contain <script> tags that execute arbitrary JavaScript code, for example.

The escape envelope parameter was demonstrated in Example 4.2 and we'll see it again in Example 4.3.

4.2.4.3. Specifying Your Preferred Language

As you know, Metaweb objects can have more than one value for the name property, but can have only one value in any given language. When you request the name of an object, it returns the name in your preferred language. The default language is English, but you can specify a different preference with the envelope parameter lang. The value of this parameter should be a language id in the /lang namespace. The following query envelope, for example, asks for the Spanish name of the French language:

{
  "lang":"/lang/es",
  "query": {
    "id":"/lang/fr",
    "name":null
  }
}

At the time of this writing, [19] mqlread supports only a single preferred language. If no name exists in the specified language, then null is returned. In the future, the lang envelope parameter is likely to evolve to support language fallbacks, and you should be able to request, for example, a name in Spanish, or in English if no Spanish name exists.

4.2.4.4. Making Queries in the Past

Use the as_of_time envelope parameter in a mqlread query to specify that the query should be performed historically, against the Metaweb database as it existed at the specified moment in the past. The Metaweb database has a journaled structure, making this kind of historical query relatively simple and efficient to perform. Note, however, that type and schema information used in processing the query is current rather than historical.

The value of the as_of_time parameter should be a timestamp in the ISO 8601 format (see Section 2.5.8) used by /type/datetime values in MQL. For example, the value "2007-02-03" represents midnight on February 3rd, 2007, and the value "2008-01-01T17:00Z" represents 5PM on January 1st, 2008. Metaweb timestamps are always stored in UTC (or GMT) time, and the as_of_time parameter assumes that your timestamp is specified in UTC. You may append Z to your timestamp to make this timezone explicit but you may not explicitly specify any other timezone. That is, you cannot add -08:00 to specify US Pacific time, for example.

Here are two queries (in their query envelopes, and named "now" and "then" in an outer envelope) that allow us to see how the number of defined types has grown over time:

{
  "now": {
    "query": {
        "return":"count",
        "type":"/type/type"
    }
  },
  "then": {
    "query": {
        "return":"count",
        "type":"/type/type"
    },
    "as_of_time":"2008-01-01"
  }
}

On 2008-04-25, freebase.com returned the following outer response envelope, showing the addition of over 900 types in under 5 months:

{
  "status" : "200 OK",
  "code" : "/api/status/ok",
  "now" : {
    "code" : "/api/status/ok",
    "result" : 5548
  },
  "then" : {
    "code" : "/api/status/ok",
    "result" : 4623
  }
}

In addition to the ability to run queries "in the past", Metaweb allows you to query the modification history of any object. See Section 3.7.4 for details.

4.2.4.5. Preventing Uniqueness Errors

When a MQL query or sub-query returns more than one result, but is not enclosed in square brackets to indicate that an array of results is expected, Metaweb normally returns an error code indicating that a uniqueness error has occurred. We saw in Section 3.2.4, for example, that the following query causes an uniqueness error because the object representing The Police has more than one type:

{"id":"/en/the_police", "type":null}

You can prevent this kind of error by setting the envelope parameter uniqueness_failure to "soft". (The default value is "hard"). With this parameter set to "soft", Metaweb simply returns one of the matching results, discards the others, and does not return an error or give any other indication that additional results are available.

Picking one (effectively random) result from a set and discarding all the others is not usually a useful strategy for handling multiple results, and the uniqueness_failure envelope parameter is not intended for use with queries like the one above. As a general rule, if a property is allowed to have more than one value, then queries of that property should be placed within square brackets.

When a property definition is changed to make the property unique after it is initially defined as non-unique, then it is possible (but rare) to find multiple values for a nominally unique property. You may want to use the uniqueness_failure envelope parameter when working with such a theoretically-unique property that is not yet unique in practice.

4.2.5. The callback Request Parameter

Every mqlread request must have a query or queries request parameter. They may also optionally have a callback parameter (this is a request parameter, not an envelope parameter). This parameter is used in JavaScript-based Metaweb applications and allows mqlread to be invoked using dynamically-generated <script> tags. (This <script>-based technique for client-server communication is commonly known as JSONP and we'll see examples in Section 4.5.)

The value of the callback parameter should be the name of a JavaScript function (without parentheses), such as processMQLResponse. Including the callback parameter in a mqlread invocation causes a small but very important change in the mqlread output. In order to understand these changes, recall that mqlread always returns a JSON-serialized object in the HTTP response body. Also recall that JSON is a subset of JavaScript which means that any JSON-serialized object that can be parsed by a JavaScript interpreter to re-create the object it represents. If you include a callback parameter in a mqlread invocation, mqlread returns a JSON-serialized object inside a JavaScript function invocation of the callback function you specify. Suppose, for example, that you invoke mqlread with a URL like this:

https://api.freebase.com/api/service/mqlread?callback=cb&query={"query":[{...}]}

In this case, the response will look like this:

cb({
  "status":"200 OK",
  "code":"/api/status/ok",
  "result":[{...MQL result here...}]
})

The JSON-serialized result envelope object is prefixed with the name of the callback function and an open parenthesis and is suffixed with a matching close parenthesis. This seems like a trivial change, but it transforms a bare JSON object into a JavaScript method invocation with that object as its argument. If the mqlread query URL is used as the src attribute of a <script> tag, the mqlread response becomes executable JavaScript content, and the callback function you name gets passed the response envelope object to process however it wants.

There is one other effect of using the callback parameter in a request. It forces mqlread to always return an HTTP status code of "200 OK", even when the query envelope is malformed and unparseable. The true HTTP status (which would otherwise have been returned) is available from the status property of the response envelope, and this enables a callback function to handle errors as well as successful queries.

See Section 4.5 for practical examples that use the callback request parameter.

4.2.6. mqlread Error Codes

The mqlread response envelope does not always include the results of your query. Errors can occur if you invoke mqlread incorrectly, if you specify an invalid MQL query, if your query times out, or if there if there is an internal error on the Metaweb server. This section explains how to check for and handle mqlread errors.

If you invoke mqlread without either the query or queries parameter, or if the value of that parameter is not a valid JSON string, it responds with an HTTP status "400 Bad Request". Even though this is an HTTP error code, the response still includes a response body and that body is still a JSON object that you can parse. It looks something like this:

{
  "status": "400 Bad request", 
  "code": "/api/status/error", 
  "messages": [
    {
      "info": {
        "value": null
      }, 
      "message": "one of query=, or queries= must be provided", 
      "code": "/api/status/error/input/invalid"
    }
  ],
  "transaction_id":"cache;cache01.sandbox.sfo1:8101;2008-09-12T21:33:30Z;0001"
}

If you invoke mqlread with correct parameters and a parseable query envelope, then it will always return an HTTP status code of "200 OK". This does not mean, however, that no error has occurred. If the query envelope is valid JSON but does not have a query property, for example, then you get this response:

{
  "status": "200 OK", 
  "code": "/api/status/error", 
  "messages": [
    {
      "info": {
        "key": "query"
      }, 
      "message": "Missing 'query' parameter", 
      "code": "/api/status/error/envelope/parse"
    }
  ],
  "transaction_id":"cache;cache01.sandbox.sfo1:8101;2008-09-12T21:35:19Z;0001"
}

And if the invocation is correct and the envelope is correct but the MQL query is invalid, then you might get a response like this:

{
  "status": "200 OK", 
  "code": "/api/status/error", 
  "messages": [
    {
      "info": {
        "expected_type": "/music/artist", 
        "property": "albums"
      }, 
      "query": {
        "albums": [], 
        "type": "/music/artist", 
        "id": "/en/the_police", 
        "error_inside": "."
      }, 
      "message": "Type /music/artist does not have property albums", 
      "code": "/api/status/error/mql/type", 
      "path": ""
    }
  ],
  "transaction_id":"cache;cache01.sandbox.sfo1:8101;2008-09-12T21:36:51Z;0001"
}

If your query is simply too big (such as asking for the names and discographies of 10,000 bands) the query will timeout and mqlread will return a response like this:

{
  "status": "200 OK", 
  "code": "/api/status/error", 
  "messages": [
    {
      "info": {
        "detail": [
          "timed out"
        ], 
        "timeout": 8.0
      }, 
      "message": "Query timeout", 
      "code": "/api/status/error/mql/timeout"
    }
  ],
  "transaction_id":"cache;cache01.sandbox.sfo1:8101;2008-09-12T21:39:01Z;0004"
}

Each of these error response envelopes includes the properties status, transaction_id, code, and messages. The status property simply repeats the HTTP status code. If you check the HTTP status code, you can ignore the status property. But if you use the callback parameter in your request, then mqlread will return a HTTP status of "200 OK", even when invocation errors occur. In this case the status property can tell you that an invocation error occurred.

The transaction_id property is always present in the response envelope whether or not an error occurred. Its value is a unique identifier for your request and enables Metaweb engineers to look it up in their internal log files. You should include the value of this property whenever you report a bug or ask a question about a query on the Metaweb developers mailing list.

The code property specifies the error code for the query. It is present in every response envelope, and you should always check this property after parsing the response from mqlread. The value of this property is always a string: if it is "/api/status/ok", then the query was successful and no error occurred. Otherwise something went wrong.

When the value of the code property is "/api/status/ok", the response envelope contains the query results as the value of a property named result. When code has any other value, the response envelope includes a messages property instead of a result property. The value of the messages property is an array (usually of length 1) of message objects each of which has the following properties:

code

A more detailed error code that more precisely specifies the nature of the error. Note that this code property of the message object is usually distinct from, and more informative than, the code property of the response envelope.

message

A human-readable description of the error.

info

An object that provides additional details about the error. The properties of this object depend on the error code.

query

For errors that result from an invalid MQL query, this property is a copy of the query object with the addition of a special error_inside property, to indicate where error occurs.

path

When the message object contains a query property, it also contains a path property that specifies the "path" of property names from the root of the MQL query to the the location of the error. This is an alternative to the error_inside property for locating the source of the error. If the error is in the outermost object of the query, then this property is just an empty string.

The descriptions above of error-related properties are valid when you use the query request parameter. If you use queries instead, there are a few differences you should be aware of. First, the status property appears only in outer envelope, not in the individual response envelopes it contains. Second, there are code properties both in the outer envelope and in the individual response envelopes. The outer code will only indicate an error, however, if there was an invocation error (such as a unparseable query envelope), and in this case there won't be response envelopes inside the outer envelope. As long as mqlread is correctly invoked, the code property of the outer envelope will be "/api/status/ok", even if there were errors in one or more (or all) of the queries. It is the code property of the individual response envelopes that specify the status of each individual query. Third (and this really follows from the second), the messages property only appears in the outer envelope for invocation errors. Otherwise, messages properties always appear within the response envelopes.

Example 4.2 included example code for mqlread error handling, and many examples that follow will also include error handling code. The general rule is to check the code property of any response envelope before "opening" it to extract the result. (And if the response envelope is inside an outer envelope, you must also check the code of that outer envelope before opening it to extract the response envelope.) If you find a code that is not "/api/status/ok", you can typically construct a suitable error message with messages[0].code and messages[0].message. If you have reason to expect a certain class of errors, you can refine your error reporting based on messages[0].code and messages[0].info.

4.3. A Python Album Lister

Now that we've seen how mqlread works in more formal detail, let's return to example code, and re-write our album listing script in Python. Example 4.3 is a Python program that:

  • expresses a MQL query as a Python data structure

  • wraps the query in an envelope object;

  • sets the value of envelope parameters in the envelope object;

  • serializes the envelope object to a JSON string;

  • URI encodes the serialized envelope;

  • Uses the serialized and encoded envelope as the value of the query parameter in a api.freebase.com URL;

  • obtains the query result, in text form, by fetching the contents of the URL;

  • parses the JSON string returned by mqlread into a response envelope object;

  • checks the code property in the response envelope to determine if the query was successful (if not, it extracts the error message from the envelope, prints the message and exits);

  • gets the query result from the envelope, extracts the array of albums, and prints their name and release dates.

This code relies on the simplejson module for JSON encoding and parsing. You can find the simplejson code at http://cheeseshop.python.org/pypi/simplejson.

Example 4.3. albumlist.py: listing albums with Python

import sys            # Command-line arguments, etc.
import simplejson     # JSON encoding.
import urllib         # URI encoding.
import urllib2        # High-level URL content fetching.

# These are some constants we'll use.
SERVER = 'api.freebase.com'              # Metaweb server
SERVICE = '/api/service/mqlread'         # Metaweb service

# Compose our MQL query as a Python data structure.
# The query is an array in case multiple bands share the same name.
band = sys.argv[1]                       # The desired band, from command line.
query = [{'type': '/music/artist',       # Our MQL query in Python.
          'name': band,                  # Place the band in the query.
          'album': [{ 'name': None,      # None is Python's null.
                      'release_date': None,
                      'sort': 'release_date' }]}]

# Put the query in an envelope
envelope = {
    'query': query,              # The query property specifies the query.
    'escape': False              # Turns off HTML escaping.
    }

# These five lines are the key code for using mqlread
encoded = simplejson.dumps(envelope)            # JSON encode the envelope.
params = urllib.urlencode({'query':encoded})    # Escape request parameters.
url ='http://%s%s?%s' % (SERVER,SERVICE,params) # The URL to request.
f = urllib2.urlopen(url)                        # Open the URL as a file.
response = simplejson.load(f)                   # Read and JSON parse response.

# Check for errors and exit with a message if the query failed.
if response['code'] != '/api/status/ok':                   # If not okay...
    error = response['messages'][0]                        # First msg object.
    sys.exit('%s: %s' % (error['code'], error['message'])) # Display code,msg.

# No errors, so handle the result
result = response['result']           # Open the response envelope, get result.

# Check the number of matching bands
if len(result) == 0:
    sys.exit('Unknown band')
elif len(result) > 1:
    print "Warning: multiple bands named " + band + ". Listing first only."

result = result[0]                    # Get first band from array of matches.
if not result['album']:               # Exit if band has no albums
    sys.exit(band + ' has no known albums.')

for album in result['album']:         # Loop through the result albums.
    name = album['name']              # Album name.
    date = album['release_date']      # Release date timestamp or null.
    if not date: date = ''            
    else: date = ' [%s]' % date[0:4]  # Just the 4-digit year in brackets.
    print "%s%s" % (name, date)       # Print name and date.

4.4. A Metaweb-enabled PHP Application

In this section, we'll demonstrate how to create an online version of our album-lister application. We'll use the the server-side scripting language PHP to create the web application that was shown in Figure 1.2 of Chapter 1. Example 4.4 is a PHP file that defines a class named Metaweb. This class has a single method, named read that takes a MQL query as a PHP data structure and returns the query result as a PHP data structure. Although the language and library details differ, the code that implements this read method is much like the code you've seen in Example 4.2 and Example 4.3.

The code in Example 4.4 is commented and you should be able to follow it even if you are not familiar with PHP. One point to note is that in PHP the data structure known as an array works as both a sequential array and as an associative array. That is, JSON objects and JSON arrays are both arrays in PHP. Example 4.4 depends on an external module for JSON serialization and parsing. The module used here is from http://pear.php.net.

Example 4.4. metaweb.php: using mqlread with PHP

<?php
/*
 * The Metaweb class defines a read() method for invoking the Metaweb
 * mqlread service on api.freebase.com. read() takes a MQL query (as a
 * PHP array), sends that query to the mqlread service and retrieves 
 * the response. It parses the response to a PHP array, and extracts
 * the query result from the response envelope and returns it. If the
 * query fails, it returns null (without providing useful diagnostics).
 */
require "JSON.php"; // A JSON encoder/decoder from http://pear.php.net

class Metaweb {
  var $json;  // Holds the JSON encoder/decoder object
  var $URL = "http://api.freebase.com/api/service/mqlread";

  // Our constructor function sets up the JSON encoder/decoder
  function Metaweb() {
    // Set up our JSON encoder and decoder object
    $this->json = new Services_JSON(SERVICES_JSON_LOOSE_TYPE);
  }
  
  // This method submits a query and synchronously returns its result.
  function read($queryobj) {
    // Put the query into an envelope object
    $envelope = array("query" => $queryobj);

    // Serialize the envelope object to JSON text
    $serialized = $this->json->encode($envelope);

    // Then URL encode the serialized text
    $encoded = urlencode($serialized);

    // Now build the URL that represents the query
    $url = $this->URL . "?query=" . $encoded;

    // Use the curl library to send the query and get response text
    $request = curl_init($url);

    // Return the result instead of printing it out.
    curl_setopt($request, CURLOPT_RETURNTRANSFER, TRUE);

    // Now fetch the URL
    $responsetext = curl_exec($request);
    curl_close($request);

    // Parse the server's response from JSON text into a PHP array
    $response = $this->json->decode($responsetext);

    // Return null if the query was not successful
    if ($response["code"] !== "/api/status/ok")
        return null;

    // Otherwise, return the query result from the envelope
    return $response["result"];
  }
}
?>

With the PHP utility function defined in Example 4.4, it becomes easy to write simple Metaweb-enabled web applications in PHP. Example 4.5 demonstrates. It displays an HTML form in which the user can enter the name of a band. When the form is submitted, it lists the albums by that band.

Example 4.5. albumlist.php: A Metaweb-enabled web application in PHP

<html>
<body>
<form>Band: <input type="text" name="band"><input type="submit"></form>
<?php
$band = $_GET["band"];        // What band is specified in the URL?
if ($band) {                  // Only list albums if a band has been specified.
  require "metaweb.php";      // Import Metaweb utility code.
  $metaweb = new Metaweb();   // Create a Metaweb object.

  // Build a MQL request for the list of albums by the band
  $query = array("type" => "/music/artist", // We want a musical artist.
                 "name" => $band,           // This is its name.
                 "album" => array());       // Fill in this empty albums array!
  
  // Submit the query using the utility function defined earlier
  $result = $metaweb->read($query);
  
  // Now output the query results in HTML
  if ($result) {                              // If we got a result...
      echo "<hr><h1>Albums by " . $band . "</h1>"; // Display page title.
      $albums = $result["album"];                  // Get the array of albums.
      foreach ($albums as $album)                  // For each album...
          echo $album . "<br>";                    // ...display album name.
  }
  else {                                      // Otherwise, print error msg.
      echo "<hr><b>Unknown band: " . $band . "</b>"; 
  }
}
?>
</body>
</html>

4.5. Metaweb Queries with JavaScript

Since the MQL syntax is based on JSON, Metaweb queries are most gracefully expressed in JavaScript. We haven't seen a JavaScript-based Metaweb application so far for one important reason: the same-origin policy. The same-origin policy is a sweeping (but necessary) security restriction in JavaScript that says that code embedded in a document that was served by server A can only interact with content that is also served by server A. This restriction applies to the XMLHttpRequest object which is what is typically used to fetch the contents of a URL. A web application hosted at api.freebase.com can use XMLHttpRequest to submit MQL queries to the mqlread service on that same server, but this is not allowed for web applications hosted on any other server.

There are two workarounds to this restriction. The first, and most obvious, is to run a proxy script on your own site that behaves like the mqlread service but simply forwards your query to api.freebase.com.

The second workaround is known as JSONP and relies on the fact that a query result, in JSON format, is valid JavaScript code. This means that a mqlread URL can be used as the value of the src attribute of a <script> tag. When the server returns its result, the <script> tag evaluates the JSON text as JavaScript code. Evaluating the JSON text creates the JavaScript object we want, but to make this scheme work, the script then has to be able to do something with that object. The solution is to use the callback request parameter (which was described in Section 4.2.5). If the URL for your mqlread invocation includes a callback parameter, then mqlread will take the value of that parameter to be the name of a JavaScript function. Then, instead of simply returning a JSON text, it will return the specified function name, an open parenthesis, the JSON text and a close parenthesis. When used this way with a <script> tag, the JSON text is evaluated, and the object that results is passed to the specified function (which you must have defined previously).

We use the <script>-based JSONP technique in this chapter: it is simple, elegant and in common use across the internet.

One thing you'll notice about our JavaScript examples here is that they are asynchronous: when you submit a query you do not get the result immediately. Instead, the callback function you specify is invoked when the result is available. This asynchronous programming model is common in client-side JavaScript, but is substantially different from the synchronous model demonstrated in Example 4.4 and other examples.

4.5.1. Invoking mqlread with the jQuery Library

Let's jump right in. Example 4.6 is an album lister web application like Example 4.5, but it communicates with freebase.com using JavaScript on the client side instead of PHP on the server side. It uses the popular jQuery library to do JSONP-based interaction with mqlread and to build and traverse the HTML document that displays the query results. Download jquery.js (version 1.2 or later) from http://jquery.com. Example 4.6 also relies an external library named json2.js. This file defines JavaScript functions for parsing and serializing JSON. The code for this module is in the public domain and is available online at .

Example 4.6 is an HTML file that consists mainly of HTML code. There is a short HTML body that defines an input field into which you can type the name of a band and an HTML <div> tag into which the names of albums will be inserted. An event handler on the input field calls the JavaScript function listAlbums() whenever a new band name is entered.

There are several points to notice in the listAlbums() function. First, it includes a MQL query enclosed in a mqlread query envelope expressed in JavaScript. Since JSON is a subset of JavaScript, the JavaScript representation is very close to the pure-JSON queries we've seen elsewhere. Note, however, that JavaScript allows but does not require double quotes around most property names, and they've been omitted in this example. Second, the listAlbums() uses the jQuery function named $(). This tersely-named function is the central part of the jQuery API. It is used to find elements within an HTML document and also to create new elements. The return value of $() is a jQuery object which represents one or more document element and defines useful methods for operating on those elements. Third, the code that displays the list of album names uses jQuery.each() to iterate through the elements of the albums array and invoke the specified function on each of the album objects.

Finally, and most importantly, listAlbums() uses jQuery.getJSON() to invoke mqlread. The first argument specifies the URL of mqlread. The second argument specifies the names and values of the query parameters, and the third argument specifies a callback function to be invoked when the query results are ready. The only tricky detail here is that jQuery.getJSON() normally uses an XMLHttpRequest. To force it to use JSONP instead, we have to append the string "callback=?" to the URL. When it sees that string in the URL, it generates a unique JSONP callback name and inserts it into the URL in place of the question mark. The code for this jQuery-based album lister follows. Later, in Example 4.8 we'll see how you can perform JSONP "manually" in your own custom JavaScript library.

Example 4.6. simplelist.html: Listing albums with jQuery

<html>
<head>
<script src="jquery.js"></script>    <!-- Ajax and DOM magic -->
<script src="json2.js"></script>     <!-- JSON.stringify() -->
<script>                                
function listAlbums(band) {     // Display albums by the specified band.
    var envelope = {                       // The mqlread query envelope
        query : {                          // The MQL query 
            type: "/music/artist",         // Find a band
            name: band,                    // With the specified name
            album: [{                      // We want to know about albums
                name:null,                 // Return album names
                release_date:null,         // And release dates
                sort: "release_date",      // Order by release date
                "release_type!=":"single"  // Don't include singles
            }]
        }
    };

    var output = $("#output");                          // Output goes here
    output.html("<h1>Albums by " + band + "</h1>");     // Display a title

    // Invoke mqlread and call the function below when it is done.
    // Adding callback=? to the URL makes jQuery do JSONP instead of XHR.
    jQuery.getJSON("http://api.freebase.com/api/service/mqlread?callback=?",
                   {query: JSON.stringify(envelope)},   // URL parameters
                   displayResults);                     // Callback function

    // This function is invoked when we get the result of our MQL query
    function displayResults(response) {  
        if (response.code == "/api/status/ok" &&        
            response.result && response.result.album) { // Check for success...
            var list = $("<ul>");                       // Make <ul> tag.
            output.append(list.hide())                  // Keep it hidden
            var albums = response.result.album;         // Get albums.
            jQuery.each(albums, function() {            // Loop through albums.
                list.append($("<li>").html(this.name)); // Make <li> for each.
            });
            list.show("normal");                        // Reveal the list
        }
        else {                                          // On failure...
            output.append("Unknown band: " + band);     // Display message.
        }
    }
}
</script>
</head>
<body>
<b>Enter the name of a band: </b>                     <!-- Prompt for input -->
<input type="text" onchange="listAlbums(this.value)"> <!-- Get band name -->
<hr><div id="output"></div>                           <!-- Display output -->
</body>
</html>

4.5.2. Listing Albums and Tracks with JavaScript

Example 4.7 is a more complicated example than Example 4.6: in addition to listing albums by a band, it can also list tracks on an album. Its output is pictured in Figure 4.1. In addition to its more sophisticated output, it does not rely on the jQuery library, instead performing HTML document creation and manipulation using lower-level DOM calls.

Example 4.7 depends on two external modules. json2.js is the public-domain JSON parser and serializer that we used in Example 4.6. The second module of external code is metaweb.js. This module, whose code listed in Example 4.8, defines the utility function Metaweb.read() that submits a MQL query, through a <script> tag, to the mqlread service. This Metaweb.read() function performs its own JSONP networking without relying on jQuery.

Figure 4.1. Listing albums and tracks

Listing albums and tracks

Example 4.7 lists the albums (it uses the != constraint to exclude singles) by a specified band, and also displays the tracks on an album when the user clicks on the name of the album. There are several features worth noting in this example. First, notice that this code uses the Metaweb.read() function to send queries to the mqlread service. We'll see how Metaweb.read() is implemented in the next section. Second, note that the code displays a "Loading..." message to the user while the queries are pending and also displays an appropriate message if a query fails. element. Finally, in addition to querying album and track names, the queries in this example also ask for album release date and track length. The example includes utility functions to massage this data, extracting a year from a /type/datetime string and converting a track length in seconds into the more familiar mm:ss format.

Example 4.7. albumlist.html: a JavaScript album and track lister

<html>
<head>
<script src="json2.js"></script>     <!-- Defines JSON.stringify() -->
<script src="metaweb.js"></script>   <!-- Defines Metaweb.read() -->
<script>                                
/* Display albums by the specified band */
function listAlbums(band) {
    // Find the document elements we need to insert content into
    var title = document.getElementById("title");
    var albumlist = document.getElementById("albumlist");
    var tracklist = document.getElementById("tracklist");

    title.innerHTML = "Albums by " + band;           // Set the page title 
    albumlist.innerHTML = "<b><i>Loading...</i></b>" // Album list is coming...
    tracklist.style.visibility = "hidden";           // Hide any old tracks
    
    var query = {                      // This is our MQL query 
        type: "/music/artist",         // Find a band
        name: band,                    // With the specified name
        album: [{                      // We want to know about albums         
            name:null,                 // Return album names
            id:null,                   // Also ids
            release_date:null,         // And release dates
            sort: "release_date",      // Order by release date
            "release_type!=":"single"  // Don't include singles
        }]
    };

    // Issue the query and invoke the function below when it is done
    Metaweb.read(query, displayAlbums);

    // This function is invoked when we get the result of our MQL query
    function displayAlbums(result) {  
        // If no result, the band was unknown.
        if (!result || !result.album) {
            albumlist.innerHTML = "<b><i>Unknown band: " + band + "</i></b>";
            return;
        }
        
        // Otherwise, the result object matches our query object, 
        // but has album data filled in.  
        var albums = result.album;  // the array of album data
        // Erase the "Loading..." message we displayed earlier
        albumlist.innerHTML = "";
        // Loop through the albums
        for(var i = 0; i < albums.length; i++) {
            var name = albums[i].name;                   // album name 
            var year = getYear(albums[i].release_date);  // album release year
            var text = name + (year?(" ["+year+"]"):""); // name+year

            // Create HTML elements to display the album name and year.
            var div = document.createElement("div");
            div.className = "album";
            div.appendChild(document.createTextNode(text));
            albumlist.appendChild(div);

            // Add an event handler to display tracks when an album is clicked
            div.onclick = makeHandler(name, albums[i].id);
        }

        // This function returns a function.  We do it this way to create
        // a closure that captures the band name and id.
        function makeHandler(name, id) {
            return function(e) { listTracks(name, id); }
        }
    }

    // A utility to return the year portion of a Metaweb /type/datetime
    function getYear(date) {
        if (!date) return null;
        if (date.length == 4) return date;
        if (date.match(/^\d{4}-/)) return date.substring(0,4);
        return null;
    }
}

/* Display the tracks on the specified album by the specified band */
function listTracks(albumname, albumid) {
    // Begin by displaying a Loading... message
    var tracklist = document.getElementById("tracklist");
    tracklist.innerHTML = "<h2>" + albumname + "</h2><p>Loading...";
    tracklist.style.visibility = "visible";

    // This is the MQL query we will issue
    var query = {
        type: "/music/album",
        id: albumid,
        // Get track names and lengths, sorted by index
        track: [{name:null, length:null, index:null, sort:"index"}]
    };

    // Issue the query, invoke the nested function when the response arrives
    Metaweb.read(query, function(result) {
                     if (result && result.track) { // If result is defined
                         var tracks = result.track;  // array of tracks
                         // Build an array of track names + lengths
                         var listitems = [];
                         for(var i = 0; i < tracks.length; i++) {
                             var n = tracks[i].name + " (" +
                               toMinutesAndSeconds(tracks[i].length) + ")";
                             listitems.push(n);
                         }
                         // Display the track list by setting innerHTML
                         tracklist.innerHTML = "<h2>" + albumname + "</h2>" +
                             "<ol><li>" + listitems.join("<li>") + "</ol>";
                     }
                     else {
                         // If empty result display error message
                         tracklist.innerHTML = "<h2>" + albumname + "</h2>" +
                             "<p>No track list is available.";
                     }
                 });

    // Convert track length in seconds to minutes:seconds format
    function toMinutesAndSeconds(seconds) {
        var minutes = Math.floor(seconds/60);
        var seconds = Math.floor(seconds-(minutes*60));
        if (seconds <= 9) seconds = "0" + seconds;
        return minutes + ":" + seconds;
    }
};
</script>
<!-- A CSS stylesheet to make the output look nice -->
<style> 
#albumlist { width:50%; padding: 5px; }
#tracklist {
  width: 45%; float:right; visibility:hidden;
  padding: 5px; border: solid black 2px; margin-right: 10px;
  background-color: #8a8;
}
#tracklist h2 { font: bold 16pt sans-serif;  text-align: center;}
#tracklist p { text-align: center; font: italic bold 12pt sans-serif; }
#tracklist li { font-style: italic;}
div.album {  font: bold 12pt sans-serif; margin: 2px;}
div.album:hover {text-decoration: underline;}
</style>
</head>
<body>
<!-- The HTML form in which the user can enter the name of a band -->
<!-- It invokes listAlbums() when the user hits Return or clicks the button -->
<form onsubmit="listAlbums(this.band.value); return false;">
<b>Enter the name of a band: </b>
<input type="text" name="band">
<input type="submit" value="List Albums">
</form>
<hr>
<!-- This is where we insert the results of our Metaweb queries -->
<h1 id="title"></h1>        <!-- display band name here -->
<div id="tracklist"></div>  <!-- list tracks here -->
<div id="albumlist"></div>  <!-- list of albums here -->
</div>
</body>
</html>

4.5.3. Client-side MQL Queries with <script>

In this section, we develop the Metaweb.read() utility function used by Example 4.7. The code in Example 4.8 is short but somewhat complicated because it performs JSONP networking. The key to understanding it is to realize that each call to Metaweb.read() defines a function with a name like Metaweb._3() (the number is different on each invocation). This function does the work of processing the response from the Metaweb server. In order to get this function invoked, Metaweb.read() adds a callback parameter to the mqlread query URL, like this:

&callback=Metaweb._3

When the mqlread service is invoked with this callback parameter, it does not return the result as a pure JSON object. Instead it returns JavaScript code. The code is simply a function invocation of the function named by the parameter. The invocation includes a JSON object as the single argument to the function:

Metaweb._3(/* JSON object goes here */)

Since JSON is a subset of the JavaScript object and array literal syntax, any JSON object is a valid function argument. By simply wrapping a function invocation around the JSON object, we've converted the mqlread response into a form suitable for use with a <script> tag.

Note that Metaweb.read() uses JSON.stringify() to serialize the query object into JSON form. This utility function is defined in the json2.js module from http://www.JSON.org/json2.js. The accompanying JSON.parse() function is not required, however, since the JavaScript interpreter that processes the <script> tag serves as our JSON parser.

Once you've understood the use of the callback parameter, and the JSON.stringify() function, the other important feature to note about Metaweb.read() is that it can submit multiple MQL queries in a single mqlread request. Each query must have a corresponding function to which results are passed, and you can pass any number of query/function pairs to Metaweb.read().

The implementation of Metaweb.read() shows how this is done: the function uses the queries parameter to mqlread instead of the query parameter, and it places each individual query envelope into an outer envelope object, giving each query a name: q0, q1, and so on.

Example 4.8. metaweb.js: Metaweb queries with script tags

/**
 * metaweb.js: 
 *
 * This file implements a Metaweb.read() utility function using a <script>
 * tag to generate the HTTP request and the URL callback parameter to
 * route the response to a specified JavaScript function.
 **/
var Metaweb = {};                         // Define our namespace
Metaweb.HOST = "http://api.freebase.com"; // The Metaweb server
Metaweb.MQLREAD = "/api/service/mqlread"; // The mqlread service on that server

// This function submits one or more MQL queries to the mqlread service.
// When the results are available, it asynchronously passes them to 
// the specified callback functions.  The function expects an even number
// of arguments: each pair of arguments consists of a query and a 
// callback function.
Metaweb.read = function(/* q0, f0 [, q1, f1...] */) {
    // Figure out how many queries we've been passed
    if (arguments.length < 2 || arguments.length % 2 == 1)
        throw "Wrong number of arguments to Metaweb.read()";
    var nqueries = arguments.length / 2;

    // Place each query in a query envelope, and put each query envelope
    // in an outer envelope.  Also, store the callbacks in an array for
    // later use.
    var envelope = {}                          // The outer envelope
    var callbacks = new Array(nqueries);       // An array to hold callbacks
    for(var i = 0; i < nqueries; i++) {        // For each query/callback pair
        var inner = {"query": arguments[i*2]}; // Make inner query envelope
        var qname = "q" + i;                   // Property name for the query
        envelope[qname] = inner;               // Put inner envelope in outer
        callbacks[i] = arguments[i*2 + 1];     // Callback for the query
    }

    // Serialize and encode the envelope object.
    var serialized = JSON.stringify(envelope);    // http://json.org/json2.js
    var encoded = encodeURIComponent(serialized); // Core JavaScript function

    // Start building the URL
    var url = Metaweb.HOST + Metaweb.MQLREAD +  // Base mqlread URL
        "?queries=" + encoded;                  // Queries request parameter

    // Get a callback function name for this url
    var callbackName = Metaweb.makeCallbackName(url);

    // Add the callback parameter to the URL
    url += "&callback=Metaweb." + callbackName;

    // Create the script tag that will fetch the contents of the url
    var script = document.createElement("script");

    // Define the function that will be invoked by the script tag.
    // This function expects to be passed an outer response envelope.
    // It extracts query results and passes them to the corresponding callback.
    // The function throws exceptions on errors. Since it is invoked
    // asynchronously, those exceptions can't be caught, but they will
    // appear in the browser's JavaScript console as useful diagnostics.
    Metaweb[callbackName] = function(outer) {
        // Throw an exception if there was an invocation error.
        if (outer.code != "/api/status/ok") {  // Should never happen
            var error = outer.messages[0];
            throw outer.status + ": " + error.code + ": " + error.message;
        }

        var errors = [];  // An array of error messages to be thrown later

        // For each query, get the response envelope, test for success,
        // and pass query results to the corresponding callback function.
        // If any query (or callback) fails, save an error to throw later.
        for(var i = 0; i < nqueries; i++) {
            var qname = "q" + i;            // Query property name
            var inner = outer[qname];       // Extract inner envelope
            // Check for query success or failure
            if (inner.code == "/api/status/ok") {
                try {
                    callbacks[i](inner.result); // On success, call callback
                } catch(ex) {
                    // Remember any exceptions caused by the callback
                    errors.push("Exception from callback #" + i + ": " + ex);
                }
            }
            else {
                // If it failed, add all of its error messages to errors[].
                for(var j = 0; j < inner.messages.length; j++) {
                    var error = inner.messages[j];
                    var msg = "mqlread error in query #" + i +
                        ": " + error.code + ": " + error.message;
                    errors.push(msg);
                }
            }
        }

        // Now perform some cleanup
        document.body.removeChild(script);   // Remove the <script> tag
        delete Metaweb[callbackName];        // Delete this function

        // Finally, if there were any errors, raise an exception now so they
        // at least get reported in the JavaScript console.
        if (errors.length > 0) throw errors.join("\n");
    };

    // Now set the URL of the script tag and add that tag to the document.
    // This triggers the HTTP request and submits the query.
    script.src = url
    document.body.appendChild(script);
};

// This function returns a callback name that is not currently in use.
// Ideally, to support caching, the name ought to be based on the URL so the
// same URL always generates the same name.  For simplicity, however, we
// just increment a counter here.
Metaweb.makeCallbackName = function(url) {
    return "_" + Metaweb.makeCallbackName.counter++;                     
};
Metaweb.makeCallbackName.counter = 0; // Initialize the callback name counter.

4.6. The Metaweb Search Service

The freebase.com website includes a search text box in the upper-right corner. Type in some text and hit return, and freebase will list database entries that match your query. While you type it offers a drop-down menu of suggestions (or auto-completions). Both the search function and the auto-complete function are powered by Metaweb's search service. For every object in the database, the search service indexes the name and aliases (/common/topic/alias) of the object, as well as any other properties of /type/text, and any documents associated (such as through /common/topic/article) with the object. Search results are ordered according to an opaque, but well-tuned ranking system to yield the most relevant results first. At the time of this writing [20] Metaweb's search engine is not language-aware, and indexes all text without regard to its source language.

This section describes the search API so that you can use it in your own web applications. Note, however that if you simply want to duplicate on your own website the searching and drop-down suggestion functionality of freebase.com, you should consider using the open-source freebase-suggest library, which is available at http://code.google.com/p/freebase-suggest/.

Keep in mind that the search service is a sophisticated full-text search engine that indexes Metaweb objects and the documents associated with them. Many simpler searches (such as looking for objects with particular words in their names) are best expressed as MQL queries (using the ~= operator for pattern matching) using the mqlread service.

The search service is a web-based service, like mqlread. The base URL for requesting search results from freebase.com is:

https://api.freebase.com/api/service/search

Unlike mqlread, the search service does not encode queries as JSON objects, and instead passes the query text and other search variables as URL parameters. For example, to search for people named "Smith", you could use this URL:

https://api.freebase.com/api/service/search?query=Smith&type=/people/person

The URL parameters to search are described in Section 4.6.1. If you enter the above URL into a browser, you'll see that the search service, like mqlread, returns its results as a JSON object. The format of the results are covered in Section 4.6.2.

The sub-sections that follow describe the input and output of the search service and then demonstrate its use with a JavaScript example.

4.6.1. Search Input

Input to the search service takes the form of HTML form-encoded parameters appended to the URL. There are four categories of parameters:

  • parameters that specify the text to be matched;

  • parameters that narrow the field of search by domain or type;

  • parameters that specify the number or offset of the desired results; and

  • parameters that affect the format of the returned results.

Every invocation of the search service must include either a query parameter or a prefix parameter (but not both). Both specify text to be searched for. If you use the query parameter, your text will only match complete words. If you specify prefix, however, any word that begins with your text matches. Searches are case-insensitive and ignore punctuation and accents on characters. The search target you specify may include multiple words, but you may not include multiple query or prefix parameters in a search URL.

You can narrow the field of search (or at least change the way the search service ranks results) with the domain and type parameters. The value of the domain parameter should be the id of a Metaweb domain. For example:

https://api.freebase.com/api/service/search?query=Smith&domain=/film

This query looks for topics that match the word Smith and have at least one type in the /film domain. Results might include the /film/actor Will Smith, and the /film/film "Mr. Smith Goes to Washington".

It is unusual to do a domain constraint alone as in the example above. Here is an alternative (with the URL truncated to fit on one line) that looks for film-related people named Smith. It matches actors and producers, but not films:

/api/service/search?query=Smith&type=/people/person&domain=/film

The type parameter specifies the id of a Metaweb type. In the search URL above, it specifies that each of the matches should be a /people/person object.

A search URL may include more than one type parameter to specify more than one type. The way that type parameters affect search results is controlled by the type_strict parameter, which must have one of the following three values:

all

Search results will only include objects that are members of all of the types described.

any

Search results will only include objects that are members of at least one of the specified types. Objects that match more types will have increased relevance scores, and may appear earlier in the search results. This is the default type matching mode when no type_strict parameter is given.

should

The specified types will be used in computing the relevance of the search results, but results may include objects that do not match any of the specified types.

Here are some more search URLs (abbreviated so they fit on one line) with comments indicating what they do:

// Find objects matching "Smith" that are either films or actors.
// type_strict=any is implicit in this search
search?query=Smith&type=/film/film&type=/film/actor

// Find objects matching "Smith" that are both films and actors.
// Results, if any, probably represent typing errors in the database.
search?query=Smith&type=/film/film&type=/film/actor&type_strict=all

// Find objects matching "Smith"; give priority to films and actors
search?query=Smith&type=/film/film&type=/film/actor&type_strict=should

By default, the search service returns the 20 most relevant results. You can request a different number of results with the limit parameter. If you want to retrieve another page of less-relevant results, you can use the start parameter:

search?query=Smith                   // Results 1-20
search?query=Smith&limit=10          // Results 1-10
search?query=Smith&start=10&limit=10 // Results 11-20

The final category of URL parameters are those miscellaneous parameters that affect the formatting of the results. Use escape=false to turn off HTML escaping of the &, <, and > characters in the results. Use indent=true if you want the JSON string of search results to be pretty-printed so that it is more human-readable. This parameter is useful when experimenting with search URLs in a web browser, but is not necessary when executing searches from scripts. The search results shown in the next section are pretty-printed, even though the indent parameter is omitted from the search URLs.

If you want to use the search service from client-side JavaScript code, you'll need to use the callback parameter. This parameter enables JSONP just as it does for mqlread: it wraps a JavaScript function invocation around the JSON result string. Using the search service with JavaScript is demonstrated in Example 4.9 and Example 4.10.

Finally, the mql_output parameter specifies which properties of each matching object are to appear in the search results. See Section 4.6.2 for details and examples.

4.6.2. Search Output

The search service returns a JSON-encoded envelope object just as the mqlread service does. This object has a code property that specifies whether the search succeeded or failed. If this code is anything other than "/api/status/ok", then the search failed. In this case, the envelope object has a messages property whose value is an array of one or more message objects. The messages array returned by search is just like that returned by mqlread: each element of this array includes a code property that identifies the specific kind of error and a message property that can be used to generate a diagnostic message. Certain message code values also have an associated info property that provides error details, but the list of codes and the format of their associated info objects is not documented.

If you get an error from the search service, it typically means that you invoked it incorrectly. Errors can occur if (for example) you don't specify either the query or prefix parameter, or if you specify an illegal value for the type_strict parameter, or if you specify an undefined id as the value of the type or domain parameter. If a search URL is properly constructed, the search is considered a success, even if it matches nothing and returns no results.

If the code property of the envelope object is "/api/status/ok", then the query was a success, and the envelope has a result property that holds a (possibly empty) array of search results.

The following search URL, for example:

https://api.freebase.com/api/service/search?query=Smith&limit=1

might return these results:

{
  "status": "200 OK", 
  "code": "/api/status/ok", 
  "transaction_id":"cache;cache01.p01.sjc1:8101;2008-09-16T21:47:14Z;0006",
  "result": [
    {
      "id": "/guid/9202a8c04000641f80000000004f16a0",
      "name": "William Smith", 
      "alias": [], 
      "type": [
        { "id": "/common/topic", "name": "Topic" }, 
        { "id": "/people/person", "name": "Person" }, 
        { "id": "/people/deceased_person", "name": "Deceased Person" }
      ], 
      "article": { "id": "/guid/9202a8c04000641f80000000004f16a5" }, 
      "image": null
    }
  ]
}

The envelope object includes status, code and transaction_id properties just mqlread response envelopes do. The interesting part of the envelope object is the result array. For this example query, we explicitly specified a limit of 1, so the array has only one element, but in general each element of the result array provides information about a single Metaweb object that matched the search query. By default (if you do not specify a mql_output parameter in the search query), each element is an object with the following properties:

id

The Metaweb id of the matched object.

name

The name of the matched object. (At the time of this writing, [21] there is no way to specify the preferred language in which the name should be returned.)

alias

An array of nicknames for the object. These are the values of the /common/topic/alias property.

type

An array that specifies the types of the object. Each element of this array is an object with id and name properties that specify the Metaweb id and the human-readable name of the type.

article

If the matched object has at least one associated document as the value of the /common/topic/article property, then this result property refers to an object with a single id property. This article.id property is the Metaweb id of the most recent document associated with the object. A blurb from this article can be a useful addition to search results. (Section 4.7 shows how to retrieve article blurbs.) If the matched object has no associated documents, then this property is null.

image

If the matched object has at least one associated image as the value of the /common/topic/image property, then this property refers to an object with nothing but an id property. image.id is a Metaweb id of the first image (the image with index:0, see Section 3.5.4). It may be useful to include a thumbnail of this image in a listing of search results. (Section 4.7 shows how to retrieve image thumbnails.) If the matched object has no associated images, then this image property is null.

These default properties of the elements of the results array can be overridden with the mql_output URL parameter. The value of this parameter should be a MQL query in square brackets. The search service adds the following properties to the query you specify:

"guid": null
"guid|=": [guids of all matching objects here]

The guid|= property specifies the guids of all objects that matched the search. The query is passed to mqlread, and the results become the results array of the search. Consider this query (which wraps onto two lines), for example:

https://api.freebase.com/api/service/search?query=love&type=/music/track&limit=3
&mql_output=[{"name":null,"/music/track/artist":null}]

It returns results like these:

{
  "status": "200 OK",
  "code": "/api/status/ok",
  "transaction_id":"cache;cache01.sandbox.sjc1:8101;2008-09-16T23:13:07Z;0001",
  "result": [{
    "guid" : "#9202a8c04000641f8000000001268f44",
    "name" : "Tainted Love",
    "/music/track/artist" : "Soft Cell"
  },{
    "guid" : "#9202a8c04000641f80000000012b0704",
    "name" : "Endless Love",
    "/music/track/artist" : "Diana Ross &amp; Lionel Richie"
  },{
    "guid" : "#9202a8c04000641f800000000129a206",
    "name" : "The Power of Love",
    "/music/track/artist" : "Huey Lewis &amp; the News"
  }]
}

4.6.3. Example: Searching for Band Names

To demonstrate the search service, let's extend the metaweb.js module of Example 4.8. Example 4.9 defines a JavaScript function named Metaweb.search() that invokes the search service. The code in this example is intended as an extension of Example 4.8. It depends on the Metaweb object, Metaweb.HOST constant and Metaweb.makeCallbackName() function defined in that previous example.

Example 4.9. metaweb.js: searching Metaweb

Metaweb.SEARCH = "/api/service/search";  // URL path to the search service

// Invoke the Metaweb search service for the specified query.
// Asynchronously pass the array of results to the specified callback function.
// 
// The first argument can be a string for simple searches or an object
// for more complex searches.  If it is a string, it should take the form
//    [type:]text[*]
// That is: the text to be searched for, optionally prefixed by a type id
// and a colon and optionally suffixed with an asterisk.  Specifying a type 
// sets the type parameter for the search, and adding an asterisk makes it a 
// prefix search.
//
// If query argument is an object, then its properties are translated into 
// search parameters.  In this case, the object must include either 
// a property named query (for an exact match) or a property named prefix
// (for a prefix match).  Other legal properties are the same as the 
// allowed parameters for the search service: type, type_strict, domain, 
// limit, start, and so on.  To specify multiple types, set the 
// type property to an array of type ids.  To specify a single type, set
// the type property to a single id.
Metaweb.search = function(query, callback) {
    var q = {};  // The query object

    if (typeof query == "string") {
        // If the query argument is a string, we must convert it to an object.
        // First, see if there is a type prefix
        var colon = query.indexOf(':');
        if (colon != -1) {
            q.type = query.substring(0, colon);
            query = query.substring(colon + 1);
        }

        // Next see if there is an asterisk suffix
        if (query.charAt(query.length-1) == '*') // prefix match
            q.prefix = query.substring(0, query.length-1);
        else
            q.query = query;
    }
    else { 
        // Otherwise, assume the query argument is an object and 
        // copy its properties into the q object.
        for(var p in query) q[p] = query[p];
    }

    // With mqlread, we would JSON-encode the query object q.  For the search
    // service, we convert the properties of q to an array of URL parameters
    var parameters = [];
    for(var name in q) {
        var value = q[name];

        if (typeof value != "object") { // A single value for the parameter
            var param = name + "=" + encodeURIComponent(value.toString());
            parameters.push(param);
        }
        else { // Otherwise, there is an array of values: multiple types
            for(var index in value) {
                var elt = value[index];
                var param = name + "=" + encodeURIComponent(elt.toString());
                parameters.push(param);
            }
        }
    }

    // Now convert the array of parameters into a URL 
    var url = Metaweb.HOST + Metaweb.SEARCH + "?" + parameters.join('&');

    // Generate a name for the function that will receive the results
    var cb = Metaweb.makeCallbackName(url);

    // Add the JSONP callback parameter to the url
    url += "&callback=Metaweb." + cb;

    // Create the script tag that will fetch that URL
    var script = document.createElement("script");

    // Define the function that handles the results from that URL
    Metaweb[cb] = function(envelope) {
        // Clean up by erasing this function and deleting the script tag
        document.body.removeChild(script);
        delete Metaweb[cb];

        // If the query was successful, pass results to the callback
        // Otherwise, throw an error message
        if (envelope.code == "/api/status/ok")
            callback(envelope.result);
        else {
            throw "Metaweb.search: " + envelope.messages[0].code +
                ": " + envelope.messages[0].message;
        }
    }

    // Now set the URL of the script tag and add that tag to the document.
    // This triggers the HTTP request and submits the search query.
    script.src = url
    document.body.appendChild(script);
};

Example 4.10 demonstrates how this Metaweb.search() function might be used in practice. It is a new version of the JavaScript-based album listing application shown in Example 4.7. The relevant new feature of this version is that if the user enters a name that is not a known band name, the application uses that name a in a search query and lists the results. For brevity, the album-listing features of this new version have been simplified, and the track-listing features have been removed.

Example 4.10. albumlist2.html: Searching for bands

<html>
<head>
<script src="json2.js"></script>     <!-- Defines JSON.stringify() -->
<script src="metaweb.js"></script>   <!-- Defines Metaweb.read() -->
<script>                                
/* Display albums by the specified band */
function listAlbums(band) {
    // Find the document elements we need to insert content into
    var title = document.getElementById("title");
    var albumlist = document.getElementById("albumlist");

    var query = [{             // This is our simple MQL query 
        type: "/music/artist", // Find a band
        name: band,            // With the specified name
        album: []              // And return all album names       
    }];

    // Issue the query and invoke the function below when it is done
    Metaweb.read(query, function(result) {  
        // If no result, the band was unknown, so search for matches
        if (!result || result.length == 0) searchForBands(band);
        // Otherwise, we found a band, so list its albums
        else {
            
            title.innerHTML = "Albums by " + band; // Display title      
            if (result[0].album.length == 0)       // If no albums, say so
                albumlist.innerHTML = "No albums found.";
            else                                   // Display the list
                albumlist.innerHTML = result[0].album.join("<br>");
        }
    });
}

// Find names of bands matching the user's partial input
function searchForBands(band) {
    // The Metaweb.search function will translate this object into
    // URL parameters for the search service.
    var query = { 
        prefix: band,            // Prefix search
        type: "/music/artist",   // for bands (using default type_strict:all)
    };

    Metaweb.search(query, function(results) {
        // If the search returns no results then we don't know what band
        if (results.length == 0) {
            document.getElementById("title").innerHTML = "Unknown Band"
            document.getElementById("albumlist").innerHTML = "";
        }
        // Otherwise, display a list of links to possible bands
        else {
            document.getElementById("title").innerHTML = "Do you mean..."
            var links = new Array(results.length);
            for(var i = 0; i < results.length; i++) {        // For each result
                var band = results[i].name;                  // Get band name
                links[i] = '<a href="javascript:listAlbums(\'' + // make link
                           band.replace("'","\\'") + '\')">' +
                           band + '</a>';
            }
            // Output list of links
            document.getElementById("albumlist").innerHTML=links.join("<br>");
        }
    });
}
</script>
</head>
<body>
<!-- The HTML form for entering a band name -->
<form onsubmit="listAlbums(this.band.value); return false;">
<b>Enter the name of a band: </b>
<input type="text" name="band">
<input type="submit" value="List Albums">
</form>
<hr>
<!-- This is where we insert the results of our Metaweb queries -->
<h1 id="title"></h1>        <!-- display band name here -->
<div id="albumlist"></div>  <!-- list of albums here -->
</div>
</body>
</html>

4.7. Fetching Content with trans

As explained in Chapter 2, Metaweb is really two databases in one. One database is the graph of nodes and relationships. The second is the content store that holds chunks (or "blobs") of data such as HTML documents and graphical images. We use mqlread service to retrieve data from the graph, and we use the trans service to retrieve content from the content store.

The trans service is so named because in addition to fetching the requested data, it can also translate it for you. For example, it can "translate" an image to thumbnail size.

The trans service is HTTP based, just as mqlread is. Content is retrieved by specifying the desired translation and the content id, with a URL of this form:

https://api.freebase.com/api/trans/translation/id

Here, for example, is an actual trans URL:

https://api.freebase.com/api/trans/raw/guid/9202a8c04000641f8000000003c1978c

The translation portion of a trans URL must be one of the following:

raw

Use raw to request that no translation is to be done on the data: it should be returned as is. (Note, however that HTML content is not completely raw: it is "sanitized" by stripping executable content such as JavaScript.)

image_thumb

Use image_thumb to request a thumbnail-sized version of an image. You can add request parameters maxwidth and maxheight to the URL to specify the desired pixel dimensions of the thumbnail. The aspect ratio of the original image is always preserved, and, by default, the image is not cropped, so if you specify both maxwidth and maxheight, the resulting image will typically match only one of those dimensions. The default value of both parameters is 75.

If you want to specify the exact size of both dimensions of the thumbnail, add mode=fillcrop to the URL: this will cause one dimension to be cropped as necessary, while still preserving the image's aspect ratio.

blurb

Use blurb to request an excerpt of document content. This provides a kind of a preview, of the kind you might see in a list of search results. In HTML documents, only content within <p> tags is returned, and all HTML tags are normally stripped. The default blurb length is 200 bytes, but you can alter this with the maxlength request parameter. If a blurb spans multiple paragraphs, the paragraph breaks are usually removed (along with other HTML tags). Add the request parameter break_paragraphs=true to preserve HTML paragraph breaks in the blurb.

The path component that follows the translation is a Metaweb id. The id passed to trans must identify an object of type /type/content, /common/image or /common/document. These three types are closely related:

/type/content

A /type/content object is the representation in the Metaweb graph of an entry in the Metaweb content store.

/common/image

When an image is added to the content store, the /type/content object for the image is co-typed /common/image, in order to add a size property that supplies the image dimensions. For images, therefore, the id of the /type/content and /common/image objects are the same.

/common/document

When document content is added to the content store, a /type/content object is created to represent the entry in the content store. Typically a separate /common/document object is also created. The content property of the document object refers to the content object. Other properties of the /common/document object provide additional meta-information about the document. The reason that /common/document and /type/content are separate objects in this case (instead of just one object with two types) is for versioning: the content property of the /common/document can easily be updated to refer to a different /type/content object when the document changes.

/common/document objects can also represent Wikipedia document content (which is not stored in the Metaweb content store). Documents that represent Wikipedia entries have content properties of null (and have a key in the /wikipedia/en_id namespace that defines the Wikipedia id of the document).

The trans service works with both Wikipedia and non-Wikipedia documents. For non-Wikipedia documents, you can use either the id of the /common/document object or of the /type/content object it refers to.

The following MQL query asks Metaweb for the ids of documents and images related to our favorite band, The Police:

Query Result
{
  "id": "/en/the_police",
  "type":"/common/topic",
  "article":[{"id":null}],
  "image":[{"id":null}]
}
{
  "id" : "/en/the_police",
  "type" : "/common/topic",
  "article" : [{
    "id" : "/guid/9202a8c04000641f800000000006df25"
  }],
  "image" : [{
    "id" : "/wikipedia/images/en_id/982873"
  },{
    "id" : "/wikipedia/images/commons_id/3520500"
  }],
}

Given these results, we can retrieve the document and the first image with these URLs:

https://api.freebase.com/api/trans/raw/guid/9202a8c04000641f800000000006df25
https://api.freebase.com/api/trans/raw/wikipedia/images/en_id/982873

And the following URLs (truncated on the left so they fit on a line) retrieve a short blurb and a big thumbnail:

/api/trans/blurb/guid/9202a8c04000641f800000000006df25?maxlength=20
/api/trans/image_thumb/wikipedia/images/en_id/982873?maxwidth=200&maxheight=200

In web applications it is often easiest to use the trans service with <img> and <iframe> tags. To retrieve and display an image, simply use a trans URL as the src attribute of an <img> tag. And to retrieve and display the HTML content of a document, use a trans URL as the src attribute of an <iframe> or the href attribute of a hyperlink.

It is also possible, of course, for scripts to download the content of trans URLs themselves, and process document or image content in whatever way they want. In JavaScript code, the trans service can be used with a callback parameter just like the mqlread and search services can be. The sub-sections that follow extend our metaweb.js module to add functions for invoking the trans service. The module code is followed by two examples. One uses trans with images, iframes and hyperlinks. The other uses the callback parameter to the trans service and actually downloads document content directly.

4.7.1. JavaScript Functions for Downloading Content

Example 4.11 is JavaScript code that extends our metaweb.js module to handle the trans family of services. Three simple functions, Metaweb.contentURL(), Metaweb.blurbURL(), and Metaweb.thumbnailURL() accept an object id and return the a URL for fetching the specified content. The blurbURL function accepts optional length and paragraph breaking arguments and the thumbnailURL function accepts optional width and height arguments. Both encode these arguments into the returned URL. These URL functions will be demonstrated in Example 4.12.

The Metaweb.download() function is more interesting. It uses the callback parameter and a dynamically generated script tag to asynchronously download document content (or a document blurb) and pass it to a specified function (or insert it into a specified document element). This allows document content to be inserted directly into a document rather than isolated in a separate <iframe>. Metaweb.download() is demonstrated in Example 4.13.

The code in Example 4.11 is an extension to the metaweb.js module of Example 4.8, and Example 4.9 and is intended to be appended to those previous examples. The code shown here assumes that Metaweb, Metaweb.HOST and Metaweb.makeCallbackName() are already defined.

Example 4.11. metaweb.js: an extension for the trans service

Metaweb.RAW = "/api/trans/raw";
Metaweb.BLURB = "/api/trans/blurb";
Metaweb.THUMB = "/api/trans/image_thumb";

// Return a URL for fetching the content specified by id.
// This id must identify a /type/content object, or a /common/document or
// /common/image.  The returned URL is suitable for use as the value of
// the src attribute of <iframe> or <img> or the href attribute of <a>.
Metaweb.contentURL = function(id) {
    return Metaweb.HOST + Metaweb.RAW + id;
};

// Return the URL of an excerpt or "blurb" of the document specified by id.
// The maxlen argument specifies the length of the blurb. If maxlen is
// omitted, the default is 200. If the document is an HTML document, then only
// content within <p> tags is returned.  Normally all HTML tags are stripped
// from the returned blurb, making it plain text.  For long blurbs, this can
// cause paragraphs to run together. Pass true as the third argument to retain
// <p> tags (but strip all others) in the returned blurb.
Metaweb.blurbURL = function(id, maxlen, paragraphs) {
    var url = Metaweb.HOST + Metaweb.BLURB + id;     // Base url
    if (maxlen) url += '?maxlength=' + maxlen;       // Specify blurb length
    if (paragraphs) url += '&break_paragraphs=true'; // Include <p> tags
    return url;
};

// Return the URL of a scaled-down version of the image specified by id.
// The thumbnail always preserves the aspect ratio of the original image.
// Specify the maximum width and height of the image with the maxwidth and
// maxheight arguments.  The defaults for both are 75, meaning that the
// thumbnail will have one dimension equal to 75 and the other less than or
// equal to 75.
Metaweb.thumbnailURL = function(id, maxwidth, maxheight) {
    var url = Metaweb.HOST + Metaweb.THUMB + id;
    if (maxwidth) url += '?maxwidth=' + maxwidth;
    if (maxheight) url += '&maxheight=' + maxheight;
    return url;
}

// Download the /common/document or /type/content with id specified by from.
// If the argument to is a function, pass the document content, content type
// and encoding to the function.  Otherwise, if to is a DOM element or a 
// string that identifies a DOM element insert the content into that element.
// The third argument is optional. If specified, it should be the length
// of the desired excerpt to be downloaded with /api/trans/blurb.
Metaweb.download = function(from, to, maxlen) {
    // What service are we using?
    var service = maxlen ? "/api/trans/blurb" : "/api/trans/raw";

    // This is the URL we must request with a script tag.
    var url = Metaweb.HOST + service + from;

    // Obtain a unique name for the function to receive the download.
    var cb = Metaweb.makeCallbackName(url);

    // Add the JSONP callback parameter to the URL
    url += "?callback=Metaweb." + cb;

    // Add the maxlength argument for blurbs.
    if (maxlen && typeof maxlen == "number") url += "&maxlength=" + maxlen;

    // Create the script tag that will do the download for us.
    var script = document.createElement("script");

    // Define the uniquely-named function that receives the response.
    Metaweb[cb] = function(envelope) {
        // Clean up this function and the script tag.
        document.body.removeChild(script);   // Remove the <script> tag.
        delete Metaweb[cb];                  // Delete this function.

        // If there was an error, throw an error message
        if (envelope.code != "/api/status/ok") {
            var err = envelope.messages[0];
            throw "Metaweb.download: " + envelope.status + ": " +
              err.code + ": " + err.message;
        }
        
        // Otherwise, get the results
        var doc = envelope.result;

        // Now handle the content we've downloaded based on the type of to.
        switch(typeof to) {
        case "function":  // Pass content to a function.
            to(doc.body, doc.media_type, doc.text_encoding);
            break;
        case "string":    // Treat string as element id.
            document.getElementById(to).innerHTML = doc.body;
            break;
        case "object":    // Assume to is a DOM element.
            to.innerHTML = doc.body;
            break;
        }
    }

    // Now set the URL of the script tag and add that tag to the document.
    // This triggers the HTTP request and invokes the trans service.
    script.src = url;
    document.body.appendChild(script);
}

4.7.2. Example: What's New on freebase.com

Example 4.12 is a JavaScript-based example that displays recently-added content from freebase.com. It issues mqlread queries to find the five images and five documents most recently added to freebase.com. (It takes advantage of the fact that the Metaweb.read() function defined in Example 4.8 can issue multiple queries at the same time.) It then dynamically generates <img> and <iframe> tags to display image thumbnails and document blurbs, and uses the Metaweb.thumbnailURL() and Metaweb.blurbURL() functions defined in Example 4.11 to create URLs for the src attributes of those tags. It also creates <a> tags, and uses Metaweb.contentURL() to hyperlink to full-sized versions of the images and documents. (These links open new windows to display the image or document.)

Example 4.12. WhatsNew.html: fetching new images and documents from freebase.com

<html>
<head>
<script src="json2.js"></script>    <!-- Required by metaweb.js -->
<script src="metaweb.js"></script>  <!-- Defines Metaweb.read(), etc. -->
<script>
// How many images and how many documents do we display?
var N = 5;                                           // This is the default
if (window.location.search.substring(0,3) == "?n=")  // URL argument overrides
    N = parseInt(window.location.search.substring(3));

// The query to find the N newest images
var imageQuery = [{
    type:"/common/image", id:null,         // Return image ids
    timestamp:null, sort:"-timestamp",     // Most recent first
    limit:N,                               // Only N of them
    "/type/content/media_type":null,       // Check image type, too
    "/type/content/media_type|=":[         // We only want images that are:
        "/media_type/image/gif",             // GIF or
        "/media_type/image/png",             // PNG or 
        "/media_type/image/jpeg"             // JPEG
        ]
}];

// The query to find the N newest documents
var documentQuery = [{
    type:"/common/document", id:null,    // Return document ids
    timestamp:null, sort:"-timestamp",   // Most recent first
    limit:N                              // Only N of them
}];

// When the document has loaded, send the queries above to api.freebase.com.
// This will invoke the functions below when the results arrive.
window.onload = function() { 
    Metaweb.read(imageQuery, displayImages,
                 documentQuery, displayDocuments); 
};

// This function is invoked with the results of the image query.
function displayImages(images) {
    var container=document.getElementById("images"); // Get container element.
    for(var i = 0; i < images.length; i++) {         // Loop through images.
        var id = images[i].id;                       // Get the image id.
        var img = document.createElement("img");     // Create <img> tag 
        img.src = Metaweb.thumbnailURL(id);          // ...for image thumbnail
        img.title = images[i].timestamp;             // ...timestamp tooltip.
        var link = document.createElement("a");      // Create hyperlink
        link.href = Metaweb.contentURL(id);          // ...to a full-size image
        link.target = "_new";                        // ...in new window.
        link.appendChild(img);                       // Put image in link.
        container.appendChild(link);                 // Put link in container.
    }
}

// This function is invoked with the results of the document query.
function displayDocuments(docs) {
    container = document.getElementById("docs");     // Get container element.
    for(var i = 0; i < docs.length; i++) {           // Loop through docs.
        var id = docs[i].id;                         // Get the document id.
        var blurb =document.createElement("iframe"); // Create an iframe
        blurb.src = Metaweb.blurbURL(id);            // ...to hold doc blurb.
        var link = document.createElement("a");      // Hyperlink
        link.href = Metaweb.contentURL(id);          // ...to full document
        link.target = "_new";                        // ...in a new window.
        link.innerHTML = docs[i].timestamp;          // Use timestamp as link.
        var listitem = document.createElement("li"); // Create list item.
        listitem.appendChild(blurb);                 // Put blurb in item.
        listitem.appendChild(link);                  // Put link in item.
        container.appendChild(listitem);             // Put item in container.
    }
}
</script>
</head>
<body><!--Static document body. Thumbnails and blurbs dynamically inserted-->
  <h2>The Newest Images</h2>                  <!-- Images heading -->
  <i>Click thumbnail for full-size image</i>  <!-- Instructions -->
  <div id="images"></div>                     <!-- Thumbnails will go here -->
  <h2>The Newest Documents</h2>               <!-- Documents heading -->
  <i>Click timestamp for full document</i>    <!-- Instructions -->
  <ol id="docs"></ol>                         <!-- Document blurbs go here -->
</body>
</html>

4.7.3. Example: A Metaweb Type Browser

Example 4.13 is a JavaScript-based web application that demonstrates the Metaweb.download() function defined in Example 4.11. This application is a Metaweb type browser that displays information about any Metaweb type. Figure 4.2 shows a sample page. Clicking on the id of another type (or typing a type id in the upper right) displays information about that type. You may actually find this type browser quite useful for exploring Metaweb system types and the types in other domains.

Figure 4.2. A Metaweb type browser

A Metaweb type browser

In addition to demonstrating document download with Metaweb.download(), this example is notable because it uses a more complicated MQL query than the other examples in this chapter and because its HTML output is more complex than previous examples. The code is well-commented, and if you've understood previous JavaScript examples, you should not have trouble following this one. One feature to note is the use of a simple helper class, named DOMStream, for dynamically generated output.

Example 4.13. TypeBrowser.html: a Metaweb type browser

<html>
<head>
<!-- These are the modules we need -->
<script language="javascript" src="json2.js"></script>
<script language="javascript" src="metaweb.js"></script>
<script language="javascript"> 

// This is the query we use to get information about a type.
// Note that we have to fill in the id of the type we're interested in.
var query = {
    id:null,            // The type we're asking about. Filled in below.
    type:"/type/type",  // The type of our type is /type/type :-)

    // What is the human-readable type name?
    name:null,          

    // Objects with documentation are co-typed /freebase/documented_object.
    "/freebase/documented_object/tip":null,       // Get short description
    "/freebase/documented_object/documentation":{ // And full documentation
         "optional":true,                         // ...if there is any
         "id":null                                // Return document id.
    },

    // Types are also co-typed /freebase/type_profile
    // This property gives us the status (published, private, etc.) of the type
    "/freebase/type_profile/published":null,   // Get publication status

    // What properties does this type have?
    properties:[{
        optional:true,                         // Okay if there are none
        name:null,                             // Property name for display
        key:[],                                // Property name for MQL
        expected_type: {name:null, id:null},   // Type of property value
        unique:null                            // Is it unique?
    }],

    // What other types have properties of this type?
    expected_by:[{      
        limit:25,                              // Don't return too many
        optional:true,                         // Okay if there are none
        name:null,                             // The property name
        key:[],                                // The property key
        schema: {name:null, id:null}           // What type defines it?
    }],

    // What are some instances of this type?
    instance:[{         
        limit:25,                              // Don't return too many
        optional:true,                         // Okay if there are none
        id:null,                               // So we can link to it
        name:null                              // Object name
    }]
};

// When we're first loaded, display /common/topic, or the type in the URL
window.onload = function() {
    var type = "/common/topic";                        // Assume /common/topic
    var search = window.location.search;              
    if (search && search.indexOf("?t=") == 0)          // If URL specifes type
       type = decodeURIComponent(search.substring(3)); // Then use that one
    queryType(type);                                   // Display type info.
}

// Query the specified type.  Call displayType() when the results arrive
function queryType(type) {
    query.id = type;                  // Specify the type in the query above
    Metaweb.read(query,               // Issue the query
                 displayType);        // Pass result object to displayType
}

// Generate a page of information based on our query results.
function displayType(result) {
    // DOMStream is a helper class defined below.
    // We use it here to output HTML text to the placeholder element
    var out = new DOMStream("placeholder");
    out.clear()                              // Erase existing content

    // If we didn't get any results then the input was invalid
    if (!result) {
        out.write("No such type");           // Output failure message
        out.flush();                         // Flush output to placeholder
        return;                              // We're done.
    }

    // Now begin generating information about the type.
    // First, display the type id as the page title.
    out.write("<h1>", result.id, "</h1>");   

    // If there is a short description, display it beneath the title.
    var tip = result["/freebase/documented_object/tip"]; 
    if (tip) out.write("<i>", tip, "</i>");

    // Display the human-readable name of the type
    out.write("<h2>Name</h2>", result.name);

    // Display publication status of the type, if available.
    var status = result["/freebase/type_profile/published"];
    if (status) out.write("<h2>Status</h2>" + status);

    // Display type documentation, if there is any
    out.write("<h2>Documentation</h2>");       // Section header
    var doc = result["/freebase/documented_object/documentation"];
    if (doc && doc.id) {
        // If there was a document describing the type, output a placeholder
        // element for its content, and issue a request for the content to
        // be inserted into that element.
        out.write("<div id='docplaceholder'><i>Loading...</i></div>");
        Metaweb.download(doc.id, "docplaceholder"); // Asynchronous!
    }
    else out.write("No documentation available.");

    // Display a table of properties
    out.write("<h2>Properties</h2>")
    if (result.properties.length == 0) out.write("No properties");
    else {
        out.write('<table border="1"><tr>', 
                  '<th>Property Key</th>',
                  '<th>Property Name</th>',
                  '<th>Property Type</th></tr>');

        for(var i = 0; i < result.properties.length; i++) {
            out.write('<tr><td>', result.properties[i].key.join(", "),
                      '</td><td>', result.properties[i].name,
                      '</td><td>');
            if (result.properties[i].unique) out.write("unique ");
            displayTypeLink(out, result.properties[i].expected_type.id,
                            result.properties[i].expected_type.name);
            out.write('</td></tr>');
        }
        out.write("</table>");
    }

    // Display the properties of other types that use this type
    out.write("<h2>Used by</h2>")
    if (result.expected_by.length == 0)
        out.write("There are no Properties of this type.");
    else {
        out.write('<table border="1"><tr>', 
                  '<th>Type</th>',
                  '<th>Property Key</th>',
                  '<th>Property Name</th>',
                  '</tr>');

        for(var i = 0; i < result.expected_by.length; i++) {
            out.write('<tr><td>');
            displayTypeLink(out, result.expected_by[i].schema.id,
                            result.expected_by[i].schema.name);
            out.write('</td><td>', result.expected_by[i].key.join(", "),
                      '</td><td>', result.expected_by[i].name,
                      '</td><tr>');
        }
        out.write("</table>");
    }


    // Output a list of the names of instances of this type
    out.write("<h2>Instances</h2>");
    if (result.instance.length == 0) out.write("No instances");
    else {
        for(var i = 0; i < result.instance.length; i++) {
            var id = result.instance[i].id;
            var name = result.instance[i].name;
            if (!name) name = id;
            if (i != 0) out.write(", ");
            out.write("<a target='_new' href='http://freebase.com/view",
                      id, "'>", name, "</a>");
        }
    }

    // Calling flush makes the output visible on the page
    out.flush();
}

// Output a link to a type. Use the type id as the link text, and
// make the type name available as a tooltip
function displayTypeLink(out, id, name) {
    out.write('<a title="', name, '" onclick="queryType(\'', id, '\')">',
              id, '</a>');
}

// This little DOMStream class writes HTML into the element we specify
function DOMStream(elt) {                     // Constructor function
    if (typeof elt == "string")               // Expects a DOM element
        elt = document.getElementById(elt);   // or element id string
    this.elt = elt;                           // Remember the element
    this.buffer = [];                         // Array to buffer output
}
DOMStream.prototype.clear = function() {      // Erase element content
    this.elt.innerHTML = "";
};
DOMStream.prototype.write = function() {      // Buffer up all arguments
    this.buffer.push.apply(this.buffer, arguments); // JavaScript voodoo
};
DOMStream.prototype.flush = function() {      // Output all text to the element
    this.elt.innerHTML += this.buffer.join(""); // Concatenate and display
    this.buffer.length = 0;                     // Empty the buffer
};
</script>

<style>
/* Some CSS styles to make everything look good */
body {
  font-family: Arial, Helvetica, sans-serif; /* We like sans-serif */
  margin-left: .5in;                         /* Indent everything... */
}
h1, h2 {  margin-left: -.25in; }             /* ...except headings */
h2 { margin-bottom: 5px; margin-top:10px; }
/* Make tables look nice */
table { border-collapse: collapse; width: 95%;}
th { background-color: #aaa;}
td { background-color: #ddd; padding: 1px 5px 1px 5px; }

/* Our <a> tags don't have hrefs, so we need to style them ourselves */
a { color: #00a; }
a:hover { text-decoration:underline; cursor:pointer;}

/* Make the input field look nice */
form.inputform { 
  float:right; border: solid black 2px; background-color: #aba;
  margin: 15px 30px 0px 0px; padding: 10px;
}
</style>
</head>
<body>
<!-- A form in which the user can enter a type id -->
<form class='inputform' onsubmit="queryType(this.t.value); return false;">
Enter type id: <input name="t"></form>
<!-- Generated content goes here -->
<div id="placeholder"></div>
</body>
</html>

4.8. Metaweb Services with Python

As a final, advanced example of the use of the mqlread, search, and trans services, Example 4.15 presents a Python module for working with Metaweb services. This module defines a metaweb.Session object that represents the host name of a Metaweb server, and also encapsulates a set of options (such as the lang option to mqlread and the maxwidth option to /api/trans/image_thumb). Each Session object also maintains a "cookie jar" for storing any HTTP cookies returned by Metaweb services, and uses those cookies in any subsequent requests made. (Cookies are used for authentication and caching, and are discussed in Chapter 6.)

One feature of Example 4.15 is of particular note. The results() method of the metaweb.Session class is a generator that returns the results of a MQL query one at a time, and uses the cursor envelope parameter to submit a MQL query as many times as necessary to retrieve all available results. You might use it in code like this:

import metaweb                                 # Use the metaweb module
freebase = metaweb.Session("api.freebase.com") # Create a Session object
albums_by_bob = [{'type':'/music/album',       # This is our MQL query
                  'artist':'Bob Dylan',
                  'name':None }]
for album in freebase.results(albums_by_bob):  # Loop through query results
    print album["name"]                        # Print album names

Example 4.14 is a Python version of Example 4.10: it lists albums by a specified band or searches for bands whose name is like a specified string. It demonstrates the metaweb.py module in more detail, showing how to use the read() method to invoke mqlread, the search() method to invoke the search service, and the blurb() method to invoke the /api/trans/blurb service.

Example 4.14. Using the metaweb.py module to read, search and download

import sys            # Command-line arguments, etc.
import metaweb        # Metaweb services

band = sys.argv[1]                       # The band the user is asking about
query = { 'type': '/music/artist',       # Our MQL query in Python.
          'name': band,                  # Place the band in the query.
          'album': [{ 'name': None,
                      'release_date': None,
                      'sort': 'release_date' }]}

freebase = metaweb.Session("api.freebase.com") # Create a session object
result = freebase.read(query)                  # Submit query, get results
if result:                                     # If we got a result
    print("Albums by %s:" % result['name'])    # print the album names
    print("\n".join([album['name'] for album in result['album']]))
else:                                          # Otherwise: no result
    matches=freebase.search(band + "*",        # Start a search
                            type="/music/artist",  # Only for bands
                            limit=5)               # We only want 5
    if (len(matches) == 0):                    # If no search results
        print "Unknown band."                  # Give up.
    else:                                      # If we got some search results
        print "Did you mean one of these?:"
        for match in matches:                  # Loop through the matches
            print                      
            print match['name']                # Print the name of the match
            article = match['article']         # Get associated article
            if article:                        # If there is an article
                text,type = freebase.blurb(article['id'],  # Download a blurb
                                           maxlength=100)  # 100 chars long
                print text;                                # And print it

With those usage examples behind us, we end this chapter with the metaweb.py code. Note that only read methods are shown here. We'll add methods for making MQL write queries and for uploading content in Chapter 6.

Example 4.15. metaweb.py: a Python module for Metaweb

#
# metaweb.py: A python module for writing Metaweb-enabled applications
#
"""
This module defines classes for working with Metaweb databases.

  metaweb.Session: represents a connection to a database
  metaweb.ServiceError: exception raised by Session methods

Typical usage:

    import metaweb
    freebase = metaweb.Session("api.freebase.com")
    q1 = [{ 'type':'/music/album', 'artist':'Bob Dylan', 'name':None }]
    q2 = [{ 'type':'/music/album', 'artist':'Bruce Springsteen', 'name':None }]
    bob,bruce = freebase.read(q1, q2)  # Submit two queries, get two results
    for album in bob: print album['name']

    # Get query results with a generator method instead
    albums = freebase.results(q2)
    albumnames = (album['name'] for album in albums)
    for name in albumnames: print name

    # Download an image of U2
    result = freebase.read({"id":"/en/u2","/common/topic/image":[{"id":None}]})
    imageid = result["/common/topic/image"][0]["id"]
    data,type = freebase.download(imageid)
    print "%s image, %d bytes long" % (type, len(data))

"""

import urllib        # URL encoding
import simplejson    # JSON serialization and parsing
import urllib2       # URL content fetching
import cookielib     # HTTP Cookie handling

# Metaweb read services
READ = '/api/service/mqlread'    # Path to mqlread service
SEARCH = '/api/service/search'   # Path to search service
DOWNLOAD = '/api/trans/raw'      # Path to download service
BLURB = '/api/trans/blurb'       # Path to document blurb service
THUMB = '/api/trans/image_thumb' # Path to image thumbnail service

# Metaweb write services
LOGIN = '/api/account/login'     # Path to login service
WRITE = '/api/service/mqlwrite'  # Path to mqlwrite service
UPLOAD = '/api/service/upload'   # Path to upload service
TOUCH = '/api/service/touch'     # Path to touch service

# Metaweb services return this code on success
OK = '/api/status/ok'            

class Session(object):
    """
    This class represents a connection to a Metaweb database.

    It defines methods for submitting read, write and search queries to the
    database and methods for uploading and downloading binary data.  
    It encapsulates the database URL (hostname and port), read and write
    options, and maintains authentication and cache-related cookies.

    The Session class defines these methods:

      read(): issue one or more MQL queries to mqlread
      results(): a generator that performs a MQL query using a cursor
      search(): invoke the search service
      download(): retrieve content with trans/raw
      contenURL(): like download(), but just return the URL
      blurb(): retrieve a document blurb with trans/blurb
      blurbURL(): like blurb(), but just return the URL
      thumbnail(): retrieve an image thumbnail with trans/image_thumb
      thumbnailURL(): like thumbnail(), but just return the URL
      login(): establish credentials (as a cookie) for writes
      write(): invoke mqlwrite
      upload(): upload content
      touch(): get a fresh mwLastWriteTime cookie to defeat caching

    Each Session instance has these read/write attributes:

      host: the hostname (and optional port) of the Metaweb server as a string.
         The default is sandbox.freebase.com.  Every Monday, the sandbox is
         erased and it is updated with a fresh copy of data from
         www.freebase.com.  This makes it an ideal place to experiment.

      cookiejar: a cookielib.CookieJar object for storing cookies.
         If none is passed when the class is created, a
         cookielib.FileCookieJar is automatically created.  Note that cookies
         are not automatically loaded into or saved from this cookie jar,
         however. Clients that want to maintain authentication or cache state
         across invocations must save and load cookies themselves.

      options: a dict mapping option names to option values. Key/value
         pairs in this dict are used as envelope or URL parameters by
         methods that need them.  The read() method looks for a lang
         option, for example and the image_thumb looks for a maxwidth
         option. Options may be passed as named parameters to the Session() 
         constructor or to the various Session methods.
    """

    def __init__(self, host="sandbox.freebase.com", cookiejar=None, **options):
        """Session constructor method"""
        self.host = host
        self.cookiejar = cookiejar or cookielib.FileCookieJar()
        self.options = options

    def read(self, *queries, **options):
        """
        Submit one or more MQL queries to a Metaweb database, using any
        named options to override the option defaults. If there is
        a single query, return the results of that query. Otherwise, return
        an array of query results. Raises ServiceError if there were problems
        with any of the queries.
        """

        # How many queries are we handling?
        n = len(queries)

        # Gather options that apply to these queries
        opts = self._getopts("lang", "as_of_time", "escape",
                             "uniqueness_failure", **options)

        # Create an outer envelope object
        outer = {}

        # Build the inner envelope for each query and put it in the outer.
        for i in range(0, n):
            inner = {'query': queries[i]} # Inner envelope holds a query.
            inner.update(opts)            # Add envelope options.
            outer['q%d' % i] = inner      # Put inner in outer with name q(n).

        # Convert outer envelope to a string
        json = self._dumpjson(outer)

        # Encode the query string as a URL parameter and create a url
        urlparam = urllib.urlencode({'queries': json}) 
        url = 'http://%s%s?%s' % (self.host, READ, urlparam)
                
        # Fetch the URL contents, parse to a JSON object and check for errors.
        # From here on outer and inner refer to response, not query, envelopes.
        outer = self._check(self._fetch(url))

        # Extract results from the response envelope and return in an array.
        # If any individual query returned an error, raise a ServiceError.
        results = []
        for i in range(0, n):
            inner = outer["q%d" % i]         # Get inner envelope from outer
            self._check(inner)               # Check inner for errors
            results.append(inner['result'])  # Get query result from inner

        # If there was just one query, return its results.  Otherwise
        # return the array of results
        if n == 1:
            return results[0]
        else:
            return results

    def results(self, query, **options):
        """
        A generator version of the read() method. It accepts a single
        query, and yields query results one by one. It uses the envelope
        cursor parameter to return a full set of results even when more
        than one invocation of mqlread is required.
        """

        # Gather options that apply to this query
        opts = self._getopts("lang", "as_of_time", "escape", 
                             "uniqueness_failure", **options)

        # Build the query envelope
        envelope = {'query': query}
        envelope.update(opts)

        # Start with cursor set to true
        cursor = True

        # Loop until cursor is no longer true
        while cursor:
            # Use the cursor as an envelope parameter
            envelope['cursor'] = cursor

            # JSON-encode the envelope and convert it to a URL parameter
            params=urllib.urlencode({'query': self._dumpjson(envelope)}) 
                
            # Build the URL
            url = 'http://%s%s?%s' % (self.host, READ, params)

            # Fetch and parse the URL contents, raising ServiceError on errors
            response = self._check(self._fetch(url))
                
            # Get the results array and yield one result at a time
            results = response['result']
            for r in results:
                yield r

            # Get the new value of the cursor for the next iteration
            cursor = response['cursor']
            

    def search(self, query, **options):
        """
        Invoke the search service for the specified query string.  If that
        string ends with an asterisk, perform a prefix search instead of a
        straight query.  type, domain, type_strict, and other search service
        options may be specified as Session options or may be passed as named
        parameters.
        """
        opts = self._getopts("domain", "type", # Build a dict of search options
                             "type_strict",    # from these session options
                             "limit", "start",  
                             "escape", "mql_output",
                             **options)        # plus any passed to this method

        if query.endswith('*'):            # If search string ends with *
            opts["prefix"] = query[0:-1]   # then this is a prefix search
        else:                              # Otherwise...
            opts["query"] = query          # It is a regular query

        params = urllib.urlencode(opts)                    # Encode options
        url = "http://%s%s?%s" % (self.host,SEARCH,params) # Build URL
        envelope = self._fetch(url)                        # Fetch response
        self._check(envelope)                              # Check that its OK
        return envelope["result"]                          # Return result


    def download(self, id):
        """
        Return the content and type of the content object identified by id.

        Returns two values: the downloaded content (as a string of characters
        or bytes) and the type of that content (as a MIME-type string, from
        the Content-Type header returned by the Metaweb server). Raises
        ServiceError if the request fails with a useful message; otherwise
        raises urllib2.HTTPError.  See also the contentURL() method.
        """
        return self._trans(self.contentURL(id))


    def blurb(self, id, **options):
        """
        Return the content and type of a document blurb.  See blurbURL().
        """
        return self._trans(self.blurbURL(id, **options))

    def thumbnail(self, id, **options):
        """
        Return the content (as a binary string) and type of an image 
        thumbnail.  See thumbnailURL().
        """
        return self._trans(self.thumbnailURL(id, **options))


    def contentURL(self, id):
        """
        Return the /api/trans URL of the /type/content, /common/image, 
        or /common/document content identified by the id argument. 
        """
        return self._transURL(id, DOWNLOAD)

    def blurbURL(self, id, **options):
        """
        Return the /api/trans URL of a blurb of the document identified by id.

        The id must refer to a /type/content or /common/document object.
        Blurb length and paragraph breaks are controlled by maxlength and
        break_paragraph options, which can be specified in the Session object
        or passed to this method.
        """
        return self._transURL(id, BLURB, ["maxlength", "break_paragraphs"],
                              options)

    def thumbnailURL(self, id, **options):
        """
        Return the URL of a thumbnail of the image identified by id.

        The id must refer to a /type/content or /common/image object.
        Thumbnail width and height are controlled by the maxwidth and
        maxheight options, which can be specified on the Session object, or
        passed to this method.
        """
        return self._transURL(id, THUMBNAIL, ["maxwidth","maxheight"], options)


    # A utility method that returns a dict of options.  
    # It first builds a dict containing only the specified keys and their
    # values, and only if those keys exist in self.options. 
    # Then it augments this dict with the specified options.
    def _getopts(self, *keys, **local_options):
        o = {}
        for k in keys:
            if k in self.options:
                o[k] = self.options[k]
        o.update(local_options)
        return o

    # Fetch the contents of the requested HTTP URL, handling cookies from
    # the cookie jar.  Return a tuple of http status, headers and
    # response body. This is the only method in this module that performs
    # HTTP or manages cookies.  This implementation uses the urllib2 library.
    # You can subclass and override this method if you want to use a
    # different implementation with different performance characteristics.
    def _http(self, url, headers={}, body=None):
        # Store the url in case we need it later for error reporting.
        # Note that this is not safe if multiple threads use the same Session.
        self.lasturl = url;

        # Build the request.  Will use POST if a body is supplied
        request = urllib2.Request(url, body, headers)

        # Add any cookies in the cookiejar to this request
        self.cookiejar.add_cookie_header(request)

        try:
            stream = urllib2.urlopen(request) # Try to open the URL, get stream
            self.cookiejar.extract_cookies(stream, request) # Remember cookies
            headers = stream.info()
            body = stream.read()
            return (stream.code, headers, body)
        except urllib2.HTTPError, e:          # If we get an HTTP error code
            return (e.code, e.info(), e.read())  # But return values as above
       

    # Parse a string of JSON text and return an object, or raise
    # InternalServiceError if the text is unparseable.
    # This implementation uses the simplejson library.
    # You can override it in a subclass if you want to use something else.
    def _parsejson(self, s):
        try: 
            return simplejson.loads(s)
        except:
            # If we couldn't parse the response body, then we probably have an 
            # low-level HTTP error with no JSON in its response. This should
            # not happen, but if it does, we createa a fake response object
            # so that we can raise a ServiceError as we do elsewhere.
            raise InternalServiceError(self.lasturl, s)

    # Encode the object o as JSON and return the encoded text.
    # If pretty is True, use line breaks and indentation to make the output
    # more human-readable.  Override this method if you want to use an
    # implementation other than simplejson.
    def _dumpjson(self, o, pretty=False):
        if pretty:
            return simplejson.dumps(o, indent=4)
        else:
            return simplejson.dumps(o)

    # An internal utility function to fetch the contents of a Metaweb service
    # URL and parse the JSON results and return the resulting object.
    #
    # Metaweb services normally return JSON response bodies even when an HTTP 
    # error occurs, and this function parses and returns those error objects. 
    # It only raises an error on very low-level HTTP errors that do 
    # not include a JSON object as its body.
    def _fetch(self, url, headers={}, body=None):
        # Fetch the URL contents
        (status, headers, body) = self._http(url, headers, body);
        # Parse the response body as JSON, and return the resulting object.
        return self._parsejson(body)

    # This is a utility method used by download(), blurb() and thumbnail()
    # to fetch the content and type of a specified URL, performing 
    # cookie management and error handling the way the _fetch function does.
    # Unlike other Metaweb services, trans does not normally return a JSON 
    # object, so we cannot just use _fetch here.
    def _trans(self, url):
        (status,headers,body) = self._http(url)   # Fetch url content
        if (status == 200):                       # If successful
            return body,headers['content-type']   # Return content and type
        else:                                     # HTTP status other than 200
            errobj = self._parsejson(body)        # Parse the body
            raise ServiceError(url, errobj)       # And raise ServiceError

    # An internal utility function to check the status code of a Metaweb
    # response envelope and raise a ServiceError if it is not okay.
    # Returns the response if no error.
    def _check(self, response):
        code = response['code']
        if code != OK:
            raise ServiceError(self.lasturl, response)
        else:
            return response

    # This utility method returns a URL for the trans service
    def _transURL(self, id, service, option_keys=[], options={}):
        url = "http://" + self.host + service + id    # Base URL.
        opts = self._getopts(*option_keys, **options) # Get request options.
        if len(opts) > 0:                             # If there are options...
            url += "?" + urllib.urlencode(opts)       # encode and add to url.
        return url


class ServiceError(Exception):
    """
    This exception class represents an error from a Metaweb service.

    When anything goes wrong with a Metaweb service, it returns a response
    object that includes an array of message objects.  When this occurs we
    wrap the entire response object in a ServiceError exception along
    with the URL that was requested.
    
    A ServiceError exception converts to a string that contains the
    requested URL (minus any URL parameters that contain the actual
    query details) plus the status code and message of the first (and
    usually only) message in the response. 

    The details attribute provides direct access to the complete response 
    object. The url attribute provides access to the full url.
    """

    # This constructor expects the URL requested and the parsed response.
    def __init__(self, url, details):
        self.url = url
        self.details = details

    # Convert to a string by printing url + the first error code and message
    def __str__(self):
        prefix = self.url.partition('?')[0]
        msg = self.details['messages'][0]
        return prefix + ": " + msg['code'] + ": " + msg['message']


class InternalServiceError(ServiceError):
    """
    A ServiceError with a fake response object. We raise one of these when
    we get an error so low-level that the HTTP response body is not a
    JSON object.  In this case we basically just report the HTTP error code.
    An exception of this type probably indicates a bug in this module.
    """
    def __init__(self, url, body):
        ServiceError.__init__(self, url, 
                              {'code':'Internal service error',
                               'messages':[{'code':'Unparseable response',
                                            'message':body}]
                               })

[16] Find it at http://search.cpan.org or install it automatically by running # cpan -i JSON

[17] A Metaweb cursor is related to, but not the same as a cursor used in SQL with a relational database.

[18] See the discussion of the as_of_time envelope parameter in Section 4.2.4.4 for an explanation of how past results can be retrieved. The as_of_time parameter also allows you to re-retrieve the first batch of query results even though the first batch does not have a cursor value.

[19] September, 2008

[20] September, 2008

[21] September, 2008