So now, we know that our web app can get easily geographic info from the client, assuming of course the client uses Firefox 3.5 or newer. We won’t talk about code samples yet but go further into the theory of geolocation, now going to the second step mentioned on the first article and do it the other way… finding geolocation on web pages from the client.
There is a number of objects we can find geotagged on a webpage. Basically it will be HTML and jpeg images.
JPEG images are the easiest to get, due to only one existing standard. JPEG geotagging is written on what is called EXIF metadata . EXIF metadata is embedded on the headers of JPEG files, and includes, among other things, all the shooting data as: flash use, speed, ISO value, time and date, camera model and maker, position of the camera, color space… and geolocation data. Of course, all this data has to be supported by the camera to be included (i.e. having a camera with GPS) or you need to add all this info later with another software. Just to point an example, you can get all your photos that aren’t geotagged with a software as iTag, GPSED, HoudahGEO or GeoPhoto and embed the geolocation info on the EXIF data, either pointing on a map or using an autonomous gps waypoint file. Having on the JPEG file all the info about when, where and how all your photos where taken is a very precious value…
Wikipedia has some info on how the dump of the exact data of the EXIF headers would look like, but, we can get the full info without messing around with headers with any EXIF library or tool.
BTW, EXIF headers can contain a lot of interesting info that most people don’t know that even exists, and can lead to info leaks. Some day I’ll write something about it…
Geotagging doesn’t rely on a single standard when talking about HTML. Digging around we can see that most used formats are:
- ICBM Used by the GeoUrl project ( that has a blog that hasn’t been updated since 2008 ) consists on a ICBM META tag containing lat/long, in form of a decimal number. As GeoUrl is also a database of geotagged sites, they also index GeoTag style tags into their database.
- GeoRDF, defined by the W3, is based on the WGS84 datum (a datum is a standard used for having reference when locating points on the Earth surface) You can check the vocabulary, that is very easy and contains lat, long and alt.
- Microformats, a much hyped tagging system (not standard but “open”, if you really care, but it works), used for “machine tagging” sections and objects, so it can make individual ‘things’ parseable and recognizable by a parser/crawler/indexer. There is a Geo Microformat available, also based on WGS84, that has lat and long properties and it’s a clone of the geo property of the vCard standard.
i.e. Flickr allows you to say where a photo or an album has been taken, and it will geotag all your photos with that info, but it won’t do it inside the EXIF data of the image, but place a pair of tags on that Flickr photo page, one ICBM and one GeoTag.
So, there’s still more to come, in next article I’ll talk about the data we are extracting, the problems we can find, and what to do with it…