The GeohashPhrase™ is a new and innovative way to communicate about specific locations. It builds off proven and open location technology – the geohash – and combines it with a deep and introspective understanding of how to adapt technology to human purposes. The GeohashPhrase™ is the first truly humanized technology of its kind.
What is a GeohashPhrase™?
A GeohashPhrase™ is a native language phrase that encodes a particular geohash. A geohash is a relatively new way of expressing geographic coordinates, and directly extends the traditional latitude and longitude coordinate system. For example, the phrase “a young dog kicks the fancy hydrant” decodes to the geohash 9vg5tegd6. Each GeohashPhrase™ can be decoded to a specific geohash and represents the area encompassed by that geohash. One can also encode a geohash as a natural language GeohashPhrase™. Both encoding and decoding can be done completely offline.
How can I start?
We are still working on an initial implementation. When it is ready it will be available at http://search.qalocate.com.
What is the License?
Open source has proven its effectiveness over proprietary solutions in a great many domains. We at QA Locate are committed to making the GeohashPhrase™ open source and free for all to use. Before we can make that official, however, there are a number of internal issues we need to resolve, as well as some external ones.
First, we want to ensure that the community will be able to iterate on and improve what we’ve put together. At this point in time the GeohashPhrase™ is still very…raw. So much so that, well, we’d like to let the paint dry first, as it were.
Second, we recognize that the success of GeohashPhrase™ will be a community effort and that means having an appropriate steward to support those efforts. A Foundation seems the go to choice, but we aren’t sure yet.
Until we are ready to make the switch, the GeohashPhrase™ and associated Search UI will be covered by our terms of service.
Lastly, please know that we deeply appreciate your support. Contributions and collaborations are appreciated and will be accepted once we have an OSS process in place.
The goal of the GeohashPhrase™ is to facilitate clear and precise communication about location in everyday situations. Nothing more. Nothing less.
While mobile devices have greatly simplified and streamlined how we transmit data to one another, they haven’t eliminated the need for a purely human-to-human mechanism for communicating location. That need will always exist. We want to ensure that in every situation, from the mundane to the urgent, where precision of communication is important, that two people can communicate without ambiguity or confusion.
The GeohashPhrase™ has the potential to save everything from lives to pizza deliveries, and we’ll have succeeded in our mission if we accomplish either.
Pizza for the Soccer Team
Your child’s soccer team has just won it’s first game! You still need to clean up a bit, but to celebrate you’d like to order pizza delivered directly to the field. You call to place the order, but when asked for the address, which you don’t have, you realize that even if you did have the address it wouldn’t be sufficient. If only there was a way to be more precise…
With the GeohashPhrase™ you can communicate your precise location in seconds by voice over the phone.
Meeting a Friend at the Festival
A festival is on and you and a friend want to attend. You’ve agreed to meet “in front of the pavilion”. Once you get there you realize you forgot that the building had multiple entrances all around. Guess you’ll need to text your friend and specify which front you meant…
With the GeohashPhrase™ you can share your location in real time via SMS/voice/email, as well as annotate calendar events with a precise location.
You and your uncle are enjoying a fine Sunday out fishing the river. You’ve gone a bit outside your normal range to see if you’d have better luck. Disaster strikes and your Uncle requires immediate medical attention. You call emergency services. Cell coverage has been spotty but your call goes through. All you see are trees and the river, and your phone is saying it has no SMS or data access. The operator asks for your location…
With the GeohashPhrase™ you share your location entirely verbally, and, should it be necessary, aurally distinct phrases can be generated.
Comparison to Other Geocode Systems
The GeohashPhrase™ is not so different than many other geocode systems, including:
- Open Location Code
- and many more
All of these systems, GeohashPhrase™ included, have developed a way to concisely encode a particular location in a more human friendly format. There are pros and cons to each system, but the primary difference is that other systems have focused, to the detriment of other factors, on this question: “How do we make representations of location extremely concise?“
At QA Locate we don’t believe that that was quite the correct question to ask. We instead asked:
How can people communicate precisely about location?
By emphasizing communication we arrived at a slightly different solution. One that eschewed naive conciseness for a deeper and more sympathetic integration with our human limitations. And one that, we believe, is suited to our increasingly interconnected world.
How does it work?
Our initial release of GeohashPhrase™ targets the English language, but we hope that with contributions from the global community support for other languages can be added. With that in mind, we’ll restrict our discussion in this article to English examples.
A GeohashPhrase™ is a specially constructed sentence that can be decoded to a geohash.
In each phrase there are two types of words used:
- Glue words — used to construct grammatically correct sentences. Examples include words like “the”, “a”, “and”, “but”, ex. During decoding glue words are ignored. During encoding different sets of glue words may be used to create unique phrases.
- Base words — represent particular geohash character(s) at a particular index. Base words may be re-ordered, pluralized, or even change part-of-speech, without impacting their underlying meaning.
There are 4 different phrase “types” corresponding to the number of base words the phrase contains. These are:
- 4 base-words: encoding a 9-character geohash, sparse coverage (i.e. only populated locations have 4-base word phrases defined), “leading” base-word encodes 3-geohash characters
- 5 base-words: encoding a 9-character geohash, complete coverage, forms our base dictionary, leading base-word encodes a single geohash character
- 6 base-words: encoding a 11-character geohash, complete coverage, extends the 5 base-word dictionary with words to encode geohash characters at positions 10 and 11
- 7 base-words: encoding a 13-character geohash, complete coverage, extends the 6 base-word dictionary with words to encode geohash characters at positions 12 and 13
The different phrase types do not use separate word dictionaries, instead the dictionaries overlap. The 5-word dictionary forms our base dictionary, with the 6-word and 7-word phrases being simple extensions.
The 4-word dictionary is more complex. Complete coverage would require our “leading” base word encode 3-geohash characters. This would require 32768 (32 x 32 x 32) unique words. This is not feasible or practical, so instead we have defined only a small number of these base words, mostly for coverage over populated areas. The 3 remaining base words overlap completely with their respective positions in the 5-base word dictionary.
By default a GeohashPhrase™ is five-base words and encodes a 9-letter geohash. At that level of precision it maps to a roughly 5m x 5m area (at the equator). A six-word phrase maps to a 15cm x 15cm area. A seven-word phrase maps to a 5mm x 5mm area. At seven-words we have enough precision to give a GeohashPhrase™ to every housefly in the world!
We are working on a formal specification, along with working code. The text below is meant to act as a rough introduction to how GeohashPhrase™s are created and interpreted.
Let’s first dive into how a GeohashPhrase™ is encoded from a geohash.
First, the characters of the geohash are grouped pair-wise, except for the first character which is left as its own group.
For example, the 9-character/5-base-words geohash is grouped as (_)(_ _)(_ _)(_ _)(_ _), resulting in 5 groupings. The 11 and 13-character groupings simply add additional couplets and have 6 and 7 groupings, respectively.
Second, each grouping of geohash characters is translated into a base word by treating the characters as the key to a static dictionary of words. Each grouping is mapped using a different dictionary with base words appearing in one and only one dictionary in one position. As a result words are never ambiguous in their meaning in a GeohashPhrase. The result is that words may be re-ordered without any effect on the meaning of the phrase.
Third, the base-words are turned into a native language phrase.
In order to ensure a grammatically correct phrase can always be constructed, the dictionaries are arranged according to a very general sentence structure. The sentence structure of an English GeohashPhrase is:
1-Adjective 2-Noun 3-Verb 4-Adjective 5-Noun (6-Adjective) (7-Noun)
This fits the general structure of English sentences. Other languages would use different, but equally general, sentence structures.
For English, the base-words are only ever nouns, verbs, or adjectives. Other types of words — particles, adverbs, etc — are considered glue-words. Additionally, tense and pluralization are ignored, i.e. kick and kicks and kicked are the same word.
From the base-words and available glue-words we can construct any of many equivalent sentences. To keep things simple there would be no canonical phrase — all phrases would be equally valid and equivalent to the same geohash after decoding.
As an example let’s say we have the geohash: 9vg5tegd6. After grouping and dictionary mapping we have our 5 base-words: [young, dog, kick, fancy, hydrant]. Last, a phrase is constructed: “a young dog kicks the fancy hydrant”.
Decoding a GeohashPhrase™ to a geohash merely follows the reverse steps of the encoding procedure.
First, the base-words are extracted from the phrase with the glue-words discarded.
Second, each base-word is mapped to its corresponding geohash characters.
Third, the final geohash is reconstructed from the characters found and presented to the user.
Decoding Complexities – Natural Language
Naive decoding of each GeohashPhrase™ will require base-word dictionaries containing all the most common variants of each word. This is how we intend to initially approach decoding. However, we are well aware that the combinatorial explosion due to all possible base-word-variants could generate large and unwieldy dictionaries, and a slow decoding process.
Therefore, it may prove useful to leverage NLP during the decoding process. We are investigating this as a possibility.
Decoding Complexities – Voice
The GeohashPhrase is focused initially with written, textual phrases. We would like to eventually support text-to-speech and speech-to-text methods.
There are a total of 8 distinct dictionaries of base-words that form the word corpus that phrases can be generated from. Ignoring the sparse dictionary for the 4 base-word phrases we require 6,176 unique base-words. (There are 6 dictionaries each of which need 1024 words, with one dictionary having merely 32 words.)
The dictionaries are constrained to a particular part of speech: noun, verb, or adjective. Words are assigned by which part of speech they are most commonly used as, and their overall frequency in common usage. Ideally common words will be used to populate the dictionaries that encode leading geohash characters, with less common and obscure words populating the dictionaries used to encode trailing characters.
In English a major constraint that we have encountered is the lack of verbs. For that reason we have limited our encoding to a single dictionary of 1024 verbs. English abounds in nouns and adjectives, fortunately, so we have three dictionaries each for them.
A feature that we would like to support would be base-word synonyms. This would introduce one, or more, additional base-words that would encode to the same geohash characters. By having synonyms, we can address issues around speech impediments and even simple phoneme similarity. We want to emphasize aural clarity and synonyms are a good way to approach that goal.
Note: our encoding scheme already considers changes in a base-word’s pluralization, tense, etc to be the same word. However, these changes rarely produce sufficiently aurally distinct words.
Is a GeohashPhrase™ an address?
No. Addresses are much more complex than simple locations. They require context and interpretation. A coordinate provides neither.
The LNS™ is a more appropriate augmentation to, and replacement for, traditional postal addresses.
Is there an Implementation?
We are currently working on an initial implementation. It should be ready soon…
Is there a Specification?
Not yet. Once we’ve released our initial implementation as a proof-of-concept, we would hope that a formal specification could be created.
Is the GEOhashPHRASE open source?
It will be. We want to release our initial implementation first.
Is the GEOhashPHRASE FREE to USE?
It will be. Right now we are still in prototype development and currently working on an initial implementation as a proof-of-concept.
Is there a way for me to contribute?
We would love to accept community contributions, but until we have formally open-sourced our implementation, the overhead of accepting outside contributions would be too large for our small team.