Googling Your Email
by Jon Udell10/07/2002
Someday we'll tell our grandchildren about those moments of epiphany, back in the last century, when we first glimpsed how the Web would change our relationship to the world. For me, one of those moments came when I was looking for an ODBC driver kit that I knew was on a CD somewhere in my office. After rifling through my piles of clutter to no avail, I tried rifling through AltaVista's index. Bingo! Downloading those couple of megabytes over our 56K leased line to the Internet was, to be sure, way slower than my CD-ROM drive's transfer rate would have been, but since I couldn't lay my hands on the CD, it was a moot point. Through AltaVista I could find, and then possess, things that I already possessed but could not find.
There began an odd inversion that continues to the present day. Any data that's public, and that Google can see, is hardly worth storing and organizing. We simply search for what we need, when we need it: just-in-time information management. But since we don't admit Google to our private data stores -- Intranets [1] and mailboxes, for example -- we're still like the shoemaker's barefoot children. Most of us can find all sorts of obscure things more easily than we can find the file that Tom sent Leslie last week.
What would it be like to Google your email? Raphaël Szwarc's
ZOË is a clever piece of software
that explores this idea. It's written in Java (source available), so it can be debugged and run everywhere. ZOË is implemented as a collection of services. Startup is as simple as
unpacking the zipped tarball and launching ZOË.jar. The services that fire up include a local Web server that handles the browser-based UI, a text indexing engine, a POP client and
server, and an SMTP server.
Because ZOË has a Web-style architecture, you can use it remotely as well as locally. At the moment, for example, I'm running ZOË on a Mac OS X box in my office, but browsing into it from my wirelessly connected laptop outside. I wouldn't recommend this, however, since ZOË's Web server has no access controls in place. By contrast, Radio Userland -- also a local, Web-server-based application, which I'm currently running on a Windows XP box in my office and browsing into remotely -- does offer HTTP basic authentication, though not over SSL. In the WiFi era, you have to be aware of which local services are truly local.
ZOË doesn't aim to replace your email client, but rather to proxy your mail traffic and build useful search and navigation mechanisms. At the moment, I'm using ZOË together with Outlook (on Windows XP) and Entourage (on MacOSX). ZOË's POP client sucks down and indexes my incoming mail in parallel with my regular clients. (I leave a cache of messages on the server so the clients don't step on one another.) By routing my outbound mail through ZOË's SMTP server, it gets to capture and index that as well. Here's a typical search result.

ZOË helps by contextualizing the results, then extracting and listing Contributors (the message senders), Attachments, and Links (such as the URL strings found in the messages). These context items are all hyperlinks. Clicking "Doug Dineley" produces the set of messages from Doug, like so:

Following Weblog convention, the # sign preceding Doug's name is a permalink. It assigns a URL to the query "find all of Doug's messages," so you can bookmark it or save it on the desktop.
Note also the breadcrumb trail that ZOË has built:
ZOË -> Com -> InfoWorld
These are links too, and they lead to directories that ZOË has automatically built. Here's the view after clicking the InfoWorld link:

Nice! Along with the directory of names, ZOË has organized all of the URLs that appear in my InfoWorld-related messages. This would be even more interesting if those URLs were named descriptively, but of course, that's a hard thing to do. Alternatively, ZOË could spider those URLs and produce a view offering contextual summaries of them. We don't normally think of desktop applications doing things like that, but ZOË (like Google) is really a service, working all the time, toiling in ways that computers should and people shouldn't.
|
Related Reading
Web Services Essentials |
When we talk about distributed Web services, we ought not lose sight of the ones that run on our own machines, and have access to our private data. ZOË reminds us how powerful these personal services can be. It also invites us to imagine even richer uses for them.
Fast, fulltext search, for example, is only part of the value that ZOË adds. Equally useful is the context it supplies. That, of course, relies on the standard metadata items available in email: Subject, Date, From. Like all mail archivers, ZOË tries to group messages into threads, and like all of them, it is limited by the unfortunate failure of mail clients to use References or In-Reply-To headers in a consistent way. Threading, therefore, depends on matching the text of Subject headers and sacrifices a lot of useful context.
For years, I've hoped email clients would begin to support custom metadata tags that would enable more robust contextualization -- even better than accurate threading would provide. My working life is organized around projects, and every project has associated with it a set of email messages. In Outlook, I use filtering and folders to organize messages by project. Unfortunately, there's no way to reuse that effort. The structure I impose on my mail store cannot be shared with other software, or with other people. Neither can the filtering rules that help me maintain that structure. This is crazy! We need to start to think of desktop applications not only as consumers of services, but also as producers of them. If Outlook's filters were Web services, for example, then ZOË -- running on the same or another machine -- could make use of them.
Services could flow in the other direction, too. For example, ZOË spends a lot of time doing textual analysis of email. Most of the correlations I perform manually, using Outlook folders, could be inferred by a hypothetical version of ZOË that would group messages based on matching content in their bodies as well as in their headers, then generate titles for these groups by summarizing them. There should be no need for Outlook to duplicate these structures. ZOË could simply offer them as a metadata feed, just as it currently offers an RSS feed that summarizes the current day's messages.
At InfoWorld's recent Web services conference, Google's cofounder Sergey Brin gave a keynote talk. Afterward, somebody asked him to weigh in on RDF and the semantic Web. "Look," he said, "putting angle brackets around everything is not a technology, by itself. I'd rather make progress by having computers understand what humans write, than to force humans to write in ways computers can understand." I've always thought that we need to find more and better ways to capture metadata when we communicate. But I've got to admit that the filtering and folders I use in Outlook require more effort than most people will ever be willing to invest. There may yet turn out to be ways to make writing the semantic Web easy and natural. Meanwhile, Google and, now, ZOË remind us that we can still add plenty of value to the poorly-structured stuff that we write every day. It's a brute-force strategy, to be sure, but isn't that why we have these 2GHz personal computers?
Jon Udell is an author, information architect, software developer, and new media innovator.
1 Users of the Google Search Appliance do, of course, invite Google behind the firewall.
Read more Jon Udell columns.
Return to the O'Reilly Network.
Showing messages 1 through 19 of 19.
-
Indexing mail with MG
2004-04-22 00:35:08 Jochen Leidner [View]
-
Microsoft has indexing built in.
2003-08-05 10:45:30 anonymous2 [View]
Just turn on microsoft indexing...
run compmgmt.msc
Services and Applications
right click on indexing service, select start
There is a tool in the management app for doing free txt serarches..
-
*Fast* Indexed Search
2003-05-16 15:21:47 anonymous2 [View]
Many other applications will let you search your email. The user experience is entirely different if the search is fast and is run against the entire body of text.
Running a search against a 500MB store of emails on outlook can take minutes. Indexed search can take seconds or even fractions of a second.
I currently use the "Nelson Email Organizer" with outlook. It provides indexed search, so that I can find a message within seconds. It also creates automatic folders (a kind of standing search) for each correspondent. Indexing makes these operations instantaneous or very quick, so that I no longer need to organize my emails into folders.
If the cost of a search is low to me, I will do it often, changing fundamentally how I organize and retrieve information.
Speed matters. We should be able to Google everything, including our file system. Windows 2000 and XP have a built in indexing service, but Microsoft does little with it (it does a poor job of speeding up the "Find" function.)
-
You people just don't get it...
2003-01-21 18:09:46 anonymous2 [View]
Zoe sounds like a fantastic tool.
I use Notes (which sux!) at work. Yes, I can search for my stuff, and use a web client away from home, but only because my IS infrastructure is running a Notes server.
I also have IMAP accounts, and Mulberry (which I also use) can search through those, but they are on other servers as well.
ZOE runs *on* *my* *computer*. Nobody else's server necessary. Nobody whining about my Notes database or IMAP store being too big. ZOE is *FREE*. Notes? Yeah, right. Maybe on "alt.binaries.warez"...
Zoe is a personal product, not something for the corporate infrastructure (as someone opined .. what a maroon).
I'm gettin' it...
-
Separating indexed local store from client
2002-10-15 07:50:31 anonymous2 [View]
Zoe is obviously not unique in providing search capabilities for archives of email. But one of the coolest things about it is the concept of separating the local indexed mail store from the client (and platform!) liberating me to use whatever mail client I want, while Zoe transparently manages the indexing of incoming and outgoing mail. If I want, the store is local, so I can search through my email on a plane. Or move my years of email from one machine to another (still indexed), whatever the platform, by copying one folder.
-
IMAP search
2002-10-10 21:19:29 anonymous2 [View]
IMAP has built in search capabilities. If you have a decent idea of what you're looking for (sender, key words, date range, etc) and a client that supports IMAP searches, it's relatively trivial to find the exact message you're looking for. Google-style indexing may simply be a waste of resources (imagine trying to do this on a corporate mail server with thousands of users).
-
Re: IronDust Queue
2002-10-10 05:03:43 anonymous2 [View]
> Queue > is a useful technology for sharing your research > with built-in access control. It integrates into
> Internet Explorer and has IMAP and POP support.
This sounds very interesting. Do you know how it compares to Info Select 7 <http://www.miclog.com/is/7/>, which also supports IMAP? Does Queue let you use any IMAP server to store your snippets of information or do you need to sign up with an IronDust account, with is $5/month?
Thanks
-
error message
2002-10-09 16:53:29 anonymous2 [View]
It doesn't work for me:-(
"[auth] user command only available under a layer"
is what I get? What does that mean? Anyone know how to get "under a layer"?
Bengt
-
IronDust Queue
2002-10-09 15:12:00 anonymous2 [View]
Queue is a useful technology for sharing your research with built-in access control. It integrates into Internet Explorer and has IMAP and POP support.
-
comments miss the point
2002-10-09 11:26:25 anonymous2 [View]
The peanut gallery here seems a little uncritically critical ;)
The article didn't claim that fundamental ground had been broken in the theory of computation,, nor that the indexing scheme is completely novel. The point is that Zoe is packaged very nicely to use as a "personal web service".
1) it's portable (java)
2) it's free
3) it's open source
Sure, if I'm a fortune 500 company with a strong desire to burn money, I can buy Notes. But Zoe will run on my linux "personal server" (on my home lan accessible through cable modem) it requires no setup and it costs me nothing.
Which of these other solutions satisfy those criteria?
-
Zoot
2002-10-09 08:02:20 anonymous2 [View]
Jon,
You can achieve exactly this by reading and replying your Outlookmail through Zoot (www.zootsoftware.com). Zoot synchronizes from and to Outlook on the fly, so you can mail either from Zoot or from Outlook.
Zoot is a PIM, probably known, with many swiss knife like functions. In Zoot you can have autosorting, assignments to words, categories, etc.
BTW, thanks for another fine article.
- Thees Peereboom
-
AskSam
2002-10-09 05:59:39 anonymous2 [View]
I've kept most of the e-mail I've sent and received since 1984. (yes, that's '84, and that's not a typo.) It's several hundred megabytes in size
It's all indexed with AskSam (www.asksam.com), and I can zone in on any message by keyword, sender, etc....much more rapidly that waiting for some Java process and HTML screens to redraw.
I've written articles about it at :
http://www.jimcarroll.com/articles/ebiz41.htm
http://www.jimcarroll.com/articles/camag4.htm
It's a very powerful tool, for contact management, research, and just simply disarming people. ("No, you didn't say that in 1987, see, you're contradicting yourself!)
-
Mulberry does this too and ...
2002-10-09 03:20:27 anonymous2 [View]
... it can simultaneously search through mailboxes that are located on many different IMAP servers as well as in your local message store.
-
Usage with Outlook
2002-10-08 23:54:44 anonymous2 [View]
Jon,
Another brilliant article!
Just one question: You mention in your article that you are using ZOE with Outlook.
I understand that ZOE has a built-in POP3 server available at port 10110. However, Outlook 2000 can't take a port number with more than 4 digits (i.e. a port number lower or equal to 9999). Where/How can you force Outlook to use 127.0.0.1:10110 as the POP3 server to use?
Thanks!
Charles Nadeau
http://radio.weblogs.com/0111823/
-
Nothing new
2002-10-08 23:51:48 anonymous2 [View]
ZOE is interesting, I'll pay that. But there are dozens of personal free and payware search tool around that can index files, directories and data stores on your local drive.
A personal Google would be nice, however prior experiments with this such as Altavista Personal Search seem to have failed miserably a few years ago as people just don't get the concept.
I use dtSearch Desktop which integrates with everything I use, email, apps etc. I have an index of 2.4 million words, 402MB in size! I find it invaluable and have used it for sometime.
Agreed on Lotus Notes, the often overlooked fast and versatile full-text search in Notes is an easily accessible feature for Notes mail users who use it very often. Rarely do you see a Notes mail user scrolling through an archive to find an old email. They're well accustomed to a fast full-text search.
Really feel this is a rehash, but perhaps the article will remind people there are dozens of tools out there worth looking at.
-
authentication
2002-10-08 14:31:57 anonymous2 [View]
Actually, ZOE has the option to enable authenthication, you just have to enable it in preferences.
Regards,
Robert
-
Yes All versions of Lotus Notes already does this
2002-10-08 07:11:38 anonymous2 [View]
I use Lotus Notes at work and have had indexed/searchable email (via boolean commands) for 6+ years. At home I use Outlook for my personal email and while the interface is a little more friendly I hate trying to find something in it.
-
Notes client does this somewhat
2002-10-08 05:54:15 anonymous2 [View]
"For years, I've hoped email clients would begin to support custom metadata tags that would enable more robust contextualization"
Notes/Domino gives access to smtp X-fields so you can for example categorize on one, but i dont think anyone know :)
Disclaimer: Not sure if it does today, this was in version 4 (I think was a while ago)
-
Rate ZOE on OSDir.com
2002-10-07 16:32:43 Steve Mallett |
[View]
We ran a brief on ZOE not long ago. Link Rate it's usefulness and/or stability.










retrieval package also has built-in support
for indexing local email folders (in MBOX format).
-- Jochen