Ending the Paper Shuffle: Locating Documents
So how can web technology help address the problem of paper shuffling? To recap the previous post, the three key problems are:
1. Tracking documents.
2. Versioning documents.
3. Locating documents.
Let’s tackle these problems in reverse order. The web has a simple solution for locating documents: the Uniform Resource Identifier, or URI. You might have heard it called the Uniform Resource Locator, or URL. The two concepts are closely related, but the differences aren’t important for this discussion. Unique Ids are nothing new; record numbering or library classification schemes like the Dewey Decimal System have been around for a long time. But the URI standard supported by the W3C is the first such system to be truly global.
It’s more than a classification system, however. It’s an addressing system. In addition to assigning a unique Id, a URI tells you how to access the document. It also tells you where it lives and who owns it.
For instance, the URI http://www.microsoft.com/en/us/default.aspx tells us that a unique document named default.aspx lives at the domain named www.microsoft.com. If we look up the domain name, we’ll find it’s registered to Microsoft Corporation of Redmond, Washington, United States of America. It also tells us that we can interact with the document using Hypertext Transfer Protocol, or HTTP. That’s a lot of information in just a few characters. It’s a simple yet powerful system.
If we want to take advantage of this well understood and widely adopted international standard in the enterprise, we need a way to assign a URI to every block of information that has importance to the business. Many existing enterprise content management tools already do this, Microsoft SharePoint, EMC Documentum, and Oracle Stellent being the big three. But all three of these tools ask employees to do something very odd from a web perspective: They ask that the employee send their local files to a central server in order to get published with an addressable URI.
This is weird. We don’t email our public web pages to Google in order to have them appear in its index. Why do we send corporate documents to a centralized server? “To give it its unique number,” an organization’s records manager might reply. But this is old-school thinking; We’re not talking about a librarian stamping a number on a physical book.
It’s far easier to move a unique number from a central server to a workstation then to transfer an entire spreadsheet from a workstation to a server, only to pass it back again so that the employee can continue working on it. In other words, have the librarian send a sticker with the URI on it to the worker and let the worker slap it on the document. The label’s much easier to ship than a book.
But that approach is old-school thinking, too. We don’t need to move the book or the unique identifier: We’ll let the workstation create its own URI.
On the Internet, when you want a document from Microsoft, you type http://www.microsoft.com into your browser. Why not go to http://mary.mycompany.com when you want to get a document from Mary in Accounting? The same technology that makes the “big I” Internet work can make our “little i” intranets work, too.
All we need is a small program that assigns a URI to an employee’s documents and allows the employee’s workstation to act as its own webserver. Naturally this program needs to be secure, but we have two advantages. The first is that an organization knows its members. Nobody can log into the corporate network without a valid user account. On the public Internet, everyone is anonymous, but on your intranet, you can identify all of your coworkers. Second is that the same corporate firewall that prevents network attacks from the Internet can also prevent leaks coming from your intranet. (You do have a corporate firewall, don’t you?) With a few sensible precautions baked into this mini-webserver, you’ve neatly solved the addressability problem.
The system is remarkably intuitive. If you know Bill was tasked with coming up with the latest marketing presentation, you won’t have to search a vast corporate repository for it; You’ll find it on his machine. When Steve finishes work on his policy memo, he doesn’t have to upload it to headquarters and remember where to file it. He just asks this little program to publish it. Ding! Now it’s got a unique number and it’s available for all to see.
That’s how we’d solve the problem of locating information: It’s easiest to locate information if you never need to move it in the first place.
Related Posts
- Ending the Paper Shuffle: Versioning Documents
- Ending the Paper Shuffle: Tracking Documents
- Two Kinds of Enterprise Software
- A Moving Story
- Opera Understands the Small Cloud
Interesting thoughts but I have some reservations about some of your comments:
> the three key problems are: (tracking, versioning & location)
You didn’t explain how versioning would be managed.
> But all three of these tools ask employees to do something very odd from a web perspective: They ask that the employee send their local files to a central server in order to get published with an addressable URI.
Yes, users have to upload content to ECM repositories but how is this different to a user uploading/publishing content to a normal website?
> This is weird. We don’t email our public web pages to Google in order to have them appear in its index.
Users do not upload to Google but they do upload to web addressable spaces that are accessible to search engines such as Google. Many ECM products also include search engines that do much the same thing – they index content in repositories.
I quite like the idea of something like URLs/URIs to make all content addressable in a common way but I see some significant issues with your theory in the current landscape.
Desktops mostly manage transient data. Websites and ECM repositories (also FTP sites, SMB shares, etc) are used to manage published data. In addition to addressibility such repositories may provide services such as versioning, security, resilience, scalability, workflow, metadata, data transformation, etc. Network performance is getting better but it’s still not perfect. Accessing content at intercontinental, offshore or low bandwidth sites can still be painful so content replication is still necessary.
Apologies if I sound too critical. I’ve not really given this much thought so I’m unable to be more positive for now. Thank you for your thoughts.
Regards
Mark
Love it!! But can you address back-up and recovery if my machine has to be re-built? Where do the important corporate documents go then?
Hey Mark,
> You didn’t explain how versioning would be managed.
You beat me to the punch! I plan to talk about how we might address the versioning and tracking aspects of the problem in upcoming posts.
> Desktops mostly manage transient data…
That’s true today. What we’re aiming for is something like a peer-to-peer approach for sharing data, rather than a hub-and-spokes model. This means that each workstation will be caching aspects of the corporate datasphere, much like every employee’s brain carries a little bit of organizational know-how. Done properly, this might make the network more resilient, not less.
> Apologies if I sound too critical.
No worries, Mark. You’ve given us some things to think about as we continue work on our prototype.
Regards,
Dean
Great idea, I was thinking about something similar a while back…URL’s replacing files.
Instead of a folder with files, we are represented by a homepage with links (basically a wiki), with invite to collaborate.
Each user has a homepage for their documents, each document a URL.
My Documents 2.0
http://libraryclips.blogsome.com/2006/07/28/my-documents-20/
> What we’re aiming for is something like a peer-to-peer approach for sharing data, rather than a hub-and-spokes model.
I see where you’re going. It sounds similar to MS Groove. It will be interesting to see how you get around the problem of caching/replication. I was involved in a Groove pilot last year but soon gave up on it due to:
1. synchronization on joining new Groove workspaces – it appeared to download 100Mbs which soon slowed my PC down to a crawl (was accessing over global WAN). I’ve no idea if this problem still exists.
2. the features didn’t seem to be that much better than using SharePoint (which I was using anyway).
Good luck with your prototype.
Disclaimer: SharePoint and Groove were not my choice, unfortunately they were corporate standards
[...] described how we’d apply web technology to the problem of locating a document in your organization. So how can we help you determine whether you were looking at the right [...]
[...] have three posts on the subject concerning locating, versioning, and tracking [...]