The file is the basic unit of computer information. From a collaboration perspective, there are plenty of file-centric tools designed for sharing files — network drives, SharePoint, box.net — even the Internet itself isn’t much more than a collection of files, shared through various protocols.
For a knowledge worker, the files they work with daily usually map directly to a particular work task — for example, minutes of a meeting, or a document detailing a deliverable. They can also contain knowledge sourced from other people or places, like news reports or policy documents.
Regardless of what’s in them, Files are also the primary unit of plagiarism. I don’t mean plagiarism in a bad way; if these files can be said to “belong” to anyone, they belong to the Enterprise as a whole. It’s common for a valuable file within the organization to be re-purposed multiple times. Making key documents available to other people in your organization can be a huge productivity enabler.
By default, these files nearly always end up on your local machine. People mail them to you, they leave them lying around on network drives for you to copy. You download them from the web and read them locally. Despite all our efforts to try to centralize file storage, my computer is full of all kinds of files collected through my work. And as much as I can appreciate the benefits of cloud-based storage, I don’t think that this is going to change anytime soon.
And although files are traditionally thought of as unstructured data, it turns out there’s a lot of indexable and valuable information that can be collected from them. It’s just takes a bit more work for us software developers than accessing data that’s already been put into a database or some other structured format.
Infovark and Files
Infovark will process any file you give it, but it has best results with files that contain meaningful text. By default, Infovark scans all Microsoft Office files, PDFs, and plain text files, but you can include other file formats if you like. Infovark will do the best job it can.
To have your Infovark process a file, you need to tell it where to look. You can do this from the Infovark Manager, on the Files tab:
You just need to specify the folder that the document is in – that’s it. From here, your Infovark will keep an eye on the directory, and when you save or update a document, your Infovark it will capture it, and make it available on your local Infovark website. When it’s done, the web page it produces looks like this:
More than Tags
If you click on the picture above (opens in a new window),You can see that your Infovark has provided a text summary of the document, and also tagged the document with what it thinks are relevant keywords.
Infovark’s tags are a bit special, though. You can search for documents by tag, just like a Delicious or Technorati search. But Infovark also uses these tags to develop concepts it associates with your information and contacts. These concepts are used to suggest people you know or other useful email and documents.
You can see this in action with the related content panel, down the right hand side of the screen. So tagging is not just useful for you, but also improves the recommendations that Infovark makes.
You can edit the document summary, and provide more information about the document on this screen. You can add headers, pictures, links to other web pages within your Infovark site as well as on external sites using a friendly WYSIWYG editor. There’s no need to learn confusing wiki markup.
Visitors to this page can leave comments, download the file for themselves, or follow links to other relevant content. They can also rate the file or add tags to the page.
As you can see, Files are a hugely important part of how Infovark determines what you know. In our next post, we’ll look at email, and how it’s captured and shared.