SIFT - transparent

Enterprise Content Analysis and Metadata Tagging

Meet the next-generation enterprise metadata tagging tool, with built-in OCR and machine learning

Search And Protect All Enterprise Content


Centralize The MLS Data Transfer Process


Scan Network Locations For Spilled Data

Integrate With Content Management Systems

The Problem


How to protect company sensitive information, while also making it discoverable?


It’s important to protect a company’s most sensitive secrets. Every organization has information that it considers proprietary or confidential. Many organizations also manage sensitive customer data, such as credit card numbers, social security numbers, account numbers, and health information. And despite the best efforts of technology and policy, sensitive data is occasionally exposed — and the ramifications are sometimes severe. Competitors may gain access to a company’s most valuable secrets. Customers may lose confidence in a firm’s ability to protect their data. And the damage done to a company’s reputation may be expensive and permanent. Sensitive documents can be vulnerable in even highly classified environments.


It’s equally important that organizations be able to access valuable information in a timely fashion. This is an exceptionally tricky problem, as most files are not properly tagged with useful metadata. Most organizations rely on full-text search, which frequently results in overwhelming search results. Also, many files (such as pictures, videos, and scanned PDF documents) are not searchable — nor are documents containing pattern-based information. And without proper metadata tagging, sophisticated technologies like digital rights management (DRM), attribute-based access control (ABAC), and content targeting become completely ineffective. These limitations result in losing potentially valuable data down deep content gravity wells.

The Complexities

There are a number of moving parts that make the problem of controlling data dissemination especially difficult, including:

Complexity. Complex corporate IT environments are especially vulnerable to misconfiguration.
Decentralization. Documents are stored in different locations — including local hard drives, file shares, web portals, and email.
Classification. Most organizations have not implemented a formal data classification system, which makes it difficult to manage documents appropriately.
Collaboration. Companies can only function when information is shared, however data in motion is always a risk.
Human Error. Despite the best policies and procedures, people make mistakes.
Auditing. It is difficult to track where and when documents are transferred, and especially hard to detect anomalous behavior.
Document Types. Data is stored in many different file formats, most of which require special software to open.
Keyword Tagging. Most data is not tagged correctly with relevant keywords, complicating both enterprise search and the protection of data in motion.
Data Patterns. Some data, especially customer data (like SSNs or credit card numbers) cannot be identified by a single keyword.
 Searchability. Many file formats are not text-searchable, such as picture files and scanned documents.

SIFT™: Helping Keep Sensitive Data Secure

The Solution

SIFT™, from Aerstone Labs, is automated metadata tagging software, designed to identify keywords in files of any kind, based on a centrally-maintained list. SIFT™ also ships with an advanced machine learning algorithm that supports identifying specific shapes in pictures and video files. SIFT™ is designed to protect an organization from accidentally exposing sensitive data, while making all information properly discoverable. SIFT™ can be used as a stand-alone portal, or integrated seamlessly with existing content management systems. Once configured to search for the kind of data an organization considers sensitive, based on keywords or regular expression (RegEx) patterns, SIFT™ processes and tags files with useful and specific metadata. SIFT™ natively supports both searchable documents (e.g., MS Office) and non-searchable assets (e.g., pictures, video, and scanned PDFs).

Key Product Features

Supported File Types

  • Support for a wide range of common file types, including most Microsoft Office and Adobe documents.
  • A built-in OCR engine, to support scanning for text in pictures and scanned PDF documents.
  • A modular design, which easily allows extending support to additional asset types.

Systems Integration

  • Several ways to scan documents, including browser-based manual and bulk scanning.
  • Restful API for inline deployment with document management systems and high assurance guards.

Keyword Identification

  • A customizable set of scanning rules, which supports scanning documents against a centralized list of keywords.
  • Full support for both static and pattern-based keywords, like SSNs or credit card numbers, based on industry-standard regular expressions.
  • Tag file metadata with discovered keywords, to support enterprise search and security solutions like digital rights management, data loss prevention, or attribute-based access control .

Auditing and Reporting

  • Scan network locations for spilled data against enclave keyword rulesets, with recursive file ingest.
  • Highly customizable auditing and historical reporting, with drill-down capability.

Free Whitepaper

Check out our free whitepaper comparing SIFT™ OCR capabilities against Google Tesseract!


See SIFT In Action!

SIFT Overview

SIFT Overview

SIFT Integration with Adobe AEM

SIFT Integration with Adobe AEM

Schedule A Demo

Contact Us

Solution Brochure


Detailed Overview