SIFT - transparent

Enterprise Content Analysis and Metadata Tagging

Meet the next-generation enterprise metadata tagging tool, with built-in OCR and machine learning

Search And Protect All Enterprise Content


Centralize The MLS Data Transfer Process


Scan Network Locations For Spilled Data

Integrate With Content Management Systems



It’s important to protect a company’s most sensitive secrets. Every organization has information that it considers proprietary or confidential. Many organizations also manage sensitive customer data, such as credit card numbers, social security numbers, account numbers, and health information. And despite the best efforts of technology and policy, sensitive data is occasionally exposed — and the ramifications are sometimes severe. Competitors may gain access to a company’s most valuable secrets. Customers may lose confidence in a firm’s ability to protect their data. And the damage done to a company’s reputation may be expensive and permanent. Sensitive documents can be vulnerable in even highly classified environments.


It’s equally important that organizations be able to access valuable information in a timely fashion. This is an exceptionally tricky problem, as most files are not properly tagged with useful metadata. Most organizations rely on full-text search, which frequently results in overwhelming search results. Also, many files (such as pictures, videos, and scanned PDF documents) are not searchable — nor are documents containing pattern-based information. And without proper metadata tagging, sophisticated technologies like digital rights management (DRM), attribute-based access control (ABAC), and content targeting become completely ineffective. These limitations result in losing potentially valuable data down deep content gravity wells.

The Complexities

There are a number of moving parts that make the problem of controlling data dissemination especially difficult, including:

Complexity. Complex corporate IT environments are especially vulnerable to misconfiguration.
Decentralization. Documents are stored in different locations — including local hard drives, file shares, web portals, and email.
Classification. Most organizations have not implemented a formal data classification system, which makes it difficult to manage documents appropriately.
Collaboration. Companies can only function when information is shared, however data in motion is always a risk.
Human Error. Despite the best policies and procedures, people make mistakes.

Auditing. It is difficult to track where and when documents are transferred, and especially hard to detect anomalous behavior.
Document Types. Data is stored in many different file formats, most of which require special software to open.
Keyword Tagging. Most data is not tagged correctly with relevant keywords, complicating both enterprise search and the protection of data in motion.
Data Patterns. Some data, especially customer data (like SSNs or credit card numbers) cannot be identified by a single keyword.
 Searchability. Many file formats are not text-searchable, such as picture files and scanned documents.

SIFT™: Helping Keep Sensitive Data Secure

The Solution

SIFT™, from Aerstone Labs, is automated metadata tagging software, designed to identify keywords in files of any kind, based on a centrally-maintained list. SIFT™ also ships with an advanced machine learning algorithm that supports identifying specific shapes in pictures and video files. SIFT™ is designed to protect an organization from accidentally exposing sensitive data, while making all information properly discoverable. SIFT™ can be used as a stand-alone portal, or integrated seamlessly with existing content management systems. Once configured to search for the kind of data an organization considers sensitive, based on keywords or regular expression (RegEx) patterns, SIFT™ processes and tags files with useful and specific metadata. SIFT™ natively supports both searchable documents (e.g., MS Office) and non-searchable assets (e.g., pictures, video, and scanned PDFs).

Key Product Features


Supported Content

  • Out-of-the-box support for a wide range of file types, including most Microsoft Office and Adobe documents.
  • Patented OCR pre-processing capabilities support scanning picture, image, and scanned PDF documents.
  • Machine learning supports identification of shapes in image and video files.

Systems Integration

  • Scan documents via file upload, or recursive network scanning.
  • RESTful API for inline deployment with content analysis solutions and document management systems.

Keyword Analysis

  • Evaluate scanned assets (and stamp file metadata!) against a centralized list of organizational keywords.
  • Full support for both static and pattern-based keywords, like SSNs or credit card numbers, based on industry-standard regular expressions.

Reporting Insight

  • Scan network locations for spilled data against enclave keyword rulesets, with recursive file ingest.
  • Highly customizable auditing and historical reporting, with drill-down capability.

Free Whitepaper

Aerstone’s patented OCR pre-processing technology provides substantially better results than native OCR. Check out our free whitepaper comparing SIFT™ OCR capabilities against Google Tesseract!

U.S. Patent 9,830,508
Systems and Methods for Extracting Text from a Digital Image

See SIFT In Action!

SIFT Overview

SIFT Overview

SIFT Integration with Adobe AEM

SIFT Integration with Adobe AEM

Schedule A Demo

Solution Brochure

Detailed Overview