Enterprise Content Analysis and Metadata Tagging
Meet the next-generation enterprise metadata tagging tool, with built-in OCR and machine learning
Search And Protect All Enterprise Content
Centralize The MLS Data Transfer Process
Scan Network Locations For Spilled Data
Integrate With Content Management Systems
It’s important to protect a company’s most sensitive secrets. Every organization has information that it considers proprietary or confidential. Many organizations also manage sensitive customer data, such as credit card numbers, social security numbers, account numbers, and health information. And despite the best efforts of technology and policy, sensitive data is occasionally exposed — and the ramifications are sometimes severe. Competitors may gain access to a company’s most valuable secrets. Customers may lose confidence in a firm’s ability to protect their data. And the damage done to a company’s reputation may be expensive and permanent. Sensitive documents can be vulnerable in even highly classified environments.
It’s equally important that organizations be able to access valuable information in a timely fashion. This is an exceptionally tricky problem, as most files are not properly tagged with useful metadata. Most organizations rely on full-text search, which frequently results in overwhelming search results. Also, many files (such as pictures, videos, and scanned PDF documents) are not searchable — nor are documents containing pattern-based information. And without proper metadata tagging, sophisticated technologies like digital rights management (DRM), attribute-based access control (ABAC), and content targeting become completely ineffective. These limitations result in losing potentially valuable data down deep content gravity wells.
There are a number of moving parts that make the problem of controlling data dissemination especially difficult, including:
SIFT™: Helping Keep Sensitive Data Secure
SIFT™, from Aerstone Labs, is automated metadata tagging software, designed to identify keywords in files of any kind, based on a centrally-maintained list. SIFT™ also ships with an advanced machine learning algorithm that supports identifying specific shapes in pictures and video files. SIFT™ is designed to protect an organization from accidentally exposing sensitive data, while making all information properly discoverable. SIFT™ can be used as a stand-alone portal, or integrated seamlessly with existing content management systems. Once configured to search for the kind of data an organization considers sensitive, based on keywords or regular expression (RegEx) patterns, SIFT™ processes and tags files with useful and specific metadata. SIFT™ natively supports both searchable documents (e.g., MS Office) and non-searchable assets (e.g., pictures, video, and scanned PDFs).
Key Product Features
- Out-of-the-box support for a wide range of file types, including most Microsoft Office and Adobe documents.
- Patented OCR pre-processing capabilities support scanning picture, image, and scanned PDF documents.
- Machine learning supports identification of shapes in image and video files.
- Scan documents via file upload, or recursive network scanning.
- RESTful API for inline deployment with content analysis solutions and document management systems.
- Evaluate scanned assets (and stamp file metadata!) against a centralized list of organizational keywords.
- Full support for both static and pattern-based keywords, like SSNs or credit card numbers, based on industry-standard regular expressions.
- Scan network locations for spilled data against enclave keyword rulesets, with recursive file ingest.
- Highly customizable auditing and historical reporting, with drill-down capability.
Aerstone’s patented OCR pre-processing technology provides substantially better results than native OCR. Check out our free whitepaper comparing SIFT™ OCR capabilities against Google Tesseract!
U.S. Patent 9,830,508
Systems and Methods for Extracting Text from a Digital Image
See SIFT In Action!