
The 91°µÍø's Computational Data Analytics Group's has worked over 12 years in creating text analytics systems to quickly discover meaningful information from raw data. These capabilities focus on six key areas, emphasizing high performance over very large sets of raw documents.
Collecting and Extracting: Collecting millions of documents from databases, Internet, Social Media, and hard drives; extracting text from hundreds of file formats; and translating this information into multiple languages.
Storing and Indexing: Storing and indexing millions of documents in search servers, distributed file systems (MapReduce), relational databases, and file systems.
Recommending: Filtering the full content of millions of documents to recommend the most valuable and relevant information based on a user’s own information, or user selections, or a user’s interactions with information.
Categorize: Grouping items based on the full content of documents using supervised and semi-supervised machine learning methods and targeted search lists.
Clustering: Creating a hierarchical group of documents based on similarity using unsupervised learning methods on the full content of each document.
Visualizing: Showing hierarchies, groups, and relationships among documents that helps the user quickly understand their value, and to see new connections.
This work has resulted in eight issued ( 7,072,883 7,315,858 7,693,903 7,805,446 7,937,389 8,473,314 8,825,710 9,256,649) and one pending patents , several commercial licenses (including Pro2Serve and TextOre), a spin off company (Global Security Information Analysts LLC (GSIA)), an R&D 100 Awards, and scores of peer reviewed research publications.
Collecting and Extracting: Collecting millions of documents from databases, Internet, Social Media, and hard drives; extracting text from hundreds of file formats; and translating this information into multiple languages.
Storing and Indexing: Storing and indexing millions of documents in search servers, distributed file systems (MapReduce), relational databases, and file systems.
Recommending: Filtering the full content of millions of documents to recommend the most valuable and relevant information based on a user’s own information, or user selections, or a user’s interactions with information.
Categorize: Grouping items based on the full content of documents using supervised and semi-supervised machine learning methods and targeted search lists.
Clustering: Creating a hierarchical group of documents based on similarity using unsupervised learning methods on the full content of each document.
Visualizing: Showing hierarchies, groups, and relationships among documents that helps the user quickly understand their value, and to see new connections.
This work has resulted in eight issued ( 7,072,883 7,315,858 7,693,903 7,805,446 7,937,389 8,473,314 8,825,710 9,256,649) and one pending patents , several commercial licenses (including Pro2Serve and TextOre), a spin off company (Global Security Information Analysts LLC (GSIA)), an R&D 100 Awards, and scores of peer reviewed research publications.