YottaSearch: SaaS Based Open Source Search Platform

searchpicThis is the first in a series of Fall 2015 High Level Product Reviews that I am publishing on the family of Yotta Data Management (YDM) solutions from Yotta Data Technologies (YDT). The first product that we will review in the YDM family is YottaSearch™.

YottaSearch™ is a Software-as-a-Service (SaaS) based big data search platform developed by Yotta Data Technologies (YDT) that can support a wide range of requirements from small business to large enterprises.  As part of the family of Data Management solutions from Yotta Data Management (YDM), YottaSearch™ utilizes the same powerful, fast and flexible open source technologies that power next generation features and functionality of the world’s largest web sites and data platforms.

By leveraging open source technology and leading Cloud Service Providers (CSPs), Yotta Data Technologies offers YottaSearch™ at very disruptive price points to corporations, firms and agencies, whether they be a small local firm or a multinational corporation with offices around the globe.

Dynamic Dashboard(s)

YottaSearch™ features YDM Dash, a very easy way to build dynamic data dashboards with interactive search visualizations and drag-n-drop widgets.  Users are able to graphically search their data, discern patterns and identify documents using nothing more than a web browser.

Data Widgets

YottaSearch™ supports the following Data Widget

  • Heatmap
  • Tree
  • Market Map
  • Text
  • Timeline
  • Pie
  • Line
  • Bar
  • Map
  • Filters
  • Grid
  • HTML

File Crawling

YottaSearch™ enables users to crawl the following platforms:

  • Local File System
  • Hadoop Distributed File System (HDFS)
  • Network File Shares
  • Web Sites, Wikis, Blogs
  • Email Accounts (IMAP, POP3)
  • Search Platforms (Solr, Open Search Server, ElasticSearch)
  • SaaS Platforms (Salesforce, SugarCRM, Alfresco, Google Drive, Dropbox)
  • Enterprise Systems (EMC Documentum, Microsoft SharePoint, Autonomy Meridio,  IBM FileNet, OpenText LiveLink)
  • Java Database Connectivity (JDBC)
  • SQL Databases (PostgreSQL, MySQL, Oracle DB, SQL Server)
  • NoSQL Databases (CouchDB, MongoDB, HBase, Cassandra)

File Indexing

YottaSearch™ supports the following File Indexing features:

  • Near Real-Time Indexing – documents are available for search almost immediately after being indexed.
  • Deduplication – Efficient, hash based exact/near document duplication detection and blocking (MD5, Lookup3, TextProfile).
  • Language Detection – ability to detect more than 50 languages before indexing, including advanced CJK support.
  • Natural Language Processing – define custom pipelines of Analysis Engines & Machine Learning which incrementally add metadata to documents via annotations, including:
    • Parts-of-Speech Tagging (nouns, verbs)
    • Chunking (noun phrases, verb phrases)
    • Named Entity Recognition (names, dates, places, currency)
    • Tokenization (identifies sentences or words)
    • Document Categorizer (classify text into pre-defined categories)
  • Big Data Performance – distributed indexing, high availability, automatic replication, atomic updates, real-time get and no single point of failure.

Advanced Search Capabilities

YottaSearch™ supports the following Advanced Search Capabilities:

  • Full-Text Search – powerful matching capabilities including phrases, wildcards, joins, grouping and much more.
  • Multiple Query Types – Dozens of query types (boolean, phrase, term, numeric, fielded and more) provide incredible power when searching.
  • Snippets – search results featuring highlighted text (document excerpts).
  • Faceted Search and Filtering –  arrangement of search results into categories  based on indexed terms.
  • More Like This – submit new queries that focus on particular terms from facets or clusters.
  • Auto Suggest – provides automatic, read ahead suggestions for query terms.
  • Phonetic Matching – a “soundalike” tool that lets helps with personal name search by applying phonetic rules.
  • Geospatial Search – location-based search with support for spatial search.
  • Relevance – balances precision and recall to meet the needs of a particular user community.
  • Hit Highlighting – also called “keyword in context”, this feature returns snippets of documents with matching query terms highlighted.
  • Search Result Clustering – automatically organizes search results into thematic folders, without external knowledge bases or training sets.
  • Query via HTTP GET and receive JSON, XML, CSV or binary results.

File Types

YottaSearch™ supports the following file Types:

  • HyperText Markup Language – HTML, XHTML
  • Extensible Markup Language – XML, XHTML, OOXML, ODF
  • Microsoft Office Document Formats – PST, MSG, Excel, Powerpoint, Word, Visio, Publisher
  • OpenDocument Format – ODF
  • iWorks Document Formats – Numbers, Pages, Keynote
  • Portable Document Format – PDF
  • Electronic Publication Format – EPUB
  • Rich Text Format – RTF
  • Compressed Packages – TAR, CPIO, ZIP, 7Zip
  • Text Formats
  • Feed Formats – RSS, Atom
  • Help Formats – CHM
  • Audio Formats – MP3, MP4, Vorbis, Speex, Opus, Flac
  • Image Formats – Jpeg, TIFF, PSD, PNG, GIF, BMP, BPG
  • Video Formats – MP4, Quicktime, Ogg
  • Java Files
  • Source Code
  • Mail Formats – MBOX, EML, Outlook PST,
  • CAD Format – DWG
  • Font Formats
  • Scientific Formats – HDF, NetCDF, Matlab, GDAL
  • Executable Programs
  • Crypto Formats – PKCS7
  • JSON Format

Advisory and Managed Services

For those users that don’t want to “get their hands dirty” or just need some help and training, YTD provides both Advisory Services and a full Managed Services Offering.


If you are looking for a web based search tool for personal or enterprise use, YottaSearch™ has an impressive set of powerful features that are extremely easy to use yet produce professional results.  We recommend YottaSearch™ be on your short list.

For more information about Yotta Data Management: http://yottadata.management/

About Charles Skamser
Charles Skamser is an internationally recognized technology sales, marketing and product management leader with over 25 years of experience in Information Governance, eDiscovery, Machine Learning, Computer Assisted Analytics, Cloud Computing, Big Data Analytics, IT Automation and ITOA. Charles is the founder and Senior Analyst for eDiscovery Solutions Group, a global provider of information management consulting, market intelligence and advisory services specializing in information governance, eDiscovery, Big Data analytics and cloud computing solutions. Previously, Charles served in various executive roles with disruptive technology start ups and well known industry technology providers. Charles is a prolific author and a regular speaker on the technology that the Global 2000 require to manage the accelerating increase in Electronically Stored Information (ESI). Charles holds a BA in Political Science and Economics from Macalester College.