Big Data and Text Analytics Solutions

Complex Event Processing (CEP) and Prospective Search Engine

We have developed a novel, state of the art engine that changes the way you think about search. With this product, instead of searching for already indexed data, we are indexing your search queries instead. By feeding data in our real-time stream processing pipelines, we evaluate your data against the queries and find all queries that match the input data. A piece of information passed to the input pipelines can be evaluated against billions of queries in a matter of milliseconds. That makes our product exceptionally good for use in the fields of real-time stock trading, logging and monitoring, reverse search, sales analytics and fraud prevention, and many more. We offer unparalleled performance, scalability and features compared to products like Elastic Search's percolator.

Our Prospective Search and Event Processing Engine is implemented on both Java and .NET


  • generic format for the input data: key-value pairs
  • multiple operator support: (AND, OR, NEAR, NOT)
  • multiple condition support: contains, does not contain, equals, does not equal, smaller than, greater than, etc.
  • grouping of queries: (X and Y) or Z
  • custom query language and grammar, which generates specialized parsers for the query language
  • out of the box scalability
  • you can store and load your queries from multiple persistent data stores - MySQL, PostgreSQL, Oracle, Microsoft SQL Server, MongoDB, etc.
  • state of the art exact phrase search
  • geolocation search and reverse geocoding
  • prefix and stemming search support


Beam is an iOS app we've develop to showcase the power of our technology stack.

Beam changes the way you consume news - instead of searching for past events and stories, you subscribe to Beams (search topics) , and you receive real time push notifications whenever there is something about your topic.

Beam works on all iOS devices, and you can receive native push notifications on your Apple Watch, too. You can configure the number of notifications you receive, both in total, as well as for every individual beam.

Learn More

Semantic Text Processing Library

A library that performs the following text processing tasks - tokenization, stemming, entity extraction, part of speech tagging (POS), and phrase generation. Our phrase generation algorithm uses linguistic models in order to generate the most semantically meaningful and correct phrases based on everyday usage and part of speech rules. The entity recognition feature implements the following entity types: people ("Barack Obama"), locations ("San Francisco", "State of NY"), organizations ("Google", "Coca Cola"), dates/times, currencies, events.

Data Feed Normalization Engine

Feed your data from multiple sources in different formats into an API that normalizes the data, and passes it to high-performance streaming queues for further processing. Since the engine is highly text-aware, it is also automatically extracting artifacts from HTML content - links, article / content text, images, etc. We have adapters built for most widely used content providers such as Facebook and Twitter. We abstract the complexity and internals of the input data feeds (poll vs push, for example), and provide failover, resilience, scalability, multi-threading features in the pipelining. Our architecture allows new input providers to be easily developed and plugged in the feed normalization engine.

Word & Phrase Generalization and Similarity Matching API

This is a highly scalable service for expanding a textual term to related topics/entities for the benefit of generalizing pattern matching. To put it simply, it finds "similar" word vectors for a given word or phrase. For example if you specify "black hole" as the input, the result will include things like "Stephen Hawking", "relativity theory", "wormhole", "gravity", etc. You can use it for smart auto-suggestions (similar to Google's), suggesting similar entities/articles, semantic matching based on topics.

We have basically implemented the WORD2VEC algorithm from scratch in C using a very efficient approach that relies on a spatial database and random projections. We have also exposed it as a service. We use Thrift as the data exchange protocol. We have trained our service using huge datasets - Wikipedia's database, Google News' archive, books, etc. We can train our service using your own dataset and customize it for your needs. What makes our WORD2VEC implementation unique is the search speed - O(log n) worst case, O(1) after cache is built - compared to any existing solution on the market.

Mobile UI Framework

Our Mobile UI Framework is used in the Beam app and provides a highly-visual data feed for consuming real-time content. You can visualize diverse company data in a common, unified way; you can visualize news and other news-related data from existing public sources in a common format, without ads.

This ia a set of native iOS components with features like rich text formatting, high-fidelity scrolling, lots of tweaks on image loading and caching; advanced filters and aggregation support, ability to persist your favorite picks. In order to achieve smooth scrolling regardless of scroll speed, we have used various optimizations such as lazy loading of list items (virtualization), async loading of images while scrolling, caching of images / cache invalidation.

Some of the features of our Mobile UI Framework include:

  • Persist user settings
  • UI for configuring push notifications
  • "Near me" geolocation settings
  • CRUD UI for searches
  • Pull to Refresh
  • Facebook integration - generate searches based on facebook likes, using our Generalization/Similarity Matching API
  • Trending items feed
  • Favorites feed
  • Different units of measure - metric, imperial

Asset Recognition Library

With this component, you can extract text, images, and other assets (urls, hashtags, etc.) from HTML code. Imagine an article on NYTimes - it's easy for a human to see what the actual article text is, but very hard for an algorithm to know that, considering how much other text there is in the source. The component also analyzes images and suggests key (best) image to show in a summary. Images need to meet certain criteria before they are selected - for instance if we are to visualize the image on a mobile phone, the width/height ratio needs to meet specific requirements. We use multiple existing algorithms to achieve the best result. Since multiple sources can feed the same content, there are features in place which take care of the duplicates removal. Duplcate text is not only evaluated based on an "equals" condition, but we also use a similarity ranking for titles as well. If there are two titles, one equal to "The new Tesla Model X is here", and the other one equal to "Tesla has released the new Model X", the similarity matching algorithm will treat them as the same text.

Data Analytics Engine

The engine evaluates real time data feeds for patterns and summarizes trends and top items. You can use it in any reporting, statistics and trends-related app. We store everything in a RDBMS and we use multi-threaded workers that do scheduled analysis of high-volume data on an hourly and daily basis.

This component performs frequency analysis of single words and phrases, taking into account their part of speech as well. The same is done for entities. Then data is ranked, both hourly/daily, as well as in total (from the initial run of the software). Since we use SQL, you can apply complex reporting and analytics tasks on the data, out of the box. For example: "Give me the person that's trending in the news today", "Which is the company that the news mentioned most for the past 2 weeks", "What are the top 5 phrases in the current article".

Workflow Trigger Engine

This is a server component for applying workflow business rules like PTO approval, reminders, triggering actions based on workflow outcomes. This can be hooked into other Stratoray components such as our Complex Event Processing engine. You can apply rules like "Notify me when the title of an incoming document contains word X, and my assignment is due today", or "Notify me two days before my assignment is due".

Notification Engine

Multi-threaded, scalable server component for evaluation and delivery of notifications to discrete endpoints and devices. It abstracts the underlying technology (APNS, Android, sms, etc.)

Real-time Adaptive Result Ranking filters

One of the key issues with sending real-time notifications is to avoid flooding the end user and only process the most relevant and important results for that end user. We have implemented a novel approach for analyzing top matches to send based on statistics, moving averages and adaptive ranking.

For instance, if there is some breaking news today which would result in multiple alerts send to users, because everyone would be talking about it, our component kicks in and filters the most relevant stuff. Once this breaking news event calms down, the component will start letting more of the less "high quality" matches sneak in. Since notifications should not flood the user, a lot of the analysis needs to happen in real time before it is decided whether to send an alert or not. It's not desired to send similar matches which are essentially about the same thing. Also, some users may set a limit to the number of notifications they would like to receive in a given day, so care must be taken to send only the top quality notifications and discard the rest.

This component implements a lot of heuristics and algorithms in order to achieve this. For example if there are a lot of notifications and matches for a particular topic, the logic switches to sending a notification only if there is match in the title. Also, it switches to sending a notification only if the match is in the first 1/X of the text. There are moving averages calculated for all of those metrics, so that if a particular topic is very hot at a specific time, the criteria / bar for notification quality will be very high, but after a certain time, when the topic isn't that hot anymore, the bar for sending notifications will get lower with time - meaning that it will be ok to send a notification even if the match isn't in the title.