GitHub - alexkross/AIR: Real-time data stream classification and knowledge generation engine with no dependencies

AIR

AIR is a real-time data stream classification and knowledge generation engine with no dependencies.

Actual (ubiquitous) dependcies are:

C, Pony-lang;
libcurl, libzmq, libjson-c;
Python, Cython, IPython/Jupyter are optional for CLI, REPL, debugging and visualization.

The core algorithm is lightweight and optimized enough to be implemented with FPGA/ASIC/SoC in parallel architecture or onto embedded systems as well as in globally distributed cloud platform, FaaS or SaaS. ZMQ may be replaced with MQTT or RAET. Curl and Json can also be replaced with other interface APIs for autonomous builds.

Overview

A core of the system, The Processor should do the smallest thing it aimed at - generate the best knowlege of possible from ingress event/data streams and try to produce predictive models.

Output (egress) knowledge stream is as compact as thoroughly processed according to the platform capabilities (RAM size, processing power, etc). In other words, you can see this as a lossy data compression black-box that tries to guess future events, supporting interactive pattern composition or customization.

Tasks for data (input and/or output) storage, aggregation, grouping, etc are beyond the scope of this project. They have a wide range of solutions for these, search in the GitHub.

Application

Currently my focus is on a Cisco ACI (Application Centric Infrastructure, aka APIC or Cisco SDN) deployment for FinTech.

Assumed ingress steams from 3-rd source on are (in order of importance):

Audit;
Faults;
Events;
Subscribed MOs;
Atomic counters;
NetFlow (huge, optional).

Note: Fabric configuration and sufficient subset of MIT is consumed by the engine at the boot phase on stream 1. User-provided corrections are fed on streams 1-2. See the diagram for the overal view.

Because the Processor can produce acceptable decisions under restricted environment and allowed time frame, the system is also targeted to HFT.

Scalability

Although one solid powerful engine is expected to perform better than a bunch of aggregated smaller ones it is possible to treat an egress flow as an ingress one and feed it to the higher engine in N-to-1 directed graph topology provided that the metadata (almost static) is properly replicated.

On the other hand by cascading the engines in such a way the more conscious reasoning may be achieved on the top node with less demands on resources.

The outlined here topology is not the only possible, moreover the Egress stream may be as simple as serial bitwise flow or rather complicated for the case when interactiveness are reqiured (e.g. the subject is a human).

Current status

Under development. Commits might be sporadic.

Feedback and contribution

Questions and suggestions are highly welcome. Just drop me an E-Mail.

Any contribution will support the development.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
engine		engine
LICENSE		LICENSE
Logical block diagram - data flow and control.pdf		Logical block diagram - data flow and control.pdf
Logical block diagram - data flow and control.png		Logical block diagram - data flow and control.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AIR

Overview

Application

Scalability

Current status

Feedback and contribution

About

Releases

Packages

Languages

License

alexkross/AIR

Folders and files

Latest commit

History

Repository files navigation

AIR

Overview

Application

Scalability

Current status

Feedback and contribution

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages