Apache Spot

Apache Spot for Cybersecurity

Spot Architecture Diagram



Data Science Process in 4 words: 
  • Congest :  pull raw data from diverse sources
  • Ingest :  prepare dataset (cleanup, transform)
  • Digest :  explore, model, train, test, learn
  • Suggest :  predict, publish, disseminate

Open Data Model



Spot Demo :
https://www.youtube.com/watch?v=tWlRQKI4I6o

One usecase - Cloudera Spot-On Security (SOS)
http://cloudera.com/cybersecurity

Open Source at GitHub :
https://github.com/apache/incubator-spot

Documents


Data Science Overview


People + Process

  • Information Pyramid:   Data > Information > Knowledge > Insight
  • 1 Teamwork by 3 Groups
  • Extract business value in a work cycle


Products and Tools



Workflow



Machine Learning Algorithm Matrix


Wikipedia articles

Machine learning := https://en.wikipedia.org/wiki/Machine_learning
gives "computers the ability to learn without being explicitly programmed."

Hidden Markov model :=  https://en.wikipedia.org/wiki/Hidden_Markov_model
Artificial neural network :=  https://en.wikipedia.org/wiki/Artificial_neural_network
Cluster analysis :=  https://en.wikipedia.org/wiki/Cluster_analysis
Naive Bayes classifier :=  https://en.wikipedia.org/wiki/Naive_Bayes_classifier
k-nearest neighbors algorithm :=  https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
Decision tree learning :=  https://en.wikipedia.org/wiki/Decision_tree_learning
Random forest :=  https://en.wikipedia.org/wiki/Random_forest

Predictive analytics := https://en.wikipedia.org/wiki/Predictive_analytics
devise complex models and algorithms to produce reliable, repeatable decisions and results and uncover hidden insights thru learning from historical relationships and trends in the data
Note: this blog has a list of Machine learning techniques

Become a Data Science Master via MOOC








Dimensions of business needs

  • Increase revenue via social media
  • Enhance productivity via technology innovation
  • Improve efficiency via workflow automation
  • Optimize performance (KPI) via operational analytics
  • Minimize risk (downtime, cybersecurity) via Spot

Hype cycle


The hype cycle provides a graphical and conceptual presentation of the maturity of emerging technologies through five phases

Amara's law: 
We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run

Comments

Popular posts from this blog

Spot-demo

Brain-Machine-Interface BMI