This document introduces DayF core, an easy-to-use machine learning platform. Some key points:
- It is designed to be simple for domain experts to use with minimal machine learning knowledge required. Models and algorithms are automatically selected.
- It aims to address challenges like the difficulty for non-experts to exploit data and modern techniques, and the expense of traditional machine learning projects.
- The platform uses Python, H2O.ai and Apache Spark frameworks. It allows users to perform analyses, select recommended models, make predictions, and view results through a web interface or APIs.
2. KeyFactors
• Easy to use: visual design and on-line-help.
• Reliable: getting best objectives.
• Minor learning needs: addressing what do you want to do and
where is objective data platform selects for you the best model and algorithm.
• Trust: auditing your experiments on the same platform as you are working.
• Agility: when you need, where you need.
• Pricing: similar to other common features.
3. Whatisthemainproblem?
• Data and modern techniques like Advanced
Analytics are the basis for survival in the new scenario ...
• but the domain experts are not able to
exploit that huge flow of information ...
• but neither experts not developers
understand these models and don’t value these
technics and what they are able to add to business …
• and nowadays business units find it difficult to
trace improvements …..
• and it actually is an expensive investment.
4. Whataretheneeds?
• Forget the complexity of new technologies
lifecycle
• Adapt language to well-know concepts.
• Reports and information to be trust with results.
• Integration capabilities with current operating
platforms, tools and polices.
• Pricing accommodated to different business needs.
Automate your experience …
7. DayFKeyFeatures • Only need a datasheet and an objective column …. and your domain expert knowledge.
• No previous knowledge needed about Machine Learning.
• 5 working modes and 4 performance metrics for choosing to avoid overfitting.
• 100% based on parameters for configuration issues.
• Automatic algorithms selection:
• Decision Trees
• Probabilistic
• Linear
• Anomalies (supervised / unsupervised)
• Clustering (K-Means)
• Automatic data normalization:
• Missing data
• Analysis improvement
• Automatic full information storage based on 3 different engines:
• mongoDB
• hdfs
• Local Filesystem
• Technically independent for future changes on Machine Learning base framework.
8. Technologies
Base Technology: Python 3.6 and pandas
Integrated ML Framework: H2O.ai and Apache Spark
H2O.ai algorithms: GradientBoosting, RandomForest, NaiveBayes, GLM, K-Means, Autoencoders,
DeepLearning (ANN)
Apache Spark 2.2 Algorithms over ml library: GradientBoosting, RandomForest, NaiveBayes, GLM,
K-Means, BisectingKMeans, DecisionTree, Linear regression, LinearSVC, LogisticRegression
12. controller.save_models:
model descriptor’s list
saving option
[BEST, BEST_3, ALL, EACH_BEST]
controller.save_models(recomendations, mode=EACH_BEST)
controller.reconstruct_execution_tree:
lista de descriptores de modelos
métrica de ordenación
store: True/False
identificador de usuario
descriptor del experimento
execution_tree = controller.reconstruct_execution_tree(arlist=None, metric='rmse', store=False,
user=controller.user_id,
experiment=recomendations[0]['model_id'])
API:gDayFcore
Saving models
Execution tree
13. controller.sexec_analysis:
datapath/dataframe
model descriptor’s list
performance metric(optional)
Analysis deepness(optional)
status, recomendations2 = controller.exec_sanalysis(datapath=''.join(source_data),
list_ar_metadata=recomendations[-3:-2],
metric='rmse', deep_impact=3)
controller.remove_models(recomendations, mode=ALL)
controller.remove_models:
model descriptor’s list
dropping mode
[BEST, BEST_3, ALL, EACH_BEST]
API:gDayFcore
Self-service
analysis
Remove
models
14. controller.exec_prediction:
datapath/dataframe
execution model descriptor or
filepath to execution model
descriptor
prediction_frame = controller.exec_prediction(datapath=''.join(source_data),
model_file=recomendations[0]['json_path'][0]['value'])
controller.get_java_model:
execution model descriptor
type : [pojo, mojo]
# Save Pojo
result = controller.get_java_model(recomendations[0], 'pojo')
print(result)
# Save Mojo
result = controller.get_java_model(recomendations[0], 'mojo')
print(result)
API:gDayFcore
Making
predictions
Java
standalone
component