2020 11 l3 PC Slides
2020 11 l3 PC Slides
2020 11 l3 PC Slides
1
Agenda
§ Session 1
§ KNIME Software Overview
§ Working with KNIME Server
§ Connect to KNIME Server
§ Server-Side Workflow Execution
§ Remote Workflow Editor
§ Permissions & Versioning
§ Session 2
§ Introduction to Components
§ Component Configuration
§ Composite Views
§ WebPortal Applications
§ Session 3
§ KNIME Server REST API
§ Integrated Deployment
§ KNIME Server Administration
KNIME KNIME
Analytics Platform Server
Data
KNIME KNIME Community Partner KNIME
Science as
Extensions Integrations Extensions Extensions WebPortal
a Service
KNIME Community
Integrations Extensions
KNIME
Data Blending
Analytics Plattform
KNIME Partner
Extensions Extensions
Data Analytics
Data Engineers Data Scientists App Developers
Data Data Analysts ML/AI Engineers
Predictive Analytics
Science
Machine Learning
Artificial Intelligence
Great
Model or Report
IT Operations
Centralized resources / strategies
Standards and preferred platforms used, Financial / Risk Oversite
infrastructure options Costs allocation
Exit strategies Compliance officer
IT Security Data/model Governance,
Data, applications traceability, GDPR
Create Productionize
Gather & Wrangle Model & Visualize Deploy & Manage Consume & Optimize
https://hub.knime.com/
KNIME Forum
Account Credentials
IT Operations
Centralized resources / strategies
Standards and preferred platforms used, Financial / Risk Oversite
infrastructure options Costs allocation
Exit strategies Compliance officer
IT Security Data/model Governance,
Data, applications traceability, GDPR
Features:
§ Self Documenting
§ No limits: All nodes
Workflow § DB, Spark, DL, Python
etc.
§ Task packaging
§ Mix and Match
§ Sharable / Reusable /
Instantiated
Component
https://www.knime.com/blog/knime-analytics-platform-40-components-are-for-sharing
As a web-based
application
within a workflow
(manual/automated)
https://www.knime.com/blog/knime-meets-knime-will-they-blend
© 2020 KNIME AG. All rights reserved. 16
Data Science Practice: Multiple Stakeholders’ Needs
Data Engineers Smart Business Users (more than Excel)
Data Science “coders” (Python, etc.)
Application Users – Interaction required
Data Science Specialists
Data Science Visual workflow / Application Users – Made to spec
generalists Report Consumers
IT Operations
Centralized resources / strategies
Standards and preferred platforms used, Financial / Risk Oversite
infrastructure options Costs allocation
Exit strategies Compliance officer
IT Security Data/model Governance,
Data, applications traceability, GDPR
Features:
§ Visualizations
§ Plotly, JavaScript, etc.
§ Reports Creation BIRT
§ Integration with
§ Excel
§ Functionality exploitation
not just CSVs
§ PowerBI
§ Tableau
§ Qlik
https://www.knime.com/community/continental-nodes-for-knime-xls-formatter § Spotfire
§ …
© 2020 KNIME AG. All rights reserved. § 19
Empower Business Users Appropriately
Features:
§ Workflows and webportal nodes
build interactive applications
& dashboards
§ KNIME WebPortal manages access
https://www.knime.com/blog/principles-of-guided-analytics
IT Operations
Centralized resources / strategies
Standards and preferred platforms used, Financial / Risk Oversite
infrastructure options Costs allocation
Exit strategies Compliance officer
IT Security Data/model Governance,
Data, applications traceability, GDPR
Features:
§ Scheduled
§ Triggered
§ Called (Rest / SAAS)
§ Call Actions based on status
§ Scale and Pin Execution
§ View, edit, execute workflows
remotely
https://docs.knime.com
IT Operations
Centralized resources / strategies
Standards and preferred platforms used, Financial / Risk Oversite
infrastructure options Costs allocation
Exit strategies Compliance officer
IT Security Data/model Governance,
Data, applications traceability, GDPR
Features:
§ Many Techniques available
§ LIME
§ SHAP
§ Shapley
§ Partial Dependence / ICE
§ Binary Classification Inspector
https://hub.knime.com/knime/extensions/org.knime.features.mli/latest
Archive
Document
Explore &
Analyze
IT Operations
Centralized resources / strategies
Standards and preferred platforms used, Financial / Risk Oversite
infrastructure options Costs allocation
Exit strategies Compliance officer
IT Security Data/model Governance,
Data, applications traceability, GDPR
Features:
§ Client Customizations
§ Custom update sites
§ Manage preferences via profiles
§ Node repository & libraries
§ Monitor server activity
§ Running and scheduled jobs
§ Adjust permissions
§ Manage ongoing services
Features:
KNIME
Server § Single sign-on (SSO) to KNIME
Server
§ Integrate with multiple identity
providers
§ Flexible configuration capabilities
Client
Identity
to map users and groups
Provider
§ Manage all aspects of KNIME
usage
https://docs.knime.com
32
Set Up a New Mount Point
Components
37
Executing a Workflow on the Server – Remote Execution
44
Remote Workflow Editor – 1/3
§ Permissions can be set for all types of items: workflows, workflow groups,
components, and data files
§ Permissions are assigned to either individual people or user groups
§ The user who uploads an item, automatically becomes its owner
§ Users with admin rights have no restrictions on permissions
§ The owner, plus everyone with admin rights, can assign and change
permissions for an item
§ It is also possible to set permissions on schedules, such that a schedule can be
maintained/changed by a team member while the owner is e.g. on vacation
Read Download a workflow job - See the content of a workflow group § File: download data and
including data execute workflows that use the
data
§ Component: use and download
Write Overwrite, create snapshots, Create and upload new items in a Overwrite a file or component
and delete workflows workflow group
Everybody Else
Highlight differences:
§ Nodes included/excluded
§ Node configurations
56
Config Details – Access KNIME Server
*for double names and double surnames the whitespace has been removed
§ Configure a mount point for KNIME Server with the details provided in the
Config Details – Access KNIME Server slide at the end of the slide deck
§ Download Server Training Material in your LOCAL workspace (hint: drag
and drop or copy and paste the entire folder)
§ Create a snapshot of the workflow and check the created snapshots with Server
History view
§ Overwrite the existing item
§ Check the option Create Snapshot before Overwriting and provide a name to the snapshot
§ The snapshot is listed in the Server History view (View à Other à Server History)
§ Download the snapshot to LOCAL Workspace
§ Use the WorkDiff feature to compare the downloaded workflow with the latest
workflow deployed on KNIME Server:
§ use Ctrl (or Cmd) button to select both workflows from KNIME Explorer;
§ right click on one of the two workflows
§ click Compare
§ select the two nodes that you wish to compare
§ click Show configuration differences of highlighted nodes
Metanodes Components
WebPortal Usage Executed in the background JavaScript Views and Widgets inside the
component are shown on a WebPortal page
Execution mode Normal execution Allows Simple Streaming execution
Recommended uses Workflow cleaning Enabling custom interactions, producing
interactive views, sharing functionalities
§ Flow Variables are -by default - only available locally inside the component
§ Configure the component input/output to pass Flow Variables
from/to outside the component
§ A layout can be defined for any Component that contains at least one widget
node or JavaScript-enabled view
§ The layout editor can be accessed from the top toolbar, when inside the
component
§ The “Append the IDs to node names” button on the top bar shows the ID of each
node
§ This is useful to reorder the items in the layout structure for the WebPortal
92
Data Science Practice: Multiple Stakeholders’ Needs
Data Engineers Smart Business Users (more than Excel)
Data Science “coders” (Python, etc.)
Application Users – Interaction required
Data Science Specialists
Data Science Visual workflow / Application Users – Made to spec
generalists Report Consumers
IT Operations
Centralized resources / strategies
Standards and preferred platforms used, Financial / Risk Oversite
infrastructure options Costs allocation
Exit strategies Compliance officer
IT Security Data/model Governance,
Data, applications traceability, GDPR
Extending data
Incorporate domain Amplifies the best
science to the
experts’ knowledge data science
Business Analysts
KNIME
Guided
Analytics &
Automation
Interaction Points
§ If a workflow is selected in the left section, its details page is shown in the
section on the right
105
Classic CRM Analytics
Model
*for double names and double surnames the whitespace has been removed
§ Use the Text Output Widget node to write the webpage description
Text for the WebPage (hint: use html as text format):
<h2>Define Cluster Parameters</h2>
<p>Set parameters to be taken into account in the following clustering.</p>
<p>Click 'Next' to start the clustering process.</p>
<P>If you do not know what a clustering process is, check <a
href="https://en.wikipedia.org/wiki/Cluster_analysis">Cluster Analysis</a> and specifically the <a
href="https://en.wikipedia.org/wiki/K-means_clustering">k-Means algorithm</a>.
§ Encapsulate the 4 created nodes in a component and configure 2 outports: one
for the Integer Widget node and one for the Column Filter Widget node
§ Define the layout of the items with in order to have the items ordered as shown
in the figure
Text Output
Widget
Column
Filter Widget
Output in server
Input data response
§ The workflow can also be executed by external tools such as Postman or Curl
for debugging purposes
§ KNIME Server as backend for third party analytical applications
Issues:
• Development =!
Deployment
• Needs Copy/Paste, Rewrite
• Transport of models is non-
trivial
# read data
raw_target_data = read_xls_data() productionize # Predictions, running as a flask service
# remove duplicates, handle missing values:
target_data = basic_data_cleanup_with_pandas(raw_target_data) # load saved components
raw_feature_data = fetch_db_data_using_psycopg2() feature_scaler,trained_RF = load_models_with_joblib()
# remove duplicates, handle missing values:
feature_data = basic_data_cleanup_with_pandas(raw_feature_data) # read and prepare data
raw_prediction_data = get_dataframe_from_request()
# basic feature engineering with sklearn prediction_data = basic_data_cleanup_with_pandas(raw_prediction_data)
feature_scaler = sklearn.preprocessing.StandardScaler().fit(feature_data) scaled_data = feature.scaler.transform(prediction_data)
standardized_features = feature_scaler.transform(feature_data)
filtered_feature_data = variance_feature_filter_with_sklearn(standardized_features,target_data) # generate predictions
predictions = trained_RF.predict(scaled_data)
# build model with sklearn prediction_probs = trained_RF.predict_proba(scaled_data)
training_feature_data,testing_feature_data,training_target_data,testing_target_data = predictions = join_tables_with_pandas(predictions,prediction_probs)
split_data_with_sklearn(filtered_feature_data,target_data)
RF_params = RF_hyperparameter_search_with_sklearn(training_feature_data,training_target_data) # return results from the service:
trained_RF = build_RF_using_params(training_feature_data,training_target_data,RF_params) return_dataframe_to_service(predictions)
# validate model
generate_validation_report(trained_RF,testing_feature_data,testing_target_data)
#--------------------
# save models
save_models_with_joblib((feature_scaler,trained_RF))
Executor(s)
KNIME Server Large Message
Queue
Workflow
Repository Request
Request
Tomcat
Request
…
Web Container
…
Client
Features:
§ KNIME Analytics Platform
§ KNIME Server Small & Medium
§ KNIME Server Large BYOL
§ Supports Server Large with multiple
Executors
§ Has an embedded Executor so can be
stand-alone
§ KNIME Executors
§ Multiple Executors that can be used by
KNIME Server Large
§ Pay as you go (PAYG) offering supports
elastic scaling
https://www.knime.com/knime-software-on-amazon-web-services
§ Bring your own license (BYOL) offering
https://www.knime.com/knime-software-on-microsoft-azure uses cores from your Server license
Executors BYOL
Features:
§ Supplement traditionally licensed
Executor Executor Executors with Pay-as-you-Go
(PAYG) model
ı
Executor Executor
§ Meet periodic demand peaks
KNIME § Fulfill need for speciality hardware
Server Executors PAYG
(e.g. GPU‘s)
Executor Executor § Meet budgeting needs
ı
Executor Executor
Executor Executor
Features:
§ Mix of Enterprise data center
Executor Executor and Cloud deployments
§ Meet periodic demand peaks
KNIME
Server AWS Virtual Private Cloud § Fulfill need for speciality
hardware (e.g. GPU‘s)
Executors PAYG
§ Meet budgeting needs
Executor Executor
VPN
ı
Executor Executor
150
Set Properties
Features:
RAM
Executor
CPU
§ Match workflow needs to
Executor capabilities
KNIME
Executor
GPU CPU § Partition compute resources
Server
by capability, department,
RAM GPU
usage, …
Executor
§ Workflow needs determined
by workflow publisher
Features:
CPU RAM Marketing
Executor § Logical groupings of
Group 1
Executors
KNIME Executor
CPU Database Finance
Finance § Match users/groups to
Server Group 2
Executor Groups
Executor
CPU GPU Engineering § Partition compute resources
Group 3 by groups, department, …
§ Partitioning managed by
Server administrators
§ knime-server.config
§ Located in \workflow_repository\config
§ Central configuration file
§ knime.ini
§ Located in Executor installation folder
§ Provides runtime parameters and JVM settings
§ executor.epf
§ Centrally managed in \workflow_repository\config\client-profiles\executor
§ Runtime copy in Executor (and client) workspace
§ Specifies preferences, e.g. database drivers, Python environments, …
§ Is distributed to all Executors that connect to the Server
Features:
KNIME
Server § Single sign-on (SSO) to KNIME
Server
§ Integrate with multiple identity
providers
§ Flexible configuration capabilities
Client
Identity
to map users and groups
Provider
§ Features:
§ Easier IT Operations.
§ Manage Analytics Platform preferences centrally
§ Include dependencies – e.g. driver files.
§ Deliver updates to configurations automatically
§ Different departments/teams
have different requirements
§ Client-profiles
§ Python-Linux
KNIME Server § Python-macOS
§ R-Linux
client-profiles
§ Databases-Win7
Databases- Python-macOS § Big Data-Win10
Big Data-Win10
Win7 (etc) § Executor
1. Knime.ini
By adding lines to the knime.ini (file available in the same directory as the KNIME Analytics Platform
executable)
On application startup of KNIME Analytics Platform, KNIME Server is queried for the specified
preference profiles. Preferences are applied before finishing startup
Path to the workflow: Examples à REST à Predict Results Using REST API
§ Right click menu à Show API Definition
§ Explore Execution Endpoint: GET Request
§ Try out and execute from browser
§ We will keep KNIME Server up and running for an additional week to let you
play around a little bit more with it
§ Interested in a trial license? Just send me an email at
179