Skip to content

Commit be81852

Browse files
committed
added mlops fraud detection docs. small edd change
1 parent 76b6a54 commit be81852

14 files changed

+413
-1
lines changed
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
---
2+
title: MLOps Fraud Detection
3+
date: 2023-11-12
4+
validated: false
5+
summary: This pattern demonstrates how Red Hat OpenShift AI and MLFlow can be used together to build an end-to-end MLOps platform. It demonstrates this using a credit card fraud detection use case.
6+
products:
7+
- Red Hat OpenShift Container Platform
8+
- Red Hat OpenShift AI
9+
- Red Hat Open Data Foundation
10+
industries:
11+
- financial services
12+
aliases: /mlops-fraud-detection/
13+
pattern_logo: mlops-fraud-detection.png
14+
links:
15+
install: mfd-getting-started
16+
arch: https://www.redhat.com/architect/portfolio/architecturedetail?ppid=6
17+
help: https://groups.google.com/g/validatedpatterns
18+
bugs: https://github.com/arslankhanali/mlops-fraud-detection/issues
19+
ci: mfd
20+
contributor:
21+
name: Arslan Khan
22+
contact: mailto:arskhan@redhat.com
23+
git: https://github.com/arslankhanali
24+
---
25+
:toc:
26+
:imagesdir: /images
27+
:_content-type: ASSEMBLY
28+
29+
include::modules/mfd-about-mlops-fraud-detection.adoc[leveloffset=+1]
30+
31+
include::modules/mfd-architecture.adoc[leveloffset=+1]
32+
33+
[id="next-steps_mfd-index"]
34+
== Next steps
35+
36+
* link:mfd-getting-started[Deploy the management hub] using Helm.
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
---
2+
title: Getting started
3+
weight: 10
4+
aliases: /mlops-fraud-detection/getting-started/
5+
---
6+
:toc:
7+
:imagesdir: /images
8+
:_content-type: ASSEMBLY
9+
10+
include::modules/mfd-deploying-mfd-pattern.adoc[leveloffset=1]
11+
12+
include::modules/mfd-using-mfd-pattern.adoc[leveloffset=1]
13+
14+
= Next Steps
15+
16+
link:https://groups.google.com/g/hybrid-cloud-patterns[Help & Feedback]
17+
link:https://github.com/validatedpatterns/mlops-fraud-detection/issues[Report Bugs]

modules/edd-deploying-edd-pattern.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ Emerging Disease Detection Validated Pattern.
145145

146146
You can run `make predeploy` to check your values. This will allow you to review your values and changed them in
147147
the case there are typos or old values. The values files that should be reviewed prior to deploying the
148-
Medical Diagnosis Validated Pattern are:
148+
Emerging Disease Detection Validated Pattern are:
149149

150150
|===
151151
| Values File | Description
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
:_content-type: CONCEPT
2+
:imagesdir: ../../images
3+
4+
[id="about-mlops-fraud-detection-pattern"]
5+
= About the MLOps Fraud Detection
6+
7+
MLOps Credit Card Fraud Detection use case::
8+
* Build and train models in RHODS to detect credit card fraud
9+
* Track and store those models with MLFlow
10+
* Serve a model stored in MLFlow using RHODS Model Serving (or MLFlow serving)
11+
* Deploy a model application in OpenShift that runs sends data to the served model and displays the prediction
12+
13+
+
14+
Background::
15+
AI technology is already transforming the financial services industry. AI models can be used to make rapid inferences that benefit the FS institute and its customers. This pattern deploys a AI model to detect fraud on crdit card transactions
16+
17+
[id="about-solution"]
18+
== About the solution
19+
20+
The model is built on a Credit Card Fraud Detection model, which predicts if a credit card usage is fraudulent or not depending on a few parameters such as: distance from home and last transaction, purchase price compared to median, if it's from a retailer that already has been purchased from before, if the PIN number is used and if it's an online order or not.
21+
22+
== Technology Highlights:
23+
* Event-Driven Architecture
24+
* Data Science on OpenShift
25+
* Model registry using MLFlow
26+
27+
== Solution Discussion
28+
29+
This architecture pattern demonstrates four strengths:
30+
31+
* *Real-Time Processing*: Analyze transactions in real-time, quickly identifying and flagging potentially fraudulent activities. This speed is crucial in preventing unauthorized transactions before they are completed.
32+
* *Pattern Recognition*: Detect patterns and anomalies in data and learn from historical transaction data to identify typical spending patterns of a cardholder and flag transactions that deviate from these patterns.
33+
* *Cost Efficiency*: By automating the detection process, AI reduces the need for extensive manual review of transactions, which can be time-consuming and costly.
34+
* *Flexibility and Agility*: An cloud native architecture that supports the use of microservices, containers, and serverless computing, allowing for more flexible and agile development and deployment of AI models. This means faster iteration and deployment of new fraud detection algorithms.
35+
36+
// video link to a presentation on the use case
37+
.Overview of the solution in credit card fraud detection
38+
* video link coming soon
39+
// video::VHjpKIeviFE[youtube]

modules/mfd-architecture.adoc

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
:_content-type: CONCEPT
2+
:imagesdir: ../../images
3+
4+
[id="overview-architecture"]
5+
== Overview of the Architecture
6+
7+
Description of each component:
8+
9+
* *Data Set*: The data set contains the data used for training and evaluating the model we will build in this demo.
10+
* *RHODS Notebook*: We will build and train the model using a Jupyter Notebook running in RHODS.
11+
* *MLFlow Experiment tracking*: We use MLFlow to track the parameters and metrics (such as accuracy, loss, etc) of a model training run. These runs can be grouped under different "experiments", making it easy to keep track of the runs.
12+
* *MLFlow Model registry*: As we track the experiment we also store the trained model through MLFlow so we can easily version it and assign a stage to it (for example Staging, Production, Archive).
13+
* *S3 (ODF)*: This is where the models are stored and what the MLFlow model registry interfaces with. We use ODF (OpenShift Data Foundation) according to the MLFlow guide, but it can be replaced with another solution.
14+
* *RHODS Model Serving*: We recommend using RHODS Model Serving for serving the model. It's based on ModelMesh and allows us to easily send requests to an endpoint for getting predictions.
15+
* *Application interface*: This is the interface used to run predictions with the model. In our case, we will build a visual interface (interactive app) using Gradio and let it load the model from the MLFlow model registry.
16+
17+
//figure 1 originally
18+
.Overview of the solution reference architecture
19+
image::mlops-fraud-detection/mfd-reference-architecture.png[link="/images/mlops-fraud-detection/mfd-reference-architecture.png"]
20+
21+
//figure 2 logical
22+
.Logical Architecture
23+
//image::mlops-fraud-detection/mfd-logical-architecture-legend.png[link="/images/mlops-fraud-detection/mfd-logical-architecture-legend.png", width=940]
24+
25+
//figure 3 Schema
26+
.Data Flow Architecture
27+
//image::mlops-fraud-detection/mfd-schema-dataflow.png[link="/images/mlops-fraud-detection/mfd-schema-dataflow.png", width=940]
28+
29+
[id="about-technology"]
30+
== About the technology
31+
32+
The following technologies are used in this solution:
33+
34+
https://www.redhat.com/en/technologies/cloud-computing/openshift/try-it[Red Hat OpenShift Platform]::
35+
An enterprise-ready Kubernetes container platform built for an open hybrid cloud strategy. It provides a consistent application platform to manage hybrid cloud, public cloud, and edge deployments. It delivers a complete application platform for both traditional and cloud-native applications, allowing them to run anywhere. OpenShift has a pre-configured, pre-installed, and self-updating monitoring stack that provides monitoring for core platform components. It also enables the use of external secret management systems, for example, HashiCorp Vault in this case, to securely add secrets into the OpenShift platform.
36+
37+
https://www.redhat.com/en/technologies/cloud-computing/openshift/openshift-ai[Red Hat OpenShift AI]::
38+
Red Hat® OpenShift® AI is an AI-focused portfolio that provides tools to train, tune, serve, monitor, and manage AI/ML experiments and models on Red Hat OpenShift. Bring data scientists, developers, and IT together on a unified platform to deliver AI-enabled applications faster.
39+
40+
https://www.redhat.com/en/technologies/cloud-computing/openshift/try-it[Red Hat OpenShift GitOps]::
41+
A declarative application continuous delivery tool for Kubernetes based on the ArgoCD project. Application definitions, configurations, and environments are declarative and version controlled in Git. It can automatically push the desired application state into a cluster, quickly find out if the application state is in sync with the desired state, and manage applications in multi-cluster environments.
42+
43+
https://www.redhat.com/en/technologies/jboss-middleware/amq[Red Hat AMQ Streams]::
44+
Red Hat AMQ streams is a massively scalable, distributed, and high-performance data streaming platform based on the Apache Kafka project. It offers a distributed backbone that allows microservices and other applications to share data with high throughput and low latency. Red Hat AMQ Streams is available in the Red Hat AMQ product.
45+
46+
Hashicorp Vault (community)::
47+
Provides a secure centralized store for dynamic infrastructure and applications across clusters, including over low-trust networks between clouds and data centers.
48+
49+
MLFlow Model Registry (community)::
50+
A centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of an MLflow Model. It provides model lineage (which MLflow experiment and run produced the model), model versioning, model aliasing, model tagging, and annotations.
51+
52+
Other::
53+
This solution also uses a variety of _observability tools_ including the Prometheus monitoring and Grafana dashboard that are integrated with OpenShift as well as components of the Observatorium meta-project which includes Thanos and the Loki API.
Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
:_content-type: PROCEDURE
2+
:imagesdir: ../../../images
3+
4+
[id="deploying-edd-pattern"]
5+
= Deploying the MLOps Fraud Detection pattern
6+
7+
== Prerequisites
8+
9+
. An OpenShift cluster (Go to https://console.redhat.com/openshift/create[the OpenShift console]). Cluster must have a dynamic StorageClass to provision PersistentVolumes.
10+
// See also link:../../mlops-fraud-detection/cluster-sizing[sizing your cluster].
11+
. A GitHub account (and a token for it with repositories permissions, to read from and write to your forks)
12+
13+
For installation tooling dependencies, see link:https://validatedpatterns.io/learn/quickstart/[Patterns quick start].
14+
15+
The use of this pattern depends on having a Red Hat OpenShift cluster. In this version of the validated pattern
16+
there is no dedicated Hub / Edge cluster for the *MLOps Fraud Detection* pattern. This single node pattern can be extend as a managed cluster(s) to a central hub.
17+
// See link:../../mlops-fraud-detection/ideas-for-customization[ideas for customization.]
18+
19+
If you do not have a running Red Hat OpenShift cluster you can start one on a
20+
public or private cloud by using link:https://console.redhat.com/openshift/create[Red Hat's cloud service].
21+
22+
[id="utilities"]
23+
= Utilities
24+
25+
A number of utilities have been built by the validated patterns team to lower the barrier to entry for using the community or Red Hat Validated Patterns. To use these utilities you will need to export some environment variables for your cloud provider:
26+
27+
[id="preparation"]
28+
= Preparation
29+
30+
. Fork the link:https://github.com/validatedpatterns/mlops-fraud-detection[mlops-fraud-detection] repo on GitHub. It is necessary to fork because your fork will be updated as part of the GitOps and DevOps processes.
31+
. Clone the forked copy of this repository.
32+
+
33+
[,sh]
34+
----
35+
git clone git@github.com:<your-username>/mlops-fraud-detection.git
36+
----
37+
38+
. Create a local copy of the Helm secrets values file that can safely include credentials
39+
+
40+
*DO NOT COMMIT THIS FILE*
41+
+
42+
You do not want to push credentials to GitHub.
43+
+
44+
[,sh]
45+
----
46+
cp values-secret-mlops-fraud-detection.yaml.template ~/values-secret.yaml
47+
vi ~/values-secret.yaml
48+
----
49+
50+
*values-secret.yaml example*
51+
52+
[source,yaml]
53+
----
54+
secrets:
55+
//Nothing at time of writing.
56+
----
57+
58+
When you edit the file you can make changes to the various DB and Grafana passwords if you wish.
59+
60+
. Customize the `values-global.yaml` for your deployment
61+
+
62+
[,sh]
63+
----
64+
git checkout -b my-branch
65+
vi values-global.yaml
66+
----
67+
68+
*Replace instances of PROVIDE_ with your specific configuration*
69+
70+
[source,yaml]
71+
----
72+
global:
73+
pattern: mlops-fraud-detection
74+
hubClusterDomain: "AUTO" # this is for test only This value is automatically fetched when Invoking against a cluster
75+
76+
options:
77+
useCSV: false
78+
syncPolicy: Automatic
79+
installPlanApproval: Automatic
80+
81+
main:
82+
clusterGroupName: hub
83+
gitOpsSpec:
84+
operatorChannel: gitops-1.9
85+
----
86+
87+
[,sh]
88+
----
89+
git add values-global.yaml
90+
git commit values-global.yaml
91+
git push origin my-branch
92+
----
93+
94+
. You can deploy the pattern using the link:/infrastructure/using-validated-pattern-operator/[validated pattern operator]. If you do use the operator then skip to Validating the Environment below.
95+
. Preview the changes that will be made to the Helm charts.
96+
+
97+
[,sh]
98+
----
99+
./pattern.sh make show
100+
----
101+
102+
. Login to your cluster using oc login or exporting the KUBECONFIG
103+
+
104+
[,sh]
105+
----
106+
oc login
107+
----
108+
+
109+
.or set KUBECONFIG to the path to your `kubeconfig` file. For example
110+
+
111+
[,sh]
112+
----
113+
export KUBECONFIG=~/my-ocp-env/auth/kubeconfig
114+
----
115+
116+
[id="check-the-values-files-before-deployment-getting-started"]
117+
== Check the values files before deployment
118+
119+
You can run a check before deployment to make sure that you have the required variables to deploy the
120+
MLOps Fraud Detection Validated Pattern.
121+
122+
You can run `make predeploy` to check your values. This will allow you to review your values and changed them in
123+
the case there are typos or old values. The values files that should be reviewed prior to deploying the
124+
MLOps Fraud Detection Validated Pattern are:
125+
126+
|===
127+
| Values File | Description
128+
129+
| values-secret.yaml / values-secret-mlops-fraud-detection.yaml
130+
| This is the values file that will include the rhpam and fhir-psql-db sections with all database et al secrets
131+
132+
| values-global.yaml
133+
| File that is used to contain all the global values used by Helm
134+
|===
135+
136+
= Deploy
137+
138+
. Apply the changes to your cluster
139+
+
140+
[,sh]
141+
----
142+
./pattern.sh make install
143+
----
144+
+
145+
If the install fails and you go back over the instructions and see what was missed and change it, then run `make update` to continue the installation.
146+
147+
. This takes some time. Especially for the OpenShift Data Foundation operator components to install and synchronize. The `make install` provides some progress updates during the install. It can take up to twenty minutes. Compare your `make install` run progress with the following video showing a successful install.
148+
149+
. Check that the operators have been installed in the UI.
150+
.. To verify, in the OpenShift Container Platform web console, navigate to *Operators → Installed Operators* page.
151+
.. Check that the Operator is installed in the `openshift-operators` namespace and its status is `Succeeded`.
152+
153+
[id="using-openshift-gitops-to-check-on-application-progress-getting-started"]
154+
== Using OpenShift GitOps to check on Application progress
155+
156+
You can also check on the progress using OpenShift GitOps to check on the various applications deployed.
157+
158+
. Obtain the ArgoCD URLs and passwords.
159+
+
160+
The URLs and login credentials for ArgoCD change depending on the pattern
161+
name and the site names they control. Follow the instructions below to find
162+
them, however you choose to deploy the pattern.
163+
+
164+
Display the fully qualified domain names, and matching login credentials, for
165+
all ArgoCD instances:
166+
+
167+
[,sh]
168+
----
169+
ARGO_CMD=`oc get secrets -A -o jsonpath='{range .items[*]}{"oc get -n "}{.metadata.namespace}{" routes; oc -n "}{.metadata.namespace}{" extract secrets/"}{.metadata.name}{" --to=-\\n"}{end}' | grep gitops-cluster`
170+
CMD=`echo $ARGO_CMD | sed 's|- oc|-;oc|g'`
171+
eval $CMD
172+
----
173+
+
174+
The result should look something like:
175+
+
176+
[,text]
177+
----
178+
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
179+
hub-gitops-server hub-gitops-server-mlops-fraud-detection-hub.apps.mfd-cluster.aws.validatedpatterns.com hub-gitops-server https passthrough/Redirect None
180+
# admin.password
181+
xsyYU6eSWtwniEk1X3jL0c2TGfQgVpDH
182+
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
183+
cluster cluster-openshift-gitops.apps.mfd-cluster.aws.validatedpatterns.com cluster 8080 reencrypt/Allow None
184+
kam kam-openshift-gitops.apps.mfd-cluster.aws.validatedpatterns.com kam 8443 passthrough/None None
185+
openshift-gitops-server openshift-gitops-server-openshift-gitops.apps.mfd-cluster.aws.validatedpatterns.com openshift-gitops-server https passthrough/Redirect None
186+
# admin.password
187+
FdGgWHsBYkeqOczE3PuRpU1jLn7C2fD6
188+
----
189+
+
190+
The most important ArgoCD instance to examine at this point is `mlops-fraud-detection-hub`. This is where all the applications for the pattern can be tracked.
191+
192+
. Check all applications are synchronised. There are thirteen different ArgoCD "applications" deployed as part of this pattern.
193+
194+

0 commit comments

Comments
 (0)