AI/ML Archives - Woohoo Services Blog!

Sprucing up a Site with AI – DALL-E

Summary

I like to tinker. It is the way I learn best. Along with that I love sharing information and documenting it through various mediums like this blog. One of my shortcoming however is I have no artistic sense.

As I started to dive into Generative AI, one of the areas that intrigued me the most was the area of Text To Image. Most of my sites have been extremely bland because I lack any sort of graphic design capabilities. In my recent reading of Artificial Intelligence & Generative AI for Beginners, it helped me work through an area where AI could help me.

Image Generators

For this project of sprucing up my https://tools.woohoosvcs.com site I used DALLE-E but it is not the only one. There are many and easy to find. It does require the Plus subscription of ChatGPT. The Bing Image Creator is a free version of that with limits. It uses the same engine, as I understand it.

Prompting

Prompting is key. We want to ensure that we’re guiding AI in the right direction. Perhaps this will not be as necessary in the future but for now it really helps. A simple prompt of “Generate an image logo to depict an ssl certificate” and DALL-E will get to going. The more specific guidance you can give it the better though. This will help ensure uniqueness as well as specificity.

Site

For this, I updated the https://tools.woohoosvcs.com and if you want to see what it used to look like you can go to https://tools.woohoosvcs.com/old

For a side by side

Further Down AI Powered Chatbot Rabbit Hole

Summary

In my previous article Chatbots, AI and Docker! I talked about a little of the theory behind this but for this article I wanted to fully go down the rabbit hole and produced my own chatbot. To do this, I had to find an updated chatterbot fork, learn a little more python to handle dependencies better, create my own fork of corpus/training material and learn Google Cloud Run. Ultimately you can skip straight to the source if you like. That’s the great part of GitOps/IaC

And Then Some

Previously, I had a workable local instance that I was able to host in podman/kind but I wanted to put this in my hosting environment on Google. In order to personalize this, I wanted to be able to add some training data and use some better practices. Having previously used Google App Engine, assumedly that would naturally be the landing place for this. I then ran into some hiccups and came across Cloud Run which was not originally available and seemed like a suitable fit as it is built for containerized workflows. It provided me a way to use my existing Dockerfile to unify the build and deploy. For tools, I have a separate build and test workflow in my cloudbuild.yaml.

Get on with Chatbots!

In my last article, I mentioned I had to find a fork of chatterbot because it has not been recently maintained. In reality though it only allowed command line prompting which is not terribly useful for a wider audience to test. I came across this amazing medium post which I have to give full credit for (and do in the html as well). The skin is pretty amazing. It also provides a wealth of in depth details.

For the web framework, I opted to use Flask and gunicorn which was fairly trivial to get going after finding that great medium post above.

Training Data

Without any training data AI/ML does not really exist. It needs to be pre-trained and/or train “on the job”. For this, chatter b ot-corpus comes into play. This is a pre-built training data set for the chatterbot library. It has some decent basic training. I wanted to be able to add my own and based on the input of casmith, its in python so shouldn’t it be able to converse with Monty Python quotes? So I did and created my own section for that.

categories:
- Monty
- humor
conversations:
- - What is your name?
  - My name is Sir Lancelot of Camelot.
- - What is your quest?
  - To seek the Holy Grail.
- - What is the air speed velocity of an unladen swallow?
  - What do you mean? An African or European swallow?
- - How do know so much about swallows?
  - Well, you have to know these things when you're a king, you know.

I have the real-time training disabled or rather put my chatterbot into read only mode because the internet can be a cruel place and I don’t need my creation coming home with a foul mouth! For my lab, the training is loaded at image creation time. This is primarily because its using the default sqlite back end. I could easily use a database for this and load the training out of band so it doesn’t require a deploy.

Logic Adapters

You may be thinking this is a simple bot that’s just doing string matching to figure out how to respond. For the most part you’re correct. This is not deep learning and it doesn’t fully understand what you are asking. With that said its very extensible with multiple logic adapters. The default is a “BestMatch” based on text input. Others allow it to report time and do math. It will weigh the confidence of the response on each adapter to let the highest scoring/weighing response win. Pretty neat!

chatbot = ChatBot(
    "Sure, Not",
    logic_adapters=[
        'chatterbot.logic.BestMatch',
        'chatterbot.logic.MathematicalEvaluation',
        'chatterbot.logic.TimeLogicAdapter'
    ],
    read_only=True
)

Over To The Infrastructure

For all of this, it starts with a Dockerfile. I already had this but it was a little bloated with build dependencies. Therefore, I created a multistage image using virtual python environment as guided by https://pythonspeed.com/articles/multi-stage-docker-python. I am not new to multistage images. My golang images use it. I was, however, new to doing this with Python. Not only did it reduce my image size down 100MB but it also removed 30 vulnerabilities from the images because of a dependency on git for some of the python libraries.

Cloud Run

To get deployed to Cloud Run, it was pretty simple although there were a few trial an errors due to permissions. The service account needed Cloud Run Admin access. Aside from that, this pumped everything through and let me keep my singular Dockerfile.

steps:
  # Docker Build
  - name: 'gcr.io/cloud-builders/docker'
    args: ['build', '-t', 
           'us.gcr.io/${PROJECT_ID}/chatbot:${SHORT_SHA}', '.']

  # Docker push to Google Artifact Registry
  - name: 'gcr.io/cloud-builders/docker'
    args: ['push',  'us.gcr.io/${PROJECT_ID}/chatbot:${SHORT_SHA}']

  # Deploy to Cloud Run
  - name: google/cloud-sdk
    args: ['gcloud', 'run', 'deploy', 'chatbot', 
           '--image=us.gcr.io/${PROJECT_ID}/chatbot:${SHORT_SHA}', 
           '--region', 'us-central1', '--platform', 'managed', 
           '--allow-unauthenticated', '--port', '5000', '--memory', '256Mi',
           '--no-cpu-boost']

# Store images in Google Artifact Registry 
images:
  - us.gcr.io/${PROJECT_ID}/chatbot:${SHORT_SHA}

It really was this simple since I had a working local environment and working Dockerfile. Just don’t look at my commit history 🙂 Quite a few silly mistakes were made if you look deep enough.

Caveat

Google App Engine lets you use custom domain mapping and bring your own certificates. I use Cloudflare to protect my entire environment and for this in GAE I placed a Cloudflare Origin certificate to help prevent it from being accessed by the outside world as no browser would trust it bypassing Cloudflare.

Google Cloud run has a preview feature of custom domain mapping. The easiest of the options doesn’t support custom certificates and therefore wants to issue you a certificate. The temp workaround for this is to not proxy through Cloudflare until the certificate is issued and then turn on proxy. Rinse and repeat yearly when the cert needs to be renewed.

I have to imagine this will get rectified once out of preview to be feature parity with Google App Engine since it seems Cloud Run intends to replace GAE.

Credits

For Multi-stage help with Python Docker Images – https://pythonspeed.com/articles/multi-stage-docker-python

For the entire UI of this demo/test – https://medium.com/@kumaramanjha2901/building-a-chatbot-in-python-using-chatterbot-and-deploying-it-on-web-7a66871e1d9b

Chatbots, AI and Docker!

Summary

I have started my learning journey about AI. With that I started reading Artificial Intelligence & Generative AI for Beginners. One of the use cases it went through for NLP (Natural Language Processing) was Chatbots.

To the internet I went – ready to go down a rabbit hole and came across a Python library called ChatterBox. I knew I did not want to bloat and taint my local environment so I started using a Docker instance in Podman.

Down the Rabbit Hole

I quickly realized the project has not been actively maintained in a number of years and had some specific and dated dependencies. For example, it seemed to do best with python 3.6 whereas the latest at the time if this writing is 3.12.

This is where Docker shines though. It is really easy to find older images and declare which versions you want. The syntax of Dockerfile is such that you can specify the image and layer the commands you want to run on it. It will work every time, no matter where it is deployed from there.

I eventually found a somewhat updated fork of it here which simplified things but it still had its nuances. chatterbox-corpus (the training data) required PyYaml 3.13 but to get this to work it needed 5.

Dockerfile

FROM python:3.6-slim

WORKDIR /usr/src/app

#COPY requirements.txt ./
#RUN pip install --no-cache-dir -r requirements.txt
RUN pip install spacy==2.2.4
RUN pip install pytz pyyaml chatterbot_corpus
RUN python -m spacy download en

RUN pip install --no-cache-dir chatterbot==1.0.8

COPY ./chatter.py .

CMD [ "python", "./chatter.py" ]

Here we can see, I needed a specific version of Python(3.6) whereas at the time of writing the latest is 3.12. It also required a specific spacy package version. With this I have a repeatable environment that I can reproduce and share (to peers or even to production!)

Dockerfile2

Just for grins, when I was able to use the updated fork it did not take much!

FROM python:3.12-slim

WORKDIR /usr/src/app

#COPY requirements.txt ./
#RUN pip install --no-cache-dir -r requirements.txt
RUN pip install spacy==3.7.4 --only-binary=:all:
RUN python -m spacy download en_core_web_sm

RUN apt-get update && apt-get install -y git
RUN pip install git+https://github.com/ShoneGK/ChatterPy

RUN pip install chatterbot-corpus

RUN pip uninstall -y PyYaml
RUN pip install --upgrade PyYaml

COPY ./chatter.py .

CMD [ "python", "./chatter.py" ]

Without Docker

Without Docker(podman) I would have tainted my local environment with many different dependencies. At the point of getting it all working, I couldn’t be sure it would work properly on another machine. Even if it did, was their environment tainted as well? With Docker, I knew I could easily repeat the process from a fresh image to validate.

Previous projects I worked on that were python related could have also tainted my local to cause unexpected results on other machines or excessive hours troubleshooting something unique to my machine. All of that avoided with a container image!

Declarative Version Management

When it becomes time to update to the next version of Python, it will be a really easy feat. Many tools will even parse these types of files and do dependency management like Dependabot or Snyk