Training & Testing Text Classification Models with Google Cloud Vertex AI

By leveraging Google’s AutoML feature, classification models can be created with little to no technical effort.

Cobus Greyling
5 min readMar 28, 2023


For starters, here are a few general observations:

  1. There are many elements of Vertex AI in general and AutoML in specific which reminds me of HuggingFace🤗 autoTRAIN.
  2. AutoML allows for quick prototyping and exploration of data sets making use of a no-code studio approach.
  3. Vertex AI has two text classification option, single or multi-label classifications. Creating class hierarchies or taxonomies are not possible.
  4. Model training time is relatively long. Traditional NLU Models have made great strides in terms of incremental training. Incremental training is the notion of appending new data to an existing model. Added to this, training time of traditional NLU models have shortened dramatically.

Even-though AutoML is designed to streamline the process of creating a ML model, two elements which are highly configureable are data split and annotation sets.

Data split: By default, AutoML randomly assigns each item in your dataset to training, validation, and test sets in a 80/10/10 ratio respectively. You can change that ratio or even manually assign each data item to a set.

Annotation sets: Annotation sets store annotations so that you can use the same dataset for other models and objectives. For example, you could use this same “happiness” dataset to train a multi-label classification model instead of a single-label one.

But I hasten to mention, Vertex AI lacks a bottom-up, data-centric approach to curating and structuring training data.

⭐️ Please follow me on LinkedIn for updates on Conversational AI ⭐️

In a previous post I stepped through the process of creating a dataset and training a ML model.

In the image below, the Vertex AI dashboard is visible, under recent models the new model is listed with the average precision.

Once the model is accessed, there is a progression bar at the top of the page. Here the newly created model can be evaluated, deployed & tested and more.

In the image below you can see that the model can be tested with longer input. On the right of the image, the labels are visible, with the enjoy_the_moment label identified.

Jumping back to the evaluate tab, a few quick-view graphic indicators are available per model.

Below you see the confusion matrix:

And below is visible the trade-off between precision and recall at different confidence thresholds.

➡️ A lower threshold results in higher recall but typically lower precision.

➡️ A higher threshold results in lower recall but typically with higher precision.

Read more on threshold, precision and recall here.

In Closing

In an upcoming article I want to consider production deployment of Vertex models.

⭐️ Please follow me on LinkedIn for updates on Conversational AI ⭐️

I’m currently the Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces and more.



Cobus Greyling

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI.