How To Use OpenAI Codex To Create Notebooks With Natural Language
The Speed & Accuracy Is Astonishing
Introduction
A chatbot conversation is where unstructured data (human conversation) is structured for processing and extracting meaning and intent.
When the chatbot reply to the user, the structured data needs to be unstructured again into natural language.
This unstructuring and structuring of data demands overhead and special detailed attention.
The degree to which the data can be entered unstructured, determines the degree of complexity.
The more unstructured the input, the more overhead to structure the input for processing.
Some chatbots simplify the process by presenting the user with buttons, menus and other design affordances. Hence structuring the user interface to some degree.
And again, the degree to which data is unstructured in the chatbot return dialog can be limited with cards etc.
What makes Codex interesting is that natural language is structured at input, but there is no subsequent unstructuring required. Natural language input is structured and code is derived from this data.
In its essence; OpenAI Codex translates natural language into code.
Interesting fact; Codex is the model that powers GitHub Copilot, which OpenAI built and launched in partnership with GitHub.
Codex is proficient in more than 12 programming languages.
Codex takes simple commands in natural language and execute them on the user’s behalf.
OpenAI Codex is based on GPT-3.
According to OpenAI, Codex’s training data contains both natural language and billions of lines of source code from publicly available sources, including code in public GitHub repositories.
Python, Notebooks & Codex
Below is the Codex playground where questions can be submitted and Codex generate the code. The environment is very simplistic with an array of settings on the right-hand pane. In the example below, you will see code copied from this preview and copied into a Notebook.
OpenAI Codex in most cases offer a follow-up question and a description of the result. In an sense ending with a proposed next question and the answer. When the Submit button is clicked, additional code is generated.
A comparison of different graphs of Temperatures and Humidity values. Codex can be asked to create 3 graphs based on the df. Or, apply crosstab to the df, if you wan to see a practical implementation of it.
Or a more ambiguous request like, demonstrates visualization of tabular data in df.
Practical implementation of a crosstab on the DF involving City and Temperature.
The full list of questions (NLU input) posed to Codex, is denoted by “ “ “. Codex generates a comprehensive comment, denoted by hash, with the code.
Exceptional Elements which can be easily overlooked:
- How natural and coherent the comments within the code is. The Natural Language Generation (NLG) is obviously on par with GPT-3.
- Context is maintained within the conversation and where user input is not comprehensive or explicit, accurate implicit assumptions are made by Codex.
- The code works. Users can copy it out of the Codex preview, paste it into a Notebook and execute. I did not have an instance where the code did not execute.
- The process of generating code is very modular and a request is broken up in to separate sequential steps.
Below is the full transcript of NLU Input, comments and generated code…
There are spelling mistakes in my NLU input. But it did not pose to be any hinderance to the process.
"""
Create a dataframe of 12 random US cities with averate yearly temperature, humidity, rainfall an wind
"""
import pandas as pd
import numpy as np
import random# Create a dataframe of 12 random US cities with averate yearly temperature, humidity, rainfall and wind# Create a list of 12 random US citiescities = ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'Philadelphia', 'San Antonio', 'San Diego', 'Dallas', 'San Jose', 'Austin', 'Jacksonville']# Create a list of 12 random average yearly temperatures
temps = [random.randint(50, 100) for i in range(12)]# Create a list of 12 random average yearly humidities
humidities = [random.randint(50, 100) for i in range(12)]# Create a list of 12 random average yearly rainfall
rainfall = [random.randint(50, 100) for i in range(12)]# Create a list of 12 random average yearly wind
wind = [random.randint(50, 100) for i in range(12)]# Create a dictionary of the lists
data = {'City': cities, 'Temperature': temps, 'Humidity': humidities, 'Rainfall': rainfall, 'Wind': wind}# Create a dataframe from the dictionary
df = pd.DataFrame(data)# Display the dataframe
df"""
create 3 graphs based on the df
"""# Create a bar graph of the average yearly temperature for each city
df.plot.bar(x='City', y='Temperature')# Create a bar graph of the average yearly humidity for each city
df.plot.bar(x='City', y='Humidity')# Create a bar graph of the average yearly rainfall for each city
df.plot.bar(x='City', y='Rainfall')# Create a bar graph of the average yearly wind for each city
df.plot.bar(x='City', y='Wind')"""
apply crosstab to the df
"""# Create a crosstab of the average yearly temperature for each city
pd.crosstab(df.City, df.Temperature)# Create a crosstab of the average yearly humidity for each city
pd.crosstab(df.City, df.Humidity)
# Create a crosstab of the average yearly rainfall for each city
pd.crosstab(df.City, df.Rainfall)# Create a crosstab of the average yearly wind for each city
pd.crosstab(df.City, df.Wind)"""
show A histogram can be stacked using df
"""# Create a histogram of the average yearly temperature for each city
df.hist(column='Temperature')# Create a histogram of the average yearly humidity for each city
df.hist(column='Humidity')# Create a histogram of the average yearly rainfall for each city
df.hist(column='Rainfall')# Create a histogram of the average yearly wind for each city
df.hist(column='Wind')"""
show A boxplot can be stacked using df
"""# Create a boxplot of the average yearly temperature for each city
df.boxplot(column='Temperature')# Create a boxplot of the average yearly humidity for each city
df.boxplot(column='Humidity')# Create a boxplot of the average yearly rainfall for each city
df.boxplot(column='Rainfall')"""
demonstrates visualization of tabular data in df
"""# Create a scatter plot of the average yearly temperature vs average yearly humidity for each city
df.plot.scatter(x='Temperature', y='Humidity')# Create a scatter plot of the average yearly temperature vs average yearly rainfall for each city
df.plot.scatter(x='Temperature', y='Rainfall')# Create a scatter plot of the average yearly temperature vs average yearly wind for each city
df.plot.scatter(x='Temperature', y='Wind')# Create a scatter plot of the average yearly humidity vs average yearly rainfall for each city
df.plot.scatter(x='Humidity', y='Rainfall')# Create a scatter plot of the average yearly humidity vs average yearly wind for each city
df.plot.scatter(x='Humidity', y='Wind')
Conclusion
Often, technology at its infancy and inception seems rudimentary, awkward and redundant. Invariably discussions ensue on the new tech’s viability and right to existence, comparing it to technologies steeped in history and innumerous iterations.
When thinking in terms of low-code or even NLU in this case it is not an all or nothing scenario.
Some comments on low-code in general…
The Good:
- Low-code on its own is not a solution to all problems.
- Smaller applications and utilities are well suited for low-code.
- Low-code is good for prototyping, experimenting and wireframes.
- Low-code is well suited as an extension to existing larger implementation, and enabling business units to create their own extensions and customization.
- Examples of good low-code implementations are IBM Watson Assistant Actions, Microsoft Power Virtual Agents, some of the Amazon Alexa Development Console functionality etc.
Impediments:
- Fine tuning is problematic with low-code.
- Scaling and integration.
- Optimization
- Performance management
- Invariably you would want to include functions and extensions not available in your authoring environment.
And the same holds true for Codex. Will enterprise systems be built this way, most probably not. Will Fortune 500 companies go the Codex route in principle…no.
But, there are some defernite niche applications, these can include:
- Solving coding challenge and problems in certain routines.
- Establishing best practice.
- Quality assurance.
- Interactive Learning
- Generating specific components for subsequent human review.