Noteable ChatGPT Plugin: A New Era of Data Analysis Beyond the Code Interpreter

In the rapidly evolving world of data science, finding tools that simplify complex tasks while making the process engaging and intuitive is a real game-changer. Enter the Noteable ChatGPT plugin, a powerful fusion of OpenAI’s ChatGPT’s natural language processing capabilities with the robustness and versatility of data notebooks on the Noteable.io platform.

Whether you’re a newcomer to data analysis or a seasoned veteran, this innovative combination of artificial intelligence and data analytics is sure to pique your curiosity and open up a world of new possibilities. From facilitating code-free data analysis, to automatically generating insightful visualizations, to enabling seamless real-time cloud collaboration, the Noteable ChatGPT plugin is setting new standards in the field of data analysis.

In this article, I don’t just tell you about the incredible features of the Noteable ChatGPT plugin—I show you.

Through real-world demonstrations, I’ll explore the advanced data analysis capabilities of the plugin, revealing how it can transform your data analysis workflows and deliver results in a remarkably intuitive and efficient manner. Join me on this journey and discover how the Noteable ChatGPT plugin can revolutionize your data analysis experience.

What Is Noteable ChatGPT Plugin

The Noteable ChatGPT plugin is a third-party plugin developed by Noteable.io, a collaborative data notebook platform. It seamlessly integrates ChatGPT’s natural language processing capabilities with data notebook, which are a feature of the Noteable.io platform.

Similar to Jupyter Notebook, Noteable’s data notebook is a web application that allows users to create and share documents containing live code, equations, visualizations, and text annotations. It helps with tasks like data cleaning, data analysis, numerical simulation, statistical modeling, data visualization, and machine learning.

The Noteable ChatGPT plugin lets users chat with ChatGPT to perform tasks such as loading datasets, doing exploratory data analysis, creating visuals, running machine learning models, and more. All within a shareable notebook environment.

This integration simplifies data analysis and makes it accessible. Even without coding skills, you can do professional data analysis using the plugin’s intuitive dialogue interface.

Key Features of the Noteable ChatGPT Plugin

The plugin provides several features:

  1. Dialogue-guided Data Analysis: Guide your data analysis process without coding. Just describe your goals and the techniques you want to use, and ChatGPT will automatically create data notebooks and interpret the analysis results for you.
  2. Data Visualization: Visual charts can be generated during data analysis through ChatGPT dialogue. You can also request specific chart types.
  3. Cloud Collaboration: Notebooks created through ChatGPT are stored on the Noteable cloud platform. Multiple people can access and edit the same notebook at the same time, and notebooks persist even after you deleted the ChatGPT conversation. Team members can add cells, edit code, add comments, and see everyone’s contributions. The platform also supports real-time chat and commenting, facilitating discussion during data analysis.
  4. Sharing and Exporting: Notebooks created by ChatGPT can be exported for online analysis or integrated into other projects.
  5. Version Control: Since results are stored in the cells of the data notebook created by Noteable through ChatGPT, you can review the history of your data analysis. You can also recover previous versions when needed.
  6. Connecting to Everything: Noteable can connect to different data sources, such as MySQL, MariaDB, PostgreSQL, BigQuery, Snowflake, Databricks, and external APIs.

Getting Started with Notable ChatGPT Plugin

Installing the Noteable Plugin

To use the Noteable ChatGPT plugin, you need an active ChatGPT Plus account. Make sure you’re logged in and check if the ChatGPT plugin function is active.

chatgpt user interface
open chatgpt setting
chatgpt plugin setting beta features

Next, go to the ChatGPT plugin store and look for the Noteable plugin.

search chatgpt plugin noteable

Click “Install” to add it to your list of plugins.

Connecting Noteable to ChatGPT

After installing the plugin, a login page for Noteable will pop up. You need to connect your Noteable account to ChatGPT.

noteable sign up page

If you don’t see the login page, visit Notable.io to create a free account.

noteable sign up page

After you have a Noteable account, ChatGPT will activate the Noteable plugin. You’ll see the Noteable logo below the GPT version selection button.

noteable chatgpt plugin selected

If it’s not selected, choose it from the plugin list.

Creating a Noteable Project

Noteable projects help you organize your work in the Noteable cloud space. A project holds Jupyter notebooks, folders, and files.

I’ll use a Kaggle dataset of over 8,000 Netflix movies and TV shows to demonstrate how to use Noteable ChatGPT plugin.

To create a project, enter the following prompt:

Create a new project called “Netflix Movies and TV Shows” and set it as my default project.

create a new noteable project with chatgpt

After ChatGPT confirmed that the project was created, I could find it in the Noteable.io cloud space.

noteable project page

I could also create projects directly in Noteable’s cloud space.

create a new noteable project in noteable

Additionally, setting a default project ensures new notebooks go to the default project, keeping your Noteable account organized.

Creating a Data Notebook

Start data analysis by creating a Noteable notebook.

A Noteable data notebook is a notebook similar to Jupyter Notebook in the Noteable cloud. It lets you add code, explanatory text, run and test code interactively, and display visual diagrams.

With the Noteable ChatGPT plugin, users unfamiliar with coding can create and edit notebooks through dialogue. To create a notebook, use the following prompt:

Create a new notebook called “Test” and give me the notebook link.

create a new notebook in noteable

Open the provided note link to access the notebook in the Noteable cloud space.

Analyzing the Dataset

Importing Data from CSV, Excel, and Databases

Noteable makes it easy to import data from different sources. You can upload CSV files, Excel spreadsheets, and connect to databases like Postgres and MySQL.

To import data in the Noteable cloud space, follow these steps:

Open the note link provided by ChatGPT to access the notebook interface.

Click the plus sign in the upper right corner of the interface and select “Add files” from the drop-down menu.

noteable notebook add files

Drag the CSV or Excel file into the dotted box or click “Upload from computer” to browse and upload the file. You can upload up to 25 files at a time, each up to 100MB.

noteable notebook import files

Performing Exploratory Data Analysis (EDA)

Exploratory Data Analysis (EDA) is like studying a map before visiting a place. It helps us understand the dataset’s overall characteristics.

Before starting analysis, it’s common to clean and preprocess the data, addressing issues like wrong values, duplicates, missing values, and outliers. Not handling these issues can affect the accuracy of later analysis.

ChatGPT can assist with EDA by:

  • Generating summary statistics for columns.
  • Checking for missing values.
  • Identifying unusual data points.
  • Plotting a column’s distribution.

Before starting EDA, let’s see if Noteable ChatGPT plugin can help us perform these steps. The dataset we’ll use is netflix_titles.csv.

To explore the dataset, I gave the following prompt to ChatGPT:

As a professional data analyst, please follow the standard data analysis procedure. Let’s analyze the data in the netflix_titles.csv step by step.

Here’s the response from ChatGPT:

I have created a code cell to load the data from the netflix_titles.csv file and display the first few rows. However, it seems there was an issue starting the kernel. Let’s try running the cell again.

You can follow along with the cell work at this link.

The data has been successfully loaded. Here are the first few rows of the dataset:

indexshow_idtypetitledirectorcastcountrydate_addedrelease_yearrating
0s1MovieDick Johnson Is DeadKirsten JohnsonUnited StatesSeptember 25, 20212020PG-13
1s2TV ShowBlood & WaterAma Qamata, Khosi Ngema, Gail Mabalane, Thabang Molaba, Dillon Windvogel, Natasha Thahane, Arno Greeff, Xolile Tshabalala, Getmore Sithole, Cindy Mahlangu, Ryle De Morny, Greteli Fincham, Sello Maake Ka-Ncube, Odwa Gwanya, Mekaila Mathys, Sandi SchulSouth AfricaSeptember 24, 20212021TV-MA
2s3TV ShowGanglandsJulien LeclercqSami Bouajila, Tracy Gotoas, Samuel Jouy, Nabiha Akkari, Sofia Lesaffre, Salim Kechiouche, Noureddine Farihi, Geert Van Rampelberg, Bakary DiomberaSeptember 24, 20212021TV-MA
3s4TV ShowJailbirds New OrleansSeptember 24, 20212021TV-MA
4s5TV ShowKota FactoryMayur More, Jitendra Kumar, Ranjan Raj, Alam Khan, Ahsaas Channa, Revathi Pillai, Urvi Singh, Arun KumarIndiaSeptember 24, 20212021TV-MA

The columns in the dataset are: index, show_id, type, title, director, cast, country, date_added, release_year, rating, duration, listed_in, description.

The next step in the data analysis process is usually to perform some exploratory data analysis (EDA) to understand the data better. Would you like to proceed with that?

After analyzing the data, ChatGPT automatically generated code to display the dataset’s first few rows. This gives a quick overview of the dataset’s structure.

Reviewing the notebook in Noteable’s cloud space, I found the Python code for data analysis.

noteable notebook load dataset

ChatGPT then asked if I wanted to perform exploratory data analysis, to which I agreed. Below is another response from ChatGPT:

Here are some initial findings from the exploratory data analysis:

  1. The dataset contains 8807 rows and 12 columns.
  2. The columns and their respective data types are as follows:
ColumnData Type
show_idobject
typeobject
titleobject
directorobject
castobject
countryobject
date_addedobject
release_yearint64
ratingobject
durationobject
  1. There are missing values in the dataset. Here is the count of missing values for each column:
ColumnMissing Values
show_id0
type0
title0
director2634
cast825
country831
date_added10
release_year0
rating4
duration3

The next step would typically be data cleaning, where we handle these missing values. Would you like to proceed with that?

ChatGPT proceeded to output the data type of each column and the number of missing values. It suggested handling the missing values, and I allowed it to continue.

ChatGPT then helped me replace missing values with “Unknown”. But actually, for columns with only a few missing values, I could also delete rows with those missing values. However, to make it easy, I chose to proceed without changing the process.

chatgpt noteable descriptive statistics for dataset

I asked ChatGPT to continue and here’s the response:

Here is a count plot of the ‘type’ column, showing the number of Movies vs. TV Shows in the dataset:

Count of Movies vs. TV Shows

As we can see, there are more movies than TV shows in the dataset.

Next, let’s visualize the distribution of the ‘release_year’ column. Would you like to proceed with that?

I asked ChatGPT to continue and here’s the response:

Here is a histogram of the ‘release_year’ column, showing the distribution of release years for the movies and TV shows in the dataset:

distribution of released year

As we can see, the number of releases has been increasing over the years, with the most releases happening in recent years.

Next, we could analyze the ‘rating’ column to see the most common ratings for the movies and TV shows. Would you like to proceed with that?

I asked ChatGPT to continue and here’s the response:

Here is a count plot of the ‘rating’ column, showing the number of movies and TV shows for each rating:

count of ratings

As we can see, the most common rating is ‘TV-MA’, followed by ‘TV-14’ and ‘TV-PG’.

Next, we could analyze the ‘country’ column to see the countries with the most content. Would you like to proceed with that?

I asked ChatGPT to continue and here’s the response:

Here is a count plot of the ‘country’ column, showing the top 10 countries with the most content:

Top 10 Countries with the Most Content

As we can see, the United States has the most content, followed by India and the United Kingdom.

This concludes the initial exploratory data analysis. Based on these insights, we can formulate more specific questions and perform further analysis. Do you have any specific questions or analyses you would like to perform on this dataset?

As you can see, ChatGPT performed more analysis and created data insights. It automatically generated charts based on the analysis objectives and presented conclusions, even though we didn’t explicitly ask for them. By default, it creates histograms if the data allows, but you can request different chart types.

For further exploration, I gave more prompts to ChatGPT, letting it generate more data insights.

My Prompt:

Analyze duration of movies.

ChatGPT’s response:

chatgpt noteable descriptive statistics duration of movies

I asked ChatGPT to continue and received this:

Here is a histogram of the ‘duration’ column, showing the distribution of movie durations:

Distribution of Movie Duration

As we can see, the most common movie durations are between 80 and 120 minutes.

This concludes the analysis of movie durations. Do you have any other specific questions or analyses you would like to perform on this dataset?

However, I won’t dive into more EDA for this demonstration. To dive deeper into the Noteable ChatGPT plugin’s capabilities, we can continue with more advanced data analysis.

Diving Deeper into Data Analysis

Let’s delve deeper into the advanced data analysis capabilities of the Noteable plugin, specifically looking at natural language processing (NLP) and machine learning modeling.

Natural Language Processing (NLP)

NLP is a field of study that enables computers to understand and interact with human language, similar to how we communicate with each other. It allows computers to process, analyze, and gain insights from large amounts of textual data.

For instance, consider the following task:

Analyzing the most frequent words in titles and generate a word cloud.

After assigning this task to ChatGPT, it generated a word cloud:

Here is a word cloud of the most frequent words in the titles:

word cloud

The size of the words in the word cloud indicates their frequency. As we can see, some of the most frequent words in the titles are ‘Love’, ‘Man’, ‘Life’, ‘World’, ‘Story’, etc.

This concludes the analysis of the most frequent words in titles. Do you have any other specific questions or analyses you would like to perform on this dataset?

Looking at the code produced in the Noteable notebook helps us comprehend the process of merging ChatGPT with Noteable.

code for noteable to generate word cloud

I asked ChatGPT to create a word cloud, and it automatically installed the wordcloud library in Python and imported the stop words from the library. I could even specify extra words to be omitted, such as ‘little’, if I wanted.

Machine Learning Modeling

Machine learning modeling in data analytics involves teaching computer algorithms to learn patterns and make predictions or decisions based on input data. These models can generalize and make accurate predictions on new, unseen data.

Let’s explore what models the Noteable ChatGPT plugin can construct:

What machine learning models can you build for this dataset?

ChatGPT’s response was:

ask chatgpt noteable what machine learning model to build

I gave ChatGPT the following prompt:

Utilize the text data (like title and description) and categorical data (like director, cast, country, and rating) to build a content-based recommendation system.

ChatGPT’s reply was as follows:

Building a content-based recommendation system involves several steps:

  1. Preprocessing: This involves cleaning the text data and converting it into a format that can be used to calculate similarity. This often involves techniques like tokenization, stemming/lemmatization, and removal of stop words.
  2. Feature Extraction: This involves converting the text data into numerical features. This can be done using techniques like TF-IDF (Term Frequency-Inverse Document Frequency) or word embeddings like Word2Vec or BERT.
  3. Similarity Calculation: Once we have numerical features, we can calculate the similarity between different movies/TV shows. This can be done using measures like cosine similarity or Jaccard similarity.
  4. Recommendation: Once we have a measure of similarity between different movies/TV shows, we can recommend similar movies/TV shows to a given movie/TV show.

Let’s start by preprocessing the text data and extracting features using TF-IDF. We’ll focus on the ‘title’, ‘description’, ‘director’, ‘cast’, ‘country’, and ‘rating’ columns.

I have preprocessed the text data and calculated the cosine similarity matrix. The next step is to create a function that takes a movie or TV show title as input and outputs a list of similar movies or TV shows. Let’s do that.

I have created a function called get_recommendations that takes a movie or TV show title as input and outputs a list of the top 10 most similar movies or TV shows based on their content.

You can now input a title and get recommendations. For example, if you want recommendations based on the movie “Inception”, you would call the function like this: get_recommendations('Inception').

Would you like to try it with a specific title?

I asked ChatGPT to call the recommendation function to see the how the model would work:

Call the function: get_recommendations('Inception')

As a result, ChatGPT suggested 10 movies and TV shows.

Here are the top 10 movies and TV shows that are most similar to ‘Inception’ based on their content:

  1. Peaky Blinders
  2. Before the Flood
  3. Brick
  4. Catch Me If You Can
  5. Mars Attacks!
  6. Dead in a Week (Or Your Money Back)
  7. Sniper: Legacy
  8. Big Dreams, Small Spaces
  9. Lincoln
  10. Tom Segura: Ball Hog

Please note that the recommendations are based on the content of the movies and TV shows, including their title, description, director, cast, country, and rating. The similarity is calculated using TF-IDF and cosine similarity, which may not always reflect semantic similarity.

Would you like to try it with another title, or is there anything else you would like to do?

I then asked ChatGPT to construct another model:

Predict a movie’s rating based on features such as director, actor, country, release year, and genre.

ChatGPT’s response was:

I have created a new notebook for the rating prediction model. You can follow along with my work here.

Let’s start by loading the data and performing some initial preprocessing.

My apologies for the confusion. It seems that the ‘netflix_titles.csv’ file is not available in the current project. We need to upload the file to the new project. Let’s do that.

It seems that the ‘netflix_titles.csv’ file is not available in the current project. We need to upload the file to the new project. You can do this by going to the project page and clicking on the ‘Upload’ button.

Once the file is uploaded, we can proceed with the analysis. Please let me know when you have uploaded the file.

The Noteable plugin created a new notebook for this task. As the dataset was absent in the new notebook’s environment, I re-uploaded it and let the process continue.

I have loaded the data, selected the features, encoded the categorical features, and split the data into training and test sets.

Next, let’s train a simple logistic regression model as a baseline.

The logistic regression model has been trained and evaluated. Here’s the classification report:

           precision    recall  f1-score   support

  66 min       0.00     0.00     0.00         1
      G       0.00     0.00     0.00       12
      NR       0.00     0.00     0.00       16
      PG       0.00     0.00     0.00       62
  PG-13       0.00     0.00     0.00       87
      R       0.28     0.09     0.14       163
  TV-14       0.51     0.05     0.08       414
    TV-G       0.00     0.00     0.00       43
  TV-MA       0.39     0.98     0.56       662
  TV-PG       0.00     0.00     0.00       185
    TV-Y       0.00     0.00     0.00       52
  TV-Y7       0.00     0.00     0.00       65

accuracy                           0.39     1762
macro avg       0.10     0.09     0.07     1762
weighted avg       0.29     0.39     0.24     1762

The model’s overall accuracy is 0.39, which is not very high. This is likely due to the complexity of the task and the limitations of the logistic regression model. More complex models or additional feature engineering might be needed to improve the performance.

Would you like to try a different model, or is there anything else you would like to do?

I asked ChatGPT to suggest some models:

Can you recommend some models?

It provided the following response:

Absolutely! Here are some machine learning models that could potentially improve the performance on this task:

  1. Random Forest: This is an ensemble learning method that operates by constructing multiple decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random forests correct for decision trees’ habit of overfitting to their training set.
  2. Gradient Boosting: This is another ensemble machine learning algorithm that constructs new predictors that aim to correct the residual errors of the prior predictor, hence improving the model incrementally. XGBoost and LightGBM are popular gradient boosting libraries that are known for their performance and speed.
  3. Support Vector Machines (SVM): SVMs are powerful for certain types of datasets, especially if the data is not too large and if there’s a clear margin of separation.
  4. Neural Networks: If the dataset is large and complex, neural networks might be a good choice. They can capture complex patterns and interactions between features, but they require more computational resources and are harder to interpret.

Remember, the choice of model often depends on the specific characteristics of the dataset and the problem at hand. It’s often a good idea to try multiple models and see which one works best for your specific use case.

Would you like to proceed with one of these models, or is there something else you would like to do?

I then requested ChatGPT to create a random forest model:

Let’s go with Random Forest.

Here’s ChatGPT’s response:

The Random Forest model has been trained and evaluated. Here’s the classification report:

           precision    recall  f1-score   support

  66 min       0.00     0.00     0.00         1
  84 min       0.00     0.00     0.00         0
      G       0.33     0.17     0.22       12
      NR       0.00     0.00     0.00       16
      PG       0.47     0.44     0.45       62
  PG-13       0.43     0.40     0.42       87
      R       0.50     0.42     0.46       163
  TV-14       0.45     0.49     0.47       414
    TV-G       0.25     0.05     0.08       43
  TV-MA       0.59     0.70     0.64       662
  TV-PG       0.23     0.13     0.17       185
    TV-Y       0.43     0.46     0.44       52
  TV-Y7       0.56     0.62     0.58       65
Unknown       0.00     0.00     0.00         0

accuracy                           0.50     1762
macro avg       0.30     0.28     0.28     1762
weighted avg       0.48     0.50     0.49     1762

The model’s overall accuracy is 0.50, which is an improvement over the logistic regression model. However, the performance on some classes is still quite low. This could be due to a variety of reasons, such as class imbalance, lack of informative features, or the model not being complex enough to capture the patterns in the data.

Would you like to try a different model, or is there anything else you would like to do?

I also requested an attempt at the Support Vector Machines (SVM) model, and here’s the response:

The Support Vector Machine (SVM) model is currently being trained. SVMs are more computationally intensive than some other models, so it may take a while for the training to complete. You can check the progress of the model training in your notebook here. I will notify you as soon as the training is complete and the results are available.

Click the notebook link provided by ChatGPT to observe the training process.

noteable notebook vector machine model training

The red box shows the training time in the image above, and it took about 2.5 hours to finish. If you want to reduce training time, you can join the Waitlist to access better hardware, such as GPUs.

Web Scraping and API Requests

Since Noteable’s data notebooks run on internet-connected servers, you can scrape websites and make API requests to extract external data. This significantly extends the capabilities of ChatGPT.

Noteable can download and install the necessary Python libraries for various tasks, not limited to data analysis.

For instance, I asked ChatGPT to help me generate a word cloud from a Wikipedia page about machine learning:

Generate a word cloud for this web page: https://en.wikipedia.org/wiki/Machine_learning

Let’s examine the results of combining Noteable with ChatGPT:

To generate a word cloud from a webpage, we’ll need to follow these steps:

  1. Fetch the webpage content.
  2. Parse the HTML to extract the text.
  3. Clean the text (remove stopwords, punctuation, etc.).
  4. Generate the word cloud.

For this task, we’ll need to use several Python libraries: requests to fetch the webpage, BeautifulSoup to parse the HTML, nltk to clean the text, and wordcloud to generate the word cloud.

Let’s start by installing these libraries and fetching the webpage content.

I have successfully fetched the webpage content. Now, let’s parse the HTML to extract the text. We’ll use BeautifulSoup for this.

The HTML has been parsed and the text has been extracted. Now, let’s clean the text by removing stopwords and punctuation. We’ll use the Natural Language Toolkit (NLTK) for this.

The text has been cleaned by removing stopwords and punctuation. Now, let’s generate the word cloud. We’ll use the WordCloud function from the wordcloud library for this.

Here is the word cloud generated from the text on the Machine Learning Wikipedia page:

word cloud of machine learning

The size of each word indicates its frequency in the text. Larger words were mentioned more frequently.

The example above is just a small use case, but you can also store the links of webpages to be crawled in a table or a database. You can then interact with ChatGPT to schedule tasks, allowing batch data crawling from webpages and saving the results to another table or database. Noteable also has a scheduling feature that lets users set specific times to run notebooks, whether hourly, daily, weekly, or monthly.

Collaborating and Sharing Results

Real-Time Collaboration in Noteable

Noteable boasts potent real-time collaboration capabilities that enhance teamwork when using notebooks generated by ChatGPT:

  • Simultaneous editing: Multiple collaborators can open the same notebook at once, and each collaborator’s changes are displayed in real time. Users can add, edit, and reorder cells, while others can watch the changes as they happen.
  • Cell Locking: Users can lock specific cells they’re working on to prevent others from accidentally overwriting them.
  • Integrated Chat: Noteable includes an embedded chat feature where collaborators can discuss specific cells and refer to them in conversations.
  • @mentions: You can use “@” to mention specific users in the chat, notifying them or drawing their attention to relevant content.

These collaborative features aid teamwork in data analysis workflows. Team members can collaborate on ChatGPT-generated data notebooks, making enhancements and iterations in real time. The history and conversations are automatically recorded, enabling effective collaboration across the organization.

Exporting and Sharing Notebooks

Finished notebooks can be shared and exported in various ways:

  • Share notebooks with specific Noteable users or set access for users in spaces or projects.
  • Download the notebook as a Jupyter (.ipynb) file.
  • Export notebooks to formats like HTML, PNG, PDF, MD, RST, and PY.

Summary

In conclusion, the Noteable ChatGPT plugin revolutionizes the way we handle data analysis.

It merges the sophistication of natural language processing with the power of data notebooks, delivering an intuitive, accessible, and collaborative data analysis platform.

Embrace the future of data analysis with the Noteable ChatGPT plugin.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *