Noteable ChatGPT Plugin: A New Era of Data Analysis Beyond the Code Interpreter
In the rapidly evolving world of data science, finding tools that simplify complex tasks while making the process engaging and intuitive is a real game-changer. Enter the Noteable ChatGPT plugin, a powerful fusion of OpenAI’s ChatGPT’s natural language processing capabilities with the robustness and versatility of data notebooks on the Noteable.io platform.
Whether you’re a newcomer to data analysis or a seasoned veteran, this innovative combination of artificial intelligence and data analytics is sure to pique your curiosity and open up a world of new possibilities. From facilitating code-free data analysis, to automatically generating insightful visualizations, to enabling seamless real-time cloud collaboration, the Noteable ChatGPT plugin is setting new standards in the field of data analysis.
In this article, I don’t just tell you about the incredible features of the Noteable ChatGPT plugin—I show you.
Through real-world demonstrations, I’ll explore the advanced data analysis capabilities of the plugin, revealing how it can transform your data analysis workflows and deliver results in a remarkably intuitive and efficient manner. Join me on this journey and discover how the Noteable ChatGPT plugin can revolutionize your data analysis experience.
What Is Noteable ChatGPT Plugin
The Noteable ChatGPT plugin is a third-party plugin developed by Noteable.io, a collaborative data notebook platform. It seamlessly integrates ChatGPT’s natural language processing capabilities with data notebook, which are a feature of the Noteable.io platform.
Similar to Jupyter Notebook, Noteable’s data notebook is a web application that allows users to create and share documents containing live code, equations, visualizations, and text annotations. It helps with tasks like data cleaning, data analysis, numerical simulation, statistical modeling, data visualization, and machine learning.
The Noteable ChatGPT plugin lets users chat with ChatGPT to perform tasks such as loading datasets, doing exploratory data analysis, creating visuals, running machine learning models, and more. All within a shareable notebook environment.
This integration simplifies data analysis and makes it accessible. Even without coding skills, you can do professional data analysis using the plugin’s intuitive dialogue interface.
Key Features of the Noteable ChatGPT Plugin
The plugin provides several features:
- Dialogue-guided Data Analysis: Guide your data analysis process without coding. Just describe your goals and the techniques you want to use, and ChatGPT will automatically create data notebooks and interpret the analysis results for you.
- Data Visualization: Visual charts can be generated during data analysis through ChatGPT dialogue. You can also request specific chart types.
- Cloud Collaboration: Notebooks created through ChatGPT are stored on the Noteable cloud platform. Multiple people can access and edit the same notebook at the same time, and notebooks persist even after you deleted the ChatGPT conversation. Team members can add cells, edit code, add comments, and see everyone’s contributions. The platform also supports real-time chat and commenting, facilitating discussion during data analysis.
- Sharing and Exporting: Notebooks created by ChatGPT can be exported for online analysis or integrated into other projects.
- Version Control: Since results are stored in the cells of the data notebook created by Noteable through ChatGPT, you can review the history of your data analysis. You can also recover previous versions when needed.
- Connecting to Everything: Noteable can connect to different data sources, such as MySQL, MariaDB, PostgreSQL, BigQuery, Snowflake, Databricks, and external APIs.
Getting Started with Notable ChatGPT Plugin
Installing the Noteable Plugin
To use the Noteable ChatGPT plugin, you need an active ChatGPT Plus account. Make sure you’re logged in and check if the ChatGPT plugin function is active.
Next, go to the ChatGPT plugin store and look for the Noteable plugin.
Click “Install” to add it to your list of plugins.
Connecting Noteable to ChatGPT
After installing the plugin, a login page for Noteable will pop up. You need to connect your Noteable account to ChatGPT.
If you don’t see the login page, visit Notable.io to create a free account.
After you have a Noteable account, ChatGPT will activate the Noteable plugin. You’ll see the Noteable logo below the GPT version selection button.
If it’s not selected, choose it from the plugin list.
Creating a Noteable Project
Noteable projects help you organize your work in the Noteable cloud space. A project holds Jupyter notebooks, folders, and files.
I’ll use a Kaggle dataset of over 8,000 Netflix movies and TV shows to demonstrate how to use Noteable ChatGPT plugin.
To create a project, enter the following prompt:
Create a new project called “Netflix Movies and TV Shows” and set it as my default project.
After ChatGPT confirmed that the project was created, I could find it in the Noteable.io cloud space.
I could also create projects directly in Noteable’s cloud space.
Additionally, setting a default project ensures new notebooks go to the default project, keeping your Noteable account organized.
Creating a Data Notebook
Start data analysis by creating a Noteable notebook.
A Noteable data notebook is a notebook similar to Jupyter Notebook in the Noteable cloud. It lets you add code, explanatory text, run and test code interactively, and display visual diagrams.
With the Noteable ChatGPT plugin, users unfamiliar with coding can create and edit notebooks through dialogue. To create a notebook, use the following prompt:
Create a new notebook called “Test” and give me the notebook link.
Open the provided note link to access the notebook in the Noteable cloud space.
Analyzing the Dataset
Importing Data from CSV, Excel, and Databases
Noteable makes it easy to import data from different sources. You can upload CSV files, Excel spreadsheets, and connect to databases like Postgres and MySQL.
To import data in the Noteable cloud space, follow these steps:
Open the note link provided by ChatGPT to access the notebook interface.
Click the plus sign in the upper right corner of the interface and select “Add files” from the drop-down menu.
Drag the CSV or Excel file into the dotted box or click “Upload from computer” to browse and upload the file. You can upload up to 25 files at a time, each up to 100MB.
Performing Exploratory Data Analysis (EDA)
Exploratory Data Analysis (EDA) is like studying a map before visiting a place. It helps us understand the dataset’s overall characteristics.
Before starting analysis, it’s common to clean and preprocess the data, addressing issues like wrong values, duplicates, missing values, and outliers. Not handling these issues can affect the accuracy of later analysis.
ChatGPT can assist with EDA by:
- Generating summary statistics for columns.
- Checking for missing values.
- Identifying unusual data points.
- Plotting a column’s distribution.
Before starting EDA, let’s see if Noteable ChatGPT plugin can help us perform these steps. The dataset we’ll use is netflix_titles.csv
.
To explore the dataset, I gave the following prompt to ChatGPT:
As a professional data analyst, please follow the standard data analysis procedure. Let’s analyze the data in the netflix_titles.csv step by step.
Here’s the response from ChatGPT:
I have created a code cell to load the data from the
netflix_titles.csv
file and display the first few rows. However, it seems there was an issue starting the kernel. Let’s try running the cell again.You can follow along with the cell work at this link.
The data has been successfully loaded. Here are the first few rows of the dataset:
index show_id type title director cast country date_added release_year rating 0 s1 Movie Dick Johnson Is Dead Kirsten Johnson United States September 25, 2021 2020 PG-13 1 s2 TV Show Blood & Water Ama Qamata, Khosi Ngema, Gail Mabalane, Thabang Molaba, Dillon Windvogel, Natasha Thahane, Arno Greeff, Xolile Tshabalala, Getmore Sithole, Cindy Mahlangu, Ryle De Morny, Greteli Fincham, Sello Maake Ka-Ncube, Odwa Gwanya, Mekaila Mathys, Sandi Schul South Africa September 24, 2021 2021 TV-MA 2 s3 TV Show Ganglands Julien Leclercq Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabiha Akkari, Sofia Lesaffre, Salim Kechiouche, Noureddine Farihi, Geert Van Rampelberg, Bakary Diombera September 24, 2021 2021 TV-MA 3 s4 TV Show Jailbirds New Orleans September 24, 2021 2021 TV-MA 4 s5 TV Show Kota Factory Mayur More, Jitendra Kumar, Ranjan Raj, Alam Khan, Ahsaas Channa, Revathi Pillai, Urvi Singh, Arun Kumar India September 24, 2021 2021 TV-MA The columns in the dataset are: index, show_id, type, title, director, cast, country, date_added, release_year, rating, duration, listed_in, description.
The next step in the data analysis process is usually to perform some exploratory data analysis (EDA) to understand the data better. Would you like to proceed with that?
After analyzing the data, ChatGPT automatically generated code to display the dataset’s first few rows. This gives a quick overview of the dataset’s structure.
Reviewing the notebook in Noteable’s cloud space, I found the Python code for data analysis.
ChatGPT then asked if I wanted to perform exploratory data analysis, to which I agreed. Below is another response from ChatGPT:
Here are some initial findings from the exploratory data analysis:
- The dataset contains 8807 rows and 12 columns.
- The columns and their respective data types are as follows:
Column Data Type show_id object type object title object director object cast object country object date_added object release_year int64 rating object duration object
- There are missing values in the dataset. Here is the count of missing values for each column:
Column Missing Values show_id 0 type 0 title 0 director 2634 cast 825 country 831 date_added 10 release_year 0 rating 4 duration 3 The next step would typically be data cleaning, where we handle these missing values. Would you like to proceed with that?
ChatGPT proceeded to output the data type of each column and the number of missing values. It suggested handling the missing values, and I allowed it to continue.
ChatGPT then helped me replace missing values with “Unknown”. But actually, for columns with only a few missing values, I could also delete rows with those missing values. However, to make it easy, I chose to proceed without changing the process.
I asked ChatGPT to continue and here’s the response:
Here is a count plot of the ‘type’ column, showing the number of Movies vs. TV Shows in the dataset:
As we can see, there are more movies than TV shows in the dataset.
Next, let’s visualize the distribution of the ‘release_year’ column. Would you like to proceed with that?
I asked ChatGPT to continue and here’s the response:
Here is a histogram of the ‘release_year’ column, showing the distribution of release years for the movies and TV shows in the dataset:
As we can see, the number of releases has been increasing over the years, with the most releases happening in recent years.
Next, we could analyze the ‘rating’ column to see the most common ratings for the movies and TV shows. Would you like to proceed with that?
I asked ChatGPT to continue and here’s the response:
Here is a count plot of the ‘rating’ column, showing the number of movies and TV shows for each rating:
As we can see, the most common rating is ‘TV-MA’, followed by ‘TV-14’ and ‘TV-PG’.
Next, we could analyze the ‘country’ column to see the countries with the most content. Would you like to proceed with that?
I asked ChatGPT to continue and here’s the response:
Here is a count plot of the ‘country’ column, showing the top 10 countries with the most content:
As we can see, the United States has the most content, followed by India and the United Kingdom.
This concludes the initial exploratory data analysis. Based on these insights, we can formulate more specific questions and perform further analysis. Do you have any specific questions or analyses you would like to perform on this dataset?
As you can see, ChatGPT performed more analysis and created data insights. It automatically generated charts based on the analysis objectives and presented conclusions, even though we didn’t explicitly ask for them. By default, it creates histograms if the data allows, but you can request different chart types.
For further exploration, I gave more prompts to ChatGPT, letting it generate more data insights.
My Prompt:
Analyze duration of movies.
ChatGPT’s response:
I asked ChatGPT to continue and received this:
Here is a histogram of the ‘duration’ column, showing the distribution of movie durations:
As we can see, the most common movie durations are between 80 and 120 minutes.
This concludes the analysis of movie durations. Do you have any other specific questions or analyses you would like to perform on this dataset?
However, I won’t dive into more EDA for this demonstration. To dive deeper into the Noteable ChatGPT plugin’s capabilities, we can continue with more advanced data analysis.
Diving Deeper into Data Analysis
Let’s delve deeper into the advanced data analysis capabilities of the Noteable plugin, specifically looking at natural language processing (NLP) and machine learning modeling.
Natural Language Processing (NLP)
NLP is a field of study that enables computers to understand and interact with human language, similar to how we communicate with each other. It allows computers to process, analyze, and gain insights from large amounts of textual data.
For instance, consider the following task:
Analyzing the most frequent words in titles and generate a word cloud.
After assigning this task to ChatGPT, it generated a word cloud:
Here is a word cloud of the most frequent words in the titles:
The size of the words in the word cloud indicates their frequency. As we can see, some of the most frequent words in the titles are ‘Love’, ‘Man’, ‘Life’, ‘World’, ‘Story’, etc.
This concludes the analysis of the most frequent words in titles. Do you have any other specific questions or analyses you would like to perform on this dataset?
Looking at the code produced in the Noteable notebook helps us comprehend the process of merging ChatGPT with Noteable.
I asked ChatGPT to create a word cloud, and it automatically installed the wordcloud
library in Python and imported the stop words from the library. I could even specify extra words to be omitted, such as ‘little’, if I wanted.
Machine Learning Modeling
Machine learning modeling in data analytics involves teaching computer algorithms to learn patterns and make predictions or decisions based on input data. These models can generalize and make accurate predictions on new, unseen data.
Let’s explore what models the Noteable ChatGPT plugin can construct:
What machine learning models can you build for this dataset?
ChatGPT’s response was:
I gave ChatGPT the following prompt:
Utilize the text data (like title and description) and categorical data (like director, cast, country, and rating) to build a content-based recommendation system.
ChatGPT’s reply was as follows:
Building a content-based recommendation system involves several steps:
- Preprocessing: This involves cleaning the text data and converting it into a format that can be used to calculate similarity. This often involves techniques like tokenization, stemming/lemmatization, and removal of stop words.
- Feature Extraction: This involves converting the text data into numerical features. This can be done using techniques like TF-IDF (Term Frequency-Inverse Document Frequency) or word embeddings like Word2Vec or BERT.
- Similarity Calculation: Once we have numerical features, we can calculate the similarity between different movies/TV shows. This can be done using measures like cosine similarity or Jaccard similarity.
- Recommendation: Once we have a measure of similarity between different movies/TV shows, we can recommend similar movies/TV shows to a given movie/TV show.
Let’s start by preprocessing the text data and extracting features using TF-IDF. We’ll focus on the ‘title’, ‘description’, ‘director’, ‘cast’, ‘country’, and ‘rating’ columns.
I have preprocessed the text data and calculated the cosine similarity matrix. The next step is to create a function that takes a movie or TV show title as input and outputs a list of similar movies or TV shows. Let’s do that.
I have created a function called
get_recommendations
that takes a movie or TV show title as input and outputs a list of the top 10 most similar movies or TV shows based on their content.You can now input a title and get recommendations. For example, if you want recommendations based on the movie “Inception”, you would call the function like this:
get_recommendations('Inception')
.Would you like to try it with a specific title?
I asked ChatGPT to call the recommendation function to see the how the model would work:
Call the function:
get_recommendations('Inception')
As a result, ChatGPT suggested 10 movies and TV shows.
Here are the top 10 movies and TV shows that are most similar to ‘Inception’ based on their content:
- Peaky Blinders
- Before the Flood
- Brick
- Catch Me If You Can
- Mars Attacks!
- Dead in a Week (Or Your Money Back)
- Sniper: Legacy
- Big Dreams, Small Spaces
- Lincoln
- Tom Segura: Ball Hog
Please note that the recommendations are based on the content of the movies and TV shows, including their title, description, director, cast, country, and rating. The similarity is calculated using TF-IDF and cosine similarity, which may not always reflect semantic similarity.
Would you like to try it with another title, or is there anything else you would like to do?
I then asked ChatGPT to construct another model:
Predict a movie’s rating based on features such as director, actor, country, release year, and genre.
ChatGPT’s response was:
I have created a new notebook for the rating prediction model. You can follow along with my work here.
Let’s start by loading the data and performing some initial preprocessing.
My apologies for the confusion. It seems that the ‘netflix_titles.csv’ file is not available in the current project. We need to upload the file to the new project. Let’s do that.
It seems that the ‘netflix_titles.csv’ file is not available in the current project. We need to upload the file to the new project. You can do this by going to the project page and clicking on the ‘Upload’ button.
Once the file is uploaded, we can proceed with the analysis. Please let me know when you have uploaded the file.
The Noteable plugin created a new notebook for this task. As the dataset was absent in the new notebook’s environment, I re-uploaded it and let the process continue.
I have loaded the data, selected the features, encoded the categorical features, and split the data into training and test sets.
Next, let’s train a simple logistic regression model as a baseline.
The logistic regression model has been trained and evaluated. Here’s the classification report:
precision recall f1-score support
66 min 0.00 0.00 0.00 1
G 0.00 0.00 0.00 12
NR 0.00 0.00 0.00 16
PG 0.00 0.00 0.00 62
PG-13 0.00 0.00 0.00 87
R 0.28 0.09 0.14 163
TV-14 0.51 0.05 0.08 414
TV-G 0.00 0.00 0.00 43
TV-MA 0.39 0.98 0.56 662
TV-PG 0.00 0.00 0.00 185
TV-Y 0.00 0.00 0.00 52
TV-Y7 0.00 0.00 0.00 65
accuracy 0.39 1762
macro avg 0.10 0.09 0.07 1762
weighted avg 0.29 0.39 0.24 1762The model’s overall accuracy is 0.39, which is not very high. This is likely due to the complexity of the task and the limitations of the logistic regression model. More complex models or additional feature engineering might be needed to improve the performance.
Would you like to try a different model, or is there anything else you would like to do?
I asked ChatGPT to suggest some models:
Can you recommend some models?
It provided the following response:
Absolutely! Here are some machine learning models that could potentially improve the performance on this task:
- Random Forest: This is an ensemble learning method that operates by constructing multiple decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random forests correct for decision trees’ habit of overfitting to their training set.
- Gradient Boosting: This is another ensemble machine learning algorithm that constructs new predictors that aim to correct the residual errors of the prior predictor, hence improving the model incrementally. XGBoost and LightGBM are popular gradient boosting libraries that are known for their performance and speed.
- Support Vector Machines (SVM): SVMs are powerful for certain types of datasets, especially if the data is not too large and if there’s a clear margin of separation.
- Neural Networks: If the dataset is large and complex, neural networks might be a good choice. They can capture complex patterns and interactions between features, but they require more computational resources and are harder to interpret.
Remember, the choice of model often depends on the specific characteristics of the dataset and the problem at hand. It’s often a good idea to try multiple models and see which one works best for your specific use case.
Would you like to proceed with one of these models, or is there something else you would like to do?
I then requested ChatGPT to create a random forest model:
Let’s go with Random Forest.
Here’s ChatGPT’s response:
The Random Forest model has been trained and evaluated. Here’s the classification report:
precision recall f1-score support
66 min 0.00 0.00 0.00 1
84 min 0.00 0.00 0.00 0
G 0.33 0.17 0.22 12
NR 0.00 0.00 0.00 16
PG 0.47 0.44 0.45 62
PG-13 0.43 0.40 0.42 87
R 0.50 0.42 0.46 163
TV-14 0.45 0.49 0.47 414
TV-G 0.25 0.05 0.08 43
TV-MA 0.59 0.70 0.64 662
TV-PG 0.23 0.13 0.17 185
TV-Y 0.43 0.46 0.44 52
TV-Y7 0.56 0.62 0.58 65
Unknown 0.00 0.00 0.00 0
accuracy 0.50 1762
macro avg 0.30 0.28 0.28 1762
weighted avg 0.48 0.50 0.49 1762The model’s overall accuracy is 0.50, which is an improvement over the logistic regression model. However, the performance on some classes is still quite low. This could be due to a variety of reasons, such as class imbalance, lack of informative features, or the model not being complex enough to capture the patterns in the data.
Would you like to try a different model, or is there anything else you would like to do?
I also requested an attempt at the Support Vector Machines (SVM) model, and here’s the response:
The Support Vector Machine (SVM) model is currently being trained. SVMs are more computationally intensive than some other models, so it may take a while for the training to complete. You can check the progress of the model training in your notebook here. I will notify you as soon as the training is complete and the results are available.
Click the notebook link provided by ChatGPT to observe the training process.
The red box shows the training time in the image above, and it took about 2.5 hours to finish. If you want to reduce training time, you can join the Waitlist to access better hardware, such as GPUs.
Web Scraping and API Requests
Since Noteable’s data notebooks run on internet-connected servers, you can scrape websites and make API requests to extract external data. This significantly extends the capabilities of ChatGPT.
Noteable can download and install the necessary Python libraries for various tasks, not limited to data analysis.
For instance, I asked ChatGPT to help me generate a word cloud from a Wikipedia page about machine learning:
Generate a word cloud for this web page: https://en.wikipedia.org/wiki/Machine_learning
Let’s examine the results of combining Noteable with ChatGPT:
To generate a word cloud from a webpage, we’ll need to follow these steps:
- Fetch the webpage content.
- Parse the HTML to extract the text.
- Clean the text (remove stopwords, punctuation, etc.).
- Generate the word cloud.
For this task, we’ll need to use several Python libraries:
requests
to fetch the webpage,BeautifulSoup
to parse the HTML,nltk
to clean the text, andwordcloud
to generate the word cloud.Let’s start by installing these libraries and fetching the webpage content.
I have successfully fetched the webpage content. Now, let’s parse the HTML to extract the text. We’ll use BeautifulSoup for this.
The HTML has been parsed and the text has been extracted. Now, let’s clean the text by removing stopwords and punctuation. We’ll use the Natural Language Toolkit (NLTK) for this.
The text has been cleaned by removing stopwords and punctuation. Now, let’s generate the word cloud. We’ll use the
WordCloud
function from thewordcloud
library for this.Here is the word cloud generated from the text on the Machine Learning Wikipedia page:
The size of each word indicates its frequency in the text. Larger words were mentioned more frequently.
The example above is just a small use case, but you can also store the links of webpages to be crawled in a table or a database. You can then interact with ChatGPT to schedule tasks, allowing batch data crawling from webpages and saving the results to another table or database. Noteable also has a scheduling feature that lets users set specific times to run notebooks, whether hourly, daily, weekly, or monthly.
Collaborating and Sharing Results
Real-Time Collaboration in Noteable
Noteable boasts potent real-time collaboration capabilities that enhance teamwork when using notebooks generated by ChatGPT:
- Simultaneous editing: Multiple collaborators can open the same notebook at once, and each collaborator’s changes are displayed in real time. Users can add, edit, and reorder cells, while others can watch the changes as they happen.
- Cell Locking: Users can lock specific cells they’re working on to prevent others from accidentally overwriting them.
- Integrated Chat: Noteable includes an embedded chat feature where collaborators can discuss specific cells and refer to them in conversations.
- @mentions: You can use “@” to mention specific users in the chat, notifying them or drawing their attention to relevant content.
These collaborative features aid teamwork in data analysis workflows. Team members can collaborate on ChatGPT-generated data notebooks, making enhancements and iterations in real time. The history and conversations are automatically recorded, enabling effective collaboration across the organization.
Exporting and Sharing Notebooks
Finished notebooks can be shared and exported in various ways:
- Share notebooks with specific Noteable users or set access for users in spaces or projects.
- Download the notebook as a Jupyter (.ipynb) file.
- Export notebooks to formats like HTML, PNG, PDF, MD, RST, and PY.
Summary
In conclusion, the Noteable ChatGPT plugin revolutionizes the way we handle data analysis.
It merges the sophistication of natural language processing with the power of data notebooks, delivering an intuitive, accessible, and collaborative data analysis platform.
Embrace the future of data analysis with the Noteable ChatGPT plugin.