Projects / whatMLmodel

What is whatMLmodel?

This is an AI⁠–⁠powered application that provides recommendations of machine learning models based on a brief description of a dataset. This project, which I am developing alongside collaborators, is still in its beta version and is open⁠–⁠source. If you are a developer or have knowledge of machine learning, you can join and contribute with your ideas and expertise.

How did the idea come about?

At some point, I came across a video by Midudev where he explained how to use an API to generate AI-powered text responses for any chat application. This got me thinking: within the text, I could request a structured JSON for my backend. I would only need a well-crafted prompt to define that structure and a function capable of identifying the first and last curly brace in the response, turning it into an object that my frontend could react to. In other words, instead of using the API to generate simple responses, I could use it to pull the strings of my application.

Some time later, I delved into the world of machine learning and was surprised by the many factors to consider when handling a dataset properly. I used ChatGPT many times to get guidance on which model to apply to a given problem, but the chat interface wasn’t the most practical. I had to take several notes to consolidate my knowledge and make quick lookups easier.

Months later, while looking for inspiration for a new project, both ideas merged: I came up with the idea of creating a platform that would help develop model selection criteria for machine learning, with an accessible interface dynamically powered by AI. Plus, I still needed a lot of practice, and I saw this as an opportunity to store my solved problems in a database that the AI could access to generate its recommendations.

How the app works

The process is simple: the user starts by making a brief description of their dataset and target variable. A classic example would be: “Knowing the characteristics of the Titanic victims (age, sex, occupation, etc.), we seek to predict the probability of survival of a given person.”
From that description, the application generates a more detailed interpretation, which includes the name of the most important features, the size of the dataset, etc. The user must check this information and correct it before clicking Get models.
Then, the AI generates a series of recommendations having detected the type of problem (regression, classification or clustering) and suggesting adaptations to other types if possible. For example, the probability of survival can be determined by a discrete variable (yes or no) and we would be talking about a classification problem, but it can be translated to a regression problem if it is defined by a continuous variable (a 70% probability of survival).
The result is an interactive analysis in which the user can explore the most suitable machine learning models for their problem and see how they have performed on similar datasets, being able to explore the code used in them. The objective is that the person can apply their own criteria to decide which model is convenient to use, considering options that they may not have taken into account.
The application also allows you to create an account to save and organize all the analyzes carried out, which is very useful for students who seek to structure their learning process and have a source of quick consultation.

How the app works

The process is simple: the user starts by making a brief description of their dataset and target variable. A classic example would be: “Knowing the characteristics of the Titanic victims (age, sex, occupation, etc.), we seek to predict the probability of survival of a given person.”

From that description, the application generates a more detailed interpretation, which includes the name of the most important features, the size of the dataset, etc. The user must check this information and correct it before clicking Get models.

Then, the AI generates a series of recommendations having detected the type of problem (regression, classification or clustering) and suggesting adaptations to other types if possible. For example, the probability of survival can be determined by a discrete variable (yes or no) and we would be talking about a classification problem, but it can be translated to a regression problem if it is defined by a continuous variable (a 70% probability of survival).

The result is an interactive analysis in which the user can explore the most suitable machine learning models for their problem and see how they have performed on similar datasets, being able to explore the code used in them. The objective is that the person can apply their own criteria to decide which model is convenient to use, considering options that they may not have taken into account.

The application also allows you to create an account to save and organize all the analyzes carried out, which is very useful for students who seek to structure their learning process and have a source of quick consultation.

Prompt engineering

The application interacts with AI on two occasions. Initially, with the user's initial description, the AI generates a more detailed interpretation. This is the prompt used in that case:

Next, the user corrects the information, leading to a second call to the AI, which generates model recommendations based on the following prompt:

Additionally, response prototypes and lists of similar datasets that should be selected are provided. This ensures that the AI has enough context to deliver an effective response. The response is then processed by the backend, which recognizes the JSON keys to structure a sequence of paragraphs and tables.

Upcoming Features

We have a series of features we’d like to implement very soon:

Encyclopedia: You’ll be able to explore the definitions and application theories of all models, with very detailed texts and graphics.
Interactive Chat: For each analysis performed, a chat will open where users can further investigate the problem at hand and explore possible solutions in more depth.
Dataset generation: The application will be able to generate fictional datasets based on the problem description. A very useful tool for testing models and algorithms.
Code generation: The application will be able to generate code to implement the suggested models. We recognize that this is a complex and delicate feature, as there is a risk of AI hallucination and users lacking the necessary knowledge to interpret and handle the code properly. However, we believe that offering guidance on data processing could, in the long run, become one of the most valuable features of the application.
Collaborative community: We want to foster the growth of a community where users will be able to share their datasets and help each other in learning and problem⁠–⁠solving.

Once again: this is an open⁠–⁠source project

We want this application to become the best platform for learning about machine learning and exploring different models. So, if you like the idea and you're a developer or have knowledge of machine learning, you're more than welcome to contribute!