This repository contains a program that compares the responses provided by StackOverflow and ChatGPT for software development-related questions. The goal is to analyze the quality, relevance, and completeness of the responses from these two sources to understand their comparative strengths and weaknesses when it comes to providing helpful information to software developers.
The program includes the following features:
This project utilizes a dataset of software development-related questions and answers from StackOverflow. The dataset was obtained from the following source:
Specifically, the dataset used in this project was extracted from the "Posts" table of the StackOverflow data dump. The data contains information such as the question title, body, tags, and the top-voted answer.
To download the dataset, follow these steps:
dataCatalog.py
script to point to the location of the extracted StackOverflow dataset.This project uses the gpt-4o-mini
model, with the following rate limits:
Parameter | Value | Description |
---|---|---|
RATE_LIMIT_TPM | 200,000 tokens/min | Maximum tokens per minute |
RATE_LIMIT_BUFFER | 5,000 tokens | Buffer to avoid hitting the rate limit exactly |
TOKEN_COST_PER_REQUEST | 2,000 tokens (estimated) | Estimated token usage per request |
These settings help maintain efficient API usage within the rate limits provided by OpenAI. For details, refer to OpenAI Platform Settings.
If you want change model you can change parameters in AiRequest.py.
To use the program, follow these steps:
Clone the repository to your local machine:
git clone https://github.com/lorenzopaoria/Comparison-between-StackOverflow-and-ChatGPT-responses-for-software-development-questions.git
Navigate to the cloned repository:
cd Comparison-between-StackOverflow-and-ChatGPT-responses-for-software-development-questions
Install the required dependencies:
pip install -r requirements.txt
Navigate to the programs repository:
cd py
Run the program:
python main.py
Review the analysis results, which will be saved in JSON file for each categories.
If you find any issues or have suggestions for improvements, please feel free to submit a pull request or open an issue in the repository.
This project is licensed under the MIT License.