AI-Dermatologist is an AI-powered medical assistant that combines vision, speech-to-text, and text-to-speech capabilities to simulate a professional doctor–patient interaction. It enables users to record their voice, submit medical images, receive diagnostic insights, and hear spoken responses.
- Voice Interaction: Record patient audio via microphone and transcribe using Groq's Whisper model.
- Image Analysis: Process and evaluate medical images (e.g., skin lesions) using a multimodal LLM (Meta LLaMA Scout).
- AI-Generated Responses: Generate concise, doctor-style text responses without AI disclaimers or formatting.
- Text-to-Speech: Convert the AI-generated diagnosis into natural-sounding speech via gTTS (Google) or ElevenLabs.
- Web UI: Intuitive Gradio interface for seamless voice and image input, and playback of responses.
- Audio Capture: User records their voice; saved as MP3 via
speech_recognitionandpydub. - Transcription: Transcribe audio to text using Groq Whisper (
whisper-large-v3). - Prompt Assembly: Combine system prompt with transcription for context.
- Image Encoding & Analysis: Base64-encode user image and query Meta LLaMA Scout via the Groq API.
- Response Generation: Receive doctor-style advice from the LLM.
- Speech Synthesis: Generate and play back spoken response using gTTS or ElevenLabs.
- Web Interface: Gradio serves as the frontend for input/output.
- Python 3.8+
- Gradio: Web UI framework for rapid prototyping of ML interfaces.
- Groq SDK: For image and audio transcription API calls.
- gTTS & ElevenLabs: Text-to-speech engines.
- SpeechRecognition & pydub: Audio recording and format conversion.
- dotenv: Environment variable management.
- winsound / afplay / aplay: Cross-platform inline audio playback.
- Clone the repository:
git clone https://github.com/pranav-here/AI-Dermatologist.git cd AI-Dermatologist - Install dependencies:
pip install -r requirements.txt
- Create a
.envfile with the following keys:ELEVENLABS_API_KEY=<your_elevenlabs_api_key> GROQ_API_KEY=<your_groq_api_key>
Launch the Gradio interface:
python gradio_app.pyOpen the URL printed in your console (typically http://127.0.0.1:7860).
- Step 1: Record your voice and/or upload an image.
- Step 2: View the transcribed text and AI-generated diagnosis.
- Step 3: Listen to the spoken response.
All dependencies are listed in requirements.txt. Example:
grpcio
gradio
pydub
speechrecognition
gtts
elevenlabs
groq-sdk
dotenv
Refer to requirements.txt for the full list and exact versions.
This project is licensed under the MIT License. See LICENSE for details.

