Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Pranav-here/AI-Dermatologist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI-Dermatologist

AI-Dermatologist is an AI-powered medical assistant that combines vision, speech-to-text, and text-to-speech capabilities to simulate a professional doctor–patient interaction. It enables users to record their voice, submit medical images, receive diagnostic insights, and hear spoken responses.


Screenshots

Tested on Gradio


Key Features

  • Voice Interaction: Record patient audio via microphone and transcribe using Groq's Whisper model.
  • Image Analysis: Process and evaluate medical images (e.g., skin lesions) using a multimodal LLM (Meta LLaMA Scout).
  • AI-Generated Responses: Generate concise, doctor-style text responses without AI disclaimers or formatting.
  • Text-to-Speech: Convert the AI-generated diagnosis into natural-sounding speech via gTTS (Google) or ElevenLabs.
  • Web UI: Intuitive Gradio interface for seamless voice and image input, and playback of responses.

Architecture Overview

  1. Audio Capture: User records their voice; saved as MP3 via speech_recognition and pydub.
  2. Transcription: Transcribe audio to text using Groq Whisper (whisper-large-v3).
  3. Prompt Assembly: Combine system prompt with transcription for context.
  4. Image Encoding & Analysis: Base64-encode user image and query Meta LLaMA Scout via the Groq API.
  5. Response Generation: Receive doctor-style advice from the LLM.
  6. Speech Synthesis: Generate and play back spoken response using gTTS or ElevenLabs.
  7. Web Interface: Gradio serves as the frontend for input/output.

Tools & Technologies

  • Python 3.8+
  • Gradio: Web UI framework for rapid prototyping of ML interfaces.
  • Groq SDK: For image and audio transcription API calls.
  • gTTS & ElevenLabs: Text-to-speech engines.
  • SpeechRecognition & pydub: Audio recording and format conversion.
  • dotenv: Environment variable management.
  • winsound / afplay / aplay: Cross-platform inline audio playback.

Installation

  1. Clone the repository:
    git clone https://github.com/pranav-here/AI-Dermatologist.git
    cd AI-Dermatologist
  2. Install dependencies:
    pip install -r requirements.txt
  3. Create a .env file with the following keys:
    ELEVENLABS_API_KEY=<your_elevenlabs_api_key>
    GROQ_API_KEY=<your_groq_api_key>

Usage

Launch the Gradio interface:

python gradio_app.py

Open the URL printed in your console (typically http://127.0.0.1:7860).

  • Step 1: Record your voice and/or upload an image.
  • Step 2: View the transcribed text and AI-generated diagnosis.
  • Step 3: Listen to the spoken response.

Screenshots

Tested on Gradio


Requirements

All dependencies are listed in requirements.txt. Example:

grpcio
gradio
pydub
speechrecognition
gtts
elevenlabs
groq-sdk
dotenv

Refer to requirements.txt for the full list and exact versions.


License

This project is licensed under the MIT License. See LICENSE for details.

About

AI-Dermatologist is an AI-powered medical assistant that combines vision, speech-to-text, and text-to-speech capabilities to simulate a professional doctor–patient interaction. It enables users to record their voice, submit medical images, receive diagnostic insights, and hear spoken responses.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages