Streamline code analysis with AI chatbots using source code gatherer
Categories:
Discover how to enhance your code analysis workflow using AI chatbots with a simple Python script that gathers source code from specified project folders. This tool helps you prepare your code for AI analysis while maintaining full control over the context provided to the AI.
Introduction
While integrated development environments (IDEs) offer sophisticated code analysis tools, AI chatbots like Claude and ChatGPT have emerged as powerful alternatives for code understanding, documentation, and problem-solving. However, feeding your codebase to these AI tools can be cumbersome. This article introduces a simple Python script that streamlines this process by gathering all source code from a specified directory into a single text file.
Key Features
The Source Code Gatherer script provides a straightforward way to collect all non-binary files from a project directory for AI analysis.
The script offers these main benefits:
Selective Context Control:
- Choose specific folders to analyze
- Exclude binary files automatically
- Control the scope of code being analyzed
Universal Compatibility:
- Works with any AI chatbot
- No special integration required
- Simple text output format
The Script
Options:
- Download the file directly: gather_source_code.py
- Copy the script below and save it as
gather_source_code.py
import os
import mimetypes
import pathlib
from typing import List, Set
def is_text_file(filepath: str, text_extensions: Set[str]) -> bool:
"""
Determine if a file is a text file based on its extension and mime type.
"""
# Check if extension is in our allowed list
ext = pathlib.Path(filepath).suffix.lower()
if ext in text_extensions:
return True
# Use mime type as fallback
mime_type, _ = mimetypes.guess_type(filepath)
return mime_type is not None and mime_type.startswith('text/')
def collect_files(directory: str, text_extensions: Set[str]) -> List[str]:
"""
Recursively collect all text files in the given directory.
"""
text_files = []
for root, _, files in os.walk(directory):
for file in files:
filepath = os.path.join(root, file)
if is_text_file(filepath, text_extensions):
text_files.append(filepath)
# Sort for consistent output
return sorted(text_files)
def create_combined_file(files: List[str], output_file: str):
"""
Create a single file containing the content of all input files with proper formatting.
"""
with open(output_file, 'w', encoding='utf-8') as outfile:
for filepath in files:
try:
with open(filepath, 'r', encoding='utf-8') as infile:
# Write file header
outfile.write(f"// Filepath: {filepath}\n\n\n")
outfile.write("```\n")
# Write file content
outfile.write(infile.read())
# Write file footer
outfile.write("\n```\n\n\n")
except UnicodeDecodeError:
print(f"Warning: Could not read {filepath} as text. Skipping.")
except Exception as e:
print(f"Error processing {filepath}: {str(e)}")
def main():
# Define the extensions you want to include
text_extensions = {
'.h', '.cpp', '.cs', '.py', '.json', '.xml', '.txt', '.md',
'.ini', '.config', '.yaml', '.yml', '.uplugin', '.build',
'.html', '.css', '.js', '.java', '.swift', '.m', '.mm',
'.sh', '.bat', '.cmd', '.ps1', '.gradle', '.properties'
}
# Get directory from command line argument or use current directory
import sys
directory = sys.argv[1] if len(sys.argv) > 1 else '.'
output_file = 'combined_source_code.txt'
# Collect and process files
print(f"Scanning directory: {directory}")
files = collect_files(directory, text_extensions)
print(f"Found {len(files)} text files")
# Create combined file
create_combined_file(files, output_file)
print(f"Created combined file: {output_file}")
if __name__ == '__main__':
main()
Usage Guide
Basic Usage
- Save the script to your local machine
- Open a terminal or command prompt
- Run the script with a directory path:
python gather_source_code.py "C:\MyProject"
The script will create a file named gathered_source_code.txt
in the current directory, containing all the source code from the specified folder.
Output Format
The generated file will contain all source code with clear separators between files:
// Filepath: src/main.py
[Content of main.py]
// Filepath: src/utils/helper.py
[Content of helper.py]
Best Practices
Optimizing Your Workflow
Choose the Right Scope:
- Select specific subdirectories for focused analysis
- Avoid including unnecessary files like build outputs or dependencies
Managing Large Codebases:
- Break down large projects into smaller chunks
- Consider AI platform token limits
- Focus on related code files for better context
Effective AI Queries:
- Provide clear questions about the gathered code
- Reference specific files or functions in your questions
- Consider including relevant documentation files
Pro Tip
When working with large projects, gather code from specific feature folders or modules separately to stay within AI platform context limits while maintaining relevant context.Benefits Over IDE Integration
While many IDEs offer direct AI integration, this script provides several unique advantages:
Platform Independence:
- Works with any AI chatbot
- No vendor lock-in
- Simple to modify and customize
Context Control:
- Precise control over what code is included
- Easy to exclude irrelevant files
- Maintain focus on specific code areas
Simplicity:
- No complex setup required
- Works with any project structure
- Easy to understand and modify
Conclusion
The Source Code Gatherer script provides a simple yet effective way to prepare your code for AI analysis. By streamlining the process of collecting source code, it allows developers to focus on getting meaningful insights from AI chatbots rather than wrestling with how to provide code context to them.