生成AIがシステム開発に利用されるようになってホットな領域になっているのが、ソースリポジトリ、ソースツリーの解析です。僕のところにも、このソースリポジトリが何をやっているのか可視化してくれみたいな話はよく聞きます。一方で、GitHub Copilotなどを利用したところで複数ファイルからなる（従ってほぼ全ての）プロダクトのソースコード解析はまだまだ難しいわけで。そこで、ソースコード解析を謳う生成AIエージェントがソースコードをどのように解析するのか、生成AIエージェントのソースコードをよんで理解してみましょう。

Potpie.ai

Potpie.aiは、コードベースにAIエージェントを統合し、デバッグや、テストなどのソフトウェア開発タスクを自動化・効率化するためのサービスを提供しています。例えば、ソースコードを解析した上で、デバッグやテスト用のエージェントを実行させ、開発プロセスを効率化させることができます。

Potpie.aiはソースコードをOSSとして公開しているため、中で何をおこなっているのかをソースコードから読み解くことが可能です。

結論から言えば、Potpie.aiではtree-sitterを用いてソースコードを解析しています。この辺りのソースコードを眺めると、tree-sitterを利用していることがわかります。

Potpie.aiでは、こうやって解析したソースの構造をNeo4Jに格納し、エージェントが利用する実装のようでした。

        language = get_language(lang)
        parser = get_parser(lang)

        query_scm = get_scm_fname(lang)
        if not query_scm.exists():
            return
        query_scm = query_scm.read_text()

        code = self.io.read_text(fname)
        if not code:
            return
        tree = parser.parse(bytes(code, "utf-8"))

        # Run the tags queries
        query = language.query(query_scm)
        captures = query.captures(tree.root_node)
        captures = list(captures)

code-base-agent

リポジトリをグラフ構造に変換するエージェントであるcode-base-agentも、tree-sitterを利用してソースコードを解析しています。

We used a combination of llama-index, CodeHierarchy module, and tree-sitter-languages for parsing code > into a graph structure, Neo4j for storing and querying the graph data, and langchain to create the agents.

Cline

もちろんClineも使っています。このコード解析はツール list_code_definition_names で使われる。

## list_code_definition_names
Description: Request to list definition names (classes, functions, methods, etc.) used 
in source code files at the top level of the specified directory. 
This tool provides insights into the codebase structure and important constructs, 
encapsulating high-level concepts and relationships that are crucial for 
understanding the overall architecture.

Parameters:
- path: (required) The path of the directory (relative to the current 
  working directory ${cwd.toPosix()}) to list top level source code definitions for.
Usage:
<list_code_definition_names>
<path>Directory path here</path>
</list_code_definition_names>${
 supportsComputerUse
  ? `