Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

CFG and DFG for ISEQs #272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Feb 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions lib/syntax_tree.rb
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,12 @@
require_relative "syntax_tree/index"

require_relative "syntax_tree/yarv"
require_relative "syntax_tree/yarv/basic_block"
require_relative "syntax_tree/yarv/bf"
require_relative "syntax_tree/yarv/calldata"
require_relative "syntax_tree/yarv/compiler"
require_relative "syntax_tree/yarv/control_flow_graph"
require_relative "syntax_tree/yarv/data_flow_graph"
require_relative "syntax_tree/yarv/decompiler"
require_relative "syntax_tree/yarv/disassembler"
require_relative "syntax_tree/yarv/instruction_sequence"
Expand Down
53 changes: 53 additions & 0 deletions lib/syntax_tree/yarv/basic_block.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# frozen_string_literal: true

module SyntaxTree
module YARV
# This object represents a single basic block, wherein all contained
# instructions do not branch except for the last one.
class BasicBlock
# This is the unique identifier for this basic block.
attr_reader :id

# This is the index into the list of instructions where this block starts.
attr_reader :block_start

# This is the set of instructions that this block contains.
attr_reader :insns

# This is an array of basic blocks that lead into this block.
attr_reader :incoming_blocks

# This is an array of basic blocks that this block leads into.
attr_reader :outgoing_blocks

def initialize(block_start, insns)
@id = "block_#{block_start}"

@block_start = block_start
@insns = insns

@incoming_blocks = []
@outgoing_blocks = []
end

# Yield each instruction in this basic block along with its index from the
# original instruction sequence.
def each_with_length
return enum_for(:each_with_length) unless block_given?

length = block_start
insns.each do |insn|
yield insn, length
length += insn.length
end
end

# This method is used to verify that the basic block is well formed. It
# checks that the only instruction in this basic block that branches is
# the last instruction.
def verify
insns[0...-1].each { |insn| raise unless insn.branch_targets.empty? }
end
end
end
end
91 changes: 91 additions & 0 deletions lib/syntax_tree/yarv/calldata.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# frozen_string_literal: true

module SyntaxTree
module YARV
# This is an operand to various YARV instructions that represents the
# information about a specific call site.
class CallData
CALL_ARGS_SPLAT = 1 << 0
CALL_ARGS_BLOCKARG = 1 << 1
CALL_FCALL = 1 << 2
CALL_VCALL = 1 << 3
CALL_ARGS_SIMPLE = 1 << 4
CALL_BLOCKISEQ = 1 << 5
CALL_KWARG = 1 << 6
CALL_KW_SPLAT = 1 << 7
CALL_TAILCALL = 1 << 8
CALL_SUPER = 1 << 9
CALL_ZSUPER = 1 << 10
CALL_OPT_SEND = 1 << 11
CALL_KW_SPLAT_MUT = 1 << 12

attr_reader :method, :argc, :flags, :kw_arg

def initialize(
method,
argc = 0,
flags = CallData::CALL_ARGS_SIMPLE,
kw_arg = nil
)
@method = method
@argc = argc
@flags = flags
@kw_arg = kw_arg
end

def flag?(mask)
(flags & mask) > 0
end

def to_h
result = { mid: method, flag: flags, orig_argc: argc }
result[:kw_arg] = kw_arg if kw_arg
result
end

def inspect
names = []
names << :ARGS_SPLAT if flag?(CALL_ARGS_SPLAT)
names << :ARGS_BLOCKARG if flag?(CALL_ARGS_BLOCKARG)
names << :FCALL if flag?(CALL_FCALL)
names << :VCALL if flag?(CALL_VCALL)
names << :ARGS_SIMPLE if flag?(CALL_ARGS_SIMPLE)
names << :BLOCKISEQ if flag?(CALL_BLOCKISEQ)
names << :KWARG if flag?(CALL_KWARG)
names << :KW_SPLAT if flag?(CALL_KW_SPLAT)
names << :TAILCALL if flag?(CALL_TAILCALL)
names << :SUPER if flag?(CALL_SUPER)
names << :ZSUPER if flag?(CALL_ZSUPER)
names << :OPT_SEND if flag?(CALL_OPT_SEND)
names << :KW_SPLAT_MUT if flag?(CALL_KW_SPLAT_MUT)

parts = []
parts << "mid:#{method}" if method
parts << "argc:#{argc}"
parts << "kw:[#{kw_arg.join(", ")}]" if kw_arg
parts << names.join("|") if names.any?

"<calldata!#{parts.join(", ")}>"
end

def self.from(serialized)
new(
serialized[:mid],
serialized[:orig_argc],
serialized[:flag],
serialized[:kw_arg]
)
end
end

# A convenience method for creating a CallData object.
def self.calldata(
method,
argc = 0,
flags = CallData::CALL_ARGS_SIMPLE,
kw_arg = nil
)
CallData.new(method, argc, flags, kw_arg)
end
end
end
184 changes: 184 additions & 0 deletions lib/syntax_tree/yarv/control_flow_graph.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
# frozen_string_literal: true

module SyntaxTree
module YARV
# This class represents a control flow graph of a YARV instruction sequence.
# It constructs a graph of basic blocks that hold subsets of the list of
# instructions from the instruction sequence.
#
# You can use this class by calling the ::compile method and passing it a
# YARV instruction sequence. It will return a control flow graph object.
#
# iseq = RubyVM::InstructionSequence.compile("1 + 2")
# iseq = SyntaxTree::YARV::InstructionSequence.from(iseq.to_a)
# cfg = SyntaxTree::YARV::ControlFlowGraph.compile(iseq)
#
class ControlFlowGraph
# This is the instruction sequence that this control flow graph
# corresponds to.
attr_reader :iseq

# This is the list of instructions that this control flow graph contains.
# It is effectively the same as the list of instructions in the
# instruction sequence but with line numbers and events filtered out.
attr_reader :insns

# This is the set of basic blocks that this control-flow graph contains.
attr_reader :blocks

def initialize(iseq, insns, blocks)
@iseq = iseq
@insns = insns
@blocks = blocks
end

def disasm
fmt = Disassembler.new(iseq)
fmt.output.puts("== cfg: #{iseq.inspect}")

blocks.each do |block|
fmt.output.puts(block.id)
fmt.with_prefix(" ") do |prefix|
unless block.incoming_blocks.empty?
from = block.incoming_blocks.map(&:id)
fmt.output.puts("#{prefix}== from: #{from.join(", ")}")
end

fmt.format_insns!(block.insns, block.block_start)

to = block.outgoing_blocks.map(&:id)
to << "leaves" if block.insns.last.leaves?
fmt.output.puts("#{prefix}== to: #{to.join(", ")}")
end
end

fmt.string
end

# This method is used to verify that the control flow graph is well
# formed. It does this by checking that each basic block is itself well
# formed.
def verify
blocks.each(&:verify)
end

def self.compile(iseq)
Compiler.new(iseq).compile
end

# This class is responsible for creating a control flow graph from the
# given instruction sequence.
class Compiler
# This is the instruction sequence that is being compiled.
attr_reader :iseq

# This is a hash of indices in the YARV instruction sequence that point
# to their corresponding instruction.
attr_reader :insns

# This is a hash of labels that point to their corresponding index into
# the YARV instruction sequence. Note that this is not the same as the
# index into the list of instructions on the instruction sequence
# object. Instead, this is the index into the C array, so it includes
# operands.
attr_reader :labels

def initialize(iseq)
@iseq = iseq

@insns = {}
@labels = {}

length = 0
iseq.insns.each do |insn|
case insn
when Instruction
@insns[length] = insn
length += insn.length
when InstructionSequence::Label
@labels[insn] = length
end
end
end

# This method is used to compile the instruction sequence into a control
# flow graph. It returns an instance of ControlFlowGraph.
def compile
blocks = connect_basic_blocks(build_basic_blocks)
ControlFlowGraph.new(iseq, insns, blocks.values).tap(&:verify)
end

private

# Finds the indices of the instructions that start a basic block because
# they're either:
#
# * the start of an instruction sequence
# * the target of a branch
# * fallen through to from a branch
#
def find_basic_block_starts
block_starts = Set.new([0])

insns.each do |index, insn|
branch_targets = insn.branch_targets

if branch_targets.any?
branch_targets.each do |branch_target|
block_starts.add(labels[branch_target])
end

block_starts.add(index + insn.length) if insn.falls_through?
end
end

block_starts.to_a.sort
end

# Builds up a set of basic blocks by iterating over the starts of each
# block. They are keyed by the index of their first instruction.
def build_basic_blocks
block_starts = find_basic_block_starts

length = 0
blocks =
iseq
.insns
.grep(Instruction)
.slice_after do |insn|
length += insn.length
block_starts.include?(length)
end

block_starts
.zip(blocks)
.to_h do |block_start, block_insns|
[block_start, BasicBlock.new(block_start, block_insns)]
end
end

# Connect the blocks by letting them know which blocks are incoming and
# outgoing from each block.
def connect_basic_blocks(blocks)
blocks.each do |block_start, block|
insn = block.insns.last

insn.branch_targets.each do |branch_target|
block.outgoing_blocks << blocks.fetch(labels[branch_target])
end

if (insn.branch_targets.empty? && !insn.leaves?) ||
insn.falls_through?
fall_through_start = block_start + block.insns.sum(&:length)
block.outgoing_blocks << blocks.fetch(fall_through_start)
end

block.outgoing_blocks.each do |outgoing_block|
outgoing_block.incoming_blocks << block
end
end
end
end
end
end
end
Loading