Writing A TABLESAMPLE Sampling Method tablesample method The TABLESAMPLE clause implementation in PostgreSQL supports creating a custom sampling methods. These methods control what sample of the table will be returned when the TABLESAMPLE clause is used. Tablesample Method Functions The tablesample method must provide following set of functions: void tsm_init (TableSampleDesc *desc, uint32 seed, ...); Initialize the tablesample scan. The function is called at the beginning of each relation scan. Note that the first two parameters are required but you can specify additional parameters which then will be used by the TABLESAMPLE clause to determine the required user input in the query itself. This means that if your function will specify additional float4 parameter named percent, the user will have to call the tablesample method with expression which evaluates (or can be coerced) to float4. For example this definition: tsm_init (TableSampleDesc *desc, uint32 seed, float4 pct); Will lead to SQL call like this: ... TABLESAMPLE yourmethod(0.5) ... BlockNumber tsm_nextblock (TableSampleDesc *desc); Returns the block number of next page to be scanned. InvalidBlockNumber should be returned if the sampling has reached end of the relation. OffsetNumber tsm_nexttuple (TableSampleDesc *desc, BlockNumber blockno, OffsetNumber maxoffset); Return next tuple offset for the current page. InvalidOffsetNumber should be returned if the sampling has reached end of the page. void tsm_end (TableSampleDesc *desc); The scan has finished, cleanup any left over state. void tsm_reset (TableSampleDesc *desc); The scan needs to rescan the relation again, reset any tablesample method state. void tsm_cost (PlannerInfo *root, Path *path, RelOptInfo *baserel, List *args, BlockNumber *pages, double *tuples); This function is used by optimizer to decide best plan and is also used for output of EXPLAIN. There is one more function which tablesampling method can implement in order to gain more fine grained control over sampling. This function is optional: bool tsm_examinetuple (TableSampleDesc *desc, BlockNumber blockno, HeapTuple tuple, bool visible); Function that enables the sampling method to examine contents of the tuple (for example to collect some internal statistics). The return value of this function is used to determine if the tuple should be returned to client. Note that this function will receive even invisible tuples but it is not allowed to return true for such tuple (if it does, PostgreSQL will raise an error). As you can see most of the tablesample method interfaces get the TableSampleDesc as a first parameter. This structure holds state of the current scan and also provides storage for the tablesample method's state. It is defined as following: typedef struct TableSampleDesc { HeapScanDesc heapScan; TupleDesc tupDesc; void *tsmdata; } TableSampleDesc; Where heapScan is the descriptor of the physical table scan. It's possible to get table size info from it. The tupDesc represents the tuple descriptor of the tuples returned by the scan and passed to the tsm_examinetuple() interface. The tsmdata can be used by tablesample method itself to store any state info it might need during the scan. If used by the method, it should be pfreed in tsm_end() function.