Recently, Bayesian principles have been successfully applied to connectionist networks with an eye towards studying the formation of internal representations. Our current work grows out of an unsupervised, generative framework being applied to understand the representations used in visual cortex (Olshausen & Field, 1996) and to discover the underlying structure in hierarchical visual domains (Lewicki & Sejnowski, 1997). We modified Lewicki and Sejnowski's approach to study how incorporating two specific constraints—context and sparse coding—affect the development of internal representations in networks learning a feature based alphabet. Analyses of the trained networks show that (1) the standard framework works well for limited data sets, but tends to poorer performance with larger data sets; (2) context alone improves performance while developing minimalistic internal representations; (3) sparse coding alone improves performance and actually develops internal representations that are somewhat redundant; (4) the combination of context and sparse coding constraints increases network accuracy and forms more robust internal representations, especially for larger data sets. Furthermore, by manipulating the form of the sparse coding constraint, networks can be encouraged to adopt either distributed or local encodings of surface features. Feedback connections in the brain may provide context information to relatively low-level visual areas, thereby informing their abiUty to discover structure in their inputs.