Programming Languages
Programming Languages
PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Thu, 04 Oct 2012 07:53:16 UTC
Contents
Articles
Computer programming History of programming languages Comparison of programming languages Computer program Programming language Abstraction Programmer Language primitive Assembly language Machine code Source code Command Execution 1 8 14 45 50 64 72 76 77 89 92 95 96 98 98 102 115 117 118 120
Theory
Programming language theory Type system Strongly typed programming language Weak typing Syntax Scripting language
References
Article Sources and Contributors Image Sources, Licenses and Contributors 124 127
Article Licenses
License 128
Computer programming
Computer programming
Computer programming (often shortened to programming or coding) is the process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages (such as Java, C++, C#, python, etc.). The purpose of programming is to create a set of instructions that computers use to perform specific operations or to exhibit desired behaviors. The process of writing source code often requires expertise in many different subjects, including knowledge of the application domain, specialized algorithms and formal logic.
Overview
Within software engineering, programming (the implementation) is regarded as one phase in a software development process. There is an ongoing debate on the extent to which the writing of programs is an art form, a craft or an engineering discipline.[1] In general, good programming is considered to be the measured application of all three, with the goal of producing an efficient and evolvable software solution (the criteria for "efficient" and "evolvable" vary considerably). The discipline differs from many other technical professions in that programmers, in general, do not need to be licensed or pass any standardized (or governmentally regulated) certification tests in order to call themselves "programmers" or even "software engineers." Because the discipline covers many areas, which may or may not include critical applications, it is debatable whether licensing is required for the profession as a whole. In most cases, the discipline is self-governed by the entities which require the programming, and sometimes very strict environments are defined (e.g. United States Air Force use of AdaCore and security clearance). However, representing oneself as a "Professional Software Engineer" without a license from an accredited institution is illegal in many parts of the world. Another ongoing debate is the extent to which the programming language used in writing computer programs affects the form that the final program takes. This debate is analogous to that surrounding the SapirWhorf hypothesis[2] in linguistics and cognitive science, which postulates that a particular spoken language's nature influences the habitual thought of its speakers. Different language patterns yield different patterns of thought. This idea challenges the possibility of representing the world perfectly with language, because it acknowledges that the mechanisms of any language condition the thoughts of its speaker community.
Computer programming
History
Ancient cultures had no conception of computing beyond simple arithmetic. The only mechanical device that existed for numerical computation at the beginning of human history was the abacus, invented in Sumeria circa 2500 BC. Later, the Antikythera mechanism, invented some time around 100 AD in ancient Greece, was the first mechanical calculator utilizing gears of various sizes and configuration to perform calculations,[3] which tracked the metonic cycle still used in lunar-to-solar calendars, and which is consistent for calculating the dates of the Olympiads.[4] The Kurdish medieval scientist Al-Jazari built programmable Automata in 1206 AD. One system employed in these devices was the use of pegs and cams placed into a wooden drum at specific locations, which would sequentially trigger levers that in turn operated percussion instruments. The output of this device was a small drummer playing various rhythms and drum patterns.[5][6] The Jacquard Loom, which Joseph Marie Jacquard developed in 1801, uses a series of pasteboard cards with holes punched in them. The hole Ada Lovelace created the first algorithm designed pattern represented the pattern that the loom had to follow in weaving for processing by a computer and is usually recognized as history's first computer cloth. The loom could produce entirely different weaves using different programmer. sets of cards. Charles Babbage adopted the use of punched cards around 1830 to control his Analytical Engine. The first computer program was written for the Analytical Engine by mathematician Ada Lovelace to calculate a sequence of Bernoulli Numbers.[7] The synthesis of numerical calculation, predetermined operation and output, along with a way to organize and input instructions in a manner relatively easy for humans to conceive and produce, led to the modern development of computer programming. Development of computer programming accelerated through the Industrial Revolution. In the late 1880s, Herman Hollerith invented the recording of data on a medium that could then be read by a machine. Prior uses of machine readable media, above, had been for control, not data. "After some initial trials with paper tape, he settled on punched cards..."[8] To process these punched cards, first known as "Hollerith cards" he invented the tabulator, and the keypunch machines. These three inventions were the foundation of the modern information processing industry. In 1896 he founded the Tabulating Machine Company (which later became the core of IBM). The addition of a control panel (plugboard) to his 1906 Type I Tabulator allowed it to do different jobs without having to be physically rebuilt. By the late 1940s, there were a variety of control panel programmable machines, called unit record equipment, to perform data-processing tasks. The invention of the von Data and instructions could be stored on external Neumann architecture allowed computer programs to be stored in punched cards, which were kept in order and computer memory. Early programs had to be painstakingly crafted arranged in program decks. using the instructions (elementary operations) of the particular machine, often in binary notation. Every model of computer would likely use different instructions (machine language) to do the same task. Later, assembly languages were developed that let the programmer specify each
Computer programming instruction in a text format, entering abbreviations for each operation code instead of a number and specifying addresses in symbolic form (e.g., ADD X, TOTAL). Entering a program in assembly language is usually more convenient, faster, and less prone to human error than using machine language, but because an assembly language is little more than a different notation for a machine language, any two machines with different instruction sets also have different assembly languages. In 1954, FORTRAN was invented; it was the first high level programming language to have a functional implementation, as opposed to just a design on paper.[9][10] (A high-level language is, in very general terms, any programming language that allows the programmer to write programs in terms that are more abstract than assembly language instructions, i.e. at a level of abstraction "higher" than that of an assembly language.) It allowed programmers to specify calculations by entering a formula directly (e.g. Y = X*2 + 5*X + 9). The program text, or source, is converted into machine instructions Wired control panel for an IBM 402 Accounting using a special program called a compiler, which translates the Machine. FORTRAN program into machine language. In fact, the name FORTRAN stands for "Formula Translation". Many other languages were developed, including some for commercial programming, such as COBOL. Programs were mostly still entered using punched cards or paper tape. (See computer programming in the punch card era). By the late 1960s, data storage devices and computer terminals became inexpensive enough that programs could be created by typing directly into the computers. Text editors were developed that allowed changes and corrections to be made much more easily than with punched cards. (Usually, an error in punching a card meant that the card had to be discarded and a new one punched to replace it.) As time has progressed, computers have made giant leaps in the area of processing power. This has brought about newer programming languages that are more abstracted from the underlying hardware. Popular programming languages of the modern era include C++, C#, Visual Basic, Pascal, HTML, Java/Javascript, Perl, PHP, SQL and dozens more. Although these high-level languages usually incur greater overhead, the increase in speed of modern computers has made the use of these languages much more practical than in the past. These increasingly abstracted languages typically are easier to learn and allow the programmer to develop applications much more efficiently and with less source code. However, high-level languages are still impractical for a few programs, such as those where low-level hardware control is necessary or where maximum processing speed is vital. Computer programming has become a popular career in the developed world, particularly in the United States, Europe, Scandinavia, and Japan. Due to the high labor cost of programmers in these countries, some forms of programming have been increasingly subject to offshore outsourcing (importing software and services from other countries, usually at a lower wage), making programming career decisions in developed countries more complicated, while increasing economic opportunities for programmers in less developed areas, particularly China and India.
Modern programming
Quality requirements
Whatever the approach to software development may be, the final program must satisfy some fundamental properties. The following properties are among the most relevant: Reliability: how often the results of a program are correct. This depends on conceptual correctness of algorithms, and minimization of programming mistakes, such as mistakes in resource management (e.g., buffer overflows and race conditions) and logic errors (such as division by zero or off-by-one errors).
Computer programming Robustness: how well a program anticipates problems not due to programmer error. This includes situations such as incorrect, inappropriate or corrupt data, unavailability of needed resources such as memory, operating system services and network connections, and user error. Usability: the ergonomics of a program: the ease with which a person can use the program for its intended purpose, or in some cases even unanticipated purposes. Such issues can make or break its success even regardless of other issues. This involves a wide range of textual, graphical and sometimes hardware elements that improve the clarity, intuitiveness, cohesiveness and completeness of a program's user interface. Portability: the range of computer hardware and operating system platforms on which the source code of a program can be compiled/interpreted and run. This depends on differences in the programming facilities provided by the different platforms, including hardware and operating system resources, expected behaviour of the hardware and operating system, and availability of platform specific compilers (and sometimes libraries) for the language of the source code. Maintainability: the ease with which a program can be modified by its present or future developers in order to make improvements or customizations, fix bugs and security holes, or adapt it to new environments. Good practices during initial development make the difference in this regard. This quality may not be directly apparent to the end user but it can significantly affect the fate of a program over the long term. Efficiency/performance: the amount of system resources a program consumes (processor time, memory space, slow devices such as disks, network bandwidth and to some extent even user interaction): the less, the better. This also includes correct disposal of some resources, such as cleaning up temporary files and lack of memory leaks.
Algorithmic complexity
The academic field and the engineering practice of computer programming are both largely concerned with discovering and implementing the most efficient algorithms for a given class of problem. For this purpose, algorithms are classified into orders using so-called Big O notation, which expresses resource use, such as execution time or memory consumption, in terms of the size of an input. Expert programmers are familiar with a variety of well-established algorithms and their respective complexities and use this knowledge to choose algorithms that are best suited to the circumstances.
Computer programming
Methodologies
The first step in most formal software development processes is requirements analysis, followed by testing to determine value modeling, implementation, and failure elimination (debugging). There exist a lot of differing approaches for each of those tasks. One approach popular for requirements analysis is Use Case analysis. Nowadays many programmers use forms of Agile software development where the various stages of formal software development are more integrated together into short cycles that take a few weeks rather than years. There are many approaches to the Software development process. Popular modeling techniques include Object-Oriented Analysis and Design (OOAD) and Model-Driven Architecture (MDA). The Unified Modeling Language (UML) is a notation used for both the OOAD and MDA. A similar technique used for database design is Entity-Relationship Modeling (ER Modeling). Implementation techniques include imperative languages (object-oriented or procedural), functional languages, and logic languages.
Debugging
Debugging is a very important task in the software development process, because an incorrect program can have significant consequences for its users. Some languages are more prone to some kinds of faults because their specification does not require compilers to perform as much checking as other languages. Use of a static code analysis tool can help detect some possible problems. Debugging is often done with IDEs like Eclipse, Kdevelop, NetBeans, Code::Blocks, and Visual Studio. Standalone debuggers like gdb are also used, and these often provide less of a visual environment, usually using a command line.
The bug from 1947 which is at the origin of a popular (but incorrect) etymology for the common term for a software defect.
Computer programming
Programming languages
Different programming languages support different styles of programming (called programming paradigms). The choice of language used is subject to many considerations, such as company policy, suitability to task, availability of third-party packages, or individual preference. Ideally, the programming language best suited for the task at hand will be selected. Trade-offs from this ideal involve finding enough programmers who know the language to build a team, the availability of compilers for that language, and the efficiency with which programs written in a given language execute. Languages form an approximate spectrum from "low-level" to "high-level"; "low-level" languages are typically more machine-oriented and faster to execute, whereas "high-level" languages are more abstract and easier to use but execute less quickly. It is usually easier to code in "high-level" languages than in "low-level" ones. Allen Downey, in his book How To Think Like A Computer Scientist, writes: The details look different in different languages, but a few basic instructions appear in just about every language: input: Gather data from the keyboard, a file, or some other device. output: Display data on the screen or send data to a file or other device. arithmetic: Perform basic arithmetical operations like addition and multiplication. conditional execution: Check for certain conditions and execute the appropriate sequence of statements.
repetition: Perform some action repeatedly, usually with some variation. Many computer languages provide a mechanism to call functions provided by libraries such as in a .so. Provided the functions in a library follow the appropriate run time conventions (e.g., method of passing arguments), then these functions may be written in any other language.
Programmers
Computer programmers are those who write computer software. Their jobs usually involve: Coding Compilation Debugging Documentation Integration Maintenance Requirements analysis Software architecture Software testing Specification
References
[1] Paul Graham (2003). Hackers and Painters (http:/ / www. paulgraham. com/ hp. html). . Retrieved 2006-08-22. [2] Kenneth E. Iverson, the originator of the APL programming language, believed that the SapirWhorf hypothesis applied to computer languages (without actually mentioning the hypothesis by name). His Turing award lecture, "Notation as a tool of thought", was devoted to this theme, arguing that more powerful notations aided thinking about computer algorithms. Iverson K.E.," Notation as a tool of thought (http:/ / elliscave. com/ APL_J/ tool. pdf)", Communications of the ACM, 23: 444-465 (August 1980). [3] " Ancient Greek Computer's Inner Workings Deciphered (http:/ / news. nationalgeographic. com/ news/ 2006/ 11/ 061129-ancient-greece. html)". National Geographic News. November 29, 2006. [4] Freeth, Tony; Jones, Alexander; Steele, John M.; Bitsakis, Yanis (July 31, 2008). "Calendars with Olympiad display and eclipse prediction on the Antikythera Mechanism" (http:/ / www. nature. com/ nature/ journal/ v454/ n7204/ full/ nature07130. html). Nature 454 (7204): 614617. doi:10.1038/nature07130. PMID18668103. . [5] A 13th Century Programmable Robot (http:/ / www. shef. ac. uk/ marcoms/ eview/ articles58/ robot. html), University of Sheffield
Computer programming
[6] Fowler, Charles B. (October 1967). "The Museum of Music: A History of Mechanical Instruments". Music Educators Journal (Music Educators Journal, Vol. 54, No. 2) 54 (2): 4549. doi:10.2307/3391092. JSTOR3391092 [7] Fuegi, J.; Francis, J. (2003). "Lovelace & babbage and the creation of the 1843 'notes'". IEEE Annals of the History of Computing 25 (4): 16. doi:10.1109/MAHC.2003.1253887. [8] "Columbia University Computing History - Herman Hollerith" (http:/ / www. columbia. edu/ acis/ history/ hollerith. html). Columbia.edu. . Retrieved 2010-04-25. [9] 12:10 p.m. ET (2007-03-20). "Fortran creator John Backus dies - Tech and gadgets- msnbc.com" (http:/ / www. msnbc. msn. com/ id/ 17704662/ ). MSNBC. . Retrieved 2010-04-25. [10] "CSC-302 99S : Class 02: A Brief History of Programming Languages" (http:/ / www. math. grin. edu/ ~rebelsky/ Courses/ CS302/ 99S/ Outlines/ outline. 02. html). Math.grin.edu. . Retrieved 2010-04-25. [11] James L. Elshoff , Michael Marcotty, Improving computer program readability to aid modification (http:/ / doi. acm. org/ 10. 1145/ 358589. 358596), Communications of the ACM, v.25 n.8, p.512-521, Aug 1982. [12] Multiple (wiki). "Readability" (http:/ / docforge. com/ wiki/ Readability). Docforge. . Retrieved 2010-01-30. [13] Survey of Job advertisements mentioning a given language (http:/ / www. computerweekly. com/ Articles/ 2007/ 09/ 11/ 226631/ sslcomputer-weekly-it-salary-survey-finance-boom-drives-it-job. htm)>
Further reading
A.K. Hartmann, Practical Guide to Computer Simulations (http://www.worldscibooks.com/physics/6988. html), Singapore: World Scientific (2009) A. Hunt, D. Thomas, and W. Cunningham, The Pragmatic Programmer. From Journeyman to Master, Amsterdam: Addison-Wesley Longman (1999) Brian W. Kernighan, The Practice of Programming, Pearson (1999) Weinberg, Gerald M., The Psychology of Computer Programming, New York: Van Nostrand Reinhold (1971)
External links
Software engineering (http://www.dmoz.org/Computers/Software/Software_Engineering//) at the Open Directory Project Programming Wikia (http://programming.wikia.com/wiki/Main_Page) How to Think Like a Computer Scientist (http://openbookproject.net/thinkcs) - by Jeffrey Elkner, Allen B. Downey and Chris Meyers
Before 1940
The first programmings languages predate the modern computer. At first, the languages were codes. The Jacquard loom, invented in 1801, used holes in punched cards to represent sewing loom arm movements in order to generate decorative patterns automatically. During a nine-month period in 1842-1843, Ada Lovelace translated the memoir of Italian mathematician Luigi Menabrea about Charles Babbage's newest proposed machine, the Analytical Engine. With the article, she appended a set of notes which specified in complete detail a method for calculating Bernoulli numbers with the Engine, recognized by some historians as the world's first computer program.[1] Herman Hollerith realized that he could encode information on punch cards when he observed that train conductors encode the appearance of the ticket holders on the train tickets using the position of punched holes on the tickets. Hollerith then encoded the 1890 census data on punch cards. The first computer codes were specialized for their applications. In the first decades of the 20th century, numerical calculations were based on decimal numbers. Eventually it was realized that logic could be represented with numbers, not only with words. For example, Alonzo Church was able to express the lambda calculus in a formulaic way. The Turing machine was an abstraction of the operation of a tape-marking machine, for example, in use at the telephone companies. Turing machines set the basis for storage of programs as data in the von Neumann architecture of computers by representing a machine through a finite number. However, unlike the lambda calculus, Turing's code does not serve well as a basis for higher-level languagesits principal use is in rigorous analyses of algorithmic complexity. Like many "firsts" in history, the first modern programming language is hard to identify. From the start, the restrictions of the hardware defined the language. Punch cards allowed 80 columns, but some of the columns had to be used for a sorting number on each card. FORTRAN included some keywords which were the same as English words, such as "IF", "GOTO" (go to) and "CONTINUE". The use of a magnetic drum for memory meant that computer programs also had to be interleaved with the rotations of the drum. Thus the programs were more hardware-dependent. To some people, what was the first modern programming language depends on how much power and human-readability is required before the status of "programming language" is granted. Jacquard looms and Charles Babbage's Difference Engine both had simple, extremely limited languages for describing the actions that these machines should perform. One can even regard the punch holes on a player piano scroll as a limited domain-specific language, albeit not designed for human consumption.
The 1940s
In the 1940s, the first recognizably modern, electrically powered computers were created. The limited speed and memory capacity forced programmers to write hand tuned assembly language programs. It was eventually realized that programming in assembly language required a great deal of intellectual effort and was error-prone. In 1948, Konrad Zuse published a paper about his programming language Plankalkl. However, it was not implemented in his lifetime and his original contributions were isolated from other developments. Some important languages that were developed in this period include: 1943 - Plankalkl (Konrad Zuse), designed, but unimplemented for a half-century 1943 - ENIAC coding system, machine-specific codeset appearing in 1948.[2] 1949 - 1954 a series of machine-specific mnemonic instruction sets, like ENIAC's, beginning in 1949 with C-10 for BINAC (which later evolved into UNIVAC).[3] Each codeset, or instruction set, was tailored to a specific manufacturer.
History of programming languages 1952 - Autocode 1954 - IPL (forerunner to LISP) 1955 - FLOW-MATIC (forerunner to COBOL) 1957 - FORTRAN (First compiler) 1957 - COMTRAN (forerunner to COBOL) 1958 - LISP 1958 - ALGOL 58 1959 - FACT (forerunner to COBOL) 1959 - COBOL 1959 - RPG 1962 - APL 1962 - Simula 1962 - SNOBOL 1963 - CPL (forerunner to C) 1964 - BASIC 1964 - PL/I 1967 - BCPL (forerunner to C)
10
History of programming languages 1975 - Scheme 1978 - SQL (initially only a query language, later extended with programming constructs)
11
History of programming languages object-oriented. These included Object Pascal, Visual Basic, and Java. Java in particular received much attention. More radical and innovative than the RAD languages were the new scripting languages. These did not directly descend from other languages and featured new syntaxes and more liberal incorporation of features. Many consider these scripting languages to be more productive than even the RAD languages, but often because of choices that make small programs simpler but large programs more difficult to write and maintain. Nevertheless, scripting languages came to be the most prominent ones used in connection with the Web. Some important languages that were developed in this period include: 1990 - Haskell 1991 - Python 1991 - Visual Basic 1991 - HTML (Mark-up Language) 1993 - Ruby 1993 - Lua 1994 - CLOS (part of ANSI Common Lisp) 1995 - Java 1995 - Delphi (Object Pascal) 1995 - JavaScript 1995 - PHP 1996 - WebDNA 1997 - Rebol 1999 - D
12
Current trends
Programming language evolution continues, in both industry and research. Some of the current trends include: Increasing support for functional programming in mainstream languages used commercially, including pure functional programming for making code easier to reason about and easier to parallelise (at both micro- and macro- levels) Constructs to support concurrent and distributed programming. Mechanisms for adding security and reliability verification to the language: extended static checking, information flow control, static thread safety. Alternative mechanisms for modularity: mixins, delegates, aspects. Component-oriented software development. Metaprogramming, reflection or access to the abstract syntax tree Increased emphasis on distribution and mobility. Integration with databases, including XML and relational databases. Support for Unicode so that source code (program text) is not restricted to those characters contained in the ASCII character set; allowing, for example, use of non-Latin-based scripts or extended punctuation. XML for graphical interface (XUL, XAML). Open source as a developmental philosophy for languages, including the GNU compiler collection and recent languages such as Python, Ruby, and Squeak. AOP or Aspect Oriented Programming allowing developers to code by places in code extended behaviors. Massively parallel languages for coding 2000 processor GPU graphics processing units and supercomputer arrays including OpenCL Some important languages developed during this period include: 2000 - ActionScript 2001 - C#
History of programming languages 2001 - Visual Basic .NET 2002 - F# 2003 - Groovy 2003 - Scala 2003 - Factor 2007 - Clojure 2009 - Go 2011 - Dart
13
14
References
[1] J. Fuegi and J. Francis (OctoberDecember 2003), "Lovelace & Babbage and the creation of the 1843 'notes'.", Annals of the History of Computing 25 (4), doi:10.1109/MAHC.2003.1253887 [2] R. F. Clippinger (1948) A Logical Coding System Applied to the ENIAC (http:/ / ftp. arl. mil/ mike/ comphist/ 48eniac-coding/ ) [3] C-10 (http:/ / hopl. murdoch. edu. au/ showlanguage2. prx?exp=5442)
Further reading
Rosen, Saul, (editor), Programming Systems and Languages, McGraw-Hill, 1967 Sammet, Jean E., Programming Languages: History and Fundamentals, Prentice-Hall, 1969 Sammet, Jean E., "Programming Languages: History and Future", Communications of the ACM, of Volume 15, Number 7, July 1972 Richard L. Wexelblat (ed.): History of Programming Languages, Academic Press 1981. Thomas J. Bergin and Richard G. Gibson (eds.): History of Programming Languages, Addison Wesley, 1996.
External links
History and evolution of programming languages (http://www.scriptol.com/programming/history.php). Graph of programming language history (http://www.levenez.com/lang/history.html)
Type identifiers
Integers
8 bit (byte) Signed ALGOL 68 (variable-width) Unsigned 16 bit (short integer) Signed Unsigned Signed 32 bit Unsigned 64 bit (long integer) Signed Unsigned Word size Signed Unsigned N/A bytes & bits Arbitrarily precise (bignum)
[c]
[c]
int N/A
[c]
[c]
int N/A
[c]
[a][g]
15
int32_t uint32_t int64_t uint64_t int unsigned int N/A
C (C99 fixed-width) C++ (C++11 fixed-width) C (C99 variable-width) C++ (C++11 variable-width) Objective-C
int8_t
uint8_t
int16_t
uint16_t
signed char
unsigned char
short[c]
unsigned short[c]
long[c]
unsigned long[c]
long long[c]
signed char
unsigned char
short[c]
unsigned short[c]
long[c]
unsigned long[c]
long long[c]
int or NSInteger
C#
sbyte
byte
short
ushort
int
uint
long
ulong N/A
Java
byte
Go D
int8 byte
Common Lisp
[1]
Scheme
ISLISP
[2]
shortint N/A SByte byte Byte smallint Integer Short word N/A UShort longint Long Integer longword N/A UInteger Long int64 qword N/A ULong N/A integer cardinal
bignum
Pascal (FPC) Visual Basic Visual Basic .NET Python 2.x Python 3.x S-Lang Fortran
N/A
INTEGER(KIND = n)
[f]
N/A
INTEGER(KIND = n)
[f]
N/A
INTEGER(KIND = n)
[f]
N/A
INTEGER(KIND = n)
[f]
N/A
PHP
N/A
N/A
int
N/A
N/A
N/A
[e]
Perl 5
N/A int8
[d]
N/A int16
[d]
N/A int32
[d]
N/A int64
[d]
[d]
Math::BigInt
SmallInteger
[i]
LargeInteger
[i]
Windows N/A PowerShell OCaml N/A N/A int32 N/A int64 N/A int or nativeint F# Standard ML N/A sbyte byte Word8.word N/A IntInf.int int16 uint16 int32 or int Int32.int uint32 Word32.word Int64.int uint64 Word64.word nativeint int unativeint word bigint LargeInt.int or open Big_int;; big_int N/A N/A N/A N/A N/A
16
import Int Int32 import Word Word32 import Int Int64 import Word Word64 Int import Word Word Integer
Haskell (GHC)
Eiffel
INTEGER_8
NATURAL_8
INTEGER_16
NATURAL_16
INTEGER_32
NATURAL_32
INTEGER_64
NATURAL_64
INTEGER
NATURAL
N/A
COBOL[h]
BINARY-CHAR BINARY-CHAR BINARY-SHORT BINARY-SHORT BINARY-LONG BINARY-LONG BINARY-DOUBLE BINARY-DOUBLE N/A SIGNED UNSIGNED SIGNED UNSIGNED SIGNED UNSIGNED SIGNED UNSIGNED
a The standard constants int shorts and int lengths can be used to determine how many shorts and longs can be usefully prefixed to short int and long int. The actually size of the short int, int and long int is available as constants short max int, max int and long max int etc. b Commonly used for characters. c The ALGOL 68, C and C++ languages do not specify the exact width of the integer types "short", "int", "long", and (C99, C++11) "long long", so they are implementation-dependent. In C and C++ "short", "long", and "long long" types are required to be at least 16, 32, and 64 bits wide, respectively, but can be more. The "int" type is required to be at least as wide as "short" and at most as wide as "long", and is typically the width of the word size on the processor of the machine (i.e. on a 32-bit machine it is often 32 bits wide; on 64-bit machines it is often 64 bits wide). C99 and C++11 also define the "[u]intN_t" exact-width types in the stdint.h header. See C syntax#Integral types for more information. d Perl 5 does not have distinct types. Integers, floating point numbers, strings, etc. are all considered "scalars". e PHP has two arbitrary-precision libraries. The BCMath library just uses strings as datatype. The GMP library uses an internal "resource" type. f The value of "n" is provided by the SELECTED_INT_KIND[3] intrinsic function. g ALGOL 68G's run time option --precision "number" can set precision for long long int's to the required "number" significant digits. The standard constants long long int width and long long max int can be used to determine actual precision. h COBOL allows the specification of a required precision and will automatically select an available type capable of representing the specified precision. "PIC S9999", for example, would required a signed variable of four decimal digits precision. If specified as a binary field, this would select a 16 bit signed type on most platforms. i Smalltalk automatically chooses an appropriate representation for integral numbers. Typically, two representations are present, one for integers fitting the native word size minus any tag bit (SmallInteger) and one supporting arbitrary sized integers (LargeInteger). Arithmetic operations support polymorphic arguments and return the result in the most appropriate compact representation.
Floating point
Single precision ALGOL 68 C Objective-C C++ (STL) C# Java Go D Common Lisp float32 float float64 double real float N/A real[a] float[b] Double precision long real[a] double N/A[b] Processor dependent short real etc. & long long real etc.[d]
17
Scheme ISLISP Pascal (Free Pascal) Visual Basic Visual Basic .NET Python JavaScript S-Lang Fortran PHP Perl Perl 6 Ruby Smalltalk Windows PowerShell OCaml F# Standard ML Haskell (GHC) Eiffel COBOL[e] Float REAL_32 N/A float32 N/A real Double REAL_64 float N/A Float num32 N/A num64 Float Double N/A Num REAL(KIND = n)[c] float N/A Number float [4] N/A single Single double Double N/A real
FLOAT-SHORT FLOAT-LONG
a The standard constants real shorts and real lengths can be used to determine how many shorts and longs can be usefully prefixed to short real and long real. The actually size of the short real, real and long real is available as constants short max real, max real and long max real etc. With the constants short small real, small real and long small real available for each type's machine epsilon. b declarations of single precision often are not honored c The value of "n" is provided by the SELECTED_REAL_KIND[5] intrinsic function. d ALGOL 68G's run time option --precision "number" can set precision for long long reals to the required "number" significant digits. The standard constants long long real width and 'long long max real can be used to determine actual precision. e COBOL also supports FLOAT-EXTENDED. The types FLOAT-BINARY-7, FLOAT-BINARY-16 and FLOAT-BINARY-34, specify IEEE-754 binary floating point variables, and FLOAT-DECIMAL-16 and FLOAT-DECIMAL-34 specify IEEE decimal floating point variables.
Complex numbers
18
Integer ALGOL 68 C (C99) [6] N/A N/A N/A N/A N/A N/A N/A N/A
Single precision compl float complex std::complex<float> N/A N/A complex64 cfloat N/A
Double precision long compl etc. double complex std::complex<double> System.Numerics.Complex (.Net 4.0) N/A complex128 cdouble N/A
Half and Quadruple precision etc. short compl etc. & long long compl etc.
C++ (STL) C#
N/A
Java Go D Objective-C Common Lisp Scheme Pascal Visual Basic Visual Basic .NET
N/A N/A N/A System.Numerics.Complex (.Net 4.0) Math::Complex complex64 complex128 complex Complex
Perl Perl 6 Python JavaScript S-Lang Fortran Ruby Smalltalk Windows PowerShell OCaml F# Complex Complex Complex N/A N/A N/A N/A N/A N/A N/A N/A
N/A
N/A
19
Text Character ALGOL 68 C (C99) C++ (STL) Objective-C C# Java Go rune char char wchar_t unichar char String[a] string & bytes N/A std::string NSString * string String string
Boolean
Enumeration
Object/Universal
[7]
N/A
enum name {item1, item2, void * ... }; id enum name {item1, item2, ... } const ( ) item1 = iota item2 ... object Object interface{}
char
string
bool
std.variant.Variant
Common Lisp Scheme ISLISP Pascal (ISO) Object Pascal (Delphi) Visual Basic N/A char string N/A boolean (item1, item2, ...) variant N/A
String
Boolean
Variant
Char
item
2
Object
...
End Enum Python JavaScript S-Lang Fortran CHARACTER(LEN = *) N/A[d] N/A[d] Char Str Bool enum name <item1 item2 ...> or enum name <<:item1(value) :item2(value) ...>> Mu CHARACTER(LEN = :), allocatable string LOGICAL(KIND = n)[f] bool CLASS(*) N/A[d] N/A[d] str String bool Boolean object Object
object
Ruby
N/A[d]
String
Object[c]
Object
20
Windows PowerShell OCaml F# char string bool N/A[e] type name = item1 = value | obj item2 = value | ... N/A[e] Char CHARACTER String STRING Bool BOOLEAN N/A[e] N/A ANY N/A N/A N/A
a specifically, strings of arbitrary length and automatically managed. b This language represents a boolean as an integer where false is represented as a value of zero and true by a non-zero value. c All values evaluate to either true or false. Everything in TrueClass evaluates to true and everything in FalseClass evaluates to false. dThis language does not have a separate character type. Characters are represented as strings of length 1. e Enumerations in this language are algebraic types with only nullary constructors f The value of "n" is provided by the SELECTED_INT_KIND[3] intrinsic function.
Derived types
Array
Further information: Comparison of programming languages (array)
fixed size array one-dimensional array multi-dimensional array one-dimensional array dynamic size array multi-dimensional array
ALGOL 68 [first:last]modename [first1:last1,first2:last2]modename flex[first:last]modename or simply: or simply: or flex[size]modename [size]modename [first1:last1][first2:last2]modename etc. C (C99) C++ (STL)
[a] [a]
std::vector<type>
C#
Java D Go
Array[d]
21
ISLISP Pascal Object Pascal (Delphi) Visual Basic Visual Basic .NET System.Collections.ArrayList or System.Collections.Generic.List(Of type) list array[first..last] of type[c] array[first1..last1] of array[first2..last2] ... of type [c] or array[first1..last1, first2..last2, ...] of type [c] N/A array of type N/A array of array ... of type
Python S-Lang Fortran PHP Perl Perl 6 Ruby Smalltalk Array type[,,...] type :: name(size) type :: name(size1, size2,...)
aIn most expressions (except the sizeof and & operators), values of array types in C are automatically converted to a pointer of its first argument. Also C's arrays can not be described in this format. See C syntax#Arrays. b The C-like "type x[]" works in Java, however "type[] x" is the preferred form of array declaration. c Subranges are used to define the bounds of the array. d JavaScript's array are a special kind of object.
Other types
22
Simple composite types Records ALGOL 68 struct (modename fieldname, ...); struct name {type name;...}; N/A Objective-C C++ C# Java JavaScript D struct name {type name;...} struct name {type name;...};[b] struct name {type name;...} N/A[a] N/A std::tuple<type1..typen> Tuple expression
Unions
C (C99)
N/A
N/A
std.variant.Algebraic!(type,...)
Go
Common Lisp Scheme ISLISP Pascal record name : type ; ... N/A
N/A
N/A
end
end Visual Basic Visual Basic .NET Structure name Dim name As type ...
End Structure Python S-Lang Fortran N/A[a] struct {name [=value], ...} TYPE name type :: name ... (val1, val2, val3, ... ) N/A
END TYPE
23
Windows PowerShell OCaml F# Standard ML type name = {mutable name : type;...} type name = {name : type,...} (val1, val2, val3, ... ) type name = Foo of type | Bar of type | ... datatype name = Foo of type | Bar of type | ... data Name = Foo types | Bar types | ... N/A
Haskell
a Only classes are supported. b structs in C++ are actually classes, but have default public visibility and are also POD objects. C++11 extended this further, to make classes act identically to POD objects in many more cases. c pair only d Although Perl doesn't have records, because Perl's type system allows different data types to be in an array, "hashes" (associative arrays) that don't have a variable index would effectively be the same as records. e Enumerations in this language are algebraic types with only nullary constructors
const type name = value; type name = initial_value; or var name = value; type name = initial_value; or auto name = value; type name = initial_value; var name = initial_value; var name type = initial_value or name := initial_value (defparameter name initial_value) or (defvar name initial_value) or (setf (symbol-value 'symbol) initial_value) (define name initial_value) (defglobal name initial_value) or (defdynamic name initial_value) (defconstant name value) N/A const type name = value; or readonly type name = value; const type name = value; or immutable type name = value; final type name = value; N/A const name = value; const name type = initial_value (defconstant name value) type synonym type using synonym = type;
Java JavaScript Go
Common Lisp
Scheme ISLISP
24
name = value Const name As type = value Imports synonym = type N/A synonym = type[b] typedef struct {...} typename type, PARAMETER :: name = value define("name", value); const name = value (5.3+)
[c]
name: type = initial_value Dim name As type Dim name As type= initial_value name = initial_value name = initial_value;
synonym = type
Fortran
type name
PHP
$name = initial_value;
N/A
Perl Perl 6
my $name = initial_value;
use constant name => value; my type constant name = value; Name = value ::synonym ::= type
synonym = type[b]
let name : type ref = ref value[d] let mutable name : type = value val name : type ref = ref value[d]
val name : type = value name::type; name = value type Synonym = type
VARIABLE name (in some systems use value VARIABLE name instead)
a Pascal has declaration blocks. See Comparison of programming languages (basic instructions)#Functions. bTypes are just regular objects, so you can just assign them. c In Perl, the "my" keyword scopes the variable into the block. d Technically, this does not declare name to be a mutable variablein ML, all names can only be bound once; rather, it declares name to point to a "reference" data structure, which is a simple mutable cell. The data structure can then be read and written to using the ! and := operators, respectively.
Control flow
Conditional statements
25
select case case switch in statements, statements,... out statements esac ( variable | statements,... | statements ) switch (variable) { case case1 : instructions break; ... default: instructions
( condition | statements | statements ) C (C99) Objective-C C++ (STL) D Java JavaScript PHP C# if (condition) {instructions} else {instructions}
( condition | statements |: condition | statements ) if (condition) {instructions} else if (condition) {instructions} ... else {instructions}
switch (variable) { case case1 : instructions ; jump statement ; ... default: instructions ; jump statement ;
} Windows PowerShell if (condition) { instructions } elseif (condition) { instructions } ... else { instructions }
26
if condition {instructions} else if condition {instructions} ... else {instructions} or switch { case condition : instructions ... default: instructions switch variable { case case1 : instructions ... default: instructions
Go
} Perl if (condition) {instructions} else {instructions} or unless (notcondition) {instructions} else {instructions}
if (condition) {instructions} elsif (condition) {instructions} ... else {instructions} or unless (notcondition) {instructions} elsif (condition) {instructions} ... else {instructions}
use feature "switch"; ... given (variable) { } when ( case1 ){ instructions } ... default { instructions }
Perl 6
Ruby
if condition instructions
if condition instructions
else instructions
end
end
27
condition ifTrue: trueBlockifFalse: falseBlock
Smalltalk
ifFalse: falseBlock
end Common Lisp (when condition instructions ) (cond (condition1 instructions) ( condition2 instructions ) ... (t instructions ) ) (case expression ( case1 instructions ) ( case2 instructions ) ... (otherwise instructions ) ) (if condition valueIfTrue valueIfFalse)
28
(cond (condition1 instructions) ( condition2 instructions ) ... (t instructions ) ) (case expression ( case1 instructions ) ( case2 instructions ) ... (t instructions ) ) (if condition valueIfTrue valueIfFalse)
ISLISP
Pascal
end
end[c]
Visual Basic
End If
End Select
End If Python [a] if condition : Tab instructions else: Tab instructions if condition : Tab instructions elif condition : Tab instructions ... else: Tab instructions if (condition) { instructions } else if (condition) { instructions } ... else { instructions } IF (condition) THEN instructions valueIfTrue if condition else valueIfFalse (Python 2.5+) N/A
S-Lang
switch (variable) { case case1: instructions } { case case2: instructions } ... SELECT CASE(variable) CASE ( case1 ) instructions ... CASE DEFAULT instructions
Fortran
ELSE instructions
ENDIF
ENDIF
END SELECT
29
condition IF instructions ELSE condition IF instructions THEN THEN value CASE case OF instructions ENDOF case OF instructions ENDOF default instructions ENDCASE match value with pattern1 -> expression | pattern2 -> expression ... | _ -> expression[b] condition IF valueIfTrue ELSE valueIfFalse THEN
Forth
OCaml
if condition then begin instructions end else if condition then begin instructions end ... else begin instructions end if condition then Tab instructions elif condition then Tab instructions ... else Tab instructions if condition then (instructions ) else if condition then ( instructions ) ... else ( instructions )
F#
Standard ML
case value of pattern1 => expression | pattern2 => expression ... | _ => expression[b]
Haskell (GHC) if condition then expression else expression or when condition (do instructions) or unless notcondition (do instructions) result | condition = expression | condition = expression | otherwise = expression
case value of { }[b] pattern1 -> expression ; pattern2 -> expression ; ... _ -> expression
if
else if
select case
conditional expression
a A single instruction can be written on the same line following the colon. Multiple instructions are grouped together in a block which starts on a newline (The indentation in required). The conditional expression syntax does not follow this rule. b This is pattern matching and is similar to select case but not the same. It is usually used to deconstruct algebraic data types. c In languages of the Pascal family, the semicolon is not part of the statement. It is a separator between statements, not a terminator.
30
Loop statements
while ALGOL 68 do while for i = first to last foreach for key to upb list do typename val=list[key]; statements od
for index from first by increment to last while condition do statements od while condition do statements od while statements; condition do statements od do { instructions } while (condition) for index from first by increment to last do statements od for (type i = first; i <= last; ++i) { instructions }
N/A for (type item in set) { instructions } std::for_each(start, end, function) (C++11) for (type item : set) { instructions } foreach (type item in set) { instructions } for (type item : set) { instructions }
C#
for (var index in set) { instructions } or for each (var item in set) { instructions } (JS 1.6+) foreach (set as item) { instructions } or foreach (set as key => item) { instructions }
PHP
foreach (range(first, last-1) as $i) { instructions } or for ($i = first; $i <= last; $i++) { instructions }
Windows PowerShell D
for ($i = first; $i -le last; $i++) { foreach (item in set) { instructions instructions } using item } foreach (i; first ... last) { instructions } for condition { instructions } while (condition) { instructions } or until (notcondition) { instructions } while condition { instructions } or until notcondition { instructions } while condition instructions do { instructions } while (condition) or do { instructions } until (notcondition) repeat { instructions } while condition or repeat { instructions } until notcondition begin instructions for i := first; i <= last; i++ { instructions } foreach $i (0 .. N-1) { instructions } or for ($i = first; $i <= last; $i++) { instructions } foreach (type item; set) { instructions } for key, item := range set { instructions } foreach $item (set) { instructions }
Go
Perl
Perl 6
for first..last -> $i { instructions } for set -> $item { instructions } or loop ($i = first; $i <= last; $i++) { instructions } for i in first...last instructions for item in set instructions
Ruby
end until notcondition loopBlock doWhile: conditionBlock first to: last do: loopBlock collection do: loopBlock
31
(loop do instructions while condition ) (loop for i from first to last by 1 do instructions ) (loop for item in set do instructions )
Common Lisp
or (dotimes (i N) instructions )
or (do ((i first (1+ i))) ((>= i last)) Scheme (do () (notcondition) instructions) or (let loop () (if condition (begin instructions (loop)))) (while condition instructions) while condition do begin instructions (let loop () (instructions (if condition (loop)))) instructions )
(do ((i first (+ i 1))) ((>= i last)) instructions) or (let loop ((i first)) (if (< i last) (begin instructions (loop (+ i 1)))))
ISLISP
(for ((i first (+ i 1))) ((>= i last)) (mapc (lambda (item) instructions) instructions) list) for i := first step 1 to last do begin instructions end;[a]
Pascal
end
until notcondition;
N/A
Visual Basic
Do instructions
Loop Python while condition : Tab instructions else: Tab instructions while (condition) { instructions } then optional-block DO WHILE (condition) instructions
Next item
Next i
N/A
for i in range(first, last): Tab instructions else: Tab instructions for (i = first; i < last; i++) { instructions } then optional-block DO I = first,last instructions
for item in set: Tab instructions else: Tab instructions foreach item(set) using (what) { instructions } then optional-block
S-Lang
Fortran
ENDDO
ENDDO
N/A
ENDDO
32
BEGIN instructions condition UNTIL limit start DO instructions LOOP
Forth
BEGIN instructions condition WHILE instructions REPEAT while condition do instructions done
N/A
OCaml
N/A
Array.iter (fun item -> instructions) array List.iter (fun item -> instructions) list for item in set do Tab instructions or Seq.iter (fun item -> instructions) set Array.app (fn item => instructions) array app (fn item => instructions) list Control.Monad.forM_ list (\item -> do instructions)
F#
Standard ML
N/A
N/A
until condition
loop instructions
end
a "step n" is used to change the loop interval. If "step" is omitted, then the loop interval is 1.
Exceptions
Further information: Exception handling syntax
throw C (C99) longjmp(state, exception); throw exception; handler switch (setjmp(state)) { case 0: instructions break; case exception: instructions ... } try { instructions } catch (exception) { instructions } ... try { instructions } catch (exception) { instructions } ... finally { instructions } try { instructions } catch (exception) { instructions } ... finally { instructions } try { instructions } catch (exception) { instructions } finally { instructions } try { instructions } catch (exception) { instructions } ... finally { instructions } try { instructions } catch (exception) { instructions } ... try { instructions } catch exception { instructions } ... finally { instructions } trap [exception] { instructions } ... instructions or try { instructions } catch [exception] { instructions } ... finally { instructions } Debug.Assert(condition); assertion assert(condition);
C++ (STL) C#
Java
assert condition;
JavaScript
assert(condition);
PHP S-Lang
assert(condition); ?
Windows PowerShell
[Debug]::Assert(condition)
33
NSAssert(condition, description); ? ?
Objective-C
@throw exception;
@try { instructions } @catch (exception) { instructions } ... @finally { instructions } eval { instructions }; if ($@) { instructions } try { instructions CATCH { when exception { instructions } ...}}
die exception;
raise exception
begin instructions
ensure instructions
end Smalltalk exception raise instructionBlock on: exception do: handlerBlock assert: conditionBlock (assert condition) or (assert condition ( place ) error )
Common Lisp (error "exception") or (handler-case (error (make-condition (progn type instructions arguments ) )) ( exception instructions ) ... )
Pascal
34
Debug.Assert condition
Visual Basic
Err.Raise ERRORNUMBER
End Select: End With '*** Try class *** Private mstrDescription As String Private mlngNumber As Long Public Sub Catch() mstrDescription = Err.Description mlngNumber = Err.Number
End Property [8] Visual Basic .NET Throw exception Try instructions Debug.Assert(condition)
Tab instructions
except exception: Tab instructions ... else: Tab instructions finally: Tab instructions Fortran Forth OCaml F# code THROW raise exception xt CATCH ( code or 0 ) try expression with pattern -> expression ... try expression with pattern -> expression ... or try expression finally expression raise exception arg throw exception or throwError expression expression handle pattern => expression ... catch tryExpression catchExpression or catchError tryExpression catchExpression assert condition expression N/A N/A assert condition
Comparison of programming languages a Common Lisp allows with-simple-restart, restart-case and restart-bind to define restarts for use with invoke-restart. Unhandled conditions may cause the implementation to show a restarts menu to the user before unwinding the stack.
35
ALGOL 68
label: ...
go to label; yield(value) (Callback [9] ... example ) goto label; ... label; ... goto label; N/A
C (C99) Objective-C C++ (STL) D C# Java JavaScript PHP Perl Perl 6 Go Common Lisp
break;
continue;
label:
yield return value; break label; continue label; N/A yield value; break levels; last label; continue levels; next label; goto label;
tag ... )
Scheme ISLISP (return-from block) (tagbody tag Pascal(ISO) Pascal(FPC) Visual Basic Visual Basic .NET Python RPG IV S-Lang Fortran break LEAVE; break; EXIT break; Exit block Continue block continue ITER; continue; CYCLE label[b] GOTO label N/A N/A yield value N/A continue; N/A N/A label: GoTo label ... (go tag)
label:[a]
36
next continue
a Pascal has declaration blocks. See Comparison of programming languages (basic instructions)#Functions. b label must be a number between 1 and 99999.
Functions
See reflection for calling and declaring functions by strings.
calling a function ALGOL 68 foo(parameters); basic/void function proc foo = (parameters) void: ( instructions ); value-returning function proc foo = (parameters) rettype: ( instructions ...; retvalue ); required main function N/A global declarations int main(int argc, char *argv[]) { C++ (STL) C# } static void Main(string[] args) { instructions } or static int Main(string[] args) { instructions } public static void main(String[] args) { instructions } or public static void main(String... args) { instructions } int main(char[][] args) { instructions} or int main(string[] args) { instructions} or void main(char[][] args) { instructions} or void main(string[] args) { instructions} function foo(parameters) { function foo(parameters) { instructions } or instructions ... return value; } var foo = function (parameters) {instructions } or var foo = new Function ("parameter", ... ,"last parameter" "instructions"); func foo(parameters) { instructions func foo(parameters) type { } instructions ... return value } instructions
C (C99) Objective-C
foo(parameters)
Java
JavaScript
N/A
Go
37
(defun foo (parameters) ... value )
(define (foo parameters) instructions) or (define foo (lambda (parameters) instructions)) (defun foo (parameters) instructions )
(define (foo parameters) instructions... return_value) or (define foo (lambda (parameters) instructions... return_value)) (defun foo (parameters) ... value )
N/A
ISLISP
Pascal
foo(parameters)
end.
end;
end; Visual Basic Foo(parameters) Sub Foo(parameters) instructions Function Foo(parameters) As type instructions Foo = value Sub Main() instructions
End Sub
End Sub
End Function Visual Basic .NET Function Foo(parameters) As type instructions Return value Sub Main(ByVal CmdArgs() As String) instructions
End Function
End Function
38
def foo(parameters): Tab instructions Tab return value define foo (parameters) { instructions ... return value; } type FUNCTION foo (arguments) instructions ... foo = value
Python
foo(parameters)
N/A
S-Lang
Fortran
END PROGRAM
END SUBROUTINE
END FUNCTION[c] Forth parameters FOO : FOO stack effect comment: ( before -- ) ; PHP foo(parameters) function foo(parameters) { instructions } sub foo { my (parameters) = @_; instructions } multi sub foo(parameters) { instructions } instructions : FOO stack effect comment: ( before -- after ) ; function foo(parameters) { instructions ... return value; } sub foo { my (parameters) = @_; instructions... return value; } our type multi sub foo(parameters) { instructions... return value; } def foo(parameters) instructions return value N/A instructions N/A
Perl
Perl 6
Ruby
foo(parameters)
end
end Windows PowerShell foo parameters function foo (parameters) { instructions }; or function foo { param(parameters) instructions } function foo (parameters) { instructions return value }; or function foo { param(parameters) instructions return value } let rec foo parameters = instructions... return_value
OCaml F#
foo parameters
Standard ML
fun foo parameters = ( instructions... return_value ) foo parameters = return_value or foo parameters = do Tab instructions Tab return value main :: IO () main = do instructions
Haskell
39
foo (parameters): type require preconditions do instructions
Eiffel
foo (parameters)
[b]
a Pascal requires "forward;" for forward declarations. b Eiffel allows the specification of an application's root class and feature. c In Fortran, function/subroutine parameters are called arguments (since PARAMETER is a language keyword); the CALL keyword is required for subroutines.
[c]
Type conversions
Where string is a signed decimal number:
string to integer ALGOL 68 with general, and then specific formats string to long integer string to floating point integer to string floating point to string
With prior declarations and association of: string buf := "12345678.9012e34 "; file proxy; associate(proxy, buf); get(proxy, ivar); getf(proxy, ($g$, ivar)); or getf(proxy, ($dddd$, ivar)); get(proxy, livar); getf(proxy, ($g$, livar)); or getf(proxy, ($8d$, livar)); get(proxy, rvar); getf(proxy, ($g$, rvar)); or getf(proxy, ($8d.4dE2d$, rvar)); put(proxy, ival); putf(proxy, ($g$, ival)); or putf(proxy, ($4d$, ival)); put(proxy, rval); putf(proxy, ($g(width, places, exp)$, rval)); or putf(proxy, ($8d.4dE2d$, rval)); etc. sprintf(string, "%f", float);
C (C99)
integer = atoi(string);
long = atol(string);
float = atof(string);
string = [NSString string = [NSString stringWithFormat:@"%i", stringWithFormat:@"%f", integer]; float]; std::ostringstream o; o << number; string = o.str();
std::istringstream(string) >> number; integer = std::stoi(string); integer = int.Parse(string); long = std::stol(string); long = long.Parse(string);
float = std::stof(string); string = std::to_string(number); double = std::stod(string); float = float.Parse(string); or double = double.Parse(string); float = std.conv.to!float(string) or double = std.conv.to!double(string) string = number.ToString();
C#
integer = std.conv.to!int(string)
long = std.conv.to!long(string)
string = std.conv.to!string(number)
Java
integer = long = float = string = Integer.parseInt(string); Long.parseLong(string); Float.parseFloat(string); or Integer.toString(integer); double = Double.parseDouble(string);
40
float = parseFloat(string); or float = new Number (string) or float = Number (string) or float = string*1; float, error = strconv.ParseFloat(string, 64) string = number.toString (); or string = new String (number); or string = String (number); or string = number+"";
Go
integer, error = long, error = strconv.Atoi(string) or strconv.ParseInt(string, integer, error = 10, 64) strconv.ParseInt(string, 10, 0)
(setf float (read-from-string (setf string (princ-to-string number)) string)) (define string (number->string number)) (setf float (convert string <float>)) float := StrToFloat(string); (setf string (convert number <string>))
Pascal
string := IntToStr(integer);
string := FloatToStr(float);
string = CStr(number)
READ(string,format) number integer = intval(string); or integer = (int)string; float = floatval(string); or float = (float)string;
string = "number"; or string = strval(number); or string = (string)number; string = "number"; string = ~number;
number = 0 + string; number = +string; integer = string.to_i or integer = Integer(string) integer = [int]string long = [long]string float = string.to_f or float = Float(string) float = [float]string
string = number.to_s
Windows PowerShell
string = [string]number; or string = "number"; or string = (number).ToString() let string = string_of_int integer let string = string number val string = Int.toString integer string = show number val string = Real.toString float let string = string_of_float float
OCaml
let integer = int_of_string string let integer = int string val integer = Int.fromString string number = read string let integer = int64 string
let float = float_of_string string let float = float string val float = Real.fromString string
a JavaScript only uses floating point numbers so there are some technicalities.[4] b Perl doesn't have separate types. Strings and numbers are interchangeable.
41
C (C99)
Objective-C
[[NSFileHandle fileHandleWithStandardError] writeData:data]; std::cerr << x; or std::clog << x; Console.Error.Write(format, x); or Console.Error.WriteLine(format, x); stderr.write(x) or stderr.writeln(x) or std.stdio.writef(stderr, format, x) or std.stdio.writefln(stderr, format, x) System.err.print(x); or System.err.printf(format, x); or System.err.println(x); fmt.Fprintln(os.Stderr, x) or fmt.Fprintf(os.Stderr, format, x)
C++
C#
x = std.stdio.readln()
std.stdio.write(x) or std.stdio.writeln(x) or std.stdio.writef(format, x) or std.stdio.writefln(format, x) System.out.print(x); or System.out.printf(format, x); or System.out.println(x); fmt.Println(x) or fmt.Printf(format, x)
Java
Go
JavaScript Web Browser implementation JavaScript Active Server Pages JavaScript x = WScript.StdIn.Read(chars) or Windows Script x = WScript.StdIn.ReadLine() Host Common Lisp (setf x (read-line))
document.write(x)
Response.Write(x)
WScript.StdErr.Write(x) or WScript.StdErr.WriteLine(x)
(princ x *error-output*) or (format *error-output* format x) (display x (current-error-port)) or (format (current-error-port) format x)
Scheme (R6RS)
(define x (read-line))
ISLISP Pascal
(format (standard-output) format x) (format (error-output) format x) write(x); or writeln(x); N/A Print x or ?x
Visual Basic
42
Console.Write(format, x) or Console.WriteLine(format, x) Console.Error.Write(format, x) or Console.Error.WriteLine(format, x) print >> sys.stderr, x or sys.stderr.write(x) print(x, end="", file=sys.stderr) fputs (x, stderr) WRITE(ERROR_UNIT,format) expressions[e]
x = Console.Read() or x = Console.ReadLine()
Python 2.x
x = raw_input(prompt)
READ(*,format) variable names or WRITE(*,format) expressions or READ(INPUT_UNIT,format) variable names[e] WRITE(OUTPUT_UNIT,format) expressions[e] buffer length ACCEPT ( # chars read ) KEY ( char ) $x = fgets(STDIN); or $x = fscanf(STDIN, format); buffer length TYPE char EMIT print x; or echo x; or printf(format, x); print x; or printf format, x; x.print or x.say
Forth
PHP
Perl
print STDERR x; or printf STDERR format, x; x.note or $*ERR.print(x) or $*ERR.say(x) $stderr.puts(x) or $stderr.printf(format, x) Write-Error x
Perl 6
Ruby
x = gets
puts x or printf(format, x) x; or Write-Output x; or echo x print_int x or print_endline str or Printf.printf format x ... printf format x ... or printfn format x ... print str print x or putStrLn str
Windows PowerShell
$x = Read-Host -Prompt text; or $x = [Console]::Read(); or $x = [Console]::ReadLine() let x = read_int () or let str = read_line () or Scanf.scanf format (fun x ... -> ...) let x = System.Console.ReadLine()
OCaml
prerr_int x or prerr_endline str or Printf.eprintf format x ... eprintf format x ... or eprintfn format x ... TextIO.output (TextIO.stdErr, str) hPrint stderr x or hPutStrLn stderr str
F#
a Algol 68 additionally as the "unformatted" transput routines: read, write, get and put. b gets(x) and fgets(x, length, stdin) read unformatted text from stdin. Use of gets is not recommended. c puts(x) and fputs(x, stdout) write unformatted text to stdout. d fputs(x, stderr) writes unformatted text to stderr e INPUT_UNIT, OUTPUT_UNIT, ERROR_UNIT are defined in the ISO_FORTRAN_ENV module.[10]
43
Argument values C (C99) Objective-C C++ C# Java D JavaScript Windows Script Host implementation Go Common Lisp Scheme (R6RS) ISLISP Pascal Visual Basic Visual Basic .NET Python S-Lang Fortran ParamStr(n) Command[a] CmdArgs(n) WScript.Arguments(n) args[n] argv[n] argc
Argument counts
args.Length args.length
Assembly.GetEntryAssembly().Location;
ENDDO PHP Perl Perl 6 Ruby Windows PowerShell OCaml F# Standard ML $argv[n] $ARGV[n] @*ARGS[n] ARGV[n] $args[n] $argc scalar(@ARGV) @*ARGS.elems ARGV.size $args.Length first argument $0 $PROGRAM_NAME $0 $MyInvocation.MyCommand.Name
Sys.argv.(n) args.[n] List.nth (CommandLine.arguments (), n) do { args <- System.getArgs; return args !! n }
Haskell (GHC)
System.getProgName
Comparison of programming languages aThe command-line arguments in Visual Basic are not separated. A split function Split(string) is required for separating them.
44
Execution of commands
Shell command Execute program Replace current program with new executed program execl(path, args); or execv(path, arglist); [NSTask launchedTaskWithLaunchPath:(NSString *)path arguments:(NSArray *)arguments]; System.Diagnostics.Process.Start(path, argstring); exec.Run(path, argv, envv, dir, exec.DevNull, exec.DevNull, exec.DevNull) Interaction.Shell(command ,WindowStyle ,isWaitOnReturn) Microsoft.VisualBasic.Interaction.Shell(command ,WindowStyle ,isWaitOnReturn) std.process.system("command"); System.Diagnostics.Process.Start(path, argstring) std.process.execv(path, arglist); Runtime.exec(command); or new ProcessBuilder(command).start(); os.Exec(path, argv, envv)
C C++ Objective-C
system("command");
C# F# Go
Visual Basic
Java
JavaScript WScript.CreateObject WshShell.Exec(command) Windows ("WScript.Shell").Run(command ,WindowStyle Script Host ,isWaitOnReturn); implementation Common Lisp Scheme ISLISP Pascal OCaml system(command); Sys.command command, Unix.open_process_full command env (stdout, stdin, stderr),... OS.Process.system command Unix.create_process prog args new_stdin new_stdout new_stderr, ... Unix.execute (path, args) Unix.execv prog args or Unix.execve prog args env Posix.Process.exec (path, args) Posix.Process.executeFile path True args ... exec(path, args) (shell command) (system command) N/A N/A N/A
Standard ML
Haskell (GHC)
System.system command
Perl
system(command) or $output = `command` system(command) or output = `command` system(command) or $output = `command` or exec(command) or passthru(command)
Ruby
exec(path, args)
PHP
45
os.execv(path, args)
Python
os.system(command) or subprocess.Popen(command) system(command) CALL SYSTEM (command, status) or status = SYSTEM (command)[a] [Diagnostics.Process]::Start(command) Invoke-Item program arg1 arg2
S-Lang Fortran
Windows PowerShell
a Compiler-dependent extension.[11]
References
[1] http:/ / www. lispworks. com/ documentation/ HyperSpec/ Front/ index. htm [2] http:/ / www. islisp. info/ specification. html [3] http:/ / fortranwiki. org/ fortran/ show/ selected_int_kind [4] 8.5 The Number Type (http:/ / www. mozilla. org/ js/ language/ E262-3. pdf) [5] http:/ / fortranwiki. org/ fortran/ show/ selected_real_kind [6] http:/ / www. gnu. org/ software/ libc/ manual/ html_node/ Complex-Numbers. html#Complex-Numbers [7] http:/ / rosettacode. org/ wiki/ Enumerations#ALGOL_68 [8] https:/ / sites. google. com/ site/ truetryforvisualbasic/ [9] http:/ / rosettacode. org/ wiki/ Prime_decomposition#ALGOL_68 [10] http:/ / fortranwiki. org/ fortran/ show/ iso_fortran_env [11] http:/ / gcc. gnu. org/ onlinedocs/ gfortran/ SYSTEM. html#SYSTEM
Computer program
A computer program (also software, or just a program) is a sequence of instructions written to perform a specified task with a computer.[1] A computer requires programs to function, typically executing the program's instructions in a central processor.[2] The program has an executable form that the computer can use directly to execute the instructions. The same program in its human-readable source code form, from which executable programs are derived (e.g., compiled), enables a programmer to study and develop its algorithms. Computer source code is often written by computer programmers. Source code is written in a programming language that usually follows one of two main paradigms: imperative or declarative programming. Source code may be converted into an executable file (sometimes called an executable program or a binary) by a compiler and later executed by a central processing unit. Alternatively, computer programs may be executed with the aid of an interpreter, or may be embedded directly into hardware. Computer programs may be categorized along functional lines: system software and application software. Two or more computer programs may run simultaneously on one computer, a process known as multitasking.
Programming
#include <stdio.h> int main() { printf("Hello world!\n"); return 0; } Source code of a program written in the C programming language
Computer program Computer programming is the iterative process of writing or editing source code. Editing source code involves testing, analyzing, and refining, and sometimes coordinating with other programmers on a jointly developed program. A person who practices this skill is referred to as a computer programmer, software developer or coder. The sometimes lengthy process of computer programming is usually referred to as software development. The term software engineering is becoming popular as the process is seen as an engineering discipline.
46
Paradigms
Computer programs can be categorized by the programming language paradigm used to produce them. Two of the main paradigms are imperative and declarative. Programs written using an imperative language specify an algorithm using declarations, expressions, and statements.[3] A declaration couples a variable name to a datatype. For example: var x: integer; . An expression yields a value. For example: 2 + 2 yields 4. Finally, a statement might assign an expression to a variable or use the value of a variable to alter the program's control flow. For example: x := 2 + 2; if x = 4 then do_something(); One criticism of imperative languages is the side effect of an assignment statement on a class of variables called non-local variables.[4] Programs written using a declarative language specify the properties that have to be met by the output. They do not specify details expressed in terms of the control flow of the executing machine but of the mathematical relations between the declared objects and their properties. Two broad categories of declarative languages are functional languages and logical languages. The principle behind functional languages (like Haskell) is to not allow side effects, which makes it easier to reason about programs like mathematical functions.[4] The principle behind logical languages (like Prolog) is to define the problem to be solved the goal and leave the detailed solution to the Prolog system itself.[5] The goal is defined by providing a list of subgoals. Then each subgoal is defined by further providing a list of its subgoals, etc. If a path of subgoals fails to find a solution, then that subgoal is backtracked and another path is systematically attempted. The form in which a program is created may be textual or visual. In a visual language program, elements are graphically manipulated rather than textually specified.
Compiling or interpreting
A computer program in the form of a human-readable, computer programming language is called source code. Source code may be converted into an executable image by a compiler or executed immediately with the aid of an interpreter. Either compiled or interpreted programs might be executed in a batch process without human interaction, but interpreted programs allow a user to type commands in an interactive session. In this case the programs are the separate commands, whose execution occurs sequentially, and thus together. When a language is used to give commands to a software application (such as a shell) it is called a scripting language. Compilers are used to translate source code from a programming language into either object code or machine code. Object code needs further processing to become machine code, and machine code is the central processing unit's native code, ready for execution. Compiled computer programs are commonly referred to as executables, binary images, or simply as binaries a reference to the binary file format used to store the executable code. Interpreted computer programs - in a batch or interactive session - are either decoded and then immediately executed or are decoded into some efficient intermediate representation for future execution. BASIC, Perl, and Python are examples of immediately executed computer programs. Alternatively, Java computer programs are compiled ahead of time and stored as a machine independent code called bytecode. Bytecode is then executed on request by an interpreter called a virtual machine.
Computer program The main disadvantage of interpreters is that computer programs run slower than when compiled. Interpreting code is slower than running the compiled version because the interpreter must decode each statement each time it is loaded and then perform the desired action. However, software development may be faster using an interpreter because testing is immediate when the compiling step is omitted. Another disadvantage of interpreters is that at least one must be present on the computer during computer program execution. By contrast, compiled computer programs need no compiler present during execution. No properties of a programming language require it to be exclusively compiled or exclusively interpreted. The categorization usually reflects the most popular method of language execution. For example, BASIC is thought of as an interpreted language and C a compiled language, despite the existence of BASIC compilers and C interpreters. Some systems use just-in-time compilation (JIT) whereby sections of the source are compiled 'on the fly' and stored for subsequent executions.
47
Self-modifying programs
A computer program in execution is normally treated as being different from the data the program operates on. However, in some cases this distinction is blurred when a computer program modifies itself. The modified computer program is subsequently executed as part of the same program. Self-modifying code is possible for programs written in machine code, assembly language, Lisp, C, COBOL, PL/1, Prolog and JavaScript (the eval feature) among others.
Embedded programs
Some computer programs are embedded into hardware. A stored-program computer requires an initial computer program stored in its read-only memory to boot. The boot process is to identify and initialize all aspects of the system, from processor registers to device controllers to memory contents.[7] Following the initialization process, this initial computer program loads the operating system and sets the program counter to begin normal operations. Independent of the host computer, a hardware device might have embedded firmware to control its operation. Firmware is used when the computer program is rarely or never expected to change, or when the program must not be lost when the power is off.[8]
The microcontroller on the right of this USB flash drive is controlled with embedded firmware.
Computer program
48
Manual programming
Computer programs historically were manually input to the central processor via switches. An instruction was represented by a configuration of on/off settings. After setting the configuration, an execute button was pressed. This process was then repeated. Computer programs also historically were manually input via paper tape or punched cards. After the medium was loaded, the starting address was set via switches and the execute button pressed.[9]
Generative programming is a style of computer programming that creates source code through generic classes, prototypes, templates, aspects, and code generators to improve programmer productivity. Source code is generated with programming tools such as a template processor or an integrated development environment. The simplest form of source code generator is a macro processor, such as the C preprocessor, which replaces patterns in source code according to relatively simple rules. Software engines output source code or markup code that simultaneously become the input to another computer process. Application servers are software engines that deliver applications to client computers. For example, a Wiki is an application server that lets users build dynamic content assembled from articles. Wikis generate HTML, CSS, Java, and JavaScript which are then interpreted by a web browser.
Simultaneous execution
Many operating systems support multitasking which enables many computer programs to appear to run simultaneously on one computer. Operating systems may run multiple programs through process scheduling a software mechanism to switch the CPU among processes often so users can interact with each program while it runs.[10] Within hardware, modern day multiprocessor computers or computers with multicore processors may run multiple programs.[11] One computer program can calculate simultaneously more than one operation using threads or separate processes. Multithreading processors are optimized to execute multiple threads efficiently.
Functional categories
Computer programs may be categorized along functional lines. The main functional categories are system software and application software. System software includes the operating system which couples computer hardware with application software.[12] The purpose of the operating system is to provide an environment in which application software executes in a convenient and efficient manner.[12] In addition to the operating system, system software includes utility programs that help manage and tune the computer. If a computer program is not system software then it is application software. Application software includes middleware, which couples the system software with the user interface. Application software also includes utility programs that help users solve application problems, like the need for sorting. Sometimes development environments for software development are seen as a functional category on its own, especially in the context of human-computer interaction and programming language design. Development environments gather system software (such as compilers and system's batch processing scripting languages) and application software (such as IDEs) for the specific purpose of helping programmers create new programs.
Computer program
49
References
[1] Stair, Ralph M., et al. (2003). Principles of Information Systems, Sixth Edition. Thomson Learning, Inc.. pp.132. ISBN0-619-06489-7. [2] Silberschatz, Abraham (1994). Operating System Concepts, Fourth Edition. Addison-Wesley. pp.58. ISBN0-201-50480-4. [3] Wilson, Leslie B. (1993). Comparative Programming Languages, Second Edition. Addison-Wesley. pp.75. ISBN0-201-56885-3. [4] Wilson, Leslie B. (1993). Comparative Programming Languages, Second Edition. Addison-Wesley. pp.213. ISBN0-201-56885-3. [5] Wilson, Leslie B. (1993). Comparative Programming Languages, Second Edition. Addison-Wesley. pp.244. ISBN0-201-56885-3. [6] Silberschatz, Abraham (1994). Operating System Concepts, Fourth Edition. Addison-Wesley. pp.97. ISBN0-201-50480-4. [7] Silberschatz, Abraham (1994). Operating System Concepts, Fourth Edition. Addison-Wesley. pp.30. ISBN0-201-50480-4. [8] Tanenbaum, Andrew S. (1990). Structured Computer Organization, Third Edition. Prentice Hall. pp.11. ISBN0-13-854662-2. [9] Silberschatz, Abraham (1994). Operating System Concepts, Fourth Edition. Addison-Wesley. pp.6. ISBN0-201-50480-4. [10] Silberschatz, Abraham (1994). Operating System Concepts, Fourth Edition. Addison-Wesley. pp.100. ISBN0-201-50480-4. [11] Akhter, Shameem (2006). Multi-Core Programming. Richard Bowles (Intel Press). pp.1113. ISBN0-9764832-4-6. [12] Silberschatz, Abraham (1994). Operating System Concepts, Fourth Edition. Addison-Wesley. pp.1. ISBN0-201-50480-4.
Further reading
Knuth, Donald E. (1997). The Art of Computer Programming, Volume 1, 3rd Edition. Boston: Addison-Wesley. ISBN0-201-89683-4. Knuth, Donald E. (1997). The Art of Computer Programming, Volume 2, 3rd Edition. Boston: Addison-Wesley. ISBN0-201-89684-2. Knuth, Donald E. (1997). The Art of Computer Programming, Volume 3, 3rd Edition. Boston: Addison-Wesley. ISBN0-201-89685-0.
External links
Definition of "Program" (http://www.webopedia.com/TERM/P/program.html) at Webopedia Definition of "Software" (http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?query=software) at FOLDOC Definition of "Computer Program" (http://dictionary.reference.com/browse/computer program) at dictionary.com
Programming language
50
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely. The earliest programming languages predate the invention of the computer, and were used to direct the behavior of machines such as Jacquard looms and player pianos. Thousands of different programming languages have been created, mainly in the computer field, with many more being created every year. Most programming languages describe computation in an imperative style, i.e., as a sequence of commands, although some languages, such as those that support functional programming or logic programming, use alternative forms of description. The description of a programming language is usually split into the two components of syntax (form) and semantics (meaning). Some languages are defined by a specification document (for example, the C programming language is specified by an ISO Standard), while other languages, such as Perl 5 and earlier, have a dominant implementation that is used as a reference.
Definitions
A programming language is a notation for writing programs, which are specifications of a computation or algorithm.[1] Some, but not all, authors restrict the term "programming language" to those languages that can express all possible algorithms.[1][2] Traits often considered important for what constitutes a programming language include: Function and target: A computer programming language is a language[3] used to write computer programs, which involve a computer performing some kind of computation[4] or algorithm and possibly control external devices such as printers, disk drives, robots,[5] and so on. For example PostScript programs are frequently created by another program to control a computer printer or display. More generally, a programming language may describe computation on some, possibly abstract, machine. It is generally accepted that a complete specification for a programming language includes a description, possibly idealized, of a machine or processor for that language.[6] In most practical contexts, a programming language involves a computer; consequently, programming languages are usually defined and studied this way.[7] Programming languages differ from natural languages in that natural languages are only used for interaction between people, while programming languages also allow humans to communicate instructions to machines. Abstractions: Programming languages usually contain abstractions for defining and manipulating data structures or controlling the flow of execution. The practical necessity that a programming language support adequate abstractions is expressed by the abstraction principle;[8] this principle is sometimes formulated as recommendation to the programmer to make proper use of such abstractions.[9] Expressive power: The theory of computation classifies languages by the computations they are capable of expressing. All Turing complete languages can implement the same set of algorithms. ANSI/ISO SQL and Charity are examples of languages that are not Turing complete, yet often called programming languages.[10][11] Markup languages like XML, HTML or troff, which define structured data, are not generally considered programming languages.[12][13][14] Programming languages may, however, share the syntax with markup languages if a computational semantics is defined. XSLT, for example, is a Turing complete XML dialect.[15][16][17] Moreover, LaTeX, which is mostly used for structuring documents, also contains a Turing complete subset.[18][19] The term computer language is sometimes used interchangeably with programming language.[20] However, the usage of both terms varies among authors, including the exact scope of each. One usage describes programming languages as a subset of computer languages.[21] In this vein, languages used in computing that have a different goal than expressing computer programs are generically designated computer languages. For instance, markup languages are sometimes referred to as computer languages to emphasize that they are not meant to be used for
Programming language programming.[22] Another usage regards programming languages as theoretical constructs for programming abstract machines, and computer languages as the subset thereof that runs on physical computers, which have finite hardware resources.[23] John C. Reynolds emphasizes that formal specification languages are just as much programming languages as are the languages intended for execution. He also argues that textual and even graphical input formats that affect the behavior of a computer are programming languages, despite the fact they are commonly not Turing-complete, and remarks that ignorance of programming language concepts is the reason for many flaws in input formats.[24]
51
Elements
All programming languages have some primitive building blocks for the description of data and the processes or transformations applied to them (like the addition of two numbers or the selection of an item from a collection). These primitives are defined by syntactic and semantic rules which describe their structure and meaning respectively.
Syntax
A programming language's surface form is known as its syntax. Most programming languages are purely textual; they use sequences of text including words, numbers, and punctuation, much like written natural languages. On the other hand, there are some programming languages which are more graphical in nature, using visual relationships between symbols to specify a program. The syntax of a language describes the possible combinations of symbols that form a syntactically correct program. The meaning given to a combination of Parse tree of Python code with inset tokenization symbols is handled by semantics (either formal or hard-coded in a reference implementation). Since most languages are textual, this article discusses textual syntax. Programming language syntax is usually defined using a combination of regular expressions (for lexical structure) and BackusNaur Form (for grammatical structure). Below is a simple grammar, based on Lisp:
Programming language
52
Syntax highlighting is often used to aid programmers in recognizing elements of source code. The language above is Python.
expression ::= atom | list atom ::= number | symbol number ::= [+-]?['0'-'9']+ symbol ::= ['A'-'Z''a'-'z'].* list ::= '(' expression* ')' This grammar specifies the following: an expression is either an atom or a list; an atom is either a number or a symbol; a number is an unbroken sequence of one or more decimal digits, optionally preceded by a plus or minus sign; a symbol is a letter followed by zero or more of any characters (excluding whitespace); and a list is a matched pair of parentheses, with zero or more expressions inside it.
The following are examples of well-formed token sequences in this grammar: '12345', '()', '(a b c232 (1))' Not all syntactically correct programs are semantically correct. Many syntactically correct programs are nonetheless ill-formed, per the language's rules; and may (depending on the language specification and the soundness of the implementation) result in an error on translation or execution. In some cases, such programs may exhibit undefined behavior. Even when a program is well-defined within a language, it may still have a meaning that is not intended by the person who wrote it. Using natural language as an example, it may not be possible to assign a meaning to a grammatically correct sentence or the sentence may be false: "Colorless green ideas sleep furiously." is grammatically well-formed but has no generally accepted meaning. "John is a married bachelor." is grammatically well-formed but expresses a meaning that cannot be true. The following C language fragment is syntactically correct, but performs operations that are not semantically defined (the operation *p >> 4 has no meaning for a value having a complex type and p->im is not defined because the value of p is the null pointer): complex *p = NULL; complex abs_p = sqrt(*p >> 4 + p->im);
Programming language If the type declaration on the first line were omitted, the program would trigger an error on compilation, as the variable "p" would not be defined. But the program would still be syntactically correct, since type declarations provide only semantic information. The grammar needed to specify a programming language can be classified by its position in the Chomsky hierarchy. The syntax of most programming languages can be specified using a Type-2 grammar, i.e., they are context-free grammars.[25] Some languages, including Perl and Lisp, contain constructs that allow execution during the parsing phase. Languages that have constructs that allow the programmer to alter the behavior of the parser make syntax analysis an undecidable problem, and generally blur the distinction between parsing and execution.[26] In contrast to Lisp's macro system and Perl's BEGIN blocks, which may contain general computations, C macros are merely string replacements, and do not require code execution.[27]
53
Semantics
The term Semantics refers to the meaning of languages, as opposed to their form (syntax). Static semantics The static semantics defines restrictions on the structure of valid texts that are hard or impossible to express in standard syntactic formalisms.[1] For compiled languages, static semantics essentially include those semantic rules that can be checked at compile time. Examples include checking that every identifier is declared before it is used (in languages that require such declarations) or that the labels on the arms of a case statement are distinct.[28] Many important restrictions of this type, like checking that identifiers are used in the appropriate context (e.g. not adding an integer to a function name), or that subroutine calls have the appropriate number and type of arguments, can be enforced by defining them as rules in a logic called a type system. Other forms of static analyses like data flow analysis may also be part of static semantics. Newer programming languages like Java and C# have definite assignment analysis, a form of data flow analysis, as part of their static semantics. Dynamic semantics Once data has been specified, the machine must be instructed to perform operations on the data. For example, the semantics may define the strategy by which expressions are evaluated to values, or the manner in which control structures conditionally execute statements. The dynamic semantics (also known as execution semantics) of a language defines how and when the various constructs of a language should produce a program behavior. There are many ways of defining execution semantics. Natural language is often used to specify the execution semantics of languages commonly used in practice. A significant amount of academic research went into formal semantics of programming languages, which allow execution semantics to be specified in a formal manner. Results from this field of research have seen limited application to programming language design and implementation outside academia. Type system A type system defines how a programming language classifies values and expressions into types, how it can manipulate those types and how they interact. The goal of a type system is to verify and usually enforce a certain level of correctness in programs written in that language by detecting certain incorrect operations. Any decidable type system involves a trade-off: while it rejects many incorrect programs, it can also prohibit some correct, albeit unusual programs. In order to bypass this downside, a number of languages have type loopholes, usually unchecked casts that may be used by the programmer to explicitly allow a normally disallowed operation between different types. In most typed languages, the type system is used only to type check programs, but a number of languages, usually functional ones, infer types, relieving the programmer from the need to write type annotations. The formal design and study of type systems is known as type theory.
Programming language Typed versus untyped languages A language is typed if the specification of every operation defines types of data to which the operation is applicable, with the implication that it is not applicable to other types.[29] For example, the data represented by "this text between the quotes" is a string. In most programming languages, dividing a number by a string has no meaning. Most modern programming languages will therefore reject any program attempting to perform such an operation. In some languages, the meaningless operation will be detected when the program is compiled ("static" type checking), and rejected by the compiler, while in others, it will be detected when the program is run ("dynamic" type checking), resulting in a runtime exception. A special case of typed languages are the single-type languages. These are often scripting or markup languages, such as REXX or SGML, and have only one data typemost commonly character strings which are used for both symbolic and numeric data. In contrast, an untyped language, such as most assembly languages, allows any operation to be performed on any data, which are generally considered to be sequences of bits of various lengths.[29] High-level languages which are untyped include BCPL and some varieties of Forth. In practice, while few languages are considered typed from the point of view of type theory (verifying or rejecting all operations), most modern languages offer a degree of typing.[29] Many production languages provide means to bypass or subvert the type system (see casting). Static versus dynamic typing In static typing, all expressions have their types determined prior to when the program is executed, typically at compile-time. For example, 1 and (2+2) are integer expressions; they cannot be passed to a function that expects a string, or stored in a variable that is defined to hold dates.[29] Statically typed languages can be either manifestly typed or type-inferred. In the first case, the programmer must explicitly write types at certain textual positions (for example, at variable declarations). In the second case, the compiler infers the types of expressions and declarations based on context. Most mainstream statically typed languages, such as C++, C# and Java, are manifestly typed. Complete type inference has traditionally been associated with less mainstream languages, such as Haskell and ML. However, many manifestly typed languages support partial type inference; for example, Java and C# both infer types in certain limited cases.[30] Dynamic typing, also called latent typing, determines the type-safety of operations at runtime; in other words, types are associated with runtime values rather than textual expressions.[29] As with type-inferred languages, dynamically typed languages do not require the programmer to write explicit type annotations on expressions. Among other things, this may permit a single variable to refer to values of different types at different points in the program execution. However, type errors cannot be automatically detected until a piece of code is actually executed, potentially making debugging more difficult. Lisp, Perl, Python, JavaScript, and Ruby are dynamically typed. Weak and strong typing Weak typing allows a value of one type to be treated as another, for example treating a string as a number.[29] This can occasionally be useful, but it can also allow some kinds of program faults to go undetected at compile time and even at run-time. Strong typing prevents the above. An attempt to perform an operation on the wrong type of value raises an error.[29] Strongly typed languages are often termed type-safe or safe. An alternative definition for "weakly typed" refers to languages, such as Perl and JavaScript, which permit a large number of implicit type conversions. In JavaScript, for example, the expression 2 * x implicitly converts x to a number, and this conversion succeeds even if x is null, undefined, an Array, or a string of letters. Such implicit conversions are often useful, but they can mask programming errors. Strong and static are now generally considered orthogonal concepts, but usage in the literature differs. Some use the term strongly typed to mean
54
Programming language strongly, statically typed, or, even more confusingly, to mean simply statically typed. Thus C has been called both strongly typed and weakly, statically typed.[31][32]
55
Programming language less effort from the programmer. This lets them write more functionality per time unit.[36] Natural language processors have been proposed as a way to eliminate the need for a specialized language for programming. However, this goal remains distant and its benefits are open to debate. Edsger W. Dijkstra took the position that the use of a formal language is essential to prevent the introduction of meaningless constructs, and dismissed natural language programming as "foolish".[37] Alan Perlis was similarly dismissive of the idea.[38] Hybrid approaches have been taken in Structured English and SQL. A language's designers and users must construct a number of artifacts that govern and enable the practice of programming. The most important of these artifacts are the language specification and implementation.
56
Specification
The specification of a programming language is intended to provide a definition that the language users and the implementors can use to determine whether the behavior of a program is correct, given its source code. A programming language specification can take several forms, including the following: An explicit definition of the syntax, static semantics, and execution semantics of the language. While syntax is commonly specified using a formal grammar, semantic definitions may be written in natural language (e.g., as in the C language), or a formal semantics (e.g., as in Standard ML[39] and Scheme[40] specifications). A description of the behavior of a translator for the language (e.g., the C++ and Fortran specifications). The syntax and semantics of the language have to be inferred from this description, which may be written in natural or a formal language. A reference or model implementation, sometimes written in the language being specified (e.g., Prolog or ANSI REXX[41]). The syntax and semantics of the language are explicit in the behavior of the reference implementation.
Implementation
An implementation of a programming language provides a way to execute that program on one or more configurations of hardware and software. There are, broadly, two approaches to programming language implementation: compilation and interpretation. It is generally possible to implement a language using either technique. The output of a compiler may be executed by hardware or a program called an interpreter. In some implementations that make use of the interpreter approach there is no distinct boundary between compiling and interpreting. For instance, some implementations of BASIC compile and then execute the source a line at a time. Programs that are executed directly on the hardware usually run several orders of magnitude faster than those that are interpreted in software. One technique for improving the performance of interpreted programs is just-in-time compilation. Here the virtual machine, just before execution, translates the blocks of bytecode which are going to be used to machine code, for direct execution on the hardware.
Programming language
57
Usage
Thousands of different programming languages have been created, mainly in the computing field.[42] Programming languages differ from most other forms of human expression in that they require a greater degree of precision and completeness. When using a natural language to communicate with other people, human authors and speakers can be ambiguous and make small errors, and still expect their intent to be understood. However, figuratively speaking, computers "do exactly what they are told to do", and cannot "understand" what code the programmer intended to write. The combination of the language definition, a program, and the program's inputs must fully specify the external behavior that occurs when the program is executed, within the domain of control of that program. On the other hand, ideas about an algorithm can be communicated to humans without the precision required for execution by using pseudocode, which interleaves natural language with code written in a programming language. A programming language provides a structured mechanism for defining pieces of data, and the operations or transformations that may be carried out automatically on that data. A programmer uses the abstractions present in the language to represent the concepts involved in a computation. These concepts are represented as a collection of the simplest elements available (called primitives).[43] Programming is the process by which programmers combine these primitives to compose new programs, or adapt existing ones to new uses or a changing environment. Programs for a computer might be executed in a batch process without human interaction, or a user might type commands in an interactive session of an interpreter. In this case the "commands" are simply programs, whose execution is chained together. When a language is used to give commands to a software application (such as a shell) it is called a scripting language.
Programming language
58
Taxonomies
There is no overarching classification scheme for programming languages. A given programming language does not usually have a single ancestor language. Languages commonly arise by combining the elements of several predecessor languages with new ideas in circulation at the time. Ideas that originate in one language will diffuse throughout a family of related languages, and then leap suddenly across familial gaps to appear in an entirely different family. The task is further complicated by the fact that languages can be classified along multiple axes. For example, Java is both an object-oriented language (because it encourages object-oriented organization) and a concurrent language (because it contains built-in constructs for running multiple threads in parallel). Python is an object-oriented scripting language. In broad strokes, programming languages divide into programming paradigms and a classification by intended domain of use. Traditionally, programming languages have been regarded as describing computation in terms of imperative sentences, i.e. issuing commands. These are generally called imperative programming languages. A great deal of research in programming languages has been aimed at blurring the distinction between a program as a set of instructions and a program as an assertion about the desired answer, which is the main feature of declarative programming.[48] More refined paradigms include procedural programming, object-oriented programming, functional programming, and logic programming; some languages are hybrids of paradigms or multi-paradigmatic. An assembly language is not so much a paradigm as a direct model of an underlying machine architecture. By purpose, programming languages might be considered general purpose, system programming languages, scripting languages, domain-specific languages, or concurrent/distributed languages (or a combination of these).[49] Some general purpose languages were designed largely with educational goals.[50] A programming language may also be classified by factors unrelated to programming paradigm. For instance, most programming languages use English language keywords, while a minority do not. Other languages may be classified as being deliberately esoteric or not.
Programming language
59
History
Early developments
The first programming languages predate the modern computer. The 19th century saw the invention of "programmable" looms and player piano scrolls, both of which implemented examples of domain-specific languages. By the beginning of the twentieth century, punch cards encoded data and directed mechanical processing. In the 1930s and 1940s, the formalisms of Alonzo Church's lambda calculus and Alan Turing's Turing machines provided mathematical abstractions for expressing algorithms; the lambda calculus remains influential in language design.[51] In the 1940s, the first electrically powered digital computers were created. Grace Hopper, was one of the first programmers of the Harvard Mark I computer, a pioneer in the field, developed the first compiler, around 1952, for a computer programming language. Notwithstanding, the idea of programming language existed earlier; the first high-level programming language to be designed for a computer was Plankalkl, developed for the German Z3 by Konrad Zuse between 1943 and 1945. However, it was not implemented until 1998 and 2000.[52]
A selection of textbooks that teach programming, in languages both popular and obscure. These are only a few of the thousands of programming languages and dialects that have been designed in history.
Programmers of early 1950s computers, notably UNIVAC I and IBM 701, used machine language programs, that is, the first generation language (1GL). 1GL programming was quickly superseded by similarly machine-specific, but mnemonic, second generation languages (2GL) known as assembly languages or "assembler". Later in the 1950s, assembly language programming, which had evolved to include the use of macro instructions, was followed by the development of "third generation" programming languages (3GL), such as FORTRAN, LISP, and COBOL.[53] 3GLs are more abstract and are "portable", or at least implemented similarly on computers that do not support the same native machine code. Updated versions of all of these 3GLs are still in general use, and each has strongly influenced the development of later languages.[54] At the end of the 1950s, the language formalized as ALGOL 60 was introduced, and most later programming languages are, in many respects, descendants of Algol.[54] The format and use of the early programming languages was heavily influenced by the constraints of the interface.[55]
Refinement
The period from the 1960s to the late 1970s brought the development of the major language paradigms now in use, though many aspects were refinements of ideas in the very first Third-generation programming languages: APL introduced array programming and influenced functional programming.[56] PL/I (NPL) was designed in the early 1960s to incorporate the best ideas from FORTRAN and COBOL. In the 1960s, Simula was the first language designed to support object-oriented programming; in the mid-1970s, Smalltalk followed with the first "purely" object-oriented language. C was developed between 1969 and 1973 as a system programming language, and remains popular.[57] Prolog, designed in 1972, was the first logic programming language.
Programming language In 1978, ML built a polymorphic type system on top of Lisp, pioneering statically typed functional programming languages. Each of these languages spawned an entire family of descendants, and most modern languages count at least one of them in their ancestry. The 1960s and 1970s also saw considerable debate over the merits of structured programming, and whether programming languages should be designed to support it.[58] Edsger Dijkstra, in a famous 1968 letter published in the Communications of the ACM, argued that GOTO statements should be eliminated from all "higher level" programming languages.[59] The 1960s and 1970s also saw expansion of techniques that reduced the footprint of a program as well as improved productivity of the programmer and user. The card deck for an early 4GL was a lot smaller for the same functionality expressed in a 3GL deck.
60
Programming language
61
References
[1] Aaby, Anthony (2004). Introduction to Programming Languages (http:/ / www. emu. edu. tr/ aelci/ Courses/ D-318/ D-318-Files/ plbook/ intro. htm). . [2] In mathematical terms, this means the programming language is Turing-complete MacLennan, Bruce J. (1987). Principles of Programming Languages. Oxford University Press. p.1. ISBN0-19-511306-3. [3] Steven R. Fischer, A history of language, Reaktion Books, 2003, ISBN 1-86189-080-X, p. 205 [4] ACM SIGPLAN (2003). "Bylaws of the Special Interest Group on Programming Languages of the Association for Computing Machinery" (http:/ / www. acm. org/ sigs/ sigplan/ sigplan_bylaws. htm). . Retrieved 19 June 2006., The scope of SIGPLAN is the theory, design, implementation, description, and application of computer programming languages - languages that permit the specification of a variety of different computations, thereby providing the user with significant control (immediate or delayed) over the computer's operation. [5] Dean, Tom (2002). "Programming Robots" (http:/ / www. cs. brown. edu/ people/ tld/ courses/ cs148/ 02/ programming. html). Building Intelligent Robots. Brown University Department of Computer Science. . Retrieved 23 September 2006. [6] R. Narasimahan, Programming Languages and Computers: A Unified Metatheory, pp. 189--247 in Franz Alt, Morris Rubinoff (eds.) Advances in computers, Volume 8, Academic Press, 1994, ISBN 012012108, p.193 : "a complete specification of a programming language must, by definition, include a specification of a processor--idealized, if you will--for that language." [the source cites many references to support this statement] [7] Ben Ari, Mordechai (1996). Understanding Programming Languages. John Wiley and Sons. "Programs and languages can be dened as purely formal mathematical objects. However, more people are interested in programs than in other mathematical objects such as groups, precisely because it is possible to use the programthe sequence of symbolsto control the execution of a computer. While we highly recommend the study of the theory of programming, this text will generally limit itself to the study of programs as they are executed on a computer." [8] David A. Schmidt, The structure of typed programming languages, MIT Press, 1994, ISBN 0-262-19349-3, p. 32 [9] Pierce, Benjamin (2002). Types and Programming Languages. MIT Press. p.339. ISBN0-262-16209-1. [10] Digital Equipment Corporation. "Information Technology - Database Language SQL (Proposed revised text of DIS 9075)" (http:/ / www. contrib. andrew. cmu. edu/ ~shadow/ sql/ sql1992. txt). ISO/IEC 9075:1992, Database Language SQL. . Retrieved 29 June 2006. [11] The Charity Development Group (December 1996). "The CHARITY Home Page" (http:/ / pll. cpsc. ucalgary. ca/ charity1/ www/ home. html). . Retrieved 29 June 2006., Charity is a categorical programming language..., All Charity computations terminate. [12] XML in 10 points (http:/ / www. w3. org/ XML/ 1999/ XML-in-10-points. html) W3C, 1999, XML is not a programming language. [13] Powell, Thomas (2003). HTML & XHTML: the complete reference. McGraw-Hill. p.25. ISBN0-07-222942-X. "HTML is not a programming language." [14] Dykes, Lucinda; Tittel, Ed (2005). XML For Dummies, 4th Edition. Wiley. p.20. ISBN0-7645-8845-1. "...it's a markup language, not a programming language." [15] "What kind of language is XSLT?" (http:/ / www. ibm. com/ developerworks/ library/ x-xslt/ ). Ibm.com. . Retrieved 3 December 2010. [16] "XSLT is a Programming Language" (http:/ / msdn. microsoft. com/ en-us/ library/ ms767587(VS. 85). aspx). Msdn.microsoft.com. . Retrieved 3 December 2010. [17] Scott, Michael (2006). Programming Language Pragmatics. Morgan Kaufmann. p.802. ISBN0-12-633951-1. "XSLT, though highly specialized to the transformation of XML, is a Turing-complete programming language." [18] http:/ / tobi. oetiker. ch/ lshort/ lshort. pdf [19] Syropoulos, Apostolos; Antonis Tsolomitis, Nick Sofroniou (2003). Digital typography using LaTeX. Springer-Verlag. p.213. ISBN0-387-95217-9. "TeX is not only an excellent typesetting engine but also a real programming language." [20] Robert A. Edmunds, The Prentice-Hall standard glossary of computer terminology, Prentice-Hall, 1985, p. 91 [21] Pascal Lando, Anne Lapujade, Gilles Kassel, and Frdric Frst, Towards a General Ontology of Computer Programs (http:/ / www. loa-cnr. it/ ICSOFT2007_final. pdf), ICSOFT 2007 (http:/ / dblp. uni-trier. de/ db/ conf/ icsoft/ icsoft2007-1. html), pp. 163-170 [22] S.K. Bajpai, Introduction To Computers And C Programming, New Age International, 2007, ISBN 81-224-1379-X, p. 346 [23] R. Narasimahan, Programming Languages and Computers: A Unified Metatheory, pp. 189--247 in Franz Alt, Morris Rubinoff (eds.) Advances in computers, Volume 8, Academic Press, 1994, ISBN 012012108, p.215: "[...] the model [...] for computer languages differs from that [...] for programming languages in only two respects. In a computer language, there are only finitely many names--or registers--which can assume only finitely many values--or states--and these states are not further distinguished in terms of any other attributes. [author's footnote:] This may sound like a truism but its implications are far reaching. For example, it would imply that any model for programming languages, by fixing certain of its parameters or features, should be reducible in a natural way to a model for computer languages." [24] John C. Reynolds, Some thoughts on teaching programming and programming languages, SIGPLAN Notices, Volume 43, Issue 11, November 2008, p.109 [25] Michael Sipser (1996). Introduction to the Theory of Computation. PWS Publishing. ISBN0-534-94728-X. Section 2.2: Pushdown Automata, pp.101114. [26] Jeffrey Kegler, " Perl and Undecidability (http:/ / www. jeffreykegler. com/ Home/ perl-and-undecidability)", The Perl Review. Papers 2 and 3 prove, using respectively Rice's theorem and direct reduction to the halting problem, that the parsing of Perl programs is in general undecidable.
Programming language
[27] Marty Hall, 1995, Lecture Notes: Macros (http:/ / www. apl. jhu. edu/ ~hall/ Lisp-Notes/ Macros. html), PostScript version (http:/ / www. apl. jhu. edu/ ~hall/ Lisp-Notes/ Macros. ps) [28] Michael Lee Scott, Programming language pragmatics, Edition 2, Morgan Kaufmann, 2006, ISBN 0-12-633951-1, p. 18-19 [29] Andrew Cooke. "Introduction To Computer Languages" (http:/ / www. acooke. org/ comp-lang. html). . Retrieved 13 July 2012. [30] Specifically, instantiations of generic types are inferred for certain expression forms. Type inference in Generic Javathe research language that provided the basis for Java 1.5's bounded parametric polymorphism extensionsis discussed in two informal manuscripts from the Types mailing list: Generic Java type inference is unsound (http:/ / www. seas. upenn. edu/ ~sweirich/ types/ archive/ 1999-2003/ msg00849. html) (Alan Jeffrey, 17 December 2001) and Sound Generic Java type inference (http:/ / www. seas. upenn. edu/ ~sweirich/ types/ archive/ 1999-2003/ msg00921. html) (Martin Odersky, 15 January 2002). C#'s type system is similar to Java's, and uses a similar partial type inference scheme. [31] "Revised Report on the Algorithmic Language Scheme" (http:/ / www. schemers. org/ Documents/ Standards/ R5RS/ HTML/ r5rs-Z-H-4. html). 20 February 1998. . Retrieved 9 June 2006. [32] Luca Cardelli and Peter Wegner. "On Understanding Types, Data Abstraction, and Polymorphism" (http:/ / citeseer. ist. psu. edu/ cardelli85understanding. html). Manuscript (1985). . Retrieved 9 June 2006. [33] ric Lvnez (2011). "Computer Languages History" (http:/ / www. levenez. com/ lang/ ). . [34] Jing Huang. "Artificial Language vs. Natural Language" (http:/ / www. cs. cornell. edu/ info/ Projects/ Nuprl/ cs611/ fall94notes/ cn2/ subsection3_1_3. html). . [35] IBM in first publishing PL/I, for example, rather ambitiously titled its manual The universal programming language PL/I (IBM Library; 1966). The title reflected IBM's goals for unlimited subsetting capability: PL/I is designed in such a way that one can isolate subsets from it satisfying the requirements of particular applications. ( "PL/I" (http:/ / www. encyclopediaofmath. org/ index. php?title=PL/ I& oldid=19175). Encyclopedia of Mathematics. . Retrieved 29 June 2006.). Ada and UNCOL had similar early goals. [36] Frederick P. Brooks, Jr.: The Mythical Man-Month, Addison-Wesley, 1982, pp. 93-94 [37] Dijkstra, Edsger W. On the foolishness of "natural language programming." (http:/ / www. cs. utexas. edu/ users/ EWD/ transcriptions/ EWD06xx/ EWD667. html) EWD667. [38] Perlis, Alan (September 1982). "Epigrams on Programming" (http:/ / www-pu. informatik. uni-tuebingen. de/ users/ klaeren/ epigrams. html). SIGPLAN Notices Vol. 17, No. 9. pp.7-13. . [39] Milner, R.; M. Tofte, R. Harper and D. MacQueen. (1997). The Definition of Standard ML (Revised). MIT Press. ISBN0-262-63181-4. [40] Kelsey, Richard; William Clinger and Jonathan Rees (February 1998). "Section 7.2 Formal semantics" (http:/ / www. schemers. org/ Documents/ Standards/ R5RS/ HTML/ r5rs-Z-H-10. html#%_sec_7. 2). Revised5 Report on the Algorithmic Language Scheme. . Retrieved 9 June 2006. [41] ANSI Programming Language Rexx, X3-274.1996 [42] "HOPL: an interactive Roster of Programming Languages" (http:/ / hopl. murdoch. edu. au/ ). Australia: Murdoch University. . Retrieved 1 June 2009. "This site lists 8512 languages." [43] Abelson, Sussman, and Sussman. "Structure and Interpretation of Computer Programs" (http:/ / mitpress. mit. edu/ sicp/ full-text/ book/ book-Z-H-10. html). . Retrieved 3 March 2009. [44] http:/ / www. computerweekly. com/ Articles/ 2007/ 09/ 11/ 226631/ sslcomputer-weekly-it-salary-survey-finance-boom-drives-it-job. htm [45] "Counting programming languages by book sales" (http:/ / radar. oreilly. com/ archives/ 2006/ 08/ programming_language_trends_1. html). Radar.oreilly.com. 2 August 2006. . Retrieved 3 December 2010. [46] Bieman, J.M.; Murdock, V., Finding code on the World Wide Web: a preliminary investigation, Proceedings First IEEE International Workshop on Source Code Analysis and Manipulation, 2001 [47] "Programming Language Popularity" (http:/ / www. langpop. com/ ). Langpop.com. . Retrieved 3 December 2010. [48] Carl A. Gunter, Semantics of Programming Languages: Structures and Techniques, MIT Press, 1992, ISBN 0-262-57095-5, p. 1 [49] "TUNES: Programming Languages" (http:/ / tunes. org/ wiki/ programming_20languages. html). . [50] Wirth, Niklaus (1993). "Recollections about the development of Pascal" (http:/ / portal. acm. org/ citation. cfm?id=155378). Proc. 2nd ACM SIGPLAN conference on history of programming languages: 333342. doi:10.1145/154766.155378. ISBN0-89791-570-4. . Retrieved 30 June 2006. [51] Benjamin C. Pierce writes:
62
"... the lambda calculus has seen widespread use in the specification of programming language features, in language design and implementation, and in the study of type systems."
Pierce, Benjamin C. (2002). Types and Programming Languages. MIT Press. p.52. ISBN0-262-16209-1. [52] Rojas, Ral, et al. (2000). "Plankalkl: The First High-Level Programming Language and its Implementation". Institut fr Informatik, Freie Universitt Berlin, Technical Report B-3/2000. (full text) (http:/ / www. zib. de/ zuse/ Inhalt/ Programme/ Plankalkuel/ Plankalkuel-Report/ Plankalkuel-Report. htm) [53] Linda Null, Julia Lobur, The essentials of computer organization and architecture, Edition 2, Jones & Bartlett Publishers, 2006, ISBN 0-7637-3769-0, p. 435 [54] O'Reilly Media. "History of programming languages" (http:/ / www. oreilly. com/ news/ graphics/ prog_lang_poster. pdf) (PDF). . Retrieved 5 October 2006.
Programming language
[55] Frank da Cruz. IBM Punch Cards (http:/ / www. columbia. edu/ acis/ history/ cards. html) Columbia University Computing History (http:/ / www. columbia. edu/ acis/ history/ index. html). [56] Richard L. Wexelblat: History of Programming Languages, Academic Press, 1981, chapter XIV. [57] Franois Labelle. "Programming Language Usage Graph" (http:/ / www. cs. berkeley. edu/ ~flab/ languages. html). SourceForge. . Retrieved 21 June 2006.. This comparison analyzes trends in number of projects hosted by a popular community programming repository. During most years of the comparison, C leads by a considerable margin; in 2006, Java overtakes C, but the combination of C/C++ still leads considerably. [58] Hayes, Brian (2006). "The Semicolon Wars". American Scientist 94 (4): 299303. [59] Dijkstra, Edsger W. (March 1968). "Go To Statement Considered Harmful" (http:/ / www. acm. org/ classics/ oct95/ ). Communications of the ACM 11 (3): 147148. doi:10.1145/362929.362947. . Retrieved 29 June 2006. [60] Tetsuro Fujise, Takashi Chikayama Kazuaki Rokusawa, Akihiko Nakase (December 1994). "KLIC: A Portable Implementation of KL1" Proc. of FGCS '94, ICOT Tokyo, December 1994. KLIC is a portable implementation of a concurrent logic programming language [[KL1 (http:/ / www. icot. or. jp/ ARCHIVE/ HomePage-E. html)].] [61] Jim Bender (15 March 2004). "Mini-Bibliography on Modules for Functional Programming Languages" (http:/ / readscheme. org/ modules/ ). ReadScheme.org. . Retrieved 27 September 2006. [62] Wall, Programming Perl ISBN 0-596-00027-8 p.66
63
Further reading
Abelson, Harold; Sussman, Gerald Jay (1996). [[Structure and Interpretation of Computer Programs (http:// mitpress.mit.edu/sicp/full-text/book/book-Z-H-4.html)]] (2nd ed.). MIT Press. Raphael Finkel: Advanced Programming Language Design (http://www.nondot.org/sabre/Mirrored/ AdvProgLangDesign/), Addison Wesley 1995. Daniel P. Friedman, Mitchell Wand, Christopher T. Haynes: Essentials of Programming Languages, The MIT Press 2001. Maurizio Gabbrielli and Simone Martini: "Programming Languages: Principles and Paradigms", Springer, 2010. David Gelernter, Suresh Jagannathan: Programming Linguistics, The MIT Press 1990. Ellis Horowitz (ed.): Programming Languages, a Grand Tour (3rd ed.), 1987. Ellis Horowitz: Fundamentals of Programming Languages, 1989. Shriram Krishnamurthi: Programming Languages: Application and Interpretation, online publication (http:// www.cs.brown.edu/~sk/Publications/Books/ProgLangs/). Bruce J. MacLennan: Principles of Programming Languages: Design, Evaluation, and Implementation, Oxford University Press 1999. John C. Mitchell: Concepts in Programming Languages, Cambridge University Press 2002. Benjamin C. Pierce: Types and Programming Languages, The MIT Press 2002. Terrence W. Pratt and Marvin V. Zelkowitz: Programming Languages: Design and Implementation (4th ed.), Prentice Hall 2000. Peter H. Salus. Handbook of Programming Languages (4 vols.). Macmillan 1998. Ravi Sethi: Programming Languages: Concepts and Constructs, 2nd ed., Addison-Wesley 1996. Michael L. Scott: Programming Language Pragmatics, Morgan Kaufmann Publishers 2005. Robert W. Sebesta: Concepts of Programming Languages, 9th ed., Addison Wesley 2009. Franklyn Turbak and David Gifford with Mark Sheldon: Design Concepts in Programming Languages, The MIT Press 2009. Peter Van Roy and Seif Haridi. Concepts, Techniques, and Models of Computer Programming, The MIT Press 2004. David A. Watt. Programming Language Concepts and Paradigms. Prentice Hall 1990. David A. Watt and Muffy Thomas. Programming Language Syntax and Semantics. Prentice Hall 1991. David A. Watt. Programming Language Processors. Prentice Hall 1993. David A. Watt. Programming Language Design Concepts. John Wiley & Sons 2004.
Programming language
64
External links
99 Bottles of Beer (http://www.99-bottles-of-beer.net/) A collection of implementations in many languages. Computer Programming Languages (http://www.dmoz.org/Computers/Programming/Languages/) at the Open Directory Project
Abstraction
In computer science, abstraction is the process by which data and programs are defined with a representation similar in form to its meaning (semantics), while hiding away the implementation details. Abstraction tries to reduce and factor out details so that the programmer can focus on a few concepts at a time. A system can have several abstraction layers whereby different meanings and amounts of detail are exposed to the programmer. For example, low-level abstraction layers expose details of the computer hardware where the program is run, while high-level layers deal with the business logic of the program. The following English definition of abstraction helps to understand how this term applies to computer science, IT and objects: abstraction - a concept or idea not associated with any specific instance[1] Abstraction captures only those details about an object that are relevant to the current perspective. The concept originated by analogy with abstraction in mathematics. The mathematical technique of abstraction begins with mathematical definitions, making it a more technical approach than the general concept of abstraction in philosophy. For example, in both computing and in mathematics, numbers are concepts in the programming languages, as founded in mathematics. Implementation details depend on the hardware and software, but this is not a restriction because the computing concept of number is still based on the mathematical concept. In computer programming, abstraction can apply to control or to data: Control abstraction is the abstraction of actions while data abstraction is that of data structures. Control abstraction involves the use of subprograms and related concepts control flows Data abstraction allows handling data bits in meaningful ways. For example, it is the basic motivation behind datatype. One can regard the notion of an object (from object-oriented programming) as an attempt to combine abstractions of data and code. The same abstract definition can be used as a common interface for a family of objects with different implementations and behaviors but which share the same meaning. The inheritance mechanism in object-oriented programming can be used to define an abstract class as the common interface. The recommendation that programmers use abstractions whenever suitable in order to avoid duplication (usually of code) is known as the abstraction principle. The requirement that a programming language provide suitable abstractions is also called the abstraction principle.
Abstraction
65
Rationale
Computing mostly operates independently of the concrete world: The hardware implements a model of computation that is interchangeable with others. The software is structured in architectures to enable humans to create the enormous systems by concentration on a few issues at a time. These architectures are made of specific choices of abstractions. Greenspun's Tenth Rule is an aphorism on how such an architecture is both inevitable and complex. A central form of abstraction in computing is language abstraction: new artificial languages are developed to express specific aspects of a system. Modeling languages help in planning. Computer languages can be processed with a computer. An example of this abstraction process is the generational development of programming languages from the machine language to the assembly language and the high-level language. Each stage can be used as a stepping stone for the next stage. The language abstraction continues for example in scripting languages and domain-specific programming languages. Within a programming language, some features let the programmer create new abstractions. These include the subroutine, the module, and the software component. Some other abstractions such as software design patterns and architectural styles remain invisible to a programming language and operate only in the design of a system. Some abstractions try to limit the breadth of concepts a programmer needs by completely hiding the abstractions they in turn are built on. The software engineer and writer Joel Spolsky has criticised these efforts by claiming that all abstractions are leaky that they can never completely hide the details below; however this does not negate the usefulness of abstraction. Some abstractions are designed to interoperate with others, for example a programming language may contain a foreign function interface for making calls to the lower-level language.
Language features
Programming languages
Different programming languages provide different types of abstraction, depending on the intended applications for the language. For example: In object-oriented programming languages such as C++, Object Pascal, or Java, the concept of abstraction has itself become a declarative statement - using the keywords virtual (in C++) or abstract (in Java). After such a declaration, it is the responsibility of the programmer to implement a class to instantiate the object of the declaration. Functional programming languages commonly exhibit abstractions related to functions, such as lambda abstractions (making a term into a function of some variable), higher-order functions (parameters are functions), bracket abstraction (making a term into a function of a variable). Modern Lisps such as Clojure, Scheme and Common Lisp support macro systems to allow syntactic abstraction. This allows a Lisp programmer to eliminate boilerplate code, abstract away tedious function call sequences, implement new control flow structures, implement or even build Domain Specific Languages (DLSs), which allow domain-specific concepts to be expressed in some optimised way. All of these, when used correctly, improve both the programmer's efficiency and the clarity of the code by making the intended purpose more explicit. A consequence of syntactic abstraction is also that any Lisp dialect and in fact almost any programming language can, in principle, be implemented in any modern Lisp with significantly reduced (but still non-trivial in some cases) effort when compared to "more traditional" programming languages such as Python, C or Java.
Abstraction
66
Specification methods
Analysts have developed various methods to formally specify software systems. Some known methods include: Abstract-model based method (VDM, Z); Algebraic techniques (Larch, CLEAR, OBJ, ACT ONE, CASL); Process-based techniques (LOTOS, SDL, Estelle); Trace-based techniques (SPECIAL, TAM); Knowledge-based techniques (Refine, Gist).
Specification languages
Specification languages generally rely on abstractions of one kind or another, since specifications are typically defined earlier in a project (and at a more abstract level) than an eventual implementation. The UML specification language, for example, allows the definition of abstract classes, which remain abstract during the architecture and specification phase of the project.
Control abstraction
Programming languages offer control abstraction as one of the main purposes of their use. Computer machines understand operations at the very low level such as moving some bits from one location of the memory to another location and producing the sum of two sequences of bits. Programming languages allow this to be done in the higher level. For example, consider this statement written in a Pascal-like fashion: a := (1 + 2) * 5 To a human, this seems a fairly simple and obvious calculation ("one plus two is three, times five is fifteen"). However, the low-level steps necessary to carry out this evaluation, and return the value "15", and then assign that value to the variable "a", are actually quite subtle and complex. The values need to be converted to binary representation (often a much more complicated task than one would think) and the calculations decomposed (by the compiler or interpreter) into assembly instructions (again, which are much less intuitive to the programmer: operations such as shifting a binary register left, or adding the binary complement of the contents of one register to another, are simply not how humans think about the abstract arithmetical operations of addition or multiplication). Finally, assigning the resulting value of "15" to the variable labeled "a", so that "a" can be used later, involves additional 'behind-the-scenes' steps of looking up a variable's label and the resultant location in physical or virtual memory, storing the binary representation of "15" to that memory location, etc. Without control abstraction, a programmer would need to specify all the register/binary-level steps each time he simply wanted to add or multiply a couple of numbers and assign the result to a variable. Such duplication of effort has two serious negative consequences: 1. it forces the programmer to constantly repeat fairly common tasks every time a similar operation is needed 2. it forces the programmer to program for the particular hardware and instruction set
Abstraction
67
Structured programming
Structured programming involves the splitting of complex program tasks into smaller pieces with clear flow-control and interfaces between components, with reduction of the complexity potential for side-effects. In a simple program, this may aim to ensure that loops have single or obvious exit points and (where possible) to have single exit points from functions and procedures. In a larger system, it may involve breaking down complex tasks into many different modules. Consider a system which handles payroll on ships and at shore offices: The uppermost level may feature a menu of typical end-user operations. Within that could be standalone executables or libraries for tasks such as signing on and off employees or printing checks. Within each of those standalone components there could be many different source files, each containing the program code to handle a part of the problem, with only selected interfaces available to other parts of the program. A sign on program could have source files for each data entry screen and the database interface (which may itself be a standalone third party library or a statically linked set of library routines). Either the database or the payroll application also has to initiate the process of exchanging data with between ship and shore, and that data transfer task will often contain many other components. These layers produce the effect of isolating the implementation details of one component and its assorted internal methods from the others. Object-oriented programming embraced and extended this concept.
Data abstraction
Data abstraction enforces a clear separation between the abstract properties of a data type and the concrete details of its implementation. The abstract properties are those that are visible to client code that makes use of the data typethe interface to the data typewhile the concrete implementation is kept entirely private, and indeed can change, for example to incorporate efficiency improvements over time. The idea is that such changes are not supposed to have any impact on client code, since they involve no difference in the abstract behaviour. For example, one could define an abstract data type called lookup table which uniquely associates keys with values, and in which values may be retrieved by specifying their corresponding keys. Such a lookup table may be implemented in various ways: as a hash table, a binary search tree, or even a simple linear list of (key:value) pairs. As far as client code is concerned, the abstract properties of the type are the same in each case. Of course, this all relies on getting the details of the interface right in the first place, since any changes there can have major impacts on client code. As one way to look at this: the interface forms a contract on agreed behaviour between the data type and client code; anything not spelled out in the contract is subject to change without notice. Languages that implement data abstraction include Ada and Modula-2. Object-oriented languages are commonly claimed to offer data abstraction; however, their inheritance concept tends to put information in the interface that more properly belongs in the implementation; thus, changes to such information ends up impacting client code, leading directly to the Fragile binary interface problem.
Abstraction direction, inside the types or classes, structuring them to simplify a complex set of relationships, it is called delegation or inheritance. Various object-oriented programming languages offer similar facilities for abstraction, all to support a general strategy of polymorphism in object-oriented programming, which includes the substitution of one type for another in the same or similar role. Although not as generally supported, a configuration or image or package may predetermine a great many of these bindings at compile-time, link-time, or loadtime. This would leave only a minimum of such bindings to change at run-time. Common Lisp Object System or Self, for example, feature less of a class-instance distinction and more use of delegation for polymorphism. Individual objects and functions are abstracted more flexibly to better fit with a shared functional heritage from Lisp. C++ exemplifies another extreme: it relies heavily on templates and overloading and other static bindings at compile-time, which in turn has certain flexibility problems. Although these examples offer alternate strategies for achieving the same abstraction, they do not fundamentally alter the need to support abstract nouns in code - all programming relies on an ability to abstract verbs as functions, nouns as data structures, and either as processes. Consider for example a sample Java fragment to represent some common farm "animals" to a level of abstraction suitable to model simple aspects of their hunger and feeding. It defines an Animal class to represent both the state of the animal and its functions: public class Animal extends LivingThing { private Location loc; private double energyReserves; public boolean isHungry() { return energyReserves < 2.5; } public void eat(Food f) { // Consume food energyReserves += f.getCalories(); } public void moveTo(Location l) { // Move to new location loc = l; } } With the above definition, one could create objects of type Animal and call their methods like this: thePig = new Animal(); theCow = new Animal(); if (thePig.isHungry()) { thePig.eat(tableScraps); } if (theCow.isHungry()) { theCow.eat(grass); } theCow.moveTo(theBarn);
68
Abstraction In the above example, the class Animal is an abstraction used in place of an actual animal, LivingThing is a further abstraction (in this case a generalisation) of Animal. If one requires a more differentiated hierarchy of animals to differentiate, say, those who provide milk from those who provide nothing except meat at the end of their lives that is an intermediary level of abstraction, probably DairyAnimal (cows, goats) who would eat foods suitable to giving good milk, and MeatAnimal (pigs, steers) who would eat foods to give the best meat-quality. Such an abstraction could remove the need for the application coder to specify the type of food, so s/he could concentrate instead on the feeding schedule. The two classes could be related using inheritance or stand alone, and the programmer could define varying degrees of polymorphism between the two types. These facilities tend to vary drastically between languages, but in general each can achieve anything that is possible with any of the others. A great many operation overloads, data type by data type, can have the same effect at compile-time as any degree of inheritance or other means to achieve polymorphism. The class notation is simply a coder's convenience.
69
Object-oriented design
Decisions regarding what to abstract and what to keep under the control of the coder become the major concern of object-oriented design and domain analysisactually determining the relevant relationships in the real world is the concern of object-oriented analysis or legacy analysis. In general, to determine appropriate abstraction, one must make many small decisions about scope (domain analysis), determine what other systems one must cooperate with (legacy analysis), then perform a detailed object-oriented analysis which is expressed within project time and budget constraints as an object-oriented design. In our simple example, the domain is the barnyard, the live pigs and cows and their eating habits are the legacy constraints, the detailed analysis is that coders must have the flexibility to feed the animals what is available and thus there is no reason to code the type of food into the class itself, and the design is a single simple Animal class of which pigs and cows are instances with the same functions. A decision to differentiate DairyAnimal would change the detailed analysis but the domain and legacy analysis would be unchangedthus it is entirely under the control of the programmer, and we refer to abstraction in object-oriented programming as distinct from abstraction in domain or legacy analysis.
Considerations
When discussing formal semantics of programming languages, formal methods or abstract interpretation, abstraction refers to the act of considering a less detailed, but safe, definition of the observed program behaviors. For instance, one may observe only the final result of program executions instead of considering all the intermediate steps of executions. Abstraction is defined to a concrete (more precise) model of execution. Abstraction may be exact or faithful with respect to a property if one can answer a question about the property equally well on the concrete or abstract model. For instance, if we wish to know what the result of the evaluation of a mathematical expression involving only integers +, -, , is worth modulo n, we need only perform all operations modulo n (a familiar form of this abstraction is casting out nines). Abstractions, however, though not necessarily exact, should be sound. That is, it should be possible to get sound answers from themeven though the abstraction may simply yield a result of undecidability. For instance, we may abstract the students in a class by their minimal and maximal ages; if one asks whether a certain person belongs to that class, one may simply compare that person's age with the minimal and maximal ages; if his age lies outside the range, one may safely answer that the person does not belong to the class; if it does not, one may only answer "I don't know". The level of abstraction included in a programming language can influence its overall usability. The Cognitive dimensions framework includes the concept of abstraction gradient in a formalism. This framework allows the
Abstraction designer of a programming language to study the trade-offs between abstraction and other characteristics of the design, and how changes in abstraction influence the language usability. Abstractions can prove useful when dealing with computer programs, because non-trivial properties of computer programs are essentially undecidable (see Rice's theorem). As a consequence, automatic methods for deriving information on the behavior of computer programs either have to drop termination (on some occasions, they may fail, crash or never yield out a result), soundness (they may provide false information), or precision (they may answer "I don't know" to some questions). Abstraction is the core concept of abstract interpretation. Model checking generally takes place on abstract versions of the studied systems.
70
Levels of abstraction
Computer science commonly presents levels (or, less commonly, layers) of abstraction, wherein each level represents a different model of the same information and processes, but uses a system of expression involving a unique set of objects and compositions that apply only to a particular domain. [2] Each relatively abstract, "higher" level builds on a relatively concrete, "lower" level, which tends to provide an increasingly "granular" representation. For example, gates build on electronic circuits, binary on gates, machine language on binary, programming language on machine language, applications and operating systems on programming languages. Each level is embodied, but not determined, by the level beneath it, making it a language of description that is somewhat self-contained.
Database systems
Since many users of database systems lack in-depth familiarity with computer data-structures, database developers often hide complexity through the following levels: Physical level: The lowest level of abstraction describes how a system actually stores data. The physical level describes complex low-level data structures in detail. Logical level: The next higher level of abstraction describes what data the database stores, and what relationships exist among those data. The logical level thus describes an entire database in terms of a small number of relatively simple structures. Although implementation of the simple structures at the logical level may involve complex physical Data abstraction levels of a database system level structures, the user of the logical level does not need to be aware of this complexity. This referred to as Physical Data Independence. Database administrators, who must decide what information to keep in a database, use the logical level of abstraction. View level: The highest level of abstraction describes only part of the entire database. Even though the logical level uses simpler structures, complexity remains because of the variety of information stored in a large database. Many users of a database system do not need all this information; instead, they need to access only a part of the database. The view level of abstraction exists to simplify their interaction with the system. The system may provide many views for the same database.
Abstraction
71
Layered architecture
The ability to provide a design of different levels of abstraction can simplify the design considerably enable different role players to effectively work at various levels of abstraction Systems design and business process design can both use this. Some design processes specifically generate designs that contain various levels of abstraction. Layered architecture partitions the concerns of the application into stacked groups (layers). It is a technique used in designing computer software, hardware, and communications in which system or network components are isolated in layers so that changes can be made in one layer without affecting the others.
Notes
This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later.
[1] Thefreedictionary.com (http:/ / www. thefreedictionary. com/ abstraction) [2] Luciano Floridi, Levellism and the Method of Abstraction (http:/ / www. philosophyofinformation. net/ pdf/ latmoa. pdf) IEG Research Report 22.11.04
Further reading
Harold Abelson; Gerald Jay Sussman; Julie Sussman (25 July 1996). Structure and Interpretation of Computer Programs (http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-10.html) (2 ed.). MIT Press. ISBN978-0-262-01153-2. Retrieved 22 June 2012. Spolsky, Joel (11 November, 2002). "The Law of Leaky Abstractions" (http://www.joelonsoftware.com/ articles/LeakyAbstractions.html). Joel on Software. Abstraction/information hiding (http://www.cs.cornell.edu/courses/cs211/2006sp/Lectures/L08-abstraction/ 08_abstraction.html) - CS211 course, Cornell University. Gorodinski, Lev (31 May, 2012). "Abstractions" (http://gorodinski.com/blog/2012/05/31/abstractions/).
Programmer
72
Programmer
A programmer, computer programmer, developer, or coder is a person who writes computer software. The term computer programmer can refer to a specialist in one area of computer programming or to a generalist who writes code for many kinds of software. One who practices or professes a formal approach to programming may also be known as a programmer analyst. A programmer's primary computer language (C, C++, Java, Lisp, Python etc.) is often prefixed to the above titles, and those who work in a Web environment often prefix their titles with Web. The term programmer can be used to refer to a software developer, Web Developer, Mobile Applications Developer, Embedded Firmware Developer, software engineer, computer scientist, or software analyst. However, members of these professions typically Student programmers at the Technische Hochschule in Aachen, Germany in 1970 possess other software engineering skills, beyond programming; for this reason, the term programmer is sometimes considered an insulting or derogatory oversimplification of these other professions. This has sparked much debate amongst developers, analysts, computer scientists, programmers, and outsiders who continue to be puzzled at the subtle differences in the definitions of these occupations.[1][2][3][4][5] British countess and mathematician Ada Lovelace is popularly credited as history's first programmer, as she was the first to express an algorithm intended for implementation on a computer, Charles Babbage's analytical engine, in October 1842, intended for the calculation of Bernoulli numbers.[6] Her work never ran because Babbage's machine was never completed to a functioning standard in her time; the first programmer to successfully run a program on a functioning modern electronically based computer was pioneer computer scientist Konrad Zuse, who achieved this feat in 1941. The ENIAC programming team, consisting of Kay McNulty, Betty Jennings, Betty Snyder, Marlyn Wescoff, Fran Bilas and Ruth Lichterman were the first regularly working programmers.[7][8] International Programmers' Day is celebrated annually on January 7.[9] In 2009, the government of Russia decreed a professional annual holiday known as Programmers' Day to be celebrated on September 13 (September 12 in leap years). It had also been an unofficial international holiday before that.
Programmer conditions on an aircraft for pilots training in a flight simulator. Although simple programs can be written in a few hours, programs that use complex mathematical formulas whose solutions can only be approximated or that draw data from many existing systems may require more than a year of work. In most cases, several programmers work together as a team under a senior programmers supervision. Programmers write programs according to the specifications determined primarily by more senior programmers and by systems analysts. After the design process is complete, it is the job of the programmer to convert that design into a logical series of instructions that the computer can follow. The programmer codes these instructions in one of many programming languages. Different programming languages are used depending on the purpose of the program. COBOL, for example, is commonly used for business applications that typically run on mainframe and midrange computers, whereas Fortran is used in science and engineering. C++ is widely used for both scientific and business applications. Java, C# and PHP are popular programming languages for Web and business applications. Programmers generally know more than one programming language and, because many languages are similar, they often can learn new languages relatively easily. In practice, programmers often are referred to by the language they know, e.g. as Java programmers, or by the type of function they perform or environment in which they work: for example, database programmers, mainframe programmers, or Web developers. When making changes to the source code that programs are made up of, programmers need to make other programmers aware of the task that the routine is to perform. They do this by inserting comments in the source code so that others can understand the program more easily. To save work, programmers often use libraries of basic code that can be modified or customized for a specific application. This approach yields more reliable and consistent programs and increases programmers' productivity by eliminating some routine steps.
73
Programmer
74
Types of software
Programmers in software development companies may work directly with experts from various fields to create software either programs designed for specific clients or packaged software for general use ranging from computer and video games to educational software to programs for desktop publishing and financial planning. Programming of packaged software constitutes one of the most rapidly growing segments of the computer services industry. In some organizations, particularly small ones, workers commonly known as programmer analysts are responsible for both the systems analysis and the actual programming work. The transition from a mainframe environment to one that is based primarily on personal computers (PCs) has blurred the once rigid distinction between the programmer and the user. Increasingly, adept end users are taking over many of the tasks previously performed by programmers. For example, the growing use of packaged software, such as spreadsheet and database management software packages, allows users to write simple programs to access data and perform calculations. In addition, the rise of the Internet has made Web development a huge part of the programming field. More and more software applications nowadays are Web applications that can be used by anyone with a Web browser. Examples of such applications include the Google search service, the Hotmail e-mail service, and the Flickr photo-sharing service.
Globalization
Market changes in the UK
According to BBC, 17% of computer science students could not find work in their field 7 months after graduation in 2009 which was the highest rate of the university majors surveyed while 0% of medical students were unemployed in the same survey.[11] The UK category system does, however, class such degrees as Information technology and Game design as 'computer science', somewhat inflating the actual figure.[12]
Programmer
75
References
[1] "No Programmers" (http:/ / www. ericsink. com/ No_Programmers. html). . [2] "Developer versus programmer" (http:/ / codebetter. com/ blogs/ raymond. lewallen/ archive/ 2005/ 02/ 22/ 55812. aspx). . [3] "Developers AND Programmers" (http:/ / weblogs. asp. net/ miked/ archive/ 2006/ 10/ 13/ _2200_Developers_2200_-and-_2200_Programmers_2200_. aspx). . [4] "Programmer vs. Developer vs. Software Engineer" (http:/ / discuss. joelonsoftware. com/ default. asp?joel. 3. 112837. 37). . [5] "Programmer vs. Developer vs. Software Engineer" (http:/ / www. xtremevbtalk. com/ archive/ index. php/ t-233780. html). . [6] J. Fuegi and J. Francis, "Lovelace & Babbage and the creation of the 1843 'notes'." Annals of the History of Computing 25 #4 (OctoberDecember 2003): 19, 25. Digital Object Identifier (http:/ / dx. doi. org/ 10. 1109/ MAHC. 2003. 1253887) [7] "ENIAC Programmers Project" (http:/ / eniacprogrammers. org/ ). Eniacprogrammers.org. . Retrieved 2010-10-03. [8] "ABC News: First Computer Programmers Inspire Documentary" (http:/ / abcnews. go. com/ Technology/ story?id=3951187& page=1). Abcnews.go.com. 2007-12-04. . Retrieved 2010-10-03. [9] "International Programmers' Day" (http:/ / www. internationalprogrammersday. org). . [10] http:/ / www. bls. gov/ oco/ ocos110. htm [11] "'One in 10' UK graduates unemployed" (http:/ / www. bbc. co. uk/ news/ 10477551) from the BBC [12] (http:/ / www. plymouth. ac. uk/ pages/ view. asp?page=23727) ATAS classifications (University of Plymouth)
Further reading
Weinberg, Gerald M., The Psychology of Computer Programming, New York: Van Nostrand Reinhold, 1971 An experiential study of the nature of programming work: Lucas, Rob. "Dreaming in Code" (http://www. newleftreview.org/?view=2836) New Left Review 62, March-April 2010, pp. 125-132.
External links
"The Future of IT Jobs in America" article (http://www.ideosphere.com/fx-bin/Claim?claim=ITJOBS) How to be a programmer (http://samizdat.mines.edu/howto/HowToBeAProgrammer.html) - An overview of the challenges of being a programmer The US Department of Labor's description of "Computer Programmer" (http://www.bls.gov/oco/ocos110. htm) and "Computer Software Engineer" (http://www.bls.gov/oco/ocos267.htm) and statistics for employed "Computer Programmers" (http://www.bls.gov/oes/current/oes151021.htm)
Language primitive
76
Language primitive
In computing, language primitives are the simplest elements available in a programming language. A primitive can be defined as the smallest 'unit of processing' available to a programmer of a particular machine, or can be an atomic element of an expression in a language. Primitives are units with a meaning, i.e. a semantic value in the language. Thus they're different from tokens in a parser, which are the minimal elements of syntax.
Language primitive
77
References
[1] Surana P (2006) (PDF). Meta-Compilation of Language Abstractions. (ftp:/ / lispnyc. org/ meeting-assets/ 2007-02-13_pinku/ SuranaThesis. pdf). . Retrieved 2008-03-17. [2] Kuketayev. "The Data Abstraction Penalty (DAP) Benchmark for Small Objects in Java." (http:/ / www. adtmag. com/ joop/ article. aspx?id=4597). . Retrieved 2008-03-17. [3] Chatzigeorgiou; Stephanides (2002). "Evaluating Performance and Power Of Object-Oriented Vs. Procedural Programming Languages" (http:/ / books. google. com/ ?id=QMalP1P2kAMC& dq="abstraction+ penalty"). In Blieberger; Strohmeier. Proceedings - 7th International Conference on Reliable Software Technologies - Ada-Europe'2002. Springer. pp.367. ISBN978-3-540-43784-0. .
Assembly language
See the terminology section below for information regarding inconsistent use of the terms assembly and assembler. An assembly language is a low-level programming language for a computer, microcontroller, or other programmable device, in which each statement corresponds to a single machine code instruction. Each assembly language is specific to a particular computer architecture, in contrast to most high-level programming languages, which are generally portable across multiple systems. Assembly language is converted into executable machine code by a utility program referred to as an assembler; the conversion process is referred to as assembly, or assembling the code. Assembly language uses a mnemonic to represent each low-level machine operation or opcode. Some opcodes require one or more operands as part of the instruction, and most assemblers can take labels and symbols as operands to represent addresses and constants, instead of hard coding them into the program. Macro assemblers include a macroinstruction facility so that assembly language text can be pre-assigned to a name, and that name can be used to insert the text into other code. Many assemblers offer additional mechanisms to facilitate program development, to control the assembly process, and to aid debugging.
Key concepts
Assembler
An assembler creates object code by translating assembly instruction mnemonics into opcodes, and by resolving symbolic names for memory locations and other entities.[1] The use of symbolic references is a key feature of assemblers, saving tedious calculations and manual address updates after program modifications. Most assemblers also include macro facilities for performing textual substitutione.g., to generate common short sequences of instructions as inline, instead of called subroutines. Assemblers have been available since the 1950s and are far simpler to write than compilers for high-level languages as each mnemonic instruction / address mode combination translates directly into a single machine language opcode. Modern assemblers, especially for RISC architectures, such as SPARC or POWER, as well as x86 and x86-64, optimize Instruction scheduling to exploit the CPU pipeline efficiently.
Assembly language Number of passes There are two types of assemblers based on how many passes through the source are needed to produce the executable program. One-pass assemblers go through the source code once. Any symbol used before it is defined will require "errata" at the end of the object code (or, at least, no earlier than the point where the symbol is defined) telling the linker or the loader to "go back" and overwrite a placeholder which had been left where the as yet undefined symbol was used. Multi-pass assemblers create a table with all symbols and their values in the first passes, then use the table in later passes to generate code. In both cases, the assembler must be able to determine the size of each instruction on the initial passes in order to calculate the addresses of subsequent symbols. This means that if the size of an operation referring to an operand defined later depends on the type or distance of the operand, the assembler will make a pessimistic estimate when first encountering the operation, and if necessary pad it with one or more "no-operation" instructions in a later pass or the errata. In an assembler with peephole optimization addresses may be recalculated between passes to allow replacing pessimistic code with code tailored to the exact distance from the target. The original reason for the use of one-pass assemblers was speed of assembly often a second pass would require rewinding and rereading a tape or rereading a deck of cards. Modern computers perform multi-pass assembly without unacceptable delay. The advantage of the multi-pass assembler is that the absence of errata makes the linking process (or the program load if the assembler directly produces executable code) faster.[2] High-level assemblers More sophisticated high-level assemblers provide language abstractions such as: Advanced control structures High-level procedure/function declarations and invocations High-level abstract data types, including structures/records, unions, classes, and sets Sophisticated macro processing (although available on ordinary assemblers since the late 1950s for IBM 700 series and since the 1960s for IBM/360, amongst other machines) Object-oriented programming features such as classes, objects, abstraction, polymorphism, and inheritance[3] See Language design below for more details.
78
Assembly language
A program written in assembly language consists of a series of (mnemonic) processor instructions and meta-statements (known variously as directives, pseudo-instructions and pseudo-ops), comments and data. Assembly language instructions usually consist of an opcode mnemonic followed by a list of data, arguments or parameters.[4] These are translated by an assembler into machine language instructions that can be loaded into memory and executed. For example, the instruction that tells an x86/IA-32 processor to move an immediate 8-bit value into a register. The binary code for this instruction is 10110 followed by a 3-bit identifier for which register to use. The identifier for the AL register is 000, so the following machine code loads the AL register with the data 01100001.[5] 10110000 01100001 This binary computer code can be made more human-readable by expressing it in hexadecimal as follows B0 61 Here, B0 means 'Move a copy of the following value into AL', and 61 is a hexadecimal representation of the value 01100001, which is 97 in decimal. Intel assembly language provides the mnemonic MOV (an abbreviation of move)
Assembly language for instructions such as this, so the machine code above can be written as follows in assembly language, complete with an explanatory comment if required, after the semicolon. This is much easier to read and to remember. MOV AL, 61h ; Load AL with 97 decimal (61 hex)
79
In some assembly languages the same mnemonic such as MOV may be used for a family of related instructions for loading, copying and moving data, whether these are immediate values, values in registers, or memory locations pointed to by values in registers. Other assemblers may use separate opcodes such as L for "move memory to register", ST for "move register to memory", LR for "move register to register", MVI for "move immediate operand to memory", etc. The Intel opcode 10110000 (B0) copies an 8-bit value into the AL register, while 10110001 (B1) moves it into CL and 10110010 (B2) does so into DL. Assembly language examples for these follow.[5] MOV AL, 1h MOV CL, 2h MOV DL, 3h ; Load AL with immediate value 1 ; Load CL with immediate value 2 ; Load DL with immediate value 3
The syntax of MOV can also be more complex as the following examples show.[6]
MOV EAX, [EBX] ; Move the 4 bytes in memory at the address contained in EBX into EAX
MOV [ESI+EAX], CL ; Move the contents of CL into the byte at address ESI+EAX
In each case, the MOV mnemonic is translated directly into an opcode in the ranges 88-8E, A0-A3, B0-B8, C6 or C7 by an assembler, and the programmer does not have to know or remember which.[5] Transforming assembly language into machine code is the job of an assembler, and the reverse can at least partially be achieved by a disassembler. Unlike high-level languages, there is usually a one-to-one correspondence between simple assembly statements and machine language instructions. However, in some cases, an assembler may provide pseudoinstructions (essentially macros) which expand into several machine language instructions to provide commonly needed functionality. For example, for a machine that lacks a "branch if greater or equal" instruction, an assembler may provide a pseudoinstruction that expands to the machine's "set if less than" and "branch if zero (on the result of the set instruction)". Most full-featured assemblers also provide a rich macro language (discussed below) which is used by vendors and programmers to generate more complex code and data sequences. Each computer architecture has its own machine language. Computers differ in the number and type of operations they support, in the different sizes and numbers of registers, and in the representations of data in storage. While most general-purpose computers are able to carry out essentially the same functionality, the ways they do so differ; the corresponding assembly languages reflect these differences. Multiple sets of mnemonics or assembly-language syntax may exist for a single instruction set, typically instantiated in different assembler programs. In these cases, the most popular one is usually that supplied by the manufacturer and used in its documentation.
Assembly language
80
Language design
Basic elements
There is a large degree of diversity in the way the authors of assemblers categorize statements and in the nomenclature that they use. In particular, some describe anything other than a machine mnemonic or extended mnemonic as a pseudo-operation (pseudo-op). A typical assembly language consists of 3 types of instruction statements that are used to define program operations: Opcode mnemonics Data sections Assembly directives Opcode mnemonics and extended mnemonics Instructions (statements) in assembly language are generally very simple, unlike those in high-level language. Generally, a mnemonic is a symbolic name for a single executable machine language instruction (an opcode), and there is at least one opcode mnemonic defined for each machine language instruction. Each instruction typically consists of an operation or opcode plus zero or more operands. Most instructions refer to a single value, or a pair of values. Operands can be immediate (value coded in the instruction itself), registers specified in the instruction or implied, or the addresses of data located elsewhere in storage. This is determined by the underlying processor architecture: the assembler merely reflects how this architecture works. Extended mnemonics are often used to specify a combination of an opcode with a specific operand, e.g., the System/360 assemblers use B as an extended mnemonic for BC with a mask of 15 and NOP for BC with a mask of 0. Extended mnemonics are often used to support specialized uses of instructions, often for purposes not obvious from the instruction name. For example, many CPU's do not have an explicit NOP instruction, but do have instructions that can be used for the purpose. In 8086 CPUs the instruction xchg ax,ax is used for nop, with nop being a pseudo-opcode to encode the instruction xchg ax,ax. Some disassemblers recognize this and will decode the xchg ax,ax instruction as nop. Similarly, IBM assemblers for System/360 and System/370 use the extended mnemonics NOP and NOPR for BC and BCR with zero masks. For the SPARC architecture, these are known as synthetic instructions[7] Some assemblers also support simple built-in macro-instructions that generate two or more machine instructions. For instance, with some Z80 assemblers the instruction ld hl,bc is recognized to generate ld l,c followed by ld h,b.[8] These are sometimes known as pseudo-opcodes. Data sections There are instructions used to define data elements to hold data and variables. They define the type of data, the length and the alignment of data. These instructions can also define whether the data is available to outside programs (programs assembled separately) or only to the program in which the data section is defined. Some assemblers classify these as pseudo-ops. Assembly directives Assembly directives, also called pseudo opcodes, pseudo-operations or pseudo-ops, are instructions that are executed by an assembler at assembly time, not by a CPU at run time. They can make the assembly of the program dependent on parameters input by a programmer, so that one program can be assembled different ways, perhaps for different applications. They also can be used to manipulate presentation of a program to make it easier to read and maintain. (For example, directives would be used to reserve storage areas and optionally their initial contents.) The names of directives often start with a dot to distinguish them from machine instructions.
Assembly language Symbolic assemblers let programmers associate arbitrary names (labels or symbols) with memory locations. Usually, every constant and variable is given a name so instructions can reference those locations by name, thus promoting self-documenting code. In executable code, the name of each subroutine is associated with its entry point, so any calls to a subroutine can use its name. Inside subroutines, GOTO destinations are given labels. Some assemblers support local symbols which are lexically distinct from normal symbols (e.g., the use of "10$" as a GOTO destination). Some assemblers provide flexible symbol management, letting programmers manage different namespaces, automatically calculate offsets within data structures, and assign labels that refer to literal values or the result of simple computations performed by the assembler. Labels can also be used to initialize constants and variables with relocatable addresses. Assembly languages, like most other computer languages, allow comments to be added to assembly source code that are ignored by the assembler. Good use of comments is even more important with assembly code than with higher-level languages, as the meaning and purpose of a sequence of instructions is harder to decipher from the code itself. Wise use of these facilities can greatly simplify the problems of coding and maintaining low-level code. Raw assembly source code as generated by compilers or disassemblerscode without any comments, meaningful symbols, or data definitionsis quite difficult to read when changes must be made.
81
Macros
Many assemblers support predefined macros, and others support programmer-defined (and repeatedly re-definable) macros involving sequences of text lines in which variables and constants are embedded. This sequence of text lines may include opcodes or directives. Once a macro has been defined its name may be used in place of a mnemonic. When the assembler processes such a statement, it replaces the statement with the text lines associated with that macro, then processes them as if they existed in the source code file (including, in some assemblers, expansion of any macros existing in the replacement text). Note that this definition of "macro" is slightly different from the use of the term in other contexts, like the C programming language. C macro's created through the #define directive typically are just one line, or a few lines at most. Assembler macro instructions can be lengthy "programs" by themselves, executed by interpretation by the assembler during assembly. Since macros can have 'short' names but expand to several or indeed many lines of code, they can be used to make assembly language programs appear to be far shorter, requiring fewer lines of source code, as with higher level languages. They can also be used to add higher levels of structure to assembly programs, optionally introduce embedded debugging code via parameters and other similar features. Macro assemblers often allow macros to take parameters. Some assemblers include quite sophisticated macro languages, incorporating such high-level language elements as optional parameters, symbolic variables, conditionals, string manipulation, and arithmetic operations, all usable during the execution of a given macro, and allowing macros to save context or exchange information. Thus a macro might generate a large number of assembly language instructions or data definitions, based on the macro arguments. This could be used to generate record-style data structures or "unrolled" loops, for example, or could generate entire algorithms based on complex parameters. An organization using assembly language that has been heavily extended using such a macro suite can be considered to be working in a higher-level language, since such programmers are not working with a computer's lowest-level conceptual elements. Macros were used to customize large scale software systems for specific customers in the mainframe era and were also used by customer personnel to satisfy their employers' needs by making specific versions of manufacturer operating systems. This was done, for example, by systems programmers working with IBM's Conversational Monitor System / Virtual Machine (VM/CMS) and with IBM's "real time transaction processing" add-ons, Customer
Assembly language Information Control System CICS, and ACP/TPF, the airline/financial system that began in the 1970s and still runs many large computer reservations systems (CRS) and credit card systems today. It was also possible to use solely the macro processing abilities of an assembler to generate code written in completely different languages, for example, to generate a version of a program in COBOL using a pure macro assembler program containing lines of COBOL code inside assembly time operators instructing the assembler to generate arbitrary code. This was because, as was realized in the 1960s, the concept of "macro processing" is independent of the concept of "assembly", the former being in modern terms more word processing, text processing, than generating object code. The concept of macro processing appeared, and appears, in the C programming language, which supports "preprocessor instructions" to set variables, and make conditional tests on their values. Note that unlike certain previous macro processors inside assemblers, the C preprocessor was not Turing-complete because it lacked the ability to either loop or "go to", the latter allowing programs to loop. Despite the power of macro processing, it fell into disuse in many high level languages (major exceptions being C/C++ and PL/I) while remaining a perennial for assemblers. Macro parameter substitution is strictly by name: at macro processing time, the value of a parameter is textually substituted for its name. The most famous class of bugs resulting was the use of a parameter that itself was an expression and not a simple name when the macro writer expected a name. In the macro: foo: macro a load a*b the intention was that the caller would provide the name of a variable, and the "global" variable or constant b would be used to multiply "a". If foo is called with the parameter a-c, the macro expansion of load a-c*b occurs. To avoid any possible ambiguity, users of macro processors can parenthesize formal parameters inside macro definitions, or callers can parenthesize the input parameters.[9]
82
Assembly language
83
Assembly language
84
Current usage
There have always been debates over the usefulness and performance of assembly language relative to high-level languages. Assembly language has specific niche uses where it is important; see below. But in general, modern optimizing compilers are claimed[18] to render high-level languages into code that can run as fast as hand-written assembly, despite the counter-examples that can be found.[19][20][21] The complexity of modern processors and memory sub-systems makes effective optimization increasingly difficult for compilers, as well as assembly programmers.[22][23] Moreover, and to the dismay of efficiency lovers, increasing processor performance has meant that most CPUs sit idle most of the time, with delays caused by predictable bottlenecks such as I/O operations and paging. This has made raw code execution speed a non-issue for many programmers. There are some situations in which developers might choose to use assembly language: A stand-alone executable of compact size is required that must execute without recourse to the run-time components or libraries associated with a high-level language; this is perhaps the most common situation. For example, firmware for telephones, automobile fuel and ignition systems, air-conditioning control systems, security systems, and sensors. Code that must interact directly with the hardware, for example in device drivers and interrupt handlers. Programs that need to use processor-specific instructions not implemented in a compiler. A common example is the bitwise rotation instruction at the core of many encryption algorithms. Programs that create vectorized functions for programs in higher-level languages such as C. In the higher-level language this is sometimes aided by compiler intrinsic functions which map directly to SIMD mnemonics, but nevertheless result in a one-to-one assembly conversion specific for the given vector processor. Programs requiring extreme optimization, for example an inner loop in a processor-intensive algorithm. Game programmers take advantage of the abilities of hardware features in systems, enabling games to run faster. Also large scientific simulations require highly optimized algorithms, e.g. linear algebra with BLAS[19][24] or discrete cosine transformation (e.g. SIMD assembly version from x264[25]) Situations where no high-level language exists, on a new or specialized processor, for example. Programs need precise timing such as real-time programs such as simulations, flight navigation systems, and medical equipment. For example, in a fly-by-wire system, telemetry must be interpreted and acted upon within strict time constraints. Such systems must eliminate sources of unpredictable delays, which may be created by (some) interpreted languages, automatic garbage collection, paging operations, or preemptive multitasking. However, some higher-level languages incorporate run-time components and operating system interfaces that can introduce such delays. Choosing assembly or lower-level languages for such systems gives programmers greater visibility and control over processing details. cryptographic algorithms that must always take strictly the same time to execute, preventing timing attacks. Situations where complete control over the environment is required, in extremely high security situations where nothing can be taken for granted. Computer viruses, bootloaders, certain device drivers, or other items very close to the hardware or low-level operating system. Instruction set simulators for monitoring, tracing and debugging where additional overhead is kept to a minimum Reverse-engineering and modifying program files such as existing binaries that may or may not have originally been written in a high-level language, for example when trying to recreate programs for which source code is not available or has been lost, or cracking copy protection of proprietary software. Video games (also termed ROM hacking), which is possible via several methods. The most widely employed is altering program code at the assembly language level. Self modifying code, to which assembly language lends itself well.
Assembly language Games and other software for graphing calculators.[26] Assembly language is still taught in most computer science and electronic engineering programs. Although few programmers today regularly work with assembly language as a tool, the underlying concepts remain very important. Such fundamental topics as binary arithmetic, memory allocation, stack processing, character set encoding, interrupt processing, and compiler design would be hard to study in detail without a grasp of how a computer operates at the hardware level. Since a computer's behavior is fundamentally defined by its instruction set, the logical way to learn such concepts is to study an assembly language. Most modern computers have similar instruction sets. Therefore, studying a single assembly language is sufficient to learn: I) the basic concepts; II) to recognize situations where the use of assembly language might be appropriate; and III) to see how efficient executable code can be created from high-level languages. [27] This is analogous to children needing to learn the basic arithmetic operations (e.g., long division), although calculators are widely used for all except the most trivial calculations.
85
Typical applications
Assembly language is typically used in a system's boot code, (BIOS on IBM-compatible PC systems and CP/M), the low-level code that initializes and tests the system hardware prior to booting the OS, and is often stored in ROM. Some compilers translate high-level languages into assembly first before fully compiling, allowing the assembly code to be viewed for debugging and optimization purposes. Relatively low-level languages, such as C, allow the programmer to embed assembly language directly in the source code. Programs using such facilities, such as the Linux kernel, can then construct abstractions using different assembly language on each hardware platform. The system's portable code can then use these processor-specific components through a uniform interface. Assembly language is valuable in reverse engineering. Many programs are distributed only in machine code form which is straightforward to translate into assembly language, but more difficult to translate into a higher-level language. Tools such as the Interactive Disassembler make extensive use of disassembly for such a purpose. Assemblers can be used to generate blocks of data, with no high-level language overhead, from formatted and commented source code, to be used by other code.
Related terminology
Assembly language or assembler language is commonly called assembly, assembler, ASM, or symbolic machine code. A generation of IBM mainframe programmers called it ALC for Assembly Language Code or BAL[28] for Basic Assembly Language. Calling the language assembler might be considered potentially confusing and ambiguous, since this is also the name of the utility program that translates assembly language statements into machine code. However, this usage has been common among professionals and in the literature for decades.[29] Similarly, some early computers called their assembler their assembly program.[30]) The computational step where an assembler is run, including all macro processing, is termed assembly time. The assembler is said to be "assembling" the source code. The use of the word assembly dates from the early years of computers (cf. short code, speedcode). A cross assembler (see also cross compiler) is an assembler that is run on a computer or operating system of a different type from the system on which the resulting code is to run. Cross-assembling may be necessary if the target system cannot run an assembler itself, as is typically the case for small embedded systems. The computer on which the cross assembler is run must have some means of transporting the resulting machine code to the target system. Common methods involve transmitting an exact byte-by-byte copy of the machine code or an ASCII representation of the machine code in a portable format (such as Motorola or Intel hexadecimal) through a compatible interface to the target system for execution.
Assembly language An assembler directive or pseudo-opcode is a command given to an assembler "directing it to perform operations other than assembling instructions."[1] Directives affect how the assembler operates and "may affect the object code, the symbol table, the listing file, and the values of internal assembler parameters." Sometimes the term pseudo-opcode is reserved for directives that generate object code, such as those that generate data.[31] A meta-assembler is "a program that accepts the syntactic and semantic description of an assembly language, and generates an assembler for that language." [32]
86
Further details
For any given personal computer, mainframe, embedded system, and game console, both past and present, at least one possibly dozens of assemblers have been written. For some examples, see the list of assemblers. On Unix systems, the assembler is traditionally called as, although it is not a single body of code, being typically written anew for each port. A number of Unix variants use GAS. Within processor groups, each assembler has its own dialect. Sometimes, some assemblers can read another assembler's dialect, for example, TASM can read old MASM code, but not the reverse. FASM and NASM have similar syntax, but each support different macros that could make them difficult to translate to each other. The basics are all the same, but the advanced features will differ.[33] Also, assembly can sometimes be portable across different operating systems on the same type of CPU. Calling conventions between operating systems often differ slightly or not at all, and with care it is possible to gain some portability in assembly language, usually by linking with a C library that does not change between operating systems. An instruction set simulator can process the object code/ binary of any assembler to achieve portability even across platforms with an overhead no greater than a typical bytecode interpreter. This is similar to use of microcode to achieve compatibility across a processor family. Some higher level computer languages, such as C and Borland Pascal, support inline assembly where sections of assembly code, in practice usually brief, can be embedded into the high level language code. The Forth language commonly contains an assembler used in CODE words. An emulator can be used to debug assembly-language programs.
Assembly language
87
Address
Label
Object code
[34]
a_start 2048 2064 2068 2072 2076 2080 2084 2088 2092 2096 done: length:
.equ 3000 ld length,% be done addcc %r1,-4,%r1 addcc %r1,%r2,%r4 ld %r4,%r5 ba loop addcc %r3,%r5,%r3 jmpl %r15+4,%r0 20 00000010 10000000 00000000 00000110 10000010 10000000 01111111 11111100 10001000 10000000 01000000 00000010 11001010 00000001 00000000 00000000 00010000 10111111 11111111 11111011 10000110 10000000 11000000 00000101 10000001 11000011 11100000 00000100 00000000 00000000 00000000 00010100 00000000 00000000 00001011 10111000
3000
a:
Example of a selection of instructions (for a virtual computer[35]) with the corresponding address in memory where each instruction will be placed. These addresses are not static, see memory management. Accompanying each instruction is the generated (by the assembler) object code that coincides with the virtual computer's architecture (or ISA).
References
[1] [2] [3] [4] David Salomon (1993). Assemblers and Loaders (http:/ / www. davidsalomon. name/ assem. advertis/ asl. pdf) Beck, Leland L. (1996). "2". System Software: An Introduction to Systems Programming. Addison Wesley. Hyde, Randall. "Chapter 12 Classes and Objects". The Art of Assembly Language, 2nd Edition. No Starch Press. 2010. Intel Architecture Software Developers Manual, Volume 2: Instruction Set Reference (http:/ / download. intel. com/ design/ PentiumII/ manuals/ 24319102. PDF). INTEL CORPORATION. 1999. . Retrieved 18 November 2010. [5] Intel Architecture Software Developers Manual, Volume 2: Instruction Set Reference (http:/ / download. intel. com/ design/ PentiumII/ manuals/ 24319102. PDF). INTEL CORPORATION. 1999. pp.442 and 35. . Retrieved 18 November 2010. [6] Evans, David (2006). "x86 Assembly Guide" (http:/ / www. cs. virginia. edu/ ~evans/ cs216/ guides/ x86. html). University of Virginia. . Retrieved 18 November 2010. [7] "The SPARC Architecture Manual, Version 8" (http:/ / www. sparc. com/ standards/ V8. pdf). SPARC, International. 1992. . [8] http:/ / www. z80. de/ z80/ z80code. htm [9] "Macros (C/C++), MSDN Library for Visual Studio 2008" (http:/ / msdn. microsoft. com/ en-us/ library/ 503x3e3s(v=VS. 90). aspx). Microsoft Corp.. . Retrieved 2010-06-22. [10] "Concept 14 Macros" (http:/ / skycoast. us/ pscott/ software/ mvs/ concept14. html). MVS Software. . Retrieved May 25, 2009. [11] Answers.com. "assembly language: Definition and Much More from Answers.com" (http:/ / www. answers. com/ topic/ assembly-language?cat=technology). . Retrieved 2008-06-19. [12] NESHLA: The High Level, Open Source, 6502 Assembler for the Nintendo Entertainment System (http:/ / neshla. sourceforge. net/ ) [13] Salomon. Assemblers and Loaders (http:/ / www. davidsalomon. name/ assem. advertis/ asl. pdf). p.7. . Retrieved 2012-01-17. [14] "The IBM 650 Magnetic Drum Calculator" (http:/ / www. columbia. edu/ cu/ computinghistory/ 650. html). . Retrieved 2012-01-17. [15] Eidolon's Inn : SegaBase Saturn (http:/ / www. eidolons-inn. net/ tiki-index. php?page=SegaBase+ Saturn) [16] http:/ / www. theflamearrows. info/ homepage. html [17] Jim Lawless (2004-05-21). "Speaking with Don French : The Man Behind the French Silk Assembler Tools" (http:/ / www. radiks. net/ ~jimbo/ art/ int7. htm). Archived (http:/ / web. archive. org/ web/ 20080821105848/ http:/ / www. radiks. net/ ~jimbo/ art/ int7. htm) from the original on 21 August 2008. . Retrieved 2008-07-25. [18] Rusling, David A.. "The Linux Kernel" (http:/ / tldp. org/ LDP/ tlk/ basics/ sw. html). . Retrieved Mar 11, 2012.
Assembly language
[19] "Writing the Fastest Code, by Hand, for Fun: A Human Computer Keeps Speeding Up Chips" (http:/ / www. nytimes. com/ 2005/ 11/ 28/ technology/ 28super. html?_r=1). New York Times, John Markoff. 2005-11-28. . Retrieved 2010-03-04. [20] "Bit-field-badness" (http:/ / hardwarebug. org/ 2010/ 01/ 30/ bit-field-badness/ ). hardwarebug.org. 2010-01-30. Archived (http:/ / web. archive. org/ web/ 20100205120952/ http:/ / hardwarebug. org/ 2010/ 01/ 30/ bit-field-badness/ ) from the original on 5 February 2010. . Retrieved 2010-03-04. [21] "GCC makes a mess" (http:/ / hardwarebug. org/ 2009/ 05/ 13/ gcc-makes-a-mess/ ). hardwarebug.org. 2009-05-13. Archived (http:/ / web. archive. org/ web/ 20100316212040/ http:/ / hardwarebug. org/ 2009/ 05/ 13/ gcc-makes-a-mess/ ) from the original on 16 March 2010. . Retrieved 2010-03-04. [22] Randall Hyde. "The Great Debate" (http:/ / webster. cs. ucr. edu/ Page_TechDocs/ GreatDebate/ debate1. html). Archived (http:/ / web. archive. org/ web/ 20080616110102/ http:/ / webster. cs. ucr. edu/ Page_TechDocs/ GreatDebate/ debate1. html) from the original on 16 June 2008. . Retrieved 2008-07-03. [23] "Code sourcery fails again" (http:/ / hardwarebug. org/ 2008/ 11/ 28/ codesourcery-fails-again/ ). hardwarebug.org. 2010-01-30. Archived (http:/ / web. archive. org/ web/ 20100402221204/ http:/ / hardwarebug. org/ 2008/ 11/ 28/ codesourcery-fails-again/ ) from the original on 2 April 2010. . Retrieved 2010-03-04. [24] "BLAS Benchmark-August2008" (http:/ / eigen. tuxfamily. org/ index. php?title=Benchmark-August2008). eigen.tuxfamily.org. 2008-08-01. . Retrieved 2010-03-04. [25] "x264.git/common/x86/dct-32.asm" (http:/ / git. videolan. org/ ?p=x264. git;a=tree;f=common/ x86;hb=HEAD). git.videolan.org. 2010-09-29. . Retrieved 2010-09-29. [26] "68K Programming in Fargo II" (http:/ / tifreakware. net/ tutorials/ 89/ a/ calc/ fargoii. htm). Archived (http:/ / web. archive. org/ web/ 20080702181616/ http:/ / tifreakware. net/ tutorials/ 89/ a/ calc/ fargoii. htm) from the original on 2 July 2008. . Retrieved 2008-07-03. [27] Hyde, Randall (1996-09-30). "Foreword ("Why would anyone learn this stuff?"), op. cit." (http:/ / www. arl. wustl. edu/ ~lockwood/ class/ cs306/ books/ artofasm/ fwd. html). Archived (http:/ / web. archive. org/ web/ 20100325155048/ http:/ / www. arl. wustl. edu/ ~lockwood/ class/ cs306/ books/ artofasm/ fwd. html) from the original on 25 March 2010. . Retrieved 2010-03-05. [28] Techically BAL was only the assembler for BPS; the others were macro assemblers. [29] Stroustrup, Bjarne, The C++ Programming Language, Addison-Wesley, 1986, ISBN 0-201-12078-X: "C++ was primarily designed so that the author and his friends would not have to program in assembler, C, or various modern high-level languages. [use of the term assembler to mean assembly language]" [30] Saxon, James, and Plette, William, Programming the IBM 1401, Prentice-Hall, 1962, LoC 62-20615. [use of the term assembly program] [31] Microsoft Corporation. "MASM: Directives & Pseudo-Opcodes" (http:/ / flint. cs. yale. edu/ cs422/ doc/ art-of-asm/ pdf/ CH08. PDF). . Retrieved March 19, 2011. [32] (John Daintith, ed.) A Dictionary of Computing: "meta-assembler" (http:/ / www. encyclopedia. com/ doc/ 1O11-metaassembler. html) [33] Randall Hyde. "Which Assembler is the Best?" (http:/ / webster. cs. ucr. edu/ AsmTools/ WhichAsm. html). Archived (http:/ / web. archive. org/ web/ 20071018014019/ http:/ / webster. cs. ucr. edu/ AsmTools/ WhichAsm. html) from the original on 18 October 2007. . Retrieved 2007-10-19. [34] Murdocca, Miles J.; Vincent P. Heuring (2000). Principles of Computer Architecture. Prentice-Hall. ISBN0-201-43664-7. [35] Principles of Computer Architecture (http:/ / iiusatech. com/ ~murdocca/ POCA) (POCA) ARCTools virtual computer available for download to execute referenced code, accessed August 24, 2005
88
Further reading
ASM Community Book (http://www.asmcommunity.net/book/) "An online book full of helpful ASM info, tutorials and code examples" by the ASM Community Jonathan Bartlett: Programming from the Ground Up (http://programminggroundup.blogspot.com/). Bartlett Publishing, 2004. ISBN 0-9752838-4-7 Also available online as PDF (http://download.savannah.gnu.org/releases-noredirect/pgubook/ ProgrammingGroundUp-1-0-booksize.pdf) Robert Britton: MIPS Assembly Language Programming. Prentice Hall, 2003. ISBN 0-13-142044-5 Paul Carter: PC Assembly Language. Free ebook, 2001. Website (http://drpaulcarter.com/pcasm/) Jeff Duntemann: Assembly Language Step-by-Step. Wiley, 2000. ISBN 0-471-37523-3 Randall Hyde: The Art of Assembly Language. No Starch Press, 2003. ISBN 1-886411-97-2 Draft versions available online (http://webster.cs.ucr.edu/AoA/index.html) as PDF and HTML Peter Norton, John Socha, Peter Norton's Assembly Language Book for the IBM PC, Brady Books, NY: 1986. Michael Singer, PDP-11. Assembler Language Programming and Machine Organization, John Wiley & Sons, NY: 1980.
Assembly language Dominic Sweetman: See MIPS Run. Morgan Kaufmann Publishers, 1999. ISBN 1-55860-410-3 John Waldron: Introduction to RISC Assembly Language Programming. Addison Wesley, 1998. ISBN 0-201-39828-1
89
External links
Machine language for beginners (http://www.atariarchives.org/mlb/introduction.php) The ASM Community (http://www.asmcommunity.net/), a programming resource about assembly. Unix Assembly Language Programming (http://www.int80h.org/) IBM High Level Assembler (http://www-03.ibm.com/systems/z/os/zos/bkserv/r8pdf/index.html#hlasm) IBM manuals on mainframe assembler language. PPR: Learning Assembly Language (http://c2.com/cgi/wiki?LearningAssemblyLanguage) An Introduction to Writing 32-bit Applications Using the x86 Assembly Language (http://siyobik.info/main/ documents/view/x86-tutorial/) Assembly Language Programming Examples (http://www.azillionmonkeys.com/qed/asmexample.html) Authoring Windows Applications In Assembly Language (http://www.grc.com/smgassembly.htm) Iczelion's Win32 Assembly Tutorial (http://win32assembly.online.fr/tutorials.html) Assembly Optimization Tips (http://mark.masmcode.com/) by Mark Larson
Machine code
Machine code or machine language is a system of impartible instructions executed directly by a computer's central processing unit (CPU). Each instruction performs a very specific task, typically either an operation on a unit of data (in a register or in memory, e.g. add or move), or a jump operation (deciding which instruction executes next, often conditional on the results of a previous instruction). Every executable program is made up of a series of these atomic instructions. Machine code may be regarded as an extremely hardware-dependent programming language or as the lowest-level representation of a compiled and/or assembled computer program. While it is possible to write programs in machine code, because of the tedious difficulty in managing CPU resources, it is rarely done today, except for situations that require the most extreme optimization. Almost all executable programs are written in higher-level languages, and translated to executable machine code by a compiler and linker. Machine code is sometimes called native code when referring to platform-dependent parts of language features or libraries.[1] Programs in interpreted languages[2] are not translated to machine code; however, their interpreter (which may be seen as a processor executing the higher-level program) often is. Machine code should not be confused with so-called "bytecode", which is executed by an interpreter.
Machine code arrangement, operating systems, or peripheral devices. Because a program normally relies on such factors, different systems will typically not run the same machine code, even when the same type of processor is used. A machine code instruction set may have all instructions of the same length, or it may have variable-length instructions. How the patterns are organized varies strongly with the particular architecture and often also with the type of instruction. Most instructions have one or more opcode fields which specifies the basic instruction type (such as arithmetic, logical, jump, etc.) and the actual operation (such as add or compare) and other fields that may give the type of the operand(s), the addressing mode(s), the addressing offset(s) or index, or the actual value itself (such constant operands contained in an instruction are called immediates)[3].
90
Programs
A computer program is a sequence of instructions that are executed by a CPU. While simple processors execute instructions one after the other, superscalar processors are capable of executing several instructions at once. Program flow may be influenced by special 'jump' instructions that transfer execution to an instruction other than the numerically following one. Conditional jumps are taken (execution continues at another address) or not (execution continues at the next instruction) depending on some condition.
Assembly languages
A much more readable rendition of machine language, called assembly language, uses mnemonic codes to refer to machine code instructions, rather than using the instructions' numeric values directly. For example, on the Zilog Z80 processor, the machine code 00000101, which causes the CPU to decrement the B processor register, would be represented in assembly language as DEC B.
Example
The MIPS architecture provides a specific example for a machine code whose instructions are always 32 bits long. The general type of instruction is given by the op (operation) field, the highest 6 bits. J-type (jump) and I-type (immediate) instructions are fully specified by op. R-type (register) instructions include an additional field funct to determine the exact operation. The fields used in these types are: 6 op op op 5 rs | rs | 5 5 5 6 bits rt | rd |shamt| funct] R-type rt | address/immediate] I-type target address ] J-type
[ [ [
| | |
rs, rt, and rd indicate register operands; shamt gives a shift amount; and the address or immediate fields contain an operand directly. For example adding the registers 1 and 2 and placing the result in register 6 is encoded: [ op | rs | rt | rd |shamt| funct] 0 1 2 6 0 32 000000 00001 00010 00110 00000 100000
decimal binary
Load a value into register 8, taken from the memory cell 68 cells after the location listed in register 3: [ op | rs | rt | address/immediate] 35 3 8 68 100011 00011 01000 00000 00001 000100
decimal binary
Machine code [ op | target address ] 2 1024 000010 00000 00000 00000 10000 000000
91
decimal binary
Relationship to microcode
In some computer architectures, the machine code is implemented by a more fundamental underlying layer of programs called microprograms, providing a common machine language interface across a line or family of different models of computer with widely different underlying dataflows. This is done to facilitate porting of machine language programs between different models. An example of this use is the IBM System/360 family of computers and their successors. With dataflow path widths of 8 bits to 64 bits and beyond, they nevertheless present a common architecture at the machine language level across the entire line. Using a microcode layer to implement an emulator enables the computer to present the architecture of an entirely different computer. The System/360 line used this to allow porting programs from earlier IBM machines to the new family of computers, e.g. an IBM 1401/1440/1460 emulator on the IBM S/360 model 40.
Storing in memory
The Harvard architecture is a computer architecture with physically separate storage and signal pathways for the code (instructions) and data. Today, most processors implement such separate signal pathways for performance reasons but actually implement a Modified Harvard architecture, so they can support tasks like loading an executable program from disk storage as data and then executing it. Harvard architecture is contrasted to the Von Neumann architecture, where data and code are stored in the same memory. From the point of view of a process, the code space is the part of its address space where code in execution is stored. In multitasking systems this comprises the program's code segment and usually shared libraries. In multi-threading environment, different threads of one process share code space along with data space, which reduces the overhead of context switching considerably as compared to process switching.
Readability by humans
It has been said that machine code is so unreadable that the United States Copyright Office cannot even identify whether a particular encoded program is an original work of authorship.[4] Hofstadter compares machine code with the genetic code: "Looking at a program written in machine language is vaguely comparable to looking at a DNA molecule atom by atom."[5]
Further reading
Hennessy, John L.; Patterson, David A.. Computer Organization and Design. The Hardware/Software Interface.. Morgan Kaufmann Publishers. ISBN1-55860-281-X. Tanenbaum, Andrew S.. Structured Computer Organization. Prentice Hall. ISBN0-13-020435-8. Brookshear, J. Glenn. Computer Science: An Overview. Addison Wesley. ISBN0-321-38701-5.
Machine code
92
Source code
In computer science, source code is any collection of computer instructions (possibly with comments) written using some human-readable computer language, usually as text. The source code of a program is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source code. The source code is automatically translated at some point to machine code that the computer can directly read and execute. An interpreter translates to machine code and executes it on the fly when the program is run, while a compiler translates the program in advance to machine code that it stores as executable files; these can then be executed as a separate step.
An illustration of Java source code with prologue comments indicated in red, inline comments indicated in green, and program code indicated in blue
Most computer applications are distributed in a form that includes executable files, but not their source code. If the source code were included, it would be useful to a user, programmer, or system administrator, who may wish to modify the program or understand how it works. The source code which constitutes a program is usually held in one or more text files stored on a computer's hard disk; usually these files are carefully arranged into a directory tree, known as a source tree. Source code can also be stored in a database (as is common for stored procedures) or elsewhere. Source code also appears in books and other media; often in the form of small code snippets, but occasionally complete code bases; a well-known case is the source code of PGP. The notion of source code may also be taken more broadly, to include machine code and notations in graphical languages, neither of which are textual in nature. An example from an article presented on the annual IEEE conference on Source Code Analysis and Manipulation:[1] For the purpose of clarity source code is taken to mean any fully executable description of a software system. It is therefore so construed as to include machine code, very high level languages and executable graphical representations of systems.[2]
Source code The code base of a programming project is the larger collection of all the source code of all the computer programs which make up the project. It has become common practice to maintain code bases in version control systems.
93
Organization
The source code for a particular piece of software may be contained in a single file or many files. Though the practice is uncommon, a program's source code can be written in different programming languages.[3] For example, a program written primarily in the C programming language, might have portions written in assembly language for optimization purposes. It is also possible for some components of a piece of software to be written and compiled separately, in an arbitrary programming language, and later integrated into the software using a technique called library linking. This is the case in some languages, such as Java: each class is compiled separately into a file and linked by the interpreter at runtime. Yet another method is to make the main program an interpreter for a programming language, either designed specifically for the application in question or general-purpose, and then write the bulk of the actual user functionality as macros or other forms of add-ins in this language, an approach taken for example by the GNU Emacs text editor. Moderately complex software customarily requires the compilation or assembly of several, sometimes dozens or even hundreds, of different source code files. In these cases, instructions for compilations, such as a Makefile, are included with the source code. These describe the relationships among the source code files, and contain information about how they are to be compiled. The revision control system is another tool frequently used by developers for source code maintenance.
Purposes
Source code is primarily used as input to the process that produces an executable program (i.e., it is compiled or interpreted). It is also used as a method of communicating algorithms between people (e.g., code snippets in books).[4] Programmers often find it helpful to review existing source code to learn about programming techniques.[4] The sharing of source code between developers is frequently cited as a contributing factor to the maturation of their programming skills.[4] Some people consider source code an expressive artistic medium.[5] Porting software to other computer platforms is usually prohibitively difficult without source code. Without the source code for a particular piece of software, portability is generally computationally expensive. Possible porting options include binary translation and emulation of the original platform. Decompilation of an executable program can be used to generate source code, either in assembly code or in a high level language. Programmers frequently adapt source code from one piece of software to use in other projects, a concept known as software reusability.
Licensing
Software, and its accompanying source code, typically falls within one of two licensing paradigms: free software and proprietary software. Generally speaking, software is free if the source code is free to use, distribute, modify and study, and proprietary if the source code is kept secret, or is privately owned and restricted. Note that "free" refers to freedom, not price. Under many licenses it is acceptable to charge for "free software". The first free software license to be published and to explicitly grant these freedoms was the GNU General Public License in 1989. The GNU GPL was originally intended to be used with the GNU operating system. The GNU GPL was later adopted by other non-GNU software
Source code projects such as the Linux kernel. For proprietary software, the provisions of the various copyright laws, trade secrecy and patents are used to keep the source code closed. Additionally, many pieces of retail software come with an end-user license agreement (EULA) which typically prohibits decompilation, reverse engineering, analysis, modification, or circumventing of copy protection. Types of source code protection beyond traditional compilation to object code include code encryption, code obfuscation or code morphing.
94
Quality
The way a program is written can have important consequences for its maintainers. Coding conventions, which stress readability and some language-specific conventions, are aimed at the maintenance of the software source code, which involves debugging and updating. Other priorities, such as the speed of the program's execution, or the ability to compile the program for multiple architectures, often make code readability a less important consideration, since code quality depends entirely on its purpose.
References
[1] SCAM Working Conference (http:/ / www. ieee-scam. org/ ), 2001-2010. [2] Why Source Code Analysis and Manipulation Will Always Be Important (http:/ / www. cs. ucl. ac. uk/ staff/ M. Harman/ scam10. pdf) by Mark Harman, 10th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2010). Timioara, Romania, 1213 September 2010. [3] Extending and Embedding the Python Interpreter Python v2.6 Documentation (http:/ / docs. python. org/ extending/ ) [4] Spinellis, D: Code Reading: The Open Source Perspective. Addison-Wesley Professional, 2003. ISBN 0-201-79940-5 [5] "Art and Computer Programming" ONLamp.com (http:/ / www. onlamp. com/ pub/ a/ onlamp/ 2005/ 06/ 30/ artofprog. html), (2005)
(VEW04) "Using a Decompiler for Real-World Source Recovery", M. Van Emmerik and T. Waddington, the Working Conference on Reverse Engineering, Delft, Netherlands, 912 November 2004. Extended version of the paper (http://www.itee.uq.edu.au/~emmerik/experience_long.pdf).
Source code
95
External links
Source Code Definition (http://www.linfo.org/source_code.html) - by The Linux Information Project (LINFO) Google public source code search (http://www.google.com/codesearch?) "Obligatory accreditation system for IT security products (2008-09-22), may start from May 2009, reported by Yomiuri on 2009-04-24." (http://www.metafilter.com/75061/ Obligatory-accreditation-system-for-IT-security-products). MetaFilter.com. Retrieved 2009-04-24. Same program written in multiple languages (http://rosettacode.org/wiki/Main_Page)
Command
In computing, a command is a directive to a computer program acting as an interpreter of some kind, in order to perform a specific task. Most commonly a command is a directive to some kind of command line interface, such as a shell. Specifically, the term command is used in imperative computer languages. These languages are called this, because statements in these languages are usually written in a manner similar to the imperative mood used in many natural languages. If one views a statement in an imperative language as being like a sentence in a natural language, then a command is generally like a verb in such a language. Many programs allow specially formatted arguments, known as flags, which modify the default behaviour of the command, while further arguments describe what the command acts on. Comparing to a natural language: the flags are adverbs, whilst the other arguments are objects.
Examples
Here are some commands given to a command line interpreter (Unix shell). This command changes the user's place in the directory tree from their current position to the directory /home/pete. cd is the command and /home/pete is the argument: cd /home/pete This command prints the text hello out to the standard output stream, which, in this case, will just print the text out on the screen. echo is the command and "Hello World" is the argument. The quotes are used to prevent Hello and World being treated as separate arguments: echo "Hello World" These commands are equivalent. They list files in the directory /bin. ls is the command, /bin is the argument and there are three flags: -l, -t and -r: ls -l -t -r /bin ls -ltr /bin
This displays the contents of the files ch1.txt and ch2.txt. cat is the command and ch1.txt and ch2.txt are both arguments. cat ch1.txt ch2.txt
Command This lists all the contents of the current directory. dir is the command and "A" is a flag. There is no argument. Here are some commands given to a different command line interpreter (the DOS, OS/2 and Microsoft Windows command prompt). Notice that the flags are identified differently but that the concepts are the same: dir /A
96
This displays the contents of the file readme.txt. type is the command. "readme.txt" is the argument. "P" is a parameter... type /P readme.txt
External links
command [1] from FOLDOC
References
[1] http:/ / foldoc. org/ index. cgi?query=command
Execution
Execution in computer and software engineering is the process by which a computer or a virtual machine carries out the instructions of a computer program. The instructions in the program trigger sequences of simple actions on the executing machine. Those actions produce effects according to the semantics of the instructions in the program. Programs for a computer may execute in a batch process without human interaction, or a user may type commands in an interactive session of an interpreter. In this case the "commands" are simply programs, whose execution is chained together. The term run is used almost synonymously. A related meaning of both "to run" and "to execute" refers to the specific action of a user starting (or launching or invoking) a program, as in "Please run the ... application."
Context of execution
The context in which execution takes place is crucial. Very few programs execute on a bare machine. Programs usually contain implicit and explicit assumptions about resources available at the time of execution. Most programs execute with the support of an operating system and run-time libraries specific to the source language that provide crucial services not supplied directly by the computer itself. This supportive environment, for instance, usually decouples a program from direct manipulation of the computer peripherals, providing more general, abstract services instead.
Interpreter
A system that executes a program is called an interpreter of the program. Loosely speaking, an interpreter actually does what the program says to do. This contrasts with to a language translator that converts a program from one language to another. The most common language translators are compilers. Translators typically convert their source from a high-level, human readable language into a lower-level language (sometimes as low as native machine code) that is simpler and faster for the processor to directly execute. The ideal is that the ratio of executions to translations of a program will be large; that is, a program need only be compiled once and can be run any number of times. This can provide a large benefit for translation versus direct interpretation of the source language. One trade-off is that development time is increased, because of the compilation. In some cases, only the changed files must be
Execution recompiled. Then the executable needs to be relinked. For some changes, the executable must be rebuilt from scratch. As computers and compilers become faster, this fact becomes less of an obstacle. Also, the speed of the end product is typically more important to the user than the development time. Translators usually produce an abstract result that is not completely ready to execute. Frequently, the operating system will convert the translator's object code into the final executable form just before execution of the program begins. This usually involves modifying the code to bind it to real hardware addresses and establishing address links between the program and support code in libraries. In some cases this code is further transformed the first time it is executed, for instance by just-in-time compilers, into a more efficient form that persists for some period, usually at least during the current execution run.
97
References
98
Theory
Programming language theory
Programming language theory (PLT) is a branch of computer science that deals with the design, implementation, analysis, characterization, and classification of programming languages and their individual features. It falls within the discipline of computer science, both depending on and affecting mathematics, software engineering and linguistics. It is a well-recognized branch of computer science, and an active research area, with results published in numerous journals dedicated to PLT, as well as in general computer science and engineering publications.
History
In some ways, the history of programming language theory predates even the development of programming The lowercase Greek letter (lambda) is an unofficial symbol of the languages themselves. The lambda calculus, developed field of programming language theory. This usage derives from the by Alonzo Church and Stephen Cole Kleene in the lambda calculus, a computational model introduced by Alonzo Church in the 1930s and widely used by programming language 1930s, is considered by some to be the world's first researchers. It graces the cover of the classic text Structure and programming language, even though it was intended to Interpretation of Computer Programs, and the title of the so-called model computation rather than being a means for Lambda Papers, written by Gerald Jay Sussman and Guy Steele, the programmers to describe algorithms to a computer developers of the Scheme programming language. system. Many modern functional programming languages have been described as providing a "thin veneer" over the lambda calculus,[1] and many are easily described in terms of it. The first programming language to be proposed was Plankalkl, which was designed by Konrad Zuse in the 1940s, but not publicly known until 1972 (and not implemented until 1998). The first widely known and successful programming language was Fortran, developed from 1954 to 1957 by a team of IBM researchers led by John Backus. The success of FORTRAN led to the formation of a committee of scientists to develop a "universal" computer language; the result of their effort was ALGOL 58. Separately, John McCarthy of MIT developed the Lisp programming language (based on the lambda calculus), the first language with origins in academia to be successful. With the success of these initial efforts, programming languages became an active topic of research in the 1960s and beyond. Some other key events in the history of programming language theory since then:
1950s
Noam Chomsky developed the Chomsky hierarchy in the field of linguistics; a discovery which has directly impacted programming language theory and other branches of computer science.
99
1960s
The Simula language was developed by Ole-Johan Dahl and Kristen Nygaard; it is widely considered to be the first example of an object-oriented programming language; Simula also introduced the concept of coroutines. In 1964, Peter Landin is the first to realize Church's lambda calculus can be used to model programming languages. He introduces the SECD machine which "interprets" lambda expressions. In 1965, Landin introduces the J operator, essentially a form of continuation. In 1966, Landin introduces ISWIM, an abstract computer programming language in his article The Next 700 Programming Languages. It is influential in the design of languages leading to the Haskell programming language. In 1967, Christopher Strachey publishes his influential set of lecture notes Fundamental Concepts in Programming Languages, introducing the terminology R-values, L-values, parametric polymorphism, and ad hoc polymorphism. In 1969, J. Roger Hindley publishes The Principal Type-Scheme of an Object in Combinatory Logic, later generalized into the HindleyMilner type inference algorithm. In 1969, Tony Hoare introduces the Hoare logic, a form of axiomatic semantics. In 1969, William Alvin Howard observed that a "high-level" proof system, referred to as natural deduction, can be directly interpreted in its intuitionistic version as a typed variant of the model of computation known as lambda calculus. This became known as the CurryHoward correspondence.
1970s
In 1970, Dana Scott first publishes his work on denotational semantics. In 1972, Logic programming and Prolog were developed thus allowing computer programs to be expressed as mathematical logic. In 1974, John C. Reynolds discovers System F. It had already been discovered in 1971 by the mathematical logician Jean-Yves Girard. From 1975, Sussman and Steele develop the Scheme programming language, a Lisp dialect incorporating lexical scoping, a unified namespace, and elements from the Actor model including first-class continuations. Backus, at the 1977 ACM Turing Award lecture, assailed the current state of industrial languages and proposed a new class of programming languages now known as function-level programming languages. In 1977, Gordon Plotkin introduces Programming Computable Functions, an abstract typed functional language. In 1978, Robin Milner introduces the HindleyMilner type inference algorithm for the ML programming language. Type theory became applied as a discipline to programming languages, this application has led to tremendous advances in type theory over the years.
1980s
In 1981, Gordon Plotkin publishes his paper on structured operational semantics. In 1988, Gilles Kahn published his papar on natural semantics. A team of scientists at Xerox PARC led by Alan Kay develop Smalltalk, an object-oriented language widely known for its innovative development environment. There emerged process calculi, such as the Calculus of Communicating Systems of Robin Milner, and the Communicating sequential processes model of C. A. R. Hoare, as well as similar models of concurrency such as the Actor model of Carl Hewitt. In 1985, The release of Miranda sparks an academic interest in lazy-evaluated pure functional programming languages. A committee was formed to define an open standard resulting in the release of the Haskell 1.0 standard in 1990.
Programming language theory Bertrand Meyer created the methodology Design by contract and incorporated it into the Eiffel programming language. In the 1990s: Gregor Kiczales, Jim Des Rivieres and Daniel G. Bobrow published the book The Art of the Metaobject Protocol. Eugenio Moggi and Philip Wadler introduced the use of monads for structuring programs written in functional programming languages.
100
Formal semantics
Formal semantics is the formal specification of the behaviour of computer programs and programming languages. Three common approaches to describe the semantics or "meaning" of a computer program are denotational semantics, operational semantics and axiomatic semantics.
Type theory
Type theory is the study of type systems; which are "tractable syntactic method(s) for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute." (Types and Programming Languages, MIT Press, 2002). Many programming languages are distinguished by the characteristics of their type systems.
Domain-specific languages
Domain-specific languages are languages constructed to efficiently solve problems in a particular problem domain.
Compiler construction
Compiler theory is the theory of writing compilers (or more generally, translators); programs which translate a program written in one language into another form. The actions of a compiler are traditionally broken up into syntax analysis (scanning and parsing), semantic analysis (determining what a program should do), optimization (improving the performance of a program as indicated by some metric; typically execution speed) and code generation (generation and output of an equivalent program in some target language; often the instruction set of a CPU).
101
Run-time systems
Runtime systems refers to the development of programming language runtime environments and their components, including virtual machines, garbage collection, and foreign function interfaces.
References
[1] http:/ / www. c2. com/ cgi/ wiki?ModelsOfComputation
Further reading
Abadi, Martn and Cardelli, Luca. A Theory of Objects. Springer-Verlag. Michael J. C. Gordon. Programming Language Theory and Its Implementation. Prentice Hall. Gunter, Carl and Mitchell, John C. (eds.). Theoretical Aspects of Object Oriented Programming Languages: Types, Semantics, and Language Design. MIT Press. Harper, Robert. Practical Foundations for Programming Languages (http://www.cs.cmu.edu/~rwh/plbook/ book.pdf). Draft version. Knuth, Donald E. (2003). Selected Papers on Computer Languages (http://www-cs-faculty.stanford.edu/~uno/ cl.html). Stanford, California: Center for the Study of Language and Information. Mitchell, John C.. Foundations for Programming Languages. Mitchell, John C.. Introduction to Programming Language Theory. O'Hearn, Peter. W. and Tennent, Robert. D. (1997). Algol-like Languages (http://www.eecs.qmul.ac.uk/ ~ohearn/Algol/algol.html). Progress in Theoretical Computer Science. Birkhauser, Boston. Pierce, Benjamin C. (2002). Types and Programming Languages (http://www.cis.upenn.edu/~bcpierce/tapl/ main.html). MIT Press. Pierce, Benjamin C. Advanced Topics in Types and Programming Languages. Pierce, Benjamin C. et al. (2010). Software Foundations (http://www.cis.upenn.edu/~bcpierce/sf/).
External links
Lambda the Ultimate (http://lambda-the-ultimate.org/policies#Purpose), a community weblog for professional discussion and repository of documents on programming language theory. Great Works in Programming Languages (http://www.cis.upenn.edu/~bcpierce/courses/670Fall04/ GreatWorksInPL.shtml). Collected by Benjamin C. Pierce. Programming Language Research (http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mleone/web/ language-research.html). Directory by Mark Leone. Programming Language Theory Texts Online (http://www.cs.uu.nl/wiki/Techno/ ProgrammingLanguageTheoryTextsOnline). At Utrecht University. -Calculus: Then & Now (http://turing100.acm.org/lambda_calculus_timeline.pdf) by Dana S. Scott for the ACM Turing Centenary Celebration
Programming language theory Grand Challenges in Programming Languages (http://plgrand.blogspot.com/). Panel session at POPL 2009.
102
Type system
<noinclude Mm q q q Vd XT9 </noinclude> A type system associates a type with each computed value. By examining the flow of these values, a type system attempts to ensure or prove that no type errors can occur. The particular type system in question determines exactly what constitutes a type error, but in general the aim is to prevent operations expecting a certain kind of value being used with values for which that operation does not make sense (logic errors); memory errors will also be prevented. Type systems are often specified as part of programming languages, and built into the interpreters and compilers for them; although they can also be implemented as optional tools. In computer science, a type system may be defined as "a tractable syntactic framework for classifying phrases according to the kinds of values they compute".[1] A compiler may also use the static type of a value to optimize the storage it needs and the choice of algorithms for operations on the value. In many C compilers the float data type, for example, is represented in 32 bits, in accord with the IEEE specification for single-precision floating point numbers. They will thus use floating-point-specific microprocessor operations on those values (floating-point addition, multiplication, etc.). The depth of type constraints and the manner of their evaluation affect the typing of the language. A programming language may further associate an operation with varying concrete algorithms on each type in the case of type polymorphism. Type theory is the study of type systems, although the concrete type systems of programming languages originate from practical issues of computer architecture, compiler implementation, and language design.
Fundamentals
Formally, type theory studies type systems. A programming language must have occurrence to type check using the type system whether at compiler time or runtime, manually annotated or automatically inferred. As Mark Manasse concisely put it:[2] The fundamental problem addressed by a type theory is to ensure that programs have meaning. The fundamental problem caused by a type theory is that meaningful programs may not have meanings ascribed to them. The quest for richer type systems results from this tension. Assigning a data type, what is called typing, gives meaning to a sequences of bits such as a value in memory or some object such as a variable. The hardware of a general purpose computer is unable to discriminate between for example a memory address and an instruction code, or between a character, an integer, or a floating-point number, because it makes no intrinsic distinction between any of the possible values of a sequence of bits might mean. Associating a sequence of bits with a type conveys that meaning to the programmable hardware to form a symbolic system composed of that hardware and some programmer. A program associates each value with at least one particular type, but it also occurs also that a one value is associated with many subtypes. Other entities, such as objects, modules, communication channels, dependencies can become associated with a type. Even a type can become associated with a type. An implementation of some type system could in theory associate some identifications named this way: data type a type of a value class a type of an object kind (type theory) a type of a type, or metatype These are the kinds of abstractions typing can go through on a hierarchy of levels contained in a system.
Type system When a programming language evolves a more elaborate type system, it gains a more finely-grained rule set than basic type checking, but this comes at a price when the type inferences (and other properties) become undecidable, and when more attention must be paid by the programmer to annotate code or to consider computer-related operations and functioning. It is challenging to find a sufficiently expressive type system that satisfies all programming practices in type safe manner. The more type restrictions that are imposed by the compiler, the more strongly typed a programming language is. Strongly typed languages often require the programmer to make explicit conversions in contexts where an implicit conversion would cause no harm. Pascal's type system has been described as "too strong" because, for example, the size of an array or string is part of its type, making some programming tasks difficult.[3][4] Haskell is also strongly typed but its types are automatically inferred so that explicit conversions are unnecessary. A programming language compiler can also implement a dependent type or an effect system, which enables even more program specifications to be verified by a type checker. Beyond simple value-type pairs, a virtual "region" of code is associated with an "effect" component describing what is being done with what, and enabling for example to "throw" an error report. Thus the symbolic system may be a type and effect system, which endows it with more safety checking than type checking alone. Whether automated by the compiler or specified by a programmer, a type system makes program behavior illegal that is outside the type-system rules. Advantages provided by programmer-specified type systems include: Abstraction (or modularity) Types enable programmers to think at a higher level than the bit or byte, not bothering with low-level implementation. For example, programmers can begin to think of a string as a collection of character values instead of as a mere array of bytes. Higher still, types enable programmers to think about and express interfaces between two of any-sized subsystems. This enables more levels of localization so that the definitions required for interoperability of the subsystems remain consistent when those two subsystems communicate. Documentation In more expressive type systems, types can serve as a form of documentation clarifying the intent of the programmer. For instance, if a programmer declares a function as returning a timestamp type, this documents the function when the timestamp type can be explicitly declared deeper in the code to be integer type. Advantages provided by compiler-specified type systems include: Optimization Static type-checking may provide useful compile-time information. For example, if a type requires that a value must align in memory at a multiple of four bytes, the compiler may be able to use more efficient machine instructions. Safety A type system enables the compiler to detect meaningless or probably invalid code. For example, we can identify an expression 3 / "Hello, World" as invalid, when the rules do not specify how to divide an integer by a string. Strong typing offers more safety, but cannot guarantee complete type safety. Type safety contributes to program correctness, but can only guarantee correctness at the expense of making the type checking itself an undecidable problem. In a type system with automated type checking a program may prove to run incorrectly yet be safely typed, and produce no compiler errors. Division by zero is an unsafe and incorrect operation, but a type checker running only at compile time doesn't scan for division by zero in most programming languages, and then it is left as a runtime error. To prove the absence of these more-general-than-types defects, other kinds of formal methods, collectively known as program analyses, are in common use. In addition software testing is an empirical method for finding errors that the type checker cannot detect.
103
Type system
104
Type checking
The process of verifying and enforcing the constraints of types type checking may occur either at compile-time (a static check) or run-time (a dynamic check). If a language specification requires its typing rules strongly (i.e., more or less allowing only those automatic type conversions that do not lose information), one can refer to the process as strongly typed, if not, as weakly typed. The terms are not usually used in a strict sense.
Static typing
A programming language is said to use static typing when type checking is performed during compile-time as opposed to run-time. Statically typed languages include ActionScript 3, Ada, C, D, Eiffel, F#, Fortran, Go, Haskell, haXe, JADE, Java, ML, Objective-C, OCaml, Pascal, Seed7 and Scala. C++ is statically typed, aside from its run-time type information system. The C# type system performs static-like compile-time type checking, but also includes full runtime type checking. Perl is statically typed with respect to distinguishing arrays, hashes, scalars, and subroutines. Static typing is a limited form of program verification (see type safety): accordingly, it allows many type errors to be caught early in the development cycle. Static type checkers evaluate only the type information that can be determined at compile time, but are able to verify that the checked conditions hold for all possible executions of the program, which eliminates the need to repeat type checks every time the program is executed. Program execution may also be made more efficient (e.g. faster or taking reduced memory) by omitting runtime type checks and enabling other optimizations. Because they evaluate type information during compilation and therefore lack type information that is only available at run-time, static type checkers are conservative. They will reject some programs that may be well-behaved at run-time, but that cannot be statically determined to be well-typed. For example, even if an expression <complex test> always evaluates to true at run-time, a program containing the code if <complex test> then <do something> else <type error> will be rejected as ill-typed, because a static analysis cannot determine that the else branch won't be taken.[1] The conservative behaviour of static type checkers is advantageous when <complex test> evaluates to false infrequently: A static type checker can detect type errors in rarely used code paths. Without static type checking, even code coverage tests with 100% coverage may be unable to find such type errors. The tests may fail to detect such type errors, because the combination of all places where values are created and all places where a certain value is used must be taken into account. The most widely used statically typed languages are not formally type safe. They have "loopholes" in the programming language specification enabling programmers to write code that circumvents the verification performed by a static type checker and so address a wider range of problems. For example, most C-style languages have type punning, and Haskell has such features as unsafePerformIO: such operations may be unsafe at runtime, in that they can cause unwanted behaviour due to incorrect typing of values when the program runs.
Dynamic typing
A programming language is said to be dynamically typed when the majority of its type checking is performed at run-time as opposed to at compile-time. In dynamic typing values have types, but variables do not; that is, a variable can refer to a value of any type. Dynamically typed languages include APL, Erlang, Groovy, JavaScript, Lisp, Lua, MATLAB, GNU Octave, Perl (for user-defined types, but not built-in types), PHP, Pick BASIC, Prolog, Python, Ruby, Smalltalk and Tcl. Implementations of dynamically typed languages generally associate run-time objects with "tags" containing their type information. This run-time classification is then used to implement type checks and dispatch overloaded functions, but can also enable pervasive uses of dynamic dispatch, late binding and similar idioms that would be
Type system cumbersome at best in a statically typed language, requiring the use of variant types or similar features. More broadly, as explained below, dynamic typing can improve support for dynamic programming language features, such as generating types and functionality based on run-time data. (Nevertheless, dynamically typed languages need not support any or all such features, and some dynamic programming languages are statically typed.) On the other hand, dynamic typing provides fewer a priori guarantees: a dynamically typed language accepts and attempts to execute some programs that would be ruled as invalid by a static type checker, either due to errors in the program or due to static type checking being too conservative. Dynamic typing may result in runtime type errorsthat is, at runtime, a value may have an unexpected type, and an operation nonsensical for that type is applied. Such errors may occur long after the place where the programming mistake was madethat is, the place where the wrong type of data passed into a place it should not have. This may make the bug difficult to locate. Dynamically typed language systems' run-time checks can potentially be more sophisticated than those of statically typed languages, as they can use dynamic information as well as any information from the source code. On the other hand, runtime checks only assert that conditions hold in a particular execution of the program, and the checks are repeated for every execution of the program. Development in dynamically typed languages is often supported by programming practices such as unit testing. Testing is a key practice in professional software development, and is particularly important in dynamically typed languages. In practice, the testing done to ensure correct program operation can detect a much wider range of errors than static type-checking, but full test coverage over all possible executions of a program (including timing, user inputs, etc.), if even possible, would be extremely costly and impractical. Static typing helps by providing strong guarantees of a particular subset of commonly made errors never occurring.
105
Type system typed languages such as Dependent ML and Epigram have suggested that almost all bugs can be considered type errors, if the types used in a program are properly declared by the programmer or correctly inferred by the compiler.[6] Static typing usually results in compiled code that executes more quickly. When the compiler knows the exact data types that are in use, it can produce optimized machine code. Further, compilers for statically typed languages can find assembler shortcuts more easily. Some dynamically typed languages such as Common Lisp allow optional type declarations for optimization for this very reason. Static typing makes this pervasive. See optimization. By contrast, dynamic typing may allow compilers to run more quickly and allow interpreters to dynamically load new code, since changes to source code in dynamically typed languages may result in less checking to perform and less code to revisit. This too may reduce the edit-compile-test-debug cycle. Statically typed languages that lack type inference (such as C and Java) require that programmers declare the types they intend a method or function to use. This can serve as additional documentation for the program, which the compiler will not permit the programmer to ignore or permit to drift out of synchronization. However, a language can be statically typed without requiring type declarations (examples include Haskell, Scala, OCaml and to a lesser extent C#), so explicit type declaration is not a necessary requirement for static typing in all languages. Dynamic typing allows constructs that some static type checking would reject as illegal. For example, eval functions, which execute arbitrary data as code, become possible. An eval function is possible with static typing, but requires advanced uses of algebraic data types. Furthermore, dynamic typing better accommodates transitional code and prototyping, such as allowing a placeholder data structure (mock object) to be transparently used in place of a full-fledged data structure (usually for the purposes of experimentation and testing). Dynamic typing typically allows duck typing (which enables easier code reuse). Many languages with static typing also feature duck typing or other mechanisms like generic programming which also enables easier code reuse. Dynamic typing typically makes metaprogramming easier to use. For example, C++ templates are typically more cumbersome to write than the equivalent Ruby or Python code. More advanced run-time constructs such as metaclasses and introspection are often more difficult to use in statically typed languages. In some languages, such features may also be used e.g. to generate new types and behaviors on the fly, based on run-time data. Such advanced constructs are often provided by dynamic programming languages; many of these are dynamically typed, although dynamic typing need not be related to dynamic programming languages.
106
Type system In a weakly typed language, the result of this operation depends on language-specific rules. Visual Basic would convert the string "37" into the number 37, perform addition, and produce the number 42. JavaScript would convert the number 5 to the string "5", perform string concatenation, and produce the string "537." In JavaScript, the conversion to string is applied regardless of the order of the operands (for example, y + x would be "375") while in AppleScript, the left-most operand determines the type of the result, so that x + y is the number 42 but y + x is the string "375". In the same manner, due to JavaScript's dynamic type conversions: var y = 2 / 0; infinity y == Number.POSITIVE_INFINITY Infinity == Number.POSITIVE_INFINITY "Infinity" == Infinity y == "Infinity" // y now equals a constant for // // // // returns returns returns returns true true true true
107
A C cast gone wrong exemplifies the problems that can occur if strong typing is absent: if a programmer casts a value from one type to another in C, not only must the compiler allow the code at compile time, but the runtime must allow it as well. This may permit more compact and faster C code, but it can make debugging more difficult.
108
, equivalent to three characters after the terminating zero character of the string pointed to by y
. The content of that location is undefined, and might lie outside addressable memory. The mere computation of such a pointer may result in undefined behavior (including the program crashing) according to C standards, and in typical systems dereferencing z at this point could cause the program to crash. We have a well-typed, but not memory-safe programa condition that cannot occur in a type-safe language. In some languages, like JavaScript, the use of special numeric values and constants allows type-safety for mathematical operations without resulting in runtime errors. For example, when dividing a Number by a String , or a Number by zero. var x = 32; var aString = new String("A"); x = x/aString; meaning Not A Number isNaN(x); typeof(x); var y = 2 / 0; y == Number.POSITIVE_INFINITY; typeof(y);
// x now equals the constant NaN, // // // // // returns true returns "number" y now equals a constant for infinity returns true returns "number"
Type system
109
Duck typing
In "duck typing",[10] a statement calling a method m on an object does not rely on the declared type of the object; only that the object, of whatever type, must supply an implementation of the method called, when called, at run-time. Duck typing differs from structural typing in that, if the part (of the whole module structure) needed for a given local computation is present at runtime, the duck type system is satisfied in its type identity analysis. On the other hand, a structural type system would require the analysis of the whole module structure at compile time to determine type identity or type dependence. Duck typing differs from a nominative type system in a number of aspects. The most prominent ones are that for duck typing, type information is determined at runtime (as contrasted to compile time), and the name of the type is irrelevant to determine type identity or type dependence; only partial structure information is required for that for a given point in the program execution. Duck typing uses the premise that (referring to a value) "if it walks like a duck, and quacks like a duck, then it is a duck" (this is a reference to the duck test that is attributed to James Whitcomb Riley). The term may have been coined by Alex Martelli in a 2000 message[11] to the comp.lang.python newsgroup (see Python).
Type system
110
Dependent types
Dependent types are based on the idea of using scalars or values to more precisely describe the type of some other value. For example, might be the type of a 33 matrix. We can then define typing rules such as the following rule for matrix multiplication: where , , are arbitrary positive integer values. A variant of ML called Dependent ML has been created
based on this type system, but because type checking for conventional dependent types is undecidable, not all programs using them can be type-checked without some kind of limits. Dependent ML limits the sort of equality it can decide to Presburger arithmetic. Other languages such as Epigram make the value of all expressions in the language decidable so that type checking can be decidable. It is also possible to make the language Turing-complete at the price of undecidable type checking, as in Cayenne.
Linear types
Linear types, based on the theory of linear logic, and closely related to uniqueness types, are types assigned to values having the property that they have one and only one reference to them at all times. These are valuable for describing large immutable values such as files, strings, and so on, because any operation that simultaneously destroys a linear object and creates a similar object (such as 'str = str + "a"') can be optimized "under the hood" into an in-place mutation. Normally this is not possible, as such mutations could cause side effects on parts of the program holding other references to the object, violating referential transparency. They are also used in the prototype operating system Singularity for interprocess communication, statically ensuring that processes cannot share objects in shared memory in order to prevent race conditions. The Clean language (a Haskell-like language) uses this type system in order to gain a lot of speed while remaining safe.
Intersection types
Intersection types are types describing values that belong to both of two other given types with overlapping value sets. For example, in most implementations of C the signed char has range -128 to 127 and the unsigned char has range 0 to 255, so the intersection type of these two types would have range 0 to 127. Such an intersection type could be safely passed into functions expecting either signed or unsigned chars, because it is compatible with both types. Intersection types are useful for describing overloaded function types: For example, if " int int " is the type of functions taking an integer argument and returning an integer, and " float float
Type system " is the type of functions taking a float argument and returning a float, then the intersection of these two types can be used to describe functions that do one or the other, based on what type of input they are given. Such a function could be passed into another function expecting an " int int " function safely; it simply would not use the " float float " functionality. In a subclassing hierarchy, the intersection of a type and an ancestor type (such as its parent) is the most derived type. The intersection of sibling types is empty. The Forsythe language includes a general implementation of intersection types. A restricted form is refinement types.
111
Union types
Union types are types describing values that belong to either of two types. For example, in C, the signed char has range -128 to 127, and the unsigned char has range 0 to 255, so the union of these two types would have range -128 to 255. Any function handling this union type would have to deal with integers in this complete range. More generally, the only valid operations on a union type are operations that are valid on both types being unioned. C's "union" concept is similar to union types, but is not typesafe, as it permits operations that are valid on either type, rather than both. Union types are important in program analysis, where they are used to represent symbolic values whose exact nature (e.g., value or type) is not known. In a subclassing hierarchy, the union of a type and an ancestor type (such as its parent) is the ancestor type. The union of sibling types is a subtype of their common ancestor (that is, all operations permitted on their common ancestor are permitted on the union type, but they may also have other valid operations in common).
Existential types
Existential types are frequently used in connection with record types to represent modules and abstract data types, due to their ability to separate implementation from interface. For example, the type "T = X { a: X; f: (X int); }" describes a module interface that has a data member of type X and a function that takes a parameter of the same type X and returns an integer. This could be implemented in different ways; for example: intT = { a: int; f: (int int); } floatT = { a: float; f: (float int); } These types are both subtypes of the more general existential type T and correspond to concrete implementation types, so any value of one of these types is a value of type T. Given a value "t" of type "T", we know that "t.f(t.a)" is well-typed, regardless of what the abstract type X is. This gives flexibility for choosing types suited to a particular implementation while clients that use only values of the interface typethe existential typeare isolated from these choices. In general it's impossible for the typechecker to infer which existential type a given module belongs to. In the above example intT { a: int; f: (int int); } could also have the type X { a: X; f: (int int); }. The simplest solution is to
Type system annotate every module with its intended type, e.g.: intT = { a: int; f: (int int); } as X { a: X; f: (X int); } Although abstract data types and modules had been implemented in programming languages for quite some time, it wasn't until 1988 that John C. Mitchell and Gordon Plotkin established the formal theory under the slogan: "Abstract [data] types have existential type".[12] The theory is a second-order typed lambda calculus similar to System F, but with existential instead of universal quantification.
112
and y
and y
must be numbers since addition is only defined for numbers. Therefore, any call to f
elsewhere in the program that specifies a non-numeric type (such as a string or list) as an argument would signal an error. Numerical and string constants and expressions in code can and often do imply type in a particular context. For example, an expression 3.14 might imply a type of floating-point, while [1, 2, 3] might imply a list of integers typically an array. Type inference is in general possible, if it is decidable in the type theory in question. Moreover, even if inference is undecidable in general for a given type theory, inference is often possible for a large subset of real-world programs.
Type system Haskell's type system, a version of Hindley-Milner, is a restriction of System F to so-called rank-1 polymorphic types, in which type inference is decidable. Most Haskell compilers allow arbitrary-rank polymorphism as an extension, but this makes type inference undecidable. (Type checking is decidable, however, and rank-1 programs still have type inference; higher rank polymorphic programs are rejected unless given explicit type annotations.)
113
Types of types
A type of types is a kind. Kinds appear explicitly in typeful programming, such as a type constructor in the Haskell language. Types fall into several broad categories: Primitive types the simplest kind of type; e.g., integer and floating-point number Boolean Integral types types of whole numbers; e.g., integers and natural numbers Floating point types types of numbers in floating-point representation Reference types Option types Nullable types Composite types types composed of basic types; e.g., arrays or records. Abstract data types Algebraic types Subtype Derived type Object types; e.g., type variable Partial type Recursive type Function types; e.g., binary functions universally quantified types, such as parameterized types existentially quantified types, such as modules Refinement types types that identify subsets of other types Dependent types types that depend on terms (values) Ownership types types that describe or constrain the structure of object-oriented systems Pre-defined types provided for convenience in real-world applications, such as date, time and money.
Type system If the type of e and the type of x are the same and assignment is allowed for that type, then this is a valid expression. In the simplest type systems, therefore, the question of whether two types are compatible reduces to that of whether they are equal (or equivalent). Different languages, however, have different criteria for when two type expressions are understood to denote the same type. These different equational theories of types vary widely, two extreme cases being structural type systems, in which any two types are equivalent that describe values with the same structure, and nominative type systems, in which no two syntactically distinct type expressions denote the same type (i.e., types must have the same "name" in order to be equal). In languages with subtyping, the compatibility relation is more complex. In particular, if A is a subtype of B, then a value of type A can be used in a context where one of type B is expected, even if the reverse is not true. Like equivalence, the subtype relation is defined differently for each programming language, with many variations possible. The presence of parametric or ad hoc polymorphism in a language may also have implications for type compatibility.
114
Programming style
Some programmers prefer statically typed languages; others prefer dynamically typed languages. Statically typed languages alert programmers to type errors during compilation, and they may perform better at runtime. Advocates of dynamically typed languages claim they better support rapid prototyping and that type errors are only a small subset of errors in a program.[13][14] Likewise, there is often no need to manually declare all types in statically typed languages with type inference; thus, the need for the programmer to explicitly specify types of variables is automatically lowered for such languages; and some dynamic languages have run-time optimisers[15][16] that can generate fast code approaching the speed of static language compilers, often by using partial type inference.
References
[1] Pierce, Benjamin C. (2002). Types and Programming Languages. MIT Press. ISBN0-262-16209-1. [2] Pierce, Benjamin C. (2002), p. 208 [3] Infoworld 25 April 1983 (http:/ / books. google. co. uk/ books?id=7i8EAAAAMBAJ& pg=PA66& lpg=PA66& dq=pascal+ type+ system+ "too+ strong"& source=bl& ots=PGyKS1fWUb& sig=ebFI6fk_yxwyY4b7sHSklp048Q4& hl=en& ei=lSmjTunuBo6F8gPOu43CCA& sa=X& oi=book_result& ct=result& resnum=1& ved=0CBsQ6AEwAA#v=onepage& q=pascal type system "too strong"& f=false) [4] [[Brian Kernighan (http:/ / www. cs. virginia. edu/ ~cs655/ readings/ bwk-on-pascal. html)]: Why Pascal is not my favorite language] [5] http:/ / msdn. microsoft. com/ en-us/ library/ dd233052(VS. 100). aspx [6] Xi, Hongwei; Scott, Dana (1998). "Dependent Types in Practical Programming". Proceedings of ACM SIGPLAN Symposium on Principles of Programming Languages (ACM Press): 214227. CiteSeerX: 10.1.1.41.548 (http:/ / citeseerx. ist. psu. edu/ viewdoc/ summary?doi=10. 1. 1. 41. 548). [7] Liskov, B; Zilles, S (1974). "Programming with abstract data types". ACM Sigplan Notices. CiteSeerX: 10.1.1.136.3043 (http:/ / citeseerx. ist. psu. edu/ viewdoc/ summary?doi=10. 1. 1. 136. 3043). [8] Jackson, K. (1977). "Parallel processing and modular software construction" (http:/ / www. springerlink. com/ content/ wq02703237400667/ ). Lecture Notes in Computer Science 54: 436443. doi:10.1007/BFb0021435. . [9] Bracha, G.: Pluggable Types (http:/ / bracha. org/ pluggableTypesPosition. pdf) [10] Rozsnyai, S.; Schiefer, J.; Schatten, A. (2007). "Concepts and models for typing events for event-based systems". Proceedings of the 2007 inaugural international conference on Distributed event-based systems - DEBS '07. pp.62. doi:10.1145/1266894.1266904. ISBN9781595936653. [11] Martelli, Alex (26 July 2000). "lmvn6017l@news1.newsguy.com Re: polymorphism (was Re: Type checking in python?) (news:8)". (Web link) (http:/ / groups. google. com/ group/ comp. lang. python/ msg/ e230ca916be58835?hl=en& ). [12] Mitchell, John C.; Plotkin, Gordon D.; Abstract Types Have Existential Type (http:/ / theory. stanford. edu/ ~jcm/ papers/ mitch-plotkin-88. pdf), ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988, pp. 470502 [13] Meijer, Erik; Drayton, Peter. "Static Typing Where Possible, Dynamic Typing When Needed: The End of the Cold War Between Programming Languages" (http:/ / research. microsoft. com/ en-us/ um/ people/ emeijer/ Papers/ RDL04Meijer. pdf). Microsoft Corporation. . [14] Eckel, Bruce. "Strong Typing vs. Strong Testing" (http:/ / docs. google. com/ View?id=dcsvntt2_25wpjvbbhk). Google Docs. . [15] "Adobe and Mozilla Foundation to Open Source Flash Player Scripting Engine" (http:/ / www. mozilla. com/ en-US/ press/ mozilla-2006-11-07. html). . [16] "Psyco, a Python specializing compiler" (http:/ / psyco. sourceforge. net/ introduction. html). .
Type system
115
Further reading
Smith, Chris, What To Know Before Debating Type Systems (http://cdsmith.wordpress.com/2011/01/09/ an-old-article-i-wrote/) Tratt, Laurence, Dynamically Typed Languages (http://tratt.net/laurie/research/publications/html/ tratt__dynamically_typed_languages/), Advances in Computers, Vol. 77, pp.149184, July 2009
Interpretation
Most generally, "strong typing" implies that the programming language places severe restrictions on the intermixing that is permitted to occur, preventing the compiling or running of source code which uses data in what is considered to be an invalid way. For instance, an addition operation may not allow to add an integer to a string value; a procedure which operates upon linked lists may not be used upon numbers. However, the nature and strength of these restrictions is highly variable.
Example
Weak Typing Pseudocode Strong Typing
a = 2 b = "2" concatenate(a, b) # Type Error add(a, b) # Type Error concatenate(str(a), b) # Returns "22" add(a, int(b)) # Returns 4
ActionScript 3, C++, C#, Java, Python, OCaml
Strongly typed programming language anyway). The mandatory requirement, by a language definition, of compile-time checks for type constraint violations. That is, the compiler ensures that operations only occur on operand types that are valid for the operation. However, that is also the definition of static typing, leading some experts to state: "Static typing is often confused with StrongTyping". [2] Fixed and invariable typing of data objects. The type of a given data object does not vary over that object's lifetime. For example, class instances may not have their class altered. The absence of ways to evade the type system. Such evasions are possible in languages that allow programmer access to the underlying representation of values, i.e., their bit-pattern. Omission of implicit type conversion, that is, conversions that are inserted by the compiler on the programmer's behalf. For these authors, a programming language is strongly typed if type conversions are allowed only when an explicit notation, often called a cast, is used to indicate the desire of converting one type to another. Disallowing any kind of type conversion. Values of one type cannot be converted to another type, explicitly or implicitly. A complex, fine-grained type system with compound types.
116
Brian Kernighan: "[...] each object in a program has a well-defined type which implicitly defines the legal values of and operations on the object. The language guarantees that it will prohibit illegal values and operations, by some mixture of compile- and run-time checking."[3]
Strongly typed programming language variants and pass the result into an integer literal. Assembly language and Forth have been said to be untyped. There is no type checking; it is up to the programmer to ensure that data given to functions is of the appropriate type. Any type conversion required is explicit. For this reason, writers who wish to write unambiguously about type systems often eschew the term "strong typing" in favor of specific expressions such as "type safety".
117
References
[1] [2] [3] [4] ftp:/ / gatekeeper. research. compaq. com/ pub/ DEC/ SRC/ research-reports/ SRC-045. pdf page 3 Cunningham & Cunningham Wiki (http:/ / c2. com/ cgi/ wiki?StaticTyping) [[Brian Kernighan (http:/ / www. cs. virginia. edu/ ~cs655/ readings/ bwk-on-pascal. html)] in Why Pascal is not my favourite language] Infoworld April 25th 1983 (http:/ / books. google. co. uk/ books?id=7i8EAAAAMBAJ& pg=PA66& lpg=PA66& dq=pascal+ type+ system+ "too+ strong"& source=bl& ots=PGyKS1fWUb& sig=ebFI6fk_yxwyY4b7sHSklp048Q4& hl=en& ei=lSmjTunuBo6F8gPOu43CCA& sa=X& oi=book_result& ct=result& resnum=1& ved=0CBsQ6AEwAA#v=onepage& q=pascal type system "too strong"& f=false) [5] [[Brian Kernighan (http:/ / www. cs. virginia. edu/ ~cs655/ readings/ bwk-on-pascal. html)]: Why Pascal is not my favourite language] [6] Common Lisp HyperSpec, Types and Classes (http:/ / www. lispworks. com/ documentation/ HyperSpec/ Body/ 04_. htm) [7] CMUCL User's Manual: The Compiler, Types in Python (http:/ / common-lisp. net/ project/ cmucl/ doc/ cmu-user/ compiler. html#toc123)
Weak typing
<noinclude Mm q q q Vd XT9 </noinclude> In computer science, weak typing (a.k.a. loose typing) is a property attributed to the type systems of some programming languages. It is the opposite of strong typing, and consequently the term weak typing has a number of different meanings, just as "strong typing" does. One of the more common definitions states that weakly typed programming languages are those that support either implicit type conversion (nearly all languages support at least one implicit type conversion), ad-hoc polymorphism (also known as overloading) or both. These less restrictive usage rules can give the impression that strict adherence to typing rules is less important than in strongly typed languages and hence that the type system is "weaker". However, such languages usually have restrictions on what programmers can do with values of a given type; thus it is possible for a weakly typed language to be type safe. Moreover, weakly typed languages may be statically typed, in which case overloading is resolved statically and type conversion operations are inserted by the compiler, or dynamically typed, in which case everything is resolved at run time. One claimed advantage of weak typing over strong typing is that it requires less effort on the part of the programmer because the compiler or interpreter implicitly performs certain kinds of conversions. However, one claimed disadvantage is that weakly typed programming systems catch fewer errors at compile time and some of these might still remain after testing has been completed. Sometimes implicit conversion occurs which will surprise unwary programmers and lead to unexpected bugs. For example, in PHP, the strings "1000" and "1e3" compare equal because they are implicitly cast to floating point numbers, even though they have distinct values as strings.
Syntax
118
Syntax
In computer science, the syntax of a programming language is the set of rules that define the combinations of symbols that are considered to be correctly structured programs in that language. The syntax of a language defines its surface form.[1] Text-based programming languages are based on sequences of characters, while visual programming languages are based on the spatial layout and connections between symbols (which may be textual or graphical). The lexical grammar of a textual language specifies how characters must be chunked into tokens. Other syntax rules specify the permissible sequences of these tokens and the process of assigning meaning to these token sequences is part of semantics.
Syntax highlighting and indent style are often used to aid programmers in recognizing elements of source code. Color coded highlighting is used in this piece of code written in Python.
The syntactic analysis of source code usually entails the transformation of the linear sequence of tokens into a hierarchical syntax tree (abstract syntax trees are one convenient form of syntax tree). This process is called parsing, as it is in syntactic analysis in linguistics. Tools have been written that automatically generate parsers from a specification of a language grammar written in Backus-Naur form, e.g., Yacc (yet another compiler compiler).
Syntax definition
The syntax of textual programming languages is usually defined using a combination of regular expressions (for lexical structure) and Backus-Naur Form (for grammatical structure) to inductively specify syntactic categories (nonterminals) and terminal symbols. Syntactic categories are defined by rules called productions, which specify the values that belong to a particular syntactic category.[1] Terminal symbols are the concrete characters or strings of characters (for example keywords such as define, if, let, or void) from which syntactically valid programs are constructed.
Below is a simple grammar, based on Lisp, which defines productions for the syntactic categories expression, atom, number, symbol, and list:
Syntax expression ::= atom | list atom ::= number | symbol number ::= [+-]?['0'-'9']+ symbol ::= ['A'-'Z''a'-'z'].* list ::= '(' expression* ')' This grammar specifies the following: an expression is either an atom or a list; an atom is either a number or a symbol; a number is an unbroken sequence of one or more decimal digits, optionally preceded by a plus or minus sign; a symbol is a letter followed by zero or more of any characters (excluding whitespace); and a list is a matched pair of parentheses, with zero or more expressions inside it.
119
Here the decimal digits, upper- and lower-case characters, and parentheses are terminal symbols. The following are examples of well-formed token sequences in this grammar: '12345', '()', '(a b c232 (1))' The grammar needed to specify a programming language can be classified by its position in the Chomsky hierarchy. The syntax of most programming languages can be specified using a Type-2 grammar, i.e., they are context-free grammars.[2] However, there are exceptions. In some languages like Perl and Lisp the specification (or implementation) of the language allows constructs that execute during the parsing phase. Furthermore, these languages have constructs that allow the programmer to alter the behavior of the parser. This combination effectively blurs the distinction between parsing and execution, and makes syntax analysis an undecidable problem in these languages, meaning that the parsing phase may not finish. For example, in Perl it is possible to execute code during parsing using a BEGIN statement, and Perl function prototypes may alter the syntactic interpretation, and possibly even the syntactic validity of the remaining code.[3] Similarly, Lisp macros introduced by the defmacro syntax also execute during parsing, meaning that a Lisp compiler must have an entire Lisp run-time system present. In contrast C macros are merely string replacements, and do not require code execution.[4][5]
Syntax
120
References
[1] Friedman, Daniel P.; Mitchell Wand, Christopher T. Haynes (1992). Essentials of Programming Languages (1st ed.). The MIT Press. ISBN0-262-06145-7. [2] Michael Sipser (1997). Introduction to the Theory of Computation. PWS Publishing. ISBN0-534-94728-X. Section 2.2: Pushdown Automata, pp.101114. [3] The following discussions give examples: Perl and Undecidability (http:/ / www. jeffreykegler. com/ Home/ perl-and-undecidability) LtU comment clarifying that the undecidable problem is membership in the class of Perl programs (http:/ / lambda-the-ultimate. org/ node/ 3564#comment-50578) chromatic's example of Perl code that gives a syntax error depending on the value of random variable (http:/ / www. modernperlbooks. com/ mt/ 2009/ 08/ on-parsing-perl-5. html) [4] http:/ / www. apl. jhu. edu/ ~hall/ Lisp-Notes/ Macros. html [5] http:/ / cl-cookbook. sourceforge. net/ macros. html
External links
Various syntactic constructs used in computer programming languages (http://merd.sourceforge.net/pixel/ language-study/syntax-across-languages/)
Scripting language
A scripting language or script language is a programming language that supports the writing of scripts, programs written for a software environment that automate the execution of tasks which could alternatively be executed one-by-one by a human operator. Environments that can be automated through scripting include software applications, web pages within a web browser, the shells of operating systems, and several general purpose and domain-specific languages such as those for embedded systems. Scripts can be written and executed "on-the-fly", without explicit compile and link steps; they are typically created or modified by the person executing them.[1] A scripting language is usually interpreted from source code or bytecode.[2] By contrast, the software environment the scripts are written for is typically written in a compiled language and distributed in machine code form; the user may not have access to its source code, let alone be able to modify it. The spectrum of scripting languages ranges from very small and highly domain-specific languages to general-purpose programming languages. The term script is typically reserved for small programs (up to a few thousand lines of code).
History
Early mainframe computers (in the 1950s) were non-interactive, instead using batch processing. IBM's Job Control Language (JCL) is the archetype of languages used to control batch processing.[3] The first interactive shells were developed in the 1960s to enable remote operation of the first time-sharing systems, and these used shell scripts, which controlled running computer programs within a computer program, the shell. Calvin Mooers in his TRAC language is generally credited with inventing command substitution, the ability to imbed commands in scripts that when interpreted insert a character string into the script.[4] Multics calls these active functions.[5] Louis Pouzin wrote an early processor for command scripts called RUNCOM for CTSS around 1964. Stuart Madnick at MIT wrote a scripting language for IBM's CP/CMS in 1966. He originally called this processor COMMAND, later named EXEC.[6] Multics included an offshoot of CTSS RUNCOM, also called RUNCOM.[7] Languages such as Tcl and Lua were specifically designed as general purpose scripting languages that could be embedded in any application. Other languages such as Visual Basic for Applications (VBA) provided strong
Scripting language integration with the automation facilities of an underlying system. Embedding of such general purpose scripting languages instead of developing a new language for each application also had obvious benefits, relieving the application developer of the need to code a language translator from scratch and allowing the user to apply skills learned elsewhere. Some software incorporates several different scripting languages. Modern web browsers typically provide a language for writing extensions to the browser itself, and several standard embedded languages for controlling the browser, including JavaScript (a dialect of ECMAScript) or XUL.
121
GUI scripting
With the advent of graphical user interfaces, a specialized kind of scripting language emerged for controlling a computer. These languages interact with the same graphic windows, menus, buttons, and so on that a human user would. They do this by simulating the actions of a user. These languages are typically used to automate user actions. Such languages are also called "macros" when control is through simulated key presses or mouse clicks. These languages could in principle be used to control any GUI application; but, in practice their use is limited because their use needs support from the application and from the operating system. There are a few exceptions to this limitation. Some GUI scripting languages are based on recognizing graphical objects from their display screen pixels. These GUI scripting languages do not depend on support from the operating system or application.
Application-specific languages
Many large application programs include an idiomatic scripting language tailored to the needs of the application user. Likewise, many computer game systems use a custom scripting language to express the programmed actions of non-player characters and the game environment. Languages of this sort are designed for a single application; and, while they may superficially resemble a specific general-purpose language (e.g. QuakeC, modeled after C), they have custom features that distinguish them. Emacs Lisp, while a fully formed and capable dialect of Lisp, contains many special features that make it most useful for extending the editing functions of Emacs. An application-specific scripting language can be viewed as a domain-specific programming language specialized to a single application.
Scripting language
122
Extension/embeddable languages
A number of languages have been designed for the purpose of replacing application-specific scripting languages by being embeddable in application programs. The application programmer (working in C or another systems language) includes "hooks" where the scripting language can control the application. These languages may be technically equivalent to an application-specific extension language but when an application embeds a "common" language, the user gets the advantage of being able to transfer skills from application to application. JavaScript began as and primarily still is a language for scripting inside web browsers; however, the standardization of the language as ECMAScript has made it popular as a general purpose embeddable language. In particular, the Mozilla implementation SpiderMonkey is embedded in several environments such as the Yahoo! Widget Engine. Other applications embedding ECMAScript implementations include the Adobe products Adobe Flash (ActionScript) and Adobe Acrobat (for scripting PDF files). Tcl was created as an extension language but has come to be used more frequently as a general purpose language in roles similar to Python, Perl, and Ruby. On the other hand, Rexx was originally created as a job control language, but is widely used as an extension language as well as a general purpose language. Other complex and task-oriented applications may incorporate and expose an embedded programming language to allow their users more control and give them more functionality than can be available through a user interface, no matter how sophisticated. For example, Autodesk Maya 3D authoring tools embed the MEL scripting language, or Blender which has Python to fill this role. Some other types of applications that need faster feature addition or tweak-and-run cycles (e.g. game engines) also use an embedded language. During the development, this allows them to prototype features faster and tweak more freely, without the need for the user to have intimate knowledge of the inner workings of the application or to rebuild it after each tweak (which can take a significant amount of time.) The scripting languages used for this purpose range from the more common and more famous Lua and Python to lesser-known ones such as AngelScript and Squirrel. Ch is another C compatible scripting option for the industry to embed into C/C++ application programs.
Scripting language
123
Market analysis
According to a global survey performed by Evans Data in 2008,[9] the most widespread scripting language is JavaScript. The second most widespread is PHP. Perl is the third most widespread scripting language, but in North America it enjoys significantly more popularity.[10]
References
[1] IEEE Computer, 2008, In praise of scripting, Ronald Loui author [2] Brown, Vicki. ""Scripting Languages"" (http:/ / www. mactech. com/ articles/ mactech/ Vol. 15/ 15. 09/ ScriptingLanguages/ index. html). . Retrieved 2009-07-22. [3] IBM Corporation (1967). IBM System/360 Operating System Job Control Language (C28-6529-4) (http:/ / www. bitsavers. org/ pdf/ ibm/ 360/ os/ R01-08/ C28-6539-4_OS_JCL_Mar67. pdf). . [4] Mooers, Calvin. "TRAC, A Procedure-Describing Language for the Reactive Typewriter" (http:/ / web. archive. org/ web/ 20010425014914/ http:/ / tracfoundation. org/ trac64/ procedure. htm). . Retrieved Mar 9, 1012. [5] Van Vleck(ed.), Thomas. "Multics Glossary -A- (active function)" (http:/ / www. multicians. org/ mga. html). . Retrieved Mar 9, 2012. [6] Varian, Melinda. "VM AND THE VM COMMUNITY: Past, Present, and Future" (http:/ / web. me. com/ melinda. varian/ Site/ Melinda_Varians_Home_Page_files/ neuvm. pdf). . Retrieved Mar 9, 2012. [7] Van Vleck, Thomas(ed.). "Multics Glossary -R- (RUNCOM)" (http:/ / www. multicians. org/ mgr. html#runcom). . Retrieved Mar 9, 2012. [8] Sheppard, Doug (2000-10-16). "Beginner's Introduction to Perl" (http:/ / www. perl. com/ pub/ 2000/ 10/ begperl1. html). dev.perl.org. . Retrieved 2011-01-08. [9] http:/ / www. cio. com/ article/ 446829/ PHP_JavaScript_Ruby_Perl_Python_and_Tcl_Today_The_State_of_the_Scripting_Universe?contentId=446829 [10] PHP, JavaScript, Ruby, Perl, Python, and Tcl Today: The State of the Scripting Universe - CIO.com (http:/ / www. cio. com/ article/ 446829/ PHP_JavaScript_Ruby_Perl_Python_and_Tcl_Today_The_State_of_the_Scripting_Universe?contentId=446829& slug=& )
External links
Patterns for Scripted Applications (http://web.archive.org/web/20041010125419/www.doc.ic.ac.uk/~np2/ patterns/scripting/) A study of the Script-Oriented Programming (SOP) suitability of selected languages (http://merd.sourceforge. net/pixel/language-study/scripting-language/) from The Scriptometer A Slightly Skeptical View on Scripting Languages (http://www.softpanorama.org/Articles/ a_slightly_skeptical_view_on_scripting_languages.shtml) by Dr. Nikolai Bezroukov Rob van der Woude's Scripting Pages (http://www.robvanderwoude.com/) Administrative scripting related information (includes examples) Are Scripting Languages Any Good? A Validation of Perl, Python, Rexx, and Tcl against C, C++, and Java (PDF) (http://page.mi.fu-berlin.de/~prechelt/Biblio/jccpprt2_advances2003.pdf) 2003 study Use of VBScript in QTP automation (http://knol.google.com/k/rajamanickam-antonimuthu/ quick-test-professional-software-test/14dmp09oqdm08/2#Basics_of_vbscript) Scripting on the Java platform (http://www.javaworld.com/javaworld/jw-11-2007/jw-11-jsr223.html) JavaWorld "Programming is Hard - Let's Go Scripting" by Larry Wall (http://www.perl.com/pub/a/2007/12/06/ soto-11.html) - Perl.com transcript of his State of the Onion speech. iSystemAdmin.com as System Admin site to share Tools, Script, Plugins and Books (http://www.isystemadmin. com)
124
125
126
127
License
128
License
Creative Commons Attribution-Share Alike 3.0 Unported //creativecommons.org/licenses/by-sa/3.0/