Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

Web API

DOs and DONTs


Oliver Wolf @owolf

Oliver Wolf @owolf

www.innoQ.com @innoQ

Disclaimer
! Some of the discussions around REST and Web APIs are

merely a matter of taste and personal preference I think the topics Im going to bring up are not, but Im as biased as everyone else. even claim to be one.

! This is by no means a complete compilation and doesnt ! And, as always, your mileage may vary.

Dont think in terms of endpoints.

SOAP: Facade with a single entry point


POST /soap/customer_service <soap:envelope> <soap:body> <cs:create_customer> <cs:customer> <cs:name>John Doe</cs:name> ... </soap:body> </soap:envelope>

The Web: Lots of facades with lots of doors

Do you really want the web to end at your doorstep?

The web is based on relations and interconnections.

Dont let your API be like a black hole with one way in and no way out
! Use hypermedia controls to link your resource
representations together in ways that are meaningful for your audience. concepts and resources outside your domain, use hyperlinks whenever possible. Thats what theyre meant for! visible and navigable via hypermedia controls rather than relying on out-of-band documentation.

! If your resource representations contain references to

! Make potential state transitions that apply to your resources

Dont just expose your domain model.

Many real-world domain models happen to be anemic.

If you just expose them as-is, youll inevitably end up with bunch of CRUD resources
! This doesnt necessarily have to be bad thing, but it often is. ! A client that consumes a web API based on an anemic
domain model needs to have intimate knowledge about the resources, their relations and the actions that can be performed on them tight coupling ensues. slavishly following the domain model.

! Its almost always better to design APIs for intent rather than

Designing for intent means that you need to understand how clients will use the API
! That often requires a trade-off between exibility vs. clarity
and conciseness.

! Of course clients could request a list of the top 10 customers


based on revenue like so:
GET /customers?sortBy=grossMargin&order=desc&pageSize=10

! But if thats a frequent and meaningful use case for your


API, why not introduce a new resource that explicitly conveys the intent:
GET /most_profitable_customers

Dont overuse GET and POST.

GET /blog/entries/42&action=delete POST /blog/entries/42/delete POST /customer/123 <customer> <status>Preferred</status> </customer> GET /api/create_customer?name=...

The HTTP verbs are there for a reason they have complementary qualities.

Safe? GET PUT POST DELETE

Idempotent?

Semantics
retrieve resource representation modify resource state or create resource identied by URL create new resource, leave assigning identier to server delete resource

! ! ! !

! ! ! !

You gain a lot by using HTTP as its intended to be used.


! Using HTTP verbs correctly unambiguously communicates
intent.

! The client knows excatly what to expect from the server: ! Which actions can be safely retried in case of errors? ! Which results can potentially be cached? ! Which actions mutate server-side resource state? ! Coupling between client and server is limited to the
HTTP contract, no out-of-band knowledge is required.

Dont limit your choice of error codes to 200 and 500.

Life lesson: Pretending everythings good when in fact it isnt is rarely a good idea.

S
{

? y l s r

HTTP/1.1 200 OK Content-Type: application/json

success:false, severity:100, error_message:"Everythings FUBAR!" }

There are more than 60 error codes for you to choose from.
100 101 102 103 122 200 201 202 203 204 205 206 207 208 226 300 301 302 303 304 305 306 307 308 Client Server Server resume URI is should continue with request is switching protocols has received and is processing the request aborted PUT or POST requests longer than a maximum of 2083 characters standard response for successful HTTP requests request has been fulfilled; new resource created request accepted, processing pending request processed, information may be from another source request processed, no content returned request processed, no content returned, reset document view partial resource return due to request header XML, can contain multiple separate responses results previously returned request fulfilled, reponse is instance-manipulations multiple options for the resource delivered this and all future requests directed to the given URI temporary response to request found via alternative URI permanent response to request found via alternative URI resource has not been modified since last requested content located elsewhere, retrieve from there subsequent requests should use the specified proxy connect again to different URI as provided resumable HTTP requests 413 request is larger than the server is willing or able to process 414 URI provided was too long for the server to process 415 server does not support media type 416 client has asked for unprovidable portion of the file 417 server cannot meet requirements of Expect request-header field 418 I'm a teapot 420 Twitter rate limiting 422 request unable to be followed due to semantic errors 423 resource that is being accessed is locked 424 request failed due to failure of a previous request 426 client should switch to a different protocol 428 origin server requires the request to be conditional 429 user has sent too many requests in a given amount of time 431 server is unwilling to process the request 444 server returns no information and closes the connection 449 request should be retried after performing action 450 Windows Parental Controls blocking access to webpage 451 The server cannot reach the client's mailbox. 499 connection closed by client while HTTP server is processing 500 generic error message 501 server does not recognise method or lacks ability to fulfill 502 server received an invalid response from upstream server 503 server is currently unavailable 504 gateway did not receive response from upstream server 505 server does not support the HTTP protocol version 506 content negotiation for the request results in a circular reference 507 server is unable to store the representation 508 server detected an infinite loop while processing the request 509 bandwidth limit exceeded 510 further extensions to the request are required 511 client needs to authenticate to gain network access 598 network read timeout behind the proxy 599 network connect timeout behind the proxy

400 request cannot be fulfilled due to bad syntax 401 authentication is possible but has failed 402 payment required, reserved for future use 403 server refuses to respond to request 404 requested resource could not be found 405 request method not supported by that resource 406 content not acceptable according to the Accept headers 407 client must first authenticate itself with the proxy 408 server timed out waiting for the request 409 request could not be processed because of conflict 410 resource is no longer available and will not be available again 411 request did not specify the length of its content 412 server does not meet request preconditions

Using the right error category is key to nding the appropriate recovery strategy.
! Even if youre not always sure about the subtleties of using
one code over another, at least make sure you get the error category right:

! 2xx codes indicate successful completion ! 3xx codes are redirections ! 4xx codes indicate error caused by faulty behavior on the ! 5xx codes indicate server-side errors which may or may
not be recoverable

client side these are usually recoverable (just check the request and try again)

Dont ignore caching.

Fact: There will be caches involved, no matter what.

Client Client Cache

Client

Client

Proxy Cache

Proxy Cache

Proxy Cache

ones under your control!

These are the only

Reverse Proxy Cache

The Internets

Origin Server

Origin Server

Origin Server

You can just ignore them, of course.


! If you dont include any caching headers in your responses,
well-behaved caches will just do nothing. communication in any way, use

! If you want to really make sure that no cache interferes with


Cache-Control: no-store

! But is this really what you want?

Theyre there to help!

(And they come for free.)

Help them so they can help you!


! The least you can do is include either an Expires header
or a Cache-Control: max-age=... with a reasonable freshness period for data that changes rarely and/or at regular intervals.

! Better yet, use validators: ! Include Last-Modfied in responses and honor


If-Modified-Since in requests. in requests.

! Include ETag in responses and honor If-None-Match

ETags are powerful beasts!

Heres the thing: You decide!

The cool thing about ETags


! ETags are opaque to proxies, so they can be just about
anything:

! hashes (not so cool if you need to create the

representation to calculate the hash and then throw it away if its unchanged no computation effort saved!)

! timestamps ! version numbers ! or anything else that allows your server logic to decide if
a representation can still be considered fresh, which means you can be fuzzy here!

There are some caveats to keep in mind, though.


! Be careful if your resources support multiple
representations. You might want to include a Vary: Accept header.

! If a resource has both stable and highly volatile state, it can ! Try to avoid excessive precision in query parameters as it
can lead to cache misses. Consider if GET /weather?location=52.497N13.428E is really that much better than GET /weather?location=Berlin

be useful to split it into two separate (sub-)resources (which should be hyperlinked, of course).

Dont see versioning as a requisite.

As software engineers, weve internalized that versioning is essential to control change.

But a web API is fundamentally different from a piece of installed software.


! Web APIs are singletons theres only one instance at a
time.

! Once a public-facing API is published and starts to gain


traction, it becomes increasingly difcult to change. impossible for you to enforce version updates.

! Clients are rarely under your control and its almost

Often, when you think youre changing a resource what youre actually changing is just the representation.

! In many real-world cases /v1/customers and

/v2/customers still refer to the same thing (business concept, domain object, whatever). Why should it be identied by two distinct URLs? media type instead of introducing version information into the URL: Content-Type: application/vnd.myapi.v2

! If the representation has changed, consider versioning the

Better yet, try to get by without any versioning whatsoever.


! If you design your representations with extensibility in mind,
youll probably end up not needing versioning at all. make that easier to do in JSON than in XML.

! Most JSON implementations default mustIgnore behaviour ! If backwards compatibility is not possible or adds too much
of additional complexity, consider introducing an entirely new API (as Facebook did with the Graph API, for instance).

Dont mix up searching and identifying.

Searching for resources and identifying resources are fundamentally different things.
! Its often good practice to provide more than one way to
search for things, based on clients intent:
/countries/germany/states/berlin/cities/berlin /cities/berlin /cities?name=berlin&state=berlin&country=germany

! Identity, however, should be unique:


/cities/3874

Try not to mix up these two concepts in your API.


! If possible, identify and refer to resources by their
canonical URL.

! Use redirection:
GET /countries/germany/states/berlin/cities/berlin HTTP/1.1 303 Location: http://api.example.org/cities/3874

Dont obsess over URL naming but dont ignore it either.

Fact: There is no such thing as a RESTful URL. All URLs are created equal theyre just identiers after all.

Fact: With proper use of hypermedia controls, URLs are irrelevant from a technical standpoint.

but

Which of the two logs will help you best with tracking down the problem if things go wrong?

[16/Oct/2013:13:55:36] [16/Oct/2013:13:56:01] [16/Oct/2013:13:56:47] [16/Oct/2013:13:56:58] [16/Oct/2013:14:11:13]

"GET /customers 200 "GET /customer/42 200 "PUT /customer/42 200 "POST /customer/42/orders 200 "POST /orders/4711/items 200

or
[16/Oct/2013:13:55:36] [16/Oct/2013:13:56:01] [16/Oct/2013:13:56:47] [16/Oct/2013:13:56:58] [16/Oct/2013:14:11:13] "GET /xz66fgt5 200 "GET /ahgt67ft/42 200 "PUT /ahgt67ft/42 200 "POST /ahgt67ft/42/jh77hg87 200 "POST /bn87xcws/4711/lw33mn45 200

Humans do.

Machines dont care.

Dont use extensions as the only means of content negotiation.

Name extensions are a convenient way to select media types for representations.
! Theyre especially useful for testing in a browser (which
doesnt provide an easy way to do content negotiation).

! But they introduce multiple URL aliases for the same ! Prefer to use a canonical URL with proper content
negotiation as the primary reference:
/customer/42 /customer/42.xml /customer/42.json (Canonical) (Alias) (Alias)

resource that can lead to confusion and ambiguities when used to link to the resource in hypermedia representations.

Recap

Dont think in terms of endpoints. Dont just expose your domain model. Dont overuse GET and POST. Dont limit your choice of error codes to 200 and 500. Dont ignore caching. Dont see versioning as a requisite. Dont mix up searching and identifying. Dont obsess over URL naming but dont ignore it either. Dont use extensions as the only means of content negotiation.

Thats all I have. Feel free to ask me anything!

@owolf

You might also like