Geocoding addresses of the victims of political terror in the USSR using Yandex.Maps API: a
case study1
Ivan Lyagushkin, Liudmila Lyagushkina
Contemporary studies in social sciences and digital humanities often require geocoding of people’s
addresses. This can be done by commonly used APIs, such as Google Maps API. However, what if one
needs to geocode not the current, but historical addresses? Are modern maps useful in this case? Our
answer is yes, but some modifications of the standard operations might be useful.
For the study of the Great Terror in the USSR in 1937-1938, we needed to geocode the addresses of
70,000 people who lived in five regions of the Russian Federation when she was a republic in the former
USSR. This information came from the Victims of political terror in the USSR database, created by the
Memorial Society. The results of this geocoding will be used in a paper on the economics of the Great
Terror by Liudmila Lyagushkina (HSE University, Moscow) and Andrei Markevich (New Economic School,
Moscow), and the code was prepared by Ivan Lyagushkin.
We used victims’ places of residence before the arrest, their places of birth, or work if there was no
information on places of residence. The addresses typically included the name of the region, district, and
a name of the settlement, i.e., city, town or village. We did not use the street names, as they were not
typically mentioned in the data. For this study, placing victims inside the correct district of the exact
region was the biggest challenge, as most of the other data available in the study was at a district level.
In some cases, such a type of geocoding could be done with the help of historical directories, which
contain historical names and addresses of the settlements (see, for example, Gazetteer of British Place
Names). However, no such sources were found for the exact period of the 1930-s, and the sheer scale of
Russia’s territory allows us to suppose that even the bookkeeping of address lists or other reference
books were not as detailed and accurate as we needed.
Another option might be to scan historical maps and locate the relevant names of the settlements on
them. However, the maps that we found in the library were good for georeferencing borders of districts,
but not detailed enough for finding all the settlements’ names. For example, Picture 1 represents one
section of the administrative map of the Altai Region (1939). It provides only 20-30 settlement names for
one district. However, there were many more of them in our database.
1
This memo is a part of a larger project on the economics of the Great Terror by Liudmila Lyagushkina (HSE
University, Moscow) and Andrei Markevich (New Economic School, Moscow). We would like to thank Anna
Samodelkina for her excellent research assistance.
1
Picture 1. A section of the administrative map of the Altai Region (1939). Source: The Russian State
Library (RSL).
Thus,
we
decided
to
use
modern
maps
and
chose
Yandex.Maps
API
Geocoder (https://yandex.com/dev/maps/geocoder/), as Yandex.Maps are typically more detailed and
adapted for Russia then Google Maps. This Geocoder works as follows: a user sends its addresses via API,
and it returns geographical coordinates (latitude and longitude) and the structured description of this
place (modern region, district, settlement name, street name etc.).
However, having started to work with Geocoder, we soon realized what the main problem was: it
returned the result, i.e., the coordinates in almost any case, even when it did not detect the exact place.
If the name of the Russian region and a settlement which is not on Yandex Maps were in the address,
Geocoder returned the coordinates of the geographic center of the Region. Sometimes, it returned
completely false results relying on some historical names. For example, Kamchatskaya governorate,
which existed in the 1920-s on the Kamchatka Peninsula, might be geocoded as Kamchatskaya street in
Moscow.
Therefore, we decided to create a more sophisticated algorithm. At first, we took the georeferenced
historical administrative maps of the regions that were created for the main project on the economics of
the Great Terror. Using ArcGIS, we downloaded the coordinates of the vertices of each district and
created polygons. The result of this georeferencing imposed on Yandex Maps can be seen in Picture 2.
2
Picture 2. Historical borders of the districts of the Altai Region on Yandex Maps. Screenshot from
https://ivliag.github.io/
Subsequently, we searched the victims’ addresses to locate in them the names of historical districts using
the regular expressions programming language. We attributed most of the addresses to exact districts
and checked the results, corrected false attributions because of the typos, renames of the places,
deleted the addresses which were not inside the studied regions due to different reasons etc. Finally, we
deleted the names of the districts from addresses to reduce the likelihood of wrong results like in the
case of Kamchatka mentioned before.
Following that, Yandex Geocoder searched each settlement name inside the polygon of the assigned
district. It found coordinates in about 85% of the cases. It was hard to come up with an exact figure as
there were some false positive results. If Yandex did not find the exact place inside the district, it
searched for this place inside the region.
Afterwards, we moved to the last and the most time-consuming part. We manually checked the results
of geocoding in cases when the located coordinates were outside the attributed districts, and when
Yandex proposed multiple choices for one address, or no coordinates were found. The latter case was
really rare. For this reason, we abandoned the idea of searching on Google Maps as there were lots of
options and it was hard to check and choose the correct one.
We also used some additional resources to find addresses. At first, we searched on Yandex and Google,
then we checked Wikipedia and lots of additional sources, even the forums of genealogists. From all
sources, the website with historical maps Retromap (http://retromap.ru/) was the most useful. As we
mentioned before, we did not find detailed maps of the studied regions of the 1930-s. However, this
website contains detailed georeferenced maps of the world of the 1970-1980-s. Some places, for
example villages that were abandoned, were mentioned on these maps as ‘urochishe’ (landscape unit).
However, the names of such small landscape units are not optically recognized on the map and one
could locate them only by looking at the map. Retromap interface also allows the users to copy the
coordinates of the exact point on the map (see Picture 3), and this option helped a lot.
3
Picture 3. A screenshot of the Retromap interface, showing the surroundings of Barnaul (the Altai
Region) on the 1975 map. Source: http://retromap.ru/
As a result, we assigned districts to 90% of all the victims (see Table 1), and geographical coordinates to
80% of all victims. In most cases, the lack of coordinates was explained by the fact that the address was
outside of the studied regions. It happened most typically when we had only the place of birth of a
person, but they were born in one region and arrested in another. The next typical reason was that the
address itself was only at the district level. Finally, we did not find the mentioned places on the maps in
only 3% of the cases, either using Yandex Maps or manually.
District attributed
Geographical coordinates attributed
Coordinates were not gained because:
Address was outside the studied region
Address contained only information
about district
Settlement was not found here was no
address in the database
There was no address in the database
Number of
cases
62861
55692
Percentage
of cases
90%
80%
6390
9%
5197
7%
2083
455
3%
1%
4
Table 1. The results of the geocoding of the addresses of the 65,000 victims of the Great Terror in the
USSR.
This was not the main purpose of the research, but geocoding allowed us to create powerful data
visualization. For example, Picture 4 shows the geographical distribution of the arrested men (blue) and
women (pink). Picture 5 demonstrates the arrest patterns of people of different nationalities and
ethnicities.
Picture 4. Geographical distribution of the arrested men (blue) and women (pink) in the Altai Region,
1937-1938. A screenshot from Tableau.
Picture 5. Geographical distribution of the arrests of different ethnicities in the Altai Region, 1937-1938.
A screenshot from Tableau.
Our code is stored at GitHub, and can be used in solving similar problems and challenges. We used
Geocoder free of charge, because at the time we did geocoding the daily limit was 25,000 addresses.
Now this limit has dropped to 1,000 addresses a day. One can modify the script to geocode the results by
1,000, or use alternative APIs.
5