Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Geocoding addresses of the victims of political terror in the USSR using Yandex.Maps API: a case study

2022
This memo describes a case study of using modern maps APIs to geocode historical addresses of the victims of political terror of 1930-s in the USSR. Using regular expressions, we assigned a historical district to each address. Afterwards, we used Yandex.Maps Geocoder and searched for each settlement name inside the polygon of the assigned district. If nothing was found inside the district, Geocoder searched inside the whole region. We manually checked the results of this geocoding. As a result, we assigned districts to 90% of all the victims, and geographical coordinates to 80% of all victims....Read more
Geocoding addresses of the victims of political terror in the USSR using Yandex.Maps API: a case study 1 Ivan Lyagushkin, Liudmila Lyagushkina Contemporary studies in social sciences and digital humanities often require geocoding of people’s addresses. This can be done by commonly used APIs, such as Google Maps API. However, what if one needs to geocode not the current, but historical addresses? Are modern maps useful in this case? Our answer is yes, but some modifications of the standard operations might be useful. For the study of the Great Terror in the USSR in 1937-1938, we needed to geocode the addresses of 70,000 people who lived in five regions of the Russian Federation when she was a republic in the former USSR. This information came from the Victims of political terror in the USSR database, created by the Memorial Society. The results of this geocoding will be used in a paper on the economics of the Great Terror by Liudmila Lyagushkina (HSE University, Moscow) and Andrei Markevich (New Economic School, Moscow), and the code was prepared by Ivan Lyagushkin. We used victims’ places of residence before the arrest, their places of birth, or work if there was no information on places of residence. The addresses typically included the name of the region, district, and a name of the settlement, i.e., city, town or village. We did not use the street names, as they werenot typically mentioned in the data. For this study, placing victims inside the correct district of the exact region was the biggest challenge, as most of the other data available in the study was at a district level. In some cases, such a type of geocoding could be done with the help of historical directories, which contain historical names and addresses of the settlements (see, for example, Gazetteer of British Place Names). However, no such sources were found for the exact period of the 1930-s, and the sheer scale of Russia’s territory allows us to suppose that even the bookkeeping of address lists or other reference books were not as detailed and accurate as we needed. Another option might be to scan historical maps and locate the relevant names of the settlements on them. However, the maps that we found in the library were good for georeferencing borders of districts, but not detailed enough for finding all the settlements’ names. For example, Picture 1 represents one section of the administrative map of the Altai Region (1939). It provides only 20-30 settlement names for one district. However, there were many more of them in our database. 1 This memo is a part of a larger project on the economics of the Great Terror by Liudmila Lyagushkina (HSE University, Moscow) and Andrei Markevich (New Economic School, Moscow). We would like to thank Anna Samodelkina for her excellent research assistance. 1
Picture 1. A section of the administrative map of the Altai Region (1939). Source: The Russian State Library (RSL). Thus, we decided to use modern maps and chose Yandex.Maps API Geocoder (https://yandex.com/dev/maps/geocoder/), as Yandex.Maps are typically more detailed and adapted for Russia then Google Maps. This Geocoder works as follows: a user sends its addresses via API, and it returns geographical coordinates (latitude and longitude) and the structured description of this place (modern region, district, settlement name, street name etc.). However, having started to work with Geocoder, we soon realized what the main problem was: it returned the result, i.e., the coordinates in almost any case, even when it did not detect the exact place. If the name of the Russian region and a settlement which is not on Yandex Maps were in the address, Geocoder returned the coordinates of the geographic center of the Region. Sometimes, it returned completely false results relying on some historical names. For example, Kamchatskaya governorate, which existed in the 1920-s on the Kamchatka Peninsula, might be geocoded as Kamchatskaya street in Moscow. Therefore, we decided to create a more sophisticated algorithm. At first, we took the georeferenced historical administrative maps of the regions that were created for the main project on the economics of the Great Terror. Using ArcGIS, we downloaded the coordinates of the vertices of each district and created polygons. The result of this georeferencing imposed on Yandex Maps can be seen in Picture 2. 2
Geocoding addresses of the victims of political terror in the USSR using Yandex.Maps API: a case study1 Ivan Lyagushkin, Liudmila Lyagushkina Contemporary studies in social sciences and digital humanities often require geocoding of people’s addresses. This can be done by commonly used APIs, such as Google Maps API. However, what if one needs to geocode not the current, but historical addresses? Are modern maps useful in this case? Our answer is yes, but some modifications of the standard operations might be useful. For the study of the Great Terror in the USSR in 1937-1938, we needed to geocode the addresses of 70,000 people who lived in five regions of the Russian Federation when she was a republic in the former USSR. This information came from the Victims of political terror in the USSR database, created by the Memorial Society. The results of this geocoding will be used in a paper on the economics of the Great Terror by Liudmila Lyagushkina (HSE University, Moscow) and Andrei Markevich (New Economic School, Moscow), and the code was prepared by Ivan Lyagushkin. We used victims’ places of residence before the arrest, their places of birth, or work if there was no information on places of residence. The addresses typically included the name of the region, district, and a name of the settlement, i.e., city, town or village. We did not use the street names, as they were not typically mentioned in the data. For this study, placing victims inside the correct district of the exact region was the biggest challenge, as most of the other data available in the study was at a district level. In some cases, such a type of geocoding could be done with the help of historical directories, which contain historical names and addresses of the settlements (see, for example, Gazetteer of British Place Names). However, no such sources were found for the exact period of the 1930-s, and the sheer scale of Russia’s territory allows us to suppose that even the bookkeeping of address lists or other reference books were not as detailed and accurate as we needed. Another option might be to scan historical maps and locate the relevant names of the settlements on them. However, the maps that we found in the library were good for georeferencing borders of districts, but not detailed enough for finding all the settlements’ names. For example, Picture 1 represents one section of the administrative map of the Altai Region (1939). It provides only 20-30 settlement names for one district. However, there were many more of them in our database. 1 This memo is a part of a larger project on the economics of the Great Terror by Liudmila Lyagushkina (HSE University, Moscow) and Andrei Markevich (New Economic School, Moscow). We would like to thank Anna Samodelkina for her excellent research assistance. 1 Picture 1. A section of the administrative map of the Altai Region (1939). Source: The Russian State Library (RSL). Thus, we decided to use modern maps and chose Yandex.Maps API Geocoder (https://yandex.com/dev/maps/geocoder/), as Yandex.Maps are typically more detailed and adapted for Russia then Google Maps. This Geocoder works as follows: a user sends its addresses via API, and it returns geographical coordinates (latitude and longitude) and the structured description of this place (modern region, district, settlement name, street name etc.). However, having started to work with Geocoder, we soon realized what the main problem was: it returned the result, i.e., the coordinates in almost any case, even when it did not detect the exact place. If the name of the Russian region and a settlement which is not on Yandex Maps were in the address, Geocoder returned the coordinates of the geographic center of the Region. Sometimes, it returned completely false results relying on some historical names. For example, Kamchatskaya governorate, which existed in the 1920-s on the Kamchatka Peninsula, might be geocoded as Kamchatskaya street in Moscow. Therefore, we decided to create a more sophisticated algorithm. At first, we took the georeferenced historical administrative maps of the regions that were created for the main project on the economics of the Great Terror. Using ArcGIS, we downloaded the coordinates of the vertices of each district and created polygons. The result of this georeferencing imposed on Yandex Maps can be seen in Picture 2. 2 Picture 2. Historical borders of the districts of the Altai Region on Yandex Maps. Screenshot from https://ivliag.github.io/ Subsequently, we searched the victims’ addresses to locate in them the names of historical districts using the regular expressions programming language. We attributed most of the addresses to exact districts and checked the results, corrected false attributions because of the typos, renames of the places, deleted the addresses which were not inside the studied regions due to different reasons etc. Finally, we deleted the names of the districts from addresses to reduce the likelihood of wrong results like in the case of Kamchatka mentioned before. Following that, Yandex Geocoder searched each settlement name inside the polygon of the assigned district. It found coordinates in about 85% of the cases. It was hard to come up with an exact figure as there were some false positive results. If Yandex did not find the exact place inside the district, it searched for this place inside the region. Afterwards, we moved to the last and the most time-consuming part. We manually checked the results of geocoding in cases when the located coordinates were outside the attributed districts, and when Yandex proposed multiple choices for one address, or no coordinates were found. The latter case was really rare. For this reason, we abandoned the idea of searching on Google Maps as there were lots of options and it was hard to check and choose the correct one. We also used some additional resources to find addresses. At first, we searched on Yandex and Google, then we checked Wikipedia and lots of additional sources, even the forums of genealogists. From all sources, the website with historical maps Retromap (http://retromap.ru/) was the most useful. As we mentioned before, we did not find detailed maps of the studied regions of the 1930-s. However, this website contains detailed georeferenced maps of the world of the 1970-1980-s. Some places, for example villages that were abandoned, were mentioned on these maps as ‘urochishe’ (landscape unit). However, the names of such small landscape units are not optically recognized on the map and one could locate them only by looking at the map. Retromap interface also allows the users to copy the coordinates of the exact point on the map (see Picture 3), and this option helped a lot. 3 Picture 3. A screenshot of the Retromap interface, showing the surroundings of Barnaul (the Altai Region) on the 1975 map. Source: http://retromap.ru/ As a result, we assigned districts to 90% of all the victims (see Table 1), and geographical coordinates to 80% of all victims. In most cases, the lack of coordinates was explained by the fact that the address was outside of the studied regions. It happened most typically when we had only the place of birth of a person, but they were born in one region and arrested in another. The next typical reason was that the address itself was only at the district level. Finally, we did not find the mentioned places on the maps in only 3% of the cases, either using Yandex Maps or manually. District attributed Geographical coordinates attributed Coordinates were not gained because: Address was outside the studied region Address contained only information about district Settlement was not found here was no address in the database There was no address in the database Number of cases 62861 55692 Percentage of cases 90% 80% 6390 9% 5197 7% 2083 455 3% 1% 4 Table 1. The results of the geocoding of the addresses of the 65,000 victims of the Great Terror in the USSR. This was not the main purpose of the research, but geocoding allowed us to create powerful data visualization. For example, Picture 4 shows the geographical distribution of the arrested men (blue) and women (pink). Picture 5 demonstrates the arrest patterns of people of different nationalities and ethnicities. Picture 4. Geographical distribution of the arrested men (blue) and women (pink) in the Altai Region, 1937-1938. A screenshot from Tableau. Picture 5. Geographical distribution of the arrests of different ethnicities in the Altai Region, 1937-1938. A screenshot from Tableau. Our code is stored at GitHub, and can be used in solving similar problems and challenges. We used Geocoder free of charge, because at the time we did geocoding the daily limit was 25,000 addresses. Now this limit has dropped to 1,000 addresses a day. One can modify the script to geocode the results by 1,000, or use alternative APIs. 5
Keep reading this paper — and 50 million others — with a free Academia account
Used by leading Academics
José Manuel Santos
University of Salamanca
Jakob Tanner
University of Zurich, Switzerland
Fabien Montcher
Saint Louis University
Thomás A S Haddad
Universidade de São Paulo