Best Practices for Geography Baseline Upload on Collect

Collect allows you to upload your base data in bulk which can be used by your data collectors in every response they fill. This can be achieved by the Monitoring feature which include a "Baseline Upload".

Imagine you are surveying the people of a place called Malgudi, which is in the state Narayan, in the country Buk. For every person you survey, you will have to enter Malgudi, Narayan, and Buk over and over again. Think about how tedious typing Malgudi, Narayan, and Buk will get if you survey even 100 people. Now imagine that you have to survey 1000 people. Looking at the brighter side, you now have 10 people to help you out with the task.

2SywOZGy-qVmnMO6iViJ_ggqP1Z8l709GgNow this task has become repetitive and prone to mistakes. Wondering why? Because these 10 people who are helping you out will each write a different variation of Malgudi: malgoodi, mal gadi, mal gudi; Narayan: narayn, naraayand, Narayana; and Buk: buck, book, Bukh.

We are sure you have better things to do than correcting the spellings of Malgudi, Narayan, and Buk, or even typing in Malgudi, Narayan, and Buk countless times. This is where the baseline and the monitoring question comes in!


“A baseline is a complete listing of all the entities at the center of your survey.”

Baseline and monitoring questions can help you avoid the pain of manually entering the geographic area associated with the object of the survey and the wide variation of errors and spellings that subsequently arise.

Similarly, based on the kind of projects you are running, they can be extended further. For example, if you are interested in tracking children through their growing years, the baseline would be a list of all the children under the scope. If you are interested in studying the different colleges around the world, the baseline would be a listing of all the colleges.

This entity list is then used as a monitoring question in your questionnaires simplifying your data collection. Once this is done, your auditors can just select the appropriate geography from this master list without having to guess the correct spelling. This is more efficient and ensures consistency, allowing data to be collected in the cleanest way possible.



The baseline is clearly central to data collection and it will be a part of every survey form. This means that any errors in the baseline will translate into the collected data and impact the findings hence gathered. Thus, the correctness of the baseline is essential for reliable and actionable insights.

An ideal baseline looks like:


Creating such an entity can be a daunting prospect since India alone has more than 700 districts, and 6,500 sub-districts, and over 7 lac villages. Here are some pointers that you should keep in mind before embarking on your Collect journey.

  • Ask us for help: We have a census listing of all the sub-districts that exist in India based on Census 2011. From that list, you can subset the regions that you will be working in. If you are working in an area that is not a part of census yet, we can add it for you! The idea behind using census lists is that it is a comprehensive list of most of the geographic sub-districts that exist in the country. It also helps with building choropleths and viewing alternate data in conjunction with your primary data. 
  • Add your own context: If you think the geography sub-divisions provided by us do not offer sufficient representation to your entity, feel free to add that extra information. The more context you give an entity, a context that is relevant and uniquely identifies it, the better your baseline will be.
  • Make sure you use codes: It's possible that two entities of the same name might exist, like two villages with the same name, in a sub-district. You can uniquely identify each village by using the gram panchayat they belong to in conjunction with the name, but it is wiser to attach a code to an entity. In the unlikely scenario that even the gram panchayats match, the code will be there to tell the two villages apart. 

  • Use one code for one entity: Since we use codes to uniquely identify an entity, make sure that once a code has been assigned to an entity, no other entity gets assigned the same code. Using codes is imperative since all the correct data triangulation will hinge on your entity codes. Also, differentiate between entity series using different code series. If you are using 01 for districts, use 001 for sub-districts, and 0001 for villages so that not just the numbers, the series is different as well. 
  • Assign unique names: It is also best to use different names for entities wherever possible. At the very least, the name of an entity should be unique within the entity it belongs to. Imagine, if you are working in a village that has two anganwadis by the name of Bahubal. Even if you give them separate codes, the person on the field will not be able to tell the two Bahubals apart and will end up mixing the data entry for the two anganwadis. So, using Bahubal-1 and Bahubal-2 is better. 

  • Say no to blanks: While creating the format for baseline upload, remember that baseline is a listing of all entities in your survey, so having blanks in places would moot the point. Hence, mark all questions as mandatory. This will disallow any blank values from getting uploaded.
  • Codes should be characters: If you are creating a listing in excel sheet and an entity is repeating itself multiple times, make sure that when you copy over the code of your entity onto the next row, the code remains the same. Usually, codes are all numeric and excel sheets add one when one attempts to copy over codes by dragging. (Pro-tip: Convert the column to text and then copy.)