In order to study the dynamics of participation in congresses, to reconstitute the networks of co-participation and the geographical origin of participants, we rely on the data from the congresses. This consists of the list of participants’ names and their affiliations on the one hand, and the list of papers and the composition of the sessions in which they were presented on the other. Most of this data is available online via the programme of events. It is also possible to access registration and abstract submission data by contacting the organisers of the events studied.
Another aspect of our research consists in mobilising complementary sources to reconstitute the links between participants that exist beforehand or that are established as they participate in the events studied. We are particularly interested in links that can be traced from scientific publication data: links of co-authorship in publications, co-participation in special issues of journals or even links of citations.
To do this, we retrieve bibliographic information by mobilising the content of publication databases accessible online via API calls. As access to these databases is more or less open and the coverage of these sources varies greatly, we seek to multiply the sources. In order to match all these data, it is necessary to work on disambiguating the names of authors; and to deduplicate the information when it has been retrieved several times from different sources.
For the processing of geographical information from participation data and publication data, we apply the method set up within the framework of the ANR Geoscience and the Netscience project, presented on the Geoscimo website.
This method assumes a first level of structuring of the geographical information, which is the triplet “city, province, country”. To simplify this initial information, the triplets are subjected to homogenisation operations using gazetteers. The automatic geocoding procedure then makes it possible to associate a set of geographical coordinates (latitude and longitude) to each triplet. The coordinates make it possible to locate each scientific publication on a map. In order to work with homogeneous entities on a global scale, the nearby localities are grouped together to obtain agglomerations. After choosing a counting method (fractional counting), geographic information tables on production, collaborations and citations by urban areas or couples of urban areas are obtained. In addition to the statistical and network analyses, the final information is represented using complementary visualization devices: maps, networks and graphs.