- Conference data
Conference participation data is the source from which our method is based. First, we collect, structure and normalize lists of names of participants in several editions of the studied events. Then, we rely on these same source data to reconstruct co-participation and co-presentation links and their evolution.
- Name standardisation
The normalisation of participants’ names is achieved using automatic text processing methods. The functions of the R package {stringr} are used. Using additional information such as the email of the participants help us to identify that a person is the same from one edition to the next, even when he or she has registered with a name spelled differently from one time to the next or when the name has become compound following a marriage.
- Networks
Several options are possible to reconstruct the networks as the co-presence information is present at different levels:
– participation in the same edition of the conference
– participation in the same thematic session
– participation in the same panel of the thematic session
– co-presentation in the same paper
It is also possible to reconstitute links through the sharing of common attributes, e.g. sharing the same theme, sharing the same geographical origin, or sharing the same laboratory.
We do most of the work of reconstitution and analysis of the networks with the help of the functions of the {igraph} package. We also use functions from the {ggplot}, {VizNetwork} and {cartography} packages to visualise the networks.
- Fetching data from publications
Sensitivity analysis
In the populations studied, some surnames are extremely common and it is then possible to extract a very large number of publications associated with these names. Other names are much less common. Some names pose an additional problem because they are associated with numerous forms, as the first name or surname is sometimes composed of or enriched by an initial. To anticipate these different problems and measure the level of difficulty represented by the population, we first carry out a sensitivity analysis, i.e. we study how many publications and forms are associated with the family names of our populations. This step is based on data from the Web of Science Index.
Fetching publications
Once the sensitivity analysis has been performed, queries are formulated to optimise information retrieval. This phase of our work is ongoing and will be described in more detail later.
Disambiguation
Once the publication metadata has been retrieved, it will be a matter of ensuring that the publications are indeed those of the individuals in our population. A disambiguation algorithm will be applied. The literature on the subject is very rich and tends to be increasingly abundant in recent years, a review of the literature will be conducted to make the choice of the most appropriate method for our research.
Network reconstruction
When these processing phases are complete, we will be able to reconstruct the networks of co-authorship and citations between participants and overlay them on the networks previously built up from the conference data. We will then have enriched and freely exploitable information on the dynamics of the scholarly networks of a population of active scientists in several disciplines, allowing original comparisons on the modes of socialisation and production of science in distinct communities.