Ландэ Д.В., Балагура И.В., Андрущенко В.Б.

Построение сетей соавторства по данным сервиса Google Scholar Citations

// Открытые семантические технологии проектирования интеллектуальных систем (OSTIS-2016): материалы VI междунар. науч.-техн. конф. (Минск 18-20 февраля 2016 года) / - Минск: БГУИР, 2016. - С. 233-237.

В работе приводится алгоритм построения сети соавторства ученых, регулируемой их научными интересами.
Сеть соавторства формируется на основе зондирования сервиса Google Scholar Citations. Показано, что дескрипторы, определяющие тематическую направленность, влияют на размер формируемой сети, а также на динамику ее роста. Показано, что кластеры в сетях соавторства могут рассматриваться как основа для выявления научных школ.


Lande D.V., Balagura I.V., Andrushchenko V.B.

Institute for Information Recording NAS of Ukraine, Kiev , Ukraine

The algorithm of creation of the network of a coauthorship of scientists regulated by their scientific interests is given in work. The network of a coauthorship is formed on the basis of sounding of the Google Scholar Citations service. It is shown that the descriptors defining subject influence the size of the formed network, and also dynamics of its growth. It is shown that clusters in networksof a co-authorship can be considered as a basis for identification of schools of sciences.


The objective of the work . is the description of the theoretical principles and methods of automatic formation of the co-authoring networks, in particular in the fields - Complex Networks and Text Mining sounding the great information network. To attain this aim the specific algorithm of Google Scholar Citations service scanning is used to receive the representative co-authors bank as the base units for the future network. Within the sounding notion we will perceive the small size fetch of the most important content from the large networks, which couldn.t be sounded by the processing reason [Lande, 2015].

It.s evident that the co-authoring network can be of a big size, if is not measured by the defined theme, targeted by the tags of the first author . the origin of the network formation.

Such an effect complicates considerably the perception of the formed network and reduces to the effect of .themes drift.. Also the identical last names and initials spelling can occur. To cope with these effects the thematic filtering is used i.e. the used descriptors are referred to authors of the scientometric network, and define their thematic direction. Accordance of these descriptors in the final analysis defines the size of the formatting co-authoring networks and the dynamic of its growth. In addition the clusters identification in such networks can be perceived as a basis for the science schools, experts. groups etc. extraction [Lande etc, 2013].

Main Part

It is appropriate to use the approved on the peering networks (peer to peer, P2P) models, based on the equality of participants. Peering networks consist of units; each of it interacts only with the several subsets of other units, which corresponds to the co-authoring network.

The sounding of the reference model network is provided according to the next algorithm [Lande, 2015]:
1. The several number of reference (sounding) network units are defined as the basic ones for the new network, according to the sounding results (in the common case the one unit is chosen).
2. For every unit of the reference network the allied units are defined (co-authors), they are added to the network as the result of sounding. The arcsconnections to these units are formed from the root unit.
3. From the current unit of the reference network the pass to the randomly chosen neighboring unit (coauthor) is implemented
4. If the circularity takes place or there is the mismatch of the unit to the several measuring condition, the pass to another randomly chosen unit is implemented. If there is no such unit, the network is considered to be built.

Exactly on the results of the quality modeling there was made a conclusion about the opportunity of forming the small branches of connected co-authors, according to the tags, users of Google Scholar Citations service are interested in.

The described algorithm was adapted to the real coauthoring network of Google Scholar Citations in such a way:
1. The first (root) author to begin the sounding is chosen.
2. The list of the basis tags according to the most important conception is defined appraisaly.
3. The page of the web-service of the chosen author opens.
4. All the co-authors from the chosen author profile are added to the forming network.
5. The arcs-connections are tracing to these units (co-authors) from the root unit (author).
6. From the list of the forming network units the unit for the next page transition for the further analysis is chosen randomly. This unit must meet the themes of the chosen subject field (its tags are included into the descriptors, defined on the step 2) and is not the part of the units, which were traced.
7. If such a unit is chosen, so the pass to the step 3 is implemented.
8. If there is no such an author, the network is considered to be built.

According to the described algorithm the process of the network sounding from the several (root) unit is stopped after the circularity, when according to the algorithm the pass is implemented to the unit, been traced, and also if the left units are vary from the main themes (it defines by taking into account the lexical make-up of the tags). And the exact .circularity. is the feature of the pass to another root author or the end of the sounding process.


The suggested attempt is directed to form the networks of co-authorship in frames of the knowledge domain, limited elements of which are the several tags, targeted previously by the scientists . members of the Google Scholar Citations project. It.s necessary to notice that the basic difference of the suggested model of automatic way of the network formation from the existed ones, based on the direct participation of the experts in choosing straight units and connections. In this case the researcher uses only the tiny knowledge parts, inlayed by co-authors, tags, marked as the main for them. Thus the expert environment is widened considerably. The model is used for the science fields Complex Networks and Text Mining in frames of the Google Scholar Citations, but the suggested attempt can be used for other knowledge domains and scientometric arrays. The modeling results received using the procedure proposed in the chapter 3 also can be applied to create the subject field model.