search_and_format.Rmd
library(bibou)
library(tidyverse)
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr 1.1.4 ✔ readr 2.1.4
#> ✔ forcats 1.0.0 ✔ stringr 1.5.0
#> ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
#> ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
#> ✔ purrr 1.0.1
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Let’s imagine we wanted to obtain bibliographic data from Web of Science about the Rhone river. We searched keywords “Rhone” and “river” (or “basin” or “catchment”) either in title or topic field.
Here we obtained a bit more than 2000 references, which is too much for a single export. Indeed, in case one wants to export the full references, including cited references, the maximum number of references one can export at once is 500. We’re going to export bibliographic references in several batches, according to years of publication.
Here, we’re going to repeat this process for periods going to
We hence obtain 5 .bib export files, which are provided with the
package (you can find their path once the bibou package is installed
with function system.file()
).
We merge these files into a single file using
bibou’s function bib_merge_files()
:
bibtex_file_1=system.file("savedrecs(1).bib",package="bibou")
bibtex_files=purrr::map_chr(as.character(1:5),~stringr::str_replace(bibtex_file_1,"1(?=\\).bib)",.x))
bibtex_files
#> [1] "/tmp/RtmpWs9RVN/temp_libpath84cc28429a80/bibou/savedrecs(1).bib"
#> [2] "/tmp/RtmpWs9RVN/temp_libpath84cc28429a80/bibou/savedrecs(2).bib"
#> [3] "/tmp/RtmpWs9RVN/temp_libpath84cc28429a80/bibou/savedrecs(3).bib"
#> [4] "/tmp/RtmpWs9RVN/temp_libpath84cc28429a80/bibou/savedrecs(4).bib"
#> [5] "/tmp/RtmpWs9RVN/temp_libpath84cc28429a80/bibou/savedrecs(5).bib"
bib_merge_files(bibtex_files,
to="data/savedrecs_merged.bib")
When exporting from Web of Science in several batches, one can
mistakenly export the same references several times. The function
bib_remove_duplicates()
can clean the merged files by
removing duplicated references (detected based on their
common identifier).
bib_remove_duplicates(from_file="data/savedrecs_merged.bib",
to_file="data/savedrecs_clean.bib")
We can now import these references as a table using function
bib_tib_doc()
. This import relies on the
bibliometrix
package import method and, as such, formats
descriptors in the same way. The complete list of
descriptors is available here.
The most important fields are:
Variable | Description |
---|---|
AU | Authors’ Names |
TI | Document Title |
SO | Journal Name (or Source) |
JI | ISO Source Abbreviation |
DT | Document Type |
DE | Authors’ Keywords |
ID | Keywords associated by WoS database |
AB | Abstract |
C1 | Authors’ Affiliations |
CR | Cited References |
TC | Times Cited |
PY | Publication Year |
SC | Subject Category |
UT | Unique Article Identifier |
library(bibliometrix)
tib_doc=bib_tib_doc("data/savedrecs_clean.bib")
#>
#> Converting your isi collection into a bibliographic dataframe
#>
#> Done!
#>
#>
#> Generating affiliation field tag AU_UN from C1: Done!
dim(tib_doc)
#> [1] 2097 60
colnames(tib_doc)
#> [1] "id_doc" "AU"
#> [3] "DE" "ID"
#> [5] "C1" "CR"
#> [7] "AB" "PA"
#> [9] "affiliations" "AR"
#> [11] "EM" "book.author"
#> [13] "book.group.author" "BO"
#> [15] "da" "DI"
#> [17] "GA" "eissn"
#> [19] "esi.highly.cited.paper" "esi.hot.paper"
#> [21] "earlyaccessdate" "BE"
#> [23] "FU" "FX"
#> [25] "BN" "SN"
#> [27] "JI" "SO"
#> [29] "LA" "meeting"
#> [31] "month" "note"
#> [33] "NR" "PN"
#> [35] "oa" "orcid.numbers"
#> [37] "organization" "PP"
#> [39] "PU" "SC"
#> [41] "researcherid.numbers" "SE"
#> [43] "TC" "TI"
#> [45] "DT" "UT"
#> [47] "usage.count.last.180.days" "U2"
#> [49] "VL" "web.of.science.categories."
#> [51] "web.of.science.index" "PY"
#> [53] "RP" "DB"
#> [55] "J9" "AU_UN"
#> [57] "AU1_UN" "AU_UN_NR"
#> [59] "SR_FULL" "SR"
AU | TI | SO | DT | DE | TC | PY |
---|---|---|---|---|---|---|
GOLLMANN G;BOUVET Y;KARAKOUSIS Y;TRIANTAPHYLLIDIS C | GENETIC VARIABILITY IN CHONDROSTOMA FROM AUSTRIAN, FRENCH AND GREEK RIVERS (TELEOSTEI, CYPRINIDAE) | JOURNAL OF ZOOLOGICAL SYSTEMATICS AND EVOLUTIONARY RESEARCH | ARTICLE | ALLOZYMES; GENETIC VARIATION; RANGE EXPANSION; RIVER CAPTURE; CYPRINIDAE; CHONDROSTOMA | 17 | 1997 |
FRUGET JF;CENTOFANTI M;OLIVIER JM | THE FISH FAUNA OF THE DOUBS RIVER PRIOR TO COMPLETION OF THE RHINE-RHONE CONNECTION | ENVIRONMENTAL MANAGEMENT | ARTICLE | FISH COMMUNITIES; REGULATION; RESTORATION; FLOODPLAIN; LARGE SHIP CANAL; DOUBS RIVER | 2 | 1998 |
BRAVARD JP | TECTONICS AND FLUVIODYNAMICS AT THE SAONE-RHONE CONFLUENCE FROM THE WURM TO THE HOLOCENE (FRANCE). | GEOGRAPHIE PHYSIQUE ET QUATERNAIRE | ARTICLE; PROCEEDINGS PAPER | NA | 11 | 1997 |
HUGHES FMR | FLOODPLAIN BIOGEOMORPHOLOGY | PROGRESS IN PHYSICAL GEOGRAPHY-EARTH AND ENVIRONMENT | REVIEW | FLOODS; FLOODPLAIN VEGETATION; RESTORATION; SOIL MOISTURE; REGENERATION | 212 | 1997 |
VINSON MR;HAWKINS CP | BIODIVERSITY OF STREAM INSECTS: VARIATION AT LOCAL, BASIN, AND REGIONAL SCALES | ANNUAL REVIEW OF ENTOMOLOGY | REVIEW | BIODIVERSITY; AQUATIC INSECTS; MAYFLIES; STONEFLIES; CADDISFLIES | 349 | 1998 |
FERNANDES MB;SICRE MA;BOIREAU A;TRONCZYNSKI J | POLYAROMATIC HYDROCARBON (PAH) DISTRIBUTIONS IN THE SEINE RIVER AND ITS ESTUARY | MARINE POLLUTION BULLETIN | ARTICLE | NA | 298 | 1997 |
LOIZEAU JL;DOMINIK J;LUZZI T;VERNET JP | SEDIMENT CORE CORRELATION AND MAPPING OF SEDIMENT ACCUMULATION RATES IN LAKE GENEVA (SWITZERLAND, FRANCE) USING VOLUME MAGNETIC SUSCEPTIBILITY | JOURNAL OF GREAT LAKES RESEARCH | ARTICLE | VOLUME MAGNETIC SUSCEPTIBILITY; CORE CORRELATION; CS-137; SEDIMENT; ACCUMULATION RATE; SEDIMENT SUPPLY; LAKE GENEVA | 23 | 1997 |
MARSALEIX P;ESTOURNEL C;KONDRACHOFF V;VEHIL R | A NUMERICAL STUDY OF THE FORMATION OF THE RHONE RIVER PLUME | JOURNAL OF MARINE SYSTEMS | ARTICLE | RIVER PLUME; NORTHWESTERN MEDITERRANEAN; THE RHONE; JEBAR EFFECT; TWOFOLD SIGMA COORDINATE | 66 | 1998 |
COLLINA-GIRARD J | SUBMARINE SKETCHED PROFILES ANALYSIS ALONG PROVENCE COAST, USING SCUBA-DIVING, EUSTATICS AND NEOTECTONICS RESULTS | COMPTES RENDUS DE L ACADEMIE DES SCIENCES SERIE II FASCICULE A-SCIENCES DE LA TERRE ET DES PLANETES | ARTICLE | NEOTECTONIC; EUSTATISM; PROVENCE; FRANCE; SUBMARINE PROFILES; SCUBADIVING; HOLOCENE; COVE OF COSQUER | 10 | 1997 |
GALASSI DMP;DE LAURENTIIS P | TWO NEW SPECIES OF NITOCRELLA FROM GROUNDWATERS OF ITALY (CRUSTACEA, COPEPODA, HARPACTICOIDA) | ITALIAN JOURNAL OF ZOOLOGY | ARTICLE | HARPACTICOIDA; NITOCRELLA; SPRINGWATERS; STYGOBIONT | 11 | 1997 |