Articles on Data analytics

Displaying 1 - 20 of 62 articles.

global analysis of data

For over a century, baseball’s scouts have been the backbone of America’s pastime – do they have a future?

H. James Gilmore , Flagler College and Tracy Halcomb , Flagler College

global analysis of data

Robo-advisers are here – the pros and cons of using AI in investing

Laurence Jones , Bangor University and Heather He , Bangor University

global analysis of data

AI threatens to add to the growing wave of fraud but is also helping tackle it

Laurence Jones , Bangor University and Adrian Gepp , Bangor University

global analysis of data

Twitter’s new data fees leave scientists scrambling for funding – or cutting research

Jon-Patrick Allem , University of Southern California

global analysis of data

Insurance firms can skim your online data to price your insurance — and there’s little in the law to stop this

Zofia Bednarz , University of Sydney ; Kayleen Manwaring , UNSW Sydney , and Kimberlee Weatherall , University of Sydney

global analysis of data

Two years into the pandemic, why is Australia still short of medicines?

Maryam Ziaee , Victoria University

global analysis of data

How we communicate, what we value – even who we are: 8 surprising things data science has revealed about us over the past decade

Paul X. McCarthy , UNSW Sydney and Colin Griffith , CSIRO

global analysis of data

3 ways for businesses to fuel innovation and drive performance

Grant Alexander Wilson , University of Regina

global analysis of data

Sports card explosion holds promise for keeping kids engaged in math

John Holden , Oklahoma State University

global analysis of data

Get ready for the invasion of smart building technologies following  COVID-19

Patrick Lecomte , Université du Québec à Montréal (UQAM)

global analysis of data

David Chase might hate that ‘The Many Saints of Newark’ is premiering on HBO Max – but it’s the wave of the future

Anthony Palomba , University of Virginia

global analysis of data

For these students, using data in sports is about more than winning games

Felesia Stukes , Johnson C. Smith University

global analysis of data

New data privacy rules are coming in NZ — businesses and other organisations will have to lift their games

Anca C. Yallop , Auckland University of Technology

global analysis of data

The value of the Mountain Equipment Co-op sale lies in its customer data

Michael Parent , Simon Fraser University

global analysis of data

Disasters expose gaps in emergency services’ social media use

Tan Yigitcanlar , Queensland University of Technology ; Ashantha Goonetilleke , Queensland University of Technology , and Nayomi Kankanamge , Queensland University of Technology

global analysis of data

How much coronavirus testing is enough? States could learn from retailers as they ramp up

Siqian Shen , University of Michigan

global analysis of data

Tracking your location and targeted texts: how sharing your data could help in New Zealand’s level 4 lockdown

Jon MacKay , University of Auckland, Waipapa Taumata Rau

global analysis of data

How sensors and big data can help cut food wastage

Frederic Isingizwe , Stellenbosch University and Umezuruike Linus Opara , Stellenbosch University

global analysis of data

Data lakes: where big businesses dump their excess data, and hackers have a field day

Mohiuddin Ahmed , Edith Cowan University

global analysis of data

How big data can help residents find transport, jobs and homes that work for them

Sae Chi , The University of Western Australia and Linda Robson , The University of Western Australia

Related Topics

  • Artificial intelligence (AI)
  • Data analysis
  • Data collection
  • Data privacy
  • Data science
  • Social media

Top contributors

global analysis of data

Professor in Business Information Systems, University of Sydney

global analysis of data

Lecturer in Finance, Bangor University

global analysis of data

Adjunct Professor and Industry Fellow, UNSW Sydney

global analysis of data

Professor of Finance, UNSW Sydney

global analysis of data

Productivity Growth Program Director, Grattan Institute

global analysis of data

Senior Lecturer in Applied Ethics & CyberSecurity, Griffith University

global analysis of data

Professor of Law, University of Sydney

global analysis of data

Professor of Computing Science, Director of the Digital Institute, Newcastle University

global analysis of data

Senior Research Fellow, Allens Hub for Technology, Law & Innovation, and Senior Lecturer, School of Private & Commercial Law, UNSW Sydney

global analysis of data

Strategy & Business Development, CSIRO

global analysis of data

Professor of Urban and Cultural Geography, Western Sydney University

global analysis of data

Professor of Organisational Behaviour, Bayes Business School, City, University of London

global analysis of data

Leader, Machine Learning Research Group, Data61

global analysis of data

Deputy Vice-Chancellor (Research), University of Tasmania

global analysis of data

Research Fellow in Science Communication, UNSW Sydney

  • X (Twitter)
  • Unfollow topic Follow topic

Data and analytics: Why does it matter and where is the impact?

McKinsey is currently conducting global research to benchmark data analytics maturity levels within and across industries. We encourage you to take our 20-minute survey on the topic 1 1. http://esurveydesigns.com/wix/p30952257.aspx (individual results are kept confidential), and register to receive results showing your organization’s maturity benchmarked against peers and best practices.

The promise of using analytics to enhance decision-making, automate processes and create new business ventures is well established across industries. In fact, many leading organizations are already recognizing significant impact by leveraging data and analytics to create business value. Our research indicates, however, that maturity often varies by function or sector (or both), based on a number of contributing factors; for example:

  • Marketing and Sales: Maturity in marketing and sales analytics tends to be more advanced, at least in the B2C context. Customer segmentation and personalization, social signal mining, and experimentation across channels have become mainstream across a number of industries, including retail, banking/insurance, and utilities. Intensity and sophistication largely varies and can still offer a significant competitive advantage if multiple analytics domains such as pricing, loyalty and segmentation are cleverly combined and integrated.
  • Operations: Maturity of advanced analytics in operations tends to be lower. This is usually because opportunities are harder to spot and cross-business domain knowledge is required to create a step change. Also, use cases in operations are often connected with leveraging sensor and equipment data, which can be difficult to effectively expose for analysis. Data and analytics use in operations has traditionally included identification of new oil and gas drilling sites, but has now come to include mining sensor data for predictive maintenance, integrated and demand-driven workforce management and realtime scheduling optimization.
  • Data-driven ventures: Only a few firms have started to explore the power of big data and advanced analytics to step outside their current business, either by leveraging internal data or developing analytics insights to offer as a service to customers. Examples include credit card companies providing data-driven customer targeting, or telecom companies selling location data for traffic monitoring and fraud detection. We believe that similar opportunities can be identified in the operations space and provide a competitive difference to those who do it well.

While some leading organizations are realizing great success with the emergence of these new capabilities, most companies are still in an exploration and piloting phase and have not scaled them up. McKinsey’s digital survey in 2014 revealed that while respondents felt that data and analytics would be one of the top categories of digital spending in three years’ time, they were also far more likely to believe that they were currently underinvesting in the space. Additionally, nine out of ten executives claimed that their companies would have a pressing need for digital talent in the next year, and nearly 60 percent of CIOs and CTOs polled thought that the need for data and analytics expertise would be more acute than other talent gaps.

Given the value at stake, how do companies ensure an effective data strategy and recognize impact from the promise of analytics?

Our work helping clients to build robust programs in data analytics suggests that winners have a clear strategy and follow best practices across five key areas:

  • Strategy and value: Understanding the business case for pursuing each use case and how it aligns with the company’s overall value is critical to ensure that whatever is built delivers the business impact expected. Additionally, organizations must ensure that data and analytics is high on the senior management agenda and be prepared to invest in talent, data, and technology at scale.
  • Talent and organization: While the decision to centralize or federate data and analytics capabilities depends largely on the anticipated use cases, the organizational positioning of any central group and the presence of analytics talent both centrally and in domain-specific roles is critical. Commodity services such as data cleansing or data infrastructure management may be outsourced to free up capacity for more proprietary activities, even as companies leverage capability-building programs to help grow talent organically.
  • Governance, access and quality: Analytics leaders ensure that data from disparate systems such as finance, customer, suppliers, and transactions are linked and available across the organization, while also ensuring that proper accountability and policy management techniques are in place and tied to performance metrics. Distribution of reports is often quick and automated, and prominent use is being made of both external, open and unstructured data.
  • Technology and tools: The broad availability of appropriate advanced tools for data scientists, power business users, and regular business users is critical to staying ahead of competition. New technologies, such as cloud, high-performance workbenches, and distributed data environments (data lakes) are a key component of successful data and analytics platforms.
  • Integration and adoption: A good indication of organizational maturity can be seen by how far various data and analytics have penetrated various business units, and the speed with which new use cases can be implemented. Leaders in the space are careful to measure effectiveness and to tie incentives and performance metrics to generate impact through analytics.

While fairly intuitive, all of these factors are difficult to implement effectively, and no single element represents a silver bullet to achieve competitive advantage. Our client work has consistently shown us that the combination of these factors leads to superior maturity, and, in turn, superior decision-making and stronger impact from data and analytics programs.

We are currently building benchmarks on how companies are performing in data and analytics relative to these five key areas. You can contribute by following the link to complete a 20-minute survey ( http://esurveydesigns.com/wix/p30952257.aspx ); a copy of the results specific to your organization will be made available to participants who register .

Josh Gottlieb is a practice manager in McKinsey's Atlanta office and Matthias Roggendorf is a senior expert in the Berlin office.

Explore a career with us

Predictions 2022: Data can help address the world's biggest challenges - 5 experts explain how

Data center with server racks in a corridor room. 3D render of digital data and cloud technology

We need a trusted, global data ecosystem. Image:  DCStudio/Freepik.com

.chakra .wef-1c7l3mo{-webkit-transition:all 0.15s ease-out;transition:all 0.15s ease-out;cursor:pointer;-webkit-text-decoration:none;text-decoration:none;outline:none;color:inherit;}.chakra .wef-1c7l3mo:hover,.chakra .wef-1c7l3mo[data-hover]{-webkit-text-decoration:underline;text-decoration:underline;}.chakra .wef-1c7l3mo:focus,.chakra .wef-1c7l3mo[data-focus]{box-shadow:0 0 0 3px rgba(168,203,251,0.5);} Rebecca King

Listen to the article

Data can help us tackle our largest societal challenges, including climate change, inequality, global health and economic resilience.

  • But how do we ensure that our global data systems are structured to capture the true value of data, not just the financial?
  • Business leaders share their perspectives on the real power of data and how it can be unlocked to address the biggest challenges of 2022.

With every advancement in the digital world, we unlock a limitless resource: data. It is both a by-product and a driver of global development that has transformed how we make decisions. Not only do we have increased granularity and accuracy to inform evidence-based decision making, but through AI and machine learning, we enable technology to make decisions on our behalf.

The value of this data is well established in the private sector. Successful businesses have captured this value through increasingly efficient and targeted advertisements and product design – with the global marketing data market worth an estimated at $52 billion in 2021. This is significant financially but fails to capture the true productive power of data.

Have you read?

This weather index measures climate-related risks. here's how, how the digital revolution can make healthcare more inclusive, employers hold too much power over information. workers must claim their data rights.

We can better understand how changing temperatures are impacting our environment and predict global weather disasters. We can measure inequalities to inform the policies that can best “close the gap”. We can limit the spread of global disease. And we can hold businesses and governments accountable to the environment and their citizens’ human rights.

We need to use data to empower the masses, not the few. But how do we ensure that our global data systems are structured to capture the true value of data, not just the financial?

Ahead of this year’s Davos Agenda virtual meeting, we invited leaders to share their perspectives on the real power of data and how it can be unlocked to help address the biggest challenges of 2022.

‘Leverage predictive analytics’

Vijay Guntur, Corporate Vice President and Head, Engineering and R&D Services, HCL Technologies

It is anticipated that 2022 will see a proliferation of COVID-19 mutations and that the power of data will be the key to minimizing their impact on the world. Big data and IoT technologies are evolving at an unprecedented pace to enable us to collect, prepare, analyze, anonymize, and share pandemic related data at volumes, and at a velocity, that would have been unimaginable a few years ago. Access to a trusted, global data ecosystem enables healthcare professionals, governments, and big business to leverage predictive analytics, model different scenarios, and refine and redeploy those models as more data becomes available.

We expect that the disruption we have seen in the global supply chain will continue throughout 2022 and that once again the power of data will be the key to relieving much of the stress this has caused. The use of low latency data transmission via 5G networks, streaming IoT data, and real-time insights will provide demand planning, forecasters, and logistics managers with better visibility into the various parts of their supply chain and enable them to react instantly when problems occur.

Data can help us prepare for further COVID-19 variants.

'Developing a global talent pool'

Igor Tulchinsky, Founder, Chairman and CEO, WorldQuant

If nothing else, 2021 reinforced the inevitability of uncertainty. Thanks to the growth in data and the increasing power of AI and machine learning, we are now in the age of prediction. We have already seen the promise of prediction in sectors like healthcare, where Weill Cornell Medicine enhanced its machine learning capabilities to predict COVID-19 infections within two hours – much faster than is possible with RT-PCR tests.

In the year ahead, I expect the role of predictive analytics to continue growing across public and private sectors, embedding itself in many aspects of work and life. But, grappling with the growing surge of information requires an increased focus on developing a global talent pool with the right technical skills, unified by advanced, shared goals, to interpret it and realize prediction’s full potential.

The needs of the future present a massive opportunity and maintaining a global mindset will enable new sources of talent to contribute significantly. Organizations are already embracing new ways of work, talent sourcing and development, which will be critical to succeed in the age of prediction. There is tremendous potential for business and society to harness this opportunity and have an exponentially positive impact, globally.

‘ Enable data to flow across borders'

Dr. Norihiro Suzuki, Vice President and Executive Officer, Chief Technology Officer, General Manager of the Research & Development Group and General Manager of the Corporate Venturing Office, Hitachi, Ltd.

The answer to many of our unsolved problems lies hidden in the almost unfathomable trove of data in existence. This wealth of knowledge can accelerate solutions from climate change to urbanization and education. For example, through work with The Centre for the Fourth Industrial Revolution ( C4IR) Japan and the G20 Smart Cities Alliance , we are harnessing data to create safer, viable, and sustainable cities.

However, a lack of coordination in data governance and regulation is restricting international data flows – each country has only a fraction of the information they need to effectively tackle global challenges. We need to enable data to flow across borders , by building trust between business and consumers, aligning regulations across jurisdictions and by governments and large organizations forming partnerships to support small and medium enterprises.

There is the view that technology is the accelerator and governance is the brake in innovation, but the truth is that they are 2 wheels on either side supporting the same innovation vehicle. Taking “trust” and “governance” into consideration from the design phase will accelerate the implementation of technology and innovation in society.

'Inclusive and responsible solutions'

Crystal Rugege , Managing Director, The Centre for the Fourth Industrial Revolution (C4IR) Rwanda

The last two years of the COVID-19 pandemic have amplified the critical role of data and technology in solving the profoundly complex and highly dynamic challenges of our time. As we look forward, we must prioritize building a comprehensive global data ecosystem that balances privacy rights, socio-economic development, and technological advancement. This calls for agile and interoperable data governance frameworks that provide a spectrum of instruments from policies to regulations that can adapt over time as new thinking evolves.

The World Economic Forum was the first to draw the world’s attention to the Fourth Industrial Revolution, the current period of unprecedented change driven by rapid technological advances. Policies, norms and regulations have not been able to keep up with the pace of innovation, creating a growing need to fill this gap.

The Forum established the Centre for the Fourth Industrial Revolution Network in 2017 to ensure that new and emerging technologies will help—not harm—humanity in the future. Headquartered in San Francisco, the network launched centres in China, India and Japan in 2018 and is rapidly establishing locally-run Affiliate Centres in many countries around the world.

The global network is working closely with partners from government, business, academia and civil society to co-design and pilot agile frameworks for governing new and emerging technologies, including artificial intelligence (AI) , autonomous vehicles , blockchain , data policy , digital trade , drones , internet of things (IoT) , precision medicine and environmental innovations .

Learn more about the groundbreaking work that the Centre for the Fourth Industrial Revolution Network is doing to prepare us for the future.

Want to help us shape the Fourth Industrial Revolution? Contact us to find out how you can become a member or partner.

Furthermore, we need open, high-quality data sets to create inclusive and responsible solutions that leverage machine learning, and other emerging technologies, to enhance our ability to deliver at scale. Finally, a global multi-stakeholder approach will be imperative in building the hard and soft infrastructure required to facilitate cross-border data flows and the circulation of knowledge to build more resilient economies and more equitable societies.

‘Change the fate of the ocean'

Kimberly Mathisen, CEO, The Centre for the Fourth Industrial Revolution (C4IR) Ocean

The old slogan “if you can't measure it, you can't manage it” has never been more relevant. Industry 4.0 technology will allow us to manage and measure in ways we are only just beginning to grasp.

By sharing ocean data, we can change the fate of the ocean by unleashing the power of data, technology and collaboration. We see strong indications the world is getting ready to share more ocean data. That is the starting point for data to become powerful, accelerating good solutions for more sustainable blue foods, more renewal energy sources, and greener transportation – a few areas where the power of data will benefit the ocean.

Achieving ocean sustainability requires transformative solutions that are based on data and science. One concrete example of benefits is our Ship Emissions Tracker , developed with NOA Ignite and Microsoft. Through open-source data combined with the renowned ICCT emission algorithm, the Tracker makes it possible to estimate the greenhouse gas footprint of each or all of 250,000 vessels in the global merchant fleet.

This provides compelling insights for progressive leaders in the shipping industry with targets to reduce emissions by at least 50% by 2050, and the end customers who want to purchase greener transport.

The power of ocean data to improve ocean health and wealth is immense. But for data to become powerful, sharing ocean data is the first step.

Related topics:

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Global analysis of large-scale chemical and biological experiments

Research in the life sciences is increasingly dominated by high-throughput data collection methods that benefit from a global approach to data analysis. Recent innovations that facilitate such comprehensive analyses are highlighted. Several developments enable the study of the relationships between newly derived experimental information, such as biological activity in chemical screens or gene expression studies, and prior information, such as physical descriptors for small molecules or functional annotation for genes. The way in which global analyses can be applied to both chemical screens and transcription profiling experiments using a set of common machine learning tools is discussed.

Introduction

Research in the life sciences has become dominated by high-throughput data collection methods. It is now common to screen many thousands or millions of small molecules in miniaturized biological tests, such as protein-targeted assays or cell-based assays [ 1• ]. In addition, it is common to perform microarray-based transcription profiling, which involves the simultaneous hybridization of thousands of DNA sequences to spatially arrayed targets [ 2 ]. An emerging challenge is the analysis and integration of the large datasets generated by these disparate high-throughput techniques.

Until recently, only a few genes or compounds postulated in advance to be modulators of a phenotype or to have activity of interest were selected for study. High-throughput methods now permit use of a hypothesis-generating strategy in which large libraries of genes or chemicals are tested for biological effects of interest. One relies on the large size and diversity of the initial collection to yield active genes or compounds rather than prior knowledge of the screening candidates or the biological processes being studied. This strategy uncovers a large and varied set of active compounds or genes that can then be studied with a targeted, hypothesis-driven approach.

Ideally, the dataset from each new high-throughput experiment is interpreted in the context of all previous results. It then becomes part of the context in which all future screens are analyzed. Building on previous results is not new, but doing so takes on a new level of importance and complexity when datasets are vast and involve extremely inter-related information, and the relevant prior experimental data cannot be stored and organized in the mind of one scientist. We use the term ‘global analysis’ to refer to an emphasis on greater integration and analysis of data from all sources.

Challenges involved in the global analysis of experimental data are illustrated by the new fields of chemical genetics and chemical genomics [ 1• ]. By analogy to classical genetics, chemical genetics uses small molecules in place of mutations as modifiers of protein function. Small molecules that modulate a process or phenotype of interest are identified through large-scale screening and serve as probes of the mechanisms underlying the biological process. Chemical genetics, like other large-scale screening approaches, integrates information from several large datasets. The activity profile of a library of compounds in a particular assay is measured and correlated with structural and chemical properties of the compounds, as well as previously documented biological activities. Chemical genomics involves the integration of chemical and genomic information and technologies. One example of the challenges of a chemical genomic approach is the integration and analysis of both transcription profiling and chemical screening data.

We will review work reported primarily within the last year that is applicable to global analyses of the properties of both small molecules and genes, focusing on: (i) selection and evaluation of physical descriptors for small molecules; (ii) new applications of machine learning algorithms; and (iii) novel approaches for analyzing microarray-based transcription profiling data.

Selecting chemical entities to screen

We restrict our discussion of chemical screens to low molecular weight organic molecules as these compounds are of particular interest in drug discovery efforts and in biological research. Small molecule screens are preferred for drug discovery because the resulting lead compounds can be more easily developed into orally available pharmaceuticals. Many of the tools for global analyses that we describe can also be applied to screens involving peptide, RNA, DNA or protein reagents.

The problem of selecting compounds to screen is a difficult one. The total number of possible organic compounds increases with molecular weight, thus, without a defined molecular weight cut-off there is an infinite number of possible compounds. Published estimates of the number of theoretical small molecule drugs range as high as 10 66 , which is close to the number of atoms in the universe [ 3 ].

One strategy for selecting compounds for screening is to purchase or make a representative set of molecules based on physical properties or functional groups. This approach amounts to an attempt to select an optimally diverse subset of the obtainable compounds for an initial screen. Jorgensen et al , for example, developed a method for evaluating the diversity of a compound collection using common subgraphs or substructural elements [ 4 ]. Xu et al , on the other hand, developed a drug-like index to aid the selection of compounds for screening. The index was trained on 4836 compounds from the Comprehensive Medicinal Chemistry database [ 5 ]. Reynolds et al evaluated two stochastic sampling algorithms for their ability to select both diverse and representative subsets of a chemical library space [ 6 ].

Much effort has also focused on exploring and quantitating the notion of molecular complexity and determining the appropriate level of complexity for small molecules used in high-throughput screens. Barone and Chanon refined a quantitative index of complexity that uses the number and size of the rings in the smallest set of smallest rings and the connectivity of each atom [ 7• ]. Alternatively, complexity can be defined as the number of interactive domains contained in a molecule. A molecule with low complexity has fewer sites of interaction with a target than a molecule with greater complexity. Hann et al devised a simple model in which complex molecules are more selective than simple compounds and, therefore, yield fewer hits in primary screens [ 8• ]. This model predicts an optimal level of complexity for compounds used in primary screens as the result of a trade-off between sufficient affinity for detection versus sufficient promiscuity to yield a reasonable number of hits. This model is consistent with recent analyses affirming that successful lead compounds are generally less complex than the resulting drugs [ 8• , 9• ].

Given the virtually unlimited sources of small molecules, there has been interest in identifying characteristics of small molecules that are useful for drugs and for creating models that predict the probability that a given compound will be able to function as a drug ( vide infra ). It is difficult to evaluate the performance of these predictive models because of the great variability in crucial factors, such as the choice of the training sets of compounds and the choice of descriptors that define the actual criteria for discrimination. Furthermore, all empirically derived predictive models are essentially interpolative and extrapolative. Models that are better at assigning close structural analogs to members of the training set (interpolation) may be worse at generalizing more abstract properties to novel structures (extrapolation) and vice versa. Thus, one must beware of inferring the overall performance of a predictive model from a too limited set of test compounds.

Nonetheless, several efforts at discriminating drugs and non-drugs have been reported recently. Ertl et al used polar atom surface area to predict the extent to which small molecules exhibit a single property of drug transport (ie, bioavailability) [ 10 ]. Anzali et al used chemical descriptors consisting of multilevel neighborhoods of atoms to discriminate between drugs and non-drugs with some success. Their training and testing sets consisted of 5000 compounds from the World Drug Index and 5000 compounds from the Available Chemicals Directory (ACD) [ 11 ]. Muegge et al developed a simple functional group filter to discriminate between drugs and non-drugs using both the Comprehensive Medicinal Chemistry and MACCS-II Drug Data Report (MDDR) databases for drugs and the ACD for non-drugs [ 12 ]. Frimurer et al used a feed-forward neural network with two-dimensional (2D) descriptors based on atom types to classify compounds from the MDDR and ACD as drug-like or non-drug-like, respectively. They reported 88% correct assignment of a subset of each library that had been excluded from the training set. They also tested their model with a different library and claimed generalizability to compounds structurally dissimilar to those in the training set [ 13 ].

Drug versus non-drug comparisons emphasize characteristics common to all drugs over those characteristics specific to a particular receptor. Drugs share a number of general characteristics, such as target-binding affinity and the ability to permeate into cells, and they must also have favorable absorption, distribution, metabolism and excretion (ADME) properties. Models that discriminate drugs from non-drugs tend to select for ADME properties rather than properties that correlate with cellular biological activity. If one is interested simply in cellular biological activity rather than the full complement of required drug characteristics, a correspondingly appropriate compound training set must be selected. For example, in chemical genetic approaches, compound libraries with enriched protein-binding affinity are valuable, whereas compounds with favorable ADME properties have little added value.

Finally, it has been noted that many natural products do not conform to the canonical rules for selecting drug-like compounds. Moreover, many natural products have been directly developed as drugs without the need for significant (or any) analog synthesis. This observation has inspired a new strategy of synthesizing natural-product like compounds using combinatorial, diversity-oriented syntheses [ 14• , 15• ].

Descriptors

For comparisons that involve molecular properties, the structural, physicochemical, and/or biological properties of the molecules need to be represented in a consistent form to permit direct comparison. A standardized representation of a molecular feature is referred to as a ‘descriptor’. The choice of descriptors plays a crucial role in the analysis of chemical screening data. A major challenge in descriptor analyses is the identification of the smallest, most easily and reproducibly calculated set of descriptors that retains all the information required to make the distinctions and comparisons of interest. Here, we discuss some general considerations concerning descriptor choice, and highlight some recent developments.

Chemical descriptors

The compounds in a database are normally identified by their 2D structural representations, which consist of a list of the constituent atoms, their interconnectivity and sometimes their relevant stereochemistry. Aside from experimental data, these 2D representations of the molecular structure typically contain all the available information distinguishing the compounds in the library. For each compound, a common set of structural/physical/chemical descriptors is generated from these 2D structures. Choosing this set of descriptors amounts to defining the ‘chemical space’ spanned by all possible descriptor representations. A correlation between regions in this chemical space and bioactivity is assumed to arise from the binding of the chemical to specific biological targets. Here, we concentrate on the case in which there is no specific knowledge of the presumed binding sites and there is a purely empirical relationship between structure and activity.

There is a tremendous range in both the complexity and the reliability of descriptors. Simple descriptors, such as atom counts, may be obtained directly and reliably from the 2D structural representation. At the other extreme of both complexity and reliability are three-dimensional (3D) descriptors that involve 3D geometry-optimization and provide no assurance of producing a conformation with in vivo relevance. A widely varying number of descriptor dimensions have been employed to describe chemical libraries, but these have all involved a reduction in dimensions and, thus, a loss of information versus the original representation. Removing information that does not distinguish molecules by the properties of interest (eg, bioactivity) decreases the computational expense involved in computing and manipulating the descriptor representations and the ‘noise’ associated with the descriptors that do not contribute to the distinction of interest. One family of widely used descriptors consists of database hash keys, which were originally designed to filter compounds quickly in substructure searches. Although experience shows that these keys are unreliable when used alone to represent compounds, they have proven useful when used in conjunction with other descriptors [ 16• , 17• , 18 ].

Considerable effort has been devoted to determining the importance of 3D (conformational) information relative to more simply and reliably obtained 2D information, but the results seem to be highly dependent on the details of the analysis and the nature of the correlation being sought. 3D conformational analysis is generally avoided in the interest of computational speed and reproducibility. Estrada et al found a significant correlation between 2D topological indices and the dihedral angle in a series of alkylbiphenyls, demonstrating that 3D properties may be implicitly represented without resorting to geometry optimization [ 19 ]. In addition, Ertl found that 2D topological information was sufficient to calculate a molecular surface polar area descriptor that was essentially identical to the value obtained with the comparable 3D calculation [ 10 ]. One limitation of topological descriptors is that they cannot distinguish between stereoisomers. To help address this problem, Golbraikh et al [ 20 ] and Lukovits and Linert [ 21 ] have introduced interesting ways of combining chirality with 2D topological information.

The descriptors chosen to describe a compound library may be very different from one another with respect to their range and distribution. Godden and Bajorath used measures derived from Shannon entropy to quantify the information content of each descriptor within a compound library. They extended this method to compare the distributions of a descriptor between different libraries [ 22• ].

Biological descriptors

There are a number of biologically relevant quantities that can be used as independent variables in a manner directly analogous to the chemical descriptors described above. Biological descriptors can be used in the global analyses of microarray-derived transcription profiling data or to interpret the results of a screen for biological activity in terms of previously known activities of compounds in the library. Chromosomal location can also serve as a descriptor. For example, Wyrick et al used chromatin immunoprecipitation and subsequent hybridization to genomic DNA microarrays to identify autonomously replicating sequences (ARS) in yeast cells. Using chromosomal location in the list of generated sequences, these authors determined that ARSs are overrepresented in subtelomeric and intergenic regions of chromosomes [ 23•• ].

Properties can be calculated directly from DNA sequence information in a manner analogous to the calculation of physical descriptors for small molecules. For example, enrichment of the fraction of guanine/cytosine base pairs (GC content) in promoter regions can be calculated directly from genomic DNA sequence. Konu et al , for example, found that gene expression levels were correlated with the GC content of the third nucleotide codon position of the message [ 24 ]. One can relate the presence of splice site sequences, promoter elements and transcription factor binding sites to gene expression level using similar strategies. For example, Bernstein et al determined that binding sites for the transcription factor Ume6p were enriched upstream of genes that are induced in sin3 mutant yeast cells [ 25 ]. This type of global analysis correlates genomic sequence information with gene expression data.

Some properties, such as gene function, may be linked to a DNA sequence through a strategy of annotation. Other possible annotations include chromosomal location, protein interactions and co-regulated expression groups. Each of these descriptors can serve as an independent variable for global analyses. Using functional annotation categories, Bernstein et al determined that the expression of carbon metabolite and carbohydrate utilization genes was greater in yeast cells with a HDA1 deletion [ 25 ].

The construction of a descriptor vector for each gene used in a microarray experiment can be envisaged. Each sequence (eg, gene or chromosomal fragment) would have an associated value for GC content, the number of splice sites, the number and type of promoter elements, the number of binding sites for each of many transcription factors and a quantitative assignment (perhaps binary) for each functional annotation category. Once these vectors are constructed, they allow rapid analysis of the relationship between active and inactive genes for each of these descriptor categories. By applying computational strategies described in the next section, it is possible to extract the relationship between, for example, the number of AP-1 binding sites in a gene promoter and the level of induced expression in an experiment. Moreover, such methods would permit the detection of non-linear and combinatorial relationships among these descriptors, eg, ‘stress-response genes with AP-1 binding sites and > 40% GC content in their promoter are enriched in response to stimulus X′. Finally, data from global analyses could be used to develop a predictive model to classify untested genes.

Data analysis

It is important to make a distinction between two fundamentally different applications of high-throughput screening data. Such methods may be used simply to identify compounds exceeding a certain activity threshold (hits) or to identify a more comprehensive correlation between the measured activity, molecular structure and/or previously determined biological activity or mechanism. This distinction is important because the acceptable false positive and false negative rates for the two approaches are substantially different. In a ‘threshold’ screen, high false negative and false positive rates are acceptable because secondary screening of the hits is used to distinguish between true positives and false positives. Since the identification of true positives is the ultimate goal in a ‘threshold’ screening approach, false negatives are not a concern as long as a sufficient number of true positives is found. In a global analysis, however, the false positive and false negative rates must be minimized because all results are used in a quantitative or semi-quantitative analysis. Global analyses can be quite powerful but are more expensive in terms of time and money to perform, and may require the use of sophisticated computational methods ( vide infra ).

Analysis of screening data

Screening results typically exhibit a continuous range of activities, usually with a Gaussian distribution. A cut-off value is chosen for the selection of hits and the active elements are normally confirmed in a secondary assay. The cut-off criteria for determining hits may be based on absolute activity (ie, 2-fold activity versus control), distribution (ie, three standard deviations or greater from the mean) or a desired number of compounds to be retested. Once confirmed actives have been identified, it may be desirable to search for additional active elements by testing or retesting candidates that are related in form or function. In transcription profiling screens, retesting entails performing a search of the original gene set for genes that are related to the active genes in terms of sequence or function. The screen comes to its natural conclusion with the selection of a set of actives that can be pursued in subsequent experiments.

Global analyses

Various learning techniques have been used to generate hypotheses and form models of relationships between descriptors and biological activity. These techniques may be divided into two main categories: classification and clustering. For simplicity, we assume that the data to be analyzed are compound descriptors and that the classes of compounds are active and inactive.

The goal of a classifier is to produce a model that can separate new, untested compounds into classes using a training set of already classified compounds. Classification routines attempt to discover those descriptors or sets of descriptors that distinguish the classes from each other. Neural networks, genetic algorithms and support vector machines attempt to discover regions in descriptor space that separate pre-defined classes. Unknown compounds that are subsequently placed in these regions can be classified as active or inactive [ 26 – 28 ]. These techniques optimize a learning function in order to fit the given number of classes while minimizing an error function based on the mismatch of the classifier in the assignment of compounds. One of the main issues of training is overfitting, in which the initial classes are learned so narrowly that no new members are allowed into a class. The learned model should be specific so that it seldom misclassifies compounds from the original training set but general enough to recognize new compounds that should belong to a class.

Recursive partitioning and decision trees first find the best single descriptor to split active and inactive populations into two groups and then successively find the next best descriptor to further divide the newly formed groups. These are known as greedy algorithms because they select the best solution at every step but do not necessarily find the global optimum [ 29 ].

Statistical methods can also be used to form probability models or estimate the likelihood of particular descriptors forming the known classes. These approaches generally involve the use of the training set to form a probability model that generates both a classification and a probability of being in a class. Simple statistical methods include k-nearest neighbors and the Naïve Bayes classifier. Support vector machines are also examples of statistical classifiers.

The goal in clustering a dataset is to group similar data together. Clustering forms groups of compounds that maximize internal class similarity while simultaneously minimizing external class similarity. Clustering can be accomplished by either a supervised method, where the number of classes is known, or through unsupervised learning, where the data are not grouped into a fixed set of classes.

In many cases, classes produced by clustering can be used for classification. Unknown compounds that group with predominately active compounds have a higher probability of also being active [ 30 ]. One drawback to this strategy is the fact that the higher hit rate only applies to the relatively small number of compounds that lie close to known hits. Furthermore, models of activity are not generated from clustering techniques and must be deduced by expert analysis. Indeed, descriptors that cluster compounds together may not be related to activity at all. As with classification, there are a variety of available clustering algorithms. These include hierarchical methods, such as Ward’s clustering, and non-hierarchical methods, such as Jarvis-Patrick [ 31 ] and Self-Organizing Maps [ 32 ]. Examples of statistical-based clustering include the use of Bayesian neural network to cluster drugs and non-drugs [ 33 ] and the use of k-nearest neighbor analysis to cluster compounds at various stages of the screening process [ 34 ].

In a recent global analysis of both compound screening and gene expression data, Staunton et al used a statistical classifier to identify a correlation between gene expression and cell sensitivity to compounds. Sixty cancer cell lines were exposed to numerous compounds at the National Cancer Institute, and were determined to be either sensitive or resistant to each compound. Using a Bayesian statistical classifier, Staunton et al showed that for at least one third of the tested compounds, cell sensitivity can be predicted with the gene expression pattern of untreated cells [ 35•• ]. This example demonstrates the power of global analyses to identify subtle but important relationships among variables in large-scale datasets.

Global analyses can be performed on data from compound screening and transcription profiling experiments using similar computational methods. The goal of such analyses is to discern sometimes-subtle relationships within these datasets and to make correlations between large sets of multidimensional data. Recent advances are making global analyses increasingly feasible and powerful.

There are numerous future challenges in this area. Firstly, it will be valuable to identify robust chemical descriptors that best define global chemical space, as well as the ligand-rich regions therein. Standardized tests for evaluating classification methods would enable more meaningful comparisons. Finally, methods for automatic incorporation of publicly accessible data into such analyses would be enormously powerful, as the range of testable relationships would expand dramatically.

Acknowledgments

Brent R Stockwell, PhD, is a Whitehead Fellow and is supported in part by a Career Award at the Scientific Interface from the Burroughs Wellcome Fund.

•• of outstanding interest

• of special interest

  • Economy & Politics ›

Global economy - Statistics & Facts

The rise of china, unemployment and rising inflation, key insights.

Detailed statistics

Global gross domestic product (GDP) 2028

Global inflation rate from 2000 to 2028

Countries with the largest gross domestic product (GDP) per capita 2022

Editor’s Picks Current statistics on this topic

Current statistics on this topic.

Countries with the largest gross domestic product (GDP) 2022

Leading export countries worldwide 2022

Related topics

Global economy.

  • Financial markets

Global economic indicators

  • Inflation worldwide
  • Gross Domestic Product (GDP) worldwide
  • Unemployment worldwide
  • Retail employment worldwide

Major economies

  • BRICS countries

Recommended statistics

  • Basic Statistic Global gross domestic product (GDP) 2028
  • Basic Statistic Gross domestic product (GDP) of selected global regions 2022
  • Premium Statistic GDP of the main industrialized and emerging countries 2022
  • Basic Statistic Countries with the largest gross domestic product (GDP) 2022
  • Basic Statistic Share of global regions in the gross domestic product 2022
  • Basic Statistic Global gross domestic product (GDP) per capita 2022
  • Premium Statistic Annual change in CPI 2015-2022, by country

Global gross domestic product (GDP) at current prices from 1985 to 2028 (in billion U.S. dollars)

Gross domestic product (GDP) of selected global regions 2022

Gross domestic product (GDP) of selected global regions at current prices in 2022 (in trillion U.S. dollars)

GDP of the main industrialized and emerging countries 2022

Gross domestic product (GDP) of the main industrialized and emerging countries in current prices in 2022 (in trillion U.S. dollars)

The 20 countries with the largest gross domestic product (GDP) in 2022 (in billion U.S. dollars)

Share of global regions in the gross domestic product 2022

Share of global regions in the gross domestic product (adjusted for purchasing power) in 2022

Global gross domestic product (GDP) per capita 2022

Global gross domestic product (GDP) per capita from 2012 to 2022, at current prices (in U.S. dollars)

Annual change in CPI 2015-2022, by country

Annual change in Consumer Price Index (CPI) in selected countries worldwide from 2015 to 2022

  • Basic Statistic Global Purchasing Manager Index (PMI) of the industrial sector August 2023
  • Premium Statistic Purchasing Managers Index (PMI) in developed and emerging countries 2020-2023
  • Premium Statistic Global consumer confidence index 2020-2023
  • Premium Statistic Consumer confidence in developed and emerging countries 2023
  • Premium Statistic Industrial production growth worldwide 2019-2023, by region
  • Basic Statistic Average annual wages in major developed countries 2008-2022
  • Premium Statistic Wage growth in developed countries 2019-2023
  • Premium Statistic Global policy uncertainty index monthly 2019-2023

Global Purchasing Manager Index (PMI) of the industrial sector August 2023

Global Purchasing Manager Index (PMI) of the industrial sector from August 2021 to August 2023 (50 = no change)

Purchasing Managers Index (PMI) in developed and emerging countries 2020-2023

Manufacturing PMI (Industrial PMI) in key developed and emerging economies from January 2020 to December 2023 (50 = no change)

Global consumer confidence index 2020-2023

Global consumer confidence in developed and emerging countries from January 2020 to December 2023

Consumer confidence in developed and emerging countries 2023

Consumer confidence in developed and emerging countries in November 2023

Industrial production growth worldwide 2019-2023, by region

Global industrial production growth between January 2019 to October 2023, by region

Average annual wages in major developed countries 2008-2022

Average annual wages in major developed countries from 2008 to 2022 (in U.S. dollars)

Wage growth in developed countries 2019-2023

Wage growth in major developed countries from January 2020 to November 2023

Global policy uncertainty index monthly 2019-2023

Global economic policy uncertainty index from January 2019 to November 2023

Gross domestic product

  • Basic Statistic Share of the main industrialized and emerging countries in the GDP 2022
  • Basic Statistic Countries with the largest proportion of global gross domestic product (GDP) 2022
  • Basic Statistic Gross domestic product (GDP) per capita in the main industrialized and emerging countries
  • Basic Statistic Countries with the largest gross domestic product (GDP) per capita 2022
  • Basic Statistic Countries with the lowest estimated GDP per capita 2023
  • Basic Statistic Share of economic sectors in the global gross domestic product from 2012 to 2022
  • Basic Statistic Share of economic sectors in the gross domestic product, by global regions 2022
  • Basic Statistic Proportions of economic sectors in GDP in selected countries 2022

Share of the main industrialized and emerging countries in the GDP 2022

Share of the main industrialized and emerging countries in the gross domestic product (adjusted for purchasing power) in 2022

Countries with the largest proportion of global gross domestic product (GDP) 2022

The 20 countries with the largest proportion of the global gross domestic product (GDP) based on Purchasing Power Parity (PPP) in 2022

Gross domestic product (GDP) per capita in the main industrialized and emerging countries

Gross domestic product (GDP) per capita in the main industrialized and emerging countries in current prices in 2022 (in U.S. dollars)

The 20 countries with the largest gross domestic product (GDP) per capita in 2022 (in U.S. dollars)

Countries with the lowest estimated GDP per capita 2023

The 20 countries with the lowest estimated gross domestic product (GDP) per capita in 2023 (in U.S. dollars)

Share of economic sectors in the global gross domestic product from 2012 to 2022

Share of economic sectors in the global gross domestic product (GDP) from 2012 to 2022

Share of economic sectors in the gross domestic product, by global regions 2022

Share of economic sectors in the gross domestic product (GDP) of selected global regions in 2022

Proportions of economic sectors in GDP in selected countries 2022

Proportions of economic sectors in the gross domestic product (GDP) in selected countries in 2022

Economic growth

  • Basic Statistic Growth of the global gross domestic product (GDP) 2028
  • Premium Statistic Forecast on the GDP growth in selected world regions until 2028
  • Premium Statistic Gross domestic product (GDP) growth forecast in selected countries until 2028
  • Basic Statistic Countries with the highest growth of the gross domestic product (GDP) 2022
  • Basic Statistic The 20 countries with the greatest decrease of the gross domestic product in 2022
  • Premium Statistic GDP growth in the leading industrial and emerging countries 2nd quarter 2023

Growth of the global gross domestic product (GDP) 2028

Growth of the global gross domestic product (GDP) from 1980 to 2022, with forecasts until 2028 (compared to the previous year)

Forecast on the GDP growth in selected world regions until 2028

Growth of the real gross domestic product (GDP) in selected world regions from 2018 to 2028 (compared to the previous year)

Gross domestic product (GDP) growth forecast in selected countries until 2028

Growth of the gross domestic product (GDP) in selected countries from 2018 to 2028 (compared to the previous year)

Countries with the highest growth of the gross domestic product (GDP) 2022

The 20 countries with the highest growth of the gross domestic product (GDP) in 2022 (compared to the previous year)

The 20 countries with the greatest decrease of the gross domestic product in 2022

The 20 countries with the greatest decrease of the gross domestic product (GDP) in 2022 (compared to the previous year)

GDP growth in the leading industrial and emerging countries 2nd quarter 2023

Growth of the real gross domestic product (GDP) in the leading industrial and emerging countries from 2nd quarter 2021 to 2nd quarter 2023 (compared to the previous quarter)

Unemployment

  • Basic Statistic Number of unemployed persons worldwide 1991-2024
  • Basic Statistic Global unemployment rate 2003-2022
  • Basic Statistic Unemployed persons in selected world regions 2024
  • Basic Statistic Unemployment rate in selected world regions 2022
  • Basic Statistic Youth unemployment rate in selected world regions 2022
  • Premium Statistic Monthly unemployment rate in industrial and emerging countries August 2023
  • Premium Statistic Breakdown of unemployment rates in G20 countries 2023

Number of unemployed persons worldwide 1991-2024

Number of unemployed persons worldwide from 1991 to 2024 (in millions)

Global unemployment rate 2003-2022

Global unemployment rate from 2003 to 2022 (as a share of the total labor force)

Unemployed persons in selected world regions 2024

Number of unemployed persons in selected world regions in 2021 and 2022, up to 2024 (in millions)

Unemployment rate in selected world regions 2022

Unemployment rate in selected world regions between 2017 and 2022

Youth unemployment rate in selected world regions 2022

Youth unemployment rate in selected world regions in 2000 to 2022

Monthly unemployment rate in industrial and emerging countries August 2023

Unemployment rate in the leading industrial and emerging countries from August 2022 to August 2023

Breakdown of unemployment rates in G20 countries 2023

Unemployment rate of G20 countries in 2023

Global trade

  • Premium Statistic Monthly change in goods trade globally 2018-2023
  • Basic Statistic Leading export countries worldwide 2022
  • Premium Statistic Leading import countries worldwide 2022
  • Premium Statistic The 20 countries with the highest trade surplus in 2022
  • Premium Statistic The 20 countries with the highest trade balance deficit in 2022
  • Basic Statistic Trade: export value worldwide 1950-2022
  • Premium Statistic Global merchandise imports index 2019-2023, by region
  • Premium Statistic Global merchandise exports index 2019-2023, by region

Monthly change in goods trade globally 2018-2023

Change in global goods trade volume from January 2018 to October 2023

Leading export countries worldwide in 2022 (in billion U.S. dollars)

Leading import countries worldwide 2022

Leading import countries worldwide in 2022 (in billion U.S. dollars)

The 20 countries with the highest trade surplus in 2022

The 20 countries with the highest trade surplus in 2022 (in billion U.S. dollars)

The 20 countries with the highest trade balance deficit in 2022

The 20 countries with the highest trade balance deficit in 2022 (in billion U.S. dollars)

Trade: export value worldwide 1950-2022

Trends in global export value of trade in goods from 1950 to 2022 (in billion U.S. dollars)

Global merchandise imports index 2019-2023, by region

Global merchandise imports index between January 2019 to November 2023, by region

Global merchandise exports index 2019-2023, by region

Global merchandise exports index from January 2019 to November 2023, by region

  • Basic Statistic Global inflation rate from 2000 to 2028
  • Basic Statistic Inflation rate in selected global regions in 2022
  • Premium Statistic Monthly inflation rates in developed and emerging countries 2021-2024
  • Basic Statistic Inflation rate of the main industrialized and emerging countries 2022
  • Basic Statistic Countries with the highest inflation rate 2022
  • Basic Statistic Countries with the lowest inflation rate 2022

Global inflation rate from 2000 to 2022, with forecasts until 2028 (percent change from previous year)

Inflation rate in selected global regions in 2022

Inflation rate in selected global regions in 2022 (compared to previous year)

Monthly inflation rates in developed and emerging countries 2021-2024

Monthly inflation rates in developed and emerging countries from January 2021 to January 2024 (compared to the same month of the previous year)

Inflation rate of the main industrialized and emerging countries 2022

Estimated inflation rate of the main industrialized and emerging countries in 2022 (compared to previous year)

Countries with the highest inflation rate 2022

The 20 countries with the highest inflation rate in 2022 (compared to the previous year)

Countries with the lowest inflation rate 2022

The 20 countries with the lowest inflation rate in 2022 (compared to the previous year)

Further reports Get the best reports to understand your industry

Get the best reports to understand your industry.

Mon - Fri, 9am - 6pm (EST)

Mon - Fri, 9am - 5pm (SGT)

Mon - Fri, 10:00am - 6:00pm (JST)

Mon - Fri, 9:30am - 5pm (GMT)

  • Global Data Analytics

The first edition of ESOMAR’s Global Data Analytics is the global analysis of the size and segmentation of the insights industry's evolving data analytics sector.

Part of the Global Insights Overview package.

  • Reports and Publications

At a glance

The Global Data Analytics 2023 is the first edition of the annual series that will delve into the size, characteristics, and performance of the evolving sector of the insights industry – the data analytics sector. The inferences of the report are based on the data collected by national research associations, leading companies, independent analysts, and ESOMAR representatives from 15 countries. They are complemented by ESOMAR’s independent size estimations.

In the upcoming editions, ESOMAR intends to leverage its global presence by working with its partner associations worldwide to survey this sector of the insights industry in more detail.

What can I find inside?

The Global Data Analytics report is being released at the heels of the occasion when the turnover of the data analytics sector has surpassed the established market research sector for the first time. In 2022, the global insights industry expanded from US$ 119 billion to US$ 129 billion.

ESOMAR estimates that 39% of the global insights industry’s turnover to those companies primarily engaged in data analytics, including DaaS, SaaS and other research platforms, compared to 36% of market research firms (24% of reporting firms). In 2023, the data analytics sector is expected to maintain this trend to attain a turnover of US$ 56 billion in an insights industry slated to expand to US$ 141 billion.

The report covers:

Global and regional overview of the Data Analytics sector

Experts’ views on the sector’s expansion

Self-regulation and its role in data analysis

Insiders’ perspectives on the sector

Overview of the Top 20 Data Analytics companies

Want to get better insight into the report?

Here is a selection of articles based on the Global Data Analytics.

Understanding Europe’s Data Analytics $1bn growth

The $36bn hegemony of Data Analytics in the US

Insights in Asia Pacific, set to dominate in 2024

Rest of the Americas’ journey through the inflationary hurdles

Inflation curtails promising Middle Eastern and African prospects

In addition to using essential cookies that are necessary to provide you with a smooth navigation experience on the website, we use other cookies to improve your user experience. To respect your privacy, you can adjust the selected cookies. However, this might affect your interaction with our website. Learn more about the cookies we use.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Data Descriptor
  • Open access
  • Published: 24 January 2024

A 31-year (1990–2020) global gridded population dataset generated by cluster analysis and statistical learning

  • Luling Liu 1 , 2 ,
  • Xin Cao   ORCID: orcid.org/0000-0001-5789-7582 1 , 2 ,
  • Shijie Li   ORCID: orcid.org/0000-0002-1583-4951 1 , 2 &
  • Na Jie 1 , 2  

Scientific Data volume  11 , Article number:  124 ( 2024 ) Cite this article

1407 Accesses

Metrics details

  • Social anthropology

Continuously monitoring global population spatial dynamics is crucial for implementing effective policies related to sustainable development, including epidemiology, urban planning, and global inequality. However, existing global gridded population data products lack consistent population estimates, making them unsuitable for time-series analysis. To address this issue, this study designed a data fusion framework based on cluster analysis and statistical learning approaches, which led to the generation of a continuous global gridded population dataset (GlobPOP). The GlobPOP dataset was evaluated through two-tier spatial and temporal validation to demonstrate its accuracy and applicability. The spatial validation results show that the GlobPOP dataset is highly accurate. The temporal validation results also reveal that the GlobPOP dataset performs consistently well across eight representative countries and cities despite their unique population dynamics. With the availability of GlobPOP datasets in both population count and population density formats, researchers and policymakers can leverage the new dataset to conduct time-series analysis of the population and explore the spatial patterns of population development at global, national, and city levels.

Similar content being viewed by others

global analysis of data

Uncovering temporal changes in Europe’s population density patterns using a data fusion approach

Filipe Batista e Silva, Sérgio Freire, … Carlo Lavalle

global analysis of data

High-resolution gridded population datasets for Latin America and the Caribbean using official statistics

Tom McKeen, Maksym Bondarenko, … Alessandro Sorichetta

global analysis of data

The Multi-temporal and Multi-dimensional Global Urban Centre Database to Delineate and Analyse World Cities

Michele Melchiorri, Sergio Freire, … Thomas Kemper

Background & Summary

The world’s population is estimated at over 8 billion and is projected to reach around 8.5 billion by 2030 1 . As population growth continues, the ability to monitor population spatial dynamics over long periods becomes increasingly essential for the implementation of effective policies and initiatives related to sustainable development. Specifically, of the 17 Sustainable Development Goals and 169 targets set by the United Nations 2 in 2015, approximately half of the indicators require accurate and spatially explicit demographic data. The Sustainable Development Goals emphasize ‘leaving no one behind’, which means we need increasingly spatial-temporal consistent gridded population data to identify areas and groups that are vulnerable to poverty, disease, and other development challenges, enabling more targeted and effective interventions. A continuous gridded population dataset can offer more spatially detailed information and allows for analysis of the unevenly changing relationship between humans and nature at a pixel scale over time. It was recognized as essential data source for various applications, such as epidemiology, urban planning, environmental management, assessment of risks to vulnerable population, energy crises, global inequities, and assessment of progress toward the Sustainable Development Goals (SDGs) 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 .

The gridded population data is originally derived from census data, which is typically collected through a formal enumeration, although other methods such as surveys may also be used. After converting the census data table of administrative units or enumeration areas to vector format, it will be reallocated into raster grids 11 , 12 . Raster grids are a series of cells arranged in rows and columns, where each cell represents a geographic area and contains information about the population within that area. There are two main methods for producing top-down gridded population data: area-weighted and dasymetric mapping, and bottom-up population mapping methods are adopted when census data is not available. Area-weighted mapping assumes that the population is evenly distributed across administrative areas and assigns demographic information to each grid cell based on the proportion of administrative cells covered by each cell. This method is simple and easy to implement but may not accurately reflect the true population distribution, especially in areas with heterogeneous population density 13 . Dasymetric mapping makes assumptions about the relationship between population and various geographic and land cover characteristics and uses ancillary data to determine where and how much population should be assigned to each location. This method may result in more accurate estimates of population distribution, but it requires more detailed ancillary data and expertise to implement.

There are five long time-series of global gridded population data products with either density or count measures, including the Global Human Settlements Layer Population (GHS-POP), the Global Rural Urban Mapping Project (GRUMP), the Gridded Population of the World Version 4 (GPWv4), the LandScan Population datasets and the WorldPop datasets, all with a spatial resolution of 30 arcseconds (about 1 km at the equator). Nonetheless, previous research has identified some limitations associated with these datasets.

First of all, there is currently no continuous long-term gridded population dataset available at a spatial resolution of approximately 1 km, particularly before 2000. Among the three datasets (GHS-POP, GRUMP, and GPWv4), the shortest time interval is five years. Continuous gridded population maps are available after 2000 for the other two datasets (LandScan and WorldPop). However, LandScan’s methods and metadata are updated every year, especially for the 2000s 14 . These products are based on correlations between modeling factors and populations at the administrative unit level and then predicted to gridded populations. Therefore, the accuracy of population spatialization depends on the accuracy of the elements used to a large extent and population allocation methods 8 , 15 . Besides, there is a mismatch between the training and predicted data under scale variation, resulting in low accuracy of the overall estimate 11 , 16 .

Secondly, the reliability and uncertainty of population data products are typically described in documentation or validated in specific countries and regions, with methodological and ancillary data uncertainties being the most common sources of uncertainty. Methodological uncertainty issues can arise due to spatial autocorrelation resulting from the equally weighted distribution of the population, leading to overestimation of the population 12 , 17 . Problems associated with ancillary data include common inaccuracies in land cover data, which typically have an accuracy range of 70–85% 18 . Other ancillary data sources, such as nighttime light data, can also introduce cumulative errors in the gridded population data due to saturation effects, blooming effects, and inter-annual inconsistencies 19 . These errors can undermine the reliability of the ancillary data and propagate into the final population estimates, further increasing uncertainties in the results.

Last but not least, one issue that has received limited attention is the global applicability of gridded population data. The five sets of gridded population data products are used extensively in global-scale studies, but their accuracy and suitability for different regions and situations have not been fully evaluated. Currently, there are ongoing efforts to validate and compare the precision of various population data products, although the findings are frequently restricted to specific countries or regions. For example, Archila Bustos et al . 14 used the example of Sweden, where population change is slow, to validate and compare five demographic datasets with statistical data from 1990–2015, and found that no datasets showed consistent best for different situations, and there were differences in accuracy across datasets in uninhabited areas.

Although population data products are fundamental for many researches and applications, a lack of long-term and consistently highly accurate gridded population data exists for time-series analysis. As assessments of population data product applicability continue to emerge, it has been found that each population data product has its applicability and, in some cases, shows a high degree of accuracy 4 , 20 . These findings offer insights into the research objective of whether it is possible to integrate these five sets of multi-source demographic data and leverage the strengths of each data through a statistical learning approach to produce a set of new demographic products suitable for long time-series analysis at the global grid scale.

Hence, this study proposed a data fusion framework to generate a continuous global gridded population (GlobPOP) from 1990 to 2020 using the five existing products. As shown in Fig.  1 , the whole framework of population data production is divided into three parts. The first part was pre-processing, which harmonized the data by converting population data format uniformly and linear gap-filling. The second part involved model building and estimation based on cluster analysis and statistical learning. The clustering analysis allowed for understanding the differences in each population dataset’s performance across countries. The estimation model was established through statistical learning and training regression parameters on the regions with better performance. The third part was accuracy validation, which included two levels of spatial and temporal validation. Finally, we examined the model sensitivity and discussed the adaptability of the new data product at pixel scale.

figure 1

Workflow of the estimation and validation of the global gridded population (GlobPOP).

In this section, we described the input data and the data fusion framework used in producing the global gridded population data product.

This section summarizes the five global population data products used to produce the continuous gridded population. Table  1 shows the detailed information of original input population data sources.

GPWv4 is the only dataset that uses area weighting for each year from national census registration data, where a water body mask is first applied before area weighting, to ensure that population is not allocated to water bodies and snow- and ice-covered areas 21 . The limitation lies firstly in the assumption that the population is evenly distributed within administrative boundaries and is, therefore, more accurate for smaller input units than larger ones 22 . Secondly, it can be affected by interpolation, particularly in areas where the population changes dramatically over short periods, leading to population underestimation 23 .

GHS-POP population data are binary dasymetric mapped, with population data derived from the GPWv4 UN-adjusted population dataset at the administrative district level and ancillary data using a gridded dataset of built-up areas, with each grid representing the percentage of cells covered by built-up areas. 95% of the population data is allocated to grid cells in proportion to the density of built-up areas using an area-weighted approach 24 . Only when the administrative district area is less than 250 m grid area, all the population within one grid will be aggregated together, which may lead to a shift in the spatial distribution of population to adjacent grids. As the reallocation of the population in the GHS-POP is based on the density of built-up, which may be allocated to non-residential areas, such as commercial, industrial, and recreational areas, distinguished by the residential population allocated to built-up areas 24 .

The GRUMP data is based on GPWv3 (version 3) to produce improved population gridded data, which redistributes the population to urban and rural areas according to a binary mapping method, with rural and urban areas being divided mainly based on nighttime light data. The GRUMP data refers to the use of nighttime light data such as DMSP, to estimate urban areas where the population is overestimated. Due to the ‘blooming’ effect of nighttime lights, where poorly electrified or un-electrified areas cannot be detected, and therefore the population is underestimated. Moreover, the GPWv3 as the older version is less accurate than GPWv4, and consequently, the GRUMP data is less accurate than GPWv4 in some regions 12 .

LandScan data uses multivariate mapping to assign local census data to each grid cell according to the likelihood coefficient between the auxiliary data and the population. As the metric values represent integer counts of the environmental population, which is the average population for a typical 24-hour day, week, and season, and therefore also reflect the distribution of the working, and traveling population, such as in urban areas where there is a problem of population overestimation. The LandScan algorithm is updated annually to introduce more and higher precision data, which is not conducive to time-series comparisons of LandScan data, as changes can be caused not only by population changes but also by changes in input data or algorithms 25 .

A random forest model is employed in the WorldPop data production process to generate population projections based on ancillary data such as land cover, elevation, nighttime lights, roads, and settlements. Population input data from census and official population estimation databases linked to GIS through the WorldPop initiative and built on GPWv4 are then assigned to each country/region based on population projections 13 . The random forest projections in the WorldPop data do not exceed the input population range.

Besides the gridded population data, we used some other ancillary data as well. The vector boundary shapefiles were utilized for zonal statistics at two scales, and census data were used for cluster analysis and model validation. Since census data is still considered more accurate and reliable compared to gridded population data, the country administrative level census data as reference data was used to explore where are the better regions for various gridded population data products in different years. Meanwhile, we also employed the two spatial scales (level-0 is the country administrative level, and level-2 is the sub-division of the subnational administrative level) to validate the results and for sensitivity analysis. Furthermore, the surface area layer was exploited for population density calculation. The detailed information is displayed in Supplementary Table  1 .

GADM, or Database of Global Administrative Areas, is a highly accurate global database of administrative boundaries. As we performed the zonal statistics at two levels, we only use these two levels’ boundary shapefiles. For level-0 boundaries, we matched the ISO country code with census data and acquired the 217 countries’ boundaries. And for level-2 boundaries, we chose the nine countries’ level-2 administrative units across five continents (Asia, Europe, America, Africa, and Oceania), which were processed and harmonized to match the definitions used in the level-2 census data from 1990 to 2020.

The census data provides detailed information on the population size, age structure, and geographic distribution of a specific area. For the level-0 census data, the World Population Prospect (WPP) 2022 1 provides population estimates and projections for countries and regions worldwide. In this study, only the population estimates for countries from 1990 to 2020 were considered for two aspects. On the one hand, the WPP was used as reference data in cluster analysis to explore where the better regions are for various gridded population data products in different years, which helped to improve the accuracy of the population estimates. On the other hand, it was of great significance to validate the results’ spatial-temporal consistency for 217 countries from 1990 to 2020. In addition, we collected level-2 census data from nine countries across five continents, including China and India in Asia, the United Kingdom in Europe, the United States in North America, South Africa, Nigeria, and Angola in Africa, and New Zealand and Vanuatu in Oceania. These data covered the period from 1990 to 2020 and were obtained from each country’s bureau of official statistics.

Data preprocessing

The data preprocessing consists of two steps, data harmonization and linear gapfill.

Data harmonization

The harmonization process includes the raster data conversions and census data regulations. We converted the input population density products to population count layers, by overlaying the surface area layer. Because the population count data are originally in a geographic coordinate system, the closer the grids get to the Poles, the more they become narrower and smaller. This holds even after the polygons are projected, it is more accurate to calculate raster algebra. What’s more, we excluded some uninhabited countries, island countries and regions in the census data as Supplementary Table  4 shows, and finally acquired census data of 217 countries with matched names.

Linear gapfill

Considering the gaps in different population data products are between five to ten years, we took the linear population growth assumptions to fill the data gaps. The linear gapfill process included linear interpolation and extrapolation at the pixel level. The linear interpolation formula is as in Eq. ( 1 ):

where y signifies the estimated population at a specific time, y 1 corresponds to the population at the first known time, y 2 denotes the population at the second known time, t represents the target time for which we want to estimate the population, t 1 is the time of the first known population value, t 2 is the time of the second known population value. This formula is essentially a linear interpolation formula. It calculates the population at a particular time t by considering the linear growth between the known population values ( y 1 and y 2 ) at the times t 1 and t 2 .

The data interval is usually 5 years, if data is not available within 5 years, 10 years interval is used. Thus, the five products are divided into three parts as shown in the top position of Fig.  1 . From 1990 to 1999, we performed the linear interpolation and extrapolation for GHS-POP, GRUMP, and GPWv4. For the year 2000, we kept the data for all five original population data products. And from the year 2001 to the year 2020 we carried out the linear interpolation for the GPWv4.

Model estimation

The key point of the data fusion framework is to fully comprehend and exploit the strengths and weaknesses of the five input population data products, contributing them to the regression model of population fusion. Thus, this study performed the clustering analysis which allowed for understanding the differences in each population dataset’s performance across countries. And then the estimation model was established through statistical learning and training regression parameters on the regions with better performance.

Cluster analysis for spatial consistency

Cluster analysis is an unsupervised approach, and the most common method is the K-means cluster method 26 . The statistical software used for cluster analysis is RStudio, and the packages include’cluster’,’quantreg’ and’Metrics’. Clustering allows for the identification and categorization of homogeneous groups of the dataset. Four metrics were selected to quantify the similarity between actual census and product population counts at the country level. And we used these differences to identify areas with less variation for population projections.

First of all, we selected the APE (Absolute Percentage Error), SE (Squared Error), SLE (Squared Logarithmic Error), and Dif (Difference) indexes to compare different population data products with census data. These indexes were chosen to facilitate a comprehensive comparison between different population data products and the corresponding census data in cluster analysis.

where the X i is the actual value of population count, and the Y i is the predicted value of population count.

Then the data were scaled to a standard range, between 0 and 1, to remove any potential bias that might be introduced by different measurement scales. Thirdly, we determined the ideal number of clusters for the datasets and performed K-means clustering analysis. It involves iteratively assigning data points to different clusters based on their similarity and calculating the centroids of each cluster. Finally, the country-level census data were divided into 2 categories. The better product data which have higher similarities with census data will be utilized for model parameters training, and the worse will take part in model parameters testing.

To train regression parameters for population fusion based on countries with better performance, we selected two statistical regression models for population prediction. Regression methods such as the generalized linear model (GLM) and quantile regression model (QRM) can be effective in controlling for confounding factors in a research study 27 . The generalized linear model (GLM) is an extension of the linear regression model that extends the possible distribution of residuals to a family of distributions called the exponential family, allowing the dependent variable to be non-normal 28 . In GLM, the confounding factors can be included as covariates in the model, along with the independent variables of interest. The coefficients for the independent variables can then be estimated while controlling for the effects of the confounding factors. The quantile regression model (QRM) is more efficient and robust to outliers 29 . In QRM, the focus is on estimating the conditional quantiles of the dependent variable, rather than the mean. This can be useful when the relationship between the independent and dependent variables is not well approximated by a linear relationship. QRM can also be used to estimate the conditional quantiles while controlling for the effects of the confounding factors. The GLM and QRM can both be expressed as given below:

where Y t is the predicted population of the target t year, X n,t is the n available population data product in the target t year, and a n,t is the weight coefficient of the n available population data product in the target t year.

Given that population counts should inherently be non-negative, we employ the L-BFGS-B (Limited-memory Broyden–Fletcher–Goldfarb–Shanno Bound-constrained) algorithm for parameter estimation within the model. The algorithm is a well-established optimization technique, often used in constrained optimization problems 30 . Specifically, we impose lower bounds on the estimated coefficients to ensure their non-negativity.

We trained the two regression models at the national level to obtain the parameters needed for the production of population data product. The model output was used as coefficients of linear regression prediction at the pixel scale. During the training process, we took 10-fold cross-validation and 200 iterations on average to obtain the optimal parameters.

Population adjustment

For quality control, two steps are carried out to ensure the reliability of GlobPOP dataset. We took the UN World Population Prospects 2022 as a reference standard, with the model projections for each country adjusted to the UN agencies’ generic global national population statistics. We applied the adjustment to 217 countries, excluding uninhabited islands and territories.

Adjustment factors for matching national estimates to UN estimates:

where a t is the adjustment factor in the target year, P x,t is the pixel population count in the target year within the national administrative region, and P un,t is the UN national estimate for the target year.

Adjustment factors were applied at the pixel level within each country boundary:

where P adj , t is the sub-national UN WPP-adjusted estimate, and P x,t and a t are as defined in Eq. ( 7 ).

Furthermore, the projected population for each year will be evaluated to determine if they are below zero. If this is the case, they will be adjusted to zero to ensure that negative population numbers are not recorded.

Accuracy validation

To scan the GlobPOP products fully and thoroughly, we employed the validation in three aspects. Table  2 shows the accuracy indexes and their equation definitions for spatial and temporal validation in this study.

For spatial validation, we used four indicators (R 2 , RMSE, MAE, and Relative Entropy) to explore the overall accuracy in 217 countries and nine countries’ level-2 regions. The metric R square (R 2 ) represents the proportion of variance in the dependent variable, which describes the extent to which the variance of one variable explains the variance of a second variable. The Root Mean Squared Error (RMSE) is a common measure of the quality of the model fit. The Mean Absolute Error (MAE) is also a common measure of the error between pairs of observations of the same phenomenon. In addition to relative entropy (RE), the metric is used to measure the probability distribution difference between the predicted population count and census data.

As for the temporal validation, the time-series curve similarities and trend analysis were taken into consideration. We chose eight countries and their most populated or capital cities, and performed the temporal validation at two levels. The Dynamic Time Warping (DTW) distances method is a normal and popular method to measure the time-series curve similarities 31 . It aims to find the minimal distance between two time-series curves.

The Sen’s slope estimator and non-parametric Mann-Kendall test are widely used in the long time-series trend analysis for many fields, such as meteorology 32 , 33 , 34 . The Mann-Kendall test statistic can be expressed as given below:

where X j and X k are the sequential data values, n is the length of the data, Z s is the normalized test statistics.

And then Sen’s slope estimator can be calculated using Eqs. ( 12 ) and ( 13 ).

where d k is the value of the slope, and Sen is the Sen’s slope estimator.

Data Records

The continuous global gridded population data product 35 (GlobPOP 1990–2020) in the WGS84 coordinate system with a spatial resolution of 30 arcseconds (approximately 1 km in equator) can be freely accessed on Zenodo at https://doi.org/10.5281/zenodo.10088105 . The data is stored in the GeoTIFF format for each year. There are two population formats available: ‘Count’(Population count per grid) and ‘Density’(Population count per square kilometer each grid). The current version of the product covers the globe from 90 N latitude to 90 S.

Each GeoTIFF filename has 5 fields that are separated by an underscore “_”. A filename extension follows these fields. The fields are described below with the example filename: GlobPOP_Count_30arc_1990_I32.

Field 1: GlobPOP(Global gridded population)

Field 2: Pixel unit is population “Count” or population “Density”

Field 3: Spatial resolution is 30 arc seconds

Field 4: Year “1990”

Field 5: Data type is I32(Int 32) or F32(Float32)

Technical Validation

Cluster results.

The cluster analysis was performed to quantify the accountability of the current five global gridded population data products, which is represented by the similarity between actual census and product population counts at the country level. In Fig.  2 and Supplementary Table  2 , we provided explicit information on which global gridded population data products are not valid in a specific year for different countries, and that can guide the users on whether or not they should use these products in the study area of interest. Figure  2 shows that the numbers for which the population data products are accountable are distributed unevenly in all 217 countries for the past three decades. It quantifies the accountability of these data products by indicating how many of them can be trusted for each country in a given year. As observed in the Fig.  2 , the numbers vary across countries and years. The uneven distribution of valid data sets highlights that the reliability of these products fluctuates over time and is not uniform across all regions.

figure 2

The number of valid sets of population data products for 217 countries from 1990 to 2020.

The greater the valid numbers are, the more product data get involved in the following model training procedures. The top three countries with the lowest number of active products are India, Guadeloupe, and the Republic of Maldives. In total, 12 countries show that no less than one product set is unreliable for one or more of the past years 1990–2020.

Spatial accuracy validation

Level-0 accuracy.

The findings of this study reveal that GlobPOP has a high level of accuracy in predicting country-level population estimates, shown in Table  3 . The overall R 2 of GlobPOP is greater than 0.999 when compared with the World Population Prospects 2022. The range of Root Mean Squared Error (RMSE) values observed was between 120423 and 296066, while the Mean Absolute Error (MAE) values ranged from 48243 to 84103. Additionally, the largest relative entropy was less than 0.1. During the model estimation process, the quantile regression model (QRM) exhibited stable performance and outperformed the general linear model (GLM) tested in terms of both predictive accuracy and consistency. Therefore, we selected the QRM as the population prediction model.

Level-2 accuracy

Table  4 demonstrates that the average R 2 is higher than 0.972 for all census available countries at the level-2 scale when compared with the corresponding level-2 census data. The range of Root Mean Squared Error (RMSE) values observed was between 11158 and 272229, while the Mean Absolute Error (MAE) values ranged from 3065 to 49844. Moreover, the mean relative entropy was less than 3.406. These findings highlight the strong performance and accuracy of the population prediction model at the level-2 scale.

Temporal accuracy validation

Country-level accuracy.

To validate the temporal accuracy of GlobPOP at the country level, we randomly selected eight countries from five different continents, consisting of four developed countries (Japan (JPN), German (DEU), United States (USA), Portugal (PRT)) and four developing countries (China (CHN), Liberia (LBR), Guyana (GUY), Lebanese Republic (LBN)). These countries were chosen due to their distinct population trends, representing a diverse range of demographic and socioeconomic characteristics. We compared the population counts time-series curves of the GlobPOP dataset with the other five available datasets, from 1990 to 2020. The results are presented in Fig.  3(a) . In the developed countries, the GlobPOP dataset shows the most consistent curve variations with the census curve, while the other dataset shows obvious disparity with census curve especially in Germany.

figure 3

Comparison of the GlobPOP and the other datasets over the eight countries. ( a )The population count time-series curve in eight countries from 1990 to 2020. ( b )The population time-series curve DTW distances of the GlobPOP, LandScan, and WorldPop datasets in eight countries from 2000 to 2020.

It is worth mentioning that there are slight differences between the curves for Japan and Guyana in Fig.  3(a) , even though the curves’ trends are matched. This is due to the method used to calculate the national adjustment factor, which is rasterized from a vector file. For small countries with long coastlines, some of the small pixels were excluded during the rasterization process, which resulted in a curve that is not the same as the census data curve. This issue may have implications for the accuracy of the population estimates in these small countries, especially at a finer spatial resolution. To address this issue, our future studies will explore alternative methods for calculating the national adjustment factor that takes into account the specific characteristics of small countries with long coastlines. Nonetheless, the overall results of this study suggest that the population estimation models and products evaluated in this study could be useful for generating reliable population data at different spatial scales.

Furthermore, we computed the Dynamic Time Warping (DTW) distances between the population time-series curves of the three datasets from 2000 to 2020 in the same eight countries. The DTW distances represent the similarity between two time-series curves, with smaller distances indicating higher similarity. As presented in Fig.  3(b) , GlobPOP’s DTW distances are the smallest in the eight countries. For example, the GlobPOP dataset outperforms the other dataset in Guyana and Lebanese Republic, the DTW distances of WorldPop and LandScan are statistically six times larger than GlobPOP. The results display a large disparity of population change from 2000 to 2020 for WorldPop and LandScan comparing with census data in both countries. These comparisons provide evidence of the high temporal accuracy of the GlobPOP dataset, which consistently outperforms the other datasets tested across all eight countries, regardless of whether the countries were classified as developed or developing.

City-level accuracy

More importantly, to validate the temporal accuracy of GlobPOP at the city level, we focused on the most populated or capital cities of the above eight countries. Through trend analysis and exploration of pixel population count curve variations, we aimed to examine the GlobPOP dataset’s performance in capturing population dynamics at the local scale. Specifically, Fig.  4(a),(c),(e),(g) presents the pixel population count curves with both positive and negative slopes, with the curve trends consistently aligned with the trend analysis results.

figure 4

The temporal population trend analysis with significant slopes and pixel population curve variations in eight cities. ( a ) Tokyo in Japan. ( b ) Beijing in China. ( c ) Berlin in German. ( d ) Beirut in the Lebanese Republic. ( e ) New York in the United States. ( f ) Monrovia in Liberia. ( g ) Lisbon in Portugal. ( h ) Georgetown in Guyana.

Nonetheless, in the cities of developing countries, as Fig.  4(b),(d),(f),(h) shows, the curve fluctuations of pixels are significantly different, particularly in smaller cities, such as Beirut in Fig.  4(d) . where there is a clear discontinuity in pixels showing significant growth or decline trends from 2015 to 2020. This phenomenon is caused by the fact that the QRM model assigned more weight to LandScan since 2016, making the population distribution of GlobPOP data more similar to that of LandScan. As the LandScan data is defined as a nighttime population rather than the residential population, LandScan is more realistic in terms of spatial detail, but it is fundamentally different from other population data products. As a result, the spatial distribution of GlobPOP over the last five years and at a finer scale is somewhat inconsistent with what it was before, and further calibration is needed to adjust the parameters of the model.

Spatial distributions

Figure  5 provides a comprehensive overview of global population development over the past three decades. The pixel with population higher than 5,000 has increased significantly in India, China, western Europe, the eastern and southern United States, and South Sahara Africa since 1990. As Fig.  5(d) shows, the pixels with population count range from five to fifty diminish and instead the pixels with population no larger than five increased, it looks like the population has decreased in these areas. The observed phenomenon can be attributed to the changes in the weighting of the QRM model towards LandScan since 2016 as Supplementary Table  3 shown. This has resulted in a greater resemblance between the population distribution of GlobPOP and LandScan datasets. While LandScan provides a more detailed representation of nighttime population, it differs significantly from other population data products due to its nature of being defined as nighttime population rather than residential population. Consequently, there exists a certain degree of inconsistency in the spatial distribution of GlobPOP at a finer scale over the past five years as compared to previous years. Further calibration of the model parameters which is necessary to reconcile this disparity will be considered in the next following work.

figure 5

The global gridded population distribution from 1990 to 2020. ( a )Global population distribution in 1990. ( b )Global population distribution in 2000. ( c )Global population distribution in 2010. ( d )Global population distribution in 2020.

Benchmark test

A benchmark test was performed to evaluate the performance of three population fusion models, namely QRM, GLM, and Median-composite model, along with five global gridded population data products. The objective was to compare the models and population data products for the year 2000, which was the only year when all five datasets were available in their entirety. Other years were unsuitable for benchmarking tests as the population data products were interpolated. Figures  6 and 7 display the population count scatter plot after log10 transformation and accuracy comparisons for the five population data products and the three different model predicted populations at level-0 and level-2 scales, respectively. The results show that the QRM model performed better than the other two models at a finer scale, with an R-squared value of 0.9963. The QRM model maintains high accuracy at the level-0 scale as well, with an R-square value of 0.9997, which is similar to the performance of the GLM model. Based on these results, the QRM model was selected as the final population estimation model for this study.

figure 6

Level-0 population count notched boxplots with data points after log10 transformation, and accuracy comparisons for five population data products and three different population prediction models in 2000.

figure 7

Level-2 population count notched boxplots with data points after log10 transformation, and accuracy comparisons for five population data products and three different population prediction models in 2000.

In summary, the QRM model demonstrates the best performance among the three population fusion models and the existing five population data products. The high accuracy of the QRM model at the level-0 scale also makes it a reliable choice for population estimation.

With the spatial resolution at 30 arc-second, GlobPOP provides more detailed population distribution than conventional census data. The spatial validation results demonstrate the effectiveness of the GlobPOP model in generating reliable and precise population estimates at level-0 and level-2 scales. We also investigated the accountability of GlobPOP to estimate population in the rarely populated land cover areas at pixel scale, five different land cover types (cropland, forest, wetland, desert, and snow) were selected to test the data. As Fig.  8 shown, GlobPOP performs better in capturing population distribution in cropland compared to other products, while its performance is equivalent to other products in other land cover types. Since the real land surface data are not available, and the land cover/use products typically have its uncertainty and bias. There is a lack of reference data to perform spatial validation for gridded population data at pixel level. The selected sample areas include five different land cover types, and we believe the visual inspection could show the accountability of GlobPOP to some degree.

figure 8

Examples of population distribution at pixel level and the google earth image in 2020. ( a ) Farmland in western China. ( b )Forest in northern China. ( c ) The Sahara Desert in Africa. ( d ) Snow mountain in west-eastern China. (5) Pantanal wetland in South America.

What’s more, to analyze changes in population distributions and for long time-series analysis, a data product constructed from data layers representing the relevant period would be preferred. But there is no global gridded population dataset at approximately 1 km for the past three decades. The temporal validation results demonstrate that the GlobPOP dataset performs consistently well across all eight countries, despite their unique population dynamics. And GlobPOP dataset’s performance in capturing population dynamics at the local scale is also proven. The two-level temporal validation underscores the reliability and versatility of the population prediction model in generating accurate and consistent population estimates over time. Nonetheless, we are obliged to emphasize the disparity of the GlobPOP dataset before and after 2016. The regression model relies on coefficients trained from cluster results, as assigned more weights to LandScan since 2016. Further calibration of the model parameters which is necessary to reconcile this disparity will be considered in the following work.

Usage Notes

The input datasets and census data are all available on their official website 36 , 37 , 38 , 39 , 40 , 41 . The programs used to generate and validate the gridded population dataset were GRASS GIS (8.2), Python(3.9) and RStuido (2022.07.2). The zonal statistics were performed at QGIS (3.22). All software needs to be installed in Windows 10.

Code availability

The fully reproducible codes are publicly available at GitHub ( https://github.com/lulingliu/GlobPOP ).

UN. World Population Prospects 2022. (United Nations, Department of Economic and Social Affairs, Population Division, 2022).

UN. Transforming our World: The 2030 Agenda for Sustainable Development. (United Nations, Department of Economic and Social Affairs, 2015).

Khavari, B., Sahlberg, A., Usher, W., Korkovelos, A. & Fuso Nerini, F. The effects of population aggregation in geospatial electrification planning. Energy Strategy Reviews. 38 , 100752 (2021).

Article   Google Scholar  

Leyk, S. et al . The spatial allocation of population: a review of large-scale gridded population data products and their fitness for use. ESSD. 11 , 1385–1409 (2019).

ADS   Google Scholar  

Batista E Silva, F. et al . Uncovering temporal changes in Europe’s population density patterns using a data fusion approach. Nat Commun. 11 , 4631 (2020).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Linard, C. & Tatem, A. J. Large-scale spatial population databases in infectious disease research. Int J Health Geogr. 11 , 7 (2012).

Article   PubMed   PubMed Central   Google Scholar  

Berger, L. Leave No One Off The Map: a guide for gridded population data for sustainable development. (United Nations, Sustainable Development Solutions Network (SDSN), 2020).

Qiu, Y., Zhao, X., Fan, D., Li, S. & Zhao, Y. Disaggregating population data for assessing progress of SDGs: methods and applications. International Journal of Digital Earth. 15 , 2–29 (2022).

Article   ADS   Google Scholar  

MacManus, K., Balk, D., Engin, H., McGranahan, G. & Inman, R. Estimating population and urban areas at risk of coastal hazards, 1990–2015: how data choices matter. ESSD. 13 , 5747–5801 (2021).

Tellman, B. et al . Satellite imaging reveals increased proportion of population exposed to floods. Nature. 596 , 80–86 (2021).

Article   ADS   CAS   PubMed   Google Scholar  

Wu, S., Qiu, X. & Wang, L. Population Estimation Methods in GIS and Remote Sensing: A Review. GIScience & Remote Sensing. 42 , 80–96 (2005).

Balk, D. L. et al . Determining Global Population Distribution: Methods, Applications and Data. Advances in Parasitology. 62 , 119–156 (2006).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Lloyd, C. T. et al . Global spatio-temporally harmonised datasets for producing high-resolution gridded population distribution datasets. Big Earth Data. 3 , 108–139 (2019).

Archila Bustos, M. F., Hall, O., Niedomysl, T. & Ernstson, U. A pixel level evaluation of five multitemporal global gridded population datasets: a case study in Sweden, 1990–2015. Popul Environ. 42 , 255–277 (2020).

Matthews, S. A. et al . Looking Back, Looking Forward: Progress and Prospect for Spatial Demography. Spat Demogr. 9 , 1–29 (2021).

Kuffer, M., Owusu, M., Oliveira, L., Sliuzas, R. & van Rijn, F. The Missing Millions in Maps: Exploring Causes of Uncertainties in Global Gridded Population Datasets. ISPRS International Journal of Geo-Information. 11 , 403 (2022).

Reed, F. J. et al . Gridded Population Maps Informed by Different Built Settlement Products. Data. 3 , 33 (2018).

Zhang, X. et al . GLC_FCS30: global land-cover product with fine classification system at 30 m using time-series Landsat imagery. ESSD. 13 , 2753–2776 (2021).

Zhao, C., Cao, X., Chen, X. & Cui, X. A consistent and corrected nighttime light dataset (CCNL 1992–2013) from DMSP-OLS data. Sci Data. 9 , 424 (2022).

Chen, R., Yan, H., Liu, F., Du, W. & Yang, Y. Multiple Global Population Datasets: Differences and Spatial Distribution Characteristics. ISPRS International Journal of Geo-Information. 9 , 637 (2020).

Documentation for the Gridded Population of the World, Version 4 (GPWv4), Revision 11 Data Set. (Center for International Earth Science Information Network (CIESIN), Columbia University, 2018).

Doxsey-Whitfield, E. et al . Taking Advantage of the Improved Availability of Census Data: A First Look at the Gridded Population of the World, Version 4. Papers in Applied Geography. 1 , 226–234 (2015).

Deichmann, U., Street, H., Balk, D. & Yetman, G. Transforming Population Data for Interdisciplinary Usages: From census to grid. (Center for International Earth Science Information Network (CIESIN), Columbia University, 2001).

Freire S., MacManus K., Pesaresi M., Doxsey-Whitfield E., Mills J. Development of new open and free multi-temporal global population grids at 250 m resolution. (Geospatial Data in a Changing World; Association of Geographic Information Laboratories in Europe (AGILE), 2016).

Rose, A. N. & Bright, E. The LandScan Global Population Distribution Project: Current State of the Art and Prospective Innovation. (Computational Sciences and Engineering Division, Oak Ridge National Laboratory, 2014).

Likas, A., Vlassis, N. & Verbeek, J. J. The global k-means clustering algorithm. Pattern Recognition. 36 , 451–461 (2003).

Sayegh, A. S., Munir, S. & Habeebullah, T. M. Comparing the Performance of Statistical Models for Predicting PM10 Concentrations. Aerosol Air Qual. Res. 14 , 653–665 (2014).

Article   CAS   Google Scholar  

Coxe, S, Stephen G. W, and Leona S. Aiken. Generalized linear models. in The Oxford Handbook of Quantitative Methods Vol. 2: Statistical Analysis (ed. Todd D. Little) Ch. 3 (Oxford Univ. Press, 2013)

Hao, L., Naiman, D. Q. & Naiman, D. Q. Quantile Regression . (SAGE, 2007).

Byrd, R. H., Lu, P., Nocedal, J. & Zhu, C. A Limited Memory Algorithm for Bound Constrained Optimization. SIAM J. Sci. Comput. 16 , 1190–1208 (1995).

Article   MathSciNet   Google Scholar  

Guan, X., Huang, C., Liu, G., Meng, X. & Liu, Q. Mapping Rice Cropping Systems in Vietnam Using an NDVI-Based Time-Series Similarity Measurement Based on DTW Distance. Remote Sensing. 8 , 19 (2016).

Gocic, M. & Trajkovic, S. Analysis of changes in meteorological variables using Mann-Kendall and Sen’s slope estimator statistical tests in Serbia. Global and Planetary Change. 100 , 172–182 (2013).

Gilbert, R. O. Statistical Methods for Environmental Pollution Monitoring . (John Wiley & Sons, 1987).

Sen, P. K. Estimates of the Regression Coefficient Based on Kendall’s Tau. Journal of the American Statistical Association . 63 , 1379–1389 (1968).

Liu, L., Cao, X., Li, S. & Jie, N. GlobPOP: A 31-year (1990-2020) global gridded population dataset generated by cluster analysis and statistical learning. Zenodo https://doi.org/10.5281/zenodo.10088105 .(2023)

Schiavina, M., Freire, S., MacManus, K. GHS population grid multitemporal (1975-1990-2000-2015), R2019A. European Commission, Joint Research Centre (JRC). https://doi.org/10.2905/0C6B9751-A71F-4062-830B-43C9F432370F (2019).

Center For International Earth Science Information Network-CIESIN-Columbia University, International Food Policy Research Institute-IFPRI, The World Bank & Centro Internacional De Agricultura Tropical-CIAT. Global Rural-Urban Mapping Project, Version 1 (GRUMPv1): Population Density Grid. https://doi.org/10.7927/H4R20Z93 (2011).

Center For International Earth Science Information Network-CIESIN-Columbia University. Gridded Population of the World, Version 4 (GPWv4): Population Density, Revision 11. https://doi.org/10.7927/H49C6VHW (2018).

Rose, A., et al LandScan Global 2020. Oak Ridge National Laboratory . https://doi.org/10.48690/1523378 (2021).

WorldPop (www.worldpop.org - School of Geography and Environmental Science, University of Southampton; Department of Geography and Geosciences, University of Louisville; Departement de Geographie, Universite de Namur) and Center for International Earth Science Information Network (CIESIN), Columbia University. Global High Resolution Population Denominators Project. https://doi.org/10.5258/SOTON/WP00647 (2018).

Center For International Earth Science Information Network-CIESIN-Columbia University. Gridded Population of the World, Version 4 (GPWv4): Land and Water Area, Revision 11. https://doi.org/10.7927/H4Z60M4Z (2018).

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (Grant No. 42192584 and 42371334) and Open Fund of State Key Laboratory of Remote Sensing Science and Beijing Engineering Research Center for Global Land Remote Sensing Products (Grant No. OF202316).

Author information

Authors and affiliations.

State Key Laboratory of Remote Sensing Science, Faculty of Geographical Science, Beijing Normal University, Beijing, 100875, China

Luling Liu, Xin Cao, Shijie Li & Na Jie

Beijing Engineering Research Center for Global Land Remote Sensing Products, Faculty of Geographical Science, Beijing Normal University, Beijing, 100875, China

You can also search for this author in PubMed   Google Scholar

Contributions

XC conceived the research. XC, LL, SL and NJ designed the experiments, and LL carried out the experiments. LL prepared the manuscript. All authors contributed to manuscript discussion and revision.

Corresponding author

Correspondence to Xin Cao .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Liu, L., Cao, X., Li, S. et al. A 31-year (1990–2020) global gridded population dataset generated by cluster analysis and statistical learning. Sci Data 11 , 124 (2024). https://doi.org/10.1038/s41597-024-02913-0

Download citation

Received : 08 June 2023

Accepted : 02 January 2024

Published : 24 January 2024

DOI : https://doi.org/10.1038/s41597-024-02913-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

global analysis of data

Discover everything you need to know about your competitors and prospects in one report

View all Company Profiles or identify market leaders with our Advanced Search

Advanced Search

  • Locations ×
  • Sector ×

See your Competitor from every angle

We offer a 360-degree view of your competitor ecosystem - unearth hidden sentiments across deals, jobs, news, filings and more, at the click of a button

Indispensable Sales Intelligence

Access everything you need to know about your prospects with GlobalData Company Profiles

Deals Data Goldmine Unleashed

Minimize the time spent on research to identify strategic partnerships for your business

Gain unrivalled insight. Take decisive action.

How our Company Profiles help you secure strategic advantage:

Stay ahead of competitors

Uncover strategies, investments, financials, hiring activities and other critical signals. Then take action to stay on top

De-risk sales opportunities

Address pain points directly and turn prospects into deals.

Innovate faster with unique data signals

Be the first to anticipate strategic moves with our proprietary datasets and untapped pools of intelligence.

Never miss a move

Track in real-time all the latest company deals, patents, filings, jobs, news, triggers and alerts in one place.

Save hours of research

Trust our gold standard intelligence and eliminate time-consuming research. Focus instead on sharing actionable insights with your team

Exploit weaknesses

Identify competitor or prospect weaknesses to better position your business and exploit opportunities

Just some of the brands who rely on our Company Profiles:

image

Discover the gold standard of company intelligence

Identify market leaders with our Advanced Search

Alternatively Browse Our Top Lists

Discover the gold standard of company intelligence..

Alternatively browse our Company Bundles

  • Global Tech Giants
  • Wearable Tech
  • Artificial Intelligence

global analysis of data

Discover the quality of our company profile reports

Save up to 20% on Multi-Company Profile Bundles

Get in touch to talk to us about our exclusive offers. Email [email protected] or call us on +44 (2) 20 7947 2960.

DECODED Your daily industry news round-up

global analysis of data

U.S. flag

An official website of the United States government

Here's how you know

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Global Forecast System (GFS)

The Global Forecast System (GFS) is a National Centers for Environmental Prediction (NCEP) weather forecast model that generates data for dozens of atmospheric and land-soil variables, including temperatures, winds, precipitation, soil moisture, and atmospheric ozone concentration. The system couples four separate models (atmosphere, ocean model, land/soil model, and sea ice) that work together to accurately depict weather conditions.

  • Data Access

Access Methods

GFS Data is available through a variety of access methods and formats.

The NOAA Big Data Program also provides access to gridded 0.25°- and 0.5°-resolution analysis and forecast data in a trailing 30-day window in the AWS Open Data Registry for GFS.

GFS Parameter Sets

GFS Analysis and Forecasts

NCEI provides access to the following gridded analysis and forecast data from GFS.

GFS Analysis (GFS-ANL)

Gfs forecasts (gfs model), gfs forecasts (gfs-avn model).

Before January 2003, the GFS split into GFS Aviation (AVN) GFS Medium Range Forecast (MRF) models. AVN and MRF products are a collection from NCEP's NOAAPort. Grids, domains, run frequencies, and output frequencies have changed over the years.

Specifications

The model is constantly evolving, and regularly adjusted to improve performance and forecast accuracy. GFS is a global model with a base horizontal resolution of 18 miles (28 kilometers) between grid points. Temporal resolution covers analysis and forecasts out to 16 days. Horizontal resolution drops to 44 miles (70 kilometers) between grid points for forecasts between one week and two weeks. 

More Information

  • GFS home page
  • WMO Headers

Related News

A black and white photo of George Washington Carver in 1910.

February 21, 2024

The Legacy of Dr. George Washington Carver

The text says "AMS 2024” with different shades of red and blue running vertically in the background towards a half-dome visual of the globe.

January 18, 2024

Participating in the 2024 AMS Annual Meeting

A reel of microfiche is unrolled on top of a Naval weather logbook alongside boxes of other reels.

September 18, 2023

Preserving World War II Naval Data

banner

Thales Blog

Data security trends: 2024 report analysis, march 25, 2024.

Todd Moore

Amid ongoing economic uncertainty and a progressively complex threat landscape, businesses are trying to navigate increasingly stringent regulatory requirements while bolstering their security posture.

The 2024 Thales Global Data Threat Report , conducted by S&P Global Market Intelligence, which surveyed almost 3,000 respondents from 18 countries and 37 industries, revealed how decision-makers navigate new threats while trying to overcome old challenges. The report explores their experiences, hurdles, approaches, and achievements and offers insights into the security implications of new technologies and the organizational adaptations necessary for future success.

2024 Data Threat Report

Compliance and Residency Are Key

The study revealed that although risk is volatile and cyber regulations constantly change, nearly half (43%) of businesses did not pass a compliance audit in the past year. Among those failing audits, 31% suffered a breach in the same period, compared to a mere 3% among compliant businesses. This highlights a significant link between compliance adherence and data security.

Challenges also persist in managing operational complexity, leading to data-related issues. A substantial number of organizations struggle to identify and classify their at-risk systems, applications, and data, with only a third (33%) achieving full classification. Alarmingly, 16% admitted to hardly classifying any of their data.

The rampancy of multi-cloud usage across services, along with evolving global data privacy regulations, has underscored the importance of data sovereignty for businesses. According to the report, 28% of respondents consider mandatory external key management as the primary method to achieve sovereignty.

A Matter of Trust

The Report also revealed that most customers (89%) are willing to share their data with organizations, but this willingness comes with certain non-negotiable conditions. Nearly nine out of ten (87%) expect some level of privacy rights from the companies they engage with online. In addition to these high consumer privacy expectations, respondents highlighted that many customers access their organization's internal systems or assets. They indicated that up to 16% of those accessing corporate cloud, network, and device resources could be customers.

Similarly, external vendor and contractor access accounted for an average of 15% and 12% of users, respectively. Given the combination of heightened consumer privacy expectations and extensive external user access, Customer Identity and Access Management ( CIAM ) emerged as one of the primary emerging security concerns.

However, while improvements in CIAM, such as passkeys and password deprecation, enhance user experience, they also introduce new challenges like deepfake attacks from generative AI, and simplifying this complexity is crucial to reducing opportunities for adversaries and improving usability and engagement.

Emerging Tech: Threats and Opportunities

The report also delved into the emerging technologies that security practitioners are eyeing. More than half (57%) cited Artificial Intelligence (AI) as a major worry, with IoT hot on its heels with 55%. Next came Post Quantum Cryptography with 45%.

Having said that, these technologies also promise a host of benefits. Some 22% of respondents said they were planning to integrate generative artificial intelligence (GenAI) into their security solutions and services over the next year, and another third (33%) plan to experiment with the technology.

Ubiquitous Connectivity, Pervasive Threats

In the era of ubiquitous connectivity, IoT and 5G bring about pervasive threats too. While operational technology (OT) deployments have been criticized for their lax security focus, this year's survey reveals that 75% of IT security teams prioritize OT as a defense against IoT threats.

OT devices like power meters and "smart" sensors in various distributed physical plants are often designed for minimal oversight and reduced operational costs, exacerbating security risks. This means proactive security measures are essential. Despite the increasing connectivity options, traditional methods like physical or network isolation ("air gapping") are less favored for securing IoT/OT environments.

Reflecting zero-trust principles, respondents show reluctance to rely solely on carrier security, with only 33% expressing concern about carrier network security in the context of 5G. However, IoT and OT devices face persistent security challenges.

Establishing Centrally Defined Principles

As enterprises expand, so too will their use and integration of these technologies. This is why establishing centrally defined security principles can improve the likelihood of successful delegation and implementation, mainly when rooted in the fundamental concepts of guidance and agreement.

Like how the rule of law thrives in societies where individuals and institutions understand their rights and obligations, enterprise data security risks can be mitigated by empowering and entrusting other stakeholders to adhere to these principles voluntarily.

Download the full Thales 2024 Thales Data Threat Report now.

Related Articles

Qr code scams: what you need to know about this phishing tactic, keeping customer data safe: ai's privacy paradox, ransomware attacks: the constant and evolving cybersecurity threat, making waves: empowering women in cybersecurity, what the nfl can teach us about building a winning data security strategy.

NCEP FNL Operational Model Global Tropospheric Analyses, continuing from July 1999

| doi: 10.5065/d6m043c6.

global analysis of data

  • Description
  • Data Access
  • Documentation

These NCEP FNL (Final) Operational Global Analysis data are on 1-degree by 1-degree grids prepared operationally every six hours. This product is from the Global Data Assimilation System (GDAS), which continuously collects observational data from the Global Telecommunications System (GTS), and other sources, for many analyses. The FNLs are made with the same model which NCEP uses in the Global Forecast System (GFS), but the FNLs are prepared about an hour or so after the GFS is initialized. The FNLs are delayed so that more observational data can be used. The GFS is run earlier in support of time critical forecast needs, and uses the FNL from the previous 6 hour cycle as part of its initialization.

The analyses are available on the surface, at 26 mandatory (and other pressure) levels from 1000 millibars to 10 millibars, in the surface boundary layer and at some sigma layers, the tropopause and a few others. Parameters include surface pressure, sea level pressure, geopotential height, temperature, sea surface temperature, soil values, ice cover, relative humidity, u- and v- winds, vertical motion, vorticity and ozone.

The archive time series is continuously extended to a near-current date. It is not maintained in real-time.

Data License

Home

Landsat Analysis Ready Data (GLAD ARD) and GLAD Tools

Glad landsat ard tools.

global analysis of data

The Landsat Analysis Ready Data (ARD) created by the Global Land Analysis and Discovery team (GLAD) serves as a spatially and temporally consistent input for land cover mapping and change detection at global to local scales. The GLAD ARD represents a 16-day time series of globally consistent, tiled Landsat normalized surface reflectance from 1997 to the present operationally updated every 16 days. Accessible through a dedicated Application Programming Interface (API) without any associated charges, the GLAD ARD imposes no restrictions on subsequent redistribution or use, provided proper citation is given following the Creative Commons Attribution License (CC BY) .

In addition to the ARD dataset, the GLAD team has developed the GLAD Tools software suite. This suite enables users to apply time-series data processing, spectral and temporal data analysis, and machine-learning characterization to the GLAD ARD. The GLAD Tools User’s Manual supports users in applying the latest version of the software for national and regional land cover characterization, change assessment, and area reporting, incorporating state-of-the-art mapping and sample analysis techniques. Together, the global GLAD ARD and GLAD Tools provide an end-to-end solution for no-cost Landsat-based natural resource assessment and monitoring at national to global scales.

GLAD ARD Data Format

The GLAD ARD represents a 16-day time series of globally consistent, tiled Landsat normalized surface reflectance and brightness temperature. Within each 16-day interval, the GLAD ARD preserves Landsat observations with minimal cloud and shadow contamination. These data composites are stored as 8-band, 16-bit unsigned, LZW-compressed GeoTIFF files, in geographic coordinates, featuring a spatial resolution of 0.00025 degrees per pixel, equivalent to 27.83 meters per pixel at the Equator. The product utilizes the World Geodetic System WGS84 ( EPSG:4326 )

The global Landsat ARD product is organized into 1x1 geographic degrees tiles, each comprising 4004x4004 pixels, with a 2-pixel overlap. Tile nomenclature is derived from the tile center's integer value in degrees. The global GLAD ARD tile database is accessible online. Landsat image data collected in a 16-day interval are consolidated into a single ARD composite. A year encompasses 23 composites (GeoTIFF files), each assigned a unique numeric ID starting from the first composite in 1980. The 16-day interval ID table is available online.

In 2022, the GLAD team updated the ARD product using Landsat Collection 2 data. Collection 2 offers enhanced geometric accuracy and radiometric calibration compared to Collection 1 products. The entire ARD dataset from 1997 to the present has undergone this update, resulting in the decommissioning of the ARD Collection 1-based product.

global analysis of data

GLAD ARD Data Access

The global Landsat ARD is available for download using dedicated ARI which provides access to 16-day tiled composites (GeoTIFF files). A user must select tiles and 16-day intervals for ARD download from the global metadata . The GLAD Tools software includes tools for automated ARD data download. See GLAD Tools User’s Manual for data download and organization instructions.

To access an individual 16-day composite from GLAD cloud storage, a user may implement API commands in cURL or Wget command-line utilities.

Example of API data download command for a single ARD tile (105E_13N) and single interval (920):

  • <username> and <password> - the default API credentials are: Username: glad; Password:  ardpas.
  • <tile> - the ARD tile name .
  • <lat> - tile latitude, the second half of the ARD tile name (e.g., for the 105E_13N, is 13N).
  • <interval> - the 16-day interval ID (e.g., 920).
  • <outfolder> - output path.

Alternatively, GLAD ARD can be accessed from the Amazon Web Services (AWS) S3 cloud storage. Presently, only a portion of the global ARD data is available on AWS S3. New data may be added upon request to Peter Potapov . ARD data on AWS S3 is provided as public access and is free to download or use in the cloud. The path to the ARD data from S3 is:

The GLAD Tools is a collection of freeware utilities that enable advanced ARD data analysis and land cover and land use change assessment, including:

  • Landsat ARD Data Management: Facilitates the download and efficient management of Landsat Analysis Ready Data (ARD).
  • Annual Image Composites: Enables the creation of cloud-free annual image composites and multitemporal metrics for land cover mapping and change detection.
  • Supervised Land Cover Classification: Employs advanced machine learning tools for accurate supervised land cover classification and change detection.
  • Map Analysis: Supports map analysis through image algebra modeling, focal analysis, and precise area calculation.
  • Sample Reference Data Collection: Facilitates the collection of sample reference data and their analysis to estimate map accuracy, unbiased area, and area estimation uncertainty.

The combined capabilities of the global GLAD ARD and GLAD Tools offer a comprehensive, no-cost solution for national and regional users engaged in Landsat-based forest and land cover/land use monitoring. The GLAD Tools and ARD were successfully tested in many countries and regions and were adopted for forest and land cover monitoring at the national scale.

The distributed version of GLAD Tools is designed for Windows 10 and 11 operation systems. The minimum PC requirements are 8GB RAM and at least 50GB HDD space. The GLAD Tools are built using MinGW and GDAL software packages. The Tools require the user to install several open-access software packages ( Strawberry Perl , QGIS/OSGeo4W , R , and Google Earth Desktop ) before software installation. The GLAD Tools installation requires user to have administrative privileges on the Windows system. Please follow GLAD Tools installation instructions in the User’s Manual to set up the Tools.

The GLAD Tools are periodically updated. We recommend checking and updating GLAD Tools periodically. The user also needs to keep dependencies (listed in a text file within the software folder) up to date (in case a new version of QGIS or R software is installed).

The latest version of the GLAD Tools is available here .

GLAD Tools User's Manual

The GLAD Team prepared two versions of the User’s Manual. The short Quick Start Guide is the best way to start working with GLAD Tools. It includes instructions for most of the tools and guides a new user through Tools applications using templates provided with the software. The Quick Start Guide is available in the following languages:

  • Español (Translated by Ángela Hernández Moreno, CIEP)
  • Le français (Translated by Patrick Lola Amani, GLAD)
  • Việt (Translated by Vo Viet Cuong, SilvaCarbon)

A complete User’s Manual includes information on all GLAD Tools functions and provides a detailed reference for the ARD data processing and properties. This document is available only in English.

Dataset and Software License

The GLAD Landsat Analysis Ready Data (ARD) data is available online, with no charges for access and no restrictions on subsequent redistribution or use, as long as the proper citation is provided as specified by the Creative Commons Attribution License (CC BY).

The GLAD Tools available are with no charges and no restrictions on subsequent redistribution or use, if the proper citation is provided as specified by the Creative Commons Attribution License (CC BY).

Copyright © Global Land Analysis and Discovery Team, University of Maryland

Suggested citation:

Potapov, P., Hansen, M.C., Kommareddy, I., Kommareddy, A., Turubanova, S., Pickens, A., Adusei, B., Tyukavina A., and Ying, Q., 2020. Landsat analysis ready data for global land cover and land cover change mapping. Remote Sens. 2020, 12, 426; doi:10.3390/rs12030426

While the GLAD team makes every effort to ensure the completeness and consistency of the Landsat ARD product, we acknowledge that it may contain faults and unreadable data. We ask that you notify us immediately of any problems with our data. We will make every effort to correct them.

Concerning the Landsat ARD data and GLAD Tools provided through this service, the GLAD team disclaims any warranties, whether express or implied, including but not limited to the warranties of merchantability and fitness for a particular purpose. Additionally, the team assumes no legal liability or responsibility for the accuracy, completeness, or usefulness of the data products.

While the GLAD team is dedicated to ongoing product updates and maintaining open data access, it is important to note that the continuity of this service is subject to the availability of adequate funding and resources. Consequently, interruptions or cancellations of this service may occur at any time without prior notice.

For all questions and comments please contact Peter Potapov

global analysis of data

DASH Search - Production

NCEP GDAS/FNL 0.25 Degree Global Tropospheric Analyses and Forecast Grids

These NCEP FNL (Final) operational global analysis and forecast data are on 0.25-degree by 0.25-degree grids prepared operationally every six hours. This product is from the Global Data Assimilation System (GDAS), which continuously collects observational data from the Global Telecommunications System (GTS), and other sources, for many analyses. The FNLs are made with the same model which NCEP uses in the Global Forecast System (GFS), but the FNLs are prepared about an hour or so after the GFS is initialized. The FNLs are delayed so that more observational data can be used. The GFS is run earlier in support of time critical forecast needs, and uses the FNL from the previous 6 hour cycle as part of its initialization.

The analyses are available on the surface, at 26 mandatory (and other pressure) levels from 1000 millibars to 10 millibars, in the surface boundary layer and at some sigma layers, the tropopause and a few others. Parameters include surface pressure, sea level pressure, geopotential height, temperature, sea surface temperature, soil values, ice cover, relative humidity, u- and v- winds, vertical motion, vorticity and ozone.

The archive time series is continuously extended to a near-current date. It is not maintained in real-time.

To Access Resource:

  • Go to Resource Homepage HTML DOI: https://doi.org/10.5065/D65Q4T4Z

Questions? Email Resource Support Contact:

Riley Conroy [email protected] UCAR/NCAR - Research Data Archive

Temporal Range

  • air temperature
  • atmospheric pressure
  • atmospheric radiation
  • atmospheric tempera...
  • atmospheric water v...
  • atmospheric winds
  • cloud frequency
  • cloud liquid water/ice
  • cloud microphysics
  • cloud properties
  • dew point temperature
  • earth science
  • environmental gover...
  • evaporation
  • geopotential height
  • gravity wave
  • human dimensions
  • hydrostatic pressure
  • land management
  • land surface
  • land use/land cover...
  • longwave radiation
  • maximum/minimum tem...
  • ncep final global d...
  • ncep global data as...
  • ocean temperature
  • planetary boundary ...
  • potential temperature
  • precipitation
  • precipitation amount
  • precipitation rate
  • sea level pressure
  • sea surface tempera...
  • shortwave radiation
  • skin temperature
  • snow water equivalent
  • soil moisture/water...
  • soil temperature
  • surface radiative p...
  • surface temperature
  • surface thermal pro...
  • surface water
  • surface water proce...
  • surface winds
  • terrestrial hydrosp...
  • total precipitable ...
  • upper air temperature
  • upper level winds
  • vertical wind veloc...
  • water vapor indicators
  • water vapor processes
  • wind dynamics
  • wrf (weather resear...

Scientific Information

Contact information, citation information, harvest source.

Download Metadata (XML) · View Full Metadata (HTML)

  • Skip to main content
  • Skip to main footer

global analysis of data

Global Connectedness Report shows why globalization remains strong despite turbulent times

In times marked by severe conflict, questions about the role of globalization continue to flourish. But the latest edition of the DHL Global Connectedness Report, released in partnership with New York University’s Stern School of Business , unveils a remarkable finding: Globalization reached a record high in 2022 and has remained near that level in 2023.

2024 DHL Global Connectedness Report

Globalization at a record high.

global analysis of data

Since DHL’s Global Connectedness Report was last published, some of the strains on globalization have eased while others have intensified. The disruptions caused by the Covid-19 pandemic are clearly in the past, and its economic aftereffects are receding. But the United Nations now reports the largest number of violent conflicts since the Second World War, and geopolitical rivalry over key technologies continues to escalate.

In this dynamic environment, reliable measures of the state and trajectory of globalization are essential for business and public policy decision-making. The 2024 Global Connectedness Report is based on the meticulous analysis of nearly 9 million data points on country-to-country flows, and it measures the globalization of 181 countries, covering 99.7% of the global economy and 98.7% of the world’s population. It provides a unique and comprehensive picture of how goods & services, capital, information, and people are moving around the world.

That picture clearly shows that globalization reached a record high in 2022 and remained close to that level in 2023. This outcome may surprise many readers, but the data are unambiguous: global connectedness remains strong, even as the public policy context has become less conducive to globalization, and conflicts dominate the headlines.  

We invite you to look more closely at the key takeaways and topline results from the 2024 DHL Global Connectedness Report – and download the full report for a more in-depth analysis.

Ample room for growth

The data also refute the idea that we are living in an age of unfettered globalization, as some would claim: International flows are still much smaller than flows within national borders.

This year for the first time, the DHL Global Connectedness Report used a methodology that measures the world’s depth of globalization on a scale from 0% (nothing crosses national borders at all) to 100% (a “frictionless” world where borders and distance have ceased to matter). It currently stands at 25%, which means we are still closer to a world of separate countries than to a fully globalized world.

In other words, globalization may be at an all-time high, but there is ample room for growth.

DHL Global Connectedness Report 2024

  • Complete Report pdf 25.3 MB
  • Key Highlights Brochure pdf 1.9 MB
  • 10 Key Takeaways pdf 1.1 MB
  • GCR Scores and Ranks 2001-2022 (XLS) ms-excel 463.8 KB

Please cite data source as:  Steven A. Altman and Caroline R. Bastian, "DHL Global Connectedness Report 2024," Bonn: DHL Group. (DOI:10.58153/7jt4h-p0738)

10 KEY TAKEAWAYS

1

A RECORD HIGH

Global connectedness set a record in 2022 and remained close to it in 2023.

2

SINGAPORE ON TOP

Singapore was the most globally connected country in 2022, followed by the Netherlands and Ireland.

global analysis of data

US-CHINA TIES DIMINISHED

The pullback from direct US–China trade accelerated in 2023, but both countries are still significantly connected.

4

RUSSIA AND EUROPE DECOUPLED

Among major G20 economies, Russia had the largest single-year drop in global connectedness on record.

5

NO GLOBAL FRAGMENTATION

Global flows show no general split of the world economy between rival geopolitical blocs.

6

NO REGIONALIZATION TREND

Most international flows take place over stable or longer distances.

7

MORE CORPORATE GLOBALIZATION

Companies expanded their international presence and earned more sales abroad.

8

TRADE AT RECORD HIGH

The share of global trade in world GDP was at a record level in 2022.

global analysis of data

INFORMATION FLOWS STAGNATE

After strong growth over two decades, the globalization of information flows stalled, partly due to US-China tensions.

10

GLOBALIZATION REMAINS LIMITED

The world’s absolute level of global connectedness is only at 25%.

Summary: Three central questions

Three questions, three insights: The analysis of global trends in the 2024 Global Connectedness Report examines three questions at the center of current debates about globalization.

global analysis of data

1. Are global flows still growing?

The evidence strongly rebuts the notion that the growth of global flows has gone into reverse. The world’s overall level of global connectedness reached a record high in 2022, and data suggest it remained at roughly the same level in 2023.

Trade growth played a key role here. The share of global output traded internationally hit a record high in 2022. Early data suggest a modest decline in 2023, but this isn’t a signal of deglobalization. Trade growth normally lags behind GDP growth when the global economy slows.

Furthermore, it seems companies haven’t lost their appetite for international expansion. Examples include a rise in the value of announced greenfield foreign direct investment (FDI), and the fact that publicly traded companies from most countries are earning more of their sales abroad, among others.

People flows, which were hard hit by the Covid-19 pandemic, continued a strong recovery trend in 2023. International travel reached 88% of pre-pandemic levels and was on track for a full recovery by the end of 2024.

global analysis of data

2. Is geopolitical rivalry fracturing the global economy?

The 2024 DHL Global Connectedness Report notes clear shifts in international flows for countries at the center of current tensions. Nevertheless, there is still no clear evidence of a wider split of the world economy between rival blocs of allied countries.

The United States and China have reduced their direct flows with each other, with an average decline of roughly one-quarter in the share of U.S. flows involving China – and vice versa – since 2016. However, the shifts represent less a decoupling of the world’s two largest economies and more a reduction of what had previously been an unusually high level of integration. Ultimately, the U.S. and China are still connected by larger flows than almost every other pair of countries worldwide.

The term “decoupling” better describes another dramatic shift in international flows: the reorientation of Russia’s flows away from Europe and other Western-aligned economies since its full-scale invasion of Ukraine. In the realm of trade, Russia pivoted to alternative export markets and import sources, but no similar substitution has taken place for international business investment. As a result, announced greenfield FDI into Russia has collapsed.

However, these developments haven’t led to a wider split of the world economy between rival blocs of countries. The data also confirm that there is no general pattern of countries interacting more with other countries that have similar geopolitical perspectives.

global analysis of data

3. Are international flows becoming more regional?

Through 2023, there is no robust evidence of international flows generally becoming more regional. In fact, most types of flows have tended to take place over stable or longer distances. There was a small decline in the average distance traversed by trade in 2023, but it’s important not to overstate this development since trade flows in 2023 covered the second-longest distances on record. The only major trading region showing a clear nearshoring trend over multiple years is North America.

The lack of wider evidence of trade regionalization might be another surprise for some, as some publications have identified a rising trend in the share of trade happening inside regions that started roughly a decade ago. But that trend turned out to be short-lived, and it appeared only with some ways of defining regions and not others.

It’s also important to keep in mind that international flows are already highly regionalized; roughly half of international trade, capital, information, and people flows take place inside major world regions.

Whether international flows will become more regional in the future remains to be seen. Many companies and governments are working to foster regional supply chains, and such reconfigurations can take several years to execute.

143 countries more connected

global analysis of data

The 2024 DHL Global Connectedness Report includes country-level analyses of 181 countries. This provides additional evidence of the resilience of global flows. The gains were widespread across countries and not just the result of a small number of countries becoming more globally connected.

In 2022, the most recent year for which we have full country-level data, 143 countries became more globally connected, while only 38 saw their levels of connectedness decline.

Singapore topped the list of most globally connected countries this year, followed by The Netherlands, and Ireland. At the bottom of the list, in ascending order, were Guinea Bissau, Yemen, and São Tomé and Príncipe. The countries with the largest increases in 2022 were Bahrain, the United Kingdom, and Lebanon. The countries with the largest declines were Belarus and Russia. The drop in Russia’s connectedness was more than twice as large as any previous decline on record for a country that ranks among the world’s 20 largest economies.

Choose & compare the results

Dhl global connectedness report - at a glance.

global analysis of data

Driven by data, delivered by DHL

To make sound decisions, business leaders need solid information. Each year, the DHL Global Connectedness Report provides a grounded perspective on the state of globalization to help them do just that.

Globalization is at the forefront of many trade and policy discussions around the world, but it remains difficult to quantify. Making it tangible and measuring its development calls for scrutinizing the data and separating facts from fiction. As a leading logistics company, DHL is uniquely positioned to provide orientation and contribute to the globalization debate.

To provide a solid research foundation, DHL partnered with New York University’s Stern School of Business to form the DHL Initiative on Globalization , where a team of scholars conducts the research and analysis. Each edition of the DHL Global Connectedness Report builds on the previous report and the scholars’ decades of globalization research.

Tangible takeaways for sound decision-making

With debates about the merits of international openness continuing, the report is a go-to resource for business leaders and policymakers who wish to have better-informed discussions. Due to its unique focus, the DHL Global Connectedness Report is regularly featured in international media outlets and national publications worldwide. It is also increasingly cited in scholarly journals, consultant reports, and general interest books.

Lessons Learned from 10 Years

DHL GLOBAL CONNECTEDNESS REPORT

  • 10YR Lessons Learned - English version pdf 7.5 MB
  • 10YR Lessons Learned - Chinese version pdf 15.2 MB
  • 10YR Lessons Learned - French version pdf 15.0 MB
  • 10YR Lessons Learned - German version pdf 15.0 MB
  • 10YR Lessons Learned - Italian version pdf 15.1 MB
  • 10YR Lessons Learned - Portuguese version pdf 15.1 MB
  • 10YR Lessons Learned - Spanish version pdf 15.0 MB
  • 10YR Lessons Learned - Vietnamese version pdf 15.1 MB

global analysis of data

DHL Initiative on Globalization

The DHL Initiative on Globalization at NYU Stern aims to develop and maintain the academic world’s most comprehensive collection of data on the globalization of trade, capital, information, and people flows and to be a leading center of excellence for data-driven globalization research.

Published: March 2024

Want it Delivered?

global analysis of data

Why go looking for the latest logistics trends and business insight?

global analysis of data

Subscribe to the monthly newsletter distilled into one digestible package.

Related stories

global analysis of data

We've detected unusual activity from your computer network

To continue, please click the box below to let us know you're not a robot.

Why did this happen?

Please make sure your browser supports JavaScript and cookies and that you are not blocking them from loading. For more information you can review our Terms of Service and Cookie Policy .

For inquiries related to this message please contact our support team and provide the reference ID below.

IMAGES

  1. Data Analytics Market Size and Forecast

    global analysis of data

  2. 5 Steps of the Data Analysis Process

    global analysis of data

  3. Why is Data Analytics Important?

    global analysis of data

  4. How geospatial analytics can give your business a competitive edge

    global analysis of data

  5. How to Build a Scalable Data Analytics Pipeline

    global analysis of data

  6. Data analysis

    global analysis of data

VIDEO

  1. Global Data Flow Analysis (Part 2/2): Compiler Design: Numerical on In, Out, Kill, Gen Computation

  2. Global Data Flow Analysis Compute GEN

  3. Data Journalism

  4. A Global Analysis of Income Inequality

  5. Introduction to Data Analysis by Sunday (Chukwujekwu) Anah

  6. DATA FLOW ANALYSIS AND GLOBAL DATA FLOW ANALYSIS -INTRODUCTION

COMMENTS

  1. Our World in Data

    The source for this CO 2 data is the Global Carbon Budget, a dataset we update yearly as soon as it is published. In addition to these production-based emissions, they publish consumption-based emissions for the last three decades, which can be viewed in our Greenhouse Gas Emissions Data Explorer.

  2. Home

    We continuously collect and analyze data to create comprehensive, authoritative, and granular intelligence on a global scale. Expert Analysis We leverage the collective expertise of our in-house research analysts, consultants, and data scientists, as well as thousands of external industry thought-leaders.

  3. What Is Data Analysis? (With Examples)

    Data analysis is the practice of working with data to glean useful information, which can then be used to make informed decisions. ... According to global management consulting firm McKinsey Global Institute, data-driven companies are better at acquiring new customers, maintaining customer loyalty, and achieving above-average profitability ...

  4. What Is Data and Analytics: Everything You Need to Know

    The technologies needed across data, all its use cases, and the analysis of that data exist across a wide range, ... The global pandemic and other business disruptions have also accelerated the need to use more types of data across a broad range of use cases (especially as historical big data has proved less relevant as a basis for future ...

  5. Data at WHO

    WHO's World Health Data Hub is a comprehensive digital platform for global health data. It provides end-to-end solutions to collect, store, analyze, and share data that is timely, reliable, and actionable. ... They help us focus and prioritize actions for the greatest, fastest impact, using data and analysis developed with technical programmes ...

  6. The Global Burden of Disease Study at 30 years

    A particularly thorny problem in global descriptive epidemiology is the identification of anomalous studies or data points that should be excluded from the analysis.

  7. Data analytics News, Research and Analysis

    Paul X. McCarthy, UNSW Sydney and Colin Griffith, CSIRO. Big data analysis has unveiled startling links between seemingly unrelated things, such as how a person's physical elevation above sea ...

  8. Data and analytics: Why does it matter and where is the impact?

    McKinsey is currently conducting global research to benchmark data analytics maturity levels within and across industries. We encourage you to take our 20-minute survey on the topic 1 (individual results are kept confidential), and register to receive results showing your organization's maturity benchmarked against peers and best practices.. The promise of using analytics to enhance decision ...

  9. Leaders on how data can tackle global challenges of 2022

    Data can help us tackle our largest societal challenges, including climate change, inequality, global health and economic resilience. We can better understand how changing temperatures are impacting our environment and predict global weather disasters. We can measure inequalities to inform the policies that can best "close the gap".

  10. Global analysis of large-scale chemical and biological experiments

    We use the term 'global analysis' to refer to an emphasis on greater integration and analysis of data from all sources. Challenges involved in the global analysis of experimental data are illustrated by the new fields of chemical genetics and chemical genomics . By analogy to classical genetics, chemical genetics uses small molecules in ...

  11. Global economy

    The global inflation rate was estimated to have reached nearly nine percent in 2022, and nearly seven in 2023. However, inflation rates fell in most major world economies through 2023, even though ...

  12. Global Data Analytics

    In 2023, the data analytics sector is expected to maintain this trend to attain a turnover of US$ 56 billion in an insights industry slated to expand to US$ 141 billion. The report covers: Global and regional overview of the Data Analytics sector. Experts' views on the sector's expansion. Self-regulation and its role in data analysis

  13. A 31-year (1990-2020) global gridded population dataset ...

    To address this issue, this study designed a data fusion framework based on cluster analysis and statistical learning approaches, which led to the generation of a continuous global gridded ...

  14. Companies Insights and Analysis by GlobalData

    Save up to 20% on Multi-Company Profile Bundles. Get in touch to talk to us about our exclusive offers. Email [email protected] or call us on +44 (2) 20 7947 2960.

  15. Global Forecast System (GFS)

    The Global Forecast System (GFS) is a National Centers for Environmental Prediction (NCEP) weather forecast model that generates data for dozens of atmospheric and land-soil variables, including temperatures, winds, precipitation, soil moisture, and atmospheric ozone concentration. The system couples four separate models (atmosphere, ocean ...

  16. Data Analytics Market Size, Share & Growth Report, 2030

    The global data analytics market size was valued at USD 49.03 billion in 2022 and is projected to grow at a compound annual growth rate (CAGR) of 26.7% from 2023 to 2030. The main factors propelling the data analytics market's expansion are the growing adoption of machine learning and artificial intelligence to offer increased acceptance of ...

  17. GLAAS data portal

    UN-Water Global Analysis and Assessment of Sanitation and Drinking-water. Data portal. GLAAS provides policy- and decision-makers at all levels with reliable, easily accessible, comprehensive data on water, sanitation and hygiene (WASH) systems, including on governance, monitoring, human resources and finance.

  18. 2024 Report Analysis on Data Security Trends

    March 25, 2024. Amid ongoing economic uncertainty and a progressively complex threat landscape, businesses are trying to navigate increasingly stringent regulatory requirements while bolstering their security posture. The 2024 Thales Global Data Threat Report, conducted by S&P Global Market Intelligence, which surveyed almost 3,000 respondents ...

  19. NCEP FNL Operational Model Global Tropospheric Analyses, continuing

    These NCEP FNL (Final) Operational Global Analysis data are on 1-degree by 1-degree grids prepared operationally every six hours. This product is from the Global Data Assimilation System (GDAS), which continuously collects observational data from the Global Telecommunications System (GTS), and other sources, for many analyses.

  20. Landsat Analysis Ready Data (GLAD ARD) and GLAD Tools

    GLAD Landsat ARD Tools The Landsat Analysis Ready Data (ARD) created by the Global Land Analysis and Discovery team (GLAD) serves as a spatially and temporally consistent input for land cover mapping and change detection at global to local scales. The GLAD ARD represents a 16-day time series of globally consistent, tiled Landsat normalized surface reflectance from 1997 to the

  21. Global Food-and-Water Security-support Analysis Data (GFSAD)

    The GFSAD is a NASA funded project (2023-2028) to provide highest-resolution global cropland data and their water use that contributes towards global food-and-water security in the twenty-first century. The GFSAD products are derived through multi-sensor remote sensing data (e.g., Landsat-series, Sentinel-series, MODIS, AVHRR), secondary data, and field-plot data and aims at documenting ...

  22. Global Ocean Data Analysis Project (GLODAP) v2.2023

    The Global Ocean Data Analysis Project (GLODAP) v2.2023 represents a significant advancement in the synthesis of ocean biogeochemical bottle data. With a primary focus on seawater inorganic carbon chemistry, this update builds upon GLODAPv2.2022, incorporating several key enhancements. Notably, 43 new cruises have been added to expand the ...

  23. NCEP GDAS/FNL 0.25 Degree Global Tropospheric Analyses and Forecast

    These NCEP FNL (Final) operational global analysis and forecast data are on .25-degree by .25-degree grids prepared operationally every six hours. This product is from the Global Data Assimilation System (GDAS), which continuously collects observational data from the Global Telecommunications System (GTS), and other sources, for many analyses.

  24. Global Retail Outlook 2024

    In our Global Retail Outlook 2024 report, we asked 200 retail leaders around the globe to provide their expectations for revenue and operating margin for the industry as a whole. In this report, we assess the overall health of the retail sector and retailers who can effectively navigate the changing market terrain are poised for significant growth.

  25. DHL Global Connectedness Report

    The 2024 Global Connectedness Report is based on the meticulous analysis of nearly 9 million data points on country-to-country flows, and it measures the globalization of 181 countries, covering 99.7% of the global economy and 98.7% of the world's population.

  26. PAHO drives evidence-generation on the burden of influenza through

    PAHO conducted a comprehensive analysis in 2022 and 2023 on the morbidity and mortality associated with influenza and COVID-19 in the Region of the Americas using data from 2010 to 2023. Aligned with global strategies, the analyses aimed to generate data for vaccination impact studies, feed potential cost-effectiveness analyses and enhance ...

  27. Global Methane Tracker 2024

    The IEA's Global Methane Tracker is an indispensable tool in the fight to bring down emissions from across the energy sector. This year's update provides our latest estimates of emissions from across the sector - drawing on the more recent data and readings from satellites and ground-based measurements - and the costs and opportunities ...

  28. Stock Market Today: Dow, S&P Live Updates for March 26

    March 25, 2024 at 3:38 PM PDT. Listen. 7:16. Asian stocks were mostly set to open weaker after US equities pulled back from a rally that drove the S&P 500 to multiple records, spurring speculation ...