robot

The Impact of Generative Artificial Intelligence on Online Knowledge Communities

Generative AI, particularly Large Language Models (LLMs) like ChatGPT, has had a significant impact on online knowledge communities. By analyzing data from Stack Overflow and Reddit (2021-2023), we have observed a decline in visits and questions on Stack Overflow following the introduction of ChatGPT, with no such effect observed on Reddit. This decline on Stack Overflow was most pronounced among newer users.

article image

1. Introduction
The advancements in generative AI, especially LLMs, are noteworthy. They have the potential to positively impact online knowledge communities by enhancing knowledge sharing or negatively by potentially replacing human contributions. This study aims to understand the effects of LLMs on user engagement and the factors that moderate these effects, with a focus on the period following the release of ChatGPT in November 2022.

2. Methods
Our research utilized a variety of data sources and methodologies:

  • A proprietary dataset tracking website visitors from September 2022 to March 2023 and Stack Overflow questions/answers from October 2021 to March 2022 and October 2022 to March 2023.
  • Data on Reddit posting volumes obtained from subredditstats.com.
  • The impact of ChatGPT on Stack Overflow web traffic was analyzed using Synthetic Control Using LASSO (SCUL).
  • The effect on question volumes was examined using a difference-in-differences design.
  • We considered differences across communities and topics and explored shifts in user and question characteristics.

3. Results

  • Overall impact on community engagement: Daily web traffic on Stack Overflow declined by approximately 1 million visits (12%) following the release of ChatGPT (Fig. 1).
  • Effect on user content production: There was a decrease in question posting volumes per topic on Stack Overflow, a trend not mirrored on Reddit (Fig. 2).
  • Heterogeneity in ChatGPT's effect on Stack Overflow: The most affected topics were those related to software coding. There was a correlation observed between the availability of training data and the magnitude of effects (Figs. 3 & 4).
  • ChatGPT's effect on user accounts and questions: Newer user accounts were less likely to participate actively, and questions posted became more complex after ChatGPT's release (Figs. 5 & 6).

4. Discussion
The release of ChatGPT corresponded with a decline in both web traffic and question volumes on Stack Overflow, with no such effect on Reddit. The social fabric of online communities is crucial, and content characteristics as well as membership can be altered by the presence of LLMs. A decrease in content production within online communities may subsequently affect the availability of training data for LLMs.

5. Conclusion
Our study has its limitations, and future research could explore the generalizability to other communities and the long-term effects. Further analyses are anticipated to understand the broader impact of generative AI on knowledge sharing and collaboration.

Data availability:
Data from Stack Overflow is accessible via the Stack Exchange Data Explorer, Reddit data is available from subredditstats.com, and analysis scripts can be found in a public repository.