Files
danswer-docs/connectors/web.mdx
T
2023-07-12 19:40:43 -07:00

25 lines
880 B
Plaintext

---
title: Web Connector
description: 'Access knowledge from Web Pages'
---
## How it works
The Web Connector scrapes sites based on a base URL.
- It only indexes files from the same domain and containing the same base path.
- It will index pages reachable via hyperlinks from the base URL.
- The text contents are cleaned up via some heuristics and some metadata such as the page Title is extracted.
## Setting up
### Authorization
- As long as the page is reachable, no additional authorization is necessary.
### Indexing
1. Navigate to the Admin Dashboard and select the **Web** Connector.
2. Input the base URL to index and click on Index.
![WebConnector](/images/connectors/web/WebConnector.png)
To see the status of the indexing, visit the Connectors Status page (top left).
![WebConnectorStatus](/images/connectors/web/WebConnectorStatus.png)