Interface representing the parameters for initializing a SitemapLoader. SitemapLoaderParams

Hierarchy (view full)

Implements

Constructors

Properties

allowUrlPatterns: undefined | (string | RegExp)[]
caller: AsyncCaller
chunkSize: number

The size to chunk the sitemap URLs into for scraping.

Default

{300}
timeout: number

The timeout in milliseconds for the fetch request. Defaults to 10s.

webPath: string
selector?: `.${number}${string}` | `.i${string}` | `.a${string}` | `.b${string}` | `.c${string}` | `.d${string}` | `.e${string}` | `.f${string}` | `.g${string}` | `.h${string}` | `.j${string}` | `.k${string}` | `.l${string}` | `.m${string}` | `.n${string}` | `.o${string}` | `.p${string}` | `.q${string}` | `.r${string}` | `.s${string}` | `.t${string}` | `.u${string}` | `.v${string}` | `.w${string}` | `.x${string}` | `.y${string}` | `.z${string}` | `.I${string}` | `.A${string}` | `.B${string}` | `.C${string}` | `.D${string}` | `.E${string}` | `.F${string}` | `.G${string}` | `.H${string}` | `.J${string}` | `.K${string}` | `.L${string}` | `.M${string}` | `.N${string}` | `.O${string}` | `.P${string}` | `.Q${string}` | `.R${string}` | `.S${string}` | `.T${string}` | `.U${string}` | `.V${string}` | `.W${string}` | `.X${string}` | `.Y${string}` | `.Z${string}` | `#${number}${string}` | `#i${string}` | `#a${string}` | `#b${string}` | `#c${string}` | `#d${string}` | `#e${string}` | `#f${string}` | `#g${string}` | `#h${string}` | `#j${string}` | `#k${string}` | `#l${string}` | `#m${string}` | `#n${string}` | `#o${string}` | `#p${string}` | `#q${string}` | `#r${string}` | `#s${string}` | `#t${string}` | `#u${string}` | `#v${string}` | `#w${string}` | `#x${string}` | `#y${string}` | `#z${string}` | `#I${string}` | `#A${string}` | `#B${string}` | `#C${string}` | `#D${string}` | `#E${string}` | `#F${string}` | `#G${string}` | `#H${string}` | `#J${string}` | `#K${string}` | `#L${string}` | `#M${string}` | `#N${string}` | `#O${string}` | `#P${string}` | `#Q${string}` | `#R${string}` | `#S${string}` | `#T${string}` | `#U${string}` | `#V${string}` | `#W${string}` | `#X${string}` | `#Y${string}` | `#Z${string}` | `:${number}${string}` | `:i${string}` | `:a${string}` | `:b${string}` | `:c${string}` | `:d${string}` | `:e${string}` | `:f${string}` | `:g${string}` | `:h${string}` | `:j${string}` | `:k${string}` | `:l${string}` | `:m${string}` | `:n${string}` | `:o${string}` | `:p${string}` | `:q${string}` | `:r${string}` | `:s${string}` | `:t${string}` | `:u${string}` | `:v${string}` | `:w${string}` | `:x${string}` | `:y${string}` | `:z${string}` | `:I${string}` | `:A${string}` | `:B${string}` | `:C${string}` | `:D${string}` | `:E${string}` | `:F${string}` | `:G${string}` | `:H${string}` | `:J${string}` | `:K${string}` | `:L${string}` | `:M${string}` | `:N${string}` | `:O${string}` | `:P${string}` | `:Q${string}` | `:R${string}` | `:S${string}` | `:T${string}` | `:U${string}` | `:V${string}` | `:W${string}` | `:X${string}` | `:Y${string}` | `:Z${string}` | `|${number}${string}` | `|i${string}` | `|a${string}` | `|b${string}` | `|c${string}` | `|d${string}` | `|e${string}` | `|f${string}` | `|g${string}` | `|h${string}` | `|j${string}` | `|k${string}` | `|l${string}` | `|m${string}` | `|n${string}` | `|o${string}` | `|p${string}` | `|q${string}` | `|r${string}` | `|s${string}` | `|t${string}` | `|u${string}` | `|v${string}` | `|w${string}` | `|x${string}` | `|y${string}` | `|z${string}` | `|I${string}` | `|A${string}` | `|B${string}` | `|C${string}` | `|D${string}` | `|E${string}` | `|F${string}` | `|G${string}` | `|H${string}` | `|J${string}` | `|K${string}` | `|L${string}` | `|M${string}` | `|N${string}` | `|O${string}` | `|P${string}` | `|Q${string}` | `|R${string}` | `|S${string}` | `|T${string}` | `|U${string}` | `|V${string}` | `|W${string}` | `|X${string}` | `|Y${string}` | `|Z${string}` | `>${number}${string}` | `>i${string}` | `>a${string}` | `>b${string}` | `>c${string}` | `>d${string}` | `>e${string}` | `>f${string}` | `>g${string}` | `>h${string}` | `>j${string}` | `>k${string}` | `>l${string}` | `>m${string}` | `>n${string}` | `>o${string}` | `>p${string}` | `>q${string}` | `>r${string}` | `>s${string}` | `>t${string}` | `>u${string}` | `>v${string}` | `>w${string}` | `>x${string}` | `>y${string}` | `>z${string}` | `>I${string}` | `>A${string}` | `>B${string}` | `>C${string}` | `>D${string}` | `>E${string}` | `>F${string}` | `>G${string}` | `>H${string}` | `>J${string}` | `>K${string}` | `>L${string}` | `>M${string}` | `>N${string}` | `>O${string}` | `>P${string}` | `>Q${string}` | `>R${string}` | `>S${string}` | `>T${string}` | `>U${string}` | `>V${string}` | `>W${string}` | `>X${string}` | `>Y${string}` | `>Z${string}` | `+${number}${string}` | `+i${string}` | `+a${string}` | `+b${string}` | `+c${string}` | `+d${string}` | `+e${string}` | `+f${string}` | `+g${string}` | `+h${string}` | `+j${string}` | `+k${string}` | `+l${string}` | `+m${string}` | `+n${string}` | `+o${string}` | `+p${string}` | `+q${string}` | `+r${string}` | `+s${string}` | `+t${string}` | `+u${string}` | `+v${string}` | `+w${string}` | `+x${string}` | `+y${string}` | `+z${string}` | `+I${string}` | `+A${string}` | `+B${string}` | `+C${string}` | `+D${string}` | `+E${string}` | `+F${string}` | `+G${string}` | `+H${string}` | `+J${string}` | `+K${string}` | `+L${string}` | `+M${string}` | `+N${string}` | `+O${string}` | `+P${string}` | `+Q${string}` | `+R${string}` | `+S${string}` | `+T${string}` | `+U${string}` | `+V${string}` | `+W${string}` | `+X${string}` | `+Y${string}` | `+Z${string}` | `~${number}${string}` | `~i${string}` | `~a${string}` | `~b${string}` | `~c${string}` | `~d${string}` | `~e${string}` | `~f${string}` | `~g${string}` | `~h${string}` | `~j${string}` | `~k${string}` | `~l${string}` | `~m${string}` | `~n${string}` | `~o${string}` | `~p${string}` | `~q${string}` | `~r${string}` | `~s${string}` | `~t${string}` | `~u${string}` | `~v${string}` | `~w${string}` | `~x${string}` | `~y${string}` | `~z${string}` | `~I${string}` | `~A${string}` | `~B${string}` | `~C${string}` | `~D${string}` | `~E${string}` | `~F${string}` | `~G${string}` | `~H${string}` | `~J${string}` | `~K${string}` | `~L${string}` | `~M${string}` | `~N${string}` | `~O${string}` | `~P${string}` | `~Q${string}` | `~R${string}` | `~S${string}` | `~T${string}` | `~U${string}` | `~V${string}` | `~W${string}` | `~X${string}` | `~Y${string}` | `~Z${string}` | `[${number}${string}` | `[i${string}` | `[a${string}` | `[b${string}` | `[c${string}` | `[d${string}` | `[e${string}` | `[f${string}` | `[g${string}` | `[h${string}` | `[j${string}` | `[k${string}` | `[l${string}` | `[m${string}` | `[n${string}` | `[o${string}` | `[p${string}` | `[q${string}` | `[r${string}` | `[s${string}` | `[t${string}` | `[u${string}` | `[v${string}` | `[w${string}` | `[x${string}` | `[y${string}` | `[z${string}` | `[I${string}` | `[A${string}` | `[B${string}` | `[C${string}` | `[D${string}` | `[E${string}` | `[F${string}` | `[G${string}` | `[H${string}` | `[J${string}` | `[K${string}` | `[L${string}` | `[M${string}` | `[N${string}` | `[O${string}` | `[P${string}` | `[Q${string}` | `[R${string}` | `[S${string}` | `[T${string}` | `[U${string}` | `[V${string}` | `[W${string}` | `[X${string}` | `[Y${string}` | `[Z${string}` | `${number}${string}` | `i${string}` | `a${string}` | `b${string}` | `c${string}` | `d${string}` | `e${string}` | `f${string}` | `g${string}` | `h${string}` | `j${string}` | `k${string}` | `l${string}` | `m${string}` | `n${string}` | `o${string}` | `p${string}` | `q${string}` | `r${string}` | `s${string}` | `t${string}` | `u${string}` | `v${string}` | `w${string}` | `x${string}` | `y${string}` | `z${string}` | `I${string}` | `A${string}` | `B${string}` | `C${string}` | `D${string}` | `E${string}` | `F${string}` | `G${string}` | `H${string}` | `J${string}` | `K${string}` | `L${string}` | `M${string}` | `N${string}` | `O${string}` | `P${string}` | `Q${string}` | `R${string}` | `S${string}` | `T${string}` | `U${string}` | `V${string}` | `W${string}` | `X${string}` | `Y${string}` | `Z${string}`

The selector to use to extract the text from the document. Defaults to "body".

textDecoder?: TextDecoder

The text decoder to use to decode the response. Defaults to UTF-8.

Methods

  • Extracts the text content from the loaded document using the selector and creates a Document instance with the extracted text and metadata. It returns an array of Document instances.

    Returns Promise<Document[]>

    A Promise that resolves to an array of Document instances.

  • Loads the documents and splits them using a specified text splitter.

    Parameters

    • splitter: TextSplitter = ...

    Returns Promise<Document[]>

    A Promise that resolves with an array of Document instances, each split according to the provided TextSplitter.

  • A static method that dynamically imports the Cheerio library and returns the load function. If the import fails, it throws an error.

    Returns Promise<{
        load: ((content, options?, isDocument?) => CheerioAPI);
    }>

    A Promise that resolves to an object containing the load function from the Cheerio library.

  • Fetches web documents from the given array of URLs and loads them using Cheerio. It returns an array of CheerioAPI instances.

    Parameters

    • urls: string[]

      An array of URLs to fetch and load.

    • caller: AsyncCaller
    • timeout: undefined | number
    • Optional textDecoder: TextDecoder
    • Optional options: CheerioOptions

    Returns Promise<CheerioAPI[]>

    A Promise that resolves to an array of CheerioAPI instances.

Generated using TypeDoc