Unfurl

Metadata scraper with support for oEmbed, Twitter Cards and Open Graph Prot...

README

Unfurl


A metadata scraper with support for oEmbed, Twitter Cards and Open Graph Protocol for Node.js (>=v8.0.0)
Travis CI Coverage Status Known Vulnerabilities npm

ko-fi

The what

Unfurl _(spread out from a furled state)_ will take a url and some options, fetch the url, extract the metadata we care about and format the result in a sane way. It supports all major metadata providers and expanding it to work for any others should be trivial.

The why

So you know when you link to something on Slack, or Facebook, or Twitter - they typically show a preview of the link. To do so they have crawled the linked website for metadata and enriched the link by providing more context about it. Which usually entails grabbing its title, description and image/player embed.

The how

  1. ``` sh
  2. npm install unfurl.js
  3. ```

unfurl(url [, opts])

url - string

opts - object of:

-  oembed?: boolean - support retrieving oembed metadata
-  timeout?  number - req/res timeout in ms, it resets on redirect. 0 to disable (OS limit applies)
-  follow?: number - maximum redirect count. 0 to not follow redirect
-  compress?: boolean - support gzip/deflate content encoding
-  size?: number - maximum response body size in bytes. 0 to disable
- `headers?: Headers | Record | Iterable | Iterable>` - map of request headers, overrides the defaults

Default headers:
  1. ```
  2. {
  3.   'Accept': 'text/html, application/xhtml+xml',
  4.   'User-Agent': 'facebookexternalhit'
  5. }
  6. ```
#
  1. ```typescript
  2. import { unfurl } from 'unfurl.js'
  3. const result = unfurl('https://github.com/trending')
  4. ```
#### result is `>`
  1. ```typescript
  2. type Metadata = {
  3.   title?: string
  4.   description?: string
  5.   keywords?: string[]
  6.   favicon?: string
  7.   author?: string
  8.   oEmbed?: {
  9.     type: 'photo' | 'video' | 'link' | 'rich'
  10.     version?: string
  11.     title?: string
  12.     author_name?: string
  13.     author_url?: string
  14.     provider_name?: string
  15.     provider_url?: string
  16.     cache_age?: number
  17.     thumbnails?: [{
  18.       url?: string
  19.       width?: number
  20.       height?: number
  21.     }]
  22.   }
  23.   twitter_card: {
  24.     card: string
  25.     site?: string
  26.     creator?: string
  27.     creator_id?: string
  28.     title?: string
  29.     description?: string
  30.     players?: {
  31.       url: string
  32.       stream?: string
  33.       height?: number
  34.       width?: number
  35.     }[]
  36.     apps: {
  37.       iphone: {
  38.         id: string
  39.         name: string
  40.         url: string
  41.       }
  42.       ipad: {
  43.         id: string
  44.         name: string
  45.         url: string
  46.       }
  47.       googleplay: {
  48.         id: string
  49.         name: string
  50.         url: string
  51.       }
  52.     }
  53.     images: {
  54.       url: string
  55.       alt: string
  56.     }[]
  57.   }
  58.   open_graph: {
  59.     title: string
  60.     type: string
  61.     images?: {
  62.       url: string
  63.       secure_url?: string
  64.       type: string
  65.       width: number
  66.       height: number
  67.     }[]
  68.     url?: string
  69.     audio?: {
  70.       url: string
  71.       secure_url?: string
  72.       type: string
  73.     }[]
  74.     description?: string
  75.     determiner?: string
  76.     site_name?: string
  77.     locale: string
  78.     locale_alt: string
  79.     videos: {
  80.       url: string
  81.       stream?: string
  82.       height?: number
  83.       width?: number
  84.       tags?: string[]
  85.     }[]
  86.     article: {
  87.       published_time?: string
  88.       modified_time?: string
  89.       expiration_time?: string
  90.       author?: string
  91.       section?: string
  92.       tags?: string[]
  93.     }
  94.   }
  95. }
  96. ```

The who 💖

_(If you use unfurl.js too feel free to add your project)_
- vapid/vapid - A template-driven content management system
- beeman/micro-unfurl - small microservice that unfurls a URL and returns the OpenGraph meta data.
- probot/unfurl - a GitHub App built with probot that unfurls links on Issues and Pull Request discussions