Oh PHP-MetaParser, how many hours I spent on you.. years ago.
PHP-MetaParser was inspired by how Facebook pulled data from links when sharing them. Specifically, it would parse out the title, description and images. That's the goal of PHP-MetaParser.
Specifically, it allows you to provide a body of text (generally, a webpage's markup), and receive meta data back after it's been parsed.
I developed it in an attempt to duplicate what Facebook was able to do. On a project that has been long-burried (Clearmix), I had user profiles, and naturally, comment walls where users could post comments, images, videos and links.
I wanted to add context to the links as Facebook did, so I create a backend script that would CURL the url provided, and then parse it's contents for relevant information.
I believe that I looked at PHPs DOM
functionality, but settled on regular expressions in order to extract the above
mentioned information, as well as the page's
base property, in addition to
favicon and OpenGraph
This project includes one class which does not take care of the CURL action itself. For this, please see my PHP-Curler library.
This instantiable class includes the following public methods:
baseproperty for the page. Useful for parsing links in a document
getDescriptionReturns the meta tag description for the page, if found
getDetailsReturns an array of all the possible data that could be parsed from the document
getFaviconReturns the path to the favicon for the page. If one is not explictely defined, it makes a best guessed based on the host and
getKeywordsReturns the page's meta tag keywords, if defined
getOpenGraphReturns OpenGraph details for the page, if defined
titleattribute for the page, if defined
getURLReturns the parsed URL for the page
To use this library, it makes the expectation that you already have the body
of text which is to be parsed, and have it's data extracted from. By simply
creating an instance of a
MetaParser object, and passing in the body of text
and URL, you can then access all the data through the
getDetails method on
That's an especially relevant question with this library, as it's been decoupled from the actual CURLing.
I abstracted this library out to do just the parsing as I found I was performing CURL calls elsewhere in my codebase. I didn't want to have the parsing an inherent part of that.
I thought about extending the CURL library for the
MetaParser class, but for a
reason I can't recall right now, it didn't make sense programmatically or from
a business-logic perspective.
PHP-Gravatar, in it's beautiful-simplicity, is next.