Oh PHP-MetaParser, how many hours I spent on you.. years ago.
What is this?
PHP-MetaParser was inspired by how Facebook pulled data from links when sharing them. Specifically, it would parse out the title, description and images. That's the goal of PHP-MetaParser.
Specifically, it allows you to provide a body of text (generally, a webpage's markup), and receive meta data back after it's been parsed.
Why did I develop it?
I developed it in an attempt to duplicate what Facebook was able to do. On a project that has been long-burried (Clearmix), I had user profiles, and naturally, comment walls where users could post comments, images, videos and links.
I wanted to add context to the links as Facebook did, so I create a backend script that would CURL the url provided, and then parse it's contents for relevant information.
I believe that I looked at PHPs DOM
functionality, but settled on regular expressions in order to extract the above
mentioned information, as well as the page's base property, in addition to
it's favicon and OpenGraph
tags.
What's included?
This project includes one class which does not take care of the CURL action itself. For this, please see my PHP-Curler library.
This instantiable class includes the following public methods:
getBaseReturns thebaseproperty for the page. Useful for parsing links in a documentgetDescriptionReturns the meta tag description for the page, if foundgetDetailsReturns an array of all the possible data that could be parsed from the documentgetFaviconReturns the path to the favicon for the page. If one is not explictely defined, it makes a best guessed based on the host andbasepropertygetKeywordsReturns the page's meta tag keywords, if definedgetOpenGraphReturns OpenGraph details for the page, if definedgetTitleReturns thetitleattribute for the page, if definedgetURLReturns the parsed URL for the page
How do I use it?
To use this library, it makes the expectation that you already have the body
of text which is to be parsed, and have it's data extracted from. By simply
creating an instance of a MetaParser object, and passing in the body of text
and URL, you can then access all the data through the getDetails method on
that instance.
Why did I abstract it out?
That's an especially relevant question with this library, as it's been decoupled from the actual CURLing.
I abstracted this library out to do just the parsing as I found I was performing CURL calls elsewhere in my codebase. I didn't want to have the parsing an inherent part of that.
I thought about extending the CURL library for the MetaParser class, but for a
reason I can't recall right now, it didn't make sense programmatically or from
a business-logic perspective.
PHP-Gravatar, in it's beautiful-simplicity, is next.