Oh PHP-MetaParser, how many hours I spent on you.. years ago.
What is this?
PHP-MetaParser was inspired by how Facebook pulled data from links when sharing them. Specifically, it would parse out the title, description and images. That's the goal of PHP-MetaParser.
Specifically, it allows you to provide a body of text (generally, a webpage's markup), and receive meta data back after it's been parsed.
Why did I develop it?
I developed it in an attempt to duplicate what Facebook was able to do. On a project that has been long-burried (Clearmix), I had user profiles, and naturally, comment walls where users could post comments, images, videos and links.
I wanted to add context to the links as Facebook did, so I create a backend script that would CURL the url provided, and then parse it's contents for relevant information.
I believe that I looked at PHPs DOM
functionality, but settled on regular expressions in order to extract the above
mentioned information, as well as the page's base
property, in addition to
it's favicon
and OpenGraph
tags.
What's included?
This project includes one class which does not take care of the CURL action itself. For this, please see my PHP-Curler library.
This instantiable class includes the following public methods:
getBase
Returns thebase
property for the page. Useful for parsing links in a documentgetDescription
Returns the meta tag description for the page, if foundgetDetails
Returns an array of all the possible data that could be parsed from the documentgetFavicon
Returns the path to the favicon for the page. If one is not explictely defined, it makes a best guessed based on the host andbase
propertygetKeywords
Returns the page's meta tag keywords, if definedgetOpenGraph
Returns OpenGraph details for the page, if definedgetTitle
Returns thetitle
attribute for the page, if definedgetURL
Returns the parsed URL for the page
How do I use it?
To use this library, it makes the expectation that you already have the body
of text which is to be parsed, and have it's data extracted from. By simply
creating an instance of a MetaParser
object, and passing in the body of text
and URL, you can then access all the data through the getDetails
method on
that instance.
Why did I abstract it out?
That's an especially relevant question with this library, as it's been decoupled from the actual CURLing.
I abstracted this library out to do just the parsing as I found I was performing CURL calls elsewhere in my codebase. I didn't want to have the parsing an inherent part of that.
I thought about extending the CURL library for the MetaParser
class, but for a
reason I can't recall right now, it didn't make sense programmatically or from
a business-logic perspective.
PHP-Gravatar, in it's beautiful-simplicity, is next.