Useful PHP functions for scraping content
file_get_contents()
Reads an entire file or URL into a string. Some hosts may disallow this function and you'll have to use cURL instead. You can do more with cURL anyways.
Example:
$page_contents = file_get_contents("http://example.com/page.html");
preg_match()
Used to capture a piece of content within a matched regular expression.
Example:
preg_match("/href=(.*?)title=(.*?)/",$page_contents,$match);
$match will have an array of content matched in order from each (.*?).
preg_match_all()
Matches your regular expression many times in your content to produce an array of matches. Useful for getting a list of items that have the same string pattern around them. Such as your Steam games list.
Example:
preg_match("/<h4>(.*?)</h4>/",$page_contents,$match);
..will create an array list of all h4 headings.
print_r() or var_dump()
Display your data and arrays to test if your code is getting the content you want.
str_replace() or preg_replace()
Replaces strings in content. Good for cleaning up extra junk or modifying content. str_ matches literal strings, and preg_ can use regular expressions.
Example:
$content = str_replace("<h4>","<h3>",$content)