<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Source Allies Blog &#187; Mechanize</title>
	<atom:link href="http://blogs.sourceallies.com/tag/mechanize/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.sourceallies.com</link>
	<description>Technical and process thinking from Source Allies employees</description>
	<lastBuildDate>Mon, 26 Jul 2010 15:01:47 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Automating the Web with WWW::Mechanize</title>
		<link>http://blogs.sourceallies.com/2010/01/automating-the-web-with-wwwmechanize/</link>
		<comments>http://blogs.sourceallies.com/2010/01/automating-the-web-with-wwwmechanize/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 14:00:19 +0000</pubDate>
		<dc:creator>Joel Leyh</dc:creator>
				<category><![CDATA[Perl]]></category>
		<category><![CDATA[Mechanize]]></category>

		<guid isPermaLink="false">http://blogs.sourceallies.com/?p=953</guid>
		<description><![CDATA[And yes, the double colon does mean Perl. However, I know that Python also has the same class modeled after the Perl module. So even if py- is your favorite prefix, this should still be useful.
WWW::Mechanize gives you basic access to a &#8220;web browser&#8221; from your Perl scripts. It has the concept of getting, putting, [...]]]></description>
			<content:encoded><![CDATA[<p>And yes, the double colon does mean Perl. However, I know that Python also has the same class modeled after the Perl module. So even if py- is your favorite prefix, this should still be useful.</p>
<p><a href="http://search.cpan.org/~petdance/WWW-Mechanize/lib/WWW/Mechanize.pm" target="_blank">WWW::Mechanize</a> gives you basic access to a &#8220;web browser&#8221; from your Perl scripts. It has the concept of getting, putting, ticking and clicking. Use an image map, or enter text into a text box. It even has a back button! Using all these and more, one can make quite the script to do most anything. I&#8217;ve used this before to create a script that logged into a Google Search Appliance and download a backup file. (Since for some reason, there is no way to push backups from within a GSA)</p>
<p>More recently, I decided to automate the downloading of PDF statements from my bank&#8217;s website. This is a popular use for WWW::Mechanize, and I&#8217;ll go through a quick script which will do just this.<br />
<span id="more-953"></span><br />
Let&#8217;s start like any good Perl script should, and also include some needed modules&#8230;</p>

<div class="wp_syntax"><div class="code"><pre class="perl" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">use</span> strict<span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">use</span> WWW<span style="color: #339933;">::</span><span style="color: #006600;">Mechanize</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">use</span> HTTP<span style="color: #339933;">::</span><span style="color: #006600;">Cookies</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$robut</span> <span style="color: #339933;">=</span> WWW<span style="color: #339933;">::</span><span style="color: #006600;">Mechanize</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">new</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;"># look like a real person</span>
<span style="color: #0000ff;">$robut</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">agent</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'User-Agent=Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;"># we need cookies</span>
<span style="color: #0000ff;">$robut</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">cookie_jar</span><span style="color: #009900;">&#40;</span>HTTP<span style="color: #339933;">::</span><span style="color: #006600;">Cookies</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">new</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>This simply creates a new Mechanize object and sets a sane user-agent string. Also, we need to save cookies, so we need to create a new cookie jar.</p>
<p>Next, we&#8217;ll load the first page and set the credentials. Hopefully, the rest of the code (and the bank I use) is self-explanatory&#8230;</p>

<div class="wp_syntax"><div class="code"><pre class="perl" style="font-family:monospace;"><span style="color: #666666; font-style: italic;"># we start at login</span>
<span style="color: #0000ff;">$robut</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">get</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'https://www.yourbankURLhere.com/banking/router?requestCmd=LoginPage'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #0000ff;">$robut</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">success</span> <span style="color: #b1b100;">or</span> <span style="color: #000066;">die</span> <span style="color: #ff0000;">&quot;login GET fail&quot;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$user</span> <span style="color: #339933;">=</span> <span style="color: #ff0000;">'woooobar'</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$pass</span> <span style="color: #339933;">=</span> <span style="color: #ff0000;">'piglet'</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># find a fill out the login form</span>
<span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$login</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">$robut</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">form_name</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;logon&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #0000ff;">$login</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">value</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'USERID'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #0000ff;">$user</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #0000ff;">$robut</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">submit</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #0000ff;">$robut</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">success</span> <span style="color: #b1b100;">or</span> <span style="color: #000066;">die</span> <span style="color: #ff0000;">&quot;login POST fail&quot;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000066;">print</span> <span style="color: #ff0000;">&quot;Login done<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span></pre></div></div>

<p>I create a new &#8220;form&#8221; object, using the <code>form_name</code> call which is simply passed the &#8220;name&#8221; property of the form from the HTML of the page. Using Firebug in Firefox, this information is easily had.</p>
<p>And because my bank has the really annoying feature of prompting me to answer yet another question, I have the next bit of code to handle that&#8230;</p>

<div class="wp_syntax"><div class="code"><pre class="perl" style="font-family:monospace;"><span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$response</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">$robut</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">response</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">content</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># we have another step</span>
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #000066;">index</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">$response</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">'Answer your other Question'</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&gt;</span> <span style="color: #cc66cc;">0</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
	<span style="color: #000066;">print</span> <span style="color: #ff0000;">&quot;Answer needed...&quot;</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$ans</span><span style="color: #339933;">;</span>
	<span style="color: #666666; font-style: italic;"># and we need to figure out which question was asked</span>
	<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #000066;">index</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">$response</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">'what goes well with foo'</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&gt;</span> <span style="color: #cc66cc;">0</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span> <span style="color: #0000ff;">$ans</span> <span style="color: #339933;">=</span> <span style="color: #ff0000;">'more foo'</span><span style="color: #339933;">;</span> <span style="color: #009900;">&#125;</span>
	<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #000066;">index</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">$response</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">'where does your mother live'</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&gt;</span> <span style="color: #cc66cc;">0</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span> <span style="color: #0000ff;">$ans</span> <span style="color: #339933;">=</span> <span style="color: #ff0000;">'not here'</span><span style="color: #339933;">;</span> <span style="color: #009900;">&#125;</span>
&nbsp;
	<span style="color: #0000ff;">$login</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">$robut</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">form_name</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'challenge'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #0000ff;">$login</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">value</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'ANSWER'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #0000ff;">$ans</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #0000ff;">$login</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">value</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'CHALLENGEANSWER'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #0000ff;">$ans</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #0000ff;">$robut</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">submit</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #0000ff;">$robut</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">success</span> <span style="color: #b1b100;">or</span> <span style="color: #000066;">die</span> <span style="color: #ff0000;">&quot;challenge POST fail&quot;</span><span style="color: #339933;">;</span>
	<span style="color: #000066;">print</span> <span style="color: #ff0000;">&quot;Question done<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>This part uses <code>$robut-&gt;response-&gt;content</code>. This is the HTML of the response, and I&#8217;m searching it for various strings to decide on what to do next. Again, Firebug&#8217;s net feature was helpful in determining what I needed to submit to the form. WWW::Mechanize will use any default values provided by the page, so you don&#8217;t need to repeat every form item. <strong>One important thing to note</strong> is WWW::Mechanize doesn&#8217;t do JavaScript. So, if you have a bunch of JS going on before the form actually posts, you need to make sure your script accounts for that.</p>
<p>Now the password page is here, and it is submitted.</p>

<div class="wp_syntax"><div class="code"><pre class="perl" style="font-family:monospace;"><span style="color: #666666; font-style: italic;"># time for the password</span>
<span style="color: #0000ff;">$login</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">$robut</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">form_name</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'password'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #0000ff;">$login</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">value</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'PSWD'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #0000ff;">$pass</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #0000ff;">$robut</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">submit</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #0000ff;">$robut</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">success</span> <span style="color: #b1b100;">or</span> <span style="color: #000066;">die</span> <span style="color: #ff0000;">&quot;password POST fail&quot;</span><span style="color: #339933;">;</span>
<span style="color: #000066;">print</span> <span style="color: #ff0000;">&quot;Password done<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span></pre></div></div>

<p>Now we can &#8220;click&#8221; on the link on the main account page. The <code>follow_link</code> method will locate the first link with the given &#8220;text&#8221;. This is the text between the <code>&lt;a&gt;</code> tags.</p>

<div class="wp_syntax"><div class="code"><pre class="perl" style="font-family:monospace;"><span style="color: #0000ff;">$robut</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">follow_link</span><span style="color: #009900;">&#40;</span>text <span style="color: #339933;">=&gt;</span> <span style="color: #ff0000;">'Online Statements'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #0000ff;">$robut</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">success</span> <span style="color: #b1b100;">or</span> <span style="color: #000066;">die</span> <span style="color: #ff0000;">&quot;stmts LINK fail&quot;</span><span style="color: #339933;">;</span></pre></div></div>

<p>From here on out, you can code up logic to determine if you are missing any statements, and download them. Or maybe navigate to a page which offers an <a href="http://en.wikipedia.org/wiki/Open_Financial_Exchange">OFX</a> file download, which you then load into <a href="http://www.gnucash.org/">GnuCash</a> because you do a very good job at keeping track of your finances.</p>
<p><em>The human must click<br />
With mechanize write a script<br />
Now CRON does the work<br />
</em></p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.sourceallies.com/2010/01/automating-the-web-with-wwwmechanize/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
