Cyotek Development Bloghttps://devblog.cyotek.com/tag/office/atom.xml2012-09-26T18:24:44ZExtracting email addresses from Outlookurn:uuid:9351bcc1-f7d6-41ab-902c-d48cf9730ead2012-09-26T18:24:44Z2012-09-26T18:24:44Z<p>The cyotek.com receives an awful lot of spam and a lot of this
is sent to email addresses that don't exist. However, as we
currently have catch all's enabled, it means we receive it
regardless. This is compounded by the fact that I tend to create
a unique email address for each website or service I interact
with. And it's impossible to remember them all!</p>
<p>As a first step to deleting the catch alls, I wanted to see how
many unique @cyotek.com addresses were in use. The simplest way
of picking up these would be scanning PST files - we have email
going back to 2002 in these files, and there's the odd backup
elsewhere going back even further. Last time I used OLE
Automation with Outlook was back in the days of VB6 and I recall
well getting plagued with permission dialogs each time I dreamed
of trying to access the API. Still, I thought I'd take a look.</p>
<figure class="screenshot" ><a href="https://images.cyotek.com/image/devblog/outlook.png" class="gallery" title="A console application merrily extracting my personal email addresses from my Outlook store" ><img src="https://images.cyotek.com/image/thumbnail/devblog/outlook.png" alt="A console application merrily extracting my personal email addresses from my Outlook store" decoding="async" loading="lazy" /></a><figcaption>A console application merrily extracting my personal email addresses from my Outlook store</figcaption></figure><h2 id="setting-up">Setting up</h2>
<blockquote>
<p>Note: I tested this project on an Outlook profile which has
loaded a primary PST, an archive PST, and a Gmail account. I
haven't tested this with any other type of account (for
example Exchange) or with accounts using non-SMTP email
addresses. Caveat emptor!</p>
</blockquote>
<p>The first thing to do is add a reference to the Outlook COM
objects. I have VS2010 and VS2012 installed on this machine, and
one of them has installed a bunch of prepared Office Interop
DLL's into the GAC. Handy, I won't have to create my own! Adding
a reference to the <strong>Microsoft Outlook 14.0 Object Library</strong>
added three references,
<strong>Microsoft.Office.Interop.Outlook.dll</strong>, <strong>Office.dll</strong> and
<strong>stdole</strong> to my project.</p>
<blockquote>
<p>Note: Depending on your version of VS / .NET Framework, the
references may have a property named <strong>Embed Interop Types</strong>
which defaults to <code>true</code>. When left at this, you may have
problems debugging as you won't be able to access the objects
properly through the Immediate window, instead getting an
error similar to</p>
<blockquote>
<p>&quot;Member 'To' on embedded interop type
'Microsoft.Office.Interop.Outlook.MailItem' cannot be
evaluated while debugging since it is never referenced in the
program. Consider casting the source object to type 'dynamic'
first or building with the 'Embed Interop Types' property set
to false when debugging&quot;</p>
</blockquote>
<p>Probably a good idea to set this to <strong>false</strong> before debugging
your code!</p>
</blockquote>
<h2 id="connecting-to-outlook">Connecting to Outlook</h2>
<blockquote>
<p>All the code below assumes that you have a <code>using Microsoft.Office.Interop.Outlook;</code> statement at the top of
your code file.</p>
</blockquote>
<p>Connecting to Outlook is easy enough, just create a new instance
of the Application interface. We'll use as a root for everything
else.</p>
<figure class="lang-csharp highlight"><figcaption><span>csharp</span></figcaption><pre class="code">
Application application<span class="symbol">;</span>

application <span class="symbol">=</span> <span class="keyword">new</span> Application<span class="symbol">(</span><span class="symbol">)</span><span class="symbol">;</span>
</pre>
</figure>
<blockquote>
<p>Remember I mentioned permission dialogs? Older versions of
Outlook used to prompt for permissions. Outlook 2010 just
seems to quietly get on with things. The only thing I've
noticed is that if you try and create a new <code>Application</code> when
Outlook isn't currently running, it will be silently started
and the system tray icon will have a slightly different icon
and a tooltip informing that some other program is using
Outlook. Much nicer than previous behaviours!</p>
</blockquote>
<h2 id="getting-account-folders">Getting Account Folders</h2>
<p>The <code>Session</code> property of the <code>Application</code> interface returns a
<code>NameSpace</code> that details your Outlook setup, and allows access
to accounts, profile details etc. However, for this project, the
only thing I care about is the <code>Folders</code> property which returns
a collection of <code>MAPIFolder</code> objects. In my case, it was the
three top level folders for my profile - I was somewhat
surprised that the Gmail account was loaded actually.</p>
<p>Now that we have a folder, we can scan it by enumerating the
<code>Items</code> property. As Outlook folders can contain items of
various types, you need to check the item type - I'm looking for
<code>MailItem</code> objects in order to extract those addresses.</p>
<h2 id="pulling-out-email-addresses">Pulling out email addresses</h2>
<p>Each <code>MailItem</code> has <code>Sender</code>, <code>To</code> and <code>Recipients</code> properties.
<code>To</code> seems to be just a string version of <code>Recipients</code> and so
shall be completely ignored - why bother parsing it manually
when <code>Recipients</code> already does it for you. The <code>Sender</code> property
returns an <code>AddressEntry</code>, and each item in the <code>Recipients</code>
collection (a <code>Recipient</code>) offers an <code>AddressEntry</code> property. So
we're all set!</p>
<p>The following code snippet is from the example project, and
basically shows how I scan a source <code>MAPIFolder</code> looking for
<code>MailItem</code> objects.</p>
<figure class="lang-csharp highlight"><figcaption><span>csharp</span></figcaption><pre class="code">
<span class="keyword">protected</span> <span class="keyword">virtual</span> <span class="keyword">void</span> ScanFolder<span class="symbol">(</span>MAPIFolder folder<span class="symbol">)</span>
<span class="symbol">{</span>
 <span class="keyword">this</span><span class="symbol">.</span>CurrentFolderIndex<span class="symbol">++</span><span class="symbol">;</span>
 <span class="keyword">this</span><span class="symbol">.</span>OnFolderScanning<span class="symbol">(</span><span class="keyword">new</span> MAPIFolderEventArgs<span class="symbol">(</span>folder<span class="symbol">,</span> <span class="keyword">this</span><span class="symbol">.</span>FolderCount<span class="symbol">,</span> <span class="keyword">this</span><span class="symbol">.</span>CurrentFolderIndex<span class="symbol">)</span><span class="symbol">)</span><span class="symbol">;</span>

 <span class="comment">// items</span>
 <span class="keyword">foreach</span> <span class="symbol">(</span><span class="keyword">object</span> item <span class="keyword">in</span> folder<span class="symbol">.</span>Items<span class="symbol">)</span>
 <span class="symbol">{</span>
 <span class="keyword">if</span> <span class="symbol">(</span>item <span class="keyword">is</span> MailItem<span class="symbol">)</span>
 <span class="symbol">{</span>
 MailItem email<span class="symbol">;</span>

 email <span class="symbol">=</span> <span class="symbol">(</span>MailItem<span class="symbol">)</span>item<span class="symbol">;</span>

 <span class="comment">// add the sender of the email</span>
 <span class="keyword">if</span> <span class="symbol">(</span><span class="keyword">this</span><span class="symbol">.</span>Options<span class="symbol">.</span>HasFlag<span class="symbol">(</span>Options<span class="symbol">.</span>Sender<span class="symbol">)</span><span class="symbol">)</span>
 <span class="keyword">this</span><span class="symbol">.</span>ProcessAddress<span class="symbol">(</span>email<span class="symbol">.</span>Sender<span class="symbol">)</span><span class="symbol">;</span>

 <span class="comment">// add the recipies of the email</span>
 <span class="keyword">if</span> <span class="symbol">(</span><span class="keyword">this</span><span class="symbol">.</span>Options<span class="symbol">.</span>HasFlag<span class="symbol">(</span>Options<span class="symbol">.</span>Recipient<span class="symbol">)</span><span class="symbol">)</span>
 <span class="symbol">{</span>
 <span class="keyword">foreach</span> <span class="symbol">(</span>Recipient recipient <span class="keyword">in</span> email<span class="symbol">.</span>Recipients<span class="symbol">)</span>
 <span class="keyword">this</span><span class="symbol">.</span>ProcessAddress<span class="symbol">(</span>recipient<span class="symbol">.</span>AddressEntry<span class="symbol">)</span><span class="symbol">;</span>
 <span class="symbol">}</span>
 <span class="symbol">}</span>
 <span class="symbol">}</span>

 <span class="comment">// sub folders</span>
 <span class="keyword">if</span> <span class="symbol">(</span><span class="keyword">this</span><span class="symbol">.</span>Options<span class="symbol">.</span>HasFlag<span class="symbol">(</span>Options<span class="symbol">.</span>SubFolders<span class="symbol">)</span><span class="symbol">)</span>
 <span class="symbol">{</span>
 <span class="keyword">foreach</span> <span class="symbol">(</span>MAPIFolder childFolder <span class="keyword">in</span> folder<span class="symbol">.</span>Folders<span class="symbol">)</span>
 <span class="keyword">this</span><span class="symbol">.</span>ScanFolder<span class="symbol">(</span>childFolder<span class="symbol">)</span><span class="symbol">;</span>
 <span class="symbol">}</span>
<span class="symbol">}</span>
</pre>
</figure>
<p>When I find an <code>AddressEntry</code> to process, I call the following
functions:</p>
<figure class="lang-csharp highlight"><figcaption><span>csharp</span></figcaption><pre class="code">
<span class="keyword">protected</span> <span class="keyword">virtual</span> <span class="keyword">void</span> ProcessAddress<span class="symbol">(</span>AddressEntry addressEntry<span class="symbol">)</span>
<span class="symbol">{</span>
 <span class="keyword">if</span> <span class="symbol">(</span>addressEntry <span class="symbol">!=</span> <span class="keyword">null</span> <span class="symbol">&amp;&amp;</span> <span class="symbol">(</span>addressEntry<span class="symbol">.</span>AddressEntryUserType <span class="symbol">==</span> OlAddressEntryUserType<span class="symbol">.</span>olSmtpAddressEntry <span class="symbol">||</span> addressEntry<span class="symbol">.</span>AddressEntryUserType <span class="symbol">==</span> OlAddressEntryUserType<span class="symbol">.</span>olOutlookContactAddressEntry<span class="symbol">)</span><span class="symbol">)</span>
 <span class="keyword">this</span><span class="symbol">.</span>ProcessAddress<span class="symbol">(</span>addressEntry<span class="symbol">.</span>Address<span class="symbol">)</span><span class="symbol">;</span>
 <span class="keyword">else</span> <span class="keyword">if</span> <span class="symbol">(</span>addressEntry <span class="symbol">!=</span> <span class="keyword">null</span><span class="symbol">)</span>
 Debug<span class="symbol">.</span>Print<span class="symbol">(</span><span class="string">&quot;Unknown address type: {0} ({1})&quot;</span><span class="symbol">,</span> addressEntry<span class="symbol">.</span>AddressEntryUserType<span class="symbol">,</span> addressEntry<span class="symbol">.</span>Address<span class="symbol">)</span><span class="symbol">;</span>
<span class="symbol">}</span>

<span class="keyword">protected</span> <span class="keyword">virtual</span> <span class="keyword">void</span> ProcessAddress<span class="symbol">(</span><span class="keyword">string</span> emailAddress<span class="symbol">)</span>
<span class="symbol">{</span>
 <span class="keyword">int</span> domainStartPosition<span class="symbol">;</span>

 domainStartPosition <span class="symbol">=</span> emailAddress<span class="symbol">.</span>IndexOf<span class="symbol">(</span><span class="string">&quot;@&quot;</span><span class="symbol">)</span><span class="symbol">;</span>

 <span class="keyword">if</span> <span class="symbol">(</span><span class="symbol">!</span><span class="keyword">string</span><span class="symbol">.</span>IsNullOrEmpty<span class="symbol">(</span>emailAddress<span class="symbol">)</span> <span class="symbol">&amp;&amp;</span> domainStartPosition <span class="symbol">!=</span> <span class="symbol">-</span><span class="number">1</span><span class="symbol">)</span>
 <span class="symbol">{</span>
 <span class="keyword">bool</span> canAdd<span class="symbol">;</span>

 <span class="keyword">if</span> <span class="symbol">(</span><span class="keyword">this</span><span class="symbol">.</span>Options<span class="symbol">.</span>HasFlag<span class="symbol">(</span>Options<span class="symbol">.</span>FilterByDomain<span class="symbol">)</span><span class="symbol">)</span>
 canAdd <span class="symbol">=</span> <span class="keyword">this</span><span class="symbol">.</span>IncludedDomains<span class="symbol">.</span>Contains<span class="symbol">(</span>emailAddress<span class="symbol">.</span>Substring<span class="symbol">(</span>domainStartPosition <span class="symbol">+</span> <span class="number">1</span><span class="symbol">)</span><span class="symbol">)</span><span class="symbol">;</span>
 <span class="keyword">else</span>
 canAdd <span class="symbol">=</span> <span class="keyword">true</span><span class="symbol">;</span>

 <span class="keyword">if</span> <span class="symbol">(</span>canAdd<span class="symbol">)</span>
 <span class="keyword">this</span><span class="symbol">.</span>EmailAddresses<span class="symbol">.</span>Add<span class="symbol">(</span>emailAddress<span class="symbol">)</span><span class="symbol">;</span>
 <span class="symbol">}</span>
<span class="symbol">}</span>
</pre>
</figure>
<p>Although I'm scanning my entire PST, I don't want every single
email address in there - I ran it once and it brought back just
over 5000 addresses. What I want, is addresses tied to the
domains I own, so I added some filtering for this. With this
filtering enabled it returned a more manageable 497 unique
addresses. Although I'm not creating 497 aliases on the email
server!</p>
<h3 id="wrapping-up">Wrapping up</h3>
<p>This is a lot easier than what I was expecting, and in fact this
is probably the smoothest piece of COM interop I've done with
.NET yet. No strange errors, no forced to compile in 32bit mode,
It Just Works.</p>
<p>You can find the example project in the link below.</p>
<h2 id="update-history">Update History</h2>
<ul>
<li>2012-09-26 - First published</li>
<li>2020-11-21 - Updated formatting</li>
</ul>

<p><small>
All content <a href="https://devblog.cyotek.com/copyright-and-trademarks">Copyright (c) by Cyotek Ltd</a> or its respective writers. Permission to reproduce news and web log entries and other RSS feed content in unmodified form without notice is granted provided they are not used to endorse or promote any products or opinions (other than what was expressed by the author) and without taking them out of context. Written permission from the copyright owner must be obtained for everything else.<br />Original URL of this content is https://devblog.cyotek.com/post/extracting-email-addresses-from-outlook .
</small></p>Richard Mosshttps://www.cyotek.com/richard.moss@cyotek.com