Detecting and Dealing with Website Theft

Hero image for Detecting and Dealing with Website Theft. Image by James Pond.
Hero image for 'Detecting and Dealing with Website Theft.' Image by James Pond.

It was Oscar Wilde who once said:

Imitation is the sincerest form of flattery...

And often, when quoted, that is as far as the quoter will go, intending to use the quote as a modern way to brush off mimicry as little more than paying a compliment. The full quote, however, is a little more devastating, although perhaps a lot less modern and a lot more clunky to use:

Imitation is the sincerest form of flattery that mediocrity can pay to greatness.

In context, what he actually says is that should imitation have to occur, then we can be confident in the fact that the individual doing the imitation is at best mediocre. The act of imitating is flattering by the very fact that it demonstrates just how unable and secondrate the imitator is when compared to the imitated.

In the world of web design and web development, the act of imitating, copying, or outright stealing another person's work is rife. I have found versions of my personal work more times than I care to remember, and I have never found it flattering.

For me, this has all risen to the surface again very recently when I stumbled across an almost exact replica of my (very recently released) personal website. They have not only replicated the look and feel down to matching structure, colours, and fonts, but they had gone as far as to outright copy the text I'd written about myself.

Apparently, I had changed my name, moved to Aberdeen, and gotten a lot worse at both spelling and web development judging by that portfolio...

Due to the nature of the web, this sort of thing is incredibly common: your server delivers a copy of your work straight to their browser every time it is requested by anybody. Once it's there in their browser, it is often as easy as a File > Save click away, although there are some steps you can take to avoid making it too easy.

This has been happening with my own personal websites as far back as 2007. Seeing your hard work misappropriated or used as a marketing tool for somebody else never gets easier.

Although the realworld likelihood of my website ever getting confused with either of the above examples is very slim, it still very much amounts to copyright theft. An author would not allow you to publish a copy (or even part) of their book without permission, Apple spent seven years protecting their IP against Samsung, and in 1978, Twentieth Century Fox sued Universal Studios because Battlestar Galactica stepped just a little too far into the Star Wars universe. That video is absolutely fascinating, by the way!

In the world of web development, there are practical concerns at play, too: copying the text from somebody else's website (even just portions of it) causes duplicate content, which can lead to poorer search engine performance for both sites. In situations where the search engine is unable to ascertain which version is the original, it will simply filter both out altogether. Although not an officiallyimposed penalty, what this then does is push both sites down the rankings, or even worse rank the copied version higher than the original.

This sort of copying is virtually ubiquitous with younger, rookie developers starting out; my About page is a very common source for other developer's biographies and LinkedIn profiles:

Whilst some are happy for their work to be reused by anybody (even going as far as to publish it in freetoaccess Git repositories), there are many who take ownership of their intellectual property very seriously. For them, there are a few steps that can be taken to help minimise copies and to detect and deal with those who copy it anyway quickly.


Detecting Copied Text

As I mentioned above, it is impossible to stop people from copying and reusing your work if it is published online, and although there are ways you can intercept, track, and educate potential copiers, the fact is that if you get enough visitors to your site, your work will eventually find its way onto another website, advertising another (competing) person or company.

There are two key tools you can use to detect when this happens.

Google Alerts

As I've talked about previously, I track clipboard events on my websites and inject a copyright notice into them if something is copied off the site. It's a very simple process which does two things for me:

  • Makes sure that the person copying off the site is aware that the stuff they are copying is copyright protected;
  • Keeps a log of specifically what text was copied and when.

This second point is very useful: it gives me a headsup of exactly what text I might later find being reused without permission.

Google Alerts is a free tool which you can use to shadow just these cases of copied text. Creating an alert for an excerpt of your text in quotes is very straightforward. It will immediately show you any other matching sites within the Google index, and you can then use options there to select how often and how many results you want Google to alert you about going forwards.

Screenshot of Google Alerts showing an existing case for "an advocate of fundamentally well-engineered, maintainable, testable and scalable", which has been copied and used on azharr.com.

The two important settings to pay attention to when setting up an event are:

  • How often:

    Asithappens
  • How many:

    All results

This ensures that you'll get an email moreorless immediately when a matching website is indexed and that it sends you every single match rather than filtering any out.

Copyscape

Copyscape is the other tool I use semiregularly. This isn't free, but is very affordable and worth the money each scan of my personal website, including the blog articles tends to come in under a couple of quid.

Their 'Batch Search' tool allows you to input a list of URLs (lineseparated), which it then treats as the content source during its scan. This is one of the main reasons that I still maintain and generate a urllist.txt sitemap you can literally just copy the whole list into Copyscape and hit the 'Start the batch' button.

What it will then do is scan the web looking for matches. A few moments later, you will be looking at a list of definitive sources and copies (broken down by URL), showing you exactly where Copyscape has found duplicate content:

Screenshot from the Copyscape Batch Search tool displaying found duplicate content on yahiaaljanabi.com and the same developer's Upwork profile.

Copyscape does also offer a more automated scanning and protection service where it does these scans for you on a regular basis. For me, I feel that simply logging on and running a manual batch scan occasionally is more than enough to keep track of things.


Detecting Copied Code

Generally, the most surefire indicator you can have that somebody else it attempting to use your code online is hotlinking: where they have a copy of your code but haven't yet edited out all the links back to your domain and your server.

It falls a little outside the scope of this article to delve too deeply into this one, but suffice it to say: it is well worth keeping an eye on your server error logs. If you start to see remote URLs attempting to call assets from your site especially static assets like CSS or JavaScript then it is well worth taking a deeper look into those referrals to see why another site might be loading in your code.


What to Do Once You've Found a Copy

If you do find somebody else has copied your work, there are a number of different options you can take, ranging from very softtouch all the way up to the more drastic, permanent action.

Reach Out to Them

This is the easy one. During your first communication with the copier, there is no need to be confrontational; just a simple message saying, "Hey I'm pleased you found inspiration in my work. Could you not copy it outright though, please?" is often all it takes to make the problem go away:

Screenshot of Twitter chat between @johnkavanagh and @azhrzafar about the text he had copied from johnkavanagh.co.uk/about.

Sometimes actually finding their contact details can be a little more difficult: if there's no contact form or email address on the website itself then try Twitter or LinkedIn, or even take a look at the domain's WHOIS registration data.

Failing That..

Sometimes you will fail to get hold of the, or won't be able to find contact details, or they simply won't respond, or as I've experienced a few times they will take it as a personal affront and tell you to unceremoniously insert your copyright where the sun don't shine. I would always advocate the softer approach above but if they really aren't willing to remove your content, then this is where the Digital Millennium Copyright Act comes into play.

Although strictly an American law, what it does is build upon two treaties from the World Intellectual Property Organisation, so its scope is worldwide, not to mention that the majority of hosting and bigtech companies have at least some stake in the US, so it will respond to these notices whether you are a US citizen or not.

Shift4Shop has a very helpful singlepage generator which you can use to generate your notice. There are a few places you can send this:

To the Website Owner

If you have a way of contacting them, send it to them. Chances are that they won't react favourably to this given that you have already reached this point, but it is a solid step to take.

To the Website Host

You are more likely to get a response here: no web host wants to be liable for hosting copyrighted material. Who Hosts This will tell you who the website hosts are, and it is then a relatively straightforward process to reach out to them directly with your DMCA notice.

I've found that searching for the hostname with 'DMCA' appended will often lead you to their legal documentation and process. For example, here is Digital Ocean's. If searching in this way does not work, try using CommandF on the host's Policy pages (usually in their footer) for an email address, or even use their social media accounts to ask for their legal contact details.

Expect the host to respond to you within a few days and either take the materials down themselves or give their customer an ultimatum to do so themselves.

To Google

The concern over search engine performance and duplicate content can be mitigated by posting a DMCA notice to Google too. They hold a copy of your copyrighted content and will want to remove it if you make them aware.

Like everything other process in this article, this is relatively straightforward when you know now. A form within Google Search Console allows you to post a formal notice to them. Make sure you note every infringing URL within your notice and be very specific about the nature of your claim to ownership. Their response times can vary, but if your case is clear and all else has failed, you can expect that they will remove the URL from their search index altogether within a couple of weeks.

To Bing

Like Google, Bing has a very similar form and procedure to getting your copyrighted work removed from their search results. I have found that they tend to respond quite a bit quicker than Google does, but the end result is much the same.

To Others

If you find as I have in the past that your work is being used on platforms like LinkedIn or Facebook, they also have very clear copyright notification processes in place which you can follow to get the content removed.

After All That..

At the end of all of this, the copied website owner has hopefully responded and the material has been taken down. If not, then with luck the host has. If not, then at least you can be confident that their website is not competing with your own in the search engine rankings.

I have usually found that by the time Google has come back to me about a notice, the owner or host has already taken action to remove it themselves.


What If I Want to Be the Copier?

I should say that people's personal feelings on this type of copying and imitation vary drastically: we live in a society where developers very freely place their work in opensource licenses or freetouse Git repos. On the flip side: a lot of the projects I work on are very stringently protected and I go well out of my way to make this as clear as possible to my visitors.

Somebody somewhere has often spent a significant amount of money on the works you see online on every website, it is entirely their prerogative to protect their ownership if they so wish.

If you come across a website or a piece of work that you find particularly inspirational then the first thing you should always do is reach out to the website owner and ask. It is a simple, easy, quick, and respectful thing to do. Even if the owner doesn't want you to copy their work outright, they may be happy to point you in the right direction or towards close alternatives. I am certainly always happy to offer guidance and advice, even though I really do not want you using my personal website to try and sell your wares.


Categories:

  1. Copyright
  2. CSS
  3. Development
  4. Front‑End Development
  5. Guides
  6. JavaScript
  7. Sass
  8. Search Engine Optimisation