Bryan Wilcutt : Curious Internet Bryan Wilcutt's Curious Internet: Plagiarizing the Internet

Saturday, August 11, 2012

Plagiarizing the Internet

By Bryan Wilcutt

The German government recently took a refreshing turn in addressing the issue of plagiarizing content by Content Aggregators.  We believe that addressing the issue properly will cause serious necessary change to the internet.

What is a Content Aggregator?  These are organizations that take streams of information from RSS feeds and will combine them into one single stream—and usually after they strip and modify the content themselves.  The RSS feed itself is a feed of all contents (articles, comments, etc.) from a website.  Curious Internet’s content is published on an RSS feed as well, please feel free to subscribe!   Once this data has been stripped of information leading to its origin, the Content Aggregator slaps their name on it and tries to peddle it as their own original content.  Of course, the content is usually scrubbed and mangled  through a re-writer—an application that changes words around.   For example, the last sentence would be changed to “words changed around an application.”  This is obvious plagiarism.

Since Google is the king of the search engine world, their policies have the biggest impact on the web.  Google’s search engine algorithm, dubbed ‘Panda’, puts a lot of emphasis on the content of websites in order to judge their ranking.  This is the crutch of the issue.  You see, without fresh, original content Google will leave you by the wayside, like road kill on the information super highway.  It isn’t all Google’s fault, either.  Many times it is a site owner who either has no idea how to create original content or lacks the money to buy original content.  

So how bad is this problem?


You can find duplicated, plagiarized contents all over the internet if you look around enough.  Websites like Curious Internet are limited in their output capabilities; we publish once per week.  Other sites attempt to publish daily and unless they have a substantial budget, they commonly publish plagiarized work that has been processed through a Content Aggregator.  Although Google’s Panda algorithm may lack the intelligence to tell the difference, we can.  This all basically leads to money as fresh content means more visitors and that may equate in more sales.  Simple.

Ultimately the blame falls on site owners who knowingly publish plagiarized works.  We do not necessarily blame Google for the issue, we blame the clever human beings who know how to get around Google’s well publicized requirements.  We believe there are ways around the content problem, however. 

At Curious Internet, we just don’t whine about the issues we try to provide solutions to them as well.  Sure, many of our solutions are hyperbole, but you get what you pay for, right (insert a cheesy text smiley here).

Here’s our solution to the plagiarizing problem.  Note that this solution would definitely work, the technology already exists, and only minor changes to a web browser may be required.

SIGNED AUTHORED CONTENT (SAC)


Solution: Digitally sign work by the author which will lead to content protection.  Somewhat tougher to approach, sure, but the technology is there to start this effort.   Web browsers would have to be modified to know how to display signed author content, authors will need the tools to assist them with publishing and signing process as well.   In the future, most websites will only display signed author content and nothing else.  The signature of the author is negotiated, perhaps even purchased, and traceable.   To enforce this content, all search engines will need to agree not to index those who do not follow the SAC standard (which, we’ll admit, doesn’t exist yet).

We envision the future of web authoring to look like this:

Web author gets their own digital signature, a long string of encoded hexadecimal numbers similar to an RSA and SHA key.  These signatures are purchased only once from the DSA (Digital Signature Authority).  Existing DSA keys, using for public-private key encoding, are expensive so SAC keys will need to be cheaper to make use of the DSA, otherwise another organization will need to be created to compete with the DFA; perhaps the WAC (Web Authorization Committee).

The author then creates their own work, whether it is a web page, a document, etc., and uses a currently-non-existing tool to sign that content with their special SAC code.

Web browsers recognize the SAC format and, using another key found at the DSA (or WAC), the contents is verified and displayed.

The web author may have to pay a yearly fee to the WAC or DSA to continue support of their content protection.  Should your content show up on someone else’s website without your permission, you can easily find them and follow the prescribed process in getting it removed or having the plagiarizer removed—we’ll leave the details to the lawyers.

Conclusion


Digitally signing content and banning non-signed content will be the only real way of controlling plagiarism in a world-wide setting.  Of course this approach doesn’t keep an author completely protected but with the use of other tools, such as content duplication checkers, we believe the majority of plagiarism can be eliminated completely and the internet may actually (!) become a place of original thought.

2 comments:

  1. Well, the giant search engine Google itself takes care of the incidences of plagiarism and it penalizes the websites for using the copied content. However, efforts must be done to stop the actions of plagiarizers because it harms the reputation of any website in a long run. Using a free plagiarism checker can be helpful and be used as a cost-effective way to combat plagiarism on the internet.

    ReplyDelete
  2. Agreed, there are a number of plagiarizer checkers out there. I always recommend running your papers through them before turning them in to your professors... good way to save a headache!

    ReplyDelete