[License-discuss] Copyright on APIs

Thorsten Glaser tg at mirbsd.de
Tue Jul 2 17:45:18 UTC 2019


VanL dixit:

>There are two issues here. I don't think anyone would argue that APIs are

I would, but IANAL. European here, though.

>If the API is part of the "Work" for copyright purposes, then copying the
>API

That’s just the thing. The API is the outwards-facing interface,
designed for interoperating with other components, and a deliberate
barrier between modules, so I believe strongly that the EU exemption
applies.

But there’s another point, which might work internationally:

	You don’t copy APIs, you implement them (independently).

Repeat that after me, it’s really important:

	You don’t copy APIs, you implement them (independently).

An API is a description of interfaces you have to implement to
implement the API (on one side) or can call if you’re the caller
(the other side). It’s not code, although it is often described
in code. It’s the interface that’s described by code or other
accompanying documentation.

Let’s use the example of Java™. There’s this binary class de‐
compiler which you can use to obtain a fingerprint-of-sorts of
an API in Java™, called javap. With a bit aftertouch, its out‐
put can be used to construct an independent implementation of
the same API, without copying any code (or more precisely, any
of the creative work):

# extract one class to use as example
$ unzip -p /usr/lib/jvm/java-8-openjdk-i386/jre/lib/rt.jar \
    java/util/logging/Logger.class >Logger.class
# generate its API fingerprint
$ javap Logger.class >Logger.txt

Now, Logger.txt looks like this:

-----BEGIN cutting here may damage your screen surface-----
Compiled from "Logger.java"
public class java.util.logging.Logger {
  static final java.lang.String SYSTEM_LOGGER_RB_NAME;
[…]
  public static java.util.logging.Logger getLogger(java.lang.String);
[…]
  public void log(java.util.logging.LogRecord);
  public void log(java.util.logging.Level, java.lang.String);
[…]
  public void warning(java.lang.String);
  public void info(java.lang.String);
[…]
  static {};
}
-----END cutting here may damage your screen surface-----

Just ignore the first line. The second-to-last line is a static
initialiser. The rest serves as stub for your own implementation,
although you’re mostly able to skip the private members if you
don’t need them. And, one hard thing I’ve learnt, the public
static string constant is copied into classes that use them at
compile time, so you may be able to ignore it, too; otherwise,
its content is most likely API as well.

Let’s make a Java™ source file out of this, which will then be
comprised of solely the API description, but no creative parts yet:

-----BEGIN cutting here may damage your screen surface-----
package java.util.logging;
public class Logger {
	public static Logger getLogger(final String string) {}
	public void log(final LogRecord logRecord) {}
	public void log(final Level level, final String string) {}
	public void warning(final String string) {}
	public void info(final String string) {}
}
-----END cutting here may damage your screen surface-----

Now you’ll need to fill in the empty methods, do the same for any
other classes necessary (here, LogRecord and Level of the same package,
and perhaps String), and you’re done.

Well, not quite. There’s JavaDoc. This is API documentation that’s
embedded in the source files, then collected at compile time into
a form usable by IDEs and another suitable for publication on web
sites for developers to peruse, delivered as “JavaDoc JAR archives”
together with the binary class JARs. It is something you’d normally
want.

Let’s look at the original OpenJDK 8 java/util/logging/Logger.java
file (copyrighted content ahead, excerpting for the purpose of this
discussion, although it is REALLY long, this is important for this;
skip 230 lines to read on):

-----BEGIN cutting here may damage your screen surface-----
[… copyright header …]
package java.util.logging;
[…]
import java.util.ArrayList;
import java.util.Iterator;
[…]
/**
 * A Logger object is used to log messages for a specific
 * system or application component.  Loggers are normally named,
 * using a hierarchical dot-separated namespace.  Logger names
 * can be arbitrary strings, but they should normally be based on
 * the package name or class name of the logged component, such
 * as java.net or javax.swing.  In addition it is possible to create
 * "anonymous" Loggers that are not stored in the Logger namespace.
 * <p>
 * Logger objects may be obtained by calls on one of the getLogger
 * factory methods.  These will either create a new Logger or
 * return a suitable existing Logger. It is important to note that
 * the Logger returned by one of the {@code getLogger} factory methods
 * may be garbage collected at any time if a strong reference to the
 * Logger is not kept.
 * <p>
 * Logging messages will be forwarded to registered Handler
 * objects, which can forward the messages to a variety of
 * destinations, including consoles, files, OS logs, etc.
 * <p>
 * Each Logger keeps track of a "parent" Logger, which is its
 * nearest existing ancestor in the Logger namespace.
 * <p>
 * Each Logger has a "Level" associated with it.  This reflects
 * a minimum Level that this logger cares about.  If a Logger's
 * level is set to <tt>null</tt>, then its effective level is inherited
 * from its parent, which may in turn obtain it recursively from its
 * parent, and so on up the tree.
 * <p>
 * The log level can be configured based on the properties from the
 * logging configuration file, as described in the description
 * of the LogManager class.  However it may also be dynamically changed
 * by calls on the Logger.setLevel method.  If a logger's level is
 * changed the change may also affect child loggers, since any child
 * logger that has <tt>null</tt> as its level will inherit its
 * effective level from its parent.
 * <p>
 * On each logging call the Logger initially performs a cheap
 * check of the request level (e.g., SEVERE or FINE) against the
 * effective log level of the logger.  If the request level is
 * lower than the log level, the logging call returns immediately.
 * <p>
 * After passing this initial (cheap) test, the Logger will allocate
 * a LogRecord to describe the logging message.  It will then call a
 * Filter (if present) to do a more detailed check on whether the
 * record should be published.  If that passes it will then publish
 * the LogRecord to its output Handlers.  By default, loggers also
 * publish to their parent's Handlers, recursively up the tree.
 * <p>
 * Each Logger may have a {@code ResourceBundle} associated with it.
 * The {@code ResourceBundle} may be specified by name, using the
 * {@link #getLogger(java.lang.String, java.lang.String)} factory
 * method, or by value - using the {@link
 * #setResourceBundle(java.util.ResourceBundle) setResourceBundle} method.
 * This bundle will be used for localizing logging messages.
 * If a Logger does not have its own {@code ResourceBundle} or resource bundle
 * name, then it will inherit the {@code ResourceBundle} or resource bundle name
 * from its parent, recursively up the tree.
 * <p>
 * Most of the logger output methods take a "msg" argument.  This
 * msg argument may be either a raw value or a localization key.
 * During formatting, if the logger has (or inherits) a localization
 * {@code ResourceBundle} and if the {@code ResourceBundle} has a mapping for
 * the msg string, then the msg string is replaced by the localized value.
 * Otherwise the original msg string is used.  Typically, formatters use
 * java.text.MessageFormat style formatting to format parameters, so
 * for example a format string "{0} {1}" would format two parameters
 * as strings.
 * <p>
 * A set of methods alternatively take a "msgSupplier" instead of a "msg"
 * argument.  These methods take a {@link Supplier}{@code <String>} function
 * which is invoked to construct the desired log message only when the message
 * actually is to be logged based on the effective log level thus eliminating
 * unnecessary message construction. For example, if the developer wants to
 * log system health status for diagnosis, with the String-accepting version,
 * the code would look like:
 <pre><code>

   class DiagnosisMessages {
     static String systemHealthStatus() {
       // collect system health information
       ...
     }
   }
   ...
   logger.log(Level.FINER, DiagnosisMessages.systemHealthStatus());
</code></pre>
 * With the above code, the health status is collected unnecessarily even when
 * the log level FINER is disabled. With the Supplier-accepting version as
 * below, the status will only be collected when the log level FINER is
 * enabled.
 <pre><code>

   logger.log(Level.FINER, DiagnosisMessages::systemHealthStatus);
</code></pre>
 * <p>
 * When looking for a {@code ResourceBundle}, the logger will first look at
 * whether a bundle was specified using {@link
 * #setResourceBundle(java.util.ResourceBundle) setResourceBundle}, and then
 * only whether a resource bundle name was specified through the {@link
 * #getLogger(java.lang.String, java.lang.String) getLogger} factory method.
 * If no {@code ResourceBundle} or no resource bundle name is found,
 * then it will use the nearest {@code ResourceBundle} or resource bundle
 * name inherited from its parent tree.<br>
 * When a {@code ResourceBundle} was inherited or specified through the
 * {@link
 * #setResourceBundle(java.util.ResourceBundle) setResourceBundle} method, then
 * that {@code ResourceBundle} will be used. Otherwise if the logger only
 * has or inherited a resource bundle name, then that resource bundle name
 * will be mapped to a {@code ResourceBundle} object, using the default Locale
 * at the time of logging.
 * <br id="ResourceBundleMapping">When mapping resource bundle names to
 * {@code ResourceBundle} objects, the logger will first try to use the
 * Thread's {@linkplain java.lang.Thread#getContextClassLoader() context class
 * loader} to map the given resource bundle name to a {@code ResourceBundle}.
 * If the thread context class loader is {@code null}, it will try the
 * {@linkplain java.lang.ClassLoader#getSystemClassLoader() system class loader}
 * instead.  If the {@code ResourceBundle} is still not found, it will use the
 * class loader of the first caller of the {@link
 * #getLogger(java.lang.String, java.lang.String) getLogger} factory method.
 * <p>
 * Formatting (including localization) is the responsibility of
 * the output Handler, which will typically call a Formatter.
 * <p>
 * Note that formatting need not occur synchronously.  It may be delayed
 * until a LogRecord is actually written to an external sink.
 * <p>
 * The logging methods are grouped in five main categories:
 * <ul>
 * <li><p>
 *     There are a set of "log" methods that take a log level, a message
 *     string, and optionally some parameters to the message string.
 * <li><p>
 *     There are a set of "logp" methods (for "log precise") that are
 *     like the "log" methods, but also take an explicit source class name
 *     and method name.
 * <li><p>
 *     There are a set of "logrb" method (for "log with resource bundle")
 *     that are like the "logp" method, but also take an explicit resource
 *     bundle object for use in localizing the log message.
 * <li><p>
 *     There are convenience methods for tracing method entries (the
 *     "entering" methods), method returns (the "exiting" methods) and
 *     throwing exceptions (the "throwing" methods).
 * <li><p>
 *     Finally, there are a set of convenience methods for use in the
 *     very simplest cases, when a developer simply wants to log a
 *     simple string at a given log level.  These methods are named
 *     after the standard Level names ("severe", "warning", "info", etc.)
 *     and take a single argument, a message string.
 * </ul>
 * <p>
 * For the methods that do not take an explicit source name and
 * method name, the Logging framework will make a "best effort"
 * to determine which class and method called into the logging method.
 * However, it is important to realize that this automatically inferred
 * information may only be approximate (or may even be quite wrong!).
 * Virtual machines are allowed to do extensive optimizations when
 * JITing and may entirely remove stack frames, making it impossible
 * to reliably locate the calling class and method.
 * <P>
 * All methods on Logger are multi-thread safe.
 * <p>
 * <b>Subclassing Information:</b> Note that a LogManager class may
 * provide its own implementation of named Loggers for any point in
 * the namespace.  Therefore, any subclasses of Logger (unless they
 * are implemented in conjunction with a new LogManager class) should
 * take care to obtain a Logger instance from the LogManager class and
 * should delegate operations such as "isLoggable" and "log(LogRecord)"
 * to that instance.  Note that in order to intercept all logging
 * output, subclasses need only override the log(LogRecord) method.
 * All the other logging methods are implemented as calls on this
 * log(LogRecord) method.
 *
 * @since 1.4
 */
public class Logger {
[…]
    /**
     * Find or create a logger for a named subsystem.  If a logger has
     * already been created with the given name it is returned.  Otherwise
     * a new logger is created.
     * <p>
     * If a new logger is created its log level will be configured
     * based on the LogManager configuration and it will configured
     * to also send logging output to its parent's Handlers.  It will
     * be registered in the LogManager global namespace.
     * <p>
     * Note: The LogManager may only retain a weak reference to the newly
     * created Logger. It is important to understand that a previously
     * created Logger with the given name may be garbage collected at any
     * time if there is no strong reference to the Logger. In particular,
     * this means that two back-to-back calls like
     * {@code getLogger("MyLogger").log(...)} may use different Logger
     * objects named "MyLogger" if there is no strong reference to the
     * Logger named "MyLogger" elsewhere in the program.
     *
     * @param   name            A name for the logger.  This should
     *                          be a dot-separated name and should normally
     *                          be based on the package name or class name
     *                          of the subsystem, such as java.net
     *                          or javax.swing
     * @return a suitable Logger
     * @throws NullPointerException if the name is null.
     */
[… omitted annotation and comment not relevant for the discussion …]
    public static Logger getLogger(String name) {
[… function implementation …]
    }
[…]
    /**
     * Log a WARNING message.
     * <p>
     * If the logger is currently enabled for the WARNING message
     * level then the given message is forwarded to all the
     * registered output Handler objects.
     * <p>
     * @param   msg     The string message (or a key in the message catalog)
     */
    public void warning(String msg) {
        log(Level.WARNING, msg);
    }
[…]
}
-----END cutting here may damage your screen surface-----

As you can see, this is really expressive documentation of the API, how
it behaves, what the callers need to use, and sometimes even implemen‐
tation details, in a specific format (HTML snippets and tagged extras
like @param, @return, @throws).

I’m not questioning that these are copyrightable, and that, if Google
copied them from Oracle, they’ve violated copyright. I could understand
the desire (these are something you’d really want) though.

However, these are *also*, not fully but still in a non-trivial amount,
part of the API description. IANAL but I’d accept paraphrasing the de‐
scription and adjusting for your own implementation or stub thereof is
acceptable… even if it ends up sounding close to the original, because
there’s only a finite amount of ways something can be described… the
less ways there are, the less “creativity” (threshold of originality)
can be applied to it, and without ToO no copyright protection.

>"strong copyleft." As argued  by the FSF FAQ
><https://www.gnu.org/licenses/gpl-faq.en.html>, the inclusion of *any* code
>element from a copylefted source makes the entire work a derived work. (See

The FSF and even the Linux kernel community have been trying a technical
trick: in the C language, so-called “header files” that supposedly only
contain the neutral API description are included into users of library
code. This might look like this:

-----BEGIN cutting here may damage your screen surface-----
   23 #ifndef _CTYPE_H_
   24 #define _CTYPE_H_
   25
   26 #include <sys/cdefs.h>

   43 __BEGIN_DECLS
   44 int     isalnum(int);
   45 int     isalpha(int);
   46 int     iscntrl(int);
   47 int     isdigit(int);
[…]
   56 int     tolower(int);
   57 int     totitle(int);
   58 int     toupper(int);
[…]
   60 #if !defined(_ANSI_SOURCE) && !defined(_POSIX_SOURCE)
   61 int     isascii(int);
[…]
   67 #endif /* !_ANSI_SOURCE && !_POSIX_SOURCE */
[…]
  142 __END_DECLS
  143
  144 #endif /* !_CTYPE_H_ */
-----END cutting here may damage your screen surface-----

I’ve line-numbered them here in order to be able to describe them better
and skipped some comments.

Lines 23‥24 and 144 are an “inclusion guard” that wraps around the same
header file, in order to make the C præprocessor not explode when the
header file is included twice. This is an artefact of the C language.

Line 26 includes a header this header depends on or exposes as part of
its API (nowadays considered bad design, users should include all
headers they use directly).

Lines 43 and 142 are a Berkeley Unix (BSD) macro (from <sys/cdefs.h>)
making the header usable for C++ compilers as well.

Lines 44‥47, 56‥58 61 are actual APIs: they say “there’s a method with
this name, that takes an integer argument and returns an integer”.

Linex 60 and 67 make line 61 invisible if strict ANSI or POSIX com‐
pliance is requested (basically, line 61 is an extension over the
standards and code that calls itself standards-compliant should not
be using it).


The trick the GNU project and Linux are using is now this, still
in the same header file (taken from MirBSD):

-----BEGIN cutting here may damage your screen surface-----
  103 #ifdef __GNUC__
  104 #define tolower(c)      __extension__({                 \
  105         int __CTYPE_Tl = (c);                           \
  106                                                         \
  107         (__CTYPE_Tl >= 'A') && (__CTYPE_Tl <= 'Z') ?    \
  108             __CTYPE_Tl - 'A' + 'a' : __CTYPE_Tl;        \
  109 })
  110 #define toupper(c)      __extension__({                 \
  111         int __CTYPE_Tu = (c);                           \
  112                                                         \
  113         (__CTYPE_Tu >= 'a') && (__CTYPE_Tu <= 'z') ?    \
  114             __CTYPE_Tu - 'a' + 'A' : __CTYPE_Tu;        \
  115 })
  116 #else
  117 #define tolower(c)      (((c) >= 'A') && ((c) <= 'Z') ? (c) - 'A' + 'a' : (c))
  118 #define toupper(c)      (((c) >= 'a') && ((c) <= 'z') ? (c) - 'a' + 'A' : (c))
  119 #endif
  120 #define totitle(c)      toupper(c)
-----END cutting here may damage your screen surface-----

This defines præprocessor macros that expand to *implementation* of
functions. As long as these macros are defined, code that simply calls
the function as in…

	int lowercased = tolower(somecharacter);

… will expand to the macro’s content, instead of calling the tolower()
function from the library, thus embedding its code.

This is generally a speed trick. In this case, the implementation is
short enough (and does not use any non-exported data structures) that
embedding it is much faster than a function call to the library. The
library function must still be available for uses like…

	functionpointer fp = &tolower;	// take address of function

… or…

	int lowercased = (tolower)(somecharacter);

… which explicitly avoids the macro, as does this:

	#undef tolower
	int lowercased = tolower(somecharacter);

This speed trick can have the averse effect of making the user of
such a header file into a derived work of the library, by means of
embedding (the compiler variant of statical linking) the library.

In the case of Linux, kernel modules are *expected* to call such
macros that inline themselves, in a half-hearted attempt to force
all kernel modules into GPL rules (I won’t go into details).

In both GNU and Linux cases, skipping the actual library function
and embedding its implementation into a header makes it impossible
to use the library or write a kernel module without embedding some
code from the GPL’d library/kernel into your own code, using the
documented interfaces.

There are two ways around this:

① Do not include the respective headers, or undefine their macros
  and come up with your own equivalent calls.

  This is used on the Sharp Zaurus ARM platform, which natively
  boots into Linux from ROM. To run other operating systems like
  OpenBSD, you basically have to boot Linux first then do trickery
  to replace the running Linux with your own operating system, and
  this can only be done as Linux kernel module.

  The OpenBSD bootloader for zaurus is a Linux kernel module that
  does not include any Linux headers. (Needless to say it only works
  with the shipped Linux version and not if you upgraded your ROM
  to a newer kernel…)

② Ship your own code as source code only. Leave it to the user to
  compile your code with the offending GPL code. If your licence is
  not GPL compatible but still Open Source (or even Shared Source),
  this will work… it’ll merely make the *compiled* form a derivative
  of both your work and the GPL’d work and thus unable to be distri‐
  buted.

Writing an alternative library that implements the same interface,
which your program can then compile against, and use, at run time,
either the GPL or the alternative library, depending on which is
installed (or in the LD_LIBRARY_PATH), is also a possibility. This
has been done with GNU libreadline (GPL) vs. BSD libedit (BSD).

You can compile programs against libedit, which implements the most
relevant functions of GNU libreadline, and then use it with libedit
or GNU libreadline at runtime; if you do the latter as user, you’ll
not benefit from most extra libreadline functions save one: it will
read the ~/.inputrc file used by libreadline to customise options
and keybindings, whereas libedit has static keybindings.

Note that this only works with dynamic linking, and Patrice-Emmanuel
has made his case that dynamic linking in the EU, opposed to FSF
propaganda, is a GPL “infection” barrier.


>2) That brings us to the second point: Patents.

I’m not even going there.


bye,
//mirabilos
-- 
I believe no one can invent an algorithm. One just happens to hit upon it
when God enlightens him. Or only God invents algorithms, we merely copy them.
If you don't believe in God, just consider God as Nature if you won't deny
existence.		-- Coywolf Qi Hunt



More information about the License-discuss mailing list